Add British Library books dataset (#3603) · huggingface/datasets@4c417d5

github-actions · 2022-01-31T17:12:27Z

Show benchmarks

PyArrow==3.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.010361 / 0.011353 (-0.000992)	0.004780 / 0.011008 (-0.006228)	0.039831 / 0.038508 (0.001323)	0.036507 / 0.023109 (0.013398)	0.346391 / 0.275898 (0.070493)	0.375956 / 0.323480 (0.052476)	0.007648 / 0.007986 (-0.000337)	0.005414 / 0.004328 (0.001085)	0.009900 / 0.004250 (0.005650)	0.039364 / 0.037052 (0.002312)	0.346798 / 0.258489 (0.088308)	0.375626 / 0.293841 (0.081785)	0.043398 / 0.128546 (-0.085148)	0.014310 / 0.075646 (-0.061336)	0.288432 / 0.419271 (-0.130839)	0.064930 / 0.043533 (0.021397)	0.344998 / 0.255139 (0.089859)	0.348630 / 0.283200 (0.065430)	0.114255 / 0.141683 (-0.027428)	1.953525 / 1.452155 (0.501370)	2.036390 / 1.492716 (0.543674)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.338470 / 0.018006 (0.320464)	0.533627 / 0.000490 (0.533138)	0.048015 / 0.000200 (0.047815)	0.000739 / 0.000054 (0.000684)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.042112 / 0.037411 (0.004700)	0.027665 / 0.014526 (0.013139)	0.033448 / 0.176557 (-0.143109)	0.075411 / 0.737135 (-0.661725)	0.046408 / 0.296338 (-0.249931)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.552801 / 0.215209 (0.337592)	5.599749 / 2.077655 (3.522094)	2.259652 / 1.504120 (0.755532)	1.911626 / 1.541195 (0.370431)	1.901316 / 1.468490 (0.432826)	0.691034 / 4.584777 (-3.893743)	6.638672 / 3.745712 (2.892959)	3.134887 / 5.269862 (-2.134975)	1.611833 / 4.565676 (-2.953843)	0.084587 / 0.424275 (-0.339688)	0.014531 / 0.007607 (0.006924)	0.731351 / 0.226044 (0.505307)	7.883939 / 2.268929 (5.615010)	3.047520 / 55.444624 (-52.397104)	2.419689 / 6.876477 (-4.456788)	2.416134 / 2.142072 (0.274062)	0.890184 / 4.805227 (-3.915043)	0.173913 / 6.500664 (-6.326751)	0.070673 / 0.075469 (-0.004796)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.807622 / 1.841788 (-0.034166)	15.535543 / 8.074308 (7.461235)	44.617659 / 10.191392 (34.426267)	1.179021 / 0.680424 (0.498597)	0.650141 / 0.534201 (0.115940)	0.564323 / 0.579283 (-0.014960)	0.700177 / 0.434364 (0.265814)	0.378618 / 0.540337 (-0.161720)	0.394635 / 1.386936 (-0.992301)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.009029 / 0.011353 (-0.002324)	0.007192 / 0.011008 (-0.003816)	0.033446 / 0.038508 (-0.005062)	0.033247 / 0.023109 (0.010138)	0.367490 / 0.275898 (0.091592)	0.356706 / 0.323480 (0.033226)	0.006336 / 0.007986 (-0.001649)	0.003922 / 0.004328 (-0.000407)	0.007877 / 0.004250 (0.003627)	0.035214 / 0.037052 (-0.001839)	0.330208 / 0.258489 (0.071719)	0.365098 / 0.293841 (0.071257)	0.043389 / 0.128546 (-0.085158)	0.012936 / 0.075646 (-0.062710)	0.280270 / 0.419271 (-0.139002)	0.070812 / 0.043533 (0.027279)	0.341617 / 0.255139 (0.086478)	0.373899 / 0.283200 (0.090700)	0.107289 / 0.141683 (-0.034394)	1.967412 / 1.452155 (0.515258)	1.992520 / 1.492716 (0.499804)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.269765 / 0.018006 (0.251759)	0.518831 / 0.000490 (0.518341)	0.000693 / 0.000200 (0.000493)	0.000091 / 0.000054 (0.000037)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.035021 / 0.037411 (-0.002390)	0.023836 / 0.014526 (0.009310)	0.030891 / 0.176557 (-0.145666)	0.076910 / 0.737135 (-0.660226)	0.036604 / 0.296338 (-0.259735)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.598111 / 0.215209 (0.382902)	5.884815 / 2.077655 (3.807160)	2.213402 / 1.504120 (0.709282)	1.872340 / 1.541195 (0.331145)	1.890033 / 1.468490 (0.421543)	0.712668 / 4.584777 (-3.872109)	6.389394 / 3.745712 (2.643682)	2.861361 / 5.269862 (-2.408501)	1.478328 / 4.565676 (-3.087349)	0.083284 / 0.424275 (-0.340991)	0.058594 / 0.007607 (0.050987)	0.748062 / 0.226044 (0.522017)	7.402994 / 2.268929 (5.134066)	3.045604 / 55.444624 (-52.399021)	2.255251 / 6.876477 (-4.621226)	2.228919 / 2.142072 (0.086847)	0.887659 / 4.805227 (-3.917568)	0.172469 / 6.500664 (-6.328195)	0.068655 / 0.075469 (-0.006814)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.912016 / 1.841788 (0.070229)	15.133940 / 8.074308 (7.059632)	41.568732 / 10.191392 (31.377340)	1.060358 / 0.680424 (0.379935)	0.596030 / 0.534201 (0.061829)	0.548450 / 0.579283 (-0.030833)	0.730354 / 0.434364 (0.295990)	0.378266 / 0.540337 (-0.162072)	0.394699 / 1.386936 (-0.992237)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

1 comment on commit `4c417d5`

github-actions bot commented on `4c417d5` Jan 31, 2022

Commit

There are no files selected for viewing

1 comment on commit 4c417d5

github-actions bot commented on 4c417d5 Jan 31, 2022

Choose a reason for hiding this comment

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

1 comment on commit `4c417d5`

github-actions bot commented on `4c417d5` Jan 31, 2022