Skip to content

Commit

Permalink
fix parquet to arrow script failed when number of samples is small (#301
Browse files Browse the repository at this point in the history
)

key_format becomes negative

Co-authored-by: Romain Beaumont <[email protected]>
  • Loading branch information
luke-han and rom1504 authored Jan 6, 2024
1 parent c4e6615 commit 9fe1276
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion clip_retrieval/clip_back_prepro/parquet_to_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def parquet_to_arrow(parquet_folder, output_arrow_folder, columns_to_return):
sink = None
current_batch_count = 0
batch_counter = 0
key_format = int(math.log10(number_samples / 10**10)) + 1
key_format = max(0, int(math.log10(number_samples / 10**10))) + 1
for parquet_files in tqdm(files):
if sink is None or current_batch_count > 10**10:
if sink is not None:
Expand Down

0 comments on commit 9fe1276

Please sign in to comment.