Skip to content

Commit

Permalink
fix parquet to arrow script failed when number of samples is small
Browse files Browse the repository at this point in the history
key_format becomes negative
  • Loading branch information
luke-han committed Aug 7, 2023
1 parent b2461fe commit ddf3d08
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion clip_retrieval/clip_back_prepro/parquet_to_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def parquet_to_arrow(parquet_folder, output_arrow_folder, columns_to_return):
sink = None
current_batch_count = 0
batch_counter = 0
key_format = int(math.log10(number_samples / 10**10)) + 1
key_format = max(0, int(math.log10(number_samples / 10**10))) + 1
for parquet_files in tqdm(files):
if sink is None or current_batch_count > 10**10:
if sink is not None:
Expand Down

0 comments on commit ddf3d08

Please sign in to comment.