You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been experimenting with smart_open and can't figure out which way can I ensure that files are consistent when coping data between GCS and S3 (and vice versa).
with open(uri=f"...",mode='rb',transport_params=dict(client=gcs_client)) as fout:
with open(uri=f"...", mode='wb',transport_params=s3_tp) as fin:
for line in fout:
fin.write(line)
ETags are not matching (which is expected I guess), but files are different in size when copied from GCS to S3.
gsutil shows size 1340495 bytes and after copying to s3 it's 1291979 bytes (though the file itself seems ok).
I've tried turn off s3 multipart_upload, but that doesn't change the behaviour.
If I use below ordinary way to read/write files, my file size taken from gcs and written to s3 matches, and I can create validation process.
for blob in blobs:
buffer = io.BytesIO()
blob.download_to_file(buffer)
buffer.seek(0)
s3_client.put_object(Body=buffer, Bucket='...' Key=blob.name)
Which mechanism can be used to validate files consistency after copy?
Hi,
I've been experimenting with smart_open and can't figure out which way can I ensure that files are consistent when coping data between GCS and S3 (and vice versa).
ETags are not matching (which is expected I guess), but files are different in size when copied from GCS to S3.
gsutil shows size 1340495 bytes and after copying to s3 it's 1291979 bytes (though the file itself seems ok).
I've tried turn off s3
multipart_upload
, but that doesn't change the behaviour.If I use below ordinary way to read/write files, my file size taken from gcs and written to s3 matches, and I can create validation process.
Which mechanism can be used to validate files consistency after copy?
The text was updated successfully, but these errors were encountered: