Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] - Enable Unloading Redshift Tables to S3 in either JSON, PARQUET or CSV format #1052

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

NirTatcher
Copy link
Contributor

@NirTatcher NirTatcher commented May 12, 2024

  • Adding the format parameter to the unload from redshift to S3 function. We can use either CSV or JSON or PARQUET (highly recommended for large tables, using PARQUET files are smaller than CSVs, can be read and written much faster compared to CSV), currently we only enable to unload as a TXT file by default.
  • Including only supported options to the unload query.
  • Unloading as a CSV does not support ADDQUOTES.
  • Unloading as a PARQUET does not support DELIMITER, ADDQUOTES, ESCAPE, NULL AS, HEADER, GZIP (compression) options.
  • Unloading as a JSON does not support DELIMITER, HEADER, ADDQUOTES, ESCAPE, NULL AS options.
image

Hope this will be beneficial to anyone except me and pass all of the live tests. @shaunagm please let me know if there are any changes that need to be made to get this implemented.
Thank you!
More on PARQUET here.
More on UNLOAD here.

@NirTatcher NirTatcher changed the title trying to add the format parameter to the unload from redshift to S3 … [Enhancement] - Enable Unloading Redshift Tables in either JSON, PARQUET or CSV format May 12, 2024
@NirTatcher NirTatcher changed the title [Enhancement] - Enable Unloading Redshift Tables in either JSON, PARQUET or CSV format [Enhancement] - Enable Unloading Redshift Tables to S3 in either JSON, PARQUET or CSV format May 12, 2024
@shaunagm
Copy link
Collaborator

Thanks so much @NirTatcher! This looks good to me but I'd love it if someone with more Redshift knowledge gave it a look too. Maybe @austinweisgrau or @Jason94?

@NirTatcher
Copy link
Contributor Author

Thanks so much @NirTatcher! This looks good to me but I'd love it if someone with more Redshift knowledge gave it a look too. Maybe @austinweisgrau or @Jason94?

Sure thing @shaunagm!
If it would make it easier I can just make all of that in separated functions so like unload (as text by default), unload as csv, unload as json, unload as parquet functions.
Also, I believe it's much more powerful (at least for my use) to unload from redshidt to s3 as csv/json/parquert than in text format.
So I'll be waiting and would be happy to receive any feedback from @Jason94 or @austinweisgrau so thank you all in advance!

@NirTatcher
Copy link
Contributor Author

@austinweisgrau once you or anyone else gets to it I will be glad to get feedback on this so we can start letting people to unload tables into S3 buckets in another format than TXT.
Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants