-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
download script #1
base: main
Are you sure you want to change the base?
Conversation
=== Do not change lines below === { "chain": [], "cmd": "poetry run python3 scripts/download_all_domains.py", "dsid": "934d913e-8268-4342-aa0c-3702ee516d10", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and ready to go ahead with data reading. I've added a few minor comments.
scripts/remove_downloads.py
Outdated
from faostat_data_primap.helper.definitions import downloaded_data_path | ||
|
||
|
||
def run(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the delete function good for? Why would we want to delete version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really need this script. I've used it to test the work flow. Once I have the tests set up I can delete it
scripts/remove_downloads.py
Outdated
files_to_delete = os.listdir(path_to_files) | ||
|
||
for file in files_to_delete: | ||
path_to_file = path_to_files / file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deletion of committed files only works after unlocking them using datalad unlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know
|
||
Returns | ||
------- | ||
True if the file was downloaded, False if a cached file was found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case there is no file, but download fails, requests throws and error, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure it does. I can try and make a test for that
scripts/download_all_domains.py
Outdated
|
||
|
||
def download_all_domains(sources: list[tuple[str]]) -> list[str]: | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add downloading of the metadata available with the data. Having the methodology descriptions next to the data could be quite helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But It's less important and can be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be good to keep old version of the methodology description docs though, to see if things have changed. I assume it will not always be updated, so we might want to use checksums and only store new versions of they have changed and symlink to the old version instead. That could quickly rule out methodology changes (if they are not updated yearly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and ready to go ahead with data reading. I've added a few minor comments.
=== Do not change lines below === { "chain": [], "cmd": "poetry run python3 scripts/download_all_domains.py", "dsid": "934d913e-8268-4342-aa0c-3702ee516d10", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
=== Do not change lines below === { "chain": [], "cmd": "poetry run python3 scripts/download_all_domains.py", "dsid": "934d913e-8268-4342-aa0c-3702ee516d10", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^
@JGuetschow this is ready to be reviewed again. Since you've looked at it last time:
|
Description
Write a script that downloads all climate data from the FAOSTAT website.
We need to:
downloaded_data
directoryChecklist
Please confirm that this pull request has done the following:
changelog/
Notes
2023-11-09
.gitattributes
to consider.zip
files as well