You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some possibilities on these data files that might help us scale to mucho larger datasets but keep things centralized to git (at least for this more "researchy" repo):
store raw data files with git-lfs, this enables some configuration/control around downloading potentially large files. The current raw files aren't that big, but I imagine we'll be stepping into some larger ones soon.
potentially add the processed directory to .gitignore, and provide instructions in the readme for running the cleaning pipeline and generating the processed data files. My reasoning here is that this will a) reduce repo size and b) ensure that new clones always have up-to-date cleaned files (i.e. forgetting to run the cleaning pipeline after changing it won't be an issue for new users).
Anyway — just ideas, these are my preferences but we definitely don't have to use them (but github hosted repos do have a maximum size — I don't recall what it is though).
store raw data files with git-lfs, this enables some configuration/control around downloading potentially large files. The current raw files aren't that big, but I imagine we'll be stepping into some larger ones soon.
potentially add the
processed
directory to.gitignore
, and provide instructions in the readme for running the cleaning pipeline and generating the processed data files. My reasoning here is that this will a) reduce repo size and b) ensure that new clones always have up-to-date cleaned files (i.e. forgetting to run the cleaning pipeline after changing it won't be an issue for new users).Anyway — just ideas, these are my preferences but we definitely don't have to use them (but github hosted repos do have a maximum size — I don't recall what it is though).
Originally posted by @azane in #14 (comment)
The text was updated successfully, but these errors were encountered: