Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop session_path column from datasets cache table #134

Open
6 tasks done
k1o0 opened this issue Oct 7, 2024 · 0 comments
Open
6 tasks done

Drop session_path column from datasets cache table #134

k1o0 opened this issue Oct 7, 2024 · 0 comments
Assignees

Comments

@k1o0
Copy link
Collaborator

k1o0 commented Oct 7, 2024

Dropping the redundant session_path column from the datasets cache table and using joins instead we would reduce the table size in memory by about 13%-23% for a (hopefully) negligible increase in time loading datasets.

  • Drop session_path column in cache loader
  • Method to get session path from Series (ab40955)
  • Remove session_path from standard column list and local cache generator functions
  • Handle missing session behaviour when loading aggregate datasets
  • Update converter methods to handle missing session path column (i.e. record2url, path2url)
  • Update download_datasets and download_aws to handle session path lookups
@k1o0 k1o0 self-assigned this Oct 7, 2024
k1o0 added a commit that referenced this issue Oct 10, 2024
k1o0 added a commit that referenced this issue Oct 10, 2024
* Resolves issue #134
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant