Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major Refactor #14

Merged
merged 8 commits into from
Oct 14, 2023
Merged

Major Refactor #14

merged 8 commits into from
Oct 14, 2023

Conversation

harrykeightley
Copy link
Contributor

  • Make datasets library scale appropriately on larger datasets.
  • Major refactor to trainer and processing code within lib.

cause it to remove that character entirely from the vocab.
- Force absolute audio path resolution for dataset files
- Manually load in audio from paths with librosa
- More logging

.
* Refactor trainer code to expose as much flexibility to user as possible w.r.t. training options. 
* Update tests
* Finally get a working wav2vec2 training run.
* Add example script for working wav2vec2 run.
@harrykeightley
Copy link
Contributor Author

CI tests are failing due to an open issue with torch: pytorch/pytorch#100974
However, tests are passing locally, so I'm gonna push.

@harrykeightley harrykeightley merged commit b5da7f9 into main Oct 14, 2023
1 of 2 checks passed
@harrykeightley harrykeightley deleted the dataset-upgrades branch October 14, 2023 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant