Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build custom Tokenizer and custom Processor flows for wav2vec2 models. #7

Merged
merged 7 commits into from
Sep 19, 2023

Conversation

harrykeightley
Copy link
Contributor

@harrykeightley harrykeightley commented Sep 8, 2023

  • Allow for custom processor building in the trainer flow
  • Checks to see if the job is for a wav2vec2 model, and if so, creates a custom tokenizer and feature extractor.
  • As a result, predictions are no longer filled with "unks" when fine tuning:
Screenshot 2023-09-08 at 5 13 00 pm

Obviously the prediction here is not any good, but this might just be because it was trained on a dataset of 1.

@harrykeightley
Copy link
Contributor Author

Tests work locally, unsure what the error is here. Might just merge that badboi

@benfoley
Copy link
Contributor

Merge away!

@harrykeightley harrykeightley merged commit db11fe1 into main Sep 19, 2023
1 of 2 checks passed
@harrykeightley harrykeightley deleted the processor branch September 19, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants