Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about Continual Training with WavLM and Pre-Training Resources #54

Open
CantaoSu opened this issue Apr 28, 2024 · 0 comments
Open

Comments

@CantaoSu
Copy link

Hi,

I'm currently working on my master's thesis, which involves developing an Automatic Speech Recognition (ASR) model for Dutch dysarthric speech. My approach is to further pre-train the WavLM Large model (already trained on English) with 400 hours of Dutch normal speech, then fine-tune it with one hour of Dutch dysarthric speech, before comparing it with Wav2Vec 2.0.

However, I've encountered a roadblock. I couldn't find any pre-training resources for WavLM in the S3PRL toolkit or any related documentation. Given that WavLM has been pre-trained on English, I wanted to explore continual training or pre-training in Dutch. I was directed to this repository by the S3PRL team, but I'm not sure if there are existing resources or examples for this type of task.

Would you be able to guide me on how to approach this problem? Specifically, I need to know if it's possible to pre-train WavLM with different datasets and if there are any recipes or scripts available to help with this process. Any advice or pointers to documentation, examples, or other resources would be greatly appreciated.

Thank you in advance for your assistance. I look forward to your response.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant