Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To train a new voice for English, how many hours of audio do you recommend? #194

Open
xiao1ongbao opened this issue Sep 27, 2024 · 1 comment

Comments

@xiao1ongbao
Copy link

To train a new voice for English, how many hours of audio do you recommend?
Does the training script train from scratch or finetunes the existing model?
Thanks!

@iv2985
Copy link

iv2985 commented Oct 27, 2024

If one takes the G_0.pth (the first checkpoint) during training and uses it for inference, it speaks English with a young female voice that doesn't match the audio clips being trained on. So, it seems that it is fine-tuning that starting point.

As for duration of audio, I have gotten reasonable results with only 5 minutes of audio and 1k epochs with 48khz wav. Most people use 1+ hours, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants