You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To train a new voice for English, how many hours of audio do you recommend?
Does the training script train from scratch or finetunes the existing model?
Thanks!
The text was updated successfully, but these errors were encountered:
If one takes the G_0.pth (the first checkpoint) during training and uses it for inference, it speaks English with a young female voice that doesn't match the audio clips being trained on. So, it seems that it is fine-tuning that starting point.
As for duration of audio, I have gotten reasonable results with only 5 minutes of audio and 1k epochs with 48khz wav. Most people use 1+ hours, however.
To train a new voice for English, how many hours of audio do you recommend?
Does the training script train from scratch or finetunes the existing model?
Thanks!
The text was updated successfully, but these errors were encountered: