-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer learning for another sampling rate? #164
Comments
Did you change the sampling rate on the API that saves/playbacks the audio? |
Yes I did change the playback api. I did train my own Waveglow network in the meantime as well as Tacotron. Both trained with my 16KHz german dataset. When I do inference with these it sounds like this: When I do inference with the pretrained When I do inference with the pretrained I can either match the pitch, or the speed, but not both. |
When training waveglow, did you change the sampling rate in your config file? Line 17 in d2c2511
|
When I trained from scratch, yes I did. But I wanted to know what if I did not train from scratch. I wanted to know if a pretrained waveglow with sampling 22050 helps to train further with a dataset of sampling rate 16000, or would they be incompatible? Would just setting the sampling rate do the trick? |
Hi, @cduguet ! Have you already run any experiments of training from the pretrained model on a dataset with sr=16KHz? |
have you changed the hope length either? Because in wave glow source code, the mel-spec will be upsampled as the same length as audio with transposed convolution with fixed stride 256. |
Hey @patrick-g-zhang |
@Shikherneo2 Yes. in original google's paper, the hop time of stft is 12.5ms which should be 200 sample points when sampling rate is 16kHz. |
I have done it with 16k training data which is per-trained with 22k provided model. The loss dropped quickly and generate audible audio. |
@cduguet I am trying to train from a new waveglow model scratch too, with sample_rate=8000 (maybe I should increase to 16k since 8k sound very bad), what do I need to change in |
try this #88 |
Hi!
I'd like to use the pretrained weights in waveglow, to train on a dataset with different sampling rate. When I just train tacotron and try the mel outputs on the pretrained waveglow model the audio outputs sound low-pitched.
If the frequency is fundamentally different, does it bring any benefit using the pretrained network or it would be as useful as training from scratch?
Any experiences in this??
My dataset sampling rate is 16000, in contrast to 22500 from the original LJSpeech dataset.
The text was updated successfully, but these errors were encountered: