Transfer learning for another sampling rate? #164

cduguet · 2019-10-25T08:29:10Z

Hi!
I'd like to use the pretrained weights in waveglow, to train on a dataset with different sampling rate. When I just train tacotron and try the mel outputs on the pretrained waveglow model the audio outputs sound low-pitched.
If the frequency is fundamentally different, does it bring any benefit using the pretrained network or it would be as useful as training from scratch?
Any experiences in this??
My dataset sampling rate is 16000, in contrast to 22500 from the original LJSpeech dataset.

rafaelvalle · 2019-10-25T16:01:33Z

Did you change the sampling rate on the API that saves/playbacks the audio?

cduguet · 2019-11-07T17:25:09Z

Yes I did change the playback api. I did train my own Waveglow network in the meantime as well as Tacotron. Both trained with my 16KHz german dataset.

When I do inference with these it sounds like this:
https://vocaroo.com/i/s0W76vXWCsTh

When I do inference with the pretrained waveglow_256channels.pt, at a sampling rate of 22.5KHz:
https://vocaroo.com/i/s0G4XUSjcgSi

When I do inference with the pretrained waveglow_256channels.pt, at a sampling rate of 16KHz:
https://vocaroo.com/i/s1lIjBBGPbRM

I can either match the pitch, or the speed, but not both.

rafaelvalle · 2019-11-07T21:42:10Z

When training waveglow, did you change the sampling rate in your config file?

waveglow/config.json

Line 17 in d2c2511

"sampling_rate": 22050,

cduguet · 2019-11-08T14:33:17Z

When I trained from scratch, yes I did. But I wanted to know what if I did not train from scratch.

I wanted to know if a pretrained waveglow with sampling 22050 helps to train further with a dataset of sampling rate 16000, or would they be incompatible? Would just setting the sampling rate do the trick?

Islanna · 2019-11-26T12:26:24Z

Hi, @cduguet !
Your idea looks really promising. Proper training from scratch takes too much time, probably transfer learning can solve this problem.

Have you already run any experiments of training from the pretrained model on a dataset with sr=16KHz?

patrick-g-zhang · 2019-12-10T13:27:32Z

have you changed the hope length either? Because in wave glow source code, the mel-spec will be upsampled as the same length as audio with transposed convolution with fixed stride 256.

Shikherneo2 · 2019-12-10T14:58:06Z

Hey @patrick-g-zhang
Do you mean change the hop length to something like 200 for a Sampling rate of 16KHz?

patrick-g-zhang · 2019-12-10T15:18:26Z

@Shikherneo2 Yes. in original google's paper, the hop time of stft is 12.5ms which should be 200 sample points when sampling rate is 16kHz.

patrick-g-zhang · 2019-12-14T00:49:06Z

I have done it with 16k training data which is per-trained with 22k provided model. The loss dropped quickly and generate audible audio.

EuphoriaCelestial · 2020-06-05T01:57:04Z

When I trained from scratch, yes I did. But I wanted to know what if I did not train from scratch.

I wanted to know if a pretrained waveglow with sampling 22050 helps to train further with a dataset of sampling rate 16000, or would they be incompatible? Would just setting the sampling rate do the trick?

@cduguet I am trying to train from a new waveglow model scratch too, with sample_rate=8000 (maybe I should increase to 16k since 8k sound very bad), what do I need to change in config.json for new sample rate?

ashish-roopan · 2020-07-25T14:03:38Z

try this #88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning for another sampling rate? #164

Transfer learning for another sampling rate? #164

cduguet commented Oct 25, 2019

rafaelvalle commented Oct 25, 2019

cduguet commented Nov 7, 2019

rafaelvalle commented Nov 7, 2019 •

edited

Loading

cduguet commented Nov 8, 2019

Islanna commented Nov 26, 2019

patrick-g-zhang commented Dec 10, 2019

Shikherneo2 commented Dec 10, 2019

patrick-g-zhang commented Dec 10, 2019

patrick-g-zhang commented Dec 14, 2019

EuphoriaCelestial commented Jun 5, 2020

ashish-roopan commented Jul 25, 2020

Transfer learning for another sampling rate? #164

Transfer learning for another sampling rate? #164

Comments

cduguet commented Oct 25, 2019

rafaelvalle commented Oct 25, 2019

cduguet commented Nov 7, 2019

rafaelvalle commented Nov 7, 2019 • edited Loading

cduguet commented Nov 8, 2019

Islanna commented Nov 26, 2019

patrick-g-zhang commented Dec 10, 2019

Shikherneo2 commented Dec 10, 2019

patrick-g-zhang commented Dec 10, 2019

patrick-g-zhang commented Dec 14, 2019

EuphoriaCelestial commented Jun 5, 2020

ashish-roopan commented Jul 25, 2020

rafaelvalle commented Nov 7, 2019 •

edited

Loading