pretrained model which can resume training #35

emmacirl · 2018-11-19T03:38:26Z

@rafaelvalle Thanks for your sharing! It helps a lot.
I find it trains very slow (about 1 epoch/day,batch size=1) when I trained model using my data(about 12h). Could you offer a model which can resume training from it ?

Thanks a lot !!

jiqizaisikao · 2018-11-19T05:06:03Z

Hi,it seems that it is too much slow,how many epochs should be done before we can get good result

candlewill · 2018-11-20T09:08:33Z

I use the following method to retrain on the official released pre-trained model:

change config.json:
set "checkpoint_path": "models/waveglow_new.pt"
change train.py:

def load_checkpoint(checkpoint_path, model, optimizer):
    assert os.path.isfile(checkpoint_path)
    checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
    # iteration = checkpoint_dict['iteration']
    iteration = 1
    # optimizer.load_state_dict(checkpoint_dict['optimizer'])
    model_for_loading = checkpoint_dict['model']
    model.load_state_dict(model_for_loading.state_dict())
    print("Loaded checkpoint '{}' (iteration {})" .format(
          checkpoint_path, iteration))
    return model, optimizer, iteration

The training is still in process, and I can't tell if this is completely right.

belevtsoff · 2018-11-22T18:02:59Z

@candlewill I'm trying to finetune the official pretrained model and I'm getting nan's because the determinants of the Intertible1x1Conv weight matrices are negative, so the logdet operation produces nan's. Have you changed something to address that? @rafaelvalle what do you think about it? This apparently doesn't matter for inference, but I'm thinking that you've probably used a slightly different version of code to train that checkpoint (i.e. using abs of the det or smth).

WendongGan · 2018-11-23T03:15:38Z

@candlewill Thanks for your share! I want to know how about your retraining？ Does it work well?

WendongGan · 2018-11-23T06:32:53Z

@belevtsoff @candlewill @rafaelvalle I also meet the problem when I try to finetune the official pretrained model.

Look forward your help!

belevtsoff · 2018-11-23T09:30:58Z

@UESTCgan it seems like the problem is twofold:

the glow_old.py uses interchanging splitting of channels (in both forward and infer), whereas glow.py always uses first half of channels as audio_0. So it seems that right now to finetune the thing you have to either modify glow.py to use the old style channels or uncomment and fix the forward method in glow_old.py and use that one instead.
in glow.py, replace torch.logdet(W) with torch.det(W).abs().log(). This makes more sense mathematically and will get rid of nan's

I'm not sure if I've spotted everything needed though, but at least now my finetuned model produces speech and not just a bunch of noise

WendongGan · 2018-11-26T07:54:06Z

@belevtsoff Thanks for your reply! I will have a try, and then I will share my results to communicate with you. Thanks again!

rafaelvalle · 2018-11-28T16:59:51Z

@belevtsoff careful with the alterations. We initialize the determinants to be positive and the determinant's crossing between positive and negative values suggests that during optimization one is stepping over infinite error, at determinant 0, which is bad. This can be caused by a large update caused by either large learning rate or some outlier batch...

belevtsoff · 2018-11-28T22:48:44Z

@rafaelvalle Thanks for the response! But actually my point is that if you simply take the matrix W of the InvConv layer from the official pre-trained checkpoint (without any finetuning) - it's determinant will be negative. How is it possible?

rafaelvalle · 2018-11-29T02:39:27Z

Probably because in the old model we did not enforce the determinant to be 1 at initialization.

belevtsoff · 2018-11-29T15:24:55Z

@rafaelvalle yeah, probably

Anyway, I've successfully finetuned the pretrained checkpoint using the recipe above. Although it didn't completely remove that reverb-like effect on a male voice.

candlewill · 2018-12-04T07:58:37Z

After two weeks' fine tune, I can also get some clear voice. However, the problem as @belevtsoff mentioned, also occurs in my experiment. Maybe more training is needed.

Here are some samples (In Chinese):

nvidia_waveglow_samples.zip

hdmjdp · 2018-12-06T07:45:15Z

@candlewill your wav is female or male?

candlewill · 2018-12-06T09:49:16Z

@hdmjdp Female.

li-xx-5 · 2019-01-15T00:42:55Z

@UESTCgan,hello.i met the same problem.i want to ask you how do you solve that problem,thank you very much!

li-xx-5 · 2019-01-15T00:46:46Z

hi,@candlewill.i met that problem.how can i resume training from the checkpoint,thank you very much.

HashiamKadhim · 2019-02-18T23:27:31Z

@UESTCgan it seems like the problem is twofold:

the glow_old.py uses interchanging splitting of channels (in both forward and infer), whereas glow.py always uses first half of channels as audio_0. So it seems that right now to finetune the thing you have to either modify glow.py to use the old style channels or uncomment and fix the forward method in glow_old.py and use that one instead.

@belevtsoff can you please share your fix for this part?

duvtedudug · 2019-03-01T16:10:42Z

Also interested in resuming from waveglow_old.pt checkpoint.

@belevtsoff @candlewill Can you share your fix?

@rafaelvalle Is there a better way? Or have you a new checkpoint? (That works with current code)

belevtsoff · 2019-03-01T19:10:11Z

@duvtedudug @HashiamKadhim Oh, sorry guys, I forgot about this. I'll share the code as soon as I get to the computer

rohan6366 · 2019-03-02T14:24:59Z

Any advice on training for adaption, where the dataset is small.

Thanks

belevtsoff · 2019-03-03T20:28:37Z

@duvtedudug @HashiamKadhim @doctor-xiang @rafaelvalle Ok, I've submitted a pull request to add the possibility to continue training from the official checkpoint: #99. You can use my fork if the PR will never get merged. Let me know if I overlooked smth

anshshan · 2019-11-04T03:05:32Z

After two weeks' fine tune, I can also get some clear voice. However, the problem as @belevtsoff mentioned, also occurs in my experiment. Maybe more training is needed.

Here are some samples (In Chinese):

nvidia_waveglow_samples.zip

Which Chinese dataset do you use？

FadyKhalaf · 2019-12-27T13:00:21Z

hello, @belevtsoff your modifications is no longer working i get errors like that whenever i try to load the model

MuyangDu · 2020-06-11T01:32:58Z

@UESTCgan it seems like the problem is twofold:

the glow_old.py uses interchanging splitting of channels (in both forward and infer), whereas glow.py always uses first half of channels as audio_0. So it seems that right now to finetune the thing you have to either modify glow.py to use the old style channels or uncomment and fix the forward method in glow_old.py and use that one instead.

in glow.py, replace torch.logdet(W) with torch.det(W).abs().log(). This makes more sense mathematically and will get rid of nan's

I'm not sure if I've spotted everything needed though, but at least now my finetuned model produces speech and not just a bunch of noise

I have replace torch.logdet(W) with torch.det(W).abs().log() but still getting NaN loss after several epochs. I have also used vad to remove all the silense in the wav and made sure all the wave is longer than segment length. I have also use std() to make sure no sliense in training
segment. Also tried to add some random noise to the audio samples in the dataloader. However, none of the above helps. The loss reached around -4.6 and suddenly became NaN. I have tried to use smaller learning rate but it turns out the smaller learning rate just slow down the converge. When the loss reached -4.6. The loss became NaN again. Any ideas?

rafaelvalle · 2020-06-25T20:53:30Z

Closing due to inactivity.

yoyolicoris mentioned this issue Nov 28, 2018

During training, the loss value goes up and down and cannot converge, is that normal? Besides, what should the final loss value looks like? #49

Closed

belevtsoff mentioned this issue Mar 3, 2019

Option to continue training from the official checkpoint #99

Open

mcncm mentioned this issue Nov 30, 2019

nans in inference with pretrained model #171

Closed

jaywalnut310 mentioned this issue Jun 16, 2020

Log det Jacobian is wrong in Inv1x1Conv jaywalnut310/glow-tts#17

Closed

rafaelvalle closed this as completed Jun 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretrained model which can resume training #35

pretrained model which can resume training #35

emmacirl commented Nov 19, 2018

jiqizaisikao commented Nov 19, 2018

candlewill commented Nov 20, 2018

belevtsoff commented Nov 22, 2018 •

edited

Loading

WendongGan commented Nov 23, 2018

WendongGan commented Nov 23, 2018

belevtsoff commented Nov 23, 2018 •

edited

Loading

WendongGan commented Nov 26, 2018

rafaelvalle commented Nov 28, 2018 •

edited

Loading

belevtsoff commented Nov 28, 2018 •

edited

Loading

rafaelvalle commented Nov 29, 2018

belevtsoff commented Nov 29, 2018 •

edited

Loading

candlewill commented Dec 4, 2018

hdmjdp commented Dec 6, 2018 •

edited

Loading

candlewill commented Dec 6, 2018

li-xx-5 commented Jan 15, 2019

li-xx-5 commented Jan 15, 2019

HashiamKadhim commented Feb 18, 2019

duvtedudug commented Mar 1, 2019

belevtsoff commented Mar 1, 2019

rohan6366 commented Mar 2, 2019 •

edited

Loading

belevtsoff commented Mar 3, 2019

anshshan commented Nov 4, 2019

FadyKhalaf commented Dec 27, 2019 •

edited

Loading

MuyangDu commented Jun 11, 2020

rafaelvalle commented Jun 25, 2020

pretrained model which can resume training #35

pretrained model which can resume training #35

Comments

emmacirl commented Nov 19, 2018

jiqizaisikao commented Nov 19, 2018

candlewill commented Nov 20, 2018

belevtsoff commented Nov 22, 2018 • edited Loading

WendongGan commented Nov 23, 2018

WendongGan commented Nov 23, 2018

belevtsoff commented Nov 23, 2018 • edited Loading

WendongGan commented Nov 26, 2018

rafaelvalle commented Nov 28, 2018 • edited Loading

belevtsoff commented Nov 28, 2018 • edited Loading

rafaelvalle commented Nov 29, 2018

belevtsoff commented Nov 29, 2018 • edited Loading

candlewill commented Dec 4, 2018

hdmjdp commented Dec 6, 2018 • edited Loading

candlewill commented Dec 6, 2018

li-xx-5 commented Jan 15, 2019

li-xx-5 commented Jan 15, 2019

HashiamKadhim commented Feb 18, 2019

duvtedudug commented Mar 1, 2019

belevtsoff commented Mar 1, 2019

rohan6366 commented Mar 2, 2019 • edited Loading

belevtsoff commented Mar 3, 2019

anshshan commented Nov 4, 2019

FadyKhalaf commented Dec 27, 2019 • edited Loading

MuyangDu commented Jun 11, 2020

rafaelvalle commented Jun 25, 2020

belevtsoff commented Nov 22, 2018 •

edited

Loading

belevtsoff commented Nov 23, 2018 •

edited

Loading

rafaelvalle commented Nov 28, 2018 •

edited

Loading

belevtsoff commented Nov 28, 2018 •

edited

Loading

belevtsoff commented Nov 29, 2018 •

edited

Loading

hdmjdp commented Dec 6, 2018 •

edited

Loading

rohan6366 commented Mar 2, 2019 •

edited

Loading

FadyKhalaf commented Dec 27, 2019 •

edited

Loading