-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pretrained model which can resume training #35
Comments
Hi,it seems that it is too much slow,how many epochs should be done before we can get good result |
I use the following method to retrain on the official released pre-trained model:
def load_checkpoint(checkpoint_path, model, optimizer):
assert os.path.isfile(checkpoint_path)
checkpoint_dict = torch.load(checkpoint_path, map_location='cpu')
# iteration = checkpoint_dict['iteration']
iteration = 1
# optimizer.load_state_dict(checkpoint_dict['optimizer'])
model_for_loading = checkpoint_dict['model']
model.load_state_dict(model_for_loading.state_dict())
print("Loaded checkpoint '{}' (iteration {})" .format(
checkpoint_path, iteration))
return model, optimizer, iteration The training is still in process, and I can't tell if this is completely right. |
@candlewill I'm trying to finetune the official pretrained model and I'm getting nan's because the determinants of the Intertible1x1Conv weight matrices are negative, so the logdet operation produces nan's. Have you changed something to address that? @rafaelvalle what do you think about it? This apparently doesn't matter for inference, but I'm thinking that you've probably used a slightly different version of code to train that checkpoint (i.e. using abs of the det or smth). |
@candlewill Thanks for your share! I want to know how about your retraining? Does it work well? |
@belevtsoff @candlewill @rafaelvalle I also meet the problem when I try to finetune the official pretrained model. Look forward your help! |
@UESTCgan it seems like the problem is twofold:
I'm not sure if I've spotted everything needed though, but at least now my finetuned model produces speech and not just a bunch of noise |
@belevtsoff Thanks for your reply! I will have a try, and then I will share my results to communicate with you. Thanks again! |
@belevtsoff careful with the alterations. We initialize the determinants to be positive and the determinant's crossing between positive and negative values suggests that during optimization one is stepping over infinite error, at determinant 0, which is bad. This can be caused by a large update caused by either large learning rate or some outlier batch... |
@rafaelvalle Thanks for the response! But actually my point is that if you simply take the matrix W of the InvConv layer from the official pre-trained checkpoint (without any finetuning) - it's determinant will be negative. How is it possible? |
Probably because in the old model we did not enforce the determinant to be 1 at initialization. |
@rafaelvalle yeah, probably Anyway, I've successfully finetuned the pretrained checkpoint using the recipe above. Although it didn't completely remove that reverb-like effect on a male voice. |
After two weeks' fine tune, I can also get some clear voice. However, the problem as @belevtsoff mentioned, also occurs in my experiment. Maybe more training is needed. Here are some samples (In Chinese): |
@candlewill your wav is female or male? |
@hdmjdp Female. |
@UESTCgan,hello.i met the same problem.i want to ask you how do you solve that problem,thank you very much! |
hi,@candlewill.i met that problem.how can i resume training from the checkpoint,thank you very much. |
@belevtsoff can you please share your fix for this part? |
Also interested in resuming from waveglow_old.pt checkpoint. @belevtsoff @candlewill Can you share your fix? @rafaelvalle Is there a better way? Or have you a new checkpoint? (That works with current code) |
@duvtedudug @HashiamKadhim Oh, sorry guys, I forgot about this. I'll share the code as soon as I get to the computer |
Any advice on training for adaption, where the dataset is small. Thanks |
@duvtedudug @HashiamKadhim @doctor-xiang @rafaelvalle Ok, I've submitted a pull request to add the possibility to continue training from the official checkpoint: #99. You can use my fork if the PR will never get merged. Let me know if I overlooked smth |
Which Chinese dataset do you use? |
hello, @belevtsoff your modifications is no longer working i get errors like that whenever i try to load the model |
I have replace |
Closing due to inactivity. |
@rafaelvalle Thanks for your sharing! It helps a lot.
I find it trains very slow (about 1 epoch/day,batch size=1) when I trained model using my data(about 12h). Could you offer a model which can resume training from it ?
Thanks a lot !!
The text was updated successfully, but these errors were encountered: