Replies: 1 comment 1 reply
-
Try now, its fixed |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone,
I've been trying to fine-tune/transfer learn the YourTTS model on a custom dataset, and run in to an issue with engaging the speaker encodings, I think.
I'm doing this on Colab, with a modified version of the 'original recipe' by Edresson & iamkhalidbashir, but I have run things with modifications on my own machine with the same outcome.
tl;dr - at inference time tts reports an empty list of speakers {}
(coqui) C:\tts>tts --text "test test" --model_path test/best_model.pth --config_path test/config2.json --list_speaker_idxs --speakers_file_path test/speakers.pth
I've modified the config used to point to the new speakers.pth, same results. Same results with not specifying speakers_file_path in the CLI.
Potential training issues: When calling Trainer(), the config is updated, and the incorrect speakers.pth seems to be copied to the run directory.
The speakers.pth created by compute_embeddings is 2.7mb, the one copied to the run directory is 431 bytes.
I've tried replacing the 431 byte file with the computed embeddings before running trainer.fit(), same results as above.
Trainer seems to be setting
"speakers_file": "/content/drive/MyDrive/duke-yt/traineroutput/YourTTS-EN-VCTK-January-09-2023_08+37PM-0000000/speakers.pth" as well, regardless of how many different ways I attempt to set this to null.
Does this override d_vector_file? The generated config.json file points to the newly created speakers file "d_vector_file": ["/content/drive/MyDrive/duke-yt/speakers.pth"]
Pastebin link with config.json generated at run time: https://pastebin.com/bKrkPAyE
Pastebin link with trainer output logs: https://pastebin.com/SU9Ebnwh
I've made a dataset that mirrors the VCTK format, with 2 speaker directories named 'duke' and 'cash'. The samples that I can listen to in Tensorboard sound great, but I'm not sure if there's a way to hear both voices in Tensorboard, or if only one is being trained. It appears that the samples are all the 'duke' voice.
Pastebin link with colab code and results: https://pastebin.com/Wb1PBfi5
I've searched up the other discussions regarding YourTTS and found others with a similar issue, but couldn't figure out a solution. I'm sure its user error on my part, but I'm in the weeds. Any help would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions