GST Prosody transfer on Tacotron 2 is not working #1723

saibharani · 2022-06-25T18:04:24Z

saibharani
Jun 25, 2022

Describe the bug

I trained an Tacotron2 GST model on LJspeech dataset and own Emotional dataset for 100k steps using use_gst=True, gst=GSTConfig(), options in training.

To Reproduce

But during inference the audio sounds same with or without using style_wav. I used from TTS.utils.synthesizer import Synthesizer function for synthesizing the text. Can you suggest any changes in config or any corrections to improve the prosody transfer quality.

Expected behavior

No response

Logs

No response

Environment

- TTS version 0.7.0
- Pytorch version 1.9.0
- CUDA version 10.2
- OS Ubuntu 18.04
- Python 3.8.13

Additional context

Any help is appreciated, thanks.

WeberJulian · 2022-06-28T09:15:42Z

WeberJulian
Jun 28, 2022

Hey, LJSpeech might not be the best dataset to train GST on because of the lack of prosody. In the paper they use data from the 2013 Blizzard Challenge. But you might try with audio book data as well since it tends to be more expressive.

0 replies

WeberJulian · 2022-06-28T09:16:18Z

WeberJulian
Jun 28, 2022

You can also try inference by tweaking manually the style tokens to see if you can get variations.

0 replies

saibharani · 2022-06-28T09:59:45Z

saibharani
Jun 28, 2022
Author

That makes sense I will try to train a new model using Blizzard dataset or any Audio book data and will give an update here.

Can you please elaborate on tweaking style tokens manually, do you mean by inputting a dict to gst_style_input_weights parameter? can you give any example of manual style tokens?

Thanks

1 reply

erogol Jul 11, 2022
Maintainer

Yes, you can try different weights and see how the output changes. There is no recipe for the right weights.

erogol · 2022-07-11T11:02:27Z

erogol
Jul 11, 2022
Maintainer

I move it as it is not a functional issue.

0 replies

WeberJulian · 2022-07-18T08:20:30Z

WeberJulian
Jul 18, 2022

Hey @saibharani, did you get better results with another dataset?

13 replies

saibharani Jul 18, 2022
Author

And can these ForwardTTS models be made MultiLingual? If yes can you point me to any example implementations.

WeberJulian Jul 18, 2022

I don't think it has been tested yet, but it shouldn't be too hard to do.

saibharani Jul 18, 2022
Author

I was referring to the LanguageEmbedding layer in VITS can it be implemented in ForwardTTS?

saibharani Jul 18, 2022
Author

and can we add new languages once the LanguageEmbedding layer is trained in VITS? I have a model in VITS and I want to add new Languages to it

WeberJulian Jul 18, 2022

Yeah but you have to do it manually.

jaggzh · 2024-10-26T15:55:05Z

jaggzh
Oct 26, 2024

Can anyone post example code for tacotron2 + gst training and inference? (Possibly with the style tokens provided during inference, but I can probably get that in there myself.)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GST Prosody transfer on Tacotron 2 is not working #1723

{{title}}

Replies: 6 comments 14 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

GST Prosody transfer on Tacotron 2 is not working #1723

saibharani Jun 25, 2022

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Replies: 6 comments · 14 replies

WeberJulian Jun 28, 2022

WeberJulian Jun 28, 2022

saibharani Jun 28, 2022 Author

erogol Jul 11, 2022 Maintainer

erogol Jul 11, 2022 Maintainer

WeberJulian Jul 18, 2022

saibharani Jul 18, 2022 Author

WeberJulian Jul 18, 2022

saibharani Jul 18, 2022 Author

saibharani Jul 18, 2022 Author

WeberJulian Jul 18, 2022

jaggzh Oct 26, 2024

saibharani
Jun 25, 2022

Replies: 6 comments 14 replies

WeberJulian
Jun 28, 2022

WeberJulian
Jun 28, 2022

saibharani
Jun 28, 2022
Author

erogol Jul 11, 2022
Maintainer

erogol
Jul 11, 2022
Maintainer

WeberJulian
Jul 18, 2022

saibharani Jul 18, 2022
Author

saibharani Jul 18, 2022
Author

saibharani Jul 18, 2022
Author

jaggzh
Oct 26, 2024