GST Prosody transfer on Tacotron 2 is not working #1723
Replies: 6 comments 14 replies
-
Hey, LJSpeech might not be the best dataset to train GST on because of the lack of prosody. In the paper they use data from the 2013 Blizzard Challenge. But you might try with audio book data as well since it tends to be more expressive. |
Beta Was this translation helpful? Give feedback.
-
You can also try inference by tweaking manually the style tokens to see if you can get variations. |
Beta Was this translation helpful? Give feedback.
-
That makes sense I will try to train a new model using Blizzard dataset or any Audio book data and will give an update here. Can you please elaborate on tweaking style tokens manually, do you mean by inputting a dict to gst_style_input_weights parameter? can you give any example of manual style tokens? Thanks |
Beta Was this translation helpful? Give feedback.
-
I move it as it is not a functional issue. |
Beta Was this translation helpful? Give feedback.
-
Hey @saibharani, did you get better results with another dataset? |
Beta Was this translation helpful? Give feedback.
-
Can anyone post example code for tacotron2 + gst training and inference? (Possibly with the style tokens provided during inference, but I can probably get that in there myself.) |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
I trained an Tacotron2 GST model on LJspeech dataset and own Emotional dataset for 100k steps using
use_gst=True, gst=GSTConfig(),
options in training.To Reproduce
But during inference the audio sounds same with or without using style_wav. I used
from TTS.utils.synthesizer import Synthesizer
function for synthesizing the text. Can you suggest any changes in config or any corrections to improve the prosody transfer quality.Expected behavior
No response
Logs
No response
Environment
Additional context
Any help is appreciated, thanks.
Beta Was this translation helpful? Give feedback.
All reactions