VITS sounds drunk on German #2834
Replies: 3 comments 5 replies
-
Hi @cschaefer26 , even i have no idea what the problem might be and want to congratulate you for your great forward_taco_melgan model 👏. It's sounding really good. |
Beta Was this translation helpful? Give feedback.
-
I am not sure but you can try disabling the blank_token in the config. Might make it more fluent. How large is your dataset? |
Beta Was this translation helpful? Give feedback.
-
Here is the train script if it helps: import os from trainer import Trainer, TrainerArgs from TTS.tts.configs.shared_configs import BaseDatasetConfig output_path = os.path.dirname(os.path.abspath(file)) model_args = VitsArgs() config = VitsConfig( ) tokenizer, config = TTSTokenizer.init_from_config(config, IPAPhonemes()) ap = AudioProcessor.init_from_config(config) train_samples, eval_samples = load_tts_samples( model = Vits(config, ap, tokenizer, speaker_manager=None) trainer = Trainer( |
Beta Was this translation helpful? Give feedback.
-
Hi, first of all thanks for your hard work to front the proprietary tts systems, love it. I am currently trying to train some German VITS models with coqui and I am finding that the prosody is really weird, here is a model output after about 200k steps:
Sentence:
Es ist schade, dass die EU als die humanste und moralischste aller Ländergruppierungen angesehen wird, aber sie wollen die Menschenrechte nicht aufrechterhalten und den Magnitsky Act nicht nutzen.
Phonemes:
ɛs ɪst ʃaːdə, das diː eːʔuː als diː humaːnstə ʊnt moʁaːlɪʃstə alɐ lɛndɐɡʁʊpiːʁʊŋən anɡəzeːən vɪʁt, aːbɐ ziː vɔlən diː mɛnʃn̩ʁɛçtə nɪçt aʊfʁɛçtʔɛɐhaltn̩ ʊnt deːn maɡnɪt͡ski ɛkt nɪçt nʊt͡sn̩.
noise=0.8
audio_noise_0.8.mp4
noise=0
audio_noise_0.mp4
For comparison here is our trained ForwardTacotron model (100k steps) + a modified MelGAN:
forward_taco_melgan.mp4
Any idea what could be the problem? I switched off phonemization and use IPAPhonemes as character set, the rest of the config is default. Any help would be appreciated :) - if you need I can of course post tensorboard graphs, configs etc.
Beta Was this translation helpful? Give feedback.
All reactions