Training vs tensorboard metrics #211

smlkdev · 2024-11-08T10:56:52Z

Will my training yield better results over time? Currently, the training took about 9 hours.
I have 1500 wav samples, with a total audio length of approximately 2 hours.

What other metrics should I pay attention to in TensorBoard?

smlkdev · 2024-11-09T09:44:39Z

Update after ~34h:
Little improvement visible but note sure should I keep it longer because of the flattening.

jeremy110 · 2024-11-09T13:17:47Z

We usually look at g/total, and from your graph, it seems to be decreasing pretty well. But I’m not sure if 2 hours of training data is enough; I initially used around 8 to 10 hours for training.

smlkdev · 2024-11-10T21:11:11Z

We usually look at g/total, and from your graph, it seems to be decreasing pretty well. But I’m not sure if 2 hours of training data is enough; I initially used around 8 to 10 hours for training.

@jeremy110 Thank you for your response! I’m honestly a bit hooked on watching the progress as it keeps going down, so I can’t seem to stop checking in :-)

Currently at 68 hours.

I’m planning to create an 8-10 hour audio dataset for the next training session. Could you suggest what kind of text data I should gather for it? So far, I’ve used random articles and some ChatGPT-generated data, but I’ve heard that people sometimes read books, for example. Is there perhaps a dataset available with quality English sentences that covers a variety of language phenomena? I tried to find it but no results.

jeremy110 · 2024-11-11T01:07:05Z

@smlkdev
Basically, this training can be kept short since it’s just a fine-tuning session; no need to make it too long. Here’s my previous tensorboard log for your reference(#120 (comment)).

I haven’t specifically researched text types. My own dataset was professionally recorded, with sentences that resemble reading books. I’m not very familiar with English datasets—are you planning to train in English?

smlkdev · 2024-11-11T10:44:06Z

This is my first attempt with ML/training/voice cloning and decided to use english. I read briefly Thai thread and it was way too complex for me to start with.

Your training was 32 hours long and for me (I'm not the expert) infer voice matched original :) That's really nice. Is it a voice that had 8-10 hours of audio as you mentioned earlier?

jeremy110 · 2024-11-11T12:04:16Z

Yes, that's correct. I tried both single-speaker and multi-speaker models, and the total duration is around 8-10 hours.

If this is your first time getting into it, I recommend you try F5-TTS. There are a lot of people in the forums who have trained their own models, and some even wrote a Gradio interface, which is very convenient.

smlkdev · 2024-11-12T19:45:17Z

@jeremy110 thank your for your responses.

Is F5-TTS better than MeloTTS in terms of quality?

I just realized that my cloned MeloTTS voice doesn’t add breaks between sentences. I have to add them manually—by splitting the text into sentences, breaking it down into smaller parts, generating and then merging it back together after adding pauses. It can be made automatically of course but still a bit of work. (I was focusing on single sentences before and I liked the quality)

jeremy110 · 2024-11-13T01:13:07Z

In terms of quality, I think F5-TTS is quite good. You can try it out on the Huggingface demo.

The pauses within sentences mainly depend on your commas (","). The program adds a space after punctuation to create a pause. However, if the audio files you trained on have very little silence before and after the speech, the generated audio will also have little silence. Of course, you can add the pauses manually, but you could also address it by adjusting the training data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training vs tensorboard metrics #211

Training vs tensorboard metrics #211

smlkdev commented Nov 8, 2024

smlkdev commented Nov 9, 2024

jeremy110 commented Nov 9, 2024

smlkdev commented Nov 10, 2024

jeremy110 commented Nov 11, 2024

smlkdev commented Nov 11, 2024

jeremy110 commented Nov 11, 2024

smlkdev commented Nov 12, 2024 •

edited

Loading

jeremy110 commented Nov 13, 2024 •

edited

Loading

Training vs tensorboard metrics #211

Training vs tensorboard metrics #211

Comments

smlkdev commented Nov 8, 2024

smlkdev commented Nov 9, 2024

jeremy110 commented Nov 9, 2024

smlkdev commented Nov 10, 2024

jeremy110 commented Nov 11, 2024

smlkdev commented Nov 11, 2024

jeremy110 commented Nov 11, 2024

smlkdev commented Nov 12, 2024 • edited Loading

jeremy110 commented Nov 13, 2024 • edited Loading

smlkdev commented Nov 12, 2024 •

edited

Loading

jeremy110 commented Nov 13, 2024 •

edited

Loading