Help with training a model #4028
Unanswered
knight3000
asked this question in
General Q&A
Replies: 1 comment
-
6 short wav files is really not enough to train a model... |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All! I'm trying to train the model but i'm running into a few errors which i cannot figure. I have "good" clean wav files with associated metadata. I have 6 wav files in total ranging from 4 -13 seconds each. Could anyone provide some guidance? Here is my entire console output when i running the script:
`> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:None
| > pitch_fmin:1.0
| > pitch_fmax:640.0
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:45
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
| > Found 1 files in C:\Users\Victor\Desktop\voice cloning project\voices\voice
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Victor\Desktop\voice cloning project\training.py", line 83, in
trainer.fit()
File "C:\Users\Victor\Desktop\voice cloning project\clone_env\lib\site-packages\trainer\trainer.py", line 1860, in fit
remove_experiment_folder(self.output_path)
File "C:\Users\Victor\Desktop\voice cloning project\clone_env\lib\site-packages\trainer\generic_utils.py", line 77, in remove_experiment_folder
fs.rm(experiment_path, recursive=True)
File "C:\Users\Victor\Desktop\voice cloning project\clone_env\lib\site-packages\fsspec\implementations\local.py", line 183, in rm
shutil.rmtree(p)
File "C:\Users\Victor\AppData\Local\Programs\Python\Python39\lib\shutil.py", line 759, in rmtree
return _rmtree_unsafe(path, onerror)
return _rmtree_unsafe(path, onerror)
File "C:\Users\Victor\AppData\Local\Programs\Python\Python39\lib\shutil.py", line 629, in _rmtree_unsaf File "C:\Users\Victor\AppData\Local\Programs\Python\Python39\lib\shutil.py", line 629, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
onerror(os.unlink, fullname, sys.exc_info())
File "C:\Users\Victor\AppData\Local\Programs\Python\Python39\lib\shutil.py", line 627, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:/Users/Victor/Desktop/voice cloning project/run-October-16-2024_04+24PM-0000000\trainer_0_log.txt'`
And here the training script im working with:
import os
// Trainer: Where the ✨️ happens.
// TrainingArgs: Defines the set of arguments of the Trainer.
from trainer import Trainer, TrainerArgs
// GlowTTSConfig: all model related values for training, validating and testing.
from TTS.tts.configs.glow_tts_config import GlowTTSConfig
// BaseDatasetConfig: defines name, formatter and path of the dataset.
from TTS.tts.configs.shared_configs import BaseDatasetConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.glow_tts import GlowTTS
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
// Set the path to your dataset directory.
your_dataset_directory = "voices\voice\" # Change this to your directory
// we use the same path as this script as our training folder.
output_path = os.path.dirname(os.path.abspath(file))
// DEFINE DATASET CONFIG
// Set your custom dataset and define its path.
dataset_config = BaseDatasetConfig(
formatter="ljspeech", # Use the ljspeech formatter
meta_file_train="metadata.txt", # Update if your metadata file has a different name
path=your_dataset_directory, # Use your dataset directory
)
// INITIALIZE THE TRAINING CONFIGURATION
// Configure the model. Every config class inherits the BaseTTSConfig.
config = GlowTTSConfig(
batch_size=4,
eval_batch_size=2,
num_loader_workers=1,
num_eval_loader_workers=1,
run_eval=False,
test_delay_epochs=-1,
epochs=1000,
text_cleaner="phoneme_cleaners",
use_phonemes=True,
phoneme_language="en-us",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
print_step=25,
print_eval=False,
mixed_precision=False,
output_path=output_path,
datasets=[dataset_config],
)
if name == "main":
// INITIALIZE THE AUDIO PROCESSOR
// Audio processor is used for feature extraction and audio I/O.
ap = AudioProcessor.init_from_config(config)
Beta Was this translation helpful? Give feedback.
All reactions