A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

fork of kyubong's TensorFlow Implementation of DC-TTS

from Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention.

Contact

fork maintained by [email protected]

Requirements

NumPy >= 1.11.1
TensorFlow >= 1.12.0
librosa
tqdm
matplotlib
scipy

Changes from Kyubong

data loading for python3 (remove codecs and other fixes as seen in issue #11)
absolute paths to data (allow preprocessed files to be written to; read from specified absolute path)
argparse for specifying task number and GPU (train.py and synthesize.py)
allow custom sentences, output directory when generating, and output spectrograms as images

Example

rice is often served in round bowls

text2mel: ~50K steps; SSRN ~15K steps

text2mel: ~70K steps; SSRN ~20K steps

Data

LJSpeech

download LJ Speech Dataset 1.1
rename metadata.csv to transcript.csv (will update later...?)
add absolute path to corpus do hyperparams.py as data=<path>

Training

check corpus path in hyperparams.py
run python(3.6) prepo.py to create mel and mag subdirectories (this speeds up training a lot)
run python(3.6) train.py -n 1 to train text2mel. use -g flag to specify GPU id.
run python(3.6) train.py -n 2 to train SSRN (mel upscaling). use -g flag to specify GPU id.
note: both training can be run together using separate GPUs

Monitoring

tensorboard can be used to monitor training by specifying the log directory (for LJS, LJ01-1 and LJ01-2), e.g.:

tensorboard --logdir=/home/derek/PythonProjects/dc_tts/logs/LJ01-1 --port=9999

Synthesis

run python(3.6) synthesize.py, specifying GPU with -g, file with -f (format like harvard_sentences.txt), output dir with -o

the default file and output are specified in hyperparams.py

the spectrograms are also written as images

(from Kyubong master) Notes

The paper didn't mention normalization, but without normalization I couldn't get it to work. So I added layer normalization.
The paper fixed the learning rate to 0.001, but it didn't work for me. So I decayed it.
I tried to train Text2Mel and SSRN simultaneously, but it didn't work. I guess separating those two networks mitigates the burden of training.
The authors claimed that the model can be trained within a day, but unfortunately the luck was not mine. However obviously this is much fater than Tacotron as it uses only convolution layers.
Thanks to the guided attention, the attention plot looks monotonic almost from the beginning. I guess this seems to hold the aligment tight so it won't lose track.
The paper didn't mention dropouts. I applied them as I believe it helps for regularization.
Check also other TTS models such as Tacotron and Deep Voice 3.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.idea		.idea
files/KSS		files/KSS
img		img
logs		logs
samples		samples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client.py		client.py
data_load.py		data_load.py
harvard_sentences.txt		harvard_sentences.txt
host.py		host.py
hyperparams.py		hyperparams.py
korean_sents.txt		korean_sents.txt
modules.py		modules.py
movie_quotes.txt		movie_quotes.txt
networks.py		networks.py
prepo.py		prepo.py
requirements.txt		requirements.txt
single_sentence.txt		single_sentence.txt
synthesize.py		synthesize.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Contact

Requirements

Changes from Kyubong

Example

Data

LJSpeech

Training

Monitoring

Synthesis

(from Kyubong master) Notes

About

Releases

Packages

Languages

License

goodatlas/dc_tts

Folders and files

Latest commit

History

Repository files navigation

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model

Contact

Requirements

Changes from Kyubong

Example

Data

LJSpeech

Training

Monitoring

Synthesis

(from Kyubong master) Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages