Create voice datasets in the lj-speech format.(1) Export your dataset and use it to train a model with Coqui 🐸 TTS
Choose the directory where you would like to retrieve your exported dataset.
Bind mount that directory to /app/backend/user/projects
with (-v) the volume option as below
docker run -p 8000:8000 -v /path/to/your/export/dir:/app/backend/user/projects echokeeper
Running the container this way loads these prompts. You could record a 25minute dataset using these. The result will probably not fit your use case. I recommend building your own prompts with this notebook.
If you want to use your own prompts...
- Choose the directory where you would like to retrieve your exported dataset. In this example, we'll call that directory 'output'.
- Create a directory structure like this:
/output
|-/projects
|--leave this dir empty if you are starting from scratch
|--if you want to continue recording to a project, place that project folder here
|-/prompts
|--prompt.json
- Bind mount your output dir to the
/app/backend/user
docker run -p 8000:8000 -v /path/to/your/export/dir:/app/backend/user echokeeper
Type a name for your project into the New Project input
Whisper is the magic behind this whole app. It transcribes your recorded audio into text Select your language and model size. The bigger models perform better but require more RAM and disk space.
The model will be downloaded after you make your first recording. Unfortunately, this means there will be a longish pause between the moment that you finish your first recording and the moment that you see your first transcription. Every following transcription will feel instantaneous.
run on your machine without a container
- Whisper requires the command-line tool
ffmpeg
andportaudio
to be installed on your system, which is available from most package managers:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
sudo apt install portaudio19-dev
# on Arch Linux
sudo pacman -S ffmpeg
sudo pacman -S portaudio
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
brew install portaudio
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
- Install the backend and frontend environmet
sh install_playground.sh
- Run the backend
cd backend && source venv/bin/activate && flask run --port 8000
- In a different terminal, run the React frontend
cd interface && yarn start
- [] Normalize dataset audio during the Export phase
- [] Track duration of recordings
- [] In export_metadata_txt(), create the third column of the ljspeech format, per Keith Ito's spec
This repository and the code and model weights of Whisper are released under the MIT License.
(1) https://keithito.com/LJ-Speech-Dataset/ Metadata is provided in transcripts.csv. This file consists of one record per line, delimited by the pipe character (0x7c). The fields are:
ID: this is the name of the corresponding .wav file
Transcription: words spoken by the reader (UTF-8)
Normalized Transcription: transcription with numbers, ordinals, and monetary units expanded into full words (UTF-8).
!Note: Normalized Transcriptions aren't complete as of 5/6/23