Add a readme for the parler-tts example. (#2434)

* Add a readme for the parler-tts example. * Remove the python decode script. * mp4 tweaks. * Another readme tweak.
huggingface · Aug 19, 2024 · 14fd2d9 · 14fd2d9
1 parent 31a1075
commit 14fd2d9
Show file tree

Hide file tree

Showing 4 changed files with 24 additions and 30 deletions.
diff --git a/README.md b/README.md
@@ -120,6 +120,8 @@ We also provide a some command line based examples using state of the art models
   model using residual vector quantization.
 - [MetaVoice](./candle-examples/examples/metavoice/): foundational model for
   text-to-speech.
+- [Parler-TTS](./candle-examples/examples/parler-tts/): large text-to-speech
+  model.
 - [T5](./candle-examples/examples/t5), [Bert](./candle-examples/examples/bert/),
   [JinaBert](./candle-examples/examples/jina-bert/) : useful for sentence embeddings.
 - [DINOv2](./candle-examples/examples/dinov2/): computer vision model trained
@@ -236,6 +238,7 @@ If you have an addition to this list, please submit a pull request.
         - Whisper, multi-lingual speech-to-text.
         - EnCodec, audio compression model.
         - MetaVoice-1B, text-to-speech model.
+        - Parler-TTS, text-to-speech model.
     - Computer Vision Models.
         - DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT,
           ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera.

diff --git a/candle-examples/examples/parler-tts/README.md b/candle-examples/examples/parler-tts/README.md
@@ -0,0 +1,21 @@
+# candle-parler-tts
+
+[Parler-TTS](https://huggingface.co/parler-tts/parler-tts-large-v1) is a large
+text-to-speech model with 2.2B parameters trained on ~45K hours of audio data.
+The voice can be controlled by a text prompt.
+
+## Run an example
+
+```bash
+cargo run --example parler-tts -r -- \
+  --prompt "Hey, how are you doing today?"
+```
+
+In order to specify some prompt for the voice, use the `--description` argument.
+```bash
+cargo run --example parler-tts -r -- \
+  --prompt "Hey, how are you doing today?" \
+  --description "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
+```
+
+https://github.com/huggingface/candle/raw/main/candle-examples/examples/parler-tts/hello.mp4
diff --git a/candle-examples/examples/parler-tts/decode.py b/candle-examples/examples/parler-tts/decode.py
diff --git a/candle-examples/examples/parler-tts/hello.mp4 b/candle-examples/examples/parler-tts/hello.mp4