A Rust-based text-to-speech synthesizer that uses the CMU phonetic dictionary and pre-recorded phonemes to generate funny-sounding speech using my voice (or your own samples which you can compile into the program by replacing the ones in pronouncer_lib/audio).
- Text-to-speech synthesis using CMU phonetic dictionary
- High-quality pre-recorded phonemes for natural sound
- Smooth audio transitions using advanced crossfading
- Outputs standard WAV audio files (44.1kHz, 16-bit)
- Static compilation of audio data for standalone binaries
- Ensure you have Rust installed (https://rustup.rs/)
- Clone this repository
- Build the project:
cargo build --release
Run the program with words as arguments:
cargo run --release -- "hello world"
Or run it interactively:
cargo run --release
Enter a string: hello world
The program will generate an output.wav
file containing the synthesized speech.
The project is organized as a Rust workspace containing two main crates:
Core library containing the text-to-speech engine:
src/lib.rs
- Main library interface and audio processingsrc/phoneme.rs
- Phoneme enum and conversion functionsbuild.rs
- Build script for processing dictionary and audio filesaudio/
- Pre-recorded WAV files for each phonemebuild/
- Build-time resources including CMU dictionary
Command-line interface executable:
src/main.rs
- CLI implementation- Handles argument parsing and file I/O
-
Build System
- Processes CMU dictionary at compile time
- Serializes phoneme WAV files into binary data
- Generates optimized lookup tables
-
Phoneme System
- 39 distinct phonemes based on CMU dictionary
- Each phoneme has a corresponding WAV recording
- Efficient enum-based representation
-
Audio Processing
- 44.1kHz 16-bit mono WAV output
- Crossfading algorithm for smooth transitions
- Fileless audio storage - phoneme WAV data is serialized and embedded directly into the binary
-
Dictionary System
- CMU dictionary-based word to phoneme conversion
- Fallback to character-by-character pronunciation
- Efficient hashmap-based lookups
- The build script (
build.rs
) processes the CMU dictionary and WAV files - Dictionary is converted to a binary lookup table using bincode serialization
- WAV files are serialized and embedded directly into the binary
- Static initialization provides immediate access to audio data at runtime
- Input text is normalized and split into words
- Words are looked up in the CMU dictionary
- Unknown words fall back to character-by-character pronunciation
- Phoneme sequences are converted to audio samples
- Advanced crossfading is applied between phonemes
- Final audio is written to WAV file
- Audio data is compiled directly into the binary, eliminating runtime file I/O
- Efficient bincode serialization for compact data storage
- High-performance hashmap-based dictionary lookups
- Optimized crossfading algorithm for smooth transitions
Core dependencies:
bincode
: Fast serializationhashbrown
: High-performance hashmapshound
: WAV file handlinglazy_static
: Efficient static initializationserde
: Serialization framework
MIT