Simple user interface for transcribing audio into text (STT).
The user interface records audio, broadcasts it on a Redis channel and then displays the transcript that was received on another Redis channel. The interface is therefore not tied to a particular STT engine, as long as it can communicate via Redis channels.
Uses python-sounddevice under the hood for recording the audio (stumbled across the library in this post).
Install redis on your machine (if not already present):
sudo apt install redis-server
sudo systemctl restart redis
Set up the virtual environment and install the application:
virtualenv -p /usr/bin/python3 venv
./venv/bin/pip install "git+ssh://[email protected]/waikato-ufdl/gtk-audio-transcribe.git"
The following example uses Coqui STT via Redis and docker.
Download the English tflite model into the current directory and start the Coqui container from the same directory:
docker run \
--net=host \
-v `pwd`:/workspace \
-it waikatodatamining/tf_coqui_stt:1.15.2_0.10.0a10_cpu \
stt_transcribe_redis \
--redis_in audio \
--redis_out transcript \
--model /workspace/full.tflite \
--verbose
Create a YAML config file called config.yaml
in the current directory with the following content:
# the
redis:
host: "localhost"
port: 6379
db: 0
channel_out: "audio"
channel_in: "transcript"
recording:
# the device to use for recording
device: "pulse"
# maximum length in seconds
max_duration: 3.0
# the number of channels
num_channels: 1
# the sample rate in Hz
sample_rate: 16000
Start the application as follows:
./venv/bin/python3 -c config.yaml