Skip to content

log-y/sr-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sr-demo

Clone the repo, install the dependencies in the requirements.txt file, then run 'django manage.py runserver'.

Summary:

  • Trained on 1,000 hours of speech data from CommonVoice (ran three epochs)
  • Has ~8M parameters (~30mb file)
  • $35 dollars to train (using cloud resource GPUs)
  • Used a single A100 GPU for ~40 hours

Some things I learned:

  • I learned a lot about working with audio in Python, recording it, parsing it, changing the frame-rate, buckets, quality, etc. I used pyaudio and pytorch.audio for most of the processing.

  • I gained some experience with RNNs (used to predict likely sequences of letters)

  • I also learned a lot about sending audio between the backend (harder than it sounds, in my opinion, because you have to figure out some specific media requirements for django)

    Here are some pictures: Click 'record' to start recording: s1 Plot the classifications over time (28 classes: 26 letters, one space token, one silent token): s3 Helpful overlays to explain project ideas: s4 And at the end, it'll display the most likely word(s) in the audio clip (after showing the raw letters at first)! s5

Note: In order to run the demo, you need portaudio (a package that pyaudio depends on). If portaudio isn't installing correctly, try this:

pip install --global-option='build_ext' \
    --global-option='-I/usr/local/include' \
    --global-option='-L/usr/local/lib' \
    pyaudio

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published