sr-demo

Clone the repo, install the dependencies in the requirements.txt file, then run 'django manage.py runserver'.

Summary:

Some things I learned:

I learned a lot about working with audio in Python, recording it, parsing it, changing the frame-rate, buckets, quality, etc. I used pyaudio and pytorch.audio for most of the processing.
I gained some experience with RNNs (used to predict likely sequences of letters)
I also learned a lot about sending audio between the backend (harder than it sounds, in my opinion, because you have to figure out some specific media requirements for django)

Here are some pictures: Click 'record' to start recording: Plot the classifications over time (28 classes: 26 letters, one space token, one silent token): Helpful overlays to explain project ideas: And at the end, it'll display the most likely word(s) in the audio clip (after showing the raw letters at first)!

Note: In order to run the demo, you need portaudio (a package that pyaudio depends on). If portaudio isn't installing correctly, try this:

pip install --global-option='build_ext' \
    --global-option='-I/usr/local/include' \
    --global-option='-L/usr/local/lib' \
    pyaudio

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
media		media
myapp		myapp
myproject		myproject
README.md		README.md
db.sqlite3		db.sqlite3
manage.py		manage.py
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback