-
Notifications
You must be signed in to change notification settings - Fork 19
week 6.03 12.03.2017
Matthijs Van keirsbilck edited this page Mar 29, 2017
·
1 revision
-
goal: converting .Wav files to .MFCC files using HTK toolbox; - if you have MFCC file of whole .wav, extract specific parts from it using
HCopy -C config0 -s 10e7 -e 11e7 source.mfcc target.mfcc
(cuts 00:10 .. 00:11 from source.) -
do this in batch using scripts, see
audioSR/Preprocessing
- find a database (eg using this)
- search folders for .wav files; save them in a file; prepare file (.scp) for HCopy batch command. See
prepareWAV_HTK.py
- the WAV files had corrupted headers; fix them -> see
fixWavs.py
- run prepareWAV_HTK on fixed Wavs
- run HCopy using the output wrom prepareWAV_HTK
- copy files in 'mfc' folder back to wav folder. All label files, wav files, and mfc files are now together.
-
Using the data:
-
visualizing audio signals.
Slightly modified version inaudioSR/Preprocessing/helpFunctions/wavToPng.py
- MFC visualization: see visualizeMFC.m (using this Matlab library)
- prepare Audio SR: find example implementations (can't find much for Lasagne? Maybe work with Keras/TF/Torch or so, and then somehow combine both models? Or write everything from scratch?)
- important resources:
- LAS: MFCCs -> series of chars.
- CNN SR from raw audio. Quite interesting, they use CNN directly, don't generate MFCC using HTK.
- Hrayr example implementation: Blogpost and GitHub repo. Could possibly use this.
- Python implementation, bidirectional LSTM with TIMIT preprocessing
finishing lipreading software so we can easily choose a different network/model to use when evaluating. Also add Top-1 and Top-5 accuracy
- look at Microsoft paper (VGG, ResNet for speech): https://arxiv.org/abs/1610.05256