week 6.03 12.03.2017

goal: converting .Wav files to .MFCC files using HTK toolbox; - if you have MFCC file of whole .wav, extract specific parts from it using
HCopy -C config0 -s 10e7 -e 11e7 source.mfcc target.mfcc (cuts 00:10 .. 00:11 from source.)
do this in batch using scripts, see audioSR/Preprocessing
1. find a database (eg using this)
2. search folders for .wav files; save them in a file; prepare file (.scp) for HCopy batch command. See prepareWAV_HTK.py
3. the WAV files had corrupted headers; fix them -> see fixWavs.py
4. run prepareWAV_HTK on fixed Wavs
5. run HCopy using the output wrom prepareWAV_HTK
6. copy files in 'mfc' folder back to wav folder. All label files, wav files, and mfc files are now together.
Using the data:
- HTK data format

visualizing audio signals.
Slightly modified version in audioSR/Preprocessing/helpFunctions/wavToPng.py
MFC visualization: see visualizeMFC.m (using this Matlab library)

prepare Audio SR: find example implementations (can't find much for Lasagne? Maybe work with Keras/TF/Torch or so, and then somehow combine both models? Or write everything from scratch?)
important resources:
- LAS: MFCCs -> series of chars.
- CNN SR from raw audio. Quite interesting, they use CNN directly, don't generate MFCC using HTK.
- Hrayr example implementation: Blogpost and GitHub repo. Could possibly use this.
- Python implementation, bidirectional LSTM with TIMIT preprocessing

finishing lipreading software so we can easily choose a different network/model to use when evaluating. Also add Top-1 and Top-5 accuracy

look at Microsoft paper (VGG, ResNet for speech): https://arxiv.org/abs/1610.05256

Provide feedback