-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
running with public dataset #2
Comments
It was a subset of the Massive Auditory Lexical Decision database, which the lab I work in released in 2019. The full data set is over 3 hours, I think, of isolated English words recorded by a single young male speaker, in addition to nearly 10,000 recorded fake English words. I had switched away from it to TIMIT during testing because I didn't have an a priori idea of what the accuracy level should be to determine if the CTC loss function was working correctly. The code is actually in this repo already here. The If you are interested in the full data set, it is available here. The transcriptions are given as TextGrid files to use with the Praat program. If you don't already have a library to process the files to extract the transcriptions, you may want to use the At some point, I may try to update the code and possibly submit it to the Flux model zoo, but our semester started recently, so I'm low on spare time for a while. Let me know if you have any questions though! |
Oh, the actual model file is missing. Well, let me see if I can track that down. I will see if I can update it now anyway. |
@matthijsvk I have been able to re-create the code I was using for these demos and put it here. The data set is a bit funky for CTC because of it being onehot encoded. I am planning to make something for the model zoo, where I will re-extract the input and output values to be more appropriate to a CTC-style recognition system. Hopefully the stuff in this repo is still somewhat helpful until then. |
Great, thanks! |
Hi,
I saw you contributed the CTC loss function in [https://github.com/FluxML/Flux.jl/pull/1287]. Thanks for all that work :).
There you mentioned you had an example with a publicly available speech corpus.
Which one was that and would you be willing to upload the code for that?
The text was updated successfully, but these errors were encountered: