-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there anyway to control how long after the last word is spoken before Vosk closes the session? #24
Comments
Hi, Same thing, could you please confirm parameter which sets the max silence threshold as currently it looks like it's very short. |
also looking at this |
You can change the following params in model.conf:
You can equally scale them up. |
Thanks for your response. When using this will this stop the audio stream from asterisk server to my websocket server from ending before the call ends? |
No, current module stops the stream after every result. This is how asterisk speech module works unfortunately. It would be nice to have some long transcription mode though. |
I see, the method I am also working on as an alternative due to this limitation is using a different method. This plugin - https://github.com/nadirhamid/asterisk-audiofork It provides a continuous audio stream, but of course doesn't work with vosk out of the box. What do you think would be needed to adapt the code to just use this binary data audio stream? |
You can just adapt backend server https://github.com/nadirhamid/audiofork-transcribe-demo, no need to update audiofork module itself, it should work the same way. |
The audiofork transcribe demo is using google's closed source transcription. How would I adapt it? Especially if I wanted to use an open source option. |
Ok, I made a script, but I am getting significant slow downs. I've tried configuring beam and other things you have suggested but still the result lags behind. This is on a CPU. Any things I can try to improve speed to be almost real time? https://gist.github.com/Goddard/b86c0469c42e1f4c415f37354a5f30db |
What is your hardware and how many streams are you trying to process |
In my tests I am only doing 1 stream. Architecture: x86_64 |
This is very small delay. How much memory do you have? |
64 gigs That is what the time reports, but I was thinking the process would be asynchronous between transcriptions so they wouldn't build up to taking longer than the person speaking. Unless i have an issue with my script I don't see a way to increase the speed. Because that is just the transcription time for each partial. It could have several partials. If you have 20 partials adding up to .2 or .9 that equals sometimes a 10 second delay to get a full transcription. Does vosk-api use a VAD as well? Do you think this would speed it up? |
Do you see this delay with asterisk-audiofork module or with vosk-asterisk? |
Even when using something only locally I get poor results for example - https://github.com/alphacep/vosk-server/tree/master/websocket-microphone python will claim the transcription process only took m.s. but really it takes a few seconds for the data to print to the screen. Sometimes it will take 4 seconds for the text to be printed to the terminal. I don't think it is a situation where python is slow because even the websocket-cpp boost beast appears to lag behind considerably. The vosk asterisk plugin appears to be a bit faster, but the transcription ends before the entire call ends so it isn't very useful. I just installed using a python virtual environment and pip requirements.txt on Ubuntu 22.04. My local machine is a newer intel cpu with 64 gigs as well Only thing I see is WARNING (VoskAPI:CheckMemoryUsage():determinize-lattice-pruned.cc:316) Did not reach requested beam in determinize-lattice: size exceeds maximum 50000000 bytes; (repo,arcs,elems) = (25158432,1108448,23744520), after rebuilding, repo size was 21053120, effective beam was 5.49789 vs. requested beam 6 |
for example using boost beast websocket example provided, it takes approxamately 4 seconds for the speech recognition to print. I used the websocket microphone example connected to a remote boost beast websocket example. But even locally I experience the same thing. Would a GPU be faster then that? |
Is there anyway to control how long after the last word is spoken before Vosk closes the session?
I am using the Python implementation and would like to limit how long the system will wait before closing the session.
Are there any parameter files I can create?
python3 ./asr_server.py /opt/vosk-model-en/model
The text was updated successfully, but these errors were encountered: