Models (how to choose the right ones) #203
-
First of all, thank you, it seems to be a great tool! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @ACEXAN, It's not a stupid question -- we definitely do need to provide some more information at the 'select models' phase. As you said, unless we informed about the pros/cons of each model, it's hard to know what to pick. For whisper -- as the model size increases, the transcription (and especially translation) accuracy will increase -- but the processing time is much longer. I wouldn't necessarily install all of them -- perhaps base, small, medium, and large-v3, but even that might be overkill. It also depends on what you want to do with it. If you want to use it to produce usable subtitles, for example, I would probably use a larger model like medium or large. On the other hand, if you want to produce a label track only to use as a 'guide' for your project -- running 'base' is probably just fine. |
Beta Was this translation helpful? Give feedback.
-
If I may chime in here. Done some testing with different models and quite frankly, (at least for English), the large-v3 is the best. Yes, it takes a bit longer (about 1 minute of processing time on my machine for minute of audio, compared to eg large-v2 which seems to be about half of that) but if you want to create quality subtitles for eg Youtube, the editing work to fix any remaining errors is just the least in that. So for anything going public, my personal opinion is that large-v3 is the way to go! |
Beta Was this translation helpful? Give feedback.
Hi @ACEXAN,
It's not a stupid question -- we definitely do need to provide some more information at the 'select models' phase. As you said, unless we informed about the pros/cons of each model, it's hard to know what to pick.
For whisper -- as the model size increases, the transcription (and especially translation) accuracy will increase -- but the processing time is much longer. I wouldn't necessarily install all of them -- perhaps base, small, medium, and large-v3, but even that might be overkill. It also depends on what you want to do with it. If you want to use it to produce usable subtitles, for example, I would probably use a larger model like medium or large. On the other hand, if …