Models (how to choose the right ones) #203

ACEXAN · 2024-05-20T21:42:15Z

ACEXAN
May 20, 2024

First of all, thank you, it seems to be a great tool!
This probably gonna sound stupid but, during the installation, how does one know which models to install and what each one does (better)? The list is quite long. For example, there's quite a list of models for Whisper (long 1, long 2, long 3, short etc...) does each one do something specific? The smaller the better? The bigger the better? Or should I just install all of them?
Thanks in advance!
P.S. I looked for similar discussions, I swear!..

Answered by RyanMetcalfeInt8

May 21, 2024

Hi @ACEXAN,

It's not a stupid question -- we definitely do need to provide some more information at the 'select models' phase. As you said, unless we informed about the pros/cons of each model, it's hard to know what to pick.

For whisper -- as the model size increases, the transcription (and especially translation) accuracy will increase -- but the processing time is much longer. I wouldn't necessarily install all of them -- perhaps base, small, medium, and large-v3, but even that might be overkill. It also depends on what you want to do with it. If you want to use it to produce usable subtitles, for example, I would probably use a larger model like medium or large. On the other hand, if …

View full answer

RyanMetcalfeInt8 · 2024-05-21T00:28:42Z

RyanMetcalfeInt8
May 21, 2024
Maintainer

Hi @ACEXAN,

It's not a stupid question -- we definitely do need to provide some more information at the 'select models' phase. As you said, unless we informed about the pros/cons of each model, it's hard to know what to pick.

For whisper -- as the model size increases, the transcription (and especially translation) accuracy will increase -- but the processing time is much longer. I wouldn't necessarily install all of them -- perhaps base, small, medium, and large-v3, but even that might be overkill. It also depends on what you want to do with it. If you want to use it to produce usable subtitles, for example, I would probably use a larger model like medium or large. On the other hand, if you want to produce a label track only to use as a 'guide' for your project -- running 'base' is probably just fine.

1 reply

ACEXAN May 23, 2024
Author

I see. So since I mostly plan on transcribing my audio notes, I should fly somewhere between base and medium. And when the time comes for some serious (subtitles, etc...) "heavy lifting" I'll add larger models.
Thank you!

The3IC · 2024-05-21T07:09:04Z

The3IC
May 21, 2024

If I may chime in here. Done some testing with different models and quite frankly, (at least for English), the large-v3 is the best. Yes, it takes a bit longer (about 1 minute of processing time on my machine for minute of audio, compared to eg large-v2 which seems to be about half of that) but if you want to create quality subtitles for eg Youtube, the editing work to fix any remaining errors is just the least in that. So for anything going public, my personal opinion is that large-v3 is the way to go!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models (how to choose the right ones) #203

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Models (how to choose the right ones) #203

ACEXAN May 20, 2024

Replies: 2 comments · 1 reply

RyanMetcalfeInt8 May 21, 2024 Maintainer

ACEXAN May 23, 2024 Author

The3IC May 21, 2024

ACEXAN
May 20, 2024

Replies: 2 comments 1 reply

RyanMetcalfeInt8
May 21, 2024
Maintainer

ACEXAN May 23, 2024
Author

The3IC
May 21, 2024