Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect timestamps #2279

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Incorrect timestamps #2279

wants to merge 2 commits into from

Conversation

bviksoe
Copy link

@bviksoe bviksoe commented Jul 3, 2024

Fixes #2271

  • Adds consecutive timestamps after end of last segment as the new starting ts
  • Add these timestamp to output when "print-special" enabled
  • Fixes fflush usage in live reporting

I was not able to test this with the special "token_timestamps" option.

NB: This is my first Github PR so go easy on me.

Fixes ggerganov#2271

- Adds consecutive timestamps after end of last segment as the new starting ts
- Add these timestamp to output when "print-special" enabled
- Fixes fflush usage in live reporting

I was not able to test this with the special "token_timestamps" option.
@thewh1teagle
Copy link
Contributor

@bviksoe

I tested this and it works.
good catch!
how did you found that problem? and how you got it fixed?

current whisper.cpp logs
cd /tmp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
wget "https://github.com/ggerganov/whisper.cpp/assets/61390950/bbf9d9c4-3d60-4693-832d-e48135edf379" -O audio.wav
cmake -B build .
cmake --build build
ffmpeg -i audio.wav -ar 16000 -ac 1 -c:a pcm_s16le normal.wav
./build/bin/main -f ./normal.wav -m "/Users/user/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin"

# Result

# [00:00:00.000 --> 00:00:06.000]   I-I-I just wanna tell you how I'm feelin'
# [00:00:06.000 --> 00:00:08.700]   Gotta make you understand that
# [00:00:08.700 --> 00:00:18.080]   Never gonna give you up, never gonna let you down
# [00:00:18.080 --> 00:00:25.280]   Never gonna run around and
PR log
# Test new PR

cd /tmp
git clone https://github.com/bviksoe/whisper.cpp -b master whisper1.cpp
cd whisper1.cpp
cmake -B build .
cmake --build build
./build/bin/main -f ../whisper.cpp/normal.wav -m "/Users/user/Library/Application Support/github.com.thewh1teagle.vibe/ggml-medium.bin"

# Result

# [00:00:00.000 --> 00:00:06.000]   I-I-I just wanna tell you how I'm feelin'
# [00:00:06.000 --> 00:00:08.700]   Gotta make you understand that
# [00:00:14.080 --> 00:00:18.080]   Never gonna give you up, never gonna let you down
# [00:00:22.600 --> 00:00:25.280]   Never gonna run around and

Notice that the third timestamp is correct in the PR log.

@bviksoe
Copy link
Author

bviksoe commented Jul 5, 2024

@thewh1teagle

how did you found that problem?

If you uncomment the line

//#define WHISPER_DEBUG

you compile with extended debug trace. Then you should be able to see that the model actually produces extra timestamp tokens that this library was ignoring.

I was actually looking into why main is producing so many repetitions and hallucinations compared to similar libraries based on the same model.

@manumaan
Copy link

I am unable to build it on Mac M1. It gives many errors, including stuff like mmintrin.h:14:2: error: "This header is only meant to be used on x86 and x64 architecture"

@bviksoe
Copy link
Author

bviksoe commented Jul 26, 2024

@manumaan
Most likely unrelated to this PR as it doesn't affect any build procedures.
The build error previously reported by CI was unrelated to this PR.

For general build problems, open a new Issue or ask in Discussions.
Best advice is to use CMake (rather than Make) for building as outlined in the project overview. CMake worked for me on Windows.

@dkakaie
Copy link

dkakaie commented Aug 14, 2024

This really helped with timing of my sentences. A segment would start long before it was actually spoken specially when music is played in between segments. However my word-level timestamps still suffer from going out of sync.

@thewh1teagle
Copy link
Contributor

@bviksoe
I found that it's accurate with medium model but tiny / small the timestamps are incorrect with silences

@mrfragger
Copy link

I'm thinking of trying to write a script to change any sub-timing end that is ahead of any sub-timing on the next line to become it. These are three examples with whisper.cpp 1.6.2 and having subs stay on screen for many minutes or even seconds when new subs come on it's pretty distracting. Hope to try out this PR someday or hope it gets merged.

06:22:32.254 --> 06:22:39.764
blah blah 1

06:22:39.764 --> 06:22:42.254 (<--for instance this would become 06:22:41.260)
blah blah 2

06:22:41.260 --> 06:22:46.660
blah blah 3

06:22:46.660 --> 06:22:53.380
blah blah 4

06:22:53.380 --> 06:22:58.940
blah blah 5

06:22:58.940 --> 06:23:04.580
blah blah 6


40:03:11.653 --> 40:03:16.213
more and more 1

40:03:16.213 --> 40:12:32.523 (<-- would become 40:03:24.933)
more and more 2

40:03:24.933 --> 40:03:29.573
more and more 3

40:03:30.373 --> 40:03:34.133
more and more 4

40:03:34.133 --> 40:03:34.773
more and more 5


57:02:18.560 --> 57:02:22.560
still some 1

57:02:22.560 --> 57:09:52.750 (<-- would become 57:02:30.560)
still some 2

57:02:30.560 --> 57:02:34.560
still some 3

57:02:34.560 --> 57:02:38.560
still some 4

57:02:38.560 --> 57:02:40.560
still some 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect timetstamps
5 participants