Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking back to the beginning of the file in scanFileAndUpdateMetadataAndIndex #455

Open
NicolasHug opened this issue Jan 20, 2025 · 1 comment

Comments

@NicolasHug
Copy link
Member

In scanFileAndUpdateMetadataAndIndex, we first scan the entire file by receiving all packets in the container, calling av_read_frame() until the end of the video:

int ffmpegStatus = av_read_frame(formatContext_.get(), packet.get());

After this scan, we call av_seek_file to go back to the beginning of the file:

int ffmepgStatus =
avformat_seek_file(formatContext_.get(), 0, INT64_MIN, 0, 0, 0);

(docs)

Specifically we pass:

  • 0 as reference stream index
  • 0 as the desired pts location

I wonder if we should instead pass:

  • the best stream's index instead of 0. I feel like the best stream has a higher chance of being the correct reference to use when seeking.
  • minPts as the desired pts location, instead of 0. As we've seen, some videos may start at negative pts values.
@scotts
Copy link
Contributor

scotts commented Jan 22, 2025

I also noticed that we don't seek to the beginning at the start of the function, either. That is, scanFileAndUpdateMetadataAnIndex() is a public method on the C++ VideoDecoder class. In theory, someone could easily call it after decoding a bunch of frames. The Python VideoDecoder API prevents such a scenario. However, after reasoning through it a bunch, I think we can say such a sequence of calls is invalid. The scan function was clearly written assuming no one else would do any other seeking before calling it. And the functions we have internally for seeking while decoding assume we have active streams. We also have some internal callers of the C++ scan function, so we can't make it private.

I dug into the code for avformat_find_stream_info() which does a less extreme version of what we're doing. In that function, one of the first things they do is figure out where the cursor currently is, and store it in old_offset. And then whenever they read packets from the file, they make sure to call (effectively) avio_seek(file_handle, old_offset).

Curiously, because the logic above always queries where the current cursor is, and reset is, it doesn't answer our question of what we should use for the beginning. Based on all of that plus @NicolasHug's analysis, I think the most correct thing for us to do is (roughly):

  1. Assert that maybeDesiredPts_ == std::nullopt. Note that this variable is a misnomer: it's actually a double in seconds, even though its name sounds like it's an int64 in pts. This also means we can assume that the cursor is in fact at the beginning.
  2. Do the current logic.
  3. Call av_seek_file() on the best stream and minPtsFromScan. Note that we just computed that value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants