Seeking back to the beginning of the file in `scanFileAndUpdateMetadataAndIndex` #455

NicolasHug · 2025-01-20T15:33:18Z

In scanFileAndUpdateMetadataAndIndex, we first scan the entire file by receiving all packets in the container, calling av_read_frame() until the end of the video:

torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp

Line 551 in d26bfbc

int ffmpegStatus = av_read_frame(formatContext_.get(), packet.get());

After this scan, we call av_seek_file to go back to the beginning of the file:

torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp

Lines 593 to 594 in d26bfbc

    
           int ffmepgStatus = 
        
               avformat_seek_file(formatContext_.get(), 0, INT64_MIN, 0, 0, 0);

(docs)

Specifically we pass:

0 as reference stream index
0 as the desired pts location

I wonder if we should instead pass:

the best stream's index instead of 0. I feel like the best stream has a higher chance of being the correct reference to use when seeking.
minPts as the desired pts location, instead of 0. As we've seen, some videos may start at negative pts values.

The text was updated successfully, but these errors were encountered:

scotts · 2025-01-22T21:21:07Z

I also noticed that we don't seek to the beginning at the start of the function, either. That is, scanFileAndUpdateMetadataAnIndex() is a public method on the C++ VideoDecoder class. In theory, someone could easily call it after decoding a bunch of frames. The Python VideoDecoder API prevents such a scenario. However, after reasoning through it a bunch, I think we can say such a sequence of calls is invalid. The scan function was clearly written assuming no one else would do any other seeking before calling it. And the functions we have internally for seeking while decoding assume we have active streams. We also have some internal callers of the C++ scan function, so we can't make it private.

I dug into the code for avformat_find_stream_info() which does a less extreme version of what we're doing. In that function, one of the first things they do is figure out where the cursor currently is, and store it in old_offset. And then whenever they read packets from the file, they make sure to call (effectively) avio_seek(file_handle, old_offset).

Curiously, because the logic above always queries where the current cursor is, and reset is, it doesn't answer our question of what we should use for the beginning. Based on all of that plus @NicolasHug's analysis, I think the most correct thing for us to do is (roughly):

Assert that maybeDesiredPts_ == std::nullopt. Note that this variable is a misnomer: it's actually a double in seconds, even though its name sounds like it's an int64 in pts. This also means we can assume that the cursor is in fact at the beginning.
Do the current logic.
Call av_seek_file() on the best stream and minPtsFromScan. Note that we just computed that value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking back to the beginning of the file in `scanFileAndUpdateMetadataAndIndex` #455

Seeking back to the beginning of the file in `scanFileAndUpdateMetadataAndIndex` #455

NicolasHug commented Jan 20, 2025

scotts commented Jan 22, 2025

Seeking back to the beginning of the file in scanFileAndUpdateMetadataAndIndex #455

Seeking back to the beginning of the file in scanFileAndUpdateMetadataAndIndex #455

Comments

NicolasHug commented Jan 20, 2025

scotts commented Jan 22, 2025

Seeking back to the beginning of the file in `scanFileAndUpdateMetadataAndIndex` #455

Seeking back to the beginning of the file in `scanFileAndUpdateMetadataAndIndex` #455