You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In scanFileAndUpdateMetadataAndIndex, we first scan the entire file by receiving all packets in the container, calling av_read_frame() until the end of the video:
I also noticed that we don't seek to the beginning at the start of the function, either. That is, scanFileAndUpdateMetadataAnIndex() is a public method on the C++ VideoDecoder class. In theory, someone could easily call it after decoding a bunch of frames. The Python VideoDecoder API prevents such a scenario. However, after reasoning through it a bunch, I think we can say such a sequence of calls is invalid. The scan function was clearly written assuming no one else would do any other seeking before calling it. And the functions we have internally for seeking while decoding assume we have active streams. We also have some internal callers of the C++ scan function, so we can't make it private.
I dug into the code for avformat_find_stream_info() which does a less extreme version of what we're doing. In that function, one of the first things they do is figure out where the cursor currently is, and store it in old_offset. And then whenever they read packets from the file, they make sure to call (effectively) avio_seek(file_handle, old_offset).
Curiously, because the logic above always queries where the current cursor is, and reset is, it doesn't answer our question of what we should use for the beginning. Based on all of that plus @NicolasHug's analysis, I think the most correct thing for us to do is (roughly):
Assert that maybeDesiredPts_ == std::nullopt. Note that this variable is a misnomer: it's actually a double in seconds, even though its name sounds like it's an int64 in pts. This also means we can assume that the cursor is in fact at the beginning.
Do the current logic.
Call av_seek_file() on the best stream and minPtsFromScan. Note that we just computed that value.
In
scanFileAndUpdateMetadataAndIndex
, we first scan the entire file by receiving all packets in the container, callingav_read_frame()
until the end of the video:torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Line 551 in d26bfbc
After this scan, we call
av_seek_file
to go back to the beginning of the file:torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Lines 593 to 594 in d26bfbc
(docs)
Specifically we pass:
I wonder if we should instead pass:
The text was updated successfully, but these errors were encountered: