Speed and storage reproduce #29

lixinghe1999 · 2023-06-15T07:18:49Z

Dear author and other people who are interested,

I am interested in ML on compressed video recently, however, after trying few codes on this repo, I have the following concerns.

why the motion vector is stored by int64 by default? from my experiment, the size of it can easily be larger than the original video.
even if I change the type of motion vector to uint8, the size of it is still 4* larger than the original video. I guess there is further compression behind H.264 (I am new to video codec), can anyone confirm it?
read the pre-saved .npy motion vector seems only have a very limited advantage compared to directly reading the RGB. (7s for motion vector, 8s for OpenCV RGB reading, 30minutes video). I understand reading the .npy is not the same as reading the byte from video by C++ (although np.load is C++ backend), but since the decoder is not so heavy, I still feel that only reading motion vector can give limited benefit.
Besides, can I use mv-extractor to directly get the motion vector?

I am appreciate to any comments or help! thanks in advance.

LukasBommes · 2024-10-27T17:48:29Z

Hey @lixinghe1999,
the purpose of this library is by no means to beat the codec at storing the video in the most space-saving way. For that you are better of storing the video fully encoded. Instead, the purpose of this library is to make the motion vectors, which are an internal detail of the codec, accessible for research and projects that require access to the motion vectors.

To answer your questions:

The motion vectors are stored as int32 as defined here. It's true that one could store them more space-saving as some fields require only uint8 or int16 (see the definition of AVMotionVector here). It's too long ago to say for sure why I cast everything to int32. But I assume, that was needed because the entire array is passed to numpy by reference to form a numpy array. And standard numpy arrays can only contain homogenous data types.
Well, uint8 won't fit the data contained in the AVMotionVector, which contains some fields that are int16 or int32. Regarding the second part of your question, I can't recall how this works and will have to look this up on my own. You coudl take a look at this book. It explains such details very well.
As mentioned above, the aim of extracting the motion vectors is not to be faster than the codec. The codec also makes use of hardware-acceleration, which you won't get with numpy arrays.
Sure, you can use the extract_mvs command. Please refer to the README on how to do this.

LukasBommes · 2024-10-31T16:44:33Z

To follow up on your second question: Why would the motion vectors be larger than the original frame? Let's assume the original frame is 1280 px * 720 px (720p) and contains three color channels (RGB). If it's an 8 bit image, it would take up 12807203 Bytes = 2700 KiB. If that same frame was stored as a P frame, the associated 3600 motions vectors (each 32 Bytes) of the MPEG4 encoded frame would take up 3600*32 Bytes = 112.5 KiB. So, the motion vectors are 24 times more compact than the frame. (Obviously, this calculation does not account for the keyframe which is needed as reference for the motion vectors to make sense).

Let me also cite a section from the book that I recommended:

4.3.1.3 Bitstream encoding

The video coding process produces a number of values that must be encoded to form the compressed bitstream. These values include:

Quantized transform coefficients

Information to enable the decoder to re-create the prediction

Information about the structure of the compressed data and the compression tools used during encoding

Information about the complete video sequence.

These values and parameters, syntax elements, are converted into binary codes using variable length coding and/or arithmetic coding. Each of these encoding methods produces an efficient, compact binary representation of the information. The encoded bitstream can then be stored and/or transmitted.

So, the motion vectors will be even further compressed by the codec. And I am quite sure the codec won't store all the zero-valued motion vectors, which there can be many depending on scene composition.

LukasBommes closed this as completed Oct 27, 2024

LukasBommes reopened this Oct 27, 2024

LukasBommes mentioned this issue Oct 31, 2024

update dtype in readme from int64 to int32 #70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed and storage reproduce #29

Speed and storage reproduce #29

lixinghe1999 commented Jun 15, 2023

LukasBommes commented Oct 27, 2024 •

edited

Loading

LukasBommes commented Oct 31, 2024 •

edited

Loading

Speed and storage reproduce #29

Speed and storage reproduce #29

Comments

lixinghe1999 commented Jun 15, 2023

LukasBommes commented Oct 27, 2024 • edited Loading

LukasBommes commented Oct 31, 2024 • edited Loading

LukasBommes commented Oct 27, 2024 •

edited

Loading

LukasBommes commented Oct 31, 2024 •

edited

Loading