Theoritically, what limits the number of frames the model can inferrence? #662

HWT-WalterHu · 2025-01-13T07:24:27Z

The parameter matrix of DiT can compute with any shape of latents, isn't it?
If we increase the dim of frame like from 13 to 27, only the shape of attention map get bigger, but it still can do matrix compute with pre-trained parameters and generate a longer video?

So what limits the number of frames?

yzy-thu · 2025-01-14T05:05:37Z

The max training length

HWT-WalterHu · 2025-01-14T07:51:40Z

The max training length

Thanks for your reply.
So the answer is that model can compute on bigger dimension input tensor, but the generated video quality will drop a lot, because the model hasn't been trained on that length, right?

zRzRzRzRzRzRzR assigned yzy-thu Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Theoritically, what limits the number of frames the model can inferrence? #662

Theoritically, what limits the number of frames the model can inferrence? #662

HWT-WalterHu commented Jan 13, 2025

yzy-thu commented Jan 14, 2025

HWT-WalterHu commented Jan 14, 2025 •

edited

Loading

Theoritically, what limits the number of frames the model can inferrence? #662

Theoritically, what limits the number of frames the model can inferrence? #662

Comments

HWT-WalterHu commented Jan 13, 2025

yzy-thu commented Jan 14, 2025

HWT-WalterHu commented Jan 14, 2025 • edited Loading

HWT-WalterHu commented Jan 14, 2025 •

edited

Loading