Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Theoritically, what limits the number of frames the model can inferrence? #662

Open
HWT-WalterHu opened this issue Jan 13, 2025 · 2 comments
Assignees

Comments

@HWT-WalterHu
Copy link

The parameter matrix of DiT can compute with any shape of latents, isn't it?
If we increase the dim of frame like from 13 to 27, only the shape of attention map get bigger, but it still can do matrix compute with pre-trained parameters and generate a longer video?

So what limits the number of frames?

@yzy-thu
Copy link
Contributor

yzy-thu commented Jan 14, 2025

The max training length

@HWT-WalterHu
Copy link
Author

HWT-WalterHu commented Jan 14, 2025

The max training length

Thanks for your reply.
So the answer is that model can compute on bigger dimension input tensor, but the generated video quality will drop a lot, because the model hasn't been trained on that length, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants