You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The parameter matrix of DiT can compute with any shape of latents, isn't it?
If we increase the dim of frame like from 13 to 27, only the shape of attention map get bigger, but it still can do matrix compute with pre-trained parameters and generate a longer video?
So what limits the number of frames?
The text was updated successfully, but these errors were encountered:
Thanks for your reply.
So the answer is that model can compute on bigger dimension input tensor, but the generated video quality will drop a lot, because the model hasn't been trained on that length, right?
The parameter matrix of DiT can compute with any shape of latents, isn't it?
If we increase the dim of frame like from 13 to 27, only the shape of attention map get bigger, but it still can do matrix compute with pre-trained parameters and generate a longer video?
So what limits the number of frames?
The text was updated successfully, but these errors were encountered: