-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process_encoder_hidden_states function is applied on each denoising step #9625
Comments
Ditto for diffusers/src/diffusers/models/unets/unet_2d_condition.py Lines 952 to 954 in 31058cd
|
These don't require recomputation, yes. We recently discussed about this internally too so I have some comments fresh in mind that I'd like to share. This was tested in the past and it was found that the speedup gains were existent but quite insignificant (unless you're trying to generate in bulk, for example as an image generation service provider - in which case, it makes sense to save on as much extra overhead as you can). Since these layers exist in the UNet, performing intermediate computations in the pipeline using them to pass back into the UNet might be a little confusing for newcomers. Ideally, we'd like this be as simple as possible and behave like a blackbox function where you don't have to bother preparing anything but the latents and prompt embeddings, even if it has performance downsides to it. I did have a design idea in mind to provide a modeling-level control where the user could pass in one or more layer identifiers/regex to be able to reuse computed values. Something that looked like:
During the first inference step, the values you'd like to reuse/cache will be computed, and for the remaining inference steps, it would be reused. At the end of the denoising loop, the cache states would be cleared to be able use new prompts/images embeddings, etc. Alternatively, we could add more parameters to the UNet forward such as cc @yiyixuxu too because we were discussing this recently |
Thank you @a-r-r-o-w! In my tests the overhead was around 2.5% for the IP adapter, which I don't think is a big deal for most users. But it depends on an encoder complexity. Personally I like your idea with caching especially if you don't want to do changes on a pipeline level. |
diffusers/src/diffusers/models/unets/unet_2d_condition.py
Line 1004 in 31058cd
Hey! I noticed that
process_encoder_hidden_states
is applied on each denoising step that influences on an inference performance. This function can be executed only once before running UNet2DConditionModel.forward. The output ofprocess_encoder_hidden_states
should be combined withencoder_hidden_states
and be provided to UNet2DConditionModel.forwardThe text was updated successfully, but these errors were encountered: