-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NPU Adaption for Sanna #10409
NPU Adaption for Sanna #10409
Conversation
@sayakpaul Please take a look at this PR for making Sanna suitable for NPU training. Thank you so much! |
@@ -979,10 +982,10 @@ def main(args): | |||
) | |||
|
|||
# VAE should always be kept in fp32 for SANA (?) | |||
vae.to(dtype=torch.float32) | |||
vae.to(accelerator.device, dtype=torch.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. As we conditionally put the VAE on and off the accelerator device.
transformer.to(accelerator.device, dtype=weight_dtype) | ||
# because Gemma2 is particularly suited for bfloat16. | ||
text_encoder.to(dtype=torch.bfloat16) | ||
text_encoder.to(accelerator.device, dtype=torch.bfloat16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Training related changes look straightforward to me. Could we also add a note about this support in the README?
I will leave the models
related changes for @yiyixuxu to review.
@sayakpaul Thanks for your help! You can definitely add this support in the README! By the way, I think once we've done the sd3 lora function, we should change the accelerate loading and saving functions at least for training scripts for dreambooth Lora |
Oh, I was wondering if you could just add a note about the NPU support in the README directly.
This is not relevant for this PR, so, we can ignore. |
@sayakpaul I'm not sure what is the process, as both flux and sd don't have these notes. By the way, the npu training should be automatically proceed if the npu is available. |
Okay we can leave it out of this PR then and open a future PR to add that note. |
Sure, thanks for your help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in the training script looks good to me. Off to @yiyixuxu for the other changes.
@yiyixuxu Please take a look. Thank you! |
@@ -119,6 +120,12 @@ def __init__( | |||
# 2. Cross Attention | |||
if cross_attention_dim is not None: | |||
self.norm2 = nn.LayerNorm(dim, elementwise_affine=norm_elementwise_affine, eps=norm_eps) | |||
|
|||
if is_torch_npu_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as in the other PR - let's not update default attn processor logic for now
we can manually set it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as in the other PR - let's not update default attn processor logic for now we can manually set it
I've updated the new one, please take a look. This can just use set up NPU FA directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will let you know when the full test is complete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu It still needs to modify the sanna_transformer file, so I think to check in the init it;s the best option now
@yiyixuxu Please take a look at this modification, if this is fine, I will update the FLUX one as well once this has been merged. Thank you! |
@@ -294,6 +294,10 @@ def __init__( | |||
processor = ( | |||
AttnProcessor2_0() if hasattr(F, "scaled_dot_product_attention") and self.scale_qk else AttnProcessor() | |||
) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm I don't think we should change the default attention processor here
let's keep this logic in SANA:)
@yiyixuxu I've changed the logic back in SANA, thanks for your help! |
@@ -119,6 +120,13 @@ def __init__( | |||
# 2. Cross Attention | |||
if cross_attention_dim is not None: | |||
self.norm2 = nn.LayerNorm(dim, elementwise_affine=norm_elementwise_affine, eps=norm_eps) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lawrence-cj let me know if it's ok with you to default to NPU attention when it's available:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'm not familiar with NPU training and inference. Is this NPU device very popular in diffusers community?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
let's wait for SANA author to see if they like the default processor change, ok to me otherwise
actually, I think it's not so great if we only have this behavior for Sana: automatically use NPU when it is available maybe let's not do this unless we want to change default for everything? you can still explicitly set NPU attention processor, no? |
Hi @yiyixuxu, because the default attention processor is AttnProcessor2_0, so either to change it inside model or change the set processor in the attention_processor.py |
Hi @yiyixuxu, is this ready to merge? Thanks |
@leisuzz |
@yiyixuxu Please take a look at this modification and let me know if it is ok. Thanks |
@yiyixuxu In fact, NPU has to use the AttnProcessorNPU, otherwise the loss will be NaN in Sana. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Adaption for NPU training dreambooth lora for Sanna
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.