Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is vae decoder so slow? Can you help me? #27

Open
radish0926 opened this issue Oct 15, 2024 · 4 comments
Open

Why is vae decoder so slow? Can you help me? #27

radish0926 opened this issue Oct 15, 2024 · 4 comments

Comments

@radish0926
Copy link

企业微信截图_5e1116fd-06f0-496c-8927-0522bdc55615

@aredden
Copy link
Owner

aredden commented Oct 15, 2024

Seems like most of the delay is actually synchronization, which sort of implies that the slowdown is actually something else in the code prior. The way torch works is each op actually runs asynchronously, with each next op getting pushed to the gpu to run at a later time. Since synchronization is what is taking the longest time, try synchronizing before the decode and then run the decode step to ensure that it's not something else.

@aredden
Copy link
Owner

aredden commented Oct 15, 2024

Actulally, it could be because you may have set autoencoder offloading to true, so- in that case it could be that the slowdown is moving the vae to gpu, encoding, and then moving the vae back to the cpu.

@radish0926
Copy link
Author

I tried not to uninstall the autoencoder, but found that the speed of the decoder is still the same slow. The main part of speed encoding is upsampling. It takes 4 to 5 seconds,My test machine is L4
for i_level in reversed(range(self.num_resolutions)): for i_block in range(self.num_res_blocks + 1): h = self.up[i_level].block[i_block](h) if len(self.up[i_level].attn) > 0: h = self.up[i_level].attn[i_block](h) if i_level != 0: h = self.up[i_level].upsample(h)

@aredden
Copy link
Owner

aredden commented Oct 18, 2024

I'm not entirely sure what the slowdown would be- though an L4 has pretty low wattage limits so it might be related it throttling because of wattage limits. I would check the clock speeds as it's decoding, check to see whether they drop significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants