Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Optionally use flash attention. #378

Open
tbenthompson opened this issue Sep 8, 2023 · 4 comments · May be fixed by #501
Open

[Proposal] Optionally use flash attention. #378

tbenthompson opened this issue Sep 8, 2023 · 4 comments · May be fixed by #501

Comments

@tbenthompson
Copy link

tbenthompson commented Sep 8, 2023

It would be nice to have a flag to enable flash attention in models where that would make sense. This is helpful for performance and memory usage in larger models. In my case working with Pythia 12B, I get ~50% better performance and ~4x larger batch sizes when using flash attention. I also find numerical stability in float16 to be better using flash attention, probably because the model was trained using flash attention.

The downside of using flash attention in TransformerLens is that we would not have access to intermediate quantities in the attention calculation like the attention matrix itself. This is why I would suggest having a default-off flag so that users can choose whether they need those intermediate values to be available. In addition, when only a small subset of attention intermediates are needed, it's much faster to just cache the input to the attention layer (or the residual stream) and then recompute those intermediates when needed.

Thanks!

@neelnanda-io
Copy link
Collaborator

neelnanda-io commented Sep 8, 2023 via email

@alan-cooney
Copy link
Collaborator

Seems v. useful for sparse autoencoder training.

Docs here - https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html#conclusion - in case anyone wants to take this (I'll pick it up at some point if no-one does).

@cmathw
Copy link
Contributor

cmathw commented Jan 24, 2024

I'd be quite keen to make a start on this soon, @alan-cooney have you made a start already?

@alan-cooney
Copy link
Collaborator

I'd be quite keen to make a start on this soon, @alan-cooney have you made a start already?

I haven't yet so please feel free to!

@cmathw cmathw linked a pull request Jan 30, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants