Feature/44 make flash attention configurable #47

theissenhelen · 2024-09-20T14:00:00Z

Current setup:

If flash-attn is available in the environment, the MultiHeadSelfAttention module automatically imports the corresponding attention function. In inference however we do not have that information.

Now:

flex attention available
user specifies whether flash-attn, flex attention or scaled dot product attention should be used in the model config.
adds configurable parameters (soft cap, aLiBi) for flash attention
for aLiB:i adds a function to compute the slopes according to the number of attention heads
scaled dot product attention now supports sliding window (making it numerically equivalent to flash/flex)

This PR will be accompanied by changes to the config in Anemoi-training (PR)

Todo:

test various attention options

📚 Documentation preview 📚: https://anemoi-models--47.org.readthedocs.build/en/47/

src/anemoi/models/layers/attention.py

* fix: change pre-cmmit autoupdate schedule to monthly * fix: change the merge strategy for Changelog to Union * fix: add .envrc to .gitignore * ci: ignore pre-commit-config and readthedocs for changelog updates * ci: fix to correct hpc workflow call * fix: update precommit config * chore: update pre-commits * feat: add codeowners file * chore: update dependencies * ci: add hpc-config * docs: changelog * fix: respond to review comments --------- Co-authored-by: Jesper Dramsch <[email protected]>

* feat: add configurability to dropout in MultiHeadSelfAttention Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> * test: adjust to dropout_p * doc: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * [pre-commit.ci] pre-commit autoupdate (#25) updates: - [github.com/psf/black-pre-commit-mirror: 24.4.2 → 24.8.0](psf/black-pre-commit-mirror@24.4.2...24.8.0) - [github.com/astral-sh/ruff-pre-commit: v0.4.6 → v0.6.2](astral-sh/ruff-pre-commit@v0.4.6...v0.6.2) - [github.com/tox-dev/pyproject-fmt: 2.1.3 → 2.2.1](tox-dev/pyproject-fmt@2.1.3...2.2.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog * Feature/integrate reusable workflows (#16) * ci: add public pr label * ci: add readthedocs update check * ci: add downstream ci * ci: add ci-config * chore(deps): remove unused dependency * docs: update changelog * ci: switch to main * chore: changelog 0.2.1 * Update error messages from invalid sub_graph in model instantiation (#20) * ci: inherit pypi publish flow (#17) * ci: inherit pypi publish flow Co-authored-by: Helen Theissen <[email protected]> * docs: add to changelog * fix: typo in reusable workflow * fix: another typo * chore: bump actions/setup-python to v5 * ci: run downstream-ci for changes in src and tests * docs: update changelog --------- Co-authored-by: Helen Theissen <[email protected]> * Update CHANGELOG.md to KeepChangelog format * Ci/changelog-release-updater (#26) * ci: add changelof release updater * docs: update changelog --------- Co-authored-by: Rilwan (Akanni) Adewoyin <[email protected]> Co-authored-by: Gert Mertes <[email protected]> Co-authored-by: Mario Santa Cruz <[email protected]> Co-authored-by: Jesper Dramsch <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

xfail for MultiHeadSelfAttention

for more information, see https://pre-commit.ci

codecov-commenter · 2024-10-01T08:14:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.84%. Comparing base (e608a73) to head (8656cae).
Report is 62 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop      #47   +/-   ##
========================================
  Coverage    99.84%   99.84%           
========================================
  Files           23       23           
  Lines         1277     1304   +27     
========================================
+ Hits          1275     1302   +27     
  Misses           2        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CHANGELOG.md

mchantry · 2024-10-03T10:51:08Z

Where's the PR template?

src/anemoi/models/layers/attention.py

….com:ecmwf/anemoi-models into feature/44-make-flash-attention-configurable

JesperDramsch

Thanks for this contribution. This should make the code much more stable in inference.

I left a few comments about implementation details that caught my eye and the documentation needs to be updated.

Additionally, now that it's configurable, is there a way to also change it in the config?
I assume it's through the instantiation. Should we add it to the config then to make it explicitly available?

When the errors are implemented correctly, please also add tests that make sure the errors are triggered correctly, so we can catch edge-cases.

src/anemoi/models/layers/attention.py

JesperDramsch · 2024-12-16T11:14:37Z

src/anemoi/models/layers/attention.py

+            self.attention = torch.compile(self.attention)
+            self.is_attn_compiled = True
+
+        # TODO test how this impacts scaling at large model counts


Who is this a TODO for?

JesperDramsch · 2024-12-16T11:23:03Z

src/anemoi/models/layers/attention.py

+    Tensor
+        aLiBi slopes
+    """
+    n = 2 ** math.floor(math.log2(num_heads))


Since num_heads is an integer, we could be using bit-shifting here:
n = 1 << (num_heads.bit_length() - 1)

Not sure how necessary speed is here though, as a trade-off against readability. It would definitely need a comment.

Speed is not an issue as it is only calculated once. So, I would go for readability.

src/anemoi/models/layers/attention.py

JesperDramsch · 2024-12-16T11:28:00Z

src/anemoi/models/layers/chunk.py

+            A predefined string which selects which underlying attention
+            implementation, by default "flash_attention"
+        softcap : float, optional
+            Anything > 0 activates softcapping flash attention, by default None


What does "Anything > 0" mean here? Please adjust this explanation across docstrings to be more informative to someone that hasn't worked with the attention implementation yet.

….com:ecmwf/anemoi-models into feature/44-make-flash-attention-configurable

mchantry reviewed Sep 22, 2024

View reviewed changes

src/anemoi/models/layers/attention.py Outdated Show resolved Hide resolved

theissenhelen and others added 9 commits September 27, 2024 08:42

feat: FlashMultiHeadSelfAttention

539e8a2

chore!: drop support for scaled_dot_product_attention

a86c9a8

feat: add softcap

105443f

test: add softcap

e82a59e

xfail for MultiHeadSelfAttention

[pre-commit.ci] auto fixes from pre-commit.com hooks

e648eb0

for more information, see https://pre-commit.ci

feat: flash attention lazy import

6271cd8

feat: make alibi slopes configurable

d4940e7

theissenhelen force-pushed the feature/44-make-flash-attention-configurable branch from a080cc5 to d4940e7 Compare September 27, 2024 08:44

theissenhelen added 4 commits September 27, 2024 11:49

chore(deps): add flash-attn

9ff6cb9

feat: use scaled_dot_product as default

bbd89dc

feat: make alibi_slope cinfigurable in block, chunk processor

91533c6

chore(deps): remove flash-attn

0eb5c50

theissenhelen added 6 commits October 2, 2024 13:07

feat: get alibi_slopes

c04e641

docs: update docstrings

6523b47

fix: bias shape

22623cc

fix: softcap optional

ed07e34

fix: import annotations from future

c841324

fix: annotation error

6c12dda

theissenhelen marked this pull request as ready for review October 3, 2024 10:08

theissenhelen requested a review from mchantry October 3, 2024 10:08

mchantry reviewed Oct 3, 2024

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

docs: update changelog

b7b8f2e

theissenhelen mentioned this pull request Oct 3, 2024

feat: make flash-attn configurabele ecmwf/anemoi-training#73

Draft

mpvginde reviewed Oct 4, 2024

View reviewed changes

src/anemoi/models/layers/attention.py Outdated Show resolved Hide resolved

fix: type annotation

df353d9

cathalobrien added 3 commits November 12, 2024 08:25

added input parameter checks

739aa65

precommit fix

2a2ed11

merge

fa1474c

theissenhelen dismissed mpvginde’s stale review via fa1474c November 20, 2024 09:38

b8raoult marked this pull request as ready for review November 20, 2024 09:38

b8raoult requested review from JesperDramsch, gmertes, floriankrb, anaprietonem and JPXKQX as code owners November 20, 2024 09:38

anaprietonem assigned theissenhelen Nov 21, 2024

theissenhelen and others added 9 commits November 26, 2024 16:42

fix: typo

a703688

test: adjust tests

f1be563

fix: no self.use_alibi_slopes

0dda5d6

fix: use_alibi_slope default to false

12facf0

feat: Add sliding window support for TorchAttention via mask

60e32f1

fix: set default flash_attention

07d9684

fix: pytest

9a1827a

fix: tests

ca8c9fa

Merge branch 'feature/44-make-flash-attention-configurable' of github…

ac897ea

….com:ecmwf/anemoi-models into feature/44-make-flash-attention-configurable

JesperDramsch suggested changes Dec 16, 2024

View reviewed changes

theissenhelen added 4 commits December 18, 2024 11:44

docs: improve docstrings in MultiHeadSelfAttention

7ec8142

fix: error instead of SystemExit

972d3c5

chore: refactor SDPAAttention update_mask method

e89fd2e

feat: add missing pytest.ini

2d122df

theissenhelen force-pushed the feature/44-make-flash-attention-configurable branch from 4a99a5e to 2d122df Compare December 19, 2024 09:14

chore: remove explicit float typing

d4510f6

theissenhelen force-pushed the feature/44-make-flash-attention-configurable branch from e96cfd1 to d4510f6 Compare December 19, 2024 14:14

cathalobrien added 2 commits December 19, 2024 15:42

Merge branch 'feature/44-make-flash-attention-configurable' of github…

6057004

….com:ecmwf/anemoi-models into feature/44-make-flash-attention-configurable

support running without window size

8656cae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/44 make flash attention configurable #47

Feature/44 make flash attention configurable #47

theissenhelen commented Sep 20, 2024 •

edited by github-actions bot

Loading

codecov-commenter commented Oct 1, 2024 •

edited

Loading

mchantry commented Oct 3, 2024

JesperDramsch left a comment

JesperDramsch Dec 16, 2024

JesperDramsch Dec 16, 2024

theissenhelen Dec 19, 2024 •

edited

Loading

JesperDramsch Dec 16, 2024

Feature/44 make flash attention configurable #47

Are you sure you want to change the base?

Feature/44 make flash attention configurable #47

Conversation

theissenhelen commented Sep 20, 2024 • edited by github-actions bot Loading

codecov-commenter commented Oct 1, 2024 • edited Loading

Codecov Report

mchantry commented Oct 3, 2024

JesperDramsch left a comment

Choose a reason for hiding this comment

JesperDramsch Dec 16, 2024

Choose a reason for hiding this comment

JesperDramsch Dec 16, 2024

Choose a reason for hiding this comment

theissenhelen Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

JesperDramsch Dec 16, 2024

Choose a reason for hiding this comment

theissenhelen commented Sep 20, 2024 •

edited by github-actions bot

Loading

codecov-commenter commented Oct 1, 2024 •

edited

Loading

theissenhelen Dec 19, 2024 •

edited

Loading