This repository has been archived by the owner on Dec 20, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature/44 make flash attention configurable #47
base: develop
Are you sure you want to change the base?
Feature/44 make flash attention configurable #47
Changes from 22 commits
539e8a2
3317138
3186a8e
a86c9a8
105443f
e82a59e
e648eb0
6271cd8
d4940e7
9ff6cb9
bbd89dc
91533c6
0eb5c50
c04e641
6523b47
22623cc
ed07e34
c841324
6c12dda
b7b8f2e
df353d9
fc335c7
663fea0
a8b3f9d
6595ca1
0c55a9c
ea665be
ffa2d99
7c2d634
d2ed932
3295159
ebde686
5102d9a
3abc286
673a25d
f606058
ef34771
5136fb3
892c269
4c42171
4bdf464
5a670b2
34db6e4
d424c75
222b7d8
c2aca14
b75d225
147e772
f0c24e8
3c4572b
fb731f7
f0308f2
6dee265
7fb0b62
739aa65
2a2ed11
fa1474c
a703688
f1be563
0dda5d6
12facf0
60e32f1
07d9684
9a1827a
ca8c9fa
ac897ea
7ec8142
972d3c5
e89fd2e
2d122df
d4510f6
6057004
8656cae
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since
num_heads
is an integer, we could be using bit-shifting here:n = 1 << (num_heads.bit_length() - 1)
Not sure how necessary speed is here though, as a trade-off against readability. It would definitely need a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speed is not an issue as it is only calculated once. So, I would go for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "Anything > 0" mean here? Please adjust this explanation across docstrings to be more informative to someone that hasn't worked with the attention implementation yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: What is the comment for?