We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The displayed sparsity for the block mask is negative when the block size is bigger than the max size of the mask. Similar to this issue #68
torch nightly : 2.6.0.dev20241221+cu124 repro code :
import torch from torch.nn.attention.flex_attention import create_block_mask document_id = torch.zeros(100, dtype=torch.int, device="cuda") document_id[:10] = 0 document_id[10:20] = 1 for i in range(20, 100, 20): document_id[i : i + 20] = i // 20 + 1 def document_causal_mask(b, h, q_idx, kv_idx): causal_mask = q_idx >= kv_idx document_mask = document_id[q_idx] == document_id[kv_idx] return causal_mask & document_mask mask = create_block_mask(document_causal_mask, 1, 1, 100, 100, "cuda") print(mask)
output :
BlockMask(shape=(1, 1, 100, 100), sparsity=-63.84%, (0, 0) ██ )
does this have an effect on the results obtained with such a mask?
The text was updated successfully, but these errors were encountered:
This should not have any effect on the result. This is an independent code path from what is used to run w/ flex attention.
I noticed this as well and I know the root cause. I am working on the fix here
pytorch/pytorch#143534
Cc @Chillee
Sorry, something went wrong.
No branches or pull requests
The displayed sparsity for the block mask is negative when the block size is bigger than the max size of the mask.
Similar to this issue #68
torch nightly : 2.6.0.dev20241221+cu124
repro code :
output :
does this have an effect on the results obtained with such a mask?
The text was updated successfully, but these errors were encountered: