Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc mask returns negative sparsity #93

Open
staghado opened this issue Dec 21, 2024 · 1 comment
Open

Doc mask returns negative sparsity #93

staghado opened this issue Dec 21, 2024 · 1 comment

Comments

@staghado
Copy link

staghado commented Dec 21, 2024

The displayed sparsity for the block mask is negative when the block size is bigger than the max size of the mask.
Similar to this issue #68

torch nightly : 2.6.0.dev20241221+cu124
repro code :

import torch
from torch.nn.attention.flex_attention import create_block_mask

document_id = torch.zeros(100, dtype=torch.int, device="cuda")
document_id[:10] = 0
document_id[10:20] = 1
for i in range(20, 100, 20):
    document_id[i : i + 20] = i // 20 + 1

def document_causal_mask(b, h, q_idx, kv_idx):
    causal_mask = q_idx >= kv_idx
    document_mask = document_id[q_idx] == document_id[kv_idx]
    return causal_mask & document_mask

mask = create_block_mask(document_causal_mask, 1, 1, 100, 100, "cuda")
print(mask)

output :

BlockMask(shape=(1, 1, 100, 100), sparsity=-63.84%, 
(0, 0)
██
)

does this have an effect on the results obtained with such a mask?

@drisspg
Copy link
Contributor

drisspg commented Dec 21, 2024

This should not have any effect on the result. This is an independent code path from what is used to run w/ flex attention.

I noticed this as well and I know the root cause. I am working on the fix here

pytorch/pytorch#143534

Cc @Chillee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants