Doc mask returns negative sparsity #93

staghado · 2024-12-21T13:52:02Z

The displayed sparsity for the block mask is negative when the block size is bigger than the max size of the mask.
Similar to this issue #68

torch nightly : 2.6.0.dev20241221+cu124
repro code :

import torch
from torch.nn.attention.flex_attention import create_block_mask

document_id = torch.zeros(100, dtype=torch.int, device="cuda")
document_id[:10] = 0
document_id[10:20] = 1
for i in range(20, 100, 20):
    document_id[i : i + 20] = i // 20 + 1

def document_causal_mask(b, h, q_idx, kv_idx):
    causal_mask = q_idx >= kv_idx
    document_mask = document_id[q_idx] == document_id[kv_idx]
    return causal_mask & document_mask

mask = create_block_mask(document_causal_mask, 1, 1, 100, 100, "cuda")
print(mask)

output :

BlockMask(shape=(1, 1, 100, 100), sparsity=-63.84%, 
(0, 0)
██
)

does this have an effect on the results obtained with such a mask?

The text was updated successfully, but these errors were encountered:

drisspg · 2024-12-21T15:16:04Z

This should not have any effect on the result. This is an independent code path from what is used to run w/ flex attention.

I noticed this as well and I know the root cause. I am working on the fix here

pytorch/pytorch#143534

Cc @Chillee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc mask returns negative sparsity #93

Doc mask returns negative sparsity #93

staghado commented Dec 21, 2024 •

edited

Loading

drisspg commented Dec 21, 2024 •

edited

Loading

Doc mask returns negative sparsity #93

Doc mask returns negative sparsity #93

Comments

staghado commented Dec 21, 2024 • edited Loading

drisspg commented Dec 21, 2024 • edited Loading

staghado commented Dec 21, 2024 •

edited

Loading

drisspg commented Dec 21, 2024 •

edited

Loading