Skip to content

Commit

Permalink
Fix typos: casual -> causal (#102)
Browse files Browse the repository at this point in the history
  • Loading branch information
awgu authored Jan 10, 2025
1 parent a710e18 commit 2e4d04a
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions examples/flex_attn.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,7 @@
"The implementation using as a mask_mod:\n",
"```Python\n",
"The implementation using a mask_mod:\n",
"def casual_mask(b,h,q_idx, kv_idx):\n",
"def causal_mask(b, h, q_idx, kv_idx):\n",
" return q_idx >= kv_idx\n",
"```\n",
"As you can see they look very similar, both return scalar tensors. The key differences\n",
Expand Down Expand Up @@ -449,7 +449,7 @@
"### Sliding Window Attention\n",
"The [Mistral paper](https://arxiv.org/abs/2310.06825) has a very nice visual of this bias and describes it. In essence you define a fixed size \"SLIDING_WINDOW\" and for autogressive decoding you only allow `torch.abs(q_tokens - kv_tokens) < SLIDING_WINDOW` to attend to each other. Typically this is also combined with causal attention. We are going to do this through a a nice pattern, mask composition. Typically masking can can conceptually be done in pieces and then composed together.\n",
"\n",
"We are going to write two mask_functions 1 for doing `casual-masking`, and one for doing `windowed-attention` and compose them together to produce the final mask_fn. As we know from earlier, mask_fns return boolean values where a value of `True` indicates that the element should take part in attention.\n"
"We are going to write two mask_functions 1 for doing `causal-masking`, and one for doing `windowed-attention` and compose them together to produce the final mask_fn. As we know from earlier, mask_fns return boolean values where a value of `True` indicates that the element should take part in attention.\n"
]
},
{
Expand Down

0 comments on commit 2e4d04a

Please sign in to comment.