Fix attention result projection (#666)

* Updated README to have Bryce as the maintainer * Fix attention result projection Current result projection for attention is incorrect. Type annotations would suggest that `result` isn't being summed over `head_index`, but in fact it is. I've edited the function so that it's no longer being summed over `head_index`. Note, this bug caused the ARENA material to fail for the first transformers chapter, I've tested it and it now works. * fix formatting with black --------- Co-authored-by: Bryce Meyer <[email protected]> Co-authored-by: Neel Nanda <[email protected]>
TransformerLensOrg · Jul 11, 2024 · 67ed0d6 · 67ed0d6
1 parent 9872334
commit 67ed0d6
Showing 1 changed file with 8 additions and 5 deletions.
diff --git a/transformer_lens/components/abstract_attention.py b/transformer_lens/components/abstract_attention.py
@@ -297,12 +297,15 @@ def forward(
             else:
                 w = einops.rearrange(
                     self.W_O,
-                    "head_index d_head d_model -> d_model (head_index d_head)",
+                    "head_index d_head d_model -> d_model head_index d_head",
                 )
-                input = einops.rearrange(
-                    z, "batch pos head_index d_head -> batch pos (head_index d_head)"
-                )
-                result = self.hook_result(F.linear(input, w))  # [batch, pos, head_index, d_model]
+                result = self.hook_result(
+                    einops.einsum(
+                        z,
+                        w,
+                        "... head_index d_head, d_model head_index d_head -> ... head_index d_model",
+                    )
+                )  # [batch, pos, head_index, d_model]
             out = (
                 einops.reduce(result, "batch position index model->batch position model", "sum")
                 + self.b_O