Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating self-attention maps from Fig. 4a of publication #10

Open
Al-Murphy opened this issue Jul 8, 2024 · 0 comments
Open

Replicating self-attention maps from Fig. 4a of publication #10

Al-Murphy opened this issue Jul 8, 2024 · 0 comments

Comments

@Al-Murphy
Copy link

Hi,

I'm trying to replicate Chromoformer's self-attention map as analysed in Fig. 4a of your publication. The description given in the results is:

the attention weights produced by the Embedding transformer of Chromoformer-clf during the prediction were visualized to analyze the internal behavior of the model.

For which two attention heads are used. However, there appears to only be one head shown in a trained version of the clf model from the github demo for the embedding transformers (2000 resolution shown below):

seed = 123
bsz = 32
i_max = 8
w_prom = 40000
w_max = 40000
n_feats = 7
d_emb = 128
embed_kws = {
    "n_layers": 1,
    "n_heads": 2,
    "d_model": 128,
    "d_ff": 128,
}
pairwise_interaction_kws = {
    "n_layers": 2,
    "n_heads": 2,
    "d_model": 128,
    "d_ff": 256,
}
regulation_kws = {
    "n_layers": 6,
    "n_heads": 8,
    "d_model": 256,
    "d_ff": 256,
}
d_head = 128
model_clf = ChromoformerClassifier(
    n_feats, d_emb, d_head, embed_kws, pairwise_interaction_kws, regulation_kws, seed=seed
)
model_clf

output (partial - showing just the 2k embedding transformer):

ChromoformerBase(
  (embed): ModuleDict(
    (2000): EmbeddingTransformer(
      (lin_proj): Linear(in_features=7, out_features=128, bias=False)
      (transformer): Transformer(
        (layers): ModuleList(
          (0): AttentionBlock(
            (self_att): MultiHeadAttention(
              (w_bias): Linear(in_features=2, out_features=2, bias=False)
              (att): Linear(in_features=128, out_features=384, bias=False)
              (ff): Linear(in_features=128, out_features=128, bias=True)
              (ln): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
            )
            (ff): FeedForward(
              (l1): Linear(in_features=128, out_features=128, bias=True)
              (l2): Linear(in_features=128, out_features=128, bias=True)
              (ln): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
      )
    )

I've tried using an approach like this using register_forward_hook() but given there appears to be only one attention head in the printed model layers, I can only get the model.embed2000.transformer.layers[0].self_att or model.embed2000.transformer.layers[0].self_att.att matrix. How did you get the two matrices like in the publication from this? Did you use the self_att or specifically the self_att.att matrix or something else?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant