Replicating self-attention maps from Fig. 4a of publication #10

Al-Murphy · 2024-07-08T14:53:08Z

Hi,

I'm trying to replicate Chromoformer's self-attention map as analysed in Fig. 4a of your publication. The description given in the results is:

the attention weights produced by the Embedding transformer of Chromoformer-clf during the prediction were visualized to analyze the internal behavior of the model.

For which two attention heads are used. However, there appears to only be one head shown in a trained version of the clf model from the github demo for the embedding transformers (2000 resolution shown below):

seed = 123
bsz = 32
i_max = 8
w_prom = 40000
w_max = 40000
n_feats = 7
d_emb = 128
embed_kws = {
    "n_layers": 1,
    "n_heads": 2,
    "d_model": 128,
    "d_ff": 128,
}
pairwise_interaction_kws = {
    "n_layers": 2,
    "n_heads": 2,
    "d_model": 128,
    "d_ff": 256,
}
regulation_kws = {
    "n_layers": 6,
    "n_heads": 8,
    "d_model": 256,
    "d_ff": 256,
}
d_head = 128
model_clf = ChromoformerClassifier(
    n_feats, d_emb, d_head, embed_kws, pairwise_interaction_kws, regulation_kws, seed=seed
)
model_clf

output (partial - showing just the 2k embedding transformer):

ChromoformerBase(
  (embed): ModuleDict(
    (2000): EmbeddingTransformer(
      (lin_proj): Linear(in_features=7, out_features=128, bias=False)
      (transformer): Transformer(
        (layers): ModuleList(
          (0): AttentionBlock(
            (self_att): MultiHeadAttention(
              (w_bias): Linear(in_features=2, out_features=2, bias=False)
              (att): Linear(in_features=128, out_features=384, bias=False)
              (ff): Linear(in_features=128, out_features=128, bias=True)
              (ln): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
            )
            (ff): FeedForward(
              (l1): Linear(in_features=128, out_features=128, bias=True)
              (l2): Linear(in_features=128, out_features=128, bias=True)
              (ln): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
      )
    )

I've tried using an approach like this using register_forward_hook() but given there appears to be only one attention head in the printed model layers, I can only get the model.embed2000.transformer.layers[0].self_att or model.embed2000.transformer.layers[0].self_att.att matrix. How did you get the two matrices like in the publication from this? Did you use the self_att or specifically the self_att.att matrix or something else?

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicating self-attention maps from Fig. 4a of publication #10

Replicating self-attention maps from Fig. 4a of publication #10

Al-Murphy commented Jul 8, 2024

Replicating self-attention maps from Fig. 4a of publication #10

Replicating self-attention maps from Fig. 4a of publication #10

Comments

Al-Murphy commented Jul 8, 2024