Is it possible to add cross attention aka encoder outputs to SRU++ ? #164

hadaev8 · 2021-02-26T13:37:17Z

No description provided.

taoleicn · 2021-02-27T17:13:45Z

At the moment, we haven't implemented a SRU++ "decoder" in which there are both self attention and cross attention. There are two options you could choose:

You can feed in memory to SRU++ and it will treat it as extra context to attend. In other words, you can do sth like:

# encoder
enc_output, enc_hidden, _ = SRUpp_encoder(enc_input, pad_mask=is_padding_mask_enc)

# decoder
memory = [output] * num_dec_layers  # a list of tensor of size (length, batch size, d)
dec_output, dec_hidden, _ = SRUpp_decoder(dec_input,
                                          pad_mask=is_padding_mask_dec,
                                          attn_mask=attention_mask,   # (dec_length, src_length + dec_length)
                                          memory=memory,
                                          memory_mask_pad=...)

Note we are assuming all input & hidden dimensions are d here.

You can customize a SRUpp decoding layer. See here:
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L907-L911

hadaev8 · 2021-02-27T18:18:38Z

@taoleicn
The first option seems like the usual decoder it transformer.
It will attend to self outputs and to memory inputs, right?

Where can I find the transform_module definition?

taolei87 · 2021-02-27T18:30:46Z

@hadaev8 yes and no.

Yes in the sense that within each SRU++ layer, the layer will attend to both self outputs and the memory inputs.
No in the sense that in a transformer decoder, there are two attention sub-layers. One is used only for self attention, and another one is used only for cross attention. In option 1, what would happen is the memory tensor will first be concatenated with the self outputs from the previous layer, and then only one attention is applied. See https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L791-L793

Re: transform_module
definition:
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L90-L94

how SRUpp set transform_module as the attention sub-module:
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L1019-L1028
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L1046

forward method of SRUppCell:
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L907-L911

hadaev8 · 2021-03-01T16:43:18Z

@taolei87
Do I understood correctly what this expects one memory vector instead of sequence?

taoleicn · 2021-03-01T17:22:10Z

@hadaev8 i'm not sure i follow. can you elaborate more on your question?

hadaev8 · 2021-03-01T18:10:35Z

@taolei87
What is expected size of memory tensor?

taoleicn · 2021-03-01T18:22:11Z

It is a 3-dimensional tensor (memory_seq_len, batch_size, hidden_size). See an illustration below:

SRUpp module takes a list of memory tensors (one for each sub-layer), and SRUppCell takes a single memory tensor.
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L1088
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L771

I updated the pseudo code in the previous reply for a correction.

hadaev8 · 2021-03-01T20:04:57Z

@taolei87
Now I got it.
I will try it as it is but have a feeling it's not a good idea to concat self and cross attentions under one softmax.
Any plans for adding more common cross attention?
Also, how it should work in inference?

Spotted this thing:
https://github.com/asappresearch/sru/blob/3.0.0-dev/sru/modules.py#L158

taoleicn added the question label Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to add cross attention aka encoder outputs to SRU++ ? #164

Is it possible to add cross attention aka encoder outputs to SRU++ ? #164

hadaev8 commented Feb 26, 2021

taoleicn commented Feb 27, 2021 •

edited

Loading

hadaev8 commented Feb 27, 2021

taolei87 commented Feb 27, 2021

hadaev8 commented Mar 1, 2021 •

edited

Loading

taoleicn commented Mar 1, 2021

hadaev8 commented Mar 1, 2021

taoleicn commented Mar 1, 2021

hadaev8 commented Mar 1, 2021

Is it possible to add cross attention aka encoder outputs to SRU++ ? #164

Is it possible to add cross attention aka encoder outputs to SRU++ ? #164

Comments

hadaev8 commented Feb 26, 2021

taoleicn commented Feb 27, 2021 • edited Loading

hadaev8 commented Feb 27, 2021

taolei87 commented Feb 27, 2021

hadaev8 commented Mar 1, 2021 • edited Loading

taoleicn commented Mar 1, 2021

hadaev8 commented Mar 1, 2021

taoleicn commented Mar 1, 2021

hadaev8 commented Mar 1, 2021

taoleicn commented Feb 27, 2021 •

edited

Loading

hadaev8 commented Mar 1, 2021 •

edited

Loading