Implementation request: Attention GRU #387

juesato · 2017-02-02T07:16:27Z

@nicholas-leonard I'm probably going to write a GRU with attention. I'm curious to get your input on the best way to do this. I'm also happy to contribute it here if you want.

The first option is to modify the current GRU implementation, where every time step takes 3 inputs rather than 2, {x_t, h_t-1, enc} where enc are the encodings being attended over.

I'm not particularly satisfied with this since there's a lot of almost duplicated code in handling the forwards / backwards passes. I don't think this is too bad, since this is already done across GRU / LSTM. It would also be slower, and wouldn't be possible to do the trick where the multiplication of the encoding states to the embedding hidden state is only done once. I'm not sure how important this is, since it seems like anything actually really speed critical needs to be written at a lower level anyways.

The other option would be to write it as a single module, like SeqGRUAttention. Seems to involve a lot of code redundancy for similar reasons. But this way, it wouldn't have to worry about playing nicely with Sequencer / repeating the boilerplate in GRU.lua.
I think the major disadvantage of this approach is that it's less transparent what's going on, since the gradients are computed by hand.

I'm slightly leaning towards the second.

gaosh · 2017-02-08T16:06:09Z

I implemented the temporal attention model with LSTM from Describing Videos by Exploiting Temporal Structure. I made it like SeqLSTMAttention, you can also take a look at this post.

juesato · 2017-02-12T02:56:19Z

This doesn't seem the same, unless I'm missing something. I want attention to be integrated into the internal dynamics of the GRU, and this module takes an encoding and a hidden state and gives you weights, but it would still need to be integrated with a GRU. If there's a clean way to do so, I'd be interested.

gaosh · 2017-02-12T15:11:40Z

Can you show me which paper you want to implement exactly?

juesato · 2017-02-13T01:17:17Z

Sure, either Bahdanau 2014 or Luong 2015 would do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation request: Attention GRU #387

Implementation request: Attention GRU #387

juesato commented Feb 2, 2017

gaosh commented Feb 8, 2017

juesato commented Feb 12, 2017

gaosh commented Feb 12, 2017

juesato commented Feb 13, 2017

Implementation request: Attention GRU #387

Implementation request: Attention GRU #387

Comments

juesato commented Feb 2, 2017

gaosh commented Feb 8, 2017

juesato commented Feb 12, 2017

gaosh commented Feb 12, 2017

juesato commented Feb 13, 2017