PyTorch Implementation of Monotonic Chunkwise Attention
- PyTorch 0.4
- Soft MoChA
- Hard MoChA
- Linear Time Decoding
- Experiment with Real-world dataset
It's not clear if authors' TF implementation supports decoding in linear time. They calculate energies for whole encoder outputs instead of scanning from previously attended encoder output.
- Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss and Douglas Eck. Online and Linear-Time Attention by Enforcing Monotonic Alignments (ICML 2017)
- Chung-Cheng Chiu and Colin Raffel. Monotonic Chunkwise Attention (ICLR 2018)