Home

Welcome to the hierarchical-attention-model wiki!

This repository implemented the hierarchical attention networks proposed by Zichao Yang et.al (paper: Hierarchical Attention Networks for Document Classification)

The following image is taken from their paper, which reflects the hierarchical attention network architectures:

It mainly consists of 4 parts(from bottom to top): a word sequence layer, a word-level attention layer, a sentence encoder, and a sentence-level attention layer.

Since the task(document classification) itself doesn't contain any sentence/document pairs like machine translation does, there's no natural way of forming any inter-attentions. Here an innovative way of forming intra-attention is proposed by the paper. As the figure shows, two context vectors $u_w$ and $u_s$ are used for that purpose. Specifically, $u_w$ is operated over each hidden output of the word sequence layer, and $u_s$ is operated over each hidden output of the sentence sequence layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally