Paper, Tags: #sarcasm-detection, #architectures
In this paper we revisit the notion of modelling contrast in order to reason with sarcasm. We propose an attention-based neural model that looks in-between instead of across, enabling it to explicitly model contrast and incongruity. We conduct extensive experiments on 6 benchmark datasets from Twitter, Reddit and the Internet Argument Corpus.
State-of-the-art sarcasm detection systems mainly rely on deep and sequential NNs, with GRUs or LSTMs, with the input document being parsed one word at a time. Shortcoming: no explicit interaction between word pairs, so can't explicitly model contrast, incongruity or juxtaposition of situation. And it's also difficult to capture long-range dependencies.
We have the intuition that modeling intra-sentence relationships can not only improve classification performance, but also pave the way for more explainable neural sarcasm detection methods.
We want to combine the effectiveness of state-of-the-art recurrent models while harnessing the intuition of looking in-between. We propose a multi-dimensional intra-attention recurrent network, MIARN, that models intricate similarities between each word pair in the sentence.
Zhang et al., 2016 propose a recurrent-based model with a gated pooling mechanism. Ghosh and Veale, 2016 propose a CNN-LSTM-DNN.
Our work focuses on document-only sarcasm detection, no personality information and user context. to the best of our knowledge, our model is the first attention model for sarcasm detection.
Input: one-hot encoded vectors, each vector is a word in the vocabulary. Then word embedding, output sequence of word embeddings.
Then we model the relationship of each word pair in the input sequence. this attention layer learns to pay attention based on a word’s largest contribution to all words in the sequence.
This is one-dimensional adaptation, but sometimes words have many meanings, so we represent the word pair with vectors. multi-dimensional adaptation of the intra-attention mechanism.
To account for compositional information (not happy), we also use a separate LSTM. We concatenate both things, and softmax layer.
- Twitter: Ptacek (hashtag) and Riloff (hand annotated)
- Debates: 2 datasets from the Internet Argument Corpus
- NBOW, neural BoW
- CNN
- LSTM
- Attention LSTM
- GRNN
- CNN-LSTM-DNN
In Twitter Ptacek, MIARN works better, in Riloff SIARM. In Reddit quite similar, in debates mostly MIARN. There's much more improvement in Debates (long text) rather than in short text.
our proposed models also perform much better on long text, which can be attributed to the intra-attentive representations explicitly modeling long range dependencies.
LSTM models focus on punctuation a lot, our model on polarized words.
Based on the intuition of looking in between, we proposed a new neural network architecture for sarcasm detection. Our model learns an intra-attentive representation of the sentence, enabling it to detect contrastive sentiment, situations and incongruity.