A Study on Detecting Hate Speech in Social Media

Malmasi, Shervin, and Marcos Zampieri. “Detecting Hate Speech in Social Media.” (2017): n. pag. Print.

Detecting Hate Speech in Social Media:

Aim of the study:

They aim to detect hate speech in social media by establishing lexical baselines and distinguish hate speech from profanity.

In order to do that they apply supervised classification methods using a recently released dataset annotated for detecting hate speech.

Their system uses character n-grams, word n-grams and word skip-grams.

They use a dataset that contains English tweets annotation with three labels:

Hate speech(contains hate speech)
Offensive language but no hate speech (contains offensive language but no hate speech)
No offensive content

LEXICAL BASELINE is constructed to discriminate between hate speech and profanity because hate speech is mostly confused with for offensive content and therefore it’s misclassified as being non offensive.

Classification:

How well a system can detect hate speech from other content that is generally profane.

They applied a linear Support Vector Machine (SVM) classifier and used three groups of features extracted for these experiments: surface n-grams, word skip-grams, and Brown clusters. Data:

14.509 English tweets annotated by a minimum of three annotators. Each instance in the dataset contains the text of a tweet along with one of the three aforementioned labels.

Classifier: They used LIBLINEAR package and the LIBLINEAR SVM implementation demonstrated to be a very effective classifier for Native Language Identification, temporal text classification, and language variety identification.

Features: Two groups of surface features:

Surface n-grams: These are our most basic features, consisting of character n-grams (of order 2–8) and word n-grams (of order 1– 3). All tokens are lowercased before extraction of n-grams; character n-grams are extracted across word boundaries.  
Word Skip-grams: Word Skip-grams: Similar to the above features, they also extract 1-, 2- and 3-skip word bigrams. These features were chosen to approximate longer distance dependencies between words, which would be hard to capture using bigrams alone.

Evaluation: They report their result in terms of accuracy. The results obtained are compared against a majority class baseline and an oracle classifier.

Results: They started by investigating efficacy of their features for the task. They first trained a single classifier, with each of them using a type of feature. Then, they trained a single model combining all of their features into single space. These were compared against the majority class baseline and the oracle. (see table 2)

The majority class baseline is quite high due to the class imbalance in the data. The oracle achieves an accuracy of 91.6%, showing that none of their features are able to correctly classify a substantial portion of their samples. The character n-grams perform well here, with 4-grams achieving the best performance of all features. Word unigrams also perform well, while performance degrades with bigrams, tri-grams and skip-grams. However, the skip-grams may be capturing longer distance dependencies which provide complementary information to the other feature types. In tasks relying on stylistic in- formation, it has been shown that skip-grams capture information that is very similar to syntactic dependencies.

The combination of all features does not achieve the performance of a character 4-grams model and causes a large dimensionality increase, with a total of 5.5 million features. It is not clear if this model is able to correctly capture the diverse information provided by the three feature types since we include more character n-gram models than word-based ones.

Secondly, they analyzed the rate of learning for these features. A learning curve for the classifier that yielded the best performance overall, character 4-grams is shown in the Figure 1. (see figure 1)

It is observed that accuracy increased as the amount of training instances increased and the standard deviation of the results between the cross-validation folds decreased. This suggest that the use of more training data is likely to provide higher accuracy. However, the accuracy increases at a much slower rate after 15.000 training instances.

Finally they also examined a confusion matrix for the character 4-gram model. This demonstrates that the greatest degree of confusion lies between hate speech and generally offensive material. A substantial amount of offensive content is also misclassified as being non-offensive. The non-offensive class achieves the best result, with the vast majority of samples being correctly classified. (see figure 2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Study on Detecting Hate Speech in Social Media

Clone this wiki locally