Created a model which predicts a probability of each type of toxicity for each comment.
Dataset - https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data
Toxic comment classification- The dataset has 312735 comments. Out of these the training set has 1,59,571 comments while the training set has 1,53,164 comments. These comments are classified into 6 toxic behaviours. The classes are “Toxic, Severe toxic, Obscene, Threat, Insult, Identity hate”.
Results : I've used 2 classification algorithm- Multinomial Naive Bayes model with 54.003% and Logistic regression with 96% accuracy. We use the logistic regression model because of the better performance metric.