Skip to content
dhercher edited this page Mar 13, 2014 · 4 revisions

####Homework

  • The challenge is to detect when a comment from a conversation would be considered insulting to another participant in the conversation. Samples could be drawn from conversation streams like news commenting sites, magazine comments, message boards, blogs, text messages, etc.

  • Train a Naive Bayes spam classifier model using "../input/train-utf8.csv". Try to minimize the error rate [ (False Positives + False Negatives) / All Samples ]

  • Use your model to predict whether the text is spam using new data in "../input/test-utf8.csv". Export your model results in a new file called "output/hw7.csv". Each line of the file should include either a 1 if the text is spam, and a 0 otherwise. The output file should look like this:

 0
 1
 1
  • Finish this assignment before the beginning of class on Monday 3/17. We will compile the results, and the winner will get a special prize.

####Links