This project recreates the results from the paper "Detecting Symptoms of Depression on Social Media." The goal of this project is to use posts on Reddit to train a random forest classifier to predict symptoms of depression. This involves extracting features from the posts using two approaches: topic modeling with LDA and contextual embeddings with DistilRoBERTa. Both models are evaluated using 5-fold cross-validation to compute AUC scores.
Run all the cells in the notebook sequentially.
https://drive.google.com/file/d/1zLAhWo9zmdJt-VzH7k3u5jO3XqUr1JlE/view?usp=sharing