Grantland.com was a long-form sports & pop culture journalism site that was closed by Disney/ESPN in 2015. In this project, Unsupervised Learning and Natural Language Processing is used to cluster a selection of articles and Supervised Learning is used to predict the contributor of each article.
Start with the PDF here, which is an easy visual overview of the project and outcomes.
The notebook that shows how the data was collected is GLscraper. The article texts are stored in data.zip.
If you'd like to dig deeper, I recommend viewing the main notebook on NBViewer HERE, which enables some fun stuff like interactive 3D graphs of clustering solutions.