-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discovering Higher-level Patterns - Challenge #51
Comments
Intuitions: |
People speaking about Latin American politicians that ran for president (2005-2015):
Corpus del Español: This corpus contains about two billion words of Spanish, taken from about two million web pages from 21 different Spanish-speaking countries. It was web-scraped in 2015. Class dataset: Corpus del Español ("SPAN"). |
Comparison of Airbnb reviews from different places
Dataset/Download: Inside Airbnb (http://insideairbnb.com/get-the-data.html) |
Dataset: Twitter 'likes' of persons with differing self-identified personalities.
Data |
Subject: Topics and rhetoric change in the Marx-Engels Collected Works (MECW), 1835-1895
Data: |
Intuitions
Dataset |
Intuitions: Data: |
*The understanding of immigrant's benefit is different between academia and general public Data: |
Data: Music lyrics dataset. Intuition: |
Intuitions: Dataset: http://www.crazy-internet-people.com/site/gilmoregirls/scripts.html |
Intuitions
|
Intuitions: Data: Davies TV Corpus and fanfiction scraped using AO3 scraper script (https://github.com/radiolarian/AO3Scraper) |
Intuitions:
I didn't collect the data on this because this is unrelated to my project, but it can be scraped from Munk Debates and Intelligence Squared websites. |
Intuitions:
Data: |
First, write down three intuitions you have about broad content patterns you will discover in your data. Plan an asterisk next to the one you expect most firmly, and a plus next to the one that, if true, would be the biggest or most important surprise to others (especially the research community to whom you might communicate it, if robustly supported). Second, describe the dataset(s) on which you will build an unsupervised model to explore these intuitions. Then place (a) a link to the data, (b) a script to download and clean it, (c) a reference to a class dataset, (d) an invitation for a TA to contact you about it, or (e) a brief explanation why the data cannot be made available. Please do NOT spend time/space explaining the precise unsupervised strategy you will use to explore your intuitions. (Then upvote the 5 most interesting, relevant and challenging challenge responses from others).
The text was updated successfully, but these errors were encountered: