-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find better data examples #46
Comments
Need to replace
Questionable datasets
|
Draw examples from Kieran Healy's
|
Deadest names - see #115 |
Still working on this, specifically with Movies and Snapchat data. (Repo is very minimal now.) Will also look into Two other options, would love to hear what you think-- palmer penguins instead of EDIT: Also flagging Damon Jones' scrape of UCPD stops as a potential alternative to the WaPo Police shooting dataset. |
Palmer penguins is supposed to be a good drop-in replacement for Chicago flights data would be nice to replace |
I have looked at the penguin dataset and the lecture notes. I would say the penguin data is viable in terms of most of the operations we need. For instance, it could be used for practicing pipe and writing functions. However, one problem I think might be significant about penguin data is that it contains only 344 observations, while diamonds has more than 20k observations. In the exercise we use characteristics like color and cut, both of which have more than 5 kinds. But the qualitative variables, species and island, in penguins only have three different possible entries. This fact to some extent signifies the lack of variability in the penguin data, and thus might lead to some problems in modeling and make the data visualization less diverse than figures produced by diamonds. |
I don't think we use |
This is the website I looked up with a few use of scatterplot. I was not sure if my concern was significant, so I decided to bring it up anyways :) |
@YinsuH I think we'd be okay with the number of observations. But I agree with your concerns about an appropriate number of categorical variables for some of the examples. Especially I am thinking about computer programing as problem solving. Could you take a stab at rewriting the examples in the I think the easiest workflow will be to fork the |
@bensoltoff I have created a pull request for the course site. However, this is my first attempt in updating the website and some of the work might still have problems. I will continue checking them in the next few days. Also, I have written a few questions I got in the pull request post. Please have a look. |
Household Pulse Survey - assess impact of COVID-19 on households |
Need to replace
Questionable datasets
|
Needs more social scientific data examples to practice skills, not just datasets of convenience
The text was updated successfully, but these errors were encountered: