If 80% of data science work is data wrangling, 80% of your impact is through visualization.
Hans Rosling is one of the most popular data scientists on the web. His original TED talk was viral among my friends when it came out. We are going to create some graphics using his formatted data as our weekly case study. Note that we need to remove Kuwait from the data (discussion on this)
- Complete a review of 2-3 different data visualizations used to answer specific questions. Some fun websites are pudding.cool, wonkblog, fivethiryeight, and priceonomics (but you can use any website, blog, or article with a good visualization).
• Does living in a black neighborhood has a high insurance than a predominantly white neighborhood? By using a scatterplot, I can see a weak positive correlation that can signify the correlation between prices from renters and black residents' policies that cost more than the average. Another visualization between average percentage insurance and percentage city that is black demonstrates a significantly lower correlation by using the top ten insurance in California. Another visual is a bar chart illustrating, the insurance Goodcover doesn't have much of a difference with both neighborhoods. This can also be extraneous factors that could influence the correlation such as risks of invasion, fire, or liability. The visualizations can further explain the study and to determine if there were correlation or not. It demonstrate a further explanation about housing insurance. https://priceonomics.com/is-insurance-more-expensive-in-black-neighborhoods/
The frustration of shopping for woman's clothes are controversial. By being held too small to knitted pockets or being pleased with have pockets in a dress or pants. While men's pockets are so deep and spacious. The study presented a comparison of pockets between men and women by demonstrating an illustration by each scroll down with addition to facts, color, and size. There is a significant difference in size and brands for both genders. o https://pudding.cool/2018/08/pockets/
- Make sure you are in our Slack workspace.
- Finish setting up VScode for programming in R and Python.
- Finish setting up Rstudio.
- Finish installing Git.
- Finish creating your Github account and connecting to our organization.
- Recreate the two graphics in this repo using
gapminder
dataset fromlibrary(gapminder)
(get them to match as closely as you can).- Use
library(tidyverse)
to load ggplot2 and dplyr and thetheme_bw()
to duplicate the first plot. - Use
scale_y_continuous(trans = "sqrt")
to get the correct scale on the y-axis. - Build weighted average data set using
weighted.mean()
and GDP withsummarise()
andgroup_by()
that will be the black continent average line on the second plot. - Use
theme_bw()
to duplicate the second plot. You will need to use the new data to make the black lines and dots showing the continent average. - Use
ggsave()
and save each plot as a .png with a width of 15 inches.
- Use
- Recreate the two graphics in this repo using the
gapminder
dataset fromlibrary(gapminder)
(get them to match as closely as you can).
- Slack quick start guide
- VScode and interactive Python in VScode
- Git, Python, R installation.
- Rstudio and Using Git within Rstudio
- Github (Please carefully think about your Github username. It is for business use.)