Move all example headings back a level

andrewheiss · May 30, 2023 · 69e00fa · 69e00fa
1 parent fbf752d
commit 69e00fa
Show file tree

Hide file tree

Showing 10 changed files with 68 additions and 88 deletions.
diff --git a/example/03-example.qmd b/example/03-example.qmd
@@ -28,8 +28,6 @@ And I *promise* future examples will not be this long!
 </div>
 
 
-## Complete code
-
 ::: {.callout-important}
 ### Slight differences from the video
 
@@ -44,7 +42,7 @@ set.seed(1234)
 options(dplyr.summarise.inform = FALSE)
 ```
 
-### Load and clean data
+## Load and clean data
 
 First, we need to load a few libraries: {tidyverse} (as always) and {readxl} for reading Excel files:
 
@@ -95,7 +93,7 @@ bbc <- bbc_raw %>%
   mutate(grant_year_category = factor(grant_year))
 ```
 
-### Histograms
+## Histograms
 
 First let's look at the distribution of grant amounts with a histogram. Map `grant_amount` to the x-axis and don't map anything to the y-axis, since `geom_histogram()` will calculate the y-axis values for us:
 
@@ -142,7 +140,7 @@ ggplot(bbc, aes(x = grant_amount, fill = grant_year_category)) +
 
 Neat!
 
-### Points
+## Points
 
 Next let's look at the data using points, mapping year to the x-axis and grant amount to the y-axis:
 
@@ -181,7 +179,7 @@ ggplot(bbc, aes(x = grant_year_category, y = grant_amount, color = grant_program
 
 It does! We appear to have two different distributions of grants: small grants have a limit of £30,000, while regular grants have a much higher average amount.
 
-### Boxplots
+## Boxplots
 
 We can add summary information to the plot by only changing the `geom` we're using. Switch from `geom_point()` to `geom_boxplot()`:
 
@@ -190,7 +188,7 @@ ggplot(bbc, aes(x = grant_year_category, y = grant_amount, color = grant_program
   geom_boxplot()
 ```
 
-### Summaries
+## Summaries
 
 We can also make smaller summarized datasets with {dplyr} functions like `group_by()` and `summarize()` and plot those. First let's look at grant totals, averages, and counts over time:
 

diff --git a/example/04-example.qmd b/example/04-example.qmd
@@ -19,8 +19,6 @@ If you want to follow along with this example, you can download the data directl
 </div>
 
 
-## Complete code
-
 ::: {.callout-important}
 ### Slight differences from the video
 
@@ -33,7 +31,7 @@ set.seed(1234)
 options(dplyr.summarise.inform = FALSE)
 ```
 
-### Load data
+## Load data
 
 There are two CSV files:
 
@@ -76,7 +74,7 @@ births_2000_2014 <- read_csv(here::here(
 births_combined <- bind_rows(births_1994_1999, births_2000_2014)
 ```
 
-### Wrangle data
+## Wrangle data
 
 Let's look at the first few rows of the data to see what we're working with:
 
@@ -111,7 +109,7 @@ If you look at the data now, you can see the columns are changed and have differ
 
 Our `births` data is now clean and ready to go!
 
-### Bar plot
+## Bar plot
 
 First we can look at a bar chart showing the total number of births each day. We need to make a smaller summarized dataset and then we'll plot it:
 
@@ -156,7 +154,7 @@ ggplot(data = total_births_weekday,
        x = NULL, y = "Total births")
 ```
 
-### Lollipop chart
+## Lollipop chart
 
 Since the ends of the bars are often the most important part of the graph, we can use a lollipop chart to emphasize them. We'll keep all the same code from our bar chart and make a few changes:
 
@@ -181,7 +179,7 @@ ggplot(data = total_births_weekday,
 ```
 
 
-### Strip plot
+## Strip plot
 
 However, we want to \#barbarplots! (Though they're arguably okay here, since they show totals and not averages). Let's show all the data with points. We'll use the full dataset now, map x to weekday, y to births, and change `geom_col()` to `geom_point()`. We'll tell `geom_point()` to jitter the points randomly.
 
@@ -195,7 +193,7 @@ ggplot(data = births,
 
 There are some interesting points in the low ends, likely because of holidays like Labor Day and Memorial Day (for the Mondays) and Thanksgiving (for the Thursday). If we had a column that indicated whether a day was a holiday, we could color by that and it would probably explain most of those low numbers. Unfortunately we don't have that column, and it'd be hard to make. Some holidays are constant (Halloween is always October 31), but some aren't (Thanksgiving is the fourth Thursday in November, so we'd need to find out which November 20-somethingth each year is the fourth Thursday, and good luck doing that at scale).
 
-### Beeswarm plot
+## Beeswarm plot
 
 We can add some structure to these points if we use the [{ggbeeswarm} package](https://github.com/eclarke/ggbeeswarm), with either `geom_beeswarm()` or `geom_quasirandom()`. `geom_quasirandom()` actually works better here since there are so many points—`geom_beeswarm()` makes the clusters of points way too wide.
 
@@ -210,7 +208,7 @@ ggplot(data = births,
   guides(color = "none")
 ```
 
-### Heatmap
+## Heatmap
 
 Finally, let's use something non-traditional to show the average births by day in a somewhat proportional way. We can calculate the average number of births every day and then make a heatmap that fills each square by that average, thus showing the relative differences in births per day.
 

diff --git a/example/06-example.qmd b/example/06-example.qmd
@@ -18,8 +18,6 @@ If you want to follow along with this example, you can download the data below (
 </div>
 
 
-## Complete code
-
 ::: {.callout-important}
 ### Slight differences from the video
 
@@ -31,7 +29,7 @@ knitr::opts_chunk$set(fig.width = 6, fig.height = 3.6, fig.align = "center", col
 set.seed(1234)
 ```
 
-### Load and clean data
+## Load and clean data
 
 First, we load the libraries we'll be using:
 
@@ -61,7 +59,7 @@ weather_atl <- weather_atl_raw %>%
 
 Now we're ready to go!
 
-### Histograms
+## Histograms
 
 We can first make a histogram of wind speed. We'll use a bin width of 1 and color the edges of the bars white:
 
@@ -98,7 +96,7 @@ ggplot(weather_atl, aes(x = windSpeed, fill = Month)) +
 
 Neat! January, March, and April appear to have the most variation in windy days, with a few wind-less days and a few very-windy days, while August was very wind-less.
 
-### Density plots
+## Density plots
 
 The code to create a density plot is nearly identical to what we used for the histogram—the only thing we change is the `geom` layer:
 
@@ -195,7 +193,7 @@ ggplot(weather_atl_long, aes(x = temp, y = fct_rev(Month),
 Super neat! We can see much wider temperature disparities during the summer, with large gaps between high and low, and relatively equal high/low temperatures during the winter.
 
 
-### Box, violin, and rain cloud plots
+## Box, violin, and rain cloud plots
 
 Finally, we can look at the distribution of variables with box plots, violin plots, and other similar graphs. First, we'll make a box plot of windspeed, filled by the `Day` variable we made indicating weekday:
 

diff --git a/example/07-example.qmd b/example/07-example.qmd
@@ -17,8 +17,6 @@ If you want to follow along with this example, you can download the data below (
 </div>
 
 
-## Complete code
-
 ::: {.callout-important}
 ### Slight differences from the video
 
@@ -31,7 +29,7 @@ set.seed(1234)
 options("digits" = 2, "width" = 150)
 ```
 
-### Load and clean data
+## Load and clean data
 
 First, we load the libraries we'll be using:
 
@@ -52,7 +50,7 @@ weather_atl <- read_csv("data/atl-weather-2019.csv")
 weather_atl <- read_csv(here::here("files", "data", "external_data", "atl-weather-2019.csv"))
 ```
 
-### Legal dual y-axes
+## Legal dual y-axes
 
 It is fine (and often helpful!) to use two y-axes if the two different scales measure the same thing, like counts and percentages, Fahrenheit and Celsius, pounds and kilograms, inches and centimeters, etc.
 
@@ -96,7 +94,7 @@ ggplot(weather_atl, aes(x = time, y = temperatureHigh)) +
   theme_minimal()
 ```
 
-### Combining plots
+## Combining plots
 
 A good alternative to using two y-axes is to use two plots instead. The [{patchwork} package](https://github.com/thomasp85/patchwork) makes this *really* easy to do with R. There are other similar packages that do this, like {cowplot} and {gridExtra}, but I've found that {patchwork} is the easiest to use *and* it actually aligns the different plot elements like axis lines and legends (yay alignment in CRAP!). The [documentation for {patchwork}](https://patchwork.data-imaginist.com/articles/guides/assembly.html) is really great and full of examples—you should check it out to see all the things you can do with it!
 
@@ -146,7 +144,7 @@ temp_plot + humidity_plot +
   plot_layout(ncol = 1, heights = c(0.7, 0.3))
 ```
 
-### Scatterplot matrices
+## Scatterplot matrices
 
 We can visualize the correlations between pairs of variables with the `ggpairs()` function in the {GGally} package. For instance, how correlated are high and low temperatures, humidity, wind speed, and the chance of precipitation? We first make a smaller dataset with just those columns, and then we feed that dataset into `ggpairs()` to see all the correlation information:
 
@@ -170,7 +168,7 @@ ggpairs(weather_correlations) +
 ```
 
 
-### Correlograms
+## Correlograms
 
 Scatterplot matrices typically include way too much information to be used in actual publications. I use them when doing my own analysis just to see how different variables are related, but I rarely polish them up for public consumption. In the readings for today, Claus Wilke showed a type of plot called a [*correlogram*](https://clauswilke.com/dataviz/visualizing-associations.html#associations-correlograms) which *is* more appropriate for publication. 
 
@@ -255,7 +253,7 @@ ggplot(things_to_correlate_long,
 ```
 
 
-### Simple regression
+## Simple regression
 
 We can also visualize the relationships between variables using regression. Simple regression is easy to visualize, since you're only working with an X and a Y. For instance, what's the relationship between humidity and high temperatures during the summer?
 
@@ -295,7 +293,7 @@ ggplot(weather_atl_summer,
 
 And indeed, as humidity increases, temperatures decrease.
 
-### Coefficient plots
+## Coefficient plots
 
 But if we use multiple variables in the model, it gets really hard to visualize the results since we're working with multiple dimensions. Instead, we can use coefficient plots to see the individual coefficients in the model.
 
@@ -334,7 +332,7 @@ ggplot(model_tidied,
 
 Neat! Now we can see how big these different coefficients are and how close they are to zero. Wind speed has a big significant effect on temperature. The others are all very close to zero.
 
-### Marginal effects plots
+## Marginal effects plots
 
 ::: {.callout-tip}
 ### 2023 update!
@@ -455,7 +453,7 @@ ggplot(predicted_values_fancy, aes(x = windSpeed, y = .fitted)) +
 That's so neat! Temperatures go down slightly as cloud cover increases. If we wanted to improve the model, we'd add an interaction term between cloud cover and windspeed so that each line would have a different slope in addition to a different intercept, but that's beyond the scope of this class.
 
 
-### Predicted values and marginal effects in 2023
+## Predicted values and marginal effects in 2023
 
 Instead of using `expand_grid()` and `augment()` to create and plug in a mini dataset of variables to move up and down, we can use [the {marginaleffects} package](https://vincentarelbundock.github.io/marginaleffects/) to simplify life!
 

diff --git a/example/08-example.qmd b/example/08-example.qmd
@@ -24,8 +24,6 @@ If you want to skip the data downloading, you can download the data below (you'l
 </div>
 
 
-## Complete code
-
 ::: {.callout-important}
 ### Slight differences from the video
 
@@ -38,7 +36,7 @@ set.seed(1234)
 options("digits" = 2, "width" = 150)
 ```
 
-### Load and clean data
+## Load and clean data
 
 First, we load the libraries we'll be using:
 
@@ -106,7 +104,7 @@ wdi_clean <- wdi_raw %>%
 head(wdi_clean)
 ```
 
-### Small multiples
+## Small multiples
 
 First we can make some small multiples plots and show life expectancy over time for a handful of countries. We'll make a list of some countries chosen at random while I scrolled through the data, and then filter our data to include only those rows. We then plot life expectancy, faceting by country.
 
@@ -167,7 +165,7 @@ ggplot(life_expectancy_eu, aes(x = year, y = life_expectancy)) +
 
 Neat!
 
-### Sparklines
+## Sparklines
 
 Sparklines are just line charts (or bar charts) that are really really small.
 
@@ -206,7 +204,7 @@ You can then use those saved tiny plots in your text.
 > Both India <img class="img-inline" src="/example/08-example_files/figure-html/india-spark-1.png" width = "100"/> and China <img class="img-inline" src="/example/08-example_files/figure-html/china-spark-1.png" width = "100"/> have seen increased CO<sub>2</sub> emissions over the past 20 years.
 
 
-### Slopegraphs
+## Slopegraphs
 
 We can make a slopegraph to show changes in GDP per capita between two time periods. We need to first filter our WDI to include only the start and end years (here 1995 and 2015). Then, to make sure that we're using complete data, we'll get rid of any country that has missing data for either 1995 or 2015. The `group_by(...) %>% filter(...) %>% ungroup()` pipeline does this, with the `!any(is.na(gdp_per_cap))` test keeping any rows where any of the `gdp_per_cap` values are not missing for the whole country.
 
@@ -289,7 +287,7 @@ ggplot(gdp_south_asia, aes(x = year, y = gdp_per_cap, group = country, color = c
 ```
 
 
-### Bump charts
+## Bump charts
 
 Finally, we can make a bump chart that shows changes in rankings over time. We'll look at CO<sub>2</sub> emissions in South Asia. First we need to calculate a new variable that shows the rank of each country within each year. We can do this if we group by year and then use the `rank()` function to rank countries by the `co2_emissions` column.
 

diff --git a/example/09-example.qmd b/example/09-example.qmd
@@ -24,8 +24,6 @@ If you want to skip the data downloading, you can download the data below (you'l
 </div>
 
 
-## Complete code
-
 ::: {.callout-important}
 ### Slight differences from the video
 
@@ -39,7 +37,7 @@ options("digits" = 2, "width" = 150)
 ```
 
 
-### Load data
+## Load data
 
 First, we load the libraries we'll be using:
 
@@ -70,7 +68,7 @@ wdi_clean <- wdi_co2_raw %>%
   filter(region != "Aggregates")
 ```
 
-### Clean and reshape data
+## Clean and reshape data
 
 Next we'll do some substantial filtering and reshaping so that we can end up with the rankings of CO~2~ emissions in 1995 and 2014. I annotate as much as possible below so you can see what's happening in each step.
 
@@ -133,7 +131,7 @@ And here's what it looks like now:
 head(co2_rankings)
 ```
 
-### Plot the data and annotate
+## Plot the data and annotate
 
 I use IBM Plex Sans in this plot. You can [download it from Google Fonts](https://fonts.google.com/specimen/IBM+Plex+Sans).