-
-
Notifications
You must be signed in to change notification settings - Fork 592
/
Copy path_03-ex.Rmd
221 lines (173 loc) · 7.91 KB
/
_03-ex.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
For these exercises we will use the `us_states` and `us_states_df` datasets from the **spData** package.
You must have attached the package, and other packages used in the attribute operations chapter (**sf**, **dplyr**, **terra**) with commands such as `library(spData)` before attempting these exercises:
```{r 03-ex-e0, include=TRUE, message=FALSE}
library(sf)
library(dplyr)
library(terra)
library(spData)
data(us_states)
data(us_states_df)
```
`us_states` is a spatial object (of class `sf`), containing geometry and a few attributes (including name, region, area, and population) of states within the contiguous United States.
`us_states_df` is a data frame (of class `data.frame`) containing the name and additional variables (including median income and poverty level, for the years 2010 and 2015) of US states, including Alaska, Hawaii and Puerto Rico.
The data comes from the United States Census Bureau, and is documented in `?us_states` and `?us_states_df`.
E1. Create a new object called `us_states_name` that contains only the `NAME` column from the `us_states` object using either base R (`[`) or tidyverse (`select()`) syntax.
What is the class of the new object and what makes it geographic?
```{r 03-ex-e1}
us_states_name = us_states["NAME"]
class(us_states_name)
attributes(us_states_name)
attributes(us_states_name$geometry)
```
```{asis 03-ex-e1-asis}
- It is of class `sf` and `data.frame`: it has 2 classes.
- It is the `sf` class that makes in geographic.
- More specifically it is the attributes of the object (`sf_column`) and the geometry column (such as `bbox`, `crs`) that make it geographic.
```
E2. Select columns from the `us_states` object which contain population data.
Obtain the same result using a different command (bonus: try to find three ways of obtaining the same result).
Hint: try to use helper functions, such as `contains` or `matches` from **dplyr** (see `?contains`).
```{r 03-ex-e2}
us_states |> select(total_pop_10, total_pop_15)
# or
us_states |> select(starts_with("total_pop"))
# or
us_states |> select(contains("total_pop"))
# or
us_states |> select(matches("tal_p"))
```
E3. Find all states with the following characteristics (bonus: find *and* plot them):
- Belong to the Midwest region.
- Belong to the West region, have an area below 250,000 km^2^ *and* in 2015 a population greater than 5,000,000 residents (hint: you may need to use the function `units::set_units()` or `as.numeric()`).
- Belong to the South region, had an area larger than 150,000 km^2^ and a total population in 2015 larger than 7,000,000 residents.
```{r 03-ex-e3}
us_states |>
filter(REGION == "Midwest")
us_states |> filter(REGION == "West", AREA < units::set_units(250000, km^2), total_pop_15 > 5000000)
# or
us_states |> filter(REGION == "West", as.numeric(AREA) < 250000, total_pop_15 > 5000000)
us_states |> filter(REGION == "South", AREA > units::set_units(150000, km^2), total_pop_15 > 7000000)
# or
us_states |> filter(REGION == "South", as.numeric(AREA) > 150000, total_pop_15 > 7000000)
```
E4. What was the total population in 2015 in the `us_states` dataset?
What was the minimum and maximum total population in 2015?
```{r 03-ex-e4}
us_states |> summarize(total_pop = sum(total_pop_15),
min_pop = min(total_pop_15),
max_pop = max(total_pop_15))
```
E5. How many states are there in each region?
```{r 03-ex-e5}
us_states |>
group_by(REGION) |>
summarize(nr_of_states = n())
```
E6. What was the minimum and maximum total population in 2015 in each region?
What was the total population in 2015 in each region?
```{r 03-ex-e6}
us_states |>
group_by(REGION) |>
summarize(min_pop = min(total_pop_15),
max_pop = max(total_pop_15),
tot_pop = sum(total_pop_15))
```
E7. Add variables from `us_states_df` to `us_states`, and create a new object called `us_states_stats`.
What function did you use and why?
Which variable is the key in both datasets?
What is the class of the new object?
```{r 03-ex-e7}
us_states_stats = us_states |>
left_join(us_states_df, by = c("NAME" = "state"))
class(us_states_stats)
```
E8. `us_states_df` has two more rows than `us_states`.
How can you find them? (Hint: try to use the `dplyr::anti_join()` function.)
```{r 03-ex-e8}
us_states_df |>
anti_join(st_drop_geometry(us_states), by = c("state" = "NAME"))
```
E9. What was the population density in 2015 in each state?
What was the population density in 2010 in each state?
```{r 03-ex-e9}
us_states2 = us_states |>
mutate(pop_dens_15 = total_pop_15/AREA,
pop_dens_10 = total_pop_10/AREA)
```
E10. How much has population density changed between 2010 and 2015 in each state?
Calculate the change in percentages and map them.
```{r 03-ex-e10}
us_popdens_change = us_states2 |>
mutate(pop_dens_diff_10_15 = pop_dens_15 - pop_dens_10,
pop_dens_diff_10_15p = (pop_dens_diff_10_15/pop_dens_10) * 100)
plot(us_popdens_change["pop_dens_diff_10_15p"])
```
E11. Change the columns' names in `us_states` to lowercase. (Hint: helper functions - `tolower()` and `colnames()` may help.)
```{r 03-ex-e11}
us_states %>%
setNames(tolower(colnames(.)))
```
E12. Using `us_states` and `us_states_df` create a new object called `us_states_sel`.
The new object should have only two variables: `median_income_15` and `geometry`.
Change the name of the `median_income_15` column to `Income`.
```{r 03-ex-e12}
us_states_sel = us_states |>
left_join(us_states_df, by = c("NAME" = "state")) |>
select(Income = median_income_15)
```
E13. Calculate the change in the number of residents living below the poverty level between 2010 and 2015 for each state. (Hint: See ?us_states_df for documentation on the poverty level columns.)
Bonus: Calculate the change in the *percentage* of residents living below the poverty level in each state.
```{r 03-ex-e13}
us_pov_change = us_states |>
left_join(us_states_df, by = c("NAME" = "state")) |>
mutate(pov_change = poverty_level_15 - poverty_level_10)
# Bonus
us_pov_pct_change = us_states |>
left_join(us_states_df, by = c("NAME" = "state")) |>
mutate(pov_pct_10 = (poverty_level_10 / total_pop_10) * 100,
pov_pct_15 = (poverty_level_15 / total_pop_15) * 100) |>
mutate(pov_pct_change = pov_pct_15 - pov_pct_10)
```
E14. What was the minimum, average and maximum state's number of people living below the poverty line in 2015 for each region?
Bonus: What is the region with the largest increase in people living below the poverty line?
```{r 03-ex-e14}
us_pov_change_reg = us_pov_change |>
group_by(REGION) |>
summarize(min_state_pov_15 = min(poverty_level_15),
mean_state_pov_15 = mean(poverty_level_15),
max_state_pov_15 = max(poverty_level_15))
# Bonus
us_pov_change |>
group_by(REGION) |>
summarize(region_pov_change = sum(pov_change)) |>
filter(region_pov_change == max(region_pov_change)) |>
pull(REGION) |>
as.character()
```
E15. Create a raster from scratch, with nine rows and columns and a resolution of 0.5 decimal degrees (WGS84).
Fill it with random numbers.
Extract the values of the four corner cells.
```{r 03-ex-e15}
r = rast(nrow = 9, ncol = 9, res = 0.5,
xmin = 0, xmax = 4.5, ymin = 0, ymax = 4.5,
vals = rnorm(81))
# using cell IDs
r[c(1, 9, 81 - 9 + 1, 81)]
r[c(1, nrow(r)), c(1, ncol(r))]
```
E16. What is the most common class of our example raster `grain`?
```{r 03-ex-e16}
grain = rast(system.file("raster/grain.tif", package = "spData"))
freq(grain) |>
arrange(-count )# the most common classes are silt and sand (13 cells)
```
E17. Plot the histogram and the boxplot of the `dem.tif` file from the **spDataLarge** package (`system.file("raster/dem.tif", package = "spDataLarge")`).
```{r 03-ex-e17}
dem = rast(system.file("raster/dem.tif", package = "spDataLarge"))
hist(dem)
boxplot(dem)
# we can also use ggplot2 after converting SpatRaster to a data frame
library(ggplot2)
ggplot(as.data.frame(dem), aes(dem)) + geom_histogram()
ggplot(as.data.frame(dem), aes(dem)) + geom_boxplot()
```