-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path0103_r-scripts.Rmd
344 lines (215 loc) · 8.23 KB
/
0103_r-scripts.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
# R Scripts {#r_scipts}
```{r include = FALSE}
source("common.R")
library(tweetrmd) #... embedding tweets
ds4p_urls <- read.csv("./admin/csv/ds4p_urls.csv")
```
Usually, we want to save the analyses we run so we can re-run them later or refer back to them. We do that with **scripts**.
Let's make an R script. Click File -> New File -> R Script.
A new part of the RStudio window appears. The Script pane.
This is where you can write a series of lines of code than you can then **execute** (send to the Console to run) later.
## Assign an Object
Pick a number, assign it to an object called `favorite_number`, and print it.
```{r}
favorite_number <- 42
favorite_number
```
Run these lines of code by placing your cursor on the line and either clicking the **Run** button or typing Ctrl/Cmd + Shift + Enter/Return.
## Types of Objects and Operations
Now let's explore some basic types of objects and operations in R.
### Vectors
*Vectors* store multiple entries of one data type, like numbers or characters.
You'll discover that they show up just about everywhere in R.
Let's collect some data and store this in a vector called `times`.
**How many hours did you sleep last night?**
Drop your answer in the chat.
Here's starter code:
```{r, eval=FALSE}
times <- c()
```
```{r, include=FALSE}
times <- round(runif(20, 4, 10))
```
The `c()` function is how we make a vector in R.
The "c" stands for "concatenate".
Operations happen component-wise.
Change those times to minutes.
How can we "save" the results?
All parts of a vector have the same **type**.
There are many types of variables in R.
The most common types are:
1. numeric (numbers) 1. double (numbers with decimal values; `2`, `3.4`, `1000`) 1. integer (`1L`, `2L`, `100L`)
2. character (words or strings; `"a"`, `"foo"`, `"lastname"`)
3. logical (`TRUE` or `FALSE`)
4. factor (categorical variable; `factor(c("control", "experiment"))`)
### Functions
R comes with many many **functions**.
Functions take one or more *inputs* and return one or more *outputs*.
You can think of functions as prewritten R programs.
Functions all take the general form:
```{r, eval = FALSE, tidy = FALSE}
functionName(arg1 = val1, arg2 = val2, and so on)
```
To call a function, type its name, then parentheses `()`.
Inside the `()`, type the arguments and values to use as input.
What's the average sleep time?
Let's compute that using the `mean()` function.
```{r}
mean(times)
```
To learn how a function works, we can look at its **help file**.
Open the help file for `mean()` by typing `?mean` or `help(mean)` in the console.
The help file will includes the following:
1. A brief description of the function
2. A list of the arguments and how to call the function
3. A detailed description of the arguments
4. (Optionally) Other usage details
5. Coded examples
We can see that `mean()` has 4 **arguments**:
1. `x`: A vector to compute the mean of
2. `trim`: the fraction (0 to 0.5) of observations to be trim from each end
3. `na.rm`: `TRUE` or `FALSE`--should missing values be removed before computing?
4. `...`: Other arguments. More on that later.
**Default values** for arugments are given in the Usage section by `=`.
If an argument has no default (like `x` in `mean()`), it usually means it's `required`.
Let's compute the trimmed mean of `times`, dropping 10% of values from each end.
```{r, eval=FALSE}
```
You can either enter arguments **in order**:
```{r}
mean(times, .1)
```
Or **by name**:
```{r}
mean(times, trim = .1)
```
It's good practice to *name all arguments after the first (or maybe second if its clear)*.
Try out some other functions, such as `sd()`, `range()`, and `length()`.
Much of R is about becoming familiar with R's "vocabulary".
A nice list can be found in [Advanced R - Vocabulary](http://adv-r.had.co.nz/Vocabulary.html).
### Comparisons
How many people slept less than 6 hours?
Let's answer that using *comparisons*.
We can compare the values of `times` to another value using `<`.
```{r}
times < 6
```
Comparisons return a vector of **logical** values.
We can do other logical comparisons:
```{r}
times > 6
times == 5
times <= 7
times != 2
```
We can combine multiple comparisons using `&` (AND), `|` (OR), and `!` (NOT).
```{r}
(times < 4) | (times > 9)
```
Try out these functions that work with logical values.
```{r}
any(times < 6)
all(times < 6)
which(times < 6)
```
### Subsetting
Use `[]` to subset values from a vector.
You can subset with an integer (by position) or with logicals.
```{r}
times[4]
times[c(2, 5)]
times[-6]
times[-c(2, 3)]
times[4:8]
times[times >= 7]
```
Subset `times`:
1. Extract the third entry.
2. Extract everything except the third entry.
3. Extract the second and fourth entry. The fourth and second entry.
4. Extract the second through fifth entry.
5. Extract all entries that are less than 4 hours Why does this work? Logical subsetting!
### Modifying a vector
You can change the vector by combining `[]` with `<-`.
Let's say that we learned that the second time was incorrect and we wanted to replace it with missing data.
In R, missing data is noted by `NA`.
Replace the second entry in `times` with `NA`.
```{r}
```
Now, let's "cap" all entries that are bigger than 8 at 8 hours.
```{r}
```
If this is more than one value, why don't we need to match the number of values?
Recycling!
Be careful of recycling!
Let's compute the mean of these new times:
```{r}
mean(times)
```
What happened?
How do we compute the mean of the non-missing values?
```{r}
```
### Data frames
We usually work with more than one variable at a time.
When we do that, we will work with **data frames**.
A data frame is a **list** of vectors, all of the same length.
R has some data frames "built in".
For example, some car data is attached to the variable name `mtcars`.
Print `mtcars` to screen.
Notice the tabular format.
**Your turn**: Finish the exercises of this section:
1. Use some of these built-in R functions to explore `mtcars`
- `head()`
- `tail()`
- `str()`
- `nrow()`
- `ncol()`
- `summary()`
- `row.names()`
- `names()`
2. What's the first column name in the `mtcars` dataset?
3. Which column number is named `"wt"`?
With data frames,each column is its own vector.
You can extract a vector by name using `$`.
For example, we can extract the `cyl` column with `mtcars$cyl`.
You can also extract columns using `[[]]`.
For example, try `mtcars[["cyl"]]` or `mtcars[[2]]`.
4. Extract the vector of `mpg` values. What's the mean `mpg` of all cars in the dataset?
### R packages
Often, the suite of functions that "come with" R are not enough to do an analysis.
Sometimes, the suite of functions that "come with" R are not very convenient.
In these cases, R *packages* come to the rescue.
These are "add ons", each coming with their own suite of functions and objects, usually designed to do one type of task.
[CRAN](https://cran.r-project.org/) stores many R packages that.
It's easy to install packages from CRAN using the `install.packages()` function.
Run the following lines of code to install the `tibble` and `gapminder` packages.
(But don't include `install.packages()` lines in your scripts---it's not very nice to others!)
```{r, eval=FALSE}
install.packages("tibble")
install.packages("palmerpenguins")
```
- `tibble`: a data frame with some useful "bells and whistles"
- `palmerpenguins`: a package with the penguins dataset (as a `tibble`!)
After you install a package, you need to *load* it to use it.
Use the `library()` function to load a package.
(Note: Do not use the similar function `require()` to load packages.
This has some different, undesirable, behavior for normal usage.)
Run the following lines of code to load the packages.
(Put these in your scripts, and near the top.)
```{r}
library(tibble)
library(palmerpenguins)
```
You can explore the objects in a package in the Environment pane.
Try the following two approaches to access information about the `tibble` package.
Run the lines one-at-a-time.
Vignettes are your friend, but do not always exist.
```{r, eval=FALSE}
?tibble
browseVignettes(package = "tibble")
```
Print out the `penguins` object to screen.
It's a tibble---how does it differ from a data frame in terms of how it's printed?
```{r links, child="admin/md/courselinks.md"}
```