diff --git a/.nojekyll b/.nojekyll index 71e5030..ac3ca98 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -10166c3e \ No newline at end of file +466e1c70 \ No newline at end of file diff --git a/lectures.html b/lectures.html index 20e0e14..9c414f4 100644 --- a/lectures.html +++ b/lectures.html @@ -208,7 +208,7 @@

Lectures

-
+
@@ -246,7 +246,7 @@

-
+
@@ -290,7 +290,7 @@

-
+
@@ -337,7 +337,7 @@

-
+
@@ -378,7 +378,7 @@

-
+
@@ -419,7 +419,7 @@

-
+
@@ -460,7 +460,7 @@

-
+
@@ -510,7 +510,7 @@

-
+
@@ -563,7 +563,7 @@

-
+
@@ -613,7 +613,7 @@

-
+
@@ -663,7 +663,7 @@

-
+
@@ -710,7 +710,7 @@

-
+
@@ -757,7 +757,7 @@

-
+
@@ -804,7 +804,7 @@

-
+
@@ -845,7 +845,7 @@

-
+
@@ -886,7 +886,7 @@

-
+
@@ -930,7 +930,7 @@

-
+
@@ -974,7 +974,7 @@

-
+
@@ -1015,7 +1015,7 @@

-
+
@@ -1056,7 +1056,7 @@

-
+
@@ -1103,7 +1103,7 @@

-
+
@@ -1150,7 +1150,7 @@

-
+
@@ -1194,7 +1194,7 @@

-
+
@@ -1238,7 +1238,7 @@

-
+
@@ -1276,7 +1276,7 @@

-
+
diff --git a/lectures.xml b/lectures.xml index 3341134..b1e19cf 100644 --- a/lectures.xml +++ b/lectures.xml @@ -10,7 +10,7 @@ Course website for Statistical Computing (BSPH 140.776) in Fall 2023 quarto-1.3.450 -Fri, 18 Aug 2023 01:10:34 GMT +Fri, 18 Aug 2023 01:51:49 GMT 01 - Welcome! Leonardo Collado Torres @@ -20,7 +20,7 @@ -

This lecture as the rest of the course is adapted from the version Stephanie C. Hicks designed and maintained in 2021 - 2022. Check the recent changes to this file through the GitHub history.

+

This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

Welcome! I am very excited to have you in our one-term (i.e. half a semester) course on Statistical Computing course number (140.776) offered by the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health.

This course is designed for ScM and PhD students at Johns Hopkins Bloomberg School of Public Health. I am pretty flexible about permitting outside students, but I want everyone to be aware of the goals and assumptions so no one feels like they are surprised by how the class works.

@@ -366,8 +366,9 @@ Important

Typos and corrections

Feel free to submit typos/errors/etc via the github repository associated with the class: https://github.com/lcolladotor/jhustatcomputing2023. You will have the thanks of your grateful instructor!

-
-

R session information

+
+
+

R session information

session_info()
-
]]> @@ -428,7 +428,7 @@ font-style: inherit;">session_info()
module 1 week 1 https://lcolladotor.github.io/jhustatcomputing2023/posts/01-welcome/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
02 - Introduction to R and RStudio! @@ -439,6 +439,7 @@ font-style: inherit;">session_info()
+

This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

There are only two kinds of languages: the ones people complain about and the ones nobody uses. —Bjarne Stroustrup

@@ -756,7 +757,7 @@ font-style: inherit;">install_github()
background-color: null; font-style: inherit;">library('ggplot2')
+font-style: inherit;">"ggplot2")

You may or may not see a short message on the screen. Some packages show messages when you load them, and others do not.

This was a quick overview of R packages. We will use a lot of them, so you will get used to them rather quickly.

@@ -835,9 +836,62 @@ Tip

[‘Water Colours’ from Danielle Navarro https://art.djnavarro.net]

+ + +
+

R session information

+
+
options(width = 120)
+sessioninfo::session_info()
+
+
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
+ setting  value
+ version  R version 4.3.1 (2023-06-16)
+ os       macOS Ventura 13.5
+ system   aarch64, darwin20
+ ui       X11
+ language (EN)
+ collate  en_US.UTF-8
+ ctype    en_US.UTF-8
+ tz       America/New_York
+ date     2023-08-17
+ pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
+
+─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
+ package     * version date (UTC) lib source
+ cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
+ colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
+ digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
+ evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
+ fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
+ htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
+ htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
+ jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
+ knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
+ rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
+ rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
+ rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
+ sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
+ xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
+ yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
+
+ [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
+
+──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+
+
-
]]> @@ -847,7 +901,7 @@ Tip programming RStudio https://lcolladotor.github.io/jhustatcomputing2023/posts/02-introduction-to-r-and-rstudio/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT @@ -859,6 +913,7 @@ Tip +

This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

Pre-lecture materials

@@ -1043,9 +1098,62 @@ Tip

[‘Flametree’ from Danielle Navarro https://art.djnavarro.net]

+
+ +
+

R session information

+
+
options(width = 120)
+sessioninfo::session_info()
+
+
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
+ setting  value
+ version  R version 4.3.1 (2023-06-16)
+ os       macOS Ventura 13.5
+ system   aarch64, darwin20
+ ui       X11
+ language (EN)
+ collate  en_US.UTF-8
+ ctype    en_US.UTF-8
+ tz       America/New_York
+ date     2023-08-17
+ pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
+
+─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
+ package     * version date (UTC) lib source
+ cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
+ colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
+ digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
+ evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
+ fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
+ htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
+ htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
+ jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
+ knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
+ rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
+ rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
+ rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
+ sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
+ xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
+ yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
+
+ [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
+
+──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+
+
-
]]> @@ -1056,7 +1164,7 @@ Tip git GitHub https://lcolladotor.github.io/jhustatcomputing2023/posts/03-introduction-to-gitgithub/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
@@ -1068,6 +1176,7 @@ Tip +

This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

An article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result. —Claerbout and Karrenbach (1992)

@@ -1328,9 +1437,87 @@ Questions + + +
+

R session information

+
+
options(width = 120)
+sessioninfo::session_info()
+
+
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
+ setting  value
+ version  R version 4.3.1 (2023-06-16)
+ os       macOS Ventura 13.5
+ system   aarch64, darwin20
+ ui       X11
+ language (EN)
+ collate  en_US.UTF-8
+ ctype    en_US.UTF-8
+ tz       America/New_York
+ date     2023-08-17
+ pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
+
+─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
+ package     * version date (UTC) lib source
+ cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
+ colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
+ colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
+ digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
+ dplyr         1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
+ emojifont     0.5.5   2021-04-20 [1] CRAN (R 4.3.0)
+ evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
+ fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
+ fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
+ generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
+ ggplot2       3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
+ glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
+ gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
+ here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
+ htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
+ htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
+ jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
+ knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
+ lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
+ magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
+ munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
+ pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
+ pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
+ proto         1.0.0   2016-10-29 [1] CRAN (R 4.3.0)
+ R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
+ rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
+ rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
+ rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
+ rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
+ scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
+ sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
+ showtext      0.9-6   2023-05-03 [1] CRAN (R 4.3.0)
+ showtextdb    3.0     2020-06-04 [1] CRAN (R 4.3.0)
+ sysfonts      0.8.8   2022-03-13 [1] CRAN (R 4.3.0)
+ tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
+ tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
+ utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
+ vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
+ xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
+ yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
+
+ [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
+
+──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+
+
-
]]> @@ -1339,7 +1526,7 @@ Questions R reproducibility https://lcolladotor.github.io/jhustatcomputing2023/posts/04-reproducible-research/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
05 - Literate Statistical Programming @@ -1350,6 +1537,7 @@ Questions +

This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

Pre-lecture materials

@@ -1794,22 +1982,26 @@ font-style: inherit;">ncol = cols)
if (condition) {
-  
-}
-
-    else {
-  
-}
-
-## Case 1
+} else if (condition) {
-  
-}
+ ## Case 2 +} else if (condition) { + ## Case 3 +}
  • fun to create a function
  • @@ -1820,7 +2012,7 @@ background-color: null; font-style: inherit;"><- function(variables) { - + }
      @@ -1832,7 +2024,7 @@ background-color: null; font-style: inherit;">for (variable in vector) { - + }
        @@ -1973,10 +2165,63 @@ Tip
      +
+ +
+

R session information

+
+
options(width = 120)
+sessioninfo::session_info()
+
+
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
+ setting  value
+ version  R version 4.3.1 (2023-06-16)
+ os       macOS Ventura 13.5
+ system   aarch64, darwin20
+ ui       X11
+ language (EN)
+ collate  en_US.UTF-8
+ ctype    en_US.UTF-8
+ tz       America/New_York
+ date     2023-08-17
+ pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
+
+─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
+ package     * version date (UTC) lib source
+ cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
+ colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
+ digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
+ evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
+ fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
+ htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
+ htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
+ jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
+ knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
+ rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
+ rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
+ rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
+ sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
+ xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
+ yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
+
+ [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
+
+──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+
+
-
@@ -1995,7 +2240,7 @@ Knuth, Donald E. 1984. “Literate Programming.” Comput. J.R Markdown programming https://lcolladotor.github.io/jhustatcomputing2023/posts/05-literate-programming/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
@@ -2007,6 +2252,7 @@ Knuth, Donald E. 1984. “Literate Programming.” Comput. J.This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

Pre-lecture materials

@@ -2366,7 +2612,7 @@ font-style: inherit;">c("bibtex", "RefManageR") +font-style: inherit;">"RefManageR"))

What do they do? How might they be helpful to you in terms of reference management?

    @@ -2400,10 +2646,63 @@ Tip

    [Add here.]

    +
+ +
+

R session information

+
+
options(width = 120)
+sessioninfo::session_info()
+
+
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
+ setting  value
+ version  R version 4.3.1 (2023-06-16)
+ os       macOS Ventura 13.5
+ system   aarch64, darwin20
+ ui       X11
+ language (EN)
+ collate  en_US.UTF-8
+ ctype    en_US.UTF-8
+ tz       America/New_York
+ date     2023-08-17
+ pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
+
+─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
+ package     * version date (UTC) lib source
+ cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
+ colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
+ digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
+ evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
+ fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
+ htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
+ htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
+ jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
+ knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
+ rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
+ rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
+ rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
+ sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
+ xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
+ yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
+
+ [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
+
+──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+
+
-

References

@@ -2422,7 +2721,7 @@ Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbo R Markdown programming https://lcolladotor.github.io/jhustatcomputing2023/posts/06-reference-management/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT @@ -2434,6 +2733,7 @@ Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbo +

This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

“When writing code, you’re always collaborating with future-you; and past-you doesn’t respond to emails”. —Hadley Wickham

@@ -2521,13 +2821,37 @@ font-style: inherit;">setwd()
background-color: null; font-style: inherit;">setwd("C:\Users\Brian\path\only"C:\t\\hat\Brian\has")
+font-style: inherit;">Users\\Brian\\path\\only\\that\\Brian\\has") -

The problem is, if I want to use his code, I will need to go and hand-edit every single one of those paths (C:\Users\Brian\path\only\that\Brian\has) to the path that I want to use on my computer or wherever I saved the folder on my computer (e.g. /Users/Stephanie/Documents/path/only/I/have).

+

The problem is, if I want to use his code, I will need to go and hand-edit every single one of those paths (C:\Users\Brian\path\only\that\Brian\has) to the path that I want to use on my computer or wherever I saved the folder on my computer (e.g. /Users/leocollado/Documents/path/only/I/have).

  1. This is an unsustainable practice.
  2. I can go in and manually edit the path, but this assumes I know how to set a working directory. Not everyone does.
  3. @@ -2692,7 +3016,7 @@ font-style: inherit;">"data"))
    if(if (!"my", "relative", "path"))){
    -  "path"))) {
    +    dir.create(::here('functions.R'))
    +font-style: inherit;">"functions.R"))
    @@ -2940,7 +3264,7 @@ font-style: inherit;">5 background-color: null; font-style: inherit;">save(x, file=file = here("x.Rda")) background-color: null; font-style: inherit;">saveRDS(x, file=file = here("x.Rds")) background-color: null; font-style: inherit;">list.files(path=path = here(2 save(x,y, save(x, y, file=file = here( x,y,z 1,2,3", - skip = 2) +font-style: inherit;">2 +)
    Rows: 1 Columns: 3
     ── Column specification ────────────────────────────────────────────────────────
    @@ -3441,11 +3766,12 @@ font-style: inherit;">  x,y,z
       1,2,3",
    -      comment = "#")
    +font-style: inherit;">"#" +)
    Rows: 1 Columns: 3
     ── Column specification ────────────────────────────────────────────────────────
    @@ -3494,12 +3820,13 @@ font-style: inherit;">here("data", "team_standings.csv"), 
    -                  "team_standings.csv"),
    +    col_types = "cc")
    +font-style: inherit;">"cc" +)

    Note that the col_types argument accepts a compact representation. Here "cc" indicates that the first column is character and the second column is character (there are only two columns). Using the col_types argument is useful because often it is not easy to automatically figure out the type of a column by looking at a few rows (especially if a column has many missing values).

    @@ -3528,12 +3855,13 @@ font-style: inherit;">here("data", "2016-07-19.csv.bz2"), - "2016-07-19.csv.bz2"), + n_max = 10)
    +font-style: inherit;">10 +)
    Rows: 10 Columns: 10
     ── Column specification ────────────────────────────────────────────────────────
    @@ -3559,18 +3887,19 @@ font-style: inherit;">here("data", "2016-07-19.csv.bz2"), 
    -                 "2016-07-19.csv.bz2"),
    +    col_types = "ccicccccci", 
    -                 "ccicccccci",
    +    n_max = 10)
    -logs
    +font-style: inherit;">10 +) +logs
    # A tibble: 10 × 10
        date       time     size r_version r_arch r_os  package version country ip_id
    @@ -3601,8 +3930,8 @@ font-style: inherit;">here("data", "2016-07-19.csv.bz2"), 
    -                     "2016-07-19.csv.bz2"),
    +    col_types = date = col_date()),
    -                         n_max = 10)
    -logdates
    +font-style: inherit;">10 +) +logdates
    # A tibble: 10 × 1
        date      
    @@ -3694,9 +4024,83 @@ Tip
     
     
    +
    + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + bit           4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
    + bit64         4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
    + here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
    + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
    + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
    + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
    + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
    + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
    + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
    + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
    + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
    + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
    + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
    + vroom         1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
    + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -
    ]]> @@ -3708,7 +4112,7 @@ Tip here tidyverse https://lcolladotor.github.io/jhustatcomputing2023/posts/07-reading-and-writing-data/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 08 - Managing data frames with the Tidyverse @@ -3719,6 +4123,7 @@ Tip +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    Pre-lecture materials

    @@ -3920,8 +4325,8 @@ Note background-color: null; font-style: inherit;">as_tibble(chicago) %>% - %>% + print(
    tibble(
    -      a = :5,
    -      b = :10,
    -      c = 1,
    -      z = (a 
    tibble(
    -      `:5,
    -      `= "numeric",
    -      `<- tibble(
    -      a = :5,
    -      b = :10,
    -      c = 1,
    -      z = (a head(chicago)
    background-color: null; font-style: inherit;">head
    (transmute(chicago,
    - transmute(chicago, + pm10detrend = pm10tmean2 na.rm = TRUE), - o3detrend = o3tmean2 mean(o3tmean2, na.rm = TRUE)))
    +font-style: inherit;">TRUE
    )
    +))
    # A tibble: 6 × 2
       pm10detrend o3detrend
    @@ -4909,7 +5315,8 @@ font-style: inherit;">group_by(chicago, year)
    summarize(years, summarize(years,
    +    pm25 = mean(pm25, na.rm = TRUE), 
    -          TRUE),
    +    o3 = max(o3tmean2, na.rm = TRUE), 
    -          TRUE),
    +    no2 = median(no2tmean2, na.rm = TRUE))
    +font-style: inherit;">TRUE) +)
    # A tibble: 19 × 4
         year  pm25    o3   no2
    @@ -5018,7 +5426,8 @@ font-style: inherit;">group_by(chicago, pm25.quint)
    summarize(quint, summarize(quint,
    +    o3 = mean(o3tmean2, na.rm = TRUE), 
    -          TRUE),
    +    no2 = mean(no2tmean2, na.rm = TRUE))
    +font-style: inherit;">TRUE) +)
    # A tibble: 6 × 3
       pm25.quint     o3   no2
    @@ -5073,9 +5483,15 @@ font-style: inherit;">first(x)))
    background-color: null; font-style: inherit;">first(x) %>% second %>% + second() %>% third +font-style: inherit;">%>% + third()
    @@ -5092,8 +5508,8 @@ Example
    chicago %>% 
    -  %>%
    +    mutate(+ 1900) %>%    
    -  %>%
    +    group_by(year) %>% 
    -  %>%
    +    summarize(summarize(
    +        pm25 = mean(pm25, na.rm = TRUE), 
    -            TRUE),
    +        o3 = max(o3tmean2, na.rm = TRUE), 
    -            TRUE),
    +        no2 = median(no2tmean2, na.rm = TRUE))
    +font-style: inherit;">TRUE) + )
    # A tibble: 19 × 4
         year  pm25    o3   no2
    @@ -5200,15 +5618,16 @@ font-style: inherit;">+ 1) %>% 
    -        %>%
    +    group_by(month) %>% 
    -        %>%
    +    summarize(summarize(
    +        pm25 = mean(pm25, na.rm = TRUE), 
    -                  TRUE),
    +        o3 = max(o3tmean2, na.rm = TRUE), 
    -                  TRUE),
    +        no2 = median(no2tmean2, na.rm = TRUE))
    +font-style: inherit;">TRUE) + )
    # A tibble: 12 × 4
        month  pm25    o3   no2
    @@ -5285,16 +5705,16 @@ font-style: inherit;">10)
    # A tibble: 10 × 11
        city   tmpd dewpoint date        pm25 pm10tmean2 o3tmean2 no2tmean2
        <chr> <dbl>    <dbl> <date>     <dbl>      <dbl>    <dbl>     <dbl>
    - 1 chic   62       45.3 2001-05-08   7.3       51.5    26.5       27.6
    - 2 chic   36       36.8 1991-11-28  NA         10      11.7       16.6
    - 3 chic   29       19.6 2005-03-14  19.6       51       9.93      39.9
    - 4 chic   20       11.2 2004-02-13  24.5       17.5    21.8       23.3
    - 5 chic   32.5     20.4 1997-03-23  NA         14.2    25.4       19.0
    - 6 chic   68.5     64.1 1996-07-27  NA         21      19.6       22.4
    - 7 chic   28.5     18.2 1997-11-11  NA         24.5     3.94      28.1
    - 8 chic   45.5     44.1 1991-04-13  NA         25      13.0       15.4
    - 9 chic   67       49.3 2000-10-14  19.4       54.5    24.9       31.0
    -10 chic   71       48   1994-09-21  NA         82      30.5       48.5
    + 1 chic   49       40.2 2000-09-25   6.6        7      17.2       15.5
    + 2 chic   35       24.1 1989-11-02  NA         25       8.83      17.3
    + 3 chic   63.5     54.4 1996-04-18  NA         54      30.5       26.7
    + 4 chic   70       65.9 1997-06-19  NA         60.5    32.4       39.9
    + 5 chic   54       50.6 2005-11-05  27.2       32      11.5       18.2
    + 6 chic   86.5     73.4 1990-07-04  NA         60.6    52.2       12.8
    + 7 chic   74       74.6 1987-08-14  NA         49.5    24.2       18.6
    + 8 chic   34.5     29.1 1995-11-27  NA         25       6.57      29.3
    + 9 chic   73       61.2 1995-09-13  NA         46      25.3       26.5
    +10 chic   79       64.6 2005-07-31  20.8       29.5    40.8       20.2
     # ℹ 3 more variables: pm25detrend <dbl>, year <dbl>, pm25.quint <fct>
    @@ -5411,9 +5831,94 @@ Tip +
    + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
    + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
    + ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
    + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
    + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
    + here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
    + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
    + lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
    + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
    + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
    + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
    + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
    + purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
    + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
    + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
    + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
    + tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
    + tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
    + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
    + tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
    + timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
    + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
    + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
    + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
    + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -
    ]]> @@ -5426,7 +5931,7 @@ Tip tibble tidyverse https://lcolladotor.github.io/jhustatcomputing2023/posts/08-managing-data-frames-with-tidyverse/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 09 - Tidy data and the Tidyverse @@ -5437,6 +5942,7 @@ Tip +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    “Happy families are all alike; every unhappy family is unhappy in its own way.” —- Leo Tolstoy

    @@ -5579,7 +6085,7 @@ font-style: inherit;">library(tidyverse) relig_income %>% - pivot_longer("respondents") %>% - mutate(# Gather everything EXCEPT religion to tidy datarelig_income %>% - pivot_longer(
    relig_income %>%
    -      pivot_longer("respondents") %>%
    -      mutate(income = factor(income)) %>% 
    -  %>%
    +    group_by(income) %>% 
    -  %>%
    +    summarize(sum(respondents)) %>%
    -      pivot_wider(pivot_wider(
    +        names_from = "income", 
    -              "income",
    +        values_from = "total_respondents") "total_respondents"
    +    ) %>%
    -  knitr    knitr::<- tibble(
    -      "company" :3, each=each = 4), 
    -  4),
    +    "year"  "year" = 2009, 3),
    -      "Q1"    "Q1" = size = 12),
    -      "Q2"    "Q2" = size = 12),
    -      "Q3"    "Q3" = size = 12),
    -      "Q4"    "Q4" = 12),
     
    # A tibble: 12 × 6
        company  year    Q1    Q2    Q3    Q4
          <int> <int> <int> <int> <int> <int>
    - 1       1  2006    34     7    70     7
    - 2       1  2007    72    26    96    64
    - 3       1  2008    62    68    45    98
    - 4       1  2009    45    48    42    92
    - 5       2  2006    51    13    75    36
    - 6       2  2007    49    71    34    93
    - 7       2  2008   100    83    22    71
    - 8       2  2009    91    67    28    80
    - 9       3  2006    19    28    85     1
    -10       3  2007    61    38    65    75
    -11       3  2008    32    57    47    51
    -12       3  2009     4    58    63     0
    + 1 1 2006 99 6 54 47 + 2 1 2007 28 79 90 9 + 3 1 2008 7 72 69 24 + 4 1 2009 16 56 6 100 + 5 2 2006 42 58 75 25 + 6 2 2007 64 1 100 6 + 7 2 2008 43 88 37 77 + 8 2 2009 95 74 17 44 + 9 3 2006 34 47 77 38 +10 3 2007 73 31 31 54 +11 3 2008 4 49 93 0 +12 3 2009 57 4 45 96
    # try it yourself 
    +font-style: inherit;"># try it yourself
    @@ -6082,22 +6590,24 @@ font-style: inherit;"># try it yourself
    gapminder %>% 
    -  %>%
    +    unite(unite(
    +        col=col = "country_continent_year", 
    -        country"country_continent_year",
    +        country:year, 
    -        :year,
    +        sep=sep = "_")
    +font-style: inherit;">"_" + )
    # A tibble: 1,704 × 4
        country_continent_year lifeExp      pop gdpPercap
    @@ -6119,34 +6629,37 @@ font-style: inherit;">"_")
    gapminder %>% 
    -  %>%
    +    unite(unite(
    +        col=col = "country_continent_year", 
    -        country"country_continent_year",
    +        country:year, 
    -        :year,
    +        sep=sep = "_") "_"
    +    ) %>% 
    -  %>%
    +    separate(separate(
    +        col=col = "country_continent_year", 
    -           "country_continent_year",
    +        into=into = c("country", "continent", "year"), 
    -           "year"),
    +        sep=sep = "_")
    +font-style: inherit;">"_" + )
    # A tibble: 1,704 × 6
        country     continent year  lifeExp      pop gdpPercap
    @@ -6213,8 +6727,8 @@ font-style: inherit;">"d,e,f,g", "h,i,j")) %>% 
    -  %>%
    +    separate(x, "d,e", "f,g,i")) %>% 
    -  %>%
    +    separate(x, 
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
    + gapminder   * 1.0.0   2023-03-10 [1] CRAN (R 4.3.0)
    + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
    + ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
    + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
    + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
    + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
    + lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
    + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
    + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
    + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
    + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
    + purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
    + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
    + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
    + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
    + tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
    + tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
    + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
    + tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
    + timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
    + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
    + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
    + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
    + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    - ]]> @@ -6293,7 +6891,7 @@ Tip here tidyverse https://lcolladotor.github.io/jhustatcomputing2023/posts/09-tidy-data-and-the-tidyverse/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 10 - Joining data in R @@ -6304,6 +6902,7 @@ Tip +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    Pre-lecture materials

    @@ -6432,7 +7031,7 @@ background-color: null; font-style: inherit;"><- tibble( - id = each = 3), - visit = 2, 3), - outcome = print(outcomes)
    # A tibble: 9 × 3
       id    visit outcome
       <chr> <int>   <dbl>
    -1 a         0   1.54 
    -2 a         1   3.39 
    -3 a         2   3.03 
    -4 b         0   0.309
    -5 b         1   2.52 
    -6 b         2   3.03 
    -7 c         0   2.13 
    -8 c         1   3.12 
    -9 c         2   3.99 
    +1 a 0 3.07 +2 a 1 3.25 +3 a 2 3.93 +4 b 0 2.18 +5 b 1 2.91 +6 b 2 2.83 +7 c 0 1.49 +8 c 1 2.56 +9 c 2 1.46

    Note that subjects are labeled by a unique identifer in the id column.

    @@ -6506,7 +7105,7 @@ background-color: null; font-style: inherit;"><-
    tibble(
    - id = "b", "c"), - house =
    subjects
    @@ -6654,15 +7253,15 @@ font-style: inherit;">"id")
    # A tibble: 9 × 4
       id    visit outcome house   
       <chr> <int>   <dbl> <chr>   
    -1 a         0   1.54  detached
    -2 a         1   3.39  detached
    -3 a         2   3.03  detached
    -4 b         0   0.309 rowhouse
    -5 b         1   2.52  rowhouse
    -6 b         2   3.03  rowhouse
    -7 c         0   2.13  rowhouse
    -8 c         1   3.12  rowhouse
    -9 c         2   3.99  rowhouse
    +1 a 0 3.07 detached +2 a 1 3.25 detached +3 a 2 3.93 detached +4 b 0 2.18 rowhouse +5 b 1 2.91 rowhouse +6 b 2 2.83 rowhouse +7 c 0 1.49 rowhouse +8 c 1 2.56 rowhouse +9 c 2 1.46 rowhouse
    @@ -6687,7 +7286,7 @@ background-color: null; font-style: inherit;"><- tibble( - id = "b", "c"), - visit = 1, 0), - house = "visit"))
    # A tibble: 9 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 a         0   1.54  detached
    -2 a         1   3.39  <NA>    
    -3 a         2   3.03  <NA>    
    -4 b         0   0.309 <NA>    
    -5 b         1   2.52  rowhouse
    -6 b         2   3.03  <NA>    
    -7 c         0   2.13  rowhouse
    -8 c         1   3.12  <NA>    
    -9 c         2   3.99  <NA>    
    +1 a 0 3.07 detached +2 a 1 3.25 <NA> +3 a 2 3.93 <NA> +4 b 0 2.18 <NA> +5 b 1 2.91 rowhouse +6 b 2 2.83 <NA> +7 c 0 1.49 rowhouse +8 c 1 2.56 <NA> +9 c 2 1.46 <NA>
    @@ -6786,7 +7385,7 @@ background-color: null; font-style: inherit;"><- tibble( - id = "b", "c"), - visit = 1, 0), - house = "visit"))
    # A tibble: 9 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 a         0   1.54  <NA>    
    -2 a         1   3.39  <NA>    
    -3 a         2   3.03  <NA>    
    -4 b         0   0.309 <NA>    
    -5 b         1   2.52  rowhouse
    -6 b         2   3.03  <NA>    
    -7 c         0   2.13  rowhouse
    -8 c         1   3.12  <NA>    
    -9 c         2   3.99  <NA>    
    +1 a 0 3.07 <NA> +2 a 1 3.25 <NA> +3 a 2 3.93 <NA> +4 b 0 2.18 <NA> +5 b 1 2.91 rowhouse +6 b 2 2.83 <NA> +7 c 0 1.49 rowhouse +8 c 1 2.56 <NA> +9 c 2 1.46 <NA>
    @@ -6897,8 +7496,8 @@ font-style: inherit;">"visit"))
    # A tibble: 2 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 b         1    2.52 rowhouse
    -2 c         0    2.13 rowhouse
    +1 b 1 2.91 rowhouse +2 c 0 1.49 rowhouse @@ -6925,8 +7524,8 @@ font-style: inherit;">"visit"
    ))
    # A tibble: 2 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 b         1    2.52 rowhouse
    -2 c         0    2.13 rowhouse
    +1 b 1 2.91 rowhouse +2 c 0 1.49 rowhouse @@ -6966,7 +7565,8 @@ font-style: inherit;"># Create first example data frame
    background-color: null; font-style: inherit;"><-
    data.frame(data.frame( + ID = :3, - X1 = "a1", "a2", "a3")) -"a3") +) +# Create second example data frame -df2 df2 <- data.frame(data.frame( + ID = 2:4, - 4, + X2 = "b1", "b2", "b3")) +font-style: inherit;">"b3") +)
    1. Try changing the order from the above e.g. inner_join(df2, df1), semi_join(df2, df1) and anti_join(df2, df1). What changed? What did not change?
    2. @@ -7039,9 +7642,92 @@ Tip + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr       * 1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -7053,7 +7739,7 @@ Tip here tidyverse https://lcolladotor.github.io/jhustatcomputing2023/posts/10-joining-data-in-r/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 11 - Plotting Systems @@ -7064,6 +7750,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      The data may not contain the answer. And, if you torture the data long enough, it will tell you anything. —John W. Tukey

      @@ -7165,10 +7852,10 @@ font-style: inherit;">data(airquality) with(airquality, { - plot(Temp, Ozone) - lines(data(airquality) with(airquality, { - plot(Temp, Ozone, main = "my plot") - lines(data(mpg) mpg %>% - ggplot(aes(displ, hwy)) + - + + geom_point()
      @@ -7458,11 +8145,97 @@ font-style: inherit;">geom_point()
      -

      There are additional functions in ggplot2 that allow you to make arbitrarily sophisticated plots.

      -

      We will discuss more about this in the next lecture.

      +

      There are additional functions in ggplot2 that allow you to make arbitrarily sophisticated plots.

      +

      We will discuss more about this in the next lecture.

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling      0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lattice     * 0.21-8  2023-04-05 [1] CRAN (R 4.3.1)
      + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -7473,7 +8246,7 @@ font-style: inherit;">geom_point
      ()
      ggplot2 data viz https://lcolladotor.github.io/jhustatcomputing2023/posts/11-plotting-systems/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
      12 - The ggplot2 plotting system: qplot() @@ -7484,6 +8257,7 @@ font-style: inherit;">geom_point() +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      “The greatest value of a picture is when it forces us to notice what we never expected to see.” —John Tukey

      @@ -7583,11 +8357,11 @@ Example
      with(airquality, { 
      -        with(airquality, {
      +    plot(Temp, Ozone)
      -            lines(library(tidyverse)
       airquality %>%
      -            ggplot(aes(Temp, Ozone)) + 
      -        +
      +    geom_point() + 
      -        +
      +    geom_smooth(geom_smooth(
      +        method = "loess", 
      -                    "loess",
      +        se = FALSE) FALSE
      +    ) + 
      -        +
      +    theme_minimal()
      @@ -7976,8 +8752,8 @@ $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007,
      penguins %>% 
      -  %>%
      +    count(species)
      @@ -8290,8 +9066,8 @@ font-style: inherit;">facets = . ~ drv) + - + + geom_smooth(data = maacs, color = mopos) + - + + geom_smooth(facets = . ~ mopos) + - + + geom_smooth( +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + bit              4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
      + bit64            4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + crayon           1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver           2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + here           * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling         0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lattice          0.21-8  2023-04-05 [1] CRAN (R 4.3.1)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + Matrix           1.6-1   2023-08-14 [1] CRAN (R 4.3.0)
      + mgcv             1.9-0   2023-07-11 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + nlme             3.1-163 2023-08-09 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rprojroot        2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + vroom            1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      - ]]> @@ -8719,7 +9591,7 @@ Tip ggplot2 data viz https://lcolladotor.github.io/jhustatcomputing2023/posts/12-ggplot2-plotting-system-part-1/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 13 - The ggplot2 plotting system: ggplot() @@ -8730,6 +9602,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -8888,12 +9761,13 @@ background-color: null; font-style: inherit;">"data", "bmi_pm25_no2_sim.csv"), - col_types = "nnci") -maacs
      +font-style: inherit;">"nnci" +) +maacs
      # A tibble: 517 × 4
          logpm25 logno2_new bmicat        NocturnalSympt
      @@ -8943,13 +9817,15 @@ font-style: inherit;"><- ggplot(maacs, aes(aes(
      +    x = logpm25, 
      -                       x = logpm25,
      +    y = NocturnalSympt))
      -y = NocturnalSympt
      +))
      +summary(g)
      @@ -8989,7 +9865,7 @@ background-color: null; font-style: inherit;"><- maacs %>% - ggplot(<- maacs %>% - ggplot(geom_point()
      g + 
      -  +
      +    geom_point() + 
      -  +
      +    geom_smooth()
      @@ -9067,13 +9943,13 @@ font-style: inherit;">geom_smooth()
      g + 
      -  +
      +    geom_point() + 
      -  +
      +    geom_smooth(# try it yourself
       library(palmerpenguins)
      -penguins 
      +penguins
      # A tibble: 344 × 8
          species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
      @@ -9149,13 +10025,13 @@ Example
       
      g + 
      -  +
      +    geom_point() + 
      -  +
      +    geom_smooth("lm") +
      -      facet_grid(. ~ bmicat) 
      +font-style: inherit;">~ bmicat)
      @@ -9218,9 +10094,9 @@ font-style: inherit;">4, alpha = 11 // 2)
      @@ -9257,9 +10133,9 @@ font-style: inherit;">4, alpha = 11 // 2)
      @@ -9280,52 +10156,55 @@ font-style: inherit;">2)
      g + 
      -  +
      +    geom_point(aes(color = bmicat), 
      -             color = bmicat),
      +        size = 2, 
      -             2,
      +        alpha = 11 // 2) 2
      +    ) + 
      -  +
      +    geom_smooth(geom_smooth(
      +        size = 4, 
      -              4,
      +        linetype = 3, 
      -              3,
      +        method = "lm", 
      -              "lm",
      +        se = FALSE)
      +font-style: inherit;">FALSE + )
      Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
       ℹ Please use `linewidth` instead.
      @@ -9368,8 +10247,8 @@ Note
      g + 
      -  +
      +    geom_point(aes(color = bmicat)) + 
      -  +
      +    theme_bw(# try it yourself
       library(palmerpenguins)
      -penguins 
      +penguins
      # A tibble: 344 × 8
          species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
      @@ -9460,8 +10339,8 @@ Note
       
      g + 
      -  +
      +    geom_point(aes(color = bmicat)) + 
      -  +
      +    labs(title = "MAACS Cohort") + 
      -  +
      +    labs(labs(
      +        x = "log " * PM[2.5]), 
      -       2.5]),
      +        y = "Nocturnal Symptoms")
      +font-style: inherit;">"Nocturnal Symptoms" + )
      @@ -9517,7 +10398,8 @@ font-style: inherit;">"Nocturnal Symptoms")
      background-color: null; font-style: inherit;"><- data.frame(data.frame( + x = 1:100, - 100, + y = rnorm(100)) -testdat[100) +) +testdat[50,50, 2] <- 100 100 ## Outlier! -plot(testdat$x, - testdat$x, + testdat$y, - type = "l", - "l", + ylim = c(-3,3, 3))
      +font-style: inherit;">3) +)
      @@ -9619,13 +10503,13 @@ font-style: inherit;">geom_line()
      g + 
      -  +
      +    geom_line() + 
      -  +
      +    ylim(3)
      g + 
      -  +
      +    geom_line() + 
      -  +
      +    coord_cartesian(<- maacs %>%
      -            ggplot(geom_point(alpha = 11 // 3) + 
      -        +
      +    facet_grid(bmicat ~ no2tert) + 
      -        +
      +    geom_smooth(method=method = "lm", se=se = FALSE, col=col = "steelblue") + 
      -        +
      +    theme_bw(base_size = 10) + 
      -        +
      +    labs(* PM[2.5])) + 
      -        +
      +    labs(y = "Nocturnal Symptoms") + 
      -        +
      +    labs(
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + bit              4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
      + bit64            4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + crayon           1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver           2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + here           * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling         0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lattice          0.21-8  2023-04-05 [1] CRAN (R 4.3.1)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + Matrix           1.6-1   2023-08-14 [1] CRAN (R 4.3.0)
      + mgcv             1.9-0   2023-07-11 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + nlme             3.1-163 2023-08-09 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rprojroot        2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + vroom            1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      - ]]> @@ -9922,7 +10902,7 @@ Tip ggplot2 data viz https://lcolladotor.github.io/jhustatcomputing2023/posts/13-ggplot2-plotting-system-part-2/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 14 - R Nuts and Bolts @@ -9933,6 +10913,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -11572,9 +12553,93 @@ Tip
      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -11583,7 +12648,7 @@ Tip R programming https://lcolladotor.github.io/jhustatcomputing2023/posts/14-r-nuts-and-bolts/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 15 - Control Structures @@ -11594,6 +12659,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -11751,20 +12817,20 @@ font-style: inherit;"><- runif(n=n = 1, min=min = 0, max=max = 10) +font-style: inherit;">10) x
      -
      [1] 1.907048
      +
      [1] 3.521267

      Then, we can write and if-else statement that tests whethere x is greater than 3 or not.

      @@ -11775,14 +12841,14 @@ font-style: inherit;">> 3
      -
      [1] FALSE
      +
      [1] TRUE

      If x is greater than 3, then the first condition occurs. If x is not greater than 3, then the second condition occurs.

      if(x if (x > <- 10
      -  } } else {
           y <- 0
      -  }
      +}

      Finally, we can auto print y to see what the value is.

      y
      -
      [1] 0
      +
      [1] 10

      This expression can also be written a different (but equivalent!) way in R.

      @@ -11815,7 +12881,7 @@ font-style: inherit;">0 background-color: null; font-style: inherit;"><- if(x if (x > 3) { 10 - } } else { +font-style: inherit;">else { 0 - } +} y
      -
      [1] 0
      +
      [1] 10
      @@ -11898,7 +12964,7 @@ font-style: inherit;">library(tidyverse) library(palmerpenguins) -penguins
      +penguins
      # A tibble: 344 × 8
          species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
      @@ -11928,7 +12994,7 @@ font-style: inherit;">library(palmerpenguins)
       
      for(i for (i in :10) {
      -            print(i)
       }
      @@ -11979,7 +13045,7 @@ background-color: null; font-style: italic;">## create for loop for(i for (i in :4) { - ## Print out each element of 'x' - print(x[i]) +font-style: inherit;">print(x[i]) }
      [1] "a"
      @@ -12026,7 +13092,7 @@ background-color: null;
       font-style: italic;">## create for loop
       for(i for (i in :4) {
      -            ## Print out just 'i'
      -            print(i)
       }
      @@ -12072,12 +13138,12 @@ background-color: null; font-style: italic;">## Generate a sequence based on length of 'x' for(i for (i in seq_along(x)) { - seq_along(x)) { + print(x[i]) } @@ -12092,10 +13158,10 @@ font-style: inherit;">print(x[i])
      for(babyshark for (babyshark in x) {
      -            print(babyshark)
       }
      @@ -12109,10 +13175,10 @@ font-style: inherit;">print(babyshark)
      for(candyisgreat for (candyisgreat in x) {
      -            print(candyisgreat)
       }
      @@ -12126,10 +13192,10 @@ font-style: inherit;">print(candyisgreat)
      for(RememberToVote for (RememberToVote in x) {
      -            print(RememberToVote)
       }
      @@ -12144,28 +13210,28 @@ font-style: inherit;">print(RememberToVote)
      for(for (1999 in x) {
      -            print(1999)
       }
      -
      Error: <text>:1:5: unexpected numeric constant
      -1: for(1999
      -        ^
      +
      Error: <text>:1:6: unexpected numeric constant
      +1: for (1999
      +         ^

      For one line loops, the curly braces are not strictly necessary.

      for(i for (i in 3)
       
      for(i for (i in seq_len(nrow(x))) {
      -            for(j for (j in seq_len(ncol(x))) {
      -                        print(x[i, j])
      -        }   
      +    }
       }
      [1] 1
      @@ -12304,15 +13370,15 @@ background-color: null;
       font-style: inherit;">0
       while(count while (count < 10) {
      -            print(count)
      -        count     count <- count 1)
       
       while(z while (z >= <= 10) {
      -        coin     coin <- 1, 0.5)
      -        
      -        
      +    if(coin if (coin == 1) {  1) { ## random walk
      -                z         z <- z + 1
      -        }     } else {
      -                z         z <- z - 1
      -        } 
      +    }
       }
       1e-8
       repeat {
      -        x1     x1 <- computeEstimate()
      -        
      -        
      +    if(if (abs(x1 - x0) < tol) {  < tol) { ## Close enough?
      -                        break
      -        }     } else {
      -                x0         x0 <- x1
      -        } 
      +    }
       }
      @@ -12608,7 +13674,7 @@ Pro-tip
      for(i for (i in :100) {
      -            if(i if (i <= 20) {
      -                        ## Skip the first 20 iterations
      -                        next                 
      -        }
      -        next
      +    }
      +    ## Do something here
       }
      @@ -12640,7 +13706,7 @@ font-style: italic;">## Do something here
      for(i for (i in :100) {
      -          print(i)
       
      -          if(i if (i > 20) {
      -                      ## Stop loop after 20 iterations
      -                      break  
      -      }     
      +font-style: inherit;">break
      +    }
       }
      @@ -12721,9 +13787,93 @@ Tip
      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -12732,7 +13882,7 @@ Tip R programming https://lcolladotor.github.io/jhustatcomputing2023/posts/15-control-structures/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT 16 - Functions @@ -12743,6 +13893,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -12854,7 +14005,7 @@ background-color: null; font-style: inherit;"><- function() { - ## This is an empty function } @@ -12863,7 +14014,7 @@ background-color: null; font-style: italic;">## Functions have their own class class(f)
      +font-style: inherit;">class(f)
      [1] "function"
      @@ -12872,7 +14023,7 @@ background-color: null; font-style: italic;">## Execute this function f()
      +font-style: inherit;">f()
      NULL
      @@ -12887,10 +14038,10 @@ background-color: null; font-style: inherit;"><- function() { - # this is the function body - hello hello <- \n" - cat(hello) +font-style: inherit;">cat(hello) } <- function(num) { - for(i for (i in seq_len(num)) { - hello hello <- \n" - cat(hello) - } +font-style: inherit;">cat(hello) + } } <- function(num) { - hello hello <- \n" - for(i for (i in seq_len(num)) { - cat(hello) - } - chars } + chars <- nchar(hello) * num - chars + chars } meaningoflife num = 1) { - hello hello <- \n" - for(i for (i in seq_len(num)) { - cat(hello) - } - chars } + chars <- nchar(hello) * num - chars + chars } f() f() ## Use default value for 'num'
      @@ -13146,7 +14297,7 @@ font-style: italic;">## Use default value for 'num'f(2) 2) ## Use user-specified value
      @@ -13241,7 +14392,7 @@ font-style: inherit;">100, 2, 1) 1) ## Generate some data
      @@ -13268,9 +14419,9 @@ background-color: null; font-style: italic;">## Positional match first argument, default for 'na.rm' sd(mydata) +font-style: inherit;">sd(mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286
      ## Specify 'x' argument by name, default for 'na.rm'
       background-color: null;
       font-style: inherit;">sd(x = mydata)                 
      +font-style: inherit;">x = mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286
      x = mydata, na.rm = FALSE) 
      +font-style: inherit;">FALSE)
      -
      [1] 1.110707
      +
      [1] 1.014286
      @@ -13315,9 +14466,9 @@ font-style: inherit;">na.rm = FALSE, x = mydata) +font-style: inherit;">x = mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286

      You can mix positional matching with matching by name.

      @@ -13331,7 +14482,7 @@ font-style: inherit;">na.rm = FALSE, mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286

      Here, the mydata object is assigned to the x argument, because it’s the only argument not yet specified.

      @@ -13452,12 +14603,12 @@ background-color: null; font-style: inherit;"><- function(a, b) { - a a^2 -} +} f(<- function(a, b) { - print(a) - print(b) } @@ -13526,7 +14677,7 @@ font-style: italic;">## Pass '...' to 'plot' function
      function (x, ...) 
       UseMethod("mean")
      -<bytecode: 0x138e33de8>
      +<bytecode: 0x1075ea1e8>
       <environment: namespace:base>
      @@ -13558,7 +14709,7 @@ font-style: inherit;">"four"
      , "five", sep=sep = "_")
      @@ -13774,10 +14925,10 @@ background-color: null; font-style: inherit;"><- function(x) { - x x + y -}
      +}

      In many programming languages, this would be an error, because y is not defined inside the function.

      In R, this is valid code because R uses rules called lexical scoping to find the value associated with a name.

      @@ -13828,7 +14979,7 @@ background-color: null; font-style: inherit;"><- function(x, y) { - if (< 0.1) { - sum(x, y) - } } else { - sum(x, y) * 1.1 - } + } } 2))
      
         3 3.3 
      - 95 905 
      + 82 918
      @@ -13977,9 +15128,62 @@ Tip
      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -13989,7 +15193,7 @@ Tip programming functions https://lcolladotor.github.io/jhustatcomputing2023/posts/16-functions/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
      17 - Vectorization and loop functionals @@ -14000,6 +15204,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -14133,7 +15338,7 @@ font-style: inherit;">1:10 +font-style: inherit;">10 x + y @@ -14170,9 +15375,9 @@ background-color: null; font-style: inherit;">:10 -xx *y +font-style: inherit;">* y
       [1]   1   4   9  16  25  36  49  64  81 100
      @@ -14212,7 +15417,7 @@ font-style: inherit;">*y X <- as.list(X) .Internal(lapply(X, FUN)) } -<bytecode: 0x12d12f9d0> +<bytecode: 0x12d9335d0> <environment: namespace:base> @@ -14267,8 +15472,8 @@ font-style: inherit;">10)) [1] 1 2 3 4 5 $b - [1] 0.9398820 0.6808533 -0.5230355 -1.4199458 -0.9806165 0.2871580 - [7] 1.2836726 -1.1063673 1.4649872 0.4810928 + [1] -0.6113707 0.5950531 0.6319343 0.5595441 0.3188799 -0.4400711 + [7] 1.6687028 0.4501791 1.4356856 -0.3858270
      lapply(x, mean)
      [1] 3 $b -[1] 0.1107681 +[1] 0.422271

      Notice that here we are passing the mean() function as an argument to the lapply() function.

      @@ -14341,13 +15546,13 @@ font-style: inherit;">lapply(x, mean) [1] 2.5 $b -[1] -0.3599091 +[1] 0.1655327 $c -[1] 1.715792 +[1] 0.9767504 $d -[1] 5.062643 +[1] 4.951283 @@ -14369,16 +15574,16 @@ background-color: null; font-style: inherit;">lapply(x, runif)
      [[1]]
      -[1] 0.4687761
      +[1] 0.5924944
       
       [[2]]
      -[1] 0.9249996 0.3011933
      +[1] 0.8660588 0.3277243
       
       [[3]]
      -[1] 0.5811661 0.1755092 0.5232761
      +[1] 0.5009080 0.2951163 0.6264905
       
       [[4]]
      -[1] 0.6459540 0.3708483 0.6723211 0.7998949
      +[1] 0.04282267 0.14951908 0.82034538 0.64614463
      @@ -14437,16 +15642,16 @@ background-color: null; font-style: inherit;">10)
      [[1]]
      -[1] 8.291326
      +[1] 5.653385
       
       [[2]]
      -[1] 8.893872 9.878169
      +[1] 8.325503 7.234466
       
       [[3]]
      -[1] 5.5325986 0.4374242 7.2026176
      +[1] 5.968981 9.174316 7.920678
       
       [[4]]
      -[1] 1.6807689 0.2755822 8.5226424 9.5019399
      +[1] 9.491500 3.023649 2.990945 8.757496

      So now, instead of the random numbers being between 0 and 1 (the default), the are all between 0 and 10.

      @@ -14495,7 +15700,7 @@ font-style: inherit;">6, 3, 2)) +font-style: inherit;">2)) x
      $a
      @@ -14516,9 +15721,11 @@ $b
       background-color: null;
       font-style: inherit;">lapply(x, function(elt) { elt[,function(elt) {
      +    elt[, 1] })
      +font-style: inherit;">1] +})
      $a
       [1] 1 2
      @@ -14538,7 +15745,7 @@ background-color: null;
       font-style: inherit;"><- function(elt) {
      -        elt[,     elt[, 1]
       }
      @@ -14621,13 +15828,13 @@ font-style: inherit;">lapply(x, mean)
      [1] 2.5 $b -[1] -0.3561419 +[1] -0.1478465 $c -[1] 1.078816 +[1] 0.819794 $d -[1] 5.020936 +[1] 4.954484

      Notice that lapply() returns a list (as usual), but that each element of the list has length 1.

      @@ -14635,10 +15842,10 @@ $d
      sapply(x, mean) 
      +font-style: inherit;">sapply(x, mean)
               a          b          c          d 
      - 2.5000000 -0.3561419  1.0788156  5.0209365 
      + 2.5000000 -0.1478465 0.8197940 4.9544836

      Because the result of lapply() was a list where each element had length 1, sapply() collapsed the output into a numeric vector, which is often more useful than a list.

      @@ -14706,16 +15913,16 @@ background-color: null; font-style: inherit;">split(x, f)
      $`1`
      - [1] -0.88306749 -1.86719488  0.63289913  1.05916422 -0.55471433  0.14180641
      - [7]  0.07777047 -0.09623353  0.80288817 -0.07352678
      + [1]  0.78541247 -0.06267966 -0.89713180  0.11796725  0.66689447 -0.02523006
      + [7] -0.19081948  0.44974528 -0.51005146 -0.08103298
       
       $`2`
      - [1] 0.52710414 0.78458044 0.85538500 0.11115802 0.43938934 0.30846324
      - [7] 0.12611702 0.92352094 0.07062165 0.61957181
      + [1] 0.29977033 0.31873253 0.53182993 0.85507540 0.21585775 0.89867742
      + [7] 0.78109747 0.06887742 0.79661568 0.60022565
       
       $`3`
      - [1] -0.67639542  0.72492785  0.10007215  0.29327660  0.85127149  0.50446636
      - [7]  0.05115469  2.29881193 -0.63035160  2.09792647
      + [1] -0.38262045 0.06294368 0.41768485 1.57972821 1.17555228 1.47374130 + [7] 1.79199913 2.25569283 1.55226509 -1.51811384

      A common idiom is split followed by an lapply.

      @@ -14727,13 +15934,13 @@ background-color: null; font-style: inherit;">split(x, f), mean)
      $`1`
      -[1] -0.07602086
      +[1] 0.0253074
       
       $`2`
      -[1] 0.4765912
      +[1] 0.536676
       
       $`3`
      -[1] 0.5615161
      +[1] 0.8408873
      @@ -14813,7 +16020,7 @@ background-color: null; font-style: inherit;">lapply(s, function(x) { - colMeans(x[, sapply(s, function(x) { - colMeans(x[, sapply(s, function(x) { - colMeans(x[, "Ozone", "Solar.R", "Wind")], - "Wind")], + na.rm = TRUE) -}) +font-style: inherit;">TRUE + ) +})
                      5         6          7          8         9
       Ozone    23.61538  29.44444  59.115385  59.961538  31.44828
      @@ -14970,7 +16178,7 @@ font-style: inherit;">gl(3, 10)   
      +font-style: inherit;">10)
       f
       [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
      @@ -14980,8 +16188,8 @@ Levels: 1 2 3
      background-color: null; font-style: inherit;">tapply(x, f, mean)
      -
               1          2          3 
      -0.03546858 0.50033323 1.23684289 
      +
              1         2         3 
      +0.3554738 0.5195466 0.6764006 
      @@ -14993,13 +16201,13 @@ background-color: null; font-style: inherit;">tapply
      (x, f, range)
      $`1`
      -[1] -1.597023  1.582242
      +[1] -1.431912  2.695089
       
       $`2`
      -[1] 0.01799498 0.98731564
      +[1] 0.1263379 0.8959040
       
       $`3`
      -[1] -0.1673642  2.8815083
      +[1] -1.207741 1.696309
      @@ -15050,31 +16258,31 @@ font-style: inherit;">10) background-color: null; font-style: inherit;">head(x)
      -
                  [,1]        [,2]        [,3]         [,4]       [,5]       [,6]
      -[1,] -0.01270296  0.12521307 -0.35347017 -0.288597192  0.4754956 -1.4952687
      -[2,] -1.76025729 -0.36661801  1.57260727  0.909927684 -0.8722067  2.4145309
      -[3,] -0.04541822 -0.08756584  0.09477815  0.587649433 -0.2839712 -0.3948512
      -[4,] -0.79873007  2.33988787  0.04433525 -0.043574962  1.8351096 -1.4161750
      -[5,]  0.57385840  0.22221005 -1.15025884  0.002239365 -1.1274753  0.2699411
      -[6,] -0.79337310  0.15304664  0.05230485  2.088306453 -2.5307486  1.0901328
      -            [,7]         [,8]       [,9]       [,10]
      -[1,] -0.06995917 -0.970955222 -0.6081838  0.36135088
      -[2,]  0.98219144  1.226671950  0.7388203  0.99107134
      -[3,]  0.36028126  1.080908318 -1.4657096 -0.83599160
      -[4,] -0.46741177 -0.341382567  0.6639626  0.90447006
      -[5,] -0.63266831 -0.828562584 -0.5595121 -0.51470923
      -[6,]  0.44488488 -0.005120275 -1.2554960 -0.09944684
      +
                [,1]       [,2]       [,3]        [,4]       [,5]       [,6]
      +[1,]  1.589728  0.7733454 -1.3311072 -0.77084025 -0.1947478  0.1748546
      +[2,]  2.395088  0.3243910 -1.5133366  0.09199955  0.3850993  0.1851718
      +[3,]  1.039643 -2.1721402 -0.9933217 -1.89261272  0.1748050  1.0563987
      +[4,] -1.580978 -0.9884235 -1.4976744 -0.51011200 -2.7512079  0.5547477
      +[5,]  1.264799 -2.0551874  0.4483417 -3.08561764 -0.1549359 -0.8384706
      +[6,]  1.756973  0.9244522  0.2740854 -0.61441465 -1.0661350  1.4497808
      +           [,7]        [,8]       [,9]      [,10]
      +[1,]  0.7163086 -0.01817166  0.2193225 -0.3346788
      +[2,]  0.7606851  0.42082416  0.1099027  0.2834439
      +[3,] -1.1218204 -1.17000278  0.4302792 -0.5684986
      +[4,]  0.6082452  0.46763465 -0.3481830 -0.1765517
      +[5,] -0.7460224 -0.01123782  1.8116342 -0.1033175
      +[6,]  1.0160202 -0.82361401 -0.1616471 -0.1628032
      apply(x, 2, mean)  2, mean) ## Take the mean of each column
      -
       [1] -0.24958041  0.14629702 -0.14633652 -0.26691102 -0.15595976  0.07473874
      - [7]  0.05314485  0.07476061 -0.30001733  0.14398756
      +
       [1]  0.083759441 -0.134507982 -0.246473461 -0.371270102 -0.078433882
      + [6] -0.101665531 -0.007126106 -0.003193726  0.114767264  0.070612124
      @@ -15095,14 +16303,14 @@ Example background-color: null; font-style: inherit;">apply(x, 1, sum) 1, sum) ## Take the mean of each row
      -
       [1] -2.8370777  5.8367390 -0.9898905  2.7204911 -3.7449375 -0.8555091
      - [7]  2.4826554  0.9494142 -3.9096827  0.2117756  0.3672752 -2.7321397
      -[13]  2.4937133 -2.7042877 -4.6029774 -6.2231452 -1.9386089  0.5097158
      -[19] -2.2691720  4.7181237
      +
       [1]  0.82401382  3.44326903 -5.21727094 -6.22250299 -3.47001414  2.59269751
      + [7] -1.76049948 -0.54534465  1.26993157 -0.05660623  1.89101638  2.60154094
      +[13] -0.80804188  1.96321614 -2.68869045  0.56525640  0.44214056 -4.25890694
      +[19] -3.02509115 -1.01075274
      @@ -15195,20 +16403,20 @@ font-style: inherit;">10) background-color: null; font-style: inherit;">head(x)
      -
                  [,1]        [,2]       [,3]      [,4]       [,5]       [,6]
      -[1,] -1.09759334 -0.58191082 -0.6190918 0.7545051 -1.6708063 -1.2382435
      -[2,] -0.04952269  0.50872978  1.6895949 0.1657323  1.7746160  1.7427081
      -[3,]  0.45414643  1.22539326  0.6284307 0.2973018  1.0887260  0.4581224
      -[4,] -0.03995540  0.23679937 -0.7905091 0.6370128  0.7911886 -0.2637556
      -[5,]  0.12208387 -1.41751608  1.2769118 0.8510867 -0.4888010 -0.1692706
      -[6,] -1.31501439 -0.08597665 -0.7616683 0.7553028  1.1584617 -2.0701933
      -           [,7]        [,8]        [,9]       [,10]
      -[1,] -1.1974074  1.22719350 -0.32231319  1.16291606
      -[2,] -0.6335309  0.95729514 -0.84747657  0.91182060
      -[3,] -0.7138229 -1.88743158  0.07026544 -2.01649459
      -[4,] -0.2273346  1.76161541 -1.26793435 -1.89014826
      -[5,]  0.3346429 -0.75236320  0.31607231  0.09632038
      -[6,] -1.0845780  0.02416961  0.50295930  1.93484470
      +
                  [,1]         [,2]      [,3]       [,4]        [,5]         [,6]
      +[1,]  0.58654399 -0.502546440 1.1493478  0.6257709 -0.02866237  1.490139530
      +[2,] -0.14969248  0.327632870 0.0202589  0.2889600 -0.16552218 -0.829703298
      +[3,]  1.12561766  0.707836011 0.6038607 -0.6722613  0.85092968  0.550785886
      +[4,] -1.71719604  0.554424755 0.4229181  0.1484968  0.22134369  0.258853355
      +[5,]  0.31827641  1.555568589 0.8971850 -0.7742244  0.45459793 -0.043814576
      +[6,] -0.08429415  0.001737282 0.1906608  1.1145869  0.54156791 -0.004889302
      +           [,7]        [,8]       [,9]      [,10]
      +[1,] -0.7879713  1.02206400 -1.0420765 -1.2779945
      +[2,]  1.7217146  0.06728039  0.6408182 -0.3551929
      +[3,] -0.2439192 -0.71553120 -0.8273868  0.2559954
      +[4,] -0.1085818 -0.28763268  1.9010457  1.7950971
      +[5,] -1.4082747 -1.07621679  0.5428189  0.4538626
      +[6,] -1.0644006 -0.04186614 -0.8150566  1.0490749
      c(0.25, 0.75))    
      +font-style: inherit;">0.75))
      -
                [,1]        [,2]       [,3]       [,4]       [,5]       [,6]
      -25% -1.1724539 0.004291043 -0.5178008 -0.6588207 -0.4089184 -1.0038506
      -75%  0.4853005 1.506519993  0.5858536  0.5369595  0.3300002  0.6922169
      +
                [,1]       [,2]       [,3]        [,4]       [,5]        [,6]
      +25% -0.7166151 -0.1615648 -0.5651758 -0.04431213 -0.5916219 -0.07368714
      +75%  0.9229907  0.3179646  0.6818422  0.52154809  0.5207637  0.45384114
                 [,7]       [,8]       [,9]      [,10]      [,11]      [,12]
      -25% -0.9842272 -1.0220842 -0.7082846 -0.8992771 -0.3444137 -0.4086714
      -75%  0.2951763  0.6737552  0.1853825  1.0853115  0.6014494  0.3695608
      -         [,13]      [,14]        [,15]      [,16]       [,17]      [,18]
      -25% -1.1790230 -0.7932644 -0.002708936 -0.5149016 -0.83974314 -0.7881085
      -75%  0.1577916  0.9562642  1.100022074  0.4498309 -0.04954139  0.2352183
      +25% -0.4355993 -0.1313015 -0.8149658 -0.9260982 0.02077709 -0.1343613
      +75%  1.5985929  0.8889319  0.2213238  0.3661333 0.82424899  0.4156328
      +         [,13]      [,14]      [,15]      [,16]      [,17]      [,18]
      +25% -0.1281593 -0.6691927 -0.2824997 -0.6574923 0.06421797 -0.7905708
      +75%  1.3073689  1.2450340  0.5072401  0.5023885 1.08294108  0.4653062
                [,19]      [,20]
      -25% 0.03656589 -0.7393304
      -75% 0.35820288  0.5060296
      +25% -0.5826196 -0.6965163 +75% 0.1313324 0.6849689

      Notice that I had to pass the probs = c(0.25, 0.75) argument to quantile() via the ... argument to apply().

      @@ -15268,7 +16476,7 @@ background-color: null; font-style: inherit;"><- function(mu, sigma, x) { - sum(((x <- rnorm(100) 100) ## Generate some data sumsq(mu=mu = 1, sigma=sigma = 1, x) 1, x) ## This works (returns one value)
      -
      [1] 201.5111
      +
      [1] 248.8765

      However, passing a vector of mus or sigmas won’t work with this function because it’s not vectorized.

      @@ -15326,11 +16534,11 @@ font-style: inherit;">1
      :10, x) 10, x) ## This is not what we want
      -
      [1] 121.9851
      +
      [1] 119.3071
      @@ -15365,8 +16573,8 @@ font-style: inherit;">:10, x)
      -
       [1] 201.5111 127.6611 113.3086 108.0569 105.5217 104.0882 103.1900 102.5851
      - [9] 102.1553 101.8371
      +
       [1] 248.8765 146.5055 124.7964 116.2695 111.8983 109.2945 107.5867 106.3890
      + [9] 105.5067 104.8318

      Pretty cool, right?

      @@ -15427,9 +16635,62 @@ Tip +
      + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -15439,7 +16700,7 @@ Tip programming functions https://lcolladotor.github.io/jhustatcomputing2023/posts/17-loop-functions/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
      18 - Debugging R Code @@ -15450,6 +16711,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -15666,30 +16928,30 @@ background-color: null; font-style: inherit;"><- function(x) { - if(x if (x > 0) { - print("x is greater than zero") - } } else { - print("x is less than or equal to zero") - } - } + invisible(x) +font-style: inherit;">invisible(x) }

      This function is simple:

      @@ -15736,42 +16998,43 @@ background-color: null; font-style: inherit;"><-
      function(x) {
      - if(if (is.na(x)) - is.na(x)) { + print("x is a missing value!") - } else if(x if (x > 0) - 0) { + print("x is greater than zero") - } else - else { + print("x is less than or equal to zero") - } + invisible(x) -} +}

      Now we can run the following.

      @@ -15807,7 +17070,7 @@ font-style: inherit;">2))
      background-color: null; font-style: inherit;">print_message2
      (x)
      -
      Error in if (is.na(x)) print("x is a missing value!") else if (x > 0) print("x is greater than zero") else print("x is less than or equal to zero"): the condition has length > 1
      +
      Error in if (is.na(x)) {: the condition has length > 1

      Now what?? Why are we getting this warning?

      @@ -15828,54 +17091,56 @@ background-color: null; font-style: inherit;"><-
      function(x) {
      - if(if (length(x) > 1L) - > 1L) { + stop("'x' has length > 1") - } + if(if (is.na(x)) - is.na(x)) { + print("x is a missing value!") - } else if(x if (x > 0) - 0) { + print("x is greater than zero") - } else - else { + print("x is less than or equal to zero") - } + invisible(x) -} +}

      Now when we pass print_message3() a vector, we should get an error.

      @@ -16436,14 +17701,14 @@ background-color: null; font-style: inherit;"><- function(d) { - if (!is.numeric(d)) { - stop(call. = FALSE) - } - d } + d + +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
      + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + reprex      * 2.0.2   2022-08-17 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -16502,7 +17825,7 @@ Tip programming debugging https://lcolladotor.github.io/jhustatcomputing2023/posts/18-debugging-r-code/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
      19 - Error Handling and Generation @@ -16513,6 +17836,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -16653,8 +17977,8 @@ Example background-color: null; font-style: inherit;"><- function(){ - function() { + message(! background-color: null; font-style: inherit;"><- function(){ - function() { + stop(<- function(n){ - function(n) { + stopifnot(n <= 0) - n + n } "Consider yourself warned!")<- function(x){ - function(x) { + warning("Generating an NA.") - NA } @@ -16891,16 +18215,16 @@ font-style: inherit;">"seven")) background-color: null; font-style: inherit;"><- function(expr){ - function(expr) { + tryCatch(expr, - error = function(e){ - function(e) { + message(\n", e) - }, - }, + warning = function(w){ - function(w) { + message(\n", w) - }, - }, + finally = { - message("Finally done!") - }) -} + } + ) +}

      This function takes an expression as an argument and tries to evaluate it. If the expression can be evaluated without any errors or warnings then the result of the expression is returned and the message Finally done! is printed to the R console. If an error or warning is generated, then the functions that are provided to the error or warning arguments are printed. Let’s try this function out with a few examples.

      beera({
      -      2 2
       
      beera({
      -      "two" 
      beera({
      -      as.numeric(<- function(n){
      -  n function(n) {
      +    n %% "two")
      background-color: null; font-style: inherit;"><-
      function(n){ - function(n) { + tryCatch(n == 0, - error = function(e){ - function(e) { + FALSE - }) -} - - } + ) +} + +is_even_error("eight")
      background-color: null; font-style: inherit;"><- function(n){ - function(n) { + is.numeric(n) +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -17197,7 +18576,7 @@ Tip programming debugging https://lcolladotor.github.io/jhustatcomputing2023/posts/19-error-handling-and-generation/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT
      20 - Working with dates and times @@ -17208,6 +18587,7 @@ Tip +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -17337,7 +18717,7 @@ Pro-tip background-color: null; font-style: inherit;">install.packages("lubridate")
      +font-style: inherit;">"lubridate")
      library(tidyverse)
       library(lubridate) 
      +font-style: inherit;">library(lubridate)
      @@ -17386,7 +18766,7 @@ font-style: inherit;">today() background-color: null; font-style: inherit;">now()
      -
      [1] "2023-08-17 17:32:07 EDT"
      +
      [1] "2023-08-17 21:47:51 EDT"

      Otherwise, there are three ways you are likely to create a date/time:

      @@ -17496,7 +18876,7 @@ font-style: inherit;">20170131) background-color: null; font-style: inherit;">ymd("2016-09-13") "2016-09-13") ## International standard
      @@ -17506,7 +18886,7 @@ font-style: italic;">## International standard
      background-color: null; font-style: inherit;">ymd("2016/09/13") "2016/09/13") ## Just figure it out
      @@ -17516,7 +18896,7 @@ font-style: italic;">## Just figure it out
      background-color: null; font-style: inherit;">mdy("09-13-2016") "09-13-2016") ## Mostly U.S.
      @@ -17526,7 +18906,7 @@ font-style: italic;">## Mostly U.S.
      background-color: null; font-style: inherit;">dmy("13-09-2016") "13-09-2016") ## Europe
      @@ -17540,16 +18920,18 @@ font-style: italic;">## Europe
      background-color: null; font-style: inherit;"><- c(c( + "2016-04-05", - "2016-04-05", + "2016/05/06", - "2016,10,4") -"2016,10,4" +) +ymd(x)
      @@ -17569,8 +18951,8 @@ font-style: inherit;">library(nycflights13) flights %>% - %>% + select(year, month, day)
      @@ -17599,13 +18981,13 @@ font-style: inherit;">select(year, month, day)
      flights %>% 
      -  %>%
      +    select(year, month, day) %>% 
      -  %>%
      +    mutate(
       
      flights %>% 
      -  %>%
      +    select(year, month, day, hour, minute)
      @@ -17696,7 +19078,7 @@ font-style: inherit;">today())
      background-color: null; font-style: inherit;">now()
      -
      [1] "2023-08-17 17:32:08 EDT"
      +
      [1] "2023-08-17 21:47:52 EDT"
      "1970-01-01 01:00")
       class(x) 
      +font-style: inherit;">class
      (x)
      [1] "POSIXct" "POSIXt" 
      @@ -17975,7 +19357,7 @@ font-style: inherit;">"2012-01-01", tz = "") "") ## Midnight y ## this works background-color: null; font-style: inherit;">+ y ## what??? why does this not work? +font-style: italic;">## what??? why does this not work?
      Error in `+.POSIXt`(x, y): binary '+' is not defined for "POSIXt" objects
      @@ -18059,13 +19441,13 @@ font-style: inherit;">class(x) background-color: null; font-style: inherit;">+ 33 ** 6060 ** 60 date(y) background-color: null; font-style: inherit;">+ 1 +font-style: inherit;">1
      [1] "2011-01-10"
      @@ -18208,17 +19590,19 @@ font-style: inherit;"><- ymd_hms(c(c( + "2012-10-25 01:13:46", - "2015-04-23 15:11:23"), "2015-04-23 15:11:23" +), tz = "") -year(x)
      @@ -18253,17 +19637,19 @@ font-style: inherit;"><- ymd_hms(c(c( + "2012-10-25 01:13:46", - "2015-04-23 15:11:23"), "2015-04-23 15:11:23" +), tz = "") -minute(x)
      @@ -18424,7 +19810,7 @@ font-style: inherit;">library(ggplot2) storm_sub %>% - ggplot(aes(x = begin)) + - + + geom_histogram(bins = 20) + - + + theme_bw()
      @@ -18457,7 +19843,7 @@ font-style: inherit;">library(ggplot2) storm_sub %>% - ggplot(aes(x = begin)) + - + + facet_wrap(~ type) ~type) + - + + geom_histogram(bins = 20) + - + + theme_bw() + - + + theme(90))
      storm_sub %>%
      -      ggplot(x = begin, y = deaths)) + 
      -  +
      +    geom_point()
      @@ -18532,7 +19918,7 @@ font-style: inherit;">geom_point()
      storm_sub %>%
      -      filter(6) %>%
      -      ggplot(aes(begin, deaths)) + 
      -  +
      +    geom_point()
      @@ -18562,7 +19948,7 @@ font-style: inherit;">geom_point()
      storm_sub %>%
      -      filter(16) %>%
      -      ggplot(aes(begin, deaths)) + 
      -  +
      +    geom_point()
      @@ -18711,9 +20097,106 @@ Tip
      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package      * version date (UTC) lib source
      + bit            4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
      + bit64          4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
      + cli            3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout       1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace     2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + crayon         1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
      + digest         0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr        * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + emojifont      0.5.5   2021-04-20 [1] CRAN (R 4.3.0)
      + evaluate       0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi          1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver         2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap        1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats      * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics       0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2      * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue           1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable         0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + here         * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
      + hms            1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools      0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets    1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite       1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr          1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling       0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lifecycle      1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate    * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell        0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + nycflights13 * 1.0.2   2021-04-12 [1] CRAN (R 4.3.0)
      + pillar         1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + proto          1.0.0   2016-10-29 [1] CRAN (R 4.3.0)
      + purrr        * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6             2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr        * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang          1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown      2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rprojroot      2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
      + rstudioapi     0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales         1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + showtext       0.9-6   2023-05-03 [1] CRAN (R 4.3.0)
      + showtextdb     3.0     2020-06-04 [1] CRAN (R 4.3.0)
      + stringi        1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr      * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + sysfonts       0.8.8   2022-03-13 [1] CRAN (R 4.3.0)
      + tibble       * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr        * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect     1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse    * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange     0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb           0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8           1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs          0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + vroom          1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
      + withr          2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun           0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml           2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      ]]> @@ -18724,7 +20207,7 @@ Tip programming dates and times https://lcolladotor.github.io/jhustatcomputing2023/posts/20-working-with-dates-and-times/index.html - Fri, 18 Aug 2023 01:10:34 GMT + Fri, 18 Aug 2023 01:51:49 GMT diff --git a/posts/01-welcome/index.html b/posts/01-welcome/index.html index 6de4fff..07c228c 100644 --- a/posts/01-welcome/index.html +++ b/posts/01-welcome/index.html @@ -251,8 +251,8 @@

      Table of contents

    3. Disability Support Service
    4. Previous versions of the class
    5. Typos and corrections
    6. -
    7. R session information
    8. +
    9. R session information
    10. @@ -266,7 +266,7 @@

      Table of contents

      -

      This lecture as the rest of the course is adapted from the version Stephanie C. Hicks designed and maintained in 2021 - 2022. Check the recent changes to this file through the GitHub history.

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Welcome! I am very excited to have you in our one-term (i.e. half a semester) course on Statistical Computing course number (140.776) offered by the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health.

      This course is designed for ScM and PhD students at Johns Hopkins Bloomberg School of Public Health. I am pretty flexible about permitting outside students, but I want everyone to be aware of the goals and assumptions so no one feels like they are surprised by how the class works.

      @@ -608,8 +608,9 @@

      Previous ve

      Typos and corrections

      Feel free to submit typos/errors/etc via the github repository associated with the class: https://github.com/lcolladotor/jhustatcomputing2023. You will have the thanks of your grateful instructor!

      -
      -

      R session information

      +
      +
      +

      R session information

      options(width = 120)
       sessioninfo::session_info()
      @@ -652,7 +653,6 @@

      R session informatio

      -
      diff --git a/posts/02-introduction-to-r-and-rstudio/index.html b/posts/02-introduction-to-r-and-rstudio/index.html index b7feb18..87f5bfd 100644 --- a/posts/02-introduction-to-r-and-rstudio/index.html +++ b/posts/02-introduction-to-r-and-rstudio/index.html @@ -263,6 +263,7 @@

      Table of contents

    11. Additional Resources
    12. rtistry
    13. +
    14. R session information
    15. @@ -276,6 +277,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      There are only two kinds of languages: the ones people complain about and the ones nobody uses. —Bjarne Stroustrup

      @@ -575,7 +577,7 @@

      Installi

      We will not do that now, but it is quite likely that at one point later in this course we will.

      You only need to install a package once, unless you upgrade/re-install R. Once installed, you still need to load the package before you can use it. That has to happen every time you start a new R session. You do that using the library() command. For instance to load the ggplot2 package, type

      -
      library('ggplot2')
      +
      library("ggplot2")

      You may or may not see a short message on the screen. Some packages show messages when you load them, and others do not.

      This was a quick overview of R packages. We will use a lot of them, so you will get used to them rather quickly.

      @@ -654,9 +656,52 @@

      rtistry

      [‘Water Colours’ from Danielle Navarro https://art.djnavarro.net]

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/03-introduction-to-gitgithub/index.html b/posts/03-introduction-to-gitgithub/index.html index 5f4cb8c..fb22c41 100644 --- a/posts/03-introduction-to-gitgithub/index.html +++ b/posts/03-introduction-to-gitgithub/index.html @@ -22,6 +22,40 @@ margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } +/* CSS for syntax highlighting */ +pre > code.sourceCode { white-space: pre; position: relative; } +pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } +pre > code.sourceCode > span:empty { height: 1.2em; } +.sourceCode { overflow: visible; } +code.sourceCode > span { color: inherit; text-decoration: inherit; } +div.sourceCode { margin: 1em 0; } +pre.sourceCode { margin: 0; } +@media screen { +div.sourceCode { overflow: auto; } +} +@media print { +pre > code.sourceCode { white-space: pre-wrap; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +} +pre.numberSource code + { counter-reset: source-line 0; } +pre.numberSource code > span + { position: relative; left: -4em; counter-increment: source-line; } +pre.numberSource code > span > a:first-child::before + { content: counter(source-line); + position: relative; left: -1em; text-align: right; vertical-align: baseline; + border: none; display: inline-block; + -webkit-touch-callout: none; -webkit-user-select: none; + -khtml-user-select: none; -moz-user-select: none; + -ms-user-select: none; user-select: none; + padding: 0 4px; width: 4em; + } +pre.numberSource { margin-left: 3em; padding-left: 4px; } +div.sourceCode + { } +@media screen { +pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } +} @@ -223,6 +257,7 @@

      Table of contents

    16. Additional Resources
    17. rtistry
    18. +
    19. R session information
    20. @@ -236,6 +271,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -420,9 +456,52 @@

      rtistry

      [‘Flametree’ from Danielle Navarro https://art.djnavarro.net]

      +
      + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/04-reproducible-research/index.html b/posts/04-reproducible-research/index.html index 3800985..eef7357 100644 --- a/posts/04-reproducible-research/index.html +++ b/posts/04-reproducible-research/index.html @@ -22,6 +22,40 @@ margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ vertical-align: middle; } +/* CSS for syntax highlighting */ +pre > code.sourceCode { white-space: pre; position: relative; } +pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } +pre > code.sourceCode > span:empty { height: 1.2em; } +.sourceCode { overflow: visible; } +code.sourceCode > span { color: inherit; text-decoration: inherit; } +div.sourceCode { margin: 1em 0; } +pre.sourceCode { margin: 0; } +@media screen { +div.sourceCode { overflow: auto; } +} +@media print { +pre > code.sourceCode { white-space: pre-wrap; } +pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +} +pre.numberSource code + { counter-reset: source-line 0; } +pre.numberSource code > span + { position: relative; left: -4em; counter-increment: source-line; } +pre.numberSource code > span > a:first-child::before + { content: counter(source-line); + position: relative; left: -1em; text-align: right; vertical-align: baseline; + border: none; display: inline-block; + -webkit-touch-callout: none; -webkit-user-select: none; + -khtml-user-select: none; -moz-user-select: none; + -ms-user-select: none; user-select: none; + padding: 0 4px; width: 4em; + } +pre.numberSource { margin-left: 3em; padding-left: 4px; } +div.sourceCode + { } +@media screen { +pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } +} @@ -227,6 +261,7 @@

      Table of contents

      +
    21. R session information
    22. @@ -240,6 +275,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      An article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result. —Claerbout and Karrenbach (1992)

      @@ -500,9 +536,77 @@

      Final Questions

    + + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + dplyr         1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
    + emojifont     0.5.5   2021-04-20 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
    + ggplot2       3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
    + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
    + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
    + here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
    + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
    + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
    + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
    + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
    + proto         1.0.0   2016-10-29 [1] CRAN (R 4.3.0)
    + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + showtext      0.9-6   2023-05-03 [1] CRAN (R 4.3.0)
    + showtextdb    3.0     2020-06-04 [1] CRAN (R 4.3.0)
    + sysfonts      0.8.8   2022-03-13 [1] CRAN (R 4.3.0)
    + tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
    + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
    + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
    + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -
    diff --git a/posts/05-literate-programming/index.html b/posts/05-literate-programming/index.html index 1ff6a41..d5dc738 100644 --- a/posts/05-literate-programming/index.html +++ b/posts/05-literate-programming/index.html @@ -290,6 +290,7 @@

    Table of contents

  4. Questions
  5. Additional Resources
  6. +
  7. R session information
  8. @@ -303,6 +304,7 @@

    Table of contents

    +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    Pre-lecture materials

    @@ -727,23 +729,19 @@

    Code snippets

    if (condition) {
    -  
    -}
    -
    -else {
    -  
    -}
    -
    -else if (condition) {
    -  
    -}
    + ## Case 1 +} else if (condition) { + ## Case 2 +} else if (condition) { + ## Case 3 +}
    • fun to create a function
    name <- function(variables) {
    -  
    +
     }
      @@ -751,7 +749,7 @@

      Code snippets

    for (variable in vector) {
    -  
    +
     }
      @@ -890,10 +888,53 @@

      Additional Resources<

    +
    + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -
    diff --git a/posts/06-reference-management/index.html b/posts/06-reference-management/index.html index aab8e29..ad54699 100644 --- a/posts/06-reference-management/index.html +++ b/posts/06-reference-management/index.html @@ -273,6 +273,7 @@

    Table of contents

  9. Additional Resources
  10. rtistry
  11. +
  12. R session information
  13. @@ -286,6 +287,7 @@

    Table of contents

    +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    Pre-lecture materials

    @@ -523,7 +525,7 @@

    Practice

  14. Install the following packages:

-
install.packages(c("bibtex", "RefManageR")
+
install.packages(c("bibtex", "RefManageR"))

What do they do? How might they be helpful to you in terms of reference management?

    @@ -557,10 +559,53 @@

    rtistry

    [Add here.]

    + + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -

    References

    diff --git a/posts/07-reading-and-writing-data/index.html b/posts/07-reading-and-writing-data/index.html index 1ce6214..4b79d85 100644 --- a/posts/07-reading-and-writing-data/index.html +++ b/posts/07-reading-and-writing-data/index.html @@ -273,6 +273,7 @@

    Table of contents

  1. Final Questions
  2. Additional Resources
  3. +
  4. R session information
  5. @@ -286,6 +287,7 @@

    Table of contents

    +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    “When writing code, you’re always collaborating with future-you; and past-you doesn’t respond to emails”. —Hadley Wickham

    @@ -365,9 +367,9 @@

    Relative versus absolute paths

    function being used which explicitly tells R to change the absolute path or absolute location of which directory to move into.

    For example, say I want to clone a GitHub repo from my colleague Brian, which has 100 R script files, and in every one of those files at the top is:

    -
    setwd("C:\Users\Brian\path\only\that\Brian\has")
    +
    setwd("C:\\Users\\Brian\\path\\only\\that\\Brian\\has")
    -

    The problem is, if I want to use his code, I will need to go and hand-edit every single one of those paths (C:\Users\Brian\path\only\that\Brian\has) to the path that I want to use on my computer or wherever I saved the folder on my computer (e.g. /Users/Stephanie/Documents/path/only/I/have).

    +

    The problem is, if I want to use his code, I will need to go and hand-edit every single one of those paths (C:\Users\Brian\path\only\that\Brian\has) to the path that I want to use on my computer or wherever I saved the folder on my computer (e.g. /Users/leocollado/Documents/path/only/I/have).

    1. This is an unsustainable practice.
    2. I can go in and manually edit the path, but this assumes I know how to set a working directory. Not everyone does.
    3. @@ -472,8 +474,8 @@

      Finding
    4. List the files in the path.
    -
    if(!file.exists(here("my", "relative", "path"))){
    -  dir.create(here("my", "relative", "path"))
    +
    if (!file.exists(here("my", "relative", "path"))) {
    +    dir.create(here("my", "relative", "path"))
     }
     list.files(here("my", "relative", "path"))
    @@ -584,7 +586,7 @@

    R code

    For example, it might be something like this:

    -
    source(here::here('functions.R'))
    +
    source(here::here("functions.R"))
    @@ -650,9 +652,9 @@

    Example

    Let’s try an example. Let’s save a vector of length 5 into the two file formats.

    x <- 1:5
    -save(x, file=here("data", "x.Rda"))
    -saveRDS(x, file=here("data", "x.Rds"))
    -list.files(path=here("data"))
    +save(x, file = here("data", "x.Rda")) +saveRDS(x, file = here("data", "x.Rds")) +list.files(path = here("data"))
     [1] "2016-07-19.csv.bz2"       "b_lyrics.RDS"            
      [3] "bmi_pm25_no2_sim.csv"     "chicago.rds"             
    @@ -720,7 +722,7 @@ 

    Example

    x <- 1:5
     y <- x^2
    -save(x,y, file=here("data", "x.Rda"))
    +save(x, y, file = here("data", "x.Rda"))
     new_x2 <- load(here("data", "x.Rda"))

    When you are done:

    @@ -963,7 +965,8 @@

    Example

    The second line of metadata x,y,z 1,2,3", - skip = 2)
    + skip = 2 +)
    Rows: 1 Columns: 3
     ── Column specification ────────────────────────────────────────────────────────
    @@ -985,7 +988,8 @@ 

    Example

    read_csv("# A comment I want to skip
       x,y,z
       1,2,3",
    -  comment = "#")
    + comment = "#" +)
    Rows: 1 Columns: 3
     ── Column specification ────────────────────────────────────────────────────────
    @@ -1024,8 +1028,9 @@ 

    Example

    Here is an example of how to specify the column types explicitly:

    -
    teams <- read_csv(here("data", "team_standings.csv"), 
    -                  col_types = "cc")
    +
    teams <- read_csv(here("data", "team_standings.csv"),
    +    col_types = "cc"
    +)

    Note that the col_types argument accepts a compact representation. Here "cc" indicates that the first column is character and the second column is character (there are only two columns). Using the col_types argument is useful because often it is not easy to automatically figure out the type of a column by looking at a few rows (especially if a column has many missing values).

    @@ -1044,8 +1049,9 @@

    Example

    The following call reads a gzip-compressed CSV file containing download logs from the RStudio CRAN mirror.

    -
    logs <- read_csv(here("data", "2016-07-19.csv.bz2"), 
    -                 n_max = 10)
    +
    logs <- read_csv(here("data", "2016-07-19.csv.bz2"),
    +    n_max = 10
    +)
    Rows: 10 Columns: 10
     ── Column specification ────────────────────────────────────────────────────────
    @@ -1061,10 +1067,11 @@ 

    Example

    Note that the warnings indicate that read_csv() may have had some difficulty identifying the type of each column. This can be solved by using the col_types argument.

    -
    logs <- read_csv(here("data", "2016-07-19.csv.bz2"), 
    -                 col_types = "ccicccccci", 
    -                 n_max = 10)
    -logs
    +
    logs <- read_csv(here("data", "2016-07-19.csv.bz2"),
    +    col_types = "ccicccccci",
    +    n_max = 10
    +)
    +logs
    # A tibble: 10 × 10
        date       time     size r_version r_arch r_os  package version country ip_id
    @@ -1085,10 +1092,11 @@ 

    Example

    For example, in the log data above, the first column is actually a date, so it might make more sense to read it in as a Date object.

    If we wanted to just read in that first column, we could do

    -
    logdates <- read_csv(here("data", "2016-07-19.csv.bz2"), 
    -                     col_types = cols_only(date = col_date()),
    -                     n_max = 10)
    -logdates
    +
    logdates <- read_csv(here("data", "2016-07-19.csv.bz2"),
    +    col_types = cols_only(date = col_date()),
    +    n_max = 10
    +)
    +logdates
    # A tibble: 10 × 1
        date      
    @@ -1166,9 +1174,73 @@ 

    Additional Resources<

    + + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + bit           4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
    + bit64         4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
    + here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
    + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
    + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
    + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
    + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
    + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
    + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
    + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
    + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
    + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
    + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
    + vroom         1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
    + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -
    diff --git a/posts/08-managing-data-frames-with-tidyverse/index.html b/posts/08-managing-data-frames-with-tidyverse/index.html index 8c08a31..284f7fb 100644 --- a/posts/08-managing-data-frames-with-tidyverse/index.html +++ b/posts/08-managing-data-frames-with-tidyverse/index.html @@ -276,6 +276,7 @@

    Table of contents

  6. Final Questions
  7. Additional Resources
  8. +
  9. R session information
  10. @@ -289,6 +290,7 @@

    Table of contents

    +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    Pre-lecture materials

    @@ -464,8 +466,8 @@

    Want to see

Here, we again display the chicago data.frame as a tibble but specify that we would only like to see 5 rows. The width = Inf argument specifies that we would like to see all the possible columns. Here, there are only 8, but for larger datasets, this can be helpful to specify.

-
as_tibble(chicago) %>% 
-  print(n = 5, width = Inf)
+
as_tibble(chicago) %>%
+    print(n = 5, width = Inf)
# A tibble: 6,940 × 8
   city   tmpd  dptp date       pm25tmean2 pm10tmean2 o3tmean2 no2tmean2
@@ -497,10 +499,10 @@ 

tibble()

In the example here, we see that the column c will contain the value ‘1’ across all rows.

tibble(
-  a = 1:5,
-  b = 6:10,
-  c = 1,
-  z = (a + b)^2 + c
+    a = 1:5,
+    b = 6:10,
+    c = 1,
+    z = (a + b)^2 + c
 )
# A tibble: 5 × 4
@@ -531,9 +533,9 @@ 

tibble()

Note that to refer to such columns in other tidyverse packages, you willl continue to use backticks surrounding the variable name.

tibble(
-  `two words` = 1:5,
-  `12` = "numeric",
-  `:)` = "smile",
+    `two words` = 1:5,
+    `12` = "numeric",
+    `:)` = "smile",
 )
# A tibble: 5 × 3
@@ -561,10 +563,10 @@ 

Subsetting tibbles

For example:

df <- tibble(
-  a = 1:5,
-  b = 6:10,
-  c = 1,
-  z = (a + b)^2 + c
+    a = 1:5,
+    b = 6:10,
+    c = 1,
+    z = (a + b)^2 + c
 )
 
 # Extract by name using $ or [[]]
@@ -1043,9 +1045,10 @@ 

mutate()

There is also the related transmute() function, which does the same thing as mutate() but then drops all non-transformed variables.

Here, we de-trend the PM10 and ozone (O3) variables.

-
head(transmute(chicago, 
-               pm10detrend = pm10tmean2 - mean(pm10tmean2, na.rm = TRUE),
-               o3detrend = o3tmean2 - mean(o3tmean2, na.rm = TRUE)))
+
head(transmute(chicago,
+    pm10detrend = pm10tmean2 - mean(pm10tmean2, na.rm = TRUE),
+    o3detrend = o3tmean2 - mean(o3tmean2, na.rm = TRUE)
+))
# A tibble: 6 × 2
   pm10detrend o3detrend
@@ -1103,9 +1106,11 @@ 

group_by()

Finally, we compute summary statistics for each year in the data frame with the summarize() function.

-
summarize(years, pm25 = mean(pm25, na.rm = TRUE), 
-          o3 = max(o3tmean2, na.rm = TRUE), 
-          no2 = median(no2tmean2, na.rm = TRUE))
+
summarize(years,
+    pm25 = mean(pm25, na.rm = TRUE),
+    o3 = max(o3tmean2, na.rm = TRUE),
+    no2 = median(no2tmean2, na.rm = TRUE)
+)
# A tibble: 19 × 4
     year  pm25    o3   no2
@@ -1156,8 +1161,10 @@ 

group_by()

Finally, we can compute the mean of o3 and no2 within quintiles of pm25.

-
summarize(quint, o3 = mean(o3tmean2, na.rm = TRUE), 
-          no2 = mean(no2tmean2, na.rm = TRUE))
+
summarize(quint,
+    o3 = mean(o3tmean2, na.rm = TRUE),
+    no2 = mean(no2tmean2, na.rm = TRUE)
+)
# A tibble: 6 × 3
   pm25.quint     o3   no2
@@ -1185,7 +1192,9 @@ 

%>%

This nesting is not a natural way to think about a sequence of operations.

The %>% operator allows you to string operations in a left-to-right fashion, i.e.

-
first(x) %>% second %>% third
+
first(x) %>%
+    second() %>%
+    third()
@@ -1200,12 +1209,14 @@

%>%

Take the example that we just did in the last section.

That can be done with the following sequence in a single R expression.

-
chicago %>% 
-  mutate(year = as.POSIXlt(date)$year + 1900) %>%    
-  group_by(year) %>% 
-  summarize(pm25 = mean(pm25, na.rm = TRUE), 
-            o3 = max(o3tmean2, na.rm = TRUE), 
-            no2 = median(no2tmean2, na.rm = TRUE))
+
chicago %>%
+    mutate(year = as.POSIXlt(date)$year + 1900) %>%
+    group_by(year) %>%
+    summarize(
+        pm25 = mean(pm25, na.rm = TRUE),
+        o3 = max(o3tmean2, na.rm = TRUE),
+        no2 = median(no2tmean2, na.rm = TRUE)
+    )
# A tibble: 19 × 4
     year  pm25    o3   no2
@@ -1250,11 +1261,13 @@ 

%>%

Another example might be computing the average pollutant level by month. This could be useful to see if there are any seasonal trends in the data.

-
mutate(chicago, month = as.POSIXlt(date)$mon + 1) %>% 
-        group_by(month) %>% 
-        summarize(pm25 = mean(pm25, na.rm = TRUE), 
-                  o3 = max(o3tmean2, na.rm = TRUE), 
-                  no2 = median(no2tmean2, na.rm = TRUE))
+
mutate(chicago, month = as.POSIXlt(date)$mon + 1) %>%
+    group_by(month) %>%
+    summarize(
+        pm25 = mean(pm25, na.rm = TRUE),
+        o3 = max(o3tmean2, na.rm = TRUE),
+        no2 = median(no2tmean2, na.rm = TRUE)
+    )
# A tibble: 12 × 4
    month  pm25    o3   no2
@@ -1299,16 +1312,16 @@ 

slice_*()

# A tibble: 10 × 11
    city   tmpd dewpoint date        pm25 pm10tmean2 o3tmean2 no2tmean2
    <chr> <dbl>    <dbl> <date>     <dbl>      <dbl>    <dbl>     <dbl>
- 1 chic   62       45.3 2001-05-08   7.3       51.5    26.5       27.6
- 2 chic   36       36.8 1991-11-28  NA         10      11.7       16.6
- 3 chic   29       19.6 2005-03-14  19.6       51       9.93      39.9
- 4 chic   20       11.2 2004-02-13  24.5       17.5    21.8       23.3
- 5 chic   32.5     20.4 1997-03-23  NA         14.2    25.4       19.0
- 6 chic   68.5     64.1 1996-07-27  NA         21      19.6       22.4
- 7 chic   28.5     18.2 1997-11-11  NA         24.5     3.94      28.1
- 8 chic   45.5     44.1 1991-04-13  NA         25      13.0       15.4
- 9 chic   67       49.3 2000-10-14  19.4       54.5    24.9       31.0
-10 chic   71       48   1994-09-21  NA         82      30.5       48.5
+ 1 chic   49       40.2 2000-09-25   6.6        7      17.2       15.5
+ 2 chic   35       24.1 1989-11-02  NA         25       8.83      17.3
+ 3 chic   63.5     54.4 1996-04-18  NA         54      30.5       26.7
+ 4 chic   70       65.9 1997-06-19  NA         60.5    32.4       39.9
+ 5 chic   54       50.6 2005-11-05  27.2       32      11.5       18.2
+ 6 chic   86.5     73.4 1990-07-04  NA         60.6    52.2       12.8
+ 7 chic   74       74.6 1987-08-14  NA         49.5    24.2       18.6
+ 8 chic   34.5     29.1 1995-11-27  NA         25       6.57      29.3
+ 9 chic   73       61.2 1995-09-13  NA         46      25.3       26.5
+10 chic   79       64.6 2005-07-31  20.8       29.5    40.8       20.2
 # ℹ 3 more variables: pm25detrend <dbl>, year <dbl>, pm25.quint <fct>
@@ -1411,9 +1424,84 @@

Additional Resources<

+ + +
+

R session information

+
+
options(width = 120)
+sessioninfo::session_info()
+
+
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
+ setting  value
+ version  R version 4.3.1 (2023-06-16)
+ os       macOS Ventura 13.5
+ system   aarch64, darwin20
+ ui       X11
+ language (EN)
+ collate  en_US.UTF-8
+ ctype    en_US.UTF-8
+ tz       America/New_York
+ date     2023-08-17
+ pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
+
+─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
+ package     * version date (UTC) lib source
+ cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
+ colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
+ colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
+ digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
+ dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
+ evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
+ fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
+ fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
+ forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
+ generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
+ ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
+ glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
+ gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
+ here        * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
+ hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
+ htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
+ htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
+ jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
+ knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
+ lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
+ lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
+ magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
+ munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
+ pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
+ pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
+ purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
+ R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
+ readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
+ rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
+ rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
+ rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
+ rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
+ scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
+ sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
+ stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
+ stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
+ tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
+ tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
+ tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
+ tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
+ timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
+ tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
+ utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
+ vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
+ withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
+ xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
+ yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
+
+ [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
+
+──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
+
+
-
diff --git a/posts/09-tidy-data-and-the-tidyverse/index.html b/posts/09-tidy-data-and-the-tidyverse/index.html index 9ca8928..a68d32d 100644 --- a/posts/09-tidy-data-and-the-tidyverse/index.html +++ b/posts/09-tidy-data-and-the-tidyverse/index.html @@ -254,6 +254,7 @@

Table of contents

  • Final Questions
  • Additional Resources
  • +
  • R session information
  • @@ -267,6 +268,7 @@

    Table of contents

    +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    “Happy families are all alike; every unhappy family is unhappy in its own way.” —- Leo Tolstoy

    @@ -403,8 +405,8 @@

    Tidy data

    library(tidyverse)
     
     relig_income %>%
    -  pivot_longer(-religion, names_to = "income", values_to = "respondents") %>%
    -  mutate(religion = factor(religion), income = factor(income))
    + pivot_longer(-religion, names_to = "income", values_to = "respondents") %>% + mutate(religion = factor(religion), income = factor(income))
    # A tibble: 180 × 3
        religion income             respondents
    @@ -487,7 +489,7 @@ 

    pivot_longer()
    # Gather everything EXCEPT religion to tidy data
     relig_income %>%
    -  pivot_longer(-religion, names_to = "income", values_to = "respondents")
    + pivot_longer(-religion, names_to = "income", values_to = "respondents")

    # A tibble: 180 × 3
        religion income             respondents
    @@ -525,13 +527,15 @@ 

    pivot_wider()

    You use the summarize() function in dplyr to summarize the total number of respondents per income category.

    relig_income %>%
    -  pivot_longer(-religion, names_to = "income", values_to = "respondents") %>%
    -  mutate(religion = factor(religion), income = factor(income)) %>% 
    -  group_by(income) %>% 
    -  summarize(total_respondents = sum(respondents)) %>%
    -  pivot_wider(names_from = "income", 
    -              values_from = "total_respondents") %>%
    -  knitr::kable()
    + pivot_longer(-religion, names_to = "income", values_to = "respondents") %>% + mutate(religion = factor(religion), income = factor(income)) %>% + group_by(income) %>% + summarize(total_respondents = sum(respondents)) %>% + pivot_wider( + names_from = "income", + values_from = "total_respondents" + ) %>% + knitr::kable()
    @@ -645,34 +649,34 @@

    pivot_wider()

    Bonus: Calculate a mean revenue for each company AND each year (averaged across all 4 quarters).

    df <- tibble(
    -  "company" = rep(1:3, each=4), 
    -  "year"  = rep(2006:2009, 3),
    -  "Q1"    = sample(x = 0:100, size = 12),
    -  "Q2"    = sample(x = 0:100, size = 12),
    -  "Q3"    = sample(x = 0:100, size = 12),
    -  "Q4"    = sample(x = 0:100, size = 12),
    +    "company" = rep(1:3, each = 4),
    +    "year" = rep(2006:2009, 3),
    +    "Q1" = sample(x = 0:100, size = 12),
    +    "Q2" = sample(x = 0:100, size = 12),
    +    "Q3" = sample(x = 0:100, size = 12),
    +    "Q4" = sample(x = 0:100, size = 12),
     )
     df
    # A tibble: 12 × 6
        company  year    Q1    Q2    Q3    Q4
          <int> <int> <int> <int> <int> <int>
    - 1       1  2006    34     7    70     7
    - 2       1  2007    72    26    96    64
    - 3       1  2008    62    68    45    98
    - 4       1  2009    45    48    42    92
    - 5       2  2006    51    13    75    36
    - 6       2  2007    49    71    34    93
    - 7       2  2008   100    83    22    71
    - 8       2  2009    91    67    28    80
    - 9       3  2006    19    28    85     1
    -10       3  2007    61    38    65    75
    -11       3  2008    32    57    47    51
    -12       3  2009     4    58    63     0
    + 1 1 2006 99 6 54 47 + 2 1 2007 28 79 90 9 + 3 1 2008 7 72 69 24 + 4 1 2009 16 56 6 100 + 5 2 2006 42 58 75 25 + 6 2 2007 64 1 100 6 + 7 2 2008 43 88 37 77 + 8 2 2009 95 74 17 44 + 9 3 2006 34 47 77 38 +10 3 2007 73 31 31 54 +11 3 2008 4 49 93 0 +12 3 2009 57 4 45 96
    -
    # try it yourself 
    +
    # try it yourself
    @@ -686,10 +690,12 @@

    separate()

    First, we combine the first three columns into one new column using unite().

    -
    gapminder %>% 
    -  unite(col="country_continent_year", 
    -        country:year, 
    -        sep="_")
    +
    gapminder %>%
    +    unite(
    +        col = "country_continent_year",
    +        country:year,
    +        sep = "_"
    +    )
    # A tibble: 1,704 × 4
        country_continent_year lifeExp      pop gdpPercap
    @@ -709,13 +715,17 @@ 

    separate()

    Next, we show how to separate the columns into three separate columns using separate() using the col, into and sep arguments.

    -
    gapminder %>% 
    -  unite(col="country_continent_year", 
    -        country:year, 
    -        sep="_") %>% 
    -  separate(col="country_continent_year", 
    -           into=c("country", "continent", "year"), 
    -           sep="_")
    +
    gapminder %>%
    +    unite(
    +        col = "country_continent_year",
    +        country:year,
    +        sep = "_"
    +    ) %>%
    +    separate(
    +        col = "country_continent_year",
    +        into = c("country", "continent", "year"),
    +        sep = "_"
    +    )
    # A tibble: 1,704 × 6
        country     continent year  lifeExp      pop gdpPercap
    @@ -755,11 +765,11 @@ 

    Final Questions

  • What do the extra and fill arguments do in separate()? Experiment with the various options for the following two toy datasets.

  • -
    tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% 
    -  separate(x, c("one", "two", "three"))
    +
    tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>%
    +    separate(x, c("one", "two", "three"))
     
    -tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% 
    -  separate(x, c("one", "two", "three"))
    +tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% + separate(x, c("one", "two", "three"))
    1. Both unite() and separate() have a remove argument. What does it do? Why would you set it to FALSE?

    2. @@ -787,9 +797,83 @@

      Additional Resources<

    + + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package     * version date (UTC) lib source
    + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
    + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
    + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
    + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
    + gapminder   * 1.0.0   2023-03-10 [1] CRAN (R 4.3.0)
    + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
    + ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
    + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
    + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
    + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
    + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
    + lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
    + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
    + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
    + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
    + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
    + purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
    + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
    + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
    + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
    + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
    + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
    + tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
    + tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
    + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
    + tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
    + timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
    + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
    + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
    + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
    + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
    + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -
    diff --git a/posts/10-joining-data-in-r/index.html b/posts/10-joining-data-in-r/index.html index b492a82..017229f 100644 --- a/posts/10-joining-data-in-r/index.html +++ b/posts/10-joining-data-in-r/index.html @@ -266,6 +266,7 @@

    Table of contents

  • Final Questions
  • Additional Resources
  • +
  • R session information
  • @@ -279,6 +280,7 @@

    Table of contents

    +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    Pre-lecture materials

    @@ -401,9 +403,9 @@

    The first table

    library(tidyverse)
     
     outcomes <- tibble(
    -        id = rep(c("a", "b", "c"), each = 3),
    -        visit = rep(0:2, 3),
    -        outcome = rnorm(3 * 3, 3)
    +    id = rep(c("a", "b", "c"), each = 3),
    +    visit = rep(0:2, 3),
    +    outcome = rnorm(3 * 3, 3)
     )
     
     print(outcomes)
    @@ -411,15 +413,15 @@

    The first table

    # A tibble: 9 × 3
       id    visit outcome
       <chr> <int>   <dbl>
    -1 a         0   1.54 
    -2 a         1   3.39 
    -3 a         2   3.03 
    -4 b         0   0.309
    -5 b         1   2.52 
    -6 b         2   3.03 
    -7 c         0   2.13 
    -8 c         1   3.12 
    -9 c         2   3.99 
    +1 a 0 3.07 +2 a 1 3.25 +3 a 2 3.93 +4 b 0 2.18 +5 b 1 2.91 +6 b 2 2.83 +7 c 0 1.49 +8 c 1 2.56 +9 c 2 1.46

    Note that subjects are labeled by a unique identifer in the id column.

    @@ -429,8 +431,8 @@

    A second table

    Here is some code to create a second table (we will be joining the first and second tables shortly). This table contains some data about the hypothetical subjects’ housing situation by recording the type of house they live in.

    subjects <- tibble(
    -        id = c("a", "b", "c"),
    -        house = c("detached", "rowhouse", "rowhouse")
    +    id = c("a", "b", "c"),
    +    house = c("detached", "rowhouse", "rowhouse")
     )
     
     print(subjects)
    @@ -517,15 +519,15 @@

    Left Join

    # A tibble: 9 × 3
       id    visit outcome
       <chr> <int>   <dbl>
    -1 a         0   1.54 
    -2 a         1   3.39 
    -3 a         2   3.03 
    -4 b         0   0.309
    -5 b         1   2.52 
    -6 b         2   3.03 
    -7 c         0   2.13 
    -8 c         1   3.12 
    -9 c         2   3.99 
    +1 a 0 3.07 +2 a 1 3.25 +3 a 2 3.93 +4 b 0 2.18 +5 b 1 2.91 +6 b 2 2.83 +7 c 0 1.49 +8 c 1 2.56 +9 c 2 1.46
    subjects
    @@ -545,15 +547,15 @@

    Left Join

    # A tibble: 9 × 4
       id    visit outcome house   
       <chr> <int>   <dbl> <chr>   
    -1 a         0   1.54  detached
    -2 a         1   3.39  detached
    -3 a         2   3.03  detached
    -4 b         0   0.309 rowhouse
    -5 b         1   2.52  rowhouse
    -6 b         2   3.03  rowhouse
    -7 c         0   2.13  rowhouse
    -8 c         1   3.12  rowhouse
    -9 c         2   3.99  rowhouse
    +1 a 0 3.07 detached +2 a 1 3.25 detached +3 a 2 3.93 detached +4 b 0 2.18 rowhouse +5 b 1 2.91 rowhouse +6 b 2 2.83 rowhouse +7 c 0 1.49 rowhouse +8 c 1 2.56 rowhouse +9 c 2 1.46 rowhouse
    @@ -574,9 +576,9 @@

    Left Join w

    In the previous examples, the subjects table didn’t have a visit column. But suppose it did? Maybe people move around during the study. We could image a table like this one.

    subjects <- tibble(
    -        id = c("a", "b", "c"),
    -        visit = c(0, 1, 0),
    -        house = c("detached", "rowhouse", "rowhouse"),
    +    id = c("a", "b", "c"),
    +    visit = c(0, 1, 0),
    +    house = c("detached", "rowhouse", "rowhouse"),
     )
     
     print(subjects)
    @@ -596,15 +598,15 @@

    Left Join w
    # A tibble: 9 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 a         0   1.54  detached
    -2 a         1   3.39  <NA>    
    -3 a         2   3.03  <NA>    
    -4 b         0   0.309 <NA>    
    -5 b         1   2.52  rowhouse
    -6 b         2   3.03  <NA>    
    -7 c         0   2.13  rowhouse
    -8 c         1   3.12  <NA>    
    -9 c         2   3.99  <NA>    
    +1 a 0 3.07 detached +2 a 1 3.25 <NA> +3 a 2 3.93 <NA> +4 b 0 2.18 <NA> +5 b 1 2.91 rowhouse +6 b 2 2.83 <NA> +7 c 0 1.49 rowhouse +8 c 1 2.56 <NA> +9 c 2 1.46 <NA>

    @@ -627,9 +629,9 @@

    Left Join w

    We may even have a situation where we are missing housing data for a subject completely. The following table has no information about subject a.

    subjects <- tibble(
    -        id = c("b", "c"),
    -        visit = c(1, 0),
    -        house = c("rowhouse", "rowhouse"),
    +    id = c("b", "c"),
    +    visit = c(1, 0),
    +    house = c("rowhouse", "rowhouse"),
     )
     
     subjects
    @@ -648,15 +650,15 @@

    Left Join w
    # A tibble: 9 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 a         0   1.54  <NA>    
    -2 a         1   3.39  <NA>    
    -3 a         2   3.03  <NA>    
    -4 b         0   0.309 <NA>    
    -5 b         1   2.52  rowhouse
    -6 b         2   3.03  <NA>    
    -7 c         0   2.13  rowhouse
    -8 c         1   3.12  <NA>    
    -9 c         2   3.99  <NA>    
    +1 a 0 3.07 <NA> +2 a 1 3.25 <NA> +3 a 2 3.93 <NA> +4 b 0 2.18 <NA> +5 b 1 2.91 rowhouse +6 b 2 2.83 <NA> +7 c 0 1.49 rowhouse +8 c 1 2.56 <NA> +9 c 2 1.46 <NA>

    @@ -686,8 +688,8 @@

    Inner Join

    # A tibble: 2 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 b         1    2.52 rowhouse
    -2 c         0    2.13 rowhouse
    +1 b 1 2.91 rowhouse +2 c 0 1.49 rowhouse
    @@ -700,8 +702,8 @@

    Right Join

    # A tibble: 2 × 4
       id    visit outcome house   
       <chr> <dbl>   <dbl> <chr>   
    -1 b         1    2.52 rowhouse
    -2 c         0    2.13 rowhouse
    +1 b 1 2.91 rowhouse +2 c 0 1.49 rowhouse @@ -735,11 +737,15 @@

    Final Questions

    # Create first example data frame
    -df1 <- data.frame(ID = 1:3,
    -                  X1 = c("a1", "a2", "a3"))
    -# Create second example data frame
    -df2 <- data.frame(ID = 2:4, 
    -                  X2 = c("b1", "b2", "b3"))
    +df1 <- data.frame( + ID = 1:3, + X1 = c("a1", "a2", "a3") +) +# Create second example data frame +df2 <- data.frame( + ID = 2:4, + X2 = c("b1", "b2", "b3") +)
    1. Try changing the order from the above e.g. inner_join(df2, df1), semi_join(df2, df1) and anti_join(df2, df1). What changed? What did not change?
    2. @@ -766,9 +772,82 @@

      Additional Resources< + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr       * 1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/11-plotting-systems/index.html b/posts/11-plotting-systems/index.html index fb0d71d..ae0389a 100644 --- a/posts/11-plotting-systems/index.html +++ b/posts/11-plotting-systems/index.html @@ -247,6 +247,7 @@

      Table of contents

    3. The Lattice System
    4. The ggplot2 System
    5. +
    6. R session information
    7. @@ -260,6 +261,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      The data may not contain the answer. And, if you torture the data long enough, it will tell you anything. —John W. Tukey

      @@ -357,8 +359,8 @@

      The Base Plotting
      data(airquality)
       with(airquality, {
      -        plot(Temp, Ozone)
      -        lines(loess.smooth(Temp, Ozone))
      +    plot(Temp, Ozone)
      +    lines(loess.smooth(Temp, Ozone))
       })
      @@ -380,8 +382,8 @@

      The Base Plotting
      data(airquality)
       with(airquality, {
      -        plot(Temp, Ozone, main = "my plot")
      -        lines(loess.smooth(Temp, Ozone))
      +    plot(Temp, Ozone, main = "my plot")
      +    lines(loess.smooth(Temp, Ozone))
       })
      @@ -559,8 +561,8 @@

      The ggplot2 System

      library(tidyverse)
       data(mpg)
       mpg %>%
      -  ggplot(aes(displ, hwy)) + 
      -  geom_point()
      + ggplot(aes(displ, hwy)) + + geom_point()
      @@ -574,9 +576,85 @@

      The ggplot2 System

      There are additional functions in ggplot2 that allow you to make arbitrarily sophisticated plots.

      We will discuss more about this in the next lecture.

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2     * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling      0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lattice     * 0.21-8  2023-04-05 [1] CRAN (R 4.3.1)
      + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/12-ggplot2-plotting-system-part-1/index.html b/posts/12-ggplot2-plotting-system-part-1/index.html index 9e0296d..c2cab99 100644 --- a/posts/12-ggplot2-plotting-system-part-1/index.html +++ b/posts/12-ggplot2-plotting-system-part-1/index.html @@ -258,6 +258,7 @@

      Table of contents

    8. Final Questions
    9. Additional Resources
    10. +
    11. R session information
    12. @@ -271,6 +272,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      “The greatest value of a picture is when it forces us to notice what we never expected to see.” —John Tukey

      @@ -368,9 +370,9 @@

      The ggplot2 Plotting System

      Consider the following plot made using base graphics previously.

      -
      with(airquality, { 
      -        plot(Temp, Ozone)
      -        lines(loess.smooth(Temp, Ozone))
      +
      with(airquality, {
      +    plot(Temp, Ozone)
      +    lines(loess.smooth(Temp, Ozone))
       })
      @@ -394,11 +396,13 @@

      The ggplot2 Plotting System

      library(tidyverse)
       airquality %>%
      -        ggplot(aes(Temp, Ozone)) + 
      -        geom_point() + 
      -        geom_smooth(method = "loess", 
      -                    se = FALSE) + 
      -        theme_minimal()
      + ggplot(aes(Temp, Ozone)) + + geom_point() + + geom_smooth( + method = "loess", + se = FALSE + ) + + theme_minimal()
      @@ -667,8 +671,8 @@

      Modifying aesthetics<

      If we wanted to count the number of penguins for each of the three species, we can use the count() function in dplyr:

      -
      penguins %>% 
      -  count(species)
      +
      penguins %>%
      +    count(species)
      # A tibble: 3 × 2
         species       n
      @@ -901,8 +905,8 @@ 

      Facets

      What if you wanted to add a smoother to each one of those panels? Simple, you literally just add the smoother as another geom.

      -
      qplot(displ, hwy, data = mpg, facets = . ~ drv) + 
      -  geom_smooth(method = "lm")
      +
      qplot(displ, hwy, data = mpg, facets = . ~ drv) +
      +    geom_smooth(method = "lm")
      @@ -1099,8 +1103,8 @@

      Case Study: MAACS

      This is slightly better but the substantial overlap makes it difficult to discern any trends in the data. For this we need to add a smoother of some sort. Here we add a linear regression line (a type of smoother) to each group to see if there’s any difference.

      -
      qplot(log(pm25), log(eno), data = maacs, color = mopos) + 
      -        geom_smooth(method = "lm")
      +
      qplot(log(pm25), log(eno), data = maacs, color = mopos) +
      +    geom_smooth(method = "lm")
      `geom_smooth()` using formula = 'y ~ x'
      @@ -1111,8 +1115,8 @@

      Case Study: MAACS

      Here we see quite clearly that the red group and the green group exhibit rather different relationships between PM2.5 and eNO. For the non-allergic individuals, there appears to be a slightly negative relationship between PM2.5 and eNO and for the allergic individuals, there is a positive relationship. This suggests a strong interaction between PM2.5 and allergic status, an hypothesis perhaps worth following up on in greater detail than this brief exploratory analysis.

      Another, and perhaps more clear, way to visualize this interaction is to use separate panels for the non-allergic and allergic individuals using the facets argument to qplot().

      -
      qplot(log(pm25), log(eno), data = maacs, facets = . ~ mopos) + 
      -        geom_smooth(method = "lm")
      +
      qplot(log(pm25), log(eno), data = maacs, facets = . ~ mopos) +
      +    geom_smooth(method = "lm")
      `geom_smooth()` using formula = 'y ~ x'
      @@ -1169,9 +1173,95 @@

      Additional Resources<

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + bit              4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
      + bit64            4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + crayon           1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver           2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + here           * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling         0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lattice          0.21-8  2023-04-05 [1] CRAN (R 4.3.1)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + Matrix           1.6-1   2023-08-14 [1] CRAN (R 4.3.0)
      + mgcv             1.9-0   2023-07-11 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + nlme             3.1-163 2023-08-09 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rprojroot        2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + vroom            1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/13-ggplot2-plotting-system-part-2/index.html b/posts/13-ggplot2-plotting-system-part-2/index.html index bee9aaa..bda08da 100644 --- a/posts/13-ggplot2-plotting-system-part-2/index.html +++ b/posts/13-ggplot2-plotting-system-part-2/index.html @@ -274,6 +274,7 @@

      Table of contents

    13. Final Questions
    14. Additional Resources
    15. +
    16. R session information
    17. @@ -287,6 +288,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -431,8 +433,9 @@

      Example: BMI, PM2
      library(tidyverse)
       library(here)
       maacs <- read_csv(here("data", "bmi_pm25_no2_sim.csv"),
      -                  col_types = "nnci")
      -maacs
      + col_types = "nnci" +) +maacs

      # A tibble: 517 × 4
          logpm25 logno2_new bmicat        NocturnalSympt
      @@ -476,9 +479,11 @@ 

      Building up in layers

      Here, we will eventually be plotting the log of PM2.5 and NocturnalSymp variable.

      -
      g <- ggplot(maacs, aes(x = logpm25, 
      -                       y = NocturnalSympt))
      -summary(g)
      +
      g <- ggplot(maacs, aes(
      +    x = logpm25,
      +    y = NocturnalSympt
      +))
      +summary(g)
      data: logpm25, logno2_new, bmicat, NocturnalSympt [517x4]
       mapping:  x = ~logpm25, y = ~NocturnalSympt
      @@ -510,7 +515,7 @@ 

      Building up in layers

      Now, normally if you were to print() a ggplot object a plot would appear on the plot device, however, our object g actually does not contain enough information to make a plot yet.

      g <- maacs %>%
      -        ggplot(aes(logpm25, NocturnalSympt))
      +    ggplot(aes(logpm25, NocturnalSympt))
       print(g)
      @@ -527,7 +532,7 @@

      First plot wit

      Here, we add the geom_point() function to create a traditional scatter plot.

      g <- maacs %>%
      -        ggplot(aes(logpm25, NocturnalSympt))
      +    ggplot(aes(logpm25, NocturnalSympt))
       g + geom_point()
      @@ -546,9 +551,9 @@

      Adding more layers

      smooth

      Because the data appear rather noisy, it might be better if we added a smoother on top of the points to see if there is a trend in the data with PM2.5.

      -
      g + 
      -  geom_point() + 
      -  geom_smooth()
      +
      g +
      +    geom_point() +
      +    geom_smooth()
      @@ -560,9 +565,9 @@

      smooth

      The default smoother is a loess smoother, which is flexible and nonparametric but might be too flexible for our purposes. Perhaps we’d prefer a simple linear regression line to highlight any first order trends. We can do this by specifying method = "lm" to geom_smooth().

      -
      g + 
      -  geom_point() + 
      -  geom_smooth(method = "lm")
      +
      g +
      +    geom_point() +
      +    geom_smooth(method = "lm")
      @@ -588,7 +593,7 @@

      smooth

      # try it yourself
       
       library(palmerpenguins)
      -penguins 
      +penguins

      # A tibble: 344 × 8
          species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
      @@ -626,10 +631,10 @@ 

      facets

      We want one row and two columns, one column for each weight category. So we specify bmicat on the right hand side of the forumla passed to facet_grid().

      -
      g + 
      -  geom_point() + 
      -  geom_smooth(method = "lm") +
      -  facet_grid(. ~ bmicat) 
      +
      g +
      +    geom_point() +
      +    geom_smooth(method = "lm") +
      +    facet_grid(. ~ bmicat)
      @@ -663,7 +668,7 @@

      map aesthetics

      For example, here we modify the points in the scatterplot to make the color “steelblue”, the size larger, and the alpha transparency greater.

      -
      g + geom_point(color = "steelblue", size = 4, alpha = 1/2)
      +
      g + geom_point(color = "steelblue", size = 4, alpha = 1 / 2)
      @@ -682,7 +687,7 @@

      map aesthetics

      For example, we can map the aesthetic color to the variable bmicat, so the points will be colored according to the levels of bmicat.

      We use the aes() function to indicate this difference from the plot above.

      -
      g + geom_point(aes(color = bmicat), size = 4, alpha = 1/2)
      +
      g + geom_point(aes(color = bmicat), size = 4, alpha = 1 / 2)
      @@ -699,14 +704,17 @@

      Customizing the smo

      For example, we can customize the smoother that we overlay on the points with geom_smooth().

      Here we change the line type and increase the size from the default. We also remove the shaded standard error from the line.

      -
      g + 
      -  geom_point(aes(color = bmicat), 
      -             size = 2, 
      -             alpha = 1/2) + 
      -  geom_smooth(size = 4, 
      -              linetype = 3, 
      -              method = "lm", 
      -              se = FALSE)
      +
      g +
      +    geom_point(aes(color = bmicat),
      +        size = 2,
      +        alpha = 1 / 2
      +    ) +
      +    geom_smooth(
      +        size = 4,
      +        linetype = 3,
      +        method = "lm",
      +        se = FALSE
      +    )
      Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
       ℹ Please use `linewidth` instead.
      @@ -747,9 +755,9 @@

      Changing the theme

      -
      g + 
      -  geom_point(aes(color = bmicat)) + 
      -  theme_bw(base_family = "Times")
      +
      g +
      +    geom_point(aes(color = bmicat)) +
      +    theme_bw(base_family = "Times")
      @@ -774,7 +782,7 @@

      Changing the theme

      # try it yourself
       
       library(palmerpenguins)
      -penguins 
      +penguins

      # A tibble: 344 × 8
          species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
      @@ -819,11 +827,13 @@ 

      Modifying labels

      Here is an example of modifying the title and the x and y labels to make the plot a bit more informative.

      -
      g + 
      -  geom_point(aes(color = bmicat)) + 
      -  labs(title = "MAACS Cohort") + 
      -  labs(x = expression("log " * PM[2.5]), 
      -       y = "Nocturnal Symptoms")
      +
      g +
      +    geom_point(aes(color = bmicat)) +
      +    labs(title = "MAACS Cohort") +
      +    labs(
      +        x = expression("log " * PM[2.5]),
      +        y = "Nocturnal Symptoms"
      +    )
      @@ -840,13 +850,16 @@

      A quick as

      If you make a lot of time series plots, you often want to restrict the range of the y-axis while still plotting all the data.

      In the base graphics system you can do that as follows.

      -
      testdat <- data.frame(x = 1:100, 
      -                      y = rnorm(100))
      -testdat[50,2] <- 100  ## Outlier!
      -plot(testdat$x, 
      -     testdat$y,
      -     type = "l", 
      -     ylim = c(-3,3))
      +
      testdat <- data.frame(
      +    x = 1:100,
      +    y = rnorm(100)
      +)
      +testdat[50, 2] <- 100 ## Outlier!
      +plot(testdat$x,
      +    testdat$y,
      +    type = "l",
      +    ylim = c(-3, 3)
      +)
      @@ -882,9 +895,9 @@

      A quick as

      One might think that modifying the ylim() attribute would give you the same thing as the base plot, but it doesn’t (?????)

      -
      g + 
      -  geom_line() + 
      -  ylim(-3, 3)
      +
      g +
      +    geom_line() +
      +    ylim(-3, 3)
      @@ -899,9 +912,9 @@

      A quick as

      Effectively, what this does is subset the data so that only observations between -3 and 3 are included, then plot the data.

      To plot the data without subsetting it first and still get the restricted range, you have to do the following.

      -
      g + 
      -  geom_line() + 
      -  coord_cartesian(ylim = c(-3, 3))
      +
      g +
      +    geom_line() +
      +    coord_cartesian(ylim = c(-3, 3))
      @@ -956,16 +969,16 @@

      More com
      ## Setup ggplot with data frame
       g <- maacs %>%
      -        ggplot(aes(logpm25, NocturnalSympt))
      +    ggplot(aes(logpm25, NocturnalSympt))
       
       ## Add layers
      -g + geom_point(alpha = 1/3) + 
      -        facet_grid(bmicat ~ no2tert) + 
      -        geom_smooth(method="lm", se=FALSE, col="steelblue") + 
      -        theme_bw(base_family = "Avenir", base_size = 10) + 
      -        labs(x = expression("log " * PM[2.5])) + 
      -        labs(y = "Nocturnal Symptoms") + 
      -        labs(title = "MAACS Cohort")
      +g + geom_point(alpha = 1 / 3) + + facet_grid(bmicat ~ no2tert) + + geom_smooth(method = "lm", se = FALSE, col = "steelblue") + + theme_bw(base_family = "Avenir", base_size = 10) + + labs(x = expression("log " * PM[2.5])) + + labs(y = "Nocturnal Symptoms") + + labs(title = "MAACS Cohort")

      `geom_smooth()` using formula = 'y ~ x'
      @@ -1020,9 +1033,95 @@

      Additional Resources<

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + bit              4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
      + bit64            4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + crayon           1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver           2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + here           * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling         0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lattice          0.21-8  2023-04-05 [1] CRAN (R 4.3.1)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + Matrix           1.6-1   2023-08-14 [1] CRAN (R 4.3.0)
      + mgcv             1.9-0   2023-07-11 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + nlme             3.1-163 2023-08-09 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rprojroot        2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + vroom            1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-15-1.png b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-15-1.png index 8128ff4..807f126 100644 Binary files a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-15-1.png and b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-16-1.png b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-16-1.png index cac4ee5..ed79c8a 100644 Binary files a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-16-1.png and b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-16-1.png differ diff --git a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-17-1.png b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-17-1.png index 9cc2c8f..126a383 100644 Binary files a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-17-1.png and b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-17-1.png differ diff --git a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-18-1.png b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-18-1.png index e30aa4b..1f3afa1 100644 Binary files a/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-18-1.png and b/posts/13-ggplot2-plotting-system-part-2/index_files/figure-html/unnamed-chunk-18-1.png differ diff --git a/posts/14-r-nuts-and-bolts/index.html b/posts/14-r-nuts-and-bolts/index.html index 9aa996d..b9adf58 100644 --- a/posts/14-r-nuts-and-bolts/index.html +++ b/posts/14-r-nuts-and-bolts/index.html @@ -270,6 +270,7 @@

      Table of contents

    18. Final Questions
    19. Additional Resources
    20. +
    21. R session information
    22. @@ -283,6 +284,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -1326,9 +1328,83 @@

      Additional Resources<

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/15-control-structures/index.html b/posts/15-control-structures/index.html index 6f6cc87..aa950e0 100644 --- a/posts/15-control-structures/index.html +++ b/posts/15-control-structures/index.html @@ -257,6 +257,7 @@

      Table of contents

    23. Final Questions
    24. Additional Resources
    25. +
    26. R session information
    27. @@ -270,6 +271,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -377,45 +379,45 @@

      if-else<

      Here is an example of a valid if/else structure.

      Let’s use the runif(n, min=0, max=1) function which draws a random value between a min and max value with the default being between 0 and 1.

      -
      x <- runif(n=1, min=0, max=10)  
      +
      x <- runif(n = 1, min = 0, max = 10)
       x
      -
      [1] 1.907048
      +
      [1] 3.521267

      Then, we can write and if-else statement that tests whethere x is greater than 3 or not.

      x > 3
      -
      [1] FALSE
      +
      [1] TRUE

      If x is greater than 3, then the first condition occurs. If x is not greater than 3, then the second condition occurs.

      -
      if(x > 3) {
      +
      if (x > 3) {
           y <- 10
      -  } else {
      +} else {
           y <- 0
      -  }
      +}

      Finally, we can auto print y to see what the value is.

      y
      -
      [1] 0
      +
      [1] 10

      This expression can also be written a different (but equivalent!) way in R.

      -
      y <- if(x > 3) {
      +
      y <- if (x > 3) {
           10
      -  } else { 
      +} else {
           0
      -  }
      +}
       
       y
      -
      [1] 0
      +
      [1] 10
      @@ -462,7 +464,7 @@

      if-else< library(tidyverse) library(palmerpenguins) -penguins

      +penguins
      # A tibble: 344 × 8
          species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
      @@ -490,8 +492,8 @@ 

      for Loops

      In R, for loops take an iterator variable and assign it successive values from a sequence or vector.

      For loops are most commonly used for iterating over the elements of an object (list, vector, etc.)

      -
      for(i in 1:10) {
      -        print(i)
      +
      for (i in 1:10) {
      +    print(i)
       }
      [1] 1
      @@ -513,9 +515,9 @@ 

      for Loops

      x <- c("a", "b", "c", "d") ## create for loop -for(i in 1:4) { - ## Print out each element of 'x' - print(x[i]) +for (i in 1:4) { + ## Print out each element of 'x' + print(x[i]) }
      [1] "a"
      @@ -530,9 +532,9 @@ 

      for Loops

      x <- c("a", "b", "c", "d") ## create for loop -for(i in 1:4) { - ## Print out just 'i' - print(i) +for (i in 1:4) { + ## Print out just 'i' + print(i) }
      [1] 1
      @@ -558,8 +560,8 @@ 

      seq_along()

      Let’s put seq_along() and for loops together.

      ## Generate a sequence based on length of 'x'
      -for(i in seq_along(x)) {   
      -        print(x[i])
      +for (i in seq_along(x)) {
      +    print(x[i])
       }
      [1] "a"
      @@ -570,8 +572,8 @@ 

      seq_along()

      It is not necessary to use an index-type variable (i.e. i).

      -
      for(babyshark in x) {
      -        print(babyshark)
      +
      for (babyshark in x) {
      +    print(babyshark)
       }
      [1] "a"
      @@ -581,8 +583,8 @@ 

      seq_along()

      -
      for(candyisgreat in x) {
      -        print(candyisgreat)
      +
      for (candyisgreat in x) {
      +    print(candyisgreat)
       }
      [1] "a"
      @@ -592,8 +594,8 @@ 

      seq_along()

      -
      for(RememberToVote in x) {
      -        print(RememberToVote)
      +
      for (RememberToVote in x) {
      +    print(RememberToVote)
       }
      [1] "a"
      @@ -604,18 +606,18 @@ 

      seq_along()

      You can use any character index you want (but not with symbols or numbers).

      -
      for(1999 in x) {
      -        print(1999)
      +
      for (1999 in x) {
      +    print(1999)
       }
      -
      Error: <text>:1:5: unexpected numeric constant
      -1: for(1999
      -        ^
      +
      Error: <text>:1:6: unexpected numeric constant
      +1: for (1999
      +         ^

      For one line loops, the curly braces are not strictly necessary.

      -
      for(i in 1:4) print(x[i])
      +
      for (i in 1:4) print(x[i])
      [1] "a"
       [1] "b"
      @@ -661,10 +663,10 @@ 

      Nested for l

      -
      for(i in seq_len(nrow(x))) {
      -        for(j in seq_len(ncol(x))) {
      -                print(x[i, j])
      -        }   
      +
      for (i in seq_len(nrow(x))) {
      +    for (j in seq_len(ncol(x))) {
      +        print(x[i, j])
      +    }
       }
      [1] 1
      @@ -700,9 +702,9 @@ 

      while Loops

      Once the loop body is executed, the condition is tested again, and so forth, until the condition is false, after which the loop exits.

      count <- 0
      -while(count < 10) {
      -        print(count)
      -        count <- count + 1
      +while (count < 10) {
      +    print(count)
      +    count <- count + 1
       }
      [1] 0
      @@ -723,14 +725,14 @@ 

      while Loops

      z <- 5
       set.seed(1)
       
      -while(z >= 3 && z <= 10) {
      -        coin <- rbinom(1, 1, 0.5)
      -        
      -        if(coin == 1) {  ## random walk
      -                z <- z + 1
      -        } else {
      -                z <- z - 1
      -        } 
      +while (z >= 3 && z <= 10) {
      +    coin <- rbinom(1, 1, 0.5)
      +
      +    if (coin == 1) { ## random walk
      +        z <- z + 1
      +    } else {
      +        z <- z - 1
      +    }
       }
       print(z)
      @@ -796,13 +798,13 @@

      repeat Loops

      tol <- 1e-8 repeat { - x1 <- computeEstimate() - - if(abs(x1 - x0) < tol) { ## Close enough? - break - } else { - x0 <- x1 - } + x1 <- computeEstimate() + + if (abs(x1 - x0) < tol) { ## Close enough? + break + } else { + x0 <- x1 + } }
      @@ -838,23 +840,23 @@

      repeat Loops

      next, break

      next is used to skip an iteration of a loop.

      -
      for(i in 1:100) {
      -        if(i <= 20) {
      -                ## Skip the first 20 iterations
      -                next                 
      -        }
      -        ## Do something here
      +
      for (i in 1:100) {
      +    if (i <= 20) {
      +        ## Skip the first 20 iterations
      +        next
      +    }
      +    ## Do something here
       }

      break is used to exit a loop immediately, regardless of what iteration the loop may be on.

      -
      for(i in 1:100) {
      -      print(i)
      +
      for (i in 1:100) {
      +    print(i)
       
      -      if(i > 20) {
      -              ## Stop loop after 20 iterations
      -              break  
      -      }     
      +    if (i > 20) {
      +        ## Stop loop after 20 iterations
      +        break
      +    }
       }
      @@ -909,9 +911,83 @@

      Additional Resources<

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package        * version date (UTC) lib source
      + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
      + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/16-functions/index.html b/posts/16-functions/index.html index 483bd1f..968a4d1 100644 --- a/posts/16-functions/index.html +++ b/posts/16-functions/index.html @@ -269,6 +269,7 @@

      Table of contents

    28. Final Questions
    29. Additional Resources
    30. +
    31. R session information
    32. @@ -282,6 +283,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -389,15 +391,15 @@

      Your First FunctionHere’s a simple function that takes no arguments and does nothing.

      f <- function() {
      -        ## This is an empty function
      +    ## This is an empty function
       }
       ## Functions have their own class
      -class(f)  
      +class(f)
      [1] "function"
      ## Execute this function
      -f()       
      +f()

      NULL
      @@ -408,9 +410,9 @@

      Your First FunctionThe next thing we can do is create a function that actually has a non-trivial function body.

      f <- function() {
      -        # this is the function body
      -        hello <- "Hello, world!\n"
      -        cat(hello) 
      +    # this is the function body
      +    hello <- "Hello, world!\n"
      +    cat(hello)
       }
       f()
      @@ -447,10 +449,10 @@

      Your First FunctionFor this basic function, we can add an argument that determines how many times “Hello, world!” is printed to the console.

      f <- function(num) {
      -        for(i in seq_len(num)) {
      -                hello <- "Hello, world!\n"
      -                cat(hello) 
      -        }
      +    for (i in seq_len(num)) {
      +        hello <- "Hello, world!\n"
      +        cat(hello)
      +    }
       }
       f(3)
      @@ -480,12 +482,12 @@

      Your First FunctionThis next function returns the total number of characters printed to the console.

      f <- function(num) {
      -        hello <- "Hello, world!\n"
      -        for(i in seq_len(num)) {
      -                 cat(hello)
      -        }
      -        chars <- nchar(hello) * num
      -        chars
      +    hello <- "Hello, world!\n"
      +    for (i in seq_len(num)) {
      +        cat(hello)
      +    }
      +    chars <- nchar(hello) * num
      +    chars
       }
       meaningoflife <- f(3)
      @@ -527,23 +529,23 @@

      Your First FunctionHere, for example, we could set the default value for num to be 1, so that if the function is called without the num argument being explicitly specified, then it will print “Hello, world!” to the console once.

      f <- function(num = 1) {
      -        hello <- "Hello, world!\n"
      -        for(i in seq_len(num)) {
      -                cat(hello)
      -        }
      -        chars <- nchar(hello) * num
      -        chars
      +    hello <- "Hello, world!\n"
      +    for (i in seq_len(num)) {
      +        cat(hello)
      +    }
      +    chars <- nchar(hello) * num
      +    chars
       }
       
       
      -f()    ## Use default value for 'num'
      +f() ## Use default value for 'num'
      Hello, world!
      [1] 14
      -
      f(2)   ## Use user-specified value
      +
      f(2) ## Use user-specified value
      Hello, world!
       Hello, world!
      @@ -616,7 +618,7 @@

      Argument matching

      function (n, mean = 0, sd = 1)  
      -
      mydata <- rnorm(100, 2, 1)              ## Generate some data
      +
      mydata <- rnorm(100, 2, 1) ## Generate some data

      100 is assigned to the n argument, 2 is assigned to the mean argument, and 1 is assigned to the sd argument, all by positional matching.

      The following calls to the sd() function (which computes the empirical standard deviation of a vector of numbers) are all equivalent.

      @@ -637,19 +639,19 @@

      Argument matching

      ## Positional match first argument, default for 'na.rm'
      -sd(mydata)                     
      +sd(mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286
      ## Specify 'x' argument by name, default for 'na.rm'
      -sd(x = mydata)                 
      +sd(x = mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286
      ## Specify both arguments by name
      -sd(x = mydata, na.rm = FALSE) 
      +sd(x = mydata, na.rm = FALSE)
      -
      [1] 1.110707
      +
      [1] 1.014286

      @@ -658,9 +660,9 @@

      Argument matching

      In the example below, we specify the na.rm argument first, followed by x, even though x is the first argument defined in the function definition.

      ## Specify both arguments by name
      -sd(na.rm = FALSE, x = mydata)     
      +sd(na.rm = FALSE, x = mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286

      You can mix positional matching with matching by name.

      @@ -668,7 +670,7 @@

      Argument matching

      sd(na.rm = FALSE, mydata)
      -
      [1] 1.110707
      +
      [1] 1.014286

      Here, the mydata object is assigned to the x argument, because it’s the only argument not yet specified.

      @@ -751,8 +753,8 @@

      Lazy Evaluation

      In this example, the function f() has two arguments: a and b.

      f <- function(a, b) {
      -        a^2
      -} 
      +    a^2
      +}
       f(2)
      [1] 4
      @@ -763,8 +765,8 @@

      Lazy Evaluation

      This example also shows lazy evaluation at work, but does eventually result in an error.

      f <- function(a, b) {
      -        print(a)
      -        print(b)
      +    print(a)
      +    print(b)
       }
       f(45)
      @@ -791,7 +793,7 @@

      The ... Argu
      function (x, ...) 
       UseMethod("mean")
      -<bytecode: 0x138e33de8>
      +<bytecode: 0x1075ea1e8>
       <environment: namespace:base>

      @@ -801,7 +803,7 @@

      The ... Argu
      [1] "one two three"
      -
      paste("one", "two", "three", "four", "five", sep="_")
      +
      paste("one", "two", "three", "four", "five", sep = "_")
      [1] "one_two_three_four_five"
      @@ -931,8 +933,8 @@

      Environment

      For example, take this function:

      f <- function(x) {
      -  x + y
      -} 
      + x + y +}

      In many programming languages, this would be an error, because y is not defined inside the function.

      In R, this is valid code because R uses rules called lexical scoping to find the value associated with a name.

      @@ -957,17 +959,17 @@

      Environment

      For f() that includes the behavior of two things that you might not expect: { and +. This allows you to do devious things like:

      `+` <- function(x, y) {
      -  if (runif(1) < 0.1) {
      -    sum(x, y)
      -  } else {
      -    sum(x, y) * 1.1
      -  }
      +    if (runif(1) < 0.1) {
      +        sum(x, y)
      +    } else {
      +        sum(x, y) * 1.1
      +    }
       }
       table(replicate(1000, 1 + 2))
      
         3 3.3 
      - 95 905 
      + 82 918
      @@ -1054,9 +1056,52 @@

      Additional Resources<

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/17-loop-functions/index.html b/posts/17-loop-functions/index.html index 0c13e6b..c483b98 100644 --- a/posts/17-loop-functions/index.html +++ b/posts/17-loop-functions/index.html @@ -272,6 +272,7 @@

      Table of contents

    33. Final Questions
    34. Additional Resources
    35. +
    36. R session information
    37. @@ -285,6 +286,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -370,7 +372,7 @@

      Two vectors

      If we have two vectors of the same length, and we sum them in R, they will be added entry by entry as follows:

      x <- 1:10
      -y <- 1:10 
      +y <- 1:10
       x + y
       [1]  2  4  6  8 10 12 14 16 18 20
      @@ -387,7 +389,7 @@

      Two vectors

      y <- 1:10
      -x*y
      +x * y
       [1]   1   4   9  16  25  36  49  64  81 100
      @@ -427,7 +429,7 @@

      lapply()

      X <- as.list(X) .Internal(lapply(X, FUN)) } -<bytecode: 0x12d12f9d0> +<bytecode: 0x12d9335d0> <environment: namespace:base>
      @@ -464,8 +466,8 @@

      lapply()

      [1] 1 2 3 4 5 $b - [1] 0.9398820 0.6808533 -0.5230355 -1.4199458 -0.9806165 0.2871580 - [7] 1.2836726 -1.1063673 1.4649872 0.4810928 + [1] -0.6113707 0.5950531 0.6319343 0.5595441 0.3188799 -0.4400711 + [7] 1.6687028 0.4501791 1.4356856 -0.3858270
      lapply(x, mean)
      @@ -473,7 +475,7 @@

      lapply()

      [1] 3 $b -[1] 0.1107681 +[1] 0.422271

      Notice that here we are passing the mean() function as an argument to the lapply() function.

      @@ -500,13 +502,13 @@

      lapply()

      [1] 2.5 $b -[1] -0.3599091 +[1] 0.1655327 $c -[1] 1.715792 +[1] 0.9767504 $d -[1] 5.062643 +[1] 4.951283
      @@ -518,16 +520,16 @@

      lapply()

      lapply(x, runif)
      [[1]]
      -[1] 0.4687761
      +[1] 0.5924944
       
       [[2]]
      -[1] 0.9249996 0.3011933
      +[1] 0.8660588 0.3277243
       
       [[3]]
      -[1] 0.5811661 0.1755092 0.5232761
      +[1] 0.5009080 0.2951163 0.6264905
       
       [[4]]
      -[1] 0.6459540 0.3708483 0.6723211 0.7998949
      +[1] 0.04282267 0.14951908 0.82034538 0.64614463
      @@ -568,16 +570,16 @@

      lapply()

      lapply(x, runif, min = 0, max = 10)
      [[1]]
      -[1] 8.291326
      +[1] 5.653385
       
       [[2]]
      -[1] 8.893872 9.878169
      +[1] 8.325503 7.234466
       
       [[3]]
      -[1] 5.5325986 0.4374242 7.2026176
      +[1] 5.968981 9.174316 7.920678
       
       [[4]]
      -[1] 1.6807689 0.2755822 8.5226424 9.5019399
      +[1] 9.491500 3.023649 2.990945 8.757496

      So now, instead of the random numbers being between 0 and 1 (the default), the are all between 0 and 10.

      @@ -594,7 +596,7 @@

      lapply()

      Here I am creating a list that contains two matrices.

      -
      x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2)) 
      +
      x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
       x
      $a
      @@ -611,7 +613,9 @@ 

      lapply()

      Suppose I wanted to extract the first column of each matrix in the list. I could write an anonymous function for extracting the first column of each matrix.

      -
      lapply(x, function(elt) { elt[,1] })
      +
      lapply(x, function(elt) {
      +    elt[, 1]
      +})
      $a
       [1] 1 2
      @@ -627,7 +631,7 @@ 

      lapply()

      For example, I could have done the following.

      f <- function(elt) {
      -        elt[, 1]
      +    elt[, 1]
       }
       lapply(x, f)
      @@ -670,22 +674,22 @@

      sapply()

      [1] 2.5 $b -[1] -0.3561419 +[1] -0.1478465 $c -[1] 1.078816 +[1] 0.819794 $d -[1] 5.020936
      +[1] 4.954484

      Notice that lapply() returns a list (as usual), but that each element of the list has length 1.

      Here’s the result of calling sapply() on the same list.

      -
      sapply(x, mean) 
      +
      sapply(x, mean)
               a          b          c          d 
      - 2.5000000 -0.3561419  1.0788156  5.0209365 
      + 2.5000000 -0.1478465 0.8197940 4.9544836

      Because the result of lapply() was a list where each element had length 1, sapply() collapsed the output into a numeric vector, which is often more useful than a list.

      @@ -721,16 +725,16 @@

      split()

      split(x, f)
      $`1`
      - [1] -0.88306749 -1.86719488  0.63289913  1.05916422 -0.55471433  0.14180641
      - [7]  0.07777047 -0.09623353  0.80288817 -0.07352678
      + [1]  0.78541247 -0.06267966 -0.89713180  0.11796725  0.66689447 -0.02523006
      + [7] -0.19081948  0.44974528 -0.51005146 -0.08103298
       
       $`2`
      - [1] 0.52710414 0.78458044 0.85538500 0.11115802 0.43938934 0.30846324
      - [7] 0.12611702 0.92352094 0.07062165 0.61957181
      + [1] 0.29977033 0.31873253 0.53182993 0.85507540 0.21585775 0.89867742
      + [7] 0.78109747 0.06887742 0.79661568 0.60022565
       
       $`3`
      - [1] -0.67639542  0.72492785  0.10007215  0.29327660  0.85127149  0.50446636
      - [7]  0.05115469  2.29881193 -0.63035160  2.09792647
      + [1] -0.38262045 0.06294368 0.41768485 1.57972821 1.17555228 1.47374130 + [7] 1.79199913 2.25569283 1.55226509 -1.51811384

      A common idiom is split followed by an lapply.

      @@ -738,13 +742,13 @@

      split()

      lapply(split(x, f), mean)
      $`1`
      -[1] -0.07602086
      +[1] 0.0253074
       
       $`2`
      -[1] 0.4765912
      +[1] 0.536676
       
       $`3`
      -[1] 0.5615161
      +[1] 0.8408873
      @@ -808,7 +812,7 @@

      Splitting a Data Fr

      Then we can take the column means for Ozone, Solar.R, and Wind for each sub-data frame.

      lapply(s, function(x) {
      -        colMeans(x[, c("Ozone", "Solar.R", "Wind")])
      +    colMeans(x[, c("Ozone", "Solar.R", "Wind")])
       })
      $`5`
      @@ -835,7 +839,7 @@ 

      Splitting a Data Fr

      Using sapply() might be better here for a more readable output.

      sapply(s, function(x) {
      -        colMeans(x[, c("Ozone", "Solar.R", "Wind")])
      +    colMeans(x[, c("Ozone", "Solar.R", "Wind")])
       })
                     5         6          7        8        9
      @@ -847,9 +851,10 @@ 

      Splitting a Data Fr

      Unfortunately, there are NAs in the data so we cannot simply take the means of those variables. However, we can tell the colMeans function to remove the NAs before computing the mean.

      sapply(s, function(x) {
      -        colMeans(x[, c("Ozone", "Solar.R", "Wind")], 
      -                 na.rm = TRUE)
      -})
      + colMeans(x[, c("Ozone", "Solar.R", "Wind")], + na.rm = TRUE + ) +})

                      5         6          7          8         9
       Ozone    23.61538  29.44444  59.115385  59.961538  31.44828
      @@ -891,7 +896,7 @@ 

      tapply

      ## Simulate some data
       x <- c(rnorm(10), runif(10), rnorm(10, 1))
       ## Define some groups with a factor variable
      -f <- gl(3, 10)   
      +f <- gl(3, 10)
       f
       [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
      @@ -899,8 +904,8 @@ 

      tapply

      tapply(x, f, mean)
      -
               1          2          3 
      -0.03546858 0.50033323 1.23684289 
      +
              1         2         3 
      +0.3554738 0.5195466 0.6764006 
      @@ -910,13 +915,13 @@

      tapply

      tapply(x, f, range)
      $`1`
      -[1] -1.597023  1.582242
      +[1] -1.431912  2.695089
       
       $`2`
      -[1] 0.01799498 0.98731564
      +[1] 0.1263379 0.8959040
       
       $`3`
      -[1] -0.1673642  2.8815083
      +[1] -1.207741 1.696309

      @@ -951,25 +956,25 @@

      apply()

      x <- matrix(rnorm(200), 20, 10)
       head(x)
      -
                  [,1]        [,2]        [,3]         [,4]       [,5]       [,6]
      -[1,] -0.01270296  0.12521307 -0.35347017 -0.288597192  0.4754956 -1.4952687
      -[2,] -1.76025729 -0.36661801  1.57260727  0.909927684 -0.8722067  2.4145309
      -[3,] -0.04541822 -0.08756584  0.09477815  0.587649433 -0.2839712 -0.3948512
      -[4,] -0.79873007  2.33988787  0.04433525 -0.043574962  1.8351096 -1.4161750
      -[5,]  0.57385840  0.22221005 -1.15025884  0.002239365 -1.1274753  0.2699411
      -[6,] -0.79337310  0.15304664  0.05230485  2.088306453 -2.5307486  1.0901328
      -            [,7]         [,8]       [,9]       [,10]
      -[1,] -0.06995917 -0.970955222 -0.6081838  0.36135088
      -[2,]  0.98219144  1.226671950  0.7388203  0.99107134
      -[3,]  0.36028126  1.080908318 -1.4657096 -0.83599160
      -[4,] -0.46741177 -0.341382567  0.6639626  0.90447006
      -[5,] -0.63266831 -0.828562584 -0.5595121 -0.51470923
      -[6,]  0.44488488 -0.005120275 -1.2554960 -0.09944684
      -
      -
      apply(x, 2, mean)  ## Take the mean of each column
      +
                [,1]       [,2]       [,3]        [,4]       [,5]       [,6]
      +[1,]  1.589728  0.7733454 -1.3311072 -0.77084025 -0.1947478  0.1748546
      +[2,]  2.395088  0.3243910 -1.5133366  0.09199955  0.3850993  0.1851718
      +[3,]  1.039643 -2.1721402 -0.9933217 -1.89261272  0.1748050  1.0563987
      +[4,] -1.580978 -0.9884235 -1.4976744 -0.51011200 -2.7512079  0.5547477
      +[5,]  1.264799 -2.0551874  0.4483417 -3.08561764 -0.1549359 -0.8384706
      +[6,]  1.756973  0.9244522  0.2740854 -0.61441465 -1.0661350  1.4497808
      +           [,7]        [,8]       [,9]      [,10]
      +[1,]  0.7163086 -0.01817166  0.2193225 -0.3346788
      +[2,]  0.7606851  0.42082416  0.1099027  0.2834439
      +[3,] -1.1218204 -1.17000278  0.4302792 -0.5684986
      +[4,]  0.6082452  0.46763465 -0.3481830 -0.1765517
      +[5,] -0.7460224 -0.01123782  1.8116342 -0.1033175
      +[6,]  1.0160202 -0.82361401 -0.1616471 -0.1628032
      + +
      apply(x, 2, mean) ## Take the mean of each column
      -
       [1] -0.24958041  0.14629702 -0.14633652 -0.26691102 -0.15595976  0.07473874
      - [7]  0.05314485  0.07476061 -0.30001733  0.14398756
      +
       [1]  0.083759441 -0.134507982 -0.246473461 -0.371270102 -0.078433882
      + [6] -0.101665531 -0.007126106 -0.003193726  0.114767264  0.070612124
      @@ -986,12 +991,12 @@

      apply()

      I can also compute the sum of each row.

      -
      apply(x, 1, sum)   ## Take the mean of each row
      +
      apply(x, 1, sum) ## Take the mean of each row
      -
       [1] -2.8370777  5.8367390 -0.9898905  2.7204911 -3.7449375 -0.8555091
      - [7]  2.4826554  0.9494142 -3.9096827  0.2117756  0.3672752 -2.7321397
      -[13]  2.4937133 -2.7042877 -4.6029774 -6.2231452 -1.9386089  0.5097158
      -[19] -2.2691720  4.7181237
      +
       [1]  0.82401382  3.44326903 -5.21727094 -6.22250299 -3.47001414  2.59269751
      + [7] -1.76049948 -0.54534465  1.26993157 -0.05660623  1.89101638  2.60154094
      +[13] -0.80804188  1.96321614 -2.68869045  0.56525640  0.44214056 -4.25890694
      +[19] -3.02509115 -1.01075274
      @@ -1062,36 +1067,36 @@

      Other Ways to Apply
      x <- matrix(rnorm(200), 20, 10)
       head(x)
      -
                  [,1]        [,2]       [,3]      [,4]       [,5]       [,6]
      -[1,] -1.09759334 -0.58191082 -0.6190918 0.7545051 -1.6708063 -1.2382435
      -[2,] -0.04952269  0.50872978  1.6895949 0.1657323  1.7746160  1.7427081
      -[3,]  0.45414643  1.22539326  0.6284307 0.2973018  1.0887260  0.4581224
      -[4,] -0.03995540  0.23679937 -0.7905091 0.6370128  0.7911886 -0.2637556
      -[5,]  0.12208387 -1.41751608  1.2769118 0.8510867 -0.4888010 -0.1692706
      -[6,] -1.31501439 -0.08597665 -0.7616683 0.7553028  1.1584617 -2.0701933
      -           [,7]        [,8]        [,9]       [,10]
      -[1,] -1.1974074  1.22719350 -0.32231319  1.16291606
      -[2,] -0.6335309  0.95729514 -0.84747657  0.91182060
      -[3,] -0.7138229 -1.88743158  0.07026544 -2.01649459
      -[4,] -0.2273346  1.76161541 -1.26793435 -1.89014826
      -[5,]  0.3346429 -0.75236320  0.31607231  0.09632038
      -[6,] -1.0845780  0.02416961  0.50295930  1.93484470
      +
                  [,1]         [,2]      [,3]       [,4]        [,5]         [,6]
      +[1,]  0.58654399 -0.502546440 1.1493478  0.6257709 -0.02866237  1.490139530
      +[2,] -0.14969248  0.327632870 0.0202589  0.2889600 -0.16552218 -0.829703298
      +[3,]  1.12561766  0.707836011 0.6038607 -0.6722613  0.85092968  0.550785886
      +[4,] -1.71719604  0.554424755 0.4229181  0.1484968  0.22134369  0.258853355
      +[5,]  0.31827641  1.555568589 0.8971850 -0.7742244  0.45459793 -0.043814576
      +[6,] -0.08429415  0.001737282 0.1906608  1.1145869  0.54156791 -0.004889302
      +           [,7]        [,8]       [,9]      [,10]
      +[1,] -0.7879713  1.02206400 -1.0420765 -1.2779945
      +[2,]  1.7217146  0.06728039  0.6408182 -0.3551929
      +[3,] -0.2439192 -0.71553120 -0.8273868  0.2559954
      +[4,] -0.1085818 -0.28763268  1.9010457  1.7950971
      +[5,] -1.4082747 -1.07621679  0.5428189  0.4538626
      +[6,] -1.0644006 -0.04186614 -0.8150566  1.0490749
      ## Get row quantiles
      -apply(x, 1, quantile, probs = c(0.25, 0.75))    
      +apply(x, 1, quantile, probs = c(0.25, 0.75))
      -
                [,1]        [,2]       [,3]       [,4]       [,5]       [,6]
      -25% -1.1724539 0.004291043 -0.5178008 -0.6588207 -0.4089184 -1.0038506
      -75%  0.4853005 1.506519993  0.5858536  0.5369595  0.3300002  0.6922169
      +
                [,1]       [,2]       [,3]        [,4]       [,5]        [,6]
      +25% -0.7166151 -0.1615648 -0.5651758 -0.04431213 -0.5916219 -0.07368714
      +75%  0.9229907  0.3179646  0.6818422  0.52154809  0.5207637  0.45384114
                 [,7]       [,8]       [,9]      [,10]      [,11]      [,12]
      -25% -0.9842272 -1.0220842 -0.7082846 -0.8992771 -0.3444137 -0.4086714
      -75%  0.2951763  0.6737552  0.1853825  1.0853115  0.6014494  0.3695608
      -         [,13]      [,14]        [,15]      [,16]       [,17]      [,18]
      -25% -1.1790230 -0.7932644 -0.002708936 -0.5149016 -0.83974314 -0.7881085
      -75%  0.1577916  0.9562642  1.100022074  0.4498309 -0.04954139  0.2352183
      +25% -0.4355993 -0.1313015 -0.8149658 -0.9260982 0.02077709 -0.1343613
      +75%  1.5985929  0.8889319  0.2213238  0.3661333 0.82424899  0.4156328
      +         [,13]      [,14]      [,15]      [,16]      [,17]      [,18]
      +25% -0.1281593 -0.6691927 -0.2824997 -0.6574923 0.06421797 -0.7905708
      +75%  1.3073689  1.2450340  0.5072401  0.5023885 1.08294108  0.4653062
                [,19]      [,20]
      -25% 0.03656589 -0.7393304
      -75% 0.35820288  0.5060296
      +25% -0.5826196 -0.6965163 +75% 0.1313324 0.6849689

      Notice that I had to pass the probs = c(0.25, 0.75) argument to quantile() via the ... argument to apply().

      @@ -1117,23 +1122,23 @@

      Vectorizing a Funct

      Here’s an example of a function that computes the sum of squares given some data, a mean parameter and a standard deviation. The formula is \(\sum_{i=1}^n(x_i-\mu)^2/\sigma^2\).

      sumsq <- function(mu, sigma, x) {
      -        sum(((x - mu) / sigma)^2)
      +    sum(((x - mu) / sigma)^2)
       }

      This function takes a mean mu, a standard deviation sigma, and some data in a vector x.

      In many statistical applications, we want to minimize the sum of squares to find the optimal mu and sigma. Before we do that, we may want to evaluate or plot the function for many different values of mu or sigma.

      -
      x <- rnorm(100)       ## Generate some data
      -sumsq(mu=1, sigma=1, x)  ## This works (returns one value)
      +
      x <- rnorm(100) ## Generate some data
      +sumsq(mu = 1, sigma = 1, x) ## This works (returns one value)
      -
      [1] 201.5111
      +
      [1] 248.8765

      However, passing a vector of mus or sigmas won’t work with this function because it’s not vectorized.

      -
      sumsq(1:10, 1:10, x)  ## This is not what we want
      +
      sumsq(1:10, 1:10, x) ## This is not what we want
      -
      [1] 121.9851
      +
      [1] 119.3071
      @@ -1144,8 +1149,8 @@

      Vectorizing a Funct
      vsumsq <- Vectorize(sumsq, c("mu", "sigma"))
       vsumsq(1:10, 1:10, x)
      -
       [1] 201.5111 127.6611 113.3086 108.0569 105.5217 104.0882 103.1900 102.5851
      - [9] 102.1553 101.8371
      +
       [1] 248.8765 146.5055 124.7964 116.2695 111.8983 109.2945 107.5867 106.3890
      + [9] 105.5067 104.8318

      Pretty cool, right?

      @@ -1208,9 +1213,52 @@

      Additional Resources< + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/18-debugging-r-code/index.html b/posts/18-debugging-r-code/index.html index 7d43d9b..2e4aca6 100644 --- a/posts/18-debugging-r-code/index.html +++ b/posts/18-debugging-r-code/index.html @@ -262,6 +262,7 @@

      Table of contents

    38. Final Questions
    39. Additional Resources
    40. +
    41. R session information
    42. @@ -275,6 +276,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -469,12 +471,12 @@

      Something’s Wrong!

      Here is another function that is designed to print a message to the console depending on the nature of its input.

      print_message <- function(x) {
      -        if(x > 0) {
      -                print("x is greater than zero")
      -        } else {
      -                print("x is less than or equal to zero")
      -        }  
      -        invisible(x)        
      +    if (x > 0) {
      +        print("x is greater than zero")
      +    } else {
      +        print("x is less than or equal to zero")
      +    }
      +    invisible(x)
       }

      This function is simple:

      @@ -509,14 +511,15 @@

      Something’s Wrong!

      We can fix this problem by anticipating the possibility of NA values and checking to see if the input is NA with the is.na() function.

      print_message2 <- function(x) {
      -        if(is.na(x))
      -                print("x is a missing value!")
      -        else if(x > 0)
      -                print("x is greater than zero")
      -        else
      -                print("x is less than or equal to zero")
      -        invisible(x)
      -}
      + if (is.na(x)) { + print("x is a missing value!") + } else if (x > 0) { + print("x is greater than zero") + } else { + print("x is less than or equal to zero") + } + invisible(x) +}

      Now we can run the following.

      @@ -534,7 +537,7 @@

      Something’s Wrong!

      print_message2(x)
      -
      Error in if (is.na(x)) print("x is a missing value!") else if (x > 0) print("x is greater than zero") else print("x is less than or equal to zero"): the condition has length > 1
      +
      Error in if (is.na(x)) {: the condition has length > 1

      Now what?? Why are we getting this warning?

      @@ -551,16 +554,18 @@

      Something’s Wrong!

      For the first way, we simply need to check the length of the input.

      print_message3 <- function(x) {
      -        if(length(x) > 1L)
      -                stop("'x' has length > 1")
      -        if(is.na(x))
      -                print("x is a missing value!")
      -        else if(x > 0)
      -                print("x is greater than zero")
      -        else
      -                print("x is less than or equal to zero")
      -        invisible(x)
      -}
      + if (length(x) > 1L) { + stop("'x' has length > 1") + } + if (is.na(x)) { + print("x is a missing value!") + } else if (x > 0) { + print("x is greater than zero") + } else { + print("x is less than or equal to zero") + } + invisible(x) +}

      Now when we pass print_message3() a vector, we should get an error.

      @@ -787,10 +792,10 @@

      Final Questions

      g <- function(b) h(b) h <- function(c) i(c) i <- function(d) { - if (!is.numeric(d)) { - stop("`d` must be numeric", call. = FALSE) - } - d + 10 + if (!is.numeric(d)) { + stop("`d` must be numeric", call. = FALSE) + } + d + 10 } f("a")
      @@ -820,9 +825,57 @@

      Additional Resources<

      +
      + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
      + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + reprex      * 2.0.2   2022-08-17 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/19-error-handling-and-generation/index.html b/posts/19-error-handling-and-generation/index.html index 9ae94b9..f7360c6 100644 --- a/posts/19-error-handling-and-generation/index.html +++ b/posts/19-error-handling-and-generation/index.html @@ -251,6 +251,7 @@

      Table of contents

      +
    43. R session information
    44. @@ -264,6 +265,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -384,8 +386,8 @@

      What is an error?

      Here’s a small function that will generate a message:

      -
      f <- function(){
      -  message("This is a message.")
      +
      f <- function() {
      +    message("This is a message.")
       }
       
       f()
      @@ -419,8 +421,8 @@

      Generating Errors

      If an error occurs inside of a function, then the name of that function will appear in the error message:

      -
      name_of_function <- function(){
      -  stop("Something bad happened.")
      +
      name_of_function <- function() {
      +    stop("Something bad happened.")
       }
       
       name_of_function()
      @@ -441,9 +443,9 @@

      Generating Errors

      Let’s take a look at an example:

      -
      error_if_n_is_greater_than_zero <- function(n){
      -  stopifnot(n <= 0)
      -  n
      +
      error_if_n_is_greater_than_zero <- function(n) {
      +    stopifnot(n <= 0)
      +    n
       }
       
       error_if_n_is_greater_than_zero(5)
      @@ -474,9 +476,9 @@

      Generating Errors

      Just like errors, a warning generated inside of a function will include the name of the function in which it was generated:

      -
      make_NA <- function(x){
      -  warning("Generating an NA.")
      -  NA
      +
      make_NA <- function(x) {
      +    warning("Generating an NA.")
      +    NA
       }
       
       make_NA("Sodium")
      @@ -548,23 +550,24 @@

      How should er

      The tryCatch() function is the workhorse of handling errors and warnings in R. The first argument of this function is any R expression, followed by conditions which specify how to handle an error or a warning. The last argument, finally, specifies a function or expression that will be executed after the expression no matter what, even in the event of an error or a warning.

      Let’s construct a simple function I’m going to call beera that catches errors and warnings gracefully.

      -
      beera <- function(expr){
      -  tryCatch(expr,
      -         error = function(e){
      -           message("An error occurred:\n", e)
      -         },
      -         warning = function(w){
      -           message("A warning occured:\n", w)
      -         },
      -         finally = {
      -           message("Finally done!")
      -         })
      -}
      +
      beera <- function(expr) {
      +    tryCatch(expr,
      +        error = function(e) {
      +            message("An error occurred:\n", e)
      +        },
      +        warning = function(w) {
      +            message("A warning occured:\n", w)
      +        },
      +        finally = {
      +            message("Finally done!")
      +        }
      +    )
      +}

      This function takes an expression as an argument and tries to evaluate it. If the expression can be evaluated without any errors or warnings then the result of the expression is returned and the message Finally done! is printed to the R console. If an error or warning is generated, then the functions that are provided to the error or warning arguments are printed. Let’s try this function out with a few examples.

      beera({
      -  2 + 2
      +    2 + 2
       })
      Finally done!
      @@ -573,7 +576,7 @@

      How should er
      [1] 4

      beera({
      -  "two" + 2
      +    "two" + 2
       })
      An error occurred:
      @@ -582,7 +585,7 @@ 

      How should er Finally done!

      beera({
      -  as.numeric(c(1, "two", 3))
      +    as.numeric(c(1, "two", 3))
       })
      A warning occured:
      @@ -594,8 +597,8 @@ 

      How should er

      Notice that we’ve effectively transformed errors and warnings into messages.

      Now that you know the basics of generating and catching errors you’ll need to decide when your program should generate an error. My advice to you is to limit the number of errors your program generates as much as possible. Even if you design your program so that it’s able to catch and handle errors, the error handling process slows down your program by orders of magnitude. Imagine you wanted to write a simple function that checks if an argument is an even number. You might write the following:

      -
      is_even <- function(n){
      -  n %% 2 == 0
      +
      is_even <- function(n) {
      +    n %% 2 == 0
       }
       
       is_even(768)
      @@ -609,14 +612,15 @@

      How should er

      You can see that providing a string causes this function to raise an error. You could imagine though that you want to use this function across a list of different data types, and you only want to know which elements of that list are even numbers. You might think to write the following:

      -
      is_even_error <- function(n){
      -  tryCatch(n %% 2 == 0,
      -           error = function(e){
      -             FALSE
      -           })
      -}
      -
      -is_even_error(714)
      +
      is_even_error <- function(n) {
      +    tryCatch(n %% 2 == 0,
      +        error = function(e) {
      +            FALSE
      +        }
      +    )
      +}
      +
      +is_even_error(714)
      [1] TRUE
      @@ -627,8 +631,8 @@

      How should er

      This appears to be working the way you intended, however when applied to more data this function will be seriously slow compared to alternatives. For example I could check that n is numeric before treating n like a number:

      -
      is_even_check <- function(n){
      -  is.numeric(n) && n %% 2 == 0
      +
      is_even_check <- function(n) {
      +    is.numeric(n) && n %% 2 == 0
       }
       
       is_even_check(1876)
      @@ -701,9 +705,52 @@

      Additional Resources<

      +

      +
      +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/20-working-with-dates-and-times/index.html b/posts/20-working-with-dates-and-times/index.html index 32c1377..84d57be 100644 --- a/posts/20-working-with-dates-and-times/index.html +++ b/posts/20-working-with-dates-and-times/index.html @@ -286,6 +286,7 @@

      Table of contents

    45. Final Questions
    46. Additional Resources
    47. +
    48. R session information
    49. @@ -299,6 +300,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -424,11 +426,11 @@

      The lubridate<

      Artwork by Allison Horst on the dplyr package [Source: Artwork by Allison Horst]

      lubridate is installed when you install tidyverse, but it is not loaded when you load tidyverse. Alternatively, you can install it separately.

      -
      install.packages("lubridate") 
      +
      install.packages("lubridate")
      library(tidyverse)
      -library(lubridate) 
      +library(lubridate)

      @@ -465,7 +467,7 @@

      Creating date/times

      now()
      -
      [1] "2023-08-17 17:32:07 EDT"
      +
      [1] "2023-08-17 21:47:51 EDT"

      Otherwise, there are three ways you are likely to create a date/time:

      @@ -545,19 +547,19 @@

      1. From a string

      Alternate Formulations

      Different locales have different ways of formatting dates

      -
      ymd("2016-09-13")  ## International standard
      +
      ymd("2016-09-13") ## International standard
      [1] "2016-09-13"
      -
      ymd("2016/09/13")  ## Just figure it out
      +
      ymd("2016/09/13") ## Just figure it out
      [1] "2016-09-13"
      -
      mdy("09-13-2016")  ## Mostly U.S.
      +
      mdy("09-13-2016") ## Mostly U.S.
      [1] "2016-09-13"
      -
      dmy("13-09-2016")  ## Europe
      +
      dmy("13-09-2016") ## Europe
      [1] "2016-09-13"
      @@ -565,10 +567,12 @@

      Alternate Formulati

      All of the above are valid and lead to the exact same object.

      Even if the individual dates are formatted differently, ymd() can usually figure it out.

      -
      x <- c("2016-04-05", 
      -       "2016/05/06",
      -       "2016,10,4")
      -ymd(x)
      +
      x <- c(
      +    "2016-04-05",
      +    "2016/05/06",
      +    "2016,10,4"
      +)
      +ymd(x)
      [1] "2016-04-05" "2016-05-06" "2016-10-04"
      @@ -582,8 +586,8 @@

      2. Fr
      library(nycflights13)
       
      -flights %>% 
      -  select(year, month, day)
      +flights %>% + select(year, month, day)
      # A tibble: 336,776 × 3
           year month   day
      @@ -608,9 +612,9 @@ 

      2. Fr

      We combine these functions inside of mutate to add a new column to our dataset:

      -
      flights %>% 
      -  select(year, month, day) %>% 
      -  mutate(departure = make_date(year, month, day))
      +
      flights %>%
      +    select(year, month, day) %>%
      +    mutate(departure = make_date(year, month, day))
      # A tibble: 336,776 × 4
           year month   day departure 
      @@ -640,8 +644,8 @@ 

      2. Fr

      The flights also contains a hour and minute column.

      -
      flights %>% 
      -  select(year, month, day, hour, minute)
      +
      flights %>%
      +    select(year, month, day, hour, minute)
      # A tibble: 336,776 × 5
           year month   day  hour minute
      @@ -681,7 +685,7 @@ 

      3. From other types

      now()
      -
      [1] "2023-08-17 17:32:08 EDT"
      +
      [1] "2023-08-17 21:47:52 EDT"
      as_date(now())
      @@ -732,7 +736,7 @@

      POSIXct

      Technically, the POSIXct class represents the number of seconds since 1 January 1970. (In case you were wondering, “POSIXct” stands for “Portable Operating System Interface”, calendar time.)

      x <- ymd_hm("1970-01-01 01:00")
      -class(x) 
      +class(x)

      [1] "POSIXct" "POSIXt" 
      @@ -868,7 +872,7 @@

      Operations on Dates and Times

      Arithmetic

      You can add and subtract dates and times.

      -
      x <- ymd("2012-01-01", tz = "")  ## Midnight
      +
      x <- ymd("2012-01-01", tz = "") ## Midnight
       y <- dmy_hms("9 Jan 2011 11:34:21", tz = "")
       x - y ## this works
      @@ -889,7 +893,7 @@

      Arithmetic

      [1] FALSE
      -
      x + y ## what??? why does this not work? 
      +
      x + y ## what??? why does this not work?
      Error in `+.POSIXt`(x, y): binary '+' is not defined for "POSIXt" objects
      @@ -914,7 +918,7 @@

      Arithmetic

      POSIXct objects are a measure of seconds from an origin, usually the UNIX epoch (1st Jan 1970).

      Just add the requisite number of seconds to the object:

      -
      x + 3*60*60 # add 3 hours
      +
      x + 3 * 60 * 60 # add 3 hours
      [1] "2012-01-01 03:00:00 EST"
      @@ -931,7 +935,7 @@

      Arithmetic

      And then add a number to the date (in this case 1 day)

      -
      y + 1  
      +
      y + 1
      [1] "2011-01-10"
      @@ -989,9 +993,11 @@

      Extracting Elements of Dates/Times

      Date Elements

      -
      x <- ymd_hms(c("2012-10-25 01:13:46",
      -               "2015-04-23 15:11:23"), tz = "")
      -year(x)
      +
      x <- ymd_hms(c(
      +    "2012-10-25 01:13:46",
      +    "2015-04-23 15:11:23"
      +), tz = "")
      +year(x)
      [1] 2012 2015
      @@ -1012,9 +1018,11 @@

      Date Elements

      Time Elements

      -
      x <- ymd_hms(c("2012-10-25 01:13:46",
      -               "2015-04-23 15:11:23"), tz = "")
      -minute(x)
      +
      x <- ymd_hms(c(
      +    "2012-10-25 01:13:46",
      +    "2015-04-23 15:11:23"
      +), tz = "")
      +minute(x)
      [1] 13 11
      @@ -1139,9 +1147,9 @@

      Histograms of Dat
      library(ggplot2)
       storm_sub %>%
      -  ggplot(aes(x = begin)) + 
      -  geom_histogram(bins = 20) + 
      -  theme_bw()
      + ggplot(aes(x = begin)) + + geom_histogram(bins = 20) + + theme_bw()

      @@ -1150,11 +1158,11 @@

      Histograms of Dat
      library(ggplot2)
       storm_sub %>%
      -  ggplot(aes(x = begin)) + 
      -  facet_wrap(~ type) + 
      -  geom_histogram(bins = 20) + 
      -  theme_bw() + 
      -  theme(axis.text.x.bottom = element_text(angle = 90))
      + ggplot(aes(x = begin)) + + facet_wrap(~type) + + geom_histogram(bins = 20) + + theme_bw() + + theme(axis.text.x.bottom = element_text(angle = 90))

      @@ -1164,8 +1172,8 @@

      Histograms of Dat

      Scatterplots of Dates/Times

      storm_sub %>%
      -  ggplot(aes(x = begin, y = deaths)) + 
      -  geom_point()
      + ggplot(aes(x = begin, y = deaths)) + + geom_point()

      @@ -1173,9 +1181,9 @@

      Scatterplots of

      If we focus on a single month, the x-axis adapts.

      storm_sub %>%
      -  filter(month(begin) == 6) %>%
      -  ggplot(aes(begin, deaths)) + 
      -  geom_point()
      + filter(month(begin) == 6) %>% + ggplot(aes(begin, deaths)) + + geom_point()

      @@ -1183,9 +1191,9 @@

      Scatterplots of

      Similarly, we can focus on a single day.

      storm_sub %>%
      -  filter(month(begin) == 6, day(begin) == 16) %>%
      -  ggplot(aes(begin, deaths)) + 
      -  geom_point()
      + filter(month(begin) == 6, day(begin) == 16) %>% + ggplot(aes(begin, deaths)) + + geom_point()

      @@ -1270,9 +1278,96 @@

      Additional Resources<

      + + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package      * version date (UTC) lib source
      + bit            4.0.5   2022-11-15 [1] CRAN (R 4.3.0)
      + bit64          4.0.5   2020-08-30 [1] CRAN (R 4.3.0)
      + cli            3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout       1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + colorspace     2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
      + crayon         1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
      + digest         0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + dplyr        * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
      + emojifont      0.5.5   2021-04-20 [1] CRAN (R 4.3.0)
      + evaluate       0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fansi          1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
      + farver         2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
      + fastmap        1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + forcats      * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
      + generics       0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
      + ggplot2      * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
      + glue           1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + gtable         0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
      + here         * 1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
      + hms            1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
      + htmltools      0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets    1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite       1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr          1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + labeling       0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
      + lifecycle      1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + lubridate    * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
      + magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + munsell        0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
      + nycflights13 * 1.0.2   2021-04-12 [1] CRAN (R 4.3.0)
      + pillar         1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
      + pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
      + proto          1.0.0   2016-10-29 [1] CRAN (R 4.3.0)
      + purrr        * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
      + R6             2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
      + readr        * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
      + rlang          1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown      2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rprojroot      2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
      + rstudioapi     0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + scales         1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
      + sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + showtext       0.9-6   2023-05-03 [1] CRAN (R 4.3.0)
      + showtextdb     3.0     2020-06-04 [1] CRAN (R 4.3.0)
      + stringi        1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr      * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + sysfonts       0.8.8   2022-03-13 [1] CRAN (R 4.3.0)
      + tibble       * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
      + tidyr        * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
      + tidyselect     1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
      + tidyverse    * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
      + timechange     0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
      + tzdb           0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
      + utf8           1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
      + vctrs          0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + vroom          1.6.3   2023-04-28 [1] CRAN (R 4.3.0)
      + withr          2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
      + xfun           0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml           2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/21-regular-expressions/index.html b/posts/21-regular-expressions/index.html index 320ca61..5d01d20 100644 --- a/posts/21-regular-expressions/index.html +++ b/posts/21-regular-expressions/index.html @@ -281,6 +281,7 @@

      Table of contents

    50. Final Questions
    51. Additional Resources
    52. +
    53. R session information
    54. @@ -294,6 +295,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -675,7 +677,7 @@

      repetition

      [1] FALSE
      -
      # Does "Mississippi" contain the pattern of an "i" followed by 
      +
      # Does "Mississippi" contain the pattern of an "i" followed by
       # 2 of any character, with that pattern repeated three times adjacently?
       grepl("(i.{2}){3}", "Mississippi")
      @@ -746,12 +748,12 @@

      character sets

      So for example, to include a literal single or double quote in a string you can use \ to “escape” the string and being able to include a single or double quote:

      -
      double_quote <- "\"" 
      +
      double_quote <- "\""
       double_quote
      [1] "\""
      -
      single_quote <- '\''
      +
      single_quote <- "'"
       single_quote
      [1] "'"
      @@ -1109,7 +1111,7 @@

      grep()

      sub()

      The sub(pattern, replacement, x) function takes as arguments a regex, a “replacement,” and a vector of strings. This function will replace the first instance of that regex found in each string.

      -
      sub(pattern = "[Ii]", replacement = "1", x= c("Hawaii", "Illinois", "Kentucky"))
      +
      sub(pattern = "[Ii]", replacement = "1", x = c("Hawaii", "Illinois", "Kentucky"))
      [1] "Hawa1i"   "1llinois" "Kentucky"
      @@ -1248,7 +1250,7 @@

      str_replace

      [45] "Vermont" "VirginiB" "WBshington" "West VirginiB" [49] "Wisconsin" "Wyoming"
      -
      sub(pattern = "[Aa]", replacement = "B", x= state.name)
      +
      sub(pattern = "[Aa]", replacement = "B", x = state.name)
       [1] "Blabama"        "Blaska"         "Brizona"        "Brkansas"      
        [5] "CBlifornia"     "ColorBdo"       "Connecticut"    "DelBware"      
      @@ -1413,9 +1415,58 @@ 

      Additional Resources<

      +
      + +
      +

      R session information

      +
      +
      options(width = 120)
      +sessioninfo::session_info()
      +
      +
      ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
      + setting  value
      + version  R version 4.3.1 (2023-06-16)
      + os       macOS Ventura 13.5
      + system   aarch64, darwin20
      + ui       X11
      + language (EN)
      + collate  en_US.UTF-8
      + ctype    en_US.UTF-8
      + tz       America/New_York
      + date     2023-08-17
      + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
      +
      +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
      + package     * version date (UTC) lib source
      + cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
      + colorout      1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
      + digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
      + evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
      + fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
      + glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
      + htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
      + htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
      + jsonlite      1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
      + knitr       * 1.43    2023-05-25 [1] CRAN (R 4.3.0)
      + lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
      + magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
      + rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
      + rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.1)
      + rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
      + sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
      + stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
      + stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
      + vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
      + xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
      + yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
      +
      + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
      +
      +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      +
      +
      -
      diff --git a/posts/22-working-with-factors/index.html b/posts/22-working-with-factors/index.html index a6a666a..3264a73 100644 --- a/posts/22-working-with-factors/index.html +++ b/posts/22-working-with-factors/index.html @@ -270,6 +270,7 @@

      Table of contents

    55. Final Questions
    56. Additional Resources
    57. +
    58. R session information
    59. @@ -283,6 +284,7 @@

      Table of contents

      +

      This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

      Pre-lecture materials

      @@ -382,8 +384,8 @@

      Factor basics

      To create a factor you must start by creating a list of the valid levels:

      month_levels <- c(
      -  "Jan", "Feb", "Mar", "Apr", "May", "Jun", 
      -  "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
      +    "Jan", "Feb", "Mar", "Apr", "May", "Jun",
      +    "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
       )

      Now we can create a factor with the factor() function defining the levels argument:

      @@ -482,8 +484,8 @@

      C $class [1] "factor" -
      tibble(x1_original, x1_factor) %>% 
      -  mutate(x1_numeric = as.numeric(x1_factor))
      +
      tibble(x1_original, x1_factor) %>%
      +    mutate(x1_numeric = as.numeric(x1_factor))
      # A tibble: 8 × 3
         x1_original x1_factor x1_numeric
      @@ -539,7 +541,7 @@ 

      C

      This behavior of the factor() function feels unexpected at best.

      Another example of unexpected behavior is how the function will silently make a missing value because the values in the data and the levels do not match.

      -
      factor("a", levels="c")
      +
      factor("a", levels = "c")
      [1] <NA>
       Levels: c
      @@ -567,10 +569,12 @@

      Factors when mo

      Consider a vector of character strings with three income levels:

      -
      income_level <- c(rep("low",10), 
      -                  rep("medium",10), 
      -                  rep("high",10))
      -income_level
      +
      income_level <- c(
      +    rep("low", 10),
      +    rep("medium", 10),
      +    rep("high", 10)
      +)
      +income_level
       [1] "low"    "low"    "low"    "low"    "low"    "low"    "low"    "low"   
        [9] "low"    "low"    "medium" "medium" "medium" "medium" "medium" "medium"
      @@ -598,7 +602,7 @@ 

      Factors when mo Coefficients: (Intercept) xlow xmedium - 0.4194 -0.2668 -0.6707

      + -0.5621 0.5728 0.4219

      @@ -608,18 +612,20 @@

      Factors when mo

      Memory req for factors and character strings

      Consider a large character string such as income_level corresponding to a categorical variable.

      -
      income_level <- c(rep("low",10000), 
      -                  rep("medium",10000), 
      -                  rep("high",10000))
      +
      income_level <- c(
      +    rep("low", 10000),
      +    rep("medium", 10000),
      +    rep("high", 10000)
      +)

      In early versions of R, storing categorical data as a factor variable was considerably more efficient than storing the same data as strings, because factor variables only store the factor labels once.

      However, R now uses a global string pool, so each unique string is only stored once, which means storage is now less of an issue.

      -
      format(object.size(income_level), units="Kb") # size of the character string
      +
      format(object.size(income_level), units = "Kb") # size of the character string
      [1] "234.6 Kb"
      -
      format(object.size(factor(income_level)), units="Kb") # size of the factor
      +
      format(object.size(factor(income_level)), units = "Kb") # size of the factor
      [1] "117.8 Kb"
      @@ -678,8 +684,8 @@

      General Social Surve

      When factors are stored in a tibble, you cannot see their levels so easily. One way to view them is with count():

      -
      gss_cat %>% 
      -  count(race)
      +
      gss_cat %>%
      +    count(race)
      # A tibble: 3 × 2
         race      n
      @@ -691,8 +697,8 @@ 

      General Social Surve

      Or with a bar chart using the geom_bar() geom:

      -
      gss_cat %>% 
      -  ggplot(aes(x=race)) +
      +
      gss_cat %>%
      +    ggplot(aes(x = race)) +
           geom_bar()

      A bar chart showing the distribution of race. There are ~2000 records with race "Other", 3000 with race "Black", and other 15,000 with race "White".

      @@ -722,8 +728,8 @@

      Modifying factor or

      It’s often useful to change the order of the factor levels in a visualization.

      Let’s explore the relig (religion) factor:

      -
      gss_cat %>% 
      -  count(relig)
      +
      gss_cat %>%
      +    count(relig)
      # A tibble: 15 × 2
          relig                       n
      @@ -766,14 +772,16 @@ 

      Modifying factor or

      The first level is “No answer” followed by “Don’t know”, and so on.

      Imagine you want to explore the average number of hours spent watching TV (tvhours) per day across religions (relig):

      -
      relig_summary <- gss_cat %>% 
      -  group_by(relig) %>% 
      -  summarise(tvhours = mean(tvhours, na.rm = TRUE),
      -            n = n())
      -
      -relig_summary %>% 
      -  ggplot(aes(x = tvhours, y = relig)) + 
      -  geom_point()
      +
      relig_summary <- gss_cat %>%
      +    group_by(relig) %>%
      +    summarise(
      +        tvhours = mean(tvhours, na.rm = TRUE),
      +        n = n()
      +    )
      +
      +relig_summary %>%
      +    ggplot(aes(x = tvhours, y = relig)) +
      +    geom_point()

      A scatterplot of with tvhours on the x-axis and religion on the y-axis. The y-axis is ordered seemingly aribtrarily making it hard to get any sense of overall pattern.

      @@ -789,10 +797,12 @@

      fct_reorder

    60. Optionally, .fun, a function that’s used if there are multiple values of x for each value of f. The default value is median.
    61. -
      relig_summary %>% 
      -  ggplot(aes(x = tvhours, 
      -             y = fct_reorder(.f = relig, .x = tvhours))) +
      -    geom_point()
      +
      relig_summary %>%
      +    ggplot(aes(
      +        x = tvhours,
      +        y = fct_reorder(.f = relig, .x = tvhours)
      +    )) +
      +    geom_point()

      The same scatterplot as above, but now the religion is displayed in increasing order of tvhours. "Other eastern" has the fewest tvhours under 2, and "Don't know" has the highest (over 5).

      @@ -811,9 +821,9 @@

      fct_reorder

      You could rewrite the plot above as:

      -
      relig_summary %>% 
      -  mutate(relig = fct_reorder(relig, tvhours)) %>% 
      -  ggplot(aes(x = tvhours, y = relig)) +
      +
      relig_summary %>%
      +    mutate(relig = fct_reorder(relig, tvhours)) %>%
      +    ggplot(aes(x = tvhours, y = relig)) +
           geom_point()

      @@ -833,15 +843,17 @@

      fct_reorder

      What if we create a similar plot looking at how average age varies across reported income level?

      -
      rincome_summary <- 
      -  gss_cat %>% 
      -  group_by(rincome) %>% 
      -  summarise(age = mean(age, na.rm = TRUE),
      -            n = n())
      -
      -rincome_summary %>% 
      -  ggplot(aes(x = age, y = fct_reorder(.f = rincome, .x = age))) + 
      -    geom_point()
      +
      rincome_summary <-
      +    gss_cat %>%
      +    group_by(rincome) %>%
      +    summarise(
      +        age = mean(age, na.rm = TRUE),
      +        n = n()
      +    )
      +
      +rincome_summary %>%
      +    ggplot(aes(x = age, y = fct_reorder(.f = rincome, .x = age))) +
      +    geom_point()

      A scatterplot with age on the x-axis and income on the y-axis. Income has been reordered in order of average age which doesn't make much sense. One section of the y-axis goes from $6000-6999, then <$1000, then $8000-9999.

      @@ -882,7 +894,7 @@

      fct_reorder

    library(palmerpenguins)
    -penguins 
    +penguins
    # A tibble: 344 × 8
        species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
    @@ -911,8 +923,8 @@ 

    fct_relevel

    You can use fct_relevel().

    It takes a factor, f, and then any number of levels that you want to move to the front of the line.

    -
    rincome_summary %>% 
    -  ggplot(aes(age, fct_relevel(rincome, "Not applicable"))) +
    +
    rincome_summary %>%
    +    ggplot(aes(age, fct_relevel(rincome, "Not applicable"))) +
         geom_point()

    The same scatterplot but now "Not Applicable" is displayed at the bottom of the y-axis. Generally there is a positive association between income and age, and the income band with the highest average age is "Not applicable".

    @@ -934,20 +946,20 @@

    fct_relevel

    Another type of reordering is useful when you are coloring the lines on a plot. fct_reorder2(f, x, y) reorders the factor f by the y values associated with the largest x values.

    This makes the plot easier to read because the colors of the line at the far right of the plot will line up with the legend.

    -
    by_age <- 
    -  gss_cat %>% 
    -  filter(!is.na(age)) %>% 
    -  count(age, marital) %>% 
    -  group_by(age) %>% 
    -  mutate(prop = n / sum(n))
    +
    by_age <-
    +    gss_cat %>%
    +    filter(!is.na(age)) %>%
    +    count(age, marital) %>%
    +    group_by(age) %>%
    +    mutate(prop = n / sum(n))
     
    -by_age %>% 
    -  ggplot(aes(age, prop, colour = marital)) +
    +by_age %>%
    +    ggplot(aes(age, prop, colour = marital)) +
         geom_line(na.rm = TRUE)
    -by_age %>% 
    -  ggplot(aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
    +by_age %>%
    +    ggplot(aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
         geom_line() +
    -  labs(colour = "marital")
    + labs(colour = "marital")
    @@ -964,9 +976,9 @@

    fct_relevel

    fct_infreq

    Finally, for bar plots, you can use fct_infreq() to order levels in decreasing frequency: this is the simplest type of reordering because it doesn’t need any extra variables. Combine it with fct_rev() if you want them in increasing frequency so that in the bar plot largest values are on the right, not the left.

    -
    gss_cat %>% 
    -  mutate(marital = marital %>% fct_infreq() %>% fct_rev()) %>%  
    -  ggplot(aes(marital)) +
    +
    gss_cat %>%
    +    mutate(marital = marital %>% fct_infreq() %>% fct_rev()) %>%
    +    ggplot(aes(marital)) +
         geom_bar()

    A bar char of marital status ordered in from least to most common: no answer (~0), separated (~1,000), widowed (~2,000), divorced (~3,000), never married (~5,000), married (~10,000).

    @@ -981,8 +993,8 @@

    Modifying factor l

    fct_recode

    The most general and powerful tool is fct_recode(). It allows you to recode, or change, the value of each level. For example, take the gss_cat$partyid:

    -
    gss_cat %>% 
    -  count(partyid)
    +
    gss_cat %>%
    +    count(partyid)
    # A tibble: 10 × 2
        partyid                n
    @@ -1007,15 +1019,16 @@ 

    fct_recode

  • the old values go on the right
  • -
    gss_cat %>% 
    -  mutate(partyid = fct_recode(partyid,
    -      "Republican, strong"    = "Strong republican",
    -      "Republican, weak"      = "Not str republican",
    -      "Independent, near rep" = "Ind,near rep",
    -      "Independent, near dem" = "Ind,near dem",
    -      "Democrat, weak"        = "Not str democrat",
    -      "Democrat, strong"      = "Strong democrat")) %>% 
    -  count(partyid)
    +
    gss_cat %>%
    +    mutate(partyid = fct_recode(partyid,
    +        "Republican, strong"    = "Strong republican",
    +        "Republican, weak"      = "Not str republican",
    +        "Independent, near rep" = "Ind,near rep",
    +        "Independent, near dem" = "Ind,near dem",
    +        "Democrat, weak"        = "Not str democrat",
    +        "Democrat, strong"      = "Strong democrat"
    +    )) %>%
    +    count(partyid)
    # A tibble: 10 × 2
        partyid                   n
    @@ -1047,18 +1060,19 @@ 

    fct_recode

    To combine groups, you can assign multiple old levels to the same new level:

    -
    gss_cat %>% 
    -  mutate(partyid = fct_recode(partyid,
    -      "Republican, strong"    = "Strong republican",
    -      "Republican, weak"      = "Not str republican",
    -      "Independent, near rep" = "Ind,near rep",
    -      "Independent, near dem" = "Ind,near dem",
    -      "Democrat, weak"        = "Not str democrat",
    -      "Democrat, strong"      = "Strong democrat",
    -      "Other"                 = "No answer",
    -      "Other"                 = "Don't know",
    -      "Other"                 = "Other party")) %>% 
    -  count(partyid)
    +
    gss_cat %>%
    +    mutate(partyid = fct_recode(partyid,
    +        "Republican, strong"    = "Strong republican",
    +        "Republican, weak"      = "Not str republican",
    +        "Independent, near rep" = "Ind,near rep",
    +        "Independent, near dem" = "Ind,near dem",
    +        "Democrat, weak"        = "Not str democrat",
    +        "Democrat, strong"      = "Strong democrat",
    +        "Other"                 = "No answer",
    +        "Other"                 = "Don't know",
    +        "Other"                 = "Other party"
    +    )) %>%
    +    count(partyid)
    # A tibble: 8 × 2
       partyid                   n
    @@ -1080,13 +1094,14 @@ 

    fct_collapse

    If you want to collapse a lot of levels, fct_collapse() is a useful variant of fct_recode().

    For each new variable, you can provide a vector of old levels:

    -
    gss_cat %>% 
    -  mutate(partyid = fct_collapse(partyid,
    -      "other" = c("No answer", "Don't know", "Other party"),
    -      "rep" = c("Strong republican", "Not str republican"),
    -      "ind" = c("Ind,near rep", "Independent", "Ind,near dem"),
    -      "dem" = c("Not str democrat", "Strong democrat"))) %>% 
    -  count(partyid)
    +
    gss_cat %>%
    +    mutate(partyid = fct_collapse(partyid,
    +        "other" = c("No answer", "Don't know", "Other party"),
    +        "rep" = c("Strong republican", "Not str republican"),
    +        "ind" = c("Ind,near rep", "Independent", "Ind,near dem"),
    +        "dem" = c("Not str democrat", "Strong democrat")
    +    )) %>%
    +    count(partyid)
    # A tibble: 4 × 2
       partyid     n
    @@ -1104,9 +1119,9 @@ 

    fct_lump_*

    That’s the job of the fct_lump_*() family of functions.

    fct_lump_lowfreq() is a simple starting point that progressively lumps the smallest groups categories into “Other”, always keeping “Other” as the smallest category.

    -
    gss_cat %>% 
    -  mutate(relig = fct_lump_lowfreq(relig)) %>% 
    -  count(relig)
    +
    gss_cat %>%
    +    mutate(relig = fct_lump_lowfreq(relig)) %>%
    +    count(relig)
    # A tibble: 2 × 2
       relig          n
    @@ -1118,10 +1133,10 @@ 

    fct_lump_*

    In this case it’s not very helpful: it is true that the majority of Americans in this survey are Protestant, but we’d probably like to see some more details!

    Instead, we can use the fct_lump_n() to specify that we want exactly 10 groups:

    -
    gss_cat %>% 
    -  mutate(relig = fct_lump_n(relig, n = 10)) %>% 
    -  count(relig, sort = TRUE) %>% 
    -  print(n = Inf)
    +
    gss_cat %>%
    +    mutate(relig = fct_lump_n(relig, n = 10)) %>%
    +    count(relig, sort = TRUE) %>%
    +    print(n = Inf)
    # A tibble: 10 × 2
        relig                       n
    @@ -1206,9 +1221,85 @@ 

    Additional Resources<

    + + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
    +
    +─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
    + package        * version date (UTC) lib source
    + cli              3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
    + colorout         1.2-2   2023-05-06 [1] Github (jalvesaq/colorout@79931fd)
    + colorspace       2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
    + digest           0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
    + dplyr          * 1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
    + evaluate         0.21    2023-05-05 [1] CRAN (R 4.3.0)
    + fansi            1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
    + farver           2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
    + fastmap          1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
    + forcats        * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
    + generics         0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
    + ggplot2        * 3.4.3   2023-08-14 [1] CRAN (R 4.3.1)
    + glue             1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
    + gtable           0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
    + hms              1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
    + htmltools        0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
    + htmlwidgets      1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
    + jsonlite         1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
    + knitr            1.43    2023-05-25 [1] CRAN (R 4.3.0)
    + labeling         0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
    + lifecycle        1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
    + lubridate      * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
    + magrittr         2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
    + munsell          0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
    + palmerpenguins * 0.1.1   2022-08-15 [1] CRAN (R 4.3.0)
    + pillar           1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
    + pkgconfig        2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
    + purrr          * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
    + R6               2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
    + readr          * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
    + rlang            1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
    + rmarkdown        2.24    2023-08-14 [1] CRAN (R 4.3.1)
    + rstudioapi       0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
    + scales           1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
    + sessioninfo      1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
    + stringi          1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
    + stringr        * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
    + tibble         * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
    + tidyr          * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
    + tidyselect       1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
    + tidyverse      * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
    + timechange       0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
    + tzdb             0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
    + utf8             1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
    + vctrs            0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
    + withr            2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
    + xfun             0.40    2023-08-09 [1] CRAN (R 4.3.0)
    + yaml             2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
    +
    + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
    +
    +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    +
    +
    -
    diff --git a/posts/23-working-with-text-sentiment-analysis/index.html b/posts/23-working-with-text-sentiment-analysis/index.html index bc7a42f..936fa10 100644 --- a/posts/23-working-with-text-sentiment-analysis/index.html +++ b/posts/23-working-with-text-sentiment-analysis/index.html @@ -263,6 +263,7 @@

    Table of contents

  • Document-term matrix
  • Creating DocumentTermMatrix objects
  • +
  • R session information
  • @@ -276,6 +277,7 @@

    Table of contents

    +

    This lecture, as the rest of the course, is adapted from the version Stephanie C. Hicks designed and maintained in 2021 and 2022. Check the recent changes to this file through the GitHub history.

    Pre-lecture materials

    @@ -413,16 +415,18 @@

    How does it work?

    To make this easier, I typed this text into a vector of character strings: one string per sentence.

    -
    peng_preface <- 
    -  c("I started using R in 1998 when I was a college undergraduate working on my senior thesis.", 
    -    "The version was 0.63.",  
    -    "I was an applied mathematics major with a statistics concentration and I was working with Dr. Nicolas Hengartner on an analysis of word frequencies in classic texts (Shakespeare, Milton, etc.).", 
    -    "The idea was to see if we could identify the authorship of each of the texts based on how frequently they used certain words.", 
    -    "We downloaded the data from Project Gutenberg and used some basic linear discriminant analysis for the modeling.",
    -    "The work was eventually published and was my first ever peer-reviewed publication.", 
    -    "I guess you could argue it was my first real 'data science' experience.")
    -
    -peng_preface
    +
    peng_preface <-
    +    c(
    +        "I started using R in 1998 when I was a college undergraduate working on my senior thesis.",
    +        "The version was 0.63.",
    +        "I was an applied mathematics major with a statistics concentration and I was working with Dr. Nicolas Hengartner on an analysis of word frequencies in classic texts (Shakespeare, Milton, etc.).",
    +        "The idea was to see if we could identify the authorship of each of the texts based on how frequently they used certain words.",
    +        "We downloaded the data from Project Gutenberg and used some basic linear discriminant analysis for the modeling.",
    +        "The work was eventually published and was my first ever peer-reviewed publication.",
    +        "I guess you could argue it was my first real 'data science' experience."
    +    )
    +
    +peng_preface
    [1] "I started using R in 1998 when I was a college undergraduate working on my senior thesis."                                                                                                        
     [2] "The version was 0.63."                                                                                                                                                                            
    @@ -444,9 +448,11 @@ 

    How does it work?

    Then, we use the tibble() function to construct a data frame with two columns: one counting the line number and one from the character strings in peng_preface.

    -
    peng_preface_df <- tibble(line=1:7, 
    -                          text=peng_preface)
    -peng_preface_df
    +
    peng_preface_df <- tibble(
    +    line = 1:7,
    +    text = peng_preface
    +)
    +peng_preface_df
    # A tibble: 7 × 2
        line text                                                                    
    @@ -465,14 +471,16 @@ 

    How does it work?

    Text Mining and Tokens

    Next, we will use the unnest_tokens() function where we will call the output column to be created word and the input column text from the peng_preface_df.

    -
    peng_token <- 
    -  peng_preface_df %>% 
    -  unnest_tokens(output = word, 
    -                input = text, 
    -                token = "words")
    -
    -peng_token %>% 
    -  head()
    +
    peng_token <-
    +    peng_preface_df %>%
    +    unnest_tokens(
    +        output = word,
    +        input = text,
    +        token = "words"
    +    )
    +
    +peng_token %>%
    +    head()
    # A tibble: 6 × 2
        line word   
    @@ -484,8 +492,8 @@ 

    Text Mining and Tok 5 1 in 6 1 1998

    -
    peng_token %>% 
    -  tail()
    +
    peng_token %>%
    +    tail()
    # A tibble: 6 × 2
        line word      
    @@ -512,11 +520,12 @@ 

    Text Mining and Tok

    We could tokenize by "characters":

    -
    peng_preface_df %>% 
    -  unnest_tokens(word, 
    -                text, 
    -                token = "characters") %>% 
    -  head()
    +
    peng_preface_df %>%
    +    unnest_tokens(word,
    +        text,
    +        token = "characters"
    +    ) %>%
    +    head()
    # A tibble: 6 × 2
        line word 
    @@ -533,12 +542,13 @@ 

    Text Mining and Tok

    or something called ngrams, which is defined by Wikipedia as a “contiguous sequence of n items from a given sample of text or speech”

    -
    peng_preface_df %>% 
    -  unnest_tokens(word,
    -                text, 
    -                token = "ngrams", 
    -                n=3) %>% 
    -  head()
    +
    peng_preface_df %>%
    +    unnest_tokens(word,
    +        text,
    +        token = "ngrams",
    +        n = 3
    +    ) %>%
    +    head()
    # A tibble: 6 × 2
        line word           
    @@ -553,12 +563,13 @@ 

    Text Mining and Tok

    Another option is to use the character_shingles option, which is similar to tokenizing like ngrams, except the units are characters instead of words.

    -
    peng_preface_df %>% 
    -  unnest_tokens(word, 
    -                text, 
    -                token = "character_shingles",
    -                n = 4) %>% 
    -  head()
    +
    peng_preface_df %>%
    +    unnest_tokens(word,
    +        text,
    +        token = "character_shingles",
    +        n = 4
    +    ) %>%
    +    head()
    # A tibble: 6 × 2
        line word 
    @@ -573,12 +584,13 @@ 

    Text Mining and Tok

    You can also create custom functions for tokenization.

    -
    peng_preface_df %>% 
    -  unnest_tokens(word, 
    -                text, 
    -                token = stringr::str_split,
    -                pattern = " ") %>% 
    -  head()
    +
    peng_preface_df %>%
    +    unnest_tokens(word,
    +        text,
    +        token = stringr::str_split,
    +        pattern = " "
    +    ) %>%
    +    head()
    # A tibble: 6 × 2
        line word   
    @@ -603,15 +615,19 @@ 

    Text Mining and Tok

    Let’s tokenize the first four sentences of Amanda Gorman’s The Hill We Climb by words.

    -
    gorman_hill_we_climb <- 
    -  c("When day comes we ask ourselves, where can we find light in this neverending shade?",
    -    "The loss we carry, a sea we must wade.", 
    -    "We’ve braved the belly of the beast, we’ve learned that quiet isn’t always peace and the norms and notions of what just is, isn’t always justice.",
    -    "And yet the dawn is ours before we knew it, somehow we do it, somehow we’ve weathered and witnessed a nation that isn’t broken but simply unfinished.")
    -
    -hill_df <- tibble(line=seq_along(gorman_hill_we_climb), 
    -                  text=gorman_hill_we_climb)
    -hill_df 
    +
    gorman_hill_we_climb <-
    +    c(
    +        "When day comes we ask ourselves, where can we find light in this neverending shade?",
    +        "The loss we carry, a sea we must wade.",
    +        "We’ve braved the belly of the beast, we’ve learned that quiet isn’t always peace and the norms and notions of what just is, isn’t always justice.",
    +        "And yet the dawn is ours before we knew it, somehow we do it, somehow we’ve weathered and witnessed a nation that isn’t broken but simply unfinished."
    +    )
    +
    +hill_df <- tibble(
    +    line = seq_along(gorman_hill_we_climb),
    +    text = gorman_hill_we_climb
    +)
    +hill_df
    # A tibble: 4 × 2
        line text                                                                    
    @@ -623,10 +639,12 @@ 

    Text Mining and Tok

    ### try it out
     
    -hill_df %>% 
    -  unnest_tokens(output = wordsforfun, 
    -                input = text, 
    -                token = "words")
    +hill_df %>% + unnest_tokens( + output = wordsforfun, + input = text, + token = "words" + )

    # A tibble: 77 × 2
         line wordsforfun
    @@ -698,12 +716,14 @@ 

    Example: text from works of Jane Austen

  • Convert it into a one-row-per-line dataframe using the unnest_tokens() function
  • -
    pp_book_df <- tibble(text = prideprejudice) 
    -  
    -pp_book_df %>% 
    -  unnest_tokens(output = word, 
    -                input = text, 
    -                token="words")
    +
    pp_book_df <- tibble(text = prideprejudice)
    +
    +pp_book_df %>%
    +    unnest_tokens(
    +        output = word,
    +        input = text,
    +        token = "words"
    +    )
    # A tibble: 122,204 × 1
        word     
    @@ -723,11 +743,13 @@ 

    Example: text from works of Jane Austen

    We can also divide it by paragraphs:

    -
    tmp <- pp_book_df %>% 
    -  unnest_tokens(output = paragraph, 
    -                input = text, 
    -                token ="paragraphs") 
    -tmp
    +
    tmp <- pp_book_df %>%
    +    unnest_tokens(
    +        output = paragraph,
    +        input = text,
    +        token = "paragraphs"
    +    )
    +tmp
    # A tibble: 10,721 × 1
        paragraph                                                                    
    @@ -747,7 +769,7 @@ 

    Example: text from works of Jane Austen

    We can extract a particular element from the tibble

    -
    tmp[3,1]
    +
    tmp[3, 1]
    # A tibble: 1 × 1
       paragraph
    @@ -771,9 +793,11 @@ 

    Example: text from works of Jane Austen

    We could also divide it by sentence:

    pp_book_df %>%
    -    unnest_tokens(output = sentence,
    -                  input = text, 
    -                  token = "sentences") 
    + unnest_tokens( + output = sentence, + input = text, + token = "sentences" + )
    # A tibble: 15,545 × 1
        sentence                                                                  
    @@ -811,14 +835,16 @@ 

    Example: text from works of Jane Austen

    This lets us keep track of which paragraph is which.

    -
    paragraphs <- 
    -  pp_book_df %>%
    -    unnest_tokens(output = paragraph, 
    -                  input = text, 
    -                  token = "paragraphs") %>%
    -    mutate(paragraph_number = row_number()) 
    -
    -paragraphs
    +
    paragraphs <-
    +    pp_book_df %>%
    +    unnest_tokens(
    +        output = paragraph,
    +        input = text,
    +        token = "paragraphs"
    +    ) %>%
    +    mutate(paragraph_number = row_number())
    +
    +paragraphs
    # A tibble: 10,721 × 2
        paragraph                                                    paragraph_number
    @@ -852,8 +878,10 @@ 

    Example: text from works of Jane Austen

    After tokenizing by paragraph, we can then tokenzie by word:

    paragraphs %>%
    -    unnest_tokens(output = word, 
    -                  input = paragraph)
    + unnest_tokens( + output = word, + input = paragraph + )
    # A tibble: 122,204 × 2
        paragraph_number word     
    @@ -882,8 +910,8 @@ 

    Example: text from works of Jane Austen

    onix SMART snowball 404 571 174
    -
    stop_words %>% 
    -  head(n=10)
    +
    stop_words %>%
    +    head(n = 10)
    # A tibble: 10 × 2
        word        lexicon
    @@ -902,15 +930,17 @@ 

    Example: text from works of Jane Austen

    We can remove stop words (kept in the tidytext dataset stop_words) with an anti_join(x,y) (return all rows from x without a match in y).

    -
    words_by_paragraph <- 
    -  paragraphs %>%
    -    unnest_tokens(output = word, 
    -                  input = paragraph) %>%
    -    anti_join(stop_words)
    +
    words_by_paragraph <-
    +    paragraphs %>%
    +    unnest_tokens(
    +        output = word,
    +        input = paragraph
    +    ) %>%
    +    anti_join(stop_words)
    Joining with `by = join_by(word)`
    -
    words_by_paragraph 
    +
    words_by_paragraph
    # A tibble: 37,246 × 2
        paragraph_number word        
    @@ -932,8 +962,8 @@ 

    Example: text from works of Jane Austen

    For example, here we use dplyr’s count() function to find the most common words in the book

    words_by_paragraph %>%
    -  count(word, sort = TRUE) %>% 
    -  head()
    + count(word, sort = TRUE) %>% + head()
    # A tibble: 6 × 2
       word          n
    @@ -949,10 +979,10 @@ 

    Example: text from works of Jane Austen

    Then use ggplot2 to plot the most commonly used words from the book.

    words_by_paragraph %>%
    -  count(word, sort = TRUE) %>%
    -  filter(n > 150) %>%
    -  mutate(word = fct_reorder(word, n)) %>%
    -  ggplot(aes(word, n)) +
    +    count(word, sort = TRUE) %>%
    +    filter(n > 150) %>%
    +    mutate(word = fct_reorder(word, n)) %>%
    +    ggplot(aes(word, n)) +
         geom_col() +
         xlab(NULL) +
         coord_flip()
    @@ -962,8 +992,8 @@

    Example: text from works of Jane Austen

    We can also do this for all of her books using the austen_books() object

    -
    austen_books() %>% 
    -  head()
    +
    austen_books() %>%
    +    head()
    # A tibble: 6 × 2
       text                    book               
    @@ -978,19 +1008,23 @@ 

    Example: text from works of Jane Austen

    We can do some data wrangling that keep tracks of the line number and chapter (using a regex) to find where all the chapters are.

    -
    original_books <- 
    -  austen_books() %>%
    -  group_by(book) %>%
    -  mutate(linenumber = row_number(),
    -         chapter = cumsum(
    -                    str_detect(text, 
    -                               pattern = regex(pattern = "^chapter [\\divxlc]",
    -                                               ignore_case = TRUE))
    -                              )
    -                          ) %>%
    -  ungroup()
    -
    -original_books
    +
    original_books <-
    +    austen_books() %>%
    +    group_by(book) %>%
    +    mutate(
    +        linenumber = row_number(),
    +        chapter = cumsum(
    +            str_detect(text,
    +                pattern = regex(
    +                    pattern = "^chapter [\\divxlc]",
    +                    ignore_case = TRUE
    +                )
    +            )
    +        )
    +    ) %>%
    +    ungroup()
    +
    +original_books
    # A tibble: 73,422 × 4
        text                    book                linenumber chapter
    @@ -1011,8 +1045,8 @@ 

    Example: text from works of Jane Austen

    Finally, we can restructure it to a one-token-per-row format using the unnest_tokens() function and remove stop words using the anti_join() function in dplyr.

    tidy_books <- original_books %>%
    -  unnest_tokens(word, text) %>% 
    -  anti_join(stop_words)
    + unnest_tokens(word, text) %>% + anti_join(stop_words)
    Joining with `by = join_by(word)`
    @@ -1037,10 +1071,10 @@

    Example: text from works of Jane Austen

    Here are the most commonly used words across all of Jane Austen’s books.

    tidy_books %>%
    -  count(word, sort = TRUE) %>%
    -  filter(n > 600) %>%
    -  mutate(word = fct_reorder(word, n)) %>%
    -  ggplot(aes(word, n)) +
    +    count(word, sort = TRUE) %>%
    +    filter(n > 600) %>%
    +    mutate(word = fct_reorder(word, n)) %>%
    +    ggplot(aes(word, n)) +
         geom_col() +
         xlab(NULL) +
         coord_flip()
    @@ -1167,13 +1201,13 @@

    -
    nrc_joy <- get_sentiments("nrc") %>% 
    -  filter(sentiment == "joy")
    +
    nrc_joy <- get_sentiments("nrc") %>%
    +    filter(sentiment == "joy")
     
     tidy_books %>%
    -  filter(book == "Emma") %>%
    -  inner_join(nrc_joy) %>%
    -  count(word, sort = TRUE)
    + filter(book == "Emma") %>% + inner_join(nrc_joy) %>% + count(word, sort = TRUE)
    Joining with `by = join_by(word)`
    @@ -1200,7 +1234,7 @@

    tidy_books %>%
    -  inner_join(get_sentiments("bing"))
    + inner_join(get_sentiments("bing"))

    Joining with `by = join_by(word)`
    @@ -1232,10 +1266,11 @@

    tidy_books %>%
    -  inner_join(get_sentiments("bing")) %>%
    -  count(book, 
    -        index = linenumber %/% 80, 
    -        sentiment) 
    + inner_join(get_sentiments("bing")) %>% + count(book, + index = linenumber %/% 80, + sentiment + )

    Joining with `by = join_by(word)`
    @@ -1278,16 +1313,19 @@

    -
    jane_austen_sentiment <- 
    -  tidy_books %>%
    -  inner_join(get_sentiments("bing")) %>%
    -  count(book, 
    -        index = linenumber %/% 80, 
    -        sentiment) %>%
    -  pivot_wider(names_from = sentiment, 
    -              values_from = n, 
    -              values_fill = 0) %>%
    -  mutate(sentiment = positive - negative)
    +
    jane_austen_sentiment <-
    +    tidy_books %>%
    +    inner_join(get_sentiments("bing")) %>%
    +    count(book,
    +        index = linenumber %/% 80,
    +        sentiment
    +    ) %>%
    +    pivot_wider(
    +        names_from = sentiment,
    +        values_from = n,
    +        values_fill = 0
    +    ) %>%
    +    mutate(sentiment = positive - negative)
    Joining with `by = join_by(word)`
    @@ -1318,8 +1356,8 @@

    -
    jane_austen_sentiment %>% 
    -  ggplot(aes(x = index, y = sentiment, fill = book)) +
    +
    jane_austen_sentiment %>%
    +    ggplot(aes(x = index, y = sentiment, fill = book)) +
         geom_col(show.legend = FALSE) +
         facet_wrap(. ~ book, ncol = 2, scales = "free_x")
    @@ -1337,12 +1375,16 @@

    Word clouds

    Loading required package: RColorBrewer
    tidy_books %>%
    -  anti_join(stop_words) %>%
    -  count(word) %>%
    -  with(wordcloud(word, n, max.words = 100))
    + anti_join(stop_words) %>% + count(word) %>% + with(wordcloud(word, n, max.words = 100))
    Joining with `by = join_by(word)`
    +
    +
    Warning in wordcloud(word, n, max.words = 100): miss could not be fit on page.
    +It will not be plotted.
    +

    @@ -1392,15 +1434,15 @@

    Creati

    Perhaps the most widely used implementation of DTMs in R is the DocumentTermMatrix class in the tm package. Many available text mining datasets are provided in this format.

    Let’s create a sparse with cast_sparse() function and then a dtm with the cast_dtm() function:

    -
    tidy_austen <- 
    -  austen_books() %>%
    -    mutate(line = row_number()) %>%
    -    unnest_tokens(word, text) %>%
    -    anti_join(stop_words)
    +
    tidy_austen <-
    +    austen_books() %>%
    +    mutate(line = row_number()) %>%
    +    unnest_tokens(word, text) %>%
    +    anti_join(stop_words)
    Joining with `by = join_by(word)`
    -
    tidy_austen
    +
    tidy_austen
    # A tibble: 217,609 × 3
        book                 line word       
    @@ -1420,11 +1462,11 @@ 

    Creati

    First, we’ll make a sparse matrix with cast_sparse(data, row, column, value):

    -
    austen_sparse <- tidy_austen %>%
    -    count(line, word) %>%
    -    cast_sparse(row = line, column = word, value = n)
    -
    -austen_sparse[1:10, 1:10]
    +
    austen_sparse <- tidy_austen %>%
    +    count(line, word) %>%
    +    cast_sparse(row = line, column = word, value = n)
    +
    +austen_sparse[1:10, 1:10]
    10 x 10 sparse Matrix of class "dgCMatrix"
    @@ -1447,11 +1489,11 @@

    Creati

    Next, we’ll make a dtm object with cast_dtm(data, document, matrix):

    -
    austen_dtm <- tidy_austen %>%
    -    count(line, word) %>%
    -    cast_dtm(document = line, term = word, value = n)
    -
    -austen_dtm
    +
    austen_dtm <- tidy_austen %>%
    +    count(line, word) %>%
    +    cast_dtm(document = line, term = word, value = n)
    +
    +austen_dtm
    <<DocumentTermMatrix (documents: 61010, terms: 13914)>>
     Non-/sparse entries: 216128/848677012
    @@ -1461,15 +1503,15 @@ 

    Creati

    -
    class(austen_dtm)  
    +
    class(austen_dtm)
    [1] "DocumentTermMatrix"    "simple_triplet_matrix"
    -
    dim(austen_dtm)
    +
    dim(austen_dtm)
    [1] 61010 13914
    -
    as.matrix(austen_dtm[1:20, 1:10])
    +
    as.matrix(austen_dtm[1:20, 1:10])
        Terms
     Docs sense sensibility austen jane 1811 1 chapter dashwood estate family
    @@ -1501,10 +1543,101 @@ 

    Creati

    Latent Dirichlet allocation (LDA) is a particularly popular method for fitting a topic model. It treats each document as a mixture of topics, and each topic as a mixture of words. This allows documents to “overlap” each other in terms of content, rather than being separated into discrete groups, in a way that mirrors typical use of natural language.

    We can also perform supervised analyses to build a classifier to classify lines of text from our austen_sparse or austen_dtm objects.

    + +
    +

    R session information

    +
    +
    options(width = 120)
    +sessioninfo::session_info()
    +
    +
    ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
    + setting  value
    + version  R version 4.3.1 (2023-06-16)
    + os       macOS Ventura 13.5
    + system   aarch64, darwin20
    + ui       X11
    + language (EN)
    + collate  en_US.UTF-8
    + ctype    en_US.UTF-8
    + tz       America/New_York
    + date     2023-08-17
    + pandoc   3.1.5 @ /opt/homebrew/bin/ (via rmarkdown)
     
    -
    +─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── + package * version date (UTC) lib source + cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) + colorout 1.2-2 2023-05-06 [1] Github (jalvesaq/colorout@79931fd) + colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) + digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) + dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.3.0) + evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) + fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) + farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0) + fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) + forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0) + fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) + generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) + ggplot2 * 3.4.3 2023-08-14 [1] CRAN (R 4.3.1) + glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) + gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0) + hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0) + htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.0) + htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0) + janeaustenr * 1.0.0 2022-08-26 [1] CRAN (R 4.3.0) + jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.0) + knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0) + labeling 0.4.2 2020-10-20 [1] CRAN (R 4.3.0) + lattice 0.21-8 2023-04-05 [1] CRAN (R 4.3.1) + lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) + lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.3.0) + magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) + Matrix 1.6-1 2023-08-14 [1] CRAN (R 4.3.0) + munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) + NLP 0.2-1 2020-10-14 [1] CRAN (R 4.3.0) + pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) + pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) + purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) + R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) + rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.3.0) + RColorBrewer * 1.1-3 2022-04-03 [1] CRAN (R 4.3.0) + Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0) + readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0) + rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) + rmarkdown 2.24 2023-08-14 [1] CRAN (R 4.3.1) + rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) + scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) + sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) + slam 0.1-50 2022-01-08 [1] CRAN (R 4.3.0) + SnowballC 0.7.1 2023-04-25 [1] CRAN (R 4.3.0) + stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0) + stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0) + textdata 0.4.4 2022-09-02 [1] CRAN (R 4.3.0) + tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) + tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0) + tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) + tidytext * 0.4.1 2023-01-07 [1] CRAN (R 4.3.0) + tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0) + timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0) + tm 0.7-11 2023-02-05 [1] CRAN (R 4.3.0) + tokenizers 0.3.0 2022-12-22 [1] CRAN (R 4.3.0) + tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0) + utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) + vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0) + withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) + wordcloud * 2.6 2018-08-24 [1] CRAN (R 4.3.0) + xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0) + xml2 1.3.5 2023-07-06 [1] CRAN (R 4.3.0) + yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) + [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library + +──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

    +
    +
    + + +