From ddd015f0b7b8e5c2a7c5a317fc75855f82344956 Mon Sep 17 00:00:00 2001 From: Martin van Rongen Date: Tue, 18 Jun 2024 09:57:45 +0100 Subject: [PATCH] 1806 ch4 ex --- .DS_Store | Bin 14340 -> 14340 bytes .../data-wrangling/execute-results/html.json | 4 +- .../execute-results/html.json | 4 +- materials/.DS_Store | Bin 12292 -> 12292 bytes materials/_no-render/template.qmd | 6 +- materials/data-wrangling.qmd | 1 + materials/data/.DS_Store | Bin 0 -> 6148 bytes materials/data/finches.csv | 360 +++++++++--------- materials/intro-to-programming.qmd | 121 ++++++ 9 files changed, 311 insertions(+), 185 deletions(-) create mode 100644 materials/data/.DS_Store diff --git a/.DS_Store b/.DS_Store index d97cd6524d68583041dc89cf17c141ac65bd9d1c..2c8613932181bc2a5a542cb2f9bae260dd65e0d6 100644 GIT binary patch delta 20 bcmZoEXernrC^$KvJ$dpx1=r0x1-;b(RT>AY delta 16 XcmZoEXernrC^$J*WyR*5g5GKXIlcyE diff --git a/_freeze/materials/data-wrangling/execute-results/html.json b/_freeze/materials/data-wrangling/execute-results/html.json index dfc1bc0..c2397a9 100644 --- a/_freeze/materials/data-wrangling/execute-results/html.json +++ b/_freeze/materials/data-wrangling/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "8e89a1ddd7de06abe96834b33bc78637", + "hash": "4ee0dbc716a5c66090af7d6afff82f6c", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Data wrangling\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n- Be able to make changes to variables (columns).\n- Be able to make changes to observations (rows).\n- Implement changes on a grouped basis.\n- Export a data set to file.\n\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n:::\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# create / change columns\ndplyr::mutate()\n\n# move columns\ndplyr::relocate()\n\n# group values by one or more variables\ndplyr::group_by()\n\n# count number of unique observations\ndplyr::count()\n\n# summarises data; specify the type of summary within the function\ndplyr::summarise()\n\n# reshapes the data into a wide format\ntidyr::pivot_wider()\n\n# reshapes the data into a long format\ntidyr::pivot_longer()\n```\n:::\n\n\n\n:::\n:::\n\n## Purpose and aim\n\nOften, there is not one single data format that allows you to do all of your analysis. Getting comfortable with making changes to the way your data are organised is an important skill. This is sometimes referred to as 'data wrangling'. In this section we'll learn how we can change the organisation of columns, how to add new columns, manipulate rows and perform these operations on subgroups of the data.\n\n## Reading in data\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe'll keep using our data set on Darwin's finches. If you haven't read these data in, please do so with the following:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches <- read_csv(\"data/finches.csv\")\n```\n:::\n\n:::\n\n## Creating new columns\n\nSometimes you'll have to create new columns in your data set. For example, you might have a column that records something in kilograms, but you need it in milligrams. You'd then have to either convert the original column or create a new one with the new data.\n\nLet's see how to do this using the `weight` column from the `finches` data.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe'll use pipes to do this, so we can see what R is doing without immediately updating the data. This is generally a useful technique: check each step one-by-one and after you're happy with the changes, *then* update the table.\n\nTo add a column, we use the `mutate()` function. We first define the name of the *new column*, then tell it what needs to go in it.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n mutate(weight_kg = weight / 1000)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 180 × 13\n species group weight wing tarsus blength bdepth bwidth pc1_body pc1_beak\n \n 1 G. fortis Early … 15.8 67.1 19.6 10.3 8.95 8.32 0.382 -0.431\n 2 G. fortis Early … 15.2 66 18.3 10.4 8.7 8.4 -1.06 -0.452\n 3 G. fortis Early … 18.0 68 18.9 11.2 9.6 8.83 0.839 0.955\n 4 G. fortis Early … 18.5 70.3 19.7 11 9.7 8.73 2.16 0.824\n 5 G. fortis Early … 15.7 69 18.9 10.9 9.8 9 0.332 1.08 \n 6 G. fortis Early … 17.8 70.1 19.2 12.7 10.9 9.79 1.50 3.55 \n 7 G. fortis Early … 17.2 69 20.3 11.9 9.8 9 1.86 1.67 \n 8 G. fortis Early … 17.2 68.5 19.2 11.4 9.8 8.6 0.879 1.00 \n 9 G. fortis Early … 16.5 66.3 18.7 9.04 8.42 7.98 -0.227 -1.81 \n10 G. fortis Early … 19.4 69 18.7 11.3 9.6 8.8 1.39 1.00 \n# ℹ 170 more rows\n# ℹ 3 more variables: pc2_beak , is_early , weight_kg \n```\n\n\n:::\n:::\n\n\nYou'll probably notice that our new column isn't visible on screen. This is because we have quite a few columns in our table. We can move the new column to directly after the `weight` column. We use the `relocate()` function for this.\n\nWe tell `relocate()` which column we want to move, then use the `.after =` argument to specify where we want to insert the column.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n mutate(weight_kg = weight / 1000) %>% \n relocate(weight_kg, .after = weight)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 180 × 13\n species group weight weight_kg wing tarsus blength bdepth bwidth pc1_body\n \n 1 G. fortis Early… 15.8 0.0158 67.1 19.6 10.3 8.95 8.32 0.382\n 2 G. fortis Early… 15.2 0.0152 66 18.3 10.4 8.7 8.4 -1.06 \n 3 G. fortis Early… 18.0 0.0180 68 18.9 11.2 9.6 8.83 0.839\n 4 G. fortis Early… 18.5 0.0185 70.3 19.7 11 9.7 8.73 2.16 \n 5 G. fortis Early… 15.7 0.0157 69 18.9 10.9 9.8 9 0.332\n 6 G. fortis Early… 17.8 0.0178 70.1 19.2 12.7 10.9 9.79 1.50 \n 7 G. fortis Early… 17.2 0.0172 69 20.3 11.9 9.8 9 1.86 \n 8 G. fortis Early… 17.2 0.0172 68.5 19.2 11.4 9.8 8.6 0.879\n 9 G. fortis Early… 16.5 0.0165 66.3 18.7 9.04 8.42 7.98 -0.227\n10 G. fortis Early… 19.4 0.0194 69 18.7 11.3 9.6 8.8 1.39 \n# ℹ 170 more rows\n# ℹ 3 more variables: pc1_beak , pc2_beak , is_early \n```\n\n\n:::\n:::\n\n\n:::\n\nWe can see that the new column indeed contains the new weight measurements, composed of the original `weight` values divided by 1,000.\n\nNow that we know this gives us the result we want, we can update the original table:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches <- finches %>% \n mutate(weight_kg = weight / 1000) %>% \n relocate(weight_kg, .after = weight)\n```\n:::\n\n\n:::\n\n## Grouping and summarising\n\nA very common technique used in data analysis is the \"split-apply-combine\". This is a three-step process, where we:\n\n1. Split the data into subgroups.\n2. Apply a set of transformations / calculations / ... to each subgroup.\n3. Combine the result into a single table.\n\n### Groups\n\nI happen to know that there are two distinct species in this data set. Let's say we're interested in finding out how many observations we have for each species.\n\nThere are two steps to this process:\n\n1. We need to split the data by `species`.\n2. We need to count the number of rows (= observations) in each subgroup.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can use the `group_by()` function to group data by a given variable. Here, we will group the data by `species`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n group_by(species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 180 × 13\n# Groups: species [2]\n species group weight weight_kg wing tarsus blength bdepth bwidth pc1_body\n \n 1 G. fortis Early… 15.8 0.0158 67.1 19.6 10.3 8.95 8.32 0.382\n 2 G. fortis Early… 15.2 0.0152 66 18.3 10.4 8.7 8.4 -1.06 \n 3 G. fortis Early… 18.0 0.0180 68 18.9 11.2 9.6 8.83 0.839\n 4 G. fortis Early… 18.5 0.0185 70.3 19.7 11 9.7 8.73 2.16 \n 5 G. fortis Early… 15.7 0.0157 69 18.9 10.9 9.8 9 0.332\n 6 G. fortis Early… 17.8 0.0178 70.1 19.2 12.7 10.9 9.79 1.50 \n 7 G. fortis Early… 17.2 0.0172 69 20.3 11.9 9.8 9 1.86 \n 8 G. fortis Early… 17.2 0.0172 68.5 19.2 11.4 9.8 8.6 0.879\n 9 G. fortis Early… 16.5 0.0165 66.3 18.7 9.04 8.42 7.98 -0.227\n10 G. fortis Early… 19.4 0.0194 69 18.7 11.3 9.6 8.8 1.39 \n# ℹ 170 more rows\n# ℹ 3 more variables: pc1_beak , pc2_beak , is_early \n```\n\n\n:::\n:::\n\n\nThis doesn't seem to make much difference, since it's still outputting all of the data. However, if you look closely, you will notice that next to the `A tibble: 180 x 13` text in the top-left corner there is now a `Groups: species [2]` designation. What this means is that, behind the scenes, the table is now also split by the `species` variable and that there are two distinct groups in there.\n\nSo, if we want to see how many observations we have in each group we can use the very useful `count()` function. We don't have to specify anything - in this case it just counts the number of rows.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n group_by(species) %>% \n count()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n# Groups: species [2]\n species n\n \n1 G. fortis 89\n2 G. scandens 91\n```\n\n\n:::\n:::\n\n:::\n\nThere we are, we have two distinct species of finch in these data and they more or less have an equal number of observations.\n\n### Summarising data\n\nQuite often you might find yourself in a situation where you want to get some summary statistics, based on subgroups within the data. Let's see how that works with our data.\n\nWe now know there are two species in our data. Let's imagine we wanted to know the average `weight` for each species.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can use the `summarise()` function to, well, *summarise* data. The first bit indicates the name of the new column that will contain the summarised values. The part after it determines what goes into this column.\n\nHere we want the average weight, so we use `mean(weight)` to calculate this. Let's store this in a column called `avg_weight`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n group_by(species) %>% \n summarise(avg_weight = mean(weight))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n species avg_weight\n \n1 G. fortis 15.8\n2 G. scandens 19.5\n```\n\n\n:::\n:::\n\n\n:::\n\nThis gives us a table where we have the average weight for each species. We can simply expand this for any other variables, for example:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# calculate mean, median, minimum and maximum weight per group\nfinches %>% \n group_by(species) %>% \n summarise(avg_weight = mean(weight),\n median_weight = median(weight),\n min_weight = min(weight),\n max_weight = max(weight))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 5\n species avg_weight median_weight min_weight max_weight\n \n1 G. fortis 15.8 15.5 11.6 19.9\n2 G. scandens 19.5 19 15.4 24.4\n```\n\n\n:::\n:::\n\n\n:::\n\n## Reshaping data\n\nWhen you're analysing your data, you'll often find that you will need to structure your data in different ways, for different purposes.\n\nIdeally, you always have the same starting point where:\n\n1. Each column contains a single variable (something you're measuring).\n2. Each row is a single observation (all the measurements belonging to a single unit/person/plant etc).\n\nEven though you might still need to have your data in a different shape, having it like this as a starting point means you can always rework your data.\n\nLet's illustrate this with the following example:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 3\n species group n\n \n1 G. fortis Early blunt 30\n2 G. fortis Late blunt 30\n3 G. fortis Late pointed 29\n4 G. scandens Early pointed 31\n5 G. scandens Late blunt 30\n6 G. scandens Late pointed 30\n```\n\n\n:::\n:::\n\n\nHere we have count data (number of observations) for each species and group. It's quite a list and you can imagine that if you had many more species then it would become tricky to interpret. So, instead we're going to reshape the this table and have a column for each unique `group` and a row for each `species`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can obtain the data set above by using the `count()` function. Here we are counting by two variables: `species` and `group`.\n\nIf we want to reshape the data, we can use the `pivot_*` functions. There are two main ones:\n\n1. `pivot_longer()` creates a 'long' format data set; here each observation is a single row and data is repeated in the first column.\n2. `pivot_wider()` creates a 'wide' format data set; here data is not repeated in the first column.\n\nSo, here we are using the `pivot_wider()` function. We need to tell it where the new column names are going to come from (`names_from =`). We also need to specify where the values are coming from that are going to be used to populate the new table (`values_from =`):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches_wide <- finches %>% \n count(species, group) %>% \n pivot_wider(names_from = group, values_from = n)\n\nfinches_wide\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 5\n species `Early blunt` `Late blunt` `Late pointed` `Early pointed`\n \n1 G. fortis 30 30 29 NA\n2 G. scandens NA 30 30 31\n```\n\n\n:::\n:::\n\n\n:::\n\nThis gives us a 'wide' table, where the original data are split by the type of `group`. We have 4 distinct groups, so we end up with one column for each group plus the original one for `species`.\n\n::: {.callout-note}\n## Long or wide?\n\nDeciding which format to use can sometimes feel a bit tricky. Relating it to plotting can be helpful. Ask yourself the question: \"what is going on the x and y axis?\". Each variable that you want to plot on either the x or y axis needs to be in its own column.\n\n:::\n## Exporting data\n\nIt can be useful to save data sets you create throughout your analysis.\n\n::: {.panel-tabset group=\"language\"}\n## R\nWe can do this using the `write_csv()` function. This will write a table to a `.csv` file (comma-separated values). The first part tells it which data set we're saving. We'll use the `finches_wide` as an example. The `file =` argument specifies where the file needs to be stored. Here, we are saving it in the `data` folder, under the name `finches_wide.csv`.\n\nNote: the filename needs to be in quotes *and* needs to have a file extension.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwrite_csv(finches_wide, file = \"data/finches_wide.csv\")\n```\n:::\n\n\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- A 3-step process (split - apply - combine) allows you to apply transformations on subgroups of your data.\n- The result can be combined in a single table.\n- We reshape our data based on our type of analysis.\n- Organise your data so that each variable has its own column and each observation is a row.\n\n:::\n", + "markdown": "---\ntitle: \"Data wrangling\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n- Be able to make changes to variables (columns).\n- Be able to make changes to observations (rows).\n- Implement changes on a grouped basis.\n- Export a data set to file.\n\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n:::\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# create / change columns\ndplyr::mutate()\n\n# move columns\ndplyr::relocate()\n\n# group values by one or more variables\ndplyr::group_by()\n\n# count number of unique observations\ndplyr::count()\n\n# summarises data; specify the type of summary within the function\ndplyr::summarise()\n\n# reshapes the data into a wide format\ntidyr::pivot_wider()\n\n# reshapes the data into a long format\ntidyr::pivot_longer()\n```\n:::\n\n\n\n:::\n:::\n\n## Purpose and aim\n\nOften, there is not one single data format that allows you to do all of your analysis. Getting comfortable with making changes to the way your data are organised is an important skill. This is sometimes referred to as 'data wrangling'. In this section we'll learn how we can change the organisation of columns, how to add new columns, manipulate rows and perform these operations on subgroups of the data.\n\n## Reading in data\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe'll keep using our data set on Darwin's finches. If you haven't read these data in, please do so with the following:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches <- read_csv(\"data/finches.csv\")\n```\n:::\n\n:::\n\n## Creating new columns\n\nSometimes you'll have to create new columns in your data set. For example, you might have a column that records something in kilograms, but you need it in milligrams. You'd then have to either convert the original column or create a new one with the new data.\n\nLet's see how to do this using the `weight` column from the `finches` data.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe'll use pipes to do this, so we can see what R is doing without immediately updating the data. This is generally a useful technique: check each step one-by-one and after you're happy with the changes, *then* update the table.\n\nTo add a column, we use the `mutate()` function. We first define the name of the *new column*, then tell it what needs to go in it.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n mutate(weight_kg = weight / 1000)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 180 × 13\n species group weight wing tarsus blength bdepth bwidth pc1_body pc1_beak\n \n 1 G. fortis early_… 15.8 67.1 19.6 10.3 8.95 8.32 0.382 -0.431\n 2 G. fortis early_… 15.2 66 18.3 10.4 8.7 8.4 -1.06 -0.452\n 3 G. fortis early_… 18.0 68 18.9 11.2 9.6 8.83 0.839 0.955\n 4 G. fortis early_… 18.5 70.3 19.7 11 9.7 8.73 2.16 0.824\n 5 G. fortis early_… 15.7 69 18.9 10.9 9.8 9 0.332 1.08 \n 6 G. fortis early_… 17.8 70.1 19.2 12.7 10.9 9.79 1.50 3.55 \n 7 G. fortis early_… 17.2 69 20.3 11.9 9.8 9 1.86 1.67 \n 8 G. fortis early_… 17.2 68.5 19.2 11.4 9.8 8.6 0.879 1.00 \n 9 G. fortis early_… 16.5 66.3 18.7 9.04 8.42 7.98 -0.227 -1.81 \n10 G. fortis early_… 19.4 69 18.7 11.3 9.6 8.8 1.39 1.00 \n# ℹ 170 more rows\n# ℹ 3 more variables: pc2_beak , is_early , weight_kg \n```\n\n\n:::\n:::\n\n\nYou'll probably notice that our new column isn't visible on screen. This is because we have quite a few columns in our table. We can move the new column to directly after the `weight` column. We use the `relocate()` function for this.\n\nWe tell `relocate()` which column we want to move, then use the `.after =` argument to specify where we want to insert the column.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n mutate(weight_kg = weight / 1000) %>% \n relocate(weight_kg, .after = weight)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 180 × 13\n species group weight weight_kg wing tarsus blength bdepth bwidth pc1_body\n \n 1 G. fortis early… 15.8 0.0158 67.1 19.6 10.3 8.95 8.32 0.382\n 2 G. fortis early… 15.2 0.0152 66 18.3 10.4 8.7 8.4 -1.06 \n 3 G. fortis early… 18.0 0.0180 68 18.9 11.2 9.6 8.83 0.839\n 4 G. fortis early… 18.5 0.0185 70.3 19.7 11 9.7 8.73 2.16 \n 5 G. fortis early… 15.7 0.0157 69 18.9 10.9 9.8 9 0.332\n 6 G. fortis early… 17.8 0.0178 70.1 19.2 12.7 10.9 9.79 1.50 \n 7 G. fortis early… 17.2 0.0172 69 20.3 11.9 9.8 9 1.86 \n 8 G. fortis early… 17.2 0.0172 68.5 19.2 11.4 9.8 8.6 0.879\n 9 G. fortis early… 16.5 0.0165 66.3 18.7 9.04 8.42 7.98 -0.227\n10 G. fortis early… 19.4 0.0194 69 18.7 11.3 9.6 8.8 1.39 \n# ℹ 170 more rows\n# ℹ 3 more variables: pc1_beak , pc2_beak , is_early \n```\n\n\n:::\n:::\n\n\n:::\n\nWe can see that the new column indeed contains the new weight measurements, composed of the original `weight` values divided by 1,000.\n\nNow that we know this gives us the result we want, we can update the original table:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches <- finches %>% \n mutate(weight_kg = weight / 1000) %>% \n relocate(weight_kg, .after = weight)\n```\n:::\n\n\n:::\n\n## Grouping and summarising\n\nA very common technique used in data analysis is the \"split-apply-combine\". This is a three-step process, where we:\n\n1. Split the data into subgroups.\n2. Apply a set of transformations / calculations / ... to each subgroup.\n3. Combine the result into a single table.\n\n### Groups\n\nI happen to know that there are two distinct species in this data set. Let's say we're interested in finding out how many observations we have for each species.\n\nThere are two steps to this process:\n\n1. We need to split the data by `species`.\n2. We need to count the number of rows (= observations) in each subgroup.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can use the `group_by()` function to group data by a given variable. Here, we will group the data by `species`:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n group_by(species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 180 × 13\n# Groups: species [2]\n species group weight weight_kg wing tarsus blength bdepth bwidth pc1_body\n \n 1 G. fortis early… 15.8 0.0158 67.1 19.6 10.3 8.95 8.32 0.382\n 2 G. fortis early… 15.2 0.0152 66 18.3 10.4 8.7 8.4 -1.06 \n 3 G. fortis early… 18.0 0.0180 68 18.9 11.2 9.6 8.83 0.839\n 4 G. fortis early… 18.5 0.0185 70.3 19.7 11 9.7 8.73 2.16 \n 5 G. fortis early… 15.7 0.0157 69 18.9 10.9 9.8 9 0.332\n 6 G. fortis early… 17.8 0.0178 70.1 19.2 12.7 10.9 9.79 1.50 \n 7 G. fortis early… 17.2 0.0172 69 20.3 11.9 9.8 9 1.86 \n 8 G. fortis early… 17.2 0.0172 68.5 19.2 11.4 9.8 8.6 0.879\n 9 G. fortis early… 16.5 0.0165 66.3 18.7 9.04 8.42 7.98 -0.227\n10 G. fortis early… 19.4 0.0194 69 18.7 11.3 9.6 8.8 1.39 \n# ℹ 170 more rows\n# ℹ 3 more variables: pc1_beak , pc2_beak , is_early \n```\n\n\n:::\n:::\n\n\nThis doesn't seem to make much difference, since it's still outputting all of the data. However, if you look closely, you will notice that next to the `A tibble: 180 x 13` text in the top-left corner there is now a `Groups: species [2]` designation. What this means is that, behind the scenes, the table is now also split by the `species` variable and that there are two distinct groups in there.\n\nSo, if we want to see how many observations we have in each group we can use the very useful `count()` function. We don't have to specify anything - in this case it just counts the number of rows.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n group_by(species) %>% \n count()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n# Groups: species [2]\n species n\n \n1 G. fortis 89\n2 G. scandens 91\n```\n\n\n:::\n:::\n\n:::\n\nThere we are, we have two distinct species of finch in these data and they more or less have an equal number of observations.\n\n### Summarising data\n\nQuite often you might find yourself in a situation where you want to get some summary statistics, based on subgroups within the data. Let's see how that works with our data.\n\nWe now know there are two species in our data. Let's imagine we wanted to know the average `weight` for each species.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can use the `summarise()` function to, well, *summarise* data. The first bit indicates the name of the new column that will contain the summarised values. The part after it determines what goes into this column.\n\nHere we want the average weight, so we use `mean(weight)` to calculate this. Let's store this in a column called `avg_weight`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches %>% \n group_by(species) %>% \n summarise(avg_weight = mean(weight))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n species avg_weight\n \n1 G. fortis 15.8\n2 G. scandens 19.5\n```\n\n\n:::\n:::\n\n\n:::\n\nThis gives us a table where we have the average weight for each species. We can simply expand this for any other variables, for example:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# calculate mean, median, minimum and maximum weight per group\nfinches %>% \n group_by(species) %>% \n summarise(avg_weight = mean(weight),\n median_weight = median(weight),\n min_weight = min(weight),\n max_weight = max(weight))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 5\n species avg_weight median_weight min_weight max_weight\n \n1 G. fortis 15.8 15.5 11.6 19.9\n2 G. scandens 19.5 19 15.4 24.4\n```\n\n\n:::\n:::\n\n\n:::\n\n## Reshaping data\n\nWhen you're analysing your data, you'll often find that you will need to structure your data in different ways, for different purposes.\n\nIdeally, you always have the same starting point where:\n\n1. Each column contains a single variable (something you're measuring).\n2. Each row is a single observation (all the measurements belonging to a single unit/person/plant etc).\n\nEven though you might still need to have your data in a different shape, having it like this as a starting point means you can always rework your data.\n\nLet's illustrate this with the following example:\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 3\n species group n\n \n1 G. fortis early_blunt 30\n2 G. fortis late_blunt 30\n3 G. fortis late_pointed 29\n4 G. scandens early_pointed 31\n5 G. scandens late_blunt 30\n6 G. scandens late_pointed 30\n```\n\n\n:::\n:::\n\n\nHere we have count data (number of observations) for each species and group. It's quite a list and you can imagine that if you had many more species then it would become tricky to interpret. So, instead we're going to reshape the this table and have a column for each unique `group` and a row for each `species`.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can obtain the data set above by using the `count()` function. Here we are counting by two variables: `species` and `group`.\n\nIf we want to reshape the data, we can use the `pivot_*` functions. There are two main ones:\n\n1. `pivot_longer()` creates a 'long' format data set; here each observation is a single row and data is repeated in the first column.\n2. `pivot_wider()` creates a 'wide' format data set; here data is not repeated in the first column.\n\nSo, here we are using the `pivot_wider()` function. We need to tell it where the new column names are going to come from (`names_from =`). We also need to specify where the values are coming from that are going to be used to populate the new table (`values_from =`):\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfinches_wide <- finches %>% \n count(species, group) %>% \n pivot_wider(names_from = group, values_from = n)\n\nfinches_wide\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 5\n species early_blunt late_blunt late_pointed early_pointed\n \n1 G. fortis 30 30 29 NA\n2 G. scandens NA 30 30 31\n```\n\n\n:::\n:::\n\n\n:::\n\nThis gives us a 'wide' table, where the original data are split by the type of `group`. We have 4 distinct groups, so we end up with one column for each group plus the original one for `species`.\n\n::: {.callout-note}\n## Long or wide?\n\nDeciding which format to use can sometimes feel a bit tricky. Relating it to plotting can be helpful. Ask yourself the question: \"what is going on the x and y axis?\". Each variable that you want to plot on either the x or y axis needs to be in its own column.\n\n:::\n\n## Exporting data\n\nIt can be useful to save data sets you create throughout your analysis.\n\n::: {.panel-tabset group=\"language\"}\n## R\nWe can do this using the `write_csv()` function. This will write a table to a `.csv` file (comma-separated values). The first part tells it which data set we're saving. We'll use the `finches_wide` as an example. The `file =` argument specifies where the file needs to be stored. Here, we are saving it in the `data` folder, under the name `finches_wide.csv`.\n\nNote: the filename needs to be in quotes *and* needs to have a file extension.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwrite_csv(finches_wide, file = \"data/finches_wide.csv\")\n```\n:::\n\n\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- A 3-step process (split - apply - combine) allows you to apply transformations on subgroups of your data.\n- The result can be combined in a single table.\n- We reshape our data based on our type of analysis.\n- Organise your data so that each variable has its own column and each observation is a row.\n\n:::\n", "supporting": [ "data-wrangling_files" ], diff --git a/_freeze/materials/intro-to-programming/execute-results/html.json b/_freeze/materials/intro-to-programming/execute-results/html.json index f3ddc15..0b265cb 100644 --- a/_freeze/materials/intro-to-programming/execute-results/html.json +++ b/_freeze/materials/intro-to-programming/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "f50e7486c44ec63e9d0020df8ea69770", + "hash": "c3158d91cdc031a8ceb7402c5d18216e", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Getting started\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n- Learn basic programming techniques\n\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n### Functions\n:::\n:::\n\n## Purpose and aim\n\nUsing a programming language to analyse, visualise and communicate your data has many advantages over point-and-click programmes.\n\n* it documents analysis steps with code, aiding reproducibility\n* allows scaling to large data\n* generates high quality graphics that can be adjusted\n\n## Introduction\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nThe term \"**R**\" is used to refer to both the programming language and the\nsoftware that interprets the scripts written using it.\n\n**RStudio** is an additional software that makes it easier to \ninteract with R by providing tools that make programming easier. \nTo function correctly, RStudio needs R and therefore both need to be installed on your computer.\n\nSome advantages of using R for your data analysis include:\n\n- Analysis steps are documented with code, allowing for greater reproducibility. \n- There are thousands of packages (extensions) available, making R a very flexible \n and powerful tool, for a wide range of applications. \n- Analysis can be scaled to large data.\n- Can generate a wide range of high-quality graphics for data visualisation.\n- There is a large community of contributors.\n- It's free and open source.\n\n\n### The RStudio Interface \n\nRStudio is divided into four \"panes\", illustrated below. \nThe default layout is:\n\n- Top Left - **Source**: this is where you edit your R scripts \n (this panel might not appear until you create a script, which we demonstrate below).\n- Bottom Left - **Console**: where R will execute commands and print results.\n- Top Right - **Environment**: this will show you which objects you create \n while working with R.\n- Bottom Right - **Files**/**Plots**/**Packages**/**Help**: several tabs that allow \n you to navigate your files, view plots, view installed packages and search help files. \n\n\n![](images/00-RStudio_screen.png)\n\n:::\n\n## Getting set up\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nIt is good practice to keep a set of related data, analyses, and text\nself-contained in a single folder called the **working directory** (usually a folder \non your computer where you have all the files relating to a project you're working on). \n\nThe working directory is an important concept to understand. It is the place\nwhere R will look for and save files.\n\nAll of the scripts within this folder can then use *relative paths* to files. \nRelative paths indicate where inside the project a file is located (as opposed to \nabsolute paths, which point to where a file is on a specific computer). \nWorking this way makes it a lot easier to move your project around on your computer \nand share it with others without having to directly modify file paths in the individual \nscripts.\n\nRStudio provides a helpful set of tools to do this through its **Projects**\ninterface, which not only creates a working directory for you but also remembers\nits location (allowing you to quickly navigate to it). The interface also \npreserves custom settings and open files to make it easier to resume work after \na break. \n\n\n### Creating a new project\n\nUsually, you will already have a folder on your computer for your project, for \nexample with some data you collected or downloaded from the web. \n\nTo create an _R Project_ within the `r-workshop` directory:\n\n- From the upper menu on RStudio click: File > New project > Existing directory.\n- Click the browse... button and navigate and open your `r-workshop` folder. \n- Click on Create project. This will initiate a fresh R session.\n\nFrom now on, whenever you want to work on this project, open the the `Rproj` file \nthat was created in your `r-workshop` folder.\n\nThis will ensure your working directory is automatically set correctly. This also means \nthat you can move the project folder to a different location or even different \ncomputer. As long as you open the `Rproj` file, your working directory will be set correctly. \n\nIf you need to check your working directory, you can run `getwd()` on the console. \nIf for some reason your working directory is not what it should be, you can change it in the RStudio interface by navigating in the file browser (bottom-right panel) to where your working directory should be, clicking on the blue gear icon \nMore > Set As Working Directory.\n\nAlternatively, you can run `setwd(\"/path/to/working/directory\")` on the console to \nreset your working directory. However, your scripts should not include this line, \nbecause it will fail on someone else's computer.\n\n:::\n\n## Writing code\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nNow that we have a project, let's run our first commands in R.\n\nOn the _console_ panel, type:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n1 + 100\n```\n:::\n\n\nAnd R will print out the answer, with a preceding `[1]`. Don't worry about\nthis for now, we'll explain that later. For now think of it as indicating\noutput.\n\nAny time you hit return and the console shows a \"`+`\" instead of a \"`>`\", it\nmeans it's waiting for you to complete the command. If you want to cancel a\ncommand you can hit Esc and RStudio will give you back the `>` prompt.\n\n:::\n\n## Creating scripts\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nSo far, we've been typing these commands directly in the R console. However, if we \nclosed RStudio and later wanted to recreate these operations, there would be no \nrecord of them anywhere. \n\nIn practice, we should always write our code in a **script**, which is a plain text \ndocument with our commands written in it. \nTo create a new R script go to File > New File > R Script.\n\nThis will open a panel on the top-left. This is a text editor, which in RStudio \ndoes some syntax highlighting (it colours the code) to help read the code. \n\nAs you're adding code to the script, you can run it interactively on the console \nby pressing the shortcut Ctrl+Enter. \n\n:::\n\n## Installing and loading packages\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nAdditional packages can be installed to extend the functionality of R. \nMost packages are available in a central repository called CRAN and can be \ninstalled from within R using the `install.packages()` function.\n\nFor example, to install (or update) the `tidyverse` package, you would run the \nfollowing command on the console:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"tidyverse\")\n```\n:::\n\n\nBecause the install process accesses the CRAN repository, you will need an Internet \nconnection to install packages.\n\nAfter this, you can then load the package to use it in your analysis. For the example above, we would do that as follows with the `library()` function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n:::\n\n\n:::\n\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- We use a working directory to organise our projects\n- Using scripts we're able to keep a record of our code\n- Packages or libraries give additional functionality\n:::\n", + "markdown": "---\ntitle: \"Getting started\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n- Learn basic programming techniques\n\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n### Functions\n:::\n:::\n\n## Purpose and aim\n\nUsing a programming language to analyse, visualise and communicate your data has many advantages over point-and-click programmes.\n\n* it documents analysis steps with code, aiding reproducibility\n* allows scaling to large data\n* generates high quality graphics that can be adjusted\n\n## Introduction\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nThe term \"**R**\" is used to refer to both the programming language and the\nsoftware that interprets the scripts written using it.\n\n**RStudio** is an additional software that makes it easier to \ninteract with R by providing tools that make programming easier. \nTo function correctly, RStudio needs R and therefore both need to be installed on your computer.\n\nSome advantages of using R for your data analysis include:\n\n- Analysis steps are documented with code, allowing for greater reproducibility. \n- There are thousands of packages (extensions) available, making R a very flexible \n and powerful tool, for a wide range of applications. \n- Analysis can be scaled to large data.\n- Can generate a wide range of high-quality graphics for data visualisation.\n- There is a large community of contributors.\n- It's free and open source.\n\n\n### The RStudio Interface \n\nRStudio is divided into four \"panes\", illustrated below. \nThe default layout is:\n\n- Top Left - **Source**: this is where you edit your R scripts \n (this panel might not appear until you create a script, which we demonstrate below).\n- Bottom Left - **Console**: where R will execute commands and print results.\n- Top Right - **Environment**: this will show you which objects you create \n while working with R.\n- Bottom Right - **Files**/**Plots**/**Packages**/**Help**: several tabs that allow \n you to navigate your files, view plots, view installed packages and search help files. \n\n\n![](images/00-RStudio_screen.png)\n\n:::\n\n## Getting set up\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nIt is good practice to keep a set of related data, analyses, and text\nself-contained in a single folder called the **working directory** (usually a folder \non your computer where you have all the files relating to a project you're working on). \n\nThe working directory is an important concept to understand. It is the place\nwhere R will look for and save files.\n\nAll of the scripts within this folder can then use *relative paths* to files. \nRelative paths indicate where inside the project a file is located (as opposed to \nabsolute paths, which point to where a file is on a specific computer). \nWorking this way makes it a lot easier to move your project around on your computer \nand share it with others without having to directly modify file paths in the individual \nscripts.\n\nRStudio provides a helpful set of tools to do this through its **Projects**\ninterface, which not only creates a working directory for you but also remembers\nits location (allowing you to quickly navigate to it). The interface also \npreserves custom settings and open files to make it easier to resume work after \na break. \n\n\n### Creating a new project\n\nUsually, you will already have a folder on your computer for your project, for \nexample with some data you collected or downloaded from the web. \n\nTo create an _R Project_ within the `r-workshop` directory:\n\n- From the upper menu on RStudio click: File > New project > Existing directory.\n- Click the browse... button and navigate and open your `r-workshop` folder. \n- Click on Create project. This will initiate a fresh R session.\n\nFrom now on, whenever you want to work on this project, open the the `Rproj` file \nthat was created in your `r-workshop` folder.\n\nThis will ensure your working directory is automatically set correctly. This also means \nthat you can move the project folder to a different location or even different \ncomputer. As long as you open the `Rproj` file, your working directory will be set correctly. \n\nIf you need to check your working directory, you can run `getwd()` on the console. \nIf for some reason your working directory is not what it should be, you can change it in the RStudio interface by navigating in the file browser (bottom-right panel) to where your working directory should be, clicking on the blue gear icon \nMore > Set As Working Directory.\n\nAlternatively, you can run `setwd(\"/path/to/working/directory\")` on the console to \nreset your working directory. However, your scripts should not include this line, \nbecause it will fail on someone else's computer.\n\n:::\n\n:::{.callout-important}\nComplete [Exercise -@sec-exr_project].\n:::\n\n## Writing code\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nNow that we have a project, let's run our first commands in R.\n\nOn the _console_ panel, type:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n1 + 100\n```\n:::\n\n\nAnd R will print out the answer, with a preceding `[1]`. Don't worry about\nthis for now, we'll explain that later. For now think of it as indicating\noutput.\n\nAny time you hit return and the console shows a \"`+`\" instead of a \"`>`\", it\nmeans it's waiting for you to complete the command. If you want to cancel a\ncommand you can hit Esc and RStudio will give you back the `>` prompt.\n\n:::\n\n:::{.callout-important}\nComplete [Exercise -@sec-exr_calculations].\n:::\n\n## Creating scripts\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nSo far, we've been typing these commands directly in the R console. However, if we \nclosed RStudio and later wanted to recreate these operations, there would be no \nrecord of them anywhere. \n\nIn practice, we should always write our code in a **script**, which is a plain text \ndocument with our commands written in it. \nTo create a new R script go to File > New File > R Script.\n\nThis will open a panel on the top-left. This is a text editor, which in RStudio \ndoes some syntax highlighting (it colours the code) to help read the code. \n\nAs you're adding code to the script, you can run it interactively on the console \nby pressing the shortcut Ctrl+Enter. \n\n:::\n\n:::{.callout-important}\nComplete [Exercise -@sec-exr_scripts].\n:::\n\n## Installing and loading packages\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nAdditional packages can be installed to extend the functionality of R. \nMost packages are available in a central repository called CRAN and can be \ninstalled from within R using the `install.packages()` function.\n\nFor example, to install (or update) the `tidyverse` package, you would run the \nfollowing command on the console:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"tidyverse\")\n```\n:::\n\n\nBecause the install process accesses the CRAN repository, you will need an Internet \nconnection to install packages.\n\nAfter this, you can then load the package to use it in your analysis. For the example above, we would do that as follows with the `library()` function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n:::\n\n\n:::\n\n:::{.callout-important}\nComplete [Exercise -@sec-exr_packages].\n:::\n\n:::{.callout-important}\nRemember, we only need to **install** a library/package *once*. However, we need to **load** it every time we start an analysis.\n:::\n\n\n\n## Exercises\n\n### Setting up a project {#sec-exr_project}\n\n:::{.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nSet up a project and make sure it's set as your working directory.\n\n:::\n\n### Calculations {#sec-exr_calculations}\n\n:::{.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nRun the following calculations:\n\n* `2 + 23`\n* `23 * 4`\n* `314 - 82`\n* `(12 - 4) * (6 + 2)`\n* `3 ^ 2`\n\n::: {.callout-answer collapse=true}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\n2 + 23\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 25\n```\n\n\n:::\n\n```{.r .cell-code}\n23 * 4\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 92\n```\n\n\n:::\n\n```{.r .cell-code}\n314 - 82\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 232\n```\n\n\n:::\n\n```{.r .cell-code}\n(12 - 4) * (6 + 2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 64\n```\n\n\n:::\n\n```{.r .cell-code}\n3 ^ 2\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 9\n```\n\n\n:::\n:::\n\n\n:::\n:::\n:::\n\n### Creating scripts {#sec-exr_scripts}\n\n:::{.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nPlease do the following:\n\n1. Create a script called `session_01` in your working directory.\n2. Re-run the calculations from [Exercise -@sec-exr_calculations].\n3. Save the changes to the script.\n\n:::\n\n### Adding functionality {#sec-exr_packages}\n\n:::{.callout-exercise}\n\n\n{{< level 1 >}}\n\n\n\nIt's important that you are comfortable with adding functionality.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nPlease install the `tidyverse` package *using the console*.\n\nThen, in the script you created in [Exercise -@sec-exr_scripts] load it into R.\n:::\n\n::: {.callout-answer collapse=true}\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can install the package as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"tidyverse\")\n```\n:::\n\n\nNote that the title of the package needs to be in quotes (`\" \"`).\n\nWe load the package by running the following line of code from our script:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\n```\n:::\n\n\nNote that, rather inconsistently, we do *not* use quotes around the package name when loading it.\n:::\n:::\n:::\n\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- We use a working directory to organise our projects\n- Using scripts we're able to keep a record of our code\n- Packages or libraries give additional functionality\n:::\n", "supporting": [ "intro-to-programming_files" ], diff --git a/materials/.DS_Store b/materials/.DS_Store index 2454046a29312d4f3b830d25a9d759fc3e674296..6ba38676c279390518c342ce0418d1185c05e933 100644 GIT binary patch delta 255 zcmW;Gzbk}s9Ki9<`|GZU=V3TK_dFfLT~gE?w^$6sVlfbt?ec3>78J6ax^O4HlKdDf z20t$KL?kKv1e#53jxF=cs6*A* z@&l*hN?eN@aVx^&UObBz@haZL2Ou$sz)1?Jq>)bvZb~VmidyQZ$3ru%w9!E)UVQY? zkDn372r$7U(=4#a63c9}$36!fa>5zsToC4gN1k|h^C_dHL&iv_jFqV}Qx*v8B)==# z|MqWWq0h{=eMhR|#3bqYcTK2E$DMYmP)vq;%c z6D>Nw-OBHxn;v>KVvu1?7}bPvO$ai@471Fkm}iAm)>vnQO?EWsNR1_DnM=;0DAS%1 OvZ-B1lWDjPt>*_#H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3 zem<@ulZcFPQ@L2!n>{z**++&mCkOWA81W14cNZlEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ zLs35+`xjp>T0Esc and RStudio will give you back the `>` prompt ::: +:::{.callout-important} +Complete [Exercise -@sec-exr_calculations]. +::: + ## Creating scripts ::: {.panel-tabset group="language"} @@ -184,6 +192,10 @@ by pressing the shortcut Ctrl+Enter. ::: +:::{.callout-important} +Complete [Exercise -@sec-exr_scripts]. +::: + ## Installing and loading packages ::: {.panel-tabset group="language"} @@ -213,6 +225,115 @@ library(tidyverse) ::: +:::{.callout-important} +Complete [Exercise -@sec-exr_packages]. +::: + +:::{.callout-important} +Remember, we only need to **install** a library/package *once*. However, we need to **load** it every time we start an analysis. +::: + + + +## Exercises + +### Setting up a project {#sec-exr_project} + +:::{.callout-exercise} + +{{< level 1 >}} + +Set up a project and make sure it's set as your working directory. + +::: + +### Calculations {#sec-exr_calculations} + +:::{.callout-exercise} + +{{< level 1 >}} + +Run the following calculations: + +* `2 + 23` +* `23 * 4` +* `314 - 82` +* `(12 - 4) * (6 + 2)` +* `3 ^ 2` + +::: {.callout-answer collapse=true} + +::: {.panel-tabset group="language"} +## R + +```{r} +2 + 23 +23 * 4 +314 - 82 +(12 - 4) * (6 + 2) +3 ^ 2 +``` + +::: +::: +::: + +### Creating scripts {#sec-exr_scripts} + +:::{.callout-exercise} + +{{< level 1 >}} + +Please do the following: + +1. Create a script called `session_01` in your working directory. +2. Re-run the calculations from [Exercise -@sec-exr_calculations]. +3. Save the changes to the script. + +::: + +### Adding functionality {#sec-exr_packages} + +:::{.callout-exercise} + +{{< level 1 >}} + +It's important that you are comfortable with adding functionality. + +::: {.panel-tabset group="language"} +## R + +Please install the `tidyverse` package *using the console*. + +Then, in the script you created in [Exercise -@sec-exr_scripts] load it into R. +::: + +::: {.callout-answer collapse=true} + +::: {.panel-tabset group="language"} +## R + +We can install the package as follows: + +```{r} +#| eval: false +install.packages("tidyverse") +``` + +Note that the title of the package needs to be in quotes (`" "`). + +We load the package by running the following line of code from our script: + +```{r} +#| eval: false +library(tidyverse) +``` + +Note that, rather inconsistently, we do *not* use quotes around the package name when loading it. +::: +::: +::: + ## Summary