diff --git a/_freeze/materials/resampling-practical-permutation-single/execute-results/html.json b/_freeze/materials/resampling-practical-permutation-single/execute-results/html.json index 7470f8a..0562d6b 100644 --- a/_freeze/materials/resampling-practical-permutation-single/execute-results/html.json +++ b/_freeze/materials/resampling-practical-permutation-single/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "504bdd7aa5890270857b7c83688d6a38", + "hash": "f195307e49276edb94cdde194b35780e", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Single permutation tests\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n**Questions**\n\n- How do we analyse data without distributional assumptions?\n\n**Objectives**\n\nPerform Monte Carlo permutation tests for:\n\n- Single predictors\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# A collection of R packages designed for data science\nlibrary(tidyverse)\n```\n:::\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Takes a random sample with or without replacement (default)\nbase::sample()\n```\n:::\n\n\n## Python\n\n### Libraries\n\n\n::: {.cell}\n\n```{.python .cell-code}\n# A Python data analysis and manipulation tool\nimport pandas as pd\n\n# A Python package for scientific computing\nimport numpy as np\n\n# Python equivalent of `ggplot2`\nfrom plotnine import *\n\n# Python module providing statistical functionality\nfrom scipy import stats\n```\n:::\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.python .cell-code}\n# Calculates the difference between two elements\npandas.DataFrame.diff()\n\n# Randomly permutes a sequence\nnumpy.random.permutation()\n\n# Calculates the interquartile range\nscipy.stats.iqr()\n```\n:::\n\n:::\n:::\n\n## Purpose and aim\n\nIf we wish to test for a difference between two groups in the case where the assumptions of a two-sample t-test just aren’t met, then a two-sample permutation test procedure is appropriate. It is also appropriate even if the assumptions of a t-test are met, but in that case, it would be easier to just do the t-test.\n\nOne of the additional benefits of permutation test is that we aren’t just restricted to testing hypotheses about the means of the two groups. We can test hypotheses about absolutely anything we want! So, we could see if the ranges of the two groups differed significantly etc.\n\n## Data and hypotheses\n\nLet’s consider an experimental data set where we have measured the weights of two groups of 12 female mice (so 24 mice in total). One group of mice was given a perfectly normal diet (`control`) and the other group of mice was given a high fat diet for several months (`highfat`).\n\nWe want to test whether there is any difference in the mean weight of the two groups. We still need to specify the hypotheses:\n\n$H_0$: there is no difference in the means of the two groups\n\n$H_1$: there is a difference in the means of the two groups Let’s read in the data and look at it:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight <- read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(mice_weight, aes(x = diet, y = weight)) +\n geom_boxplot() +\n geom_jitter(width = 0.1)\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmice_weight_py = pd.read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\n(ggplot(mice_weight_py,\n aes(x = \"diet\",\n y = \"weight\")) +\n geom_boxplot() +\n geom_jitter(width = 0.1))\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-10-1.png){width=614}\n:::\n:::\n\n\n:::\n\nLooking at the data, it appears that there might be a difference between the mean weight of the two groups. The weights of the mice on the `highfat` diet appears somewhat higher than on `control`, although there is quite some overlap between the data.\n\nThe medians (the horizontal lines in the boxes) are shifted - as are the boxes. So, the first thing we probably want to do is to calculate the exact difference in means:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight %>% \n group_by(diet) %>% \n summarise(mean_weight = mean(weight)) %>% \n ungroup()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n diet mean_weight\n \n1 control 23.8\n2 highfat 26.8\n```\n\n\n:::\n:::\n\n\nWe'll want to use this difference later on, so we store it in a variable:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobs_diff_weight <- mice_weight %>% \n group_by(diet) %>% \n summarise(mean_weight = mean(weight)) %>% \n ungroup() %>% \n # calculate the difference in weight\n # (there are many ways that you can do this)\n pull(mean_weight) %>% \n diff()\n\nobs_diff_weight\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 3.020833\n```\n\n\n:::\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nobs_diff_weight = (mice_weight_py\n .groupby('diet')['weight']\n .mean()\n .diff()\n .iloc[-1])\n```\n:::\n\n\nThis gives us an observed difference of:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nobs_diff_weight\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n3.020833333333332\n```\n\n\n:::\n:::\n\n\n\n:::\n\nWhat we want to know is: is this difference unusual/big/statistically significant? Specifically, how likely would it be to get a difference this big if there were no difference between the two groups?\n\n## Permutation theory\n\nThe key idea behind permutation techniques is that if the null hypothesis is true, and there is no difference between the two groups then if I were to switch some of the mice from one group to the next then this wouldn’t change the difference between the groups too much. If on the other hand there actually is a difference between the groups (with one group having much higher weights than the other), then if I were to switch some mice between the groups then this should average out the two groups leading to a smaller difference in group means.\n\nSo, what we do is we shuffle the mice weights around lots and lots of times, calculating the difference between the group means each time. Once we have done this shuffling hundreds or thousands of times, we will have loads of possible values for the difference in the two group means. At this stage we can look at our actual difference (the one we calculated from our original data) and see how this compares to all of the simulated differences.\n\nWe can calculate how many of the simulated differences are bigger than our real difference and this proportion is exactly the p-value that we’re looking for!\n\n## Permutation example\n\nLet look at how to carry this out in practice.\n\nWe need to approach this a bit logically, since we are going to iterate a process multiple times. We can break down the steps into the following:\n\n1. Define the number of permutations.\n2. Permute (randomly assign) the `diet` labels, without replacing.\n3. Calculate the *new difference* in means between the groups.\n4. Store the difference and repeat.\n\n### Permute the data\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe will be randomly shuffling our data around. So we set the `seed`, to aid reproducibility for the example.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseed <- 2602\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(seed)\n\n# Set the number of permutations\nreps <- 1000\n\n# Create a place to store the values\npermuted_stats <- tibble(permuted_diff = numeric(reps))\n\n# Loop through process\nfor(i in 1:reps){\n # Get the data \n permuted_data <- mice_weight\n \n # Permute (reshuffle) the group labels\n permuted_data$diet <- sample(permuted_data$diet)\n \n # Calculate the new group differences\n permuted_diff <- permuted_data %>% \n group_by(diet) %>% \n summarise(mean_weight = mean(weight)) %>% \n ungroup() %>% \n pull(mean_weight) %>% \n diff()\n \n # Store the calculated difference\n permuted_stats$permuted_diff[i] <- permuted_diff\n}\n```\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nseed = 2602\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\nnp.random.seed(seed)\n\n# Set the number of permutations\nreps = 1000\n\n# Create a place to store the values\npermuted_stats = pd.DataFrame({'permuted_diff': [0] * reps})\n\n# Loop through process\nfor i in range(reps):\n # Get the data\n permuted_data_py = mice_weight_py\n \n # Permute the group labels\n permuted_data_py['diet'] = (np\n .random\n .permutation(permuted_data_py['diet']\n .values))\n \n # Calculate the new group difference\n permuted_diff = (permuted_data_py \n .groupby('diet')['weight']\n .mean()\n .diff()\n .iloc[-1])\n\n # Store the calculated difference\n permuted_stats['permuted_diff'][i] = permuted_diff\n```\n:::\n\n\n:::\n\n### Comparing permuted values\n\nWe can visualise the difference as a histogram:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(permuted_stats, aes(permuted_diff)) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_weight, colour = \"blue\", linewidth = 1)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-19-1.png){width=672}\n:::\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\n(ggplot(permuted_stats,\n aes(x = \"permuted_diff\")) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_weight, colour = \"blue\", size = 1))\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-20-1.png){width=614}\n:::\n:::\n\n\n:::\n\nThe histogram is centred around zero, as we would expect: under the null hypothesis in this analysis there shouldn’t be any difference between the groups.\n\nThe blue vertical line shows us the value of our actual observed difference.\n\nWe can see that our observed difference is unlikely (because it’s out in the tails of the distribution rather than in the middle), but we want to be able to say exactly how unlikely. To do that we need to calculate the proportion of simulated differences that were bigger than our observed value. And, because we’re interested in a two-tailed test, we also need to include any simulated values that were less than -3.02 (so the lower tail).\n\n### Calculating statistical significance\n\nIn essence, we are calculating the *statistical significance* by comparing our original data against the null hypothesis.\n\nWe do this in the following steps:\n\n1. Count the number of occurrences where the permuted difference (`permuted_diff`) is larger than the observed weight difference (`obs_diff_weight`).\n2. Divide this by the number of times we permuted the data (`reps`)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\npermuted_stats %>% \n filter(abs(permuted_diff) > obs_diff_weight) %>% \n nrow()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 51\n```\n\n\n:::\n:::\n\n\nIf we divide this number by 1000 (the number of permutations), we get a value of 0.051.\n\nIn this case we have fixed the random number generator. You might not have done that and normally you probably don't want to either. In that case you will get a slightly different value every time you run this. In order to get more precision on this p-value we will need to run more than 1,000 replicates.\n\nA good rule of thumb is that, for 1,000 replicates, a permutation test will return a p-value to within 1% of the actual value (so in this case the p-value is probably between 0.041 and 0.061). If we go to 10,000 replicates, then the error in the estimated p-value reduces to about 0.1%.\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nlarger_diff = len(permuted_stats[permuted_stats['permuted_diff'] \\\n .abs() > obs_diff_weight])\n\nlarger_diff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n50\n```\n\n\n:::\n:::\n\n\nIf we divide this number by 1000 (the number of permutations), we get a value of 0.05.\n\nIn this case we have fixed the random number generator. You might not have done that and normally you probably don't want to either. In that case you will get a slightly different value every time you run this. In order to get more precision on this p-value we will need to run more than 1,000 replicates.\n\nA good rule of thumb is that, for 1,000 replicates, a permutation test will return a p-value to within 1% of the actual value (so in this case the p-value is probably between 0.04 and 0.06). If we go to 10,000 replicates, then the error in the estimated p-value reduces to about 0.1%.\n:::\n\n## Exercises\n\n### Permuting IQR {#sec-exr_iqr}\n\n:::{.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise we'll again use the `mice_weight` data from `data/mice_weight.csv`.\n\nOne of the advantages of using permutation tests is that we're not limited to just exploring the mean or median from our data. To practice this, we'll explore differences in the interquartile range (IQR) between the `control` and `highfat` groups.\n\nQuestion: is there a significant difference in the IQR of `weight` between `control` and `highfat` mice?\n\n::: {.callout-answer collapse=\"true\"}\n\nTo address the question, we do the following:\n\n1. Load and visualise the data\n2. Calculate the observed IQR for both groups\n3. Permute the data\n4. Calculate how often the permuted IQR is larger than the observed\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n#### Load and visualise the data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight <- read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n\nWe have visualised the data previously.\n\n#### Observed statistic\n\nTo calculate the interquartile range, we use the `IQR()` function. The IQR for each group is as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight %>% \n group_by(diet) %>% \n summarise(iqr_weight = IQR(weight))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n diet iqr_weight\n \n1 control 5.04\n2 highfat 5.49\n```\n\n\n:::\n:::\n\n\nThe observed difference in IQR is:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobs_diff_iqr <- mice_weight %>% \n group_by(diet) %>% \n summarise(iqr_weight = IQR(weight)) %>% \n pull(iqr_weight) %>% \n diff()\n\nobs_diff_iqr\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.45\n```\n\n\n:::\n:::\n\n\n#### Permute the data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseed <- 2602\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(seed)\n\n# Set the number of permutations\nreps <- 1000\n\n# Create a place to store the values\npermuted_stats <- tibble(permuted_diff = numeric(reps))\n\n# Loop through process\nfor(i in 1:reps){\n # Get the data \n permuted_data <- mice_weight\n \n # Permute (reshuffle) the group labels\n permuted_data$diet <- sample(permuted_data$diet)\n \n # Calculate the new group differences\n permuted_diff <- permuted_data %>% \n group_by(diet) %>% \n summarise(iqr_weight = IQR(weight)) %>% \n ungroup() %>% \n pull(iqr_weight) %>% \n diff()\n \n # Store the calculated difference\n permuted_stats$permuted_diff[i] <- permuted_diff\n}\n```\n:::\n\n\nWe visualise these as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(permuted_stats, aes(permuted_diff)) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_iqr, colour = \"blue\", linewidth = 1)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-28-1.png){width=672}\n:::\n:::\n\n#### Calculate statistical significance\n\n\n::: {.cell}\n\n```{.r .cell-code}\npermuted_stats %>% \n filter(abs(permuted_diff) > obs_diff_iqr) %>% \n nrow()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 803\n```\n\n\n:::\n:::\n\n\nDividing this by the number of permutations (1000) gives us a p-value of 0.803.\n\n## Python\n\n#### Load and visualise the data\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmice_weight_py = pd.read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n\nWe have visualised the data previously.\n\n#### Observed statistic\n\nWe use the `iqr()` function from `scipy.stats`:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nfrom scipy.stats import iqr\n\nobs_iqr = (mice_weight_py\n .groupby('diet')['weight']\n .agg(iqr))\n\nobs_iqr\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ndiet\ncontrol 5.0375\nhighfat 5.4875\nName: weight, dtype: float64\n```\n\n\n:::\n:::\n\n\nThis gives us an observed difference of:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nobs_diff_iqr = obs_iqr.diff().iloc[-1]\n\nobs_diff_iqr\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n0.45000000000000284\n```\n\n\n:::\n:::\n\n\n#### Permute the data\n\n\n::: {.cell}\n\n```{.python .cell-code}\nseed = 2602\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\nnp.random.seed(seed)\n\n# Set the number of permutations\nreps = 1000\n\n# Create a place to store the values\npermuted_stats = pd.DataFrame({'permuted_diff': [0] * reps})\n\n# Loop through process\nfor i in range(reps):\n # Get the data\n permuted_data_py = mice_weight_py\n \n # Permute the group labels\n permuted_data_py['diet'] = (np\n .random\n .permutation(permuted_data_py['diet']\n .values))\n \n # Calculate the new group difference\n permuted_iqr = (permuted_data_py\n .groupby('diet')['weight']\n .agg(iqr))\n\n permuted_diff = permuted_iqr.diff().iloc[-1]\n\n # Store the calculated difference\n permuted_stats['permuted_diff'][i] = permuted_diff\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\n(ggplot(permuted_stats,\n aes(x = \"permuted_diff\")) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_iqr, colour = \"blue\", size = 1))\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-35-1.png){width=614}\n:::\n:::\n\n\n#### Calculate the statistical significance\n\nHere we need to find all the values where the permuted IQR is *smaller* than -0.45 or *larger* than 0.45:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nlarger_diff = len(permuted_stats[permuted_stats['permuted_diff'] \\\n .abs() > obs_diff_iqr])\n\nlarger_diff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n808\n```\n\n\n:::\n:::\n\n\n\nIf we divide this number by 1000 (the number of permutations), we get a value of 0.808.\n\n:::\n\nThis analysis shows that there is no statistically significant difference between the interquartile range of weight for the two different diets.\n\n\n:::{.callout-note collapse=true}\n## Differences in IQR in R vs Python: would you like to know more?\n\nThe eagle-eyed amongst you might have noticed that the values calculated between R and Python are slightly different. Part of this is caused by the difference in how the random number generators work between the two languages (which we're not going to go into) and part of it by the difference in how the IQR is calculated.\n\nIn R, the `IQR()` function uses a default method to calculate the quartiles (\"type-7\"), which excludes the smallest and largest 25% of the data when calculating the quartiles.\n\nIn Python, the `scipy.stats.iqr()` function calculates the interquartile range simply as the difference between the 75th and 25th percentiles.\n\nHence, some slight differences. If you've *really* got your mind set on making them more equivalent you can specify an extra argument in Python: `rng`. You can set it to include the middle 50% of the data: `iqr(data, rng = (25, 75))`.\n:::\n\n:::\n:::\n\n\n\n\n\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- We can use resampled data to draw conclusions around how likely it is our original data would occur, given the null hypothesis.\n- We can calculate statistical significance using this approach.\n:::\n", + "markdown": "---\ntitle: \"Single permutation tests\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n**Questions**\n\n- How do we analyse data without distributional assumptions?\n\n**Objectives**\n\nPerform Monte Carlo permutation tests for:\n\n- Single predictors\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# A collection of R packages designed for data science\nlibrary(tidyverse)\n```\n:::\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Takes a random sample with or without replacement (default)\nbase::sample()\n```\n:::\n\n\n## Python\n\n### Libraries\n\n\n::: {.cell}\n\n```{.python .cell-code}\n# A Python data analysis and manipulation tool\nimport pandas as pd\n\n# A Python package for scientific computing\nimport numpy as np\n\n# Python equivalent of `ggplot2`\nfrom plotnine import *\n\n# Python module providing statistical functionality\nfrom scipy import stats\n```\n:::\n\n\n### Functions\n\n\n::: {.cell}\n\n```{.python .cell-code}\n# Calculates the difference between two elements\npandas.DataFrame.diff()\n\n# Randomly permutes a sequence\nnumpy.random.permutation()\n\n# Calculates the interquartile range\nscipy.stats.iqr()\n```\n:::\n\n:::\n:::\n\n## Purpose and aim\n\nIf we wish to test for a difference between two groups in the case where the assumptions of a two-sample t-test just aren’t met, then a two-sample permutation test procedure is appropriate. It is also appropriate even if the assumptions of a t-test are met, but in that case, it would be easier to just do the t-test.\n\nOne of the additional benefits of permutation test is that we aren’t just restricted to testing hypotheses about the means of the two groups. We can test hypotheses about absolutely anything we want! So, we could see if the ranges of the two groups differed significantly etc.\n\n## Data and hypotheses\n\nLet’s consider an experimental data set where we have measured the weights of two groups of 12 female mice (so 24 mice in total). One group of mice was given a perfectly normal diet (`control`) and the other group of mice was given a high fat diet for several months (`highfat`).\n\nWe want to test whether there is any difference in the mean weight of the two groups. We still need to specify the hypotheses:\n\n$H_0$: there is no difference in the means of the two groups\n\n$H_1$: there is a difference in the means of the two groups Let’s read in the data and look at it:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight <- read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(mice_weight, aes(x = diet, y = weight)) +\n geom_boxplot() +\n geom_jitter(width = 0.1)\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmice_weight_py = pd.read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\n(ggplot(mice_weight_py,\n aes(x = \"diet\",\n y = \"weight\")) +\n geom_boxplot() +\n geom_jitter(width = 0.1))\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-10-1.png){width=614}\n:::\n:::\n\n\n:::\n\nLooking at the data, it appears that there might be a difference between the mean weight of the two groups. The weights of the mice on the `highfat` diet appears somewhat higher than on `control`, although there is quite some overlap between the data.\n\nThe medians (the horizontal lines in the boxes) are shifted - as are the boxes. So, the first thing we probably want to do is to calculate the exact difference in means:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight %>% \n group_by(diet) %>% \n summarise(mean_weight = mean(weight)) %>% \n ungroup()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n diet mean_weight\n \n1 control 23.8\n2 highfat 26.8\n```\n\n\n:::\n:::\n\n\nWe'll want to use this difference later on, so we store it in a variable:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobs_diff_weight <- mice_weight %>% \n group_by(diet) %>% \n summarise(mean_weight = mean(weight)) %>% \n ungroup() %>% \n # calculate the difference in weight\n # (there are many ways that you can do this)\n pull(mean_weight) %>% \n diff()\n\nobs_diff_weight\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 3.020833\n```\n\n\n:::\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nobs_diff_weight = (mice_weight_py\n .groupby('diet')['weight']\n .mean()\n .diff()\n .iloc[-1])\n```\n:::\n\n\nThis gives us an observed difference of:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nobs_diff_weight\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n3.020833333333332\n```\n\n\n:::\n:::\n\n\n\n:::\n\nWhat we want to know is: is this difference unusual/big/statistically significant? Specifically, how likely would it be to get a difference this big if there were no difference between the two groups?\n\n## Permutation theory\n\nThe key idea behind permutation techniques is that if the null hypothesis is true, and there is no difference between the two groups then if I were to switch some of the mice from one group to the next then this wouldn’t change the difference between the groups too much. If on the other hand there actually is a difference between the groups (with one group having much higher weights than the other), then if I were to switch some mice between the groups then this should average out the two groups leading to a smaller difference in group means.\n\nSo, what we do is we shuffle the mice weights around lots and lots of times, calculating the difference between the group means each time. Once we have done this shuffling hundreds or thousands of times, we will have loads of possible values for the difference in the two group means. At this stage we can look at our actual difference (the one we calculated from our original data) and see how this compares to all of the simulated differences.\n\nWe can calculate how many of the simulated differences are bigger than our real difference and this proportion is exactly the p-value that we’re looking for!\n\n## Permutation example\n\nLet look at how to carry this out in practice.\n\nWe need to approach this a bit logically, since we are going to iterate a process multiple times. We can break down the steps into the following:\n\n1. Define the number of permutations.\n2. Permute (randomly assign) the `diet` labels, without replacing.\n3. Calculate the *new difference* in means between the groups.\n4. Store the difference and repeat.\n\n### Permute the data\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe will be randomly shuffling our data around. So we set the `seed`, to aid reproducibility for the example.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseed <- 2602\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(seed)\n\n# Set the number of permutations\nreps <- 1000\n\n# Create a place to store the values\npermuted_stats <- tibble(permuted_diff = numeric(reps))\n\n# Loop through process\nfor(i in 1:reps){\n # Get the data \n permuted_data <- mice_weight\n \n # Permute (reshuffle) the group labels\n permuted_data$diet <- sample(permuted_data$diet)\n \n # Calculate the new group differences\n permuted_diff <- permuted_data %>% \n group_by(diet) %>% \n summarise(mean_weight = mean(weight)) %>% \n ungroup() %>% \n pull(mean_weight) %>% \n diff()\n \n # Store the calculated difference\n permuted_stats$permuted_diff[i] <- permuted_diff\n}\n```\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nseed = 2602\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\nnp.random.seed(seed)\n\n# Set the number of permutations\nreps = 1000\n\n# Create a place to store the values\npermuted_stats = pd.DataFrame({'permuted_diff': [0] * reps})\n\n# Loop through process\nfor i in range(reps):\n # Get the data\n permuted_data_py = mice_weight_py\n \n # Permute the group labels\n permuted_data_py['diet'] = (np\n .random\n .permutation(permuted_data_py['diet']\n .values))\n \n # Calculate the new group difference\n permuted_diff = (permuted_data_py \n .groupby('diet')['weight']\n .mean()\n .diff()\n .iloc[-1])\n\n # Store the calculated difference\n permuted_stats['permuted_diff'][i] = permuted_diff\n```\n:::\n\n\n:::\n\n### Comparing permuted values\n\nWe can visualise the difference as a histogram:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(permuted_stats, aes(permuted_diff)) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_weight, colour = \"blue\", linewidth = 1)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-19-1.png){width=672}\n:::\n:::\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\n(ggplot(permuted_stats,\n aes(x = \"permuted_diff\")) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_weight, colour = \"blue\", size = 1))\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-20-1.png){width=614}\n:::\n:::\n\n\n:::\n\nThe histogram is centred around zero, as we would expect: under the null hypothesis in this analysis there shouldn’t be any difference between the groups.\n\nThe blue vertical line shows us the value of our actual observed difference.\n\nWe can see that our observed difference is unlikely (because it’s out in the tails of the distribution rather than in the middle), but we want to be able to say exactly how unlikely. To do that we need to calculate the proportion of simulated differences that were bigger than our observed value. And, because we’re interested in a two-tailed test, we also need to include any simulated values that were less than -3.02 (so the lower tail).\n\n### Calculating statistical significance\n\nIn essence, we are calculating the *statistical significance* by comparing our original data against the null hypothesis.\n\nWe do this in the following steps:\n\n1. Count the number of occurrences where the permuted difference (`permuted_diff`) is larger than the observed weight difference (`obs_diff_weight`).\n2. Divide this by the number of times we permuted the data (`reps`)\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\npermuted_stats %>% \n filter(abs(permuted_diff) > obs_diff_weight) %>% \n nrow()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 51\n```\n\n\n:::\n:::\n\n\nIf we divide this number by 1000 (the number of permutations), we get a value of 0.051.\n\nIn this case we have fixed the random number generator. You might not have done that and normally you probably don't want to either. In that case you will get a slightly different value every time you run this. In order to get more precision on this p-value we will need to run more than 1,000 replicates.\n\nA good rule of thumb is that, for 1,000 replicates, a permutation test will return a p-value to within 1% of the actual value (so in this case the p-value is probably between 0.041 and 0.061). If we go to 10,000 replicates, then the error in the estimated p-value reduces to about 0.1%.\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nlarger_diff = len(permuted_stats[permuted_stats['permuted_diff'] \\\n .abs() > obs_diff_weight])\n\nlarger_diff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n50\n```\n\n\n:::\n:::\n\n\nIf we divide this number by 1000 (the number of permutations), we get a value of 0.05.\n\nIn this case we have fixed the random number generator. You might not have done that and normally you probably don't want to either. In that case you will get a slightly different value every time you run this. In order to get more precision on this p-value we will need to run more than 1,000 replicates.\n\nA good rule of thumb is that, for 1,000 replicates, a permutation test will return a p-value to within 1% of the actual value (so in this case the p-value is probably between 0.04 and 0.06). If we go to 10,000 replicates, then the error in the estimated p-value reduces to about 0.1%.\n:::\n\n## Exercises\n\n### Permuting IQR {#sec-exr_iqr}\n\n:::{.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise we'll again use the `mice_weight` data from `data/mice_weight.csv`.\n\nOne of the advantages of using permutation tests is that we're not limited to just exploring the mean or median from our data. To practice this, we'll explore differences in the interquartile range (IQR) between the `control` and `highfat` groups.\n\nQuestion: is there a significant difference in the IQR of `weight` between `control` and `highfat` mice?\n\n::: {.callout-answer collapse=\"true\"}\n\nTo address the question, we do the following:\n\n1. Load and visualise the data\n2. Calculate the observed IQR for both groups\n3. Permute the data\n4. Calculate how often the permuted IQR is larger than the observed\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n#### Load and visualise the data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight <- read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n\nWe have visualised the data previously.\n\n#### Observed statistic\n\nTo calculate the interquartile range, we use the `IQR()` function. The IQR for each group is as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmice_weight %>% \n group_by(diet) %>% \n summarise(iqr_weight = IQR(weight))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 2\n diet iqr_weight\n \n1 control 5.04\n2 highfat 5.49\n```\n\n\n:::\n:::\n\n\nThe observed difference in IQR is:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobs_diff_iqr <- mice_weight %>% \n group_by(diet) %>% \n summarise(iqr_weight = IQR(weight)) %>% \n pull(iqr_weight) %>% \n diff()\n\nobs_diff_iqr\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.45\n```\n\n\n:::\n:::\n\n\n#### Permute the data\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseed <- 2602\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(seed)\n\n# Set the number of permutations\nreps <- 1000\n\n# Create a place to store the values\npermuted_stats <- tibble(permuted_diff = numeric(reps))\n\n# Loop through process\nfor(i in 1:reps){\n # Get the data \n permuted_data <- mice_weight\n \n # Permute (reshuffle) the group labels\n permuted_data$diet <- sample(permuted_data$diet)\n \n # Calculate the new group differences\n permuted_diff <- permuted_data %>% \n group_by(diet) %>% \n summarise(iqr_weight = IQR(weight)) %>% \n ungroup() %>% \n pull(iqr_weight) %>% \n diff()\n \n # Store the calculated difference\n permuted_stats$permuted_diff[i] <- permuted_diff\n}\n```\n:::\n\n\nWe visualise these as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(permuted_stats, aes(permuted_diff)) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_iqr, colour = \"blue\", linewidth = 1)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-28-1.png){width=672}\n:::\n:::\n\n#### Calculate statistical significance\n\n\n::: {.cell}\n\n```{.r .cell-code}\npermuted_stats %>% \n filter(abs(permuted_diff) > obs_diff_iqr) %>% \n nrow()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 803\n```\n\n\n:::\n:::\n\n\nDividing this by the number of permutations (1000) gives us a p-value of 0.803.\n\n## Python\n\n#### Load and visualise the data\n\n\n::: {.cell}\n\n```{.python .cell-code}\nmice_weight_py = pd.read_csv(\"data/mice_weight.csv\")\n```\n:::\n\n\nWe have visualised the data previously.\n\n#### Observed statistic\n\nWe use the `iqr()` function from `scipy.stats`:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nfrom scipy.stats import iqr\n\nobs_iqr = (mice_weight_py\n .groupby('diet')['weight']\n .agg(iqr))\n\nobs_iqr\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\ndiet\ncontrol 5.0375\nhighfat 5.4875\nName: weight, dtype: float64\n```\n\n\n:::\n:::\n\n\nThis gives us an observed difference of:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nobs_diff_iqr = obs_iqr.diff().iloc[-1]\n\nobs_diff_iqr\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n0.45000000000000284\n```\n\n\n:::\n:::\n\n\n#### Permute the data\n\n\n::: {.cell}\n\n```{.python .cell-code}\nseed = 2602\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\nnp.random.seed(seed)\n\n# Set the number of permutations\nreps = 1000\n\n# Create a place to store the values\npermuted_stats = pd.DataFrame({'permuted_diff': [0] * reps})\n\n# Loop through process\nfor i in range(reps):\n # Get the data\n permuted_data_py = mice_weight_py\n \n # Permute the group labels\n permuted_data_py['diet'] = (np\n .random\n .permutation(permuted_data_py['diet']\n .values))\n \n # Calculate the new group difference\n permuted_iqr = (permuted_data_py\n .groupby('diet')['weight']\n .agg(iqr))\n\n permuted_diff = permuted_iqr.diff().iloc[-1]\n\n # Store the calculated difference\n permuted_stats['permuted_diff'][i] = permuted_diff\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\n(ggplot(permuted_stats,\n aes(x = \"permuted_diff\")) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_iqr, colour = \"blue\", size = 1))\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-35-1.png){width=614}\n:::\n:::\n\n\n#### Calculate the statistical significance\n\nHere we need to find all the values where the permuted IQR is *smaller* than -0.45 or *larger* than 0.45:\n\n\n::: {.cell}\n\n```{.python .cell-code}\nlarger_diff = len(permuted_stats[permuted_stats['permuted_diff'] \\\n .abs() > obs_diff_iqr])\n\nlarger_diff\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n808\n```\n\n\n:::\n:::\n\n\n\nIf we divide this number by 1000 (the number of permutations), we get a value of 0.808.\n\n:::\n\nThis analysis shows that there is no statistically significant difference between the interquartile range of weight for the two different diets.\n\n\n:::{.callout-note collapse=true}\n## Differences in IQR in R vs Python: would you like to know more?\n\nThe eagle-eyed amongst you might have noticed that the values calculated between R and Python are slightly different. Part of this is caused by the difference in how the random number generators work between the two languages (which we're not going to go into) and part of it by the difference in how the IQR is calculated.\n\nIn R, the `IQR()` function uses a default method to calculate the quartiles (\"type-7\"), which excludes the smallest and largest 25% of the data when calculating the quartiles.\n\nIn Python, the `scipy.stats.iqr()` function calculates the interquartile range simply as the difference between the 75th and 25th percentiles.\n\nHence, some slight differences. If you've *really* got your mind set on making them more equivalent you can specify an extra argument in Python: `rng`. You can set it to include the middle 50% of the data: `iqr(data, rng = (25, 75))`.\n:::\n\n:::\n:::\n\n### Rats - strange metrics {#sec-exr_ratswheel}\n\n:::{.callout-exercise}\n\n\n{{< level 2 >}}\n\n\n\nFor this exercise we'll be using the data from `data/rats_wheel.csv`.\n\nThis data set contains information on the length of time that 24 rats were able to stay balanced on a rotating wheel. Half of the rats were assigned to the control group and the other half were given a dose of a centrally acting muscle relaxant. The animals were placed on a rotating cylinder and the length of time that each rat remained on the cylinder was measured, up to a maximum of 300 seconds.\n\nThe data set contains two variables: `time` and `group`.\n\nWhilst you could explore differences in means between these two groups, in this case an alternative statistic presents itself. When you look at the data you should notice that for the control group that all 12 rats manage to stay on the roller for the maximum 300 seconds, whereas in the treated group 5 out of the 12 fall off earlier.\n\nFor this exercise, instead of calculating the mean length of time for each group, you should calculate the proportion of rats that make it to 300s in each group and find the difference. This will be your statistic.\n\nAnswer the following questions:\n\n1. Is the proportion of rats that remain on the wheel the entire duration of the experiment is the same between each group? Use a permutation test to explore this.\n2. Why would the difference in medians be a particularly useless statistic in this case?\n\n::: {.callout-answer collapse=\"true\"}\n\nLoad and visualise the data.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrats_wheel <- read_csv(\"data/rats_wheel.csv\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nRows: 24 Columns: 2\n── Column specification ────────────────────────────────────────────────────────\nDelimiter: \",\"\nchr (1): group\ndbl (1): time\n\nℹ Use `spec()` to retrieve the full column specification for this data.\nℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.\n```\n\n\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(rats_wheel, aes(x = group, y = time)) +\n geom_boxplot() +\n geom_jitter(width = 0.1)\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-38-1.png){width=672}\n:::\n:::\n\n\n\n## Python\n\n\n::: {.cell}\n\n```{.python .cell-code}\nrats_wheel_py = pd.read_csv(\"data/rats_wheel.csv\")\n```\n:::\n\n::: {.cell}\n\n```{.python .cell-code}\n(ggplot(rats_wheel_py,\n aes(x = \"group\",\n y = \"time\")) +\n geom_boxplot() +\n geom_jitter(width = 0.1))\n```\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-40-1.png){width=614}\n:::\n:::\n\n\n:::\n\nWe can immediately see what the issue is with these data. All of the `control` subjects stayed on the wheel for 300s. If we would check the diagnostic plots for these data then it would not look very promising. For example, I am confident that the equality of variance assumption will not hold here!\n\nSo, let's do what we're asked to do and calculate the proportion of rats that make it to 300s, for each group.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nThere are many ways of doing this, but here is one:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprop_rats <- rats_wheel %>% \n group_by(group, time) %>% \n count() %>% \n group_by(group) %>% \n mutate(group_n = sum(n),\n prop_rats = n / group_n) %>% \n filter(time == 300)\n\nprop_rats\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 2 × 5\n# Groups: group [2]\n group time n group_n prop_rats\n \n1 control 300 12 12 1 \n2 treatment 300 7 12 0.583\n```\n\n\n:::\n:::\n\nNext, we calculate the *difference* in the proportion of rats that make it to 300s, between the two groups.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nobs_diff_prop <- prop_rats %>% \n pull(prop_rats) %>% \n diff()\n\nobs_diff_prop\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] -0.4166667\n```\n\n\n:::\n:::\n\n\n## Python\n\nThere are many ways of doing this, but here's one:\n\n\n::: {.cell}\n\n```{.python .cell-code}\n# Calculate the total number of observations in each group\nrats_wheel_py['group_n'] = rats_wheel_py.groupby('group')['group'].transform('size')\n\n# Count the number of occurrences for each unique time point\n# and keep the total group count\nprop_rats_py = rats_wheel_py.groupby(['group', 'group_n', 'time']).size().reset_index(name = 'n')\n\n# Calculate the proportion of rats that make it to each time point\nprop_rats_py['prop_rats'] = prop_rats_py['n'] / prop_rats_py['group_n']\n\n# Filter for the 300s time point\nprop_rats_py = prop_rats_py[prop_rats_py['time'] == 300]\n\nprop_rats_py\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n group group_n time n prop_rats\n0 control 12 300 12 1.000000\n6 treatment 12 300 7 0.583333\n```\n\n\n:::\n:::\n\n\nNext, we calculate the *difference* in the proportion of rats that make it to 300s, between the two groups.\n\n\n::: {.cell}\n\n```{.python .cell-code}\nobs_diff_prop = prop_rats_py['prop_rats'].diff().iloc[-1]\n\nobs_diff_prop\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n-0.41666666666666663\n```\n\n\n:::\n:::\n\n:::\n\nNow we've got that out of the way, we can permute our data. We're reshuffling the `group` labels randomly, then recalculating the permuted proportional difference at time point 300s.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nseed <- 2602\n```\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nset.seed(seed)\n\n# Set the number of permutations\nreps <- 1000\n\n# Create a place to store the values\npermuted_stats <- tibble(permuted_diff = numeric(reps))\n\n# Loop through process\nfor(i in 1:reps){\n # Get the data \n permuted_data <- rats_wheel\n \n # Permute (reshuffle) the group labels\n permuted_data$group <- sample(permuted_data$group)\n \n # Calculate the new proportional differences\n \n permuted_diff <- permuted_data %>% \n group_by(group, time) %>% \n count() %>% \n group_by(group) %>% \n mutate(group_n = sum(n),\n prop_rats = n / group_n) %>% \n filter(time == 300) %>% \n pull(prop_rats) %>% \n diff()\n \n # Store the calculated difference\n permuted_stats$permuted_diff[i] <- permuted_diff\n}\n```\n:::\n\n\n## Python\n\n:::\n\nWe can now compare the permuted values:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(permuted_stats, aes(permuted_diff)) +\n geom_histogram() +\n geom_vline(xintercept = obs_diff_prop, colour = \"blue\", linewidth = 1) +\n geom_vline(xintercept = abs(obs_diff_prop), colour = \"blue\", linewidth = 1)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.\n```\n\n\n:::\n\n::: {.cell-output-display}\n![](resampling-practical-permutation-single_files/figure-html/unnamed-chunk-47-1.png){width=672}\n:::\n:::\n\n\n\n## Python\n\n:::\n\nWe can now answer the question if the proportion of rats that make it to full time is different between the groups. We do this by comparing the number of occurrences in the resampled data against the original data. How many times is the difference in proportion larger than the observed difference in proportion? Remember, we are doing a two-tailed test, so we need to get the values on either side of the observed proportion.\n\nIn this case the observed difference in proportion between `control` and `treatment` is negative, which we need to take into account.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\npermuted_stats %>% \n filter(permuted_diff < obs_diff_prop |\n permuted_diff > abs(obs_diff_prop))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 0 × 1\n# ℹ 1 variable: permuted_diff \n```\n\n\n:::\n:::\n\n\n\n## Python\n\n:::\n\nWe find that none of the permuted differences in proportion between `control` and `treatment` that make it to 300s is outside the one we observed. This means that it is extremely unlikely that we'd find these data, if there is indeed no difference between the two groups.\n\nTo answer the second question: the median is a particularly useless statistic to use with these data because there is no variation in the measurements for the `control` group. All of the values are 300s, meaning you can't find a value where 50% of the data is on one side of it and 50% of the data is on the other!\n:::\n:::\n\n\n\n\n\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- We can use resampled data to draw conclusions around how likely it is our original data would occur, given the null hypothesis.\n- We can calculate statistical significance using this approach.\n:::\n", "supporting": [ "resampling-practical-permutation-single_files" ], diff --git a/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-10-1.png b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-10-1.png index 0fbece2..0162f60 100644 Binary files a/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-10-1.png and b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-10-1.png differ diff --git a/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-38-1.png b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-38-1.png new file mode 100644 index 0000000..5b7d709 Binary files /dev/null and b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-38-1.png differ diff --git a/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-40-1.png b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-40-1.png new file mode 100644 index 0000000..8e08e61 Binary files /dev/null and b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-40-1.png differ diff --git a/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-47-1.png b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-47-1.png new file mode 100644 index 0000000..60c2b15 Binary files /dev/null and b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-47-1.png differ diff --git a/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-8-1.png b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-8-1.png index e40e5d5..2c0ee2a 100644 Binary files a/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-8-1.png and b/_freeze/materials/resampling-practical-permutation-single/figure-html/unnamed-chunk-8-1.png differ diff --git a/materials/data/rats_wheel.csv b/materials/data/rats_wheel.csv new file mode 100644 index 0000000..9310420 --- /dev/null +++ b/materials/data/rats_wheel.csv @@ -0,0 +1,25 @@ +"time","group" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +300,"control" +22,"treatment" +300,"treatment" +75,"treatment" +271,"treatment" +300,"treatment" +18,"treatment" +300,"treatment" +300,"treatment" +163,"treatment" +300,"treatment" +300,"treatment" +300,"treatment" diff --git a/materials/resampling-practical-permutation-single.qmd b/materials/resampling-practical-permutation-single.qmd index 630f54b..6878e4b 100644 --- a/materials/resampling-practical-permutation-single.qmd +++ b/materials/resampling-practical-permutation-single.qmd @@ -579,6 +579,210 @@ Hence, some slight differences. If you've *really* got your mind set on making t ::: ::: +### Rats - strange metrics {#sec-exr_ratswheel} + +:::{.callout-exercise} + +{{< level 2 >}} + +For this exercise we'll be using the data from `data/rats_wheel.csv`. + +This data set contains information on the length of time that 24 rats were able to stay balanced on a rotating wheel. Half of the rats were assigned to the control group and the other half were given a dose of a centrally acting muscle relaxant. The animals were placed on a rotating cylinder and the length of time that each rat remained on the cylinder was measured, up to a maximum of 300 seconds. + +The data set contains two variables: `time` and `group`. + +Whilst you could explore differences in means between these two groups, in this case an alternative statistic presents itself. When you look at the data you should notice that for the control group that all 12 rats manage to stay on the roller for the maximum 300 seconds, whereas in the treated group 5 out of the 12 fall off earlier. + +For this exercise, instead of calculating the mean length of time for each group, you should calculate the proportion of rats that make it to 300s in each group and find the difference. This will be your statistic. + +Answer the following questions: + +1. Is the proportion of rats that remain on the wheel the entire duration of the experiment is the same between each group? Use a permutation test to explore this. +2. Why would the difference in medians be a particularly useless statistic in this case? + +::: {.callout-answer collapse="true"} + +Load and visualise the data. + +::: {.panel-tabset group="language"} +## R + +```{r} +rats_wheel <- read_csv("data/rats_wheel.csv") +``` + +```{r} +ggplot(rats_wheel, aes(x = group, y = time)) + + geom_boxplot() + + geom_jitter(width = 0.1) +``` + + +## Python + +```{python} +rats_wheel_py = pd.read_csv("data/rats_wheel.csv") +``` + +```{python} +#| results: hide +(ggplot(rats_wheel_py, + aes(x = "group", + y = "time")) + + geom_boxplot() + + geom_jitter(width = 0.1)) +``` + +::: + +We can immediately see what the issue is with these data. All of the `control` subjects stayed on the wheel for 300s. If we would check the diagnostic plots for these data then it would not look very promising. For example, I am confident that the equality of variance assumption will not hold here! + +So, let's do what we're asked to do and calculate the proportion of rats that make it to 300s, for each group. + +::: {.panel-tabset group="language"} +## R + +There are many ways of doing this, but here is one: + +```{r} +prop_rats <- rats_wheel %>% + group_by(group, time) %>% + count() %>% + group_by(group) %>% + mutate(group_n = sum(n), + prop_rats = n / group_n) %>% + filter(time == 300) + +prop_rats +``` +Next, we calculate the *difference* in the proportion of rats that make it to 300s, between the two groups. + +```{r} +obs_diff_prop <- prop_rats %>% + pull(prop_rats) %>% + diff() + +obs_diff_prop +``` + +## Python + +There are many ways of doing this, but here's one: + +```{python} +# Calculate the total number of observations in each group +rats_wheel_py['group_n'] = rats_wheel_py.groupby('group')['group'].transform('size') + +# Count the number of occurrences for each unique time point +# and keep the total group count +prop_rats_py = rats_wheel_py.groupby(['group', 'group_n', 'time']).size().reset_index(name = 'n') + +# Calculate the proportion of rats that make it to each time point +prop_rats_py['prop_rats'] = prop_rats_py['n'] / prop_rats_py['group_n'] + +# Filter for the 300s time point +prop_rats_py = prop_rats_py[prop_rats_py['time'] == 300] + +prop_rats_py +``` + +Next, we calculate the *difference* in the proportion of rats that make it to 300s, between the two groups. + +```{python} +obs_diff_prop = prop_rats_py['prop_rats'].diff().iloc[-1] + +obs_diff_prop +``` +::: + +Now we've got that out of the way, we can permute our data. We're reshuffling the `group` labels randomly, then recalculating the permuted proportional difference at time point 300s. + +::: {.panel-tabset group="language"} +## R + +```{r} +seed <- 2602 +``` + +```{r} +set.seed(seed) + +# Set the number of permutations +reps <- 1000 + +# Create a place to store the values +permuted_stats <- tibble(permuted_diff = numeric(reps)) + +# Loop through process +for(i in 1:reps){ + # Get the data + permuted_data <- rats_wheel + + # Permute (reshuffle) the group labels + permuted_data$group <- sample(permuted_data$group) + + # Calculate the new proportional differences + + permuted_diff <- permuted_data %>% + group_by(group, time) %>% + count() %>% + group_by(group) %>% + mutate(group_n = sum(n), + prop_rats = n / group_n) %>% + filter(time == 300) %>% + pull(prop_rats) %>% + diff() + + # Store the calculated difference + permuted_stats$permuted_diff[i] <- permuted_diff +} +``` + +## Python + +::: + +We can now compare the permuted values: + +::: {.panel-tabset group="language"} +## R + +```{r} +ggplot(permuted_stats, aes(permuted_diff)) + + geom_histogram() + + geom_vline(xintercept = obs_diff_prop, colour = "blue", linewidth = 1) + + geom_vline(xintercept = abs(obs_diff_prop), colour = "blue", linewidth = 1) +``` + + +## Python + +::: + +We can now answer the question if the proportion of rats that make it to full time is different between the groups. We do this by comparing the number of occurrences in the resampled data against the original data. How many times is the difference in proportion larger than the observed difference in proportion? Remember, we are doing a two-tailed test, so we need to get the values on either side of the observed proportion. + +In this case the observed difference in proportion between `control` and `treatment` is negative, which we need to take into account. + +::: {.panel-tabset group="language"} +## R + +```{r} +permuted_stats %>% + filter(permuted_diff < obs_diff_prop | + permuted_diff > abs(obs_diff_prop)) +``` + + +## Python + +::: + +We find that none of the permuted differences in proportion between `control` and `treatment` that make it to 300s is outside the one we observed. This means that it is extremely unlikely that we'd find these data, if there is indeed no difference between the two groups. + +To answer the second question: the median is a particularly useless statistic to use with these data because there is no variation in the measurements for the `control` group. All of the values are 300s, meaning you can't find a value where 50% of the data is on one side of it and 50% of the data is on the other! +::: +::: +