Philipp Masur 2022-12
This tutorial outlines power analyses for simple t-tests (mean differences). It is the second part of an on-going series about power analyses. The first part on conducting power analyses for bivariate regression analyses can be found here. We recommend to go through this first tutorial before working on this one.
This tutorial is going to be comparatively short. It shows that Monte Carlo simulations can be used to explore mean differences a bit more in detail, particularly if one has no idea about the standardized effect sizes of a mean difference.
First, let’s assume we have an idea about the strength of the mean difference of interest in terms of Cohen’s d. In this case, we can simply calculate the necessary sample size like so:
# Load package
library(pwr)
# Compute necessary sample size
pwr.t.test(d = .2,
sig.level = .05,
power = .80,
alternative = "two.sided")
##
## Two-sample t test power calculation
##
## n = 393.4057
## d = 0.2
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
In this scenario, we would need overall n = 786 participants for our study.
When we do not have a clear assumptions about the standardized effect size, we can use simulations to produce data that corresponds to our assumptions about the variables distributions.
Let’s assume we have a simple experiment in which particpants are either
exposed to a stimulus or not. We call the groups control and treatment
respectively. Using the function rnorm()
, we can create two normally
distributed vectors. Here, we assume the the control group on average
scores 3 on the outcome variable and the treatment group on average 4.
We further assume that both variables have a standard deviation of 1.
We can plot these differences quickly and also run a t-test to check whether such an effect becomes significant with 100 people (per group!).
library(tidyverse)
# To ensure reproducibility
set.seed(42)
# Simulate data
n <- 100
control <- rnorm(n = n, mean = 3, sd = 1)
treatment <- rnorm(n = n, mean = 4, sd = 1)
# Plot differences
ggplot(NULL) +
geom_density(aes(x = control), alpha = .4, fill = "blue")+
geom_density(aes(x = treatment), alpha = .4, fill = "red") +
theme_classic() +
labs(x = "Distributions of control vs. treatment")
# Combine into data set
d <- tibble(group = rep(c("control", "treatment"), each = n),
outcome = c(control, treatment))
# test
t.test(outcome ~ group, d)
##
## Welch Two Sample t-test
##
## data: outcome by group
## t = -6.3809, df = 194.18, p-value = 1.263e-09
## alternative hypothesis: true difference in means between group control and group treatment is not equal to 0
## 95 percent confidence interval:
## -1.1519980 -0.6080049
## sample estimates:
## mean in group control mean in group treatment
## 3.032515 3.912516
Indeed, a sample of 200 would be enough to test this effect. However, what is the power in this case?
We can wrap the simulation code in a function that allows us to vary all parameters of interest (includin the sample size, the means, and the standard deviations). It automatically extracts the p-value from the t-test as well. This, we “map” over the grid of possible combinations. The function sim_ttest, which we created, thereby takes the respective columns as input. The number of simulations will be 1000 in this case, but should be higher to arrive a stable solutions (> 10,000).
sim_ttest <- function(n = 100, control = 3, treatment = 4, sd1 = 1, sd2 = NULL) {
if(is.null(sd2)) {
sd2 <- sd1
}
control <- rnorm(n = n, mean = control, sd = sd1)
treatment <- rnorm(n = n, mean = treatment, sd = sd2)
d <- tibble(group = rep(c("control", "treatment"), each = n),
outcome = c(control, treatment))
value <- t.test(outcome ~ group, d)$p.value
names(value) <- "value"
return(value)
}
results <- expand.grid(n = seq(20, 400, by = 20), # sample sizes
control = 3, # Mean of the control group (baseline)
treatment = c(3.5, 4, 4.5), # investigated means of the treatment group
sd1 = c(1, 1.5, 2), # standard deviations for both variables
times = 1:1000) %>% # Number of simulations per combinations
dplyr::select(-times) %>%
mutate(pmap_df(., sim_ttest))
head(results)
n | control | treatment | sd1 | value |
---|---|---|---|---|
20 | 3 | 3.5 | 1 | 0.0796459 |
40 | 3 | 3.5 | 1 | 0.1209476 |
60 | 3 | 3.5 | 1 | 0.0040551 |
80 | 3 | 3.5 | 1 | 0.0035720 |
100 | 3 | 3.5 | 1 | 0.0017552 |
120 | 3 | 3.5 | 1 | 0.0006338 |
As we can see, the code above produce a result table that includes the p-value per combination (18,000 combinations!). We can now summarize by counting the number of times a pvalue is below .05 per combination and plot this as curve.
results %>%
mutate(sig = ifelse(value < .05, TRUE, FALSE)) %>%
group_by(n, treatment, sd1) %>%
summarize(power = (sum(sig)/length(sig))*100) %>%
ggplot(aes(x = n, y = power, color = factor(sd1))) +
geom_point() +
geom_line() +
facet_wrap(~treatment) +
geom_hline(yintercept = 80, linetype = "dotted", color = "red") +
scale_color_brewer(palette = "Dark2") +
theme_bw() +
labs(x = "Sample size (n)", y = "Power (1 - beta)",
color = "Standard deviations")
As we can see, we need many more participants if a) the mean difference is small (.5 vs. 1 vs. 1.5) and when the standard deviations are larger!