diff --git a/.nojekyll b/.nojekyll
index 5e18935..3539e69 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-66339ad3
\ No newline at end of file
+905f3538
\ No newline at end of file
diff --git a/index.html b/index.html
index 4f13ac7..253b38a 100644
--- a/index.html
+++ b/index.html
@@ -298,7 +298,7 @@
Tutorial
Setting the seed – How can you generate the same random numbers?
Sample size n – How many values should you generate within a simulation?
Simulate to check alpha – Write your first simulation and check the rate of false-positive findings.
Simulate to check power – Simulate data to perform a power analysis.
diff --git a/search.json b/search.json
index ef08926..998577d 100644
--- a/search.json
+++ b/search.json
@@ -26,7 +26,7 @@
"href": "tutorial_pages/repeat.html",
"title": "Repetition",
"section": "",
- "text": "Repetition\nThe function\n\nreplicate(nrep, expression) repeats the expression provided nrep times.\n\nFor example, replicate(10, mean(rnorm(100))) reads: ‘Draw 100 values from a normal distribution with a mean of 0 and a standard deviation of 1 (the default values of rnorm(n, mean, sd)), calculate the mean of these 100 values, and do all that 10 times.’\n\nYOUR TURN:\nIn your local exercise script:\n1. Repeat 1000 times the calculation of the mean of 10 values drawn from a uniform distribution between 0 and 10.\n2. Repeat 100 times the calculation of the mean of 50 values drawn from a normal distribution with a mean of 10 and a standard deviation of 5.\n3. Make a histogram of your results for each task. Are the distributions looking as expected?\n\n\n\n\n\n Back to top",
+ "text": "Repetition\nThe function replicate(nrep, expression) repeats the expression provided nrep times.\nFor example, replicate(10, mean(rnorm(100))) reads: ‘Draw 100 values from a normal distribution with a mean of 0 and a standard deviation of 1 (the default values of rnorm(n, mean, sd)), calculate the mean of these 100 values, and do all that 10 times.’\n\nYOUR TURN:\nIn your local exercise script:\n1. Repeat 1000 times the calculation of the mean of 10 values drawn from a uniform distribution between 0 and 10.\n2. Repeat 100 times the calculation of the mean of 50 values drawn from a normal distribution with a mean of 10 and a standard deviation of 5.\n3. Make a histogram of your results for each task. Are the distributions looking as expected?\n\n\n\n\n\n Back to top",
"crumbs": [
"Tutorial",
"Repeat"
@@ -37,7 +37,7 @@
"href": "tutorial_pages/random-numbers-generators.html",
"title": "Random number generators",
"section": "",
- "text": "Random number generators\nR contains several functions to generate random numbers.\nType ?function in your console to get information on the function’s arguments (i.e. the values that must be provided to obtain the function’s result).\nThe function sample(x, n, replace = FALSE) draws n values from a given vector x without replacement (by default).\nSampling without replacement means that when you repeatedly draw e.g. one item at a time from a pool of items, any item selected during the first draw is not available for selection during the second draw, and the first and second selected items are not in the pool to select from during the third draw, etc. Sampling with replacement means that all the original options are available at each draw.\n\nYOUR TURN:\nSample 100 values between 3 and 103 with replacement. For this, open the file ./exercise_script.R from the root of your local repository (with or without answers), review the examples if needed, complete the exercise, and check out the proposed answer.\n\nThe following functions draw n values from distributions with the specified parameters:\n\nrunif(n, min, max) draws n values from a uniform distribution with the specified min and max.\n\nrpois(n, lambda) draws n values from a Poisson distribution with the specified lambda.\n\nrnorm(n, mean, sd) draws n values from a normal distribution with the specified mean and standard deviation sd.\n\nrbinom(n, prob) draws n values from a binomial distribution with the specified probability prob.\n\n\nYOUR TURN:\n1. Draw 100 values from a normal distribution with a mean of 0 and a standard deviation of 1.\n2. Draw 50 values from a normal distribution with a mean of 10 and a standard deviation of 5.\n3. Draw 1000 values from a Poisson distribution with a lambda of 50.\n4. Draw 30 values from a uniform distribution between 0 and 10.\nTry it out in your local exercise script.\n\n\n\n\n\n Back to top",
+ "text": "Random number generators\nR contains several functions to generate random numbers.\nType ?function in your console to get information on the function’s arguments (i.e. the values that must be provided to obtain the function’s result).\nThe function sample(x, n, replace = FALSE) draws n values from a given vector x without replacement (by default).\nSampling without replacement means that when you repeatedly draw e.g. one item at a time from a pool of items, any item selected during the first draw is not available for selection during the second draw, and the first and second selected items are not in the pool to select from during the third draw, etc. Sampling with replacement means that all the original options are available at each draw.\n\nYOUR TURN:\nSample 100 values between 3 and 103 with replacement. For this, open the R script(s) with the exercises (./exercise_script_with_solutions.R and/or ./exercise_script_without_solutions.R) from the root of your local repository, review the examples if needed, complete the exercise, and check out the proposed answer.\n\nThe following functions draw n values from distributions with the specified parameters:\n\nrunif(n, min, max) draws n values from a uniform distribution with the specified min and max.\n\nrpois(n, lambda) draws n values from a Poisson distribution with the specified lambda.\n\nrnorm(n, mean, sd) draws n values from a normal distribution with the specified mean and standard deviation sd.\n\nrbinom(n, prob) draws n values from a binomial distribution with the specified probability prob.\n\n\nYOUR TURN:\n1. Draw 100 values from a normal distribution with a mean of 0 and a standard deviation of 1.\n2. Draw 50 values from a normal distribution with a mean of 10 and a standard deviation of 5.\n3. Draw 1000 values from a Poisson distribution with a lambda of 50.\n4. Draw 30 values from a uniform distribution between 0 and 10.\nTry it out in your local exercise script.\n\n\n\n\n\n Back to top",
"crumbs": [
"Tutorial",
"Random number generators"
@@ -103,7 +103,7 @@
"href": "tutorial_pages/check-power.html",
"title": "Checking power through simulations",
"section": "",
- "text": "Checking power through simulations\nThe power of a statistical test tells us the probability that the test correctly rejects the null hypothesis. In other words, if we only examine true effects, the power is the proportion of tests that will (correctly) reject the null hypothesis. Often, the power is set to 80%, though, as with alpha = 0.05, this is an arbitrary choice.\nGenerally, we want to do power analysis before collecting data, to work out the sample size we need to detect some effect. If we are calculating a required sample size, the power analysis can also be called a sample size calculation.\nTaking the example of a t-test, we need to understand a few parameters:\n\nn, the sample size.\ndelta, the difference in means that you want to be able to detect. Deciding what this value should be is tricky. You might rely on estimates from the literature (though bear in mind they are likely to be inflated), or you can use a minimally important difference, which is the threshold below which you do not consider a difference interesting enough to be worth detecting. In a clinical trial, for example, this might be the smallest difference that a patient would care about.\nsd, the standard deviation. Usually, this needs to be estimated from the literature or pilot studies.\nsig.level, the alpha, as discussed previously.\npower, the power as defined above.\n\nYou can calculate any one of these parameters, given all of the others. We usually want to specify, delta, sd, sig.level and power and calculate the required sample size.\nWe can calculate the required sample size for a t-test using:\npower.t.test(n = NULL, delta = 0.5, sd = 1, sig.level = 0.05, power = 0.8)\nNotice that n = NULL, so this parameter is calculated.\nThe sample size n we need, given this set of parameters, is 64 per group.\nJust as we can check the alpha of our test by sampling from the same distribution (i.e. simulating data without an effect), we can check the power by sampling from different distributions (i.e. simulating data with an effect).\nIf we sample values from two normal distributions with different means (e.g. N(0,1) and N(0.5,1)), what is the minimum sample size we need to detect a significant difference in means with a t-test 80% of the time?\n\nYOUR TURN:\n1. Use your simulation skills to work out the power through simulation. Write a function that does the following:\n\nDraws n values from a random normal distribution with mean1 and another n values from a normal distribution with mean2.\nCompares the means of these two samples with a t-test and extracts the p-value.\n\n\nReplicate the function 1000 times using the parameters used in the power calculation above (that used the power.t.test() function).\nCalculate the proportion of p-values that are smaller than 0.05.\n\n\np-values of t-tests comparing means from 1000 simulations of N(0,1) and N(0.5,1) with n = 64:\n \n\nThe proportion of correctly rejected null hypotheses in the simulation is close to 0.8, which is what we would expect.\nUsing simulations for power analysis is not really necessary for simple examples like a t-test, though it is useful to check your understanding.\nWhen analyses become complex and it is hard or impossible to determine a sample size analytically (i.e. you can’t calculate it, or there’s no suitable function to use), then simulations are an indispensable tool.\nA simple example of a power analysis like the one you’ve just done can be found in the “Power analysis” section of this paper:\n\nBlanco, D., Schroter, S., Aldcroft, A., Moher, D., Boutron, I., Kirkham, J. J., & Cobo, E. (2020). Effect of an editorial intervention to improve the completeness of reporting of randomised trials: a randomised controlled trial. BMJ Open, 10(5), e036799. https://doi.org/10.1136/bmjopen-2020-036799\n\nA complete self-paced tutorial to simulate data for power analysis of complex statistical designs can be found here:\n\nhttps://lmu-osc.github.io/Simulations-for-Advanced-Power-Analyses/\n\n\n\n\n\n\n Back to top",
+ "text": "Checking power through simulations\nThe power of a statistical test tells us the probability that the test correctly rejects the null hypothesis. In other words, if we only examine true effects, the power is the proportion of tests that will (correctly) reject the null hypothesis. Often, the power is set to 80%, though, as with alpha = 0.05, this is an arbitrary choice.\nGenerally, we want to do power analysis before collecting data, to work out the sample size we need to detect some effect. If we are calculating a required sample size, the power analysis can also be called a sample size calculation.\nTaking the example of a t-test, we need to understand a few parameters:\n\nn, the sample size.\ndelta, the difference in means that you want to be able to detect. Deciding what this value should be is tricky. You might rely on estimates from the literature (though bear in mind they are likely to be inflated), or you can use a minimally important difference, which is the threshold below which you do not consider a difference interesting enough to be worth detecting. In a clinical trial, for example, this might be the smallest difference that a patient would care about.\nsd, the standard deviation. Usually, this needs to be estimated from the literature or pilot studies.\nsig.level, the alpha, as discussed previously.\npower, the power as defined above.\n\nYou can calculate any one of these parameters, given all of the others. We usually want to specify, delta, sd, sig.level and power and calculate the required sample size.\nWe can calculate the required sample size for a t-test using:\npower.t.test(n = NULL, delta = 0.5, sd = 1, sig.level = 0.05, power = 0.8)\nNotice that n = NULL, so this parameter is calculated.\nThe sample size n we need, given this set of parameters, is 64 per group.\nJust as we can check the alpha of our test by sampling from the same distribution (i.e. simulating data without an effect), we can check the power by sampling from different distributions (i.e. simulating data with an effect).\nIf we sample values from two normal distributions with different means (e.g. N(0,1) and N(0.5,1)), what is the minimum sample size we need to detect a significant difference in means with a t-test 80% of the time?\n\nYOUR TURN:\n1. Use your simulation skills to work out the power through simulation. Write a function that does the following: i) Draws n values from a random normal distribution with mean1 and another n values from a normal distribution with mean2. ii) Compares the means of these two samples with a t-test and extracts the p-value. 2. Replicate the function 1000 times using the parameters used in the power calculation above (that used the power.t.test() function). 3. Calculate the proportion of p-values that are smaller than 0.05.\n\np-values of t-tests comparing means from 1000 simulations of N(0,1) and N(0.5,1) with n = 64:\n \n\nThe proportion of correctly rejected null hypotheses in the simulation is close to 0.8, which is what we would expect.\nUsing simulations for power analysis is not really necessary for simple examples like a t-test, though it is useful to check your understanding.\nWhen analyses become complex and it is hard or impossible to determine a sample size analytically (i.e. you can’t calculate it, or there’s no suitable function to use), then simulations are an indispensable tool.\nA simple example of a power analysis like the one you’ve just done can be found in the “Power analysis” section of this paper:\n\nBlanco, D., Schroter, S., Aldcroft, A., Moher, D., Boutron, I., Kirkham, J. J., & Cobo, E. (2020). Effect of an editorial intervention to improve the completeness of reporting of randomised trials: a randomised controlled trial. BMJ Open, 10(5), e036799. https://doi.org/10.1136/bmjopen-2020-036799\n\nA complete self-paced tutorial to simulate data for power analysis of complex statistical designs can be found here:\n\nhttps://lmu-osc.github.io/Simulations-for-Advanced-Power-Analyses/\n\n\n\n\n\n\n Back to top",
"crumbs": [
"Tutorial",
"Simulate to check power"
@@ -125,7 +125,7 @@
"href": "index.html",
"title": "Introduction to Simulations in R",
"section": "",
- "text": "This tutorial was created by Malika Ihle based on materials from Joel Pick, Hadley Wickham, and Kevin Hallgren, with contributions from James Smith.\nIt is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.\n\n\n\n\nHave R and RStudio installed. If you don’t, follow these instructions.\n\nKnow some R basics (e.g. how to select a value in a data frame, how to create a vector). If you don’t, visit the following tutorial: https://lmu-osc.github.io/introduction-to-R/.\n\n\n\n\n\nWatch this 30-minute introduction to credible research, which contextualises the importance of simulations for reliable research.\nRead Hallgren, A. K. (2013). Conducting simulation studies in the R programming environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43–60.\n\n\n\n\n\n\nThe self-paced tutorial (pages linked below) will alternate presentation of concepts and simple exercises for you to try to apply them in R. Each time you see written YOUR TURN, switch to your local copy of the exercise script (you can choose between a file with or without the solutions depending on e.g. your level of familiarity with R), review the examples if needed, complete the exercise, and check out the proposed answer (which often contains additional tips). Come back to the online tutorial and after finishing one page, you can navigate to the next page linked at the bottom to continue. The exercise script contains code for all the exercises and code that generates the plots that appear in the online tutorial, all in order of appearance in the tutorial.\nIt is necessary that you work through the sections of the tutorial in order. Please read the blurbs of each section below to get an overview of this workshop. Then click on the first page ‘Download the material’ and follow along by navigating to the next page linked at the bottom of each page! You can get back to this overview at any time by clicking on the title ‘Introduction-Simulations-in-R’ at the top of each page.\n\n\n\n\nDownload the material – Get this tutorial onto your machine.\nDefinition – What are simulations?\nPurpose – What can we use simulations for?\nBasic principles – What do we need to create a simulation?\nRandom number generators – How to generate random numbers in R?\nRepeat – How to repeat the generation of random numbers multiple times?\nSetting the seed – How can you generate the same random numbers?\nSample size n – How many values should you generate within a simulation?\nNumber of simulations nrep – How many repeats of a simulation should you run?\nDry rule – How to write your own functions?\nSimulate to check alpha – Write your first simulation and check the rate of false-positive findings.\n\nSimulate to check power – Simulate data to perform a power analysis.\n\nSimulate to prepare a preregistration – Simulate data to test statistical analyses before preregistering them.\n\nGeneral structure – What is the general structure of a simulation?\nLimitations – What are the limitations of simulations?\nReal-life example – What are real-life examples of simulations?\nAdditional resources – What resources can help you write your own simulation?",
+ "text": "This tutorial was created by Malika Ihle based on materials from Joel Pick, Hadley Wickham, and Kevin Hallgren, with contributions from James Smith.\nIt is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.\n\n\n\n\nHave R and RStudio installed. If you don’t, follow these instructions.\n\nKnow some R basics (e.g. how to select a value in a data frame, how to create a vector). If you don’t, visit the following tutorial: https://lmu-osc.github.io/introduction-to-R/.\n\n\n\n\n\nWatch this 30-minute introduction to credible research, which contextualises the importance of simulations for reliable research.\nRead Hallgren, A. K. (2013). Conducting simulation studies in the R programming environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43–60.\n\n\n\n\n\n\nThe self-paced tutorial (pages linked below) will alternate presentation of concepts and simple exercises for you to try to apply them in R. Each time you see written YOUR TURN, switch to your local copy of the exercise script (you can choose between a file with or without the solutions depending on e.g. your level of familiarity with R), review the examples if needed, complete the exercise, and check out the proposed answer (which often contains additional tips). Come back to the online tutorial and after finishing one page, you can navigate to the next page linked at the bottom to continue. The exercise script contains code for all the exercises and code that generates the plots that appear in the online tutorial, all in order of appearance in the tutorial.\nIt is necessary that you work through the sections of the tutorial in order. Please read the blurbs of each section below to get an overview of this workshop. Then click on the first page ‘Download the material’ and follow along by navigating to the next page linked at the bottom of each page! You can get back to this overview at any time by clicking on the title ‘Introduction-Simulations-in-R’ at the top of each page.\n\n\n\n\nDownload the material – Get this tutorial onto your machine.\nDefinition – What are simulations?\nPurpose – What can we use simulations for?\nBasic principles – What do we need to create a simulation?\nRandom number generators – How to generate random numbers in R?\nRepeat – How to repeat the generation of random numbers multiple times?\nSetting the seed – How can you generate the same random numbers?\nSample size n – How many values should you generate within a simulation?\nNumber of simulations nrep – How many repeats of a simulation should you run?\nDRY rule – How to write your own functions?\nSimulate to check alpha – Write your first simulation and check the rate of false-positive findings.\n\nSimulate to check power – Simulate data to perform a power analysis.\n\nSimulate to prepare a preregistration – Simulate data to test statistical analyses before preregistering them.\n\nGeneral structure – What is the general structure of a simulation?\nLimitations – What are the limitations of simulations?\nReal-life example – What are real-life examples of simulations?\nAdditional resources – What resources can help you write your own simulation?",
"crumbs": [
"Home"
]
@@ -165,7 +165,7 @@
"href": "index.html#self-paced-workshop",
"title": "Introduction to Simulations in R",
"section": "",
- "text": "The self-paced tutorial (pages linked below) will alternate presentation of concepts and simple exercises for you to try to apply them in R. Each time you see written YOUR TURN, switch to your local copy of the exercise script (you can choose between a file with or without the solutions depending on e.g. your level of familiarity with R), review the examples if needed, complete the exercise, and check out the proposed answer (which often contains additional tips). Come back to the online tutorial and after finishing one page, you can navigate to the next page linked at the bottom to continue. The exercise script contains code for all the exercises and code that generates the plots that appear in the online tutorial, all in order of appearance in the tutorial.\nIt is necessary that you work through the sections of the tutorial in order. Please read the blurbs of each section below to get an overview of this workshop. Then click on the first page ‘Download the material’ and follow along by navigating to the next page linked at the bottom of each page! You can get back to this overview at any time by clicking on the title ‘Introduction-Simulations-in-R’ at the top of each page.\n\n\n\n\nDownload the material – Get this tutorial onto your machine.\nDefinition – What are simulations?\nPurpose – What can we use simulations for?\nBasic principles – What do we need to create a simulation?\nRandom number generators – How to generate random numbers in R?\nRepeat – How to repeat the generation of random numbers multiple times?\nSetting the seed – How can you generate the same random numbers?\nSample size n – How many values should you generate within a simulation?\nNumber of simulations nrep – How many repeats of a simulation should you run?\nDry rule – How to write your own functions?\nSimulate to check alpha – Write your first simulation and check the rate of false-positive findings.\n\nSimulate to check power – Simulate data to perform a power analysis.\n\nSimulate to prepare a preregistration – Simulate data to test statistical analyses before preregistering them.\n\nGeneral structure – What is the general structure of a simulation?\nLimitations – What are the limitations of simulations?\nReal-life example – What are real-life examples of simulations?\nAdditional resources – What resources can help you write your own simulation?",
+ "text": "The self-paced tutorial (pages linked below) will alternate presentation of concepts and simple exercises for you to try to apply them in R. Each time you see written YOUR TURN, switch to your local copy of the exercise script (you can choose between a file with or without the solutions depending on e.g. your level of familiarity with R), review the examples if needed, complete the exercise, and check out the proposed answer (which often contains additional tips). Come back to the online tutorial and after finishing one page, you can navigate to the next page linked at the bottom to continue. The exercise script contains code for all the exercises and code that generates the plots that appear in the online tutorial, all in order of appearance in the tutorial.\nIt is necessary that you work through the sections of the tutorial in order. Please read the blurbs of each section below to get an overview of this workshop. Then click on the first page ‘Download the material’ and follow along by navigating to the next page linked at the bottom of each page! You can get back to this overview at any time by clicking on the title ‘Introduction-Simulations-in-R’ at the top of each page.\n\n\n\n\nDownload the material – Get this tutorial onto your machine.\nDefinition – What are simulations?\nPurpose – What can we use simulations for?\nBasic principles – What do we need to create a simulation?\nRandom number generators – How to generate random numbers in R?\nRepeat – How to repeat the generation of random numbers multiple times?\nSetting the seed – How can you generate the same random numbers?\nSample size n – How many values should you generate within a simulation?\nNumber of simulations nrep – How many repeats of a simulation should you run?\nDRY rule – How to write your own functions?\nSimulate to check alpha – Write your first simulation and check the rate of false-positive findings.\n\nSimulate to check power – Simulate data to perform a power analysis.\n\nSimulate to prepare a preregistration – Simulate data to test statistical analyses before preregistering them.\n\nGeneral structure – What is the general structure of a simulation?\nLimitations – What are the limitations of simulations?\nReal-life example – What are real-life examples of simulations?\nAdditional resources – What resources can help you write your own simulation?",
"crumbs": [
"Home"
]
@@ -175,7 +175,7 @@
"href": "tutorial_pages/check-alpha.html",
"title": "Using simulations to check alpha",
"section": "",
- "text": "Using simulations to check alpha\nIn most quantitative sciences, we accept a type I error rate of 0.05, which is often called the alpha or significance level. This value tells us the probability of rejecting the null hypothesis (i.e. of finding an effect) given that the null hypothesis is true.\nIn other words, if there is no true effect (e.g. no difference between two groups), we would expect our null hypothesis of no effect to be rejected (incorrectly) (alpha * 100)% of the time.\nIf you draw from the same normal distribution twice, will the mean of the two samples differ significantly in 5% of the cases?\n\nYOUR TURN:\n1. Figure out how to do a t-test in R.\n2. Generate two vectors of 10 values drawn from N(0,1) and compare them with a t-test.\n3. Figure out how to extract the p-value from that object (explore your R object with the functions str() or names()).\n4. Write a function simT() that generates two vectors of n values drawn from a standard normal distribution (N(0,1)), compares them with a t-test, and returns the p-value.\n5. Test your function by calling it for n = 50.\n6. For n = 10, generate nrep = 20 repetitions and draw a histogram.\n7. Repeat the previous task with nrep = 100.\n\np-values of t-tests comparing means from 20 or 100 simulations of N(0,1) with n = 10:\n \n\nIn the first case, where nrep = 20, we expect 1 out of the 20 tests to be significant (5%). In my case, I did! How many did you get?\nIn the second case, where nrep = 100, we expect 5 out of the 100 tests to be significant. In my case, I got 6. How many did you get?\nAre those deviations meaningful? Are they significant?\n\nYOUR TURN:\n1. Plot a histogram of nrep = 1000 outputs of the function simT with n = 10.\n2. Plot a histogram of nrep = 1000 outputs of the function simT with n = 100.\n\np-values of t-tests comparing means from 1000 simulations of N(0,1) with n=10 or n=100:\n \n\nIn both cases, we expect 50 out of the 1000 tests to be significant by chance (i.e. with a p-value under 0.05). In my simulations, I get 40 and 45 false positive results, for n = 10 and n = 100, respectively. How many did you get?\nThese proportions are not significantly different from 5%.\nprop.test(45, 1000, p = 0.05, alternative = \"two.sided\", correct = TRUE)\n 1-sample proportions test with continuity correction\n data: 45 out of 1000, null probability 0.05\n X-squared = 0.42632, df = 1, p-value = 0.5138\nIt is important to note that, although alpha = 0.05 is commonly used, this is an arbitrary choice and you should consider what is an appropriate type 1 error rate for your particular investigation.\nAlthough it isn’t necessary to check that a statistical analysis as simple as a t-test does not yield more than 5% false-positive results, in situations where the structure of the data is complex and analysed with more advanced models (e.g. when explanatory variables are mathematically linked to each other or are combined in a mixed-effect model), this may allow to compare different modelling approaches and select one that does not produce more than 5% false-positive results.\nSuch complex example, where simulation is the only viable approach to construct a statistical model that does not lead to spurious effects, can be found in this paper:\n\nIhle, M., Pick, J. L., Winney, I. S., Nakagawa, S., & Burke, T. (2019). Measuring Up to Reality: Null Models and Analysis Simulations to Study Parental Coordination Over Provisioning Offspring. Frontiers in Ecology and Evolution, 7, 142. https://doi.org/10.3389/fevo.2019.00142\n\n\n\n\n\n\n Back to top",
+ "text": "Using simulations to check alpha\nIn most quantitative sciences, we accept a type I error rate of 0.05, which is often called the alpha or significance level. This value tells us the probability of rejecting the null hypothesis (i.e. of finding an effect) given that the null hypothesis is true.\nIn other words, if there is no true effect (e.g. no difference between two groups), we would expect our null hypothesis of no effect to be rejected (incorrectly) (alpha * 100)% of the time.\nIf you draw from the same normal distribution twice, will the mean of the two samples differ significantly in 5% of the cases?\n\nYOUR TURN:\n1. Figure out how to do a t-test in R.\n2. Generate two vectors of 10 values drawn from N(0,1) and compare them with a t-test.\n3. Figure out how to extract the p-value from that object (explore your R object with the functions str() or names()).\n4. Write a function simT() that generates two vectors of n values drawn from a standard normal distribution (N(0,1)), compares them with a t-test, and returns the p-value.\n5. Test your function by calling it for n = 50.\n6. For n = 10, generate nrep = 20 repetitions and draw a histogram.\n7. Repeat the previous task with nrep = 100.\n\np-values of t-tests comparing means from 20 or 100 simulations of N(0,1) with n = 10:\n \n\nIn the first case, where nrep = 20, we expect 1 out of the 20 tests to be significant (5%). In my case, I did! How many did you get?\nIn the second case, where nrep = 100, we expect 5 out of the 100 tests to be significant. In my case, I got 6. How many did you get?\nAre those deviations meaningful? Are they significant?\n\nYOUR TURN:\n1. Plot a histogram of nrep = 1000 outputs of the function simT with n = 10.\n2. Plot a histogram of nrep = 1000 outputs of the function simT with n = 100.\n\np-values of t-tests comparing means from 1000 simulations of N(0,1) with n=10 or n=100:\n \n\nIn both cases, we expect 50 out of the 1000 tests to be significant by chance (i.e. with a p-value under 0.05). In my simulations, I get 40 and 45 false positive results, for n = 10 and n = 100, respectively. How many did you get?\nThese proportions are not significantly different from 5%.\nprop.test(45, 1000, p = 0.05, alternative = \"two.sided\", correct = TRUE)\n\n1-sample proportions test with continuity correction\ndata: 45 out of 1000, null probability 0.05\nX-squared = 0.42632, df = 1, p-value = 0.5138\n\nIt is important to note that, although alpha = 0.05 is commonly used, this is an arbitrary choice and you should consider what is an appropriate type 1 error rate for your particular investigation.\nAlthough it isn’t necessary to check that a statistical analysis as simple as a t-test does not yield more than 5% false-positive results, in situations where the structure of the data is complex and analysed with more advanced models (e.g. when explanatory variables are mathematically linked to each other or are combined in a mixed-effect model), this may allow to compare different modelling approaches and select one that does not produce more than 5% false-positive results.\nSuch complex example, where simulation is the only viable approach to construct a statistical model that does not lead to spurious effects, can be found in this paper:\n\nIhle, M., Pick, J. L., Winney, I. S., Nakagawa, S., & Burke, T. (2019). Measuring Up to Reality: Null Models and Analysis Simulations to Study Parental Coordination Over Provisioning Offspring. Frontiers in Ecology and Evolution, 7, 142. https://doi.org/10.3389/fevo.2019.00142\n\n\n\n\n\n\n Back to top",
"crumbs": [
"Tutorial",
"Simulate to check alpha"
@@ -241,7 +241,7 @@
"href": "tutorial_pages/real-life-example.html",
"title": "Real-life example",
"section": "",
- "text": "Real-life example\nThis is a walk through one relatively simple simulation written to check whether the following two models would provide the same results:\n\nA generalized linear model based on a contingency table of counts (Poisson distribution).\n\nA generalized linear model with one line per observation and the occurrence of the variable of interest coded as ‘Yes’ or ‘No’ (binomial distribution).\n\nI created this code while preparing my preregistration for a simple behavioural ecology experiment about methods for independently manipulating palatability and colour in small insect prey (article, OSF preregistration).\nThe R script screenshot below, glm_Freq_vs_YN.R, can be found in the folder Ihle2020.\n\nThis walkthrough will use the steps as defined on the page ‘General structure’.\n\nDefine sample sizes (within a dataset and number of replicates), experimental design (fixed dataset structure, e.g. treatment groups, factors), and parameters that will need to vary (here, the strength of the effect).\n\n\nGenerate data (here, using sample() and the probabilities defined in step 1) and format it in two different ways to accommodate the two statistical tests to be compared.\n\n\nRun the statistical test and save the parameter estimate of interest for that iteration. Here, this is done for both statistical tests to be compared.\n\n\nReplicate steps 2 (data simulation) and 3 (data analyses) to get the distribution of the parameter estimates by wrapping these steps in a function.\nDefinition of the function at the beginning: \n Output returned from the function at the end: \n Replicate the function nrep times. Here, pbreplicate() is used to provide a bar of progress for R to run this command. \n\nExplore the parameter space. Here, vary the probabilities of sampling between 0 and 1 depending on the treatment group category.\n\n\nAnalyse and interpret the combined results of many simulations. In this case, the results of the two models were qualitatively the same (comparison of results for a few simulations), and both models gave the same expected 5% false positive results when no effect was simulated. Varying the effect (the probability of sampling 0 or 1 depending on the experimental treatment) allowed us to find the minimum effect size for which the number of positive results of the tests is over 80%.\n\n\n\n\n\n\n\n\n Back to top",
+ "text": "Real-life example\nThis is a walk through one relatively simple simulation written to check whether the following two models would provide the same results:\n\nA generalized linear model based on a contingency table of counts (Poisson distribution).\n\nA generalized linear model with one line per observation and the occurrence of the variable of interest coded as ‘Yes’ or ‘No’ (binomial distribution).\n\nI created this code while preparing my preregistration for a simple behavioural ecology experiment about methods for independently manipulating palatability and colour in small insect prey (article, OSF preregistration).\nThe R script screenshot below, glm_Freq_vs_YN.R, can be found in the folder Ihle2020.\nThis walkthrough will use the steps as defined on the page ‘General structure’.\n\nDefine sample sizes (within a dataset and number of replicates), experimental design (fixed dataset structure, e.g. treatment groups, factors), and parameters that will need to vary (here, the strength of the effect).\n\n\nGenerate data (here, using sample() and the probabilities defined in step 1) and format it in two different ways to accommodate the two statistical tests to be compared.\n\n\nRun the statistical test and save the parameter estimate of interest for that iteration. Here, this is done for both statistical tests to be compared.\n\n\nReplicate steps 2 (data simulation) and 3 (data analyses) to get the distribution of the parameter estimates by wrapping these steps in a function.\nDefinition of the function at the beginning: \n Output returned from the function at the end: \n Replicate the function nrep times. Here, pbreplicate() is used to provide a bar of progress for R to run this command. \n\nExplore the parameter space. Here, vary the probabilities of sampling between 0 and 1 depending on the treatment group category.\n\n\nAnalyse and interpret the combined results of many simulations. In this case, the results of the two models were qualitatively the same (comparison of results for a few simulations), and both models gave the same expected 5% false positive results when no effect was simulated. Varying the effect (the probability of sampling 0 or 1 depending on the experimental treatment) allowed us to find the minimum effect size for which the number of positive results of the tests is over 80%.\n\n\n\n\n\n\n\n\n Back to top",
"crumbs": [
"Tutorial",
"Real-life example"
diff --git a/sitemap.xml b/sitemap.xml
index 0bd5db1..23080f0 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,74 +2,74 @@
https://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/simulate-for-preregistration.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/sample-size-n.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/repeat.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/random-numbers-generators.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/number-of-simulations-nrep.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/general-structure.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/download-repo.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/check-power.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/basic-principles.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/index.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.736Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/check-alpha.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/definition.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/dry-rule.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/limitations.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/purpose.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/real-life-example.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/resources.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Zhttps://lmu-osc.github.io/Introduction-Simulations-in-R/tutorial_pages/seed.html
- 2024-09-08T09:03:58.202Z
+ 2024-09-09T00:45:59.740Z
diff --git a/tutorial_pages/check-alpha.html b/tutorial_pages/check-alpha.html
index f3a3673..af7aadc 100644
--- a/tutorial_pages/check-alpha.html
+++ b/tutorial_pages/check-alpha.html
@@ -314,9 +314,11 @@
Using simulations to check alpha
In both cases, we expect 50 out of the 1000 tests to be significant by chance (i.e. with a p-value under 0.05). In my simulations, I get 40 and 45 false positive results, for n = 10 and n = 100, respectively. How many did you get?
These proportions are not significantly different from 5%.
prop.test(45, 1000, p =0.05, alternative ="two.sided", correct =TRUE)
-
1-sample proportions test with continuity correction
- data: 45 out of 1000, null probability 0.05
- X-squared = 0.42632, df = 1, p-value = 0.5138
+
+
1-sample proportions test with continuity correction
+data: 45 out of 1000, null probability 0.05
+X-squared = 0.42632, df = 1, p-value = 0.5138
+
It is important to note that, although alpha = 0.05 is commonly used, this is an arbitrary choice and you should consider what is an appropriate type 1 error rate for your particular investigation.
Although it isn’t necessary to check that a statistical analysis as simple as a t-test does not yield more than 5% false-positive results, in situations where the structure of the data is complex and analysed with more advanced models (e.g. when explanatory variables are mathematically linked to each other or are combined in a mixed-effect model), this may allow to compare different modelling approaches and select one that does not produce more than 5% false-positive results.
Such complex example, where simulation is the only viable approach to construct a statistical model that does not lead to spurious effects, can be found in this paper:
If we sample values from two normal distributions with different means (e.g. N(0,1) and N(0.5,1)), what is the minimum sample size we need to detect a significant difference in means with a t-test 80% of the time?
YOUR TURN:
-1. Use your simulation skills to work out the power through simulation. Write a function that does the following:
-
-
Draws n values from a random normal distribution with mean1 and another n values from a normal distribution with mean2.
-
Compares the means of these two samples with a t-test and extracts the p-value.
-
-
-
Replicate the function 1000 times using the parameters used in the power calculation above (that used the power.t.test() function).
-
Calculate the proportion of p-values that are smaller than 0.05.
-
+1. Use your simulation skills to work out the power through simulation. Write a function that does the following: i) Draws n values from a random normal distribution with mean1 and another n values from a normal distribution with mean2. ii) Compares the means of these two samples with a t-test and extracts the p-value. 2. Replicate the function 1000 times using the parameters used in the power calculation above (that used the power.t.test() function). 3. Calculate the proportion of p-values that are smaller than 0.05.
p-values of t-tests comparing means from 1000 simulations of N(0,1) and N(0.5,1) with n = 64:
Sampling without replacement means that when you repeatedly draw e.g. one item at a time from a pool of items, any item selected during the first draw is not available for selection during the second draw, and the first and second selected items are not in the pool to select from during the third draw, etc. Sampling with replacement means that all the original options are available at each draw.
YOUR TURN:
-Sample 100 values between 3 and 103 with replacement. For this, open the file ./exercise_script.R from the root of your local repository (with or without answers), review the examples if needed, complete the exercise, and check out the proposed answer.
+Sample 100 values between 3 and 103 with replacement. For this, open the R script(s) with the exercises (./exercise_script_with_solutions.R and/or ./exercise_script_without_solutions.R) from the root of your local repository, review the examples if needed, complete the exercise, and check out the proposed answer.
The following functions draw n values from distributions with the specified parameters:
I created this code while preparing my preregistration for a simple behavioural ecology experiment about methods for independently manipulating palatability and colour in small insect prey (article, OSF preregistration).
The R script screenshot below, glm_Freq_vs_YN.R, can be found in the folder Ihle2020.
-
This walkthrough will use the steps as defined on the page ‘General structure’.
Define sample sizes (within a dataset and number of replicates), experimental design (fixed dataset structure, e.g. treatment groups, factors), and parameters that will need to vary (here, the strength of the effect).
replicate(nrep, expression) repeats the expression provided nrep times.
-
+
The function replicate(nrep, expression) repeats the expression provided nrep times.
For example, replicate(10, mean(rnorm(100))) reads: ‘Draw 100 values from a normal distribution with a mean of 0 and a standard deviation of 1 (the default values of rnorm(n, mean, sd)), calculate the mean of these 100 values, and do all that 10 times.’