From 4309eb017653814ea7fef807720b6110dad70b0c Mon Sep 17 00:00:00 2001 From: "Song, Xiao Ping" Date: Wed, 19 Oct 2022 17:01:49 +0800 Subject: [PATCH] first draft for additional notes on basic stats --- notes/3_Basic_statistics.Rmd | 408 ++++++ notes/3_Basic_statistics.html | 2177 +++++++++++++++++++++++++++++++++ output/sd-vs-se.png | Bin 0 -> 15762 bytes 3 files changed, 2585 insertions(+) create mode 100644 notes/3_Basic_statistics.Rmd create mode 100644 notes/3_Basic_statistics.html create mode 100644 output/sd-vs-se.png diff --git a/notes/3_Basic_statistics.Rmd b/notes/3_Basic_statistics.Rmd new file mode 100644 index 0000000..9c8b529 --- /dev/null +++ b/notes/3_Basic_statistics.Rmd @@ -0,0 +1,408 @@ +--- +title: "Basic statistics" +subtitle: "[Intro2R](https://github.com/xp-song/Intro2R) crash course: Additional material" +author: "Author: [Song, Xiao Ping](https://xp-song.github.io)" +date: "Last updated on `r Sys.Date()`" +output: + html_document: + toc: true + toc_depth: 2 + toc_float: true + number_sections: true + theme: paper +--- + +--- + +This worksheet provides a brief overview of basic statistics that can be computed using the R programming language. + +# Data {.tabset .tabset-fade .tabset-pills} + +## Import + +First, load the example data `LifeCycleSavings`, which contains the savings ratio (population's personal savings divided by disposable income) aggregated for selected countries from 1960 to 1970. Five variables (columns) are reported for `r nrow(LifeCycleSavings)` countries in the world: + +- `sr`: aggregate personal savings ratio (in percentage) +- `pop15`: percentage of population under 15 +- `pop75`: percentage of population over 75 +- `dpi`: real per-capita disposable income +- `ddpi`: percentage growth rate of `dpi` + +```{r load example data} +data(LifeCycleSavings) + +# convert rownames to new column +LifeCycleSavings <- cbind(country = rownames(LifeCycleSavings), LifeCycleSavings) +rownames(LifeCycleSavings) <- NULL +``` + +```{r quick look} +nrow(LifeCycleSavings) # number of countries +head(LifeCycleSavings) # see first few rows +``` + + +View more details by running `?LifeCycleSavings` or `help(LifeCycleSavings)`. + +--- + +## Distribution + +Examine the frequency distribution of the personal savings ratio for all `r nrow(LifeCycleSavings)` countries: + +```{r summary statistics - histogram} +sr <- LifeCycleSavings$sr # assign to variable + +hist(sr) +``` + +Examine the minimum (`min()`) and maximum (`max()`) savings ratio across all countries. + +```{r min and max} +min(sr) +max(sr) +range(sr) # calculate both min & max at once +``` + +Examine the savings ratio at different quantiles across the distribution: + +```{r quantiles} +quantile(sr, 0.5) # 50% percentile (0.5 quantile) + +quantile(sr, c(0.025, 0.25, 0.75, 0.975)) # examine multiple quantiles +``` + + +
+ +--- + +# Summary statistics {.tabset .tabset-fade .tabset-pills} + +## Central tendency + +Explore summary statistics such as the average (`mean()`) and median (`median()`) savings ratio across all countries. + +```{r summary statistics - general} +sum(sr)/length(sr) # manually calculate the mean +mean(sr) # function for arithmetic mean + +median(sr) # 50% quantile, not sensitive to outliers +``` + +Alternatively, you can view basic summary statistics for all columns using the function `summary()`: + +```{r summary stats} +summary(LifeCycleSavings) # calculate for all variables at once +``` + +
+ +--- + +## Spread + +Calculate the standard deviation (σ), or dispersion around the mean. It is expressed in the same units as the original values: + +```{r sd} +sqrt(sum((sr - mean(sr))^2) / (length(sr) - 1)) # manually calculate sd, by taking the 'sum of squares' divided by the 'degrees of freedom' +sd(sr) # function to calculate sd +``` + +Calculate the variance (σ2): + +```{r variance} +sd(sr)^2 # calculate manually from the sd +var(sr) +``` + +Estimate the standard error: + +```{r se} +sd(sr) / sqrt(length(sr)) # manually calculate +``` + +Note: In sampling, we measure the **standard deviation (SD)** of a sample as a representation of the spread for the entire population (which is unknown). On the other hand, the **standard error (SE)** estimates the spread across _multiple_ samples of a population, and gives us an indication of how well the sampled data represents the whole population. A higher sample size will ensure a better representation (accuracy) of the actual spread within the population. + +```{r out.width = "50%", fig.align='center', dpi = 300, echo = FALSE, fig.cap="_**Figure: Difference between SD and SE.** Source: [Scribbr.com](https://www.scribbr.com/statistics/standard-error/)_"} +knitr::include_graphics("../output/sd-vs-se.png", dpi = 300) +``` + +The SE is often reported as confidence intervals (CIs), which tell us the range of values that an unknown population parameter (e.g., the mean) is expected to be within. It is common practice to report the CI for which 95% of all sample means are expected to be within, i.e., a 95% confidence level: + +```{r ci} +sample_se <- sd(sr) / sqrt(length(sr)) # formula to calculate SE from SD + + +# use t-distribution to estimate population SE +# assumes that the data were randomly sampled from whole population, and that the variable follows a normal distribution! + +deg_freedom <- length(sr)-1 +# 'degrees of freedom' is the max no. of values that are free to vary, in order to arrive at the estimated parameter + +t_score <- qt(p = 0.975, df = deg_freedom) +# t-score describes how far away (no. of SD) a data point is from mean of the t-distribution. We want the value for the 97.5% percentile (0.975 quantile), because a two-tailed distribution has 2.5% of data at both ends, which sums up to 5% of data being outside the CI + +margin_error <- t_score * sample_se + +lower_limit <- mean(sr) - margin_error +upper_limit <- mean(sr) + margin_error +c(lower_limit, upper_limit) +``` + +Alternatively, you can use the function `t.test()` to calculate the 95% CI: + +```{r t-test} +t.test(sr, conf.level = 0.95) +``` + +
+ +--- + +# Study design {.tabset .tabset-fade .tabset-pills} + +## Sampling + +In experiments or observational studies, random sampling is often required to avoid bias and ensure that the sampled data accurately represents the whole population. This section demonstrates various functions that may be used to for sampling. + +First, let's 'set the seed' for R's random number generator. This will allow us to reproduce the same random sequence whenever the code is re-run in future sessions. + +```{r} +set.seed(1) +``` + +Simple random sampling may be performed using the function `sample()`. For example, we can randomly sample 5 rows of `LifeCycleSavings` without replacement: + +```{r random sampling} +rows <- nrow(LifeCycleSavings) +rows_sampled <- sample(rows, 5, replace = FALSE) +LifeCycleSavings[rows_sampled, ] # subset to sampled rows + +``` + +Stratified random sampling can help ensure that the sampled dataset retains the same statistical information of the population. For example, we might want to choose 10 countries for further analysis, while ensuring that the savings ratio among the countries chosen retain the same distribution as the entire population: + +```{r stratified sampling with distribution retained} +# categorise sr into 5 categories based on quantiles (see section 1.2) +sr_quantiles <- quantile(sr, c(0, 0.025, 0.25, 0.75, 0.975, 1)) +sr_intervals <- cut(sr, breaks = sr_quantiles, include.lowest = TRUE) +LifeCycleSavings$sr_intervals <- sr_intervals # add intervals as new column +sr_frequencies <- table(sr_intervals) # count frequency of each category +LifeCycleSavings$sr_frequencies <- sr_frequencies[LifeCycleSavings$sr_intervals] # add frequency as new column + +# stratified random sampling +rows_sampled <- + sample(rows, 10, # choose 10 rows + prob = LifeCycleSavings$sr_frequencies, # higher frequencies have higher probability of being chosen, retain distribution + replace = FALSE) +LifeCycleSavings[rows_sampled, ] # subset to sampled rows + +``` +On the other hand, we may want _equal_ representation among categories/strata, to ensure sufficient variability in the sample. For example, we can ensure a uniform distribution in the savings ratio among the 10 countries chosen: + +```{r stratified sampling with uniform distribution, message = FALSE} +library(dplyr) # load required package (install if necessary) + +slice_sample( + group_by(LifeCycleSavings, sr_intervals), # group by categories + n = 2) # 2 countries per category (total of 10) +``` + +
+ +--- + +## Power analysis + +Power analysis is typically used to understand the sample size required to detect an effect. For any of the following tests, given any three of the four factors below, we can determine the fourth: + +- **Sample size**: Number of samples collected +- **Effect size**: Magnitude of the effect of interest +- **Significance level**: Probability of finding an effect that is not there (_Type I error_)^[_Type I Error_ (alpha): Rejecting the null hypothesis when actually true (scientific consensus = 0.05)] +- **Power**: Probability of finding an effect that is there, i.e., rejecting null hypothesis when it is false (_1 - Type II error_)^[_Type II error_ (beta): Accepting the null hypothesis when it is false (scientific consensus = 0.2)] + +```{r} +library(pwr) # load required package +``` + +There are many permutations of power calculations, depending on the factor of interest, or the type of test. Some examples to calculate the significance level are shown below: + +**T-test of means between two groups:** + +```{r power for t-tests} +pwr.t.test(n = 30, # equal no. of samples between 2 groups + d = 0.5, # effect size + sig.level = NULL, + power = 0.8) +pwr.t2n.test(n1 = 20, n2 = 30, # if sample sizes are unequal + d =0.5, + sig.level = NULL, power = 0.8) +``` + +**ANOVA test for means between multiple groups:** + +```{r power for anova tests} +pwr.anova.test(k = 3, # 3 groups + n = 30, # no. of samples per grp + f = 0.5, # effect size + sig.level = NULL, + power = 0.8) +``` + + +**Test of proportions between two groups:** + +```{r power for tests of proportions} +pwr.2p.test(n = 30, # comparing + h = 0.5, # effect size + sig.level = NULL, power = 0.8) +``` + +Refer to [statsmethods.net](https://www.statmethods.net/stats/power.html) more examples and tests (e.g., Chi-square tests, linear models, correlations). + + +
+ +--- + +# Statstical tests {.tabset .tabset-fade .tabset-pills} + +Some of the following sections involve comparisons between multiple groups (factors). Let's create a new column assigning each country to a continent (group), using the package `countrycode`: + +```{r add continent to data, warning = FALSE} +library(countrycode) # load required package + +LifeCycleSavings$continent <- + countrycode(sourcevar = LifeCycleSavings$country, + origin = "country.name", + destination = "continent") + +head(LifeCycleSavings) +``` + + +## Correlation + +Examine basic relationships between pairs of variables using the function `cor()`. Note that you can specify the type of test within the argument '`method`' (e.g. Pearson, Spearman, or Kendall correlation). + +```{r basic relationships} +dpi <- LifeCycleSavings$dpi # assign disposable income to new variable +cor(sr, dpi) # correlation coefficient 'r' is weakly positive (range of -1 to 1) +cor(sr, dpi)^2 # coefficient of determination 'R squared' - percentage variation in one variable explained by other variable (range of 0-1) + +cov(sr, dpi) # covariance - measure of joint variability between two numeric variables +``` + +The relationships between all variables (columns) can be examined using the function `plot()`: + +```{r plot relationships} +plot(LifeCycleSavings[,2:5]) # only plot for main variables +``` + +
+ +--- + +## Two-group means + +To compare the differences in means between two groups/samples, t-tests can be used. The following assumptions are made about the data: + +- **Independence**: Observations in one sample are independent of observations in the other sample (same data point should not appear in both samples) +- **Normality**: Both samples are approximately normally distributed +- **Homogeneity of Variances**: Both samples have approximately the same variance +- **Random Sampling**: Both samples were obtained using random sampling + + +```{r t-tests} +# compare sr between Europe & Asia +sr_europe <- LifeCycleSavings$sr[LifeCycleSavings$continent == "Europe"] +sr_asia <- LifeCycleSavings$sr[LifeCycleSavings$continent == "Asia"] + +t.test(sr_europe, sr_asia, var.equal = TRUE) # use 'var.equal = FALSE' to perform Welch's t-test, which does assume that the 2 groups have equal variances +``` + +We can check if assumptions violated. For example, we can check the normality of values using the function `shapiro.test()`, or graphically using the function `qqnorm()`; points should align well with the diagonal line: + +```{r check normality} +shapiro.test(sr_europe) # if p-value <0.05, data is non-normal + +qqnorm(sr_europe) +qqline(sr_europe) +``` + +Non-parametric tests of group differences can be used if assumptions are violated. For example, these include the Mann-Whitney U, Wilcoxon Signed Rank (for paired samples), and Friedman tests (for multiple test attempts). + +```{r non-parametric tests of group differences} +wilcox.test(sr_europe, sr_asia) # Mann-Whitney U test - does not assume that the two samples are normally distributed +``` + + +
+ +--- + +## ANOVA/ANCOVA + +We can compare the differences in means between multiple (≥ 2) groups. For example, the savings ratio between the five continents can be compared: + +1. Based on an analysis of variance (ANOVA) between groups. Note that the coefficient estimate and significance level of each group is compared against the intercept, which is defined by alphabetical order (in this example, Africa) + +2. Based on the analysis of covariance (ANCOVA) between groups. ANCOVA is a blend of ANOVA and linear regression, which statistically controls for the effects of other continuous variable (e.g. `dpi`) that are not of primary interest. + +```{r} +model <- aov(sr ~ continent, data = LifeCycleSavings) # ANOVA +summary.lm(model) +``` + + + +```{r} +model <- aov(sr ~ continent + dpi, data = LifeCycleSavings) # ANCOVA +summary.lm(model) +``` + +Importantly, ANOVA/ANCOVA also makes assumptions about the data (see Section 4.2). If assumptions are violated, for example, the (non-parametric) Kruskal–Wallis test may be used instead of ANOVA (it is an extension of the Mann–Whitney U test, which only allows for comparison between two groups). See [statsmethods.net](https://www.statmethods.net/stats/anovaAssumptions.html) for more details on how to test assumptions for ANOVA/ANCOVA. + + +Diagnostic plots may be used to examine whether assumptions are met. Click [here](https://arc.lib.montana.edu/book/statistics-with-r-textbook/item/57) to read more about interpreting these plots. + +```{r} +par(mfrow=c(2,2)) # combine four plots into one +plot(model) +``` + +
+ +--- + +# Further resources + +
+ +- [Understanding p-values through simulation](https://rpsychologist.com/pvalue/) + +- [Understanding maximum likelihood](https://rpsychologist.com/likelihood/) + +- [Statistical power and significance testing](https://rpsychologist.com/d3/nhst/) + +- [Confidence intervals](https://rpsychologist.com/d3/ci/) + +- [Correlations](https://rpsychologist.com/correlation/) + +- [t-distribution](https://rpsychologist.com/d3/tdist/) + +- [P-value distribution](https://rpsychologist.com/d3/pdist/) + +- [Cohen's d](https://rpsychologist.com/cohend/) + +- [Linear regression](https://mlu-explain.github.io/linear-regression/) + + +
+ +--- + +Creative Commons Licence + +Copyright © `r format(Sys.Date(), "%Y")` Song, Xiao Ping diff --git a/notes/3_Basic_statistics.html b/notes/3_Basic_statistics.html new file mode 100644 index 0000000..0a55244 --- /dev/null +++ b/notes/3_Basic_statistics.html @@ -0,0 +1,2177 @@ + + + + + + + + + + + + + + +Basic statistics + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

This worksheet provides a brief overview of basic statistics that can +be computed using the R programming language.

+
+

1 Data

+
+

1.1 Import

+

First, load the example data LifeCycleSavings, which +contains the savings ratio (population’s personal savings divided by +disposable income) aggregated for selected countries from 1960 to 1970. +Five variables (columns) are reported for 50 countries in the world:

+
    +
  • sr: aggregate personal savings ratio (in +percentage)
  • +
  • pop15: percentage of population under 15
  • +
  • pop75: percentage of population over 75
  • +
  • dpi: real per-capita disposable income
  • +
  • ddpi: percentage growth rate of dpi
  • +
+
data(LifeCycleSavings)
+
+# convert rownames to new column
+LifeCycleSavings <- cbind(country = rownames(LifeCycleSavings), LifeCycleSavings)
+rownames(LifeCycleSavings) <- NULL
+
nrow(LifeCycleSavings) # number of countries
+
## [1] 50
+
head(LifeCycleSavings) # see first few rows
+
##     country    sr pop15 pop75     dpi ddpi
+## 1 Australia 11.43 29.35  2.87 2329.68 2.87
+## 2   Austria 12.07 23.32  4.41 1507.99 3.93
+## 3   Belgium 13.17 23.80  4.43 2108.47 3.82
+## 4   Bolivia  5.75 41.89  1.67  189.13 0.22
+## 5    Brazil 12.88 42.19  0.83  728.47 4.56
+## 6    Canada  8.79 31.72  2.85 2982.88 2.43
+

View more details by running ?LifeCycleSavings or +help(LifeCycleSavings).

+
+
+
+

1.2 Distribution

+

Examine the frequency distribution of the personal savings ratio for +all 50 countries:

+
sr <- LifeCycleSavings$sr # assign to variable
+
+hist(sr)
+

+

Examine the minimum (min()) and maximum +(max()) savings ratio across all countries.

+
min(sr)
+
## [1] 0.6
+
max(sr)
+
## [1] 21.1
+
range(sr) # calculate both min & max at once
+
## [1]  0.6 21.1
+

Examine the savings ratio at different quantiles across the +distribution:

+
quantile(sr, 0.5) # 50% percentile (0.5 quantile)
+
##   50% 
+## 10.51
+
quantile(sr, c(0.025, 0.25, 0.75, 0.975)) # examine multiple quantiles
+
##     2.5%      25%      75%    97.5% 
+##  1.43875  6.97000 12.61750 18.17525
+


+
+
+
+
+

2 Summary statistics

+
+

2.1 Central tendency

+

Explore summary statistics such as the average (mean()) +and median (median()) savings ratio across all +countries.

+
sum(sr)/length(sr) # manually calculate the mean
+
## [1] 9.671
+
mean(sr) # function for arithmetic mean
+
## [1] 9.671
+
median(sr) # 50% quantile, not sensitive to outliers
+
## [1] 10.51
+

Alternatively, you can view basic summary statistics for all columns +using the function summary():

+
summary(LifeCycleSavings) # calculate for all variables at once
+
##    country                sr             pop15           pop75      
+##  Length:50          Min.   : 0.600   Min.   :21.44   Min.   :0.560  
+##  Class :character   1st Qu.: 6.970   1st Qu.:26.21   1st Qu.:1.125  
+##  Mode  :character   Median :10.510   Median :32.58   Median :2.175  
+##                     Mean   : 9.671   Mean   :35.09   Mean   :2.293  
+##                     3rd Qu.:12.617   3rd Qu.:44.06   3rd Qu.:3.325  
+##                     Max.   :21.100   Max.   :47.64   Max.   :4.700  
+##       dpi               ddpi       
+##  Min.   :  88.94   Min.   : 0.220  
+##  1st Qu.: 288.21   1st Qu.: 2.002  
+##  Median : 695.66   Median : 3.000  
+##  Mean   :1106.76   Mean   : 3.758  
+##  3rd Qu.:1795.62   3rd Qu.: 4.478  
+##  Max.   :4001.89   Max.   :16.710
+


+
+
+
+

2.2 Spread

+

Calculate the standard deviation (σ), or dispersion around the mean. +It is expressed in the same units as the original values:

+
sqrt(sum((sr - mean(sr))^2) / (length(sr) - 1)) # manually calculate sd, by taking the 'sum of squares' divided by the 'degrees of freedom'
+
## [1] 4.480407
+
sd(sr) # function to calculate sd
+
## [1] 4.480407
+

Calculate the variance (σ2):

+
sd(sr)^2 # calculate manually from the sd
+
## [1] 20.07405
+
var(sr)
+
## [1] 20.07405
+

Estimate the standard error:

+
sd(sr) / sqrt(length(sr)) # manually calculate
+
## [1] 0.6336252
+

Note: In sampling, we measure the standard deviation +(SD) of a sample as a representation of the spread for the +entire population (which is unknown). On the other hand, the +standard error (SE) estimates the spread across +multiple samples of a population, and gives us an indication of +how well the sampled data represents the whole population. A higher +sample size will ensure a better representation (accuracy) of the actual +spread within the population.

+
+_**Figure: Difference between SD and SE.** Source: [Scribbr.com](https://www.scribbr.com/statistics/standard-error/)_ +

+Figure: Difference between SD and SE. Source: Scribbr.com +

+
+

The SE is often reported as confidence intervals (CIs), which tell us +the range of values that an unknown population parameter (e.g., the +mean) is expected to be within. It is common practice to report the CI +for which 95% of all sample means are expected to be within, i.e., a 95% +confidence level:

+
sample_se <- sd(sr) / sqrt(length(sr)) # formula to calculate SE from SD
+
+
+# use t-distribution to estimate population SE
+# assumes that the data were randomly sampled from whole population, and that the variable follows a normal distribution!
+
+deg_freedom <- length(sr)-1 
+# 'degrees of freedom' is the max no. of values that are free to vary, in order to arrive at the estimated parameter
+
+t_score <- qt(p = 0.975, df = deg_freedom) 
+# t-score describes how far away (no. of SD) a data point is from mean of the t-distribution. We want the value for the 97.5% percentile (0.975 quantile), because a two-tailed distribution has 2.5% of data at both ends, which sums up to 5% of data being outside the CI
+
+margin_error <- t_score * sample_se
+
+lower_limit <- mean(sr) - margin_error
+upper_limit <- mean(sr) + margin_error
+c(lower_limit, upper_limit)
+
## [1]  8.397682 10.944318
+

Alternatively, you can use the function t.test() to +calculate the 95% CI:

+
t.test(sr, conf.level = 0.95)
+
## 
+##  One Sample t-test
+## 
+## data:  sr
+## t = 15.263, df = 49, p-value < 2.2e-16
+## alternative hypothesis: true mean is not equal to 0
+## 95 percent confidence interval:
+##   8.397682 10.944318
+## sample estimates:
+## mean of x 
+##     9.671
+


+
+
+
+
+

3 Study design

+
+

3.1 Sampling

+

In experiments or observational studies, random sampling is often +required to avoid bias and ensure that the sampled data accurately +represents the whole population. This section demonstrates various +functions that may be used to for sampling.

+

First, let’s ‘set the seed’ for R’s random number generator. This +will allow us to reproduce the same random sequence whenever the code is +re-run in future sessions.

+
set.seed(1)
+

Simple random sampling may be performed using the function +sample(). For example, we can randomly sample 5 rows of +LifeCycleSavings without replacement:

+
rows <- nrow(LifeCycleSavings)
+rows_sampled <- sample(rows, 5, replace = FALSE)
+LifeCycleSavings[rows_sampled, ] # subset to sampled rows
+
##        country    sr pop15 pop75     dpi ddpi
+## 4      Bolivia  5.75 41.89  1.67  189.13 0.22
+## 39      Sweden  6.86 21.44  4.54 3299.49 3.01
+## 1    Australia 11.43 29.35  2.87 2329.68 2.87
+## 34 Philippines 12.78 46.26  1.12  152.01 2.00
+## 23       Japan 21.10 27.01  1.91 1257.28 8.21
+

Stratified random sampling can help ensure that the sampled dataset +retains the same statistical information of the population. For example, +we might want to choose 10 countries for further analysis, while +ensuring that the savings ratio among the countries chosen retain the +same distribution as the entire population:

+
# categorise sr into 5 categories based on quantiles (see section 1.2)
+sr_quantiles <- quantile(sr, c(0, 0.025, 0.25, 0.75, 0.975, 1))
+sr_intervals <- cut(sr, breaks = sr_quantiles, include.lowest = TRUE)
+LifeCycleSavings$sr_intervals <- sr_intervals # add intervals as new column
+sr_frequencies <- table(sr_intervals) # count frequency of each category 
+LifeCycleSavings$sr_frequencies <- sr_frequencies[LifeCycleSavings$sr_intervals] # add frequency as new column
+
+# stratified random sampling
+rows_sampled <- 
+  sample(rows, 10, # choose 10 rows
+         prob = LifeCycleSavings$sr_frequencies, # higher frequencies have higher probability of being chosen, retain distribution
+         replace = FALSE)
+LifeCycleSavings[rows_sampled, ] # subset to sampled rows
+
##       country    sr pop15 pop75     dpi ddpi sr_intervals sr_frequencies
+## 6      Canada  8.79 31.72  2.85 2982.88 2.43  (6.97,12.6]             24
+## 18   Honduras  7.70 47.32  0.58  232.44 3.19  (6.97,12.6]             24
+## 20      India  9.00 41.31  0.96   88.94 1.54  (6.97,12.6]             24
+## 10 Costa Rica 10.78 47.64  1.14  471.24 2.80  (6.97,12.6]             24
+## 45  Venezuela  9.22 46.40  0.90  813.39 0.53  (6.97,12.6]             24
+## 14     France 12.64 25.06  4.70 2213.82 4.52  (12.6,18.2]             11
+## 2     Austria 12.07 23.32  4.41 1507.99 3.93  (6.97,12.6]             24
+## 5      Brazil 12.88 42.19  0.83  728.47 4.56  (12.6,18.2]             11
+## 21    Ireland 11.34 31.16  4.19 1139.95 2.99  (6.97,12.6]             24
+## 50   Malaysia  4.71 47.20  0.66  242.69 5.08  (1.44,6.97]             11
+

On the other hand, we may want equal representation among +categories/strata, to ensure sufficient variability in the sample. For +example, we can ensure a uniform distribution in the savings ratio among +the 10 countries chosen:

+
library(dplyr) # load required package (install if necessary)
+
+slice_sample(
+  group_by(LifeCycleSavings, sr_intervals), # group by categories
+  n = 2) # 2 countries per category (total of 10)
+
## # A tibble: 10 × 8
+## # Groups:   sr_intervals [5]
+##    country        sr pop15 pop75   dpi  ddpi sr_intervals sr_frequencies
+##    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <fct>        <table>       
+##  1 Iceland      1.27  34.0  3.08 1900.  1.12 [0.6,1.44]    2            
+##  2 Chile        0.6   39.7  1.34  663.  2.67 [0.6,1.44]    2            
+##  3 Paraguay     2.02  41.2  1.05  221.  1.03 (1.44,6.97]  11            
+##  4 Turkey       5.13  43.4  1.08  390.  2.96 (1.44,6.97]  11            
+##  5 Nicaragua    7.3   45.0  1.21  326.  2.48 (6.97,12.6]  24            
+##  6 Venezuela    9.22  46.4  0.9   813.  0.53 (6.97,12.6]  24            
+##  7 Italy       14.3   24.5  3.48 1390   3.54 (12.6,18.2]  11            
+##  8 Philippines 12.8   46.3  1.12  152.  2    (12.6,18.2]  11            
+##  9 Japan       21.1   27.0  1.91 1257.  8.21 (18.2,21.1]   2            
+## 10 Zambia      18.6   45.2  0.56  138.  5.14 (18.2,21.1]   2
+


+
+
+
+

3.2 Power analysis

+

Power analysis is typically used to understand the sample size +required to detect an effect. For any of the following tests, given any +three of the four factors below, we can determine the fourth:

+
    +
  • Sample size: Number of samples collected
  • +
  • Effect size: Magnitude of the effect of +interest
  • +
  • Significance level: Probability of finding an +effect that is not there (Type I error)1
  • +
  • Power: Probability of finding an effect that is +there, i.e., rejecting null hypothesis when it is false (1 - Type II +error)2
  • +
+
library(pwr) # load required package
+

There are many permutations of power calculations, depending on the +factor of interest, or the type of test. Some examples to calculate the +significance level are shown below:

+

T-test of means between two groups:

+
pwr.t.test(n = 30, # equal no. of samples between 2 groups
+           d = 0.5, # effect size
+           sig.level = NULL, 
+           power = 0.8) 
+
## 
+##      Two-sample t test power calculation 
+## 
+##               n = 30
+##               d = 0.5
+##       sig.level = 0.2759254
+##           power = 0.8
+##     alternative = two.sided
+## 
+## NOTE: n is number in *each* group
+
pwr.t2n.test(n1 = 20, n2 = 30, # if sample sizes are unequal
+             d =0.5, 
+             sig.level = NULL, power = 0.8) 
+
## 
+##      t test power calculation 
+## 
+##              n1 = 20
+##              n2 = 30
+##               d = 0.5
+##       sig.level = 0.368904
+##           power = 0.8
+##     alternative = two.sided
+

ANOVA test for means between multiple groups:

+
pwr.anova.test(k = 3, # 3 groups
+               n = 30, # no. of samples per grp
+               f = 0.5, # effect size
+               sig.level = NULL, 
+               power = 0.8)
+
## 
+##      Balanced one-way analysis of variance power calculation 
+## 
+##               k = 3
+##               n = 30
+##               f = 0.5
+##       sig.level = 0.0006734242
+##           power = 0.8
+## 
+## NOTE: n is number in each group
+

Test of proportions between two groups:

+
pwr.2p.test(n = 30, # comparing
+            h = 0.5, # effect size
+            sig.level = NULL, power = 0.8)
+
## 
+##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
+## 
+##               h = 0.5
+##               n = 30
+##       sig.level = 0.2716841
+##           power = 0.8
+##     alternative = two.sided
+## 
+## NOTE: same sample sizes
+

Refer to statsmethods.net +more examples and tests (e.g., Chi-square tests, linear models, +correlations).

+


+
+
+
+
+

4 Statstical tests

+

Some of the following sections involve comparisons between multiple +groups (factors). Let’s create a new column assigning each country to a +continent (group), using the package countrycode:

+
library(countrycode) # load required package
+
+LifeCycleSavings$continent <- 
+  countrycode(sourcevar = LifeCycleSavings$country,
+              origin = "country.name",
+              destination = "continent")
+
+head(LifeCycleSavings)
+
##     country    sr pop15 pop75     dpi ddpi sr_intervals sr_frequencies
+## 1 Australia 11.43 29.35  2.87 2329.68 2.87  (6.97,12.6]             24
+## 2   Austria 12.07 23.32  4.41 1507.99 3.93  (6.97,12.6]             24
+## 3   Belgium 13.17 23.80  4.43 2108.47 3.82  (12.6,18.2]             11
+## 4   Bolivia  5.75 41.89  1.67  189.13 0.22  (1.44,6.97]             11
+## 5    Brazil 12.88 42.19  0.83  728.47 4.56  (12.6,18.2]             11
+## 6    Canada  8.79 31.72  2.85 2982.88 2.43  (6.97,12.6]             24
+##   continent
+## 1   Oceania
+## 2    Europe
+## 3    Europe
+## 4  Americas
+## 5  Americas
+## 6  Americas
+
+

4.1 Correlation

+

Examine basic relationships between pairs of variables using the +function cor(). Note that you can specify the type of test +within the argument ‘method’ (e.g. Pearson, Spearman, or +Kendall correlation).

+
dpi <- LifeCycleSavings$dpi # assign disposable income to new variable
+cor(sr, dpi) # correlation coefficient 'r' is weakly positive (range of -1 to 1)
+
## [1] 0.2203589
+
cor(sr, dpi)^2 # coefficient of determination 'R squared' - percentage variation in one variable explained by other variable (range of 0-1)
+
## [1] 0.04855805
+
cov(sr, dpi) # covariance - measure of joint variability between two numeric variables
+
## [1] 978.2825
+

The relationships between all variables (columns) can be examined +using the function plot():

+
plot(LifeCycleSavings[,2:5]) # only plot for main variables
+

+


+
+
+
+

4.2 Two-group means

+

To compare the differences in means between two groups/samples, +t-tests can be used. The following assumptions are made about the +data:

+
    +
  • Independence: Observations in one sample are +independent of observations in the other sample (same data point should +not appear in both samples)
  • +
  • Normality: Both samples are approximately normally +distributed
  • +
  • Homogeneity of Variances: Both samples have +approximately the same variance
  • +
  • Random Sampling: Both samples were obtained using +random sampling
  • +
+
# compare sr between Europe & Asia
+sr_europe <- LifeCycleSavings$sr[LifeCycleSavings$continent == "Europe"]
+sr_asia <- LifeCycleSavings$sr[LifeCycleSavings$continent == "Asia"]
+
+t.test(sr_europe, sr_asia, var.equal = TRUE) # use 'var.equal = FALSE' to perform Welch's t-test, which does assume that the 2 groups have equal variances
+
## 
+##  Two Sample t-test
+## 
+## data:  sr_europe and sr_asia
+## t = 0.93376, df = 24, p-value = 0.3597
+## alternative hypothesis: true difference in means is not equal to 0
+## 95 percent confidence interval:
+##  -2.144784  5.688995
+## sample estimates:
+## mean of x mean of y 
+##  11.57211   9.80000
+

We can check if assumptions violated. For example, we can check the +normality of values using the function shapiro.test(), or +graphically using the function qqnorm(); points should +align well with the diagonal line:

+
shapiro.test(sr_europe) # if p-value <0.05, data is non-normal
+
## 
+##  Shapiro-Wilk normality test
+## 
+## data:  sr_europe
+## W = 0.8979, p-value = 0.04458
+
qqnorm(sr_europe)
+qqline(sr_europe)
+

+

Non-parametric tests of group differences can be used if assumptions +are violated. For example, these include the Mann-Whitney U, Wilcoxon +Signed Rank (for paired samples), and Friedman tests (for multiple test +attempts).

+
wilcox.test(sr_europe, sr_asia) # Mann-Whitney U test - does not assume that the two samples are normally distributed
+
## 
+##  Wilcoxon rank sum exact test
+## 
+## data:  sr_europe and sr_asia
+## W = 86, p-value = 0.2792
+## alternative hypothesis: true location shift is not equal to 0
+


+
+
+
+

4.3 ANOVA/ANCOVA

+

We can compare the differences in means between multiple (≥ 2) +groups. For example, the savings ratio between the five continents can +be compared:

+
    +
  1. Based on an analysis of variance (ANOVA) between groups. Note +that the coefficient estimate and significance level of each group is +compared against the intercept, which is defined by alphabetical order +(in this example, Africa)

  2. +
  3. Based on the analysis of covariance (ANCOVA) between groups. +ANCOVA is a blend of ANOVA and linear regression, which statistically +controls for the effects of other continuous variable +(e.g. dpi) that are not of primary interest.

  4. +
+
model <- aov(sr ~ continent, data = LifeCycleSavings) # ANOVA
+summary.lm(model) 
+
## 
+## Call:
+## aov(formula = sr ~ continent, data = LifeCycleSavings)
+## 
+## Residuals:
+##      Min       1Q   Median       3Q      Max 
+## -10.3021  -2.0500   0.3556   2.1000  11.3000 
+## 
+## Coefficients:
+##                   Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)        10.9400     1.8602   5.881 5.04e-07 ***
+## continentAmericas  -3.7356     2.1311  -1.753   0.0866 .  
+## continentAsia      -1.1400     2.4356  -0.468   0.6420    
+## continentEurope     0.6321     2.0907   0.302   0.7638    
+## continentOceania    0.1100     3.4801   0.032   0.9749    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 4.16 on 44 degrees of freedom
+##   (1 observation deleted due to missingness)
+## Multiple R-squared:  0.1887, Adjusted R-squared:  0.115 
+## F-statistic: 2.559 on 4 and 44 DF,  p-value: 0.05173
+
model <- aov(sr ~ continent + dpi, data = LifeCycleSavings) # ANCOVA
+summary.lm(model)
+
## 
+## Call:
+## aov(formula = sr ~ continent + dpi, data = LifeCycleSavings)
+## 
+## Residuals:
+##      Min       1Q   Median       3Q      Max 
+## -10.3458  -1.9905   0.2849   2.1321  10.9703 
+## 
+## Coefficients:
+##                     Estimate Std. Error t value Pr(>|t|)    
+## (Intercept)       10.8342824  1.8895049   5.734  8.9e-07 ***
+## continentAmericas -3.9408796  2.1919761  -1.798   0.0792 .  
+## continentAsia     -1.1746300  2.4581846  -0.478   0.6352    
+## continentEurope    0.0711454  2.4104787   0.030   0.9766    
+## continentOceania  -0.4978399  3.7316721  -0.133   0.8945    
+## dpi                0.0003739  0.0007777   0.481   0.6332    
+## ---
+## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
+## 
+## Residual standard error: 4.196 on 43 degrees of freedom
+##   (1 observation deleted due to missingness)
+## Multiple R-squared:  0.1931, Adjusted R-squared:  0.09922 
+## F-statistic: 2.057 on 5 and 43 DF,  p-value: 0.08953
+

Importantly, ANOVA/ANCOVA also makes assumptions about the data (see +Section 4.2). If assumptions are violated, for example, the +(non-parametric) Kruskal–Wallis test may be used instead of ANOVA (it is +an extension of the Mann–Whitney U test, which only allows for +comparison between two groups). See statsmethods.net +for more details on how to test assumptions for ANOVA/ANCOVA.

+

Diagnostic plots may be used to examine whether assumptions are met. +Click here +to read more about interpreting these plots.

+
par(mfrow=c(2,2)) # combine four plots into one
+plot(model)
+

+


+
+
+
+ +
+
+
    +
  1. Type I Error (alpha): Rejecting the null +hypothesis when actually true (scientific consensus = 0.05)↩︎

  2. +
  3. Type II error (beta): Accepting the null +hypothesis when it is false (scientific consensus = 0.2)↩︎

  4. +
+
+ + + +
+
+ +
+ + + + + + + + + + + + + + + + diff --git a/output/sd-vs-se.png b/output/sd-vs-se.png new file mode 100644 index 0000000000000000000000000000000000000000..4901d3a5a5682b959566056b72706133e6a94a1d GIT binary patch literal 15762 zcmdVBbx>R3*De|e4NwRg+@-WgA*GbyR@~i-6096MA zi2R1Je$pO!Li;SLAPN9fMq=F?KSPdDoK&U60A-_;yT}2iy|j)K0DwjK?~MXTNhL*o zxvea(F7fpA#5PdRInn-G8Nfc&$T8CVW~>!ynm?>Z}}*m z=;-K^i=)GiW}eJ z(ju07yRx#fjEv0a$!TkAYePeW_p}kcH_(EN~JX8XU0r%#E z^W3+XcK}~*8%ueVI+T0sR}e<NJF7KQYAIi2q_NVFr$-D`Sna-BLj}3l)(d&s+w!RObeb%%rA9IYd%H~#! zuVse1(?d$#;q9vZtMFG&76gwY&;zbJAPlknr`!zME&)$EIGC%2hL*7!?FtHVDYD7ow4cRr~lhf~Sk~LW7g?3GkI7K@SM(ciz z9vksJYvO!I`*7b>l~($_{zWXd;D*;a!Z_de_HwyDC?6^i+& z%I2s!MS&dGpk9I1D}JZ5vdcO-{l$zc8)x^WPx*9$%bN-x-1c#0qSHc0Ehsizl>1jp zFG!qDm$$xHp{X@(9_@Z5!}?)2YrnN1O6J-jm}}_3<$j9Z3oHiRF<-{XHSY=hk4TiI9pYn51 z+EZQXo7HxE2EzAL%1@wxLkV$?oECu(*=>-$Ip-FJ337t^s{)_9nEbdz)2PYsl(o5Q zqlW6U1!l^w0$KOj0~w$%*a&aMP03GN+=GX%zRo0?e8llnqwr$34^_ckjXj^FgsR?d zgsDv(X8*XHWOn1^{5Ci+P{vL6y_T2p@WokdtyW^bR?Orjr&q-HLKbFqmo^+@i~&21 z5PU7n0&R_W`IoWGWI>AFt+i=mNbIKNjfG|hww&V}gyO4_E5&gTT4S>PR zjr$oDHY~*i=@l~F=lEpIeI)bVZnF%*wGS>UM~o=EJhcN_CehiJqKn-c0mWczsoUjE z&6cD_KAh)dEv-C8k?K0(jxEL9Cxen{Jr#nwmBU`dfaTh{sLdVnJ&tt#v29Q{q3NC`%$y8M%X$;?oM+u zxL5=_bFk&2X5M?tVy>c=WfI%#>9!^Hn_XBD6r2X3Oqwbn@ECce-ekW)lQ72AV-e?m zRO7pPl9gK<-yguv&>y!;tyZq-C*_*nbk|!rWqR9)?GRK;GM-ZBRP=R#8YejJSgD*Z z4=-l6IJQrk*-_o*NJGc|_^ev}Z-zaRl(F@xX|>@IcvJr)CkeGxMzQ(Q5NFIE_*lQJtt zHc59fD%?qxO5#j99iHlK=_}txo~qvp@bV`6-1rUNQj5~mdQ~zf<><<}^T@y1eRJRiJG#l$^v@CXg2y03$Ra5MA8q)P& z%l1p#L^3`&%GTpPC-YXzVjEZ>CgZ}%lTl$~aN}Q5^p@2i&KXC~f0-7i<>hV*CH|z( zoa)HWGArbH8TSL*U~FQld6x@^VsgMVEL?5DWRQn%&}fFUK}ZVtu+%}Xqk`Zpk39gM z1QNgX{yZ}(rTWOJm&qt=6(N*S~D@SIr9(!@|?8YH-w$1H@DYx{-CPXZ~nEZs((sR_(G=emw|x!Je4H)0^K4aX-|7dVK%W>yeeM=D&Q1bO&GfEEcf=!qlT$V#%a zA#IH$7;<+G@nP7C*`k?sN(0mJxFioeuS`#7V(ES4{Wj2o&g^AV-ydGWht^z+&5dsM zc0T3G-qa8-8wvTHroKMUr=5C-dvkqc>0;K;o;XdE{^*FpyVhA?&E~v0WV&z@SzD}> z2$5FEAG*mC7ZQg{oBjD&n!T>-Cpfr(ZhskQE;rLr#o?f=1dIcJ{l!$*O1pa1r(uwh zg909FYb&jjSXS@4K2bjWhe^n`_C3z465i6C`G{_1=`$Y%5-Qu5QotkeQOlpd19m^Y zG)vCC_$uFbsMVsRn;%@oib6=D;8&RcCxVyEOi%Y zG%Z>sfvnsmJqGDhg%T4dx8tl<>Hjvl;e#JBp) z>~E2L)cm_k0T8i|$&QIAKR&&iu7r%qwECLhYjC|Qpx3kMP+g!WQYcKloHw;3<%H@G z#l@7w(yQhD*;APtlEK&Ugc$BfC@o)2Z}9jxwXrb071T3PmFQiDg@v@OuY^fwydd*x zcBrz8obY#LiWrd=Hn>w=;M(vkG;jY8Jb9;$ubGa@QdGMnWL@DcGy_@9J$#iobyMC- ze^oV|`XqfOWnLgSN0&XH%$`4xa^DVK87#^aq$%52+$vg725WPc*ZTgBbVFW(zG~}Y zM7C)pEYFHlSzFa6#4jg5T#yJPUK9-*O7BunCEEbHHwsp((z~i(Tz&W{%%!e(z_xXP z@U4=i0=MLG7dl<>meqeKFK)UCD#Tp0T*T^M)hmWGVQA7o5>U3!Tvrw)&DC)57x=;q ze1m)}c5p}eDvLes_FHh0ul62htZv$l41#}cd_|lN)zv=+PwblxLBSj{wG)OlzQ<5R zH%W^@`Nj=?uaJM+w zCj7VgcU0$!7ngqTic|Ar=SpcX_TvIXx&PMdF@Q50alrn_b*zkUGV|NZc9 z{{S-9S)aM@=E?(;-D{%FZ`nN>;&j*zfF9e?XyJxow4At4p;-5D&c8HX+Y50Gb!g-2 zBVVV@yWXQ2iik*~(tYd7)Z9g%o>bX>ddg^aLlru5Nc9h>sc0JIeb<{DV>4-0ex!&Y zI7NAF1N{p@ox-l@^!&IgN%L6Kxk`x9VFa|s6O`2p#b-1=K@IrTL8miA1AHSH1@EN5 zkgwLM|8@y(zje$vBNcrPI;N%NW@aQ6#R2shRTPn4bgqIi+7ZMOye2+51}-l|X-g%9 zd~Kq48D<+LsKWw95FHm1;ZtM+^bRFp8~G9G##|~XvLb1SZ=Ndv9T{6tE&mqOZG~Bu2cxz$l5@D;ceQ< z#_QIbbMS_)jk5~&t$URD7|&lp0+*}1TU)2L`VbErWde9T#`ca{GZ6WRu2>BXqX=7Z zdV)bEek}6NiLEFWRhBgt+=N3xtEY7QY=7uv{#P~kJN|iHV7z5V1(J@sVEOtjE0lP z{KV$7CDN}xX{nR-zh)fQ+?uV zkK}a+z}?WdWZv~D>EPnl;9hN$UGoqrh`!k{v!aRnlcIWPrWg<>(cdLv)06X%_EU!| zVsJ4hhZm&b$-qwZR@k#iI7v#zQ$6+}Q}WKp4Hp{$1I0q`T;`OVYSDWqXJ(vWcgKg$ zvVMOoa1Xr54tG1VzPa>wiX3)r@uTe9kJiQ`dy{x^7!hWl5^Qgv6mr-_(B`bAw~GM5 zVVv%K_^6w&Dsi$mV;$b8$4kIu3i8pvHWxFIoGwp2?>`8xTZi(HzjocS#KdQzkKGg5 zHqiT3<>HYL*WX%A^Vy3t;UV(}1p4hguG}@%dyJT^Lz$|(vYhGID(f$%UCjrnUSzH{ zvI-%CSgj~SI<7A)V%dy=zU535*y^b}pm{t(w_+JZN9K8En=LuVgwiZ|=aSu4e@A^t zoEUW6tUoQ4&?&t_T+OWiIFu4Q7&GzQ{#5}}s9ep@I~O|AZz%bHqUP zv@+O>9@`m_HmGwYdZqrnYgK9aY0hC z@@)Y2<~;41CyNIrEU`_KBwjdVCe~-6)sUN1y4NTU8!ns;g|@u}&8;>s0>DH1pnoW!4F#`&$QfNAi`sr9YE8qikMPSwaEJdtIsu(;pnfoT>5Gq~rstaTk<+$+Xe|PO3%*_3Nxs9?!ih!h6fkJ8;?J6J2-2?-`j~D4b z0->46rMt!*x&QR~-$-?U=0pm?xeB4~oX<$Do4`qs0Hm0)+3_F$=hQL!&qheMkbfL8 zY6EIs>b_-01$qLl0U(HkeCU4!C`0X{=A%qbZw?Tny|EWF7~RdF;_J zMKZ<~3QG<6k7~#OVqx?aN;RKU(f+GEUUSA~6aCY1|KHdM&Ss5VPw^klq$kUt{{vDh zx)OyHnc=D?TO-`BVWH77_@ucB@{&!l^vyHvl%ozYtp?j??d8N@YuBIw=8)i*zrQ@$yZ9JF~6jnGTL=#784-+ty}piWC0{xtN{fYhMYhYjXedm_-vxwU~_SjS(CW#Oms2tOx zWfGsE@1D)uokHQJ$IPrB<5>%_DWXQd1~z@-qSMC=#`m~~X0D?5r{j%S$*$s9PfaNegz%$3_%Y4(uyQXl$CL%fQ){`- zj(XNP0(PHJPP$Y3)6yly;5c^Ra*?EbAoto>_yAqh&n1>VO&V;NrxKJc2z@P8TJ4rK zn>}y)>~^eiC;n>gHBb80oQuZGL|A(5`5r9f+@T>*CR)ursncbe)ccpM^F0Sv0I?_+ zb5j{;wl+hw+cBSCB&VZ;>vupA%yq*w1stJ1iccM}>Px*Y05x*NQvk@X>Kk*K^ z`f7D-T)?37h1GLiftI7;m;p?y*|hoK@{)%J4|UWCR`uGAKL>@dq!+Okt>4+XUOv0l z@-pKd2eGOvwC`MaO+*OU=kHrTJBC(l1UD6W0WVqQyFF*=>jRZwa~Eaw^$+<)uw>?D z4vIDtEqEc0j~{pJ(Y{mG>~VNWqi?&gc|@xPif$_^oGUCF11(KH>&>u+i|0KU_&G%$ zAvLh6ZCh^dbEx&?;;Y(L&_CMPXA|F5|3y=9^i^-*MCjYStKXo#<6drM)^bCG!jmo% z?2A$IV#h^;HZc7YBE`~}Uyrz3NfoAe)QRtXYkj6j8qAJ7V8u1qMnal6tRNbU!Q{H*)F5KJeX$!A+sq%Y& zsgz_&@uXc9I6cLO6ec(T70P6Mk{j*X%=)9K@#h&86`6A#$YZRT1@ zvDrs*SNORlJ~?4U1niQ|$bGm2}>Qoqr!@ z)Kq+Nt~K&Uqh4q?cLS8Oo;Y4((A25ym0Wy@h4?un9V@f|~9(MRz zn61=g)u*V1x09) z;%Qzl8GTtE6)6z59Mz&x`R>m0@okuwaf7y>^6%90!KP|fsB{NdGq6b4>F!;8d7Cpl z+sYD1D8B$4{*m=OU&akzJ}g0H&ZfOpg|<>L+V9Oo7rDoS}vFQ%bxf$YhHJD#|KkQ8}@h7{w~&6PfM>?(r54}Tiie)bo$$^ zkJe(f$9co{Rr9ipPx|v4tvx%?; zLV-yBPy+gn^;~~PZsYLhs{ut7i^{F$TsK)SzS-0KaA~_!Kfxa$hV!c95g}?#c#>By zVKEpLZq!c7yCU>f9_xn_|1wL?{%gU9_DY8l%25F{urpQST;l9T~XP zqPzU`BSSW)!%(mmQFwa?xIghp=%h86Yb^-Wlg?j@x^3-$s#f7;2NfZFuKc{_KZYr) zv{cA_R;*nE^~$QmP>8E;D_x_PhNIX3 zYCSk=NVB~*tm5BX>Ci&3Juu6bwp2~r2JqirD{Z4Vk;Wa!8mJ+lyD8uqEK@CrLKz80 zY}sFNLihu+kFxbgcBJD3$%-n)q~pvcL}aiIFy$3Q#~umCC`vu#JB7dKZ6lBE$@L7# zNTdJ**`s}ZsrJ9{?Sh)`ugH+-oI^WA4h~q^nX{4oSGxdn7qZ>W#ehkZARYaOOhjMxY5!E0h zU>W?h>^P(V;z%q=s7DfY{|gLKtaNjQ|Blh&uc$JJBz4>Jf)h#eKM(=P@K9+1%(eQl z$K8d0BV>5WbmLHpac0LkhlqKIe8DRX z>>Y>-a%Mxo`-1*JCRP($ZSMPz0Jkb3+p zsIj(5J_&(jRBUNv-9R(N^VQ=1rT!m`r4Hk-MgNuhe~^{28GEN<9pM;LW;dT*`60mp zB(D#s8I7Y*fUlnvjqTYn;rd6+BI)emFbO{=!iYB_8>4ao^04&!dcpt2|Utg<`na z{Soi~GT~EVE}!76$!KO}&Fk3tcrxN?_lY@{@Wu!_68ejm`Sp~&@WT2$8*5Y1AZeW> z=U&PcA|ffFs36L_rZtof1dRW-nQnSeF#gPO8cpFP{UYZd4Kq;mt@tO9FXSUSpJ+9r zjsPA|3Fl`PkwmO+TJi5lle+#*4X^Cn6=E>mT@?u=!{}P|7Jg~+zIPy&8z`)JA!APL zv1Lt3O+)>KD@@&!$q4)&2h^mKQEm95k2-iz!q;Z8JZ4DvH}u|p?X&RfNx_m~7%q0g z?Jd{nA@xEVr`xQ9f{a49aH(6EF6@oMfSN`0gN62wm>}H`cp$#l^E)=Nsqy=shj@Ybh|WE z8f*j8&|sifeO4Of(VrQe;W8E+HiX|sQIk>lYhNagmHBGTmr9X-944nK5Ku5&KUGL! zU1r`BebzUs)ncB^_2izl8IgcP^)i$iVazDe*!X5I~w_d1(PlEOxy<=60hNP8u02+TC`ta@ywo`^bYkTI1YwP0NA^GRc&i z>3-N>b|6R7H>64TQ($4h7+i$4PtqFg|7p}$V7fbdfVbXF)hP4$EP0&-J^P`LEzjD0 zs?-vT`!F^u|6<}f=}eK7QAL7=i` zI*c6SydSrh4=Tx&c2M*3`F+?F^?9KtGn&x}HCi2X4kKuyVy%fAYQJa&&!jbP-uK7);K%i>^UL$$>+msaQGbQ0k= zm9qe2P{0cqH1rVTEr!0FEiERRDCtizBfU4G+9F1T@T7YAfT06etDfE8Xn7r5)5uHfH8iRqWv zpzcscZ*&#V5zX~NAn3yYx>3dRu8At{3QhXHK#`G>L_$$I4Q9#ELKOqN=F6N2T#z5A z)1=@P4jL#Zj0g|Z`)P1*W<$(K{$~YR)}(^haBn&z23i@#uxLe{EgnfANrX|QcmzrS zit!r86`zFd!_RKqt~QjwJdI5kS|~gnZUlxW;r-+bEOh~rDbhlKK(Xb)^lTS8ZIn&| zkd;VNFf6*>(P1%R^FRB(({w!ZCBIj8Lnrean2bFS*4mBt8YiYc0xY!Gr ze}NS_?)!f2cf~n^^@rcKRFT?W{6NvO=VigQYi1XpC7uf95iRSYBO~qCgP{>GyC-Pc z^P#+oe^#!~!iuHB*N?wuw4M1egm9kBuH2}waF6O5MKz@O_>{NksgbZZ6nd&aNn1iX z3-3604BM;XguOjJUsyvD1?!{wM;OZ63O(zv=Xcli_`bCUE;fJnBp?Q6_}pK*`BH+U zjG|il6qkG-`^JY)dOcrvb^k3#bkdelOdAL&S#pQc8aB)S0vq_MToxMKKb%ZyMMPdm zXQu`Ac6_Gsdq`{O{M@xF)w}eT`o0hG`zpWK+IaCK1u$_=_ACxXqJc5j*6L7^aO&yl z$C0j;L6WH3Q6ukBY-zHY-fJ8y>R^wVMOt<`mXzwtv*QDYh8$RfL;4eOs2{B)z^QiY z>A--iedcD7DL$|B7|Je>XttWu*Nz>glQW0;e6Z7%wH@m|nU`AUH%vGeVv$fb<`lBs_qgu!N`%ZV21Cd5G=YfBK18@0$;jknmjGe79xG(}?=N}yq zk)^O3pe`hUbI!MfDZyY+vvi{9{Jn_CdCZ2l4YEvAG6p8QSR2mxtRY^1R{BTkG|pmV zg(UYaWKMb;?KK@$X9!qB{o4KS6di66SqlY@BO$86cLZ(8AD7A61km*v$fE|Y6Xy@c zW1ax)ezcY-xMlbG6DvJWvcoqRC36a;MTJcAj5+S{#i`?$#gsT^@*TY@+abG~YI7^e zXz`n0+20lSmESh?+Cf+$kn*uAw}p@o0VoZ6={I`|9mg1gJcf08Enfv;(|m|`R?13< zdK1l4roU&ycI?hFVrd7EoWw7Pr}S0q!%4PK8ikbp)$*Kb6dTymNaXR zy6XKi=5<^t*WYk|0F{i96?3*7W zpXO^pVajmC8;m!s=UTC{b!Y}PzvrRwW)YiD9cRb37AuG#-YN{(_#MI(UVOgz+Sz5} zEl7zO{%GO9--w3I?3h|n-$>d~K;P;DxEDtm4;sI+6U#Go6HeRkK^@A`_k$7b<(9I(I& z6+1B_YDsQtl*}D=HG|KlH9mX>g@JS0jl~ z4$j}NOtKUBG0D2u6Us_p#FX>h7C^LyrZ}GVRX6V&dA!vO$4+ik8pW?AkrmWa6~^6X z2B?2@Jvv|OZowCATCY%@0&m$??ir^#ce1UfWJR-`a=zu@$oBG@>TltksN!@v4B=tL zZUIbB2_9DkyVt2pq2>A!=2F(G+X^-0sn9PN4!URrJA9+W7!Qa7|16dMA;@Mt8~8|r z4~0H=yHdJ}e>Tx(ko^Tmp_+Bna4}ZBRkkV@cg#!fK?}E1b~HWN#yq~i63=5kqIv(a z|0Z?q&245@vjBV6%cOglN3h-4yDC2Y2Xi<#sHA4s6Rlkwe|s-;Jl}0vF97ELT_hu}QOVE3HwP}Lw%%jt<#_yJRF3VgxhUya|CgZoi z@xEN2X=wtmD~#0aXc@jqjO;Ckhvc16WGPJLO!&uqw`%#U2(L}2C_L(1wdIEnY*)q# z?A3j}p##IfV}P1JT@Y#Vp9)y?oF?E@`<}nAnfeh{&cD--ZlyoS@H-{6j1UMbS+L9r%7Fb#4S1!Nr<@-cMM`xRk5|RAy2?O-sKxRLWtu=f2MQH zBD!CLO+=LLf#MqE6i^;1A$&*#mb3|`u3VDsv&XX2K8S1|1e6{7a)E8|8=m`p2patSRF_&hG$@j8}l&jVw4U5`k4KNW1 z3#`z?VBk-aRf@hDUG1)?yqys>gwBFFFUb>5MuWCVY*i<}kBO6y0*OlChJ^Mn`G+RN zW}1I6$a&XcL1aa{aq8J0iS8&K2>LGBcD43Rnpe^ssnKvGu2Jwp;VQ35^#CVZOOG~Q zQxbwUOxtsGA-7|d+B(mY@#CRoWc>FHU$&$@y{cDA?=fmf`nKek;2EEI>TR_(t?yEm zbHCAV+;ASC4e;mAinOW%fhy;|rJOE#BjY*Spg*EFcqV4skRfed$}zW9ICO(VZ}$r+ z5BB}jH^ZCdA{RE3CXqYg7SSF_d+{1uLZ_J(Qb&A9$Wq4`3Y7g6qXT*`KwM(odBHw` zC!Z(%{Yr-DxGm9hfp|vG+DL=FlZahjJ^bVe3X4`mS#EzzXLs5b;d$ zM&RJ(K1eMhVRdJcT1@3^7hv3;!5!5(_LlH_oo%MY)FmmRqi0|3yW*D>`#tbq+LKB4 z@`l0jwyVm`=I(;LGZ1LO&>gr{EopDE%+YG?lu!ecRLkw}qFCkB7X^-?bQ`wM(U8}b zBQMq}@8B)fQK5OLb#cmoxof`sazshSO`*eUDA}TyT-ZciX;fhNOC~7t3YO zVC&FD_tyl9`;NqZfxEA7iC97%ma&xWOVvls?%01Qba?MM1Wf1e)Ah0#ygA_>NV)m$ zugtm&wnu-1(f_()FhQ8L5{2pQQ6oFD0!^Ii@j31XT!%FG5p$y0s6h{U zIXJPszl^?*;AcDrD@uT?aW}{14pue*MpBob9GC1jQd`5@U()tO(ztf% zcvzw!f7rD66QcMjqds;$J8d~G&k%n6C^a-#J+-P)L5l$aO6%Xc2==)sbv)xY(M(^U zuy?~Fr4%7(xDcsdZH%Ds_R(h1_Ru9)W~Y_GbJuldd_d`yk_MeT=XR?*``H*^8u$my z<+pmKMTaAB`v=yReDkw)RQ+5Bl*C935EHHk12&}Hm6)X`5n54ld@9^~hfmG0@uV>K0}1?~v{hKbt4> zI5}JQUS#YXByo(?q>L34)#UQ`jqY6lJyq zY<7jcMn>|s%gpQd79@)W4uij;M`*HiP0#h?5p&YAYI^{YIFyybed-hAO-n zNFTU;Qo_olf+&kpidI!kbViS;^fC-T}D)feY~Fn#jx>v|CUj%RqQB)$Qw~!xP>;OyAf4jd>=Tp zC-St@`wVEptmAtRvXZ!cp29B^YZGj=6>R~<4E>I~axyIP2P0y8Ntb{1(VuW_kq?a( z_XoXIJzMtpE4&PTd*A~qe{qVZmc?0E0!_=@JG6iN3ccnUpKOjTJ@@MZG8v=LoS@44 z)H|!0A^}%aj(B4ELF~6r?L`Y)1&n)*_V`+8pHk<1ZM}A<+}*A|YJQ~}UNzE^gy-Vm ztkZ(f7$rqwxp)dB$W&gEG^*f$RMRCK$NW`6DLJBX{gCG-E3}Z%h*Ob2sXI$zVLn{Q z>0w-RUc`r`&Fc#R5SEJxacgQyCwir4V`u?A80Kuue2q0bQMkR#YAbvvza$o62FzaJ z`h)}N==S9R@-Ps;u-c*2ff&PlJ?^CPm#a%U9jXdBdNNcLb7hJHK^FJvxIq6|A-oL^heM{ss&o5ns|6Ee!Jy-vv$eyWr z$}Q*(|MPtqGqrl%xFarD0HFIFq(c*Au^&aKqH4Ot^SC>6b|YM#)!Qb@95*2X&K8f=qa?k+}dcNjwiS-4IT>oqBc z-kAnyT%=&x=z~j?8x5s6+yQq6Zc|C8Lb5OBv~H&&-?9Th)PNBtiThsR6D&6@c|M#H zMmq}rShj;v{(Px7al+di3w=eW{ABb3djK`ShEXB}yGxrGz{jiCZ2RGG zPJtVNWw={Z_jLnyXE_moPJW&*La{&_3eVgBB{AsDJTD-*M>8ixBb&dzyFl|&p(Gq$ z;dc@W1DH$DPi2IIxEEbj!e2@2Tg7DUGLhzQL2y)Alg-Z3!62_>5keUiXi?@f*_gi0 zg~Vd*rkW;d3%LC|OP=Y>KG1H1}%Ff{*81<-XZFKXH`VQFZf0&&qS`a>oLk zF+hlF=`*FntDSPrMLoms`m?5-gAyVP3$)n8GSnp$Bvd1%jnQo8ilg6*R2}|^)*Tfm zh8di&1KbKk=OrlO)>oEoUlTAq?Jr3th7OKIa_PCHzU{3+7Q4723CgxD!ZCAT4-W?$ zHz^_ws}<NTv0dceMPbN>P!rj0JmAt4QYyM_f>})sr1*>KVSIznaL0C zL)C2_@};f;=4$jLYyR0NLm%aQb8+`*OT^(mB6zj#_SA(Ux@cVZ;K4QeSwZi3>2EG? zCUF;&(KSNdNfa;|`6F(oJK5ixG+;FNsKzy$3z(23Tznjf`@=TFNnE(F&^h%JLql!# zY9iC#l1^#Q;Wd*&e69~At0@8cZPyr=4b!YSEmR$4Nsw7qKskdpbwA zq_6R|_IZ8TBw)leGh8-WDLl8H>A8?=oAVy*ymh%bPKN6KE{3^SLIgLnX~5aBiYt_mK9mWYvXvHdYminV^jk7Hi;xG`c&-+uQpm#jvbPBP`rolUcoUy{Q z{o7hxPxKkzzW~A_L3g54g7{)QCsPpFgfw6@@)mXO4$B)|sO9q|0p^8ev8iXc_n&1$ zQStAKfEvR1ck)z5qBIQ@V9k%gmbwtzES+YSTFfntzZ2Pq6D#V~n8hNP4$en=STFm+ zdN5i|R~5OS5ub#l>SqWTkPr5A9DSvun0&z+q_r@j<^4(%x%G*^G1MwR83m%kvPFH# zOUd^}fVW?C9)Rt8_|iDltggvs;(7DH`1XsQz{8?Q zE@hMUudb$@u_pLkDE1`eJs&n4?tBad!v_lN`U`rc;O08wTu~Cjjdej+in>AefTb@W z_4dyiQQAfcy}9gF`dUOOpPgW@CrP{U19AeOR-~01ozJUNJkJ4M10q1K3JGr0#`1S| z!}c3IF`|j8w&|Y}`fWGovS1-l#l;efb+rKs=3f)9r%YL~l9j6HgMh1&GAb}VwKm2q zBSwC0x0)#VVW=g0O-*)gKdghP`3sQmxy0MT=AopAW~^ik^bQ3^$GIpswy2v%Z)G26 zu?jD9}je)88s^uo7It(Rsz9nGjqfgrvcJRQNGk0o_-*w`~k^yKoR z(u1)~3H@nP)>lGu6Q>#?Ik*Mm2rzO-2VQ3eP+Lp8B?SBt`jAY+TSig&J%22M!&s;= zQgXpR|J6?Xnz_;Ka^s0NYH@o6dJ-Gzktp04P6wa-N|A~SN(RYgi|$a2vqdv#?MF9= z{9OIRxf?blH!!w|0#XS0Ar3d5qep$(t2dkfW+74yS+D|<)Sqt`G03e%$Q8Avty zXWY+W6ADUcDV3qXZv6Q^Y=E}}rurv$$D5h=D=;91*?!2}XtG)`C2O^lhccGFGCN_Dbi+!w57lcm5H zX6X%OBPj-`aEXp5LuSvF3FmKYuZBxET58YgaZKyp#@BWy3C`g7Yg#BTxW-cx-59uY zS%z>G9<2!Q-v<1|Xb=R;GEuUz1Zv4)pnEgv{1!jE`z%Zw6VNOLzkm^lmh=R4_`kvt zKSY~$7*?lcMNJ6206(gcJ6~v z1Z0<1_wZGwh$dQ1A^7?GDY!sX=Qx)EpE3!$$sL!2&odrkl=^}IA#>BVz#O#!F4C5- zQ%NuIH*a38_F7MbzEN96#Ag?GzVmy@<@Z-?qO({c{d;rz)wum30J$AI+jA`og+<{G zSg#P3txOAsA*$INkyYQC)5*>eT9+>$K#R9NfNKaH)KJqPV{Q6|&WfN~oiaa}D#Yyt z)>b=y4~fQDpv`DI{q@1J-q9Y)7U)NH(!M@cf5dUV_Op;N0EZDHUTp@Qjx}aY{qL?p zgF87`ne&+t=8B*5N?RfBCyA|IU}Jgm*+^s*Bt)N0_0B})X$i3)MhVCWq%mwxY!83r z_$a~m1vrArD{bn>)R-@9}lUT?|)z-~P$jGn$nlt~4hna_o-`Dy)ym z@FdQ10^FNuoII3gtc1-P`nCAn#Q(j+0xaPO`+1+EUeH{KLB>&KDD{y4iU!C?D2kVf H83q0i;^