diff --git a/foundations-errors.qmd b/foundations-errors.qmd index f93feecd..8e67b4bd 100644 --- a/foundations-errors.qmd +++ b/foundations-errors.qmd @@ -383,7 +383,7 @@ text(2.08, 0.21, "5%", cex = 1.2) ``` First, suppose the sample difference was larger than 0. -In a one-sided test, we would set $H_A:$ difference $> 0.$ If the observed difference falls in the upper 5% of the distribution, we would reject $H_0$ since the p-value would just be a the single tail. +In a one-sided test, we would set $H_A:$ difference $> 0.$ If the observed difference falls in the upper 5% of the distribution, we would reject $H_0$ since the p-value would just be the single tail. Thus, if $H_0$ is true, we incorrectly reject $H_0$ about 5% of the time when the sample mean is above the null value, as shown above. Then, suppose the sample difference was smaller than 0. diff --git a/foundations-mathematical.qmd b/foundations-mathematical.qmd index 10708bf5..569079e2 100644 --- a/foundations-mathematical.qmd +++ b/foundations-mathematical.qmd @@ -546,7 +546,7 @@ openintro::normTail(m = 0, s = 1, L = 0.43) We can also find the Z score associated with a percentile. For example, to identify Z for the $80^{th}$ percentile, we use `qnorm()` which identifies the **quantile** for a given percentage. The quantile represents the cutoff value. -(To remember the function `qnorm()` as providing a cutozff, notice that both `qnorm()` and "cutoff" start with the sound "kuh". +(To remember the function `qnorm()` as providing a cutoff, notice that both `qnorm()` and "cutoff" start with the sound "kuh". To remember the `pnorm()` function as providing a probability from a given cutoff, notice that both `pnorm()` and probability start with the sound "puh".) We determine the Z score for the $80^{th}$ percentile using `qnorm()`: 0.84. ```{r} @@ -1058,7 +1058,7 @@ When the sample size is sufficiently large, the normal approximation generally p ### Observed data -In Section @sec-caseStudyOpportunityCost we were introduced to the opportunity cost study, which found that students became thriftier when they were reminded that not spending money now means the money can be spent on other things in the future. +In @sec-caseStudyOpportunityCost we were introduced to the opportunity cost study, which found that students became thriftier when they were reminded that not spending money now means the money can be spent on other things in the future. Let's re-analyze the data in the context of the normal distribution and compare the results. ::: {.data data-latex=""} @@ -1144,8 +1144,8 @@ Next, let's turn our attention to the medical consultant case study. ### Observed data -In Section @sec-case-study-med-consult we learned about a medical consultant who reported that only 3 of their 62 clients who underwent a liver transplant had complications, which is less than the more common complication rate of 0.10. -In that work, we did not model a null scenario, but we will discuss a simulation method for a one proportion null distribution in Section sec-one-prop-null-boot, such a distribution is provided in @fig-MedConsNullSim-w-normal. +In @sec-case-study-med-consult we learned about a medical consultant who reported that only 3 of their 62 clients who underwent a liver transplant had complications, which is less than the more common complication rate of 0.10. +In that work, we did not model a null scenario, but we will discuss a simulation method for a one proportion null distribution in @sec-one-prop-null-boot, such a distribution is provided in @fig-MedConsNullSim-w-normal. We have added the best-fitting normal curve to the figure, which has a mean of 0.10. Borrowing a formula that we'll encounter in [Chapter -@sec-inference-one-prop], the standard error of this distribution was also computed: $SE = 0.038.$ diff --git a/foundations-randomization.qmd b/foundations-randomization.qmd index ce54ce48..fd6ba548 100644 --- a/foundations-randomization.qmd +++ b/foundations-randomization.qmd @@ -28,7 +28,7 @@ You may agree that there is almost always variability in data -- one dataset wil However, quantifying the variability in the data is neither obvious nor easy to do, i.e., answering the question "*how* different is one dataset from another?" is not trivial. First, a note on notation. -We generally use $p$ to denote a population proportion and $\hat{p}$ to a sample proportion. +We generally use $p$ to denote a population proportion and $\hat{p}$ to denote a sample proportion. Similarly, we generally use $\mu$ to denote a population mean and $\bar{x}$ to denote a sample mean. ::: {.workedexample data-latex=""} diff --git a/inf-model-applications.qmd b/inf-model-applications.qmd index c8071ddf..517d8a44 100644 --- a/inf-model-applications.qmd +++ b/inf-model-applications.qmd @@ -293,7 +293,7 @@ Interpret the interval in context.[^27-inf-model-applications-8] ::: [^27-inf-model-applications-8]: Because there were 1,000 bootstrap resamples, we look for the cutoffs which provide 50 bootstrap slopes on the left, 900 in the middle, and 50 on the right. - Looking at the bootstrap histogram, the rough 95% confidence interval is \$9 to \$13.10. + Looking at the bootstrap histogram, the rough 90% confidence interval is \$9 to \$13.10. For games that are new, the average price is higher by between \$9.00 and \$13.10 than games that are used, with 90% confidence. ### Cross-validation diff --git a/inf-model-logistic.qmd b/inf-model-logistic.qmd index bd5e14da..845218d4 100644 --- a/inf-model-logistic.qmd +++ b/inf-model-logistic.qmd @@ -124,7 +124,7 @@ ggplot(spam_pred, aes(x = .pred_1, y = spam)) + ``` We'd like to assess the quality of the model. -For example, we might ask: if we look at emails that we modeled as having 10% chance of being spam, do we find out 10% of the actually are spam? +For example, we might ask: if we look at emails that we modeled as having 10% chance of being spam, do we find out 10% of them actually are spam? We can check this for groups of the data by constructing a plot as follows: 1. Bucket the observations into groups based on their predicted probabilities. @@ -320,7 +320,7 @@ Using the example above and focusing on each of the variable p-values (here we w - $H_0: \beta_1 = 0$ given `cc`, `dollar`, and `urgent_subj` are included in the model - $H_0: \beta_2 = 0$ given `to_multiple`, `dollar`, and `urgent_subj` are included in the model - $H_0: \beta_3 = 0$ given `to_multiple`, `cc`, and `urgent_subj` are included in the model -- $H_0: \beta_4 = 0$ given `to_multiple`, `dollar`, and `dollar` are included in the model +- $H_0: \beta_4 = 0$ given `to_multiple`, `cc`, and `dollar` are included in the model The very low p-values from the software output tell us that three of the variables (that is, not `cc`) act as statistically discernible predictors in the model at the discernibility level of 0.05, despite the inclusion of any of the other variables. Consider the p-value on $H_0: \beta_1 = 0$. @@ -346,7 +346,7 @@ A full treatment of cross-validation and logistic regression models is beyond th Using $k$-fold cross-validation, we can build $k$ different models which are used to predict the observations in each of the $k$ holdout samples. As with linear regression (see @sec-inf-mult-reg-cv), we compare a smaller logistic regression model to a larger logistic regression model. The smaller model uses only the `to_multiple` variable, see the complete dataset (not cross-validated) model output in @tbl-emaillogmodel1. -The logistic regression model can be written as, where $\hat{p}$ is the estimated probability of being a spam email message: +The logistic regression model can be written as follows, where $\hat{p}$ is the estimated probability of being a spam email message. ```{r} #| include: false diff --git a/inf-model-mlr.qmd b/inf-model-mlr.qmd index f04a8d73..c7fb389c 100644 --- a/inf-model-mlr.qmd +++ b/inf-model-mlr.qmd @@ -260,7 +260,7 @@ What is the difference in total amount? ------------------------------------------------------------------------ -Two samples of coins with the same number of low coins (3), but a different number of total coins (4 vs 5) and a different total total amount (\$0.41 vs \$0.66). +Two samples of coins with the same number of low coins (3), but a different number of total coins (4 vs 5) and a different total amount (\$0.41 vs \$0.66). ```{r} #| label: lowsame @@ -283,7 +283,7 @@ What is the difference in total amount? ------------------------------------------------------------------------ -Two samples of coins with the same total number of coins (4), but a different number of low coins (3 vs 4) and a different total total amount (\$0.41 vs \$0.17). +Two samples of coins with the same total number of coins (4), but a different number of low coins (3 vs 4) and a different total amount (\$0.41 vs \$0.17). ```{r} #| label: totalsame diff --git a/inf-model-slr.qmd b/inf-model-slr.qmd index 181c8c6c..70eb650f 100644 --- a/inf-model-slr.qmd +++ b/inf-model-slr.qmd @@ -149,7 +149,7 @@ ggplot(sandwich3, aes(x = ad, y = rev)) + \vspace{-5mm} -@fig-sand-samp12 shows the two samples and the least squares regressions from fig-sand-samp on the same plot. +@fig-sand-samp12 shows the two samples and the least squares regressions from @fig-sand-samp on the same plot. We can see that the two lines are different. That is, there is **variability** in the regression line from sample to sample. The concept of the sampling variability is something you've seen before, but in this lesson, you will focus on the variability of the line often measured through the variability of a single statistic: **the slope of the line**. @@ -723,7 +723,7 @@ In America's two-party system (the vast majority of House members through histor In 2020 there were 232 Democrats, 198 Republicans, and 1 Libertarian in the House. To assess the validity of the claim related to unemployment and voting patterns, we can compile historical data and look for a connection. -We consider every midterm election from 1898 to 2018, with the exception of those elections during the Great Depression. +We consider every midterm election from 1898 to 2018, with the exception of the elections during the Great Depression. The House of Representatives is made up of 435 voting members. ::: {.data data-latex=""} diff --git a/inference-applications.qmd b/inference-applications.qmd index e570a09f..d61516c5 100644 --- a/inference-applications.qmd +++ b/inference-applications.qmd @@ -175,7 +175,7 @@ tsim_table |> - One-sample or differences from paired data: the observations (or differences) must be independent and nearly normal. For larger sample sizes, we can relax the nearly normal requirement, e.g., slight skew is okay for sample sizes of 15, moderate skew for sample sizes of 30, and strong skew for sample sizes of 60. - For a difference of means when the data are not paired: each sample mean must separately satisfy the one-sample conditions for the $t$-distribution, and the data in the groups must also be independent. -- Compute the point estimate of interest, the standard error, and the degrees of freedom For $df,$ use $n-1$ for one sample, and for two samples use either statistical software or the smaller of $n_1 - 1$ and $n_2 - 1.$ +- Compute the point estimate of interest, the standard error, and the degrees of freedom. For $df,$ use $n-1$ for one sample, and for two samples use either statistical software or the smaller of $n_1 - 1$ and $n_2 - 1.$ - Compute the T score and p-value. @@ -307,7 +307,7 @@ Remember that there are a total of 44 subjects in the study (22 English and 22 S There are two rows in the dataset for each of the subjects: one representing data from when they were shown an image with 4 items on it and the other with 16 items on it. Each subject was asked 10 questions for each type of image (with a different layout of items on the image for each question). The variable of interest to us is `redundant_perc`, which gives the percentage of questions the subject used a redundant adjective to identify "the blue triangle". -Note that the variable in "percentage", and we are interested in the average percentage. +Note that the variable is "percentage", and we are interested in the average percentage. Therefore, we will use methods for means. If the variable had been "success or failure" (e.g., "used redundant or didn't"), we would have used methods for proportions. diff --git a/inference-one-mean.qmd b/inference-one-mean.qmd index b8f61049..7e54a59f 100644 --- a/inference-one-mean.qmd +++ b/inference-one-mean.qmd @@ -679,7 +679,7 @@ pt(-2.10, df = 18) \vspace{-5mm} ::: {.workedexample data-latex=""} -What proportion of the𝑡-distribution with 20 degrees of freedom falls above 1.65? +What proportion of the 𝑡-distribution with 20 degrees of freedom falls above 1.65? ------------------------------------------------------------------------ diff --git a/inference-one-prop.qmd b/inference-one-prop.qmd index 6a646ea9..e7eaa1cb 100644 --- a/inference-one-prop.qmd +++ b/inference-one-prop.qmd @@ -127,7 +127,7 @@ The proportions that are equal to or less than $\hat{p} = 0.0484$ are shaded. The shaded areas represent sample proportions under the null distribution that provide at least as much evidence as $\hat{p}$ favoring the alternative hypothesis. There were `r medical_consultant_n_sim` simulated sample proportions with $\hat{p}_{sim} \leq 0.0484.$ We use these to construct the null distribution's left-tail area and find the p-value: -$$\text{left tail area} = \frac{\text{Number of observed simulations with }\hat{p}_{sim} \leq \text{ 00.0484}}{10000}$$ +$$\text{left tail area} = \frac{\text{Number of observed simulations with }\hat{p}_{sim} \leq \text{ 0.0484}}{10000}$$ Of the 10,000 simulated $\hat{p}_{sim},$ `r medical_consultant_n_sim` were equal to or smaller than $\hat{p}.$ Since the hypothesis test is one-sided, the estimated p-value is equal to this tail area: `r medical_consultant_p_val`. @@ -554,7 +554,7 @@ The single tail area which represents the p-value is 0.2776. Because the p-value is larger than 0.05, we do not reject $H_0.$ The poll does not provide convincing evidence that a majority of payday loan borrowers support regulations around credit checks and evaluation of debt payments. In @sec-two-prop-errors we discuss two-sided hypothesis tests of which the payday example may have been better structured. -That is, we might have wanted to ask whether the borrows **support or oppose** the regulations (to study opinion in either direction away from the 50% benchmark). +That is, we might have wanted to ask whether the borrowers **support or oppose** the regulations (to study opinion in either direction away from the 50% benchmark). In that case, the p-value would have been doubled to 0.5552 (again, we would not reject $H_0).$ In the two-sided hypothesis setting, the appropriate conclusion would be to claim that the poll does not provide convincing evidence that a majority of payday loan borrowers support or oppose regulations around credit checks and evaluation of debt payments. In both the one-sided or two-sided setting, the conclusion is somewhat unsatisfactory because there is no conclusion. diff --git a/inference-paired-means.qmd b/inference-paired-means.qmd index f5e86911..817b4fdb 100644 --- a/inference-paired-means.qmd +++ b/inference-paired-means.qmd @@ -637,7 +637,7 @@ That is, the original dataset would include the list of 68 price differences, an The bootstrap procedure for paired differences is quite similar to the procedure applied to the one-sample statistic case in @sec-boot1mean. In @fig-pairboot, two 99% confidence intervals for the difference in the cost of a new book at the UCLA bookstore compared with Amazon have been calculated. -The bootstrap percentile confidence interval is computing using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\$0.25, \$7.87). +The bootstrap percentile confidence interval is computed using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\$0.25, \$7.87). ::: {.guidedpractice data-latex=""} Using the histogram of bootstrapped difference in means, estimate the standard error of the mean of the sample differences, $\bar{x}_{diff}.$[^21-inference-paired-means-1] @@ -924,7 +924,7 @@ Create a 95% confidence interval for the average price difference between books ------------------------------------------------------------------------ -Conditions have already verified and the standard error computed in a previous Example.\ +Conditions have already been verified and the standard error computed in a previous Example.\ To find the confidence interval, identify $t^{\star}_{67}$ using statistical software or the $t$-table $(t^{\star}_{67} = 2.00),$ and plug it, the point estimate, and the standard error into the confidence interval formula: $$ diff --git a/inference-tables.qmd b/inference-tables.qmd index 4741e596..50c24dec 100644 --- a/inference-tables.qmd +++ b/inference-tables.qmd @@ -105,7 +105,7 @@ Obviously we observed fewer than this, though it is not yet clear if that is due If the questions were actually equally effective, meaning about 27.85% of respondents would disclose the freezing issue regardless of what question they were asked, about how many sellers would we expect to *hide* the freezing problem from the Positive Assumption group?[^18-inference-tables-2] ::: -[^18-inference-tables-2]: We would expect $(1 - 0.2785) \times 73 = 52.67.$ It is okay that this result, like the result from Example \ref{iPodExComputeExpAA}, is a fraction. +[^18-inference-tables-2]: We would expect $(1 - 0.2785) \times 73 = 52.67.$ It is okay that this result, like the result from the Example above, is a fraction. We can compute the expected number of sellers who we would expect to disclose or hide the freezing issue for all groups, if the questions had no impact on what they disclosed, using the same strategies employed in the previous Example and Guided Practice to compute expected counts. These expected counts were used to construct @tbl-ipod-ask-data-summary-expected, which is the same as @tbl-ipod-ask-data-summary, except now the expected counts have been added in parentheses. @@ -521,7 +521,7 @@ To get a sense for the statistic used in the chi-squared test, first compute the [^18-inference-tables-3]: The expected count for row one / column one is found by multiplying the row one total (234) and column one total (319), then dividing by the table total (699): $\frac{234\times 319}{699} = 106.8.$ Similarly for the second column and the first row: $\frac{234\times 380}{699} = 127.2.$ Row 2: 105.9 and 126.1. Row 3: 106.3 and 126.7. -Note, when analyzing 2-by-2 contingency tables (that is, when both variables only have two possible options), one guideline is to use the two-proportion methods introduced in Chapter \@ref(inference-two-props). +Note, when analyzing 2-by-2 contingency tables (that is, when both variables only have two possible options), one guideline is to use the two-proportion methods introduced in [Chapter -@sec-inference-two-props]. \clearpage diff --git a/inference-two-means.qmd b/inference-two-means.qmd index 270165bd..49ff01e2 100644 --- a/inference-two-means.qmd +++ b/inference-two-means.qmd @@ -396,7 +396,7 @@ Choose one of the bootstrap confidence intervals for the true difference in aver ------------------------------------------------------------------------ -Because neither of the 90% intervals (either percentile or SE) above overlap zero (note that zero is never one of the bootstrapped differences so 95% and 99% intervals would have given the same conclusion!), we conclude that the ESC treatment is substantially better with respect to heart pumping capacity than the treatment. +Because neither of the 90% intervals (either percentile or SE) above overlap zero (note that zero is never one of the bootstrapped differences so 95% and 99% intervals would have given the same conclusion!), we conclude that the ESC treatment is substantially better with respect to heart pumping capacity than the control. Because the study is a randomized controlled experiment, we can conclude that it is the treatment (ESC) which is causing the change in pumping capacity. ::: @@ -419,7 +419,7 @@ The `weight` variable represents the weights of the newborns and the `smoke` var ```{r} #| label: tbl-babySmokeDF -#| tbl-cap: Four cases from the `births14` dataset. The emoty cells indicate missing data. +#| tbl-cap: Four cases from the `births14` dataset. The empty cells indicate missing data. #| tbl-pos: H births14 |> select(-premie, -mature, -lowbirthweight, -whitemom, -marital) |> @@ -508,12 +508,12 @@ Since both conditions are satisfied, the difference in sample means may be model ```{r} #| label: fig-babySmokePlotOfTwoGroupsToExamineSkew #| fig-cap: | -#| The top panel represents birth weights for infants whose mothers smoked during -#| pregnancy. The bottom panel represents the birth weights for infants whose mothers +#| The left panel represents birth weights for infants whose mothers smoked during +#| pregnancy. The right panel represents the birth weights for infants whose mothers #| who did not smoke during pregnancy. #| fig-alt: | -#| The top panel represents birth weights for infants whose mothers smoked during -#| pregnancy. The bottom panel represents the birth weights for infants whose mothers +#| The left panel represents birth weights for infants whose mothers smoked during +#| pregnancy. The right panel represents the birth weights for infants whose mothers #| who did not smoke during pregnancy. #| fig-asp: 0.35 #| out-width: 100% @@ -602,7 +602,7 @@ Furthermore, health differences between babies born to mothers who smoke and tho [^20-inference-two-means-6]: You can watch an episode of John Oliver on [*Last Week Tonight*](youtu.be/6UsHHOCH4q8) to explore the present day offenses of the tobacco industry. Please be aware that there is some adult language. -A small note on the power of the independent t-test (recall the discussion of power in @sec-pow). It turns out that the independent t-test given here is often less powerful than the paired t-test discussed in @sec-mathpaired. That said, depending on how the data are collected, we don't always have mechanism for pairing the data and reducing the inherent variability across observations. +A small note on the power of the independent t-test (recall the discussion of power in @sec-pow). It turns out that the independent t-test given here is often less powerful than the paired t-test discussed in @sec-mathpaired. That said, depending on how the data are collected, we don't always have a mechanism for pairing the data and reducing the inherent variability across observations. \clearpage @@ -728,7 +728,7 @@ We are 95% confident that the heart pumping function in sheep that received embr ### Summary In this chapter we extended the single mean inferential methods to questions of differences in means. -You may have seen parallels from the chapters that extended a single proportion (Chapter \@ref(inference-one-prop)) to differences in proportions (Chapter \@ref(inference-two-props)). +You may have seen parallels from the chapters that extended a single proportion ([Chapter -@sec-inference-one-prop]) to differences in proportions ([Chapter -@sec-inference-two-props]). When considering differences in sample means (indeed, when considering many quantitative statistics), we use the t-distribution to describe the sampling distribution of the T score (the standardized difference in sample means). Ideas of confidence level and type of error which might occur from a hypothesis test conclusion are similar to those seen in other chapters (see [Chapter -@sec-foundations-decision-errors]). diff --git a/inference-two-props.qmd b/inference-two-props.qmd index 3c42adcb..00d30e6b 100644 --- a/inference-two-props.qmd +++ b/inference-two-props.qmd @@ -162,7 +162,7 @@ Here, the value of the statistic is: $\hat{p}_T - \hat{p}_C = 0.35 - 0.22 = 0.13 The bootstrap method applied to two samples is an extension of the method described in [Chapter -@sec-foundations-bootstrapping]. Now, we have two samples, so each sample estimates the population from which they came. -In the CPR setting, the `treatment` sample estimates the population of all individuals who have gotten (or will get) the treatment; the `control` sample estimate the population of all individuals who do not get the treatment and are controls. +In the CPR setting, the `treatment` sample estimates the population of all individuals who have gotten (or will get) the treatment; the `control` sample estimates the population of all individuals who do not get the treatment and are controls. @fig-boot2proppops extends @fig-boot1 to show the bootstrapping process from two samples simultaneously. ```{r} diff --git a/model-logistic.qmd b/model-logistic.qmd index 21f73cf6..d102102b 100644 --- a/model-logistic.qmd +++ b/model-logistic.qmd @@ -184,7 +184,7 @@ transformation(p_i) = \beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i} + \cdots + \be $$ We want to choose a **transformation**\index{transformation} in the equation that makes practical and mathematical sense. -For example, we want a transformation that makes the range of possibilities on the left hand side of the equation equal to the range of possibilities for the right hand side; if there was no transformation in the equation, the left hand side could only take values between 0 and 1, but the right hand side could take values outside well outside of the range from 0 to 1. +For example, we want a transformation that makes the range of possibilities on the left hand side of the equation equal to the range of possibilities for the right hand side; if there was no transformation in the equation, the left hand side could only take values between 0 and 1, but the right hand side could take values well outside of the range from 0 to 1. ```{r} #| include: false