From 05a613cb482e5fccdc5c3c71df1a20153e3b5b3a Mon Sep 17 00:00:00 2001 From: George Ho <19851673+eigenfoo@users.noreply.github.com> Date: Wed, 26 Jun 2019 07:21:03 +0000 Subject: [PATCH] BLD: finish ANOVA chapter (#13) * MAINT: remove patsy and republish * BUG: fix all links * BLD: one way anova needs work? * DOC: document plot code * BLD: working on anova tables... * MAINT: finish one-way anova --- index.html | 330 ++++++++++++++++++++++++++++--------- plots.py | 7 + tests-as-linear.ipynb | 369 ++++++++++++++++++++++++++++++------------ 3 files changed, 526 insertions(+), 180 deletions(-) diff --git a/index.html b/index.html index 6e4e239..4c7bc03 100644 --- a/index.html +++ b/index.html @@ -13226,11 +13226,9 @@

Contents

  • 8 Sources and further equivalences
  • 9 Teaching materials and a course outline
  • -
  • 10 Limitations -
  • - @@ -13279,6 +13277,7 @@

    2 Settings and toy data
    import numpy as np
     import pandas as pd
     import scipy
    +import statsmodels.api as sm
     import statsmodels.formula.api as smf
     import matplotlib.pyplot as plt
     import plots
    @@ -14221,7 +14220,7 @@ 

    4.2.3 Python code: Wilcoxon m @@ -14785,32 +14784,17 @@

    6.1.2 Example data
    num_points = 20
    -df = pd.DataFrame()
    -df["y"] = np.concatenate([
    -    np.random.normal(0.0, 1, num_points),
    -    np.random.normal(1.0, 1, num_points),
    -    np.random.normal(0.5, 1, num_points),
    -])
     
    +a = np.random.normal(0.0, 1, num_points)
    +b = np.random.normal(1.0, 1, num_points)
    +c = np.random.normal(0.5, 1, num_points)
    +
    +df = pd.DataFrame()
    +df["y"] = np.concatenate([a, b, c])
     df["group"] = list("".join([num_points * char for char in "abc"]))
     df = df.join(pd.get_dummies(df.group, prefix="group", drop_first=True).astype(np.float64))
    -
    - -
    - - - - - - -
    -
    - -
    -
    -
    df.head()
    +df.head()
     
    @@ -14900,7 +14884,7 @@

    6.1.2 Example data
    -

    With group a's intercept omni-present, see how exactly one other parameter is added to predict value for group b and c in a given row (scroll to he end). Thus data points in group b never affect the estimates in group c.

    +

    With group a's intercept omni-present, see how exactly one other parameter is added to predict value for group b and c in a given row. Thus data points in group b never affect the estimates in group c.

    @@ -14921,10 +14905,7 @@

    6.1.3 Python code: one-way ANOVA
    - + + + +
    +
    + +
    +
    +
    _, p = scipy.stats.kruskal(a, b, c)
    +res = smf.ols("y ~ 1 + group_b + group_c", signed_rank_df).fit()
    +
    + +
    +
    +
    + +
    + + + +
    +
    + +
    +
    +
    table = pd.DataFrame(index=["p value", "df"])
    +table["scipy.stats.kruskal"] = [p, None]
    +table["ols (y ~ 1 + group_b + group_c, signed rank)"] = [res.f_pvalue, res.df_model]
     
    -res = smf.ols("y ~ 1 + group_b + group_c", df).fit()  # TODO rank
    +table.T
     
    +
    +
    + +
    +
    + + +
    + + + + +
    +
    + + + + + + + + + + + + + + + + + + + + + +
    p valuedf
    scipy.stats.kruskal0.019324NaN
    ols (y ~ 1 + group_b + group_c, signed rank)0.0180582.0
    +
    +
    + +
    +
    -

    6.2 Two-way ANOVA

    6.2.1 Theory: As linear models

    Model: one mean per group (main effects) plus these means multiplied across factors (interaction effects). The main effects are the one-way ANOVAs above, though in the context of a larger model. The interaction effect is harder to explain in the abstract even though it's just a few numbers multiplied with each other. I will leave that to the teachers to keep focus on equivalences here :-)

    +

    6.2 Two-way ANOVA

    6.2.1 Theory: As linear models

    Model: one mean per group (main effects) plus these means multiplied across factors (interaction effects). The main effects are the one-way ANOVAs above, though in the context of a larger model. The interaction effect is harder to explain in the abstract even though it's just a few numbers multiplied with each other. I will leave that to the teachers to keep focus on equivalences here :-)

    Switching to matrix notation:

    $y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 \qquad \mathcal{H}_0: \beta_3 = 0$

    Here $\beta_i$ are vectors of betas of which only one is selected by the indicator vector $X_i$. The $\mathcal{H}_0$ shown here is the interaction effect. Note that the intercept $\beta_0$, to which all other $\beta$s are relative, is now the mean for the first level of all factors.

    @@ -15072,10 +15136,111 @@

    6.2 Two-way ANOVA
    -
     
    +
    df["mood"] = (df.shape[0] // 2) * ["happy", "sad"]
    +df = df.join(pd.get_dummies(df.mood, prefix="mood").astype(np.float64))
    +
    +df.head()
     
    +
    +

    + +
    +
    + + +
    + + + + +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ygroupgroup_bgroup_cmoodmood_happymood_sad
    01.080830a0.00.0happy1.00.0
    1-2.316629a0.00.0sad0.01.0
    2-0.138365a0.00.0happy1.00.0
    32.003775a0.00.0sad0.01.0
    42.051200a0.00.0happy1.00.0
    +
    +
    + +
    +
    @@ -15096,7 +15261,7 @@

    6.2 Two-way ANOVA
    -
     
    +
    # TODO
     
    @@ -15419,6 +15584,12 @@

    8 Sources and further equivalencesThis article by Kristoffer Magnusson on RM-ANOVA and growth models using lme4::lmer mixed models.
  • This post by Thom Baguley on the Friedman test. That post was actually the one that inititated my exploration of linear equivalences to "non-parametric"" tests which ultimately pushed me over the edge to write up the present article.
  • + +

    +

    +
    +
    +

    9 Teaching materials and a course outline

    Most advanced stats books (and some intro-books) take the "everything is GLMM" approach as well. However, the "linear model" part often stays at the conceptual level, rather than being made explicit. I wanted to make linear models the tool in a concise way. Luckily, more beginner-friendly materials have emerged lately:

    • Russ Poldrack's open-source book "Statistical Thinking for the 21st century" (start at chapter 5 on modeling)

      @@ -15442,55 +15613,55 @@

      9 Teaching materials and a co

    • Confidence/credible intervals on the parameters. Stress that the Maximum-Likelihood estimate is extremely unlikely, so intervals are more important.

    • -
    • Briefly introduce $R^2$ for the simple regression models above. Mention in passing that this is called the Pearson and Spearman correlation coefficients.

      +
    • Briefly introduce $R^2$ for the simple regression models above. Mention in passing that this is called the Pearson and Spearman correlation coefficients.

    • - -
      1. Special case #1: One or two means (t-tests, Wilcoxon, Mann-Whitney):

          -
        1. One mean: When there is only one x-value, the regression model simplifies to $y = b$. If $y$ is non-metric, you can rank-transform it. Apply the assumptions (homoscedasticity doesn't apply since there is only one $x$). Mention in passing that these intercept-only models are called one-sample t-test and Wilcoxon Signed Rank test respectively.

          +
        2. One mean: When there is only one x-value, the regression model simplifies to $y = b$. If $y$ is non-metric, you can rank-transform it. Apply the assumptions (homoscedasticity doesn't apply since there is only one $x$). Mention in passing that these intercept-only models are called one-sample t-test and Wilcoxon Signed Rank test respectively.

        3. -
        4. Two means: If we put two variables 1 apart on the x-axis, the difference between the means is the slope. Great! It is accessible to our swizz army knife called linear modeling. Apply the assumption checks to see that homoscedasticity reduces to equal variance between groups. This is called an independent t-test. Do a few worked examples and exercises, maybe adding Welch's test, and do the rank-transformed version, called Mann-Whitney U.

          +
        5. Two means: If we put two variables 1 apart on the x-axis, the difference between the means is the slope. Great! It is accessible to our swizz army knife called linear modeling. Apply the assumption checks to see that homoscedasticity reduces to equal variance between groups. This is called an independent t-test. Do a few worked examples and exercises, maybe adding Welch's test, and do the rank-transformed version, called Mann-Whitney U.

        6. -
        7. Paired samples: Violates the independence assumption. After computing pairwise differences, this is equivalent to 2.1 (one intercept), though it is called the paired t-test and Wilcoxon's matched pairs.

          +
        8. Paired samples: Violates the independence assumption. After computing pairwise differences, this is equivalent to 2.1 (one intercept), though it is called the paired t-test and Wilcoxon's matched pairs.

      2. Special case #2: Three or more means (ANOVAs)

          -
        1. Dummy coding of categories: How one regression coefficient for each level of a factor models an intercept for each level when multiplied by a binary indicator. This is just extending what we did in 2.1. to make this data accessible to linear modeling.

          +
        2. Dummy coding of categories: How one regression coefficient for each level of a factor models an intercept for each level when multiplied by a binary indicator. This is just extending what we did in 2.1. to make this data accessible to linear modeling.

        3. -
        4. Means of one variable: One-way ANOVA.

          +
        5. Means of one variable: One-way ANOVA.

        6. -
        7. Means of two variables: Two-way ANOVA.

          +
        8. Means of two variables: Two-way ANOVA.

      3. -
      -
      1. Special case #3: Three or more proportions (Chi-Square)

        1. Logarithmic transformation: Making multiplicative models linear using logarithms, thus modeling proportions. See this excellent introduction to the equivalence of log-linear models and Chi-Square tests as models of proportions. Also needs to introduce (log-)odds ratios. When the multiplicative model is made summative using logarithms, we just add the dummy-coding trick from 3.1, and see that the models are identical to the ANOVA models in 3.2 and 3.3, only the interpretation of the coefficients have changed.

        2. -
        3. Proportions of one variable: Goodness of fit.

          +
        4. Proportions of one variable: Goodness of fit.

        5. -
        6. Proportions of two variables: Contingency tables.

          +
        7. Proportions of two variables: Contingency tables.

      2. -
      -
      1. Hypothesis testing:

          -
        1. Hypothesis testing as model comparisons: Hypothesis testing is the act of choosing between a full model and one where a parameter is fixed to a particular value (often zero, i.e., effectively excluded from the model) instead of being estimated. For example, when fixing one of the two means to zero in the t-test, we study how well a single mean (a one-sample t-test) explains all the data from both groups. If it does a good job, we prefer this model over the two-mean model because it is simpler. So hypothesis testing is just comparing linear models to make more qualitative statements than the truly quantitative statements which were covered in bullets 1-4 above. As tests of single parameters, hypothesis testing is therefore less informative However, when testing multiple parameters at the same time (e.g., a factor in ANOVA), model comparison becomes invaluable.

          +
        2. Hypothesis testing as model comparisons: Hypothesis testing is the act of choosing between a full model and one where a parameter is fixed to a particular value (often zero, i.e., effectively excluded from the model) instead of being estimated. For example, when fixing one of the two means to zero in the t-test, we study how well a single mean (a one-sample t-test) explains all the data from both groups. If it does a good job, we prefer this model over the two-mean model because it is simpler. So hypothesis testing is just comparing linear models to make more qualitative statements than the truly quantitative statements which were covered in bullets 1-4 above. As tests of single parameters, hypothesis testing is therefore less informative However, when testing multiple parameters at the same time (e.g., a factor in ANOVA), model comparison becomes invaluable.

        3. Likelihood ratios: Likelihood ratios are the swizz army knife which will do model comparison all the way from the one-sample t-test to GLMMs. BIC penalizes model complexity. Moreover, add priors and you've got Bayes Factors. One tool, and you're done. I've used LRTs in the ANOVAs above.

      + +
    +
    +
    +
    +

    10 Limitations

    I have made a few simplifications for clarity:

    1. I have not covered assumptions in the examples. This will be another post! But all assumptions of all tests come down to the usual three: a) independence of data points, b) normally distributed residuals, and c) homoscedasticity.

      @@ -15499,7 +15670,7 @@

      10 Limitationsrobust models would be preferable, but fail to show the equivalences.

    2. -
    3. Several named tests are still missing from the list and may be added at a later time. This includes the Sign test (require large N to be reasonably approximated by a linear model), Friedman as RM-ANOVA on rank(y), McNemar, and Binomial/Multinomial. See stuff on these in the section on links to further equivalences. If you think that they should be included here, feel free to submit "solutions" to the github repo of this doc!

      +
    4. Several named tests are still missing from the list and may be added at a later time. This includes the Sign test (require large N to be reasonably approximated by a linear model), Friedman as RM-ANOVA on rank(y), McNemar, and Binomial/Multinomial. See stuff on these in the section on links to further equivalences. If you think that they should be included here, feel free to submit "solutions" to the github repo of this doc!

    @@ -15508,7 +15679,7 @@

    10 Limitations
    -

    11 Computing Environment

    +

    11 Computing Environment

    @@ -15541,7 +15712,6 @@

    11 Computing Environment diff --git a/plots.py b/plots.py index a9f0aba..c16e655 100644 --- a/plots.py +++ b/plots.py @@ -185,6 +185,7 @@ def dummy_coding_plot(): def one_way_anova_plot(): + # Construct data as a pd.DataFrame a = np.random.normal(0, 1, 20) b = np.random.normal(-2, 1, 20) c = np.random.normal(3, 1, 20) @@ -200,17 +201,21 @@ def one_way_anova_plot(): ) df["group_4"] = np.concatenate(3 * [np.zeros_like(d)] + [np.ones_like(d)]) + # ANOVA equivalent linear model res = smf.ols("y ~ 1 + group_2 + group_3 + group_4", df).fit() beta0, beta1, beta2, beta3 = res.params + # Plot fig, ax = plt.subplots(figsize=[10, 8]) ax.scatter(0 * np.ones_like(a), a, color="k") ax.scatter(1 * np.ones_like(b), b, color="k") ax.scatter(2 * np.ones_like(c), c, color="k") ax.scatter(3 * np.ones_like(d), d, color="k") + # Group 1 (baseline) ax.axhline(beta0, color="b", label=r"$\beta_0$ (group 1 mean)") + # Group 2 ax.plot([0.7, 1.3], 2 * [beta0 + beta1], color="navy") ax.plot( [0, 1], @@ -219,6 +224,7 @@ def one_way_anova_plot(): label=r"$\beta_1, \beta_2, ...$ (slopes/differences to $\beta_0$)", ) + # Group 3 ax.plot( [1.7, 2.3], 2 * [beta0 + beta2], @@ -227,6 +233,7 @@ def one_way_anova_plot(): ) ax.plot([1, 2], [beta0, beta0 + beta2], color="r") + # Group 4 ax.plot([2.7, 3.3], 2 * [beta0 + beta3], color="navy") ax.plot([2, 3], [beta0, beta0 + beta3], color="r") diff --git a/tests-as-linear.ipynb b/tests-as-linear.ipynb index 2a184c6..2c6147b 100644 --- a/tests-as-linear.ipynb +++ b/tests-as-linear.ipynb @@ -49,7 +49,7 @@ "- [8 Sources and further equivalences](#8-Sources-and-further-equivalences)\n", "- [9 Teaching materials and a course outline](#9-Teaching-materials-and-a-course-outline)\n", "- [10 Limitations](#10-Limitations)\n", - " - [11 Computing Environment](#11-Computing-Environment)" + "- [11 Computing Environment](#11-Computing-Environment)" ], "text/plain": [ "" @@ -113,6 +113,7 @@ "import numpy as np\n", "import pandas as pd\n", "import scipy\n", + "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf\n", "import matplotlib.pyplot as plt\n", "import plots\n", @@ -961,7 +962,7 @@ " NaN\n", " \n", " \n", - " smf.ols (y_sub_y2 ~ 1)\n", + " smf.ols (y_sub_y2 ~ 1, signed rank)\n", " -9.82\n", " 0.016212\n", " -2.490077\n", @@ -973,9 +974,13 @@ "

    " ], "text/plain": [ - " value p-values t-values 0.025 CI 0.975 CI\n", - "scipy.stats.wilcoxon NaN 0.017794 NaN NaN NaN\n", - "smf.ols (y_sub_y2 ~ 1) -9.82 0.016212 -2.490077 -17.745068 -1.894932" + " value p-values t-values 0.025 CI \\\n", + "scipy.stats.wilcoxon NaN 0.017794 NaN NaN \n", + "smf.ols (y_sub_y2 ~ 1, signed rank) -9.82 0.016212 -2.490077 -17.745068 \n", + "\n", + " 0.975 CI \n", + "scipy.stats.wilcoxon NaN \n", + "smf.ols (y_sub_y2 ~ 1, signed rank) -1.894932 " ] }, "execution_count": 20, @@ -986,7 +991,7 @@ "source": [ "utils.tabulate_results([None, p, None, None, None],\n", " res,\n", - " [\"scipy.stats.wilcoxon\", \"smf.ols (y_sub_y2 ~ 1)\"],\n", + " [\"scipy.stats.wilcoxon\", \"smf.ols (y_sub_y2 ~ 1, signed rank)\"],\n", " coeff=\"Intercept\")" ] }, @@ -1314,7 +1319,7 @@ " NaN\n", " \n", " \n", - " smf.ols (y ~ 1 + group)\n", + " smf.ols (y ~ 1 + group, signed rank)\n", " 25.68\n", " 0.016957\n", " 2.429124\n", @@ -1326,9 +1331,13 @@ "
    " ], "text/plain": [ - " value p-values t-values 0.025 CI 0.975 CI\n", - "scipy.stats.mannwhitneyu NaN 0.011351 NaN NaN NaN\n", - "smf.ols (y ~ 1 + group) 25.68 0.016957 2.429124 4.700777 46.659223" + " value p-values t-values 0.025 CI \\\n", + "scipy.stats.mannwhitneyu NaN 0.011351 NaN NaN \n", + "smf.ols (y ~ 1 + group, signed rank) 25.68 0.016957 2.429124 4.700777 \n", + "\n", + " 0.975 CI \n", + "scipy.stats.mannwhitneyu NaN \n", + "smf.ols (y ~ 1 + group, signed rank) 46.659223 " ] }, "execution_count": 26, @@ -1339,7 +1348,7 @@ "source": [ "utils.tabulate_results([None, p, None, None, None],\n", " res,\n", - " [\"scipy.stats.mannwhitneyu\", \"smf.ols (y ~ 1 + group)\"],\n", + " [\"scipy.stats.mannwhitneyu\", \"smf.ols (y ~ 1 + group, signed rank)\"],\n", " coeff=\"group\")" ] }, @@ -1361,7 +1370,7 @@ "t, p = scipy.stats.ttest_ind(data.y, data.y2, equal_var=False)\n", "\n", "# TODO: linear model with per-group variances\n", - "# See https://lindeloev.github.io/tests-as-linear/#52_welch%E2%80%99s_t-test" + "# See https://github.com/eigenfoo/tests-as-linear/issues/12" ] }, { @@ -1439,24 +1448,6 @@ "cell_type": "code", "execution_count": 29, "metadata": {}, - "outputs": [], - "source": [ - "num_points = 20\n", - "df = pd.DataFrame()\n", - "df[\"y\"] = np.concatenate([\n", - " np.random.normal(0.0, 1, num_points),\n", - " np.random.normal(1.0, 1, num_points),\n", - " np.random.normal(0.5, 1, num_points),\n", - "])\n", - "\n", - "df[\"group\"] = list(\"\".join([num_points * char for char in \"abc\"]))\n", - "df = df.join(pd.get_dummies(df.group, prefix=\"group\", drop_first=True).astype(np.float64))" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, "outputs": [ { "data": { @@ -1534,12 +1525,23 @@ "4 2.051200 a 0.0 0.0" ] }, - "execution_count": 30, + "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ + "num_points = 20\n", + "\n", + "a = np.random.normal(0.0, 1, num_points)\n", + "b = np.random.normal(1.0, 1, num_points)\n", + "c = np.random.normal(0.5, 1, num_points)\n", + "\n", + "df = pd.DataFrame()\n", + "df[\"y\"] = np.concatenate([a, b, c])\n", + "df[\"group\"] = list(\"\".join([num_points * char for char in \"abc\"]))\n", + "df = df.join(pd.get_dummies(df.group, prefix=\"group\", drop_first=True).astype(np.float64))\n", + "\n", "df.head()" ] }, @@ -1547,7 +1549,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "With group a's intercept omni-present, see how exactly one other parameter is added to predict `value` for group b and c in a given row (scroll to he end). Thus data points in group b never affect the estimates in group c." + "With group a's intercept omni-present, see how exactly one other parameter is added to predict `value` for group b and c in a given row. Thus data points in group b never affect the estimates in group c." ] }, { @@ -1561,20 +1563,17 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 30, "metadata": {}, "outputs": [], "source": [ - "F, p = scipy.stats.f_oneway(df[df[\"group\"] == \"a\"].y,\n", - " df[df[\"group\"] == \"b\"].y,\n", - " df[df[\"group\"] == \"c\"].y)\n", - "\n", + "F, p = scipy.stats.f_oneway(a, b, c)\n", "res = smf.ols(\"y ~ 1 + group_b + group_c\", df).fit()" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 31, "metadata": {}, "outputs": [ { @@ -1598,55 +1597,45 @@ " \n", " \n", " \n", - " value\n", - " p-values\n", - " t-values\n", - " 0.025 CI\n", - " 0.975 CI\n", + " F statistic\n", + " p value\n", + " df\n", " \n", " \n", " \n", " \n", " scipy.stats.f_oneway\n", - " NaN\n", + " 4.512932\n", " 0.015156\n", " NaN\n", - " NaN\n", - " NaN\n", " \n", " \n", - " smf.ols (y ~ 1 + group_b + group_c)\n", - " 0.801863\n", - " 0.012126\n", - " 2.591178\n", - " 0.182182\n", - " 1.421545\n", + " ols (y ~ 1 + group_b + group_c)\n", + " 4.512932\n", + " 0.015156\n", + " 2.0\n", " \n", " \n", "\n", "
    " ], "text/plain": [ - " value p-values t-values 0.025 CI \\\n", - "scipy.stats.f_oneway NaN 0.015156 NaN NaN \n", - "smf.ols (y ~ 1 + group_b + group_c) 0.801863 0.012126 2.591178 0.182182 \n", - "\n", - " 0.975 CI \n", - "scipy.stats.f_oneway NaN \n", - "smf.ols (y ~ 1 + group_b + group_c) 1.421545 " + " F statistic p value df\n", + "scipy.stats.f_oneway 4.512932 0.015156 NaN\n", + "ols (y ~ 1 + group_b + group_c) 4.512932 0.015156 2.0" ] }, - "execution_count": 32, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "# FIXME what to tabulate here?\n", - "utils.tabulate_results([None, p, None, None, None],\n", - " res,\n", - " [\"scipy.stats.f_oneway\", \"smf.ols (y ~ 1 + group_b + group_c)\"],\n", - " coeff=\"group_b\")" + "table = pd.DataFrame(index=[\"F statistic\", \"p value\", \"df\"])\n", + "table[\"scipy.stats.f_oneway\"] = [F, p, None]\n", + "table[\"ols (y ~ 1 + group_b + group_c)\"] = [res.fvalue, res.f_pvalue, res.df_model]\n", + "\n", + "table.T" ] }, { @@ -1667,17 +1656,88 @@ "### 6.1.4 Python code: Kruskal-Wallis" ] }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "signed_rank_df = df.copy()\n", + "signed_rank_df[\"y\"] = signed_rank(signed_rank_df[\"y\"])" + ] + }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ - "_, p = scipy.stats.kruskal(df[df[\"group\"] == \"a\"].y,\n", - " df[df[\"group\"] == \"b\"].y,\n", - " df[df[\"group\"] == \"c\"].y)\n", + "_, p = scipy.stats.kruskal(a, b, c)\n", + "res = smf.ols(\"y ~ 1 + group_b + group_c\", signed_rank_df).fit()" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    p valuedf
    scipy.stats.kruskal0.019324NaN
    ols (y ~ 1 + group_b + group_c, signed rank)0.0180582.0
    \n", + "
    " + ], + "text/plain": [ + " p value df\n", + "scipy.stats.kruskal 0.019324 NaN\n", + "ols (y ~ 1 + group_b + group_c, signed rank) 0.018058 2.0" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "table = pd.DataFrame(index=[\"p value\", \"df\"])\n", + "table[\"scipy.stats.kruskal\"] = [p, None]\n", + "table[\"ols (y ~ 1 + group_b + group_c, signed rank)\"] = [res.f_pvalue, res.df_model]\n", "\n", - "res = smf.ols(\"y ~ 1 + group_b + group_c\", df).fit() # TODO rank" + "table.T" ] }, { @@ -1688,7 +1748,7 @@ "\n", "### 6.2.1 Theory: As linear models\n", "\n", - "Model: one mean per group (main effects) plus these means multiplied across factors (interaction effects). The main effects are the [one-way ANOVA](#anova1)s above, though in the context of a larger model. The interaction effect is harder to explain in the abstract even though it's just a few numbers multiplied with each other. I will leave that to the teachers to keep focus on equivalences here :-)\n", + "Model: one mean per group (main effects) plus these means multiplied across factors (interaction effects). The main effects are the [one-way ANOVAs](#6.1-One-way-ANOVA-and-Kruskal-Wallis) above, though in the context of a larger model. The interaction effect is harder to explain in the abstract even though it's just a few numbers multiplied with each other. I will leave that to the teachers to keep focus on equivalences here :-)\n", "\n", "Switching to matrix notation:\n", "\n", @@ -1701,10 +1761,114 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 35, "metadata": {}, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ygroupgroup_bgroup_cmoodmood_happymood_sad
    01.080830a0.00.0happy1.00.0
    1-2.316629a0.00.0sad0.01.0
    2-0.138365a0.00.0happy1.00.0
    32.003775a0.00.0sad0.01.0
    42.051200a0.00.0happy1.00.0
    \n", + "
    " + ], + "text/plain": [ + " y group group_b group_c mood mood_happy mood_sad\n", + "0 1.080830 a 0.0 0.0 happy 1.0 0.0\n", + "1 -2.316629 a 0.0 0.0 sad 0.0 1.0\n", + "2 -0.138365 a 0.0 0.0 happy 1.0 0.0\n", + "3 2.003775 a 0.0 0.0 sad 0.0 1.0\n", + "4 2.051200 a 0.0 0.0 happy 1.0 0.0" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[\"mood\"] = (df.shape[0] // 2) * [\"happy\", \"sad\"]\n", + "df = df.join(pd.get_dummies(df.mood, prefix=\"mood\").astype(np.float64))\n", + "\n", + "df.head()" + ] }, { "cell_type": "markdown", @@ -1715,10 +1879,12 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 36, "metadata": {}, "outputs": [], - "source": [] + "source": [ + "# TODO" + ] }, { "cell_type": "markdown", @@ -1954,9 +2120,13 @@ " * [These slides by Christoph Scheepers](https://www.uni-tuebingen.de/fileadmin/Uni_Tuebingen/SFB/SFB_833/A_Bereich/A1/Christoph_Scheepers_-_Statistikworkshop.pdf) on Chi-Square as log-linear models.\n", " * [This notebook by Philip M. Alday](https://rpubs.com/palday/glm-test) on Chi-square, binomial, multinomial, and poisson tests as log-linear and logistic models. These \"equivalences\" are less exact than what I presented above, and were therefore not included here. They are still great for a conceptual understanding of these tests, though!\n", " * [This article by Kristoffer Magnusson](https://rpsychologist.com/r-guide-longitudinal-lme-lmer) on RM-ANOVA and growth models using `lme4::lmer` mixed models.\n", - " * [This post by Thom Baguley](https://seriousstats.wordpress.com/2012/02/14/friedman/) on the Friedman test. That post was actually the one that inititated my exploration of linear equivalences to \"non-parametric\"\" tests which ultimately pushed me over the edge to write up the present article.\n", - "\n", - "\n", + " * [This post by Thom Baguley](https://seriousstats.wordpress.com/2012/02/14/friedman/) on the Friedman test. That post was actually the one that inititated my exploration of linear equivalences to \"non-parametric\"\" tests which ultimately pushed me over the edge to write up the present article." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "# 9 Teaching materials and a course outline\n", "\n", "Most advanced stats books (and some intro-books) take the \"everything is GLMM\" approach as well. However, the \"linear model\" part often stays at the conceptual level, rather than being made explicit. I wanted to make linear models the *tool* in a concise way. Luckily, more beginner-friendly materials have emerged lately:\n", @@ -1972,7 +2142,6 @@ "Note that whereas the understanding of sampling and hypothesis testing is usually the first focus of mainstream stats courses, it is saved for later here to build upon students' prior knowledge, rather than throwing a lot of conceptually novel material at them.\n", "\n", "1. **Fundamentals of regression:**\n", - "\n", " 1. Recall from high-school: $y = a \\cdot x + b$, and getting a really good intuition about slopes and intercepts. Understanding that this can be written using all variable names, e.g., `money = profit * time + starting_money` or $y = \\beta_1x + \\beta_2*1$ or, suppressing the coefficients, as `y ~ x + 1`. If the audience is receptive, convey the idea of these models [as a solution to differential equations](https://magesblog.com/post/modelling-change), specifying how $y$ *changes* with $x$.\n", " \n", " 2. Extend to a few multiple regression as models. Make sure to include plenty of real-life examples and exercises at this point to make all of this really intuitive. Marvel at how briefly these models allow us to represent large datasets.\n", @@ -1983,42 +2152,43 @@ " \n", " 5. Confidence/credible intervals on the parameters. Stress that the Maximum-Likelihood estimate is extremely unlikely, so intervals are more important.\n", " \n", - " 6. Briefly introduce $R^2$ for the simple regression models above. Mention in passing that this is called [the Pearson and Spearman correlation coefficients](#correlation). \n", + " 6. Briefly introduce $R^2$ for the simple regression models above. Mention in passing that this is called [the Pearson and Spearman correlation coefficients](#3-Pearson-and-Spearman-correlation). \n", "\n", - " \n", "2. **Special case #1: One or two means (t-tests, Wilcoxon, Mann-Whitney):**\n", "\n", - " 1. **One mean:** When there is only one x-value, the regression model simplifies to $y = b$. If $y$ is non-metric, you can rank-transform it. Apply the assumptions (homoscedasticity doesn't apply since there is only one $x$). Mention in passing that these intercept-only models are called [one-sample t-test and Wilcoxon Signed Rank test respectively](#t1).\n", + " 1. **One mean:** When there is only one x-value, the regression model simplifies to $y = b$. If $y$ is non-metric, you can rank-transform it. Apply the assumptions (homoscedasticity doesn't apply since there is only one $x$). Mention in passing that these intercept-only models are called [one-sample t-test and Wilcoxon Signed Rank test respectively](#4.1-One-sample-t-test-and-Wilcoxon-signed-rank).\n", " \n", - " 2. **Two means:** If we put two variables 1 apart on the x-axis, the difference between the means is the slope. Great! It is accessible to our swizz army knife called linear modeling. Apply the assumption checks to see that homoscedasticity reduces to equal variance between groups. This is called an [independent t-test](#t2). Do a few worked examples and exercises, maybe adding Welch's test, and do the rank-transformed version, called Mann-Whitney U.\n", + " 2. **Two means:** If we put two variables 1 apart on the x-axis, the difference between the means is the slope. Great! It is accessible to our swizz army knife called linear modeling. Apply the assumption checks to see that homoscedasticity reduces to equal variance between groups. This is called an [independent t-test](#5.1-Independent-t-test-and-Mann-Whitney-U). Do a few worked examples and exercises, maybe adding Welch's test, and do the rank-transformed version, called Mann-Whitney U.\n", " \n", - " 3. *Paired samples:* Violates the independence assumption. After computing pairwise differences, this is equivalent to 2.1 (one intercept), though it is called the [paired t-test and Wilcoxon's matched pairs](#tpair).\n", + " 3. *Paired samples:* Violates the independence assumption. After computing pairwise differences, this is equivalent to 2.1 (one intercept), though it is called the [paired t-test and Wilcoxon's matched pairs](#4.2-Paired-samples-t-test-and-Wilcoxon-matched-pairs).\n", " \n", "3. **Special case #2: Three or more means (ANOVAs)**\n", "\n", - " 1. *[Dummy coding](#dummy) of categories:* How one regression coefficient for each level of a factor models an intercept for each level when multiplied by a binary indicator. This is just extending what we did in 2.1. to make this data accessible to linear modeling.\n", - " \n", - " 2. *Means of one variable:* [One-way ANOVA](#anova1).\n", - " \n", - " 3. *Means of two variables:* [Two-way ANOVA](#anova2).\n", + " 1. *[Dummy coding](#5.1.2-Theory:-Dummy-coding) of categories:* How one regression coefficient for each level of a factor models an intercept for each level when multiplied by a binary indicator. This is just extending what we did in 2.1. to make this data accessible to linear modeling.\n", " \n", + " 2. *Means of one variable:* [One-way ANOVA](#6.1-One-way-ANOVA-and-Kruskal-Wallis).\n", " \n", + " 3. *Means of two variables:* [Two-way ANOVA](#6.2-Two-way-ANOVA).\n", + "\n", "4. **Special case #3: Three or more proportions (Chi-Square)**\n", "\n", " 1. *Logarithmic transformation:* Making multiplicative models linear using logarithms, thus modeling proportions. See [this excellent introduction](https://www.uni-tuebingen.de/fileadmin/Uni_Tuebingen/SFB/SFB_833/A_Bereich/A1/Christoph_Scheepers_-_Statistikworkshop.pdf) to the equivalence of log-linear models and Chi-Square tests as models of proportions. Also needs to introduce (log-)odds ratios. When the multiplicative model is made summative using logarithms, we just add the dummy-coding trick from 3.1, and see that the models are identical to the ANOVA models in 3.2 and 3.3, only the interpretation of the coefficients have changed.\n", " \n", - " 2. *Proportions of one variable:* [Goodness of fit](#goodness).\n", + " 2. *Proportions of one variable:* [Goodness of fit](#7.1-Goodness-of-fit).\n", " \n", - " 3. *Proportions of two variables:* [Contingency tables](#contingency).\n", - "\n", + " 3. *Proportions of two variables:* [Contingency tables](#7.2-Contingency-tables).\n", "\n", "5. **Hypothesis testing:** \n", "\n", - " 1. *Hypothesis testing as model comparisons:* Hypothesis testing is the act of choosing between a full model and one where a parameter is fixed to a particular value (often zero, i.e., effectively excluded from the model) instead of being estimated. For example, when fixing one of the two means to zero in the [t-test](#t2), we study how well a single mean (a [one-sample t-test](#t1)) explains all the data from both groups. If it does a good job, we prefer this model over the two-mean model because it is simpler. So hypothesis testing is just comparing linear models to make more qualitative statements than the truly quantitative statements which were covered in bullets 1-4 above. As tests of single parameters, hypothesis testing is therefore less informative However, when testing multiple parameters at the same time (e.g., a factor in ANOVA), model comparison becomes invaluable.\n", + " 1. *Hypothesis testing as model comparisons:* Hypothesis testing is the act of choosing between a full model and one where a parameter is fixed to a particular value (often zero, i.e., effectively excluded from the model) instead of being estimated. For example, when fixing one of the two means to zero in the [t-test](#5.1-Independent-t-test-and-Mann-Whitney-U), we study how well a single mean (a [one-sample t-test](#4.1-One-sample-t-test-and-Wilcoxon-signed-rank)) explains all the data from both groups. If it does a good job, we prefer this model over the two-mean model because it is simpler. So hypothesis testing is just comparing linear models to make more qualitative statements than the truly quantitative statements which were covered in bullets 1-4 above. As tests of single parameters, hypothesis testing is therefore less informative However, when testing multiple parameters at the same time (e.g., a factor in ANOVA), model comparison becomes invaluable.\n", " \n", - " 2. *Likelihood ratios:* Likelihood ratios are the swizz army knife which will do model comparison all the way from the one-sample t-test to GLMMs. BIC penalizes model complexity. Moreover, add priors and you've got Bayes Factors. One tool, and you're done. I've used LRTs in the ANOVAs above.\n", - "\n", - "\n", + " 2. *Likelihood ratios:* Likelihood ratios are the swizz army knife which will do model comparison all the way from the one-sample t-test to GLMMs. BIC penalizes model complexity. Moreover, add priors and you've got Bayes Factors. One tool, and you're done. I've used LRTs in the ANOVAs above." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "# 10 Limitations\n", "\n", "I have made a few simplifications for clarity:\n", @@ -2029,19 +2199,19 @@ "\n", "3. I have not discussed inference. I am only including p-values in the comparisons as a crude way to show the equivalences between the underlying models since people care about p-values. Parameter estimates will show the same equivalence. How to do *inference* is another matter. Personally, I'm a Bayesian, but going Bayesian here would render it less accessible to the wider audience. Also, doing [robust models](https://en.wikipedia.org/wiki/Robust_statistics) would be preferable, but fail to show the equivalences.\n", "\n", - "4. Several named tests are still missing from the list and may be added at a later time. This includes the Sign test (require large N to be reasonably approximated by a linear model), Friedman as RM-ANOVA on `rank(y)`, McNemar, and Binomial/Multinomial. See stuff on these in [the section on links to further equivalences](#links). If you think that they should be included here, feel free to submit \"solutions\" to [the github repo](https://github.com/lindeloev/tests-as-linear/) of this doc!\n" + "4. Several named tests are still missing from the list and may be added at a later time. This includes the Sign test (require large N to be reasonably approximated by a linear model), Friedman as RM-ANOVA on `rank(y)`, McNemar, and Binomial/Multinomial. See stuff on these in [the section on links to further equivalences](#8-Sources-and-further-equivalences). If you think that they should be included here, feel free to submit \"solutions\" to [the github repo](https://github.com/lindeloev/tests-as-linear/) of this doc!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## 11 Computing Environment" + "# 11 Computing Environment" ] }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 37, "metadata": {}, "outputs": [ { @@ -2052,7 +2222,6 @@ "matplotlib==3.1.0\r\n", "numpy==1.16.4\r\n", "pandas==0.24.2\r\n", - "patsy==0.5.1\r\n", "scipy==1.3.0\r\n", "statsmodels==0.10.0\r\n" ]