-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI docs about 1-sided tests + default 90% CI for phi/V #366
Merged
Merged
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
a463197
CI docs about 1-sided tests + default 90% CI for phi/V
mattansb fd8e2cc
fix tests
mattansb d602ec8
New alternative arg (part 1)
mattansb 6b6a833
new CIs for rankES
mattansb 0817b2b
CI docs + ANOVA EF get new CI
mattansb ac5591b
new CIs for htests
mattansb 2ccd0f7
Revise effectsize_CIs for clarity
bwiernik 3d82e22
Update NEWS.md
bwiernik 06f7f8d
Fix HTML syntax
bwiernik b1cca46
remove one-sided-ci docs from individual docs.
mattansb a66c86e
minor edits ro CI docs
mattansb f02eba6
fix tests
mattansb 1eea82b
equi-test
mattansb 9bcc220
fix bugs in build
mattansb f6de72c
test one sided CI
mattansb 0aafc49
ref fix in docs
mattansb 510ab59
link to CIs
mattansb 2ba3852
cleanup examples in xtab
mattansb 693d1ed
show one sided CIs in vignettes
mattansb 1305743
version bump
mattansb 8ee89de
remove ref to master branch
mattansb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,87 +1,156 @@ | ||
#' Confidence Intervals | ||
#' Confidence (Compatibility) Intervals | ||
#' | ||
#' More information regarding Confidence Intervals and how they are computed in | ||
#' `effectsize`. | ||
#' More information regarding Confidence (Compatibiity) Intervals and how | ||
#' they are computed in *effectsize*. | ||
#' | ||
#' @section Confidence Intervals: | ||
#' Unless stated otherwise, confidence intervals are estimated using the | ||
#' Noncentrality parameter method; These methods searches for a the best | ||
#' non-central parameters (`ncp`s) of the noncentral t-, F- or Chi-squared | ||
#' distribution for the desired tail-probabilities, and then convert these | ||
#' `ncp`s to the corresponding effect sizes. (See full [effectsize-CIs] for | ||
#' more.) | ||
#' @section Confidence (Compatibility) Intervals (CIs): | ||
#' Unless stated otherwise, confidence (compatibility) intervals (CIs) are | ||
#' estimated using the noncentrality parameter method (also called the | ||
#' "pivot method"). This method finds the noncentrality parameter ("*ncp*") of | ||
#' a noncentral *t*, *F*, or chi-squared distribution that places the observed | ||
#' *t*, *F*, or chi-squared test statistic at the desired probability point of | ||
#' the distribution. For example, if the observed *t* statistic is 2.0, with 50 | ||
#' degrees of freedom, for which cumulative noncentral *t* distribution is | ||
#' *t* = 2.0 the .025 quantile (answer: the noncentral *t* distribution with | ||
#' *ncp* = .04)? After estimating these confidence bounds on the *ncp*, they are | ||
#' converted into the effect size metric to obtain a confidence interval for the | ||
#' effect size (Steiger, 2004). | ||
#' \cr\cr | ||
#' For additional details on estimation and troubleshooting, see [effectsize_CIs]. | ||
#' | ||
#' @section CIs and Significance Tests: | ||
#' "Confidence intervals on measures of effect size convey all the information | ||
#' in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility) | ||
#' intervals and p values are complementary summaries of parameter uncertainty | ||
#' given the observed data. A dichotomous hypothesis test could be performed | ||
#' with either a CI or a p value. The | ||
#' 100(\ifelse{latex}{\out{$1 - \alpha$}}{\ifelse{html}{\out{1 − α}}{1 - alpha}})% | ||
#' confidence interval contains all of the parameter values for which | ||
#' \ifelse{latex}{\out{$p > \alpha$}}{\ifelse{html}{\out{p > α}}{p > alpha}} | ||
#' for the current data and model. For example, a 95% confidence interval | ||
#' contains all of the values for which p > .05. | ||
#' \cr\cr | ||
#' Note that a confidence interval including 0 *does not* indicate no effect. | ||
#' Rather, it suggests that the observed data and background model assumptions | ||
#' combined do not clearly indicate against a parameter value of 0 (or any other | ||
#' value in the interval), with the level of this evidence defined by the chosen | ||
#' \ifelse{latex}{\out{$\alpha$}}{\ifelse{html}{\out{α}}{alpha}} level | ||
#' (Rafi & Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To | ||
#' infer no effect, additional judgments about what parameter values are "close | ||
#' enough" to 0 to be negligible are needed ("equivalence testing"; | ||
#' Bauer & Kiesser, 1996). | ||
#' | ||
#' @section CI Contains Zero: | ||
#' Keep in mind that `ncp` confidence intervals are inverted significance tests, | ||
#' and only inform us about which values are not significantly different than | ||
#' our sample estimate. (They do *not* inform us about which values are | ||
#' plausible, likely or compatible with our data.) Thus, when CIs contain the | ||
#' value 0, this should *not* be taken to mean that a null effect size is | ||
#' supported by the data; Instead this merely reflects a non-significant test | ||
#' statistic - i.e. the *p*-value is greater than alpha (Morey et al., 2016). | ||
#' @section One-Sided CIs: | ||
#' Typically, CIs are constructed as two-tailed intervals, with an equal | ||
#' proportion of the cumulative probability distribution above and below the | ||
#' interval. CIs can also be constructed as *one-sided* intervals, | ||
#' giving only a lower bound or upper bound. This is analogous to computing a | ||
#' 1-tailed *p* value or conducting a 1-tailed hypothesis test. | ||
#' \cr\cr | ||
#' Significance tests conducted using CIs (whether a value is inside the interval) | ||
#' and using *p* values (whether p < alpha for that value) are only guaranteed | ||
#' to agree when both are constructed using the same number of sides/tails. | ||
#' \cr\cr | ||
#' Most effect sizes are not bounded by zero (e.g., *r*, *d*, *g*). These | ||
#' typically involve *t*- or *z*-statistics and are generally tested using | ||
#' 2-tailed tests and 2-sided CIs. | ||
#' \cr\cr | ||
#' Some effect sizes are strictly positive--they have a minimum value of 0. | ||
#' For example, | ||
#' \ifelse{latex}{\out{$R^2$}}{\ifelse{html}{\out{*R*<sup>2</sup>}}{R^2}}, | ||
#' \ifelse{latex}{\out{$\eta^2$}}{\ifelse{html}{\out{η<sup>2</sup>}}{eta^2}}, | ||
#' and other variance-accounted-for effect sizes, as well as Cramer's *V* and | ||
#' multiple *R*, range from 0 to 1. These typically involve *F*- or | ||
#' \ifelse{latex}{\out{$\chi^2$}}{\ifelse{html}{\out{\chi;<sup>2</sup>}}{chi-squared}}-statistics | ||
#' and are generally tested using *1-tailed* tests. These test test whether the | ||
#' estimated effect size is *larger* than the hypothesized value (e.g., 0). The | ||
#' corresponding CI that yields the same significance decision is a *1-sided* CI | ||
#' estimating only a lower bound. This is the default CI computed by *effectsize* | ||
#' for these effect sizes, called by setting `alternative = "greater"`. | ||
#' \cr\cr | ||
#' For positive only effect sizes (Eta squared, Cramer's V, etc.; Effect sizes | ||
#' associated with Chi-squared and F distributions), and for one-sided CIs in | ||
#' general, this applies also to cases where the lower bound of the CI is equal | ||
#' to 0. For example: | ||
#' This lower bound interval indicates the smallest effect size that is not | ||
#' significantly different from the observed effect size. That is, it is the | ||
#' minimum effectsize compatible with the observed data, background model | ||
#' assumptions, and \ifelse{latex}{\out{$\alpha$}}{\ifelse{html}{\out{α}}{alpha}} | ||
#' level. The interval does not indicate a maximum effect size value; anything | ||
#' up to the maximum possible value of the effect size (e.g., 1) is in the interval. | ||
#' \cr\cr | ||
#' An alternative 1-sided CI that can be used to test against the maximum effect | ||
#' size value (e.g., is | ||
#' \ifelse{latex}{\out{$R^2$}}{\ifelse{html}{\out{*R*<sup>2</sup>}}{R^2}} | ||
#' significantly different from a perfect correlation of 1.0?) can by setting | ||
#' `alternative = "less"`. This estimates a CI with only an *upper* bound; | ||
#' anything from the minimum possible value of the effect size (e.g., 0) up to | ||
#' this upper bound is in the interval. | ||
#' \cr\cr | ||
#' To obtain a 2-sided interval with equal probability proportions above and | ||
#' below the interval, set `alternative = "two-sided"`. These intervals can | ||
#' be interpreted in the same way as other 2-sided intervals, such as those | ||
#' for *r*, *d*, or *g*. | ||
#' \cr\cr | ||
#' An alternative approach to aligning significance tests using CIs and 1-tailed | ||
#' *p* values that can often be found in the literature is to | ||
#' construct a 2-sided CI at a lower confidence level ( | ||
#' 100(\ifelse{latex}{\out{$1 - 2\alpha$}}{\ifelse{html}{\out{1 − 2α}}{1 - 2*alpha}})% | ||
#' = \ifelse{latex}{\out{$100 - 2 \times 5\% = 90\%$}}{\ifelse{html}{\out{100 − 2 %times; 5% = 90%}}{100 - 2*5% = 90%}}), | ||
#' estimates the lower bound and upper bound for the above 1-sided intervals | ||
#' simultaneously. These intervals are commonly reported when conducting equivalence | ||
#' tests. For example, a 90% 2-sided interval gives the bounds for an equivalence | ||
#' test with \ifelse{latex}{\out{$\alpha = .05$}}{\ifelse{html}{\out{α = .05}}{alpha = .05}}. | ||
#' However, be aware that this interval does not give 95% coverage for the | ||
#' underlying effect size parameter value. For that, construct a 95% 2-sided CI. | ||
#' For example: | ||
#' | ||
#' ```{r} | ||
#' fit <- aov(mpg ~ factor(gear) + factor(cyl), mtcars[1:6, ]) | ||
#' eta_squared(fit) | ||
#' data("hardlyworking") | ||
#' fit <- lm(salary ~ n_comps + age, data = hardlyworking) | ||
#' eta_squared(fit, ci = 0.95, alternative = "less") # default, lower 1-sided bound | ||
#' eta_squared(fit, ci = 0.95, alternative = "greater") # upper 1-sided bound | ||
#' eta_squared(fit, ci = 0.9, alternative = "two.sided") # both 1-sided bounds for alpha = .05 | ||
#' eta_squared(fit, ci = 0.95, alternative = "two.sided") # 2-sided bounds for alpha = .05 | ||
#' ``` | ||
#' Even more care should be taken when the *upper* bound is equal to 0 - this | ||
#' occurs when *p*-value is greater than 1-alpha/2 making, the upper bound | ||
#' cannot be estimated, and the upper bound is arbitrarily set to 0 (Steiger, | ||
#' 2004). | ||
#' | ||
#' @section CI Does Not Contain the Estimate: | ||
#' For very large sample sizes, the width of the CI can be smaller than the | ||
#' tolerance of the optimizer, resulting in CIs of width 0. This can also, | ||
#' result in the estimated CIs excluding the point estimate. For example: | ||
#' For very large sample sizes or effect sizes, the width of the CI can be | ||
#' smaller than the tolerance of the optimizer, resulting in CIs of width 0. | ||
#' This can also result in the estimated CIs excluding the point estimate. | ||
#' | ||
#' For example: | ||
#' ```{r} | ||
#' t_to_d(80, df_error = 4555555) | ||
#' ``` | ||
#' | ||
#' @section One-Sided CIs: | ||
#' "Confidence intervals on measures of effect size convey all the information | ||
#' in a hypothesis test, and more" (Steiger, 2004). Essentially, a hypothesis | ||
#' test can be preformed by inspecting the CI - if it excludes the null | ||
#' hypothesized value, then we can conclude that the effect size is | ||
#' significantly different from this value. For 2-sided tests, such as those | ||
#' typically involving *t*- or *z*-statistics, this is done by estimating an | ||
#' upper bound, which indicates values the effect size is significantly smaller | ||
#' than, and a lower bound, which indicates values the effect size is | ||
#' significantly larger than. | ||
#' \cr\cr | ||
#' However, one-sided hypothesis tests, such as those involving *F*- or | ||
#' \eqn{\chi^2}-statistics (or one-tailed *t*- or *z*-tests), test if the | ||
#' estimated effect size is *larger* than the null hypothesized value. | ||
#' Accordingly, the constructed CI is constructed by estimating only a *lower | ||
#' bound*, which indicates values the effect size is significantly larger than, | ||
#' whereas the *upper bound* is fixed at the maximal possible value of the | ||
#' effect size, since there is no value *above* the estimated effect size that | ||
#' is significantly *smaller* than it. (And vice versa for one-sided tests of | ||
#' inferiority.) This is why across `effectsize`, effect sizes associated with | ||
#' *F*- or \eqn{\chi^2}-statistics (Cramer's *V*, \eqn{\eta^2}, ...) default to | ||
#' a 95% CI with `alternative = "greater"`, to construct one sided CIs. | ||
#' \cr\cr | ||
#' An alternative solution that can often be found in the literature is to | ||
#' construct a two-sided CI at a lower confidence level (1-2\eqn{\alpha} = | ||
#' 1-2*5% = 90%), which gives the same lower bound, but also estimates an upper | ||
#' bound. Although this can be useful for equivalence testing, it should be | ||
#' noted that this solution doesn't actually give 95% coverage on the estimated | ||
#' effect size. For example: | ||
#' ```{r} | ||
#' data("hardlyworking") | ||
#' fit <- lm(salary ~ n_comps + age, data = hardlyworking) | ||
#' eta_squared(fit) | ||
#' eta_squared(fit, ci = 0.9, alternative = "two.sided") | ||
#' ``` | ||
#' In these cases, consider an alternative optimizer, such as though used in | ||
#' the \pkg{MBESS} package or an alternative method for computing CIs, | ||
#' such as the bootstrap. | ||
#' | ||
#' | ||
#' @references | ||
#' - Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E. J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic bulletin & review, 23(1), 103-123. | ||
#' - Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164-182. | ||
#' Bauer, P., & Kieser, M. (1996). | ||
#' A unifying approach for confidence intervals and testing of equivalence and difference. | ||
#' _Biometrika, 83_(4), 934-–937. | ||
#' \doi{10.1093/biomet/83.4.934} | ||
#' | ||
#' Rafi, Z., & Greenland, S. (2020). | ||
#' Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. | ||
#' _BMC Medical Research Methodology, 20_(1), Article 244. | ||
#' \doi{10.1186/s12874-020-01105-9} | ||
#' | ||
#' Schweder, T., & Hjort, N. L. (2016). | ||
#' _Confidence, likelihood, probability: Statistical inference with confidence distributions._ | ||
#' Cambridge University Press. | ||
#' \doi{10.1017/CBO9781139046671} | ||
#' | ||
#' Steiger, J. H. (2004). | ||
#' Beyond the _F_ test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. | ||
#' _Psychological Methods, 9_(2), 164--182. | ||
#' \doi{10.1037/1082-989x.9.2.164} | ||
#' | ||
#' Xie, M., & Singh, K. (2013). | ||
#' Confidence distribution, the frequentist distribution estimator of a parameter: A review. | ||
#' _International Statistical Review, 81_(1), 3–-39. | ||
#' \doi{10.1111/insr.12000} | ||
#' | ||
#' @rdname effectsize-CIs | ||
#' @name effectsize-CIs | ||
#' @rdname effectsize_CIs | ||
#' @name effectsize_CIs | ||
NULL |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwiernik I didn't originally add these as this seemed to much make the individual docs way too long for my liking. We can reference this and the troubleshooting in the same line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I have placed, say, here:
https://github.com/easystats/effectsize/pull/366/files?file-filters%5B%5D=.Rd&hide-deleted-files=true#diff-e3faf4c1fa910d1c629afa2d7d19afa68c4a54b02237e0f21cd792a15d177c53R89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me.