Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate balanced precision #333

Merged
merged 34 commits into from
Apr 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
41451c7
Quick implementation & examples for balanced precision
kelly-sovacool Mar 29, 2023
759601c
Demo balanced precision plot
kelly-sovacool Mar 29, 2023
1ca4a7b
Update bprc examples
kelly-sovacool Mar 29, 2023
1411cfd
Write unit tests for balanced precision
kelly-sovacool Mar 29, 2023
1967f41
Merge 1411cfd06ffe05a81fe010640313ba5586256d0c into 5f77e12816424b14b…
kelly-sovacool Mar 29, 2023
1b5afbf
📚 Render Roxygen documentation
github-actions[bot] Mar 29, 2023
7f860a3
🎨 Style R code
github-actions[bot] Mar 29, 2023
e36e8ef
Add calc_balanced_precision() to reference
kelly-sovacool Mar 29, 2023
6be75f4
Update News
kelly-sovacool Mar 29, 2023
d9dda89
Merge branch 'aubprc' of https://github.com/SchlossLab/mikropml into …
kelly-sovacool Mar 29, 2023
ef47385
Merge d9dda89d04a7b605f7c6f4a4779a7cc28f1d99fa into 5f77e12816424b14b…
kelly-sovacool Mar 29, 2023
fe9eded
📚 Render Roxygen documentation
github-actions[bot] Mar 29, 2023
636fe72
Document ycol for plot_mean_prc()
kelly-sovacool Mar 29, 2023
d43d449
ycol is now an argument instead of hard-coding mean_precision
kelly-sovacool Mar 29, 2023
a2f2b13
📑 Build docs site
github-actions[bot] Mar 29, 2023
60e0b00
Export calc_mean_perf
kelly-sovacool Mar 29, 2023
beb16ee
Merge branch 'aubprc' of https://github.com/SchlossLab/mikropml into …
kelly-sovacool Mar 29, 2023
b5f4f43
Merge beb16eeab329347495ea1e23a44e288c7163827c into 5f77e12816424b14b…
kelly-sovacool Mar 29, 2023
64d2751
Select most relevant columns for example
kelly-sovacool Mar 29, 2023
d153000
📑 Build docs site
github-actions[bot] Mar 29, 2023
e2e45e5
Merge branch 'aubprc' of https://github.com/SchlossLab/mikropml into …
kelly-sovacool Mar 29, 2023
b84b3ab
Add my email
kelly-sovacool Mar 29, 2023
120d4ec
Merge b84b3ab92432fa0faf452e9c779e3fc1096da18a into 5f77e12816424b14b…
kelly-sovacool Mar 29, 2023
d750c35
📑 Build docs site
github-actions[bot] Mar 29, 2023
ac20136
Merge branch 'main' into aubprc
kelly-sovacool Apr 2, 2023
2fad8fe
Name args in example
kelly-sovacool Apr 2, 2023
7ff353e
Merge 2fad8fea6af51dacd5732d02e9602f50d0e0b722 into 654c3a082afccffca…
kelly-sovacool Apr 2, 2023
9c56cd3
📚 Render Roxygen documentation
github-actions[bot] Apr 2, 2023
eb451f1
🎨 Style R code
github-actions[bot] Apr 2, 2023
66d3d2c
📑 Build docs site
github-actions[bot] Apr 2, 2023
1f7be32
Merge branch 'main' into aubprc
kelly-sovacool Apr 12, 2023
d764ae6
Merge 1f7be32ab788c0c3ca07031c8a36875ff04199bb into 1c5fa414a9576ed3e…
kelly-sovacool Apr 12, 2023
60b2c2b
📚 Render Roxygen documentation
github-actions[bot] Apr 12, 2023
bfa2345
📑 Build docs site
github-actions[bot] Apr 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ export("%>%")
export(":=")
export(.data)
export(bootstrap_performance)
export(calc_balanced_precision)
export(calc_baseline_precision)
export(calc_mean_perf)
export(calc_mean_prc)
export(calc_mean_roc)
export(calc_model_sensspec)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
- New function `bootstrap_performance()` allows you to calculate confidence
intervals for the model performance from a single train/test split by
bootstrapping the test set (#329, @kelly-sovacool).
- New function `calc_balanced_precision()` allows you to calculate balanced
precision and balanced area under the precision-recall curve (#333, @kelly-sovacool).
- Improved output from `find_feature_importance()` (#326, @kelly-sovacool).
- Renamed the column `names` to `feat` to represent each feature or group of correlated features.
- New column `lower` and `upper` to report the bounds of the empirical 95% confidence interval from the permutation test.
Expand Down
74 changes: 71 additions & 3 deletions R/performance.R
Original file line number Diff line number Diff line change
Expand Up @@ -356,13 +356,15 @@ calc_model_sensspec <- function(trained_model, test_data, outcome_colname = NULL

#' Generic function to calculate mean performance curves for multiple models
#'
#' Used by `calc_mean_roc()` and `calc_mean_prc()`.
#'
#' @param sensspec_dat data frame created by concatenating results of
#' `calc_model_sensspec()` for multiple models.
#' @param group_var variable to group by (e.g. specificity or recall).
#' @param sum_var variable to summarize (e.g. sensitivity or precision).
#'
#' @return data frame with mean & standard deviation of `sum_var` summarized over `group_var`
#' @keywords internal
#' @export
#'
#' @author Courtney Armour
#' @author Kelly Sovacool
Expand Down Expand Up @@ -452,7 +454,7 @@ calc_mean_prc <- function(sensspec_dat) {
#' ml_result$test_data,
#' "dx"
#' ) %>%
#' mutate(seed = seed)
#' dplyr::mutate(seed = seed)
#' return(sensspec)
#' }
#' sensspec_dat <- purrr::map_dfr(seq(100, 102), get_sensspec_seed)
Expand All @@ -470,6 +472,17 @@ calc_mean_prc <- function(sensspec_dat) {
#' baseline_prec <- calc_baseline_precision(otu_mini_bin, "dx", "cancer")
#' prc_dat %>%
#' plot_mean_prc(baseline_precision = baseline_prec)
#'
#' # balanced precision
#' prior <- calc_baseline_precision(otu_mini_bin,
#' outcome_colname = "dx",
#' pos_outcome = "cancer"
#' )
#' bprc_dat <- sensspec_dat %>%
#' dplyr::mutate(balanced_precision = calc_balanced_precision(precision, prior)) %>%
#' dplyr::rename(recall = sensitivity) %>%
#' calc_mean_perf(group_var = recall, sum_var = balanced_precision)
#' bprc_dat %>% plot_mean_prc(ycol = mean_balanced_precision) + ylab("Mean Bal. Precision")
#' }
NULL

Expand All @@ -488,7 +501,10 @@ NULL
#' @examples
#' # calculate the baseline precision
#' data.frame(y = c("a", "b", "a", "b")) %>%
#' calc_baseline_precision("y", "a")
#' calc_baseline_precision(
#' outcome_colname = "y",
#' pos_outcome = "a"
#' )
#'
#'
#' calc_baseline_precision(otu_mini_bin,
Expand All @@ -515,3 +531,55 @@ calc_baseline_precision <- function(dataset,
baseline_prec <- npos / ntot
return(baseline_prec)
}

#' Calculate balanced precision given actual and baseline precision
#'
#' Implements Equation 1 from Wu _et al._ 2021 \doi{10.1016/j.ajhg.2021.08.012}.
#' It is the same as Equation 7 if `AUPRC` (aka `prAUC`) is used in place of `precision`.
#'
#' @param precision actual precision of the model.
#' @param prior baseline precision, aka frequency of positives.
#' Can be calculated with [calc_baseline_precision]
#'
#' @return the expected precision if the data were balanced
#' @export
#' @author Kelly Sovacool \email{sovacool@@umich.edu}
#'
#' @examples
#' prior <- calc_baseline_precision(otu_mini_bin,
#' outcome_colname = "dx",
#' pos_outcome = "cancer"
#' )
#' calc_balanced_precision(otu_mini_bin_results_rf$performance$Precision, prior)
#'
#' otu_mini_bin_results_rf$performance %>%
#' dplyr::mutate(
#' balanced_precision = calc_balanced_precision(Precision, prior),
#' aubprc = calc_balanced_precision(prAUC, prior)
#' ) %>%
#' dplyr::select(AUC, Precision, balanced_precision, aubprc)
#'
#' # cumulative performance for a single model
#' sensspec_1 <- calc_model_sensspec(
#' otu_mini_bin_results_glmnet$trained_model,
#' otu_mini_bin_results_glmnet$test_data,
#' "dx"
#' )
#' head(sensspec_1)
#' prior <- calc_baseline_precision(otu_mini_bin,
#' outcome_colname = "dx",
#' pos_outcome = "cancer"
#' )
#' sensspec_1 %>%
#' dplyr::mutate(balanced_precision = calc_balanced_precision(precision, prior)) %>%
#' dplyr::rename(recall = sensitivity) %>%
#' calc_mean_perf(group_var = recall, sum_var = balanced_precision) %>%
#' plot_mean_prc(ycol = mean_balanced_precision)
calc_balanced_precision <-
function(precision, prior) {
return(
precision * (1 - prior) / (
precision * (1 - prior) + (1 - precision) * prior
)
)
}
5 changes: 3 additions & 2 deletions R/plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -274,15 +274,16 @@ plot_mean_roc <- function(dat,
#' @inheritParams shared_ggprotos
#' @inheritParams plot_mean_roc
#' @param baseline_precision baseline precision from `calc_baseline_precision()`
#' @param ycol column for the y axis (Default: `mean_precision`)
#'
#' @export
plot_mean_prc <- function(dat, baseline_precision = NULL,
plot_mean_prc <- function(dat, baseline_precision = NULL, ycol = mean_precision,
ribbon_fill = "#C7E9C0", line_color = "#00441B") {
recall <- mean_precision <- lower <- upper <- NULL
abort_packages_not_installed("ggplot2")
prc_plot <- dat %>%
ggplot2::ggplot(ggplot2::aes(
x = recall, y = mean_precision,
x = recall, y = {{ ycol }},
ymin = lower, ymax = upper
)) +
shared_ggprotos(ribbon_fill = ribbon_fill, line_color = line_color) +
Expand Down
4 changes: 3 additions & 1 deletion _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ reference:
- get_feature_importance
- get_performance_tbl
- sensspec
- calc_mean_perf
- calc_baseline_precision
- calc_balanced_precision
- compare_models
- permute_p_value
- bootstrap_performance
Expand All @@ -63,7 +66,6 @@ reference:
Visualize results to help you tune hyperparameters and choose model methods.
contents:
- starts_with('plot')
- calc_baseline_precision
- tidy_perf_data
- get_hp_performance
- combine_hp_performance
Expand Down
28 changes: 14 additions & 14 deletions docs/dev/articles/parallel.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/dev/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ articles:
parallel: parallel.html
preprocess: preprocess.html
tuning: tuning.html
last_built: 2023-04-02T20:50Z
last_built: 2023-04-12T17:15Z
urls:
reference: http://www.schlosslab.org/mikropml/reference
article: http://www.schlosslab.org/mikropml/articles
Expand Down
34 changes: 17 additions & 17 deletions docs/dev/reference/bootstrap_performance.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added docs/dev/reference/calc_balanced_precision-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading