diff --git a/_freeze/content/lectures/13-cs01-analysis/execute-results/html.json b/_freeze/content/lectures/13-cs01-analysis/execute-results/html.json
index fd9a281..3295f04 100644
--- a/_freeze/content/lectures/13-cs01-analysis/execute-results/html.json
+++ b/_freeze/content/lectures/13-cs01-analysis/execute-results/html.json
@@ -1,7 +1,7 @@
{
- "hash": "a8036478fb7d853228624bab66cdc6d0",
+ "hash": "7dd6d09087241c7f1d863d30e5715f66",
"result": {
- "markdown": "---\ntitle: \"13-cs01-analysis\"\nauthor: \"Professor Shannon Ellis\"\ndate: \"2023-11-14\"\n\nformat:\n html: \n output-file: 13-cs01-analysis.html\n embed-resources: true\n revealjs:\n output-file: 13-cs01-analysis-slides.html\n slide-number: true\n chalkboard: false \n preview-links: auto\n logo: images/cogs137-logo-hex.png\n css: slides.css\n footer: \n scrollable: true\n embed-resources: true\n execute:\n echo: true\n eval: true\n---\n\n::: {.cell}\n\n:::\n\n\n\n# CS01: Biomarkers of Recent Use (Analysis) {background-color=\"#92A86A\"}\n\n## Q&A {.smaller}\n\n> Q: How extensive does our extension component need to be?\\\n> A: A bit hard to answer in certain terms. We'll discuss some examples today to hopefully set expectaions well. To explain in writing here, the most typical extension is students using the data provided to ask and answer a question not directly presented in class. Thus, simply generating a visualization not presented in class would NOT be sufficient. At the other end, finding external data on the topic and analyzing that data, while certainly allowed, would far exceed expectations. In between those extremes is what we expect: add significantly to the analysis, beyond what was presented in class.\n\n\n## Course Announcements\n\nDue Dates:\n\n- **HW03** (MLR) due Mon 11/20\n- **Project Proposal** (it will be a Google Form) due 11/20\n- **CS01** Deadlines:\n - **Lab06** due Friday - cs01-focused\n - Report & \"General Communication\" due 11/27\n - survey about how working with group went - due 11/28\n\n. . .\n\nNotes:\n\nMidterm scores & Feedback posted\n\n- overall, did very well\n - avg: 13.85/15 (92%)\n - 6 perfect scores\n- answer key on course website\n\nI am behind on emails and Piazza posts.\n\n## Mid-course Survey Summary \n\n- N=73 (~75%)\n- Pacing workload (so far) about right\n- Course notes most helpful in the course overall\n- Also helpful: completing labs, doing homework,\n- Many are not checking labs against answer keys; most are not doing suggested readings\n- Of those that attend lecture, most find it helpful\n\n## Mid-course: Time Spent\n\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-1-1.png){width=3000}\n:::\n:::\n\n\n\n## Mid-course: What would you change?\n \n\n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n
\n\n```\n:::\n:::\n\n\n \n## Agenda\n\n- Debugging/Understanding Code Strategies\n- Sensitivity & Specificity\n- Cross-compound correlations\n- Extensions\n\n## Summary: Figuring out what's going on in code\n\nSuggestions (as discussed in class):\n\n:::incremental\n1. Look up documentation (i.e. `?...`) / Google the function\n2. Run it on different input; see how output changing\n3. Run the code line-by-line, understanding output at each step \n4. Ask ChatGPT\n:::\n\n# Question {background-color=\"#92A86A\"}\n\nWhich compound, in which matrix, and at what cutoff is the best biomarker of recent use?\n\n. . .\n\n::: callout-message\nEvery group will answer this question.\n:::\n\n# Data & Files {background-color=\"#92A86A\"}\n\n## Packages\n\nThree additional packages required for these notes:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(purrr)\nlibrary(rstatix)\nlibrary(cowplot)\n```\n:::\n\n\n\n## The Data\n\nReading in the data from the end of data wrangling notes: \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nload(\"data/compounds.RData\")\nload(\"data/timepoints.RData\")\nload(\"data/data_clean.RData\")\n```\n:::\n\n::: {.cell}\n\n:::\n\n\n\nAnd the functions...\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsource(\"src/cs01_functions.R\")\n```\n:::\n\n\n\n# Analysis {background-color=\"#92A86A\"}\n\n\n## Sensitivity & Specificity\n\n**Sensitivity** | the ability of a test to correctly identify patients with a disease/trait/condition. $$TP/(TP + FN)$$\n\n. . .\n\n**Specificity** | the ability of a test to correctly identify people without the disease/trait/condition. $$TN/(TN + FP)$$\n\n. . . \n\n[❓ For this analysis, do you care more about sensitivity? about specificity? equally about both?]{style=\"background-color: #ADD8E6\"}\n\n## What is a TP here? TN? FP? FN? \n\n**Post-smoking** (cutoff > 0)\n\n:::incremental\n- TP = THC group, value >= cutoff\n- FN = THC group, value < cutoff\n- FP = Placebo group, value >= cutoff\n- TN = Placebo group, value < cutoff\n:::\n\n. . .\n\n**Post-smoking** (cutoff == 0)\n\nCannot be a TP or FP if zero...\n\n- TP = THC group, value > cutoff),\n- FN = THC group, value <= cutoff),\n- FP = Placebo group, value > cutoff),\n- TN = Placebo group, value < cutoff)\n\n. . . \n\n**Pre-smoking** \n\nCannot be a TP or FN before consuming...\n\n- TP = 0\n- FN = 0\n- FP = value >= cutoff\n- TN = value < cutoff\n\n## ROC\n\nReceiver-Operator Characteristic (ROC) Curve: TPR (Sensitivity) vs FPR (1-Specificity)\n\n![](images/13/Roc_curve.svg.png)\n\nImage Credit: By cmglee, MartinThoma - Roc-draft-xkcd-style.svg, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=109730045\n\n\n\n\n## Calculating Sensitivity and Specificity\n\n:::panel-tabset\n\n### Calculate \n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmake_calculations <- function(dataset, dataset_removedups, split, compound, \n start = start, stop = stop, tpt_use = tpt_use){\n ## remove NAs\n df <- dataset_removedups %>% \n dplyr::select(treatment, compound, timepoint_use) %>%\n rename(compound = 2) %>%\n filter(complete.cases(.))\n if(nrow(df)>0){\n if(stop <= 0){\n output <- df %>% \n summarise(TP = 0,\n FN = 0,\n FP = sum(compound >= split),\n TN = sum(compound < split)) \n }else{\n if(split == 0){\n output_pre <- df %>% \n filter(tpt_use == \"pre-smoking\") %>%\n summarise(TP = 0,\n FN = 0,\n FP = sum(compound >= split),\n TN = sum(compound < split)) \n \n output <- df %>% \n filter(tpt_use != \"pre-smoking\") %>%\n summarise(TP = sum(treatment != \"Placebo\" & compound > split),\n FN = sum(treatment != \"Placebo\" & compound <= split),\n FP = sum(treatment == \"Placebo\" & compound > split),\n TN = sum(treatment == \"Placebo\" & compound < split))\n \n output <- output + output_pre\n }else{\n ## calculate values if pre-smoking\n output_pre <- df %>% \n filter(tpt_use == \"pre-smoking\") %>%\n summarise(TP = 0,\n FN = 0,\n FP = sum(compound >= split),\n TN = sum(compound < split)) \n \n output <- df %>% \n filter(tpt_use != \"pre-smoking\") %>%\n summarise(TP = sum(treatment != \"Placebo\" & compound >= split),\n FN = sum(treatment != \"Placebo\" & compound < split),\n FP = sum(treatment == \"Placebo\" & compound >= split),\n TN = sum(treatment == \"Placebo\" & compound < split))\n \n output <- output + output_pre\n }\n }\n }\n # clean things up; make calculations on above values\n output <- output %>%\n mutate(detection_limit = split,\n compound = compound,\n time_start = start,\n time_stop = stop,\n time_window = tpt_use,\n NAs = nrow(dataset) - nrow(df),\n N = nrow(dataset_removedups),\n N_removed = nrow(dataset) - nrow(dataset_removedups),\n Sensitivity = (TP/(TP + FN)), \n Specificity = (TN /(TN + FP)),\n PPV = (TP/(TP+FP)),\n NPV = (TN/(TN + FN)),\n Efficiency = ((TP + TN)/(TP + TN + FP + FN))*100\n )\n \n return(output)\n}\n```\n:::\n\n\n\n### Run\n\n- determine what cutoff values to try\n- carry out above function on those cutoffs\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsens_spec <- function(dataset, compound, start, stop, tpt_use, \n lowest_value = 0.5, splits = NULL, ...){\n # if it's not all NAs...\n if(sum(is.na(dataset[,compound])) != nrow(dataset)){\n # specify what splits should be used for calculations\n if(is.null(splits)){\n limits <- dataset[is.finite(rowSums(dataset[,compound])),compound]\n ## define lower and upper limits\n lower = min(limits, na.rm=TRUE)\n upper = max(limits, na.rm=TRUE)\n ## determine splits to use for calculations\n tosplit = pull(limits[,1])[limits[,1]>0]\n ## only split if there are detectable limits:\n if(length(tosplit)>=1){\n splits = c(lowest_value, quantile(tosplit, probs=seq(0, 1, by = 0.01), na.rm=TRUE))\n splits = unique(splits)\n }else{\n splits = 0\n }\n }else{\n splits = splits\n }\n # filter to include timepoint of interest\n dataset <- dataset %>% \n filter(time_from_start > start & time_from_start <= stop & !is.na(timepoint_use))\n dataset_removedups <- dataset %>%\n filter(!is.na(timepoint_use)) %>% \n group_by(timepoint_use) %>% \n distinct(id, .keep_all = TRUE) %>% \n ungroup()\n\n ## create empty output variable which we'll fill in\n ## iterate through each possible dose and calculate\n output <- map_dfr(as.list(splits), ~make_calculations(dataset, \n dataset_removedups, \n split = .x,\n compound,\n start = start,\n stop = stop, \n tpt_use = tpt_use))\n }\n \n return(output)\n}\n```\n:::\n\n\n\n### Apply\n\nMap the above for each matrix...\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsens_spec_cpd <- function(dataset, cpd, timepoints, splits = NULL){\n args2 <- list(start = timepoints$start, \n stop = timepoints$stop, \n tpt_use = timepoints$timepoint)\n out <- args2 %>% \n pmap_dfr(sens_spec, dataset, compound = cpd, splits = splits)\n return(out)\n}\n```\n:::\n\n\n\n### Do it!\n\nThis takes a few minutes to run... (reminder: `cache=TRUE`)\n\n\n\n::: {.cell hash='13-cs01-analysis_cache/revealjs/unnamed-chunk-10_9dac3684d755322598037474bd630929'}\n\n```{.r .cell-code}\noutput_WB <- map_dfr(compounds_WB, \n ~sens_spec_cpd(dataset = WB, cpd = all_of(.x), \n timepoints = timepoints_WB)) %>% clean_gluc()\n\noutput_BR <- map_dfr(compounds_BR, \n ~sens_spec_cpd(dataset = BR, cpd = all_of(.x),\n timepoints = timepoints_BR)) %>% clean_gluc()\n\noutput_OF <- map_dfr(compounds_OF, \n ~sens_spec_cpd(dataset = OF, cpd = all_of(.x),\n timepoints = timepoints_OF)) %>% clean_gluc()\n```\n:::\n\n\n\n::: \n\n## ROC \n\n:::panel-tabset\n\n### Code\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nss_plot <- function(output, tpts=8, tissue){\n to_include = output %>%\n group_by(compound) %>% \n summarize(mean_detection = mean(detection_limit)) %>% \n filter(mean_detection > 0)\n \n output <- output %>% \n mutate(iszero = ifelse(time_start<0,TRUE,FALSE),\n Sensitivity = round(Sensitivity*100,0),\n Specificity = round(Specificity*100,0)) %>%\n filter(compound %in% to_include$compound,\n time_window != \"pre-smoking\") %>%\n clean_gluc() %>% \n mutate(compound = fct_relevel(as.factor(compound), \"THC\"))\n \n output <- output %>% mutate(\n legend = paste0(time_window,' (N=', N,')'))\n \n blue_colors = c('#C2F8FF', '#A2DDED', '#86BEDC', '#6C9FCA', \n '#547EB9', '#3F5EA8', '#2D4096', '#1E2385',\n '#181173', '#180762', '#180051')\n values = c(blue_colors[1:tpts])\n \n print(ggplot(output, aes(x = detection_limit, y = Sensitivity, group = fct_inorder(legend))) + \n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound, scales = \"free_x\") +\n labs(x = 'Detection Limit',\n y = 'Sensitivity') +\n ylim(0,1) +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16), \n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12)) \n )\n print(\n ggplot(output, aes(x = detection_limit, y = Specificity, group = fct_inorder(legend))) + \n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound, scales = \"free_x\") +\n ylim(0,100) +\n labs(title = tissue,\n x = 'Detection Limit',\n y = 'Specificity') +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16),\n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12))\n )\n print(\n ggplot(output, aes(x=(100-Specificity), y = Sensitivity, group = fct_inorder(legend))) +\n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound) +\n xlim(0, 100) +\n ylim(0, 100) +\n labs(title = tissue,\n x = '(100-Specificity)',\n y = 'Sensitivity') +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16),\n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12),\n axis.text = element_text(size=12))\n )\n}\n\nroc_plot <- function(output, tpts=8, tissue){\n to_include = output %>%\n group_by(compound) %>% \n summarize(mean_detection = mean(detection_limit)) %>% \n filter(mean_detection > 0)\n \n output <- output %>% \n mutate(iszero = ifelse(time_start<0,TRUE,FALSE),\n Sensitivity = round(Sensitivity*100,0),\n Specificity = round(Specificity*100,0)) %>%\n filter(compound %in% to_include$compound,\n time_window != \"pre-smoking\") %>%\n clean_gluc() %>% \n mutate(compound = fct_relevel(as.factor(compound), \"THC\"))\n \n output <- output %>% mutate(\n legend = paste0(time_window,' (N=', N,')'))\n \n blue_colors = c('#C2F8FF', '#86BEDC', \n '#547EB9', '#2D4096',\n '#181173', '#180051')\n values = c(blue_colors[1:tpts])\n print(\n ggplot(output, aes(x=(100-Specificity), y = Sensitivity, group = fct_inorder(legend))) +\n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound) +\n xlim(0, 100) +\n ylim(0, 100) +\n labs(title = tissue,\n x = '(100-Specificity)',\n y = 'Sensitivity') +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16),\n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12),\n axis.text = element_text(size=12) )\n )\n}\n```\n:::\n\n\n\n### Calculate\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nss1_a <- ss_plot(output_WB, tpts = length(unique(output_WB$time_start)), tissue = \"Blood\")\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-12-1.png){width=3000}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-12-2.png){width=3000}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-12-3.png){width=3000}\n:::\n\n```{.r .cell-code}\nss2_a <- ss_plot(output_OF, tpts = length(unique(output_OF$time_start)), tissue = \"Oral Fluid\")\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-12-4.png){width=3000}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-12-5.png){width=3000}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-12-6.png){width=3000}\n:::\n\n```{.r .cell-code}\nss3_a <- roc_plot(output_BR, tpts = length(unique(output_BR$time_start)), tissue = \"Breath\")\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-12-7.png){width=3000}\n:::\n:::\n\n\n\n### Plot\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbottom_row <- plot_grid(ss2_a, ss3_a, labels = c('B', 'C'), label_size = 12, ncol = 2, rel_widths = c(0.66, .33))\nplot_grid(ss1_a, bottom_row, labels = c('A', ''), label_size = 12, ncol = 1)\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-13-1.png){width=3000}\n:::\n:::\n\n\n\n:::\n\n## Calculate: THC\n\nReminder: Currently, states have laws on the books from zero tolerance (detection of any level) to 5ng/mL\n\n:::panel-tabset\n\n### WB\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncutoffs = c(0.5, 1, 2, 5, 10)\nWB_THC <- sens_spec_cpd(dataset = WB, cpd = 'thc',\n timepoints = timepoints_WB,\n splits = cutoffs) %>% clean_gluc()\n\nWB_THC\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 50 × 17\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 81 108 0.5 THC -400 0\n 2 0 0 61 128 1 THC -400 0\n 3 0 0 45 144 2 THC -400 0\n 4 0 0 10 179 5 THC -400 0\n 5 0 0 1 188 10 THC -400 0\n 6 124 2 28 33 0.5 THC 0 30\n 7 123 3 22 39 1 THC 0 30\n 8 119 7 15 46 2 THC 0 30\n 9 106 20 4 57 5 THC 0 30\n10 101 25 0 61 10 THC 0 30\n# ℹ 40 more rows\n# ℹ 9 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency \n```\n:::\n:::\n\n\n\n### OF\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nOF_THC <- sens_spec_cpd(dataset = OF, cpd = 'thc',\n timepoints = timepoints_OF,\n splits = cutoffs) %>% clean_gluc()\n\nOF_THC\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 40 × 17\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 35 157 0.5 THC -400 0\n 2 0 0 20 172 1 THC -400 0\n 3 0 0 9 183 2 THC -400 0\n 4 0 0 0 192 5 THC -400 0\n 5 0 0 0 192 10 THC -400 0\n 6 129 0 39 24 0.5 THC 0 30\n 7 129 0 30 33 1 THC 0 30\n 8 128 1 19 44 2 THC 0 30\n 9 128 1 3 60 5 THC 0 30\n10 125 4 1 62 10 THC 0 30\n# ℹ 30 more rows\n# ℹ 9 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency \n```\n:::\n:::\n\n\n\n### BR\n\nWhy is there no calculation for breath?\n\n\n:::\n\n## Cutoffs\n\n:::panel-tabset\n\n### Code\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_cutoffs <- function(dataset, timepoint_use_variable, tissue, labels = c(\"A\", \"B\"), vertline, cpd, x_labels){\n col_val = c(\"#D9D9D9\", \"#BDBDBD\", \"#969696\", \"#636363\", \"#252525\")\n lines = rep(\"solid\", 5)\n \n df_ss <- dataset %>% \n mutate(time_window = fct_relevel(as.factor(time_window), \n levels(timepoint_use_variable)),\n detection_limit = as.factor(detection_limit),\n Sensitivity = round(Sensitivity*100,0),\n Specificity = round(Specificity*100,0),\n my_label = paste0(time_window, ' N=', N),\n my_label = gsub(\" \", \"\\n\", my_label),\n my_label = fct_relevel(as.factor(my_label), x_labels)) #%>% \n \n p1 <- df_ss %>% \n ggplot(aes(x = my_label, y = Sensitivity, \n colour = detection_limit)) + \n geom_line(size = 1.2, aes(group = detection_limit, \n linetype = detection_limit)) + \n geom_vline(xintercept=vertline, linetype = 'dotted') +\n geom_point(show.legend=FALSE) + \n ylim(0,100) +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +\n scale_linetype_manual(values=lines) +\n scale_color_manual(values = col_val, name = \"Cutoff \\n (ng/mL)\",\n guide = guide_legend(override.aes = list(linetype = c(1),\n shape = rep(NA, length(lines))) )) +\n theme_classic() +\n theme( axis.title = element_text(size=16),\n axis.text = element_text(size=10),\n legend.position = c(0.08, 0.4),\n panel.grid = element_blank(),\n strip.background = element_blank()\n ) +\n guides(linetype = FALSE) +\n labs(x = \"Time Window\", \n y = \"Sensitivity\", \n title = paste0(tissue,\": \", cpd) )\n \n p2 <- df_ss %>% \n ggplot(aes(x = my_label, y = Specificity,\n group = detection_limit, \n colour = detection_limit, \n linetype = detection_limit)) + \n geom_line(size = 1.2) +\n geom_vline(xintercept=vertline, linetype = 'dotted') +\n geom_point() + \n ylim(0,100) +\n scale_color_manual(values = col_val) +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +\n scale_linetype_manual(values = lines, \n guide = guide_legend(override.aes = list(linetype = \"solid\",\n shape = rep(NA, length(lines))) )) +\n theme_classic() +\n theme(axis.title = element_text(size=16),\n axis.text = element_text(size=10),\n legend.position = \"none\", \n panel.grid = element_blank(),\n strip.background = element_blank()) +\n labs(x = \"Time Window\", \n y = \"Specificity\",\n title = \"\" )\n \n title <- ggdraw() + \n draw_label(\n tissue,\n x = 0.05,\n hjust = 0\n )\n \n plot_row <- plot_grid(p1, p2, labels = labels, label_size = 12)\n \n plot_grid(\n title, plot_row,\n ncol = 1,\n # rel_heights values control vertical title margins\n rel_heights = c(0.1, 1)\n )\n \n return(list(plot_row, df_ss))\n\n}\n```\n:::\n\n\n\n### WB\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nblood_levels <- c(\"pre-smoking\\nN=189\", \"0-30\\nmin\\nN=187\", \"31-70\\nmin\\nN=165\",\n \"71-100\\nmin\\nN=157\", \"101-180\\nmin\\nN=168\", \"181-210\\nmin\\nN=103\",\n \"211-240\\nmin\\nN=127\", \"241-270\\nmin\\nN=137\", \"271-300\\nmin\\nN=120\",\n \"301+\\nmin\\nN=88\")\n\nplot_cutoffs(dataset=WB_THC, \n timepoint_use_variable=WB$timepoint_use, \n tissue=\"Blood\", \n vertline=levels(WB$timepoint_use)[5], \n cpd=\"THC\", \n x_labels=blood_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-17-1.png){width=3000}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 50 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 81 108 0.5 THC -400 0\n 2 0 0 61 128 1 THC -400 0\n 3 0 0 45 144 2 THC -400 0\n 4 0 0 10 179 5 THC -400 0\n 5 0 0 1 188 10 THC -400 0\n 6 124 2 28 33 0.5 THC 0 30\n 7 123 3 22 39 1 THC 0 30\n 8 119 7 15 46 2 THC 0 30\n 9 106 20 4 57 5 THC 0 30\n10 101 25 0 61 10 THC 0 30\n# ℹ 40 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n\n### OF\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nof_levels <- c(\"pre-smoking\\nN=192\", \"0-30\\nmin\\nN=192\", \"31-90\\nmin\\nN=117\",\n \"91-180\\nmin\\nN=99\", \"181-210\\nmin\\nN=102\", \"211-240\\nmin\\nN=83\",\n \"241-270\\nmin\\nN=90\", \"271+\\nmin\\nN=76\")\n\nplot_cutoffs(OF_THC, OF$timepoint_use, tissue = \"Oral Fluid\", labels = c(\"A\", \"B\"), vertline=levels(OF$timepoint_use)[4], cpd=\"THC\", x_labels=of_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-18-1.png){width=3000}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 40 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 35 157 0.5 THC -400 0\n 2 0 0 20 172 1 THC -400 0\n 3 0 0 9 183 2 THC -400 0\n 4 0 0 0 192 5 THC -400 0\n 5 0 0 0 192 10 THC -400 0\n 6 129 0 39 24 0.5 THC 0 30\n 7 129 0 30 33 1 THC 0 30\n 8 128 1 19 44 2 THC 0 30\n 9 128 1 3 60 5 THC 0 30\n10 125 4 1 62 10 THC 0 30\n# ℹ 30 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n\n:::\n\n\n## Calculate: CBN\nReminder: Currently, states have laws on the books from zero tolerance (detection of any level) to 5ng/mL\n\n:::panel-tabset\n\n### WB\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nWB_CBN = sens_spec_cpd(dataset = WB, cpd = 'cbn',\n timepoints = timepoints_WB,\n splits = cutoffs) %>% clean_gluc()\n\nblood_levels <- c(\"pre-smoking\\nN=189\", \"0-30\\nmin\\nN=187\", \"31-70\\nmin\\nN=165\",\n \"71-100\\nmin\\nN=157\", \"101-180\\nmin\\nN=168\", \"181-210\\nmin\\nN=103\",\n \"211-240\\nmin\\nN=127\", \"241-270\\nmin\\nN=137\", \"271-300\\nmin\\nN=120\",\n \"301+\\nmin\\nN=88\")\n\nplot_cutoffs(WB_CBN, WB$timepoint_use, tissue = \"Blood\", vertline=levels(WB$timepoint_use)[5], cpd=\"CBN\", x_labels=blood_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-19-1.png){width=3000}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 50 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 1 188 0.5 CBN -400 0\n 2 0 0 0 189 1 CBN -400 0\n 3 0 0 0 189 2 CBN -400 0\n 4 0 0 0 189 5 CBN -400 0\n 5 0 0 0 189 10 CBN -400 0\n 6 106 20 7 54 0.5 CBN 0 30\n 7 97 29 0 61 1 CBN 0 30\n 8 82 44 0 61 2 CBN 0 30\n 9 40 86 0 61 5 CBN 0 30\n10 9 117 0 61 10 CBN 0 30\n# ℹ 40 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n\n### OF\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nOF_CBN = sens_spec_cpd(dataset = OF, cpd = 'cbn',\n timepoints = timepoints_OF,\n splits = cutoffs) %>% clean_gluc()\n\nof_levels <- c(\"pre-smoking\\nN=192\", \"0-30\\nmin\\nN=192\", \"31-90\\nmin\\nN=117\",\n \"91-180\\nmin\\nN=99\", \"181-210\\nmin\\nN=102\", \"211-240\\nmin\\nN=83\",\n \"241-270\\nmin\\nN=90\", \"271+\\nmin\\nN=76\")\n\nplot_cutoffs(OF_CBN, OF$timepoint_use, tissue = \"Oral Fluid\", labels = c(\"A\", \"B\"), vertline=levels(OF$timepoint_use)[4], cpd=\"CBN\", x_labels=of_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-20-1.png){width=3000}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 40 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 5 187 0.5 CBN -400 0\n 2 0 0 1 191 1 CBN -400 0\n 3 0 0 1 191 2 CBN -400 0\n 4 0 0 1 191 5 CBN -400 0\n 5 0 0 0 192 10 CBN -400 0\n 6 127 2 41 22 0.5 CBN 0 30\n 7 125 4 32 31 1 CBN 0 30\n 8 122 7 18 45 2 CBN 0 30\n 9 116 13 7 56 5 CBN 0 30\n10 107 22 3 60 10 CBN 0 30\n# ℹ 30 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n\n:::\n\n\n## Compound Correlations\n\n:::panel-tabset\n\n### Code\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplotRegression <- function (x, y, xlab, ylab, x_text, y_text, y_text2, title) {\n fit <- lm(y ~ x)\n if(max(fit$model[,1],na.rm=TRUE)!=0){\n ggplot(fit$model, aes_string(x = names(fit$model)[2], \n y = names(fit$model)[1])) + \n geom_point() +\n stat_smooth(method = \"lm\", col = \"#B73239\", size = 1.5, se = FALSE) +\n annotate(\"text\", x=x_text, y=y_text, \n label = paste(\"R^2 == \", format(signif(summary(fit)$adj.r.squared, 5), \n digits=2)),\n vjust=1, hjust=0, parse=TRUE,size=4.5) +\n labs(x = xlab, \n y = ylab, \n title = title ) +\n annotate(\"text\", x=x_text, y=y_text2, label = paste(\n \"y = \", format(signif(fit$coef[[2]], 5),digits=2),\n \"x + \",\n format(signif(fit$coef[[1]],5 ),digits=2),\n paste0(\"\\nN = \", length(x))),\n vjust=1, hjust=0, size=4.5) + \n theme_minimal(base_size=14) +\n theme(panel.grid = element_blank(),\n axis.line = element_line(size = 0.5, linetype = \"solid\",\n colour = \"black\"),\n legend.position=\"none\") \n } else{\n ggplot(fit$model, aes_string(x = names(fit$model)[2], \n y = names(fit$model)[1])) + \n geom_point() +\n scale_y_continuous(limits = c(0,3)) +\n stat_smooth(method = \"lm\", col = \"#B73239\", size = 1.5, se = FALSE) +\n annotate(\"text\", x=x_text, y=y_text, \n label = paste(\"R^2 == \", format(signif(summary(fit)$adj.r.squared, 5), digits=2)),vjust=1, hjust=1, parse=TRUE,size=4.5) +\n labs(x = xlab, \n y = ylab, \n title = title ) +\n annotate(\"text\", x=x_text, y=y_text2, label = paste(\n \"y = \", format(signif(fit$coef[[2]], 5),digits=2),\n \"x + \",\n format(signif(fit$coef[[1]],5 ),digits=2),\n paste0(\"\\nN = \", length(x))), vjust=1, hjust=1,size=4.5) + \n theme_minimal(base_size = 14) +\n theme(panel.grid = element_blank(),\n axis.line = element_line(size = 0.5, linetype = \"solid\",\n colour = \"black\"),\n legend.position=\"none\") \n \n \n }\n}\n```\n:::\n\n\n\n### Plot\n\n\n::: {.cell}\n\n```{.r .cell-code}\nwb_reg <- ggplotRegression(WB$thc, WB$cbn, xlab = 'THC (ng/mL)', ylab = 'CBN (ng/mL)', x_text= 150, y_text = 7, y_text2 = 5, title = \"Blood\")\n\nof_reg <- ggplotRegression(OF$thc, OF$cbn, xlab = 'THC (ng/mL)', ylab = 'CBN (ng/mL)', x_text= 12500, y_text = 750, y_text2 = 500, title = \"Oral Fluid\")\n\nplot_grid(wb_reg, of_reg, labels = 'AUTO', label_size = 12, ncol = 2, scale = 1)\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-revealjs/unnamed-chunk-22-1.png){width=3000}\n:::\n:::\n\n\n\n:::\n\n## Possible Extensions\n\nOur current question asks for a *single* compound...and you'll need to decide that.\n\n. . . \n\n...but you could imagine a world where more than one compound or more than one matrix could be measured at the roadside.\n\n. . . \n\nSo:\n\n:::incremental\n- combination of the oral fluid and blood that would better predict recent use? (For example if an officer stopped a driver and got a high oral fluid, but could not get a blood sample for a couple of hours and got a relatively low result would this predict recent use better than blood (or OF) alone? \n- Is there a ratio of OF/blood that predicts recent use?\n- Machine learning model to determine optimal combination of measurements/cutoffs to detect recent use?\n:::\n\n. . . \n\nThings to keep in mind:\n\n- some matrices are easier to get at the roadside\n- time from use matters (trying to detect *recent* use)\n- we may not care equally about sensitivity and specificity\n\n\n## Recap {.smaller background-color=\"#92A86A\"}\n\n- Can you describe sensitivity? Specificity?\n- Can you explain how TP, TN, FP, and FN were calculated/defined *in this experiment*?\n- Can you describe the code used to carry out the calculations?\n- Can you interpret the results from these data?\n",
+ "markdown": "---\ntitle: \"13-cs01-analysis\"\nauthor: \"Professor Shannon Ellis\"\ndate: \"2023-11-14\"\n\nformat:\n html: \n output-file: 13-cs01-analysis.html\n embed-resources: true\n revealjs:\n output-file: 13-cs01-analysis-slides.html\n slide-number: true\n chalkboard: false \n preview-links: auto\n logo: images/cogs137-logo-hex.png\n css: slides.css\n footer: \n scrollable: true\n embed-resources: true\n execute:\n echo: true\n eval: true\n---\n\n::: {.cell}\n\n:::\n\n\n# CS01: Biomarkers of Recent Use (Analysis) {background-color=\"#92A86A\"}\n\n## Q&A {.smaller}\n\n> Q: How extensive does our extension component need to be?\\\n> A: A bit hard to answer in certain terms. We'll discuss some examples today to hopefully set expectaions well. To explain in writing here, the most typical extension is students using the data provided to ask and answer a question not directly presented in class. Thus, simply generating a visualization not presented in class would NOT be sufficient. At the other end, finding external data on the topic and analyzing that data, while certainly allowed, would far exceed expectations. In between those extremes is what we expect: add significantly to the analysis, beyond what was presented in class.\n\n\n## Course Announcements\n\nDue Dates:\n\n- **HW03** (MLR) due Mon 11/20\n- **Project Proposal** (it will be a Google Form) due 11/20\n- **CS01** Deadlines:\n - **Lab06** due Friday - cs01-focused\n - Report & \"General Communication\" due 11/27\n - survey about how working with group went - due 11/28\n\n. . .\n\nNotes:\n\nMidterm scores & Feedback posted\n\n- overall, did very well\n - avg: 13.85/15 (92%)\n - 6 perfect scores\n- answer key on course website\n\nI am behind on emails and Piazza posts.\n\n## Mid-course Survey Summary \n\n- N=73 (~75%)\n- Pacing workload (so far) about right\n- Course notes most helpful in the course overall\n- Also helpful: completing labs, doing homework,\n- Many are not checking labs against answer keys; most are not doing suggested readings\n- Of those that attend lecture, most find it helpful\n\n## Mid-course: Time Spent\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-1-1.png){width=2100}\n:::\n:::\n\n\n## Mid-course: What would you change?\n \n\n::: {.cell}\n::: {.cell-output-display}\n```{=html}\n\n\n```\n:::\n:::\n\n \n## Agenda\n\n- Debugging/Understanding Code Strategies\n- Sensitivity & Specificity\n- Cross-compound correlations\n- Extensions\n\n## Summary: Figuring out what's going on in code\n\nSuggestions (as discussed in class):\n\n:::incremental\n1. Look up documentation (i.e. `?...`) / Google the function\n2. Run it on different input; see how output changing\n3. Run the code line-by-line, understanding output at each step \n4. Ask ChatGPT\n:::\n\n# Question {background-color=\"#92A86A\"}\n\nWhich compound, in which matrix, and at what cutoff is the best biomarker of recent use?\n\n. . .\n\n::: callout-message\nEvery group will answer this question.\n:::\n\n# Data & Files {background-color=\"#92A86A\"}\n\n## Packages\n\nThree additional packages required for these notes:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(purrr)\nlibrary(rstatix)\nlibrary(cowplot)\n```\n:::\n\n\n## The Data\n\nReading in the data from the end of data wrangling notes: \n\n\n::: {.cell}\n\n```{.r .cell-code}\nload(\"data/compounds.RData\")\nload(\"data/timepoints.RData\")\nload(\"data/data_clean.RData\")\n```\n:::\n\n::: {.cell}\n\n:::\n\n\nAnd the functions...\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsource(\"src/cs01_functions.R\")\n```\n:::\n\n\n# Analysis {background-color=\"#92A86A\"}\n\n\n## Sensitivity & Specificity\n\n**Sensitivity** | the ability of a test to correctly identify patients with a disease/trait/condition. $$TP/(TP + FN)$$\n\n. . .\n\n**Specificity** | the ability of a test to correctly identify people without the disease/trait/condition. $$TN/(TN + FP)$$\n\n. . . \n\n[❓ For this analysis, do you care more about sensitivity? about specificity? equally about both?]{style=\"background-color: #ADD8E6\"}\n\n## What is a TP here? TN? FP? FN? \n\n**Post-smoking** (cutoff > 0)\n\n:::incremental\n- TP = THC group, value >= cutoff\n- FN = THC group, value < cutoff\n- FP = Placebo group, value >= cutoff\n- TN = Placebo group, value < cutoff\n:::\n\n. . .\n\n**Post-smoking** (cutoff == 0)\n\nCannot be a TP or FP if zero...\n\n- TP = THC group, value > cutoff),\n- FN = THC group, value <= cutoff),\n- FP = Placebo group, value > cutoff),\n- TN = Placebo group, value < cutoff)\n\n. . . \n\n**Pre-smoking** \n\nCannot be a TP or FN before consuming...\n\n- TP = 0\n- FN = 0\n- FP = value >= cutoff\n- TN = value < cutoff\n\n## ROC\n\nReceiver-Operator Characteristic (ROC) Curve: TPR (Sensitivity) vs FPR (1-Specificity)\n\n![](images/13/Roc_curve.svg.png)\n\nImage Credit: By cmglee, MartinThoma - Roc-draft-xkcd-style.svg, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=109730045\n\n\n\n\n## Calculating Sensitivity and Specificity\n\n:::panel-tabset\n\n### Calculate \n\n\n::: {.cell}\n\n```{.r .cell-code}\nmake_calculations <- function(dataset, dataset_removedups, split, compound, \n start = start, stop = stop, tpt_use = tpt_use){\n ## remove NAs\n df <- dataset_removedups %>% \n dplyr::select(treatment, compound, timepoint_use) %>%\n rename(compound = 2) %>%\n filter(complete.cases(.))\n if(nrow(df)>0){\n if(stop <= 0){\n output <- df %>% \n summarise(TP = 0,\n FN = 0,\n FP = sum(compound >= split),\n TN = sum(compound < split)) \n }else{\n if(split == 0){\n output_pre <- df %>% \n filter(tpt_use == \"pre-smoking\") %>%\n summarise(TP = 0,\n FN = 0,\n FP = sum(compound >= split),\n TN = sum(compound < split)) \n \n output <- df %>% \n filter(tpt_use != \"pre-smoking\") %>%\n summarise(TP = sum(treatment != \"Placebo\" & compound > split),\n FN = sum(treatment != \"Placebo\" & compound <= split),\n FP = sum(treatment == \"Placebo\" & compound > split),\n TN = sum(treatment == \"Placebo\" & compound < split))\n \n output <- output + output_pre\n }else{\n ## calculate values if pre-smoking\n output_pre <- df %>% \n filter(tpt_use == \"pre-smoking\") %>%\n summarise(TP = 0,\n FN = 0,\n FP = sum(compound >= split),\n TN = sum(compound < split)) \n \n output <- df %>% \n filter(tpt_use != \"pre-smoking\") %>%\n summarise(TP = sum(treatment != \"Placebo\" & compound >= split),\n FN = sum(treatment != \"Placebo\" & compound < split),\n FP = sum(treatment == \"Placebo\" & compound >= split),\n TN = sum(treatment == \"Placebo\" & compound < split))\n \n output <- output + output_pre\n }\n }\n }\n # clean things up; make calculations on above values\n output <- output %>%\n mutate(detection_limit = split,\n compound = compound,\n time_start = start,\n time_stop = stop,\n time_window = tpt_use,\n NAs = nrow(dataset) - nrow(df),\n N = nrow(dataset_removedups),\n N_removed = nrow(dataset) - nrow(dataset_removedups),\n Sensitivity = (TP/(TP + FN)), \n Specificity = (TN /(TN + FP)),\n PPV = (TP/(TP+FP)),\n NPV = (TN/(TN + FN)),\n Efficiency = ((TP + TN)/(TP + TN + FP + FN))*100\n )\n \n return(output)\n}\n```\n:::\n\n\n### Run\n\n- determine what cutoff values to try\n- carry out above function on those cutoffs\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsens_spec <- function(dataset, compound, start, stop, tpt_use, \n lowest_value = 0.5, splits = NULL, ...){\n # if it's not all NAs...\n if(sum(is.na(dataset[,compound])) != nrow(dataset)){\n # specify what splits should be used for calculations\n if(is.null(splits)){\n limits <- dataset[is.finite(rowSums(dataset[,compound])),compound]\n ## define lower and upper limits\n lower = min(limits, na.rm=TRUE)\n upper = max(limits, na.rm=TRUE)\n ## determine splits to use for calculations\n tosplit = pull(limits[,1])[limits[,1]>0]\n ## only split if there are detectable limits:\n if(length(tosplit)>=1){\n splits = c(lowest_value, quantile(tosplit, probs=seq(0, 1, by = 0.01), na.rm=TRUE))\n splits = unique(splits)\n }else{\n splits = 0\n }\n }else{\n splits = splits\n }\n # filter to include timepoint of interest\n dataset <- dataset %>% \n filter(time_from_start > start & time_from_start <= stop & !is.na(timepoint_use))\n dataset_removedups <- dataset %>%\n filter(!is.na(timepoint_use)) %>% \n group_by(timepoint_use) %>% \n distinct(id, .keep_all = TRUE) %>% \n ungroup()\n\n ## create empty output variable which we'll fill in\n ## iterate through each possible dose and calculate\n output <- map_dfr(as.list(splits), ~make_calculations(dataset, \n dataset_removedups, \n split = .x,\n compound,\n start = start,\n stop = stop, \n tpt_use = tpt_use))\n }\n \n return(output)\n}\n```\n:::\n\n\n### Apply\n\nMap the above for each matrix...\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsens_spec_cpd <- function(dataset, cpd, timepoints, splits = NULL){\n args2 <- list(start = timepoints$start, \n stop = timepoints$stop, \n tpt_use = timepoints$timepoint)\n out <- args2 %>% \n pmap_dfr(sens_spec, dataset, compound = cpd, splits = splits)\n return(out)\n}\n```\n:::\n\n\n### Do it!\n\nThis takes a few minutes to run... (reminder: `cache=TRUE`)\n\n\n::: {.cell hash='13-cs01-analysis_cache/html/unnamed-chunk-10_e432cbdfc0d26929ce6a1bb9fd220e35'}\n\n```{.r .cell-code}\noutput_WB <- map_dfr(compounds_WB, \n ~sens_spec_cpd(dataset = WB, cpd = all_of(.x), \n timepoints = timepoints_WB)) %>% clean_gluc()\n\noutput_BR <- map_dfr(compounds_BR, \n ~sens_spec_cpd(dataset = BR, cpd = all_of(.x),\n timepoints = timepoints_BR)) %>% clean_gluc()\n\noutput_OF <- map_dfr(compounds_OF, \n ~sens_spec_cpd(dataset = OF, cpd = all_of(.x),\n timepoints = timepoints_OF)) %>% clean_gluc()\n```\n:::\n\n\n::: \n\n## ROC \n\n:::panel-tabset\n\n### Code\n\n\n::: {.cell}\n\n```{.r .cell-code}\nss_plot <- function(output, tpts=8, tissue){\n to_include = output %>%\n group_by(compound) %>% \n summarize(mean_detection = mean(detection_limit)) %>% \n filter(mean_detection > 0)\n \n output <- output %>% \n mutate(iszero = ifelse(time_start<0,TRUE,FALSE),\n Sensitivity = round(Sensitivity*100,0),\n Specificity = round(Specificity*100,0)) %>%\n filter(compound %in% to_include$compound,\n time_window != \"pre-smoking\") %>%\n clean_gluc() %>% \n mutate(compound = fct_relevel(as.factor(compound), \"THC\"))\n \n output <- output %>% mutate(\n legend = paste0(time_window,' (N=', N,')'))\n \n blue_colors = c('#C2F8FF', '#A2DDED', '#86BEDC', '#6C9FCA', \n '#547EB9', '#3F5EA8', '#2D4096', '#1E2385',\n '#181173', '#180762', '#180051')\n values = c(blue_colors[1:tpts])\n \n print(ggplot(output, aes(x = detection_limit, y = Sensitivity, group = fct_inorder(legend))) + \n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound, scales = \"free_x\") +\n labs(x = 'Detection Limit',\n y = 'Sensitivity') +\n ylim(0,1) +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16), \n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12)) \n )\n print(\n ggplot(output, aes(x = detection_limit, y = Specificity, group = fct_inorder(legend))) + \n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound, scales = \"free_x\") +\n ylim(0,100) +\n labs(title = tissue,\n x = 'Detection Limit',\n y = 'Specificity') +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16),\n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12))\n )\n print(\n ggplot(output, aes(x=(100-Specificity), y = Sensitivity, group = fct_inorder(legend))) +\n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound) +\n xlim(0, 100) +\n ylim(0, 100) +\n labs(title = tissue,\n x = '(100-Specificity)',\n y = 'Sensitivity') +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16),\n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12),\n axis.text = element_text(size=12))\n )\n}\n\nroc_plot <- function(output, tpts=8, tissue){\n to_include = output %>%\n group_by(compound) %>% \n summarize(mean_detection = mean(detection_limit)) %>% \n filter(mean_detection > 0)\n \n output <- output %>% \n mutate(iszero = ifelse(time_start<0,TRUE,FALSE),\n Sensitivity = round(Sensitivity*100,0),\n Specificity = round(Specificity*100,0)) %>%\n filter(compound %in% to_include$compound,\n time_window != \"pre-smoking\") %>%\n clean_gluc() %>% \n mutate(compound = fct_relevel(as.factor(compound), \"THC\"))\n \n output <- output %>% mutate(\n legend = paste0(time_window,' (N=', N,')'))\n \n blue_colors = c('#C2F8FF', '#86BEDC', \n '#547EB9', '#2D4096',\n '#181173', '#180051')\n values = c(blue_colors[1:tpts])\n print(\n ggplot(output, aes(x=(100-Specificity), y = Sensitivity, group = fct_inorder(legend))) +\n geom_point(aes(color=fct_inorder(legend)), size = 0.9, show.legend = FALSE) +\n geom_path(aes(color=fct_inorder(legend)), size=1.2) + \n facet_grid(~compound) +\n xlim(0, 100) +\n ylim(0, 100) +\n labs(title = tissue,\n x = '(100-Specificity)',\n y = 'Sensitivity') +\n scale_color_manual(values = values, name = 'Time Window') +\n theme_classic(base_size = 12) + \n theme(axis.title = element_text(size=16),\n panel.grid = element_blank(),\n strip.background = element_blank(),\n strip.text.x = element_text(size = 12),\n axis.text = element_text(size=12) )\n )\n}\n```\n:::\n\n\n### Calculate\n\n\n::: {.cell}\n\n```{.r .cell-code}\nss1_a <- ss_plot(output_WB, tpts = length(unique(output_WB$time_start)), tissue = \"Blood\")\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-12-1.png){width=2100}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-12-2.png){width=2100}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-12-3.png){width=2100}\n:::\n\n```{.r .cell-code}\nss2_a <- ss_plot(output_OF, tpts = length(unique(output_OF$time_start)), tissue = \"Oral Fluid\")\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-12-4.png){width=2100}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-12-5.png){width=2100}\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-12-6.png){width=2100}\n:::\n\n```{.r .cell-code}\nss3_a <- roc_plot(output_BR, tpts = length(unique(output_BR$time_start)), tissue = \"Breath\")\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-12-7.png){width=2100}\n:::\n:::\n\n\n### Plot\n\n::: {.cell}\n\n```{.r .cell-code}\nbottom_row <- plot_grid(ss2_a, ss3_a, labels = c('B', 'C'), label_size = 12, ncol = 2, rel_widths = c(0.66, .33))\nplot_grid(ss1_a, bottom_row, labels = c('A', ''), label_size = 12, ncol = 1)\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-13-1.png){width=2100}\n:::\n:::\n\n\n:::\n\n## Calculate: THC\n\nReminder: Currently, states have laws on the books from zero tolerance (detection of any level) to 5ng/mL\n\n:::panel-tabset\n\n### WB\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncutoffs = c(0.5, 1, 2, 5, 10)\nWB_THC <- sens_spec_cpd(dataset = WB, cpd = 'thc',\n timepoints = timepoints_WB,\n splits = cutoffs) %>% clean_gluc()\n\nWB_THC\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 50 × 17\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 81 108 0.5 THC -400 0\n 2 0 0 61 128 1 THC -400 0\n 3 0 0 45 144 2 THC -400 0\n 4 0 0 10 179 5 THC -400 0\n 5 0 0 1 188 10 THC -400 0\n 6 124 2 28 33 0.5 THC 0 30\n 7 123 3 22 39 1 THC 0 30\n 8 119 7 15 46 2 THC 0 30\n 9 106 20 4 57 5 THC 0 30\n10 101 25 0 61 10 THC 0 30\n# ℹ 40 more rows\n# ℹ 9 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency \n```\n:::\n:::\n\n\n### OF\n\n\n::: {.cell}\n\n```{.r .cell-code}\nOF_THC <- sens_spec_cpd(dataset = OF, cpd = 'thc',\n timepoints = timepoints_OF,\n splits = cutoffs) %>% clean_gluc()\n\nOF_THC\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n# A tibble: 40 × 17\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 35 157 0.5 THC -400 0\n 2 0 0 20 172 1 THC -400 0\n 3 0 0 9 183 2 THC -400 0\n 4 0 0 0 192 5 THC -400 0\n 5 0 0 0 192 10 THC -400 0\n 6 129 0 39 24 0.5 THC 0 30\n 7 129 0 30 33 1 THC 0 30\n 8 128 1 19 44 2 THC 0 30\n 9 128 1 3 60 5 THC 0 30\n10 125 4 1 62 10 THC 0 30\n# ℹ 30 more rows\n# ℹ 9 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency \n```\n:::\n:::\n\n\n### BR\n\nWhy is there no calculation for breath?\n\n\n:::\n\n## Cutoffs\n\n:::panel-tabset\n\n### Code\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_cutoffs <- function(dataset, timepoint_use_variable, tissue, labels = c(\"A\", \"B\"), vertline, cpd, x_labels){\n col_val = c(\"#D9D9D9\", \"#BDBDBD\", \"#969696\", \"#636363\", \"#252525\")\n lines = rep(\"solid\", 5)\n \n df_ss <- dataset %>% \n mutate(time_window = fct_relevel(as.factor(time_window), \n levels(timepoint_use_variable)),\n detection_limit = as.factor(detection_limit),\n Sensitivity = round(Sensitivity*100,0),\n Specificity = round(Specificity*100,0),\n my_label = paste0(time_window, ' N=', N),\n my_label = gsub(\" \", \"\\n\", my_label),\n my_label = fct_relevel(as.factor(my_label), x_labels)) #%>% \n \n p1 <- df_ss %>% \n ggplot(aes(x = my_label, y = Sensitivity, \n colour = detection_limit)) + \n geom_line(size = 1.2, aes(group = detection_limit, \n linetype = detection_limit)) + \n geom_vline(xintercept=vertline, linetype = 'dotted') +\n geom_point(show.legend=FALSE) + \n ylim(0,100) +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +\n scale_linetype_manual(values=lines) +\n scale_color_manual(values = col_val, name = \"Cutoff \\n (ng/mL)\",\n guide = guide_legend(override.aes = list(linetype = c(1),\n shape = rep(NA, length(lines))) )) +\n theme_classic() +\n theme( axis.title = element_text(size=16),\n axis.text = element_text(size=10),\n legend.position = c(0.08, 0.4),\n panel.grid = element_blank(),\n strip.background = element_blank()\n ) +\n guides(linetype = FALSE) +\n labs(x = \"Time Window\", \n y = \"Sensitivity\", \n title = paste0(tissue,\": \", cpd) )\n \n p2 <- df_ss %>% \n ggplot(aes(x = my_label, y = Specificity,\n group = detection_limit, \n colour = detection_limit, \n linetype = detection_limit)) + \n geom_line(size = 1.2) +\n geom_vline(xintercept=vertline, linetype = 'dotted') +\n geom_point() + \n ylim(0,100) +\n scale_color_manual(values = col_val) +\n scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +\n scale_linetype_manual(values = lines, \n guide = guide_legend(override.aes = list(linetype = \"solid\",\n shape = rep(NA, length(lines))) )) +\n theme_classic() +\n theme(axis.title = element_text(size=16),\n axis.text = element_text(size=10),\n legend.position = \"none\", \n panel.grid = element_blank(),\n strip.background = element_blank()) +\n labs(x = \"Time Window\", \n y = \"Specificity\",\n title = \"\" )\n \n title <- ggdraw() + \n draw_label(\n tissue,\n x = 0.05,\n hjust = 0\n )\n \n plot_row <- plot_grid(p1, p2, labels = labels, label_size = 12)\n \n plot_grid(\n title, plot_row,\n ncol = 1,\n # rel_heights values control vertical title margins\n rel_heights = c(0.1, 1)\n )\n \n return(list(plot_row, df_ss))\n\n}\n```\n:::\n\n\n### WB\n\n\n::: {.cell}\n\n```{.r .cell-code}\nblood_levels <- c(\"pre-smoking\\nN=189\", \"0-30\\nmin\\nN=187\", \"31-70\\nmin\\nN=165\",\n \"71-100\\nmin\\nN=157\", \"101-180\\nmin\\nN=168\", \"181-210\\nmin\\nN=103\",\n \"211-240\\nmin\\nN=127\", \"241-270\\nmin\\nN=137\", \"271-300\\nmin\\nN=120\",\n \"301+\\nmin\\nN=88\")\n\nplot_cutoffs(dataset=WB_THC, \n timepoint_use_variable=WB$timepoint_use, \n tissue=\"Blood\", \n vertline=levels(WB$timepoint_use)[5], \n cpd=\"THC\", \n x_labels=blood_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-17-1.png){width=2100}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 50 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 81 108 0.5 THC -400 0\n 2 0 0 61 128 1 THC -400 0\n 3 0 0 45 144 2 THC -400 0\n 4 0 0 10 179 5 THC -400 0\n 5 0 0 1 188 10 THC -400 0\n 6 124 2 28 33 0.5 THC 0 30\n 7 123 3 22 39 1 THC 0 30\n 8 119 7 15 46 2 THC 0 30\n 9 106 20 4 57 5 THC 0 30\n10 101 25 0 61 10 THC 0 30\n# ℹ 40 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n### OF\n\n\n::: {.cell}\n\n```{.r .cell-code}\nof_levels <- c(\"pre-smoking\\nN=192\", \"0-30\\nmin\\nN=192\", \"31-90\\nmin\\nN=117\",\n \"91-180\\nmin\\nN=99\", \"181-210\\nmin\\nN=102\", \"211-240\\nmin\\nN=83\",\n \"241-270\\nmin\\nN=90\", \"271+\\nmin\\nN=76\")\n\nplot_cutoffs(OF_THC, OF$timepoint_use, tissue = \"Oral Fluid\", labels = c(\"A\", \"B\"), vertline=levels(OF$timepoint_use)[4], cpd=\"THC\", x_labels=of_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-18-1.png){width=2100}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 40 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 35 157 0.5 THC -400 0\n 2 0 0 20 172 1 THC -400 0\n 3 0 0 9 183 2 THC -400 0\n 4 0 0 0 192 5 THC -400 0\n 5 0 0 0 192 10 THC -400 0\n 6 129 0 39 24 0.5 THC 0 30\n 7 129 0 30 33 1 THC 0 30\n 8 128 1 19 44 2 THC 0 30\n 9 128 1 3 60 5 THC 0 30\n10 125 4 1 62 10 THC 0 30\n# ℹ 30 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n:::\n\n\n## Calculate: CBN\nReminder: Currently, states have laws on the books from zero tolerance (detection of any level) to 5ng/mL\n\n:::panel-tabset\n\n### WB\n\n\n::: {.cell}\n\n```{.r .cell-code}\nWB_CBN = sens_spec_cpd(dataset = WB, cpd = 'cbn',\n timepoints = timepoints_WB,\n splits = cutoffs) %>% clean_gluc()\n\nblood_levels <- c(\"pre-smoking\\nN=189\", \"0-30\\nmin\\nN=187\", \"31-70\\nmin\\nN=165\",\n \"71-100\\nmin\\nN=157\", \"101-180\\nmin\\nN=168\", \"181-210\\nmin\\nN=103\",\n \"211-240\\nmin\\nN=127\", \"241-270\\nmin\\nN=137\", \"271-300\\nmin\\nN=120\",\n \"301+\\nmin\\nN=88\")\n\nplot_cutoffs(WB_CBN, WB$timepoint_use, tissue = \"Blood\", vertline=levels(WB$timepoint_use)[5], cpd=\"CBN\", x_labels=blood_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-19-1.png){width=2100}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 50 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 1 188 0.5 CBN -400 0\n 2 0 0 0 189 1 CBN -400 0\n 3 0 0 0 189 2 CBN -400 0\n 4 0 0 0 189 5 CBN -400 0\n 5 0 0 0 189 10 CBN -400 0\n 6 106 20 7 54 0.5 CBN 0 30\n 7 97 29 0 61 1 CBN 0 30\n 8 82 44 0 61 2 CBN 0 30\n 9 40 86 0 61 5 CBN 0 30\n10 9 117 0 61 10 CBN 0 30\n# ℹ 40 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n### OF\n\n\n::: {.cell}\n\n```{.r .cell-code}\nOF_CBN = sens_spec_cpd(dataset = OF, cpd = 'cbn',\n timepoints = timepoints_OF,\n splits = cutoffs) %>% clean_gluc()\n\nof_levels <- c(\"pre-smoking\\nN=192\", \"0-30\\nmin\\nN=192\", \"31-90\\nmin\\nN=117\",\n \"91-180\\nmin\\nN=99\", \"181-210\\nmin\\nN=102\", \"211-240\\nmin\\nN=83\",\n \"241-270\\nmin\\nN=90\", \"271+\\nmin\\nN=76\")\n\nplot_cutoffs(OF_CBN, OF$timepoint_use, tissue = \"Oral Fluid\", labels = c(\"A\", \"B\"), vertline=levels(OF$timepoint_use)[4], cpd=\"CBN\", x_labels=of_levels)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[[1]]\n```\n:::\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-20-1.png){width=2100}\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\n\n[[2]]\n# A tibble: 40 × 18\n TP FN FP TN detection_limit compound time_start time_stop\n \n 1 0 0 5 187 0.5 CBN -400 0\n 2 0 0 1 191 1 CBN -400 0\n 3 0 0 1 191 2 CBN -400 0\n 4 0 0 1 191 5 CBN -400 0\n 5 0 0 0 192 10 CBN -400 0\n 6 127 2 41 22 0.5 CBN 0 30\n 7 125 4 32 31 1 CBN 0 30\n 8 122 7 18 45 2 CBN 0 30\n 9 116 13 7 56 5 CBN 0 30\n10 107 22 3 60 10 CBN 0 30\n# ℹ 30 more rows\n# ℹ 10 more variables: time_window , NAs , N , N_removed ,\n# Sensitivity , Specificity , PPV , NPV ,\n# Efficiency , my_label \n```\n:::\n:::\n\n\n:::\n\n\n## Compound Correlations\n\n:::panel-tabset\n\n### Code\n\n::: {.cell}\n\n```{.r .cell-code}\nggplotRegression <- function (x, y, xlab, ylab, x_text, y_text, y_text2, title) {\n fit <- lm(y ~ x)\n if(max(fit$model[,1],na.rm=TRUE)!=0){\n ggplot(fit$model, aes_string(x = names(fit$model)[2], \n y = names(fit$model)[1])) + \n geom_point() +\n stat_smooth(method = \"lm\", col = \"#B73239\", size = 1.5, se = FALSE) +\n annotate(\"text\", x=x_text, y=y_text, \n label = paste(\"R^2 == \", format(signif(summary(fit)$adj.r.squared, 5), \n digits=2)),\n vjust=1, hjust=0, parse=TRUE,size=4.5) +\n labs(x = xlab, \n y = ylab, \n title = title ) +\n annotate(\"text\", x=x_text, y=y_text2, label = paste(\n \"y = \", format(signif(fit$coef[[2]], 5),digits=2),\n \"x + \",\n format(signif(fit$coef[[1]],5 ),digits=2),\n paste0(\"\\nN = \", length(x))),\n vjust=1, hjust=0, size=4.5) + \n theme_minimal(base_size=14) +\n theme(panel.grid = element_blank(),\n axis.line = element_line(size = 0.5, linetype = \"solid\",\n colour = \"black\"),\n legend.position=\"none\") \n } else{\n ggplot(fit$model, aes_string(x = names(fit$model)[2], \n y = names(fit$model)[1])) + \n geom_point() +\n scale_y_continuous(limits = c(0,3)) +\n stat_smooth(method = \"lm\", col = \"#B73239\", size = 1.5, se = FALSE) +\n annotate(\"text\", x=x_text, y=y_text, \n label = paste(\"R^2 == \", format(signif(summary(fit)$adj.r.squared, 5), digits=2)),vjust=1, hjust=1, parse=TRUE,size=4.5) +\n labs(x = xlab, \n y = ylab, \n title = title ) +\n annotate(\"text\", x=x_text, y=y_text2, label = paste(\n \"y = \", format(signif(fit$coef[[2]], 5),digits=2),\n \"x + \",\n format(signif(fit$coef[[1]],5 ),digits=2),\n paste0(\"\\nN = \", length(x))), vjust=1, hjust=1,size=4.5) + \n theme_minimal(base_size = 14) +\n theme(panel.grid = element_blank(),\n axis.line = element_line(size = 0.5, linetype = \"solid\",\n colour = \"black\"),\n legend.position=\"none\") \n \n \n }\n}\n```\n:::\n\n\n### Plot\n\n::: {.cell}\n\n```{.r .cell-code}\nwb_reg <- ggplotRegression(WB$thc, WB$cbn, xlab = 'THC (ng/mL)', ylab = 'CBN (ng/mL)', x_text= 150, y_text = 7, y_text2 = 5, title = \"Blood\")\n\nof_reg <- ggplotRegression(OF$thc, OF$cbn, xlab = 'THC (ng/mL)', ylab = 'CBN (ng/mL)', x_text= 12500, y_text = 750, y_text2 = 500, title = \"Oral Fluid\")\n\nplot_grid(wb_reg, of_reg, labels = 'AUTO', label_size = 12, ncol = 2, scale = 1)\n```\n\n::: {.cell-output-display}\n![](13-cs01-analysis_files/figure-html/unnamed-chunk-22-1.png){width=2100}\n:::\n:::\n\n\n:::\n\n## Possible Extensions\n\nOur current question asks for a *single* compound...and you'll need to decide that.\n\n. . . \n\n...but you could imagine a world where more than one compound or more than one matrix could be measured at the roadside.\n\n. . . \n\nSo:\n\n:::incremental\n- combination of the oral fluid and blood that would better predict recent use? (For example if an officer stopped a driver and got a high oral fluid, but could not get a blood sample for a couple of hours and got a relatively low result would this predict recent use better than blood (or OF) alone? \n- Is there a ratio of OF/blood that predicts recent use?\n- Machine learning model to determine optimal combination of measurements/cutoffs to detect recent use?\n:::\n\n. . . \n\nThings to keep in mind:\n\n- some matrices are easier to get at the roadside\n- time from use matters (trying to detect *recent* use)\n- we may not care equally about sensitivity and specificity\n\n## cs01: what to do now?\n\n1. Communicate with your group!\n2. Discuss possible extensions\n3. Make a plan; figure out who's doing what; set deadlines\n4. Implement the plan!\n\n\n## What has to be done:\n\n:::incremental\n- Question | include in Rmd; add extension if applicable\n- Background | summarize and add to what was discussed in classed\n- Data\n - Describe data & variables\n - Data wrangling | likely copy + paste from notes; add explanation as you go\n- Analysis\n - EDA | likely borrowing parts from notes and adding more in; be sure to include interpretations of output & guide the reader\n - Analysis | likely borrowing *most/all* from class; interpretations/guiding reader/contextualizing is essential\n - Extension | must be completed\n- Conclusion | summarize\n- Proofread | ensure it makes sense from top to bottom\n- General Audience communication (submit on Canvas; 1 submission per group)\n:::\n\n## Collaborating on GitHub\n\n- Be sure to pull changes every time you sit down to work\n- Avoid working on the same part of the same file as another teammate OR work in separate files and combine at the end\n- push your changes once you're ready to add them to the group\n\n## Recap {.smaller background-color=\"#92A86A\"}\n\n- Can you describe sensitivity? Specificity?\n- Can you explain how TP, TN, FP, and FN were calculated/defined *in this experiment*?\n- Can you describe the code used to carry out the calculations?\n- Can you interpret the results from these data?\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
@@ -9,9 +9,6 @@
"includes": {
"include-in-header": [
"\n\n\n\n\n\n\n\n\n"
- ],
- "include-after-body": [
- "\n\n\n"
]
},
"engineDependencies": {},
diff --git a/content/lectures/13-cs01-analysis.qmd b/content/lectures/13-cs01-analysis.qmd
index 9cc0a78..741c6c2 100644
--- a/content/lectures/13-cs01-analysis.qmd
+++ b/content/lectures/13-cs01-analysis.qmd
@@ -834,6 +834,36 @@ Things to keep in mind:
- time from use matters (trying to detect *recent* use)
- we may not care equally about sensitivity and specificity
+## cs01: what to do now?
+
+1. Communicate with your group!
+2. Discuss possible extensions
+3. Make a plan; figure out who's doing what; set deadlines
+4. Implement the plan!
+
+
+## What has to be done:
+
+:::incremental
+- Question | include in Rmd; add extension if applicable
+- Background | summarize and add to what was discussed in classed
+- Data
+ - Describe data & variables
+ - Data wrangling | likely copy + paste from notes; add explanation as you go
+- Analysis
+ - EDA | likely borrowing parts from notes and adding more in; be sure to include interpretations of output & guide the reader
+ - Analysis | likely borrowing *most/all* from class; interpretations/guiding reader/contextualizing is essential
+ - Extension | must be completed
+- Conclusion | summarize
+- Proofread | ensure it makes sense from top to bottom
+- General Audience communication (submit on Canvas; 1 submission per group)
+:::
+
+## Collaborating on GitHub
+
+- Be sure to pull changes every time you sit down to work
+- Avoid working on the same part of the same file as another teammate OR work in separate files and combine at the end
+- push your changes once you're ready to add them to the group
## Recap {.smaller background-color="#92A86A"}
diff --git a/content/lectures/images/14/LastLecture.png b/content/lectures/images/14/LastLecture.png
deleted file mode 100644
index 416d2a9..0000000
Binary files a/content/lectures/images/14/LastLecture.png and /dev/null differ
diff --git a/content/lectures/images/14/ex2.png b/content/lectures/images/14/ex2.png
deleted file mode 100644
index f8cbf94..0000000
Binary files a/content/lectures/images/14/ex2.png and /dev/null differ
diff --git a/content/lectures/images/14/sebastianalgharaballi.png b/content/lectures/images/14/sebastianalgharaballi.png
deleted file mode 100644
index 19125b7..0000000
Binary files a/content/lectures/images/14/sebastianalgharaballi.png and /dev/null differ
diff --git a/content/lectures/images/14/shenovad.png b/content/lectures/images/14/shenovad.png
deleted file mode 100644
index 8071ae3..0000000
Binary files a/content/lectures/images/14/shenovad.png and /dev/null differ
diff --git a/docs/content/lectures/08-effective-communication-slides.html b/docs/content/lectures/08-effective-communication-slides.html
index be8f7b6..ec60768 100644
--- a/docs/content/lectures/08-effective-communication-slides.html
+++ b/docs/content/lectures/08-effective-communication-slides.html
@@ -556,37 +556,7 @@
code span.wa { color: #5e5e5e; font-style: italic; }
-
+
+
+
+
@@ -4046,7 +4046,7 @@
Calculating Sensitivity and Specificity
This takes a few minutes to run… (reminder: cache=TRUE)
diff --git a/docs/search.json b/docs/search.json
index 9736a84..fd81157 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -7403,6 +7403,321 @@
"href": "schedule.html",
"title": "COGS 137",
"section": "",
- "text": "Week\nDate\nTitle\nType\n\n\n\n\n0\nTh Sep 28\nWelcome & Tooling\nLecture\n\n\n1\nTu Oct 3\nIntro to R\nLecture\n\n\n1\nTh Oct 5\nData Wrangling: dplyr\nLecture\n\n\n1\nFri Oct 6\nLab 01: Intro to R\nLab\n\n\n2\nTu Oct 10\nData Wrangling: tidyr\nLecture\n\n\n2\nTh Oct 12\nData Visualization: ggplot2 (day 1)\nLecture\n\n\n2\nFri Oct 13\nLab 02: Data Wrangling\nLab\n\n\n3\nMon Oct 16\nHW01 due (11:59 PM)\nHW\n\n\n3\nTu Oct 17\nData Visualization: ggplot2 (day 2)\nLecture\n\n\n3\nTh Oct 19\nData Analysis & Modeling\nLecture\n\n\n3\nFri Oct 20\nLab 03: Data Visualization\nLab\n\n\n4\nTu Oct 24\nLinear Models Review\nLecture\n\n\n4\nTh Oct 26\nEffective Communication\nLecture\n\n\n4\nFri Oct 27\nLab 04: Modeling\nLab\n\n\n5\nMon Oct 30\nHW02 due (11:59 PM)\nHW\n\n\n5\nTu Oct 31\nMultiple Linear Regression*\nLecture\n\n\n5\nTh Nov 2\nCase Study & Final Project Info\nLecture\n\n\n5\nFri Nov 3\nLab used for midterm review\nLab\n\n\n6\nMon Nov 6\nMIDTERM EXAM (due 11:59 PM) \nExam\n\n\n6\nTu Nov 7\nCase Study 01: THC Biomarkers (day 1)\nLecture\n\n\n6\nTh Nov 9\nCase Study 01: THC Biomarkers (day 2)\nLecture\n\n\n6\nFri Nov 10\nLab 05: Multiple Linear Regression\nLab\n\n\n7\nTu Nov 14\nCase Study 01: THC Biomarkers (day 3)\nLecture\n\n\n7\nTh Nov 16\nLogistic Regression\nLecture\n\n\n7\nFri Nov 17\nLab 06: CS01\nLab\n\n\n8\nMon Nov 20\nHW03 due (11:59 PM)\nHW\n\n\n8\nMon Nov 20\nFinal Project Proposal Due\nProject\n\n\n8\nTu Nov 21\nCase Study 02: Vaping Behaviors (day 1)\nLecture\n\n\n8\nTh Nov 23\nNo Class (Thanksgiving)\n--\n\n\n9\nMon Nov 27\nCS01 Due (11:59 PM)\nCase Study\n\n\n9\nTu Nov 28\nCase Study 02: Vaping Behaviors (day 2)\nLecture\n\n\n9\nTh Nov 30\nCase Study 02: Vaping Behaviors (day 3)\nLecture\n\n\n9\nFri Dec 1\nLab 07: Logistic Regression & CS02\nLab\n\n\n10\nMon Dec 4\nCS02 Due\nCase Study\n\n\n10\nTu Dec 5\nFinal Project Brainstorming\nLecture\n\n\n10\nTh Dec 7\nNext Steps\nLecture\n\n\n10\nFri Dec 8\nLab 08: Final Project\nLab\n\n\nFinals\nTu Dec 12\nFinal Project Due (11:59 PM)\nProject"
+ "text": "Week\nDate\nTitle\nType\n\n\n\n\n0\nTh Sep 28\nWelcome & Tooling\nLecture\n\n\n1\nTu Oct 3\nIntro to R\nLecture\n\n\n1\nTh Oct 5\nData Wrangling: dplyr\nLecture\n\n\n1\nFri Oct 6\nLab 01: Intro to R\nLab\n\n\n2\nTu Oct 10\nData Wrangling: tidyr\nLecture\n\n\n2\nTh Oct 12\nData Visualization: ggplot2 (day 1)\nLecture\n\n\n2\nFri Oct 13\nLab 02: Data Wrangling\nLab\n\n\n3\nMon Oct 16\nHW01 due (11:59 PM)\nHW\n\n\n3\nTu Oct 17\nData Visualization: ggplot2 (day 2)\nLecture\n\n\n3\nTh Oct 19\nData Analysis & Modeling\nLecture\n\n\n3\nFri Oct 20\nLab 03: Data Visualization\nLab\n\n\n4\nTu Oct 24\nLinear Models Review\nLecture\n\n\n4\nTh Oct 26\nEffective Communication\nLecture\n\n\n4\nFri Oct 27\nLab 04: Modeling\nLab\n\n\n5\nMon Oct 30\nHW02 due (11:59 PM)\nHW\n\n\n5\nTu Oct 31\nMultiple Linear Regression*\nLecture\n\n\n5\nTh Nov 2\nCase Study & Final Project Info\nLecture\n\n\n5\nFri Nov 3\nLab used for midterm review\nLab\n\n\n6\nMon Nov 6\nMIDTERM EXAM (due 11:59 PM) \nExam\n\n\n6\nTu Nov 7\nCase Study 01: THC Biomarkers (day 1)\nLecture\n\n\n6\nTh Nov 9\nCase Study 01: THC Biomarkers (day 2)\nLecture\n\n\n6\nFri Nov 10\nLab 05: Multiple Linear Regression\nLab\n\n\n7\nTu Nov 14\nCase Study 01: THC Biomarkers (day 3)\nLecture\n\n\n7\nTh Nov 16\ntidymodels\nLecture\n\n\n7\nFri Nov 17\nLab 06: CS01\nLab\n\n\n8\nMon Nov 20\nHW03 due (11:59 PM)\nHW\n\n\n8\nMon Nov 20\nFinal Project Proposal Due\nProject\n\n\n8\nTu Nov 21\nCase Study 02: Air Pollution (day 1)\nLecture\n\n\n8\nTh Nov 23\nNo Class (Thanksgiving)\n--\n\n\n9\nMon Nov 27\nCS01 Due (11:59 PM)\nCase Study\n\n\n9\nTu Nov 28\nCase Study 02: Air Pollution (day 2)\nLecture\n\n\n9\nTh Nov 30\nCase Study 02: Air Pollution (day 3)\nLecture\n\n\n9\nFri Dec 1\nLab 07: CS02\nLab\n\n\n10\nMon Dec 4\nCS02 Due\nCase Study\n\n\n10\nTu Dec 5\nFinal Project Brainstorming\nLecture\n\n\n10\nTh Dec 7\nNext Steps\nLecture\n\n\n10\nFri Dec 8\nLab 08: Final Project\nLab\n\n\nFinals\nTu Dec 12\nFinal Project Due (11:59 PM)\nProject"
+ },
+ {
+ "objectID": "content/lectures/13-cs01-analysis.html#cs01-what-to-do-now",
+ "href": "content/lectures/13-cs01-analysis.html#cs01-what-to-do-now",
+ "title": "13-cs01-analysis",
+ "section": "cs01: what to do now?",
+ "text": "cs01: what to do now?\n\nCommunicate with your group!\nDiscuss possible extensions\nMake a plan; figure out who’s doing what; set deadlines\nImplement the plan!"
+ },
+ {
+ "objectID": "content/lectures/13-cs01-analysis.html#what-has-to-be-done",
+ "href": "content/lectures/13-cs01-analysis.html#what-has-to-be-done",
+ "title": "13-cs01-analysis",
+ "section": "What has to be done:",
+ "text": "What has to be done:\n\n\nQuestion | include in Rmd; add extension if applicable\nBackground | summarize and add to what was discussed in classed\nData\n\nDescribe data & variables\nData wrangling | likely copy + paste from notes; add explanation as you go\n\nAnalysis\n\nEDA | likely borrowing parts from notes and adding more in; be sure to include interpretations of output & guide the reader\nAnalysis | likely borrowing most/all from class; interpretations/guiding reader/contextualizing is essential\nExtension | must be completed\n\nConclusion | summarize\nProofread | ensure it makes sense from top to bottom\nGeneral Audience communication (submit on Canvas; 1 submission per group)"
+ },
+ {
+ "objectID": "content/lectures/13-cs01-analysis.html#collaborating-on-github",
+ "href": "content/lectures/13-cs01-analysis.html#collaborating-on-github",
+ "title": "13-cs01-analysis",
+ "section": "Collaborating on GitHub",
+ "text": "Collaborating on GitHub\n\nBe sure to pull changes every time you sit down to work\nAvoid working on the same part of the same file as another teammate OR work in separate files and combine at the end\npush your changes once you’re ready to add them to the group"
+ },
+ {
+ "objectID": "content/lectures/14-tidymodels.html",
+ "href": "content/lectures/14-tidymodels.html",
+ "title": "14-tidymodels",
+ "section": "",
+ "text": "Q: I had a question about the presentations for the final projects; since it is due during finals week, is it a live presentation in class or do we submit a video? If it is a live presentation, do we present during our designated final day/time on webreg?\nA: Video submission!\n\n\nQ: I also wanted to mention that the mid/pre-course extra credit surveys doesn’t reflect a change in grade on canvas. (For ex. if i put a 0 or 100 for E.C my grade stays the same).\nA: Correct - I add these in at the end. Canvas can do many things, but it doesn’t handle EC well (from what I can tell).\n\n\nQ: I’m overwhelmed/confused by “the code :’) it’s quite a bit to take in”\nA: Yes! It’s a lot! This is why we have group mates on the case study. I encourage everyone to sit with the code after class and then work through it together as you complete the case study!\n\n\nQ: For oral fluid you mentioned looking more into why there’s that big dip in specificity and that we should look more into that on Friday with eda but would that be slightly guided because I have no idea where to start with that.\nA: I would make some plots that specifically look at the data/numbers there to figure out what could be leading to that drop at that particular time window.\n\n\nQ: Why are specificity graphs so high? A: Good question - this is generally b/c people who didn’t smoke have values very close to zero across compounds…so they will rarely be above the cutoff, making this very effective at identifying individuals who did not smoke\n\n\nQ: What is the dplyr::select notation, like is it a way to use select from dplyr without librarying first?\nA: Yes!\n\n\nQ: Also separate topic, but do we have information on impairment so we can account for that with recent use? A: Great question - impairment is very hard to define here. We (the researchers) have data on self-reported high and what the police officers determined, but y’all don’t have that data. So, we’re using knowledge from other studies (see 11-cs01-data notes) to understand what we know on impoairment but only focusing on detecting recent use here.\n\n\nQ: I am unable to locate where to sign up for groups for the final project\nA: This form was just released (sorry for delay). link to survey\n\n\nQ: I think I need more time to digest how the code works together to produce the visuals that we saw.\nA: I agree. I think I could balance and give more time in class…but I will say this is an exercise I want groups to work through together!\n\n\n\n\nDue Dates:\n\nLab06 due Friday - cs01-focused\nHW03 (MLR) due Mon 11/20\nProject Proposal due Mon 11/20 link\nCS01 due 11/27:\n\nReport (submitted via GitHub)\n“General Communication” (submit on Canvas; one submission per group)\nsurvey about how working with group went - due 11/28\n\n\n\n\n\n\nmachine learning intro\n(re)introduce tidymodels\nworked example: ML in tidymodels\n\n\n\n\nThe package itself has some worked examples: https://www.tidymodels.org/start/models/\nThere’s a whole book (written by the developer of tidymodels) that covers the tidymodels package: https://www.tmwr.org/\nThis may be a helpful reference for this lecture, although, of course, the book goes into way more detail than we will in a single lecture.\n\n\n\n\n“Other packages, such as caret and mlr, help to solve the R model API issue. These packages do a lot of other things too: pre-processing, model tuning, resampling, feature selection, ensembling, and so on. In the tidyverse, we strive to make our packages modular and parsnip is designed only to solve the interface issue. It is not designed to be a drop-in replacement for caret. The tidymodels package collection, which includes parsnip, has other packages for many of these tasks, and they are designed to work together. We are working towards higher-level APIs that can replicate and extend what the current model packages can do.” - Max Kuhn (tidymodels developer)\n\n\nBenefits:\n\nStandardized workflow/format/notation across different types of machine learning algorithms\nCan easily modify pre-processing, algorithm choice, and hyper-parameter tuning making optimization easy\n\n\n\n\n\nThe main packages (and their roles):\n ## Machine Learning: intro\nIn intro stats, you should have learned the central dogma of statistics: we sample from a population\n . . .\nThe data from the sample are used to make an inference about the population:\n . . .\nFor prediction, we have a similar sampling problem:\n . . .\nBut now we are trying to build a rule that can be used to predict a single observation’s value of some characteristic using characteristics of the other observations.\n ## ML: the goal\nThe goal is to:\nbuild a machine learning algorithm\n\nthat uses the features as input\n\n\nand predicts a outcome variable\n\n\nin the situation where we do not know the outcome variable.\n\n\n\n\nTypically, you use data where you have both the input and output data to train a machine learning algorithm.\nWhat you need:\n\nA data set to train from.\nAn algorithm or set of algorithms you can use to try values of \\(f\\).\nA distance metric \\(d\\) for measuring how close \\(Y\\) is to \\(\\hat{Y}\\).\nA definition of what a “good” distance is.\n\n\n\n\n\n\n\nHow these packages fit together for carrying out machine learning:\n\n\n\n\n ## Worked example: Hotel Stays\n\nGoal: predict which hotel stays included children and/or babies, based on the other characteristics of the stays such as which hotel the guests stay at, how much they pay, etc\n\nImage Source: https://www.tidymodels.org/start/case-study/\nCase Study: https://www.tidymodels.org/start/case-study/\n\n\n\n\n\nglmnet\n`rf``\n\n\n\n\n\n# install.packages(\"glmnet\")\n# install.packages(\"ranger\")\n# install.packages(\"vip\")\n\nlibrary(tidymodels)\n\n\n\n\nHotel Bookings Data from: Antonio, Almeida, and Nunes (2019)\n\nData Dictionary\n\n\n\nhotels <- \n read_csv(\"https://tidymodels.org/start/case-study/hotels.csv\") |>\n mutate(across(where(is.character), as.factor))\n\n\n\n\nglimpse(hotels)\n\nRows: 50,000\nColumns: 23\n$ hotel City_Hotel, City_Hotel, Resort_Hotel, R…\n$ lead_time 217, 2, 95, 143, 136, 67, 47, 56, 80, 6…\n$ stays_in_weekend_nights 1, 0, 2, 2, 1, 2, 0, 0, 0, 2, 1, 0, 1, …\n$ stays_in_week_nights 3, 1, 5, 6, 4, 2, 2, 3, 4, 2, 2, 1, 2, …\n$ adults 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 1, 2, …\n$ children none, none, none, none, none, none, chi…\n$ meal BB, BB, BB, HB, HB, SC, BB, BB, BB, BB,…\n$ country DEU, PRT, GBR, ROU, PRT, GBR, ESP, ESP,…\n$ market_segment Offline_TA/TO, Direct, Online_TA, Onlin…\n$ distribution_channel TA/TO, Direct, TA/TO, TA/TO, Direct, TA…\n$ is_repeated_guest 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …\n$ previous_cancellations 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …\n$ previous_bookings_not_canceled 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …\n$ reserved_room_type A, D, A, A, F, A, C, B, D, A, A, D, A, …\n$ assigned_room_type A, K, A, A, F, A, C, A, D, A, D, D, A, …\n$ booking_changes 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …\n$ deposit_type No_Deposit, No_Deposit, No_Deposit, No_…\n$ days_in_waiting_list 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …\n$ customer_type Transient-Party, Transient, Transient, …\n$ average_daily_rate 80.75, 170.00, 8.00, 81.00, 157.60, 49.…\n$ required_car_parking_spaces none, none, none, none, none, none, non…\n$ total_of_special_requests 1, 3, 2, 1, 4, 1, 1, 1, 1, 1, 0, 1, 0, …\n$ arrival_date 2016-09-01, 2017-08-25, 2016-11-19, 20…\n\n\n\n\nOutcome variable: children (two levels):\n\nhotels %>% \n count(children) %>% \n mutate(prop = n/sum(n))\n\n# A tibble: 2 × 3\n children n prop\n \n1 children 4038 0.0808\n2 none 45962 0.919 \n\n\n\n\nNotes: - Data are imbalanced - There are methods for combating this issue usingrecipes(search for steps toupsampleordownsample) or other more specialized packages likethemis` - For demo purposes, we’ll model as-is\n\n\n\n\nTypically, data are split into a training and testing datasets\n\n\nTraining | data used to build & tune the model; model “learns” from these data\nTesting | data withheld from training to evaluate model performance (can it generalize?)\n\n\n\n\n\n\n\n\nReminder: children is pretty imbalanced so we’ll use a stratified random sample (to ensure similar proportion of chilren in training and testing)\n\nset.seed(123)\nsplits <- initial_split(hotels, strata = children)\nsplits\n\n\n<37500/12500/50000>\n\n\n\n\nhotel_other <- training(splits)\nhotel_test <- testing(splits)\n\n\n\n\n# training set proportions by children\nhotel_other %>% \n count(children) %>% \n mutate(prop = n/sum(n))\n\n# A tibble: 2 × 3\n children n prop\n \n1 children 3027 0.0807\n2 none 34473 0.919 \n\n\n\n\n\n# test set proportions by children\nhotel_test %>% \n count(children) %>% \n mutate(prop = n/sum(n))\n\n# A tibble: 2 × 3\n children n prop\n \n1 children 1011 0.0809\n2 none 11489 0.919 \n\n\n\n\n\n\n\nmultiple approaches (k-fold CV)\nhere: validation set\n\n Image Source: https://www.tidymodels.org/start/case-study/\n\n\n\n\nTraining: 30,000\nValidation: 7,500\nTesting: 12,500\n\n\n\nset.seed(234)\nval_set <- validation_split(hotel_other, \n strata = children, \n prop = c(0.8))\n\nval_set\n\n# Validation Set Split (0.8/0.2) using stratification \n# A tibble: 1 × 2\n splits id \n \n1 validation\n\n\n\n\n\n\nList ingredients | specify which variables we will be using as our outcome and predictors (assign roles)\n\n\n\n\nSpecify model (parsnip)\n\nlr_mod <- \n logistic_reg(penalty = tune(), mixture = 1) %>% \n set_engine(\"glmnet\")\n\n\nBuild recipe:\n\n\nstep_date() creates predictors for the year, month, and day of the week.\nstep_holiday() generates a set of indicator variables for specific holidays. Although we don’t know where these two hotels are located, we do know that the countries for origin for most stays are based in Europe.\nstep_rm() removes variables; here we’ll use it to remove the original date variable since we no longer want it in the model.\nstep_dummy() converts characters or factors (i.e., nominal variables) into one or more numeric binary model terms for the levels of the original data.\nstep_zv() removes indicator variables that only contain a single unique value (e.g. all zeros). This is important because, for penalized models, the predictors should be centered and scaled.\nstep_normalize() centers and scales numeric variables.\n\n\n\nholidays <- c(\"AllSouls\", \"AshWednesday\", \"ChristmasEve\", \"Easter\", \n \"ChristmasDay\", \"GoodFriday\", \"NewYearsDay\", \"PalmSunday\")\n\nlr_recipe <- \n recipe(children ~ ., data = hotel_other) %>% \n step_date(arrival_date) %>% \n step_holiday(arrival_date, holidays = holidays) %>% \n step_rm(arrival_date) %>% \n step_dummy(all_nominal_predictors()) %>% \n step_zv(all_predictors()) %>% \n step_normalize(all_predictors())\n\n\n\n\n\n\nlr_workflow <- \n workflow() %>% \n add_model(lr_mod) %>% \n add_recipe(lr_recipe)\n\n\n\n\n\nlr_reg_grid <- tibble(penalty = 10^seq(-4, -1, length.out = 30))\n\n. . . Lowest penalty values:\n\nlr_reg_grid %>% top_n(-5)\n\n# A tibble: 5 × 1\n penalty\n \n1 0.0001 \n2 0.000127\n3 0.000161\n4 0.000204\n5 0.000259\n\n\nHighest penalty values:\n\nlr_reg_grid %>% top_n(5)\n\n# A tibble: 5 × 1\n penalty\n