Skip to content

Commit

Permalink
add 17
Browse files Browse the repository at this point in the history
  • Loading branch information
ShanEllis committed Nov 30, 2023
1 parent c829837 commit 5929420
Show file tree
Hide file tree
Showing 205 changed files with 15,599 additions and 7,386 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

10 changes: 3 additions & 7 deletions _freeze/content/lectures/14-tidymodels/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
{
"hash": "877b12ac78d2dc03fae8cfd51e76edec",
"hash": "8e36f36a39c719c350ab623d5c02bbff",
"result": {
"markdown": "---\ntitle: \"14-tidymodels\"\nauthor: \"Professor Shannon Ellis\"\ndate: \"2023-11-21\"\n\nformat:\n html: \n output-file: 14-tidymodels.html\n embed-resources: true\n revealjs:\n output-file: 14-tidymodels-slides.html\n slide-number: true\n chalkboard: false \n preview-links: auto\n logo: images/cogs137-logo-hex.png\n css: slides.css\n footer: <https://cogs137.github.io/website/>\n scrollable: true\n embed-resources: true\n execute:\n echo: true\n eval: true\n---\n\n::: {.cell}\n\n:::\n\n\n\n# `tidymodels` {background-color=\"#92A86A\"}\n\n\n## Q&A {.smaller}\n\n> Q: I had a question about the presentations for the final projects; since it is due during finals week, is it a live presentation in class or do we submit a video? If it is a live presentation, do we present during our designated final day/time on webreg? \\\n> A: Video submission!\n\n> Q: I also wanted to mention that the mid/pre-course extra credit surveys doesn’t reflect a change in grade on canvas. (For ex. if i put a 0 or 100 for E.C my grade stays the same).\\\n> A: Correct - I add these in at the end. Canvas can do many things, but it doesn't handle EC well (from what I can tell).\n\n> Q: I'm overwhelmed/confused by \"the code :') it's quite a bit to take in\"\\\n> A: Yes! It's a lot! This is why we have group mates on the case study. I encourage everyone to sit with the code after class and then work through it together as you complete the case study!\n\n> Q: For oral fluid you mentioned looking more into why there's that big dip in specificity and that we should look more into that on Friday with eda but would that be slightly guided because I have no idea where to start with that.\\\n> A: I would make some plots that specifically look at the data/numbers there to figure out what could be leading to that drop at that particular time window.\n\n> Q: Why are specificity graphs so high?\\ \n> A: Good question - this is generally b/c people who didn't smoke have values very close to zero across compounds...so they will rarely be above the cutoff, making this very effective at identifying individuals who did not smoke\n\n> Q: What is the dplyr::select notation, like is it a way to use select from dplyr without librarying first? \\\n> A: Yes!\n\n> Q: Also separate topic, but do we have information on impairment so we can account for that with recent use?\\ \n> A: Great question - impairment is *very* hard to define here. We (the researchers) have data on self-reported high and what the police officers determined, but y'all don't have that data. So, we're using knowledge from other studies (see 11-cs01-data notes) to understand what we know on impoairment but only focusing on detecting recent use here. \n\n> Q: I am unable to locate where to sign up for groups for the final project\\\n> A: This form was just released (sorry for delay). [link to survey](https://forms.gle/EhrwnqAjRHMA6kKa9)\n\n> Q: I think I need more time to digest how the code works together to produce the visuals that we saw. \\\n> A: I agree. I think I could balance and give more time in class...but I will say this is an exercise I want groups to work through together!\n\n## Course Announcements\n\nDue Dates:\n\n- No class This Th; No Lab this Fri (Happy Thanksgiving!)\n- **CS01** due Monday 11/27\n - group work survey due Tues 11/28\n\n. . .\n\nNotes: \n\n- Be sure you watch the video from last Thursday on Canvas\n- Any questions about CS01?\n\n## Agenda\n\n- machine learning intro\n- (re)introduce `tidymodels`\n- <s>worked example: ML in `tidymodels`</s>\n\n## Suggested Resources\n\n- The package itself has some worked examples: https://www.tidymodels.org/start/models/\n- There's a whole book (written by the developer of `tidymodels`) that covers the `tidymodels` package: [https://www.tmwr.org/](https://www.tmwr.org/) \n\n## `tidymodels`: philosophy\n\n> “Other packages, such as caret and mlr, help to solve the R model API issue. These packages do a lot of other things too: pre-processing, model tuning, resampling, feature selection, ensembling, and so on. In the tidyverse, we strive to make our packages modular and parsnip is designed only to solve the interface issue. It is not designed to be a drop-in replacement for caret. The tidymodels package collection, which includes parsnip, has other packages for many of these tasks, and they are designed to work together. We are working towards higher-level APIs that can replicate and extend what the current model packages can do.” - Max Kuhn (`tidymodels` developer)\n\n. . .\n\nBenefits: \n\n1. Standardized workflow/format/notation across different types of machine learning algorithms\n2. Can easily modify pre-processing, algorithm choice, and hyper-parameter tuning making optimization easy\n\n## `tidymodels`: ecosystem\n\nThe main packages (and their roles):\n\n<p align=\"center\">\n <img width=\"800\" src=\"images/14/simpletidymodels.png\">\n</p>\n\n## Machine Learning: intro\n\nIn intro stats, you should have learned the central dogma of statistics: we sample from a population\n\n![](images/14/cdi1.png)\n\n. . . \n\nThe data from the sample are used to make an inference about the population:\n\n![](images/14/cdi2.png)\n\n. . .\n\nFor prediction, we have a similar sampling problem:\n\n![](images/14/cdp1.png)\n\n. . .\n\nBut now we are trying to build a rule that can be used to predict a single observation's value of some characteristic using characteristics of the other observations. \n\n![](images/14/cdp2.png)\n\n## ML: the goal\n\nThe goal is to:\n\nbuild a machine learning algorithm \n\n. . .\n\nthat uses features as input\n\n. . . \n\nand predicts an outcome variable \n\n. . .\n\nin the situation where we do not know the outcome variable.\n\n\n## Classic ML\n\nTypically, you use data where you have both the input and output data to **train** a machine learning algorithm.\n\n. . . \n\nWhat you need: \n\n:::incremental\n1. A data set to train from. \n2. An algorithm or set of algorithms you can use to try values of $f$.\n3. A distance metric $d$ for measuring how close $Y$ is to $\\hat{Y}$.\n4. A definition of what a \"good\" distance is.\n:::\n\n## `tidymodels` for ML\n\nHow these packages fit together for carrying out machine learning:\n\n![](images/14/MachineLearning.png) \n\n## `tidymodels`: steps\n\n![](images/14/Updated_tidymodels_basics.png)\n\n## Recap {.smaller background-color=\"#92A86A\"}\n\n- Can you describe the basics of machine learning?\n- Can you describe the goals of and general steps in `tidymodels`?\n",
"markdown": "---\ntitle: \"14-tidymodels\"\nauthor: \"Professor Shannon Ellis\"\ndate: \"2023-11-21\"\n\nformat:\n html: \n output-file: 14-tidymodels.html\n embed-resources: true\n revealjs:\n output-file: 14-tidymodels-slides.html\n slide-number: true\n chalkboard: false \n preview-links: auto\n logo: images/cogs137-logo-hex.png\n css: slides.css\n footer: <https://cogs137.github.io/website/>\n scrollable: true\n embed-resources: true\n execute:\n echo: true\n eval: true\n---\n\n::: {.cell}\n\n:::\n\n\n# `tidymodels` {background-color=\"#92A86A\"}\n\n## Q&A {.smaller}\n\n> Q: I had a question about the presentations for the final projects; since it is due during finals week, is it a live presentation in class or do we submit a video? If it is a live presentation, do we present during our designated final day/time on webreg?\\\n> A: Video submission!\n\n> Q: I also wanted to mention that the mid/pre-course extra credit surveys doesn't reflect a change in grade on canvas. (For ex. if i put a 0 or 100 for E.C my grade stays the same).\\\n> A: Correct - I add these in at the end. Canvas can do many things, but it doesn't handle EC well (from what I can tell).\n\n> Q: I'm overwhelmed/confused by \"the code :') it's quite a bit to take in\"\\\n> A: Yes! It's a lot! This is why we have group mates on the case study. I encourage everyone to sit with the code after class and then work through it together as you complete the case study!\n\n> Q: For oral fluid you mentioned looking more into why there's that big dip in specificity and that we should look more into that on Friday with eda but would that be slightly guided because I have no idea where to start with that.\\\n> A: I would make some plots that specifically look at the data/numbers there to figure out what could be leading to that drop at that particular time window.\n\n> Q: Why are specificity graphs so high?  A: Good question - this is generally b/c people who didn't smoke have values very close to zero across compounds...so they will rarely be above the cutoff, making this very effective at identifying individuals who did not smoke\n\n> Q: What is the dplyr::select notation, like is it a way to use select from dplyr without librarying first?\\\n> A: Yes!\n\n> Q: Also separate topic, but do we have information on impairment so we can account for that with recent use?  A: Great question - impairment is *very* hard to define here. We (the researchers) have data on self-reported high and what the police officers determined, but y'all don't have that data. So, we're using knowledge from other studies (see 11-cs01-data notes) to understand what we know on impoairment but only focusing on detecting recent use here.\n\n> Q: I am unable to locate where to sign up for groups for the final project\\\n> A: This form was just released (sorry for delay). [link to survey](https://forms.gle/EhrwnqAjRHMA6kKa9)\n\n> Q: I think I need more time to digest how the code works together to produce the visuals that we saw.\\\n> A: I agree. I think I could balance and give more time in class...but I will say this is an exercise I want groups to work through together!\n\n## Course Announcements\n\nDue Dates:\n\n- No class This Th; No Lab this Fri (Happy Thanksgiving!)\n- **CS01** due Monday 11/27\n - group work survey due Tues 11/28\n\n. . .\n\nNotes:\n\n- Be sure you watch the video from last Thursday on Canvas\n- Any questions about CS01?\n\n## Agenda\n\n- machine learning intro\n- (re)introduce `tidymodels`\n- <s>worked example: ML in `tidymodels`</s>\n\n## Suggested Resources\n\n- The package itself has some worked examples: https://www.tidymodels.org/start/models/\n- There's a whole book (written by the developer of `tidymodels`) that covers the `tidymodels` package: <https://www.tmwr.org/>\n\n## `tidymodels`: philosophy\n\n> \"Other packages, such as caret and mlr, help to solve the R model API issue. These packages do a lot of other things too: pre-processing, model tuning, resampling, feature selection, ensembling, and so on. In the tidyverse, we strive to make our packages modular and parsnip is designed only to solve the interface issue. It is not designed to be a drop-in replacement for caret. The tidymodels package collection, which includes parsnip, has other packages for many of these tasks, and they are designed to work together. We are working towards higher-level APIs that can replicate and extend what the current model packages can do.\" - Max Kuhn (`tidymodels` developer)\n\n. . .\n\nBenefits:\n\n1. Standardized workflow/format/notation across different types of machine learning algorithms\n2. Can easily modify pre-processing, algorithm choice, and hyper-parameter tuning making optimization easy\n\n## `tidymodels`: ecosystem\n\nThe main packages (and their roles):\n\n<p align=\"center\">\n\n<img src=\"images/14/simpletidymodels.png\" width=\"800\"/>\n\n</p>\n\n## Machine Learning: intro\n\nIn intro stats, you should have learned the central dogma of statistics: we sample from a population\n\n![](images/14/cdi1.png)\n\n. . .\n\nThe data from the sample are used to make an inference about the population:\n\n![](images/14/cdi2.png)\n\n. . .\n\nFor prediction, we have a similar sampling problem:\n\n![](images/14/cdp1.png)\n\n. . .\n\nBut now we are trying to build a rule that can be used to predict a single observation's value of some characteristic using characteristics of the other observations.\n\n![](images/14/cdp2.png)\n\n## ML: the goal\n\nThe goal is to:\n\nbuild a machine learning algorithm\n\n. . .\n\nthat uses features as input\n\n. . .\n\nand predicts an outcome variable\n\n. . .\n\nin the situation where we do not know the outcome variable.\n\n## Classic ML\n\nTypically, you use data where you have both the input and output data to **train** a machine learning algorithm.\n\n. . .\n\nWhat you need:\n\n::: incremental\n1. A data set to train from.\n2. An algorithm or set of algorithms you can use to try values of $f$.\n3. A distance metric $d$ for measuring how close $Y$ is to $\\hat{Y}$.\n4. A definition of what a \"good\" distance is.\n:::\n\n## `tidymodels` for ML\n\nHow these packages fit together for carrying out machine learning:\n\n![](images/14/MachineLearning.png)\n\n## `tidymodels`: steps\n\n![](images/14/Updated_tidymodels_basics.png)\n\n## Recap {.smaller background-color=\"#92A86A\"}\n\n- Can you describe the basics of machine learning?\n- Can you describe the goals of and general steps in `tidymodels`?\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {
"include-after-body": [
"\n<script>\n // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n // slide changes (different for each slide format).\n (function () {\n // dispatch for htmlwidgets\n function fireSlideEnter() {\n const event = window.document.createEvent(\"Event\");\n event.initEvent(\"slideenter\", true, true);\n window.document.dispatchEvent(event);\n }\n\n function fireSlideChanged(previousSlide, currentSlide) {\n fireSlideEnter();\n\n // dispatch for shiny\n if (window.jQuery) {\n if (previousSlide) {\n window.jQuery(previousSlide).trigger(\"hidden\");\n }\n if (currentSlide) {\n window.jQuery(currentSlide).trigger(\"shown\");\n }\n }\n }\n\n // hookup for slidy\n if (window.w3c_slidy) {\n window.w3c_slidy.add_observer(function (slide_num) {\n // slide_num starts at position 1\n fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n });\n }\n\n })();\n</script>\n\n"
]
},
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
Expand Down
10 changes: 3 additions & 7 deletions _freeze/content/lectures/15-cs02-data/execute-results/html.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@ website:
href: content/lectures/15-cs02-data-slides.html
- text: "16-cs02-eda"
href: content/lectures/16-cs02-eda-slides.html
# - text: "17-cs02-analysis"
# href: content/lectures/17-cs02-analysis-slides.html
- text: "17-cs02-analysis"
href: content/lectures/17-cs02-analysis-slides.html
# - text: "18-brainstorming"
# href: content/lectures/18-brainstorming-slides.html
# - text: "19-wrap-up"
Expand Down Expand Up @@ -119,8 +119,8 @@ website:
href: content/lectures/15-cs02-data.html
- text: "16-cs02-eda"
href: content/lectures/16-cs02-eda.html
# - text: "17-cs02-analysis"
# href: content/lectures/17-cs02-analysis.html
- text: "17-cs02-analysis"
href: content/lectures/17-cs02-analysis.html
# - text: "18-brainstorming"
# href: content/lectures/18-brainstorming.html
# - text: "19-wrap-up"
Expand Down
4 changes: 4 additions & 0 deletions content/cs/cs02.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ output:
eval: false
---

::: callout-important
CS02 is not required fa23 quarter. Students have the option to complete CS02 in lieu of the typical final project. This will be completed in your **final project groups** and will require use of some outside source of data.
:::

This is your second case study report, so you get to incorporate the general feedback from cs01 and carry out another complete data science project! This report will include your analysis from top (the background and question) to bottom (your analysis, interpretation, and conclusions.)

We'll be grading to see that you have: 1) all necessary code for each section of the project; 2) explanatory text that guides the reader from start to finish; 3) polished visualizations that allow the reader to both understand the data you're working with an your conclusions.
Expand Down
12 changes: 10 additions & 2 deletions content/final/final.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ This presentation must teach the details of the R package, the statistical topic

### Deliverable: Presentation

Students must also present their slides in a presentation that is **10-15min long**. This presentation can either be pre-recorded and submitted on Canvas *or* groups can sign up for a presentation slot to present in-person Thursday of finals week. For this option, **all students must participate in the presentation**. There is no grade difference for those who choose to pre-record or present in person.
Students must also present their slides in a presentation that is **10-15min long**. This presentation will be pre-recorded and submitted on Canvas. For this option, **all students must participate in the presentation**.

### Deliverable: General Communication

Expand All @@ -57,12 +57,20 @@ This will likely *not* be quite as long as a case study in this course, but will

### Deliverable: Presentation

Students must present their case study in a presentation that is **3-5min long**. What you use to visually support this presentation (slides, or something else) is up to you but should follow the effective communication aspects discussed in class. This presentation can either be pre-recorded and submitted on Canvas *or* groups can sign up for a presentation slot to present in-person Thursday of finals week. For this option, **at least one group member must present the project** (in other words, not everyone has to "speak" but everyone in the group is responsible for the contents). There is no grade difference for those who choose to pre-record or present in person.
Students must present their case study in a presentation that is **3-5min long**. What you use to visually support this presentation (slides, or something else) is up to you but should follow the effective communication aspects discussed in class. This presentation will pre-recorded and submitted on Canvas. For this option, **at least one group member must present the project** (in other words, not everyone has to "speak" but everyone in the group is responsible for the contents).

### Deliverable: General Communication

This will be a communication targeted to the general public (non-technical, non-data scientists) conveying the most important finding(s) from your project.

## Option 3: CS02 + Additional Data

Students can choose to carry out CS02 for their final project; however, students will have to find an additional dataset on a related topic (pollution, climate change, etc.) and incorporate that into the ir final report. See [CS02](https://cogs137.github.io/website/content/cs/cs02.html) documentation for details on report and general communication deliverables.

### Deliverable: Presentation

Students must also present their project in a presentation that is **10-15min long**. This presentation will be pre-recorded and submitted on Canvas. For this option, **all students must participate in the presentation**.

## Group Feedback

There will be a form to submit upon submission of the final project to provide feedback about working with your group mates. As with the case studies, this is meant to motivate not scare. Most groups work out really really well and everyone contributes to the best of their ability. However, if and when that doesn't happen, I want to be sure I'm aware of the circumstances and follow up as necessary.
Loading

0 comments on commit 5929420

Please sign in to comment.