Skip to content

Commit

Permalink
markdown source builds
Browse files Browse the repository at this point in the history
Auto-generated via {sandpaper}
Source  : eb73856
Branch  : main
Author  : Sarah Kaspar <[email protected]>
Time    : 2023-02-07 19:23:14 +0000
Message : Merge pull request #1 from zkamvar/znk-update-cache

Update Package Cache
  • Loading branch information
actions-user committed Feb 7, 2023
1 parent d92ef89 commit 4d269f7
Show file tree
Hide file tree
Showing 41 changed files with 1,007 additions and 529 deletions.
15 changes: 8 additions & 7 deletions 01-sampling.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,31 @@
---
title: " What is sampling"
teaching: 10
exercises: 2
teaching: 5
exercises: 0
---

:::::::::::::::::::::::::::::::::::::: questions

- How do you write a lesson using R Markdown and `{sandpaper}`?
- What is sampling?
- What requirements should a good sample fulfill?


::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Explain how to use markdown with the new lesson template
- Demonstrate how to include pieces of code, figures, and nested challenge blocks
- Introduce the concept of sampling.

::::::::::::::::::::::::::::::::::::::::::::::::

##



<p align="center">
<img src="/fig/sampling-frogs.png" width="500"/>
</p>

Let's start with an example, and thereby define some terminology. We have a lake with frogs in it, and there are light and dark green frogs. There’s a sunny side of the lake, and a shadowy area by the trees. Now imagine you want to estimate the fraction of light green frogs in the lake. There are too many frogs to count them all, so you catch a few and count how many of them are light coloured. This is a sample. A sample are randomly independently drawn events from a population of interest. The population of interest, in this case, are all the frogs in that lake. How can we draw randomly and independently? One obvious thing you could randomize in this experiment is the location at which you cath the frogs, because from the above picture you could get the impression that light-coloured frogs gather more in the shadows, while the dark-green frogs like the sun. Therefore, if we caught all the frogs in the same area, like in sample 1, this would probably overrepresent light frogs, thus not representing the population well. When randomizing the locations, this is less likely to be the case (see for example sample 2). You get similar problems if the observations are not independent. One example of dependent observations would be if you start with one frog, then catch the one right next to it, and so on. This is also likely to overrepresent one colour of frogs, and the reason why observations shouldn’t depend on each other. The sample size is the number of frogs in one sample. And the distribution is a set of rules that the random frog catches follow.
Let's start with an example, and thereby define some terminology. We have a lake with frogs in it, and there are light and dark green frogs. There’s a sunny side of the lake, and a shadowy area by the trees. Now imagine you want to estimate the fraction of light green frogs in the lake. There are too many frogs to count them all, so you catch a few and count how many of them are light coloured. This is a sample. A **sample** are randomly independently drawn events from a **population of interest**. The population of interest, in this case, are all the frogs in that lake. How can we draw **randomly and independently**? One obvious thing you could randomize in this experiment is the location at which you cath the frogs, because from the above picture you could get the impression that light-coloured frogs gather more in the shadows, while the dark-green frogs like the sun. Therefore, if we caught all the frogs in the same area, like in sample 1, this would probably over-represent light frogs, thus not representing the population well. When randomizing the locations, this is less likely to be the case (see for example sample 2). You get similar problems if the observations are not independent. One example of dependent observations would be if you start with one frog, then catch the one right next to it, and so on. This is also likely to over-represent one colour of frogs, and the reason why observations shouldn’t depend on each other. The **sample size** is the number of frogs in one sample.

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

Expand Down
52 changes: 34 additions & 18 deletions 02-distributions.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: "What is a probability distribution?"
teaching: 10
exercises: 2
teaching: 8
exercises: 4
---

:::::::::::::::::::::::::::::::::::::: questions

- What is a probability distribution`?
- What is a probability distribution?

::::::::::::::::::::::::::::::::::::::::::::::::

Expand All @@ -19,39 +19,57 @@ exercises: 2

## Overview probability distributions

CONTENT STILL TO COME FROM VIDEO
Most data analyses assume that data comes from some distribution. A probability distribution assigns probabilities to possible outcomes of an experiment. In terms of sampling, even if the sampling is supposed to be random, it doesn't mean that there are no rules -- so one could say that the probability distribution defines the rules for randomness.

![](https://vimeo.com/647705308)
Let's get back to our example of the lake of frogs, which is inhabited by light and dark green frogs. Let's say, the true fraction of light green frogs is $1/3$, and you decide sample of $n=10$ frogs from that lake at random.
Then, if you count the number of light-colored frogs within that net, there are 11 possible outcomes: The number can be between 0 and 10.


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor
Below, you see a plot out of this, where each of these events has a probability. A suitable distribution for describing this scenario is the *binomial distribution*, which assumes a number of trials (frog catches) which can have two outcomes (light, dark).

Inline instructor notes can help inform instructors of timing challenges
associated with the lessons. They appear in the "Instructor View"
<p align="center">
<img src="/fig/sampling-frogs-2.png" width="500"/>
</p>

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: challenge
If the true fraction of light frogs is one third, then the most likely outcome is catching 3 light frogs. Seeing 10 light frogs is rather unlikely: The probability is close to zero, and we would consider this a rare event.

<p align="center">
<img src="/fig/sampling-frogs-3.png" width="500"/>
</p>

## Challenge 1: Which of the following statements are true?



::::::::::::::::::::::::::::::::::::: challenge
## Challenge: Which of the following statements are true?

1. A probability distribution assigns probabilities to possible outcomes of an experiment.
2. The probabilities in a statistical distribution sum/integrate up to 1.
3. If the experiments are not randomized, the results don't follow a statistical distribution.


:::::::::::::::::::::::: solution

## Solution

Answers 1 and 2 are correct. To 3: If experiments are not randomized, the results still follow some distribution, but they are likely to not represent reality well.
::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::

<p align="center">
<img src="/fig/distributions-1.png" width="800"/>
</p>


## Challenge 2: Discrete distributions
There are two types of probability distributions: discrete and continuous.

A **discrete distribution** is what we have just seen, in this case the observations can only take integer values, in our example the counts from 0 to 10. In between those values, the probabilities are zero, that’s why it’s called a probability mass function: The probability mass is on defined points.

In a **continuous distribution**, we have a probability density function. An example is the Gaussian distribution. That would be suitable if we measured the sizes of frogs, and they are well described by a mean size and a variance. On a continuous scale, there are infinitely many values, so that the probability for a specific value, for example a frog size of exactly 9cm is zero.
What makes sense instead is to ask for the probability of an observation to fall into a certain interval, for example between 8 and 10 cm, and we get this probability from integrating over the probability density function.


::::::::::::::::::::::::::::::: challenge
## Challenge: Discrete distributions

What is the probability of an outcome of X=1.5 in a discrete distribution?

Expand All @@ -60,9 +78,7 @@ What is the probability of an outcome of X=1.5 in a discrete distribution?
- 0.15

:::::::::::::::::::::::: solution

The value $1.5$ is not discrete, and can therefore not occur in a discrete distribution. Its probability is zero.

:::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::

65 changes: 26 additions & 39 deletions 03-binomial.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,55 @@
---
title: "The binomial distribution"
teaching: 10
exercises: 2
teaching: 5
exercises: 0
---

:::::::::::::::::::::::::::::::::::::: questions

- What is the binomial distribution and?
- What is the binomial distribution?
- What kind of data is it used on?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives


- Explain how the binomial distribution describes outcomes of counting.

::::::::::::::::::::::::::::::::::::::::::::::::

## Overview probability distributions

The binomial distribution is what we have just seen in the example: We use it when we have a fixed sample size and count the number of "successes" in that sample -- for example mutations in a genome, or the number of cells within a sample that show a certain phenotype.

TRANSLATE VIDEO


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

Inline instructor notes can help inform instructors of timing challenges
associated with the lessons. They appear in the "Instructor View"
The binomial distribution is what we have just seen in the example: We use it when we have a fixed sample size and count the number of "successes" in that sample. Examples are:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
- how many locations in the genome carry a mutation
- the number of cells within a sample that show a certain phenotype
- how many patients out of 100 have a certain disease
- how many out of 10 frogs are light green

::::::::::::::::::::::::::::::::::::: challenge
<p align="center">
<img src="/fig/sampling-frogs-2.png" width="500"/>
</p>

## Challenge 1: Which of the following statements are true?
The binomial model has two parameters, which means the probabilities for the individual outcomes depend on two things:
- $n$ is the number of trials, or frogs, or patients, and it’s fixed.
- $p$ is the success probability.

We are in a diagnostic laboratory that gets blood samples from incoming hospital patients and tests them for some disease. Which of these experiments can be modeled with a binomial distribution?
Then the probability of observing $k$ successes out of $n$ draws (for example $k=4$ light coloured frogs out of $N=10$) can be described by this formula:

1. Counting the total number of samples that get tested over one day.
2. Counting the number of positive samples out of 50 samples that get tested successively.
3. Measuring all the blood sample's volumes (in mL).
$$P(X=k) = {n\choose k}p^k(1-p)^k$$

You don't have to remember this piece of math -- it's just to make the point that you can calculate the probability of an event that is modeled with the binomial distribution, if you know the success probability $p$ and the number of trials $n$, i.e. the parameters.

:::::::::::::::::::::::: solution

## Solution
Counting the number of positive samples out of 50 samples that get tested successively.
::::::::::::::::::: callout
In the binomial we just define a particular outcome as success, for example a light-coloured frog, or a patient with disease, even though that may not be a favourable outcome.
:::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::
Here is what the distribution looks like for a success probability of 0.3 and a sample size of 10.


## Challenge 2: Discrete distributions
<p align="center">
<img src="/fig/binomial.png" width="500"/>
</p>

What is the probability of an outcome of X=1.5 in a discrete distribution?

- 0
- 0.5
- 0.15

:::::::::::::::::::::::: solution

The value $1.5$ is not discrete, and can therefore not occur in a discrete distribution. Its probability is zero.

:::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::

The expected value of the binomial is $n \times p$, which is quite intuitive: If we catch 10 frogs and the probability of being light-green is 0.3, then we expect to catch 3 light-green frogs on average.
45 changes: 20 additions & 25 deletions 04-distributions-R.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
---
title: "Probability distributions in R"
teaching: 10
exercises: 2
teaching: 5
exercises: 7
---

:::::::::::::::::::::::::::::::::::::: questions

- What is the binomial distribution and?
- What kind of data is it used on?
- How can I calculate probabilities in R?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives


- Demonstrate and practice the use of the base R functions for calculating probabilities.
- Explain the concept of cumulative distribution.

::::::::::::::::::::::::::::::::::::::::::::::::

Expand All @@ -31,9 +31,15 @@ The first letter specifies if we want to look at the density, probability distri

The arguments depend on the distribution we are looking at, but always include the parameters of that function.

**Calculating probabilities:** Let's use the example where we caught 10 frogs and count how many of them are light-colored.
## Calculating probabilities


Let's use the example where we caught 10 frogs and count how many of them are light-colored.


![](../images/binomial_frogs.png)
<p align="center">
<img src="/fig/sampling-frogs-2.png" width="500"/>
</p>

For known parameters, we can calculate the the chances of counting exactly 5 light-colored frogs:

Expand All @@ -47,7 +53,7 @@ dbinom(x=5, size=n, prob=p)
[1] 0.1029193
```

We can ask for the probability of catching at most (or at least) 5 light frogs. In this case, we need the cumulative probability distribution starting with `p`:
We can ask for the probability of catching at most 5 light frogs. In this case, we need the cumulative probability distribution starting with `p`:


```r
Expand All @@ -58,6 +64,8 @@ pbinom(q=5, size=n, prob=p) # at most
[1] 0.952651
```

Similarly, we can ask for the probability of catching more than 5 light frogs:

```r
pbinom(q=5, size=n,prob=p, lower.tail=FALSE) # larger than
```
Expand All @@ -66,12 +74,14 @@ pbinom(q=5, size=n,prob=p, lower.tail=FALSE) # larger than
[1] 0.04734899
```



Catching at least 5 light frogs is a rare event.


::::::::::::::::::::::::::::::::::::: challenge

## Challenge 1: Disease prevalence
## Challenge: Disease prevalence

There is a disease with a known prevalence of 4%. You have a group of 100 randomly selected persons. Use the above functions to calculate

Expand All @@ -81,8 +91,6 @@ There is a disease with a known prevalence of 4%. You have a group of 100 random

:::::::::::::::::::::::: solution

## Solution

1. Exactly 7 persons:

```r
Expand All @@ -105,20 +113,7 @@ pbinom(q=6, size=100, prob=0.04, lower.tail=FALSE)


:::::::::::::::::::::::::::::::::
:::::::::::::::::::::::::::::::::::


## Challenge 2: Discrete distributions

What is the probability of an outcome of X=1.5 in a discrete distribution?

- 0
- 0.5
- 0.15

:::::::::::::::::::::::: solution

The value $1.5$ is not discrete, and can therefore not occur in a discrete distribution. Its probability is zero.

:::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::

Loading

0 comments on commit 4d269f7

Please sign in to comment.