Chapter3.Rmd

---
title: "Statistical Rethinking - Chapter 3"
author: "Lucy Liu"
output:
  html_document:
    df_print: paged
---

```{r, echo=FALSE, message=FALSE}
library(knitr)
```

Questions from 'Statistical Rethinking' by Richard McElreath, Chapter 3.

# Easy

```{r}
# code to generate samples from the posterior distribution
p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior <- rep( 1 , 1000 )
likelihood <- dbinom( 6 , size=9 , prob=p_grid )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
set.seed(100)
samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )

# load rethinking library
library(rethinking)
```

First visualise samples using a histogram:
```{r}
hist(samples)
```

Questions:

1. Below 0.2
```{r}
sum(samples < 0.2) / 1e4
```

2. Above 0.8
```{r}
sum(samples > 0.8) / 1e4
```

3. Between 0.2 & 0.8
```{r}
sum(samples > 0.2 & samples < 0.8) / 1e4
```

4. Lowest 20%
```{r}
quantile(samples, 0.2)
```

5. Highest 20%
```{r}
quantile(samples, 0.8)
```

6. Narrowest 66%
```{r}
# note this function is from the package rethinking
HPDI(samples, prob = 0.66)
```

7. 66% PI
```{r}
PI(samples, prob = 0.66)
```

# Medium

1. 8 water in 15 tosses
```{r}
p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior <- rep( 1 , 1000 )
likelihood <- dbinom( 8 , size=15 , prob=p_grid )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
```

2. Draw 10,000 samples. 90% HPDI
```{r}
set.seed(100)
samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )
```

```{r}
HPDI(samples, prob = 0.9)
```

3. Posterior predictive check.
```{r}
w <- rbinom(1e4, size = 15, prob = samples)
hist(w)
```

Probability of observing 8 water:
```{r}
sum(w == 8)/1e4
```


4. Using the posterior distribution from 8/15 data, calculate probability of observing 6/9.

```{r}
w <- rbinom(1e4, size = 9, prob = samples)
```

```{r}
sum(w == 6)/1e4
```

5. New prior with 0 at p<0.5 and constant at p>0.5.

```{r}
p_grid <- seq( from=0 , to=1 , length.out=1000 )
prior <- ifelse(p_grid<0.5, 0 , 1)
```

  + Construct posterior distribution using grid approximation
```{r}
likelihood <- dbinom( 8 , size=15 , prob=p_grid )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
```

  + Draw 1e4 samples. HPDI 90%. Note how the lowest p is 0.5 due to our new prior.
```{r}
set.seed(100)
samples <- sample( p_grid , prob=posterior , size=1e4 , replace=TRUE )
hist(samples)
```

```{r}
HPDI(samples, prob = 0.9)
```

  + Construct posterior predictive check. What is probability of 8/15
```{r}
w <- rbinom(1e4, size = 15, prob = samples)
sum(w == 8) / 1e4
```

```{r}
hist(w)
```

  + Use posterior distribution to calculate probability of 6/9.
```{r}
w <- rbinom(1e4, size = 9, prob = samples)
sum(w == 6) / 1e4
```

# Hard
Load data.

```{r}
data(homeworkch3)
# check data is correct
sum(birth1) + sum(birth2)
```

## Question 1
Computer posterior distribution. The probability of a birth being a boy ranges from 0-1. There are only 2 possible outcomes (boy or girl), which lends itself to a binomial distribution.
```{r}
# total births
total <- length(birth1) + length(birth2)

# total boy births
boys <- sum(birth1) + sum(birth2)

p_grid <- seq( from=0 , to=1 , length.out=1000 )
# uniform prior
prior <- rep( 1 , 1000 )

likelihood <- dbinom(boys , size=total , prob=p_grid )
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)

plot(p_grid, posterior, type = "l")
```

```{r}
p_grid[ which.max(posterior)]
```

## Question 2
Draw 1e4 samples from the posterior distribution.
```{r}
set.seed(100)
samples <- sample(p_grid , prob=posterior , size=1e4 , replace=TRUE )
```

Estimate 50%, 89% & 97% HPDI's
```{r}
HPDI(samples, prob = 0.5)
HPDI(samples, prob = 0.89)
HPDI(samples, prob = 0.97)
```

## Question 3

```{r}
w <- rbinom(1e4, size = 200, prob = samples)
hist(w)
```

111 within the most likely bin in the histogram.

## Question 4
Unsure exactly what was meant by this question. I have simulate number of boys from 100 births using the samples from the posterior distribution.

```{r}
w <- rbinom(1e4, size = 100, prob = samples)
hist(w)
```

```{r}
sum(birth1)
```

51 is towards the edge of the histogram peak.

## Question 5
Count number of first borns who were girls.
```{r}
girl1 <- sum(birth1 == 0)
```

```{r}
w <- rbinom(1e4, size = 49, prob = samples)
hist(w)
```

Number of second born boys, when the first born was a girl.
```{r}
sum(birth2[birth1 == 0])
```

This is very much at the tail of the simulated distribution. It suggests that births are NOT independent (as you are more likely to get a second born boy when the first born is a girl, than the model predicts). 
```{r}
sum(birth1)
sum(birth2)
```

Boys appear to be more likely on the second birth?

Number of boy first births.
```{r}
sum(birth2)
```

Boy then girl.
```{r}
sum(birth2[birth1 == 1])
```