Chapter2.Rmd

---
title: "Statistical Rethinking - Chapter 2"
author: "Lucy Liu"
output:
  html_document:
    df_print: paged
---

```{r, echo=FALSE, message=FALSE}
library(knitr)
```

Questions from 'Statistical Rethinking' by Richard McElreath, Chapter 2.

# Easy

1. 2&4 - $Pr(rain|Monday)$ & $Pr(rain,Monday) / Pr(Monday)$
2. 3 - The probability that it is Monday, given that it is raining. 
3. 1&4 - $Pr(Monday|rain)$ & $Pr(rain|Monday) Pr(Monday) / Pr(rain)$
4. We are inferring the % of water on the globe and relating this to the probability of a finger landing on water after tossing the globe. This probability has not objective reality as it either ends up on water or land, it cannot end up on 0.7 water.

# Medium
## Question 1
Calculate posterior distributions:
a) WWW 
```{r}
#define grid
pGrid <- seq(from = 0, to = 1, length.out = 20)
#sequence of 20 evenly spaced numbers from 0 to 1

#define prior
prior <- rep(1,20)
#number 1 x 20 times

# compute likelihood 
likelihood <- dbinom(3, size = 3, prob = pGrid)

# likelihood x prior
unstdPosterior <- likelihood*prior

#standardise
post <- unstdPosterior/sum(unstdPosterior)

plot(post)
```

b) WWWL
```{r}
#define grid
pGrid <- seq(from = 0, to = 1, length.out = 20)
#sequence of 20 evenly spaced numbers from 0 to 1

#define prior
prior <- rep(1,20)
#number 1 x 20 times

#compute likelihood 
likelihood <- dbinom(3, size = 4, prob = pGrid)

#likelihood x prior
unstdPosterior <- likelihood*prior

#standardise
post <- unstdPosterior/sum(unstdPosterior)

plot(post)
```

c) LWWLWWW
```{r}
#define grid
pGrid <- seq(from = 0, to = 1, length.out = 20)
#sequence of 20 evenly spaced numbers from 0 to 1

#define prior
prior <- rep(1,20)
#number 1 x 20 times

#compute likelihood
likelihood <- dbinom(5, size = 7, prob = pGrid)

#likelihood x prior
unstdPosterior <- likelihood*prior

#standardise
post <- unstdPosterior/sum(unstdPosterior)

plot(post)
```

## Question 2
Prior at <0.5 is 0 and >=0 is 1
a) WWW 
```{r}
#define grid
pGrid <- seq(from = 0, to = 1, length.out = 20)
#sequence of 20 evenly spaced numbers from 0 to 1

#define prior
prior <- ifelse(pGrid<0.5,0,1)
#number 1 x 20 times

# compute likelihood 
likelihood <- dbinom(3, size = 3, prob = pGrid)

# likelihood x prior
unstdPosterior <- likelihood*prior

#standardise
post <- unstdPosterior/sum(unstdPosterior)

plot(post)
```

b) WWWL
```{r}
#define grid
pGrid <- seq(from = 0, to = 1, length.out = 20)
#sequence of 20 evenly spaced numbers from 0 to 1

#use above prior

#compute likelihood 
likelihood <- dbinom(3, size = 4, prob = pGrid)

#likelihood x prior
unstdPosterior <- likelihood*prior

#standardise
post <- unstdPosterior/sum(unstdPosterior)

plot(post)
```

c) LWWLWWW
```{r}
#define grid
pGrid <- seq(from = 0, to = 1, length.out = 20)
#sequence of 20 evenly spaced numbers from 0 to 1

#use above prior

#compute likelihood
likelihood <- dbinom(5, size = 7, prob = pGrid)

#likelihood x prior
unstdPosterior <- likelihood*prior

#standardise
post <- unstdPosterior/sum(unstdPosterior)

plot(post)
```

## Question 3
Prove $Pr(Earth|land)=0.23$

$Pr(Earth|land)=Pr(land|Earth)Pr(Earth)/Pr(land)$

$Pr(Earth|land)=(0.30*0.5)/0.65=0.23$

Note that: $Pr(land)=(0.5*0.3)+(0.5*1)$ or $Pr(Earth,land)+Pr(Mars,land)$ 

## Question 4

* Card 1: 2
* Card 2: 1
* Card 3: 0

$Pr(card1|data)=2/3$

## Question 5

* Card 1: 2
* Card 2: 1
* Card 3: 0
* Card 4: 2

$Pr(card1\cup card4 |data)=4/5$

## Question 6

* Card 1: black side-2, weight-1 = 2
* Card 2: black side-1, weight-2 = 2
* Card 3: black side-0, weight-3 = 0

$Pr(card1|data) = 2/4=0.5$

Using likelihood

## Question 7

* BB:2
  +BW:1 - 2
  +WW:2 - 4
* BW:1
  +WW:2 - 2
  +BB:0 - 0

$Pr(BB) = 4+2/8=0.75$

![](images/chap2-1.jpg)

Using priors to calculate this:

```{r}
##ORDER:
# BB,BW,WW

#Data 1: card 1 is black side up
#Prior = all cards equally likely to be drawn. 
prior <- c(1,1,1)
#likelihood of each card, given data
likelihood <- c(2,1,0)

unstdPosterior <- prior * likelihood

posterior <- unstdPosterior/sum(unstdPosterior)

#Data 2: card 2 is white side up
prior <- posterior

likelihood <- c(3,2,1)

unstdPosterior <- prior * likelihood

posterior <- unstdPosterior/sum(unstdPosterior)

posterior
```

# Hard
## Question 1

```{r}
#Both species equally likely
prior <- c(0.5,0.5)
#Given data (twins), plausibility of species A and species B
likelihood <- c(0.1,0.2)

unstdPosterior <- prior * likelihood

posterior <- unstdPosterior/sum(unstdPosterior)

#Likelihood of next twins?

posterior[1]*0.1 + posterior[2]*0.2
```

Alternative way of computing:
Think of the species A prior as the plausibility of species A given the 1st bit of data, twins:
$$Pr(A|twins)=\frac{Pr(twins|A)Pr(A)}{Pr(twins)}$$

$$Pr(A|twins)=\frac{0.1*0.5}{0.15}=0.33$$

The species B prior is the plausibility of species B given the 1st bit of data, twins:
$$Pr(B|twins)=\frac{Pr(twins|B)Pr(B)}{Pr(twins)}$$

$$Pr(A|twins)=\frac{0.2*0.5}{0.15}=0.66$$

The likelihood that the next birth will also be twins, using the law of total probability, is:

$$Pr(A|twins)*Pr(twins|A)+Pr(B|twins)*Pr(twins|B) \\= 0.33*0.1+0.66*0.2=0.166$$

From [link](http://www.rpubs.com/andersgs/my_solutions_chapter2_statrethink)
The result of 0.167 makes sense. If the species was species B, then the probability of twins would be 0.2, if species A, it would be 0.1. As we don’t know the species, it makes sense that the probability should be something in between those two values. Where exactly, will depend on how plausible species A and B are. Without any additional information, we see that that the probability of the twins is 0.15, or exactly in the middle. When observing a single twin birth, we should update the relative plausibility of each species accordingly. In doing so, the probability of observing twins in the next birth moves a little towards 0.2, reflecting that now species B is more plausible than species A.

## Question 2
$Pr(speciesA|twins) = Pr(twins|speciesA)Pr(speciesA)/Pr(twins)$

$Pr(speciesA|twins) = (0.1*0.5)/(0.5*0.1+0.5*0.2)=0.33$

## Question 3

```{r}
#prior = Pr(A|twins), Pr(B|twins)
prior <- c(0.33,0.66)
#Pr(A|single)
#(0.9*0.5)/((0.9+0.8)/2)
#Pr(B|single)
#(0.8*0.5)/((0.9+0.8)/2)
likelihood <- c(0.53,0.47)

unstdPosterior <- prior * likelihood

posterior <- unstdPosterior/sum(unstdPosterior)
posterior
```

You can also calculate:
$$Pr(speciesA|single) = Pr(single|speciesA)Pr(speciesA)/Pr(single)$$
Note however, that $Pr(speciesA)$ should be the updated one, given the 1st twins birth. We calculated this above:
$$Pr(speciesA|single) = (0.9*0.33)/(0.9*0.33)/(0.33*0.9+0.66*0.8) = 0.36$$

## Question 4
Prior = both species equally plausible.
Information about the test:
```{r}
df <- data.frame(ActualA=c(0.8,0.35), ActualB=c(0.2,0.65))
row.names(df) <- c("TestedA","TestedB")
kable(df)
```

Using only test result:
```{r}
prior <- c(0.5,0.5)
likelihood <- c(0.8,0.35)

unstdPosterior <- prior * likelihood

posterior <- unstdPosterior/sum(unstdPosterior)
posterior
```

Using the birthing data (twins then singleton):

```{r}
#use test data above as new prior
prior <- posterior

#calcualted above
likelihood <- c(0.3605442,0.6394558)

unstdPosterior <- prior * likelihood

posterior <- unstdPosterior/sum(unstdPosterior)
posterior
```