forked from microbiome/course_2021_radboud
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path06-alpha.Rmd
191 lines (138 loc) · 6.44 KB
/
06-alpha.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
```{r, echo=FALSE}
library(mia)
library(miaViz)
library(dplyr)
se <- readRDS("data/se.rds")
tse <- readRDS("data/tse.rds")
```
# Alpha diversity
This section demonstrates the analysis of alpha diversity. This
quantity measures microbial diversity within each sample. Higher
numbers of unique taxa, and more even abundance distributions within a
sample yield larger values for alpha diversity.
Alpha diversity is a key quantity in a microbiome research. The [_mia_
package](https://microbiome.github.io/mia/) provides access to a wide
variety of alpha diversity indices. For more background information
and examples with various alpha diversity indices, see the [online
book](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity).
Let us show how to calculate to different diversity indices, Shannon
and Faith. Shannon index reflects how many different taxa there are
and how evenly they are distributed within a sample. Faith index
additionally takes into account the phylogenetic relations into
account.
```{r}
# Indices to be calculated.
# Every index is calculated by default if we don't specify indices.
indices <- c("shannon", "faith")
# Indices are stored in colData (i.e., sample metadata). We can specify the name
# of column, or we can use the default name which is the name of index
# (i.e., "shannon" and "faith").
names <- c("Shannon_diversity", "Faith_diversity")
# Calculates indices
tse <- estimateDiversity(tse, index = indices, name = names)
# Shows the calculated indices
knitr::kable(head(colData(tse)[names])) %>%
kableExtra::kable_styling("striped",
latex_options="scale_down") %>%
kableExtra::scroll_box(width = "100%")
```
Next we can visualize Shannon index with histogram.
```{r}
# ggplot needs data.frame format as input.
# Here, colData is DataFrame, therefore it needs to be converted to data.frame
shannon_hist <- ggplot(as.data.frame(colData(tse)),
aes(x = Shannon_diversity)) +
geom_histogram(bins = 20, fill = "gray", color = "black") +
labs(x = "Shannon index", y = "Sample frequency")
shannon_hist
```
```{r, echo=FALSE}
# # Same thing but done differently
# # Creates histogram. With "break", number of bins can be specified. However, the
# # value is taken as a suggestion, because hist() uses pretty() to calculate breakpoints.
# hist(colData(tse)$Shannon_diversity, col = "green", breaks = 20,
# xlab = "Shannon index",
# ylab = "Sample frequency",
# main = "Histogram of Shannon index")
```
Next, let us compare the indices based on a scatter-plot.
```{r}
cross_plot <- ggplot2::ggplot(as.data.frame(colData(tse)),
aes(x = Shannon_diversity, y = Faith_diversity)) +
geom_point() + # Adds points
geom_smooth(method=lm) + # Adds regression line
labs(x = "Shannon index", y = "Faith diversity")
cross_plot
```
```{r, echo=FALSE}
# # Does the same thing but differently
# plot(colData(tse)$Shannon_diversity, colData(tse)$Faith_diversity,
# xlab = "Shannon index",
# ylab = "Faith diversity index",
# main = "plot()") +
# # Adds regression line
# abline(lm(colData(tse)$Faith_diversity ~ colData(tse)$Shannon_diversity))
```
## Visualization
Next let us compare indices between different patient status and
cohorts. Boxplot is suitable for that purpose.
```{r}
# Creates Shannon boxplot
shannon_box <- ggplot(as.data.frame(colData(tse)),
aes(x = patient_status,
y = Shannon_diversity,
fill = cohort)) +
geom_boxplot() +
theme(title = element_text(size = 12)) # makes titles smaller
# Creates Faith boxplot
faith_box <- ggplot(as.data.frame(colData(tse)), aes(x = patient_status,
y = Faith_diversity,
fill = cohort)) +
geom_boxplot() +
theme(title = element_text(size = 12)) # makes titles smaller
# Puts them into same picture
gridExtra::grid.arrange(shannon_box, faith_box, nrow = 2)
```
For an alternative visualization, see examples with [scater::plotColData](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity).
## Statistical testing and comparisons
To further investigate if patient status could explain the variation
of Shannon index, let's do a Wilcoxon test. This is a non-parametric
test that doesn't make specific assumptions about the distribution,
unlike popular parametric tests, such as the t test, which assumes
normally distributed observations.
Wilcoxon test can be used to estimate whether the differences between
two groups is statistically significant. Here the ADHD and control
groups are not significantly different between groups (p-value is over
0.05).
```{r}
# Wilcoxon test, where Shannon index is the variable that we are comparing.
# Patient status - ADHD or control - is the factor that we use for grouping.
wilcoxon_shannon <- wilcox.test(Shannon_diversity ~ patient_status, data = colData(tse))
wilcoxon_shannon
```
Another test that we can make is to test if ADHD samples differs between different
cohorts. From boxplot that we made in previous step, we can see that there might
be statistically significant difference between different cohorts.
Let's compare Shannon index of ADHD samples between cohort 2 and cohort 3.
As we can see, there is statistically significant difference between the cohorts.
```{r}
# Takes subset of colData. Takes only ADHD samples
ADHD_shannon <- colData(tse)[ colData(tse)[, "patient_status"] == "ADHD" , ]
# Takes subset of colData. Takes only samples that are in cohort 2 or cohort 3.
ADHD_shannon <- ADHD_shannon[ ADHD_shannon[, "cohort"] %in% c("Cohort_2", "Cohort_3") , ]
# Wilcoxon test, where Shannon index is the variable that we are comparing.
# Cohort - 2 or 3 - is the factor that we use for grouping.
wilcoxon_shannon_ADHD_cohorts <- wilcox.test(Shannon_diversity ~ cohort, data = ADHD_shannon)
wilcoxon_shannon_ADHD_cohorts
```
For more examples, see a dedicated section on alpha diversity in the
[online book](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity).
## Exercises
Add the following in the reproducible summary report.
* Estimate alpha diversity for each sample and draw a histogram. Tip:
estimateDiversity
* Compare the results between two or more
alpha diversity indices (visually and/or statistically).
* See [online book](https://microbiome.github.io/OMA/microbiome-diversity.html#alpha-diversity)
for further examples.
* Example [Solutions](07-3-ex-sol-ADHD.html)