Misc.Rmd

---
title: "Miscellaneous analyses"
author: "Florian Huber, Leonard Dubois"
date: "`r Sys.Date()`"
output:
  html_document:
    fig_caption: yes
    fig_height: 6
    fig_width: 6
    number_sections: yes
    toc: yes
    toc_depth: 2
    toc_float: yes
editor_options: 
  chunk_output_type: console
---

# Setup

```{r setup, message = FALSE}
source("./setup.R")
```

# Miscellaneous analyses

Chunks in this notebook are (should be) self-contained.

## Chemical similarities

```{r}
m <- readRDS("./data/programmatic_output/m_all.rds")
all_inchi <- read_tsv("./data/all_inchi.tsv")

# read fingerprints
(drugs_RDkit_fingerprint = read_csv2(file = "./data/RDkit_fingerprint.csv"))
(drugs_RDkit_fingerprint = left_join(all_inchi, drugs_RDkit_fingerprint, 
  by = "data_stdinchi"))
colnames(drugs_RDkit_fingerprint) = c("drugname_typaslab", "data_stdinchi", 
  "RDkit_fingerprint")
drugs_RDkit_fingerprint <- semi_join(drugs_RDkit_fingerprint, m, 
  by = "drugname_typaslab")
# could also get morgan and MACCS fingerprints, see ./data
```

Plot dendrogram of similarities. 

```{r}
(cluster_matrix <- as_tibble(select(drugs_RDkit_fingerprint, -data_stdinchi)))
cluster_matrix <- cluster_matrix[complete.cases(cluster_matrix), ]
row.names(cluster_matrix) <- cluster_matrix$drugname_typaslab
cluster_matrix$drugname_typaslab <- NULL
process_lut <- structure(m$process_broad, names = m$drugname_typaslab)

cluster_matrix <- as.matrix(cluster_matrix)
cluster_matrix_splitbits <- t(apply(cluster_matrix, 1, function(x) {
   as.numeric(unlist(strsplit(x["RDkit_fingerprint"], split = "")))
}))
cluster_matrix_splitbits_dist <- dist(cluster_matrix_splitbits, 
  method = "binary")

rowcols <- moa_cols[process_lut[rownames(cluster_matrix)]]

pdf("./plots/drug_drug_similarities_RDKit.pdf", width = 14, height = 14)
heatmap.2(cluster_matrix_splitbits, 
          trace = "none", 
          Rowv = as.dendrogram(hclust(cluster_matrix_splitbits_dist)), 
          RowSideColors = rowcols, 
          margins = c(12, 9), 
          dendrogram = "row", 
          labCol = "", 
          breaks = c(0, 0.5, 1), 
          col = rev(heat.colors(n = 2)), 
          main = "Binary distances of drugs (RDKit fingerprint)")
legend("topright", 
       legend = names(moa_cols), 
       col = moa_cols, 
       lty = 1, 
       lwd = 5, 
       cex = 0.8)
dev.off()
```


## Further investigations on chemical features

First thing to note is that there is a considerable correlation between the 
chemical featues we use for the drugs in our data set. High correlation between 
features is problematic for a number of reasons: first, they indicate redundant 
information. Second, it can lead to model instability (cf. pages 12, 46 in 
Kuhn & Johnson, 2013). 

To show how strongly the chemical features are correlated:

```{r, eval = FALSE}
tmp <- readRDS("./data/programmatic_output/m_one_chem.rds")
tmp_m <- select(tmp, SlogP:data_effective_rotor_count)
tmp_m <- as.matrix(tmp_m)
row.names(tmp_m) <- tmp$drugname_typaslab
tmp_m_cor <- cor(tmp_m)
corrplot::corrplot(tmp_m_cor, method = "color", order = "hclust", tl.cex = 0.7)
```

As can also be seen in the plot "drug_drug_similarities_RDKit.pdf" (notebook 
Misc) some of our chemical features are biased towards distinguishing 
protein_synthesis from the other features. However, number of amide bonds 
favours cell wall, number of aromatic rings might favour DNA. 

```{r, eval = FALSE}
select(tmp, drugname_typaslab, process_broad, (ncol(tmp) - 12):ncol(tmp)) %>%
   filter(process_broad %in% c("cell_wall", "dna", "protein_synthesis", "membrane_stress")) %>%
   gather(-drugname_typaslab, -process_broad, key = "chemical_feature", value = "value") %>%
   ggplot(aes(x = process_broad, y = value, colour = process_broad)) + 
      ggbeeswarm::geom_beeswarm(groupOnX = TRUE, cex = 2, shape = 1) + 
      facet_wrap( ~ chemical_feature, ncol = 5, scales = "free_y") + 
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
      labs(title = "Distribution of chemical features across MoAs")
```

Check which drugs have high number of aliphatic rings and of saturated rings. 
This seems to bias the models using chemical features towards protein synthesis. 

```{r, eval = FALSE}
tmp$drugname_typaslab[tmp$NumAliphaticRings > 2.5]
tmp$drugname_typaslab[tmp$NumSaturatedRings > 2.5]
```


## PCA: first 4 PCs of most interactions, top 10% variance, no chemical features

```{r}
m <- readRDS("./data/programmatic_output/m_all.rds")
m_matrix <- as.matrix(m[, c(4:ncol(m))])
rownames(m_matrix) <- paste0(m$drugname_typaslab, "_", m$conc)

pr.out <- prcomp(m_matrix, scale = TRUE)

# screeplot:
pve <- pr.out$sdev ^ 2
pve <- pve / sum(pve)
names(pve) <- paste0("PC", seq_along(pve))

pdf("./plots/PCA.pdf", width = 8, height = 5)
par(mfrow = c(1, 2))
plot(pve, xlab = "Principal component index", 
  ylab = "Percentage of variance explained", pch = )
plot(cumsum(pve), type = "l", pch = 19, xlab = "Principal component index", 
  ylab = "Cumulative proportion of variance")
dev.off()

prcomps <- as.data.frame(pr.out$x)
prcomps$drug_conc <- row.names(prcomps)
prcomps <- separate(prcomps, col = drug_conc, 
  into = c("drugname_typaslab", "conc"), sep = "_")
prcomps <- left_join(prcomps, unique(m[, c("drugname_typaslab", "process_broad")]))
prcomps <- select(prcomps, drugname_typaslab, process_broad, everything())

my_legend <- scale_colour_manual("Target process", labels = names(moa_cols), 
  values = moa_cols)

# there are some interesting dots in the PC1-PC2 component that I would like to 
# label

p1 <- ggplot(prcomps, aes(x = PC1, y = PC2)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + my_legend + 
  ggtitle("PCA (Nichols, all dosages, all features)")

p2 <- ggplot(prcomps, aes(x = PC1, y = PC3)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + ggtitle("") + 
  my_legend

p3 <- ggplot(prcomps, aes(x = PC2, y = PC3)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + my_legend

p4 <- ggplot(prcomps, aes(x = PC3, y = PC4)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + my_legend

p <- grid.arrange(p1, p2, p3, p4, nrow = 2)
p
ggsave(p, file = "./plots/PCA_plots.pdf", width = 9, height = 6)
```

PCA with fingerprint

```{r}
my_fingerprint_mcl <- readRDS("./data/programmatic_output/fingerprint_mcl.rds")
m <- readRDS("./data/programmatic_output/m_all.rds")

m_matrix <- as.matrix(m[, c(4:ncol(m))])
m_matrix <- m_matrix[, my_fingerprint_mcl]

rownames(m_matrix) <- paste0(m$drugname_typaslab, "_", m$conc)

pr.out <- prcomp(m_matrix, scale = TRUE)

# screeplot:
pve <- pr.out$sdev ^ 2
pve <- pve / sum(pve)
names(pve) <- paste0("PC", seq_along(pve))

pdf("./plots/PCA_fingerprint.pdf", width = 8, height = 5)
par(mfrow = c(1, 2))
plot(pve, xlab = "Principal component index", 
  ylab = "Percentage of variance explained", pch = )
plot(cumsum(pve), type = "l", pch = 19, xlab = "Principal component index", 
  ylab = "Cumulative proportion of variance")
dev.off()

prcomps <- as.data.frame(pr.out$x)
prcomps$drug_conc <- row.names(prcomps)
prcomps <- separate(prcomps, col = drug_conc, 
  into = c("drugname_typaslab", "conc"), sep = "_")
prcomps <- left_join(prcomps, unique(m[, c("drugname_typaslab", "process_broad")]))
prcomps <- select(prcomps, drugname_typaslab, process_broad, everything())

my_legend <- scale_colour_manual("Target process", labels = names(moa_cols), 
  values = moa_cols)

# there are some interesting dots in the PC1-PC2 component that I would like to 
# label

p1 <- ggplot(prcomps, aes(x = PC1, y = PC2)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + my_legend + 
  ggtitle("PCA (Nichols, all dosages, all features)")

p2 <- ggplot(prcomps, aes(x = PC1, y = PC3)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + ggtitle("") + 
  my_legend

p3 <- ggplot(prcomps, aes(x = PC2, y = PC3)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + my_legend

p4 <- ggplot(prcomps, aes(x = PC3, y = PC4)) + 
  geom_point(aes(colour = process_broad), alpha = 0.75) + 
  theme_bw() + my_legend

p <- grid.arrange(p1, p2, p3, p4, nrow = 2)
p
ggsave(p, file = "./plots/PCA_plots_fingerprint.pdf", width = 9, height = 6)
```


## An example of an s-score distribution

```{r}
tmp <- readRDS("./data/programmatic_output/kept_datasets_long_featsndrugs_removed.rds")
tmp <- tmp$nichols_2011
moa <- read_csv("./data/programmatic_output/drug_moa_userfriendly.csv")

tmp <- left_join(tmp, moa[, c("drugname_typaslab", "process_broad")]) %>%
  filter(gene_synonym == "RECA") %>%
  mutate(is_dna = process_broad == "dna") %>%
  arrange(sscore)

ggplot(tmp, aes(x = sscore)) + 
   geom_histogram(aes(fill = is_dna), binwidth = 0.3) + 
   theme_bw() + 
   labs(x = expression(paste("Relative fitness of ", Delta, italic(recA))), 
     y = "Number of conditions", 
     title = expression(paste("Fitness of ", Delta, italic(recA), 
       " across >300 conditions"))) + 
   scale_fill_manual("Compound target:", values = c("#bababa", "#fd8d3c"), 
     labels = c("DNA", "Other"), breaks = c(1, 0)) + 
   theme(text = element_text(size = 16), legend.position = c(0.1, 0.75), 
     legend.justification = c("left", "bottom"), legend.background = 
       element_blank())

ggsave(filename = "./plots/Sscores_recA.pdf", width = 5.5, height = 4.5)
```


## Specificity depending on dosage

Are responses to lower drug dosages less specific than to higher dosages? And 
are reactions to higher dosages qualitatively different? 

If so, then we should be able to observe some things such as: 

1. Overlap between drugs of different mode of action higher at lower dosages than at higher dosages. Check also same MoA. 
2. Interaction set at higher dosage could be (i) different from low dosage or (ii) a superset of the low dosage. 
3. If interactions are more specific at higher dosages, then the overlap with the fingerprint should be higher. Correct for number of interactions in general. 
4. Likewise, specificity could become lower again at higher dosages. 

Instead of looking at the number of interactions, may want to consider strength of s-scores as well? 

```{r}
m <- readRDS("./data/programmatic_output/kept_datasets_long_featsndrugs_removed.rds")
nic <- m$nichols_2011
mics <- read_delim("./data/programmatic_output/MICs.csv", delim = ";")
moa <- read_csv("./data/programmatic_output/drug_moa_userfriendly.csv")

nic <- left_join(nic, mics) %>%
  left_join(moa[, c("drugname_typaslab", "process_broad")])

nic_prc <- filter(nic, !(is.na(mic_curated)), !resistant, 
  process_broad %in% c("cell_wall", "dna", "membrane_stress", "protein_synthesis")) %>%
  dplyr::select(gene_synonym, drugname_typaslab, conc, sscore, qvalue, significant, mic_curated, process_broad)

nunique(nic_prc$drugname_typaslab)

# to get an idea where in the MIC range we're usually located
nic_prc$frc_mic <- nic_prc$conc / nic_prc$mic_curated
nic_prc$log2_frc_mic <- log2(nic_prc$frc_mic)

mic_dist <- dplyr::select(nic_prc, drugname_typaslab, conc, mic_curated, frc_mic, log2_frc_mic) %>%
  distinct()
mic_dist
# but ...
nunique(mic_dist$frc_mic)

ggplot(mic_dist, aes(x = "", y = log2_frc_mic)) + 
  geom_boxplot(outlier.shape = NA, width = 0.5) + 
  geom_point(shape = 1, position = position_jitter(height = 0, width = 0.2))

# cut the space up such that each region contains same number of observations
mic_dist$log2_frc_mic_bin <- cut_number(mic_dist$log2_frc_mic, n = 5)

ggplot(mic_dist, aes(x = "", y = log2_frc_mic, colour = log2_frc_mic_bin)) + 
  geom_boxplot(outlier.shape = NA) + 
  geom_point(shape = 1, position = position_jitterdodge()) + 
  labs(x = "", y = "Fraction of MIC (log2 scale)", 
    title = "Number of conditions falling into a certain MIC interval")

# next, for each group/bin count number of interactions that overlap between 
# pairs of drugs
nic_prc <- left_join(nic_prc, mic_dist)

# for each drug-concentration combination: determine interacting genes, save 
# result in a list column
nic_prc2 <- group_by(nic_prc, drugname_typaslab, conc, mic_curated, 
  process_broad, frc_mic, log2_frc_mic, log2_frc_mic_bin) %>%
  summarise(interactors = list(gene_synonym[significant])) %>%
  ungroup()

# plot number of interactions depending on mic_bin
nic_prc2$n_interactors <- lengths(nic_prc2$interactors)

ggplot(nic_prc2, aes(x = log2_frc_mic_bin, y = n_interactors, colour = log2_frc_mic_bin)) + 
  geom_boxplot(outlier.shape = NA) + 
  geom_point(position = position_jitterdodge(), shape = 1) + 
  labs(title = "Number of interacting genes by condition, grouped by MIC interval")

# now we need some proxy of specificity: the lower the specificity, the more 
# overlap (or less?) I would expect between drugs of different MoA than within 
# the same MoA

# for each interval: generate all possible pairs of distinct drugs and count 
# the size of the overlap
mic_bins <- split(nic_prc2, nic_prc2$log2_frc_mic_bin)

overlap_counter <- function(dfr) {
  combos <- combn(nrow(dfr), m = 2)
  dfr <- map2(combos[1, ], combos[2, ], function(row1, row2) {
    l <- list(row1 = row1, row2 = row2)
    drugA <- dfr$drugname_typaslab[row1]
    drugB <- dfr$drugname_typaslab[row2]
    l$drugA <- drugA
    l$drugB <- drugB
    l$bin <- dfr$log2_frc_mic_bin[row1]
    l$same_moa <- dfr$process_broad[row1] == dfr$process_broad[row2]
    l$same_drug <- drugA == drugB
    l$intrctrs_A <- dfr$interactors[row1]
    l$intrctrs_B <- dfr$interactors[row2]
    l$overlap <- list(dplyr::intersect(l$intrctrs_A[[1]], l$intrctrs_B[[1]]))
    return(l)
  })
  dfr <- bind_rows(dfr)
  return(dfr)
}

res <- map(mic_bins, overlap_counter) %>%
  bind_rows()

res <- filter(res, !same_drug)
res$overlap_len <- lengths(res$overlap)

ggplot(res, aes(x = same_moa, y = overlap_len)) + 
  geom_violin() + 
  geom_boxplot(width = 0.3, outlier.shape = NA, colour = "red") + 
  geom_point(shape = 1, position = position_jitter(width = 0.1), alpha = 0.35) + 
  facet_wrap( ~ bin)

ggsave("./plots/Specificity_check_I.pdf")

ggplot(res[res$same_moa, ], aes(x = bin, y = overlap_len)) + 
  geom_violin() + 
  geom_boxplot(width = 0.3, outlier.shape = NA, colour = "red") + 
  geom_point(shape = 1, position = position_jitter(width = 0.1), alpha = 0.35) + 

ggsave("./plots/Specificity_check_II.pdf")

# need to check shape of distributions
ggplot(res, aes(x = overlap_len, fill = same_moa)) + 
  geom_histogram(aes(y = ..density..), position = "identity", alpha = 0.5, bins = 30) + 
  facet_wrap( ~ bin) + 
  scale_x_continuous(limits = c(0, 30))

ggsave("./plots/Specificity_check_III.pdf")

by(data = res, INDICES = res$bin, FUN = function(x) {
  list(median_same_moa = median(x$overlap_len[x$same_moa]), 
    median_difft_moa = median(x$overlap_len[!x$same_moa]))
})

by(data = res, INDICES = res$bin, FUN = function(x) {
  wilcox.test(x$overlap_len[x$same_moa], x$overlap_len[!x$same_moa])
})
```


## Correlation chemogenomic profiles for shared conditions Nichols/Newsize

```{r}
split_sets <- readRDS("./data/programmatic_output/split_sets.rds")

newsize <- split_sets$control_set
nichols <- split_sets$training_set

newsize <- newsize %>% 
  filter(origin == "newsize") %>% 
  filter(paste(drugname_typaslab, conc) %in% paste(nichols$drugname_typaslab, nichols$conc)) %>% 
  select(-one_of("origin"))

nichols <- nichols %>% 
  filter(paste(drugname_typaslab, conc) %in% paste(newsize$drugname_typaslab, newsize$conc))

newsize <- gather(newsize, AAEX:last_col(), key = "gene", value = "sscore")
nichols <- gather(nichols, AAEX:last_col(), key = "gene", value = "sscore")

joined <- full_join(newsize, nichols, by = c("drugname_typaslab", "conc", 
  "gene", "process_broad"), suffix = c(".new", ".nic"))

joined <- group_by(joined, drugname_typaslab, conc, process_broad) %>% 
  nest()

joined$cor <- map_dbl(joined$data, ~ cor(.x$sscore.new, .x$sscore.nic, method = "spearman"))
joined$label <- paste0(joined$drugname_typaslab, " (", joined$conc, ")")
joined$label <- fct_reorder(joined$label, joined$cor, .desc = FALSE)

ggplot(joined, aes(x = "", label = label, y = cor)) + 
  geom_point(shape = 1) + 
  geom_text_repel(size = 1.5, nudge_x = rep(c(0.25, -0.25), length.out = nrow(joined)), 
    segment.size = 0.2, force = 0.5) + 
  labs(x = "", y = "Spearman correlation", 
    title = "Correlation of chemogenomic profiles") + 
  paper_theme + 
  theme(axis.text.x = element_blank())

ggsave("./plots/paper/Corr_shared_conds.pdf", width = 75, height = 90, 
  units = "mm")
```


## GO enrichment analysis

The chemogenomic fingerprint was mapped to Eco Gene IDs and UniProt IDs in file 
`./data/programmatic_output/fpt_ecocyc_ids.txt`. This was then used as an input 
to Pantherdb. The output of the analysis + all settings are in the file 
`GO_analysis_2019-08-08.txt`. 

```{r}
tab <- read_delim("./GO_analysis_2019-08-08.txt", delim = "\t", skip = 6)
colnames(tab)

colnames(tab) <- c("GO_biol_proc", "N_in_ref", "N_in_input", "N_in_input_expected", 
  "over_or_under", "fold_enrich", "raw_pval")

tab$GO_biol_proc <- str_replace(string = tab$GO_biol_proc, pattern = " \\(GO:\\d*\\)", 
  replacement = "")

# filter according to a few criteria
# remove if:
# there are fewer than 5 genes in the reference
# there's just a single gene with a particular GO term in our set 
# stuff that is too broad: if there are more than 100 in the reference it's 
# usually something generic (e.g. "metabolic process")

# tabfilt <- filter(tab, N_in_input > 1 & N_in_ref > 1)

# keep only the most specific GO term according to the hierarchy
# these are the most specific ones:
hierarch_filt <- c("cell division", "response to ionizing radiation", 
  "peptidoglycan biosynthetic process", "SOS response", 
  "N-acetylneuraminate catabolic process", "D-alanine biosynthetic process", 
  "regulation of DNA recombination", "lipoprotein localization to outer membrane", 
  "double-strand break repair via homologous recombination", 
  "N-acetylglucosamine catabolic process", "lipoprotein metabolic process", 
  "maturation of SSU-rRNA", "phosphatidic acid biosynthetic process", 
  "clearance of foreign intracellular DNA", "localization within membrane", 
  "regulation of protein targeting to membrane", "glutamyl-tRNA aminoacylation", 
  "geranyl diphosphate biosynthetic process", "farnesyl diphosphate biosynthetic process", 
  "UV protection")
hierarch_filt <- paste0(hierarch_filt, collapse = "|")

tabfilt <- tab[grepl(x = tab$GO_biol_proc, pattern = hierarch_filt), ]
tabfilt <- filter(tabfilt, N_in_ref > 1 & N_in_input > 1)

# ggplot(tabfilt, aes(x = GO_biol_proc, y = N_in_input)) + 
#   geom_bar(stat = "identity") + 
#   coord_flip()

ggplot(tabfilt, aes(x = GO_biol_proc, y = -log10(raw_pval))) + 
  geom_bar(stat = "identity", width = 0.75) + 
  labs(y = "-Log10(p-value)", x = "") + 
  coord_flip() + 
  paper_theme

ggsave(filename = "./plots/paper/GO_analysis.pdf", width = 80, height = 50, 
  units = "mm")
```


## TPP Results

Pentamidine

```{r}
library(readxl)
tpp_pent <- read_xlsx("./data/2019-07-05_results_2D_TPP-pentamidine.xlsx", 
  sheet = 2)

tpp_pent %>% 
  filter(gene_name %in% c("PRS", "PURB", "MENB", "ARGP"), 
    temperature > 42) %>% 
  dplyr::select(gene_name, temperature, matches("norm_rel_fc_.*unmodified")) %>% 
  gather(key = "key", value = "fc", -gene_name, -temperature) %>% 
  mutate(conc = as.numeric(str_extract(string = key, 
    pattern = "\\d{1,3}")), temperature = factor(temperature), 
    conc = factor(conc)) -> 
  prs

ggplot(prs, aes(conc, temperature)) +
  facet_wrap(~gene_name, ncol = 6) +
  geom_tile(aes(fill = fc)) +
  scale_fill_gradient2("FC", low = viridis(3)[1], mid = viridis(3)[2], 
    high = viridis(3)[3], midpoint = 1, limits = c(0.5, 4), 
    na.value = viridis(3)[3]) + 
  labs(x = "Concentration", y = "Temperature") + 
  guides(fill = guide_legend(keywidth = 0.25, keyheight = 0.5)) +
  paper_theme + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggsave("./plots/paper/TPP_pent.pdf", width = 130, height = 55, units = "mm")
```

Thiolutin

```{r}
library(readxl)
tpp_thio <- read_xlsx("./data/2019-07-01_results_2D_TPP-thiolutin.xlsx", 
  sheet = 2)

tpp_thio %>% 
  filter(gene_name %in% c("MURE", "MREB"), 
    temperature > 42) %>% 
  dplyr::select(gene_name, temperature, matches("norm_rel_fc_.*unmodified")) %>% 
  gather(key = "key", value = "fc", -gene_name, -temperature) %>% 
  mutate(conc = as.numeric(str_extract(string = key, 
    pattern = "\\d{1,3}")), temperature = factor(temperature), 
    conc = factor(conc)) -> 
  thio

ggplot(thio, aes(conc, temperature)) +
  facet_wrap(~gene_name, ncol = 6) +
  geom_tile(aes(fill = fc)) +
  scale_fill_gradient2("FC", low = viridis(3)[1], mid = viridis(3)[2], 
    high = viridis(3)[3], midpoint = 1, limits = c(0.5, 2), 
    na.value = viridis(3)[1]) + 
  labs(x = "Concentration", y = "Temperature") + 
  guides(fill = guide_legend(keywidth = 0.25, keyheight = 0.5)) +
  paper_theme + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggsave("./plots/paper/TPP_pent.pdf", width = 95, height = 45, units = "mm")
```


## Growth curves

```{r}
library(readxl)

thio_growth <- read_xlsx("./data/2019-07-17_Thiolutin_MICs.xlsx", 
  sheet = 4, range = "A64:M67")
colnames(thio_growth)[1] <- "Strain"
thio_growth$Strain <- c("WT", "mrcB", "slt")

(thio_growth_m <- gather(thio_growth, `1`:`12`, key = "Dilution", value = "OD"))
thio_growth_m$Dilution <- as.numeric(thio_growth_m$Dilution) - 1

thio_growth_m$Dilution <- factor(thio_growth_m$Dilution, levels = 0:11)

ggplot(thio_growth_m, aes(x = Dilution, y = OD, colour = Strain)) + 
  geom_line(aes(group = Strain)) + 
  labs(x = "-log2(dilution factor)", y = "OD endpoint") + 
  paper_theme + 
  scale_colour_manual("Strain", labels = c("ΔmrcB", 'Δslt', 'wt'), 
    values = c("green", "blue", "black")) + 
  theme(legend.position = c(0.2, 0.55), legend.background = element_blank())

ggsave("./plots/paper/Thiolutin_growth.pdf", width = 60, height = 45, 
  device = cairo_pdf, units = "mm")
```


## Chemogenomic fingerprint follow-ups

```{r}
m_all_alldrugs <- rd(m_all_alldrugs)
fpt <- rd(fpt)

thio_pent <- 
  filter(m_all_alldrugs, drugname_typaslab %in% c("THIOLUTIN", "PENTAMIDINE"))

plot_heatmap(dfm = thio_pent, mics = mics, moa = mode_of_action, feats = fpt, 
  cluster_rows = FALSE, cluster_columns = FALSE, plot_width = 7, 
  use_global_fpt_order = TRUE, plot_height = 1.5, save = TRUE, 
  file = "./plots/paper/Thio_pent_heatmap.pdf")
```


## Problem with SPA-P6 strains

These are hypomorphic strains which I thought I excluded but I did not. Check which 
ones are affected. 

```{r}
indir <- "/Volumes/typas/Florian/dbsetup_tables_new"

(nichols_2011 <- read_delim(file.path(indir, "nichols_2011.csv"), delim = ";"))

strains_distinct <- select(nichols_2011, strain, ugi, gene_synonym) %>% 
  distinct()

spa_p6 <- select(nichols_2011, strain, ugi, gene_synonym) %>% 
  distinct() %>% 
  filter(grepl(x = strain, pattern = "SPA")) %>% 
  arrange(gene_synonym)
# print(spa_p6, n = 200)

# Of the fingerprint it affects ispA and lolB
```


# Session info

```{r}
R.version
sessionInfo()
```