Update README and UMAP file

sriram-lab · Jan 27, 2022 · 508e2ea · 508e2ea
1 parent 3be4339
commit 508e2ea
Show file tree

Hide file tree

Showing 3 changed files with 117 additions and 35 deletions.
diff --git a/README.md b/README.md
@@ -1,31 +1,34 @@
-# Modeling the metabolic changes of the epithelial-to-mesenchymal transition 
+# Constraint-based modeling identifies cell-state specific metabolic vulnerabilities during the epithelial to mesenchymal transition  
 
 ## Summary
-This repository identifies metabolic enzymes that are essential to the epithelial-to-mesenchymal transition (EMT) in the context of lung adenocarcinoma. 
 
-**Three analyses are performed:**
+This repository contains the code from the paper Constraint-based modeling identifies cell-state specific metabolic vulnerabilities during the epithelial to mesenchymal transition by Campit, S.E., Keshamouni, V.G., and Chandrasekaran, S. 
 
-  1. Enrichment and differential expression of multiple lung adenocarcinoma omics datasets (Bulk RNA-Seq, single-cell RNA-Seq, Proteomics, and more)
-  2. Constraint-based metabolic reconstruction and analysis for metabolic flux analysis and fitness evaluation from gene and reaction knockouts.
-  3. Hypothesis generation from differential flux and growth sensitivty analysis  
+**Key analyses contained in notebooks:**
+
+  1. Data preprocessing for transcriptomics, proteomics, single-cell transcriptomics, CERES Score data, and other =omics datasets.
+  2. Constraint-based metabolic reconstruction and analysis code for simulating metabolic fluxes and growth resulting from gene and reaction knockout. 
+  3. Statistical analyses for assessing differences between groups. 
 
 ## Programming languages used in this analysis
 
   * MATLAB version R2020b Update 4
   * R version 4.03
   * Python version 3.8.6
 
-## Getting Started
-
-
-## TO DO:
-- [ ] Create Docker container for dependencies
-- [ ] Clean up all code base further
-- [ ] Edit all notebooks
-- [ ] Update static website for short and graphical representation of paper
-
 ## Usage
-COMING SOON
+Three programming languages (Python / R / MATLAB) were used, based on availability of scientific libraries and strengths in specific tasks. Thus, we would recommend the following workflow to perform the entire analysis end-to-end. We will point to specific directories and scripts that are numbered by usage.
+
+  1. Exploratory data analysis and general understanding of data distributions: `notebooks/r/01_EDA/*.Rmd`
+  2. Preprocessing bulk -omics data for COBRA: `notebooks/r/02_DifferentialExpression/*.Rmd`
+  3. Preprocessing single-cell omics data for COBRA: `notebooks/r/03_Preprocess/*.Rmd`
+  4. Performing MAGIC data imputation for single-cell COBRA analysis: `notebooks/python/magic.ipynb`
+  5. Constraint-based reconstruction and analysis for bulk -omics data: `notebooks/matlab/01_bulk_analysis/RECON1/*.mlx`
+  6. Constraint-based reconstruction and analysis for single-cell -omics data: `notebooks/matlab/02_single_cell_analysis/recon1_scCOBRA.mlx`
+  7. Generating FBA-UMAP profiles: `notebooks/r/05_Embeddings/*.Rmd`
+  8. Statistical analyses: Google Colab notebooks can be found [here](https://drive.google.com/drive/folders/1kCNsrULvzgaTEH3387mAx7KbB_dJSO4p).
+
+Note that there are additional QA/QC scripts and notebooks available as well.
 
 ## Contributing
 Contributions to make this analysis better, more robust, and easier to follow are greatly appreciated. Here are the steps we ask of you:
@@ -40,4 +43,4 @@ Contributions to make this analysis better, more robust, and easier to follow ar
 Distributed under the GNU License. See `LICENSE` for more information.
 
 ## Contact
-For questions regarding the code deposited in this repository, please reach out to Scott Campit via email at: scampit [at] umich [dot] edu or via Twitter at @secampit.
+For questions regarding the code deposited in this repository, please reach out to Scott Campit via email at: scampit [at] umich [dot] edu or via Twitter at [at] secampit.
diff --git a/notebooks/r/05_Embeddings/03_MAGIC_UMAP_99.Rmd b/notebooks/r/05_Embeddings/03_MAGIC_UMAP_99.Rmd
@@ -188,22 +188,42 @@ to_use = names(mnorm_ko) %in% c("GLCt1", "HEX1", "PGI", "PFK", "FBA", "GAPD", "P
 glycolysis_ko = mnorm_ko[, to_use]
 merged_meanko = cbind(time, glycolysis_ko)
 tmp = melt(merged_meanko, id=c('a549.meta.data.Time'))
-tmp2 = tmp[tmp$variable == "ENO", ]
+
+minmax <- function(x){(x-min(x))/(max(x)-min(x))}
+
+tmp$density = minmax(tmp$value)
 
 library(dplyr)
 library(tidyr)
 library(ggplot2)
 
 tmp %>%
   ggplot(aes(x=value, color=a549.meta.data.Time, fill=a549.meta.data.Time)) +
-  geom_density(alpha=0.4) + 
+  geom_density(aes(y=..scaled.., alpha=0.4)) + 
   facet_wrap(~variable, ncol=4) + 
-  #scale_y_log10() 
-
   labs(x="KO Growth Score", y="Density")
 
 ```
 
+```{r}
+set.seed(1234)
+df = data.frame(value =round(c(rnorm(200,
+                                      mean=100,
+                                      sd=7))))
+  
+# import libraries ggplot2
+library(ggplot2)  
+  
+# create density plot
+ggplot(df, aes(x=value)) + geom_density()
+```
+
+```{r}
+ggplot(tmp, aes(x=value, color=a549.meta.data.Time, fill=a549.meta.data.Time)) +
+geom_density(aes(y=..scaled.., alpha=0.4)) + 
+labs(x="KO Growth Score", y="Density")
+```
+
 ### C. Merge with UMAP embedding
 This combines the reaction ko data with the UMAP embedding.
 ```{r}
@@ -791,12 +811,16 @@ library(ggplot2)
 
 tmp %>%
   ggplot(aes(x=value, color=a549.meta.data.Time, fill=a549.meta.data.Time)) +
-  geom_density(alpha=0.4) + 
+  geom_density(aes(y=..scaled.., alpha=0.4)) + 
   facet_wrap(~variable, ncol=4) + 
-  #scale_y_log10() 
+  labs(x="Glycolysis flux profile (individual reactions)", y="Density")
 
-  labs(x="Flux profiles", y="Density")
+```
 
+```{r}
+ggplot(tmp, aes(x=value, color=a549.meta.data.Time, fill=a549.meta.data.Time)) +
+geom_density(aes(y=..scaled.., alpha=0.4)) + 
+labs(x="Glycolysis flux profile (all)", y="Density")
 ```
 
 ### C. Merge with UMAP embedding