From 36363bc782d77d846ef53f6bf1e0a95195629620 Mon Sep 17 00:00:00 2001 From: daianna21 Date: Wed, 7 Feb 2024 17:54:58 -0600 Subject: [PATCH] Update fig captions --- README.Rmd | 38 +++++++++----- README.md | 152 +++++++++++++++++++++++++++-------------------------- 2 files changed, 104 insertions(+), 86 deletions(-) diff --git a/README.Rmd b/README.Rmd index c1a8dbf..66c49ea 100644 --- a/README.Rmd +++ b/README.Rmd @@ -35,34 +35,48 @@ Welcome to the `smokingMouse` project! Here you'll be able to access the mouse e ## Overview -This bulk RNA-sequencing project consisted of a differential expression analysis (DEA) involving 4 data types: genes, exons, transcripts and exon-exon junctions. The main goal of this study was to explore the effects of prenatal exposure to maternal smoking and nicotine exposures on the developing mouse brain. As secondary objectives, this work evaluated: 1) the affected genes by each exposure on the adult female brain in order to compare offspring and adult results and 2) the effects of smoking on adult blood and brain to search for overlapping biomarkers in both tissues. Finally, DEGs identified in mice were compared against previously published results in human (Semick, S.A. et al. (2018) and Toikumo, S. et al. (2023)). +This bulk RNA-sequencing project consisted of a differential expression analysis (DEA) involving 4 data types: genes, exons, transcripts and exon-exon junctions. The main goal of this study was to explore the effects of prenatal exposure to maternal smoking and nicotine exposures on the developing mouse brain. As secondary objectives, this work evaluated: 1) the affected genes by each exposure on the adult female brain in order to compare offspring and adult results and 2) the effects of smoking on adult blood and brain to search for overlapping biomarkers in both tissues. Finally, DEGs identified in mice were compared against previously published results in human (Semick et al. 2020 and Toikumo et al. 2023). -The next table summarizes the analyses done at each level. +## Study design
- -
Summary of analysis steps across gene expression feature levels : - - 1. Data preparation: in this first step, counts of genes, exons and junctions were normalized to CPM and scaled; transcript expression values were only scaled since they were already in TMP. Then, low-expression features were removed using the indicated methods and samples were separated by tissue and age in order to create subsets of the data for downstream analyses. - - 2. Exploratory Data Analysis: QC metrics of the samples were examined and used to filter them; sample level effects were explored through dimensionality reduction methods and rare samples in PCA plots were manually removed from the datasets; gene level effects were evaluated with analyses of explanatory variables and variance partition. 3. Differential Expression Analysis: with the relevant variables identified in the previous steps, the DEA was performed at the gene level for nicotine and smoking, adult and pup, and blood and brain samples, and for 3 models: the naive one modeled ~Group + batch effects, the adjusted model modeled ~Group + Pregnancy + batch effects for adults and ~Group + Sex + batch effects for pups, and the interaction model ~Group\*Pregnancy + batch effects for adults and ~Group*Sex + batch effects for pups; DEA on the rest of the levels was performed for pups only and using the adjusted model. After that, signals of the features in nicotine and smoking were compared, as well as the signals of exons and txs vs the effects of their genes, and genes’ signals were additionally compared in the different tissues, ages, models and species (vs human data of a previous study). All resultant DEG and DE features (and their genes) were quantified and compared based on their experiment (nic/smo) and direction of regulation (up/down); DEG were further compared against genes of DE exons and txs; mouse genes were also compared with human genes affected by cigarette smoke or associated with TUD. 4. Gene Ontology and KEGG: taking the DEG and the genes of DE txs and exons, GO & KEGG analyses were done and the expression levels of genes that participate in brain development related processes were explored. 5. DE feature visualization: DEG counts were represented in heatmaps in order to distinguish the groups of up and down-regulated genes. 6. Junction annotation: for novel DE jxns of unknown gene, their nearest, preceding and following genes were determined. + +
Figure 1: Experimental design of the study. A) 21 pregnant mice and 26 nonpregnant female adults were either administered nicotine (n=12), exposed to cigarette smoke (n=12), or used as controls (n=23; 11 nicotine controls and 12 smoking controls). A total of 137 pups were born to pregnant mice: 19 were born to mice that were administered nicotine, 46 to mice exposed to smoking, and the remaining 72 to control mice (23 to nicotine controls and 49 to smoking controls). Frontal cortex samples of all P0 pups (n=137: 42 of nicotine and 95 of the smoking experiment) and adults (n=47: 23 of nicotine and 24 of the smoking experiment) were obtained, as well as blood samples from the smoking-exposed and smoking control adults (n=24), totaling 208 samples. Number of donors and samples are indicated in the figure. B) RNA was extracted from such samples and bulk RNA-seq experiments were performed, obtaining expression counts for genes, exons, transcripts and exon-exon junctions. -Abbreviations: Jxn: junction; Tx: transcript; CPM: counts per million; TPM: transcripts per million; TMM: Trimmed Mean of M-Values; TMMwsp: TMM with singleton pairing; EDA: exploratory data analysis; QC: quality control; ribo: ribosomal; mt: mitochondrial; PCA: Principal Component Analysis; PC: principal component; MDS: Multidimensional Scaling; DEA: differential expression analysis; DE: differential expression/differentially expressed; DEG: differentially expressed genes; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; TUD: tobacco use disorder. +
+## Workflow -## Study design +The next table summarizes the analyses done at each level.
- -
Figure 1: Experimental design of the study. A) 36 pregnant dams and 35 non-pregnant female adult mice were either administered nicotine by intraperitoneal injection (IP; n=12), exposed to cigarette smoke in smoking chambers (n=24), or controls (n=35; 11 nicotine controls and 24 smoking controls). A total of 137 pups were born to pregnant dams: 19 were born to mice that were administered nicotine, 46 to mice exposed to cigarette smoke and the remaining 72 to control mice (23 to nicotine controls and 49 to smoking controls). Samples from frontal cortices of P0 pups and adults were obtained, as well as blood samples from smoking-exposed and smoking control adults. B) RNA was extracted, RNA-seq libraries were prepared and sequenced to obtain expression counts for genes, exons, transcripts and exon-exon junctions. + +
Summary of analysis steps across gene expression feature levels : + + 1. Data processing: counts of genes, exons, and exon-exon junctions were normalized to CPM and log2-transformed; transcript expression values were only log2-scaled since they were already in TPM. Lowly-expressed features were removed using the indicated functions and samples were separated by tissue and age in order to create subsets of the data for downstream analyses. + + 2. Exploratory Data Analysis (EDA): QC metrics of the samples were examined and used to filter the poor quality ones. Sample level effects were explored through dimensionality reduction methods and segregated samples in PCA plots were removed from the datasets. Gene level effects were evaluated with analyses of variance partition. + + 3. Differential Expression Analysis (DEA): with the relevant variables identified in the previous steps, the DEA was performed at the gene level for nicotine and smoking experiments in adult and pup brain samples, and for smoking in adult blood samples; DEA at the rest of the levels was performed for both exposures in pup brain only. DE signals of the genes in the different conditions, ages, tissues and species (human results from $^1$: [Semick et al. 2020](https://www.nature.com/articles/s41380-018-0223-1)) were contrasted, as well as the DE signals of exons and transcripts vs those of their genes. We also analyzed the mean expression of significant and non-significant genes with and without DE features. Then, all resultant DEGs and DE features (and their genes) were compared by direction of regulation (up or down) between and within experiments (nicotine/smoking); mouse DEGs were also compared against human genes associated with TUD from $^2$: [Toikumo et al. 2023](https://www.medrxiv.org/content/10.1101/2023.03.27.23287713v2). + + 4. Functional Enrichment Analysis: we obtained the GO & KEGG terms significantly enriched in our clusters of DEGs and genes of DE transcripts and exons. + + 5. DGE visualization: the log2-normalized expression of DEGs was represented in heatmaps in order to distinguish the groups of up and downregulated genes. + + 6. Novel junction gene annotation: for uncharacterized DE junctions with no annotated gene, their nearest, preceding and following genes were determined. + + +Abbreviations: Jxn: junction; Tx(s): transcript(s); CPM: counts per million; TPM: transcripts per million; TMM: Trimmed Mean of M-Values; TMMwsp: TMM with singleton pairing; QC: quality control; PC: principal component; DEA: differential expression analysis; DE: differential expression/differentially expressed; FC: fold-change; FDR: false discovery rate; DEGs: differentially expressed genes; TUD: tobacco use disorder; DGE: differential gene expression.
+ + ## smoking Mouse datasets The mouse datasets contain the following data in a single object for each feature (genes, exons, transcripts and exon-exon junctions): diff --git a/README.md b/README.md index 8c8c5a0..6bbbf3f 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ support](https://bioconductor.org/shields/posts/smokingMouse.svg)](https://suppo commit](https://bioconductor.org/shields/lastcommit/devel/data-experiment/smokingMouse.svg)](http://bioconductor.org/checkResults/devel/data-experiment-LATEST/smokingMouse/) [![Bioc dependencies](https://bioconductor.org/shields/dependencies/release/smokingMouse.svg)](https://bioconductor.org/packages/release/bioc/html/smokingMouse.html#since) -[![check-bioc](https://github.com/LieberInstitute/smokingMouse/actions/workflows/check-bioc.yaml/badge.svg)](https://github.com/LieberInstitute/smokingMouse/actions/workflows/check-bioc.yaml) +[![R-CMD-check-bioc](https://github.com/LieberInstitute/smokingMouse/actions/workflows/R-CMD-check-bioc.yaml/badge.svg)](https://github.com/LieberInstitute/smokingMouse/actions/workflows/R-CMD-check-bioc.yaml) @@ -42,8 +42,37 @@ evaluated: 1) the affected genes by each exposure on the adult female brain in order to compare offspring and adult results and 2) the effects of smoking on adult blood and brain to search for overlapping biomarkers in both tissues. Finally, DEGs identified in mice were compared against -previously published results in human (Semick, S.A. et al. (2018) and -Toikumo, S. et al. (2023)). +previously published results in human (Semick et al. 2020 and Toikumo et +al. 2023). + +## Study design + +
+ +
+ +Figure 1: Experimental design of the +study. A) 21 pregnant mice and 26 nonpregnant female adults +were either administered nicotine (n=12), exposed to cigarette smoke +(n=12), or used as controls (n=23; 11 nicotine controls and 12 smoking +controls). A total of 137 pups were born to pregnant mice: 19 were born +to mice that were administered nicotine, 46 to mice exposed to smoking, +and the remaining 72 to control mice (23 to nicotine controls and 49 to +smoking controls). Frontal cortex samples of all P0 pups (n=137: 42 of +nicotine and 95 of the smoking experiment) and adults (n=47: 23 of +nicotine and 24 of the smoking experiment) were obtained, as well as +blood samples from the smoking-exposed and smoking control adults +(n=24), totaling 208 samples. Number of donors and samples are indicated +in the figure. B) RNA was extracted from such samples and bulk +RNA-seq experiments were performed, obtaining expression counts for +genes, exons, transcripts and exon-exon junctions. + + + +
+
+ +## Workflow The next table summarizes the analyses done at each level. @@ -54,80 +83,55 @@ The next table summarizes the analyses done at each level. Summary of analysis steps across gene expression feature levels : -1. Data preparation: in this first step, counts of genes, exons -and junctions were normalized to CPM and scaled; transcript expression -values were only scaled since they were already in TMP. Then, -low-expression features were removed using the indicated methods and +1. Data processing: counts of genes, exons, and exon-exon +junctions were normalized to CPM and log2-transformed; transcript +expression values were only log2-scaled since they were already in TPM. +Lowly-expressed features were removed using the indicated functions and samples were separated by tissue and age in order to create subsets of the data for downstream analyses. -2. Exploratory Data Analysis: QC metrics of the samples were -examined and used to filter them; sample level effects were explored -through dimensionality reduction methods and rare samples in PCA plots -were manually removed from the datasets; gene level effects were -evaluated with analyses of explanatory variables and variance partition. -3. Differential Expression Analysis: with the relevant variables -identified in the previous steps, the DEA was performed at the gene -level for nicotine and smoking, adult and pup, and blood and brain -samples, and for 3 models: the naive one modeled -~Group + batch effects, the -adjusted model modeled ~Group + -Pregnancy + batch effects for adults and -~Group + Sex + batch effects -for pups, and the interaction model -~Group\*Pregnancy + batch -effects for adults and -~Group\*Sex + batch effects -for pups; DEA on the rest of the levels was performed for pups only and -using the adjusted model. After that, signals of the features in -nicotine and smoking were compared, as well as the signals of exons and -txs vs the effects of their genes, and genes’ signals were additionally -compared in the different tissues, ages, models and species (vs human -data of a previous study). All resultant DEG and DE features (and their -genes) were quantified and compared based on their experiment (nic/smo) -and direction of regulation (up/down); DEG were further compared against -genes of DE exons and txs; mouse genes were also compared with human -genes affected by cigarette smoke or associated with TUD. 4. Gene -Ontology and KEGG: taking the DEG and the genes of DE txs and exons, -GO & KEGG analyses were done and the expression levels of genes that -participate in brain development related processes were explored. 5. -DE feature visualization: DEG counts were represented in heatmaps in -order to distinguish the groups of up and down-regulated genes. 6. -Junction annotation: for novel DE jxns of unknown gene, their -nearest, preceding and following genes were determined. - -Abbreviations: Jxn: junction; Tx: transcript; CPM: -counts per million; TPM: transcripts per million; TMM: Trimmed Mean of -M-Values; TMMwsp: TMM with singleton pairing; EDA: exploratory data -analysis; QC: quality control; ribo: ribosomal; mt: mitochondrial; PCA: -Principal Component Analysis; PC: principal component; MDS: -Multidimensional Scaling; DEA: differential expression analysis; DE: -differential expression/differentially expressed; DEG: differentially -expressed genes; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes -and Genomes; TUD: tobacco use disorder. - - - - -## Study design - -
- -
- -Figure 1: Experimental design of the -study. A) 36 pregnant dams and 35 non-pregnant female adult -mice were either administered nicotine by intraperitoneal injection (IP; -n=12), exposed to cigarette smoke in smoking chambers (n=24), or -controls (n=35; 11 nicotine controls and 24 smoking controls). A total -of 137 pups were born to pregnant dams: 19 were born to mice that were -administered nicotine, 46 to mice exposed to cigarette smoke and the -remaining 72 to control mice (23 to nicotine controls and 49 to smoking -controls). Samples from frontal cortices of P0 pups and adults were -obtained, as well as blood samples from smoking-exposed and smoking -control adults. B) RNA was extracted, RNA-seq libraries were -prepared and sequenced to obtain expression counts for genes, exons, -transcripts and exon-exon junctions. +2. Exploratory Data Analysis (EDA): QC metrics of the samples +were examined and used to filter the poor quality ones. Sample level +effects were explored through dimensionality reduction methods and +segregated samples in PCA plots were removed from the datasets. Gene +level effects were evaluated with analyses of variance partition. + +3. Differential Expression Analysis (DEA): with the relevant +variables identified in the previous steps, the DEA was performed at the +gene level for nicotine and smoking experiments in adult and pup brain +samples, and for smoking in adult blood samples; DEA at the rest of the +levels was performed for both exposures in pup brain only. DE signals of +the genes in the different conditions, ages, tissues and species (human +results from $^1$: [Semick et +al. 2020](https://www.nature.com/articles/s41380-018-0223-1)) were +contrasted, as well as the DE signals of exons and transcripts vs those +of their genes. We also analyzed the mean expression of significant and +non-significant genes with and without DE features. Then, all resultant +DEGs and DE features (and their genes) were compared by direction of +regulation (up or down) between and within experiments +(nicotine/smoking); mouse DEGs were also compared against human genes +associated with TUD from $^2$: [Toikumo et +al. 2023](https://www.medrxiv.org/content/10.1101/2023.03.27.23287713v2). + +4. Functional Enrichment Analysis: we obtained the GO & KEGG +terms significantly enriched in our clusters of DEGs and genes of DE +transcripts and exons. + +5. DGE visualization: the log2-normalized expression of DEGs was +represented in heatmaps in order to distinguish the groups of up and +downregulated genes. + +6. Novel junction gene annotation: for uncharacterized DE +junctions with no annotated gene, their nearest, preceding and following +genes were determined. + +Abbreviations: Jxn: junction; Tx(s): transcript(s); +CPM: counts per million; TPM: transcripts per million; TMM: Trimmed Mean +of M-Values; TMMwsp: TMM with singleton pairing; QC: quality control; +PC: principal component; DEA: differential expression analysis; DE: +differential expression/differentially expressed; FC: fold-change; FDR: +false discovery rate; DEGs: differentially expressed genes; TUD: tobacco +use disorder; DGE: differential gene expression.