Skip to content

Commit

Permalink
Updating links from GLDS to OSDR and making files published on OSDR bold
Browse files Browse the repository at this point in the history
  • Loading branch information
asaravia-butler authored Nov 8, 2023
1 parent f9244c0 commit 73ab267
Showing 1 changed file with 44 additions and 45 deletions.
89 changes: 44 additions & 45 deletions RNAseq/Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,7 @@ The DESeq2 Normalization and DGE step, [step 9](#9-normalize-read-counts-perform

# General processing overview with example commands

> Exact processing commands for specific datasets are provided in the [GLDS_Processing_Scripts](../GLDS_Processing_Scripts) directory.
>
> All output files marked with a \# are published for each RNAseq processed dataset in the [GLDS repository](https://genelab-data.ndc.nasa.gov/genelab/projects).
> Exact processing commands and output files listed in **bold** below are included with each RNAseq processed dataset in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).
---

Expand Down Expand Up @@ -186,8 +184,8 @@ multiqc --interactive -n raw_multiqc -o /path/to/raw_multiqc/output/directory /p

**Output Data:**

- raw_multiqc.html\# (multiqc report)
- /raw_multiqc_data\# (directory containing multiqc data)
- **raw_multiqc.html** (multiqc report)
- **/raw_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -229,8 +227,8 @@ trim_galore --gzip \

**Output Data:**

- *fastq.gz\# (trimmed reads)
- *trimming_report.txt\# (trimming report)
- **\*fastq.gz** (trimmed reads)
- **\*trimming_report.txt** (trimming report)

<br>

Expand Down Expand Up @@ -275,8 +273,8 @@ multiqc --interactive -n trimmed_multiqc -o /path/to/trimmed_multiqc/output/dire

**Output Data:**

- trimmed_multiqc.html\# (multiqc report)
- /trimmed_multiqc_data\# (directory containing multiqc data)
- **trimmed_multiqc.html** (multiqc report)
- **/trimmed_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -402,12 +400,12 @@ STAR --twopassMode Basic \
**Output Data:**

- *Aligned.sortedByCoord.out.bam (sorted mapping to genome)
- *Aligned.toTranscriptome.out.bam\# (sorted mapping to transcriptome)
- *Log.final.out\# (log file containing alignment info/stats such as reads mapped, etc)
- **\*Aligned.toTranscriptome.out.bam** (sorted mapping to transcriptome)
- **\*Log.final.out** (log file containing alignment info/stats such as reads mapped, etc)
- *ReadsPerGene.out.tab (tab delimitated file containing STAR read counts per gene with 4 columns that correspond to different strandedness options: column 1 = gene ID, column 2 = counts for unstranded RNAseq, column 3 = counts for 1st read strand aligned with RNA, column 4 = counts for 2nd read strand aligned with RNA)
- *Log.out (main log file containing detailed info about the STAR run)
- *Log.progress.out (minute-by-minute report containing job progress statistics, such as the number of processed reads, % of mapped reads etc.)
- *SJ.out.tab\# (high confidence collapsed splice junctions in tab-delimited format)
- **\*SJ.out.tab** (high confidence collapsed splice junctions in tab-delimited format)
- *_STARgenome (directory containing the following:)
- sjdbInfo.txt
- sjdbList.out.tab
Expand Down Expand Up @@ -438,8 +436,8 @@ multiqc --interactive -n align_multiqc -o /path/to/aligned_multiqc/output/direct

**Output Data:**

- align_multiqc.html\# (multiqc report)
- /align_multiqc_data\# (directory containing multiqc data)
- **align_multiqc.html** (multiqc report)
- **/align_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -492,7 +490,7 @@ sessionInfo()

**Output Data:**

- STAR_Unnormalized_Counts.csv\# (Table containing raw STAR counts for each sample)
- **STAR_Unnormalized_Counts.csv** (Table containing raw STAR counts for each sample)

<br>

Expand All @@ -517,7 +515,7 @@ samtools sort -m 3G \

**Output Data:**

- *Aligned.sortedByCoord_sorted.out.bam\# (samtools sorted genome aligned bam file)
- **\*Aligned.sortedByCoord_sorted.out.bam** (samtools sorted genome aligned bam file)

<br>

Expand All @@ -538,7 +536,7 @@ samtools index -@ NumberOfThreads /path/to/*Aligned.sortedByCoord_sorted.out.bam

**Output Data:**

- *Aligned.sortedByCoord_sorted.out.bam.bai\# (index of sorted mapping to genome file)
- **\*Aligned.sortedByCoord_sorted.out.bam.bai** (index of sorted mapping to genome file)

<br>

Expand Down Expand Up @@ -646,8 +644,8 @@ multiqc --interactive -n infer_exp_multiqc -o /path/to/infer_exp_multiqc/output/

**Output Data:**

- infer_exp_multiqc.html\# (multiqc report)
- /infer_exp_multiqc_data\# (directory containing multiqc data)
- **infer_exp_multiqc.html** (multiqc report)
- **/infer_exp_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -699,8 +697,8 @@ multiqc --interactive -n genebody_cov_multiqc -o /path/to/geneBody_coverage_mult

**Output Data:**

- geneBody_cov_multiqc.html\# (multiqc report)
- /geneBody_cov_multiqc_data\# (directory containing multiqc data)
- **geneBody_cov_multiqc.html** (multiqc report)
- **/geneBody_cov_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -758,8 +756,8 @@ multiqc --interactive -n inner_dist_multiqc /path/to/inner_dist_multiqc/output/d

**Output Data:**

- inner_distance_multiqc.html\# (multiqc report)
- /inner_distance_multiqc_data\# (directory containing multiqc data)
- **inner_distance_multiqc.html** (multiqc report)
- **/inner_distance_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -808,8 +806,8 @@ multiqc --interactive -n read_dist_multiqc -o /path/to/read_dist_multiqc/output/

**Output Data:**

- read_dist_multiqc.html\# (multiqc report)
- /read_dist_multiqc_data\# (directory containing multiqc data)
- **read_dist_multiqc.html** (multiqc report)
- **/read_dist_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -894,8 +892,8 @@ rsem-calculate-expression --num-threads NumberOfThreads \

**Output Data:**

- *genes.results\# (counts per gene)
- *isoforms.results\# (counts per isoform)
- **\*genes.results** (counts per gene)
- **\*isoforms.results** (counts per isoform)
- *stat (directory containing the following stats files)
- *cnt
- *model
Expand Down Expand Up @@ -925,8 +923,8 @@ multiqc --interactive -n RSEM_count_multiqc -o /path/to/RSEM_count_multiqc/outpu

**Output Data:**

- RSEM_count_multiqc.html\# (multiqc report)
- /RSEM_count_multiqc_data\# (directory containing multiqc data)
- **RSEM_count_multiqc.html** (multiqc report)
- **/RSEM_count_multiqc_data** (directory containing multiqc data)

<br>

Expand Down Expand Up @@ -1019,7 +1017,7 @@ dpt-isa-to-runsheet --accession GLDS-### \

- *ISA.zip (compressed ISA directory containing Investigation, Study, and Assay (ISA) metadata files for the respective GLDS dataset, used to define sample groups - the *ISA.zip file is located in the [GLDS repository](https://genelab-data.ndc.nasa.gov/genelab/projects) under 'Study Files' -> 'metadata')

- {GLDS-Accession-ID}_bulkRNASeq_v{version}_runsheet.csv\# (table containing metadata required for processing, version denotes the dp_tools schema used to specify the metadata to extract from the ISA archive)
- **{GLDS-Accession-ID}_bulkRNASeq_v{version}_runsheet.csv** (table containing metadata required for processing, version denotes the dp_tools schema used to specify the metadata to extract from the ISA archive)

<br>

Expand Down Expand Up @@ -1871,35 +1869,36 @@ sessionInfo()

Output data without considering ERCC spike-in genes:

- RSEM_Unnormalized_Counts.csv\# (table containing raw RSEM gene counts for each sample)
- Normalized_Counts.csv\# (table containing normalized gene counts for each sample)
- SampleTable.csv\# (table containing samples and their respective groups)
- **RSEM_Unnormalized_Counts.csv** (table containing raw RSEM gene counts for each sample)
- **Normalized_Counts.csv** (table containing normalized gene counts for each sample)
- **SampleTable.csv** (table containing samples and their respective groups)
- visualization_output_table.csv (file used to generate GeneLab DGE visualizations)
- visualization_PCA_table.csv (file used to generate GeneLab PCA plots)
- differential_expression.csv\# (table containing normalized counts for each sample, group statistics, DESeq2 DGE results for each pairwise comparison, and gene annotations)
- contrasts.csv\# (table containing all pairwise comparisons)
- **differential_expression.csv** (table containing normalized counts for each sample, group statistics, DESeq2 DGE results for each pairwise comparison, and gene annotations)
- **contrasts.csv** (table containing all pairwise comparisons)

Output data with considering ERCC spike-in genes:
*Note: ERCC-normalized data are only available upon request. GeneLab encourages users to use the normalized and DGE data without considering ERCC spike-in genes.*

- ERCC_rawCounts_unfiltered.csv (table containing raw ERCC unfiltered counts)
- ERCC_rawCounts_filtered.csv (ERCC counts table after removing ERCC genes with low counts)
- ERCC_Normalized_Counts.csv\# (table containing ERCC-normalized gene counts for each sample)
- ERCCnorm_SampleTable.csv\# (table containing samples with detectable ERCC group B genes and their respective groups)
- ERCC_Normalized_Counts.csv (table containing ERCC-normalized gene counts for each sample)
- ERCCnorm_SampleTable.csv (table containing samples with detectable ERCC group B genes and their respective groups)
- visualization_output_table_ERCCnorm.csv (file used to generate GeneLab DGE visualizations for ERCC-normalized data)
- visualization_PCA_table_ERCCnorm.csv (file used to generate GeneLab PCA plots for ERCC-normalized data)
- ERCCnorm_differential_expression.csv\# (table containing ERCC-normalized counts for each sample, group statistics, DESeq2 DGE results for each pairwise comparison, and gene annotations)
- ERCCnorm_contrasts.csv\# (table containing all pairwise comparisons for samples containing ERCC spike-in)
- ERCCnorm_differential_expression.csv (table containing ERCC-normalized counts for each sample, group statistics, DESeq2 DGE results for each pairwise comparison, and gene annotations)
- ERCCnorm_contrasts.csv (table containing all pairwise comparisons for samples containing ERCC spike-in)


**Output Data for Datasets without ERCC Spike-In:**

- RSEM_Unnormalized_Counts.csv\# (table containing raw RSEM gene counts for each sample)
- Normalized_Counts.csv\# (table containing normalized gene counts for each sample)
- SampleTable.csv\# (table containing samples and their respective groups)
- **RSEM_Unnormalized_Counts.csv** (table containing raw RSEM gene counts for each sample)
- **Normalized_Counts.csv** (table containing normalized gene counts for each sample)
- **SampleTable.csv** (table containing samples and their respective groups)
- visualization_output_table.csv (file used to generate GeneLab DGE visualizations)
- visualization_PCA_table.csv (file used to generate GeneLab PCA plots)
- differential_expression.csv\# (table containing normalized counts for each sample, group statistics, DESeq2 DGE results for each pairwise comparison, and gene annotations)
- contrasts.csv\# (table containing all pairwise comparisons)
- **differential_expression.csv** (table containing normalized counts for each sample, group statistics, DESeq2 DGE results for each pairwise comparison, and gene annotations)
- **contrasts.csv** (table containing all pairwise comparisons)

> Note: RNAseq processed data interactive tables and plots are found in the [GLDS visualization portal](https://visualization.genelab.nasa.gov/data/studies).
Expand Down Expand Up @@ -2651,4 +2650,4 @@ ax.set_yscale("log");

- ERCC_analysis/ERCC_lodr_*.csv (ERCC Gene Table including mean counts, adjusted p-value and p-value, and filtered to genes with both adj. p-value and p-value < 0.001)

> All steps of the ERCC Spike-In Data Analysis are performed in a Jupyter Notebook (JN) and the completed JN is exported as an html file and published in the [GLDS repository](https://genelab-data.ndc.nasa.gov/genelab/projects) for the respective dataset.
> All steps of the ERCC Spike-In Data Analysis are performed in a Jupyter Notebook (JN) and the completed JN is exported as an html file (**ERCC_analysis.html**) and published in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/) for the respective dataset.

0 comments on commit 73ab267

Please sign in to comment.