DGE2 is a nextflow pipeline built using code and infrastructure developed and maintained by the nf-core initative. It was developed to perform differential gene expression analysis after the data has been preprocessed with the nf-core/rnaseq pipeline (v3+) with default star_salmon alignment.
- Takes salmon quantification files and a metadata file as input
- Performs differential gene expression analysis over a specific design or if one is not specified, over all possible designs from the metadata file
- Generates summary plots (PCA, volcano, heatmap) and txt files, as well as a summary HTML report
- Runs gene set enrichment analysis on the preRanked list of genes from the DGE results
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
If you have run the nf-core/rnaseq pipeline with default aligner (star/salmon), you should have a results/star_salmon/
folder with several additional folders and files, including
a quant.sf
file for each sample, plus a tx2gene.tsv
file with the correspondence between transcript and gene identifiers:
results/star_salmon/SAMPLE_1/quant.sf
results/star_salmon/SAMPLE_2/quant.sf
results/star_salmon/SAMPLE_3/quant.sf
results/star_salmon/SAMPLE_4/quant.sf
results/star_salmon/SAMPLE_5/quant.sf
results/star_salmon/SAMPLE_6/quant.sf
results/star_salmon/tx2gene.tsv
[... other files and folders...]
In the above example, you would pass the results/
folder to the DGE2 pipeline using the --inputdir
argument
Additionally, you will need to prepare a metadata.txt
file that looks as follows:
SampleID Levels Status
SAMPLE_1 high ctr
SAMPLE_2 high ctr
SAMPLE_3 med ctr
SAMPLE_4 low case
SAMPLE_5 low case
SAMPLE_6 low case
This should be a txt file where the first column are the sample IDs, and the other (1 or more) columns displays the conditions for each sample. The samples must match those in the results/star_salmon
inputdir.
Now, you can run the pipeline using:
nextflow run lconde-ucl/DGE2 \
-profile <docker/singularity/.../institute> \
--inputdir <PATH/TO/INPUTDIR/> \
--metadata <PATH/TO/METADATA> \
--outdir <OUTDIR>
For more details and further functionality, please refer to the usage documentation
The pipeline produces text files and plots with the DGE and GSEA results, as well as an HTML report that contains a summary of the DGE results. For more details about the output files and reports, please refer to the output documentation.
DGE2 was developed by Lucia Conde in 2024. This is a DSL2 version of an older (DSL1) DGE pipeline developed in 2019
If you would like to contribute to this pipeline, please see the contributing guidelines.
This pipeline uses code and infrastructure developed and maintained by the nf-core initative, and reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Additional references of tools and data used in this pipeline are in CITATIONS