- Ambroise BERTIN
- Jules DUPONT
- Julien GIOVANAZZI
- Matthieu VERLYNDE
The workflow was tested on a machine with the following specifications:
- Linux (tested on Ubuntu 22.04)
- 200 GB of free disk space for latest versions of tools
- 400 GB of free disk space for paper versions of tools (less might be enough, but more than 200 GB is required)
- 16 GB of RAM
- 16 cores
The free disk space is the most critical parameter. The workflow can be run with less RAM and less cores, but it will take more time to complete (all proportions kept). With too little free disk space, the workflow will freeze and crash.
It was necessary to keep the files generated by the workflow to give the possibility to analyse the results of each steps.
Check that you have Apptainer installed with the same version as the one used here (see Tools).
apptainer --version
If not, run the following command to install it
./utilities/install_apptainer.sh
Check that you have SnakeMake installed with the same version as the one used here (see Tools).
snakemake --version
If not, check Snakemake documentation to install it.
- Clone the repository
git clone https://github.com/juldpnt/reprohackaton.git
or download the zip file and unzip it.
- Go to the project directory
cd reprohackaton
- Run the workflow with default parameters
./run_workflow.sh
- Run the workflow with custom parameters
To see the list of available parameters, run
./run_workflow.sh -h
Default parameters are set in the config.yaml
file. You can change them by editing this file or by using the command line arguments of run_workflow.sh
.
For instance :
./run_workflow.sh -v latest -t true -c 8
will run the workflow with the latest version of the tools, trim the reads and use 8 cores.
- Docker (version 24.0.7)
- Snakemake (version 7.32.4)
- Apptainer (version 1.2.4)
- SRA Toolkit (version 3.0.7)
- Bowtie (latest version 1.2.2 | paper version 0.12.7)
- TrimGalore (latest version 0.6.10 | paper version unknown)
- FastQC (latest version 0.12.1 | paper version 0.11.7)
- Cutadapt (latest version 4.6 | paper version 4.2)
- Subread (latest version 2.0.6 | paper version 1.4.6)
- DESeq2 (latest version 1.42.0 | paper version unknown)
- R (latest version 4.3.2 | paper version unknown)
-
dockerfiles
: This directory contains Dockerfiles for various tools used in the project. Each subdirectory represents a different tool, and the Dockerfile within was used to create a Docker image for that tool. -
resources
: This directory contains various resource files used in the project. This includes annotation files (annots.gff
), a text file with KEGG pathway information (kegg.txt
), gene names (names_genes
), and a reference genome sequence (reference.fasta
). -
utilities
: This directory contains utility scripts that perform various tasks.get_graphs.sh
is used to plot graphs of Snakemake's DAG which requires to useinstall_graphviz.sh
beforehand.install_apptainer.sh
has to be used if Apptainer is not installed yet. -
workflow
: This directory contains the main workflow scripts for the project. TheSnakefile
is the main script that coordinates the execution of the workflow. Therules
directory contains individual rules for the Snakemake workflow, and thescripts
directory contains additional Rscripts used in the workflow.
root
├── README.md
├── config.yaml
├── dockerfiles
│ ├── alpine
│ ├── bowtie
│ │ ├── latest
│ │ └── paper
│ ├── deseq2
│ ├── sratoolkit
│ ├── subread
│ │ ├── latest
│ │ └── paper
│ └── trimgalore
├── resources
├── run_workflow.sh
├── utilities
└── workflow
├── Snakefile
├── rules
└── scripts