diff --git a/README.md b/README.md index e6fbfbe..6a5847e 100644 --- a/README.md +++ b/README.md @@ -10,38 +10,102 @@ SPeW is a framework for taking a NextGen Seq pipeline (such as RNA-seq, ChIP-seq This project was part of the September 2017 Pitt-NCBI Hackathon in Pittsburgh PA ## Dependencies -Docker https://www.docker.com/ +Docker +samtools +FastQC +cutadapt +tophat2 +bowtie2 +cufflinks +R +NextFlow -samtools http://www.htslib.org/ +## Methods -FastQC https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ +![ScreenShot](SPeW_workflow.jpg) -cutadapt http://cutadapt.readthedocs.io/en/stable/guide.html +To create the proof-of-principle simple RNA-seq pipeline, we started with writing simple Bash shell scripts for each step required in the analysis. These individual steps were then combined together by integrating them into Nextflow. In order to allow for seamless running on any workstation, Docker was then used to wrap the Nextflow code. By wrapping into a container such as Docker, all dependencies required for each step are automatically on the users workstation. Docker has the ability to be used by Singularity, allowing the code to be utilized on a High Computing Cluster(HPC). -tophat2 http://ccb.jhu.edu/software/tophat/index.shtml +### Using Nextflow to String together Bash Scripts +Nextflow is easily installed by using curl: -bowtie2 https://github.com/BenLangmead/bowtie +``` +curl -fsSL get.nextflow.io | bash +``` -cufflinks http://cole-trapnell-lab.github.io/cufflinks/ +After successfully installing Nextflow, all the bash scripts are moved into a bin folder and the execute permission is granted for all the files. -R https://www.r-project.org/ +``` +mkdir -p bin +mv *.sh bin +chmod +x *.sh +``` +With all the bash files in the same folder, the Nextflow script can now be written. First, the environment is set and the initial input files are set as parameters: -NextFlow https://www.nextflow.io/ +``` +#!/usr/bin/env nextflow -## Methods +/* +* input parameters for the pipeline +*/ +params.in = "$baseDir/data/*.fastq.gz" +``` -![ScreenShot](SPeW_workflow.jpg) +A file object is then created from the string parameter: -To create the proof-of-principle simple RNA-seq pipeline, we started with writing simple Bash shell scripts for each step required in the analysis. These individual steps were then combined together by integrating them into Nextflow. In order to allow for seamless running on any workstation, Docker was then used to wrap the Nextflow code. By wrapping into a container such as Docker, all dependencies required for each step are automatically on the users workstation. Docker has the ability to be used by Singularity, allowing the code to be utilized on a High Computing Cluster(HPC). +``` +inFiles = file(params.in) +``` -### Using Nextflow to String together Bash Scripts -Nextflow is easily installed by using curl: +By creating the file object, the file can now be used as the input file for the first process. Such as: ``` -curl -fsSL get.nextflow.io | bash +process trimming{ + +input: +file reads from inFiles +} +``` + +The output specifies a file ('trimmed_*') that is then put into a variable(trimmedFiles) to be used in the next process. + +``` +process trimming{ + +input: +file reads from inFiles + +output: +file 'trimmed_*' into trimmedFiles +} ``` -After installing Nexflow, +The script is now ready to be called. These scripts include the ability to use conditional statements: + +``` +process trimming{ + +input: +file reads from inFiles + +output: +file 'trimmed_*' into trimmedFiles + +script: +singleEnd = true +adapter1 = "ADATPER_FWD" +adapter2 = "ADATPER_REV" + +if (singleEnd==true) +""" +trimming.sh -i reads -s -a1 adapter1 -a2 adapter2 +""" +else +""" +trimming.sh -i reads -a1 adapter1 -a2 adapter2 +""" +} +``` ## Discussion Notes ### Overview