This is bioinformatics sequence-analysis pipelining software.
Currently, it is written in python, with a single R script to handle 'edgeR' differential expression.
The code is located at
- Make a new directory where you're going to run everything.
- Download the test data (FASTQ reads) from
- There are six of these files.
- Put that test FASTQ data into a new folder named 'test_data'
- You can now run one of the commands in the 'Makefile' for this project, which at the moment has two options:
- test_2groups
- test_3groups
- If you run "make test_2groups", will be run and will generate an output file named ''
- You can then invoke that script by running "bash". That is how you actually run all the bioinformatics tools.
- Finally, if you ran one of the 'make test_2groups' or 'make test_3groups' commands above, you can compare your output to the reference output in the "test_data" directory, for example: 2-group edgeR output.txt
python2 ./ --basedir="/data/projects/kp-600-b2b-osono-data-pipeline-run-feb-16/B-2016-11-November/test_data/" \
--outdir="/data/projects/kp-600-b2b-osono-data-pipeline-run-feb-16/B-2016-11-November/" \
--experiment-id="Test_3_Compare" \
--sample-ids="X1,X2,Y1,Y2,Z3A,Z3B" \
--groups="1,1,2,2,3,3" \
--rna-samples=a1.mm9.chr19.fq.gz,a2.mm9.chr19.fq.gz,b1.mm9.chr19.fq.gz,b2.mm9.chr19.fq.gz,a3.mm9.chr19.fq.gz,b3.mm9.chr19.fq.gz \
--species=mm9 --script=""
Currently, only RNA-seq has been properly debugged in the updated '' program.
Tophat (splice-aware aligner)
Executable name: tophat (version 2.1.1)
To install: See details at:
Bowtie (non-spliced aligner)
Executable name: bowtie2 (version 2.2.4)
To install: See details at:
BCP (ChIP-seq peak caller for everything except TF binding)
Executable name: BCP_HM (version 1.1)
To install: Available at (Paper: )
bam2bed (Part of the 'bedops' suite. Converts BAM regions to BED files.)
Executable name: bam2bed (version 2.4.20)
Note: This program is required only for running BCP in the CHIPseq pipeline--nothing else uses it.
To install: Available here: (Paper: )
GEM (motif-aware ChIP-seq peak caller for TF binding sites)
Executable name: gem.jar (version 2.5)
To install: See instructions at:
(We were using version 2.5, but version 2.7+ is available now.)
htseq-count (reads -> gene-level counts)
Executable name: htseq-count (version 0.6.0)
To install: pip2.7 install HTseq
(Note: must be installed via pip (or other package manager). Do not just copy the binaries--it will not work.)
Executable name: samtools (version 1.3)
To install: yum install samtools (on Redhat/CentOS)
(Available through your package manager. Other examples: brew install samtools (Mac Homebrew), apt-get install samtools (Ubuntu))
edgeR (R library for differential expression)
Executable name: NA (version 3.14.0)
To install: source(''); biocLite('edgeR') (Available through Bioconductor.)
Executable name: java (version 1.8.0)
To install: Install via package manager or on Oracle's web site: (May also be possible to install using your package manager (e.g. 'yum install java'). Java is required to run 'gem.jar' and 'MarkDuplicates.jar')