seqpipeline (pipeline.py)

This is bioinformatics sequence-analysis pipelining software.

Currently, it is written in python, with a single R script to handle 'edgeR' differential expression.

The code is located at https://github.com/gladstone-institutes/seqpipeline

How to run it:

Make a new directory where you're going to run everything.
Download the test data (FASTQ reads) from http://gb.ucsf.edu/bio/public/kp-600/test_data/
- There are six of these files.
Put that test FASTQ data into a new folder named 'test_data'
You can now run one of the commands in the 'Makefile' for this project, which at the moment has two options:
- test_2groups
- test_3groups
If you run "make test_2groups", pipeline.py will be run and will generate an output file named 'script_test.sh'
You can then invoke that script by running "bash script_test.sh". That is how you actually run all the bioinformatics tools.
Finally, if you ran one of the 'make test_2groups' or 'make test_3groups' commands above, you can compare your output to the reference output in the "test_data" directory, for example: 2-group edgeR output.txt

Example command:

python2  ./pipeline.py --basedir="/data/projects/kp-600-b2b-osono-data-pipeline-run-feb-16/B-2016-11-November/test_data/" \
            --outdir="/data/projects/kp-600-b2b-osono-data-pipeline-run-feb-16/B-2016-11-November/" \
	--experiment-id="Test_3_Compare" \
	--sample-ids="X1,X2,Y1,Y2,Z3A,Z3B" \
	--groups="1,1,2,2,3,3" \
	--rna-samples=a1.mm9.chr19.fq.gz,a2.mm9.chr19.fq.gz,b1.mm9.chr19.fq.gz,b2.mm9.chr19.fq.gz,a3.mm9.chr19.fq.gz,b3.mm9.chr19.fq.gz \
	--species=mm9  --script="script_3_compare_test.sh"

To-do:

Currently, only RNA-seq has been properly debugged in the updated 'pipeline.py' program.

=========================================

Required software:

Tophat (splice-aware aligner)

Executable name: tophat (version 2.1.1)

To install: See details at: http://ccb.jhu.edu/software/tophat/index.shtml

Bowtie (non-spliced aligner)

Executable name: bowtie2 (version 2.2.4)

To install: See details at: http://bowtie-bio.sourceforge.net/index.shtml

BCP (ChIP-seq peak caller for everything except TF binding)

Executable name: BCP_HM (version 1.1)

To install: Available at https://cb.utdallas.edu/BCP/
           (Paper: http://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1002613 )

bam2bed (Part of the 'bedops' suite. Converts BAM regions to BED files.)

Executable name: bam2bed (version 2.4.20)

Note: This program is required only for running BCP in the CHIPseq pipeline--nothing else uses it.

To install: Available here: http://bedops.readthedocs.io/en/latest/content/installation.html
           (Paper: http://bioinformatics.oxfordjournals.org/content/28/14/1919.abstract )

GEM (motif-aware ChIP-seq peak caller for TF binding sites)

Executable name: gem.jar (version 2.5)

To install: See instructions at: http://groups.csail.mit.edu/cgs/gem/

           (We were using version 2.5, but version 2.7+ is available now.)

htseq-count (reads -> gene-level counts)

Executable name: htseq-count (version 0.6.0)

```
To install: pip2.7 install HTseq
```

           (Note: must be installed via pip (or other package manager). Do not just copy the binaries--it will not work.)

samtools

Executable name: samtools (version 1.3)

To install: yum install samtools (on Redhat/CentOS)

           (Available through your package manager. Other examples: brew install samtools (Mac Homebrew), apt-get install samtools (Ubuntu))

edgeR (R library for differential expression)

Executable name: NA (version 3.14.0)

To install: source('https://bioconductor.org/biocLite.R'); biocLite('edgeR')
           (Available through Bioconductor.)

java

Executable name: java (version 1.8.0)

To install: Install via package manager or on Oracle's web site: https://java.com/en/
           (May also be possible to install using your package manager (e.g. 'yum install java'). Java is required to run 'gem.jar' and 'MarkDuplicates.jar')

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
example_output_scripts		example_output_scripts
resources		resources
test_data		test_data
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pipeline.R		pipeline.R
pipeline.py		pipeline.py
run_edgeR_diff_expr.R		run_edgeR_diff_expr.R
test_bcp.sh		test_bcp.sh
test_gem.sh		test_gem.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seqpipeline (pipeline.py)

How to run it:

Example command:

To-do:

Required software:

About

Releases

Packages

Languages

gladstone-institutes/seqpipeline

Folders and files

Latest commit

History

Repository files navigation

seqpipeline (pipeline.py)

How to run it:

Example command:

To-do:

Required software:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages