-
Notifications
You must be signed in to change notification settings - Fork 1
CGATOxford/Pipeline_iCLIP
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Pipline_iCLIP -------------- This repository contains the code for the iCLIP pipeline developed for project 28. It should contain only that code required for the running of a generic iCLIP analysis. It starts with un-demultiplexed read files with barcodes and UMIs, processes and demuxes them, maps them using a specified mapper, by default STAR. BAMs are then deduplicated using the submodule UMI-Tools, and clusters called using a bionomial distribution within indevidual transcripts. Metagene plots are plotted and simple motif calling done using MEME and DREME. See pipeline documentation: =========================== Pipeline iCLIP =========================== :Author: Ian Sudbery :Release: $Id$ :Date: |today| :Tags: Python A pipeline template. Overview ======== Usage ===== See :ref:`PipelineSettingUp` and :ref:`PipelineRunning` on general information how to use CGAT pipelines. Configuration ------------- The pipeline requires a configured :file:`pipeline.ini` file. The sphinxreport report requires a :file:`conf.py` and :file:`sphinxreport.ini` file (see :ref:`PipelineDocumenation`). To start with, use the files supplied with the :ref:`Example` data. Input ----- The inputs should be a fastq file. The pipeline expects these to be raw fastq files. That is that they contain the UMIs and the barcodes still on the 5' end of the reads. It is expected that these will not be demultiplexed, although demultiplex fastqs will also work. In addition to the fastq files, a table of barcodes and samples is required as sample_table.tsv. It has four columns: The first contains the barcode including UMI bases, marked as Xs. The second contains the barcode sequence without the UMI bases. The third contains the sample name you'd like to use The fourth contains the fastq files that contain reads from this sample e.g. NNNGGTTNN GGTT FlipIn-FLAG-R1 hiseq7,hiseq8,miseq Means that the sample FlipIn-FLAG-R1 should have reads in the fastq files hiseq7, hiseq8 and miseq, is marked by the barcode GGTT and is embeded in the UMI as NNNGGTTNN. Optional inputs +++++++++++++++ Requirements ------------ The pipeline requires the results from :doc:`pipeline_annotations`. Set the configuration variable :py:data:`annotations_database` and :py:data:`annotations_dir`. On top of the default CGAT setup, the pipeline requires the following software to be in the path: +--------------------+-------------------+------------------------------------------------+ |*Program* |*Version* |*Purpose* | +--------------------+-------------------+------------------------------------------------+ |CGAPipelines | |Pipelining infrastructure, mapping pipeline | +--------------------+-------------------+------------------------------------------------+ |CGAT | >=0.2.4 |Various | +--------------------+-------------------+------------------------------------------------+ |Bowtie | >=1.1.2 |Filtering out PhiX reads | +--------------------+-------------------+------------------------------------------------+ |FastQC | >=0.11.2 |Quality Control of demuxed reads | +--------------------+-------------------+------------------------------------------------+ |STAR or HiSat | |Spliced mapping of reads | +--------------------+-------------------+------------------------------------------------+ |bedtools | |Interval manipulation | +--------------------+-------------------+------------------------------------------------+ |samtools | |Read manipulation | +--------------------+-------------------+------------------------------------------------+ |subread | |FeatureCounts read quantifiacation | +--------------------+-------------------+------------------------------------------------+ |MEME |>=4.9.1 |Motif finding | +--------------------+-------------------+------------------------------------------------+ |DREME |>=4.9.1 |Motif finding | +--------------------+-------------------+------------------------------------------------+ |bedGraphToBigWig | |Converstion of results to BigWig | +--------------------+-------------------+------------------------------------------------+ |reaper | 13-100 |Used for demuxing and clipping reads | +--------------------+-------------------+------------------------------------------------+ Pipeline output =============== As well as the report, clusters, as BED files are in the clusters.dir directory, and traces as bigWig files are in the bigwig directory. Both of these are exported as a UCSU genome browser track hub in the export directory. Example ======= Example data and configuration is avaiable in example_data.tar.gz
About
iCLIP analysis pipeline created for project 028
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published