Skip to content

crukci-bioinformatics/BamHeaderToRefflat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Create a refFLat file from the header of a SAM/BAM file

This repository contains a simple python script that takes the header of a BAM (or SAM) file created by Salmon (Patro et al., 2017) and generates a refFlat file based on the target contigs (ignores "decoy" sequences if present). This refFlat file can then be used in conjunction with the Picard tool CollectRnaSeqMetricx in order to gather data required to a plot a normalised gene body coverage plot.

Prerequisites

You will need Python and samtools installed and on the PATH.

Usage

create_refflat_from_sam_header.py -b <BamFile> -o <OutputFile>

There are two example bam files in example_data. One is from a good quality library and the other from a library with severe RNA degradation.

To test the script:

create_refflat_from_sam_header.py \
    -b test_data/Good_Sample.bam \
    -o refFlat.txt

We can then use this refFlat file in conjunction with Picard's CollectRnaSeqMetrics to generate data for a gene body coverage plot:

java -jar picard.jar CollectRnaSeqMetrics \
    INPUT=test_data/Good_Sample.bam \
    OUTPUT=Good_Sample.RNAseq_metrics.txt \
    REF_FLAT=refFlat.txt \
    STRAND=NONE
java -jar picard.jar CollectRnaSeqMetrics \
    INPUT=test_data/Degraded_Sample.bam \
    OUTPUT=Degraded_Sample.RNAseq_metrics.txt \
    REF_FLAT=refFlat.txt \
    STRAND=NONE

From these we can generate coverage plots for our two samples using an R script.

CoveragePlots.R

This should generate the file "images/Coverage_plots.png" with the following plot:

The coverage traces generated are not as smooth as we normally expect as the two example bam files have been downsampled to just 1.5-1.8 million reads, so that they are small enough to include in the repository.

References

Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published