Skip to content

A tool set to assess the quality of the per read phasing and reduce the errors.

License

Notifications You must be signed in to change notification settings

sinamajidian/phaseme

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhaseME

PhaseME is a tool set to assess the quality of the per read phasing information and help to reduce the errors during this process.

Quickstart

1- You require your VCF file to be read based phased, which can be generated by e.g. WhatsHap.

2- Run PhaseME using Python3 on Linux to obtain stats and improve the quality of phase blocks. The only requirement is Numpy.

tar -xzf precomputed/pairlist.tar.gz
python phaseme.py improver my.vcf output_prefix

If you only want to have the quality assessment report use quality instead of improver.

Installation Test

Please try our sample data to establish the correctness of the pipeline installation. This can be found in the folder example.

python phaseme.py improver example/my.vcf example/out

The output will be a quality assessment report example/out/quality.csv as well as an improved version of the input phased VCF example/out/improved.vcf.

Complete usage

In quick start section the precomputed linkage information is used. Here, individual-specific linkage information is considered. For grasping the full advantage of PhaseME, few steps are needed prior using PhaseME.

1- You need to download 1000 Genomes reference panel haplotypes.

wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz

Warning: These are more than 10Gb.

2- You need to download the Shapeit

3- Now, run PhasME as following.

python phaseme.py improver my.vcf output_prefix  /path/to/shapeit /path/to/1000G/dataset

Parental mode

PhaseME can also assess and improve the phasings results using parental data instead of linkage information. The user should prepare a three-sample VCF including son, mother and father SNV in this order. This can be done using e.g. bcftools merge. Prior to that you may need bgzip, tabix and bcftools index on all three samples.

To obtain quality insights:

python phaseme.py quality example/trio.vcf example/out_trio_q trio

Once you want to improve phasing results:

python phaseme.py improver example/trio.vcf example/out_trio trio

For using PhaseME in MAC computer please check the folder mac.

Citation:

Please see and cite our manuscript: "PhaseME: automatic assessment of phasing quality and phasing improvement", GigaSceince, 2020.

PhaseME has been registered in BioTools.

About

A tool set to assess the quality of the per read phasing and reduce the errors.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages