VirGenA is a reference guided assembler of highly variable viral genomes, based on iterative mapping and de novo reassembling of highly variable regions, which can handle with distant reference sequence due to specially designed read mapper. VirGenA can separate mixtures of strains of different intraspecies genetic groups (genotypes, subtypes, clades, etc.) and assemble a separate consensus sequence for each group in a mixture.
If provided with multiple sequence alignment (MSA) of target references VirGenA selects optimal reference set, sorts reads to selected references and outputs consensus sequences corresponding to these references. For each consensus sequence the multiple sequence alignment of its constituent reads is printed in BAM format.
If no MSA provided, VirGenA works in single-reference mode and use user-provided reference.
Multi-fragment references are supported in single-reference mode.
You can use VirGenA for full genome assembly or just to find optimal reference set for given fastq files with Illumina paired end reads.
Complete documentation is provided in wiki format.
VirGenA is a java application: it runs on any platform supporting JVM. Simply download the latest release file and run according to usage instructions.
The following are required to run VirGenA:
-Java version 8 or higher
-VSEARCH binary in any location. Path to the binary is set in configuration file. Recomended version is included in the distribution.
-Blast installed locally
To run VirGenA with test data download and unzip release files.
on Windows:
You can set number of threads in config_test_win.xml by changing value of ThreadNumber element.
Using Windows command promt change dir to unzipped folder and type:
java -jar ./VirGenA.jar assemble -c config_test_win.xml
on Linux:
You can set number of threads in config_test_linux.xml by changing value of ThreadNumber element.
Change permissions of ./tools/vsearch to make it executable. After that using shell change dir to unzipped folder and type:
java -jar VirGenA.jar assemble -c config_test_linux.xml
Test data is an artificial mixture containing 100000 HIV paired reads of three different subtypes (01_AE, B and C) in equal proportions. VirGenA should detect these components and assemble genome-length consensus sequences for all components.
Results will be stored in ./res/ folder. Expected output is:
- Files (fasta) with assemblies of three mixture components named after the selected references: 01_AE.TH.90.CM240.U54771_assembly.fasta, B.FR.83.HXB2_LAI_IIIB_BRU.K03455_assembly.fasta, C.BW.96.96BW0502.AF110967_assembly.fasta
- Sorted bam files with read alignments and corresponding index files (bai): 'reference_name'_mapped_reads.bam and 'reference_name'_mapped_reads.bai
- Log file.
Fedonin GG, Fantin YS, Favorov AV, Shipulin GA, Neverov AD. VirGenA: a reference-based assembler for variable viral genomes. Brief Bioinform, 2017 Jul 28. doi: 10.1093/bib/bbx079.