VirSearch - searching viral sequences in metagenomes

Snakemake workflow to detect and classify viruses in metagenome assemblies.

It first detects viral sequences in assemblies (.fa files) with VirSorter2, VIBRANT and DeepVirFinder. Predictions are strictly quality controlled with CheckV, followed by clustering with CD-HIT and taxonomic classification with Demovir.

Installation

Install conda, snakemake (tested v6.3.0) and USEARCH.
Clone repository

git clone --recursive https://github.com/alexmsalmeida/virsearch.git

Download and extract necessary databases (uncompressed directory will require a total of 30 GB).

wget http://ftp.ebi.ac.uk/pub/databases/metagenomics/genome_sets/virsearch_db.tar.gz
tar -xzvf virsearch_db.tar.gz

How to run

Edit config.yml file to point to the input, output and databases directories, as well as the USEARCH binary location (usearch_binary). Input directory should contain the .fa assemblies to analyse.
Install the necessary conda environments through snakemake

snakemake --use-conda --conda-create-envs-only --cores 1

(option 1) Run the pipeline locally (adjust -j based on the number of available cores)

snakemake --use-conda -k -j 4

(option 2) Run the pipeline on a cluster (e.g., SLURM)

snakemake --use-conda -k -j 100 --cluster-config cluster.yml --cluster 'sbatch -A ALMEIDA-SL3-CPU -p icelake-himem --time=12:00:00 --ntasks={cluster.nCPU} --mem={cluster.mem} -o {cluster.output}'

Output

The main output files generated per input FASTA are the final_predictions.fa and final_predictions_tax.tsv files, which contain the viral sequences in FASTA format and their taxonomic annotation, respectively. If these files are empty it likely means that no high-confidence viral sequences were detected (check individual logs of the tools to confirm no other issues arose).

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
envs		envs
test/input		test/input
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
cluster.yml		cluster.yml
config.yml		config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VirSearch - searching viral sequences in metagenomes

Installation

How to run

Output

About

Releases

Packages

Languages

License

metagenome-atlas/virsearch

Folders and files

Latest commit

History

Repository files navigation

VirSearch - searching viral sequences in metagenomes

Installation

How to run

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages