This is a tool for reading in a DIAMOND format output file (from --outfmt 6
where the output includes staxids)
and adding columns for the 7 'standard' taxonomy ranks using info from the NCBI taxonomy database.
Simple usage: diamond_add_taxonomy [OPTIONS] DIAMOND_OUTPUT_FILE
Full usage:
Usage: diamond_add_taxonomy [OPTIONS] DIAMOND_OUTPUT_FILE
annotate_diamond - add lineage info to DIAMOND output file that includes
staxids
A new output file is created with 7 extra columns on the right hand side
that contain the standard ranks superkingdom, phylym, class, order,
family, genus and species corresponding to the NCBI taxid in the staxids
column.
The taxonomy lookup is performed using the NCBI taxonomy database via ete3
NCBITaxa. If either a saved copy of the taxdump.tar.gz file or the sqlite3
db generated by NCBITaxa is available these can be provided to reduce
network usage and speed up processing.
Args: diamond_output_file(file) - file containing output from DIAMOND
diamond_ouput_format(str) - format used for --outfmt with DIAMOND, must
contain staxids field output_file(file) - file to write output to
(default is sys.stdout) taxdump_filename(str) - path to NCBI
taxdump.tar.gz file for the taxonomy resolver (optional)
taxdb_filename(str) - path to a sqlite3 db created from NCBI
taxdump.tar.gz by ete3 NCBITaxa
Options:
--taxdump_filename PATH Path to local copy of NCBI taxdump.tar.gz file
--taxdb_filename PATH Name for the processed database, will be
loaded if it exists
--diamond_output_format TEXT Output format used by DIAMOND (most include
staxids)
--output_file FILENAME Output file to write output with expanded
taxonomy information (TSV format)
--help Show this message and exit.
pip install diamond_add_taxonomy
or install from Docker on quay.io or use Singularity (an image is available
on the SANBI cluster).