Skip to content

Tools for working with NCBI taxonomy database (and DIAMOND output)

License

Notifications You must be signed in to change notification settings

pvanheus/diamond_add_taxonomy

Repository files navigation

diamond_add_taxonomy

This is a tool for reading in a DIAMOND format output file (from --outfmt 6 where the output includes staxids) and adding columns for the 7 'standard' taxonomy ranks using info from the NCBI taxonomy database.

Simple usage: diamond_add_taxonomy [OPTIONS] DIAMOND_OUTPUT_FILE

Full usage:

Usage: diamond_add_taxonomy [OPTIONS] DIAMOND_OUTPUT_FILE

  annotate_diamond - add lineage info to DIAMOND output file that includes
  staxids

  A new output file is created with 7 extra columns on the right hand side
  that contain the standard ranks superkingdom, phylym, class, order,
  family, genus and species corresponding to the NCBI taxid in the staxids
  column.

  The taxonomy lookup is performed using the NCBI taxonomy database via ete3
  NCBITaxa. If either a saved copy of the taxdump.tar.gz file or the sqlite3
  db generated by NCBITaxa is available these can be provided to reduce
  network usage and speed up processing.

  Args:     diamond_output_file(file) - file containing output from DIAMOND
  diamond_ouput_format(str) - format used for --outfmt with DIAMOND, must
  contain staxids field     output_file(file) - file to write output to
  (default is sys.stdout)     taxdump_filename(str) - path to NCBI
  taxdump.tar.gz file for the taxonomy resolver (optional)
  taxdb_filename(str) - path to a sqlite3 db created from NCBI
  taxdump.tar.gz by ete3 NCBITaxa

Options:
  --taxdump_filename PATH       Path to local copy of NCBI taxdump.tar.gz file
  --taxdb_filename PATH         Name for the processed database, will be
                                loaded if it exists
  --diamond_output_format TEXT  Output format used by DIAMOND (most include
                                staxids)
  --output_file FILENAME        Output file to write output with expanded
                                taxonomy information (TSV format)
  --help                        Show this message and exit.

installation

pip install diamond_add_taxonomy or install from Docker on quay.io or use Singularity (an image is available on the SANBI cluster).

build status

CircleCI

About

Tools for working with NCBI taxonomy database (and DIAMOND output)

Resources

License

Stars

Watchers

Forks

Packages

No packages published