Skip to content

GTDB taxonomy for the mOTUs

Alessio Milanese edited this page Sep 12, 2022 · 4 revisions

We annotated all genomes used for mOTUs with GTDB-Tk. We then merge the annotation of the genomes at the mOTUs level, which represent clusters of genomes (see below for details).

Here is the taxonomy:

mOTUs version File Annotation tool
mOTUs 3.0.0 - 3.0.3 mOTUs_3.0.0_GTDB_tax.tsv GTDB-Tk version 2.1 on database release 207

Annotation of mOTUs

Each mOTU cluster is composed of 1 or more genomes, and for each genome we have a GTDB annotation that looks like:

GUT_GENOME002602	d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002402	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__

Note that for each taxonomic level there is either an annotation (example: s__Bacteroides fragilis) or a missing annotation (example: s__). For the evaluation that we are doing here we consider s__ (or g__, etc.) as NA.

Each taxonomic level in a mOTU can have three annotations:

Agreeing

If at least 80% of the genomes agree to one annotation, then that annotation is selected. Note that we consider only annotation that are not NA. So for example, if we have 20 genomes in a mOTU, in all these cases there is an "agreeing" annotation:

Species:       s__Bacteroides fragilis
# of genomes:                       20

Annotated as s__Bacteroides fragilis as 100% of the genomes (20/20) agree at species level.

Species:       s__Bacteroides fragilis     NA
# of genomes:                       11      9

Annotated as s__Bacteroides fragilis as 100% of the genomes (11/11) agree at species level.

Species:       s__Bacteroides fragilis   s__Bacteroides vulgatus     NA
# of genomes:                       11                         1      8

Annotated as s__Bacteroides fragilis as 91.6% of the genomes (11/12) agree at species level.

Not annotated

If all genomes at that taxonomic level do not have an annotation. Example:

Species:           NA
# of genomes:      20

Note that in the mOTUs taxonomy we report it as Not_annotated [<last annotated level>]. For example if a mOTUs is composed of 3 genomes with annotation:

GUT_GENOME002402	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;         ;g__;s__
GUT_GENOME002403	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
GUT_GENOME002404	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__

The mOTU annotaion will be:

ref_mOTU_v3_00002	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;Not_annotated [f__UBA660];Not_annotated [f__UBA660]

Incongruent

If the genomes do not agree (<80% agreement) at one specific taxonomic level. Example:

Species:       s__Bacteroides fragilis   s__Bacteroides vulgatus     NA
# of genomes:                       11                         7      2

Here the one with the highest agreement is s__Bacteroides fragilis, but only 11 out of 18 (11+7, note that we do not count the NA), which is 61% (below 80%), agree. Hence this level will be annotated as Incongruent [<last annotated level>].

Here is an example with a mOTUs with 5 genomes:

GUT_GENOME002602  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002603  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002604  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002605  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002606  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus

Where at species level we have 3 s__Bacteroides vulgatus and 2 s__Bacteroides fragilis. The mOTUs annotation is:

d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;Incongruent [g__Bacteroides]