Skip to content

GTDB taxonomy for the mOTUs

Alessio Milanese edited this page Sep 12, 2022 · 4 revisions

We annotated all genomes used for mOTUs with GTDB-Tk. We then merge the annotation of the genomes at the mOTUs level, which represent clusters of genomes (see below for details).

Here is the taxonomy:

mOTUs version File Annotation tool
mOTUs 3.0.0 - 3.0.3 mOTUs_3.0.0_GTDB_tax.tsv GTDB-Tk version 2.1 on database release 207

Annotation of mOTUs

Each mOTU cluster is composed of 1 or more genomes, and for each genome we have a GTDB annotation that looks like:

GUT_GENOME002602	d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002402	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__

Note that for each taxonomic level there is either an annotation (example: s__Bacteroides fragilis) or a missing annotation (example: s__). For the evaluation that we are doing here we consider s__ (or g__, etc.) as NA.

Each taxonomic level in a mOTU can have three annotations:

Agreeing

If at least 80% of the genomes agree to one annotation, then that annotation is selected. Note that we consider only annotation that are not NA. So for example, if we have 20 genomes in a mOTU, in all these cases there is an "agreeing" annotation:

Species:       s__Bacteroides fragilis
# of genomes:                       20

Annotated as s__Bacteroides fragilis as 100% of the genomes (20/20) agree at species level.

Species:       s__Bacteroides fragilis     NA
# of genomes:                       11      9

Annotated as s__Bacteroides fragilis as 100% of the genomes (11/11) agree at species level.

Species:       s__Bacteroides fragilis   s__Bacteroides vulgatus     NA
# of genomes:                       11                         1      8

Annotated as s__Bacteroides fragilis as 91.6% of the genomes (11/12) agree at species level.

Not annotated

If all genomes at that taxonomic level do not have an annotation. Example:

Species:           NA
# of genomes:      20

Note that in the mOTUs taxonomy we report it as Not_annotated [<last annotated level>]. For example if a mOTUs is composed of 3 genomes with annotation:

GUT_GENOME002402	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;         ;g__;s__
GUT_GENOME002403	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
GUT_GENOME002404	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__

The mOTU annotaion will be:

ref_mOTU_v3_00002	d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;Not_annotated [f__UBA660];Not_annotated [f__UBA660]

Incongruent

If the genomes do not agree (<80% agreement) at one specific taxonomic level. Example:

Species:       s__Bacteroides fragilis   s__Bacteroides vulgatus     NA
# of genomes:                       11                         7      2

Here the one with the highest agreement is s__Bacteroides fragilis, but only 11 out of 18 (11+7, note that we do not count the NA), which is 61% (below 80%), agree. Hence this level will be annotated as Incongruent [<last annotated level>].

Here is an example with a mOTUs with 5 genomes:

GUT_GENOME002602  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002603  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002604  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002605  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002606  d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus

Where at species level we have 3 s__Bacteroides vulgatus and 2 s__Bacteroides fragilis. The mOTUs annotation is:

d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;Incongruent [g__Bacteroides]

Note: when a level is ingongruent, all levels underneath are set to incongruent. If we don't do this we can have a situation where from phylum to level it is incongruent and at species level it is inconguent (if some annotations are NA at species level). Example, of a mOTUs with two genomes:

{'d__Bacteria': 2}
{'p__Bacteroidota': 1, 'p__Riflebacteria': 1}
{'c__Bacteroidia': 1, 'c__Ozemobacteria': 1}
{'o__Bacteroidales': 1, 'o__Ozemobacterales': 1}
{'f__Bacteroidaceae': 1, 'f__Ozemobacteraceae': 1}
{'g__Prevotella': 1, 'g__RUG334': 1}
{'NA': 1, 's__RUG334': 1}

One of the two genomes is not annotated at species level, hence s__RUG334 would have a 100% agreement and it would not be "Incongruent" like the genus level. But we prevent this, hence the mOTUs annotation is:

ext_mOTU_v3_22969	d__Bacteria	Incongruent [d__Bacteria]	Incongruent [d__Bacteria]	Incongruent [d__Bacteria]	Incongruent [d__Bacteria]	Incongruent [d__Bacteria]	Incongruent [d__Bacteria]