-
Notifications
You must be signed in to change notification settings - Fork 25
GTDB taxonomy for the mOTUs
We annotated all genomes used for mOTUs with GTDB-Tk. We then merge the annotation of the genomes at the mOTUs level, which represent clusters of genomes (see below for details).
Here is the taxonomy:
mOTUs version | File | Annotation tool |
---|---|---|
mOTUs 3.0.0 - 3.0.3 | mOTUs_3.0.0_GTDB_tax.tsv | GTDB-Tk version 2.1 on database release 207 |
Each mOTU cluster is composed of 1 or more genomes, and for each genome we have a GTDB annotation that looks like:
GUT_GENOME002602 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002402 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
Note that for each taxonomic level there is either an annotation (example: s__Bacteroides fragilis
) or a missing annotation (example: s__
). For the evaluation that we are doing here we consider s__
(or g__
, etc.) as NA
.
Each taxonomic level in a mOTU can have three annotations:
If at least 80% of the genomes agree to one annotation, then that annotation is selected. Note that we consider only annotation that are not NA
. So for example, if we have 20 genomes in a mOTU, in all these cases there is an "agreeing" annotation:
Species: s__Bacteroides fragilis
# of genomes: 20
Annotated as s__Bacteroides fragilis
as 100% of the genomes (20/20) agree at species level.
Species: s__Bacteroides fragilis NA
# of genomes: 11 9
Annotated as s__Bacteroides fragilis
as 100% of the genomes (11/11) agree at species level.
Species: s__Bacteroides fragilis s__Bacteroides vulgatus NA
# of genomes: 11 1 8
Annotated as s__Bacteroides fragilis
as 91.6% of the genomes (11/12) agree at species level.
If all genomes at that taxonomic level do not have an annotation. Example:
Species: NA
# of genomes: 20
Note that in the mOTUs taxonomy we report it as Not_annotated [<last annotated level>]
. For example if a mOTUs is composed of 3 genomes with annotation:
GUT_GENOME002402 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39; ;g__;s__
GUT_GENOME002403 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
GUT_GENOME002404 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
The mOTU annotaion will be:
ref_mOTU_v3_00002 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;Not_annotated [f__UBA660];Not_annotated [f__UBA660]
If the genomes do not agree (<80% agreement) at one specific taxonomic level. Example:
Species: s__Bacteroides fragilis s__Bacteroides vulgatus NA
# of genomes: 11 7 2
Here the one with the highest agreement is s__Bacteroides fragilis
, but only 11 out of 18 (11+7, note that we do not count the NA
), which is 61% (below 80%), agree. Hence this level will be annotated as Incongruent [<last annotated level>]
.
Here is an example with a mOTUs with 5 genomes:
GUT_GENOME002602 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002603 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002604 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002605 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002606 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
Where at species level we have 3 s__Bacteroides vulgatus
and 2 s__Bacteroides fragilis
. The mOTUs annotation is:
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;Incongruent [g__Bacteroides]
Note: when a level is ingongruent, all levels underneath are set to incongruent. If we don't do this we can have a situation where from phylum to level it is incongruent and at species level it is inconguent (if some annotations are NA at species level). Example, of a mOTUs with two genomes:
{'d__Bacteria': 2}
{'p__Bacteroidota': 1, 'p__Riflebacteria': 1}
{'c__Bacteroidia': 1, 'c__Ozemobacteria': 1}
{'o__Bacteroidales': 1, 'o__Ozemobacterales': 1}
{'f__Bacteroidaceae': 1, 'f__Ozemobacteraceae': 1}
{'g__Prevotella': 1, 'g__RUG334': 1}
{'NA': 1, 's__RUG334': 1}
One of the two genomes is not annotated at species level, hence s__RUG334
would have a 100% agreement and it would not be "Incongruent" like the genus level. But we prevent this, hence the mOTUs annotation is:
ext_mOTU_v3_22969 d__Bacteria Incongruent [d__Bacteria] Incongruent [d__Bacteria] Incongruent [d__Bacteria] Incongruent [d__Bacteria] Incongruent [d__Bacteria] Incongruent [d__Bacteria]