-
Notifications
You must be signed in to change notification settings - Fork 25
GTDB taxonomy for the mOTUs
We annotated all genomes used for mOTUs with GTDB-Tk. We then merge the annotation of the genomes at the mOTUs level, which represent clusters of genomes (see below for details).
Here is the taxonomy:
mOTUs version | File | Annotation tool |
---|---|---|
mOTUs 3.0.0 - 3.0.3 | mOTUs_3.0.0_GTDB_tax.tsv | GTDB-Tk version 2.1 on database release 207 |
Each mOTU cluster is composed of 1 or more genomes, and for each genome we have a GTDB annotation that looks like:
GUT_GENOME002602 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002402 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
Note that for each taxonomic level there is either an annotation (example: s__Bacteroides fragilis
) or a missing annotation (example: s__
). For the evaluation that we are doing here we consider s__
(or g__
, etc.) as NA
.
Each taxonomic level in a mOTU can have three annotations:
If at least 80% of the genomes agree to one annotation, then that annotation is selected. Note that we consider only annotation that are not NA
. So for example, if we have 20 genomes in a mOTU, in all these cases there is an "agreeing" annotation:
Species: s__Bacteroides fragilis
# of genomes: 20
Annotated as s__Bacteroides fragilis
as 100% of the genomes (20/20) agree at species level.
Species: s__Bacteroides fragilis NA
# of genomes: 11 9
Annotated as s__Bacteroides fragilis
as 100% of the genomes (11/11) agree at species level.
Species: s__Bacteroides fragilis s__Bacteroides vulgatus NA
# of genomes: 11 1 8
Annotated as s__Bacteroides fragilis
as 91.6% of the genomes (11/12) agree at species level.
If all genomes at that taxonomic level do not have an annotation. Example:
Species: NA
# of genomes: 20
Note that in the mOTUs taxonomy we report it as Not_annotated [<last annotated level>]
. For example if a mOTUs is composed of 3 genomes with annotation:
GUT_GENOME002402 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39; ;g__;s__
GUT_GENOME002403 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
GUT_GENOME002404 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;g__;s__
The mOTU annotaion will be:
ref_mOTU_v3_00002 d__Bacteria;p__Firmicutes;c__Bacilli;o__RF39;f__UBA660;Not_annotated [f__UBA660];Not_annotated [f__UBA660]
If the genomes do not agree (<80% agreement) at one specific taxonomic level. Example:
Species: s__Bacteroides fragilis s__Bacteroides vulgatus NA
# of genomes: 11 7 2
Here the one with the highest agreement is s__Bacteroides fragilis
, but only 11 out of 18 (11+7, note that we do not count the NA
), which is 61% (below 80%), agree. Hence this level will be annotated as Incongruent [<last annotated level>]
.
Here is an example with a mOTUs with 5 genomes:
GUT_GENOME002602 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002603 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides fragilis
GUT_GENOME002604 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002605 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
GUT_GENOME002606 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides vulgatus
Where at species level we have 3 s__Bacteroides vulgatus
and 2 s__Bacteroides fragilis
. The mOTUs annotation is:
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;Incongruent [g__Bacteroides]