Is it common that only about 30 MAGs of high quality were obtained from one metagenome sample? #37

hongzhonglu · 2021-04-09T14:51:27Z

hongzhonglu
Apr 9, 2021

Dear Francisco,
Recently, I have tried to use your pipeline to analyse some metagenome data. For the bin, i found I can only get about 30 MAGs of high quality (completeness > 90% by checkM). So I am wondering do you get similar results or could you get more MAGs of high quality which can be directly used for the model reconstruction? Thanks a lot!

Best,
Hongzhong

Answered by franciscozorrilla

Apr 10, 2021

Dear Hongzhong,

That sounds good actually, I generally got similar results from my human gut microbiome samples.
If I recall correctly, the largest gut community of GEMs we simulated in the metaGEM paper had around 60 members (all reconstructed from a single sample), but most samples had ~30 GEMs.
Of course the results will vary depending on the microbiome environment, sample complexity, sequencing depth, etc.

However, bear in mind that you can also use the medium quality MAGs to generate GEMs for simulation. In the paper (Fig. 2b) we showed that although GEMs from HQ MAGs tend to have more genes than GEMs from MQ MAGs, they show a very similar distribution in the number of reactions and …

View full answer

franciscozorrilla · 2021-04-10T17:40:33Z

franciscozorrilla
Apr 10, 2021
Maintainer

Dear Hongzhong,

That sounds good actually, I generally got similar results from my human gut microbiome samples.
If I recall correctly, the largest gut community of GEMs we simulated in the metaGEM paper had around 60 members (all reconstructed from a single sample), but most samples had ~30 GEMs.
Of course the results will vary depending on the microbiome environment, sample complexity, sequencing depth, etc.

However, bear in mind that you can also use the medium quality MAGs to generate GEMs for simulation. In the paper (Fig. 2b) we showed that although GEMs from HQ MAGs tend to have more genes than GEMs from MQ MAGs, they show a very similar distribution in the number of reactions and metabolites, suggesting that GEM reconstruction with CarveMe is robust towards genome completion (likely due to it's top-down approach).

Hope it helps and let me know if you have further questions!

Best wishes,
Francisco

0 replies

hongzhonglu · 2021-04-12T08:28:11Z

hongzhonglu
Apr 12, 2021
Author

Dear Francisco,
Great thanks for your detailed summary. I am wondering that whether the assembling genomes from metagenome is a good way to get the species information from one sample. As shown in one study, https://www.medrxiv.org/content/10.1101/2020.09.02.20187013v1, "For each patient sample, 16S-derived relative bacterial abundances were provided at different taxonomic levels that included 15 phyla, 28 classes, 38 orders, 71 families and 129 genera", they could obtain more species information from the sample. I think the number of species from metagenome sample is quite important to connect the microbiome composition with their potential function. Maybe we can blast the metagenome onto the reference gut species genome to get the species and abundance information.

Best,
Hongzhong

0 replies

hongzhonglu · 2021-04-12T09:07:10Z

hongzhonglu
Apr 12, 2021
Author

By the way, this tool is also a nice way to do the taxonomy analysis.
https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0
Based on its introduction, the following information can be obtained:

unambiguous taxonomic assignments;
an accurate estimation of organismal relative abundance;
species-level resolution for bacteria, archaea, eukaryotes, and viruses;
strain identification and tracking
orders of magnitude speedups compared to existing methods.
metagenomic strain-level population genomics

Best,
Hongzhong

0 replies

franciscozorrilla · 2021-04-12T09:27:11Z

franciscozorrilla
Apr 12, 2021
Maintainer

Dear Hongzhong,

If you are primarily interested in generating a list of species that are present in a metagenome then you may be better off using tools like mOTUs2, metaphlan, or kraken, which work directly on short read data (e.g. no assembly involved). These short-read-based-tools are generally more sensitive at detecting low abundance species compared to assembly-based approaches like metaGEM, although they offer less resolution at the genome level.

If I recall correctly from memory, for the human gut microbiome samples we mapped the short reads from each sample to their corresponding MAG-ome (i.e. single fasta file of all MAGs generated from a single metagenome) and found that between ~60-80% of reads mapped in each sample. This suggests that, even though we are not recovering hundreds of species per sample, we are capturing the species with the highest abundances.

Indeed, if you look at the distribution of relative abundances across samples you will see that the majority of species that are detected with these short-read-based methods have very low relative abundances (0.1%-0.01%), so they are unlikely to be contributing very much in terms of metabolic interactions.

Please let me know if you have further questions or suggestions.

Best wishes,
Francisco

0 replies

hongzhonglu · 2021-04-12T09:37:39Z

hongzhonglu
Apr 12, 2021
Author

Dear Francisco,
Thanks for your further sharing. It is now quite clear for me. There is a trade-off in the species number and the abundance. While I have a target in my study, it maybe nice if I can have about over 100 top species with abundance as the cut-off. On the other hand, if we have average 30 species per sample, I am afraid that some important information will be omitted. In fact, it seems that some species with lower abundance also affect the function.

Best,
Hongzhong

0 replies

franciscozorrilla · 2021-04-12T11:55:45Z

franciscozorrilla
Apr 12, 2021
Maintainer

Dear Hongzhong,

Indeed, low abundance species may undoubtedly play an important role in the microbiome. However, the metabolic fluxes through networks of species with low relative abundance are likely less significant/important than those of higher relative abundance species when studying the metabolism of metagenomes via flux balance analysis based methods such as SMETANA. For example, consider a 3 species system with relative abundances of 0.1%, 49.9%, and 50% respectively; in such a case it is easy to see that the metabolic fluxes through the last two species would likely to be dominating the function/phenotype of the microbiome since those species would have ~500x more biomass compared to the low abundance species. Of course in real life the low abundance species may be dominating the higher abundance species through signaling or secretion of toxins (e.g. Salmonella), but these effects would not necessarily be captured through FBA based methods.

Please also bear in mind that amplicon-based approaches like the one you mentioned (https://www.medrxiv.org/content/10.1101/2020.09.02.20187013v1) necessarily make use of reference genome based models (i.e. AGORA), which fail to capture and model the vast pangenomic variation present within species. In fact we highlight this point in the manuscript by showing pangenome curves for the top 10 most commonly reconstructed models (based on presence/absence of EC numbers in GEMs) in figure 2d. As you can see, the core genomes of these species only account for 40-60% of the diversity found in their pagenomes. Relying on reference based GEMs completely ignores this context-specific variability.

As a final comment, I wanted to mentioned that in the upcoming revision of the manuscript we show that many of the predicted metabolic interactions in the IGT/T2D communities are well documented in the literature, suggesting that the reconstructed communities of high abundance species can be used to successfully model the phenotype of gut microbiomes.

Best wishes,
Francisco

0 replies

hongzhonglu · 2021-04-12T12:05:21Z

hongzhonglu
Apr 12, 2021
Author

Thanks a lot! Very nice job! Looking forward to your new version of paper in metaGEM.

0 replies

franciscozorrilla · 2021-04-15T16:58:09Z

franciscozorrilla
Apr 15, 2021
Maintainer

I forgot to ask, how did you carry out the binning? You can get more/higher quality MAGs by using more samples (~100) and cross mapping each set of paired reads to each assembly for CONCOCT and then using metaWRAP for refining and reassembly as shown in this figure here.

0 replies

hongzhonglu · 2021-04-15T17:10:15Z

hongzhonglu
Apr 15, 2021
Author

Now I only test vamb (https://github.com/RasmussenLab/vamb) using one sample. So here some MAGs you mentioned may only exist in some samples even the total number of high quality MAGs is higher from more samples?

0 replies

franciscozorrilla · 2021-04-15T17:20:34Z

franciscozorrilla
Apr 15, 2021
Maintainer

Although it is a bit lengthy, I think that this discussion does a good job at explaining why using more samples can help you get better MAGs even if they are coming from a single sample. It is a counter-intuitive concept, but contig coverage across samples gives CONCOCT more information for binning contigs in a single sample.

I have not tried out vamb myself but I was very interested in testing it and perhaps integrating it into metaGEM. Have you compared vamb to the binners used by metaGEM?

0 replies

hongzhonglu · 2021-04-15T17:40:40Z

hongzhonglu
Apr 15, 2021
Author

It is really nice discussion with you. Currently, I did not compare vamb to the binners used in metaGEM as I want to find a simple procedure (or a short pipeline) to do the bin step at the start. I plan to do comparison later when I am free.

0 replies

franciscozorrilla · 2021-04-16T17:30:15Z

franciscozorrilla
Apr 16, 2021
Maintainer

I see, unfortunately there is no easy answer as I do not think that there is a golden standard for binning MAGs. In this twitter thread you can see that there are many differing opinions regarding what is the best binning software/procedures.

Btw, did you see the tutorial? Using two samples and the entire metaGEM binning workflow I got 135 MAGs with high completeness and low contamination.

0 replies

hongzhonglu · 2021-04-17T13:54:09Z

hongzhonglu
Apr 17, 2021
Author

Thanks for your sharing! I see your nice tutorial. By the way, how do you think of strain profiling based on MetaPhlAn 3.0 and mOTUs_v2? As an example, with mOTUs_v2, I can find much more annotated species. The mOTUs could also calculated relative abundances of each species. I am considering to utilize these tools together with bin strategy to overcome the limited species genomes from the current bin strategy.

0 replies

franciscozorrilla · 2021-04-28T11:02:37Z

franciscozorrilla
Apr 28, 2021
Maintainer

Hi Hongzhong, sorry for the late response! I have not personally tried metaphlan3 myself, so I cannot give any insights regarding how the perfomance compares to motus2. However, I think it is good complementary strategy to use short read based methods for strains/genomes that are too low abundance for MAG reconstruction.

0 replies

hongzhonglu · 2021-04-28T11:21:01Z

hongzhonglu
Apr 28, 2021
Author

Thanks Francisco！I check your tutorial and find concoct performs better than maxbin2 in your case. However, when I check maxbin2 paper (https://academic.oup.com/bioinformatics/article/32/4/605/1744462), it shows maxbin2 better than concoct. Is there anything I misunderstood?

0 replies

franciscozorrilla · 2021-04-28T13:58:12Z

franciscozorrilla
Apr 28, 2021
Maintainer

Hi Hongzhong, I think that the paper that you are referencing is very sneaky at presenting results 😈

Here are some of my thoughts:

The results shown in Figure 1 are based on simulated metagenomes of 100 species, not on real samples. In fact, they don't ever actually compare the performances of the binners on real samples!
MaxBin2 appears to be specifically designed and tested for binning co-assembled contigs as shown in sup fig 1 of the paper you cited. While there are situations where one could argue the merits of co-assembling samples (e.g. analyzing longitudinal patient samples or biological replicates), I do not think that coassembly is appropriate in the context of predicting metabolic interactions from individual human gut metagenomes (e.g. where 1 paired end sample = 1 metagenome). This is because you may end up generating chimeric contigs and MAGs that do not reflect actual biological entities.
When it comes to single sample assemblies MaxBin2 does not work as well as shown in sup fig 4. Note that also here they conveniently omit reporting the number of bins generated by CONCOCT and MetaBAT on the single sample asssemblies :)

I think that the authors were aware of what they were doing, and that is why they use very careful language when comparing the performance of MaxBin2 to the other binners:

Benchmarking the tools using different minimum contig length settings (500 and 1000 bps) revealed that MaxBin performed relatively well (in terms of F-score, which is the harmonic mean of precision and recall) compared to other binning tools (Fig. 1). It was also ranked first in tests involving 20 or more samples, indicating its accuracy in classifying contigs into distinct genomes.

Again, in the supplementary materials they confess that indeed CONCOCT had higher recall even on the simulated dataset:

By looking into precision and recall separately we found that MaxBin 2.0 achieved the 149 highest precision while CONCOCT had the highest recall, as shown in Figure S2

From my personal experience throughout the development of metaGEM, I have found that MaxBin tends to be outperformed by both MetaBAT2 and CONCOCT, although the results will depends on the dataset of course.

0 replies

hongzhonglu · 2021-04-28T14:49:59Z

hongzhonglu
Apr 28, 2021
Author

Hi Francisco, the comparison of different tools is a little confused as I see in MetaBAT2 paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662567/, they show MetaBAT2 come first while MaxBin2 come second in most cases. But anyway we should believe in what we can get.😀

0 replies

franciscozorrilla · 2021-04-28T15:23:49Z

franciscozorrilla
Apr 28, 2021
Maintainer

Yes, unfortunately each paper claims that their binner is superior to the state of the art in some way. The CAMI challenge papers seem to be the most unbiased and objective benchmark, here is the latest paper. In summary I think each tool has strengths and weaknesses, and this is why using multiple binners + dereplication/refinement strategies is common in state of the art papers like this one, which follows a MAG reconstruction protocol that is very similar to metaGEM.

0 replies

hongzhonglu · 2021-04-28T17:17:57Z

hongzhonglu
Apr 28, 2021
Author

I just got the result using vamb, maxbin2 and metabat2 in my calculation only using one sample as input. If using the cut-off, completness >=90%, contamination <=5%, the number of MAGs from vamb, maxbin2 and metabat2 is 26, 24 and 20 respectively.

0 replies

franciscozorrilla · 2021-04-28T18:11:33Z

franciscozorrilla
Apr 28, 2021
Maintainer

Thanks for sharing your results, that is very interesting. Did you try using coverage across multiple samples for binning? I believe all of these tools are benchmarked using contig coverage across multiple samples for binning to increase performance.

Also have you thought about comparing results with CONCOCT?

0 replies

hongzhonglu · 2021-04-28T18:21:28Z

hongzhonglu
Apr 28, 2021
Author

Currently I am not try using coverage across multiple samples for binning. Later I can check it.

0 replies

franciscozorrilla · 2021-04-28T19:51:12Z

franciscozorrilla
Apr 28, 2021
Maintainer

I am a bit surprised by the choice of excluding CONCOCT. Both papers cited in the post above are only a few weeks old and from leaders in the field, they use CONCOCT. Also from the CAMI paper: "Completeness was high for all methods and was highest for CONCOCT."

Yes, unfortunately each paper claims that their binner is superior to the state of the art in some way. The CAMI challenge papers seem to be the most unbiased and objective benchmark, here is the latest paper. In summary I think each tool has strengths and weaknesses, and this is why using multiple binners + dereplication/refinement strategies is common in state of the art papers like this one, which follows a MAG reconstruction protocol that is very similar to metaGEM.

I am surprised that they didn't compare against CONCOCT in the vamb paper.

0 replies

hongzhonglu · 2021-04-28T20:02:51Z

hongzhonglu
Apr 28, 2021
Author

Hi, I just want to make life easier😁. If one method is enough, I prefer to use only one method. As you said, CONCOCT is very valuable toolbox to be used. I agree with your ideas.

0 replies

njliangdong · 2024-07-22T06:49:30Z

njliangdong
Jul 22, 2024

I have a collection of metagenomic samples and I want to look at genomic microdiversity between these samples using the 'inStrain' tool. To do that I need to build a genomes db, and the documentation recommends to do de novo MGS assemblies using data from my samples to ensure that my genomes db has the specific genomes that exist in my samples (as opposed to just the closest genomes found in the public repositories).

So I've assembled each sample (using metaSPAdes & MEGAHIT), merged the resulting contigs into a single db and mapped each of my sample's reads against that db, and then I fed these many bamfiles to MetaBat2, which produced ~8k genome bins, which seems reasonable for my data.

But I'm now having a problem understanding how this should work. And its probably just my own lack of understanding of contig binning that I am hoping people here can help me with. I feel like each of my genome bins generated by MetaBat2 may have overlapping contig data. It makes sense to me that the contigs are correctly binned by genome, using the depth information per sample and similarity. But from what I've read I don't see anywhere that says each genome bin has been 'flattened' down to just the consensus of the assembly contigs.

So my question is: After doing contig binning using MetaBat2, do I need to build a single consensus per genome bin? Or has that already been done? Or do people even worry about having overlapping contigs in these genome bins?

1 reply

franciscozorrilla Jul 24, 2024
Maintainer

Hey @njliangdong , thanks for your comment. In the future please feel free to open a new issues/discussion, as it appears that your question is unrelated to this discussion.

So my question is: After doing contig binning using MetaBat2, do I need to build a single consensus per genome bin? Or has that already been done? Or do people even worry about having overlapping contigs in these genome bins?

No, this is not generally something people do or worry about, it is my understanding that this is dealt with by the metagenomic assembler. For example have a look at this alternative metagenomic processing workflow. Please also have a look at some assembler papers, e.g. megahit.

Best,
Francisco

Is it common that only about 30 MAGs of high quality were obtained from one metagenome sample? #37

hongzhonglu Apr 9, 2021

Replies: 24 comments · 1 reply

franciscozorrilla Apr 10, 2021 Maintainer

hongzhonglu Apr 12, 2021 Author

hongzhonglu Apr 12, 2021 Author

franciscozorrilla Apr 12, 2021 Maintainer

hongzhonglu Apr 12, 2021 Author

franciscozorrilla Apr 12, 2021 Maintainer

hongzhonglu Apr 12, 2021 Author

franciscozorrilla Apr 15, 2021 Maintainer

hongzhonglu Apr 15, 2021 Author

franciscozorrilla Apr 15, 2021 Maintainer

hongzhonglu Apr 15, 2021 Author

franciscozorrilla Apr 16, 2021 Maintainer

hongzhonglu Apr 17, 2021 Author

franciscozorrilla Apr 28, 2021 Maintainer

hongzhonglu Apr 28, 2021 Author

franciscozorrilla Apr 28, 2021 Maintainer

hongzhonglu Apr 28, 2021 Author

franciscozorrilla Apr 28, 2021 Maintainer

hongzhonglu Apr 28, 2021 Author

franciscozorrilla Apr 28, 2021 Maintainer

hongzhonglu Apr 28, 2021 Author

franciscozorrilla Apr 28, 2021 Maintainer

hongzhonglu Apr 28, 2021 Author

njliangdong Jul 22, 2024

franciscozorrilla Jul 24, 2024 Maintainer

hongzhonglu
Apr 9, 2021

Replies: 24 comments 1 reply

franciscozorrilla
Apr 10, 2021
Maintainer

hongzhonglu
Apr 12, 2021
Author

hongzhonglu
Apr 12, 2021
Author

franciscozorrilla
Apr 12, 2021
Maintainer

hongzhonglu
Apr 12, 2021
Author

franciscozorrilla
Apr 12, 2021
Maintainer

hongzhonglu
Apr 12, 2021
Author

franciscozorrilla
Apr 15, 2021
Maintainer

hongzhonglu
Apr 15, 2021
Author

franciscozorrilla
Apr 15, 2021
Maintainer

hongzhonglu
Apr 15, 2021
Author

franciscozorrilla
Apr 16, 2021
Maintainer

hongzhonglu
Apr 17, 2021
Author

franciscozorrilla
Apr 28, 2021
Maintainer

hongzhonglu
Apr 28, 2021
Author

franciscozorrilla
Apr 28, 2021
Maintainer

hongzhonglu
Apr 28, 2021
Author

franciscozorrilla
Apr 28, 2021
Maintainer

hongzhonglu
Apr 28, 2021
Author

franciscozorrilla
Apr 28, 2021
Maintainer

hongzhonglu
Apr 28, 2021
Author

franciscozorrilla
Apr 28, 2021
Maintainer

hongzhonglu
Apr 28, 2021
Author

njliangdong
Jul 22, 2024

franciscozorrilla Jul 24, 2024
Maintainer