Is it common that only about 30 MAGs of high quality were obtained from one metagenome sample? #37
-
Dear Francisco, Best, |
Beta Was this translation helpful? Give feedback.
Replies: 24 comments 1 reply
-
Dear Hongzhong, That sounds good actually, I generally got similar results from my human gut microbiome samples. However, bear in mind that you can also use the medium quality MAGs to generate GEMs for simulation. In the paper (Fig. 2b) we showed that although GEMs from HQ MAGs tend to have more genes than GEMs from MQ MAGs, they show a very similar distribution in the number of reactions and metabolites, suggesting that GEM reconstruction with CarveMe is robust towards genome completion (likely due to it's top-down approach). Hope it helps and let me know if you have further questions! Best wishes, |
Beta Was this translation helpful? Give feedback.
-
Dear Francisco, Best, |
Beta Was this translation helpful? Give feedback.
-
By the way, this tool is also a nice way to do the taxonomy analysis.
Best, |
Beta Was this translation helpful? Give feedback.
-
Dear Hongzhong, If you are primarily interested in generating a list of species that are present in a metagenome then you may be better off using tools like mOTUs2, metaphlan, or kraken, which work directly on short read data (e.g. no assembly involved). These short-read-based-tools are generally more sensitive at detecting low abundance species compared to assembly-based approaches like metaGEM, although they offer less resolution at the genome level. If I recall correctly from memory, for the human gut microbiome samples we mapped the short reads from each sample to their corresponding MAG-ome (i.e. single fasta file of all MAGs generated from a single metagenome) and found that between ~60-80% of reads mapped in each sample. This suggests that, even though we are not recovering hundreds of species per sample, we are capturing the species with the highest abundances. Indeed, if you look at the distribution of relative abundances across samples you will see that the majority of species that are detected with these short-read-based methods have very low relative abundances (0.1%-0.01%), so they are unlikely to be contributing very much in terms of metabolic interactions. Please let me know if you have further questions or suggestions. Best wishes, |
Beta Was this translation helpful? Give feedback.
-
Dear Francisco, Best, |
Beta Was this translation helpful? Give feedback.
-
Dear Hongzhong, Indeed, low abundance species may undoubtedly play an important role in the microbiome. However, the metabolic fluxes through networks of species with low relative abundance are likely less significant/important than those of higher relative abundance species when studying the metabolism of metagenomes via flux balance analysis based methods such as SMETANA. For example, consider a 3 species system with relative abundances of 0.1%, 49.9%, and 50% respectively; in such a case it is easy to see that the metabolic fluxes through the last two species would likely to be dominating the function/phenotype of the microbiome since those species would have ~500x more biomass compared to the low abundance species. Of course in real life the low abundance species may be dominating the higher abundance species through signaling or secretion of toxins (e.g. Salmonella), but these effects would not necessarily be captured through FBA based methods. Please also bear in mind that amplicon-based approaches like the one you mentioned (https://www.medrxiv.org/content/10.1101/2020.09.02.20187013v1) necessarily make use of reference genome based models (i.e. AGORA), which fail to capture and model the vast pangenomic variation present within species. In fact we highlight this point in the manuscript by showing pangenome curves for the top 10 most commonly reconstructed models (based on presence/absence of EC numbers in GEMs) in figure 2d. As you can see, the core genomes of these species only account for 40-60% of the diversity found in their pagenomes. Relying on reference based GEMs completely ignores this context-specific variability. As a final comment, I wanted to mentioned that in the upcoming revision of the manuscript we show that many of the predicted metabolic interactions in the IGT/T2D communities are well documented in the literature, suggesting that the reconstructed communities of high abundance species can be used to successfully model the phenotype of gut microbiomes. Best wishes, |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot! Very nice job! Looking forward to your new version of paper in metaGEM. |
Beta Was this translation helpful? Give feedback.
-
I forgot to ask, how did you carry out the binning? You can get more/higher quality MAGs by using more samples (~100) and cross mapping each set of paired reads to each assembly for CONCOCT and then using metaWRAP for refining and reassembly as shown in this figure here. |
Beta Was this translation helpful? Give feedback.
-
Now I only test vamb (https://github.com/RasmussenLab/vamb) using one sample. So here some MAGs you mentioned may only exist in some samples even the total number of high quality MAGs is higher from more samples? |
Beta Was this translation helpful? Give feedback.
-
Although it is a bit lengthy, I think that this discussion does a good job at explaining why using more samples can help you get better MAGs even if they are coming from a single sample. It is a counter-intuitive concept, but contig coverage across samples gives CONCOCT more information for binning contigs in a single sample. I have not tried out vamb myself but I was very interested in testing it and perhaps integrating it into metaGEM. Have you compared vamb to the binners used by metaGEM? |
Beta Was this translation helpful? Give feedback.
-
It is really nice discussion with you. Currently, I did not compare vamb to the binners used in metaGEM as I want to find a simple procedure (or a short pipeline) to do the bin step at the start. I plan to do comparison later when I am free. |
Beta Was this translation helpful? Give feedback.
-
I see, unfortunately there is no easy answer as I do not think that there is a golden standard for binning MAGs. In this twitter thread you can see that there are many differing opinions regarding what is the best binning software/procedures. Btw, did you see the tutorial? Using two samples and the entire |
Beta Was this translation helpful? Give feedback.
-
Thanks for your sharing! I see your nice tutorial. By the way, how do you think of strain profiling based on MetaPhlAn 3.0 and mOTUs_v2? As an example, with mOTUs_v2, I can find much more annotated species. The mOTUs could also calculated relative abundances of each species. I am considering to utilize these tools together with bin strategy to overcome the limited species genomes from the current bin strategy. |
Beta Was this translation helpful? Give feedback.
-
Hi Hongzhong, sorry for the late response! I have not personally tried metaphlan3 myself, so I cannot give any insights regarding how the perfomance compares to motus2. However, I think it is good complementary strategy to use short read based methods for strains/genomes that are too low abundance for MAG reconstruction. |
Beta Was this translation helpful? Give feedback.
-
Thanks Francisco!I check your tutorial and find concoct performs better than maxbin2 in your case. However, when I check maxbin2 paper (https://academic.oup.com/bioinformatics/article/32/4/605/1744462), it shows maxbin2 better than concoct. Is there anything I misunderstood? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Hi Francisco, the comparison of different tools is a little confused as I see in MetaBAT2 paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662567/, they show MetaBAT2 come first while MaxBin2 come second in most cases. But anyway we should believe in what we can get.😀 |
Beta Was this translation helpful? Give feedback.
-
Yes, unfortunately each paper claims that their binner is superior to the state of the art in some way. The CAMI challenge papers seem to be the most unbiased and objective benchmark, here is the latest paper. In summary I think each tool has strengths and weaknesses, and this is why using multiple binners + dereplication/refinement strategies is common in state of the art papers like this one, which follows a MAG reconstruction protocol that is very similar to metaGEM. |
Beta Was this translation helpful? Give feedback.
-
I just got the result using vamb, maxbin2 and metabat2 in my calculation only using one sample as input. If using the cut-off, completness >=90%, contamination <=5%, the number of MAGs from vamb, maxbin2 and metabat2 is 26, 24 and 20 respectively. |
Beta Was this translation helpful? Give feedback.
-
Thanks for sharing your results, that is very interesting. Did you try using coverage across multiple samples for binning? I believe all of these tools are benchmarked using contig coverage across multiple samples for binning to increase performance. Also have you thought about comparing results with CONCOCT? |
Beta Was this translation helpful? Give feedback.
-
Currently I am not try using coverage across multiple samples for binning. Later I can check it. |
Beta Was this translation helpful? Give feedback.
-
I am a bit surprised by the choice of excluding CONCOCT. Both papers cited in the post above are only a few weeks old and from leaders in the field, they use CONCOCT. Also from the CAMI paper: "Completeness was high for all methods and was highest for CONCOCT."
I am surprised that they didn't compare against CONCOCT in the vamb paper. |
Beta Was this translation helpful? Give feedback.
-
Hi, I just want to make life easier😁. If one method is enough, I prefer to use only one method. As you said, CONCOCT is very valuable toolbox to be used. I agree with your ideas. |
Beta Was this translation helpful? Give feedback.
-
I have a collection of metagenomic samples and I want to look at genomic microdiversity between these samples using the 'inStrain' tool. To do that I need to build a genomes db, and the documentation recommends to do de novo MGS assemblies using data from my samples to ensure that my genomes db has the specific genomes that exist in my samples (as opposed to just the closest genomes found in the public repositories). So I've assembled each sample (using metaSPAdes & MEGAHIT), merged the resulting contigs into a single db and mapped each of my sample's reads against that db, and then I fed these many bamfiles to MetaBat2, which produced ~8k genome bins, which seems reasonable for my data. But I'm now having a problem understanding how this should work. And its probably just my own lack of understanding of contig binning that I am hoping people here can help me with. I feel like each of my genome bins generated by MetaBat2 may have overlapping contig data. It makes sense to me that the contigs are correctly binned by genome, using the depth information per sample and similarity. But from what I've read I don't see anywhere that says each genome bin has been 'flattened' down to just the consensus of the assembly contigs. So my question is: After doing contig binning using MetaBat2, do I need to build a single consensus per genome bin? Or has that already been done? Or do people even worry about having overlapping contigs in these genome bins? |
Beta Was this translation helpful? Give feedback.
Dear Hongzhong,
That sounds good actually, I generally got similar results from my human gut microbiome samples.
If I recall correctly, the largest gut community of GEMs we simulated in the metaGEM paper had around 60 members (all reconstructed from a single sample), but most samples had ~30 GEMs.
Of course the results will vary depending on the microbiome environment, sample complexity, sequencing depth, etc.
However, bear in mind that you can also use the medium quality MAGs to generate GEMs for simulation. In the paper (Fig. 2b) we showed that although GEMs from HQ MAGs tend to have more genes than GEMs from MQ MAGs, they show a very similar distribution in the number of reactions and …