Explain what capped 8 / 16 for the kraken DBs means #33

paulzierep · 2024-08-01T09:46:42Z

Could you kindly explain how the DB are capped for the kraken2 DB ? Random Subsampling of the input or the DB itself ?

ChillarAnand · 2024-08-04T12:47:54Z

The DB itself is capped at 8/16 GB. Thats why you can see the size of those dbs is limited to 8GB/16GB.

https://benlangmead.github.io/aws-indexes/k2

paulzierep · 2024-08-05T06:12:53Z

Thank you very much for the response, but could you explain how this is done technically? FYI, I have a student who investigates the performance of kraken2 DBs, and we are also looking into the effect of the capped DBs, but it would be good if we could explain what the technical difference is.

ChillarAnand · 2024-08-05T06:33:25Z

I have already requested that the scripts that are used to build these indices be shared.

#31

There is no response yet. If the scripts are available, we know exactly how the DB is capped.

I also build a wide variety of kraken indices and created kraken-db-builder to speed the index building. You can take a look if you are interested.

https://github.com/AvilPage/kraken-db-builder

incoherentian · 2024-09-10T13:15:50Z

Hey! The RAM-friendly db are indexed the same as the other full dbs. @BenLangmead &al. then subsample the resulting kmers from each genome until they fit within the variously size-constrained indices. So the more genomes included, the smaller the kmer subsample for each included genome... hope that makes sense? That's my interpretation at least, hopefully correct.

It would probably make sense to first start further reducing the input number genomes for over-represented species, but that would require some subjective choices and be way, way, way too much manual curation. Curious as to what your student turns up @paulzierep

I originally found this page when googling who to thank for these prebuilt dbs, as they've saved me a lot of effort and highmem node queuing over the last couple of years. ("This project is maintained by BenLangmead" in the corner of the project site did not initially clue me in, so I'm clearly not very brilliant.) I am thankful though - thanks db maintainers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain what capped 8 / 16 for the kraken DBs means #33

Explain what capped 8 / 16 for the kraken DBs means #33

paulzierep commented Aug 1, 2024

ChillarAnand commented Aug 4, 2024

paulzierep commented Aug 5, 2024

ChillarAnand commented Aug 5, 2024

incoherentian commented Sep 10, 2024

Explain what capped 8 / 16 for the kraken DBs means #33

Explain what capped 8 / 16 for the kraken DBs means #33

Comments

paulzierep commented Aug 1, 2024

ChillarAnand commented Aug 4, 2024

paulzierep commented Aug 5, 2024

ChillarAnand commented Aug 5, 2024

incoherentian commented Sep 10, 2024