Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Requirements for Build #496

Open
bcantarel opened this issue Jul 20, 2024 · 9 comments
Open

Memory Requirements for Build #496

bcantarel opened this issue Jul 20, 2024 · 9 comments

Comments

@bcantarel
Copy link

I am trying to build an index for NT (NCBI non-redundant nucleotide database). I am using a machine with 96CPUs and 756GB, I have tried:
metagraph build -v -k 31 -o graph -p 96 --disk-swap /mnt/tmp/ --mem-cap-gb 700 nt.fasta

But the job is "killed"

I have also tried to use kmc to calculate the kmers prior to running metagraph build

Any advice on the command or resources needed to build a graph for ~ 1.5TB of sequence data?

@karasikov
Copy link
Member

Hi Brandi

The command looks fine.

Could you please also post the logs so we can see at what step it gets killed?

@bcantarel
Copy link
Author

[2024-08-12 12:58:43.779] [trace] Metagraph started
[2024-08-12 12:58:43.779] [trace] Build De Bruijn Graph with k-mer size k=20
[2024-08-12 12:58:43.779] [trace] Start reading data and extracting k-mers
[2024-08-12 12:58:43.779] [trace] Reserved buffer: 1073.741824 MB, capacity: 134217728 k-mers
[2024-08-12 12:58:43.779] [trace] Parsing nt.kmc.kmc_pre
[2024-08-12 12:58:58.678] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 12:59:00.235] [trace] Erasing duplicate values done. Size reduced from 134200000 to 134197287, 1073.578296 MB
[2024-08-12 12:59:08.272] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 12:59:10.498] [trace] Erasing duplicate values done. Size reduced from 201322287 to 201320614, 1610.564912 MB
[2024-08-12 12:59:26.986] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 12:59:30.896] [trace] Erasing duplicate values done. Size reduced from 335520614 to 335516737, 2684.133896 MB
[2024-08-12 12:59:55.056] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:00:01.495] [trace] Erasing duplicate values done. Size reduced from 536866737 to 536855679, 4294.845432 MB
[2024-08-12 13:00:32.088] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:00:41.516] [trace] Erasing duplicate values done. Size reduced from 805305679 to 805304842, 6442.438736 MB
[2024-08-12 13:01:47.530] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:02:03.825] [trace] Erasing duplicate values done. Size reduced from 1342154842 to 1342153029, 10737.224232 MB
[2024-08-12 13:03:41.677] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:04:07.456] [trace] Erasing duplicate values done. Size reduced from 2147478029 to 2147474973, 17179.799784 MB
[2024-08-12 13:06:09.002] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:06:48.169] [trace] Erasing duplicate values done. Size reduced from 3221224973 to 3221220724, 25769.765792 MB
[2024-08-12 13:11:07.906] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:12:15.582] [trace] Erasing duplicate values done. Size reduced from 5368695724 to 5368685772, 42949.486176 MB
[2024-08-12 13:18:50.362] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:20:40.890] [trace] Erasing duplicate values done. Size reduced from 8589910772 to 8589895773, 68719.166184 MB
[2024-08-12 13:28:51.801] [trace] Allocated capacity exceeded, erase duplicate values...
[2024-08-12 13:31:39.630] [trace] Erasing duplicate values done. Size reduced from 12884895773 to 12884877898, 103079.023184 MB

@karasikov
Copy link
Member

Hi Brandi, I was on holidays, sorry for the late response.

The logs look fine. Also, it seems like it gets killed after about 30 min, so, relatively, in the beginning.

Not sure how we can troubleshoot this.

Can you upload the inputs somewhere or share the data, so I can run the same command myself?

@bcantarel
Copy link
Author

Sure, it's just the nt database from the NCBI https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz

@karasikov
Copy link
Member

Hi Brandi,

I see that it's using k=20 in the logs and it's also constructing from a KMC file (also, with a small buffer, only 1 GB).
In the first message, you wrote k=31 and constructing from fasta.
Could you please clarify which one you're using? Or, both of them fail?

@bcantarel
Copy link
Author

I tried both -- k=20 was the latest attempt.

@bcantarel
Copy link
Author

both failed

@karasikov
Copy link
Member

karasikov commented Sep 5, 2024

Okay, I recommend using the first approach. This database contains non-redundant sequences, hence, there is no need in KMC.

Can you please try rerunning it with a smaller buffer? 40 GB should always be enough.
Also, keep in mind that you can always construct it directly from the .gz file.

Next, this will need roughly 2 TB disk swap. Can you please check how much free disk space you have in /mnt/tmp/?

Last, the database contains roughly 1.3 trillion k-mers. I recommend building a graph with a smaller k first (say, k=20 as you've already tried) to streamline things and try with a larger k later.
For k=20, according to my estimates, the graph will contain around 600 bln unique k-mers, which should make its construction a bit easier.

Also, you can add --inplace to reduce the RAM usage during the last stages of the graph construction.

So, the command would be:
metagraph build -v -k 20 -o graph -p 96 --disk-swap /mnt/tmp/ --inplace --mem-cap-gb 40 nt.fa.gz

@karasikov
Copy link
Member

karasikov commented Sep 5, 2024

I just tried this on my laptop ./metagraph build -k 20 -o test --mem-cap-gb 20 ~/Downloads/nt.fa.gz -v --inplace --disk-swap ./ -p 12
It processed ~80 chunks (20 GB per chunk, ~2.6 bln k-mers). This is about 16% of the whole construction, but I don't have enough disk swap on the laptop to complete the construction. (This only took ~50 min.)

So, with 756GB RAM and enough disk swap, the construction should run fine and should take around 6 hours.


UPD: I just ran ./metagraph build -k 17 -o test --mem-cap-gb 20 ~/Downloads/nt.fa.gz -v --disk-swap ./ -p 12 on my laptop. It took 4.5 hours to build the full graph. Building it with a larger k is only a question of having more RAM and disk swap, which you do have.
I suspect that your process got killed because the buffer was too large (700 GB for the 756GB compute node, where some of that memory is already used by the system. Reducing it to 40 GB should solve the problem).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants