-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Requirements for Build #496
Comments
Hi Brandi The command looks fine. Could you please also post the logs so we can see at what step it gets killed? |
[2024-08-12 12:58:43.779] [trace] Metagraph started |
Hi Brandi, I was on holidays, sorry for the late response. The logs look fine. Also, it seems like it gets killed after about 30 min, so, relatively, in the beginning. Not sure how we can troubleshoot this. Can you upload the inputs somewhere or share the data, so I can run the same command myself? |
Sure, it's just the nt database from the NCBI https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz |
Hi Brandi, I see that it's using |
I tried both -- k=20 was the latest attempt. |
both failed |
Okay, I recommend using the first approach. This database contains non-redundant sequences, hence, there is no need in KMC. Can you please try rerunning it with a smaller buffer? 40 GB should always be enough. Next, this will need roughly 2 TB disk swap. Can you please check how much free disk space you have in Last, the database contains roughly 1.3 trillion k-mers. I recommend building a graph with a smaller k first (say, k=20 as you've already tried) to streamline things and try with a larger k later. Also, you can add So, the command would be: |
I just tried this on my laptop So, with 756GB RAM and enough disk swap, the construction should run fine and should take around 6 hours. UPD: I just ran |
I am trying to build an index for NT (NCBI non-redundant nucleotide database). I am using a machine with 96CPUs and 756GB, I have tried:
metagraph build -v -k 31 -o graph -p 96 --disk-swap /mnt/tmp/ --mem-cap-gb 700 nt.fasta
But the job is "killed"
I have also tried to use kmc to calculate the kmers prior to running
metagraph build
Any advice on the command or resources needed to build a graph for ~ 1.5TB of sequence data?
The text was updated successfully, but these errors were encountered: