Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resume/continue job #182

Open
xvazquezc opened this issue Feb 11, 2016 · 4 comments
Open

resume/continue job #182

xvazquezc opened this issue Feb 11, 2016 · 4 comments

Comments

@xvazquezc
Copy link

Hi,
I've just installed GraftM in our cluster. I got GraftM (graft) to run but it crashed for not allocating enough RAM.

02/11/2016 01:16:57 PM INFO: Working on 028-LFA_S1_R1
02/11/2016 01:16:57 PM INFO: Working on forward reads
02/11/2016 01:32:42 PM INFO: Found 573 read(s) that may be eukaryotic
02/11/2016 01:33:31 PM INFO: 10659 read(s) detected
02/11/2016 01:33:31 PM INFO: aligning reads to reference package database
02/11/2016 01:36:12 PM INFO: Filtered 1788 short sequences from the alignment
02/11/2016 01:36:12 PM INFO: 8871 sequences remaining
02/11/2016 01:36:12 PM INFO: Working on reverse reads
02/11/2016 01:50:51 PM INFO: Found 576 read(s) that may be eukaryotic
02/11/2016 01:51:40 PM INFO: 10606 read(s) detected
02/11/2016 01:51:40 PM INFO: aligning reads to reference package database
02/11/2016 01:54:20 PM INFO: Filtered 1782 short sequences from the alignment
02/11/2016 01:54:20 PM INFO: 8824 sequences remaining
02/11/2016 01:54:20 PM INFO: Placing reads into phylogenetic tree
=>> PBS: job killed: vmem 440318926848 exceeded limit 42949672960

I realised that there is no indication for resuming or continuing a crashed job. I tried to run the same command but it stops to avoid overwriting in the directory. This is the message:

Traceback (most recent call last):
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/bin/graftM", line 345, in
Run(args).main()
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/lib/python2.7/site-packages/graftm/run.py", line 526, in main
self.graft()
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/lib/python2.7/site-packages/graftm/run.py", line 238, in graft
self.args.force)
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/lib/python2.7/site-packages/graftm/housekeeping.py", line 88, in make_working_directory
raise Exception('Directory %s already exists. Exiting to prevent over-writing'% directory_path)
Exception: Directory graftm already exists. Exiting to prevent over-writing

Is there any way to resume or at least a way to estimate the memory to be used?

Thank you in advance,
Xabier

@geronimp
Copy link
Owner

Hey Xabier

Firstly thank you for your interest in GraftM!

The step that uses the most memory in GraftM is pplacer. The more sequences within a GraftM package, the more memory is required. In the publication for pplacer they demonstrate that the memory requirements are linear with respect to the number of taxa in the tree (Fig. 3). We're working now on estimating the memory usage you can expect for each of the 16S rRNA GraftM packages - we will get back to you on this. May I ask what specific GraftM package you were using for this run?

Unfortunately there is currently no way of picking up a failed GraftM run. Hopefully in most instances the time to re-run graftM graft isn't too much. To overwrite the previous run you can use the --force flag.

Thanks again Xabier, you'll hear from us soon.

Joel

@xvazquezc
Copy link
Author

Hi Joe,
I'm using GraftM 0.9.4.
If it helps, with 12 threads, it has gone over 450 GB of memory with the GreenGenes 97 package.
Thanks

PS: I think you have the wrong pplacer indicated in the README

@geronimp
Copy link
Owner

Hey Xabier,

So we've traced this down to an issue in the way memory usage is reported for pplacer.
The memory usage by the 97 GreenGenes package should range 33 - 40 GB depending on the number of threads used by the run. Unfortunately pplacer lists the same memory allocated for the whole run as the total memory for each process, meaning that the memory usage measured by PBS (and top) is the real memory usage multiplied by the number of threads. So in your case the actual amount of mem used was likely 450/12 = 37.5

In the short term a work around could be found by specifying less threads overall the get past the memory cap. In the longer term we will raise an issue with pplacer and look at implementing a separate --pplacer_threads_flag to which you could specify the number of threads used at this step.

Apologies for the delayed reply on this one,

Joel

@xvazquezc
Copy link
Author

Hi Joel,
Just following on the memory requirements, I have quite a few samples so, I'm doing a bit of testing with the minimum cores, ie 1, with same parameters for anything else, so I can allocate more jobs on parallel (there is a cap on how many resources I can use at a time from the cluster and 450GB is a lot as it has to be requested over several nodes).
In this case with a single core, in the pplacer step the job was requesting over 66 GB (way more than expected, it crashed because of it).
I guess the memory requirements aren't very linear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants