resume/continue job #182

xvazquezc · 2016-02-11T03:45:58Z

Hi,
I've just installed GraftM in our cluster. I got GraftM (graft) to run but it crashed for not allocating enough RAM.

02/11/2016 01:16:57 PM INFO: Working on 028-LFA_S1_R1
02/11/2016 01:16:57 PM INFO: Working on forward reads
02/11/2016 01:32:42 PM INFO: Found 573 read(s) that may be eukaryotic
02/11/2016 01:33:31 PM INFO: 10659 read(s) detected
02/11/2016 01:33:31 PM INFO: aligning reads to reference package database
02/11/2016 01:36:12 PM INFO: Filtered 1788 short sequences from the alignment
02/11/2016 01:36:12 PM INFO: 8871 sequences remaining
02/11/2016 01:36:12 PM INFO: Working on reverse reads
02/11/2016 01:50:51 PM INFO: Found 576 read(s) that may be eukaryotic
02/11/2016 01:51:40 PM INFO: 10606 read(s) detected
02/11/2016 01:51:40 PM INFO: aligning reads to reference package database
02/11/2016 01:54:20 PM INFO: Filtered 1782 short sequences from the alignment
02/11/2016 01:54:20 PM INFO: 8824 sequences remaining
02/11/2016 01:54:20 PM INFO: Placing reads into phylogenetic tree
=>> PBS: job killed: vmem 440318926848 exceeded limit 42949672960

I realised that there is no indication for resuming or continuing a crashed job. I tried to run the same command but it stops to avoid overwriting in the directory. This is the message:

Traceback (most recent call last):
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/bin/graftM", line 345, in
Run(args).main()
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/lib/python2.7/site-packages/graftm/run.py", line 526, in main
self.graft()
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/lib/python2.7/site-packages/graftm/run.py", line 238, in graft
self.args.force)
File "/home/z3382651/bin/mypythondir/mypythonenv/mypythonenv/lib/python2.7/site-packages/graftm/housekeeping.py", line 88, in make_working_directory
raise Exception('Directory %s already exists. Exiting to prevent over-writing'% directory_path)
Exception: Directory graftm already exists. Exiting to prevent over-writing

Is there any way to resume or at least a way to estimate the memory to be used?

Thank you in advance,
Xabier

The text was updated successfully, but these errors were encountered:

geronimp · 2016-02-11T23:45:29Z

Hey Xabier

Firstly thank you for your interest in GraftM!

The step that uses the most memory in GraftM is pplacer. The more sequences within a GraftM package, the more memory is required. In the publication for pplacer they demonstrate that the memory requirements are linear with respect to the number of taxa in the tree (Fig. 3). We're working now on estimating the memory usage you can expect for each of the 16S rRNA GraftM packages - we will get back to you on this. May I ask what specific GraftM package you were using for this run?

Unfortunately there is currently no way of picking up a failed GraftM run. Hopefully in most instances the time to re-run graftM graft isn't too much. To overwrite the previous run you can use the --force flag.

Thanks again Xabier, you'll hear from us soon.

Joel

xvazquezc · 2016-02-12T00:50:41Z

Hi Joe,
I'm using GraftM 0.9.4.
If it helps, with 12 threads, it has gone over 450 GB of memory with the GreenGenes 97 package.
Thanks

PS: I think you have the wrong pplacer indicated in the README

geronimp · 2016-02-15T04:06:20Z

Hey Xabier,

So we've traced this down to an issue in the way memory usage is reported for pplacer.
The memory usage by the 97 GreenGenes package should range 33 - 40 GB depending on the number of threads used by the run. Unfortunately pplacer lists the same memory allocated for the whole run as the total memory for each process, meaning that the memory usage measured by PBS (and top) is the real memory usage multiplied by the number of threads. So in your case the actual amount of mem used was likely 450/12 = 37.5

In the short term a work around could be found by specifying less threads overall the get past the memory cap. In the longer term we will raise an issue with pplacer and look at implementing a separate --pplacer_threads_flag to which you could specify the number of threads used at this step.

Apologies for the delayed reply on this one,

Joel

xvazquezc · 2016-02-17T23:40:25Z

Hi Joel,
Just following on the memory requirements, I have quite a few samples so, I'm doing a bit of testing with the minimum cores, ie 1, with same parameters for anything else, so I can allocate more jobs on parallel (there is a cap on how many resources I can use at a time from the cluster and 450GB is a lot as it has to be requested over several nodes).
In this case with a single core, in the pplacer step the job was requesting over 66 GB (way more than expected, it crashed because of it).
I guess the memory requirements aren't very linear

geronimp mentioned this issue Feb 15, 2016

missing output files? #184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resume/continue job #182

resume/continue job #182

xvazquezc commented Feb 11, 2016

geronimp commented Feb 11, 2016

xvazquezc commented Feb 12, 2016

geronimp commented Feb 15, 2016

xvazquezc commented Feb 17, 2016

resume/continue job #182

resume/continue job #182

Comments

xvazquezc commented Feb 11, 2016

geronimp commented Feb 11, 2016

xvazquezc commented Feb 12, 2016

geronimp commented Feb 15, 2016

xvazquezc commented Feb 17, 2016