Dead kernel submits new job to queue on restart #12

tdaff · 2015-12-03T20:33:55Z

Original report by Scott Field (Bitbucket: sfield83, ).

Hi Tom,

Sometimes I'll need to wait a few minute for my job to start (using the very awesome feature of remotely starting jobs from a batch submission system). If the kernel dies, upon restart a brand new job is submitted. This results in two jobs sitting on the queue.

So far, I've only had a problem on PBS systems.

Best,
Scott

tdaff · 2015-12-11T11:59:54Z

Original comment by Tom Daff (Bitbucket: tdaff, GitHub: tdaff).

Hi Scott,

Thanks for the report, and I'm happy that you are still finding the code useful :)

I'm still thinking how to deal with this. I think the main issue is that the kernel is run in a subprocess and upon restart the subprocess gets killed completely and a new one starts. The original PBS job probably lingers around until it times out. Does the job go away eventually by itself (maybe 10 mins)?

Are you wondering whether it is possible to re-use a job for subsequent kernels? That might be possible, but would need significant re-engineering to persist an active connection between different python processes. Though I know it is annoying when jobs take a while in the queue.

tdaff · 2015-12-11T20:13:21Z

Original comment by Scott Field (Bitbucket: sfield83, ).

Hi Tom,

I've never allowed the rogue job to linger too long, so I'm not really sure. But would the queuing system even become alerted to the fact that the kernel has died? The job might just sit on the queue until it starts running, and then run to completion (ie nothing happens until the requested wall time is exhausted).

Anyway, its a very minor issue. It would be very nice to re-use the job for the restarted kernel. Or, to do a full cleanup, if the main process could call the system's qdel command immediately after the kernel subprocess is killed.

Scott

tdaff added minor bug labels Jun 30, 2020

tdaff mentioned this issue Jun 30, 2020

Multiplexing of connections #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dead kernel submits new job to queue on restart #12

Dead kernel submits new job to queue on restart #12

tdaff commented Dec 3, 2015

tdaff commented Dec 11, 2015

tdaff commented Dec 11, 2015

Dead kernel submits new job to queue on restart #12

Dead kernel submits new job to queue on restart #12

Comments

tdaff commented Dec 3, 2015

tdaff commented Dec 11, 2015

tdaff commented Dec 11, 2015