run a dedicated supervisor/reaper for each job #85

mheily · 2016-06-24T02:27:27Z

Each job should have a dedicated supervisor/subreaper process. In practice, this means that jobd calls fork() one more time, and makes the parent process the supervisor. This should look like:

jobd
  ||
supervisor/reaper
  ||
actual job process, defined by Program in the manifest

The subreaper function depends on calling procctl(2) or prctl(2) to enable subreaping. Not all kernels support this, so it is allowed to not have this function. The effect of not having a subreaper is that jobs might not be fully cleaned up if they spawn additional daemons.

The supervisor is actually a bit of a misnomer when compared to things like runit/s6/daemontools. The jobd supervisor will not actually keep the job alive by restarting it when it dies. Instead, it is very dumb process that will just sit in a wait() call on it's child process. If the child dies, the supervisor will hang around long enough to kill all the stray children, update the job database to reflect that the job is dead, and then exit.

If you send a SIGTERM to the supervisor, it will send a SIGTERM to the child and then wait around for all grandchildren to exit. After a certain amount of time, it will forcibly kill all grandchildren.

So in effect, the supervisor/subreaper is really just a layer between jobd and it's running jobs, with the purpose of handling reaping and process monitoring, doing so independently of jobd itself. Jobd will only be notifed when a job and all it's related processes are fully terminated. This simplifies jobd's internal state machine and child termination handling, and should make it easier for jobd itself to crash or be restarted without losing track of jobs that are in the middle of stopping.

The text was updated successfully, but these errors were encountered:

catern · 2017-03-04T22:08:33Z

Have you considered the possibility of nesting jobs? That is, running jobd inside a job? Subreapers can be nested, unlike classical Unix process supervision mechanisms (process groups and sessions), so jobd can be nested in turn. This could allow for some nice hierarchical structure and modularity, rather than flattening the job tree to a single level with a single jobd.

mheily added the enhancement label Jun 24, 2016

mheily mentioned this issue Jun 24, 2016

libnv doesn't check malloc()'s return value #82

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run a dedicated supervisor/reaper for each job #85

run a dedicated supervisor/reaper for each job #85

mheily commented Jun 24, 2016

catern commented Mar 4, 2017

run a dedicated supervisor/reaper for each job #85

run a dedicated supervisor/reaper for each job #85

Comments

mheily commented Jun 24, 2016

catern commented Mar 4, 2017