Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run a dedicated supervisor/reaper for each job #85

Open
mheily opened this issue Jun 24, 2016 · 1 comment
Open

run a dedicated supervisor/reaper for each job #85

mheily opened this issue Jun 24, 2016 · 1 comment

Comments

@mheily
Copy link
Owner

mheily commented Jun 24, 2016

Each job should have a dedicated supervisor/subreaper process. In practice, this means that jobd calls fork() one more time, and makes the parent process the supervisor. This should look like:

jobd
  ||
supervisor/reaper
  ||
actual job process, defined by Program in the manifest

The subreaper function depends on calling procctl(2) or prctl(2) to enable subreaping. Not all kernels support this, so it is allowed to not have this function. The effect of not having a subreaper is that jobs might not be fully cleaned up if they spawn additional daemons.

The supervisor is actually a bit of a misnomer when compared to things like runit/s6/daemontools. The jobd supervisor will not actually keep the job alive by restarting it when it dies. Instead, it is very dumb process that will just sit in a wait() call on it's child process. If the child dies, the supervisor will hang around long enough to kill all the stray children, update the job database to reflect that the job is dead, and then exit.

If you send a SIGTERM to the supervisor, it will send a SIGTERM to the child and then wait around for all grandchildren to exit. After a certain amount of time, it will forcibly kill all grandchildren.

So in effect, the supervisor/subreaper is really just a layer between jobd and it's running jobs, with the purpose of handling reaping and process monitoring, doing so independently of jobd itself. Jobd will only be notifed when a job and all it's related processes are fully terminated. This simplifies jobd's internal state machine and child termination handling, and should make it easier for jobd itself to crash or be restarted without losing track of jobs that are in the middle of stopping.

@catern
Copy link

catern commented Mar 4, 2017

Have you considered the possibility of nesting jobs? That is, running jobd inside a job? Subreapers can be nested, unlike classical Unix process supervision mechanisms (process groups and sessions), so jobd can be nested in turn. This could allow for some nice hierarchical structure and modularity, rather than flattening the job tree to a single level with a single jobd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants