You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each job should have a dedicated supervisor/subreaper process. In practice, this means that jobd calls fork() one more time, and makes the parent process the supervisor. This should look like:
jobd
||
supervisor/reaper
||
actual job process, defined by Program in the manifest
The subreaper function depends on calling procctl(2) or prctl(2) to enable subreaping. Not all kernels support this, so it is allowed to not have this function. The effect of not having a subreaper is that jobs might not be fully cleaned up if they spawn additional daemons.
The supervisor is actually a bit of a misnomer when compared to things like runit/s6/daemontools. The jobd supervisor will not actually keep the job alive by restarting it when it dies. Instead, it is very dumb process that will just sit in a wait() call on it's child process. If the child dies, the supervisor will hang around long enough to kill all the stray children, update the job database to reflect that the job is dead, and then exit.
If you send a SIGTERM to the supervisor, it will send a SIGTERM to the child and then wait around for all grandchildren to exit. After a certain amount of time, it will forcibly kill all grandchildren.
So in effect, the supervisor/subreaper is really just a layer between jobd and it's running jobs, with the purpose of handling reaping and process monitoring, doing so independently of jobd itself. Jobd will only be notifed when a job and all it's related processes are fully terminated. This simplifies jobd's internal state machine and child termination handling, and should make it easier for jobd itself to crash or be restarted without losing track of jobs that are in the middle of stopping.
The text was updated successfully, but these errors were encountered:
Have you considered the possibility of nesting jobs? That is, running jobd inside a job? Subreapers can be nested, unlike classical Unix process supervision mechanisms (process groups and sessions), so jobd can be nested in turn. This could allow for some nice hierarchical structure and modularity, rather than flattening the job tree to a single level with a single jobd.
Each job should have a dedicated supervisor/subreaper process. In practice, this means that jobd calls fork() one more time, and makes the parent process the supervisor. This should look like:
The subreaper function depends on calling procctl(2) or prctl(2) to enable subreaping. Not all kernels support this, so it is allowed to not have this function. The effect of not having a subreaper is that jobs might not be fully cleaned up if they spawn additional daemons.
The supervisor is actually a bit of a misnomer when compared to things like runit/s6/daemontools. The jobd supervisor will not actually keep the job alive by restarting it when it dies. Instead, it is very dumb process that will just sit in a wait() call on it's child process. If the child dies, the supervisor will hang around long enough to kill all the stray children, update the job database to reflect that the job is dead, and then exit.
If you send a SIGTERM to the supervisor, it will send a SIGTERM to the child and then wait around for all grandchildren to exit. After a certain amount of time, it will forcibly kill all grandchildren.
So in effect, the supervisor/subreaper is really just a layer between jobd and it's running jobs, with the purpose of handling reaping and process monitoring, doing so independently of jobd itself. Jobd will only be notifed when a job and all it's related processes are fully terminated. This simplifies jobd's internal state machine and child termination handling, and should make it easier for jobd itself to crash or be restarted without losing track of jobs that are in the middle of stopping.
The text was updated successfully, but these errors were encountered: