Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't terminate DL binary client subprocesses when the parent process is SIGTERM'd #77

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

airhorns
Copy link
Contributor

@airhorns airhorns commented Mar 9, 2024

We were seeing failures in Gadget during deploys where a running pod would be doing DL work and get SIGTERM'd by kubernetes. Normally, unixes will only deliver that signal to the parent process, but execa takes care to propagate SIGTERMs from the parent to any spawned children. This has the effect of prematurely killing a dl binary client subprocess mid operation. We use graceful shutdown in gadget to allow currently-running operations within our workers to finish, and so to allow that, we don't want these signals propagated.

… is SIGTERM'd

We were seeing failures in Gadget during deploys where a running pod
would be doing DL work and get SIGTERM'd by kubernetes. Normally, unixes
will only deliver that signal to the parent process, but `execa` takes
care to propagate SIGTERMs from the parent to any spawned children. This
has the effect of prematurely killing a dl binary client subprocess mid
operation. We use graceful shutdown in gadget to allow currently-running
operations within our workers to finish, and so to allow that, we don't
want these signals propagated.
@airhorns airhorns requested a review from angelini March 9, 2024 21:41
@angelini angelini merged commit 457d162 into main Mar 11, 2024
4 checks passed
@angelini
Copy link
Contributor

@airhorns I'm gonna manage deploying a new DL version and getting this into Gadget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants