Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGINT not honored by Patroni anymore #499

Open
talpa-robin opened this issue Oct 18, 2024 · 1 comment
Open

SIGINT not honored by Patroni anymore #499

talpa-robin opened this issue Oct 18, 2024 · 1 comment

Comments

@talpa-robin
Copy link

Hey Team :)

We're using the image tag timescale/timescaledb-ha:pg13.16-ts2.15.3 with Patroni and since the last update to that tag (which came with an upgrade to Patroni 4.0.2 and the STOPSIGNAL change from SIGTERM to SIGINT, see #492) the "delete/stop" commands from Kubernetes don't lead to a graceful shutdown anymore. Within the pods / processes nothing happens and then it's forcefully killed after the terminationGracePeriodSeconds.

Here are the relevant processes running inside the pod for a replica of a three node HA setup

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres       1  0.0  0.2  50212 34904 ?        Ss   10:00   0:00 /usr/bin/python3 /usr/bin/patroni /etc/timescaledb/patroni.yaml
postgres      15  0.1  0.2 583492 37044 ?        Sl   10:00   0:18 /usr/bin/python3 /usr/bin/patroni /etc/timescaledb/patroni.yaml
postgres     384  0.0  0.8 3786888 129736 ?      S    10:17   0:00 postgres -D /var/lib/postgresql/data --config-file=/var/lib/postgresql/data/postgresql.conf --listen_addresses=0.0.0.0 --po
postgres     386  0.0  3.7 3787292 600200 ?      Ss   10:17   0:09 postgres: xxx-timescaledb-xxx: startup recovering 00000096000032CE00000034
postgres     393  0.0  3.6 3787032 580960 ?      Ss   10:17   0:06 postgres: xxx-timescaledb-xxx: checkpointer 
postgres     394  0.0  0.2 3786888 37240 ?       Ss   10:17   0:00 postgres: xxx-timescaledb-xxx: background writer 
postgres     395  0.0  0.0  73260  9108 ?        Ss   10:17   0:03 postgres: xxx-timescaledb-xxx: stats collector 
postgres     402  0.0  0.1 3790584 28364 ?       Ss   10:17   0:00 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres     404  0.0  0.2 3791024 32060 ?       Ss   10:17   0:02 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres    2337  0.0  0.1 3790236 28708 ?       Ss   12:26   0:00 postgres: xxx-timescaledb-xxx: postgres postgres [local] idle
postgres    2349  0.1  0.0 3787784 15152 ?       Ss   12:26   0:11 postgres: xxx-timescaledb-xxx: walreceiver streaming 32CE/34C2B6F8

When sending a SIGINT to PID 1 or 15 nothing happens (also simulated this with kill -s SIGINT <PID>). When looking into the auditd logs of the host machine you see that PID 1 receives the SIGINT but PID 15 doesn't. When sending a SIGTERM everything works as expected.

We first thought it might be a problem with Patroni but the guys over in the Patroni Slack couldn't reproduce it and also our internal tests with this setup https://github.com/patroni/patroni/tree/master/docker confirm, that Patroni works as intended there.

Hope you can help. If you need more information don't hesitate to ask :)

Have a great weekend

@graveland
Copy link
Collaborator

That is odd for sure.... patroni's main.py has:

    signal.signal(signal.SIGINT, passtochild)

and that should be called when patroni is PID == 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants