Lock is not released if reboot of node is blocked #792

dhedberg · 2023-07-17T14:25:43Z

We want to prevent automatic reboots of any nodes where a particular set of pods are running. To that end we have configured a --blocking-pod-selector. The idea being that while that particular node will not be rebooted at the moment, any other nodes will still be allowed to reboot if they can.

The issue with that seems to be that the lock is taken before the check for blocking pods, but not released when a blocking pod is found. Since the lock is still being held, this prevents any other nodes that could reboot from doing so.

Have I misunderstood anything? Is this a bug or the expected behavior?

ckotzbauer · 2023-07-20T16:26:51Z

Yeah, when reading the code, it seems that the lock is acquired and not released in case that the reboot is blocked.
There are two possibilities for beeing blocked:

--alert-filter-regexp which queries prometheus and is independent of the current processed node (reboots are blocked in the whole cluster)
--blocking-pod-selector which is looking for specific pods on the processed node.

In case of the prometheus-query it is okay that the lock is held for longer and not released again as the reboots are blocked anyway. For the pod-selector it seems better to change the behaviour to release the lock again.

WDYT @jackfrancis?

jackfrancis · 2023-08-16T21:46:58Z

After re-reading the code a few times, I think the appropriate thing to do in this case is to simply check for blocking conditions prior to acquiring the lock. I've opened up a PR here that does that:

#819

Happy to let @ckotzbauer weigh in on this, as we now have two competing solutions for this issue. :)

ckotzbauer · 2023-08-17T04:17:22Z

I merged your PR @jackfrancis, as I really like the approach of first checking for blockers before acquiring. Thanks for your thoughts!

ckotzbauer added the enhancement label Jul 20, 2023

ckotzbauer mentioned this issue Aug 2, 2023

Release node-lock on blocking pod-selector #807

Closed

ckotzbauer added this to the 1.14.0 milestone Aug 4, 2023

jackfrancis mentioned this issue Aug 16, 2023

fix: don’t hold node lock if reboot is blocked #819

Merged

ckotzbauer closed this as completed in #819 Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock is not released if reboot of node is blocked #792

Lock is not released if reboot of node is blocked #792

dhedberg commented Jul 17, 2023

ckotzbauer commented Jul 20, 2023

jackfrancis commented Aug 16, 2023

ckotzbauer commented Aug 17, 2023

Lock is not released if reboot of node is blocked #792

Lock is not released if reboot of node is blocked #792

Comments

dhedberg commented Jul 17, 2023

ckotzbauer commented Jul 20, 2023

jackfrancis commented Aug 16, 2023

ckotzbauer commented Aug 17, 2023