From 1877d812747788192816e8b6f82233142645cc5d Mon Sep 17 00:00:00 2001 From: Mat Kowalski Date: Thu, 17 Oct 2024 10:13:13 +0200 Subject: [PATCH] OCPBUGS-43428: Soften haproxy timeout for kubeapi probe This PR changes timeouts used by haproxy when deciding whether the master backend (i.e. k8s api server) is dead or alive. The previous probe was relatively strict, allowing for a very fast failover but at the same time very prone to temporary flakiness. The new configuration aligns haproxy with the readiness probe used by k8s when detecting if pod is dead or alive. Aligning those configurations removes the mismatch we have when k8s believes api server is ready but haproxy sees it as dead. A consequence of this change is a potential increase of the downtime when api server is forcefully removed. In the worst case scenario we may see unavailability for 15 seconds. This should not be happening much in a real setups, but for the sake of completeness this should be noted. Fixes: OCPBUGS-43428 --- templates/master/00-master/on-prem/files/haproxy-haproxy.yaml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/templates/master/00-master/on-prem/files/haproxy-haproxy.yaml b/templates/master/00-master/on-prem/files/haproxy-haproxy.yaml index b402ecb9b5..a7d0d4ae85 100644 --- a/templates/master/00-master/on-prem/files/haproxy-haproxy.yaml +++ b/templates/master/00-master/on-prem/files/haproxy-haproxy.yaml @@ -36,8 +36,9 @@ contents: stats refresh 30s stats auth Username:Password backend masters + timeout check 10s option httpchk GET /readyz HTTP/1.0 balance roundrobin {{`{{- range .LBConfig.Backends }} - server {{ .Host }} {{ .Address }}:{{ .Port }} weight 1 verify none check check-ssl inter 1s fall 2 rise 3 + server {{ .Host }} {{ .Address }}:{{ .Port }} weight 1 verify none check check-ssl inter 5s fall 3 rise 1 {{- end }}`}}