Replies: 2 comments 5 replies
-
With further investigation I can see that there are zero POST requests to the Kubernetes API server until the strimzi-operator has been deleted and rescheduled. Immediately after it starts up there are quickly six POST requests (three for kafka, three for zookeeper) and the pods are soon scheduled and running. |
Beta Was this translation helpful? Give feedback.
-
Some other users who provided full logs seemed to have the issue that the informer informing the clients about the Pod events died. It is not clear if that is your case without full logs etc. But it sounds the same. You should try to upgrade to 0.35 which has updated the Kubernetes client which had some bugs as well as improved the handling around it. |
Beta Was this translation helpful? Give feedback.
-
In the past few weeks we have noticed a worrying situation wherein the strimzi-operator does not create a new pod to replace a missing pod. This just now caused a crash of both zookeeper and, as a result kafka, in our staging cluster. If we delete the strimzi-operator pod the cluster is quickly brought back to health by creating the missing pods. In this case, the missing pods was all zookeeper pods (three) and all kafka broker pods (three).
We recently upgraded from 0.27.1 to 0.34.0 and did not explicitly disable the new StrimziPodSet feature in the upgrade process. I am curious if this new feature may be playing a part. With that said, I have found precious little in the logs to indicate the failure. As you can see below the general flow is that a reconciliation begins, a pod is restarted (in this case due to a k8s node pool upgrade), but it never schedules.
A listing of pods would result in a pod missing from the list rather than a pod listed as attempting to be scheduled.
Log snippet follows:
Beta Was this translation helpful? Give feedback.
All reactions