You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running kubectl rollout restart sts redpanda -n redpanda after deploying with ephemeral storage results in an unhealthy cluster, with the destroyed broker remaining in the cluster and the new broker getting assigned a new node ID (increasing the broker count by 1).
What did you expect to happen?
Expected the cluster to restart and result in a healthy cluster.
How can we reproduce it (as minimally and precisely as possible)?. Please include values file.
Once the cluster is healthy, do a rolling restart:
kubectl rollout restart sts redpanda -n redpanda
Continuously run the following command until redpanda-2 is available:
kubectl logs pod/redpanda-2 -n redpanda -f
Eventually you will see print constantly:
WARN 2024-07-19 18:46:48,917 [shard 0:raft] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:3922 - received full heartbeat request addressed to node with different revision: {id: 2, revision: 0}, current node: {id: 3, revision: 0}, source: {id: 1, revision: 0}
I've ran through this multiple times. Sometimes the rolling restart doesn't continue past redpanda-2. Other times it continues as expected. Most times the cluster ends in the following state, where redpanda-2 is assigned a new node ID and redpanda-1 never returns to the cluster:
We have this doc for this config, but there is no mention of an issue with being able to restart. It seems that running in this state is never a good idea with this issue, since anytime a broker leaves the cluster the cluster will become unhealthy. The brokers should be decommissioned when using ephemeral storage. We have this doc explaining how to perform a rolling restart, but no mention of any issues when using ephemeral storage.
It would be great if we could disable any changes when users run kubectl rollout restart sts redpanda -n redpanda when they also have storage.persistentVolume.enabled: false.
Which are the affected charts?
Redpanda
Chart Version(s)
This happens with all versions I've tested, from 5.8.12 to 5.7.24.
The tl;dr is that we (I) don't believe there are use cases outside of simple testing / verification of chart / redpanda behaviors. If anyone has other uses cases, please chime in!
Until then, we'll update the docs and add some red tape to both NOTES.txt and the values.yaml file indicating that the errors seen here are expected behavior.
What happened?
Running
kubectl rollout restart sts redpanda -n redpanda
after deploying with ephemeral storage results in an unhealthy cluster, with the destroyed broker remaining in the cluster and the new broker getting assigned a new node ID (increasing the broker count by 1).What did you expect to happen?
Expected the cluster to restart and result in a healthy cluster.
How can we reproduce it (as minimally and precisely as possible)?. Please include values file.
Create kind cluster:
Create Redpanda config with ephemeral storage:
Deploy Redpanda 24.1.8 via helm 5.8.12:
Once the cluster is healthy, do a rolling restart:
Continuously run the following command until
redpanda-2
is available:Eventually you will see print constantly:
I've ran through this multiple times. Sometimes the rolling restart doesn't continue past
redpanda-2
. Other times it continues as expected. Most times the cluster ends in the following state, whereredpanda-2
is assigned a new node ID andredpanda-1
never returns to the cluster:Anything else we need to know?
We have this doc for this config, but there is no mention of an issue with being able to restart. It seems that running in this state is never a good idea with this issue, since anytime a broker leaves the cluster the cluster will become unhealthy. The brokers should be decommissioned when using ephemeral storage. We have this doc explaining how to perform a rolling restart, but no mention of any issues when using ephemeral storage.
It would be great if we could disable any changes when users run
kubectl rollout restart sts redpanda -n redpanda
when they also havestorage.persistentVolume.enabled: false
.Which are the affected charts?
Redpanda
Chart Version(s)
This happens with all versions I've tested, from 5.8.12 to 5.7.24.
Cloud provider
none
JIRA Link: K8S-299
The text was updated successfully, but these errors were encountered: