Skip to content

Commit

Permalink
Update the Kafka recovery procedure with KRaft notes (#10728)
Browse files Browse the repository at this point in the history
Signed-off-by: Federico Valeri <[email protected]>
  • Loading branch information
fvaleri authored Oct 21, 2024
1 parent 41372de commit f2fa81b
Show file tree
Hide file tree
Showing 3 changed files with 266 additions and 21 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,7 @@ You can recover a Kafka cluster from persistent volumes (PVs) if they are still

//scenarios to recover from
include::../../modules/managing/con-cluster-recovery-scenarios.adoc[leveloffset=+1]
//procedure to recover a cluster from a PV
//procedure to recover a KRaft based cluster from a PV
include::../../modules/managing/proc-cluster-recovery-volume.adoc[leveloffset=+1]
//procedure to recover a ZooKeeper based cluster from a PV
include::../../modules/managing/proc-cluster-recovery-volume-zk.adoc[leveloffset=+1]
213 changes: 213 additions & 0 deletions documentation/modules/managing/proc-cluster-recovery-volume-zk.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
// Module included in the following assembly:
//
// assembly-cluster-recovery-volume.adoc

[id="proc-cluster-recovery-volume-zk-{context}"]
= Recovering a deleted ZooKeeper-based Kafka cluster

[role="_abstract"]
This procedure describes how to recover a deleted Kafka cluster operating in a ZooKeeper-based environment from persistent volumes (PVs) by recreating the original `PersistentVolumeClaim` (PVC) resources.

If the Topic Operator and User Operator are deployed, you can recover `KafkaTopic` and `KafkaUser` resources by recreating them.
It is important that you recreate the `KafkaTopic` resources with the same configurations, or the Topic Operator will try to update them in Kafka.
This procedure shows how to recreate both resources.

WARNING: If the User Operator is enabled and Kafka users are not recreated, users are deleted from the Kafka cluster immediately after recovery.

.Before you begin

In this procedure, it is essential that PVs are mounted into the correct PVC to avoid data corruption.
A `volumeName` is specified for the PVC and this must match the name of the PV.

For more information, see xref:ref-persistent-storage-{context}[Persistent storage].

.Procedure

. Check information on the PVs in the cluster:
+
[source,shell,subs="+quotes,attributes"]
----
kubectl get pv
----
+
Information is presented for PVs with data.
+
.Example PV output
[source,shell,subs="+quotes,attributes"]
----
NAME RECLAIMPOLICY CLAIM
pvc-5e9c5c7f-3317-11ea-a650-06e1eadd9a4c ... Retain ... myproject/data-my-cluster-zookeeper-1
pvc-5e9cc72d-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-my-cluster-zookeeper-0
pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-my-cluster-zookeeper-2
pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c ... Retain ... myproject/data-0-my-cluster-kafka-0
pvc-7e21042e-3317-11ea-9786-02deaf9aa87e ... Retain ... myproject/data-0-my-cluster-kafka-1
pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-0-my-cluster-kafka-2
----
+
* `NAME` is the name of each PV.
* `RECLAIMPOLICY` shows that PVs are retained, meaning that the PV is not automatically deleted when the PVC is deleted.
* `CLAIM` shows the link to the original PVCs.

. Recreate the original namespace:
+
[source,shell,subs="+quotes,attributes"]
----
kubectl create namespace myproject
----
+
Here, we recreate the `myproject` namespace.

. Recreate the original PVC resource specifications, linking the PVCs to the appropriate PV:
+
.Example PVC resource specification
[source,shell,subs="+quotes,attributes"]
----
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-0-my-cluster-kafka-0
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: gp2-retain
volumeMode: Filesystem
volumeName: *pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c*
----

. Edit the PV specifications to delete the `claimRef` properties that bound the original PVC.
+
.Example PV specification
[source,shell,subs="+quotes,attributes"]
----
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubernetes.io/createdby: aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: "yes"
pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
creationTimestamp: "<date>"
finalizers:
- kubernetes.io/pv-protection
labels:
failure-domain.beta.kubernetes.io/region: eu-west-1
failure-domain.beta.kubernetes.io/zone: eu-west-1c
name: pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
resourceVersion: "39431"
selfLink: /api/v1/persistentvolumes/pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
uid: 7efe6b0d-3317-11ea-a650-06e1eadd9a4c
spec:
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
fsType: xfs
volumeID: aws://eu-west-1c/vol-09db3141656d1c258
capacity:
storage: 100Gi
*claimRef:*
*apiVersion: v1*
*kind: PersistentVolumeClaim*
*name: data-0-my-cluster-kafka-2*
*namespace: myproject*
*resourceVersion: "39113"*
*uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea*
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- eu-west-1c
- key: failure-domain.beta.kubernetes.io/region
operator: In
values:
- eu-west-1
persistentVolumeReclaimPolicy: Retain
storageClassName: gp2-retain
volumeMode: Filesystem
----
+
In the example, the following properties are deleted:
+
[source,shell,subs="+quotes,attributes"]
----
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: data-0-my-cluster-kafka-2
namespace: myproject
resourceVersion: "39113"
uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
----

. Deploy the Cluster Operator:
+
[source,shell]
----
kubectl create -f install/cluster-operator -n myproject
----

. Recreate all `KafkaTopic` resources by applying the `KafkaTopic` resource configuration:
+
[source,shell]
----
kubectl apply -f <topic_configuration_file> -n myproject
----

. Recreate all `KafkaUser` resources:
.. If user passwords and certificates need to be retained, recreate the user secrets before recreating the `KafkaUser` resources.
+
If the secrets are not recreated, the User Operator will generate new credentials automatically.
Ensure that the recreated secrets have exactly the same name, labels, and fields as the original secrets.

.. Apply the `KafkaUser` resource configuration:
+
[source,shell]
kubectl apply -f <user_configuration_file> -n myproject

. Deploy the Kafka cluster using the original configuration for the `Kafka` resource.
+
[source,shell]
----
kubectl apply -f <kafka_resource_configuration>.yaml -n myproject
----

. Verify the recovery of the `KafkaTopic` resources:
+
[source,shell]
----
kubectl get kafkatopics -o wide -w -n myproject
----
+
.Kafka topic status
[source,shell,subs="+quotes"]
----
NAME CLUSTER PARTITIONS REPLICATION FACTOR READY
my-topic-1 my-cluster 10 3 True
my-topic-2 my-cluster 10 3 True
my-topic-3 my-cluster 10 3 True
----
+
`KafkaTopic` custom resource creation is successful when the `READY` output shows `True`.

. Verify the recovery of the `KafkaUser` resources:
+
[source,shell]
----
kubectl get kafkausers -o wide -w -n myproject
----
+
.Kafka user status
[source,shell,subs="+quotes"]
----
NAME CLUSTER AUTHENTICATION AUTHORIZATION READY
my-user-1 my-cluster tls simple True
my-user-2 my-cluster tls simple True
my-user-3 my-cluster tls simple True
----
+
`KafkaUser` custom resource creation is successful when the `READY` output shows `True`.
70 changes: 50 additions & 20 deletions documentation/modules/managing/proc-cluster-recovery-volume.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
// assembly-cluster-recovery-volume.adoc

[id="proc-cluster-recovery-volume-{context}"]
= Recovering a deleted Kafka cluster
= Recovering a deleted KRaft-based Kafka cluster

[role="_abstract"]
This procedure describes how to recover a deleted cluster from persistent volumes (PVs) by recreating the original `PersistentVolumeClaim` (PVC) resources.
This procedure describes how to recover a deleted Kafka cluster operating in KRaft mode from persistent volumes (PVs) by recreating the original `PersistentVolumeClaim` (PVC) resources.

If the Topic Operator and User Operator are deployed, you can recover `KafkaTopic` and `KafkaUser` resources by recreating them.
It is important that you recreate the `KafkaTopic` resources with the same configurations, or the Topic Operator will try to update them in Kafka.
Expand Down Expand Up @@ -36,12 +36,12 @@ Information is presented for PVs with data.
[source,shell,subs="+quotes,attributes"]
----
NAME RECLAIMPOLICY CLAIM
pvc-5e9c5c7f-3317-11ea-a650-06e1eadd9a4c ... Retain ... myproject/data-my-cluster-zookeeper-1
pvc-5e9cc72d-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-my-cluster-zookeeper-0
pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-my-cluster-zookeeper-2
pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c ... Retain ... myproject/data-0-my-cluster-kafka-0
pvc-7e21042e-3317-11ea-9786-02deaf9aa87e ... Retain ... myproject/data-0-my-cluster-kafka-1
pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-0-my-cluster-kafka-2
pvc-5e9c5c7f-3317-11ea-a650-06e1eadd9a4c ... Retain ... myproject/data-0-my-cluster-broker-0
pvc-5e9cc72d-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-0-my-cluster-broker-1
pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-0-my-cluster-broker-2
pvc-7e1f67f9-3317-11ea-a650-06e1eadd9a4c ... Retain ... myproject/data-0-my-cluster-controller-3
pvc-7e21042e-3317-11ea-9786-02deaf9aa87e ... Retain ... myproject/data-0-my-cluster-controller-4
pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-0-my-cluster-controller-5
----
+
* `NAME` is the name of each PV.
Expand All @@ -52,10 +52,10 @@ pvc-7e226978-3317-11ea-97b0-0aef8816c7ea ... Retain ... myproject/data-0-my
+
[source,shell,subs="+quotes,attributes"]
----
kubectl create namespace my-project
kubectl create namespace myproject
----
+
Here, we recreate the `my-project` namespace.
Here, we recreate the `myproject` namespace.

. Recreate the original PVC resource specifications, linking the PVCs to the appropriate PV:
+
Expand All @@ -65,7 +65,7 @@ Here, we recreate the `my-project` namespace.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-0-my-cluster-kafka-0
name: data-0-my-cluster-broker-0
spec:
accessModes:
- ReadWriteOnce
Expand Down Expand Up @@ -95,7 +95,7 @@ metadata:
labels:
failure-domain.beta.kubernetes.io/region: eu-west-1
failure-domain.beta.kubernetes.io/zone: eu-west-1c
name: pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
name: pvc-5ead43d1-3317-11ea-97b0-0aef8816c7ea
resourceVersion: "39431"
selfLink: /api/v1/persistentvolumes/pvc-7e226978-3317-11ea-97b0-0aef8816c7ea
uid: 7efe6b0d-3317-11ea-a650-06e1eadd9a4c
Expand Down Expand Up @@ -138,7 +138,7 @@ In the example, the following properties are deleted:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: data-0-my-cluster-kafka-2
name: data-0-my-cluster-broker-2
namespace: myproject
resourceVersion: "39113"
uid: 54be1c60-3319-11ea-97b0-0aef8816c7ea
Expand All @@ -148,14 +148,14 @@ claimRef:
+
[source,shell]
----
kubectl create -f install/cluster-operator -n my-project
kubectl create -f install/cluster-operator -n myproject
----

. Recreate all `KafkaTopic` resources by applying the `KafkaTopic` resource configuration:
+
[source,shell]
----
kubectl apply -f <topic_configuration_file>
kubectl apply -f <topic_configuration_file> -n myproject
----

. Recreate all `KafkaUser` resources:
Expand All @@ -167,20 +167,50 @@ Ensure that the recreated secrets have exactly the same name, labels, and fields
.. Apply the `KafkaUser` resource configuration:
+
[source,shell]
kubectl apply -f <user_configuration_file>
kubectl apply -f <user_configuration_file> -n myproject

. Deploy the Kafka cluster using the original configuration for the `Kafka` resource:
. Deploy the Kafka cluster using the original configuration for the `Kafka` resource.
Add the annotation `strimzi.io/pause-reconciliation="true"` to the original configuration for the `Kafka` resource, and then deploy the Kafka cluster using the updated configuration.
+
[source,shell]
----
kubectl apply -f <kafka_resource_configuration>.yaml -n my-project
kubectl apply -f <kafka_resource_configuration>.yaml -n myproject
----

. Recover the original `clusterId` from logs or copies of the `Kafka` custom resource.
Otherwise, you can retrieve it from one of the volumes by spinning up a temporary pod.
+
[source,shell]
----
PVC_NAME="data-0-my-cluster-kafka-0"
COMMAND="grep cluster.id /disk/kafka-log*/meta.properties | awk -F'=' '{print \$2}'"
kubectl run tmp -itq --rm --restart "Never" --image "foo" --overrides "{\"spec\":
{\"containers\":[{\"name\":\"busybox\",\"image\":\"busybox\",\"command\":[\"/bin/sh\",
\"-c\",\"$COMMAND\"],\"volumeMounts\":[{\"name\":\"disk\",\"mountPath\":\"/disk\"}]}],
\"volumes\":[{\"name\":\"disk\",\"persistentVolumeClaim\":{\"claimName\":
\"$PVC_NAME\"}}]}}" -n myproject
----

. Edit the `Kafka` resource to set the `.status.clusterId` with the recovered value:
+
[source,shell]
----
kubectl edit kafka <cluster-name> --subresource status -n myproject
----

. Unpause the `Kafka` resource reconciliation:
+
[source,shell]
----
kubectl annotate kafka my-cluster strimzi.io/pause-reconciliation=false \
--overwrite -n myproject
----

. Verify the recovery of the `KafkaTopic` resources:
+
[source,shell]
----
kubectl get kafkatopics -o wide -w -n my-project
kubectl get kafkatopics -o wide -w -n myproject
----
+
.Kafka topic status
Expand All @@ -198,7 +228,7 @@ my-topic-3 my-cluster 10 3 True
+
[source,shell]
----
kubectl get kafkausers -o wide -w -n my-project
kubectl get kafkausers -o wide -w -n myproject
----
+
.Kafka user status
Expand Down

0 comments on commit f2fa81b

Please sign in to comment.