Infinite loop while updating EDS #178

fbloo · 2021-05-21T14:33:33Z

First of all, I appreciate you sharing this operator. Currently, I'm gaining some hands-on experience with it and I'm encountering some strange behaviour (to my best knowledge). I'm trying to apply a new configuration to my EDS, but it's having trouble updating the individual pods. Please correct me if I'm doing anything stupid/unsupported.

Expected Behavior

Apply yaml update on EDS
ES operator update all pods within EDS by draining, deleting, and deploying new pods. (Pod by pod)

Actual Behavior

Apply yaml update on EDS
The operator starts draining the pod, and successfully deletes it.
New pod is scheduled. However, ES operator directly shows that the pod should be updated.
ES operator starts draining again and this continues as an infinity loop.

The logs below shows all relevant logging from a single "loop". Notice how it tells that it deleted pod demo/es-data1-0, and then directly pod demo/es-data1-0 should be updated.

Steps to Reproduce the Problem

Set: enabled: true, minReplicas: 1, minIndexReplicas: 0
Apply yaml update on EDS

Specifications

Version: es-operator:latest; elasticsearch-oss:7.5.1
Platform: Azure Kubernetes
Subsystem: any

Logs:

time="2021-05-20T13:12:25Z" level=info msg="Ensuring cluster is in green state" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:25Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DrainingPod' Draining Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:25Z" level=info msg="Disabling auto-rebalance" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Excluding pod demo/es-data1-0 from shard allocation" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Waiting for draining to finish" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Found 0 remaining shards on demo/es-data1-0 (10.244.3.147)" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:26Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DrainedPod' Successfully drained Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:26Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletingPod' Deleting Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:42Z" level=info msg="Event(v1.ObjectReference{Kind:\"ElasticsearchDataSet\", Namespace:\"demo\", Name:\"es-data1\", UID:\"22b3dd79-41b6-4165-bc7f-ad78557d7959\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"6271955\", FieldPath:\"\"}): type: 'Normal' reason: 'DeletedPod' Successfully deleted Pod 'demo/es-data1-0'"
time="2021-05-20T13:12:42Z" level=info msg="Setting exclude list to ''" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:42Z" level=info msg="Enabling auto-rebalance" endpoint="http://es-data1.demo.svc.cluster.local.:9200"
time="2021-05-20T13:12:43Z" level=info msg="Pod demo/es-data1-0 should be updated. Priority: 5 (NodeSelector,PodOldRevision,STSReplicaDiff)"
time="2021-05-20T13:12:43Z" level=info msg="Pod demo/es-data1-1 should be updated. Priority: 5 (NodeSelector,PodOldRevision,STSReplicaDiff)"
time="2021-05-20T13:12:43Z" level=info msg="Found 2 Pods on StatefulSet demo/es-data1 to update"
time="2021-05-20T13:12:43Z" level=info msg="StatefulSet demo/es-data1 has 1/2 ready replicas"

The text was updated successfully, but these errors were encountered:

fbloo · 2021-05-25T15:11:30Z

Issue related to #69

My operator got an argument ---priority-node-selector=lifecycle-status=ready whilst I didn't specify a nodeSelector for my pods. Removed the argument and now it seems to be working fine.

fbloo changed the title ~~Infinity loop while updating EDS~~ Infinite loop while updating EDS May 21, 2021

fbloo closed this as completed May 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop while updating EDS #178

Infinite loop while updating EDS #178

fbloo commented May 21, 2021 •

edited

Loading

fbloo commented May 25, 2021 •

edited

Loading

Infinite loop while updating EDS #178

Infinite loop while updating EDS #178

Comments

fbloo commented May 21, 2021 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

Logs:

fbloo commented May 25, 2021 • edited Loading

fbloo commented May 21, 2021 •

edited

Loading

fbloo commented May 25, 2021 •

edited

Loading