Bug: Missing EXPECTED_CLUSTER_SIZE leads to massive load on brokers #221

pantaoran · 2023-09-08T13:09:46Z

We observed that when EXPECTED_CLUSTER_SIZE is not set (or explicitly set to -1), this destroyed measured produce latencies.
It seems that before (or during?) every request, Canary was trying to micro-manage the replicas and their leaders for the canary topic on the Kafka cluster, which was taking a lot of time and processing, resulting in extremely slow responses to the produce requests.

Average latencies as reported when EXPECTED_CLUSTER_SIZE is set correctly: 3-5ms
Average latencies as reported when EXPECTED_CLUSTER_SIZE is NOT set: 1000-2000ms

Somehow the things that canary does on the cluster slow everything down dramatically.
It also leads to an explosion in logs. With the correct setting, my empty brokers (2-broker cluster, no other clients running except Canary) logged around 8 lines per minute. When the cluster size setting is missing, they logged around 500 lines per minute (the canary reconcile interval was 10sec=default).

I don't know what Canary does in detail or why, but it feels like a bug to me.

The description in the README says that I should expect more partitions reassignment of the topic while the Kafka cluster is starting up and the brokers are coming one by one, but what I actually observe is that partitions are getting reassigned on every reconciliation (every 10sec), leading to redundant work on the brokers, which cause high produce latencies and increased log volume.

The text was updated successfully, but these errors were encountered:

mschurenko · 2024-03-07T21:48:53Z

I'm experiencing the same thing. I get the following message on the kafka controller every 10 seconds:

[2024-03-07 00:32:32,957] INFO [Controller id=2] Successfully updated assignment of partition __strimzi_canary-1 to
ReplicaAssignment(replicas=2,3,1, addingReplicas=, removingReplicas=, observers=, targetObservers=None) (kafka.controller.KafkaController)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Missing EXPECTED_CLUSTER_SIZE leads to massive load on brokers #221

Bug: Missing EXPECTED_CLUSTER_SIZE leads to massive load on brokers #221

pantaoran commented Sep 8, 2023

mschurenko commented Mar 7, 2024

Bug: Missing EXPECTED_CLUSTER_SIZE leads to massive load on brokers #221

Bug: Missing EXPECTED_CLUSTER_SIZE leads to massive load on brokers #221

Comments

pantaoran commented Sep 8, 2023

mschurenko commented Mar 7, 2024