Skip to content
This repository has been archived by the owner on Oct 18, 2024. It is now read-only.

Bug: Missing EXPECTED_CLUSTER_SIZE leads to massive load on brokers #221

Open
pantaoran opened this issue Sep 8, 2023 · 1 comment
Open

Comments

@pantaoran
Copy link

We observed that when EXPECTED_CLUSTER_SIZE is not set (or explicitly set to -1), this destroyed measured produce latencies.
It seems that before (or during?) every request, Canary was trying to micro-manage the replicas and their leaders for the canary topic on the Kafka cluster, which was taking a lot of time and processing, resulting in extremely slow responses to the produce requests.

Average latencies as reported when EXPECTED_CLUSTER_SIZE is set correctly: 3-5ms
Average latencies as reported when EXPECTED_CLUSTER_SIZE is NOT set: 1000-2000ms

Somehow the things that canary does on the cluster slow everything down dramatically.
It also leads to an explosion in logs. With the correct setting, my empty brokers (2-broker cluster, no other clients running except Canary) logged around 8 lines per minute. When the cluster size setting is missing, they logged around 500 lines per minute (the canary reconcile interval was 10sec=default).

I don't know what Canary does in detail or why, but it feels like a bug to me.

The description in the README says that I should expect more partitions reassignment of the topic while the Kafka cluster is starting up and the brokers are coming one by one, but what I actually observe is that partitions are getting reassigned on every reconciliation (every 10sec), leading to redundant work on the brokers, which cause high produce latencies and increased log volume.

@mschurenko
Copy link

I'm experiencing the same thing. I get the following message on the kafka controller every 10 seconds:

[2024-03-07 00:32:32,957] INFO [Controller id=2] Successfully updated assignment of partition __strimzi_canary-1 to
ReplicaAssignment(replicas=2,3,1, addingReplicas=, removingReplicas=, observers=, targetObservers=None) (kafka.controller.KafkaController)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants