You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 18, 2024. It is now read-only.
We observed that when EXPECTED_CLUSTER_SIZE is not set (or explicitly set to -1), this destroyed measured produce latencies.
It seems that before (or during?) every request, Canary was trying to micro-manage the replicas and their leaders for the canary topic on the Kafka cluster, which was taking a lot of time and processing, resulting in extremely slow responses to the produce requests.
Average latencies as reported when EXPECTED_CLUSTER_SIZE is set correctly: 3-5ms
Average latencies as reported when EXPECTED_CLUSTER_SIZE is NOT set: 1000-2000ms
Somehow the things that canary does on the cluster slow everything down dramatically.
It also leads to an explosion in logs. With the correct setting, my empty brokers (2-broker cluster, no other clients running except Canary) logged around 8 lines per minute. When the cluster size setting is missing, they logged around 500 lines per minute (the canary reconcile interval was 10sec=default).
I don't know what Canary does in detail or why, but it feels like a bug to me.
The description in the README says that I should expect more partitions reassignment of the topic while the Kafka cluster is starting up and the brokers are coming one by one, but what I actually observe is that partitions are getting reassigned on every reconciliation (every 10sec), leading to redundant work on the brokers, which cause high produce latencies and increased log volume.
The text was updated successfully, but these errors were encountered:
We observed that when EXPECTED_CLUSTER_SIZE is not set (or explicitly set to -1), this destroyed measured produce latencies.
It seems that before (or during?) every request, Canary was trying to micro-manage the replicas and their leaders for the canary topic on the Kafka cluster, which was taking a lot of time and processing, resulting in extremely slow responses to the produce requests.
Average latencies as reported when EXPECTED_CLUSTER_SIZE is set correctly: 3-5ms
Average latencies as reported when EXPECTED_CLUSTER_SIZE is NOT set: 1000-2000ms
Somehow the things that canary does on the cluster slow everything down dramatically.
It also leads to an explosion in logs. With the correct setting, my empty brokers (2-broker cluster, no other clients running except Canary) logged around 8 lines per minute. When the cluster size setting is missing, they logged around 500 lines per minute (the canary reconcile interval was 10sec=default).
I don't know what Canary does in detail or why, but it feels like a bug to me.
The description in the README says that I should expect
more partitions reassignment of the topic while the Kafka cluster is starting up and the brokers are coming one by one
, but what I actually observe is that partitions are getting reassigned on every reconciliation (every 10sec), leading to redundant work on the brokers, which cause high produce latencies and increased log volume.The text was updated successfully, but these errors were encountered: