You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello. I am reporting a minor bug I encountered while using Kminion.
Bug Scenario
A topic with replication.factor set to 1, or a partition that only has one ISR left
A broker hosting the leader of the above topic-partition goes down, causing a LEADER_NOT_AVAILABLE error for the relevant topic-partition
This issue prevents the collection of metrics for not just the affected partition, but for all topic-partitions across the cluster
Suspected Cause
It seems that the ListOffsets function in the minion/list_offsets.go file is the culprit. There appears to be a slight issue in the code that sends requests and handles errors.
From what I've observed, the RequestsWith function from franz-go used here returns the first error it encounters when processing bulk requests. This means that an error return by RequestsWith does not necessarily imply that the entire request has failed.
Due to this, if an error is returned immediately upon encountering an error in the RequestsWith function, the error handling code for individual topic-partitions is not executed, and the metrics for all topic-partitions are not collected.
In my case, commenting out the part where the error is returned resolved the issue and allowed for normal operations.
Please review this issue. Thank you.
The text was updated successfully, but these errors were encountered:
Hello. I am reporting a minor bug I encountered while using Kminion.
Bug Scenario
replication.factor
set to 1, or a partition that only has one ISR leftSuspected Cause
It seems that the
ListOffsets
function in theminion/list_offsets.go
file is the culprit. There appears to be a slight issue in the code that sends requests and handles errors.From what I've observed, the
RequestsWith
function fromfranz-go
used here returns the first error it encounters when processing bulk requests. This means that an error return byRequestsWith
does not necessarily imply that the entire request has failed.Due to this, if an error is returned immediately upon encountering an error in the
RequestsWith
function, the error handling code for individual topic-partitions is not executed, and the metrics for all topic-partitions are not collected.In my case, commenting out the part where the error is returned resolved the issue and allowed for normal operations.
Please review this issue. Thank you.
The text was updated successfully, but these errors were encountered: