Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching watermark when some partition leader is not available #269

Open
KimJS0328 opened this issue Aug 5, 2024 · 1 comment
Open

Fetching watermark when some partition leader is not available #269

KimJS0328 opened this issue Aug 5, 2024 · 1 comment

Comments

@KimJS0328
Copy link

Hello. I am reporting a minor bug I encountered while using Kminion.

Bug Scenario

  1. A topic with replication.factor set to 1, or a partition that only has one ISR left
  2. A broker hosting the leader of the above topic-partition goes down, causing a LEADER_NOT_AVAILABLE error for the relevant topic-partition
  3. This issue prevents the collection of metrics for not just the affected partition, but for all topic-partitions across the cluster

Suspected Cause

It seems that the ListOffsets function in the minion/list_offsets.go file is the culprit. There appears to be a slight issue in the code that sends requests and handles errors.

image

From what I've observed, the RequestsWith function from franz-go used here returns the first error it encounters when processing bulk requests. This means that an error return by RequestsWith does not necessarily imply that the entire request has failed.

Due to this, if an error is returned immediately upon encountering an error in the RequestsWith function, the error handling code for individual topic-partitions is not executed, and the metrics for all topic-partitions are not collected.

In my case, commenting out the part where the error is returned resolved the issue and allowed for normal operations.

Please review this issue. Thank you.

@weeco
Copy link
Contributor

weeco commented Nov 3, 2024

Thanks for filing this issue. I think your analysis is correct and I filed a PR for this

weeco added a commit that referenced this issue Nov 15, 2024
weeco added a commit that referenced this issue Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants