Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetTopicConfigs takes much longer to return since PR #97 #132

Open
rmb938 opened this issue Dec 3, 2021 · 5 comments
Open

GetTopicConfigs takes much longer to return since PR #97 #132

rmb938 opened this issue Dec 3, 2021 · 5 comments

Comments

@rmb938
Copy link
Contributor

rmb938 commented Dec 3, 2021

In version 2.1.0 the GetTopicConfigs was limited to just the cleanup.policy config https://github.com/cloudhut/kminion/blob/v2.1.0/minion/describe_topic_config.go#L22

However in PR #97 the limitation was removed https://github.com/cloudhut/kminion/pull/97/files#diff-ea87a33952172ffd5cd3de385e172c2398f88c1c76808a63919c6bed164fa4a8L23

In Kafka clusters with tens of thousands of topics this starts taking a very long time to return and eventually hits the 60 second timeout here https://github.com/cloudhut/kminion/blob/master/prometheus/exporter.go#L224 and/or promethues scrape timeouts.

Can the GetTopicConfigs function be modified to use the config option infoMetric.configKeys so it doesn't try to get all config items for the topics?

@rmb938
Copy link
Contributor Author

rmb938 commented Dec 3, 2021

Thinking about, maybe it makes sense to cache topic config like metadata is https://github.com/cloudhut/kminion/blob/cc3d571a471a772b9e5b2a76830c9fb234f2775f/prometheus/collect_topic_info.go#L12

Topic configs won't change often.

@weeco
Copy link
Contributor

weeco commented Dec 17, 2021

Hey @rmb938 ,
thanks for filing this issue. Unfortunately I'm very busy at the moment, but during the holidays I should have some time to look into this.

I wasn't aware that requesting more/all topic configs has such a large impact on the performance for clusters with large numbers of topics. We will figure out something! Making it configurable via a config could be an option for sure, unsure if we should reuse the config from the infoMetric because it's been created specifically for one metric series. Good idea though!

@amuraru
Copy link
Contributor

amuraru commented Dec 17, 2021

I second this! Thanks for filing this @rmb938!
When I initially developed PR #97 my local benchmarks showed no impact when querying one or all topic configurations in kafka so I opted to do the filtering based on configured infoMetrics later when the metrics are exposed. We could definetly move the filter upper and only query the selected topic config keys in Kafka

@rmb938
Copy link
Contributor Author

rmb938 commented Dec 17, 2021

Thanks @weeco , I completely understand being busy and have no issue staying on 2.1.0 until this is fixed.

@amuraru Yea it is a bit weird and I'm not 100% sure your PR caused the issue but it was the only one that seemed to make any change related to querying Kafka, however using a version from before your PR was merged in doesn't have the same performance issue.

It may make sense to "make metrics for the metrics" i.e add timer metrics to the Kakfa function calls so we could easily measure and record these types of perf issues.

@rmb938
Copy link
Contributor Author

rmb938 commented Sep 26, 2023

@weeco Any progress on fixing this? I see there was a PR opened a while ago but it was closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants