-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any way of getting estimated consumer lag in seconds in promql? #182
Comments
Hey @sebw91 , I thought about how to solve this in the past already and I had different ideas. There's one exporter that uses interpolation, see: https://github.com/seglo/kafka-lag-exporter for more information. It's a bigger effort to implement this and currently I don't plan to spend this amount of time on kminion. If you are interested in trying this I'd suggest to come up with a proposal that we can discuss here, before starting with the implementation. It's not trivial to implement it so that it can scale in larger clusters though and that would be a requirement for KMinion. |
Thanks a lot for the info. I was hoping it would be possible to do something in promql. From what I can see the data we need is all there to compute a very rough estimate. I would be fine without interpolation, just a lower bound on the topic_high_water_mark_sum to get a timestamp for consumer offset would be sufficient. This may not be possible though. |
Oh I see what you mean. You are saying the information when a certain high water mark in a partition was reached is stored in Prometheus already (at least up to the retention), so that you put the intepolation logic into the PromQL somehow. That's indeed a good idea! I'm not sure whether it's possibly with the available PromQL functions, but definetely worth a try to give that idea a try! |
I think there is a way (kinda), using offset promql modifier! If the hwm of a topic 5 minutes ago is greater than the current offset_sum in consumer group, then we can determine that we are at least 5 minutes behind, for example: `kminion_kafka_topic_high_water_mark_sum offset 5m > on (topic_name) kminion_kafka_consumer_group_topic_offset_sum + 1 I will continue exploring on my side, but this should do the trick for us. |
This is an interesting request/subject. As mentioned already, Kafka has no notion of consumer lag in time units (seconds) itself. Probably because it actually depends on how fast a consumer can/is consuming a given partition. More in general, the current/expected throughput for consumption by a consumer. For this reason, we approximate the consumer lag (all-partitions mode) in seconds using the consumer rate like this: sum(kminion_kafka_consumer_group_topic_lag{job=~"$job",group_id=~"$group_id"})
by (group_id,topic_name) / on (topic_name)
group_left sum(rate(kminion_kafka_topic_high_water_mark_sum{job=~"$job"} [$__rate_interval]))
by (group_id,topic_name) This is used in Grafana, hence the usage of Hope this is useful somehow :) |
@hhromic That's a clever prom query - very useful. Thanks very much. I think this is accurate enough for my use case. |
Kminion works great, thank you.
Anyone have a way of computing an estimated consumer time lag in promql?
I think we'd have to somehow join two series, kminion_kafka_consumer_group_topic_offset_sum and kafka_topic_high_water_mark_sum.
Conceptually the query should be something along the lines of:
time() - time_at_value(kafka_topic_high_water_mark_sum, kminion_kafka_consumer_group_topic_offset_sum + 1)
Where time_at_value is a method of getting the timestamp of a series at a value. Not something that exists in prometheus.
The text was updated successfully, but these errors were encountered: