Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: metric for time of last message produced in a topic #187

Open
hhromic opened this issue Feb 7, 2023 · 2 comments
Open

Comments

@hhromic
Copy link
Contributor

hhromic commented Feb 7, 2023

Confluent's Control Center (since version 6.2.0) implemented Improved Topic Inspection via Last-Produced Timestamp.

From: https://www.confluent.io/blog/better-kafka-management-with-improved-topic-inspection-in-confluent-part-3/#how-it-works

The “Topics” overview page gives a summary—health, throughput—of all the topics for a cluster. Confluent Platform 6.2.0 introduces a new column, “Last produced,” which reports the timestamp of the latest message produced to each topic. Using this report, you can easily compare and identify the topics that have not been produced in a long time.

This feature is actually quite useful and we think that such a feature could be very nice for kminion as well in the form of a topic/partition gauge-type metric to indicate the time of last message produced for a topic. For example:

# HELP kminion_kafka_topic_partition_last_produced_seconds Timestamp (seconds since Unix Epoch) of the last message produced for a given partition in a topic
# TYPE kminion_kafka_topic_partition_last_produced_seconds gauge
kminion_kafka_topic_partition_last_produced_seconds{partition_id="0",topic_name="__consumer_offsets"} 1675792649

The linked blog post from Confluent describes the approach to obtain this metric using an internal consumer, which sounds quite feasible to implement in kminion and its internal consumer as well. But given that this metric requires constantly consuming messages from multiple partitions/topics, there are performance considerations to keep in mind.

Hope you are interested! If you are, maybe I can dedicate some time to put together a POC.

@TheMeier
Copy link
Contributor

I guess the ending of the metric should be _timestamp_seconds https://prometheus.io/docs/practices/naming/
I read the docs you linked. The process you describe is quite involved and I wonder how much overhead that produces.

Because KafkaConsumer::poll() could timeout in the case of dormant topics, obtaining the last-produced timestamp is a relatively expensive API call

In any case I would secure such a feature with a feature toggle and maybe a topic black/whitelist.

@hhromic
Copy link
Contributor Author

hhromic commented Mar 30, 2023

I guess the ending of the metric should be _timestamp_seconds https://prometheus.io/docs/practices/naming/

Ah yes! I do follow that good practices document often, forgot that there is a specific case for timestamps. 👍

I read the docs you linked. The process you describe is quite involved and I wonder how much overhead that produces.

Yes, I have been thinking about it and indeed is more complex than it looks like.
And I agree that users of this feature probably would want to enable topic filtering.

In any case I would secure such a feature with a feature toggle and maybe a topic black/whitelist.

Yes, agreed.

Currently I'm lacking a bit of time to attempt a POC implementation, so if anybody else feels like giving it a try, please go ahead. Otherwise I will try to get more familiar with kminion's codebase and see what can I POC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants