Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Telemetry] Standardize and add metrics #606

Open
10 tasks
okdas opened this issue Mar 23, 2023 · 4 comments · May be fixed by #901
Open
10 tasks

[Telemetry] Standardize and add metrics #606

okdas opened this issue Mar 23, 2023 · 4 comments · May be fixed by #901
Assignees
Labels
infra Core infrastructure - not protocol related telemetry everything related to collection telemetry triage It requires some decision-making at team level (it can't be worked on as it stands)

Comments

@okdas
Copy link
Member

okdas commented Mar 23, 2023

Objective

It appears most metrics we've worked in the past are not being populated on Prometheus exporter. That means either we no longer need to expose that information for monitoring and troubleshooting purposes, or there is a mistake that prevents that data from being collected.

Either way, now that we have working DevNets it is a good time to revisit metrics to see what we can monitor and improve.

Goals

  • Provide meaningful metrics so node runners, including in-house operations, can get insights of software operation
  • Standardize how we add metrics for telemtry

Deliverable

  • Observe DevNet and identify 5-10 metrics that need to be tracked via telemetry
  • Create downstream tickets to document & implement the metrics above in 1 or more separate issues

TBD, but we likely need metrics to provide information about p2p usage (number of peers, messages, errors), persistence and rpc metrics (num of requests, typical http server metrics, etc).

Non-goals / Non-deliverables

  • This has nothing to do with analytics or traces

General issue deliverables

  • Update the appropriate CHANGELOG(s)
  • Update any relevant local/global README(s)
  • Update relevant source code tree explanations
  • Add or update any relevant or supporting mermaid diagrams

Testing Methodology

  • Task specific tests or benchmarks: make ...
  • New tests or benchmarks: make ...
  • All tests: make test_all
  • LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md

Creator: @okdas

@okdas okdas added infra Core infrastructure - not protocol related telemetry everything related to collection telemetry labels Mar 23, 2023
@okdas okdas self-assigned this Mar 23, 2023
@Olshansk Olshansk moved this to Backlog in V1 Dashboard Mar 23, 2023
@Olshansk Olshansk added the triage It requires some decision-making at team level (it can't be worked on as it stands) label Mar 23, 2023
@Olshansk
Copy link
Member

@okdas I've added the triage label to this ticket because I do not think it's ready yet. Standardize and add metric seems like an issue with a lot of scope, and the fact that we cannot define an exact deliverable is an example of that. What do you think of these as deliverables:

  • Observe DevNet and identify 5-10 metrics that need to be tracked via telemetry
  • Create downstream tickets to document & implement the metrics above in 1 or more separate issues

@okdas
Copy link
Member Author

okdas commented Mar 23, 2023

@Olshansk this sounds great, I like the idea of limiting the scope of the ticket! Let me steal the deliverables you mentioned. :)

@jessicadaugherty
Copy link
Contributor

jessicadaugherty commented Apr 3, 2023

@okdas define Metrics to be implemented (focus on HotPOKT and Raintree observability)

@jessicadaugherty jessicadaugherty moved this from Backlog to Up Next in V1 Dashboard Apr 5, 2023
@okdas okdas linked a pull request Jul 12, 2023 that will close this issue
20 tasks
@okdas
Copy link
Member Author

okdas commented Jul 14, 2023

I'm actively working on this.

@okdas okdas moved this from Up Next to In Progress in V1 Dashboard Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra Core infrastructure - not protocol related telemetry everything related to collection telemetry triage It requires some decision-making at team level (it can't be worked on as it stands)
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

3 participants