You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
okdas opened this issue
Mar 23, 2023
· 4 comments
· May be fixed by #901
Assignees
Labels
infraCore infrastructure - not protocol relatedtelemetryeverything related to collection telemetrytriageIt requires some decision-making at team level (it can't be worked on as it stands)
It appears most metrics we've worked in the past are not being populated on Prometheus exporter. That means either we no longer need to expose that information for monitoring and troubleshooting purposes, or there is a mistake that prevents that data from being collected.
Either way, now that we have working DevNets it is a good time to revisit metrics to see what we can monitor and improve.
Goals
Provide meaningful metrics so node runners, including in-house operations, can get insights of software operation
Standardize how we add metrics for telemtry
Deliverable
Observe DevNet and identify 5-10 metrics that need to be tracked via telemetry
Create downstream tickets to document & implement the metrics above in 1 or more separate issues
TBD, but we likely need metrics to provide information about p2p usage (number of peers, messages, errors), persistence and rpc metrics (num of requests, typical http server metrics, etc).
Non-goals / Non-deliverables
This has nothing to do with analytics or traces
General issue deliverables
Update the appropriate CHANGELOG(s)
Update any relevant local/global README(s)
Update relevant source code tree explanations
Add or update any relevant or supporting mermaid diagrams
Testing Methodology
Task specific tests or benchmarks: make ...
New tests or benchmarks: make ...
All tests: make test_all
LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md
@okdas I've added the triage label to this ticket because I do not think it's ready yet. Standardize and add metric seems like an issue with a lot of scope, and the fact that we cannot define an exact deliverable is an example of that. What do you think of these as deliverables:
Observe DevNet and identify 5-10 metrics that need to be tracked via telemetry
Create downstream tickets to document & implement the metrics above in 1 or more separate issues
infraCore infrastructure - not protocol relatedtelemetryeverything related to collection telemetrytriageIt requires some decision-making at team level (it can't be worked on as it stands)
Objective
It appears most metrics we've worked in the past are not being populated on Prometheus exporter. That means either we no longer need to expose that information for monitoring and troubleshooting purposes, or there is a mistake that prevents that data from being collected.
Either way, now that we have working DevNets it is a good time to revisit metrics to see what we can monitor and improve.
Goals
Deliverable
TBD, but we likely need metrics to provide information about p2p usage (number of peers, messages, errors), persistence and rpc metrics (num of requests, typical http server metrics, etc).Non-goals / Non-deliverables
General issue deliverables
Testing Methodology
make ...
make ...
make test_all
LocalNet
is still functioning correctly by following the instructions at docs/development/README.mdCreator: @okdas
The text was updated successfully, but these errors were encountered: