-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Sentry metrics with Prometheus metrics #918
Conversation
`shared.metrics` does not re-export `statsd` anymore.
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #918 +/- ##
=======================================
Coverage 96.23% 96.23%
=======================================
Files 823 824 +1
Lines 18972 18980 +8
=======================================
+ Hits 18257 18265 +8
Misses 715 715
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov ReportAll modified and coverable lines are covered by tests ✅ ✅ All tests successful. No failed tests found. 📢 Thoughts on this report? Let us know! |
optional_fields = { | ||
"repo_visibility": repo_visibility, | ||
"position": position, | ||
"upload_version": upload_version, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code doesn't look like it's doing anything different from the original code, and the original code is a bit simpler. What is the reason to create an optional_fields
dict here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original code, the optional_fields
are not created if we don't provide them in the arguments, and python's prometheus-client
errors if we initialize a metric with N labels, but call them with some empty labels. This change allows it to create these fields that are filled with None such that they are present in the labels dict.
The original code does not create the fields if the corresponding arguments are not provided. However, the prometheus-client in Python raises an error if we initialize a metric with a certain number of labels but then call it with some empty labels. I introduced optional_fields
dict here to ensure that these fields are always present in the labels dictionary, even if they are filled with None.
i.e. old code, generate_tags(..., endpoint="some_endpoint")
returns:
{
...
endpoint: "some_endpoint",
}
in the new code, it will return
{
...
endpoint: "some_endpoint",
repo_visibility: "None",
position: "None",
upload_version: "None",
}
If we still want the first dict to be returned, we can call generate tags with fill_labels=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I think that the new code will actually return
{
...
endpoint: "some_endpoint",
repo_visibility: None,
position: None,
upload_version: None,
}
where the values for the three extra fields are of None type (not string). Is this still fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and that's fine, in prometheus it will interpret it as a string "None"
.
upload/views/bundle_analysis.py
Outdated
position="end", | ||
) | ||
BUNDLE_ANALYSIS_UPLOAD_VIEWS_COUNTER.labels(**labels).inc() | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try
statement adds a lot of complexity in this code. I think it's good to not break the upload due to errors with metrics, but can we maybe abstract the error handling in another function? i.e. have a function called inc_counter(counter, tags)
that handles errors and logs warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this since this try-except block was present a few days ago to not break the upload. I agree, an inc_counter
function that handles all of these errors would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the PR for the change! codecov/shared#407
This PR includes changes to |
❌ 4 Tests Failed:
View the top 3 failed tests by shortest run time
To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard |
Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time. ❌ Failed Test Results:Completed 2638 tests with View the full list of failed testspytest
|
This PR includes changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple spots we could add comments, but functionality makes sense to me!
@@ -51,6 +50,17 @@ | |||
buckets=[0.05, 0.1, 0.25, 0.5, 0.75, 1, 2, 5, 10, 30], | |||
) | |||
|
|||
GQL_REQUEST_MADE_COUNTER = Counter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't quite tell - what's the intended difference between this one (GQL_REQUEST_MADE_COUNTER
) and GQL_HIT_COUNTER
in above (line 34)? Any opportunities to combine?
Also a nit but it seems like we use "Total" in the description when it's a Histogram
not Counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added GQL_REQUEST_MADE_COUNTER
so that the post
function in AsyncGraphqlView
class can increment this metric with the label path=req_path
, and the other metric GQL_HIT_COUNTER
is called when the actual gql API is called through the request_started
extension: https://ariadnegraphql.org/docs/extensions. If we want to combine this, then we would have to give req_path
to the QueryMetricsExtension
in some way, I was thinking of setting a self.path
attribute to the AsyncGraphqlView
class but since it's handling many async requests I don't think that would work. Do you know of a way to implement this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe path is available in the context
argument like info.context["request"].url.path
, if that's what you mean?
If this will be a duplicate of the information in GQL_HIT_COUNTER
then it seems like removing/combining makes sense. But since it sounds like it's meant to be another step in the lifecycle of a request, we can use it for debugging, for example comparing numbers against the GQL_REQUEST_MADE_COUNTER
. Actually I like the location of the GQL_REQUEST_MADE_COUNTER
better than the other one that already existed, so we can just leave it! I saw you were just converting the sentry one to prometheus anyway so we can just proceed.
GQL_ERROR_TYPE_COUNTER = Counter( | ||
"api_gql_errors", | ||
"Number of times API GQL endpoint failed with an exception by type", | ||
["error_type", "path"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include a comment on when to use existing GQL_ERROR_COUNTER
vs. new GQL_ERROR_TYPE_COUNTER
for a particular use case? Or is this something we could potentially combine?
"position", | ||
"upload_version", | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this pattern of pulling it out into a separate file!
I see the other spots in this PR are following the existing pattern of keeping it in the views.py
file because it would be a bigger lift to move those around, but this does seem like a nicer system going forward where possible!
upload/helpers.py
Outdated
action, | ||
request, | ||
is_shelter_request, | ||
endpoint: Optional[str] = None, | ||
repository: Optional[Repository] = None, | ||
position: Optional[str] = None, | ||
upload_version: Optional[str] = None, | ||
fill_labels: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A name like fill_optional_labels
or include_empty_labels
may be more descriptive to someone who wants to decide what value to pass here.
Also we may include a comment by the optional_fields
dict on what was discussed in the convo below so someone doesn't inadvertently remove one of these optional fields and it breaks a metric somewhere else due to the missing label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this would be much clearer. I've opted with using include_empty_labels
.
Purpose/Motivation
Closes #460
Links to relevant tickets
What does this PR do?
Include a brief description of the changes in this PR. Bullet points are your friend.
Notes to Reviewer
Anything to note to the team? Any tips on how to review, or where to start?
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.