Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Sentry metrics with Prometheus metrics #918

Merged
merged 34 commits into from
Oct 30, 2024

Conversation

tony-codecov
Copy link
Contributor

@tony-codecov tony-codecov commented Oct 22, 2024

Purpose/Motivation

Closes #460

Links to relevant tickets

What does this PR do?

Include a brief description of the changes in this PR. Bullet points are your friend.

Notes to Reviewer

Anything to note to the team? Any tips on how to review, or where to start?

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@tony-codecov tony-codecov requested review from a team as code owners October 22, 2024 20:42
Copy link

codecov bot commented Oct 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.23%. Comparing base (cdebb02) to head (c691c6e).
Report is 6 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #918   +/-   ##
=======================================
  Coverage   96.23%   96.23%           
=======================================
  Files         823      824    +1     
  Lines       18972    18980    +8     
=======================================
+ Hits        18257    18265    +8     
  Misses        715      715           
Flag Coverage Δ
unit 92.47% <100.00%> (+<0.01%) ⬆️
unit-latest-uploader 92.47% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@codecov-notifications
Copy link

codecov-notifications bot commented Oct 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Comment on lines +815 to +819
optional_fields = {
"repo_visibility": repo_visibility,
"position": position,
"upload_version": upload_version,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't look like it's doing anything different from the original code, and the original code is a bit simpler. What is the reason to create an optional_fields dict here?

Copy link
Contributor Author

@tony-codecov tony-codecov Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, the optional_fields are not created if we don't provide them in the arguments, and python's prometheus-client errors if we initialize a metric with N labels, but call them with some empty labels. This change allows it to create these fields that are filled with None such that they are present in the labels dict.

The original code does not create the fields if the corresponding arguments are not provided. However, the prometheus-client in Python raises an error if we initialize a metric with a certain number of labels but then call it with some empty labels. I introduced optional_fields dict here to ensure that these fields are always present in the labels dictionary, even if they are filled with None.

i.e. old code, generate_tags(..., endpoint="some_endpoint") returns:

{
   ...
   endpoint: "some_endpoint",
}

in the new code, it will return

{
   ...
   endpoint: "some_endpoint",
   repo_visibility: "None",
   position: "None",
   upload_version: "None",
}

If we still want the first dict to be returned, we can call generate tags with fill_labels=False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I think that the new code will actually return

{
   ...
   endpoint: "some_endpoint",
   repo_visibility: None,
   position: None,
   upload_version: None,
}

where the values for the three extra fields are of None type (not string). Is this still fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and that's fine, in prometheus it will interpret it as a string "None".

position="end",
)
BUNDLE_ANALYSIS_UPLOAD_VIEWS_COUNTER.labels(**labels).inc()
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try statement adds a lot of complexity in this code. I think it's good to not break the upload due to errors with metrics, but can we maybe abstract the error handling in another function? i.e. have a function called inc_counter(counter, tags) that handles errors and logs warnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this since this try-except block was present a few days ago to not break the upload. I agree, an inc_counter function that handles all of these errors would be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the PR for the change! codecov/shared#407

Copy link
Contributor

This PR includes changes to shared. Please review them here: codecov/shared@f0e213c...cde9892

@codecov-qa
Copy link

codecov-qa bot commented Oct 23, 2024

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
2632 4 2628 6
View the top 3 failed tests by shortest run time
api/public/v2/tests/test_test_results_view.py::TestResultsViewsetTests::test_list_filters
Stack Traces | 0.407s run time
self = &lt;test_test_results_view.TestResultsViewsetTests testMethod=test_list_filters&gt;

    def test_list_filters(self):
        url = reverse(
            "api-v2-tests-results-list",
            kwargs={
                "service": self.org.service,
                "owner_username": self.org.username,
                "repo_name": self.repo.name,
            },
        )
        res = self.client.get(f"{url}?commit_id={self.test_instances[0].commitid}")
        assert res.status_code == status.HTTP_200_OK
        assert res.json() == {
            "count": 1,
            "next": None,
            "previous": None,
            "results": [
                {
                    "id": self.test_instances[0].id,
                    "name": self.test_instances[0].test.name,
                    "test_id": self.test_instances[0].test_id,
                    "failure_message": self.test_instances[0].failure_message,
                    "duration_seconds": self.test_instances[0].duration_seconds,
                    "commitid": self.test_instances[0].commitid,
                    "outcome": self.test_instances[0].outcome,
                    "branch": self.test_instances[0].branch,
                    "repoid": self.test_instances[0].repoid,
                    "commits_where_fail": self.test_instances[
                        0
                    ].test.commits_where_fail,
                },
            ],
            "total_pages": 1,
&gt;       }
E       AttributeError: 'Test' object has no attribute 'commits_where_fail'

.../v2/tests/test_test_results_view.py:111: AttributeError
api/public/v2/tests/test_test_results_view.py::TestResultsViewsetTests::test_result_with_valid_super_token
Stack Traces | 0.418s run time
self = &lt;test_test_results_view.TestResultsViewsetTests testMethod=test_result_with_valid_super_token&gt;
repository_artifact_permissions_has_permission = &lt;MagicMock name='has_permission' id='140063576179104'&gt;

    @override_settings(SUPER_API_TOKEN="testaxs3o76rdcdpfzexuccx3uatui2nw73r")
    @patch("api.shared.permissions.RepositoryArtifactPermissions.has_permission")
    def test_result_with_valid_super_token(
        self, repository_artifact_permissions_has_permission
    ):
        repository_artifact_permissions_has_permission.return_value = False
        res = self.client.get(
            reverse(
                "api-v2-tests-results-detail",
                kwargs={
                    "service": self.org.service,
                    "owner_username": self.org.username,
                    "repo_name": self.repo.name,
                    "pk": self.test_instances[0].pk,
                },
            ),
            HTTP_AUTHORIZATION="Bearer testaxs3o76rdcdpfzexuccx3uatui2nw73r",
        )
        assert res.status_code == 200
        assert res.json() == {
            "id": self.test_instances[0].id,
            "name": self.test_instances[0].test.name,
            "test_id": self.test_instances[0].test_id,
            "failure_message": self.test_instances[0].failure_message,
            "duration_seconds": self.test_instances[0].duration_seconds,
            "commitid": self.test_instances[0].commitid,
            "outcome": self.test_instances[0].outcome,
            "branch": self.test_instances[0].branch,
            "repoid": self.test_instances[0].repoid,
            "commits_where_fail": self.test_instances[0].test.commits_where_fail,
&gt;       }
E       AttributeError: 'Test' object has no attribute 'commits_where_fail'

.../v2/tests/test_test_results_view.py:241: AttributeError
api/public/v2/tests/test_test_results_view.py::TestResultsViewsetTests::test_retrieve
Stack Traces | 0.427s run time
self = &lt;test_test_results_view.TestResultsViewsetTests testMethod=test_retrieve&gt;
get_repo_permissions = &lt;MagicMock name='get_repo_permissions' id='140062866114944'&gt;

    @patch("api.shared.repo.repository_accessors.RepoAccessors.get_repo_permissions")
    def test_retrieve(self, get_repo_permissions):
        get_repo_permissions.return_value = (True, True)
        res = self.client.get(
            reverse(
                "api-v2-tests-results-detail",
                kwargs={
                    "service": self.org.service,
                    "owner_username": self.org.username,
                    "repo_name": self.repo.name,
                    "pk": self.test_instances[0].pk,
                },
            )
        )
        assert res.status_code == status.HTTP_200_OK
        assert res.json() == {
            "id": self.test_instances[0].id,
            "name": self.test_instances[0].test.name,
            "test_id": self.test_instances[0].test_id,
            "failure_message": self.test_instances[0].failure_message,
            "duration_seconds": self.test_instances[0].duration_seconds,
            "commitid": self.test_instances[0].commitid,
            "outcome": self.test_instances[0].outcome,
            "branch": self.test_instances[0].branch,
            "repoid": self.test_instances[0].repoid,
            "commits_where_fail": self.test_instances[0].test.commits_where_fail,
&gt;       }
E       AttributeError: 'Test' object has no attribute 'commits_where_fail'

.../v2/tests/test_test_results_view.py:139: AttributeError

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

Copy link

codecov-public-qa bot commented Oct 23, 2024

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

❌ Failed Test Results:

Completed 2638 tests with 4 failed, 2628 passed and 6 skipped.

View the full list of failed tests

pytest

  • Class name: api.public.v2.tests.test_test_results_view.TestResultsViewsetTests
    Test name: test_list

    self = <test_test_results_view.TestResultsViewsetTests testMethod=test_list>

    def test_list(self):
    url = reverse(
    "api-v2-tests-results-list",
    kwargs={
    "service": self.org.service,
    "owner_username": self.org.username,
    "repo_name": self.repo.name,
    },
    )
    res = self.client.get(url)
    assert res.status_code == status.HTTP_200_OK
    assert res.json() == {
    "count": 2,
    "next": None,
    "previous": None,
    "results": [
    {
    "id": self.test_instances[0].id,
    "name": self.test_instances[0].test.name,
    "test_id": self.test_instances[0].test_id,
    "failure_message": self.test_instances[0].failure_message,
    "duration_seconds": self.test_instances[0].duration_seconds,
    "commitid": self.test_instances[0].commitid,
    "outcome": self.test_instances[0].outcome,
    "branch": self.test_instances[0].branch,
    "repoid": self.test_instances[0].repoid,
    "commits_where_fail": self.test_instances[
    0
    ].test.commits_where_fail,
    },
    {
    "id": self.test_instances[1].id,
    "name": self.test_instances[1].test.name,
    "test_id": self.test_instances[1].test_id,
    "failure_message": self.test_instances[1].failure_message,
    "duration_seconds": self.test_instances[1].duration_seconds,
    "commitid": self.test_instances[1].commitid,
    "outcome": self.test_instances[1].outcome,
    "branch": self.test_instances[1].branch,
    "repoid": self.test_instances[1].repoid,
    "failure_rate": self.test_instances[1].test.failure_rate,
    "commits_where_fail": self.test_instances[
    1
    ].test.commits_where_fail,
    },
    ],
    "total_pages": 1,
    > }
    E AttributeError: 'Test' object has no attribute 'commits_where_fail'

    .../v2/tests/test_test_results_view.py:77: AttributeError
  • Class name: api.public.v2.tests.test_test_results_view.TestResultsViewsetTests
    Test name: test_list_filters

    self = <test_test_results_view.TestResultsViewsetTests testMethod=test_list_filters>

    def test_list_filters(self):
    url = reverse(
    "api-v2-tests-results-list",
    kwargs={
    "service": self.org.service,
    "owner_username": self.org.username,
    "repo_name": self.repo.name,
    },
    )
    res = self.client.get(f"{url}&commit_id={self.test_instances[0].commitid}")
    assert res.status_code == status.HTTP_200_OK
    assert res.json() == {
    "count": 1,
    "next": None,
    "previous": None,
    "results": [
    {
    "id": self.test_instances[0].id,
    "name": self.test_instances[0].test.name,
    "test_id": self.test_instances[0].test_id,
    "failure_message": self.test_instances[0].failure_message,
    "duration_seconds": self.test_instances[0].duration_seconds,
    "commitid": self.test_instances[0].commitid,
    "outcome": self.test_instances[0].outcome,
    "branch": self.test_instances[0].branch,
    "repoid": self.test_instances[0].repoid,
    "commits_where_fail": self.test_instances[
    0
    ].test.commits_where_fail,
    },
    ],
    "total_pages": 1,
    > }
    E AttributeError: 'Test' object has no attribute 'commits_where_fail'

    .../v2/tests/test_test_results_view.py:111: AttributeError
  • Class name: api.public.v2.tests.test_test_results_view.TestResultsViewsetTests
    Test name: test_result_with_valid_super_token

    self = <test_test_results_view.TestResultsViewsetTests testMethod=test_result_with_valid_super_token>
    repository_artifact_permissions_has_permission = <MagicMock name='has_permission' id='140063576179104'>

    @override_settings(SUPER_API_TOKEN="testaxs3o76rdcdpfzexuccx3uatui2nw73r")
    @patch("api.shared.permissions.RepositoryArtifactPermissions.has_permission")
    def test_result_with_valid_super_token(
    self, repository_artifact_permissions_has_permission
    ):
    repository_artifact_permissions_has_permission.return_value = False
    res = self.client.get(
    reverse(
    "api-v2-tests-results-detail",
    kwargs={
    "service": self.org.service,
    "owner_username": self.org.username,
    "repo_name": self.repo.name,
    "pk": self.test_instances[0].pk,
    },
    ),
    HTTP_AUTHORIZATION="Bearer testaxs3o76rdcdpfzexuccx3uatui2nw73r",
    )
    assert res.status_code == 200
    assert res.json() == {
    "id": self.test_instances[0].id,
    "name": self.test_instances[0].test.name,
    "test_id": self.test_instances[0].test_id,
    "failure_message": self.test_instances[0].failure_message,
    "duration_seconds": self.test_instances[0].duration_seconds,
    "commitid": self.test_instances[0].commitid,
    "outcome": self.test_instances[0].outcome,
    "branch": self.test_instances[0].branch,
    "repoid": self.test_instances[0].repoid,
    "commits_where_fail": self.test_instances[0].test.commits_where_fail,
    > }
    E AttributeError: 'Test' object has no attribute 'commits_where_fail'

    .../v2/tests/test_test_results_view.py:241: AttributeError
  • Class name: api.public.v2.tests.test_test_results_view.TestResultsViewsetTests
    Test name: test_retrieve

    self = <test_test_results_view.TestResultsViewsetTests testMethod=test_retrieve>
    get_repo_permissions = <MagicMock name='get_repo_permissions' id='140062866114944'>

    @patch("api.shared.repo.repository_accessors.RepoAccessors.get_repo_permissions")
    def test_retrieve(self, get_repo_permissions):
    get_repo_permissions.return_value = (True, True)
    res = self.client.get(
    reverse(
    "api-v2-tests-results-detail",
    kwargs={
    "service": self.org.service,
    "owner_username": self.org.username,
    "repo_name": self.repo.name,
    "pk": self.test_instances[0].pk,
    },
    )
    )
    assert res.status_code == status.HTTP_200_OK
    assert res.json() == {
    "id": self.test_instances[0].id,
    "name": self.test_instances[0].test.name,
    "test_id": self.test_instances[0].test_id,
    "failure_message": self.test_instances[0].failure_message,
    "duration_seconds": self.test_instances[0].duration_seconds,
    "commitid": self.test_instances[0].commitid,
    "outcome": self.test_instances[0].outcome,
    "branch": self.test_instances[0].branch,
    "repoid": self.test_instances[0].repoid,
    "commits_where_fail": self.test_instances[0].test.commits_where_fail,
    > }
    E AttributeError: 'Test' object has no attribute 'commits_where_fail'

    .../v2/tests/test_test_results_view.py:139: AttributeError

Copy link
Contributor

This PR includes changes to shared. Please review them here: codecov/shared@4e27927...cde9892

@tony-codecov tony-codecov requested review from michelletran-codecov and a team October 24, 2024 19:09
Copy link
Contributor

@suejung-sentry suejung-sentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple spots we could add comments, but functionality makes sense to me!

requirements.in Outdated Show resolved Hide resolved
@@ -51,6 +50,17 @@
buckets=[0.05, 0.1, 0.25, 0.5, 0.75, 1, 2, 5, 10, 30],
)

GQL_REQUEST_MADE_COUNTER = Counter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't quite tell - what's the intended difference between this one (GQL_REQUEST_MADE_COUNTER) and GQL_HIT_COUNTER in above (line 34)? Any opportunities to combine?

Also a nit but it seems like we use "Total" in the description when it's a Histogram not Counter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added GQL_REQUEST_MADE_COUNTER so that the post function in AsyncGraphqlView class can increment this metric with the label path=req_path, and the other metric GQL_HIT_COUNTER is called when the actual gql API is called through the request_started extension: https://ariadnegraphql.org/docs/extensions. If we want to combine this, then we would have to give req_path to the QueryMetricsExtension in some way, I was thinking of setting a self.path attribute to the AsyncGraphqlView class but since it's handling many async requests I don't think that would work. Do you know of a way to implement this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe path is available in the context argument like info.context["request"].url.path, if that's what you mean?

If this will be a duplicate of the information in GQL_HIT_COUNTER then it seems like removing/combining makes sense. But since it sounds like it's meant to be another step in the lifecycle of a request, we can use it for debugging, for example comparing numbers against the GQL_REQUEST_MADE_COUNTER. Actually I like the location of the GQL_REQUEST_MADE_COUNTER better than the other one that already existed, so we can just leave it! I saw you were just converting the sentry one to prometheus anyway so we can just proceed.

GQL_ERROR_TYPE_COUNTER = Counter(
"api_gql_errors",
"Number of times API GQL endpoint failed with an exception by type",
["error_type", "path"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include a comment on when to use existing GQL_ERROR_COUNTER vs. new GQL_ERROR_TYPE_COUNTER for a particular use case? Or is this something we could potentially combine?

"position",
"upload_version",
],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like this pattern of pulling it out into a separate file!
I see the other spots in this PR are following the existing pattern of keeping it in the views.py file because it would be a bigger lift to move those around, but this does seem like a nicer system going forward where possible!

action,
request,
is_shelter_request,
endpoint: Optional[str] = None,
repository: Optional[Repository] = None,
position: Optional[str] = None,
upload_version: Optional[str] = None,
fill_labels: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A name like fill_optional_labels or include_empty_labels may be more descriptive to someone who wants to decide what value to pass here.
Also we may include a comment by the optional_fields dict on what was discussed in the convo below so someone doesn't inadvertently remove one of these optional fields and it breaks a metric somewhere else due to the missing label

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this would be much clearer. I've opted with using include_empty_labels.

@tony-codecov tony-codecov added this pull request to the merge queue Oct 30, 2024
Merged via the queue into main with commit 9beb3f2 Oct 30, 2024
18 of 19 checks passed
@tony-codecov tony-codecov deleted the tony/prometheus-metrics branch October 30, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants