-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: Fix etcdHighNumberOfFailedGRPCRequests rules return NaN value #10629
Conversation
@qingwave thanks for the PR. Please make sure you have commit title in this format |
Codecov Report
@@ Coverage Diff @@
## master #10629 +/- ##
==========================================
- Coverage 71.69% 71.36% -0.33%
==========================================
Files 393 393
Lines 36627 36628 +1
==========================================
- Hits 26258 26140 -118
- Misses 8532 8648 +116
- Partials 1837 1840 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qingwave you need to update the commit title. The build failure gives this error - e932113 fix etcdHighNumberOfFailedGRPCRequests rules... Expected commit title format '<package>{", "<package>}: <description>' Got: e932113 fix etcdHighNumberOfFailedGRPCRequests rules
Other than that per the output of the fix you have provided the changes looks good to me. Thanks!
@spzala okay, the commit title modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qingwave this should be changed as well https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/etcd3_alert.rules#L41 Also I might be missing something but the example diagram you have provided for NaN seems not have the exact expression as what's in the current rules? Thanks!
Thanks, that has fixed the build. |
/cc @xiang90 |
@spzala As your said, the fixed rules cannot return a exact expression, only 0 or 1. In prometheus, there is not a gracefully method to avoid divide by zero. We need to check for zero in divider and replace NaN with zero, as follows,
Is it a good way? |
@qingwave thanks and in that case it sounds good to me. Let's have @hexfusion or @xiang90 take a look too. Also, as I commented earlier the etcd3_alert.rules should be updated accordingly as well. Thanks! |
I'm not sure this is the right fix. As far as I can tell the reason this is happening is because we're dividing an empty vector by a scalar, for which not a number is indeed correct. What exactly needs to be fixed here? Are alerts firing that shouldn't be firing? |
We're seeing this alert firing constantly, should this rule be removed entirely? |
@servo1x for what it's worth it's disabled in openshift (v4.1.0+) until we resolve this. |
what changed?
Rules etcdHighNumberOfFailedGRPCRequests in etcd3_alert.rules.yml
why this change was made?
The original rules will return NaN value of etcdHighNumberOfFailedGRPCRequests, but it should be a zero value. as image
The denominator sometimes is zero,and return a NaN value of rules. Replace division with comparison, as image