-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(alarm): add minSampleCountToEvaluateDatapoint #453
Conversation
58a1cc2
to
917eee0
Compare
// create primary alarm | ||
|
||
const primaryAlarm = adjustedMetric.createAlarm( | ||
this.alarmScope, | ||
const primaryAlarm = alarmMetric.createAlarm(this.alarmScope, alarmName, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjustedMetric changed to alarmMetric; the rest was automatically reformatted by yarn build
without any changes
917eee0
to
ccb010b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, two nitpicks
"minSampleCountToEvaluateDatapoint is not supported for MathExpressions. " + | ||
"If you already use MathExpression, you can extend your expression to evaluate " + | ||
"the sample count using IF statement, e.g. IF(sampleCount > X, mathExpression)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth abstracting that at some point anyway just for convenience?
#458) Fixes #452 Follow up to #453: * (feat) Exposing the new `minSampleCountToEvaluateDatapoint` through CustomMonitoring * (fix) Fixing the `minSampleCountToEvaluateDatapoint` MathExpression's period as it defaults to 5 minutes. This didn't come up during testing as I tested it using 5 minute period. Apparently, if we don't set the period on MathExpression explicitly, it overrides all child metrics to 5 minute, [reference](https://github.com/aws/aws-cdk/blob/db21fefc2dc76eb4ff306fa41652ab6a6cc95e42/packages/aws-cdk-lib/aws-cloudwatch/lib/metric.ts#L606). To avoid similar situations in the future, extended the unit test to cover custom periods. --- _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_
Fixes #452
Currently, when using
minMetricSamplesToAlarm
the number of samples is evaluated for a different period than the main alarm. This makes monitoring sensitive to false positives as not every breaching datapoint must have sufficient number of samples (see #452 for more details).Moreover, the current approach for adjusting alarms to respect
minMetricSamplesToAlarm
is to create 2 extra alarms - one forNoSamples
and one for a top-level composite. Each of these monitors incurs extra costs ($0.10 forNoSamples
monitor and $0.50 for the Composite, see https://aws.amazon.com/cloudwatch/pricing/ for reference). This means that usingminMetricSamplesToAlarm
increases the cost from $0.10 per alarm to $0.70 per alarm ($0.60 of overhead!).It's possible to use Math Expression instead. Instead of adding separate alarm for
NoSamples
, we can model it a Sample Count metric, and instead of the Composite, we can use the MathExpression that conditionally emits the data based on the number of samples. The charge for Math Expression-based alarms is per metric in the Math Expression, so that comes down to $0.20 per alarm. That's a 70% cost improvement. Additionally, it reduces the overall number of alarms, effectively making it easier to fit your alarming in the CloudWatch quota and decluttering the UI.To avoid breaking any customers that rely on
minMetricSamplesToAlarm
generating alarms (e.g. #403), deprecating it and addingminSampleCountToEvaluateDatapoint
with updated behaviour next to it.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license