feat(alarm): add minSampleCountToEvaluateDatapoint #453

miloszwatroba · 2023-11-09T15:16:46Z

Fixes #452

Currently, when using minMetricSamplesToAlarm the number of samples is evaluated for a different period than the main alarm. This makes monitoring sensitive to false positives as not every breaching datapoint must have sufficient number of samples (see #452 for more details).

Moreover, the current approach for adjusting alarms to respect minMetricSamplesToAlarm is to create 2 extra alarms - one for NoSamples and one for a top-level composite. Each of these monitors incurs extra costs ($0.10 for NoSamples monitor and $0.50 for the Composite, see https://aws.amazon.com/cloudwatch/pricing/ for reference). This means that using minMetricSamplesToAlarm increases the cost from $0.10 per alarm to $0.70 per alarm ($0.60 of overhead!).

It's possible to use Math Expression instead. Instead of adding separate alarm for NoSamples, we can model it a Sample Count metric, and instead of the Composite, we can use the MathExpression that conditionally emits the data based on the number of samples. The charge for Math Expression-based alarms is per metric in the Math Expression, so that comes down to $0.20 per alarm. That's a 70% cost improvement. Additionally, it reduces the overall number of alarms, effectively making it easier to fit your alarming in the CloudWatch quota and decluttering the UI.

To avoid breaking any customers that rely on minMetricSamplesToAlarm generating alarms (e.g. #403), deprecating it and adding minSampleCountToEvaluateDatapoint with updated behaviour next to it.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

miloszwatroba · 2023-11-10T10:45:39Z

lib/common/alarm/AlarmFactory.ts

    // create primary alarm

-    const primaryAlarm = adjustedMetric.createAlarm(
-      this.alarmScope,
+    const primaryAlarm = alarmMetric.createAlarm(this.alarmScope, alarmName, {


adjustedMetric changed to alarmMetric; the rest was automatically reformatted by yarn build without any changes

xendo

LGTM, two nitpicks

lib/common/alarm/AlarmFactory.ts

echeung-amzn · 2023-11-10T21:21:11Z

lib/common/alarm/AlarmFactory.ts

+          "minSampleCountToEvaluateDatapoint is not supported for MathExpressions. " +
+            "If you already use MathExpression, you can extend your expression to evaluate " +
+            "the sample count using IF statement, e.g. IF(sampleCount > X, mathExpression)."


I wonder if it's worth abstracting that at some point anyway just for convenience?

#458) Fixes #452 Follow up to #453: * (feat) Exposing the new `minSampleCountToEvaluateDatapoint` through CustomMonitoring * (fix) Fixing the `minSampleCountToEvaluateDatapoint` MathExpression's period as it defaults to 5 minutes. This didn't come up during testing as I tested it using 5 minute period. Apparently, if we don't set the period on MathExpression explicitly, it overrides all child metrics to 5 minute, [reference](https://github.com/aws/aws-cdk/blob/db21fefc2dc76eb4ff306fa41652ab6a6cc95e42/packages/aws-cdk-lib/aws-cloudwatch/lib/metric.ts#L606). To avoid similar situations in the future, extended the unit test to cover custom periods. --- _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_

miloszwatroba force-pushed the fix/min-samples branch from 58a1cc2 to 917eee0 Compare November 10, 2023 10:41

miloszwatroba changed the title ~~fix(alarm): ensure each datapoint has sufficient number of samples for minMetricSamplesToAlarm~~ feat(alarm): add minSampleCountToEvaluateDatapoint Nov 10, 2023

miloszwatroba commented Nov 10, 2023

View reviewed changes

feat(alarm): add minSampleCountToEvaluateDatapoint

ccb010b

miloszwatroba force-pushed the fix/min-samples branch from 917eee0 to ccb010b Compare November 10, 2023 10:47

miloszwatroba marked this pull request as ready for review November 10, 2023 10:50

xendo reviewed Nov 10, 2023

View reviewed changes

lib/common/alarm/AlarmFactory.ts Outdated Show resolved Hide resolved

lib/common/alarm/AlarmFactory.ts Outdated Show resolved Hide resolved

update unsupported MathExpression exception message

eabe6ed

echeung-amzn approved these changes Nov 10, 2023

View reviewed changes

echeung-amzn merged commit 44fbbbb into cdklabs:main Nov 10, 2023
9 checks passed

miloszwatroba mentioned this pull request Nov 15, 2023

feat(alarm): add minSampleCountToEvaluateDatapoint to CustomMonitoring #458

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(alarm): add minSampleCountToEvaluateDatapoint #453

feat(alarm): add minSampleCountToEvaluateDatapoint #453

miloszwatroba commented Nov 9, 2023 •

edited

Loading

miloszwatroba Nov 10, 2023 •

edited

Loading

xendo left a comment

echeung-amzn Nov 10, 2023

feat(alarm): add minSampleCountToEvaluateDatapoint #453

feat(alarm): add minSampleCountToEvaluateDatapoint #453

Conversation

miloszwatroba commented Nov 9, 2023 • edited Loading

miloszwatroba Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

xendo left a comment

Choose a reason for hiding this comment

echeung-amzn Nov 10, 2023

Choose a reason for hiding this comment

miloszwatroba commented Nov 9, 2023 •

edited

Loading

miloszwatroba Nov 10, 2023 •

edited

Loading