Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(alarm): add minSampleCountToEvaluateDatapoint #453

Merged
merged 2 commits into from
Nov 10, 2023

Conversation

miloszwatroba
Copy link
Contributor

@miloszwatroba miloszwatroba commented Nov 9, 2023

Fixes #452

Currently, when using minMetricSamplesToAlarm the number of samples is evaluated for a different period than the main alarm. This makes monitoring sensitive to false positives as not every breaching datapoint must have sufficient number of samples (see #452 for more details).

Moreover, the current approach for adjusting alarms to respect minMetricSamplesToAlarm is to create 2 extra alarms - one for NoSamples and one for a top-level composite. Each of these monitors incurs extra costs ($0.10 for NoSamples monitor and $0.50 for the Composite, see https://aws.amazon.com/cloudwatch/pricing/ for reference). This means that using minMetricSamplesToAlarm increases the cost from $0.10 per alarm to $0.70 per alarm ($0.60 of overhead!).

It's possible to use Math Expression instead. Instead of adding separate alarm for NoSamples, we can model it a Sample Count metric, and instead of the Composite, we can use the MathExpression that conditionally emits the data based on the number of samples. The charge for Math Expression-based alarms is per metric in the Math Expression, so that comes down to $0.20 per alarm. That's a 70% cost improvement. Additionally, it reduces the overall number of alarms, effectively making it easier to fit your alarming in the CloudWatch quota and decluttering the UI.

To avoid breaking any customers that rely on minMetricSamplesToAlarm generating alarms (e.g. #403), deprecating it and adding minSampleCountToEvaluateDatapoint with updated behaviour next to it.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@miloszwatroba miloszwatroba changed the title fix(alarm): ensure each datapoint has sufficient number of samples for minMetricSamplesToAlarm feat(alarm): add minSampleCountToEvaluateDatapoint Nov 10, 2023
// create primary alarm

const primaryAlarm = adjustedMetric.createAlarm(
this.alarmScope,
const primaryAlarm = alarmMetric.createAlarm(this.alarmScope, alarmName, {
Copy link
Contributor Author

@miloszwatroba miloszwatroba Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjustedMetric changed to alarmMetric; the rest was automatically reformatted by yarn build without any changes

@miloszwatroba miloszwatroba marked this pull request as ready for review November 10, 2023 10:50
Copy link

@xendo xendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, two nitpicks

lib/common/alarm/AlarmFactory.ts Outdated Show resolved Hide resolved
lib/common/alarm/AlarmFactory.ts Outdated Show resolved Hide resolved
Comment on lines +576 to +578
"minSampleCountToEvaluateDatapoint is not supported for MathExpressions. " +
"If you already use MathExpression, you can extend your expression to evaluate " +
"the sample count using IF statement, e.g. IF(sampleCount > X, mathExpression)."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth abstracting that at some point anyway just for convenience?

@echeung-amzn echeung-amzn merged commit 44fbbbb into cdklabs:main Nov 10, 2023
9 checks passed
echeung-amzn pushed a commit that referenced this pull request Nov 16, 2023
#458)

Fixes #452

Follow up to
#453:

* (feat) Exposing the new `minSampleCountToEvaluateDatapoint` through
CustomMonitoring
* (fix) Fixing the `minSampleCountToEvaluateDatapoint` MathExpression's
period as it defaults to 5 minutes. This didn't come up during testing
as I tested it using 5 minute period. Apparently, if we don't set the
period on MathExpression explicitly, it overrides all child metrics to 5
minute,
[reference](https://github.com/aws/aws-cdk/blob/db21fefc2dc76eb4ff306fa41652ab6a6cc95e42/packages/aws-cdk-lib/aws-cloudwatch/lib/metric.ts#L606).
To avoid similar situations in the future, extended the unit test to
cover custom periods.

---

_By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache-2.0 license_
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

minMetricSamplesToAlarm uses incorrect evaluation period
3 participants