-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checkable: send state notifications after suppression if and only if the state differs compared to before the suppression started #9207
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -479,7 +479,12 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig | |
|
||
if (send_notification && !is_flapping) { | ||
if (!IsPaused()) { | ||
if (suppress_notification) { | ||
/* If there are still some pending suppressed state notification, keep the suppression until these are | ||
* handled by Checkable::FireSuppressedNotifications(). | ||
*/ | ||
bool pending = GetSuppressedNotifications() & (NotificationRecovery|NotificationProblem); | ||
|
||
if (suppress_notification || pending) { | ||
suppressed_types |= (recovery ? NotificationRecovery : NotificationProblem); | ||
} else { | ||
OnNotificationsRequested(this, recovery ? NotificationRecovery : NotificationProblem, cr, "", "", nullptr); | ||
|
@@ -496,12 +501,21 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig | |
int suppressed_types_before (GetSuppressedNotifications()); | ||
int suppressed_types_after (suppressed_types_before | suppressed_types); | ||
|
||
for (int conflict : {NotificationProblem | NotificationRecovery, NotificationFlappingStart | NotificationFlappingEnd}) { | ||
/* E.g. problem and recovery notifications neutralize each other. */ | ||
const int conflict = NotificationFlappingStart | NotificationFlappingEnd; | ||
if ((suppressed_types_after & conflict) == conflict) { | ||
/* Flapping start and end cancel out each other. */ | ||
suppressed_types_after &= ~conflict; | ||
} | ||
|
||
if ((suppressed_types_after & conflict) == conflict) { | ||
suppressed_types_after &= ~conflict; | ||
} | ||
const int stateNotifications = NotificationRecovery | NotificationProblem; | ||
if (!(suppressed_types_before & stateNotifications) && (suppressed_types & stateNotifications)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, now problem|rec. don’t neutralise each other here but – I guess – a timer and the changed NotificationReasonApplies do this job. |
||
/* A state-related notification is suppressed for the first time, store the previous state. When | ||
* notifications are no longer suppressed, this can be compared with the current state to determine | ||
* if a notification must be sent. This is done differently compared to flapping notifications just above | ||
* as for state notifications, problem and recovery don't always cancel each other. For example, | ||
* WARNING -> OK -> CRITICAL generates both types once, but there should still be a notification. | ||
*/ | ||
SetStateBeforeSuppression(old_stateType == StateTypeHard ? old_state : ServiceOK); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what's the ?: for, but it definitely doesn’t harm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need the last state before the current state that was a hard state. The TODO is there because I didn't find an existing variable or function that gives me this exact value, but I might have missed something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Upon closer inspection, I think |
||
} | ||
|
||
if (suppressed_types_after != suppressed_types_before) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,34 +129,34 @@ void Checkable::UnregisterNotification(const Notification::Ptr& notification) | |
m_Notifications.erase(notification); | ||
} | ||
|
||
static void FireSuppressedNotifications(Checkable* checkable) | ||
void Checkable::FireSuppressedNotifications() | ||
{ | ||
if (!checkable->IsActive()) | ||
if (!IsActive()) | ||
return; | ||
|
||
if (checkable->IsPaused()) | ||
if (IsPaused()) | ||
return; | ||
|
||
if (!checkable->GetEnableNotifications()) | ||
if (!GetEnableNotifications()) | ||
return; | ||
|
||
int suppressed_types (checkable->GetSuppressedNotifications()); | ||
int suppressed_types (GetSuppressedNotifications()); | ||
if (!suppressed_types) | ||
return; | ||
|
||
int subtract = 0; | ||
|
||
{ | ||
LazyInit<bool> wasLastParentRecoveryRecent ([&checkable]() { | ||
auto cr (checkable->GetLastCheckResult()); | ||
LazyInit<bool> wasLastParentRecoveryRecent ([this]() { | ||
auto cr (GetLastCheckResult()); | ||
|
||
if (!cr) { | ||
return true; | ||
} | ||
|
||
auto threshold (cr->GetExecutionStart()); | ||
|
||
for (auto& dep : checkable->GetDependencies()) { | ||
for (auto& dep : GetDependencies()) { | ||
auto parent (dep->GetParent()); | ||
ObjectLock oLock (parent); | ||
|
||
|
@@ -168,13 +168,44 @@ static void FireSuppressedNotifications(Checkable* checkable) | |
return false; | ||
}); | ||
|
||
for (auto type : {NotificationProblem, NotificationRecovery, NotificationFlappingStart, NotificationFlappingEnd}) { | ||
if (suppressed_types & (NotificationProblem|NotificationRecovery)) { | ||
CheckResult::Ptr cr = GetLastCheckResult(); | ||
NotificationType type = cr && IsStateOK(cr->GetState()) ? NotificationRecovery : NotificationProblem; | ||
bool state_suppressed = NotificationReasonSuppressed(NotificationProblem) || NotificationReasonSuppressed(NotificationRecovery); | ||
|
||
/* Only process (i.e. send or dismiss) suppressed state notifications if the following conditions are met: | ||
* | ||
* 1. State notifications are not suppressed at the moment. State notifications must only be removed from | ||
* the suppressed notifications bitset after the reason for the suppression is gone as these bits are | ||
* used as a marker for when to set the state_before_suppression attribute. | ||
* 2. The checkable is in a hard state. Soft states represent a state where we are not certain yet about | ||
* the actual state and wait with sending notifications. If we want to immediately send a notification, | ||
* we might send a recovery notification for something that just started failing or a problem | ||
* notification which might be for an intermittent problem that would have never received a | ||
* notification if there was no suppression as it still was in a soft state. Both cases aren't ideal so | ||
* better wait until we are certain. | ||
* 3. The checkable isn't likely checked soon. For example, if a downtime ended, give the checkable a | ||
* chance to recover afterwards before sending a notification. | ||
* 4. No parent recovered recently. Similar to the previous condition, give the checkable a chance to | ||
* recover after one of its dependencies recovered before sending a notification. | ||
* | ||
* If any of these conditions is not met, processing the suppressed notification is further delayed. | ||
*/ | ||
if (!state_suppressed && GetStateType() == StateTypeHard && !IsLikelyToBeCheckedSoon() && !wasLastParentRecoveryRecent.Get()) { | ||
if (NotificationReasonApplies(type)) { | ||
Checkable::OnNotificationsRequested(this, type, cr, "", "", nullptr); | ||
} | ||
subtract |= NotificationRecovery|NotificationProblem; | ||
} | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... and that suppresses them. Fine. |
||
for (auto type : {NotificationFlappingStart, NotificationFlappingEnd}) { | ||
if (suppressed_types & type) { | ||
bool still_applies = checkable->NotificationReasonApplies(type); | ||
bool still_applies = NotificationReasonApplies(type); | ||
|
||
if (still_applies) { | ||
if (!checkable->NotificationReasonSuppressed(type) && !checkable->IsLikelyToBeCheckedSoon() && !wasLastParentRecoveryRecent.Get()) { | ||
Checkable::OnNotificationsRequested(checkable, type, checkable->GetLastCheckResult(), "", "", nullptr); | ||
if (!NotificationReasonSuppressed(type) && !IsLikelyToBeCheckedSoon() && !wasLastParentRecoveryRecent.Get()) { | ||
Checkable::OnNotificationsRequested(this, type, GetLastCheckResult(), "", "", nullptr); | ||
|
||
subtract |= type; | ||
} | ||
|
@@ -186,28 +217,28 @@ static void FireSuppressedNotifications(Checkable* checkable) | |
} | ||
|
||
if (subtract) { | ||
ObjectLock olock (checkable); | ||
ObjectLock olock (this); | ||
|
||
int suppressed_types_before (checkable->GetSuppressedNotifications()); | ||
int suppressed_types_before (GetSuppressedNotifications()); | ||
int suppressed_types_after (suppressed_types_before & ~subtract); | ||
|
||
if (suppressed_types_after != suppressed_types_before) { | ||
checkable->SetSuppressedNotifications(suppressed_types_after); | ||
SetSuppressedNotifications(suppressed_types_after); | ||
} | ||
} | ||
} | ||
|
||
/** | ||
* Re-sends all notifications previously suppressed by e.g. downtimes if the notification reason still applies. | ||
*/ | ||
void Checkable::FireSuppressedNotifications(const Timer * const&) | ||
void Checkable::FireSuppressedNotificationsTimer(const Timer * const&) | ||
{ | ||
for (auto& host : ConfigType::GetObjectsByType<Host>()) { | ||
::FireSuppressedNotifications(host.get()); | ||
host->FireSuppressedNotifications(); | ||
} | ||
|
||
for (auto& service : ConfigType::GetObjectsByType<Service>()) { | ||
::FireSuppressedNotifications(service.get()); | ||
service->FireSuppressedNotifications(); | ||
} | ||
} | ||
|
||
|
@@ -224,12 +255,12 @@ bool Checkable::NotificationReasonApplies(NotificationType type) | |
case NotificationProblem: | ||
{ | ||
auto cr (GetLastCheckResult()); | ||
return cr && !IsStateOK(cr->GetState()) && GetStateType() == StateTypeHard; | ||
return cr && !IsStateOK(cr->GetState()) && cr->GetState() != GetStateBeforeSuppression(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This keeps soft states... |
||
} | ||
case NotificationRecovery: | ||
{ | ||
auto cr (GetLastCheckResult()); | ||
return cr && IsStateOK(cr->GetState()); | ||
return cr && IsStateOK(cr->GetState()) && cr->GetState() != GetStateBeforeSuppression(); | ||
Comment on lines
+258
to
+263
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This respects that there's not only one problem state, good. |
||
} | ||
case NotificationFlappingStart: | ||
return IsFlapping(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -175,6 +175,9 @@ abstract class Checkable : CustomVarObject | |
[state, no_user_view, no_user_modify] int suppressed_notifications { | ||
default {{{ return 0; }}} | ||
}; | ||
[state, enum, no_user_view, no_user_modify] ServiceState state_before_suppression { | ||
default {{{ return ServiceOK; }}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This choice feels definitely right (implies #7815 a bit). |
||
}; | ||
|
||
[config, navigation] name(Endpoint) command_endpoint (CommandEndpointRaw) { | ||
navigate {{{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, stared long enough on this line to give up all this for now. We can continue this together. On a whiteboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extended the PR description (the numbered list on top is new), maybe this helps to understand. This condition is the check if either of the two flags is set for the very first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And... if none are set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly do you mean? Set as in added to
suppressed_notifications
in general by this function? Or in some of the variables specifically?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean: to me this looks like intended as XOR, but implemented as NAND – one more trigger case. Or does the latter never happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's easier to read the conditions the other way around:
suppressed_types & stateNotifications
: This means that the check result would have triggered a state notification (i.e. either a problem or a recovery notification) if it wasn't suppressed and therefore is stored for later.!(suppressed_types_before & stateNotifications)
: This part checks that neitherNotificationProblem
norNotificationRecovery
were set before. So this is the very first time a state notification is generated during the current suppression period.With the new logic, it isn't really necessary to differentiate between the two types of state notifications. All the code would also work if only a single "NotificationState" flag was added to
suppressed_notifications
. It's mainly done this way as the bitset andNotificationType
enum already exist in their current form.For this PR, you could basically ignore states where only one of
NotificationProblem
/NotificationRecovery
is set and treat them as if both were set. The code inFireSuppressedNotifications
will figure out which one is appropriate in the end, if either.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I get it! (!BEFORE) && AFTER