Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkable: send state notifications after suppression if and only if the state differs compared to before the suppression started #9285

Merged
merged 3 commits into from
Mar 29, 2022

Conversation

julianbrost
Copy link
Contributor

@julianbrost julianbrost commented Mar 9, 2022

Backport of #9207

Also ran my test script from there on this version and looks good:

cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Sending 'Problem' notification 'github-9207!problm-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Sending 'Recovery' notification 'github-9207!rec-1!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Sending 'Recovery' notification 'github-9207!rec-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Sending 'Problem' notification 'github-9207!problm-2!dummy-service-notification' for user 'dummy'
Full output
[2022-03-09T17:12:20+01:00] Ensure clean initial state
Network cluster_default  Creating
Network cluster_default  Created
Container cluster-master-1-1  Creating
Container cluster-master-2-1  Creating
Container cluster-master-2-1  Created
Container cluster-master-1-1  Created
Container cluster-master-2-1  Starting
Container cluster-master-1-1  Starting
Container cluster-master-2-1  Started
Container cluster-master-1-1  Started
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!none-1' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!none-2' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!rec-1' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!rec-2' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!problm-1' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!problm-2' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-2'."}]}
Container cluster-master-2-1  Stopping
Container cluster-master-2-1  Stopping
Container cluster-master-1-1  Stopping
Container cluster-master-1-1  Stopping
Container cluster-master-2-1  Stopped
Container cluster-master-2-1  Removing
Container cluster-master-2-1  Removed
Container cluster-master-1-1  Stopped
Container cluster-master-1-1  Removing
Container cluster-master-1-1  Removed
Network cluster_default  Removing
Network cluster_default  Removed
[2022-03-09T17:14:31+01:00] Starting master-1
Network cluster_default  Creating
Network cluster_default  Created
Container cluster-master-1-1  Creating
Container cluster-master-1-1  Created
Container cluster-master-1-1  Starting
Container cluster-master-1-1  Started
[2022-03-09T17:15:32+01:00] master-1 version
v2.13.2-73-gccb18a04e
[2022-03-09T17:15:32+01:00] Ensure all services are CRITICAL
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-2'."}]}
[2022-03-09T17:15:38+01:00] Schedule downtimes for all services
{"results":[{"code":200,"legacy_id":13,"name":"github-9207!none-1!ea4097d4-6d0d-43da-bc27-0197bb315f21","status":"Successfully scheduled downtime 'github-9207!none-1!ea4097d4-6d0d-43da-bc27-0197bb315f21' for object 'github-9207!none-1'."}]}
{"results":[{"code":200,"legacy_id":14,"name":"github-9207!none-2!ad18bade-2551-4895-a5c7-eb97c0f09b08","status":"Successfully scheduled downtime 'github-9207!none-2!ad18bade-2551-4895-a5c7-eb97c0f09b08' for object 'github-9207!none-2'."}]}
{"results":[{"code":200,"legacy_id":15,"name":"github-9207!rec-1!6e85f88c-abdf-433f-88e8-67855e92f8dd","status":"Successfully scheduled downtime 'github-9207!rec-1!6e85f88c-abdf-433f-88e8-67855e92f8dd' for object 'github-9207!rec-1'."}]}
{"results":[{"code":200,"legacy_id":16,"name":"github-9207!rec-2!a40baac8-eeab-48c6-8c29-1ae7a7f715ad","status":"Successfully scheduled downtime 'github-9207!rec-2!a40baac8-eeab-48c6-8c29-1ae7a7f715ad' for object 'github-9207!rec-2'."}]}
{"results":[{"code":200,"legacy_id":17,"name":"github-9207!problm-1!1976c770-e9f2-44c7-ab46-c070c4a848aa","status":"Successfully scheduled downtime 'github-9207!problm-1!1976c770-e9f2-44c7-ab46-c070c4a848aa' for object 'github-9207!problm-1'."}]}
{"results":[{"code":200,"legacy_id":18,"name":"github-9207!problm-2!a8c2a018-dbd6-4fac-9c89-398066df933f","status":"Successfully scheduled downtime 'github-9207!problm-2!a8c2a018-dbd6-4fac-9c89-398066df933f' for object 'github-9207!problm-2'."}]}
[2022-03-09T17:15:44+01:00] Make all services WARNING
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-2'."}]}
[2022-03-09T17:15:50+01:00] Bring services *-1 into final states
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-1'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-1'."}]}
[2022-03-09T17:15:56+01:00] Cancel downtimes for services *-1
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!none-1' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!rec-1' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!problm-1' and 0 child downtimes."}]}
[2022-03-09T17:16:56+01:00] Notification logs from master-1
cluster-master-1-1  | [2022-03-09 17:15:38 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!none-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:38 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!none-1!dummy-service-notification' for checkable 'github-9207!none-1' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!none-2!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!none-2!dummy-service-notification' for checkable 'github-9207!none-2' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!rec-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!rec-1!dummy-service-notification' for checkable 'github-9207!rec-1' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!rec-2!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!rec-2!dummy-service-notification' for checkable 'github-9207!rec-2' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!problm-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!problm-1!dummy-service-notification' for checkable 'github-9207!problm-1' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!problm-2!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:39 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!problm-2!dummy-service-notification' for checkable 'github-9207!problm-2' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:56 +0100] information/Notification: Sending 'DowntimeEnd' notification 'github-9207!none-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:56 +0100] information/Notification: Completed sending 'DowntimeEnd' notification 'github-9207!none-1!dummy-service-notification' for checkable 'github-9207!none-1' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:56 +0100] information/Notification: Sending 'DowntimeEnd' notification 'github-9207!rec-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:56 +0100] information/Notification: Completed sending 'DowntimeEnd' notification 'github-9207!rec-1!dummy-service-notification' for checkable 'github-9207!rec-1' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:56 +0100] information/Notification: Sending 'DowntimeEnd' notification 'github-9207!problm-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:56 +0100] information/Notification: Completed sending 'DowntimeEnd' notification 'github-9207!problm-1!dummy-service-notification' for checkable 'github-9207!problm-1' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Sending 'Problem' notification 'github-9207!problm-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Sending 'Recovery' notification 'github-9207!rec-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Completed sending 'Problem' notification 'github-9207!problm-1!dummy-service-notification' for checkable 'github-9207!problm-1' and user 'dummy' using command 'dummy'.
cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Completed sending 'Recovery' notification 'github-9207!rec-1!dummy-service-notification' for checkable 'github-9207!rec-1' and user 'dummy' using command 'dummy'.
[2022-03-09T17:16:56+01:00] Notification logs from master-1 (filtered)
cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Sending 'Problem' notification 'github-9207!problm-1!dummy-service-notification' for user 'dummy'
cluster-master-1-1  | [2022-03-09 17:15:59 +0100] information/Notification: Sending 'Recovery' notification 'github-9207!rec-1!dummy-service-notification' for user 'dummy'
[2022-03-09T17:16:56+01:00] Starting master-2
Container cluster-master-2-1  Creating
Container cluster-master-2-1  Created
Container cluster-master-2-1  Starting
Container cluster-master-2-1  Started
[2022-03-09T17:17:57+01:00] master-2 version
v2.13.2-73-gccb18a04e
[2022-03-09T17:17:57+01:00] Stopping master-1
Container cluster-master-1-1  Stopping
Container cluster-master-1-1  Stopped
[2022-03-09T17:19:00+01:00] Bring services *-2 into final states
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!none-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!rec-2'."}]}
{"results":[{"code":200,"status":"Successfully processed check result for object 'github-9207!problm-2'."}]}
[2022-03-09T17:19:05+01:00] Cancel downtimes for services *-2
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!none-2' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!rec-2' and 0 child downtimes."}]}
{"results":[{"code":200,"status":"Successfully removed all downtimes for object 'github-9207!problm-2' and 0 child downtimes."}]}
[2022-03-09T17:20:06+01:00] Notification logs from master-2
cluster-master-2-1  | [2022-03-09 17:17:02 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!none-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:17:02 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!none-2!dummy-service-notification' for checkable 'github-9207!none-2' and user 'dummy' using command 'dummy'.
cluster-master-2-1  | [2022-03-09 17:17:02 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!rec-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:17:02 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!rec-2!dummy-service-notification' for checkable 'github-9207!rec-2' and user 'dummy' using command 'dummy'.
cluster-master-2-1  | [2022-03-09 17:17:02 +0100] information/Notification: Sending 'DowntimeStart' notification 'github-9207!problm-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:17:02 +0100] information/Notification: Completed sending 'DowntimeStart' notification 'github-9207!problm-2!dummy-service-notification' for checkable 'github-9207!problm-2' and user 'dummy' using command 'dummy'.
cluster-master-2-1  | [2022-03-09 17:19:06 +0100] information/Notification: Sending 'DowntimeEnd' notification 'github-9207!none-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:06 +0100] information/Notification: Completed sending 'DowntimeEnd' notification 'github-9207!none-2!dummy-service-notification' for checkable 'github-9207!none-2' and user 'dummy' using command 'dummy'.
cluster-master-2-1  | [2022-03-09 17:19:06 +0100] information/Notification: Sending 'DowntimeEnd' notification 'github-9207!rec-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:06 +0100] information/Notification: Completed sending 'DowntimeEnd' notification 'github-9207!rec-2!dummy-service-notification' for checkable 'github-9207!rec-2' and user 'dummy' using command 'dummy'.
cluster-master-2-1  | [2022-03-09 17:19:06 +0100] information/Notification: Sending 'DowntimeEnd' notification 'github-9207!problm-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:06 +0100] information/Notification: Completed sending 'DowntimeEnd' notification 'github-9207!problm-2!dummy-service-notification' for checkable 'github-9207!problm-2' and user 'dummy' using command 'dummy'.
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Sending 'Recovery' notification 'github-9207!rec-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Sending 'Problem' notification 'github-9207!problm-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Completed sending 'Recovery' notification 'github-9207!rec-2!dummy-service-notification' for checkable 'github-9207!rec-2' and user 'dummy' using command 'dummy'.
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Completed sending 'Problem' notification 'github-9207!problm-2!dummy-service-notification' for checkable 'github-9207!problm-2' and user 'dummy' using command 'dummy'.
[2022-03-09T17:20:06+01:00] Notification logs from master-2 (filtered)
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Sending 'Recovery' notification 'github-9207!rec-2!dummy-service-notification' for user 'dummy'
cluster-master-2-1  | [2022-03-09 17:19:09 +0100] information/Notification: Sending 'Problem' notification 'github-9207!problm-2!dummy-service-notification' for user 'dummy'
Container cluster-master-2-1  Stopping
Container cluster-master-2-1  Stopping
Container cluster-master-1-1  Stopping
Container cluster-master-1-1  Stopping
Container cluster-master-1-1  Stopped
Container cluster-master-1-1  Removing
Container cluster-master-1-1  Removed
Container cluster-master-2-1  Stopped
Container cluster-master-2-1  Removing
Container cluster-master-2-1  Removed
Network cluster_default  Removing
Network cluster_default  Removed

So if the unit tests are happy, I'm also happy.

This commit changes the Checkable notification suppression logic (notifications
are currently suppressed on the Checkable if it is unreachable, in a downtime,
or acknowledged) to that after the suppression reason ends, a state
notification is sent if and only if the first hard state after is different
from the last hard state from before. If the checkable is in a soft state after
the suppression ends, the notification is further suppressed until a hard state
is reached.

To achieve this behavior, a new attribute state_before_suppression is added to
Checkable. This attribute is set to the last hard state the first time either a
PROBLEM or a RECOVERY notification is suppressed. Compared to from before,
neither of these two flags in the suppressed_notification will ever be cleared
while the supression is still ongoing but only after the suppression ended and
the current state is compared with the old state stored in
state_before_suppression.
This ensures that in case of a failover in an HA zone, the other can take over
properly and has the required state to send the proper notifications.
@cla-bot cla-bot bot added the cla/signed label Mar 9, 2022
@icinga-probot icinga-probot bot added this to the 2.13.3 milestone Mar 9, 2022
@icinga-probot icinga-probot bot added bug Something isn't working ref/IP ref/NC labels Mar 9, 2022
@julianbrost julianbrost marked this pull request as ready for review March 9, 2022 16:23
@julianbrost julianbrost requested a review from Al2Klimov March 29, 2022 11:58
@julianbrost julianbrost merged commit f67a553 into support/2.13 Mar 29, 2022
@icinga-probot icinga-probot bot deleted the bugfix/suppressed-state-notifications-2.13 branch March 29, 2022 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants