Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downtime on a removed object are never closed. #10303

Open
w1ll-i-code opened this issue Jan 15, 2025 · 5 comments · May be fixed by #10311
Open

Downtime on a removed object are never closed. #10303

w1ll-i-code opened this issue Jan 15, 2025 · 5 comments · May be fixed by #10311

Comments

@w1ll-i-code
Copy link

Describe the bug

If a object with a Downtime gets disabled (even just temporary) the end of the associated Downtime is never written out to the IDO / IcingaDB.

To Reproduce

  1. Create a host in the director and deploy it.
  2. Create a downtime on the host
  3. Use the director to roll back to an older version
    4.Redeploy the new version

Expected behavior

I would expect the Downtime to be terminated once the object is deactivated (The actual_end_time set to the current time). But since the downtime is dropped without ever setting this field, the object looks in the reports as if it where in a constant downtime. which does not correspond to the internal state of icinga2.

Screenshots

image

@w1ll-i-code
Copy link
Author

Here is my proposed solution: Whenever a object gets removed, all the currently active downtimes get closed as well.

@w1ll-i-code
Copy link
Author

I am willing to implement the change myself, but I'd like to coordinate with you first, so my proposed solution is the right approach. Since the downtimes are dropped afterwards from the icinga2.state file, this seems like the most reasonable solution to me. I'd prefer it the downtimes would persist through the deploys, but that'd be a more invasive change I don't feel comfortable with implementing myself.

@yhabteab
Copy link
Member

I would expect the Downtime to be terminated once the object is deactivated (The actual_end_time set to the current time)

There is no such thing as deactivate downtime when a new version of the configuration is deployed via Icinga Director. When the host the downtimes belong to does not exist in the newly deployed configuration, then the downtimes become dangling objects that Icinga 2 cannot map to their respective host/service, and they will not even survive the config validation. However, since they are created with the ignore_on_errror flag, they will not stop Icinga 2 from loading the other configurations and once Icinga 2 is done loading/validating the other configuration, it will simply erase them from disk.

Here is my proposed solution: Whenever a object gets removed, all the currently active downtimes get closed as well.

If you don't mind wasting time on something that can't be fixed, then go ahead, but bear in mind that this is simply impossible to fix right now. Once the corresponding downtime host/service object is gone, the downtime object itself becomes pretty much useless and is not even a valid object anymore. If you don't want such strange history views, I suggest to manually clear the downtimes before removing the host/service object via Icinga Director.

@w1ll-i-code
Copy link
Author

If you don't mind wasting time on something that can't be fixed, then go ahead, but bear in mind that this is simply impossible to fix right now.

I already wasted that time and I already implemented my solution. It seems to work for mariadb/mysql, but I need to test it for pgsql and icingadb as well. But I'll probably have to do a second take to make it completely correct.

it will simply erase them from disk.

I am well aware of that. That's the problem we are currently facing. It happens often, but randomly enough that cleaning it up manually for all objects that may be affected by it is not feasible. Mostly we notice that once the SLA uptime report is generated and a host is completely out of bounds, as the downtime was not handled correctly. If we trigger a OnDowntimeRemoved before it gets erased from disk, that solution already works for us.

@w1ll-i-code
Copy link
Author

The logic I am thinking of is this:

  1. The configuration for the object gets removed, it is no longer active.
  2. The object still exists in the icinga2.state file together with the downtime.
  3. The config gets loaded and the object gets set to inactive.
  4. The inactive object gets synced to the IDO
    1. Here I propose to also trigger the OnDowntimeRemoved hook for each downtime associated with the host.
  5. The host and downtime are now inactive and will not get synced to disk in the icinga2.state file anymore. (Or just the host, not sure, but the effect is the same.)

Lmk if I have any holes in my understanding here, but from what I can observe rn, this is whats happening.

@w1ll-i-code w1ll-i-code linked a pull request Jan 21, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants