Report non-fatal errors/warning via the database for use in the Icinga Web health checks #608

julianbrost · 2023-06-21T13:10:11Z

Icinga DB Web already provides information for the Icinga DB daemon and Redis on the health page in Icinga Web. However, for the daemon, it's basically whether the daemon is running or not. It would be useful to provide a mechanism in the database to also report non-fatal errors to the user this way.

We already have a way to report errors to the use which is the icingadb check command and we probably don't want to duplicate it. I think the scope of errors/warnings reported by the new mechanism should be everything that might prevent a user from seeing an error reported by the icingadb check command inside Icinga DB Web. Non-exhaustive list of errors I can think of right now that would fit this description:

Object sync failed to process an object: this could mean that the icingadb check does not show up at all.
State sync failed to process an update or has a large backlog: an update for the icingadb check could have been skipped or is long-pending, meaning outdated information might be shown.
Runtime update failed to process an object or has a large backlog: theoretically could fit that description but in practice, it's unlikely that the icingadb check is created via a runtime update.

Note that such a mechanism would be very useful after #607 is implemented, so that important error condition are presented to the user and not just some log that in most cases nobody checks regularly.

Possible implementations

Add a flag to the icingadb_instance table that basically tells Icinga DB Web to instruct the user to check the logs. Probably the simplest possible but also most limited variant.
Add a free text field that can be set to any error message by Icinga DB and if that value is present, Icinga DB Web presents it to the user. This would provide maximum flexibility to Icinga DB in which errors it can show but means that Icinga DB Web doesn't really have a chance to properly translate this message. (However, the log messages the user would have to read in the end won't be translated either.)
Restrict ourselves to a fixed set of error conditions that are represented as some kind of error code in the database (numeric or text) for which Icinga DB Web can provide and translate individual messages. This would probably involve some kind of "unknown error, check logs" fallback that behaves similar to option (1).

During an internal discussion, we considered something like displaying a journalctl command that should filter the relevant log messages. For the flag/error code variants, there could be the need to convey extra information for this, either the whole command string or something like the value for a --since parameter.

Some kind of combination of the above might also be an option. Maybe we don't want a journalctl command but rather use a free text field to store the log messages directly (which could then be displayed on the health detail page). But that would need some extra consideration first, like can we easily catch the relevant log messages and could the contain information we might not want to just display in Web (i.e. the full log messages should probably not be visible to any user).

The text was updated successfully, but these errors were encountered:

julianbrost · 2023-06-21T13:16:00Z

@nilmerg @Al2Klimov I think you recently had a discussion on this topic. Did that yield anything that's missing above?

@nilmerg Maybe you can comment on that health check and potentially sensitive information part. Would that be something that can be handled by existing permissions in an unsurprising way (like is there an existing permission that would capture something like this so that without setting additional restrictions, nothing is revealed accidentally)? Or is this more like a "please don't do this" thing?

Al2Klimov · 2023-06-21T13:21:48Z

I opt for (1). Did you mention the sub-logger (like the already existing ones)? Would omit the need of --since IMAO btw. But wait, we shouldn’t need that if we filter for log level, would we?

julianbrost · 2023-06-21T13:38:07Z

I just wanted to say that just 1 bit of information might be too little to actually provide a helpful message.

There are many more open questions in regard to an implementation. I think one of the most fundamental ones is: Should it be possible to clear this flag without restarting the process? If not, it could be as simple as having a magic logger and the first time something is logged to it, it sets the flag. That would then give you the child logger and the option to filter for it basically for free.

nilmerg · 2023-06-21T13:57:07Z

The current health checks are not protected by anything. If you want to mention any object names or such information in error messages though, we should think about that again.

julianbrost added the enhancement New feature or request label Jun 21, 2023

julianbrost mentioned this issue Jul 11, 2023

Graceful handling of data from Redis that fails to be parsed #607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report non-fatal errors/warning via the database for use in the Icinga Web health checks #608

Report non-fatal errors/warning via the database for use in the Icinga Web health checks #608

julianbrost commented Jun 21, 2023

julianbrost commented Jun 21, 2023

Al2Klimov commented Jun 21, 2023

julianbrost commented Jun 21, 2023

nilmerg commented Jun 21, 2023

Report non-fatal errors/warning via the database for use in the Icinga Web health checks #608

Report non-fatal errors/warning via the database for use in the Icinga Web health checks #608

Comments

julianbrost commented Jun 21, 2023

Possible implementations

julianbrost commented Jun 21, 2023

Al2Klimov commented Jun 21, 2023

julianbrost commented Jun 21, 2023

nilmerg commented Jun 21, 2023