You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The maintenance causes MON and OSD to be restarted.
This is a regular process and no issue, as long as just a qualified amount of components are down at the same time.
Current state is that we get P1 alerts out of MON and OSDs down caused by the regular maintenance process.
This is misleading the operator, because it it not an actionable alert, recover automatically as the maintenance processes.
Implementation idea
Relax the alerts, so they are P3 rather than P1. This still causes noice.
Relax the time MON and OSD can be down until an alert happens. Increases the delay in a real event.
Figure a way MON and OSD downs are just counted, if more the the minimum amount of running services covering the service are down
The text was updated successfully, but these errors were encountered:
Context
The maintenance causes MON and OSD to be restarted.
This is a regular process and no issue, as long as just a qualified amount of components are down at the same time.
Current state is that we get P1 alerts out of MON and OSDs down caused by the regular maintenance process.
This is misleading the operator, because it it not an actionable alert, recover automatically as the maintenance processes.
Implementation idea
The text was updated successfully, but these errors were encountered: