-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insistent notifications for urgent actionable alerts #158
Comments
I've used PagerDuty, and didn't hate it. Their mobile app has good push notifications, they also work well over other communication methods. I've also heard good things about Victorops, but no first-hand experience. |
That was discussed that once again in Rome and we decided that alerting to Slack during "business hours" is enough. Availability incidents should rather have root cause fixed and going down for ~12 hours is something we can afford. The only system that is critical enough to wake people up is the system doing data ingestion. The plan is to improve its availability in two ways:
Test helpers are stateless, so their availability may be trivially improved. So I'm closing the issue as we plan no further actions regarding insistent notifications. |
Stuff happens: #128, #157. Non-actionable stuff happens as well: #155. @hellais and I consider, that insistent notifications sound like something valuable.
Basically there are two options for insistent notifications: separate app annoying you till you hit [ACK] or VoIP call from robot (annoying you till you pick up the phone).
Seems, our options are pagerduty, opsgenie, victorops and alertopts if we go SaaS (three of them mention discounts for non-profits). Cabot with openduty may be good enough self-hosted solution that have no apps and use phone for insistent notifications. We can also go NIH and glue prometheus webhook directly to SIP dialer dropping escalation requirement :)
Also WMF has nice page: https://wikitech.wikimedia.org/wiki/Monitoring_package_survey#OpsView
The text was updated successfully, but these errors were encountered: