Grafana OnCall engine fork — self-hosted on-call scheduler and alert router
# What this PR does Before: <img width="281" alt="Screenshot 2023-03-23 at 16 56 42" src="https://user-images.githubusercontent.com/20116910/227279464-c883ec05-a964-4360-bda2-3443409ca90a.png"> After: <img width="338" alt="Screenshot 2023-03-23 at 16 57 41" src="https://user-images.githubusercontent.com/20116910/227279476-468bffba-922a-45ea-b400-5f34d6bf0534.png"> - Add scores for overloaded users, e.g. `(+25% avg)` which means the user is scheduled to be on-call 25% more than average for given schedule. - Add score for gaps, e.g. `Schedule has gaps (29% not covered)` which means 29% of time no one is scheduled to be on-call. - Make things easier to understand when there are gaps in the schedule, add `(see overloaded users)` text. - Consider events for next 52 weeks (~1 year) instead of 90 days (~3 months), so the quality report is more accurate. Also treat any balance quality >95% as perfectly balanced. These two changes (period change and adding 95% threshold) should help eliminate false positives for _most_ schedules. - Modify backend & frontend so the backend returns all necessary user information to render without using the user store. - Move quality report generation to `OnCallSchedule` model, add more tests. ## Which issue(s) this PR fixes Related to https://github.com/grafana/oncall/issues/1552 ## Checklist - [x] Tests updated - [x] `CHANGELOG.md` updated (public docs will be added in a separate PR) |
||
|---|---|---|
| .github | ||
| dev | ||
| docs | ||
| engine | ||
| examples/terraform | ||
| grafana-plugin | ||
| helm | ||
| tools | ||
| .drone.yml | ||
| .gitignore | ||
| .markdownlint.json | ||
| .markdownlintignore | ||
| .nvmrc | ||
| .pre-commit-config.yaml | ||
| CHANGELOG.md | ||
| CODE_OF_CONDUCT.md | ||
| docker-compose-developer.yml | ||
| docker-compose-mysql-rabbitmq.yml | ||
| docker-compose.yml | ||
| GOVERNANCE.md | ||
| LICENSE | ||
| LICENSING.md | ||
| MAINTAINERS.md | ||
| Makefile | ||
| README.md | ||
| screenshot.png | ||
| SECURITY.md | ||
Grafana OnCall
Developer-friendly incident response with brilliant Slack integration.
- Collect and analyze alerts from multiple monitoring systems
- On-call rotations based on schedules
- Automatic escalations
- Phone calls, SMS, Slack, Telegram notifications
Getting Started
We prepared multiple environments:
- production
- developer
- hobby (described in the following steps)
-
Download
docker-compose.yml:curl -fsSL https://raw.githubusercontent.com/grafana/oncall/dev/docker-compose.yml -o docker-compose.yml -
Set variables:
echo "DOMAIN=http://localhost:8080 COMPOSE_PROFILES=with_grafana # Remove this line if you want to use existing grafana SECRET_KEY=my_random_secret_must_be_more_than_32_characters_long" > .env -
Launch services:
docker-compose pull && docker-compose up -d -
Go to OnCall Plugin Configuration, using log in credentials as defined above:
admin/admin(or find OnCall plugin in configuration->plugins) and connect OnCall plugin with OnCall backend:OnCall backend URL: http://engine:8080 -
Enjoy! Check our OSS docs if you want to set up Slack, Telegram, Twilio or SMS/calls through Grafana Cloud.
Update version
To update your Grafana OnCall hobby environment:
# Update Docker image
docker-compose pull engine
# Re-deploy
docker-compose up -d
After updating the engine, you'll also need to click the "Update" button on the plugin version page. See Grafana docs for more info on updating Grafana plugins.
Join community
Stargazers over time
Further Reading
- Migration from PagerDuty - Migrator
- Documentation - Grafana OnCall
- Overview Webinar - YouTube
- How To Add Integration - How to Add Integration
- Blog Post - Announcing Grafana OnCall, the easiest way to do on-call management
- Presentation - Deep dive into the Grafana, Prometheus, and Alertmanager stack for alerting and on-call management


