Grafana OnCall engine fork — self-hosted on-call scheduler and alert router
Find a file
Joey Orlando 697248dc75
Add responders improvements (#3128)
# What this PR does

https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb

## Summary of backend changes
- Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method
and `AlertReceiveChannel.is_contactable` property. These are needed to
be able to (optionally) filter down teams, in the `GET /teams` internal
API endpoint
([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)),
to just teams that have a "contactable" Direct Paging integration
- `engine/apps/alerts/paging.py`
- update these functions to support new UX. In short `direct_paging` no
longer takes a list of `ScheduleNotifications` or an `EscalationChain`
object
  - add `user_is_oncall` helper function
- add `_construct_title` helper function. In short if no `title` is
provided, which is the case for Direct Pages originating from OnCall
(either UI or Slack), then the format is `f"{from_user.username} is
paging <team.name (if team is specified> <comma separated list of
user.usernames> to join escalation"`
- `engine/apps/api/serializers/team.py` - add
`number_of_users_currently_oncall` attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52))
- `engine/apps/api/serializers/user.py` - add `is_currently_oncall`
attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162))
- `engine/apps/api/views/team.py` - add support for two new optional
query params `only_include_notifiable_teams` and `include_no_team`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70))
- `engine/apps/api/views/user.py`
- in the `GET /users` internal API endpoint, when specifying the
`search` query param now also search on `teams__name`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223);
this is a new UX requirement)
- add support for a new optional query param, `is_currently_oncall`, to
allow filtering users based on.. whether they are currently on call or
not
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282))
- remove `check_availability` endpoint (no longer used with new UX; also
removed references in frontend code)
- `engine/apps/slack/scenarios/paging.py` and
`engine/apps/slack/scenarios/manage_responders.py` - update Slack
workflows to support new UX. Schedules are no longer a concept here.
When creating a new alert group via `/escalate` the user either
specifies a team and/or user(s) (they must specify at least one of the
two and validation is done here to check this). When adding responders
to an existing alert group it's simply a list of users that they can
add, no more schedules.
- add `Organization.slack_is_configured` and
`Organization.telegram_is_configured` properties. These are needed to
support [this new functionality
](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272)
in the `AlertReceiveChannel` model.

## Summary of frontend changes
- Refactor/rename `EscalationVariants` component to `AddResponders` +
remove `grafana-plugin/src/containers/UserWarningModal` (no longer
needed with new UX)
- Remove `grafana-plugin/src/models/user.ts` as it seemed to be a
duplicate of `grafana-plugin/src/models/user/user.types.ts`

Related to https://github.com/grafana/incident/issues/4278

- Closes #3115
- Closes #3116
- Closes #3117
- Closes #3118 
- Closes #3177 

## TODO
- [x] make frontend changes
- [x] update Slack backend functionality
- [x] update public documentation
- [x] add/update e2e tests

## Post-deploy To-dos
- [ ] update dev/ops/production Slack bots to update `/escalate` command
description (should now say "Direct page a team or user(s)")

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-27 12:12:07 -04:00
.github Update make docs procedure (#3131) 2023-10-06 08:30:49 +00:00
dev Tilt Docs polishing (#3090) 2023-09-30 18:17:41 +00:00
docs Add responders improvements (#3128) 2023-10-27 12:12:07 -04:00
engine Add responders improvements (#3128) 2023-10-27 12:12:07 -04:00
grafana-plugin Add responders improvements (#3128) 2023-10-27 12:12:07 -04:00
helm feature: Hardening the Helm deployment with Redis and Postgres TLS (#3029) 2023-10-03 09:25:28 -04:00
terraform Remove unnecessary team checks (#2606) 2023-07-21 15:55:57 +01:00
tools Bump Python version in CI to 3.11.4 (#2707) 2023-08-01 10:11:05 +01:00
.dockerignore WIP: Direct paging improvements (#3064) 2023-09-28 03:57:49 +00:00
.drone.yml configure yamllint pre-commit step (#2728) 2023-08-03 02:35:08 -04:00
.gitignore Use Tilt for local development (#1396) 2023-09-07 19:38:19 +08:00
.markdownlint.json don't enforce line-length rule for markdownlint for code-blocks or tables (#2145) 2023-06-09 06:57:19 +00:00
.markdownlintignore Add tracing support 2022-12-19 17:15:06 +08:00
.pre-commit-config.yaml configure yamllint pre-commit step (#2728) 2023-08-03 02:35:08 -04:00
.yamllint.yml configure yamllint pre-commit step (#2728) 2023-08-03 02:35:08 -04:00
CHANGELOG.md Add responders improvements (#3128) 2023-10-27 12:12:07 -04:00
CODE_OF_CONDUCT.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
docker-compose-developer.yml Telegram long polling (#2250) 2023-08-24 09:12:24 +02:00
docker-compose-mysql-rabbitmq.yml fix make start command when using mysql/postgres as db (#2744) 2023-08-03 11:50:40 -04:00
docker-compose.yml fix make start command when using mysql/postgres as db (#2744) 2023-08-03 11:50:40 -04:00
GOVERNANCE.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
LICENSE World, meet OnCall! 2022-06-03 08:09:47 -06:00
LICENSING.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
MAINTAINERS.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
Makefile Use Tilt for local development (#1396) 2023-09-07 19:38:19 +08:00
README.md [README]fix prometheus yml indentation error (#2327) 2023-06-26 13:44:33 +00:00
screenshot.png Merge dev to main (#54) 2022-06-13 16:39:58 -06:00
screenshot_mobile.png Readme updates 2023-04-11 15:43:52 +03:00
Tiltfile WIP: Direct paging improvements (#3064) 2023-09-28 03:57:49 +00:00

Grafana OnCall

Latest Release License Docker Pulls Slack Discussion Build Status

Developer-friendly incident response with brilliant Slack integration.

  • Collect and analyze alerts from multiple monitoring systems
  • On-call rotations based on schedules
  • Automatic escalations
  • Phone calls, SMS, Slack, Telegram notifications

Getting Started

We prepared multiple environments:

  1. Download docker-compose.yml:

    curl -fsSL https://raw.githubusercontent.com/grafana/oncall/dev/docker-compose.yml -o docker-compose.yml
    
  2. Set variables:

    echo "DOMAIN=http://localhost:8080
    # Remove 'with_grafana' below if you want to use existing grafana
    # Add 'with_prometheus' below to optionally enable a local prometheus for oncall metrics
    # e.g. COMPOSE_PROFILES=with_grafana,with_prometheus
    COMPOSE_PROFILES=with_grafana
    # to setup an auth token for prometheus exporter metrics:
    # PROMETHEUS_EXPORTER_SECRET=my_random_prometheus_secret
    # also, make sure to enable the /metrics endpoint:
    # FEATURE_PROMETHEUS_EXPORTER_ENABLED=True
    SECRET_KEY=my_random_secret_must_be_more_than_32_characters_long" > .env
    
  3. (Optional) If you want to enable/setup the prometheus metrics exporter (besides the changes above), create a prometheus.yml file (replacing my_random_prometheus_secret accordingly), next to your docker-compose.yml:

    echo "global:
      scrape_interval:     15s
      evaluation_interval: 15s
    
    scrape_configs:
      - job_name: prometheus
        metrics_path: /metrics/
        authorization:
          credentials: my_random_prometheus_secret
        static_configs:
          - targets: [\"host.docker.internal:8080\"]" > prometheus.yml
    

    NOTE: you will need to setup a Prometheus datasource using http://prometheus:9090 as the URL in the Grafana UI.

  4. Launch services:

    docker-compose pull && docker-compose up -d
    
  5. Go to OnCall Plugin Configuration, using log in credentials as defined above: admin/admin (or find OnCall plugin in configuration->plugins) and connect OnCall plugin with OnCall backend:

    OnCall backend URL: http://engine:8080
    
  6. Enjoy! Check our OSS docs if you want to set up Slack, Telegram, Twilio or SMS/calls through Grafana Cloud.

Update version

To update your Grafana OnCall hobby environment:

# Update Docker image
docker-compose pull engine

# Re-deploy
docker-compose up -d

After updating the engine, you'll also need to click the "Update" button on the plugin version page. See Grafana docs for more info on updating Grafana plugins.

Join community

Stargazers over time

Stargazers over time

Further Reading