Grafana OnCall engine fork — self-hosted on-call scheduler and alert router
Find a file
Joey Orlando 014a9c2ec2
allow the POST incoming alert endpoints to queue create_alert tasks independent of the database status (#1896)
# What this PR does

https://www.loom.com/share/18cc445117de4895a10892d56c7d3699

In preparation to upgrade our cloud databases, this PR makes some minor
changes which, after testing locally, allowed the `POST
/<integration_type>/<alert_channel_key>` endpoints to successfully
receive incoming alerts and queue the celery tasks.

I've tested all of the defined `POST
/integrations/v1/<integration_type>/<alert_channel_key>` endpoints by
sending `POST` requests to an integrations' URL while the MySQL database
was down, bringing the database back up, and ensuring the alerts were
created.

## Some other findings
- the integration heartbeat endpoints will not work as we interact w/
the database to persist the incoming heartbeat instance
- if the integration was created in the last 180 seconds, incoming
alerts will fail due to the way we cache the integration IDs
([code](https://github.com/grafana/oncall/blob/dev/engine/apps/integrations/mixins/alert_channel_defining_mixin.py#L47-L50))
- The `create_alert` celery task is set to `max_retries=None` and
`retry_backoff=True`. This means that the queued tasks will continue
retrying forever w/ an exponential backoff, until the alerts can be
created in the database (ie. when the database is back online).

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-05-10 12:36:23 +00:00
.github upgrade to python 3.11.3 (#1849) 2023-05-05 15:32:40 +00:00
dev upgrade to python 3.11.3 (#1849) 2023-05-05 15:32:40 +00:00
docs Zendesk inbound integration docs (#1860) 2023-05-03 11:38:07 +01:00
engine allow the POST incoming alert endpoints to queue create_alert tasks independent of the database status (#1896) 2023-05-10 12:36:23 +00:00
examples/terraform Terraform examples 2022-08-11 14:32:39 +05:00
grafana-plugin fix-new-schedule-creation (#1902) 2023-05-09 12:37:42 +03:00
helm Merge hotfix to dev (#1911) 2023-05-09 11:17:27 -06:00
tools upgrade to python 3.11.3 (#1849) 2023-05-05 15:32:40 +00:00
.drone.yml Restore original lint backend drone step (#1904) 2023-05-09 13:04:55 +00:00
.gitignore ignore .http file extensions (#1762) 2023-04-17 10:52:03 +02:00
.markdownlint.json add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
.markdownlintignore Add tracing support 2022-12-19 17:15:06 +08:00
.nvmrc One startup command to rule them all (#760) 2022-11-07 16:34:43 +01:00
.pre-commit-config.yaml fix failing lint github actions job due to issue w/ isort version (#1249) 2023-01-30 11:43:15 +01:00
CHANGELOG.md Update CHANGELOG.md 2023-05-09 11:18:17 -06:00
CODE_OF_CONDUCT.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
docker-compose-developer.yml Add "Notifications Receiver" RBAC role (#1853) 2023-05-02 12:19:34 +00:00
docker-compose-mysql-rabbitmq.yml Add "Notifications Receiver" RBAC role (#1853) 2023-05-02 12:19:34 +00:00
docker-compose.yml Add "Notifications Receiver" RBAC role (#1853) 2023-05-02 12:19:34 +00:00
GOVERNANCE.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
LICENSE World, meet OnCall! 2022-06-03 08:09:47 -06:00
LICENSING.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
MAINTAINERS.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
Makefile Add "make help" command (#1583) 2023-03-21 08:12:13 +00:00
README.md Make screenshots bigger in README.md (#1799) 2023-04-20 13:26:12 +08:00
screenshot.png Merge dev to main (#54) 2022-06-13 16:39:58 -06:00
screenshot_mobile.png Readme updates 2023-04-11 15:43:52 +03:00
SECURITY.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00

Grafana OnCall

Latest Release License Docker Pulls Slack Discussion Build Status

Developer-friendly incident response with brilliant Slack integration.

  • Collect and analyze alerts from multiple monitoring systems
  • On-call rotations based on schedules
  • Automatic escalations
  • Phone calls, SMS, Slack, Telegram notifications

Getting Started

We prepared multiple environments:

  1. Download docker-compose.yml:

    curl -fsSL https://raw.githubusercontent.com/grafana/oncall/dev/docker-compose.yml -o docker-compose.yml
    
  2. Set variables:

    echo "DOMAIN=http://localhost:8080
    COMPOSE_PROFILES=with_grafana  # Remove this line if you want to use existing grafana
    SECRET_KEY=my_random_secret_must_be_more_than_32_characters_long" > .env
    
  3. Launch services:

    docker-compose pull && docker-compose up -d
    
  4. Go to OnCall Plugin Configuration, using log in credentials as defined above: admin/admin (or find OnCall plugin in configuration->plugins) and connect OnCall plugin with OnCall backend:

    OnCall backend URL: http://engine:8080
    
  5. Enjoy! Check our OSS docs if you want to set up Slack, Telegram, Twilio or SMS/calls through Grafana Cloud.

Update version

To update your Grafana OnCall hobby environment:

# Update Docker image
docker-compose pull engine

# Re-deploy
docker-compose up -d

After updating the engine, you'll also need to click the "Update" button on the plugin version page. See Grafana docs for more info on updating Grafana plugins.

Join community

Stargazers over time

Stargazers over time

Further Reading