oncall-engine/engine/apps
Joey Orlando 014a9c2ec2
allow the POST incoming alert endpoints to queue create_alert tasks independent of the database status (#1896)
# What this PR does

https://www.loom.com/share/18cc445117de4895a10892d56c7d3699

In preparation to upgrade our cloud databases, this PR makes some minor
changes which, after testing locally, allowed the `POST
/<integration_type>/<alert_channel_key>` endpoints to successfully
receive incoming alerts and queue the celery tasks.

I've tested all of the defined `POST
/integrations/v1/<integration_type>/<alert_channel_key>` endpoints by
sending `POST` requests to an integrations' URL while the MySQL database
was down, bringing the database back up, and ensuring the alerts were
created.

## Some other findings
- the integration heartbeat endpoints will not work as we interact w/
the database to persist the incoming heartbeat instance
- if the integration was created in the last 180 seconds, incoming
alerts will fail due to the way we cache the integration IDs
([code](https://github.com/grafana/oncall/blob/dev/engine/apps/integrations/mixins/alert_channel_defining_mixin.py#L47-L50))
- The `create_alert` celery task is set to `max_retries=None` and
`retry_backoff=True`. This means that the queued tasks will continue
retrying forever w/ an exponential backoff, until the alerts can be
created in the database (ie. when the database is back online).

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-05-10 12:36:23 +00:00
..
alerts Merge hotfix to dev (#1911) 2023-05-09 11:17:27 -06:00
api Refactor upcoming shifts to use cached final schedule data (#1891) 2023-05-08 19:01:24 +00:00
api_for_grafana_incident Include alert details in Grafana Incident alert-group endpoint (#1280) 2023-02-03 13:43:21 +00:00
auth_token Change Organization Deleted/Moved Precedence (#1402) 2023-02-24 11:45:21 +00:00
base Fix documentation links (#1766) 2023-04-19 10:12:16 +01:00
email Inbound email integration (#837) 2023-03-16 13:59:21 +08:00
grafana_plugin allow the POST incoming alert endpoints to queue create_alert tasks independent of the database status (#1896) 2023-05-10 12:36:23 +00:00
heartbeat Add database migrations linter (#1020) 2023-02-06 16:01:37 +08:00
integrations allow the POST incoming alert endpoints to queue create_alert tasks independent of the database status (#1896) 2023-05-10 12:36:23 +00:00
mobile_app add important_notification_volume_override to mobile app user settings model (#1893) 2023-05-09 14:28:47 +00:00
oss_installation Add database migrations linter (#1020) 2023-02-06 16:01:37 +08:00
public_api upgrade to python 3.11.3 (#1849) 2023-05-05 15:32:40 +00:00
schedules Refactor upcoming shifts to use cached final schedule data (#1891) 2023-05-08 19:01:24 +00:00
slack Handle invitation button press (#1863) 2023-05-03 08:19:56 +00:00
social_auth Fix insight_logs exceptions (#1757) 2023-04-17 07:16:18 +00:00
telegram update web UI, Slack, and Telegram to allow silencing an acknowledged alert group (#1831) 2023-04-27 14:52:35 +00:00
twilioapp Add database migrations linter (#1020) 2023-02-06 16:01:37 +08:00
user_management Fix insight_logs exceptions (#1757) 2023-04-17 07:16:18 +00:00
webhooks Webhook response check content length instead of header for length limit (#1900) 2023-05-09 13:55:05 +00:00
__init__.py World, meet OnCall! 2022-06-03 08:09:47 -06:00