Commit graph

70 commits

Author SHA1 Message Date
Michael Derynck
6119c60f55
Add test for caching deleted integration, fix test wrap methods (#4173)
# What this PR does
Add test case and fixes test added in #4163 

## Which issue(s) this PR closes

Closes [issue link here]

<!--
*Note*: if you have more than one GitHub issue that this PR closes, be
sure to preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-04-05 20:35:38 +00:00
Michael Derynck
f5855915a2
Improve performance for deleted integration lookups (#4163)
# What this PR does
- Refactor alert receive channel lookup so it is easier to follow
- Remove the additional lookup that was taking place for alert receive
channels that belong to a deleted organization, these can be treated as
deleted for usage purposes even though the alert receive channel itself
does not have `deleted_at` populated
- Organizations that have been moved will still need to be looked up
everytime. This is not optimized in favor of not maintaining a cache of
Organizations. These are not frequent requests and can be optimized
later if necessary.

## Which issue(s) this PR closes

<!--
*Note*: if you have more than one GitHub issue that this PR closes, be
sure to preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-04-05 16:16:30 +00:00
Michael Derynck
c255ec66d8
Fix bug in ratelimit logic introduced in #4137 (#4162)
# What this PR does
Fix bug introduced in #4137

## Which issue(s) this PR closes

Closes [issue link here]

<!--
*Note*: if you have more than one GitHub issue that this PR closes, be
sure to preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-04-04 20:34:18 +00:00
Michael Derynck
d14b1c8e28
Improve performance for rate-limited, banned and deleted integrations (#4137)
# What this PR does
- Remove BanAlertConsumptionBasedOnSettingsMiddleware under high traffic
scenarios this causes too much DB load and the same effect can be
achieved by soft-deleting the integration
- Change lookups for integrations so that non-existent and deleted
integrations are also cached to reduce DB load

## Which issue(s) this PR closes

Closes https://github.com/grafana/oncall-private/issues/2608

<!--
*Note*: if you have more than one GitHub issue that this PR closes, be
sure to preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-04-04 18:05:34 +00:00
Vadim Stepanov
b7e2dc14f8
Fix ratelimit bug (#4108)
# What this PR does

Fixes a bug in the ratelimit logic when integration-specific ratelimit
429s are still counted towards the organization-wide ratelimit.

## Which issue(s) this PR closes

Related to https://github.com/grafana/support-escalations/issues/9579

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-03-26 17:20:05 +00:00
Yulya Artyukhina
ed60b67884
Integtation backsync endpoint (#4082)
# What this PR does
Adds endpoint for integration backsync
Related to https://github.com/grafana/oncall-private/issues/2542

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-03-20 11:26:33 +00:00
Matias Bordese
eb1228e782
Update universal integrations to reject requests without payload (#4053)
Reject integration requests with a null payload
2024-03-14 15:51:46 +00:00
Vadim Stepanov
2b5554c079
Remove explicit request size limits (#3878)
# What this PR does

Remove explicit request size limits both for uwsgi & Django. After this
change, the effective request size limit will be 2.5MB as the default
Django value for
[DATA_UPLOAD_MAX_MEMORY_SIZE](https://docs.djangoproject.com/en/4.2/ref/settings/#data-upload-max-memory-size).

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-02-22 15:00:33 +00:00
Joey Orlando
aca2804502
add pytest-xdist to speed up backend tests (#3839)
# What this PR does

Speeds up `pytest` test execution by ~30%.

More specifically, adds
[`pytest-xdist`](https://pytest-xdist.readthedocs.io/en/stable/), which
according to their docs:
> plugin extends pytest with new test execution modes, the most used
being distributing tests across multiple CPUs to speed up test execution

**Before**
<img width="270" alt="Screenshot 2024-02-05 at 15 53 13"
src="https://github.com/grafana/oncall/assets/9406895/4da33299-5bd0-4dc3-86e1-32cfdf9106f7">

**After**
<img width="254" alt="Screenshot 2024-02-05 at 15 53 04"
src="https://github.com/grafana/oncall/assets/9406895/a59eeb52-291d-4cdc-82b2-55fd31e1c1c5">
2024-02-05 16:04:15 -05:00
Joey Orlando
3833d8de56
remove manual alert group (/oncall) slack slash command + force_route_id (#3790)
# What this PR does

Related to [this
discussion](https://raintank-corp.slack.com/archives/C04JCU51NF8/p1706550226831949)

Removes the `/oncall` Slack slash command + the concept of
`force_route_id` (as this Slack slash command was the last piece of code
to use this concept
[here](https://github.com/grafana/oncall/blob/dev/engine/apps/slack/scenarios/manual_incident.py#L146))

## TODO before merging
- [x] update the various env's Slack apps to remove the slash command
from the app manifests

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-30 17:28:23 -05:00
Joey Orlando
06933a696a
Support alert routing based on labels (#3778)
# What this PR does

This PR adds support for routing alerts based on labels.
https://www.loom.com/share/4401de6e3c4945d5b8961fe43ee373c9

Additionally:
- improve the typing around the `get_object` method that is inherited by
[`PublicPrimaryKeyMixin.get_object`](https://github.com/grafana/oncall/blob/dev/engine/common/api_helpers/mixins.py#L153)
in most of our models. `PublicPrimaryKeyMixin` is generic, so it can be
more strongly typed when it is being subclassed, which results in better
typing of the `get_object` method in child classes
- I decided to do this because I started looking into this task via the
[`AlertReceiveChannelView.send_demo_alert`
method/endpoint](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/alert_receive_channel.py#L242).
Within that method, `instance` is not typed because the inherited
`get_object` method is not typed.. I digress 😄
- improve typing around `Alert.create` and
`apps.integrations.tasks.create_alert` functions
- make `Alert.render_group_data` more DRY by extracting some logic out
into `Alert._apply_jinja_template_to_alert_payload_and_labels`
- deduplicate the logic of `value.strip().lower() in ["1", "true",
"ok"]` into a shared function,
`common.jinja_templater.apply_jinja_template.templated_value_is_truthy`

Closes https://github.com/grafana/oncall-private/issues/2490

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
- [x] Documentation added (or `pr:no public docs` PR label added if not
required) (will be done in #3762)
2024-01-30 13:07:19 -05:00
Ildar Iskhakov
401d279d54
Refactor create_alert task (#3759)
# What this PR does

This PR simplifies alert group/alert creation, so the alert created and
escalation started in the same task.

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-30 08:39:04 +00:00
Matias Bordese
dbd5452a0b
Handle a possible outdated cached integration error (#3741)
Related to
[logs](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22hum%22:%7B%22datasource%22:%22c-R8UWvVk%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22amixr-prod%5C%22,%20job%3D%5C%22amixr-prod%2Famixr-integrations%5C%22%7D%20%7C%3D%20%5C%22django.core.serializers.base.DeserializationError%5C%22%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22c-R8UWvVk%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22from%22:%221706023840486%22,%22to%22:%221706024722486%22%7D%7D%7D&orgId=1)
2024-01-23 20:46:12 +00:00
Yulya Artyukhina
0421bc472a
Fix posting slack message about ratelimits (#3582)
# What this PR does

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2374
## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-12-19 06:05:57 +00:00
Matias Bordese
054401a214
Fix missing timestamp value, add test (#3522) 2023-12-06 16:02:54 +00:00
Matias Bordese
e053eb084d
Track alert received timestamp on alert group creation (#3513)
Keep record of the timestamp when the alert group creation task is
triggered, allowing to track the delta time between alert received
datetime and alert group creation timestamp.

Related to https://github.com/grafana/oncall-private/issues/2347
2023-12-06 12:20:03 +00:00
Matias Bordese
6e35aadc0c
Add test ensuring integration endpoints work if redis cache is down (#3445) 2023-12-01 17:45:18 +00:00
Joey Orlando
76a88bc0c1
Revert "upgrade to Python 3.12 (#3456)" and "bump uwsgi version to latest #3466" (#3483)
# What this PR does

This reverts commits 7c4b40a046 and
cdb22285db.

See https://github.com/grafana/oncall-private/pull/2361 for more
details.
2023-12-01 09:56:26 -05:00
Joey Orlando
7c4b40a046
upgrade to Python 3.12 (#3456)
# What this PR does

Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 13:47:41 +00:00
Matias Bordese
aa8a904a8d
Update when slack client ratelimit retry handler is enabled (#3447) 2023-11-30 12:35:46 +00:00
Ildar Iskhakov
95a3ab3b75
Revert "Cache independent ingestion" (#3417)
Reverts grafana/oncall#3415
2023-11-23 21:38:06 +08:00
Ildar Iskhakov
a6912c96af
Merge pull request #3415 from grafana/iskhakov/cache-independent-ingestion
Cache independent ingestion
2023-11-23 16:22:18 +08:00
Ildar Iskhakov
566e8c53ba Ignore typing checks for imported library (https://mypy.readthedocs.io/en/stable/running_mypy.html\#missing-library-stubs-or-py-typed-marker) 2023-11-23 16:14:30 +08:00
Ildar Iskhakov
0d5ef785bf Make alert ingestion cache independent 2023-11-23 11:27:47 +08:00
Michael Derynck
b3583cd1a0
Add more logging info on alert creation (#3392)
# What this PR does
Add alert receive channel id when logging to make it easier to trace
grouping

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-11-21 16:16:15 +00:00
Matias Bordese
8ddea0576e
Add test ensuring ingestion works without db access (#3322)
Handling alert payloads should work without db access (but still
requires cached integrations information)
2023-11-10 17:44:37 +00:00
Michael Derynck
ad1f63dbe9
AmazonSNS integration exception handling (#3315)
# What this PR does
Handle OrganizationMoved, OrganizationDeleted and PermissionDenied
exceptions same as other integration API views instead of converting to
BadRequest.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-10 13:45:12 +00:00
Matvey Kukuy
a898835eb4 Fixing ratelimit texts 2023-09-27 14:18:00 +03:00
Vadim Stepanov
8b2212c7dc
Improve Slack error handling (#3000)
# What this PR does

- Rename `SlackClientWithErrorHandling` to just `SlackClient`
- Add more error classes + improve the way errors are raised based on
the Slack error code
- Add API call retries on Slack server errors (e.g. when Slack returns
`5xx` errors)
- Refactor some methods working with Slack API + add tests

## Which issue(s) this PR fixes

- https://github.com/grafana/oncall-private/issues/1837
- https://github.com/grafana/oncall-private/issues/1840
- https://github.com/grafana/oncall-private/issues/1842

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-12 09:49:16 +00:00
Joey Orlando
a9155130df
update slack_sdk dependency to latest version (#2947)
# What this PR does

- update `slackclient` dependency to latest version. The version we were
using was 5 years old 😲
- first followed the v2 migration guide
[here](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x)
followed by the v3 migration guide
[here](https://slack.dev/python-slack-sdk/v3-migration/). The main
changes were:
    - The PyPI project was renamed from `slackclient` to `slack_sdk`
- it is discouraged/harder to call `api_call` and encouraged to call the
helper methods (ex. `chat_postMessage`;
[note](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x#web-client-api-changes)
in migration guide docs)
- In 1.x, a failed api call would return the error payload to you and
have you handle the error. In 2.x, a failed api call will throw an
exception. To handle this in your code, you will have to wrap api calls
with a try except block. Since we overload `WebClient.api_call` this was
an easy change and only required a one line change
- remove `apps.slack.slack_client.slack_server.SlackClientServer` class.
The new version of `slack_sdk` handles the case that we needed to
overload for in the first place.
- merged `apps/slack/slack_client/slack_client.py` and
`apps/slack/slack_client/exceptions.py` into `apps/slack/client.py`

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-05 11:31:59 +02:00
Matias Bordese
0dea5661c4
Reject file uploads when posting to an integration endpoint (#2958)
Related to https://github.com/grafana/oncall-private/issues/2145

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-09-05 10:01:50 +02:00
Michael Derynck
e0e1f4b021
Always update last_heartbeat_time async (#2892)
# What this PR does
If the same heartbeat is requested at a high rate it can create lock
contention when updating the timestamp in the DB. Moving to always run
update in task should free up the connection on the API server faster,
although the task might still see some lock wait time.

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-29 02:19:28 +00:00
Matias Bordese
cec5e6a284
Skip amazon_sns integration view test (#2849) 2023-08-21 17:06:31 -03:00
Joey Orlando
df21be3a50
add more integration tests for integrations api (#2845) 2023-08-21 15:40:29 +02:00
Matias Bordese
179a1db471
Add alertmanager integration for heartbeat support (#2807)
Related to https://github.com/grafana/oncall/issues/2801 and
https://github.com/grafana/support-escalations/issues/7081.

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-08-17 13:22:37 +00:00
Ildar Iskhakov
fd19dd422a
Use periodic task for heartbeats (#2723)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
2023-08-10 02:25:00 +00:00
Innokentii Konstantinov
5538a80ba1 Fix GrafanaAlertingAPIView 2023-08-01 16:00:50 +08:00
Innokentii Konstantinov
7c4f72c348 Fix num_firing/resolved calculation 2023-08-01 13:37:31 +08:00
Innokentii Konstantinov
abca37e621
Polish amv2 (#2701) 2023-08-01 13:13:58 +08:00
Innokentii Konstantinov
1ccb9d6979
AlertManager v2 (#2643)
Introduce AlertManager v2 integration with improved internal behaviour

it's using grouping from AlertManager, not trying to re-group alerts on
OnCall side.
Existing AlertManager and Grafana Alerting integrations are marked as
Legacy with options to migrate them manually now or be migrated
automatically after DEPRECATION DATE(TBD).
Integration urls and public api responses stay the same both for legacy
and new integrations.

---------

Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-08-01 12:18:52 +08:00
Vadim Stepanov
f977f9faee
Minor formatting changes (#2641)
# What this PR does

- Updates `black` and `flake8` to latest
- Removes `F541` from flake8 ignore (`F541 f-string is missing
placeholders`)
- Enables ["float to top"
option](https://pycqa.github.io/isort/docs/configuration/options.html#float-to-top)
for `isort`
2023-07-26 14:45:44 +01:00
Vadim Stepanov
b2f4ffb98a
apps.get_model -> import (#2619)
# What this PR does

Remove
[`apps.get_model`](https://docs.djangoproject.com/en/3.2/ref/applications/#django.apps.apps.get_model)
invocations and use inline `import` statements in places where models
are imported within functions/methods to avoid circular imports.

I believe `import` statements are more appropriate for most use cases as
they allow for better static code analysis & formatting, and solve the
issue of circular imports without being unnecessarily dynamic as
`apps.get_model`. With `import` statements, it's possible to:

- Jump to model definitions in most IDEs
- Automatically sort inline imports with `isort`
- Find import errors faster/easier (most IDEs highlight broken imports)
- Have more consistency across regular & inline imports when importing
models

This PR also adds a flake8 rule to ban imports of `django.apps.apps`, so
it's harder to use `apps.get_model` by mistake (it's possible to ignore
this rule by using `# noqa: I251`). The rule is not enforced on
directories with migration files, because `apps.get_model` is often used
to get a historical state of a model, which is useful when writing
migrations ([see this SO answer for more
details](https://stackoverflow.com/a/37769213)). So `apps.get_model` is
considered OK in migrations (even necessary in some cases).

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-25 09:43:23 +00:00
Innokentii Konstantinov
6dcaf52efb
Remove integration html instructions (#2627)
Remove integration html instructions, They were migrated to the docs
2023-07-25 04:31:28 +00:00
Joey Orlando
63ac0972c5
remove deprecated heartbeat_heartbeat table/model (#2534)
# What this PR does

- Remove `heartbeat_heartbeat` table. This model/table does not seems to
be deprecated/used anywhere (no data in this in production/staging; see
more comments in the code about this).
2023-07-17 01:38:04 -04:00
Innokentii Konstantinov
ab7cd0aec2 Fix amv2 2023-06-14 14:43:00 +08:00
Innokentii Konstantinov
f0f2e7c8c6
Draft AlertManager integration v2 (#2167)
# What this PR does
Introduces AlertManagerV2 integration with better grouping and
autoresolving, not intended for production use yet.

---------

Co-authored-by: Ildar Iskhakov <Ildar.iskhakov@grafana.com>
2023-06-13 07:10:38 +00:00
Innokentii Konstantinov
056b0ddc7e
Add ratelimit for AmazonSNS (#2032)
Adds a ratelimit for AmazonSNS. 
AlertChannelDefining mixin is now injecting alert_receive_channel only
in request, not in kwargs to not to break AmazonSNS.
2023-05-26 09:57:26 +00:00
Vadim Stepanov
b8f54f1c53
Add docs & logo for AppDynamics integration (#1916)
# What this PR does
Adds docs & logo for AppDynamics integration. 
Main PR in private repo:
https://github.com/grafana/oncall-private/pull/1790.

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1621

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- No changelog (AppDynamics integration will be only available in cloud)
2023-05-11 16:41:51 +00:00
Joey Orlando
014a9c2ec2
allow the POST incoming alert endpoints to queue create_alert tasks independent of the database status (#1896)
# What this PR does

https://www.loom.com/share/18cc445117de4895a10892d56c7d3699

In preparation to upgrade our cloud databases, this PR makes some minor
changes which, after testing locally, allowed the `POST
/<integration_type>/<alert_channel_key>` endpoints to successfully
receive incoming alerts and queue the celery tasks.

I've tested all of the defined `POST
/integrations/v1/<integration_type>/<alert_channel_key>` endpoints by
sending `POST` requests to an integrations' URL while the MySQL database
was down, bringing the database back up, and ensuring the alerts were
created.

## Some other findings
- the integration heartbeat endpoints will not work as we interact w/
the database to persist the incoming heartbeat instance
- if the integration was created in the last 180 seconds, incoming
alerts will fail due to the way we cache the integration IDs
([code](https://github.com/grafana/oncall/blob/dev/engine/apps/integrations/mixins/alert_channel_defining_mixin.py#L47-L50))
- The `create_alert` celery task is set to `max_retries=None` and
`retry_backoff=True`. This means that the queued tasks will continue
retrying forever w/ an exponential backoff, until the alerts can be
created in the database (ie. when the database is back online).

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-05-10 12:36:23 +00:00
Oleg Zaytsev
41f7c23c65
Fix and tidy alertmanager heartbeat template (#1865)
# What this PR does


There was an unnecessary indentation in the `rules:` key which made it
invalid YAML.

Also replaced the mentions to Amixr with Grafana OnCall, used some
`<code>` tags and reworded some sentences.

Also removed the anchor tag from the webhook link: we don't want people
to follow that in their browser, we want them to copy it

## Result screenshot


![image](https://user-images.githubusercontent.com/1511481/236173565-b5201b81-4d69-4d0b-944a-a2106f8fbab3.png)

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-05-05 00:25:05 +00:00