Commit graph

53 commits

Author SHA1 Message Date
Joey Orlando
76a88bc0c1
Revert "upgrade to Python 3.12 (#3456)" and "bump uwsgi version to latest #3466" (#3483)
# What this PR does

This reverts commits 7c4b40a046 and
cdb22285db.

See https://github.com/grafana/oncall-private/pull/2361 for more
details.
2023-12-01 09:56:26 -05:00
Joey Orlando
7c4b40a046
upgrade to Python 3.12 (#3456)
# What this PR does

Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 13:47:41 +00:00
Matias Bordese
aa8a904a8d
Update when slack client ratelimit retry handler is enabled (#3447) 2023-11-30 12:35:46 +00:00
Ildar Iskhakov
95a3ab3b75
Revert "Cache independent ingestion" (#3417)
Reverts grafana/oncall#3415
2023-11-23 21:38:06 +08:00
Ildar Iskhakov
a6912c96af
Merge pull request #3415 from grafana/iskhakov/cache-independent-ingestion
Cache independent ingestion
2023-11-23 16:22:18 +08:00
Ildar Iskhakov
566e8c53ba Ignore typing checks for imported library (https://mypy.readthedocs.io/en/stable/running_mypy.html\#missing-library-stubs-or-py-typed-marker) 2023-11-23 16:14:30 +08:00
Ildar Iskhakov
0d5ef785bf Make alert ingestion cache independent 2023-11-23 11:27:47 +08:00
Michael Derynck
b3583cd1a0
Add more logging info on alert creation (#3392)
# What this PR does
Add alert receive channel id when logging to make it easier to trace
grouping

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-11-21 16:16:15 +00:00
Matias Bordese
8ddea0576e
Add test ensuring ingestion works without db access (#3322)
Handling alert payloads should work without db access (but still
requires cached integrations information)
2023-11-10 17:44:37 +00:00
Michael Derynck
ad1f63dbe9
AmazonSNS integration exception handling (#3315)
# What this PR does
Handle OrganizationMoved, OrganizationDeleted and PermissionDenied
exceptions same as other integration API views instead of converting to
BadRequest.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-10 13:45:12 +00:00
Matvey Kukuy
a898835eb4 Fixing ratelimit texts 2023-09-27 14:18:00 +03:00
Vadim Stepanov
8b2212c7dc
Improve Slack error handling (#3000)
# What this PR does

- Rename `SlackClientWithErrorHandling` to just `SlackClient`
- Add more error classes + improve the way errors are raised based on
the Slack error code
- Add API call retries on Slack server errors (e.g. when Slack returns
`5xx` errors)
- Refactor some methods working with Slack API + add tests

## Which issue(s) this PR fixes

- https://github.com/grafana/oncall-private/issues/1837
- https://github.com/grafana/oncall-private/issues/1840
- https://github.com/grafana/oncall-private/issues/1842

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-12 09:49:16 +00:00
Joey Orlando
a9155130df
update slack_sdk dependency to latest version (#2947)
# What this PR does

- update `slackclient` dependency to latest version. The version we were
using was 5 years old 😲
- first followed the v2 migration guide
[here](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x)
followed by the v3 migration guide
[here](https://slack.dev/python-slack-sdk/v3-migration/). The main
changes were:
    - The PyPI project was renamed from `slackclient` to `slack_sdk`
- it is discouraged/harder to call `api_call` and encouraged to call the
helper methods (ex. `chat_postMessage`;
[note](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x#web-client-api-changes)
in migration guide docs)
- In 1.x, a failed api call would return the error payload to you and
have you handle the error. In 2.x, a failed api call will throw an
exception. To handle this in your code, you will have to wrap api calls
with a try except block. Since we overload `WebClient.api_call` this was
an easy change and only required a one line change
- remove `apps.slack.slack_client.slack_server.SlackClientServer` class.
The new version of `slack_sdk` handles the case that we needed to
overload for in the first place.
- merged `apps/slack/slack_client/slack_client.py` and
`apps/slack/slack_client/exceptions.py` into `apps/slack/client.py`

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-05 11:31:59 +02:00
Matias Bordese
0dea5661c4
Reject file uploads when posting to an integration endpoint (#2958)
Related to https://github.com/grafana/oncall-private/issues/2145

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-09-05 10:01:50 +02:00
Michael Derynck
e0e1f4b021
Always update last_heartbeat_time async (#2892)
# What this PR does
If the same heartbeat is requested at a high rate it can create lock
contention when updating the timestamp in the DB. Moving to always run
update in task should free up the connection on the API server faster,
although the task might still see some lock wait time.

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-29 02:19:28 +00:00
Matias Bordese
cec5e6a284
Skip amazon_sns integration view test (#2849) 2023-08-21 17:06:31 -03:00
Joey Orlando
df21be3a50
add more integration tests for integrations api (#2845) 2023-08-21 15:40:29 +02:00
Matias Bordese
179a1db471
Add alertmanager integration for heartbeat support (#2807)
Related to https://github.com/grafana/oncall/issues/2801 and
https://github.com/grafana/support-escalations/issues/7081.

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-08-17 13:22:37 +00:00
Ildar Iskhakov
fd19dd422a
Use periodic task for heartbeats (#2723)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
2023-08-10 02:25:00 +00:00
Innokentii Konstantinov
5538a80ba1 Fix GrafanaAlertingAPIView 2023-08-01 16:00:50 +08:00
Innokentii Konstantinov
7c4f72c348 Fix num_firing/resolved calculation 2023-08-01 13:37:31 +08:00
Innokentii Konstantinov
abca37e621
Polish amv2 (#2701) 2023-08-01 13:13:58 +08:00
Innokentii Konstantinov
1ccb9d6979
AlertManager v2 (#2643)
Introduce AlertManager v2 integration with improved internal behaviour

it's using grouping from AlertManager, not trying to re-group alerts on
OnCall side.
Existing AlertManager and Grafana Alerting integrations are marked as
Legacy with options to migrate them manually now or be migrated
automatically after DEPRECATION DATE(TBD).
Integration urls and public api responses stay the same both for legacy
and new integrations.

---------

Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-08-01 12:18:52 +08:00
Vadim Stepanov
f977f9faee
Minor formatting changes (#2641)
# What this PR does

- Updates `black` and `flake8` to latest
- Removes `F541` from flake8 ignore (`F541 f-string is missing
placeholders`)
- Enables ["float to top"
option](https://pycqa.github.io/isort/docs/configuration/options.html#float-to-top)
for `isort`
2023-07-26 14:45:44 +01:00
Vadim Stepanov
b2f4ffb98a
apps.get_model -> import (#2619)
# What this PR does

Remove
[`apps.get_model`](https://docs.djangoproject.com/en/3.2/ref/applications/#django.apps.apps.get_model)
invocations and use inline `import` statements in places where models
are imported within functions/methods to avoid circular imports.

I believe `import` statements are more appropriate for most use cases as
they allow for better static code analysis & formatting, and solve the
issue of circular imports without being unnecessarily dynamic as
`apps.get_model`. With `import` statements, it's possible to:

- Jump to model definitions in most IDEs
- Automatically sort inline imports with `isort`
- Find import errors faster/easier (most IDEs highlight broken imports)
- Have more consistency across regular & inline imports when importing
models

This PR also adds a flake8 rule to ban imports of `django.apps.apps`, so
it's harder to use `apps.get_model` by mistake (it's possible to ignore
this rule by using `# noqa: I251`). The rule is not enforced on
directories with migration files, because `apps.get_model` is often used
to get a historical state of a model, which is useful when writing
migrations ([see this SO answer for more
details](https://stackoverflow.com/a/37769213)). So `apps.get_model` is
considered OK in migrations (even necessary in some cases).

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-25 09:43:23 +00:00
Innokentii Konstantinov
6dcaf52efb
Remove integration html instructions (#2627)
Remove integration html instructions, They were migrated to the docs
2023-07-25 04:31:28 +00:00
Joey Orlando
63ac0972c5
remove deprecated heartbeat_heartbeat table/model (#2534)
# What this PR does

- Remove `heartbeat_heartbeat` table. This model/table does not seems to
be deprecated/used anywhere (no data in this in production/staging; see
more comments in the code about this).
2023-07-17 01:38:04 -04:00
Innokentii Konstantinov
ab7cd0aec2 Fix amv2 2023-06-14 14:43:00 +08:00
Innokentii Konstantinov
f0f2e7c8c6
Draft AlertManager integration v2 (#2167)
# What this PR does
Introduces AlertManagerV2 integration with better grouping and
autoresolving, not intended for production use yet.

---------

Co-authored-by: Ildar Iskhakov <Ildar.iskhakov@grafana.com>
2023-06-13 07:10:38 +00:00
Innokentii Konstantinov
056b0ddc7e
Add ratelimit for AmazonSNS (#2032)
Adds a ratelimit for AmazonSNS. 
AlertChannelDefining mixin is now injecting alert_receive_channel only
in request, not in kwargs to not to break AmazonSNS.
2023-05-26 09:57:26 +00:00
Vadim Stepanov
b8f54f1c53
Add docs & logo for AppDynamics integration (#1916)
# What this PR does
Adds docs & logo for AppDynamics integration. 
Main PR in private repo:
https://github.com/grafana/oncall-private/pull/1790.

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1621

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- No changelog (AppDynamics integration will be only available in cloud)
2023-05-11 16:41:51 +00:00
Joey Orlando
014a9c2ec2
allow the POST incoming alert endpoints to queue create_alert tasks independent of the database status (#1896)
# What this PR does

https://www.loom.com/share/18cc445117de4895a10892d56c7d3699

In preparation to upgrade our cloud databases, this PR makes some minor
changes which, after testing locally, allowed the `POST
/<integration_type>/<alert_channel_key>` endpoints to successfully
receive incoming alerts and queue the celery tasks.

I've tested all of the defined `POST
/integrations/v1/<integration_type>/<alert_channel_key>` endpoints by
sending `POST` requests to an integrations' URL while the MySQL database
was down, bringing the database back up, and ensuring the alerts were
created.

## Some other findings
- the integration heartbeat endpoints will not work as we interact w/
the database to persist the incoming heartbeat instance
- if the integration was created in the last 180 seconds, incoming
alerts will fail due to the way we cache the integration IDs
([code](https://github.com/grafana/oncall/blob/dev/engine/apps/integrations/mixins/alert_channel_defining_mixin.py#L47-L50))
- The `create_alert` celery task is set to `max_retries=None` and
`retry_backoff=True`. This means that the queued tasks will continue
retrying forever w/ an exponential backoff, until the alerts can be
created in the database (ie. when the database is back online).

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-05-10 12:36:23 +00:00
Oleg Zaytsev
41f7c23c65
Fix and tidy alertmanager heartbeat template (#1865)
# What this PR does


There was an unnecessary indentation in the `rules:` key which made it
invalid YAML.

Also replaced the mentions to Amixr with Grafana OnCall, used some
`<code>` tags and reworded some sentences.

Also removed the anchor tag from the webhook link: we don't want people
to follow that in their browser, we want them to copy it

## Result screenshot


![image](https://user-images.githubusercontent.com/1511481/236173565-b5201b81-4d69-4d0b-944a-a2106f8fbab3.png)

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-05-05 00:25:05 +00:00
Vadim Stepanov
d198b932c1
Zendesk inbound integration docs (#1860)
# What this PR does
Add docs & logo for Zendesk integration. Main PR in private repo:
https://github.com/grafana/oncall-private/pull/1772

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1627

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] No changelog (Zendesk integration will be only available in cloud)
2023-05-03 11:38:07 +01:00
Vadim Stepanov
50eb1fed5d
Jira inbound integration docs (#1842)
# What this PR does
Add docs & logo for Jira integration. Main PR in private repo:
https://github.com/grafana/oncall-private/pull/1769

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1620

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] No changelog (Jira integration will be only available in cloud)
2023-05-02 09:37:49 +00:00
Ildar Iskhakov
6e61643750
Limit number of alertmanager alerts in alert group to autoresolve (#1779)
# What this PR does

This PR set the limit so that workers won't attempt to autoresolve too
big alertmanager alert groups.

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-04-24 05:38:21 +00:00
Vadim Stepanov
ea60c0d247
Inbound email integration (#837)
This PR add Inbound Email integration.

It designed to support some variety of ESPs, but in prod we will use
Mailgun, so locally I tested it only with mailgun ESP.

**Important:**
To make it work on different clusters I'm planning to provide different
email domains for different regions, like ....@us.oncall.grafana.net,
...@eu.oncall.grafana.net

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-03-16 13:59:21 +08:00
Kristian Bremberg
b6d65ebb66
Chore: add integrity hash to templates (#1473)
# What this PR does

Adds integrity hash for scripts loaded from CDN's.
2023-03-07 11:17:07 +00:00
Michael Derynck
49946e6a4e
Change Organization Deleted/Moved Precedence (#1402)
# What this PR does
When an organization is migrated to a different cluster it has it's
`migration_destination_slug` set for redirection purposes but it also
needs to be deleted so scheduled tasks for it do not run in the old
cluster. By changing the order so moved has precedence over deleted API
calls will be correctly redirected for moved organizations while the
organization is still considered deleted to suppress tasks that are no
longer needed in the old cluster.

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-02-24 11:45:21 +00:00
Innokentii Konstantinov
c733d8b9f2
Cleanup ScenarioStep (#1213)
# What this PR does
This PR cleanup ScenarioStep. It's needed to simplify moving Slack to
the messaging backends in future.

1. Introduce AlertGroupSlackService to move logic from ScenarioStep.
Also it allowed to get rid of importing ScenarioSteps in the code not
related to processing of slack callbacks.
2. Remove tags from ScenarioSteps, they are unused.
3. Remove ScenarioStep.dispatch method. It just was calling
ScenarioStep.process_scenario.
4. Remove "action" param from process_scenario, it was unused.
5. Remove creation of SlackActionRecord on handling SlackEvents. We are
not using it, but it generates INSERT query on most of the user-slack
interactions.
6. Remove "random_prefix_for_routing" from ScenarioStep, it was unused.
## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-21 20:22:11 +01:00
Matias Bordese
90def88752
Add escalation chain option when creating a direct page alert group (#1143)
Also changes the default integration used when creating an alert group
for a direct page to a custom manual integration to avoid
conflicts/unexpected behaviors with existing manual alerts.
2023-01-18 12:58:26 -03:00
Innokentii Konstantinov
8abbcee050
Org soft-delete (#1073)
# What this PR does
It introduces soft-delete of organization, since grafana stacks are
soft-deleted too. Also, we had a problem with deleting orgs with large
amounts of alerts, so soft-deletion will fix this problem. I think, that
problem of cleaning alerts of deleted orgs should be solved as a part of
alert retention
2023-01-05 12:42:55 +08:00
Ildar Iskhakov
1ff0a7da99
1.1.5.5 -> dev (#1060)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
Co-authored-by: Julia <ferril.darkdiver@gmail.com>
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
2023-01-03 11:57:16 +08:00
Michael Derynck
6267e31b22 Check id instead of object to avoid unnecessary query 2022-10-28 15:45:51 -06:00
Michael Derynck
febe1b2185 Add basic organization moved exception handling and middleware 2022-10-20 15:04:58 -06:00
Vadim Stepanov
e67d3519fe
Restore email notifications (#621)
* remove email verification related code

* remove email verification related code

* remove sendgrid callback

* remove sendgrid related code

* remove sendgrid related code

* rename sendgrid app to email

* remove email from built-in channels

* remove email from built-in channels

* remove email from built-in channels

* add email backend: https://github.com/grafana/oncall/pull/50

* add email templater

* add email templater

* convert md to html

* add email settings to live settings

* use task to send email, handle some exceptions to create logs

* remove ERROR_NOTIFICATION_MAIL_DELIVERY_FAILED usage

* add email limit logic

* fix tests

* add docs

* remove old email templates

* remove old email templates

* add template_fields to messaging backend

* add messaging backends templates to public api

* add comment for deprecated fields

* fix test

* fix tests

* disable email by default

* don't retry on SMTPException and TimeoutError

* add tests

* bring email back to public api docs

* return ERROR_NOTIFICATION_MAIL_LIMIT_EXCEEDED

* make template_fields tuple

* build_subject_and_title -> build_subject_and_message

* add one more comment about template deprecation

* use 8 as backend id

* add comment about gaierror and BadHeaderError

* add comment on importing in notify_user_async

* edit oss docs
2022-10-19 12:32:56 +01:00
Michael Derynck
5f5f427c9f Add middleware to catch exception for missing integration, reduce spamminess of logs 2022-10-13 17:18:22 -06:00
Vadim Stepanov
b84b174e20
Allow multiple database and celery broker types (#582)
* add libs for celery + redis

* move redis & cache config to settings/base.py

* move rmq & celery config to settings/base.py

* BROKER -> BROKER_TYPE

* allow multiple database types

* flake8

* add sqlite db creation to dockerfile

* fix ci

* fix ci

* debug

* remove some defaults

* remove prints

* use local memory as cache on ci

* debug

* add DATABASE_DEFAULTS

* add ci test for sqlite + redis

* add ci test for sqlite + redis

* add ci test for sqlite + redis

* debug

* add redis healthcheck

* fix sqlite

* fix dev settings

* refactor dev settings

* tweak ci settings

* clear cache properly between tests

* move db and broker types to constants

* add librabbitmq deps

* use amqp instead of librabbitmq
2022-10-04 09:25:53 +01:00
Michael Derynck
ce8f4e53fa
Conform URLs (#281)
* Make any URLs build from env vars tolerant of path prefix, trailing/leading slashes

* Add comment

* Lint
2022-07-25 09:12:50 -06:00
Roman Pertl
7f6077e07f
Fix Typo (HearBeat vs HeartBeat) 2022-06-25 11:19:40 +02:00