# What this PR does
Makes organization sync create direct paging integrations for Grafana
teams that don't have one.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2302
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Deletes duplicate direct paging integrations (i.e. keeps only the first
direct paging integration per team).
Also adds a unique constraint that will make such duplicates impossible
at the DB level.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2302
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Moving metrics creation into separate task to make alert ingestion more
robust
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Julia <ferril.darkdiver@gmail.com>
# What this PR does
Adds an ability to extract labels from alert group payload. See
[demo](https://www.loom.com/share/cf2b746eea974547b76f44298e32a54f?sid=67ed1e58-40ed-4136-a201-6482fb7773d3).
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2304
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com>
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
# What this PR does
add more logging in `apps.alerts.tasks.notify_user.perform_notification`
task (related to https://github.com/grafana/oncall-private/issues/2318;
won't solve that issue but will help with the investigation)
# What this PR does
Fixes a bug when it's not possible to delete two or more integrations
having the same name at once.
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2313
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- Fix a bug when it's not possible to delete duplicate DP integrations
for "No team"
- Make so that when a team is deleted, its DP integration is deleted
automatically
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2296
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# Which issue(s) this PR fixes
- Fix issue that was causing our openapi schema to return HTTP 500 + add
an integration test which fetches the `.yaml` schema and validates that
the endpoint returns HTTP 200 (should hopefully prevent this from
happening again).
- add a few more type hints along the way
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Allows selecting integration labels that will be passed down to alert
groups
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2179
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Maxim <maxim.mordasov@grafana.com>
# What this PR does
Fix acknowledge reminder:
- check if organization was deleted
- improve logging
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR adds alert groups success ratio over last 48 hours
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Passes ALL integration labels down to alert groups, so it's easier to
create labels for alert groups locally.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2179
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Second part of https://github.com/grafana/oncall/pull/3256 to remove the
deprecated `AlertGroup.is_restricted` column. That change was released
in v1.3.54.
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Add an ability to continue escalations for alert groups from the point
it was in case if it was stopped
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR:
- adds a new method
`apps.schedules.ical_utils.get_cached_oncall_users_for_multiple_schedules`.
In short this method basically adds a layer of caching on top of
`apps.schedules.ical_utils.get_oncall_users_for_multiple_schedules`. We
store one cache value for each schedule. Cache results are stored for 15
minutes. To me this feels like a good balance between improving
performance + returning stale results. Cache values are stored as:
- key = `f"schedule_{schedule.public_primary_key}_oncall_users"`
- value = `[user.public_primary_key for user in
schedule.currently_oncall_users]`
- adds a `DELETE /alertgroups/<id>` internal API endpoint (needed by
Grafana Incident for the Add Responders integration)
- updates the `is_currently_oncall` query parameter for the `GET /users`
internal API endpoint to return ALL users when the query param value `==
"all"`
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2264
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- Improve performance of the specific `GET /users` and `GET /teams`
calls that're made by the Add Responders dropdown in the UI
- Add `GET /team/{teamId}` internal API route (needed by Grafana
Incident team for their Add Responders changes)
- Some UI improvements to the Add Responders popup (loading state +
pre-fetch users and teams when the drawer is opened)
- Re-enable django-admin only if `settings.SILK_PROFILER_ENABLED ==
True` (need to be able to log into django admin to auth to silk routes)
Closes#3231
Closes https://github.com/grafana/oncall-private/issues/2252
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- Add a new column, `grafana_incident_id`, to the `AlertGroup` model.
For now this is really just needed to determine if the Alert Group was
created, via a Direct Page, that originated from Grafana Incident.
- I understand that these IDs may be cluster specific. For now we will
not need to make OnCall backend -> Incident API calls. Should we need to
start doing this we will likely need to start syncing the Incident
plugin's provisioned API url into the `organization` in OnCall, such
that we make the API call to the right Incident backend.
- Add two new optional request body parameters to `POST /direct_paging`,
`source_url` and `grafana_incident_id`
- `source_url` - will easily allow Grafana Incident to specify the URL
to the Incident and have this populate the "Source" button
- `grafana_incident_id` - Grafana Incident can specify this such that we
have a link back to Incident (+ we know that the Alert Group was
generated from Incident)
- Hide the "Declare Incident" button in the UI if the Alert Group was
generated from Grafana Incident
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- add missing db migration files generated via `python manage.py
makemigrations`
- fail the `lint-migrations-backend-mysql-rabbitmq` GitHub Actions CI
job if there are missing Django database migration files
# Which issue(s) this PR fixes
- Fixes an issue where if the user does not appear in the
`UserHasNotification` query, we don't actually unpage the user and
therefore they still show up in the `paged_users` array. (unpaging ==
creating a `AlertGroupLogRecord.TYPE_UNPAGE_USER` log record)
- Fixes an issue where if a user is paged multiple times, they would
currently show up in `paged_users` > 1
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb
## Summary of backend changes
- Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method
and `AlertReceiveChannel.is_contactable` property. These are needed to
be able to (optionally) filter down teams, in the `GET /teams` internal
API endpoint
([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)),
to just teams that have a "contactable" Direct Paging integration
- `engine/apps/alerts/paging.py`
- update these functions to support new UX. In short `direct_paging` no
longer takes a list of `ScheduleNotifications` or an `EscalationChain`
object
- add `user_is_oncall` helper function
- add `_construct_title` helper function. In short if no `title` is
provided, which is the case for Direct Pages originating from OnCall
(either UI or Slack), then the format is `f"{from_user.username} is
paging <team.name (if team is specified> <comma separated list of
user.usernames> to join escalation"`
- `engine/apps/api/serializers/team.py` - add
`number_of_users_currently_oncall` attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52))
- `engine/apps/api/serializers/user.py` - add `is_currently_oncall`
attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162))
- `engine/apps/api/views/team.py` - add support for two new optional
query params `only_include_notifiable_teams` and `include_no_team`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70))
- `engine/apps/api/views/user.py`
- in the `GET /users` internal API endpoint, when specifying the
`search` query param now also search on `teams__name`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223);
this is a new UX requirement)
- add support for a new optional query param, `is_currently_oncall`, to
allow filtering users based on.. whether they are currently on call or
not
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282))
- remove `check_availability` endpoint (no longer used with new UX; also
removed references in frontend code)
- `engine/apps/slack/scenarios/paging.py` and
`engine/apps/slack/scenarios/manage_responders.py` - update Slack
workflows to support new UX. Schedules are no longer a concept here.
When creating a new alert group via `/escalate` the user either
specifies a team and/or user(s) (they must specify at least one of the
two and validation is done here to check this). When adding responders
to an existing alert group it's simply a list of users that they can
add, no more schedules.
- add `Organization.slack_is_configured` and
`Organization.telegram_is_configured` properties. These are needed to
support [this new functionality
](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272)
in the `AlertReceiveChannel` model.
## Summary of frontend changes
- Refactor/rename `EscalationVariants` component to `AddResponders` +
remove `grafana-plugin/src/containers/UserWarningModal` (no longer
needed with new UX)
- Remove `grafana-plugin/src/models/user.ts` as it seemed to be a
duplicate of `grafana-plugin/src/models/user/user.types.ts`
Related to https://github.com/grafana/incident/issues/4278
- Closes#3115
- Closes#3116
- Closes#3117
- Closes#3118
- Closes#3177
## TODO
- [x] make frontend changes
- [x] update Slack backend functionality
- [x] update public documentation
- [x] add/update e2e tests
## Post-deploy To-dos
- [ ] update dev/ops/production Slack bots to update `/escalate` command
description (should now say "Direct page a team or user(s)")
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2217
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Add filtering term length check for channel filter endpoints
## Which issue(s) this PR fixes
Closes https://github.com/grafana/oncall-private/issues/2114
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fixes https://github.com/grafana/oncall/issues/2320
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Prefer partial since it is the what the [docs
suggests](https://docs.djangoproject.com/en/4.2/topics/db/transactions/#performing-actions-after-commit).
Also because partial is evaluated immediately while lambda is evaluated
at runtime (which may be causing some issues):
```
>>> from functools import partial
>>> def foo(a, b, c):
... print(a, b, c)
...
>>> x = 10
>>> bar_partial = partial(foo, 1, 2, x)
>>> bar_lambda = lambda: foo(1, 2, x)
>>> x = 20
>>> bar_partial()
1 2 10
>>> bar_lambda()
1 2 20
```
# What this PR does
- Invalidate alert group cache on wipe
- Improve public API docs on alert group deletion
- Add / improve tests
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall/issues/3051
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Makes it possible to acknowledge/unacknowledge and resolve/unresolve
alert groups via public API, and makes sure these actions are reflected
properly in the alert group timeline.
## Demo
```bash
curl --request POST \
--header "Authorization: TOKEN" \
http://localhost:8080/api/v1/alert_groups/IQMHLV8INB24N/resolve
```
<img width="651" alt="Screenshot 2023-10-04 at 16 05 27"
src="https://github.com/grafana/oncall/assets/20116910/d4e66868-0132-4b6b-95c7-8424fced7c0b">
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/3051
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
* Create Direct Paging integration (with default route) when team is
created with bulk_update
* Create notification policies when user is created with bulk_update
* If user notification policies are empty change it to Email
* Minor markup and wording improvements
* Add grafana queue to helm chart
* Remove disabled commands for redis helm chart
* Improve Dockerfile caching
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- gracefully retry
`apps.alerts.tasks.delete_alert_group.delete_alert_group` when hitting
Slack ratelimits
- remove Slack messages from the DB as soon as they are deleted from
Slack, so the tasks are not retrying perpetually
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- Rename `SlackClientWithErrorHandling` to just `SlackClient`
- Add more error classes + improve the way errors are raised based on
the Slack error code
- Add API call retries on Slack server errors (e.g. when Slack returns
`5xx` errors)
- Refactor some methods working with Slack API + add tests
## Which issue(s) this PR fixes
- https://github.com/grafana/oncall-private/issues/1837
- https://github.com/grafana/oncall-private/issues/1840
- https://github.com/grafana/oncall-private/issues/1842
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fix escalation step "Notify if num alerts in time window" when
escalation policy was deleted during the escalation
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2017
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- update `slackclient` dependency to latest version. The version we were
using was 5 years old 😲
- first followed the v2 migration guide
[here](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x)
followed by the v3 migration guide
[here](https://slack.dev/python-slack-sdk/v3-migration/). The main
changes were:
- The PyPI project was renamed from `slackclient` to `slack_sdk`
- it is discouraged/harder to call `api_call` and encouraged to call the
helper methods (ex. `chat_postMessage`;
[note](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x#web-client-api-changes)
in migration guide docs)
- In 1.x, a failed api call would return the error payload to you and
have you handle the error. In 2.x, a failed api call will throw an
exception. To handle this in your code, you will have to wrap api calls
with a try except block. Since we overload `WebClient.api_call` this was
an easy change and only required a one line change
- remove `apps.slack.slack_client.slack_server.SlackClientServer` class.
The new version of `slack_sdk` handles the case that we needed to
overload for in the first place.
- merged `apps/slack/slack_client/slack_client.py` and
`apps/slack/slack_client/exceptions.py` into `apps/slack/client.py`
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Refactors the foreign key relationship between `AlertGroup` and
`SlackMessage` models (see
https://github.com/grafana/oncall-private/issues/1867 for more info).
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1867
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fix escalation snapshot building if last notified user in escalation
step "Notify users one by one (round-robin)" was deleted
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2148
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)