Commit graph

90 commits

Author SHA1 Message Date
Vadim Stepanov
8188dd5dd2
Create missing direct paging integrations (#3468)
# What this PR does

Makes organization sync create direct paging integrations for Grafana
teams that don't have one.

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall-private/issues/2302

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 17:18:18 +00:00
Joey Orlando
7c4b40a046
upgrade to Python 3.12 (#3456)
# What this PR does

Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 13:47:41 +00:00
Vadim Stepanov
381a9ecf54
Delete duplicate direct paging integrations (#3412)
# What this PR does

Deletes duplicate direct paging integrations (i.e. keeps only the first
direct paging integration per team).
Also adds a unique constraint that will make such duplicates impossible
at the DB level.

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall-private/issues/2302

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 11:19:12 +00:00
Matias Bordese
7aa78f5f73
Enable flake8-bugbear, fix issues (#3454)
Enables [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear),
checking for bugs/design problems, and [fixes the issues
found](https://pastebin.com/fEDBz6Ta) (some interesting ones,
particularly with mutable args).

Related to https://github.com/grafana/oncall/pull/3448
2023-11-29 15:04:48 +00:00
Matias Bordese
d730f6b2bf
Trigger distribute task after alert is committed (#3420)
Fix issue triggering task retries because alert is not yet committed to
the DB.
Similar to https://github.com/grafana/oncall/pull/3001.
2023-11-24 12:02:32 +00:00
Vadim Stepanov
cb2d4fa76b
Fix deleting integrations with duplicate names (#3397)
# What this PR does

Fixes a bug when it's not possible to delete two or more integrations
having the same name at once.

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/2313

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-21 12:44:21 +00:00
Yulya Artyukhina
d7d5c3aa28
Fix acknowledge reminder (#3345)
# What this PR does
Fix acknowledge reminder:
- check if organization was deleted
- improve logging

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-14 13:39:27 +00:00
Ildar Iskhakov
784c5ee7c1
Add notifications success ratio log to auditor (#3312)
# What this PR does

This PR adds alert groups success ratio over last 48 hours

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-10 16:39:13 +08:00
Vadim Stepanov
456829f768
Pass all integration labels down to alert groups (#3302)
Reverts grafana/oncall#3301
2023-11-08 14:04:58 +00:00
Yulya Artyukhina
7552de13e5
Add a command to continue escalations for alert groups (#3283)
# What this PR does
Add an ability to continue escalations for alert groups from the point
it was in case if it was stopped

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-07 13:44:23 +00:00
Matias Bordese
cc9dc66437
Move cache clear to fixtures, fix some deprecation notices (#3269) 2023-11-06 16:52:50 +00:00
Joey Orlando
2cbb20601e
Improve performance of GET /users and GET /teams endpoints used by add responders popup (#3241)
# What this PR does

- Improve performance of the specific `GET /users` and `GET /teams`
calls that're made by the Add Responders dropdown in the UI
- Add `GET /team/{teamId}` internal API route (needed by Grafana
Incident team for their Add Responders changes)
- Some UI improvements to the Add Responders popup (loading state +
pre-fetch users and teams when the drawer is opened)
- Re-enable django-admin only if `settings.SILK_PROFILER_ENABLED ==
True` (need to be able to log into django admin to auth to silk routes)

Closes #3231 
Closes https://github.com/grafana/oncall-private/issues/2252

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-03 12:40:54 -04:00
Joey Orlando
d49ccef8cb
address some minor direct paging backend issues (#3208)
# Which issue(s) this PR fixes

- Fixes an issue where if the user does not appear in the
`UserHasNotification` query, we don't actually unpage the user and
therefore they still show up in the `paged_users` array. (unpaging ==
creating a `AlertGroupLogRecord.TYPE_UNPAGE_USER` log record)
- Fixes an issue where if a user is paged multiple times, they would
currently show up in `paged_users` > 1

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-27 20:47:00 +00:00
Joey Orlando
697248dc75
Add responders improvements (#3128)
# What this PR does

https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb

## Summary of backend changes
- Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method
and `AlertReceiveChannel.is_contactable` property. These are needed to
be able to (optionally) filter down teams, in the `GET /teams` internal
API endpoint
([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)),
to just teams that have a "contactable" Direct Paging integration
- `engine/apps/alerts/paging.py`
- update these functions to support new UX. In short `direct_paging` no
longer takes a list of `ScheduleNotifications` or an `EscalationChain`
object
  - add `user_is_oncall` helper function
- add `_construct_title` helper function. In short if no `title` is
provided, which is the case for Direct Pages originating from OnCall
(either UI or Slack), then the format is `f"{from_user.username} is
paging <team.name (if team is specified> <comma separated list of
user.usernames> to join escalation"`
- `engine/apps/api/serializers/team.py` - add
`number_of_users_currently_oncall` attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52))
- `engine/apps/api/serializers/user.py` - add `is_currently_oncall`
attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162))
- `engine/apps/api/views/team.py` - add support for two new optional
query params `only_include_notifiable_teams` and `include_no_team`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70))
- `engine/apps/api/views/user.py`
- in the `GET /users` internal API endpoint, when specifying the
`search` query param now also search on `teams__name`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223);
this is a new UX requirement)
- add support for a new optional query param, `is_currently_oncall`, to
allow filtering users based on.. whether they are currently on call or
not
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282))
- remove `check_availability` endpoint (no longer used with new UX; also
removed references in frontend code)
- `engine/apps/slack/scenarios/paging.py` and
`engine/apps/slack/scenarios/manage_responders.py` - update Slack
workflows to support new UX. Schedules are no longer a concept here.
When creating a new alert group via `/escalate` the user either
specifies a team and/or user(s) (they must specify at least one of the
two and validation is done here to check this). When adding responders
to an existing alert group it's simply a list of users that they can
add, no more schedules.
- add `Organization.slack_is_configured` and
`Organization.telegram_is_configured` properties. These are needed to
support [this new functionality
](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272)
in the `AlertReceiveChannel` model.

## Summary of frontend changes
- Refactor/rename `EscalationVariants` component to `AddResponders` +
remove `grafana-plugin/src/containers/UserWarningModal` (no longer
needed with new UX)
- Remove `grafana-plugin/src/models/user.ts` as it seemed to be a
duplicate of `grafana-plugin/src/models/user/user.types.ts`

Related to https://github.com/grafana/incident/issues/4278

- Closes #3115
- Closes #3116
- Closes #3117
- Closes #3118 
- Closes #3177 

## TODO
- [x] make frontend changes
- [x] update Slack backend functionality
- [x] update public documentation
- [x] add/update e2e tests

## Post-deploy To-dos
- [ ] update dev/ops/production Slack bots to update `/escalate` command
description (should now say "Direct page a team or user(s)")

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-27 12:12:07 -04:00
Vadim Stepanov
1acb7018d0
Improve alert group deletion API (#3124)
# What this PR does

- Invalidate alert group cache on wipe
- Improve public API docs on alert group deletion
- Add / improve tests

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall/issues/3051

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-05 14:32:40 +01:00
Vadim Stepanov
a727450d49
Public API: Acknowledge & Resolve actions (#3108)
# What this PR does

Makes it possible to acknowledge/unacknowledge and resolve/unresolve
alert groups via public API, and makes sure these actions are reflected
properly in the alert group timeline.

## Demo

```bash
curl --request POST \
     --header "Authorization: TOKEN" \
     http://localhost:8080/api/v1/alert_groups/IQMHLV8INB24N/resolve
```

<img width="651" alt="Screenshot 2023-10-04 at 16 05 27"
src="https://github.com/grafana/oncall/assets/20116910/d4e66868-0132-4b6b-95c7-8424fced7c0b">

## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/3051

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-05 09:46:48 +01:00
Matias Bordese
29eae9b2c6
Fix slack notification for shift end affected by taken swap (#3092)
Related to https://github.com/grafana/oncall/issues/3096
2023-10-02 12:56:07 +00:00
Vadim Stepanov
6caacf4048
Handle Slack ratelimit on alert group deletion (#3038)
# What this PR does

- gracefully retry
`apps.alerts.tasks.delete_alert_group.delete_alert_group` when hitting
Slack ratelimits
- remove Slack messages from the DB as soon as they are deleted from
Slack, so the tasks are not retrying perpetually

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-19 08:41:47 +00:00
Vadim Stepanov
8b2212c7dc
Improve Slack error handling (#3000)
# What this PR does

- Rename `SlackClientWithErrorHandling` to just `SlackClient`
- Add more error classes + improve the way errors are raised based on
the Slack error code
- Add API call retries on Slack server errors (e.g. when Slack returns
`5xx` errors)
- Refactor some methods working with Slack API + add tests

## Which issue(s) this PR fixes

- https://github.com/grafana/oncall-private/issues/1837
- https://github.com/grafana/oncall-private/issues/1840
- https://github.com/grafana/oncall-private/issues/1842

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-12 09:49:16 +00:00
Matias Bordese
bf4d948449
Update slack schedule shift change notification (#2949)
Related to https://github.com/grafana/oncall/issues/2916

Updated notification:

![slack-shift-notification](https://github.com/grafana/oncall/assets/260710/825fda59-6636-44c1-9740-8976e7c109a7)
2023-09-07 13:00:12 +00:00
Yulya Artyukhina
6ff61ad172
Fix escalation step "Notify if num alerts in time window" (#2965)
# What this PR does
Fix escalation step "Notify if num alerts in time window" when
escalation policy was deleted during the escalation

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2017

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-05 10:32:59 +00:00
Joey Orlando
a9155130df
update slack_sdk dependency to latest version (#2947)
# What this PR does

- update `slackclient` dependency to latest version. The version we were
using was 5 years old 😲
- first followed the v2 migration guide
[here](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x)
followed by the v3 migration guide
[here](https://slack.dev/python-slack-sdk/v3-migration/). The main
changes were:
    - The PyPI project was renamed from `slackclient` to `slack_sdk`
- it is discouraged/harder to call `api_call` and encouraged to call the
helper methods (ex. `chat_postMessage`;
[note](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x#web-client-api-changes)
in migration guide docs)
- In 1.x, a failed api call would return the error payload to you and
have you handle the error. In 2.x, a failed api call will throw an
exception. To handle this in your code, you will have to wrap api calls
with a try except block. Since we overload `WebClient.api_call` this was
an easy change and only required a one line change
- remove `apps.slack.slack_client.slack_server.SlackClientServer` class.
The new version of `slack_sdk` handles the case that we needed to
overload for in the first place.
- merged `apps/slack/slack_client/slack_client.py` and
`apps/slack/slack_client/exceptions.py` into `apps/slack/client.py`

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-05 11:31:59 +02:00
Yulya Artyukhina
cc92c53f84
Fix build escalation snapshot (#2954)
# What this PR does
Fix escalation snapshot building if last notified user in escalation
step "Notify users one by one (round-robin)" was deleted
 
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2148

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-04 11:10:28 +00:00
Yulya Artyukhina
a1a5adf891
Fix silence for paused escalations or alert groups with empty escalation chain (#2929)
# What this PR does
Fix silence for paused escalations or alert groups with empty escalation
chain

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/2912

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-31 11:47:13 +00:00
Yulya Artyukhina
1ff6ac3380
Set next step eta to None if escalation was paused (#2901)
# What this PR does
Set next step eta to None if escalation was paused (escalation step
"Continue escalation if num alerts > X in time window"

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2028

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-29 13:46:41 +00:00
Yulya Artyukhina
5bc7351671
Fix next step eta for silenced alert groups (#2887)
# What this PR does
Update `next_step_eta` in alert group escalation snapshot when alert
group is silenced for period

## Which issue(s) this PR fixes
Fixes the issue related to [this
one](https://github.com/grafana/oncall-private/issues/2028)

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
2023-08-28 12:13:01 +00:00
Yulya Artyukhina
58a9a39efe
Improve getting/updating contact points for Grafana Alerting integration (#2742)
This PR improves Grafana Alerting integration:
- get alerting contact points "on fly" instead of keeping them in db
- add ability to connect more than one contact point
- add ability to create new contact point on create Grafana Alerting
integration
- show warnings in integration settings for non-active contact points
- remove creation alerting notification policies on create Grafana
Alerting integration

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
2023-08-18 12:12:29 +02:00
Matias Bordese
179a1db471
Add alertmanager integration for heartbeat support (#2807)
Related to https://github.com/grafana/oncall/issues/2801 and
https://github.com/grafana/support-escalations/issues/7081.

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-08-17 13:22:37 +00:00
Vadim Stepanov
6f0921a3e4
Fix Slack acknowledgment reminders (#2769)
# What this PR does

Fixes a bug with Slack acknowledgment reminders not being sent (+ some
refactoring and unit tests).

## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/2756

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-11 09:41:56 +00:00
Matias Bordese
bb9f647608
Filter out untaken swaps from final schedule and shift notifications (#2748)
Avoid creating (or notifying) about potential event splits resulting
from untaken swap requests.
2023-08-04 17:43:54 +00:00
Yulya Artyukhina
0494afac85
Update schedule slack notifications (#2710)
# What this PR does

Update schedule slack notifications to use schedule final events instead
of getting events from iCal

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-03 12:38:01 +00:00
Joey Orlando
8db1ea5235
remove some references to amixr (#2698)
# What this PR does

Update references to amixr in various spots in the docs/code + some
`.md` IDE autoformatter changes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-01 14:22:42 -04:00
Innokentii Konstantinov
1ccb9d6979
AlertManager v2 (#2643)
Introduce AlertManager v2 integration with improved internal behaviour

it's using grouping from AlertManager, not trying to re-group alerts on
OnCall side.
Existing AlertManager and Grafana Alerting integrations are marked as
Legacy with options to migrate them manually now or be migrated
automatically after DEPRECATION DATE(TBD).
Integration urls and public api responses stay the same both for legacy
and new integrations.

---------

Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-08-01 12:18:52 +08:00
Joey Orlando
f77a54b518
Shift Swap Requests in Slack + improve typing for Slack django app (#2653)
# What this PR does

**Shift Swap Requests**

https://www.loom.com/share/860c3337b338412cbd2ac4024260f3e8?sid=3d91b558-b4de-4351-8b45-8a99b7302346

**Other**
- Drastically improve the typing in the `slack` Django app, and several
other models/functions that were consumed by logic within the `slack`
Django app (ex. setting `RelatedManager` type hints on various models)
https://www.loom.com/share/da6b9984519c48d59a45d3c93c08d7dc

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-28 15:11:38 +00:00
Matias Bordese
3ff6e0e492
Refactoring schedule final events (#2651)
Update `list_users_to_notify_from_ical` to use schedule final events
2023-07-28 11:59:33 +00:00
Matvey Kukuy
abc1c94355
Incident -> Alert Group (#2090)
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-07-26 15:08:07 +00:00
Vadim Stepanov
55299995f7
Fix "Continue escalation if >X alerts per Y minutes" escalation step (#2636)
# What this PR does

Fixes a faulty escalation step "Continue escalation if >X alerts per Y
minutes".

## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/895

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-26 13:33:24 +01:00
Vadim Stepanov
faa7099297
Direct paging: page if acked or silenced, show warning when resolved (#2639)
# What this PR does

The current implementation of the direct paging feature doesn't page
additional responders if the alert group is acknowledged, silenced, or
resolved, and doesn't show any warnings for such cases.
This PR makes so that adding responders for silenced & acknowledged
alert groups actually pages the selected user / schedule. For resolved
alert groups, a warning message will be shown both in web UI and Slack.

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall/issues/2442

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-26 13:25:26 +01:00
Matias Bordese
5341a7ea5b
Revert "Refactoring schedule final events for reusability" (#2642)
Reverts grafana/oncall#2625

Found a small issue, reverting for now until the problem is fixed not to
block other changes
2023-07-25 17:37:33 -03:00
Matias Bordese
95723583aa
Refactoring schedule final events for reusability (#2625)
First step towards reusing `schedule.final_events` where current /
upcoming schedule shifts information is needed (eg. escalations, shift
notifications, etc)
2023-07-25 18:45:50 +00:00
Vadim Stepanov
602ed535e3
Fix duplicate orders on routes and escalation policies (#2568)
# What this PR does

Fix duplicate `order` values for models `EscalationPolicy` and
`ChannelFilter` using the same approach as
https://github.com/grafana/oncall/pull/2278.

- Make internal API action `move_to_position` a part of
[OrderedModelViewSet](https://github.com/grafana/oncall/pull/2568/files#diff-eb62521ccbcb072d1bd2156adeadae3b5895bce6d0d54432a23db3948b0ada54R11-R34),
so all ordered model views use the same logic.
- Make public API serializers for ordered models inherit from
[OrderedModelSerializer](https://github.com/grafana/oncall/pull/2568/files#diff-d749f94af5a49adaf5074325cdfad10ddd5a52dbfd78b49582867ebb9c92fae5R6-R38),
so ordered model views are consistent with each other in public API.
- Remove `order` from plugin fronted, since it's not being used
anywhere. The frontend uses step indices & `move_to_position` action
instead.
- Make escalation snapshot use step indices instead of orders, since
orders are not guaranteed to be sequential (+fix a minor off-by-one bug)

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1680

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-18 17:17:53 +00:00
Joey Orlando
9cc74e5b67
remove references to AlertGroup.is_archived and AlertGroup.unarchived_objects (#2524)
# What this PR does

This is a follow up to #2502 which started to remove logic to
"archiving" alert groups. This PR:
- removes all references to `AlertGroup.is_archived` and marks the
column as deprecated. We will remove it in the next release
- removes the `AlertGroup.unarchived_objects` `Manager`
- renames the `AlertGroup.all_objects` `Manager` to `AlertGroup.objects`

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-18 11:48:34 +00:00
Joey Orlando
d5b43b0439
minor improvements for check_escalation_finished celery task (#2554)
# What this PR does

This PR adds some enhancements to the `check_escalation_finished` celery
task. It short-circuits auditing of an alert group if it does not have
an escalation chain associated with it. In
`EscalationSnapshotMixin.start_escalation_if_needed`
we will not set `raw_escalation_snapshot`
([here](https://github.com/grafana/oncall/blob/dev/engine/apps/alerts/escalation_snapshot/escalation_snapshot_mixin.py#L262))
in this case:
```python3
def start_escalation_if_needed(self, countdown=START_ESCALATION_DELAY, eta=None):
        """
        :type self:AlertGroup
        """
        AlertGroup = apps.get_model("alerts", "AlertGroup")

        is_on_maintenace_or_debug_mode = self.channel.maintenance_mode is not None

        if (
            self.is_restricted
            or is_on_maintenace_or_debug_mode
            or self.pause_escalation
            or not self.escalation_chain_exists <-- here
        ):
            logger.debug(
                f"Not escalating alert group w/ pk: {self.pk}\n"
                f"is_restricted: {self.is_restricted}\n"
                f"is_on_maintenace_or_debug_mode: {is_on_maintenace_or_debug_mode}\n"
                f"pause_escalation: {self.pause_escalation}\n"
                f"escalation_chain_exists: {self.escalation_chain_exists}"
            )
            return

        logger.debug(f"Start escalation for alert group with pk: {self.pk}")

        # take raw escalation snapshot from db if escalation is paused
        raw_escalation_snapshot = (
            self.build_raw_escalation_snapshot() if not self.pause_escalation else self.raw_escalation_snapshot
        )
        task_id = celery_uuid()

        AlertGroup.all_objects.filter(pk=self.pk,).update(
            active_escalation_id=task_id,
            is_escalation_finished=False,
            raw_escalation_snapshot=raw_escalation_snapshot,
        )
```

`EscalationSnapshotMixin.escalation_chain_exists` is as such:
```python3
@property
    def escalation_chain_exists(self) -> bool:
        if self.pause_escalation:
            return False
        elif not self.channel_filter:
            return False
        return self.channel_filter.escalation_chain is not None
```

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-07-17 14:04:53 +00:00
Vadim Stepanov
69bafb61f1
Direct paging improvements (#2537)
# What this PR does

- Deprecates `/oncall` Slack command in favour of `/esalate` (direct
paging) + fixes a regression bug in both commands
- Unifies direct paging UX across Slack & Web UI (or at least makes an
attempt to make things more similar). Kudos to @iskhakov for all the
great work on this recently!
- A bunch of minor changes that hopefully make direct paging more usable
- TODO: documentation updates will be added in a separate PR

## Screenshots

### No issues scenario

Slack:

<img width="522" alt="Screenshot 2023-07-14 at 23 53 11"
src="https://github.com/grafana/oncall/assets/20116910/ec15a18f-d817-4177-b1f2-6b89d79bb361">


Web UI: 

<img width="1172" alt="Screenshot 2023-07-14 at 23 52 25"
src="https://github.com/grafana/oncall/assets/20116910/813f967c-2fdd-4868-9287-487dbfa7cea6">


### Not configured scenario

Slack:

<img width="519" alt="Screenshot 2023-07-14 at 23 45 22"
src="https://github.com/grafana/oncall/assets/20116910/932fa05c-81ea-42ca-be80-41b05f767d3e">

Web UI:

<img width="1172" alt="Screenshot 2023-07-14 at 23 47 31"
src="https://github.com/grafana/oncall/assets/20116910/6bcb07e4-2e50-4120-9fac-be8b0277e181">

### `/oncall` deprecation warning

<img width="521" alt="Screenshot 2023-07-17 at 10 31 56"
src="https://github.com/grafana/oncall/assets/20116910/4ff28337-1693-4af0-81d9-9eda90099c1b">


## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/2442

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-17 14:21:56 +01:00
Innokentii Konstantinov
aa4edad4a7
Remove INTEGRATIONS_TO_REVERSE_URL_MAP (#2533)
This PR is a step for migrating to AlertManager v2. I want to simplify
config thing a little to be able to move forward.
In INTEGRATIONS_TO_REVERSE_URL_MAP key equal value and in all cases we
just don't need it.

Public API is different, since I need this mapping for migration to
AlertManager v2 to provide backward-compatibility. It will be done in
next PRs.
2023-07-17 04:43:24 +00:00
Joey Orlando
767c5352fa
augment API response pagination attributes (#2471)
# What this PR does

This PR:
- adds a few attributes to paginated API responses
- removes channel filter "send demo alert" internal API endpoint + tests
(this endpoint was marked as deprecated + not consumed by the web UI)

With the new paginated API response schema, the web UI will no longer
need to:
- hardcode `ITEMS_PER_PAGE` for each table
- manually calculate total number of pages

(these two things ☝️ will be done in
https://github.com/grafana/oncall/issues/2476)

For `GET /api/internal/v1/alertgroups` the response will now look like
this:
```diff
{
    "next": <url> | None,
    "previous": <url> | None,
    "results": [],
++  "page_size": <int>
}
```

For all other paginated API responses, the response will now look like:
```diff
{
    "count": <int>,
    "next": <url> | None,
    "previous": <url> | None,
    "results": [],
++  "page_size": <int>,
++  "current_page_number": <int>,
++  "total_pages": <int>
}
```

## TODO
- [x] update public API docs to include these new attributes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-14 11:19:40 -04:00
Joey Orlando
681087117c
remove deprecated gitops views (#2530)
# What this PR does

I don't see any traffic to these endpoint URLs over the past week. The
terraform "renderers" in `engine/apps/alerts/terraform_renderer` make a
lot of references to amixr, making think these are legacy, and can be
removed.

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-07-14 05:13:52 -04:00
Joey Orlando
77f6dedce5
add index on started_at column in alert groups (#2516)
# What this PR does

Adds an index on the `started_at` column in the `alerts_alertgroup`
table. For the alert groups query used by the
`check_escalation_finished_task`, this resulted in a huge performance
boost, taking the query time from 89mins to 4secs (on our largest
production dataset).

## Which issue(s) this PR fixes

closes #724
closes https://github.com/grafana/oncall-private/issues/1713

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-13 05:22:59 -04:00
Ildar Iskhakov
0b28815d46
Unhide direct paging integration (#2483)
# What this PR does
Fixes https://github.com/grafana/oncall/issues/2442



https://github.com/grafana/oncall/assets/2262529/08bb8e5f-acc6-4f2d-9e38-717c9f37e3da





## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-13 13:41:31 +08:00
Joey Orlando
d24dc4b630
remove organization maintenance mode + fix integration maintenance mode (#2511) 2023-07-12 16:41:44 -04:00