Commit graph

79 commits

Author SHA1 Message Date
Joey Orlando
2cbb20601e
Improve performance of GET /users and GET /teams endpoints used by add responders popup (#3241)
# What this PR does

- Improve performance of the specific `GET /users` and `GET /teams`
calls that're made by the Add Responders dropdown in the UI
- Add `GET /team/{teamId}` internal API route (needed by Grafana
Incident team for their Add Responders changes)
- Some UI improvements to the Add Responders popup (loading state +
pre-fetch users and teams when the drawer is opened)
- Re-enable django-admin only if `settings.SILK_PROFILER_ENABLED ==
True` (need to be able to log into django admin to auth to silk routes)

Closes #3231 
Closes https://github.com/grafana/oncall-private/issues/2252

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-03 12:40:54 -04:00
Joey Orlando
d49ccef8cb
address some minor direct paging backend issues (#3208)
# Which issue(s) this PR fixes

- Fixes an issue where if the user does not appear in the
`UserHasNotification` query, we don't actually unpage the user and
therefore they still show up in the `paged_users` array. (unpaging ==
creating a `AlertGroupLogRecord.TYPE_UNPAGE_USER` log record)
- Fixes an issue where if a user is paged multiple times, they would
currently show up in `paged_users` > 1

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-27 20:47:00 +00:00
Joey Orlando
697248dc75
Add responders improvements (#3128)
# What this PR does

https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb

## Summary of backend changes
- Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method
and `AlertReceiveChannel.is_contactable` property. These are needed to
be able to (optionally) filter down teams, in the `GET /teams` internal
API endpoint
([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)),
to just teams that have a "contactable" Direct Paging integration
- `engine/apps/alerts/paging.py`
- update these functions to support new UX. In short `direct_paging` no
longer takes a list of `ScheduleNotifications` or an `EscalationChain`
object
  - add `user_is_oncall` helper function
- add `_construct_title` helper function. In short if no `title` is
provided, which is the case for Direct Pages originating from OnCall
(either UI or Slack), then the format is `f"{from_user.username} is
paging <team.name (if team is specified> <comma separated list of
user.usernames> to join escalation"`
- `engine/apps/api/serializers/team.py` - add
`number_of_users_currently_oncall` attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52))
- `engine/apps/api/serializers/user.py` - add `is_currently_oncall`
attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162))
- `engine/apps/api/views/team.py` - add support for two new optional
query params `only_include_notifiable_teams` and `include_no_team`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70))
- `engine/apps/api/views/user.py`
- in the `GET /users` internal API endpoint, when specifying the
`search` query param now also search on `teams__name`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223);
this is a new UX requirement)
- add support for a new optional query param, `is_currently_oncall`, to
allow filtering users based on.. whether they are currently on call or
not
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282))
- remove `check_availability` endpoint (no longer used with new UX; also
removed references in frontend code)
- `engine/apps/slack/scenarios/paging.py` and
`engine/apps/slack/scenarios/manage_responders.py` - update Slack
workflows to support new UX. Schedules are no longer a concept here.
When creating a new alert group via `/escalate` the user either
specifies a team and/or user(s) (they must specify at least one of the
two and validation is done here to check this). When adding responders
to an existing alert group it's simply a list of users that they can
add, no more schedules.
- add `Organization.slack_is_configured` and
`Organization.telegram_is_configured` properties. These are needed to
support [this new functionality
](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272)
in the `AlertReceiveChannel` model.

## Summary of frontend changes
- Refactor/rename `EscalationVariants` component to `AddResponders` +
remove `grafana-plugin/src/containers/UserWarningModal` (no longer
needed with new UX)
- Remove `grafana-plugin/src/models/user.ts` as it seemed to be a
duplicate of `grafana-plugin/src/models/user/user.types.ts`

Related to https://github.com/grafana/incident/issues/4278

- Closes #3115
- Closes #3116
- Closes #3117
- Closes #3118 
- Closes #3177 

## TODO
- [x] make frontend changes
- [x] update Slack backend functionality
- [x] update public documentation
- [x] add/update e2e tests

## Post-deploy To-dos
- [ ] update dev/ops/production Slack bots to update `/escalate` command
description (should now say "Direct page a team or user(s)")

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-27 12:12:07 -04:00
Vadim Stepanov
1acb7018d0
Improve alert group deletion API (#3124)
# What this PR does

- Invalidate alert group cache on wipe
- Improve public API docs on alert group deletion
- Add / improve tests

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall/issues/3051

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-05 14:32:40 +01:00
Vadim Stepanov
a727450d49
Public API: Acknowledge & Resolve actions (#3108)
# What this PR does

Makes it possible to acknowledge/unacknowledge and resolve/unresolve
alert groups via public API, and makes sure these actions are reflected
properly in the alert group timeline.

## Demo

```bash
curl --request POST \
     --header "Authorization: TOKEN" \
     http://localhost:8080/api/v1/alert_groups/IQMHLV8INB24N/resolve
```

<img width="651" alt="Screenshot 2023-10-04 at 16 05 27"
src="https://github.com/grafana/oncall/assets/20116910/d4e66868-0132-4b6b-95c7-8424fced7c0b">

## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/3051

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-10-05 09:46:48 +01:00
Matias Bordese
29eae9b2c6
Fix slack notification for shift end affected by taken swap (#3092)
Related to https://github.com/grafana/oncall/issues/3096
2023-10-02 12:56:07 +00:00
Vadim Stepanov
6caacf4048
Handle Slack ratelimit on alert group deletion (#3038)
# What this PR does

- gracefully retry
`apps.alerts.tasks.delete_alert_group.delete_alert_group` when hitting
Slack ratelimits
- remove Slack messages from the DB as soon as they are deleted from
Slack, so the tasks are not retrying perpetually

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-19 08:41:47 +00:00
Vadim Stepanov
8b2212c7dc
Improve Slack error handling (#3000)
# What this PR does

- Rename `SlackClientWithErrorHandling` to just `SlackClient`
- Add more error classes + improve the way errors are raised based on
the Slack error code
- Add API call retries on Slack server errors (e.g. when Slack returns
`5xx` errors)
- Refactor some methods working with Slack API + add tests

## Which issue(s) this PR fixes

- https://github.com/grafana/oncall-private/issues/1837
- https://github.com/grafana/oncall-private/issues/1840
- https://github.com/grafana/oncall-private/issues/1842

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-12 09:49:16 +00:00
Matias Bordese
bf4d948449
Update slack schedule shift change notification (#2949)
Related to https://github.com/grafana/oncall/issues/2916

Updated notification:

![slack-shift-notification](https://github.com/grafana/oncall/assets/260710/825fda59-6636-44c1-9740-8976e7c109a7)
2023-09-07 13:00:12 +00:00
Yulya Artyukhina
6ff61ad172
Fix escalation step "Notify if num alerts in time window" (#2965)
# What this PR does
Fix escalation step "Notify if num alerts in time window" when
escalation policy was deleted during the escalation

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2017

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-05 10:32:59 +00:00
Joey Orlando
a9155130df
update slack_sdk dependency to latest version (#2947)
# What this PR does

- update `slackclient` dependency to latest version. The version we were
using was 5 years old 😲
- first followed the v2 migration guide
[here](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x)
followed by the v3 migration guide
[here](https://slack.dev/python-slack-sdk/v3-migration/). The main
changes were:
    - The PyPI project was renamed from `slackclient` to `slack_sdk`
- it is discouraged/harder to call `api_call` and encouraged to call the
helper methods (ex. `chat_postMessage`;
[note](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x#web-client-api-changes)
in migration guide docs)
- In 1.x, a failed api call would return the error payload to you and
have you handle the error. In 2.x, a failed api call will throw an
exception. To handle this in your code, you will have to wrap api calls
with a try except block. Since we overload `WebClient.api_call` this was
an easy change and only required a one line change
- remove `apps.slack.slack_client.slack_server.SlackClientServer` class.
The new version of `slack_sdk` handles the case that we needed to
overload for in the first place.
- merged `apps/slack/slack_client/slack_client.py` and
`apps/slack/slack_client/exceptions.py` into `apps/slack/client.py`

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-05 11:31:59 +02:00
Yulya Artyukhina
cc92c53f84
Fix build escalation snapshot (#2954)
# What this PR does
Fix escalation snapshot building if last notified user in escalation
step "Notify users one by one (round-robin)" was deleted
 
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2148

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-09-04 11:10:28 +00:00
Yulya Artyukhina
a1a5adf891
Fix silence for paused escalations or alert groups with empty escalation chain (#2929)
# What this PR does
Fix silence for paused escalations or alert groups with empty escalation
chain

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/2912

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-31 11:47:13 +00:00
Yulya Artyukhina
1ff6ac3380
Set next step eta to None if escalation was paused (#2901)
# What this PR does
Set next step eta to None if escalation was paused (escalation step
"Continue escalation if num alerts > X in time window"

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2028

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-29 13:46:41 +00:00
Yulya Artyukhina
5bc7351671
Fix next step eta for silenced alert groups (#2887)
# What this PR does
Update `next_step_eta` in alert group escalation snapshot when alert
group is silenced for period

## Which issue(s) this PR fixes
Fixes the issue related to [this
one](https://github.com/grafana/oncall-private/issues/2028)

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
2023-08-28 12:13:01 +00:00
Yulya Artyukhina
58a9a39efe
Improve getting/updating contact points for Grafana Alerting integration (#2742)
This PR improves Grafana Alerting integration:
- get alerting contact points "on fly" instead of keeping them in db
- add ability to connect more than one contact point
- add ability to create new contact point on create Grafana Alerting
integration
- show warnings in integration settings for non-active contact points
- remove creation alerting notification policies on create Grafana
Alerting integration

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
2023-08-18 12:12:29 +02:00
Matias Bordese
179a1db471
Add alertmanager integration for heartbeat support (#2807)
Related to https://github.com/grafana/oncall/issues/2801 and
https://github.com/grafana/support-escalations/issues/7081.

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-08-17 13:22:37 +00:00
Vadim Stepanov
6f0921a3e4
Fix Slack acknowledgment reminders (#2769)
# What this PR does

Fixes a bug with Slack acknowledgment reminders not being sent (+ some
refactoring and unit tests).

## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/2756

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-11 09:41:56 +00:00
Matias Bordese
bb9f647608
Filter out untaken swaps from final schedule and shift notifications (#2748)
Avoid creating (or notifying) about potential event splits resulting
from untaken swap requests.
2023-08-04 17:43:54 +00:00
Yulya Artyukhina
0494afac85
Update schedule slack notifications (#2710)
# What this PR does

Update schedule slack notifications to use schedule final events instead
of getting events from iCal

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-03 12:38:01 +00:00
Joey Orlando
8db1ea5235
remove some references to amixr (#2698)
# What this PR does

Update references to amixr in various spots in the docs/code + some
`.md` IDE autoformatter changes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-08-01 14:22:42 -04:00
Innokentii Konstantinov
1ccb9d6979
AlertManager v2 (#2643)
Introduce AlertManager v2 integration with improved internal behaviour

it's using grouping from AlertManager, not trying to re-group alerts on
OnCall side.
Existing AlertManager and Grafana Alerting integrations are marked as
Legacy with options to migrate them manually now or be migrated
automatically after DEPRECATION DATE(TBD).
Integration urls and public api responses stay the same both for legacy
and new integrations.

---------

Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-08-01 12:18:52 +08:00
Joey Orlando
f77a54b518
Shift Swap Requests in Slack + improve typing for Slack django app (#2653)
# What this PR does

**Shift Swap Requests**

https://www.loom.com/share/860c3337b338412cbd2ac4024260f3e8?sid=3d91b558-b4de-4351-8b45-8a99b7302346

**Other**
- Drastically improve the typing in the `slack` Django app, and several
other models/functions that were consumed by logic within the `slack`
Django app (ex. setting `RelatedManager` type hints on various models)
https://www.loom.com/share/da6b9984519c48d59a45d3c93c08d7dc

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-28 15:11:38 +00:00
Matias Bordese
3ff6e0e492
Refactoring schedule final events (#2651)
Update `list_users_to_notify_from_ical` to use schedule final events
2023-07-28 11:59:33 +00:00
Matvey Kukuy
abc1c94355
Incident -> Alert Group (#2090)
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-07-26 15:08:07 +00:00
Vadim Stepanov
55299995f7
Fix "Continue escalation if >X alerts per Y minutes" escalation step (#2636)
# What this PR does

Fixes a faulty escalation step "Continue escalation if >X alerts per Y
minutes".

## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/895

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-26 13:33:24 +01:00
Vadim Stepanov
faa7099297
Direct paging: page if acked or silenced, show warning when resolved (#2639)
# What this PR does

The current implementation of the direct paging feature doesn't page
additional responders if the alert group is acknowledged, silenced, or
resolved, and doesn't show any warnings for such cases.
This PR makes so that adding responders for silenced & acknowledged
alert groups actually pages the selected user / schedule. For resolved
alert groups, a warning message will be shown both in web UI and Slack.

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall/issues/2442

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-26 13:25:26 +01:00
Matias Bordese
5341a7ea5b
Revert "Refactoring schedule final events for reusability" (#2642)
Reverts grafana/oncall#2625

Found a small issue, reverting for now until the problem is fixed not to
block other changes
2023-07-25 17:37:33 -03:00
Matias Bordese
95723583aa
Refactoring schedule final events for reusability (#2625)
First step towards reusing `schedule.final_events` where current /
upcoming schedule shifts information is needed (eg. escalations, shift
notifications, etc)
2023-07-25 18:45:50 +00:00
Vadim Stepanov
602ed535e3
Fix duplicate orders on routes and escalation policies (#2568)
# What this PR does

Fix duplicate `order` values for models `EscalationPolicy` and
`ChannelFilter` using the same approach as
https://github.com/grafana/oncall/pull/2278.

- Make internal API action `move_to_position` a part of
[OrderedModelViewSet](https://github.com/grafana/oncall/pull/2568/files#diff-eb62521ccbcb072d1bd2156adeadae3b5895bce6d0d54432a23db3948b0ada54R11-R34),
so all ordered model views use the same logic.
- Make public API serializers for ordered models inherit from
[OrderedModelSerializer](https://github.com/grafana/oncall/pull/2568/files#diff-d749f94af5a49adaf5074325cdfad10ddd5a52dbfd78b49582867ebb9c92fae5R6-R38),
so ordered model views are consistent with each other in public API.
- Remove `order` from plugin fronted, since it's not being used
anywhere. The frontend uses step indices & `move_to_position` action
instead.
- Make escalation snapshot use step indices instead of orders, since
orders are not guaranteed to be sequential (+fix a minor off-by-one bug)

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1680

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-18 17:17:53 +00:00
Joey Orlando
9cc74e5b67
remove references to AlertGroup.is_archived and AlertGroup.unarchived_objects (#2524)
# What this PR does

This is a follow up to #2502 which started to remove logic to
"archiving" alert groups. This PR:
- removes all references to `AlertGroup.is_archived` and marks the
column as deprecated. We will remove it in the next release
- removes the `AlertGroup.unarchived_objects` `Manager`
- renames the `AlertGroup.all_objects` `Manager` to `AlertGroup.objects`

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-18 11:48:34 +00:00
Joey Orlando
d5b43b0439
minor improvements for check_escalation_finished celery task (#2554)
# What this PR does

This PR adds some enhancements to the `check_escalation_finished` celery
task. It short-circuits auditing of an alert group if it does not have
an escalation chain associated with it. In
`EscalationSnapshotMixin.start_escalation_if_needed`
we will not set `raw_escalation_snapshot`
([here](https://github.com/grafana/oncall/blob/dev/engine/apps/alerts/escalation_snapshot/escalation_snapshot_mixin.py#L262))
in this case:
```python3
def start_escalation_if_needed(self, countdown=START_ESCALATION_DELAY, eta=None):
        """
        :type self:AlertGroup
        """
        AlertGroup = apps.get_model("alerts", "AlertGroup")

        is_on_maintenace_or_debug_mode = self.channel.maintenance_mode is not None

        if (
            self.is_restricted
            or is_on_maintenace_or_debug_mode
            or self.pause_escalation
            or not self.escalation_chain_exists <-- here
        ):
            logger.debug(
                f"Not escalating alert group w/ pk: {self.pk}\n"
                f"is_restricted: {self.is_restricted}\n"
                f"is_on_maintenace_or_debug_mode: {is_on_maintenace_or_debug_mode}\n"
                f"pause_escalation: {self.pause_escalation}\n"
                f"escalation_chain_exists: {self.escalation_chain_exists}"
            )
            return

        logger.debug(f"Start escalation for alert group with pk: {self.pk}")

        # take raw escalation snapshot from db if escalation is paused
        raw_escalation_snapshot = (
            self.build_raw_escalation_snapshot() if not self.pause_escalation else self.raw_escalation_snapshot
        )
        task_id = celery_uuid()

        AlertGroup.all_objects.filter(pk=self.pk,).update(
            active_escalation_id=task_id,
            is_escalation_finished=False,
            raw_escalation_snapshot=raw_escalation_snapshot,
        )
```

`EscalationSnapshotMixin.escalation_chain_exists` is as such:
```python3
@property
    def escalation_chain_exists(self) -> bool:
        if self.pause_escalation:
            return False
        elif not self.channel_filter:
            return False
        return self.channel_filter.escalation_chain is not None
```

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-07-17 14:04:53 +00:00
Vadim Stepanov
69bafb61f1
Direct paging improvements (#2537)
# What this PR does

- Deprecates `/oncall` Slack command in favour of `/esalate` (direct
paging) + fixes a regression bug in both commands
- Unifies direct paging UX across Slack & Web UI (or at least makes an
attempt to make things more similar). Kudos to @iskhakov for all the
great work on this recently!
- A bunch of minor changes that hopefully make direct paging more usable
- TODO: documentation updates will be added in a separate PR

## Screenshots

### No issues scenario

Slack:

<img width="522" alt="Screenshot 2023-07-14 at 23 53 11"
src="https://github.com/grafana/oncall/assets/20116910/ec15a18f-d817-4177-b1f2-6b89d79bb361">


Web UI: 

<img width="1172" alt="Screenshot 2023-07-14 at 23 52 25"
src="https://github.com/grafana/oncall/assets/20116910/813f967c-2fdd-4868-9287-487dbfa7cea6">


### Not configured scenario

Slack:

<img width="519" alt="Screenshot 2023-07-14 at 23 45 22"
src="https://github.com/grafana/oncall/assets/20116910/932fa05c-81ea-42ca-be80-41b05f767d3e">

Web UI:

<img width="1172" alt="Screenshot 2023-07-14 at 23 47 31"
src="https://github.com/grafana/oncall/assets/20116910/6bcb07e4-2e50-4120-9fac-be8b0277e181">

### `/oncall` deprecation warning

<img width="521" alt="Screenshot 2023-07-17 at 10 31 56"
src="https://github.com/grafana/oncall/assets/20116910/4ff28337-1693-4af0-81d9-9eda90099c1b">


## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/2442

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-17 14:21:56 +01:00
Innokentii Konstantinov
aa4edad4a7
Remove INTEGRATIONS_TO_REVERSE_URL_MAP (#2533)
This PR is a step for migrating to AlertManager v2. I want to simplify
config thing a little to be able to move forward.
In INTEGRATIONS_TO_REVERSE_URL_MAP key equal value and in all cases we
just don't need it.

Public API is different, since I need this mapping for migration to
AlertManager v2 to provide backward-compatibility. It will be done in
next PRs.
2023-07-17 04:43:24 +00:00
Joey Orlando
767c5352fa
augment API response pagination attributes (#2471)
# What this PR does

This PR:
- adds a few attributes to paginated API responses
- removes channel filter "send demo alert" internal API endpoint + tests
(this endpoint was marked as deprecated + not consumed by the web UI)

With the new paginated API response schema, the web UI will no longer
need to:
- hardcode `ITEMS_PER_PAGE` for each table
- manually calculate total number of pages

(these two things ☝️ will be done in
https://github.com/grafana/oncall/issues/2476)

For `GET /api/internal/v1/alertgroups` the response will now look like
this:
```diff
{
    "next": <url> | None,
    "previous": <url> | None,
    "results": [],
++  "page_size": <int>
}
```

For all other paginated API responses, the response will now look like:
```diff
{
    "count": <int>,
    "next": <url> | None,
    "previous": <url> | None,
    "results": [],
++  "page_size": <int>,
++  "current_page_number": <int>,
++  "total_pages": <int>
}
```

## TODO
- [x] update public API docs to include these new attributes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-14 11:19:40 -04:00
Joey Orlando
681087117c
remove deprecated gitops views (#2530)
# What this PR does

I don't see any traffic to these endpoint URLs over the past week. The
terraform "renderers" in `engine/apps/alerts/terraform_renderer` make a
lot of references to amixr, making think these are legacy, and can be
removed.

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-07-14 05:13:52 -04:00
Joey Orlando
77f6dedce5
add index on started_at column in alert groups (#2516)
# What this PR does

Adds an index on the `started_at` column in the `alerts_alertgroup`
table. For the alert groups query used by the
`check_escalation_finished_task`, this resulted in a huge performance
boost, taking the query time from 89mins to 4secs (on our largest
production dataset).

## Which issue(s) this PR fixes

closes #724
closes https://github.com/grafana/oncall-private/issues/1713

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-13 05:22:59 -04:00
Ildar Iskhakov
0b28815d46
Unhide direct paging integration (#2483)
# What this PR does
Fixes https://github.com/grafana/oncall/issues/2442



https://github.com/grafana/oncall/assets/2262529/08bb8e5f-acc6-4f2d-9e38-717c9f37e3da





## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-13 13:41:31 +08:00
Joey Orlando
d24dc4b630
remove organization maintenance mode + fix integration maintenance mode (#2511) 2023-07-12 16:41:44 -04:00
Vadim Stepanov
3ebfd9c8b2
Remove outdated Slack functionality to create alerts (#2383)
# What this PR does

There's an outdated Slack handler that creates alerts when a file is
shared in a channel. I can't remember why it was introduced in the first
place, but I'm pretty sure that it's deprecated a long time ago, and
this feature is not documented anywhere. The PR also removes
`AlertReceiveChannel.integration_slack_channel_id` as it's no longer
used.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-06-29 10:09:02 +00:00
Ildar Iskhakov
c71ed490b4
Optmise autoresolve query (#2273)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-06-19 04:43:46 +00:00
Joey Orlando
9dde1805aa
add mypy static type checker to backend codebase (#2151)
# What this PR does

- Adds [`mypy` static type checking](https://mypy-lang.org/) to our CI
pipeline. Currently there is still a **ton** of errors being returned by
the tool, as we'll need to fix pre-existing errors. I think we can
slowly chip away at these errors in small PRs, doing them all in one
large PR is likely very risky.
- Also, this PR starts chipping away at one of the main type errors that
we have which is accessing the `datetime` class (from the `datetime`
library) or `timedelta` function on the `django.utils.timezone` module.
Basically we should be instead accessing these two objects from the
native `datetime` module. This makes sense because the [`__all__`
attribute](https://github.com/django/django/blob/main/django/utils/timezone.py#L14-L30)
in `django.utils.timezone` does not re-export `datetime` or `timedelta`.
- splits `engine` dependencies out into `requirements.txt` and
`requirements-dev.txt`

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
2023-06-12 12:50:33 -04:00
Joey Orlando
ea9b7a6331
Fix warnings when running backend tests (#2079)
# What this PR does

- update `make test` to always use `settings.ci-test`. Right now it will
use whatever the value of `DJANGO_SETTINGS_MODULE` is in
`./dev/.env.dev`, which causes ~45 tests to fail
- Fix several Python warnings that we see when running the tests
```bash
RemovedInDjango40Warning: The providing_args argument is deprecated. As it is purely documentational, it has no replacement. If you rely on this argument as documentation, you can move the text to a code comment or docstring.
    alert_create_signal = django.dispatch.Signal(
```

```bash
PytestCollectionWarning: cannot collect test class 'TestOnlyBackend' because it has a __init__ constructor (from: apps/api/tests/test_alert_receive_channel_template.py)
    class TestOnlyBackend(BaseMessagingBackend):
```

```bash
DeprecationWarning: The parameter 'use_aliases' in emoji.emojize() is deprecated and will be removed in version 2.0.0. Use language='alias' instead.
  To hide this warning, pin/downgrade the package to 'emoji~=1.6.3'
    return emoji.emojize(self.verbal_name, use_aliases=True)
```

```bash
DateTimeField CustomOnCallShift.start received a naive datetime (2023-06-01 12:53:12) while time zone support is active.
    warnings.warn("DateTimeField %s received a naive datetime (%s)"
```

```bash
apps/twilioapp/tests/test_phone_calls.py::test_resolve_by_phone
  /etc/app/apps/twilioapp/tests/test_phone_calls.py:173: DeprecationWarning: The 'text' argument to find()-type methods is deprecated. Use 'string' instead.
    content = BeautifulSoup(content, features="html.parser").findAll(text=True)
```

```bash
apps/twilioapp/tests/test_phone_calls.py::test_resolve_by_phone
apps/twilioapp/tests/test_phone_calls.py::test_wrong_pressed_digit
  /usr/local/lib/python3.11/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
```

```bash
apps/twilioapp/tests/test_phone_calls.py::test_forbidden_requests
  /usr/local/lib/python3.11/site-packages/social_django/urls.py:15: RemovedInDjango40Warning: django.conf.urls.url() is deprecated in favor of django.urls.re_path().
    url(r'^login/(?P<backend>[^/]+){0}$'.format(extra), views.auth,
```

```bash
apps/twilioapp/tests/test_phone_calls.py: 66 warnings
  /usr/local/lib/python3.11/site-packages/debug_toolbar/utils.py:255: DeprecationWarning: currentThread() is deprecated, use current_thread() instead
    thread = threading.currentThread()
```


## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-06-06 18:38:00 +00:00
Matias Bordese
0ab5312bf1
Use X-WR-TIMEZONE as calendar tz when available, fallback to UTC (#2091)
When checking current shifts to trigger slack notifications, a stable
value for timezone should be used (to avoid triggering multiple
duplicated notifications; using the first `VTIMEZONE` value available
may change on refresh, besides being potentially incorrect).

Also, `VTIMEZONE` shouldn't be used as the calendar timezone, it just
describes a [timezone
definition](https://icalendar.org/iCalendar-RFC-5545/3-6-5-time-zone-component.html)
which can later be [used when specifying a start/end
date/datetime](https://icalendar.org/iCalendar-RFC-5545/3-2-19-time-zone-identifier.html)
("The parameter MUST be specified on properties with a DATE-TIME value
if the DATE-TIME is not either a UTC or a "floating" time.").

OTOH, Google uses the non-standard[ `X-WR-TIMEZONE`
](https://github.com/collective/icalendar/issues/343)field as fallback
TZ for date-times without a `TZID`. Try to use this when available,
otherwise fallback to `UTC`.
2023-06-02 17:28:04 +00:00
Vadim Stepanov
8c8b5e0f2d
Fix demo alert for inbound email (#2081)
# What this PR does
Fix HTTP 500 when sending demo alert for inbound integration email. 

- Make all integration configs have consistent `is_demo_alert_enabled`
and `example_payload` values
- Add tests

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-06-02 09:44:32 +00:00
Matias Bordese
5d383c7d1d
Trigger slack shift notifications on current shift change (#2080)
Before this change, a diff ical check (which happens with frequency with
imported ical), particularly with overrides in an API/terraform schedule
would trigger unexpected slack notifications because the prev vs current
ical comparison will flag a diff, but when comparing current and
previous shifts, `current_shifts` will have the shift in progress while
the `prev_shifts` calculated from the overrides-only diff will most of
the time be empty (unless you set/change an override at current time).

Simplified the checks to always compare previous current shifts (ie. the
ones in the schedule from the DB) vs the recalculated ones using the
(refreshed) ical data from the schedule.
2023-06-01 16:27:14 +00:00
Matias Bordese
bd142927b5
Update find contact point name, receiver could be missing key (#2046)
Fixes issue when syncing contact points and there are receiver configs
with no `grafana_managed_receiver_configs` key.
(eg. `{"name": "autogen-contact-point-default"}`)
2023-05-29 18:52:24 +00:00
Yulya Artyukhina
15ef692009
OnCall prometheus metrics exporter (#1605)
# What this PR does
Add OnCall prometheus metrics exporter

## Which issue(s) this PR fixes

## Checklist

- [x] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
2023-05-25 18:26:13 +00:00
Innokentii Konstantinov
1f786e8d2a
Phone provider refactoring (#1713)
# What this PR does
This PR moves phone notification logic into separate object PhoneBackend
and introduces PhoneProvider interface to hide actual implementation of
external phone services provider. It should allow add new phone
providers just by implementing one class (See SimplePhoneProvider for
example).
# Why 
[Asterisk PR](https://github.com/grafana/oncall/pull/1282) showed that
our phone notification system is not flexible. However this is one of
the most frequent community questions - how to add "X" phone provider.
Also, this refactoring move us one step closer to unifying all
notification backends, since with PhoneBackend all phone notification
logic is collected in one place and independent from concrete
realisation.
# Highligts
1. PhoneBackend object - contains all phone notifications business
logic.
2. PhoneProvider - interface to  external phone services provider.
3. TwilioPhoneProvider and SimplePhoneProvider - two examples of
PhoneProvider implementation.
4. PhoneCallRecord and SMSRecord models. I introduced these models to
keep phone notification limits logic decoupled from external providers.
Existing TwilioPhoneCall and TwilioSMS objects will be migrated to the
new table to not to reset limits counter. To be able to receive status
callbacks and gather from Twilio TwilioPhoneCall and TwilioSMS still
exists, but they are linked to PhoneCallRecord and SMSRecord via fk, to
not to leat twilio logic into core code.

---------

Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
2023-05-24 06:27:48 +00:00
Ildar Iskhakov
f18858882e
Remove prints (#1924)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-05-15 09:28:01 +08:00