Related to https://github.com/grafana/oncall-private/issues/2994
- Extend gaps/empty shift checks to consider 30 days (customizable via
param, eventually make it customizable per schedule?); ie. every week
(per beat schedule), check the schedule next 30 days
- Trigger checks via async task on schedule API updates (instead of a
sync call)
- Update notifications wording / link to schedule
Related to https://github.com/grafana/oncall-private/issues/2950
- Represent missing users in schedule events (so they are displayed in
the web UI)
- Fix schedule checks for gaps/empty shifts so they send notifications
# What this PR does
- Stops writing `SlackMessage.organization` + removes references to this
field. [As we
discussed](https://raintank-corp.slack.com/archives/C083TU81TCH/p1733315887463279?thread_ts=1733311105.095309&cid=C083TU81TCH),
we do not need this field on this model/table,
`SlackMessage._slack_team_identity` is sufficient (`organization` will
be dropped in a separate PR)
- Adds a data migration script which:
- drops orphaned `SlackMessage` records; ie. ones which, even after the
[`engine/apps/slack/migrations/0007_migrate_slackmessage_channel_id.py`](https://github.com/grafana/oncall/blob/dev/engine/apps/slack/migrations/0007_migrate_slackmessage_channel_id.py)
migration, still don't have a `SlackMessage.channel` id filled in (we
discussed + agreed on dropping these records
[here](https://raintank-corp.slack.com/archives/C083TU81TCH/p1733329914516859?thread_ts=1733311105.095309&cid=C083TU81TCH))
- fills in empty `SlackMessage.slack_team_identity` values (from
`slack_message.channel.slack_team_identity`)
### Other notes
On the `organization` topic.
We store records in `SlackMessage` for two purposes (AFAICT), and in
both cases, we have references back to the `organization`:
- alert groups - `slack_message.alert_group.channel.organization`
- shift swap requests - `shift_swap_request.schedule.organization`
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Related to https://github.com/grafana/oncall-private/issues/2947
**NOTE**
This PR introduces steps 1 and 2 of the 3 part migration proposed
[here](https://raintank-corp.slack.com/archives/C06K1MQ07GS/p1732555465144099).
Step 3, swapping reads to be from the new-column and dropping
dual-writes, will be done in a future PR/release.
---
I’m tackling this work now because _ultimately_ I want to move
`AlertReceiveChannel.rate_limited_in_slack_at` to
`SlackChannel.rate_limited_at` , but first I sorta need to refactor
`SlackMessage.channel_id` from a `CHAR` field to a foreign key
relationship (because in the spots where we touch Slack rate limiting,
like
[here](https://github.com/grafana/oncall/blob/dev/engine/apps/slack/alert_group_slack_service.py#L42-L50)
for example, we only have `slack_message.channel_id`, which means I need
to do extra queries to fetch the appropriate `SlackChannel` to then be
able to get/set `SlackChannel.rate_limited_at`
Other minor stuffs:
- it also prepares us to drop `SlackMessage._slack_team_identity`. We
already have a `@property` of `SlackMessage.slack_team_identity` (which
[previously had some hacky
logic](https://github.com/grafana/oncall/blob/dev/engine/apps/slack/models/slack_message.py#L74-L84)).
I've refactored `SlackMessage.slack_team_identity` to simply point to
`self.organization.slack_team_identity` + updated our code to _stop_
setting `SlackMessage._slack_team_identity` (will drop this column in
future release)
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
This older version of recurring_ical_events does not call the pytz
.normalize() function, which can cause some invalid datetime objects to
return when a DST swap happens. For example: Nov 3, 2024 9:00 AM CDT
instead of the correct 8:00 AM CST). By calling tz.normalize on the end
date and checking if the time zone information changed, we can detect
when DST starts/stops and adjust the end date accordingly.
| | DST stopping on November 3, 2024: | DST starting on March 9, 2024 |
|-|-----------------------------------------|-----------------------------------|
| Before |

|

| After |

|

|
## Which issue(s) this PR closes
Closes#5247
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
---------
Co-authored-by: Matias Bordese <mbordese@gmail.com>
# What this PR does
Closes https://github.com/grafana/irm/issues/31 (and supersedes
https://github.com/grafana/oncall/pull/4784)
Main changes:
- updates `apps.api.permissions.user_is_authorized` to check the value
of `organization.is_grafana_irm_enabled`. If it is, we check for the
presence of `grafana-irm-app` prefixed RBAC permissions rather than
`grafana-oncall-app`
- cleans-up `engine/apps/api/tests/test_permissions.py` (bulk of the
changes in the PR)
- converts `apps.user_management.models.User.build_permissions_query` to
a `UserQuerySet` method instead
- means we can now do things like this instead:
```python3
User.objects.filter_by_permission(RBACPermission.Permissions.NOTIFICATIONS_READ,
organization)
```
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Fix calculation for interval when building ical events for schedules
with end dates and masked days. Add a test which makes it easier to test
different cases that can occur.
## Which issue(s) this PR closes
Related to https://github.com/grafana/support-escalations/issues/12388
<!--
*Note*: If you want the issue to be auto-closed once the PR is merged,
change "Related to" to "Closes" in the line above.
If you have more than one GitHub issue that this PR closes, be sure to
preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
---------
Co-authored-by: Matias Bordese <mbordese@gmail.com>
Related to https://github.com/grafana/oncall/issues/4936
Cached final schedule keeps a (now - 15d, now + 6m) window
representation of a schedule (this representation always use users'
username; it will un-relate users that are not anymore associated to a
schedule).
# What this PR does
Reduces number of calls to db for `/schedules`, `/alertgroups` and
`/users` endpoints.
Fixes the issue when there was an additional call to db to get
organization url to build user avatar full link.
## Which issue(s) this PR closes
Related to [issue link here]
<!--
*Note*: If you want the issue to be auto-closed once the PR is merged,
change "Related to" to "Closes" in the line above.
If you have more than one GitHub issue that this PR closes, be sure to
preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Covers the case on getting cached on-call users when cached schedule was
deleted
## Which issue(s) this PR closes
Related to https://github.com/grafana/oncall-private/issues/2836
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
- bumps `uwsgi` to latest version (`2.0.26`), which unblocks us from
bumping Python to 3.12
- bumps Python to 3.12.3
- refactor the Snyk GitHub Actions workflow to use the composable
actions for installed frontend and backend dependencies
- fixes several `AttributeError`s in our tests that went from a warning
to an error in Python 3.12 (see
https://github.com/python/cpython/issues/100690)
# Which issue(s) this PR closes
Closes#4358
Closes https://github.com/grafana/oncall/issues/4387
# What this PR does
## Which issue(s) this PR closes
Closes https://github.com/grafana/oncall-private/issues/2590
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required) - will be done in
https://github.com/grafana/oncall-private/issues/2591
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes. - will be done in
https://github.com/grafana/oncall-private/issues/2591
---------
Co-authored-by: Dominik <dominik.broj@grafana.com>
Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com>
Related to https://github.com/grafana/oncall/issues/3673
Keep cache up to date on every schedule refresh task run (which should
keep cache populated every time), helping on any call using cached
information (particularly the direct paging slack dialog building).
# Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2363
Addresses this issue that arises when using
`cache.get_many`/`cache.set_many` operations with a Redis Cluster:
```python3
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 1006, in determine_slot
raise RedisClusterException(
redis.exceptions.RedisClusterException: MGET - all keys must map to the same key slot
```
From the Redis Cluster
[docs](https://redis.io/docs/reference/cluster-spec/#hash-tags), this
can be addressed with this 👇 . Basically this will ensure that keys in
multi-key operations will resolve to the same hash slot (read: node):
> Hash tags
> There is an exception for the computation of the hash slot that is
used in order to implement hash tags. Hash tags are a way to ensure that
multiple keys are allocated in the same hash slot. This is used in order
to implement multi-key operations in Redis Cluster.
>
> To implement hash tags, the hash slot for a key is computed in a
slightly different way in certain conditions. If the key contains a
"{...}" pattern only the substring between { and } is hashed in order to
obtain the hash slot. However since it is possible that there are
multiple occurrences of { or } the algorithm is well specified by the
following rules:
>
> IF the key contains a { character.
> AND IF there is a } character to the right of {.
> AND IF there are one or more characters between the first occurrence
of { and the first occurrence of }.
> Then instead of hashing the key, only what is between the first
occurrence of { and the following first occurrence of } is hashed.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# Which issue(s) this PR fixes
1. Enable RBAC
2. Create a schedule rotation layer which includes a user whom is Viewer
+ has role `Notifications Receiver` (this is the RBAC role we use to
filter which users show up in the user dropdown in the rotations modal
when creating a rotation)
3. The user _sorta_ shows up in the schedule but they are listed in
`missing_users`
<img width="1166" alt="Screenshot 2023-11-17 at 10 12 30"
src="https://github.com/grafana/oncall/assets/9406895/ae4d6449-3aff-4087-9b05-64645e84b40a">
<img width="1173" alt="Screenshot 2023-11-17 at 10 15 04"
src="https://github.com/grafana/oncall/assets/9406895/3ac4f0b9-49b3-4a7d-bfcf-39a8c51bbb74">
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR:
- adds a new method
`apps.schedules.ical_utils.get_cached_oncall_users_for_multiple_schedules`.
In short this method basically adds a layer of caching on top of
`apps.schedules.ical_utils.get_oncall_users_for_multiple_schedules`. We
store one cache value for each schedule. Cache results are stored for 15
minutes. To me this feels like a good balance between improving
performance + returning stale results. Cache values are stored as:
- key = `f"schedule_{schedule.public_primary_key}_oncall_users"`
- value = `[user.public_primary_key for user in
schedule.currently_oncall_users]`
- adds a `DELETE /alertgroups/<id>` internal API endpoint (needed by
Grafana Incident for the Add Responders integration)
- updates the `is_currently_oncall` query parameter for the `GET /users`
internal API endpoint to return ALL users when the query param value `==
"all"`
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2264
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb
## Summary of backend changes
- Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method
and `AlertReceiveChannel.is_contactable` property. These are needed to
be able to (optionally) filter down teams, in the `GET /teams` internal
API endpoint
([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)),
to just teams that have a "contactable" Direct Paging integration
- `engine/apps/alerts/paging.py`
- update these functions to support new UX. In short `direct_paging` no
longer takes a list of `ScheduleNotifications` or an `EscalationChain`
object
- add `user_is_oncall` helper function
- add `_construct_title` helper function. In short if no `title` is
provided, which is the case for Direct Pages originating from OnCall
(either UI or Slack), then the format is `f"{from_user.username} is
paging <team.name (if team is specified> <comma separated list of
user.usernames> to join escalation"`
- `engine/apps/api/serializers/team.py` - add
`number_of_users_currently_oncall` attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52))
- `engine/apps/api/serializers/user.py` - add `is_currently_oncall`
attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162))
- `engine/apps/api/views/team.py` - add support for two new optional
query params `only_include_notifiable_teams` and `include_no_team`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70))
- `engine/apps/api/views/user.py`
- in the `GET /users` internal API endpoint, when specifying the
`search` query param now also search on `teams__name`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223);
this is a new UX requirement)
- add support for a new optional query param, `is_currently_oncall`, to
allow filtering users based on.. whether they are currently on call or
not
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282))
- remove `check_availability` endpoint (no longer used with new UX; also
removed references in frontend code)
- `engine/apps/slack/scenarios/paging.py` and
`engine/apps/slack/scenarios/manage_responders.py` - update Slack
workflows to support new UX. Schedules are no longer a concept here.
When creating a new alert group via `/escalate` the user either
specifies a team and/or user(s) (they must specify at least one of the
two and validation is done here to check this). When adding responders
to an existing alert group it's simply a list of users that they can
add, no more schedules.
- add `Organization.slack_is_configured` and
`Organization.telegram_is_configured` properties. These are needed to
support [this new functionality
](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272)
in the `AlertReceiveChannel` model.
## Summary of frontend changes
- Refactor/rename `EscalationVariants` component to `AddResponders` +
remove `grafana-plugin/src/containers/UserWarningModal` (no longer
needed with new UX)
- Remove `grafana-plugin/src/models/user.ts` as it seemed to be a
duplicate of `grafana-plugin/src/models/user/user.types.ts`
Related to https://github.com/grafana/incident/issues/4278
- Closes#3115
- Closes#3116
- Closes#3117
- Closes#3118
- Closes#3177
## TODO
- [x] make frontend changes
- [x] update Slack backend functionality
- [x] update public documentation
- [x] add/update e2e tests
## Post-deploy To-dos
- [ ] update dev/ops/production Slack bots to update `/escalate` command
description (should now say "Direct page a team or user(s)")
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Related to https://github.com/grafana/oncall-private/issues/2191
This will also allow plugin to get shift name and type from the
filter_events API, without needing to get details for each involved
shift in the user's on-call summary timeline.
# What this PR does
- Rename `SlackClientWithErrorHandling` to just `SlackClient`
- Add more error classes + improve the way errors are raised based on
the Slack error code
- Add API call retries on Slack server errors (e.g. when Slack returns
`5xx` errors)
- Refactor some methods working with Slack API + add tests
## Which issue(s) this PR fixes
- https://github.com/grafana/oncall-private/issues/1837
- https://github.com/grafana/oncall-private/issues/1840
- https://github.com/grafana/oncall-private/issues/1842
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- update `slackclient` dependency to latest version. The version we were
using was 5 years old 😲
- first followed the v2 migration guide
[here](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x)
followed by the v3 migration guide
[here](https://slack.dev/python-slack-sdk/v3-migration/). The main
changes were:
- The PyPI project was renamed from `slackclient` to `slack_sdk`
- it is discouraged/harder to call `api_call` and encouraged to call the
helper methods (ex. `chat_postMessage`;
[note](https://github.com/slackapi/python-slack-sdk/wiki/Migrating-to-2.x#web-client-api-changes)
in migration guide docs)
- In 1.x, a failed api call would return the error payload to you and
have you handle the error. In 2.x, a failed api call will throw an
exception. To handle this in your code, you will have to wrap api calls
with a try except block. Since we overload `WebClient.api_call` this was
an easy change and only required a one line change
- remove `apps.slack.slack_client.slack_server.SlackClientServer` class.
The new version of `slack_sdk` handles the case that we needed to
overload for in the first place.
- merged `apps/slack/slack_client/slack_client.py` and
`apps/slack/slack_client/exceptions.py` into `apps/slack/client.py`
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/2915
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
When an event is split because of a swap request, we were including the
original event if it was supposed to be in progress during the requested
time span.
# What this PR does
Adds SSR push notification follow-ups (similar to
https://github.com/grafana/oncall/pull/2798)
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/2679
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Add Slack follow-up messages for shift swap requests:
<img width="377" alt="Screenshot 2023-08-15 at 20 19 49"
src="https://github.com/grafana/oncall/assets/20116910/14053838-c8f2-49f6-81cd-383d3fbc061c">
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/2679
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Update schedule slack notifications to use schedule final events instead
of getting events from iCal
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Adds mobile app push notifications for shift swap requests.
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/2630
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- drop `GET /api/internal/v1/shift_swaps/<id>/shifts` endpoint in favour
of adding a `shifts` property to the response schema for all shift swap
endpoints (expect `GET /api/internal/v1/shift_swaps` (ie. list all))
- Update the Slack message layout:
<img width="590" alt="Screenshot 2023-08-01 at 17 28 44"
src="https://github.com/grafana/oncall/assets/9406895/84a51614-5dd6-48ec-ae81-fef4bc32fec9">
**Note**: about the highlighted lines. This is a small issue w/ the
`ShiftSwapRequest.shifts` method. @matiasb is already helping out here 🙏
**Other stuff**
- adds some type hints related to the code I was working around with
- slightly refactor `apps.slack.utils.format_datetime_to_slack` to make
it more generic for the use case in this PR
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)