Related to https://github.com/grafana/oncall-private/issues/2950
- Represent missing users in schedule events (so they are displayed in
the web UI)
- Fix schedule checks for gaps/empty shifts so they send notifications
# What this PR does
Similar to https://github.com/grafana/oncall/pull/5199
Converts follow char fields to primary key relationships on
`SlackChannel` table:
- `ResolutionNoteSlackMessage.channel_id` ->
`ResolutionNoteSlackMessage.slack_channel`
- `ChannelFilter.slack_channel_id` -> `ChannelFilter.slack_channel`
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Closes https://github.com/grafana/irm/issues/31 (and supersedes
https://github.com/grafana/oncall/pull/4784)
Main changes:
- updates `apps.api.permissions.user_is_authorized` to check the value
of `organization.is_grafana_irm_enabled`. If it is, we check for the
presence of `grafana-irm-app` prefixed RBAC permissions rather than
`grafana-oncall-app`
- cleans-up `engine/apps/api/tests/test_permissions.py` (bulk of the
changes in the PR)
- converts `apps.user_management.models.User.build_permissions_query` to
a `UserQuerySet` method instead
- means we can now do things like this instead:
```python3
User.objects.filter_by_permission(RBACPermission.Permissions.NOTIFICATIONS_READ,
organization)
```
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Covers the case on getting cached on-call users when cached schedule was
deleted
## Which issue(s) this PR closes
Related to https://github.com/grafana/oncall-private/issues/2836
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Speed up `GET /schedules` and `GET /current_user_events` internal api
endpoints by reducing number of calls to database
## Which issue(s) this PR closes
Related to https://github.com/grafana/oncall-private/issues/1552
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Minor formatting change to the suggested REFRESH-INTERVAL of iCal
exports. DURATION units less than 1d must be prefixed by "T". Fixes
issue with Atlassian Confluence failing to subscribe to iCal URLs from
Grafana OnCall
RFC 2445 explains this a bit more clearly in section 4.3.6 on DURATION.
https://www.ietf.org/rfc/rfc2445.txt
Obviously I wish Atlassian could just be a little more forgiving in
their digestion of the iCal data since clearly Gmail and others have no
problem with it, but I doubt I'm going to get much traction with them
(we do have a case open though).
## Which issue(s) this PR fixes
I haven't logged one yet but I can if you want. Again, my main issue was
with Atlassian Confluence. Kept throwing errors of "The uploaded data
does not seem to be iCalendar content" as long as the REFRESH-INTERVAL
DURATION was less than 1 day and lacking the "T" character.
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Related to https://github.com/grafana/oncall/issues/3673
Keep cache up to date on every schedule refresh task run (which should
keep cache populated every time), helping on any call using cached
information (particularly the direct paging slack dialog building).
# Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2363
Addresses this issue that arises when using
`cache.get_many`/`cache.set_many` operations with a Redis Cluster:
```python3
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 1006, in determine_slot
raise RedisClusterException(
redis.exceptions.RedisClusterException: MGET - all keys must map to the same key slot
```
From the Redis Cluster
[docs](https://redis.io/docs/reference/cluster-spec/#hash-tags), this
can be addressed with this 👇 . Basically this will ensure that keys in
multi-key operations will resolve to the same hash slot (read: node):
> Hash tags
> There is an exception for the computation of the hash slot that is
used in order to implement hash tags. Hash tags are a way to ensure that
multiple keys are allocated in the same hash slot. This is used in order
to implement multi-key operations in Redis Cluster.
>
> To implement hash tags, the hash slot for a key is computed in a
slightly different way in certain conditions. If the key contains a
"{...}" pattern only the substring between { and } is hashed in order to
obtain the hash slot. However since it is possible that there are
multiple occurrences of { or } the algorithm is well specified by the
following rules:
>
> IF the key contains a { character.
> AND IF there is a } character to the right of {.
> AND IF there are one or more characters between the first occurrence
of { and the first occurrence of }.
> Then instead of hashing the key, only what is between the first
occurrence of { and the following first occurrence of } is hashed.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# Which issue(s) this PR fixes
1. Enable RBAC
2. Create a schedule rotation layer which includes a user whom is Viewer
+ has role `Notifications Receiver` (this is the RBAC role we use to
filter which users show up in the user dropdown in the rotations modal
when creating a rotation)
3. The user _sorta_ shows up in the schedule but they are listed in
`missing_users`
<img width="1166" alt="Screenshot 2023-11-17 at 10 12 30"
src="https://github.com/grafana/oncall/assets/9406895/ae4d6449-3aff-4087-9b05-64645e84b40a">
<img width="1173" alt="Screenshot 2023-11-17 at 10 15 04"
src="https://github.com/grafana/oncall/assets/9406895/3ac4f0b9-49b3-4a7d-bfcf-39a8c51bbb74">
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR:
- adds a new method
`apps.schedules.ical_utils.get_cached_oncall_users_for_multiple_schedules`.
In short this method basically adds a layer of caching on top of
`apps.schedules.ical_utils.get_oncall_users_for_multiple_schedules`. We
store one cache value for each schedule. Cache results are stored for 15
minutes. To me this feels like a good balance between improving
performance + returning stale results. Cache values are stored as:
- key = `f"schedule_{schedule.public_primary_key}_oncall_users"`
- value = `[user.public_primary_key for user in
schedule.currently_oncall_users]`
- adds a `DELETE /alertgroups/<id>` internal API endpoint (needed by
Grafana Incident for the Add Responders integration)
- updates the `is_currently_oncall` query parameter for the `GET /users`
internal API endpoint to return ALL users when the query param value `==
"all"`
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2264
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
https://www.loom.com/share/c5e10b5ec51343d0954c6f41cfd6a5fb
## Summary of backend changes
- Add `AlertReceiveChannel.get_orgs_direct_paging_integrations` method
and `AlertReceiveChannel.is_contactable` property. These are needed to
be able to (optionally) filter down teams, in the `GET /teams` internal
API endpoint
([here](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R63)),
to just teams that have a "contactable" Direct Paging integration
- `engine/apps/alerts/paging.py`
- update these functions to support new UX. In short `direct_paging` no
longer takes a list of `ScheduleNotifications` or an `EscalationChain`
object
- add `user_is_oncall` helper function
- add `_construct_title` helper function. In short if no `title` is
provided, which is the case for Direct Pages originating from OnCall
(either UI or Slack), then the format is `f"{from_user.username} is
paging <team.name (if team is specified> <comma separated list of
user.usernames> to join escalation"`
- `engine/apps/api/serializers/team.py` - add
`number_of_users_currently_oncall` attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-26af48f796c9e987a76447586dd0f92349783d6ea6a0b6039a2f0f28bd58c2ebR45-R52))
- `engine/apps/api/serializers/user.py` - add `is_currently_oncall`
attribute to response schema
([code](https://github.com/grafana/oncall/pull/3128/files#diff-6744b5544ebb120437af98a996da5ad7d48ee1139a6112c7e3904010ab98f232R157-R162))
- `engine/apps/api/views/team.py` - add support for two new optional
query params `only_include_notifiable_teams` and `include_no_team`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-a4bd76e557f7e11dafb28a52c1034c075028c693b3c12d702d53c07fc6f24c05R55-R70))
- `engine/apps/api/views/user.py`
- in the `GET /users` internal API endpoint, when specifying the
`search` query param now also search on `teams__name`
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R223);
this is a new UX requirement)
- add support for a new optional query param, `is_currently_oncall`, to
allow filtering users based on.. whether they are currently on call or
not
([code](https://github.com/grafana/oncall/pull/3128/files#diff-30309629484ad28e6fe09816e1bd226226d652ea977b6f3b6775976c729bf4b5R272-R282))
- remove `check_availability` endpoint (no longer used with new UX; also
removed references in frontend code)
- `engine/apps/slack/scenarios/paging.py` and
`engine/apps/slack/scenarios/manage_responders.py` - update Slack
workflows to support new UX. Schedules are no longer a concept here.
When creating a new alert group via `/escalate` the user either
specifies a team and/or user(s) (they must specify at least one of the
two and validation is done here to check this). When adding responders
to an existing alert group it's simply a list of users that they can
add, no more schedules.
- add `Organization.slack_is_configured` and
`Organization.telegram_is_configured` properties. These are needed to
support [this new functionality
](https://github.com/grafana/oncall/pull/3128/files#diff-9d96504027309f2bd1e95352bac1433b09b60eb4fafb611b52a6c15ed16cbc48R271-R272)
in the `AlertReceiveChannel` model.
## Summary of frontend changes
- Refactor/rename `EscalationVariants` component to `AddResponders` +
remove `grafana-plugin/src/containers/UserWarningModal` (no longer
needed with new UX)
- Remove `grafana-plugin/src/models/user.ts` as it seemed to be a
duplicate of `grafana-plugin/src/models/user/user.types.ts`
Related to https://github.com/grafana/incident/issues/4278
- Closes#3115
- Closes#3116
- Closes#3117
- Closes#3118
- Closes#3177
## TODO
- [x] make frontend changes
- [x] update Slack backend functionality
- [x] update public documentation
- [x] add/update e2e tests
## Post-deploy To-dos
- [ ] update dev/ops/production Slack bots to update `/escalate` command
description (should now say "Direct page a team or user(s)")
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Add composite indexes based on existing queries/usage, ensuring partial
index prefixes are useful too.
- `is_active` filtering is set in the default `User` manager
- most of our user queries are per `organization`
- multiple cases filter by `username` or `email` (most notably schedule
related queries, given the low-level backend ical representation)
Also rework how users are fetched from DB when getting users from
schedules ical representation (which was particularly slow when regex
filtering by required permission).
Related to https://github.com/grafana/oncall-private/issues/2163
# What this PR does
Update schedule slack notifications to use schedule final events instead
of getting events from iCal
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Related to https://github.com/grafana/oncall/issues/2392
- Re-enable the following `mypy` rules + fix their pre-existing errors:
- `no-redef`
- `valid-type`
- `var-annotated`
- Add stronger return typing to the `GrafanaAPIClient` by use of
generics + add some links to documentation in the method docstrings
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
First step towards reusing `schedule.final_events` where current /
upcoming schedule shifts information is needed (eg. escalations, shift
notifications, etc)
# What this PR does
Remove
[`apps.get_model`](https://docs.djangoproject.com/en/3.2/ref/applications/#django.apps.apps.get_model)
invocations and use inline `import` statements in places where models
are imported within functions/methods to avoid circular imports.
I believe `import` statements are more appropriate for most use cases as
they allow for better static code analysis & formatting, and solve the
issue of circular imports without being unnecessarily dynamic as
`apps.get_model`. With `import` statements, it's possible to:
- Jump to model definitions in most IDEs
- Automatically sort inline imports with `isort`
- Find import errors faster/easier (most IDEs highlight broken imports)
- Have more consistency across regular & inline imports when importing
models
This PR also adds a flake8 rule to ban imports of `django.apps.apps`, so
it's harder to use `apps.get_model` by mistake (it's possible to ignore
this rule by using `# noqa: I251`). The rule is not enforced on
directories with migration files, because `apps.get_model` is often used
to get a historical state of a model, which is useful when writing
migrations ([see this SO answer for more
details](https://stackoverflow.com/a/37769213)). So `apps.get_model` is
considered OK in migrations (even necessary in some cases).
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
See #2173
Also, closes#2187 . All of the new files under `type_stubs/icalendar`
were autogenerated by running:
```bash
stubgen -p icalendar -o type_stubs
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- Adds [`mypy` static type checking](https://mypy-lang.org/) to our CI
pipeline. Currently there is still a **ton** of errors being returned by
the tool, as we'll need to fix pre-existing errors. I think we can
slowly chip away at these errors in small PRs, doing them all in one
large PR is likely very risky.
- Also, this PR starts chipping away at one of the main type errors that
we have which is accessing the `datetime` class (from the `datetime`
library) or `timedelta` function on the `django.utils.timezone` module.
Basically we should be instead accessing these two objects from the
native `datetime` module. This makes sense because the [`__all__`
attribute](https://github.com/django/django/blob/main/django/utils/timezone.py#L14-L30)
in `django.utils.timezone` does not re-export `datetime` or `timedelta`.
- splits `engine` dependencies out into `requirements.txt` and
`requirements-dev.txt`
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
When checking current shifts to trigger slack notifications, a stable
value for timezone should be used (to avoid triggering multiple
duplicated notifications; using the first `VTIMEZONE` value available
may change on refresh, besides being potentially incorrect).
Also, `VTIMEZONE` shouldn't be used as the calendar timezone, it just
describes a [timezone
definition](https://icalendar.org/iCalendar-RFC-5545/3-6-5-time-zone-component.html)
which can later be [used when specifying a start/end
date/datetime](https://icalendar.org/iCalendar-RFC-5545/3-2-19-time-zone-identifier.html)
("The parameter MUST be specified on properties with a DATE-TIME value
if the DATE-TIME is not either a UTC or a "floating" time.").
OTOH, Google uses the non-standard[ `X-WR-TIMEZONE`
](https://github.com/collective/icalendar/issues/343)field as fallback
TZ for date-times without a `TZID`. Try to use this when available,
otherwise fallback to `UTC`.
May be related to https://github.com/grafana/oncall/issues/1553.
We got feedback about that happening for Google Calendar imported icals.
Google Calendar exported ics URL was returning different VTIMEZONE
components on different requests, triggering differences in the imported
ical. Updated the comparison to only consider events (while keep
assuming the sequence will reflect if there are any particular event
change).
# What this PR does
Fixes a bug when current on-call users for web UI and Slack user group
are incorrect.
This happens when both conditions below are true:
- Using an ICal schedule
- Email of a user in Grafana has a different case than an event in ICal
(e.g. `User@gmail.com` in Grafana, `user@gmail.com` in ICal event)
The bug was introduced by https://github.com/grafana/oncall/pull/1169
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/1296
## Checklist
- [x] Tests updated
- [x] `CHANGELOG.md` updated
# What this PR does
Fixes slow internal`GET /schedules` endpoints. Using the fake-data
generation script in #1128, I generated 65 calendar schedules in my
local setup. This resulted in the following endpoint performance:

The responses which show ~76 queries were run on the latest `dev`
branch. Responses w/ ~26 queries were run on this branch.
Additionally:
- add typing to a few methods in `apps/schedules/ical_utils.py`
- document `apps/api/permissions/__init__.py:user_is_authorized`
function
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1552
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
* Centralize timezone validation into one spot + add serializer validation
for schedules and oncall shifts (both public and internal API)
* add engine-manage make command
* Modify plugin.json to support RBAC role registration
* defines 26 new custom roles in plugin.json. The main roles are:
- Admin: read/write access to everything in OnCall
- Reader: read access to everything in OnCall
- OnCaller : read access to everything in OnCall + edit access to Alert Groups and Schedules
- <object-type> Editor: read/write access to everything related to <object-type>
- <object-type> Reader: read access for <object-type>
- User Settings Admin: read/write access to all user's settings, not just own settings. This is in comparison to User Settings Editor which can only read/write own settings
* update changelog and documentation (#686)
* implement RBAC for OnCall backend
This commit refactors backend authorization. It trys to use RBAC authorization if the org's grafana instance supports it, otherwise it falls back to basic role authorization.
* update RBAC backend tests
* add tests for RBAC changes
- run backend tests as matrix where RBAC is enabled/disabled. When RBAC is enabled, the permissions granted are read from the role grants in the frontend's plugin.json file (instead of relying what we specify in RBACPermission.Permissions)
- remove --reuse-db --nomigrations flags from engine/tox.ini
- minor autoformatting changes to docker-compose-developer.yml
* remove --ds=settings.ci-test from pytest CI command
DJANGO_SETTINGS_MODULE is already specified as an env var so this is just unecessary duplication
* update gitignore
* update github action job name for "test"
* RBAC frontend changes
* refactors the use of basic roles (ex. Viewer, Editor, Admin) use RBAC permissions (when supported), or falling back to basic roles when RBAC is not supported.
- updates the UserAction enum in grafana-plugin/src/state/userAction.ts. Previously this was hardcoded to a list of strings that were being returned by the OnCall API. Now the values here correspond to the permissions in plugin.json (plus a fallback role)
* changes per Gabriel's comments:
- get rid of group attribute in rbac roles
- remove displayName role attribute
- remove hidden role attribute
- add back role to includes section
* don't try to update user timezone if they don't have permission
* schedule alpha major fixes
* Fix shift update for web schedules
* Fix priority level regex, fix getting shifts without duration
* Fix shift update for web schedules
* Fix tests for shift update
* Fix priority level test
* schedule alpha fixes
* add final schedule click handler
* fix date time picker
* fix utc timzeonr time picker
* fix utc time data
* dont use user timezone on start
Co-authored-by: Julia <ferril.darkdiver@gmail.com>