This PR adds new metric for Prometheus exporter
"user_was_notified_of_alert_groups" which counts how many alert groups
user was notified of.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
See #2173
Also, closes#2187 . All of the new files under `type_stubs/icalendar`
were autogenerated by running:
```bash
stubgen -p icalendar -o type_stubs
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fix incorrect order of arguments for phone provider method invocations
when relaying phone calls and SMS from OSS instances.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
The `prefecth_related` data here is not being used later (since we are
applying filters not getting `.all`) and it is preloading all alert
groups for integrations into memory (which can be a lot).
# What this PR does
Fixes an issue when multiple user notification policies have duplicated
order values, leading to the following unexpected behaviours:
1. Not possible to rearrange notification policies that have duplicated
orders.
2. The notification system only executes the first policy from each
order group. For example, if there are policies with orders `[0, 0, 0,
0]`, only the first policy will be executed, and all others will be
skipped. So the user will see four policies in the UI, while only one of
them will be actually executed.
This PR fixes the issue by adding a unique index on `(user_id,
important, order)` for `UserNotificationPolicy` model. However, it's not
possible to add that unique index using the ordering library that we use
due to it's implementation details.
I added a new abstract Django model `OrderedModel` that's able to work
with such unique indices + under concurrent load.
Important info on this new `OrderedModel` abstract model:
- Orders are unique on the DB level
- Orders are allowed to be non-consecutive, for example order sequence
`[100, 150, 400]` is valid
- When deleting an instance, orders of other instances don't change.
This is a notable difference from the library we use. I think it's
better to only delete the instance without changing any other orders,
because it reduces the number of dependencies between instances (e.g.
Terraform drift will be much smaller this way if a policy is deleted via
the web UI).
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/1680
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Optimize getting response time for alert groups in calculation metrics
task
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Updating a schedule using the web UI sometimes you don't get the change
immediately available (since the ical refresh is async).
Related to https://github.com/grafana/oncall/issues/1968
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Features and bugs related to [[Q1 2023] Iteration with
Schedules](https://github.com/grafana/oncall-private/issues/1660)
milestone
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
# What this PR does
- Description of Alert Manager info box was changed
- Fix for filtering Escalation chains in Integration
- Alt text for Image template
- Conditional result for Template was fixed
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
Plus mark `alert_receive_channel.restricted_at` column as deprecated.
This column will be removed in a future release.
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
# What this PR does
install `requirements-dev.txt` dependencies oncall docker image dev
target to allow commands like `make test` to properly work. Otherwise
you currently get:
```bash
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "pytest": executable file not found in $PATH: unknown
make: *** [test] Error 1
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
# What this PR does
Remove deprecated `permissions` `List[str]` from internal API user
response. These permission strings are no longer used and AFAICT are not
referenced anywhere in the UI.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- add user locale field to mobile app user settings table + add a test
that sends `PATCH` requests to this endpoint
- change "you're going on call" push notification text to include
localized shift time. The general format is now:
```python
f"You're going on call in {time_until_going_oncall} for schedule
{schedule.name}, {formatted_shift}"
```
- `time_until_going_oncall` is a "human-readable" format of the time
until the start)
- `schedule.name` is self-explanatory
- `formatted_shift` this depends on the shift. If the shift starts and
ends on the same day, the format will be "HH:mm - HH:mm". Otherwise, if
the shift starts and ends on different days, the format will be
"YYYY-MM-DD HH:mm - YYYY-MM-DD HH:mm". **Note** that all datetime
related formatting will use the new `locale` field that we are now
storing in the mobile app user settings table. If no locale is yet
present we will fallback to "en"
## Which issue(s) this PR fixes
closes https://github.com/grafana/oncall/issues/2024https://github.com/grafana/oncall-mobile-app/issues/187
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
- Add descriptions for fields
- Change default value of request_data to avoid confusing situation
where if there was a error before request data is built, like in the
template for the request headers, request_data would be a python dict
resulting in hard to read text in the UI status page.
# What this PR does
polished route creation
route is now created empty by default
added placeholder text to route template
added warning for route w/o template
made text same size in warnings
changed texts a bit for more consistency with grafana alerting
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
closes#2153
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
This fixes scenario described
[here](https://github.com/grafana/oncall/issues/2118#issuecomment-1580499754).
When a rotation is setup in UTC+1, and the shift starts at 00:00 with
Monday as active day and a weekly frequency, the values are translated
to UTC when submitting to the backend, so the shift data becomes
something like: shift starting at 23:00 on Sunday, but since week_start
is on Monday, the "first event" in the week belongs to the "previous
week". This can be addressed by moving the week_start, so a weekly shift
that was starting on a Monday but in UTC tz starts on Sunday,
"translates" to a UTC week_start on Sunday:

(this is with the proposed changes; otherwise you get the same issue
linked above where the first event in the week is assigned to the other
user group).
About selected week days changed when editing a rotation, see inline
comment (related to
[this](https://github.com/grafana/oncall/issues/1322#issuecomment-1521787786))
# What this PR does
Closes#2169
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Introduces AlertManagerV2 integration with better grouping and
autoresolving, not intended for production use yet.
---------
Co-authored-by: Ildar Iskhakov <Ildar.iskhakov@grafana.com>
# What this PR does
- Adds [`mypy` static type checking](https://mypy-lang.org/) to our CI
pipeline. Currently there is still a **ton** of errors being returned by
the tool, as we'll need to fix pre-existing errors. I think we can
slowly chip away at these errors in small PRs, doing them all in one
large PR is likely very risky.
- Also, this PR starts chipping away at one of the main type errors that
we have which is accessing the `datetime` class (from the `datetime`
library) or `timedelta` function on the `django.utils.timezone` module.
Basically we should be instead accessing these two objects from the
native `datetime` module. This makes sense because the [`__all__`
attribute](https://github.com/django/django/blob/main/django/utils/timezone.py#L14-L30)
in `django.utils.timezone` does not re-export `datetime` or `timedelta`.
- splits `engine` dependencies out into `requirements.txt` and
`requirements-dev.txt`
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
We were noticing some discrepancies in alert groups counts.
From a real example:
```
# queryset is the alert groups initial queryset
>>> qs = queryset.filter(channel_filter__alert_receive_channel=arc)
>>> qs.filter(resolved_at__isnull=False).count()
1318
>>> qs = queryset.filter(channel=arc)
>>> qs.filter(resolved_at__isnull=False).count()
1356
>>> qs.filter(resolved_at__isnull=False, channel_filter__isnull=True).count()
38
```
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Create a custom non-root user and use it to start an app. So uwsgi does
not require to use `setUid` and `setGid` system calls.
It handles errors while starting in Kubernetes with `runAsNonRoot: true`
check.
## Which issue(s) this PR fixes
closes https://github.com/grafana/oncall/issues/445
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
# What this PR does
Introduces BaseFailed exception for phone_notificator.
# Why
We need to somehow distinguish errors we want to be notified - like
network errors or invalid twilio credentials (I will call them "real"
errors) and errors we want to share with user, but don't want to be
paged ( I will call them "fake" errors).
To do that I added "graceful_msg" to all Failed... exceptions. If
details field is present - it mean we can return 400 code with the
message, if not - 500 code. So, "real" errors will raise Failed...
exception, while "fake" will add "graceful_msg".
# TODO
handle exceptions handled here
https://github.com/grafana/oncall/pull/2065
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
# What this PR does
This PR does three improvements in twilio_phone_provider:
1.
[Speed-up](https://github.com/grafana/oncall/pull/2034/files#diff-7a311767169c024e60e2b4e35fd531dd6e2f1ea785cfc84263e11e7932d622af)
query which calculates amount of phone_calls/sms left.
2. Remove code which was needed only for backward compatibility during
the release of PhoneProvider refactoring and improves logging for
handling status/gather updates.
3. Add db_index to twilio_sid. We are doing lot of lookups by sid and
with increasing amount of data it became resource consuming.
# What this PR does
Add first internal api error code and add "resolution_note" param to
resolve endpoint
## Which issue(s) this PR fixes
It fixes resolving alert groups via mobile app when resolution note is
required
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Makes sure that all viewset actions with `detail=True` use
`self.get_object()` to retrieve an instance that's being acted upon.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)