Related to https://github.com/grafana/oncall-private/issues/2831
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
---------
Co-authored-by: Matias Bordese <mbordese@gmail.com>
Co-authored-by: Dominik <dominik.broj@grafana.com>
# What this PR does
Fixes https://github.com/grafana/oncall/issues/4760 (also provides a
workaround for https://github.com/grafana/oncall/issues/4761)
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Reduces number of calls to db for `/schedules`, `/alertgroups` and
`/users` endpoints.
Fixes the issue when there was an additional call to db to get
organization url to build user avatar full link.
## Which issue(s) this PR closes
Related to [issue link here]
<!--
*Note*: If you want the issue to be auto-closed once the PR is merged,
change "Related to" to "Closes" in the line above.
If you have more than one GitHub issue that this PR closes, be sure to
preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Fixes building notification plan if one or more notifications were
bundled
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Fix scheduling `perform_notification` from `send_bundled_notification`
task - leftover after resolving merge conflict
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
This PR adds two new models: UserNotificationBundle and
BundledNotification (proposals for naming are welcome).
`UserNotificationBundle` manages the information about last notification
time and scheduled notification task for bundled notifications. It is
unique per user + notification_channel + notification importance.
`BundledNotification` contains notification policy and alert group, that
triggered the notification. The BundledNotification instance is created
in `notify_user_task` for every notification, that should be bundled,
and is attached to UserNotificationBundle by ForeignKey connection.
How it works:
If the user was notified recently (within the last two minutes) by the
current notification channel, and this channel is bundlable,
BundledNotification instance will be created and attached to the
UserNotificationBundle instance, and `send_bundled_notification` task
will be scheduled to execute in 2 min.
In `send_bundled_notification` task we get all BundledNotification
attached to the current UserNotificationBundle instance, check if alert
groups are still active and if there is only one notification - perform
regular notification by calling `perform_notification` task, otherwise
call "notify_by_<channel>_bundle" method for the current notification
channel.
PR with method to send notification bundle by SMS -
https://github.com/grafana/oncall/pull/4624
**This feature is disabled by default by feature flag. Public docs will
be added in a separate PR with enabling this feature.**
## Which issue(s) this PR closes
related to https://github.com/grafana/oncall-private/issues/2712
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
This is a follow-up PR to https://github.com/grafana/oncall/pull/4628.
As @Ferril pointed out, there was a slight issue in
`apps.alerts.tasks.notify_user.perform_notification` method when using a
"fallback"/default user notification policy. This is because the
`log_record_pk` arg passed into `perform_notification` will fetch the
`UserNotificationPolicyLogRecord` object, but that object will have a
`notification_policy` set to `None` (because there's no persistent
`UserNotificationPolicy` object to refer to).
Instead we now pass in a second argument to `perform_notification`,
`use_default_notification_policy_fallback`. If this is true, simply grab
the transient/in-memory `UserNotificationPolicy` and use that inside of
this task
Related to https://github.com/grafana/oncall/issues/4410
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Patches a small bug noticed (locally) by @Ferril 🙏 + updates our user
notification rules public API docs to include `notify_by_msteams` as a
valid `type` value (cloud only)
<!--
*Note*: if you have more than one GitHub issue that this PR closes, be
sure to preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
- Since send_alert_create_signal is inside transaction on_commit we can
conclude that if it does not exist it was intentionally deleted before
the task could run and the task can exit instead of retrying
- Improve logging when send_alert_create_signal is called so both alert
and alert group are in the same line so you don't need to search the
logs as much
- Improve logging on public api delete alert group so we can know what
the alert group belonged to and the responsible user/org
- Remove distribute_alerts (Stopped using a while back, code should be
safe to remove now, no tasks running in system)
## Which issue(s) this PR closes
Closes https://github.com/grafana/oncall-private/issues/2640
<!--
*Note*: if you have more than one GitHub issue that this PR closes, be
sure to preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
- removes unused "custom button" backend code now that we've migrated to
outgoing webhooks
- adds new e2e test for webhooks asserting that an `ngrok`/`express`
webhook handler receives the call as expected + payload is as expected
(related to https://github.com/grafana/oncall/issues/2691) - skipped for
now, the test passes locally but fails on GitHub Actions CI, seems to be
networking related
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
---------
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
# What this PR does
Count sms with status "accepted" as delivered in notification checker
## Which issue(s) this PR fixes
https://raintank-corp.slack.com/archives/C025VMT6SPK/p1706799009342889?thread_ts=1706786822.083149&cid=C025VMT6SPK
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Add transactions around log record creation and check transaction
on_commit before sending signals passing DB id of alert group log
records. In cases for delete we can then assume any missing IDs on tasks
are from intentionally deleted alert groups and we can stop tasks from
retrying endlessly.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR simplifies alert group/alert creation, so the alert created and
escalation started in the same task.
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Use Telegram ratelimit countdown when retry `perform_notification` task
on `RetryAfter` error
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2451
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Closes https://github.com/grafana/oncall-private/issues/2449
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Speed up escalation auditor
- use raw escalation snapshot instead of serialized one
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Stops rescheduling of `acknowledge_reminder_task` after 2 weeks.
Assumption being if it has been sitting for that long in acknowledged
state it is likely to not need more reminders that it is still
acknowledged. Notifications for thread were probably muted a long time
ago.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Updates check if escalation was skipped in Slack before trying to notify
user by Slack.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
There is no index for the `received_at` column, and the filter isn't
really needed (aggregation will work in any case, considering only the
entries for which we have data).
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Moving metrics creation into separate task to make alert ingestion more
robust
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Julia <ferril.darkdiver@gmail.com>
# What this PR does
add more logging in `apps.alerts.tasks.notify_user.perform_notification`
task (related to https://github.com/grafana/oncall-private/issues/2318;
won't solve that issue but will help with the investigation)
# What this PR does
Fix acknowledge reminder:
- check if organization was deleted
- improve logging
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR adds alert groups success ratio over last 48 hours
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Add an ability to continue escalations for alert groups from the point
it was in case if it was stopped
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Prefer partial since it is the what the [docs
suggests](https://docs.djangoproject.com/en/4.2/topics/db/transactions/#performing-actions-after-commit).
Also because partial is evaluated immediately while lambda is evaluated
at runtime (which may be causing some issues):
```
>>> from functools import partial
>>> def foo(a, b, c):
... print(a, b, c)
...
>>> x = 10
>>> bar_partial = partial(foo, 1, 2, x)
>>> bar_lambda = lambda: foo(1, 2, x)
>>> x = 20
>>> bar_partial()
1 2 10
>>> bar_lambda()
1 2 20
```
# What this PR does
- Invalidate alert group cache on wipe
- Improve public API docs on alert group deletion
- Add / improve tests
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall/issues/3051
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
* Create Direct Paging integration (with default route) when team is
created with bulk_update
* Create notification policies when user is created with bulk_update
* If user notification policies are empty change it to Email
* Minor markup and wording improvements
* Add grafana queue to helm chart
* Remove disabled commands for redis helm chart
* Improve Dockerfile caching
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- gracefully retry
`apps.alerts.tasks.delete_alert_group.delete_alert_group` when hitting
Slack ratelimits
- remove Slack messages from the DB as soon as they are deleted from
Slack, so the tasks are not retrying perpetually
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)