Commit graph

1830 commits

Author SHA1 Message Date
Joey Orlando
4d655dff60
modify check_escalation_finished_task task (#1266)
# What this PR does

This PR:
- modifies the `check_escalation_finished_task` celery task to:
  - do stricter escalation validation based on the alert group's
escalation snapshot (see the `audit_alert_group_escalation` method in
`engine/apps/alerts/tasks/check_escalation_finished.py` for the
validation logic)
- use a read-only database for querying alert-groups if one is
configured, otherwise use the "default" one
- ping a configurable heartbeat (new env var
`ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL` added)
- increase the task frequency from every 10 to every 13 minutes (this
can be configured via an env variable)
  - adds public documentation on how to configure this auditor task
- modifies the local celery startup command to properly take into
consideration all celery related env vars (similar to the ones we use in
`engine/celery_with_exporter.sh`; this made it easier to enable `celery
beat` locally for testing)
- removes the following code:
- removes references to `AlertGroup.estimate_escalation_finish_time` and
marks the model field as deprecated using the [`django-deprecate-fields`
library](https://pypi.org/project/django-deprecate-fields/). This field
was only used for the previous version of this validation task
- `EscalationSnapshotMixin.calculate_eta_for_finish_escalation` was only
used to calculate the value for
`AlertGroup.estimate_escalation_finish_time`
  - `calculate_escalation_finish_time` celery task
  

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1558

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-17 10:14:08 +00:00
Joey Orlando
515f62ab56
update wording in some Slack messages which mention 'incident' instead of 'alert group' (#1565)
# What this PR does


![image](https://user-images.githubusercontent.com/9406895/225678127-4a0bcf96-742e-4335-9958-36fa0be26b9f.png)

## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-03-16 16:43:49 +00:00
Joey Orlando
1ccd529d27
fix resolution note Slack rendering bug (#1561)
# Which issue(s) this PR fixes
changing the block element type from `plain_text` to `mrkdwn` now allows
slack to properly render Slack usernames in the Slack UI.

Also, it seems that the `mrkdwn` context block type does not support the
`emoji` key, hence getting rid of it. From the Slack
[docs](https://api.slack.com/reference/block-kit/composition-objects#text):
![Screenshot 2023-03-16 at 16 19
25](https://user-images.githubusercontent.com/9406895/225663614-b0dbbaf1-4b39-48a4-9064-a3aa43fa4f43.png)


## Before
![Screenshot 2023-03-16 at 16 12
33](https://user-images.githubusercontent.com/9406895/225663322-7e84f7a9-dc6b-4827-b01c-1643bd27b766.png)

## After
![Screenshot 2023-03-16 at 16 12
27](https://user-images.githubusercontent.com/9406895/225663343-eff3fd27-d44b-4c2a-a06d-23e0936fcd37.png)



## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-03-16 15:36:29 +00:00
Matias Bordese
3de7766389
Check for duplicated positions in TF escalation policies (#1554)
Fixes https://github.com/grafana/oncall-private/issues/1680.
Avoid multiple escalation policies in an escalation chain to have the
same order.
2023-03-16 11:48:11 +00:00
Vadim Stepanov
747fbfcb1b
Add regex_match Jinja filter (#1556)
# What this PR does
Adds a new `regex_match` filter to Jinja environment.

## Which issue(s) this PR fixes
This should be useful on its own, and also helpful for
https://github.com/grafana/oncall/pull/1555.

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-16 10:18:49 +00:00
Vadim Stepanov
bd12d38ee0
Public API: allow null escalation chain when creating routes (#1557)
# What this PR does
Allows passing `null` as a value for `escalation_chain` when creating
routes via the public API.

## Which issue(s) this PR fixes
This is needed to unblock https://github.com/grafana/oncall/pull/1555 +
creating a route without an escalation chain is possible in the web UI,
so this PR makes the public API more consistent with the web UI.

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-16 09:40:50 +00:00
Vadim Stepanov
ea60c0d247
Inbound email integration (#837)
This PR add Inbound Email integration.

It designed to support some variety of ESPs, but in prod we will use
Mailgun, so locally I tested it only with mailgun ESP.

**Important:**
To make it work on different clusters I'm planning to provide different
email domains for different regions, like ....@us.oncall.grafana.net,
...@eu.oncall.grafana.net

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-03-16 13:59:21 +08:00
Matias Bordese
3ade317010
Rework webhook trigger tasks checks and payload build (#1544) 2023-03-14 17:21:46 +00:00
Vadim Stepanov
61b7c2ec48
Add alert group filter by escalation chain (#1535)
# What this PR does
Adds a new filter on alert groups page that allows to filter alert
groups by escalation chain.

<img width="1204" alt="Screenshot 2023-03-13 at 22 42 00"
src="https://user-images.githubusercontent.com/20116910/224848730-ef753856-a050-4acb-ba36-498d2bca2b4f.png">


## Which issue(s) this PR fixes
This should be useful on it's own as it's giving more filtering
capabilities, but it also could be useful for
https://github.com/grafana/oncall/issues/1300, if PD rulesets are
migrated to a single integration with multiple escalation chains.

## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated
2023-03-14 14:38:18 +00:00
Michael Derynck
e089e29b86
Enable new webhooks preview per org (#1534) 2023-03-14 14:31:47 +00:00
Matias Bordese
8ca82ad2cd
Webhooks trigger tasks on alert group events (#1533) 2023-03-13 21:19:22 +00:00
Matias Bordese
2db1a5a883
Add initial webhooks internal plugin API (#1524) 2023-03-10 17:00:06 +00:00
Matias Bordese
cebfec5ef9
Add support for web overrides to Terraform schedules (#1222)
Related to #828 

- Enable web UI for API/Terraform schedules to add overrides
- Refactor backend to add a flag toggling between web-based and
iCal-based overrides (these options are mutually exclusive)

Also updated read-only tooltips (related to #1483)
2023-03-10 16:21:50 +00:00
Matias Bordese
2048e783ba
Add webhooks app and initial models (#1101) 2023-03-09 19:39:25 +00:00
Matias Bordese
9709cfbc73
Add force option to delete web schedule shifts (#1519)
Related to #1505
Add force param to shift delete endpoint in plugin internal API.
2023-03-09 18:39:25 +00:00
Ildar Iskhakov
6614d50427
Rename Incident to Alert Group (#1512)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-09 08:57:13 +00:00
Innokentii Konstantinov
f0ce08bd67
Check stack cluster for insight_logs (#1469)
# What this PR does
This PR modify is_insight_logs_enabled to check for a stack cluster
instead of DynamicSetting

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-09 06:30:54 +00:00
Innokentii Konstantinov
fbb83daf21
Store org cluster_slug (#1480)
# What this PR does
Store org cluster slug to write insight logs
2023-03-09 04:10:19 +00:00
Innokentii Konstantinov
0a3dfeef7d
Fix ratelimit message: incident -> alert groups (#1481) 2023-03-09 03:37:16 +00:00
Vadim Stepanov
7604ed3956
Add test for newlines in email subject (#1510)
# What this PR does
Adds a simple test for https://github.com/grafana/oncall/pull/1499

## Checklist

- [x] Tests updated
2023-03-09 01:25:44 +00:00
Manu Vamadevan
3680e5b591
grafana ticketing 81244 : Emails fails if the alert title has a newline char (#1499)
# What this PR does
Emails fails if the alert title has a newline char
## Which issue(s) this PR fixes
Grafana tickets : 81244
## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated

---------

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-03-08 17:08:19 +00:00
Matias Bordese
ef66cab597
Fix involved users filter, add missing test (#1500)
Fixes https://github.com/grafana/oncall-private/issues/1673
2023-03-08 15:27:03 +00:00
Innokentii Konstantinov
747a2b2bc0
FIx insight_logs for mobile app backend (#1498) 2023-03-08 13:38:59 +00:00
Joey Orlando
0f23a449c7
add unique idx on user column in mobileapp authtoken table (#1482)
# Which issue(s) this PR fixes
Solves the (rare) issue where a user could potentially have > 1
mobileapp auth token, leading to 500 errors when trying to interact w/
the authtoken (ex. disconnect a mobile app from a user's profile):
```shell
2023-03-07 10:12:13 source=engine:app google_trace_id=e14bf933d634068a48caf093ce43c7f5/5550677047491218352 logger=django.request Internal Server Error: /api/internal/v1/users/U6WJ3BRLM1TR7/unlink_backend
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 47, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 181, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/viewsets.py", line 125, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
    response = self.handle_exception(exc)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
    raise exc
  File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 506, in dispatch
    response = handler(request, *args, **kwargs)
  File "/etc/app/apps/api/views/user.py", line 453, in unlink_backend
    backend.unlink_user(user)
  File "/etc/app/apps/mobile_app/backend.py", line 34, in unlink_user
    token = MobileAppAuthToken.objects.get(user=user)
  File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 439, in get
    raise self.model.MultipleObjectsReturned(
apps.mobile_app.models.MobileAppAuthToken.MultipleObjectsReturned: get() returned more than one MobileAppAuthToken -- it returned 2!
```

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-03-08 13:50:57 +01:00
Ildar Iskhakov
c4c5953f85
Merge branch 'main' into dev 2023-03-08 20:19:34 +08:00
Joey Orlando
7c8722e714
remove mobile app feature flag (#1484)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-03-08 11:22:44 +01:00
Ildar Iskhakov
2e63a9ff08
Jinja2 based routes (#1319)
# What this PR does

This PR adds the new way to set up routes using jinja2 templating
language

<img width="1174" alt="Screenshot 2023-03-06 at 22 11 13"
src="https://user-images.githubusercontent.com/2262529/223134053-69d43c47-bb2a-4790-a16d-767425017a76.png">
<img width="1175" alt="Screenshot 2023-03-06 at 22 11 34"
src="https://user-images.githubusercontent.com/2262529/223134070-1e5ef82f-021c-4d5d-b255-b19bb3445641.png">


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-08 16:42:18 +08:00
Vadim Stepanov
98ccd3eca5
Prohibit creating & updating past overrides (#1474)
# What this PR does
Prohibits creating & updating overrides in the past when using the web
UI.

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/1221

## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated
2023-03-07 15:54:20 +00:00
Vadim Stepanov
a958fe836b
Merge pull request #1477 from grafana/dev
Dev to main
2023-03-07 13:44:16 +00:00
Matias Bordese
f5fb5d34dc
Rework alert group mine filter query (#1466)
Rework query to make it more efficient.
2023-03-07 11:38:50 +00:00
Kristian Bremberg
b6d65ebb66
Chore: add integrity hash to templates (#1473)
# What this PR does

Adds integrity hash for scripts loaded from CDN's.
2023-03-07 11:17:07 +00:00
Innokentii Konstantinov
7bad073626
Remove OSS_INSTALATION env var (#881)
It's a duplicate of LICENSE env var

**What this PR does**:
Remove OSS_INSTALLATION env var in favour of LICENSE env var. Also, I
refactored features tests a little. From my point of view it makes
little sense to test if all features are disabled or enabled. Better to
test specific use-case (e.g. oss installation).
Also to test that all features are disabled it is needed to set LICENSE
equals cloud license, which makes test confusing.

**Checklist**
- [x] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-07 11:07:42 +00:00
ak0nst
44e93b6ab4
Email and phone limits now environment variable (#1219)
# What this PR does
Email and phone limits now environment variables:
EMAIL_NOTIFICATIONS_LIMIT=200, PHONE_NOTIFICATIONS_LIMIT=200

## Which issue(s) this PR fixes
#1010

## Checklist

- [ ] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated

---------

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-03-07 10:48:05 +00:00
Innokentii Konstantinov
a50ec8fed2
Refactor get_user_verbal_for_team_for_slack. (#809)
Remove unused params from signature, rename
2023-03-07 10:09:37 +00:00
Innokentii Konstantinov
249e4067c4 Remove unused def render_resolution_notes_for_csv_report 2023-03-07 13:47:49 +08:00
Vadim Stepanov
ab493def5f
Schedule quality backend improvements (#1461)
# What this PR does

Changes the schedule quality API so it also returns types of comments
(this is needed to address
https://github.com/grafana/oncall/issues/118#issuecomment-1436954708).

## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall/issues/118

## Checklist

- [x] Tests updated
2023-03-06 14:27:49 +00:00
Vadim Stepanov
c20229fefd
PD migrator: migrate overrides (#1454)
# What this PR does
Allows PD migrator to migrate overrides (the current implementation only
migrates rotation layers).
Also tweaks public API so created overrides are consistent with the web
UI.

## Checklist

- [x] Tests updated
2023-03-06 13:44:28 +00:00
Innokentii Konstantinov
4b91203eca
Add validation of hostname for recapctha (#1445)
# What this PR does

- Implement recapthca v3 check. DRF_RECAPTCHA didn't support hostname
validation and it's too complicated to add it.
- Add validation of verification code on oncall side to not to call
twilio with obviously invalid codes

## Checklist

- [x] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-06 08:59:48 +00:00
Matvey Kukuy
2a311c6289
Incident -> Alert Group wording (#1450) 2023-03-02 15:03:58 +00:00
Vadim Stepanov
8170ca491c
Fix pagination issue when searching schedules (#1437)
# What this PR does
Fixes a bug with inconsistent schedule count when searching by name.

Example (2 schedules returned, but count is incorrectly set to 12):

![image](https://user-images.githubusercontent.com/20116910/222198919-2f2124bc-52b2-4e5f-a949-79bbf89a5a26.png)

## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated
2023-03-01 16:28:40 +00:00
Vadim Stepanov
4c31ede558
Add "used in escalation" filter for schedules internal API (#1425)
# What this PR does
Adds a `used` filter on schedules endpoint for internal API.

Usage:
- `?used=true` returns schedules that are referenced by at least one
escalation policy
- `?used=false` returns schedules that are NOT referenced
- `?used=null` or not providing the query param at all will return all
schedules
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/1423

## Checklist

- [x] Tests updated
2023-03-01 10:09:07 +00:00
Innokentii Konstantinov
6a5e75e083
Fix of templates api behaviour for public and private api (#1408)
# What this PR does

This PR fixes templates behaviour for public and private api. It fix
"reset to default" for templates from messaging backends and some minor
bugs. Also added acknowledge signal and source link templates

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-01 16:32:15 +08:00
Vadim Stepanov
a25fd429da
Show 100 latest alerts on alert group page (#1417)
# What this PR does
Make internal API return 100 latest alerts for alert group.

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/857

## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated
2023-02-28 14:12:56 +00:00
Matias Bordese
04c42e2796
Matiasb/fix task refresh ical when empty value (#1401)
This should fix task error as seen in logs, trying to parse an empty
string as ical value:
```
Task apps.schedules.tasks.refresh_ical_files.refresh_ical_file[] raised unexpected: ValueError("Found no components where exactly one is required: ''")
```
2023-02-24 21:16:09 +00:00
Matias Bordese
721ab9fbb9
Use UTC instead of Etc/UTC when passing tz to dateutil rrule (#1414)
Fixes https://github.com/grafana/oncall-private/issues/1648
2023-02-24 20:54:20 +00:00
Michael Derynck
b3659872a7
Get reCAPTCHA site key from backend env (#1400)
# What this PR does
Move reCAPTCHA site key to backend environment for easier management to
support multiple environments.

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [x] `CHANGELOG.md` updated
2023-02-24 15:53:35 +00:00
Matias Bordese
98b3b918a5
Add schedule pagination to plugin API (#1309)
Related to #1289

---------

Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
2023-02-24 14:59:03 +00:00
Michael Derynck
49946e6a4e
Change Organization Deleted/Moved Precedence (#1402)
# What this PR does
When an organization is migrated to a different cluster it has it's
`migration_destination_slug` set for redirection purposes but it also
needs to be deleted so scheduled tasks for it do not run in the old
cluster. By changing the order so moved has precedence over deleted API
calls will be correctly redirected for moved organizations while the
organization is still considered deleted to suppress tasks that are no
longer needed in the old cluster.

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-02-24 11:45:21 +00:00
Matias Bordese
b6ce63e2a9
Fix/rewrite flaky schedule tests (#1397) 2023-02-23 18:20:51 +00:00
Joey Orlando
b61f2ce41f
patch minor sync issue when HTTP 302 is received from Grafana API instance (#1393)
# What this PR does

this PR refactors the `sync_organization` and
`GrafanaAPIClient.is_rbac_enabled_for_organization` methods to check the
connected response bool rather than explicit check on HTTP 200. This
handles the legitimate case where the Grafana instance may return an
HTTP 302 (redirect) rather than an HTTP 200.

## Which issue(s) this PR fixes

See
[this](https://grafana.slack.com/archives/C02LSUUSE2G/p1677136582890269)
Slack thread in the community channel for more context

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-02-23 13:23:57 +00:00