Commit graph

123 commits

Author SHA1 Message Date
Ildar Iskhakov
d3c6621dae
Teams redesign (#1528)
# What this PR does

* api returns all the resources available to the user by default 
* substitutes `team switcher` with `multi-select team filter`
* allow referencing between integrations - escalations chains -
[schedules, outgoing webhooks] across teams



https://user-images.githubusercontent.com/2262529/225634581-2d2e8af2-15ce-4c01-a90e-8267d98f5a23.mov



## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Maxim <maxim.mordasov@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-03-22 00:57:20 +08:00
Joey Orlando
4d655dff60
modify check_escalation_finished_task task (#1266)
# What this PR does

This PR:
- modifies the `check_escalation_finished_task` celery task to:
  - do stricter escalation validation based on the alert group's
escalation snapshot (see the `audit_alert_group_escalation` method in
`engine/apps/alerts/tasks/check_escalation_finished.py` for the
validation logic)
- use a read-only database for querying alert-groups if one is
configured, otherwise use the "default" one
- ping a configurable heartbeat (new env var
`ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL` added)
- increase the task frequency from every 10 to every 13 minutes (this
can be configured via an env variable)
  - adds public documentation on how to configure this auditor task
- modifies the local celery startup command to properly take into
consideration all celery related env vars (similar to the ones we use in
`engine/celery_with_exporter.sh`; this made it easier to enable `celery
beat` locally for testing)
- removes the following code:
- removes references to `AlertGroup.estimate_escalation_finish_time` and
marks the model field as deprecated using the [`django-deprecate-fields`
library](https://pypi.org/project/django-deprecate-fields/). This field
was only used for the previous version of this validation task
- `EscalationSnapshotMixin.calculate_eta_for_finish_escalation` was only
used to calculate the value for
`AlertGroup.estimate_escalation_finish_time`
  - `calculate_escalation_finish_time` celery task
  

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1558

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-17 10:14:08 +00:00
Vadim Stepanov
ea60c0d247
Inbound email integration (#837)
This PR add Inbound Email integration.

It designed to support some variety of ESPs, but in prod we will use
Mailgun, so locally I tested it only with mailgun ESP.

**Important:**
To make it work on different clusters I'm planning to provide different
email domains for different regions, like ....@us.oncall.grafana.net,
...@eu.oncall.grafana.net

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-03-16 13:59:21 +08:00
Innokentii Konstantinov
747a2b2bc0
FIx insight_logs for mobile app backend (#1498) 2023-03-08 13:38:59 +00:00
Ildar Iskhakov
2e63a9ff08
Jinja2 based routes (#1319)
# What this PR does

This PR adds the new way to set up routes using jinja2 templating
language

<img width="1174" alt="Screenshot 2023-03-06 at 22 11 13"
src="https://user-images.githubusercontent.com/2262529/223134053-69d43c47-bb2a-4790-a16d-767425017a76.png">
<img width="1175" alt="Screenshot 2023-03-06 at 22 11 34"
src="https://user-images.githubusercontent.com/2262529/223134070-1e5ef82f-021c-4d5d-b255-b19bb3445641.png">


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-08 16:42:18 +08:00
Innokentii Konstantinov
a50ec8fed2
Refactor get_user_verbal_for_team_for_slack. (#809)
Remove unused params from signature, rename
2023-03-07 10:09:37 +00:00
Innokentii Konstantinov
249e4067c4 Remove unused def render_resolution_notes_for_csv_report 2023-03-07 13:47:49 +08:00
Innokentii Konstantinov
6a5e75e083
Fix of templates api behaviour for public and private api (#1408)
# What this PR does

This PR fixes templates behaviour for public and private api. It fix
"reset to default" for templates from messaging backends and some minor
bugs. Also added acknowledge signal and source link templates

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-01 16:32:15 +08:00
Matias Bordese
04c42e2796
Matiasb/fix task refresh ical when empty value (#1401)
This should fix task error as seen in logs, trying to parse an empty
string as ical value:
```
Task apps.schedules.tasks.refresh_ical_files.refresh_ical_file[] raised unexpected: ValueError("Found no components where exactly one is required: ''")
```
2023-02-24 21:16:09 +00:00
Yulya Artyukhina
53af4783de
Fix the cause of retry of notify_all and notify_group tasks (#1376)
Fix the cause of retry of notify_all and notify_group tasks that was
related to an incorrect step order.
2023-02-23 09:28:13 +00:00
Innokentii Konstantinov
26a2bd9c91
Refactor maintenance (#1340)
# What this PR does
This PR simplifies code of maintenance mode.
1. Perform distribution/escalation maintenance checks in send_signal...
tasks.
2. Use usual alert distribution flow for the maintenance incident.
3. Decouple maintenance mode from slack (all, except
**notify_about_maintenance_action** methods, I don't want to make this
PR too big)

As a bonus from these changes, maintenance mode now mute alert group
delivery in all chatops integrations, not only in slack. (Before,
incidents happened while maintenance were posted to telegram and msteams
anyway)

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-02-23 07:13:03 +00:00
Innokentii Konstantinov
c733d8b9f2
Cleanup ScenarioStep (#1213)
# What this PR does
This PR cleanup ScenarioStep. It's needed to simplify moving Slack to
the messaging backends in future.

1. Introduce AlertGroupSlackService to move logic from ScenarioStep.
Also it allowed to get rid of importing ScenarioSteps in the code not
related to processing of slack callbacks.
2. Remove tags from ScenarioSteps, they are unused.
3. Remove ScenarioStep.dispatch method. It just was calling
ScenarioStep.process_scenario.
4. Remove "action" param from process_scenario, it was unused.
5. Remove creation of SlackActionRecord on handling SlackEvents. We are
not using it, but it generates INSERT query on most of the user-slack
interactions.
6. Remove "random_prefix_for_routing" from ScenarioStep, it was unused.
## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-21 20:22:11 +01:00
Yulya Artyukhina
058665b8a8
Fix too long declare incident link (#1342)
# What this PR does

## Which issue(s) this PR fixes
Issue with too long declare incident link in Slack

## Checklist

- [x] `CHANGELOG.md` updated
2023-02-20 18:42:44 +08:00
Ildar Iskhakov
1b7ada4315
Add database migrations linter (#1020)
# What this PR does

This PR adds
[django-migration-linter](https://github.com/3YOURMIND/django-migration-linter)
to keep database migrations
 backwards compatible

- we can automatically run migrations and they are zero-downtime, e.g.
old code can work with the migrated database
 - we can run and rollback migrations without worrying about data safety
- OnCall is deployed to the multiple environments core team is not able
to control

See [django-migration-linter
checklist](https://github.com/3YOURMIND/django-migration-linter/blob/main/docs/incompatibilities.md)
for the common mistakes and best practices


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-06 16:01:37 +08:00
Matias Bordese
bc0276fb22
Keep track of direct paging schedule/importance in logs (#1269)
This will eventually allow to improve responders information in an alert
group detail page
2023-02-02 09:21:31 -03:00
Vadim Stepanov
f80271a1f4
Return alert group ID in direct paging API (#1241)
# What this PR does
Make direct paging internal API endpoint return an alert group ID.

## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
2023-01-30 11:48:25 +00:00
Ildar Iskhakov
ae44ee5652
Cache render_for_web field for alertgroups list serializer (#1236)
# What this PR does
This PR caches the field `render_for_web` with lifetime 1 day and cache
becomes invalid if it was created before
* last alert received
* template changed


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-28 12:50:41 +08:00
Matias Bordese
dd27b3f2c5
Add schedules support for slack direct paging (#1183)
Related to #823
2023-01-25 09:10:50 -03:00
Yulya Artyukhina
de5d876d27
Refactor create/update contact points for Alerting integration (#872)
**What this PR does**:
- Keep grafana version on create/update contact points to avoid multiple
requests to alerting
- Add retry limit on create contact point async
- Fix bugs related on create contact point
- Update logs on create/update contact point, make them more clear
- Avoid unnecessary requests to Grafana Alerting
2023-01-25 09:42:42 +01:00
Ildar Iskhakov
37d25b5b31
Optimize alert group filtering queries (#1191)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-23 16:07:55 +08:00
Michael Derynck
cc3fdab8fb
Fix UnboundLocalError in webhooks (#1165)
Fix error where rendered_data was being used without being defined.
2023-01-19 15:50:22 -07:00
Vadim Stepanov
ccae9d86b3
Add an ability to use an escalation chain for direct paging (#1161)
# What this PR does
Adds an ability to page an escalation chain for a newly created direct
paging alert group using the internal API. Also [adds a forgotten
migration](32fc44e744)
related to the direct paging backend.
Related to https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-19 18:51:57 +00:00
Yulya Artyukhina
d5461866d1
Add a dummy step for declare incident button in slack (#1157)
Add a dummy step for declare incident button to prevent raising 'Step is
undefined' exception because Slack sends a POST request to the backend
upon clicking a button with a redirect link to Incident.
This pr doesn't change any functionality
2023-01-19 14:50:02 +01:00
Matias Bordese
90def88752
Add escalation chain option when creating a direct page alert group (#1143)
Also changes the default integration used when creating an alert group
for a direct page to a custom manual integration to avoid
conflicts/unexpected behaviors with existing manual alerts.
2023-01-18 12:58:26 -03:00
Matias Bordese
d3062b56fd
Draft initial logic for user/schedule paging (#1098)
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 12:19:08 -03:00
Yulya Artyukhina
9129a720ef
Integration with grafana incident (#1081)
Check if Grafana Incident is enabled. If it is, add a button with a link
to declare Grafana Incident from Alert group in Slack and on Web.

Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
2023-01-17 13:04:50 +01:00
Tommy
5bd8fbdef8
Add alert groups state filter (#1133)
# What this PR does
This PR added a new parameter (state) into the alert_group public API to
filter the state of the alert groups

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/684

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 10:28:29 +00:00
Innokentii Konstantinov
fa6906a606
Simplify and speed up slack rendering (#1105)
Simplify and speed up slack rendering.
2023-01-10 15:41:38 +08:00
Joey Orlando
802e3964e9
update mobile app push notification text + make telegram alert verbage consistent ("Firing" instead of "Alerting") (#1089) 2023-01-05 16:16:43 +01:00
Michael Derynck
7c26eb559b
Improve handling of template exceptions during group data creation (#1068)
# What this PR does
With the addition of tighter controls on jinja templates handle
exceptions while rendering group data as follows:
- Title will cache error message as title and display to user and the
error will be logged
- Group distinction will be left as None and the error will be logged
- Is resolve signal will be treated as False and the error will be
logged
- Is acknowledge signal will be treated as False and the error will be
logged

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1542
2023-01-03 12:30:59 -07:00
Matias Bordese
05524ab698
Merge pull request #1059 from grafana/matiasb/truncate-slack-title-block
Truncate slack alert group title block below max size
2023-01-03 08:50:57 -03:00
Innokentii Konstantinov
5e297847ae Speedup alert group search 2023-01-03 11:04:16 +08:00
Matias Bordese
75aaeef3f2 Truncate slack alert group title block below max size 2023-01-02 10:07:53 -03:00
Innokentii Konstantinov
41f886b31e Speedup seach alertgroup 2022-12-17 19:34:13 +08:00
Innokentii Konstantinov
7341641b3f
Introduce org uuid (#947)
* Introduce org uuid

* Rename uuid_with_org_id to uuid_with_org_uuid

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2022-12-06 22:42:58 +08:00
Joey Orlando
ffda80ae34
add permalinks.web attribute to alert group internal/public api response (#953) 2022-12-06 11:06:05 +01:00
Joey Orlando
9e598385f4
Add RBAC Support (#777)
* Modify plugin.json to support RBAC role registration

* defines 26 new custom roles in plugin.json. The main roles are:

- Admin: read/write access to everything in OnCall
- Reader: read access to everything in OnCall
- OnCaller : read access to everything in OnCall + edit access to Alert Groups and Schedules
- <object-type> Editor: read/write access to everything related to <object-type>
- <object-type> Reader: read access for <object-type>
- User Settings Admin: read/write access to all user's settings, not just own settings. This is in comparison to User Settings Editor which can only read/write own settings

* update changelog and documentation (#686)

* implement RBAC for OnCall backend

This commit refactors backend authorization. It trys to use RBAC authorization if the org's grafana instance supports it, otherwise it falls back to basic role authorization.

* update RBAC backend tests

* add tests for RBAC changes
- run backend tests as matrix where RBAC is enabled/disabled. When RBAC is enabled, the permissions granted are read from the role grants in the frontend's plugin.json file (instead of relying what we specify in RBACPermission.Permissions)
- remove --reuse-db --nomigrations flags from engine/tox.ini
- minor autoformatting changes to docker-compose-developer.yml

* remove --ds=settings.ci-test from pytest CI command

DJANGO_SETTINGS_MODULE is already specified as an env var so this is just unecessary duplication

* update gitignore

* update github action job name for "test"

* RBAC frontend changes

* refactors the use of basic roles (ex. Viewer, Editor, Admin) use RBAC permissions (when supported), or falling back to basic roles when RBAC is not supported.

- updates the UserAction enum in grafana-plugin/src/state/userAction.ts. Previously this was hardcoded to a list of strings that were being returned by the OnCall API. Now the values here correspond to the permissions in plugin.json (plus a fallback role)

* changes per Gabriel's comments:
- get rid of group attribute in rbac roles
- remove displayName role attribute
- remove hidden role attribute
- add back role to includes section

* don't try to update user timezone if they don't have permission
2022-11-29 09:41:56 +01:00
Michael Derynck
3582f9b08f
Improve Jinja Template feedback and error handling (#884)
* Improve feedback so template errors are given to user

* Add security error logging

* Add limits for templates, payloads, results

* Show popup error notification for webhook errors and template errors that don't have a result

* Update tests

* Split exceptions into warnings/errors to give more control when previewing, rendering, saving templates

* Limit title lengths

* Make TypeError a warning

* Adjust title length limit

* Remove length limiting on urlize since it is being done on template render

* Fix tests

* Add KeyError and ValueError to warnings

* No longer enforcing json result when saving webhook in case it is dependent on payload

* Add tests for expected exceptions coming from apply_jinja_template

* Update changelog

* Send raw post if template result is not JSON
2022-11-28 09:46:51 -07:00
Vadim Stepanov
dc6fcf5c05
Add internal API fields for the mobile app (#910)
* add permalinks list to internal API alertgroup view

* add user's name and full avatar URL to the user view

* make avatar_full_url a property

* fix tests

* fix user connection criteria
2022-11-28 15:52:31 +00:00
Vadim Stepanov
255964ceaf
Mobile app messaging backend (#874)
* move mobile notifications to a separate backend, remove critical notification

* remove outdated mobile app code

* MOBILE_APP_PUSH_NOTIFICATIONS_ENABLED -> FEATURE_MOBILE_APP_INTEGRATION_ENABLED

* create error log if no devices are set up

* move mobile auth related code to the mobile_app Django app

* move mobile auth related code to the mobile_app Django app

* move mobile auth related code to the mobile_app Django app

* fix typing

* add GCMDevice todos

* add user connection capabilities

* add user connect/disconnect to the messaging backend

* move APNS endpoint to mobile_app Django app

* restore critical notifications

* support hackathon app

* tweak migrations so mobile app auth tokens are preserved

* reuse notify_by IDs

* use mobile app template to render push notification

* add GCM/FCM (Android) support

* fix unlink user

* logger.error -> logger.info
2022-11-23 15:56:43 +00:00
Innokentii Konstantinov
0816813237 Handle 404 for get_alerting_config 2022-11-18 17:07:39 +08:00
Innokentii Konstantinov
f9a9c1d978
Cleanup on deletion/archivation of slack channel (#822)
* Cleanup on deletion/archivation of slack channel

* Bulk update of organizations, filter channel filters by org

* Optimize org bulk update
2022-11-16 17:56:05 +08:00
Michael Derynck
25826690a8 Use common environment for templates 2022-11-05 00:31:51 -06:00
Joey Orlando
627afe37e1
Remove references to Alert.migrator_lock attribute
This commit patches issue related to #708.

#708 forgot to remove attributes on models outside of the migration_tool django app that were referencing model attributes from migration_tool.

The only attribute that referenced a field in migration_tool was migrator_lock on the Alert model. This commit removes any references to that attribute.
2022-10-27 13:52:03 +02:00
Matias Bordese
2c8c66a8c8 Not previously handled backends (eg. mobile) could end here without a messaging backend 2022-10-26 09:30:13 -03:00
Michael Derynck
a37df38930 Merge branch 'dev' into mderynck/add-check-notify-group-task 2022-10-25 12:50:12 -06:00
Matias Bordese
8e2bcf5274 Fix failing test related to users org caching 2022-10-25 14:27:27 -03:00
Michael Derynck
ef097fcdd9 Add check for usergroup to notify group task 2022-10-25 10:23:19 -06:00
Innokentii Konstantinov
2c6a27154f
Support mutliregion telegram (#676)
* Support mutliregion telegram

* Fix test_personal_message

* Fix tg verification code tests

* Simplify /start cmd handler

* Comment about link with org_id in tg msg
2022-10-25 14:53:07 +08:00
Matias Bordese
eb32fa7ba0 Handle scenario when multiple general team manual integrations are available 2022-10-21 14:23:45 -03:00