Commit graph

594 commits

Author SHA1 Message Date
Ildar Iskhakov
ae44ee5652
Cache render_for_web field for alertgroups list serializer (#1236)
# What this PR does
This PR caches the field `render_for_web` with lifetime 1 day and cache
becomes invalid if it was created before
* last alert received
* template changed


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-28 12:50:41 +08:00
Matias Bordese
e0ae9919c7
Add paging for direct paging users in slack dialog (#1232)
Fixes issue when there are more than 100 users to be listed in the
direct pagination responders select. Alternatively we should consider
moving to an `external_select` block later.
2023-01-27 14:10:44 -03:00
Ildar Iskhakov
4a8011d236
Add silk setting to store .prof files in the specific folder and share it between uwsgi workers (#1228)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-26 20:33:04 +08:00
Ildar Iskhakov
a6a781320d
Set SILKY_PYTHON_PROFILER_BINARY setting to False by default (#1218)
# What this PR does
Here is the example of the visualisation with `snakeviz`
<img width="1126" alt="Screenshot 2023-01-25 at 22 15 49"
src="https://user-images.githubusercontent.com/2262529/214586753-ad49a002-27e1-4e44-82f2-4ad5f4e40101.png">


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-25 22:17:17 +08:00
Matias Bordese
dd27b3f2c5
Add schedules support for slack direct paging (#1183)
Related to #823
2023-01-25 09:10:50 -03:00
Joey Orlando
3cf2fcf660
optimize GET /schedules internal API endpoint (#1169)
# What this PR does

Fixes slow internal`GET /schedules` endpoints. Using the fake-data
generation script in #1128, I generated 65 calendar schedules in my
local setup. This resulted in the following endpoint performance:
![Screenshot 2023-01-24 at 12 03
16](https://user-images.githubusercontent.com/9406895/214276618-1a9848ba-eb84-49ec-a099-fdd96beac93f.png)

The responses which show ~76 queries were run on the latest `dev`
branch. Responses w/ ~26 queries were run on this branch.

Additionally:
- add typing to a few methods in `apps/schedules/ical_utils.py`
- document `apps/api/permissions/__init__.py:user_is_authorized`
function

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1552

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-25 11:08:09 +01:00
Yulya Artyukhina
de5d876d27
Refactor create/update contact points for Alerting integration (#872)
**What this PR does**:
- Keep grafana version on create/update contact points to avoid multiple
requests to alerting
- Add retry limit on create contact point async
- Fix bugs related on create contact point
- Update logs on create/update contact point, make them more clear
- Avoid unnecessary requests to Grafana Alerting
2023-01-25 09:42:42 +01:00
Ildar Iskhakov
1fc3f6d301
Refactor plugin sync (#1200)
# What this PR does

This PR adds a shortcut in the plugin synchronisation process, so the
existing users will be able login without waiting for the sync task.
Every request still starts the background synchronisation task, to be
able to propagate the organisation changes faster than periodic task. It
means that we don't necessarily need "force reload" button in the
interface.
For all the other cases (user does not exist, organisation token "not
ok", etc) process remains same - plugin will show "Initialising
plugin..." until the background task in successfully completed

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-01-25 09:12:08 +08:00
Vadim Stepanov
cf1a1cd7f3
Remove DynamicSetting usage for mobile app backend on OSS (#1204)
# What this PR does
Make so there's no need to populate `mobile_app_settings` DynamicSetting
when using the OSS license to turn on the mobile app backend.
2023-01-24 13:53:54 +00:00
Ildar Iskhakov
46b39b2c87
Remove resolved and acknowledged filters as we switched to status (#1201)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-24 18:13:21 +08:00
Innokentii Konstantinov
cfa7fb816c
Sync users and teams on tf requests (#1180)
# What this PR does
This PR add sync with grafana on requests from terraform 

## Which issue(s) this PR fixes
It's needed to fix case when customers want to create team via grafana
terraform provider and use it in the oncall provider without having to
log into Grafana Cloud.

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-01-24 13:44:07 +08:00
Vadim Stepanov
ae5949aa7e
Allow viewers fetch cloud connection status (#1181)
# What this PR does
Fixes the issue when users with the viewer role can't fetch the cloud
connection status, which makes the plugin fail to load for viewers. This
PR makes the cloud connection endpoint use `OTHER_SETTINGS_READ` for
fetching the cloud connection status instead of `OTHER_SETTINGS_WRITE`.

## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated
2023-01-23 11:17:57 +00:00
Ildar Iskhakov
37d25b5b31
Optimize alert group filtering queries (#1191)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-23 16:07:55 +08:00
Dan Cech
639fd81644
Update message when user needs to connect their profile (#1190)
# What this PR does

This just tweaks the message users get when they try to interact via
slack but haven't connected their profile, it fixes a typo and
streamlines the text.
2023-01-23 08:44:33 +01:00
Ildar Iskhakov
b90fe433c9
Optimize alertgroups endpoint (#1189)
# What this PR does

Changing query to retrieve alert group in two completely different
queries instead of one with `join`

new queries
```
SELECT alerts_alertreceivechannel.id
FROM alerts_alertreceivechannel
WHERE (alerts_alertreceivechannel.deleted_at IS NULL
       AND alerts_alertreceivechannel.organization_id = 8
       AND alerts_alertreceivechannel.team_id IS NULL)


SELECT `alerts_alertgroup`.`id`
FROM `alerts_alertgroup`
WHERE (`alerts_alertgroup`.`channel_id` IN (2,33,34,35,36,40,52,59,61,62,63,70,76,89,93,94,03,08,09,10,12,13,16,18,20,22,23,24,26,27,28,30,31,33,34,35,36,40,41,42,43,45,48,53,56,57,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,86,87,88,89,91,93,23,27,29,31,32,33,55,56,57,58,65,69,72,75,81,13,17,20,22,33,34,38,39,41,44,45,46,51,52,55,56,58,59,60,63,68,70,71)
       AND NOT `alerts_alertgroup`.`is_archived`
       AND NOT `alerts_alertgroup`.`is_archived`
       AND `alerts_alertgroup`.`root_alert_group_id` IS NULL
       AND ((NOT `alerts_alertgroup`.`silenced`
             AND NOT `alerts_alertgroup`.`acknowledged`
             AND NOT `alerts_alertgroup`.`resolved`)
            OR (`alerts_alertgroup`.`acknowledged`
                AND NOT `alerts_alertgroup`.`resolved`))
       AND NOT `alerts_alertgroup`.`is_archived`)
ORDER BY `alerts_alertgroup`.`id` DESC
LIMIT 26
```

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-22 00:53:11 +08:00
Ildar Iskhakov
c9b83906a0
Optimize alertgroups endpoint (#1188)
# What this PR does

Changing query to retrieve alert group in two requests instead of one
with `join`

old query:
```
SELECT `alerts_alertgroup`.`id`
FROM `alerts_alertgroup`
INNER JOIN `alerts_alertreceivechannel` ON (`alerts_alertgroup`.`channel_id` = `alerts_alertreceivechannel`.`id`)
WHERE (`alerts_alertreceivechannel`.`organization_id` = 1
       AND `alerts_alertreceivechannel`.`team_id` IS NULL
       AND NOT `alerts_alertgroup`.`is_archived`
       AND NOT `alerts_alertgroup`.`is_archived`
       AND `alerts_alertgroup`.`root_alert_group_id` IS NULL
       AND ((NOT `alerts_alertgroup`.`silenced`
             AND NOT `alerts_alertgroup`.`acknowledged`
             AND NOT `alerts_alertgroup`.`resolved`)
            OR (`alerts_alertgroup`.`acknowledged`
                AND NOT `alerts_alertgroup`.`resolved`))
       AND NOT `alerts_alertgroup`.`is_archived`)
ORDER BY `alerts_alertgroup`.`id` DESC
LIMIT 26
```

new query:
```
SELECT "alerts_alertgroup"."id"
FROM "alerts_alertgroup"
WHERE ("alerts_alertgroup"."channel_id" IN
         (SELECT U0."id"
          FROM "alerts_alertreceivechannel" U0
          WHERE (NOT (U0."integration" = maintenance)
                 AND U0."deleted_at" IS NULL
                 AND U0."organization_id" = 1
                 AND U0."team_id" IS NULL))
       AND NOT "alerts_alertgroup"."is_archived"
       AND NOT "alerts_alertgroup"."is_archived"
       AND "alerts_alertgroup"."root_alert_group_id" IS NULL
       AND ((NOT "alerts_alertgroup"."silenced"
             AND NOT "alerts_alertgroup"."acknowledged"
             AND NOT "alerts_alertgroup"."resolved")
            OR ("alerts_alertgroup"."acknowledged"
                AND NOT "alerts_alertgroup"."resolved"))
       AND NOT "alerts_alertgroup"."is_archived")
ORDER BY "alerts_alertgroup"."id" DESC
LIMIT 26
```


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-22 00:14:48 +08:00
Ildar Iskhakov
83b1f069d0
Optimize alertgroups endpoint (#1186)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-21 21:59:20 +08:00
Vadim Stepanov
2b0abf018c
Hide direct paging integrations (#1162)
# What this PR does
Hide direct paging integrations from the web UI. Related to
https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-20 13:29:57 +00:00
Ildar Iskhakov
0a00d3e2c1
Update base.py 2023-01-20 20:20:51 +08:00
Matias Bordese
693b5a41c4
Add slack command to trigger direct paging (#1154)
Slash command needs to be added to slack app manifest:

```
  slash_commands:
    - command: /escalate
      url: https://<oncall-public-url>/slack/interactive_api_endpoint/
      description: Create a new alert group escalation
      should_escape: false
```
2023-01-20 09:06:27 -03:00
Ildar Iskhakov
aec54707ec
Add pyroscope integration (#1176)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-20 18:47:16 +08:00
Ildar Iskhakov
c06709fdb6
Move silk profiler under env variable setting (#1175)
# What this PR does

This PR moves silk profiler under the settings flag which can be
configured with env vars. It will allow us to enable silk on the
clusters, e.g. dev

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-20 18:19:31 +08:00
Joey Orlando
98241b9a10
fake-data generation script + fixes for django-silk and django-debug-toolbar (#1128)
# What this PR does

## Main stuff

- add Python script to populate local Grafana/OnCall setup w/ large
amounts of fake data. Right now the data types that can be generated
are:
- teams and Admin users via the Grafana API (must be synced manually by
going into the UI before going onto the next step)
- Calendar Schedules which have three 8h oncall-shifts, via the OnCall
public API
- fixes `django-debug-toolbar` when being run in `docker-compose`
locally

## Other stuff
- documents how to easily modify the Grafana `docker-compose` container
provisioning configuration
- document solutions for two backend setup related issues encountered
when running the engine/celery workers locally, outside of
`docker-compose`, on an Apple silicon Mac
- fixes small bug in `grafana_plugin.helpers.client.APIClient.call_api`
where it would call `response.json()` for all requests, regardless of
whether or not the response actually contained data or not
- in `engine/settings/dev.py`, properly setup `django-silk` and document
the steps to use it locally
- make it possible to log out debug SQL queries by specifying
`DEV_DEBUG_VIEW_SQL_QUERIES` env var, rather than having to uncomment
out a section of `settings/dev.py`

## Which issue(s) this PR fixes

- Some local setup issues when trying to use `django-silk` and
`django-debug-toolbar`
- Makes it much easier to populate your local setup with a lot of fake
data
- Makes it possible to easily modify your local grafana's provisioning
configuration

## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-20 09:19:41 +01:00
Michael Derynck
cc3fdab8fb
Fix UnboundLocalError in webhooks (#1165)
Fix error where rendered_data was being used without being defined.
2023-01-19 15:50:22 -07:00
Vadim Stepanov
ccae9d86b3
Add an ability to use an escalation chain for direct paging (#1161)
# What this PR does
Adds an ability to page an escalation chain for a newly created direct
paging alert group using the internal API. Also [adds a forgotten
migration](32fc44e744)
related to the direct paging backend.
Related to https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-19 18:51:57 +00:00
Yulya Artyukhina
d5461866d1
Add a dummy step for declare incident button in slack (#1157)
Add a dummy step for declare incident button to prevent raising 'Step is
undefined' exception because Slack sends a POST request to the backend
upon clicking a button with a redirect link to Incident.
This pr doesn't change any functionality
2023-01-19 14:50:02 +01:00
Vadim Stepanov
29f67dc2f3
Fix circular import 2023-01-19 11:53:05 +00:00
Vadim Stepanov
6b87ad74e9
Enforce cloud connection to send push notifications on OSS (#1132)
This PR modifies how OSS instances send mobile app push notifications.
It also adds frontend warnings when user is trying to use the mobile app
without connecting to cloud.

- [x] Add public API authentication to `FCMRelayView` and throttle the
view to 300 push notifications per instance per minute. This is similar
to how SMS and phone call notifications work on OSS instances.
- [x] Add frontend warnings based on cloud connectivity
- [x] Fix/add frontend tests
- [x] Add tests for FCMRelayView and mobile app backend

## Screenshots

When a user tries to connect the mobile app in his settings and cloud is
not connected (clicking "Connect Cloud OnCall" redirects to the "Cloud"
tab):

<img width="1088" alt="Screenshot 2023-01-12 at 18 48 58"
src="https://user-images.githubusercontent.com/20116910/212156591-86906020-eddf-43f1-9402-7ebb7547c7e6.png">

When a user tries to use mobile push notifications as a personal
notification step and cloud is not connected:

<img width="764" alt="Screenshot 2023-01-12 at 19 01 10"
src="https://user-images.githubusercontent.com/20116910/212157580-9abb0758-79ad-4316-b8cd-15b4fff01502.png">

Now on the "Cloud" tab there's some info about the mobile app (the last
section at the bottom of the page):

<img width="1245" alt="Screenshot 2023-01-12 at 18 49 10"
src="https://user-images.githubusercontent.com/20116910/212156997-c8b70dd5-bf15-4bc7-8eb8-9decdb8ecc80.png">

After connecting to the cloud instance, everything goes back to active
and it's now possible to connect the mobile app:

<img width="1091" alt="Screenshot 2023-01-12 at 19 08 27"
src="https://user-images.githubusercontent.com/20116910/212158811-60d49888-4714-4c0e-850f-3ff6a11a117a.png">

After connecting the app the warning is gone:

<img width="764" alt="Screenshot 2023-01-12 at 19 07 00"
src="https://user-images.githubusercontent.com/20116910/212158614-677ab889-127f-4d64-bacc-0c26887f3097.png">
2023-01-19 11:15:56 +00:00
Vadim Stepanov
c93ee5c554
Send a Slack DM when user is not in channel (#1144)
# What this PR does

Currently, when a user gets mentioned in an alert group thread and the
user is not in the Slack channel, the Slack bot sends the following to
the channel:

> ⚠️ Tried to ask USER to look at incident. Unfortunately USER is
not in this channel. Please, invite.

This PR changes this behaviour to instead send a direct message to the
user. The message contains a link to the main alert group message in
Slack.

<img width="806" alt="Screenshot 2023-01-17 at 19 25 36"
src="https://user-images.githubusercontent.com/20116910/212996457-02db183f-2041-4998-b743-bd5b6c84b7b5.png">


## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-18 16:08:15 +00:00
Matias Bordese
90def88752
Add escalation chain option when creating a direct page alert group (#1143)
Also changes the default integration used when creating an alert group
for a direct page to a custom manual integration to avoid
conflicts/unexpected behaviors with existing manual alerts.
2023-01-18 12:58:26 -03:00
Vadim Stepanov
b8d78fd6bb
Allow messaging backends to be enabled/disabled per organization (#1151)
# What this PR does
Allows messaging backends to be enabled/disabled per organization when
getting a list of available personal notification channels.

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-18 15:52:25 +00:00
Matias Bordese
d3062b56fd
Draft initial logic for user/schedule paging (#1098)
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 12:19:08 -03:00
Yulya Artyukhina
9129a720ef
Integration with grafana incident (#1081)
Check if Grafana Incident is enabled. If it is, add a button with a link
to declare Grafana Incident from Alert group in Slack and on Web.

Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
2023-01-17 13:04:50 +01:00
Tommy
5bd8fbdef8
Add alert groups state filter (#1133)
# What this PR does
This PR added a new parameter (state) into the alert_group public API to
filter the state of the alert groups

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/684

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 10:28:29 +00:00
Vadim Stepanov
59f2c293e7
Move FCM relay logic into a celery task (#1137) 2023-01-13 19:28:34 +00:00
Matias Bordese
0d38fe2a7f
Web schedules overrides are the higher priority level (#1115)
Related to https://github.com/grafana/oncall-private/issues/1550
2023-01-13 08:58:35 -03:00
Innokentii Konstantinov
9a3b53ff34
Delete slack_connector on org soft-delete (#1127) 2023-01-12 17:37:05 +08:00
Joey Orlando
babacf4da8
refactor the is_rbac_permissions_enabled check to be more robust (#1099)
# What this PR does
Checks the `is_rbac_permissions_enabled` flag differently based on
whether we are dealing with an open-source, or cloud installation:
- for open-source installations, simply continue making a `HEAD` request
to the list RBAC permissions Grafana API endpoint.
- for cloud installations, use the `config` object returned from `GET
/instances/{instance_id}?config=true` and check whether
`instance_info["config"]["feature_toggles"]["accessControlOnCall"] ==
"true"`

## Which issue(s) this PR fixes
Resolves the issue in hosted grafana where when a stack is inactive, the
hosted grafana gateway, returns 200 to the `HEAD` request (which
erroneously sets the `is_rbac_permissions_enabled` flag to `true`)

## Checklist

- [x] Tests updated (N/A)
- [ ] Documentation added
- [x] `CHANGELOG.md` updated
2023-01-11 12:48:30 +01:00
Vadim Stepanov
231c0f45a3
Re-implement FCM relay after introducing firebase (#1121)
This PR changes how `FCMRelayView` handles push notifications from OSS
instances, also changing how the mobile app backend sends push
notifications.
2023-01-11 11:42:01 +00:00
Innokentii Konstantinov
fa6906a606
Simplify and speed up slack rendering (#1105)
Simplify and speed up slack rendering.
2023-01-10 15:41:38 +08:00
Matias Bordese
abbb5a8381
Update schedules query not to defer ical fields used for on-call check (#1114)
Do not defer cached ical fields which are later needed for calculating
on-call users listed in the schedules list page.
2023-01-09 14:10:23 -03:00
Matias Bordese
87fad5eec1
Add select_related to fetch schedules user group information (#1109) 2023-01-09 13:15:27 -03:00
Vadim Stepanov
a1e4f72280
remove send_link_to_channel_message_or_fallback_to_full_incident 2023-01-06 11:34:11 +00:00
Salvatore Giordano
a3dbe95d5a
fix: update android notification payload (#1090) 2023-01-05 17:36:07 +01:00
Joey Orlando
802e3964e9
update mobile app push notification text + make telegram alert verbage consistent ("Firing" instead of "Alerting") (#1089) 2023-01-05 16:16:43 +01:00
Innokentii Konstantinov
8abbcee050
Org soft-delete (#1073)
# What this PR does
It introduces soft-delete of organization, since grafana stacks are
soft-deleted too. Also, we had a problem with deleting orgs with large
amounts of alerts, so soft-deletion will fix this problem. I think, that
problem of cleaning alerts of deleted orgs should be solved as a part of
alert retention
2023-01-05 12:42:55 +08:00
Vadim Stepanov
0d4701bd81
Change wording from "incident" to "alert group" in the Telegram app (#1052)
# What this PR does
Makes Telegram integration consistent with the rest of the system so it
uses the word "alert group" instead of "incident" when referring to
alert groups.

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-04 17:44:01 +00:00
Vadim Stepanov
7a1f176cb5
Schedule score backend (#338)
This PR adds an endpoint returning a schedule quality score, overloaded
users and comments on the existing issues (e.g. balance issues or gaps).

## Limitations
- Since working hours editor is not implemented yet, there are only two
scores taken into account: balance score and a score representing the
ratio of time when someone is on-call to the whole time period.
- Time period is now set to be constant (90 days from today), so **in
some cases the results will be inaccurate** (when rotations don't align
with the time period)
- It only takes primary rotations into account (overrides are ignored)

## Usage
`GET /api/internal/v1/schedules/<pk>/quality?date=<TOMORROW_DATE>`

Note that `date` should be tomorrow date, because we can only be sure
about changing tomorrow's shifts (some of the shifts for current day
could be "deleted" but still show up in the UI).

## Example response
```json
{
  "total_score": 90,
  "comments": ["Schedule has no gaps", "Schedule is well-balanced, but still can be improved"],
  "overloaded_users": ["USSZ5WRH2CUA9", "U74XJZSSQGBIH"]
}
```

Issue: #118
2023-01-04 16:49:58 +00:00
Ildar Iskhakov
15256bc4cf Remove local uwsgi instrumentation and local development tempo and agent 2023-01-04 22:25:17 +08:00
Ildar Iskhakov
dd6e716ea4 Add uwsgidecorators to requirements.txt 2023-01-04 20:46:52 +08:00