Commit graph

607 commits

Author SHA1 Message Date
Ildar Iskhakov
335c8fe65b
Optimize alert and alert group public api endpoints, add filter by id (#1274)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-03 17:05:08 +08:00
Matvey Kukuy
038310829b
Mobile app documentation draft. (#1207)
# What this PR does

First draft of documentation. @alyssawada please use it as a starting
point :)

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [x] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: alyssa wada <alyssa.wada@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Alyssa Wada <101596687+alyssawada@users.noreply.github.com>
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-02-02 15:06:28 +00:00
Vadim Stepanov
2218161069
Fix test 2023-02-02 14:28:37 +00:00
Vadim Stepanov
08dbab73d2
Remove mobile_app_settings DynamicSetting (#1268)
# What this PR does
Remove checks for `mobile_app_settings` DynamicSetting, so changing
`FEATURE_MOBILE_APP_INTEGRATION_ENABLED` is enough for toggling the
mobile app backend (aka remove per-org feature flag)

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-02 13:21:04 +00:00
Matias Bordese
bc0276fb22
Keep track of direct paging schedule/importance in logs (#1269)
This will eventually allow to improve responders information in an alert
group detail page
2023-02-02 09:21:31 -03:00
Vadim Stepanov
9b709e86c9
Fix local dev setup slowness (#1270)
# What this PR does
Fixes an issue when a local dev setup becomes extremely slow.

- Set `DEBUG` and `SILK_PROFILER_ENABLED` to `False` by default + add
utility make commands to toggle it
- Use `uwsgi` instead of Django's built-in `runserver` for local dev
setup
- Limit Celery concurrency to 3 for local dev setup (previously was 20,
used >1GB RAM on my machine)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-02 09:08:48 +00:00
Ildar Iskhakov
df1517573e
Cache web template rendered fields for alert and alertgroup endpoints (#1261)
# What this PR does
This PR adds same approach as introduced
[here](https://github.com/grafana/oncall/pull/1236) to all alert and
alertgroup endpoints

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-02 11:37:52 +08:00
Vadim Stepanov
b7176888ed
Better FCM error handling / retries (#1267)
# What this PR does
Raise `FirebaseError` in celery tasks contacting FCM instead of just
logging it + add tests

## Checklist

- [x] Tests updated
2023-02-01 14:45:32 +00:00
Matias Bordese
3e15b8cd85
Add default slack channel info to direct paging dialog (#1263) 2023-02-01 10:03:54 -03:00
Joey Orlando
16196822de
Add utility function to get readonly db key if defined (#1264)
# What this PR does

This is a minor refactor before implementing
https://github.com/grafana/oncall-private/issues/1558.

Additionally, it cleans up a few spots where we do this:
```
# Re-take in case we are in the readonly db context.
```
We currently don't read anything from a read-only database, so this
should be not necessary.

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-02-01 12:07:32 +01:00
Joey Orlando
94fe7979cf
add django-dbconn-retry library (#1262) 2023-01-31 20:17:54 +01:00
Matias Bordese
b1fc123d9f
Add a filter by involved users to alert groups page (#1240)
Related to #1119 

It also adds a shortcut to filter current user's related alert groups
(alert groups user was notified by, or in which user participated). Make
the filter visible by default, with a false value.
2023-01-30 14:08:18 +02:00
Vadim Stepanov
f80271a1f4
Return alert group ID in direct paging API (#1241)
# What this PR does
Make direct paging internal API endpoint return an alert group ID.

## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
2023-01-30 11:48:25 +00:00
Ildar Iskhakov
ae44ee5652
Cache render_for_web field for alertgroups list serializer (#1236)
# What this PR does
This PR caches the field `render_for_web` with lifetime 1 day and cache
becomes invalid if it was created before
* last alert received
* template changed


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-28 12:50:41 +08:00
Matias Bordese
e0ae9919c7
Add paging for direct paging users in slack dialog (#1232)
Fixes issue when there are more than 100 users to be listed in the
direct pagination responders select. Alternatively we should consider
moving to an `external_select` block later.
2023-01-27 14:10:44 -03:00
Ildar Iskhakov
4a8011d236
Add silk setting to store .prof files in the specific folder and share it between uwsgi workers (#1228)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-26 20:33:04 +08:00
Ildar Iskhakov
a6a781320d
Set SILKY_PYTHON_PROFILER_BINARY setting to False by default (#1218)
# What this PR does
Here is the example of the visualisation with `snakeviz`
<img width="1126" alt="Screenshot 2023-01-25 at 22 15 49"
src="https://user-images.githubusercontent.com/2262529/214586753-ad49a002-27e1-4e44-82f2-4ad5f4e40101.png">


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-25 22:17:17 +08:00
Matias Bordese
dd27b3f2c5
Add schedules support for slack direct paging (#1183)
Related to #823
2023-01-25 09:10:50 -03:00
Joey Orlando
3cf2fcf660
optimize GET /schedules internal API endpoint (#1169)
# What this PR does

Fixes slow internal`GET /schedules` endpoints. Using the fake-data
generation script in #1128, I generated 65 calendar schedules in my
local setup. This resulted in the following endpoint performance:
![Screenshot 2023-01-24 at 12 03
16](https://user-images.githubusercontent.com/9406895/214276618-1a9848ba-eb84-49ec-a099-fdd96beac93f.png)

The responses which show ~76 queries were run on the latest `dev`
branch. Responses w/ ~26 queries were run on this branch.

Additionally:
- add typing to a few methods in `apps/schedules/ical_utils.py`
- document `apps/api/permissions/__init__.py:user_is_authorized`
function

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1552

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-25 11:08:09 +01:00
Yulya Artyukhina
de5d876d27
Refactor create/update contact points for Alerting integration (#872)
**What this PR does**:
- Keep grafana version on create/update contact points to avoid multiple
requests to alerting
- Add retry limit on create contact point async
- Fix bugs related on create contact point
- Update logs on create/update contact point, make them more clear
- Avoid unnecessary requests to Grafana Alerting
2023-01-25 09:42:42 +01:00
Ildar Iskhakov
1fc3f6d301
Refactor plugin sync (#1200)
# What this PR does

This PR adds a shortcut in the plugin synchronisation process, so the
existing users will be able login without waiting for the sync task.
Every request still starts the background synchronisation task, to be
able to propagate the organisation changes faster than periodic task. It
means that we don't necessarily need "force reload" button in the
interface.
For all the other cases (user does not exist, organisation token "not
ok", etc) process remains same - plugin will show "Initialising
plugin..." until the background task in successfully completed

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-01-25 09:12:08 +08:00
Vadim Stepanov
cf1a1cd7f3
Remove DynamicSetting usage for mobile app backend on OSS (#1204)
# What this PR does
Make so there's no need to populate `mobile_app_settings` DynamicSetting
when using the OSS license to turn on the mobile app backend.
2023-01-24 13:53:54 +00:00
Ildar Iskhakov
46b39b2c87
Remove resolved and acknowledged filters as we switched to status (#1201)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-24 18:13:21 +08:00
Innokentii Konstantinov
cfa7fb816c
Sync users and teams on tf requests (#1180)
# What this PR does
This PR add sync with grafana on requests from terraform 

## Which issue(s) this PR fixes
It's needed to fix case when customers want to create team via grafana
terraform provider and use it in the oncall provider without having to
log into Grafana Cloud.

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-01-24 13:44:07 +08:00
Vadim Stepanov
ae5949aa7e
Allow viewers fetch cloud connection status (#1181)
# What this PR does
Fixes the issue when users with the viewer role can't fetch the cloud
connection status, which makes the plugin fail to load for viewers. This
PR makes the cloud connection endpoint use `OTHER_SETTINGS_READ` for
fetching the cloud connection status instead of `OTHER_SETTINGS_WRITE`.

## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated
2023-01-23 11:17:57 +00:00
Ildar Iskhakov
37d25b5b31
Optimize alert group filtering queries (#1191)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-23 16:07:55 +08:00
Dan Cech
639fd81644
Update message when user needs to connect their profile (#1190)
# What this PR does

This just tweaks the message users get when they try to interact via
slack but haven't connected their profile, it fixes a typo and
streamlines the text.
2023-01-23 08:44:33 +01:00
Ildar Iskhakov
b90fe433c9
Optimize alertgroups endpoint (#1189)
# What this PR does

Changing query to retrieve alert group in two completely different
queries instead of one with `join`

new queries
```
SELECT alerts_alertreceivechannel.id
FROM alerts_alertreceivechannel
WHERE (alerts_alertreceivechannel.deleted_at IS NULL
       AND alerts_alertreceivechannel.organization_id = 8
       AND alerts_alertreceivechannel.team_id IS NULL)


SELECT `alerts_alertgroup`.`id`
FROM `alerts_alertgroup`
WHERE (`alerts_alertgroup`.`channel_id` IN (2,33,34,35,36,40,52,59,61,62,63,70,76,89,93,94,03,08,09,10,12,13,16,18,20,22,23,24,26,27,28,30,31,33,34,35,36,40,41,42,43,45,48,53,56,57,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,86,87,88,89,91,93,23,27,29,31,32,33,55,56,57,58,65,69,72,75,81,13,17,20,22,33,34,38,39,41,44,45,46,51,52,55,56,58,59,60,63,68,70,71)
       AND NOT `alerts_alertgroup`.`is_archived`
       AND NOT `alerts_alertgroup`.`is_archived`
       AND `alerts_alertgroup`.`root_alert_group_id` IS NULL
       AND ((NOT `alerts_alertgroup`.`silenced`
             AND NOT `alerts_alertgroup`.`acknowledged`
             AND NOT `alerts_alertgroup`.`resolved`)
            OR (`alerts_alertgroup`.`acknowledged`
                AND NOT `alerts_alertgroup`.`resolved`))
       AND NOT `alerts_alertgroup`.`is_archived`)
ORDER BY `alerts_alertgroup`.`id` DESC
LIMIT 26
```

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-22 00:53:11 +08:00
Ildar Iskhakov
c9b83906a0
Optimize alertgroups endpoint (#1188)
# What this PR does

Changing query to retrieve alert group in two requests instead of one
with `join`

old query:
```
SELECT `alerts_alertgroup`.`id`
FROM `alerts_alertgroup`
INNER JOIN `alerts_alertreceivechannel` ON (`alerts_alertgroup`.`channel_id` = `alerts_alertreceivechannel`.`id`)
WHERE (`alerts_alertreceivechannel`.`organization_id` = 1
       AND `alerts_alertreceivechannel`.`team_id` IS NULL
       AND NOT `alerts_alertgroup`.`is_archived`
       AND NOT `alerts_alertgroup`.`is_archived`
       AND `alerts_alertgroup`.`root_alert_group_id` IS NULL
       AND ((NOT `alerts_alertgroup`.`silenced`
             AND NOT `alerts_alertgroup`.`acknowledged`
             AND NOT `alerts_alertgroup`.`resolved`)
            OR (`alerts_alertgroup`.`acknowledged`
                AND NOT `alerts_alertgroup`.`resolved`))
       AND NOT `alerts_alertgroup`.`is_archived`)
ORDER BY `alerts_alertgroup`.`id` DESC
LIMIT 26
```

new query:
```
SELECT "alerts_alertgroup"."id"
FROM "alerts_alertgroup"
WHERE ("alerts_alertgroup"."channel_id" IN
         (SELECT U0."id"
          FROM "alerts_alertreceivechannel" U0
          WHERE (NOT (U0."integration" = maintenance)
                 AND U0."deleted_at" IS NULL
                 AND U0."organization_id" = 1
                 AND U0."team_id" IS NULL))
       AND NOT "alerts_alertgroup"."is_archived"
       AND NOT "alerts_alertgroup"."is_archived"
       AND "alerts_alertgroup"."root_alert_group_id" IS NULL
       AND ((NOT "alerts_alertgroup"."silenced"
             AND NOT "alerts_alertgroup"."acknowledged"
             AND NOT "alerts_alertgroup"."resolved")
            OR ("alerts_alertgroup"."acknowledged"
                AND NOT "alerts_alertgroup"."resolved"))
       AND NOT "alerts_alertgroup"."is_archived")
ORDER BY "alerts_alertgroup"."id" DESC
LIMIT 26
```


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-22 00:14:48 +08:00
Ildar Iskhakov
83b1f069d0
Optimize alertgroups endpoint (#1186)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-21 21:59:20 +08:00
Vadim Stepanov
2b0abf018c
Hide direct paging integrations (#1162)
# What this PR does
Hide direct paging integrations from the web UI. Related to
https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-20 13:29:57 +00:00
Ildar Iskhakov
0a00d3e2c1
Update base.py 2023-01-20 20:20:51 +08:00
Matias Bordese
693b5a41c4
Add slack command to trigger direct paging (#1154)
Slash command needs to be added to slack app manifest:

```
  slash_commands:
    - command: /escalate
      url: https://<oncall-public-url>/slack/interactive_api_endpoint/
      description: Create a new alert group escalation
      should_escape: false
```
2023-01-20 09:06:27 -03:00
Ildar Iskhakov
aec54707ec
Add pyroscope integration (#1176)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-20 18:47:16 +08:00
Ildar Iskhakov
c06709fdb6
Move silk profiler under env variable setting (#1175)
# What this PR does

This PR moves silk profiler under the settings flag which can be
configured with env vars. It will allow us to enable silk on the
clusters, e.g. dev

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-20 18:19:31 +08:00
Joey Orlando
98241b9a10
fake-data generation script + fixes for django-silk and django-debug-toolbar (#1128)
# What this PR does

## Main stuff

- add Python script to populate local Grafana/OnCall setup w/ large
amounts of fake data. Right now the data types that can be generated
are:
- teams and Admin users via the Grafana API (must be synced manually by
going into the UI before going onto the next step)
- Calendar Schedules which have three 8h oncall-shifts, via the OnCall
public API
- fixes `django-debug-toolbar` when being run in `docker-compose`
locally

## Other stuff
- documents how to easily modify the Grafana `docker-compose` container
provisioning configuration
- document solutions for two backend setup related issues encountered
when running the engine/celery workers locally, outside of
`docker-compose`, on an Apple silicon Mac
- fixes small bug in `grafana_plugin.helpers.client.APIClient.call_api`
where it would call `response.json()` for all requests, regardless of
whether or not the response actually contained data or not
- in `engine/settings/dev.py`, properly setup `django-silk` and document
the steps to use it locally
- make it possible to log out debug SQL queries by specifying
`DEV_DEBUG_VIEW_SQL_QUERIES` env var, rather than having to uncomment
out a section of `settings/dev.py`

## Which issue(s) this PR fixes

- Some local setup issues when trying to use `django-silk` and
`django-debug-toolbar`
- Makes it much easier to populate your local setup with a lot of fake
data
- Makes it possible to easily modify your local grafana's provisioning
configuration

## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-20 09:19:41 +01:00
Michael Derynck
cc3fdab8fb
Fix UnboundLocalError in webhooks (#1165)
Fix error where rendered_data was being used without being defined.
2023-01-19 15:50:22 -07:00
Vadim Stepanov
ccae9d86b3
Add an ability to use an escalation chain for direct paging (#1161)
# What this PR does
Adds an ability to page an escalation chain for a newly created direct
paging alert group using the internal API. Also [adds a forgotten
migration](32fc44e744)
related to the direct paging backend.
Related to https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-19 18:51:57 +00:00
Yulya Artyukhina
d5461866d1
Add a dummy step for declare incident button in slack (#1157)
Add a dummy step for declare incident button to prevent raising 'Step is
undefined' exception because Slack sends a POST request to the backend
upon clicking a button with a redirect link to Incident.
This pr doesn't change any functionality
2023-01-19 14:50:02 +01:00
Vadim Stepanov
29f67dc2f3
Fix circular import 2023-01-19 11:53:05 +00:00
Vadim Stepanov
6b87ad74e9
Enforce cloud connection to send push notifications on OSS (#1132)
This PR modifies how OSS instances send mobile app push notifications.
It also adds frontend warnings when user is trying to use the mobile app
without connecting to cloud.

- [x] Add public API authentication to `FCMRelayView` and throttle the
view to 300 push notifications per instance per minute. This is similar
to how SMS and phone call notifications work on OSS instances.
- [x] Add frontend warnings based on cloud connectivity
- [x] Fix/add frontend tests
- [x] Add tests for FCMRelayView and mobile app backend

## Screenshots

When a user tries to connect the mobile app in his settings and cloud is
not connected (clicking "Connect Cloud OnCall" redirects to the "Cloud"
tab):

<img width="1088" alt="Screenshot 2023-01-12 at 18 48 58"
src="https://user-images.githubusercontent.com/20116910/212156591-86906020-eddf-43f1-9402-7ebb7547c7e6.png">

When a user tries to use mobile push notifications as a personal
notification step and cloud is not connected:

<img width="764" alt="Screenshot 2023-01-12 at 19 01 10"
src="https://user-images.githubusercontent.com/20116910/212157580-9abb0758-79ad-4316-b8cd-15b4fff01502.png">

Now on the "Cloud" tab there's some info about the mobile app (the last
section at the bottom of the page):

<img width="1245" alt="Screenshot 2023-01-12 at 18 49 10"
src="https://user-images.githubusercontent.com/20116910/212156997-c8b70dd5-bf15-4bc7-8eb8-9decdb8ecc80.png">

After connecting to the cloud instance, everything goes back to active
and it's now possible to connect the mobile app:

<img width="1091" alt="Screenshot 2023-01-12 at 19 08 27"
src="https://user-images.githubusercontent.com/20116910/212158811-60d49888-4714-4c0e-850f-3ff6a11a117a.png">

After connecting the app the warning is gone:

<img width="764" alt="Screenshot 2023-01-12 at 19 07 00"
src="https://user-images.githubusercontent.com/20116910/212158614-677ab889-127f-4d64-bacc-0c26887f3097.png">
2023-01-19 11:15:56 +00:00
Vadim Stepanov
c93ee5c554
Send a Slack DM when user is not in channel (#1144)
# What this PR does

Currently, when a user gets mentioned in an alert group thread and the
user is not in the Slack channel, the Slack bot sends the following to
the channel:

> ⚠️ Tried to ask USER to look at incident. Unfortunately USER is
not in this channel. Please, invite.

This PR changes this behaviour to instead send a direct message to the
user. The message contains a link to the main alert group message in
Slack.

<img width="806" alt="Screenshot 2023-01-17 at 19 25 36"
src="https://user-images.githubusercontent.com/20116910/212996457-02db183f-2041-4998-b743-bd5b6c84b7b5.png">


## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-18 16:08:15 +00:00
Matias Bordese
90def88752
Add escalation chain option when creating a direct page alert group (#1143)
Also changes the default integration used when creating an alert group
for a direct page to a custom manual integration to avoid
conflicts/unexpected behaviors with existing manual alerts.
2023-01-18 12:58:26 -03:00
Vadim Stepanov
b8d78fd6bb
Allow messaging backends to be enabled/disabled per organization (#1151)
# What this PR does
Allows messaging backends to be enabled/disabled per organization when
getting a list of available personal notification channels.

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-18 15:52:25 +00:00
Matias Bordese
d3062b56fd
Draft initial logic for user/schedule paging (#1098)
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 12:19:08 -03:00
Yulya Artyukhina
9129a720ef
Integration with grafana incident (#1081)
Check if Grafana Incident is enabled. If it is, add a button with a link
to declare Grafana Incident from Alert group in Slack and on Web.

Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
2023-01-17 13:04:50 +01:00
Tommy
5bd8fbdef8
Add alert groups state filter (#1133)
# What this PR does
This PR added a new parameter (state) into the alert_group public API to
filter the state of the alert groups

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/684

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 10:28:29 +00:00
Vadim Stepanov
59f2c293e7
Move FCM relay logic into a celery task (#1137) 2023-01-13 19:28:34 +00:00
Matias Bordese
0d38fe2a7f
Web schedules overrides are the higher priority level (#1115)
Related to https://github.com/grafana/oncall-private/issues/1550
2023-01-13 08:58:35 -03:00
Innokentii Konstantinov
9a3b53ff34
Delete slack_connector on org soft-delete (#1127) 2023-01-12 17:37:05 +08:00