Commit graph

491 commits

Author SHA1 Message Date
Ildar Iskhakov
46b39b2c87
Remove resolved and acknowledged filters as we switched to status (#1201)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-24 18:13:21 +08:00
Innokentii Konstantinov
cfa7fb816c
Sync users and teams on tf requests (#1180)
# What this PR does
This PR add sync with grafana on requests from terraform 

## Which issue(s) this PR fixes
It's needed to fix case when customers want to create team via grafana
terraform provider and use it in the oncall provider without having to
log into Grafana Cloud.

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-01-24 13:44:07 +08:00
Vadim Stepanov
ae5949aa7e
Allow viewers fetch cloud connection status (#1181)
# What this PR does
Fixes the issue when users with the viewer role can't fetch the cloud
connection status, which makes the plugin fail to load for viewers. This
PR makes the cloud connection endpoint use `OTHER_SETTINGS_READ` for
fetching the cloud connection status instead of `OTHER_SETTINGS_WRITE`.

## Checklist

- [x] Tests updated
- [x] `CHANGELOG.md` updated
2023-01-23 11:17:57 +00:00
Ildar Iskhakov
37d25b5b31
Optimize alert group filtering queries (#1191)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-23 16:07:55 +08:00
Dan Cech
639fd81644
Update message when user needs to connect their profile (#1190)
# What this PR does

This just tweaks the message users get when they try to interact via
slack but haven't connected their profile, it fixes a typo and
streamlines the text.
2023-01-23 08:44:33 +01:00
Ildar Iskhakov
b90fe433c9
Optimize alertgroups endpoint (#1189)
# What this PR does

Changing query to retrieve alert group in two completely different
queries instead of one with `join`

new queries
```
SELECT alerts_alertreceivechannel.id
FROM alerts_alertreceivechannel
WHERE (alerts_alertreceivechannel.deleted_at IS NULL
       AND alerts_alertreceivechannel.organization_id = 8
       AND alerts_alertreceivechannel.team_id IS NULL)


SELECT `alerts_alertgroup`.`id`
FROM `alerts_alertgroup`
WHERE (`alerts_alertgroup`.`channel_id` IN (2,33,34,35,36,40,52,59,61,62,63,70,76,89,93,94,03,08,09,10,12,13,16,18,20,22,23,24,26,27,28,30,31,33,34,35,36,40,41,42,43,45,48,53,56,57,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,86,87,88,89,91,93,23,27,29,31,32,33,55,56,57,58,65,69,72,75,81,13,17,20,22,33,34,38,39,41,44,45,46,51,52,55,56,58,59,60,63,68,70,71)
       AND NOT `alerts_alertgroup`.`is_archived`
       AND NOT `alerts_alertgroup`.`is_archived`
       AND `alerts_alertgroup`.`root_alert_group_id` IS NULL
       AND ((NOT `alerts_alertgroup`.`silenced`
             AND NOT `alerts_alertgroup`.`acknowledged`
             AND NOT `alerts_alertgroup`.`resolved`)
            OR (`alerts_alertgroup`.`acknowledged`
                AND NOT `alerts_alertgroup`.`resolved`))
       AND NOT `alerts_alertgroup`.`is_archived`)
ORDER BY `alerts_alertgroup`.`id` DESC
LIMIT 26
```

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-22 00:53:11 +08:00
Ildar Iskhakov
c9b83906a0
Optimize alertgroups endpoint (#1188)
# What this PR does

Changing query to retrieve alert group in two requests instead of one
with `join`

old query:
```
SELECT `alerts_alertgroup`.`id`
FROM `alerts_alertgroup`
INNER JOIN `alerts_alertreceivechannel` ON (`alerts_alertgroup`.`channel_id` = `alerts_alertreceivechannel`.`id`)
WHERE (`alerts_alertreceivechannel`.`organization_id` = 1
       AND `alerts_alertreceivechannel`.`team_id` IS NULL
       AND NOT `alerts_alertgroup`.`is_archived`
       AND NOT `alerts_alertgroup`.`is_archived`
       AND `alerts_alertgroup`.`root_alert_group_id` IS NULL
       AND ((NOT `alerts_alertgroup`.`silenced`
             AND NOT `alerts_alertgroup`.`acknowledged`
             AND NOT `alerts_alertgroup`.`resolved`)
            OR (`alerts_alertgroup`.`acknowledged`
                AND NOT `alerts_alertgroup`.`resolved`))
       AND NOT `alerts_alertgroup`.`is_archived`)
ORDER BY `alerts_alertgroup`.`id` DESC
LIMIT 26
```

new query:
```
SELECT "alerts_alertgroup"."id"
FROM "alerts_alertgroup"
WHERE ("alerts_alertgroup"."channel_id" IN
         (SELECT U0."id"
          FROM "alerts_alertreceivechannel" U0
          WHERE (NOT (U0."integration" = maintenance)
                 AND U0."deleted_at" IS NULL
                 AND U0."organization_id" = 1
                 AND U0."team_id" IS NULL))
       AND NOT "alerts_alertgroup"."is_archived"
       AND NOT "alerts_alertgroup"."is_archived"
       AND "alerts_alertgroup"."root_alert_group_id" IS NULL
       AND ((NOT "alerts_alertgroup"."silenced"
             AND NOT "alerts_alertgroup"."acknowledged"
             AND NOT "alerts_alertgroup"."resolved")
            OR ("alerts_alertgroup"."acknowledged"
                AND NOT "alerts_alertgroup"."resolved"))
       AND NOT "alerts_alertgroup"."is_archived")
ORDER BY "alerts_alertgroup"."id" DESC
LIMIT 26
```


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-22 00:14:48 +08:00
Ildar Iskhakov
83b1f069d0
Optimize alertgroups endpoint (#1186)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-21 21:59:20 +08:00
Vadim Stepanov
2b0abf018c
Hide direct paging integrations (#1162)
# What this PR does
Hide direct paging integrations from the web UI. Related to
https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-20 13:29:57 +00:00
Matias Bordese
693b5a41c4
Add slack command to trigger direct paging (#1154)
Slash command needs to be added to slack app manifest:

```
  slash_commands:
    - command: /escalate
      url: https://<oncall-public-url>/slack/interactive_api_endpoint/
      description: Create a new alert group escalation
      should_escape: false
```
2023-01-20 09:06:27 -03:00
Joey Orlando
98241b9a10
fake-data generation script + fixes for django-silk and django-debug-toolbar (#1128)
# What this PR does

## Main stuff

- add Python script to populate local Grafana/OnCall setup w/ large
amounts of fake data. Right now the data types that can be generated
are:
- teams and Admin users via the Grafana API (must be synced manually by
going into the UI before going onto the next step)
- Calendar Schedules which have three 8h oncall-shifts, via the OnCall
public API
- fixes `django-debug-toolbar` when being run in `docker-compose`
locally

## Other stuff
- documents how to easily modify the Grafana `docker-compose` container
provisioning configuration
- document solutions for two backend setup related issues encountered
when running the engine/celery workers locally, outside of
`docker-compose`, on an Apple silicon Mac
- fixes small bug in `grafana_plugin.helpers.client.APIClient.call_api`
where it would call `response.json()` for all requests, regardless of
whether or not the response actually contained data or not
- in `engine/settings/dev.py`, properly setup `django-silk` and document
the steps to use it locally
- make it possible to log out debug SQL queries by specifying
`DEV_DEBUG_VIEW_SQL_QUERIES` env var, rather than having to uncomment
out a section of `settings/dev.py`

## Which issue(s) this PR fixes

- Some local setup issues when trying to use `django-silk` and
`django-debug-toolbar`
- Makes it much easier to populate your local setup with a lot of fake
data
- Makes it possible to easily modify your local grafana's provisioning
configuration

## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-20 09:19:41 +01:00
Michael Derynck
cc3fdab8fb
Fix UnboundLocalError in webhooks (#1165)
Fix error where rendered_data was being used without being defined.
2023-01-19 15:50:22 -07:00
Vadim Stepanov
ccae9d86b3
Add an ability to use an escalation chain for direct paging (#1161)
# What this PR does
Adds an ability to page an escalation chain for a newly created direct
paging alert group using the internal API. Also [adds a forgotten
migration](32fc44e744)
related to the direct paging backend.
Related to https://github.com/grafana/oncall/issues/823

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-19 18:51:57 +00:00
Yulya Artyukhina
d5461866d1
Add a dummy step for declare incident button in slack (#1157)
Add a dummy step for declare incident button to prevent raising 'Step is
undefined' exception because Slack sends a POST request to the backend
upon clicking a button with a redirect link to Incident.
This pr doesn't change any functionality
2023-01-19 14:50:02 +01:00
Vadim Stepanov
29f67dc2f3
Fix circular import 2023-01-19 11:53:05 +00:00
Vadim Stepanov
6b87ad74e9
Enforce cloud connection to send push notifications on OSS (#1132)
This PR modifies how OSS instances send mobile app push notifications.
It also adds frontend warnings when user is trying to use the mobile app
without connecting to cloud.

- [x] Add public API authentication to `FCMRelayView` and throttle the
view to 300 push notifications per instance per minute. This is similar
to how SMS and phone call notifications work on OSS instances.
- [x] Add frontend warnings based on cloud connectivity
- [x] Fix/add frontend tests
- [x] Add tests for FCMRelayView and mobile app backend

## Screenshots

When a user tries to connect the mobile app in his settings and cloud is
not connected (clicking "Connect Cloud OnCall" redirects to the "Cloud"
tab):

<img width="1088" alt="Screenshot 2023-01-12 at 18 48 58"
src="https://user-images.githubusercontent.com/20116910/212156591-86906020-eddf-43f1-9402-7ebb7547c7e6.png">

When a user tries to use mobile push notifications as a personal
notification step and cloud is not connected:

<img width="764" alt="Screenshot 2023-01-12 at 19 01 10"
src="https://user-images.githubusercontent.com/20116910/212157580-9abb0758-79ad-4316-b8cd-15b4fff01502.png">

Now on the "Cloud" tab there's some info about the mobile app (the last
section at the bottom of the page):

<img width="1245" alt="Screenshot 2023-01-12 at 18 49 10"
src="https://user-images.githubusercontent.com/20116910/212156997-c8b70dd5-bf15-4bc7-8eb8-9decdb8ecc80.png">

After connecting to the cloud instance, everything goes back to active
and it's now possible to connect the mobile app:

<img width="1091" alt="Screenshot 2023-01-12 at 19 08 27"
src="https://user-images.githubusercontent.com/20116910/212158811-60d49888-4714-4c0e-850f-3ff6a11a117a.png">

After connecting the app the warning is gone:

<img width="764" alt="Screenshot 2023-01-12 at 19 07 00"
src="https://user-images.githubusercontent.com/20116910/212158614-677ab889-127f-4d64-bacc-0c26887f3097.png">
2023-01-19 11:15:56 +00:00
Vadim Stepanov
c93ee5c554
Send a Slack DM when user is not in channel (#1144)
# What this PR does

Currently, when a user gets mentioned in an alert group thread and the
user is not in the Slack channel, the Slack bot sends the following to
the channel:

> ⚠️ Tried to ask USER to look at incident. Unfortunately USER is
not in this channel. Please, invite.

This PR changes this behaviour to instead send a direct message to the
user. The message contains a link to the main alert group message in
Slack.

<img width="806" alt="Screenshot 2023-01-17 at 19 25 36"
src="https://user-images.githubusercontent.com/20116910/212996457-02db183f-2041-4998-b743-bd5b6c84b7b5.png">


## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-18 16:08:15 +00:00
Matias Bordese
90def88752
Add escalation chain option when creating a direct page alert group (#1143)
Also changes the default integration used when creating an alert group
for a direct page to a custom manual integration to avoid
conflicts/unexpected behaviors with existing manual alerts.
2023-01-18 12:58:26 -03:00
Vadim Stepanov
b8d78fd6bb
Allow messaging backends to be enabled/disabled per organization (#1151)
# What this PR does
Allows messaging backends to be enabled/disabled per organization when
getting a list of available personal notification channels.

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-18 15:52:25 +00:00
Matias Bordese
d3062b56fd
Draft initial logic for user/schedule paging (#1098)
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 12:19:08 -03:00
Yulya Artyukhina
9129a720ef
Integration with grafana incident (#1081)
Check if Grafana Incident is enabled. If it is, add a button with a link
to declare Grafana Incident from Alert group in Slack and on Web.

Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
2023-01-17 13:04:50 +01:00
Tommy
5bd8fbdef8
Add alert groups state filter (#1133)
# What this PR does
This PR added a new parameter (state) into the alert_group public API to
filter the state of the alert groups

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/684

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-01-17 10:28:29 +00:00
Vadim Stepanov
59f2c293e7
Move FCM relay logic into a celery task (#1137) 2023-01-13 19:28:34 +00:00
Matias Bordese
0d38fe2a7f
Web schedules overrides are the higher priority level (#1115)
Related to https://github.com/grafana/oncall-private/issues/1550
2023-01-13 08:58:35 -03:00
Innokentii Konstantinov
9a3b53ff34
Delete slack_connector on org soft-delete (#1127) 2023-01-12 17:37:05 +08:00
Joey Orlando
babacf4da8
refactor the is_rbac_permissions_enabled check to be more robust (#1099)
# What this PR does
Checks the `is_rbac_permissions_enabled` flag differently based on
whether we are dealing with an open-source, or cloud installation:
- for open-source installations, simply continue making a `HEAD` request
to the list RBAC permissions Grafana API endpoint.
- for cloud installations, use the `config` object returned from `GET
/instances/{instance_id}?config=true` and check whether
`instance_info["config"]["feature_toggles"]["accessControlOnCall"] ==
"true"`

## Which issue(s) this PR fixes
Resolves the issue in hosted grafana where when a stack is inactive, the
hosted grafana gateway, returns 200 to the `HEAD` request (which
erroneously sets the `is_rbac_permissions_enabled` flag to `true`)

## Checklist

- [x] Tests updated (N/A)
- [ ] Documentation added
- [x] `CHANGELOG.md` updated
2023-01-11 12:48:30 +01:00
Vadim Stepanov
231c0f45a3
Re-implement FCM relay after introducing firebase (#1121)
This PR changes how `FCMRelayView` handles push notifications from OSS
instances, also changing how the mobile app backend sends push
notifications.
2023-01-11 11:42:01 +00:00
Innokentii Konstantinov
fa6906a606
Simplify and speed up slack rendering (#1105)
Simplify and speed up slack rendering.
2023-01-10 15:41:38 +08:00
Matias Bordese
abbb5a8381
Update schedules query not to defer ical fields used for on-call check (#1114)
Do not defer cached ical fields which are later needed for calculating
on-call users listed in the schedules list page.
2023-01-09 14:10:23 -03:00
Matias Bordese
87fad5eec1
Add select_related to fetch schedules user group information (#1109) 2023-01-09 13:15:27 -03:00
Vadim Stepanov
a1e4f72280
remove send_link_to_channel_message_or_fallback_to_full_incident 2023-01-06 11:34:11 +00:00
Salvatore Giordano
a3dbe95d5a
fix: update android notification payload (#1090) 2023-01-05 17:36:07 +01:00
Joey Orlando
802e3964e9
update mobile app push notification text + make telegram alert verbage consistent ("Firing" instead of "Alerting") (#1089) 2023-01-05 16:16:43 +01:00
Innokentii Konstantinov
8abbcee050
Org soft-delete (#1073)
# What this PR does
It introduces soft-delete of organization, since grafana stacks are
soft-deleted too. Also, we had a problem with deleting orgs with large
amounts of alerts, so soft-deletion will fix this problem. I think, that
problem of cleaning alerts of deleted orgs should be solved as a part of
alert retention
2023-01-05 12:42:55 +08:00
Vadim Stepanov
0d4701bd81
Change wording from "incident" to "alert group" in the Telegram app (#1052)
# What this PR does
Makes Telegram integration consistent with the rest of the system so it
uses the word "alert group" instead of "incident" when referring to
alert groups.

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-01-04 17:44:01 +00:00
Vadim Stepanov
7a1f176cb5
Schedule score backend (#338)
This PR adds an endpoint returning a schedule quality score, overloaded
users and comments on the existing issues (e.g. balance issues or gaps).

## Limitations
- Since working hours editor is not implemented yet, there are only two
scores taken into account: balance score and a score representing the
ratio of time when someone is on-call to the whole time period.
- Time period is now set to be constant (90 days from today), so **in
some cases the results will be inaccurate** (when rotations don't align
with the time period)
- It only takes primary rotations into account (overrides are ignored)

## Usage
`GET /api/internal/v1/schedules/<pk>/quality?date=<TOMORROW_DATE>`

Note that `date` should be tomorrow date, because we can only be sure
about changing tomorrow's shifts (some of the shifts for current day
could be "deleted" but still show up in the UI).

## Example response
```json
{
  "total_score": 90,
  "comments": ["Schedule has no gaps", "Schedule is well-balanced, but still can be improved"],
  "overloaded_users": ["USSZ5WRH2CUA9", "U74XJZSSQGBIH"]
}
```

Issue: #118
2023-01-04 16:49:58 +00:00
Michael Derynck
7c26eb559b
Improve handling of template exceptions during group data creation (#1068)
# What this PR does
With the addition of tighter controls on jinja templates handle
exceptions while rendering group data as follows:
- Title will cache error message as title and display to user and the
error will be logged
- Group distinction will be left as None and the error will be logged
- Is resolve signal will be treated as False and the error will be
logged
- Is acknowledge signal will be treated as False and the error will be
logged

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1542
2023-01-03 12:30:59 -07:00
Vadim Stepanov
cd770e85ea
Catch DoesNotExist in post_slack_rate_limit_message (#1067) 2023-01-03 17:44:56 +00:00
Matias Bordese
05524ab698
Merge pull request #1059 from grafana/matiasb/truncate-slack-title-block
Truncate slack alert group title block below max size
2023-01-03 08:50:57 -03:00
Matias Bordese
0a3c96d3c3
Merge pull request #1058 from grafana/matias/fix-schedule-no-start-byday
Handle no start date when calculating by day ical shift events
2023-01-03 08:50:27 -03:00
Ildar Iskhakov
1ff0a7da99
1.1.5.5 -> dev (#1060)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
Co-authored-by: Julia <ferril.darkdiver@gmail.com>
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
2023-01-03 11:57:16 +08:00
Innokentii Konstantinov
5e297847ae Speedup alert group search 2023-01-03 11:04:16 +08:00
Matias Bordese
374f32f489 Handle no start date when calculating by day ical shift events 2023-01-02 11:53:49 -03:00
Matias Bordese
75aaeef3f2 Truncate slack alert group title block below max size 2023-01-02 10:07:53 -03:00
Ildar Iskhakov
282e58db7b
Don't render logs for too big telegram dm (#1051)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2022-12-29 13:22:15 +00:00
Joey Orlando
7ebc9cbbf7
modify push notification settings + use fcm-django library (#998)
- swaps out `django-push-notifications` for
[`fcm-django`](https://github.com/grafana/fcm-django). Again.. this is a
fork of the parent repo for exactly the same reason.. the migrations
point to `auth_user` without letting us use our own user model, this has
been patched in the `grafana` fork. The reason why we are using
`fcm-django` vs `django-push-notifications` is that the latter does not
support the new FCM API, only the "legacy" API. The legacy FCM API does
not support certain push notification settings that we would like to
use.
- modifies the iOS/Android specific push notification settings
- adds a `flower` pod in the `docker-compose-developer.yml`, useful for
debugging tasks locally
- sets the mobile app verification token TTL to 5 minutes when
developing locally. The default of 1 minute makes working with device
emulators really tricky..

This PR also swaps out the base image in `engine/Dockerfile` from
`python:3.9-alpine3.16` to `python:3.9-slim-buster`.

As to why.. in short, with the introduction of the `fcm-django` library
there is now a peer-dependency on
[`grpcio`](https://github.com/grpc/grpc) (which is used by
`firebase_admin`.. which I am using in this PR to interact directly with
Firebase Cloud Messaging (FCM)). `grpcio` does not publish wheels (read:
compiled binaries) for the Alpine distro. It does publish wheels for
Debian and hence `pip install -r requirements.txt` does not need to
build this library from the source distribution.

This is a [known
"issue"](https://github.com/grpc/grpc/issues/22815#issuecomment-1107874367)
and the recommended solution in the community is to.. not use alpine.

These were the numbers, when building the image locally, in terms of
image size and build time:

| | Local image size (uncompressed | Build time (may differ based on
your network speed) |
| ------------------------- | -------------------------------------- |
---------- |
| `python:3.9-alpine3.16`   | 785MB  | 320s |
| `python:3.9-slim-buster` | 1.05GB  | 90s   |

Co-authored-by: Salvatore Giordano <salvatoregiordanoo@gmail.com>
2022-12-20 12:41:34 +01:00
Innokentii Konstantinov
7bb4fdfe43
Merge pull request #1017 from grafana/fix_ag_filtering
Speedup search alertgroup to group alert
2022-12-19 10:59:24 +08:00
Innokentii Konstantinov
41f886b31e Speedup seach alertgroup 2022-12-17 19:34:13 +08:00
Joey Orlando
66b2ed5c64
add more logging to push notification celery task (#986) 2022-12-13 14:06:56 +01:00
Joey Orlando
5967d5af63
remove apns + fix django-push-notifications migrations (#984)
- removes APNS support
- changes the `django-push-notification` library from the `iskhakov`
fork to the [`grafana`
fork](https://github.com/grafana/django-push-notifications). This new
fork basically just patches an issue which affected the database
migrations of this django app (previously the library would not respect
the `USER_MODEL` setting when creating its tables and would instead
reference the `auth_user` table.. which we don't want)
- add `--no-cache` flag to the `make build` command

**NOTE**
A migration should be applied as follows:
```bash
# remove the four push_notifications tables, which have improper foreign key references
python manage.py migrate push_notifications zero

# recreate the tables with the proper foreign key references
python manage.py migrate
```
2022-12-13 13:00:59 +01:00