centralcloud/oncall-engine

Author	SHA1	Message	Date
Ildar Iskhakov	46b39b2c87	Remove resolved and acknowledged filters as we switched to status (#1201 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-01-24 18:13:21 +08:00
Innokentii Konstantinov	cfa7fb816c	Sync users and teams on tf requests (#1180 ) # What this PR does This PR add sync with grafana on requests from terraform ## Which issue(s) this PR fixes It's needed to fix case when customers want to create team via grafana terraform provider and use it in the oncall provider without having to log into Grafana Cloud. Co-authored-by: Joey Orlando <joey.orlando@grafana.com>	2023-01-24 13:44:07 +08:00
Vadim Stepanov	ae5949aa7e	Allow viewers fetch cloud connection status (#1181 ) # What this PR does Fixes the issue when users with the viewer role can't fetch the cloud connection status, which makes the plugin fail to load for viewers. This PR makes the cloud connection endpoint use `OTHER_SETTINGS_READ` for fetching the cloud connection status instead of `OTHER_SETTINGS_WRITE`. ## Checklist - [x] Tests updated - [x] `CHANGELOG.md` updated	2023-01-23 11:17:57 +00:00
Ildar Iskhakov	37d25b5b31	Optimize alert group filtering queries (#1191 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-01-23 16:07:55 +08:00
Dan Cech	639fd81644	Update message when user needs to connect their profile (#1190 ) # What this PR does This just tweaks the message users get when they try to interact via slack but haven't connected their profile, it fixes a typo and streamlines the text.	2023-01-23 08:44:33 +01:00
Ildar Iskhakov	b90fe433c9	Optimize alertgroups endpoint (#1189 ) # What this PR does Changing query to retrieve alert group in two completely different queries instead of one with `join` new queries ``` SELECT alerts_alertreceivechannel.id FROM alerts_alertreceivechannel WHERE (alerts_alertreceivechannel.deleted_at IS NULL AND alerts_alertreceivechannel.organization_id = 8 AND alerts_alertreceivechannel.team_id IS NULL) SELECT `alerts_alertgroup`.`id` FROM `alerts_alertgroup` WHERE (`alerts_alertgroup`.`channel_id` IN (2,33,34,35,36,40,52,59,61,62,63,70,76,89,93,94,03,08,09,10,12,13,16,18,20,22,23,24,26,27,28,30,31,33,34,35,36,40,41,42,43,45,48,53,56,57,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,86,87,88,89,91,93,23,27,29,31,32,33,55,56,57,58,65,69,72,75,81,13,17,20,22,33,34,38,39,41,44,45,46,51,52,55,56,58,59,60,63,68,70,71) AND NOT `alerts_alertgroup`.`is_archived` AND NOT `alerts_alertgroup`.`is_archived` AND `alerts_alertgroup`.`root_alert_group_id` IS NULL AND ((NOT `alerts_alertgroup`.`silenced` AND NOT `alerts_alertgroup`.`acknowledged` AND NOT `alerts_alertgroup`.`resolved`) OR (`alerts_alertgroup`.`acknowledged` AND NOT `alerts_alertgroup`.`resolved`)) AND NOT `alerts_alertgroup`.`is_archived`) ORDER BY `alerts_alertgroup`.`id` DESC LIMIT 26 ``` ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-01-22 00:53:11 +08:00
Ildar Iskhakov	c9b83906a0	Optimize alertgroups endpoint (#1188 ) # What this PR does Changing query to retrieve alert group in two requests instead of one with `join` old query: ``` SELECT `alerts_alertgroup`.`id` FROM `alerts_alertgroup` INNER JOIN `alerts_alertreceivechannel` ON (`alerts_alertgroup`.`channel_id` = `alerts_alertreceivechannel`.`id`) WHERE (`alerts_alertreceivechannel`.`organization_id` = 1 AND `alerts_alertreceivechannel`.`team_id` IS NULL AND NOT `alerts_alertgroup`.`is_archived` AND NOT `alerts_alertgroup`.`is_archived` AND `alerts_alertgroup`.`root_alert_group_id` IS NULL AND ((NOT `alerts_alertgroup`.`silenced` AND NOT `alerts_alertgroup`.`acknowledged` AND NOT `alerts_alertgroup`.`resolved`) OR (`alerts_alertgroup`.`acknowledged` AND NOT `alerts_alertgroup`.`resolved`)) AND NOT `alerts_alertgroup`.`is_archived`) ORDER BY `alerts_alertgroup`.`id` DESC LIMIT 26 ``` new query: ``` SELECT "alerts_alertgroup"."id" FROM "alerts_alertgroup" WHERE ("alerts_alertgroup"."channel_id" IN (SELECT U0."id" FROM "alerts_alertreceivechannel" U0 WHERE (NOT (U0."integration" = maintenance) AND U0."deleted_at" IS NULL AND U0."organization_id" = 1 AND U0."team_id" IS NULL)) AND NOT "alerts_alertgroup"."is_archived" AND NOT "alerts_alertgroup"."is_archived" AND "alerts_alertgroup"."root_alert_group_id" IS NULL AND ((NOT "alerts_alertgroup"."silenced" AND NOT "alerts_alertgroup"."acknowledged" AND NOT "alerts_alertgroup"."resolved") OR ("alerts_alertgroup"."acknowledged" AND NOT "alerts_alertgroup"."resolved")) AND NOT "alerts_alertgroup"."is_archived") ORDER BY "alerts_alertgroup"."id" DESC LIMIT 26 ``` ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-01-22 00:14:48 +08:00
Ildar Iskhakov	83b1f069d0	Optimize alertgroups endpoint (#1186 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-01-21 21:59:20 +08:00
Vadim Stepanov	2b0abf018c	Hide direct paging integrations (#1162 ) # What this PR does Hide direct paging integrations from the web UI. Related to https://github.com/grafana/oncall/issues/823 ## Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [ ] `CHANGELOG.md` updated (N/A)	2023-01-20 13:29:57 +00:00
Matias Bordese	693b5a41c4	Add slack command to trigger direct paging (#1154 ) Slash command needs to be added to slack app manifest: ``` slash_commands: - command: /escalate url: https://<oncall-public-url>/slack/interactive_api_endpoint/ description: Create a new alert group escalation should_escape: false ```	2023-01-20 09:06:27 -03:00
Joey Orlando	98241b9a10	fake-data generation script + fixes for django-silk and django-debug-toolbar (#1128 ) # What this PR does ## Main stuff - add Python script to populate local Grafana/OnCall setup w/ large amounts of fake data. Right now the data types that can be generated are: - teams and Admin users via the Grafana API (must be synced manually by going into the UI before going onto the next step) - Calendar Schedules which have three 8h oncall-shifts, via the OnCall public API - fixes `django-debug-toolbar` when being run in `docker-compose` locally ## Other stuff - documents how to easily modify the Grafana `docker-compose` container provisioning configuration - document solutions for two backend setup related issues encountered when running the engine/celery workers locally, outside of `docker-compose`, on an Apple silicon Mac - fixes small bug in `grafana_plugin.helpers.client.APIClient.call_api` where it would call `response.json()` for all requests, regardless of whether or not the response actually contained data or not - in `engine/settings/dev.py`, properly setup `django-silk` and document the steps to use it locally - make it possible to log out debug SQL queries by specifying `DEV_DEBUG_VIEW_SQL_QUERIES` env var, rather than having to uncomment out a section of `settings/dev.py` ## Which issue(s) this PR fixes - Some local setup issues when trying to use `django-silk` and `django-debug-toolbar` - Makes it much easier to populate your local setup with a lot of fake data - Makes it possible to easily modify your local grafana's provisioning configuration ## Checklist - [ ] Tests updated (N/A) - [ ] Documentation added (N/A) - [ ] `CHANGELOG.md` updated (N/A)	2023-01-20 09:19:41 +01:00
Michael Derynck	cc3fdab8fb	Fix UnboundLocalError in webhooks (#1165 ) Fix error where rendered_data was being used without being defined.	2023-01-19 15:50:22 -07:00
Vadim Stepanov	ccae9d86b3	Add an ability to use an escalation chain for direct paging (#1161 ) # What this PR does Adds an ability to page an escalation chain for a newly created direct paging alert group using the internal API. Also [adds a forgotten migration](`32fc44e744`) related to the direct paging backend. Related to https://github.com/grafana/oncall/issues/823 ## Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [ ] `CHANGELOG.md` updated (N/A)	2023-01-19 18:51:57 +00:00
Yulya Artyukhina	d5461866d1	Add a dummy step for declare incident button in slack (#1157 ) Add a dummy step for declare incident button to prevent raising 'Step is undefined' exception because Slack sends a POST request to the backend upon clicking a button with a redirect link to Incident. This pr doesn't change any functionality	2023-01-19 14:50:02 +01:00
Vadim Stepanov	29f67dc2f3	Fix circular import	2023-01-19 11:53:05 +00:00
Vadim Stepanov	6b87ad74e9	Enforce cloud connection to send push notifications on OSS (#1132 ) This PR modifies how OSS instances send mobile app push notifications. It also adds frontend warnings when user is trying to use the mobile app without connecting to cloud. - [x] Add public API authentication to `FCMRelayView` and throttle the view to 300 push notifications per instance per minute. This is similar to how SMS and phone call notifications work on OSS instances. - [x] Add frontend warnings based on cloud connectivity - [x] Fix/add frontend tests - [x] Add tests for FCMRelayView and mobile app backend ## Screenshots When a user tries to connect the mobile app in his settings and cloud is not connected (clicking "Connect Cloud OnCall" redirects to the "Cloud" tab): <img width="1088" alt="Screenshot 2023-01-12 at 18 48 58" src="https://user-images.githubusercontent.com/20116910/212156591-86906020-eddf-43f1-9402-7ebb7547c7e6.png"> When a user tries to use mobile push notifications as a personal notification step and cloud is not connected: <img width="764" alt="Screenshot 2023-01-12 at 19 01 10" src="https://user-images.githubusercontent.com/20116910/212157580-9abb0758-79ad-4316-b8cd-15b4fff01502.png"> Now on the "Cloud" tab there's some info about the mobile app (the last section at the bottom of the page): <img width="1245" alt="Screenshot 2023-01-12 at 18 49 10" src="https://user-images.githubusercontent.com/20116910/212156997-c8b70dd5-bf15-4bc7-8eb8-9decdb8ecc80.png"> After connecting to the cloud instance, everything goes back to active and it's now possible to connect the mobile app: <img width="1091" alt="Screenshot 2023-01-12 at 19 08 27" src="https://user-images.githubusercontent.com/20116910/212158811-60d49888-4714-4c0e-850f-3ff6a11a117a.png"> After connecting the app the warning is gone: <img width="764" alt="Screenshot 2023-01-12 at 19 07 00" src="https://user-images.githubusercontent.com/20116910/212158614-677ab889-127f-4d64-bacc-0c26887f3097.png">	2023-01-19 11:15:56 +00:00
Vadim Stepanov	c93ee5c554	Send a Slack DM when user is not in channel (#1144 ) # What this PR does Currently, when a user gets mentioned in an alert group thread and the user is not in the Slack channel, the Slack bot sends the following to the channel: > ⚠️ Tried to ask USER to look at incident. Unfortunately USER is not in this channel. Please, invite. This PR changes this behaviour to instead send a direct message to the user. The message contains a link to the main alert group message in Slack. <img width="806" alt="Screenshot 2023-01-17 at 19 25 36" src="https://user-images.githubusercontent.com/20116910/212996457-02db183f-2041-4998-b743-bd5b6c84b7b5.png"> ## Checklist - [ ] Tests updated (N/A) - [ ] Documentation added (N/A) - [x] `CHANGELOG.md` updated	2023-01-18 16:08:15 +00:00
Matias Bordese	90def88752	Add escalation chain option when creating a direct page alert group (#1143 ) Also changes the default integration used when creating an alert group for a direct page to a custom manual integration to avoid conflicts/unexpected behaviors with existing manual alerts.	2023-01-18 12:58:26 -03:00
Vadim Stepanov	b8d78fd6bb	Allow messaging backends to be enabled/disabled per organization (#1151 ) # What this PR does Allows messaging backends to be enabled/disabled per organization when getting a list of available personal notification channels. ## Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [x] `CHANGELOG.md` updated	2023-01-18 15:52:25 +00:00
Matias Bordese	d3062b56fd	Draft initial logic for user/schedule paging (#1098 ) Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>	2023-01-17 12:19:08 -03:00
Yulya Artyukhina	9129a720ef	Integration with grafana incident (#1081 ) Check if Grafana Incident is enabled. If it is, add a button with a link to declare Grafana Incident from Alert group in Slack and on Web. Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>	2023-01-17 13:04:50 +01:00
Tommy	5bd8fbdef8	Add alert groups state filter (#1133 ) # What this PR does This PR added a new parameter (state) into the alert_group public API to filter the state of the alert groups ## Which issue(s) this PR fixes https://github.com/grafana/oncall/issues/684 ## Checklist - [x] Tests updated - [x] Documentation added - [x] `CHANGELOG.md` updated Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>	2023-01-17 10:28:29 +00:00
Vadim Stepanov	59f2c293e7	Move FCM relay logic into a celery task (#1137 )	2023-01-13 19:28:34 +00:00
Matias Bordese	0d38fe2a7f	Web schedules overrides are the higher priority level (#1115 ) Related to https://github.com/grafana/oncall-private/issues/1550	2023-01-13 08:58:35 -03:00
Innokentii Konstantinov	9a3b53ff34	Delete slack_connector on org soft-delete (#1127 )	2023-01-12 17:37:05 +08:00
Joey Orlando	babacf4da8	refactor the is_rbac_permissions_enabled check to be more robust (#1099 ) # What this PR does Checks the `is_rbac_permissions_enabled` flag differently based on whether we are dealing with an open-source, or cloud installation: - for open-source installations, simply continue making a `HEAD` request to the list RBAC permissions Grafana API endpoint. - for cloud installations, use the `config` object returned from `GET /instances/{instance_id}?config=true` and check whether `instance_info["config"]["feature_toggles"]["accessControlOnCall"] == "true"` ## Which issue(s) this PR fixes Resolves the issue in hosted grafana where when a stack is inactive, the hosted grafana gateway, returns 200 to the `HEAD` request (which erroneously sets the `is_rbac_permissions_enabled` flag to `true`) ## Checklist - [x] Tests updated (N/A) - [ ] Documentation added - [x] `CHANGELOG.md` updated	2023-01-11 12:48:30 +01:00
Vadim Stepanov	231c0f45a3	Re-implement FCM relay after introducing firebase (#1121 ) This PR changes how `FCMRelayView` handles push notifications from OSS instances, also changing how the mobile app backend sends push notifications.	2023-01-11 11:42:01 +00:00
Innokentii Konstantinov	fa6906a606	Simplify and speed up slack rendering (#1105 ) Simplify and speed up slack rendering.	2023-01-10 15:41:38 +08:00
Matias Bordese	abbb5a8381	Update schedules query not to defer ical fields used for on-call check (#1114 ) Do not defer cached ical fields which are later needed for calculating on-call users listed in the schedules list page.	2023-01-09 14:10:23 -03:00
Matias Bordese	87fad5eec1	Add select_related to fetch schedules user group information (#1109 )	2023-01-09 13:15:27 -03:00
Vadim Stepanov	a1e4f72280	remove send_link_to_channel_message_or_fallback_to_full_incident	2023-01-06 11:34:11 +00:00
Salvatore Giordano	a3dbe95d5a	fix: update android notification payload (#1090 )	2023-01-05 17:36:07 +01:00
Joey Orlando	802e3964e9	update mobile app push notification text + make telegram alert verbage consistent ("Firing" instead of "Alerting") (#1089 )	2023-01-05 16:16:43 +01:00
Innokentii Konstantinov	8abbcee050	Org soft-delete (#1073 ) # What this PR does It introduces soft-delete of organization, since grafana stacks are soft-deleted too. Also, we had a problem with deleting orgs with large amounts of alerts, so soft-deletion will fix this problem. I think, that problem of cleaning alerts of deleted orgs should be solved as a part of alert retention	2023-01-05 12:42:55 +08:00
Vadim Stepanov	0d4701bd81	Change wording from "incident" to "alert group" in the Telegram app (#1052 ) # What this PR does Makes Telegram integration consistent with the rest of the system so it uses the word "alert group" instead of "incident" when referring to alert groups. ## Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [x] `CHANGELOG.md` updated	2023-01-04 17:44:01 +00:00
Vadim Stepanov	7a1f176cb5	Schedule score backend (#338 ) This PR adds an endpoint returning a schedule quality score, overloaded users and comments on the existing issues (e.g. balance issues or gaps). ## Limitations - Since working hours editor is not implemented yet, there are only two scores taken into account: balance score and a score representing the ratio of time when someone is on-call to the whole time period. - Time period is now set to be constant (90 days from today), so in some cases the results will be inaccurate (when rotations don't align with the time period) - It only takes primary rotations into account (overrides are ignored) ## Usage `GET /api/internal/v1/schedules/<pk>/quality?date=<TOMORROW_DATE>` Note that `date` should be tomorrow date, because we can only be sure about changing tomorrow's shifts (some of the shifts for current day could be "deleted" but still show up in the UI). ## Example response ```json { "total_score": 90, "comments": ["Schedule has no gaps", "Schedule is well-balanced, but still can be improved"], "overloaded_users": ["USSZ5WRH2CUA9", "U74XJZSSQGBIH"] } ``` Issue: #118	2023-01-04 16:49:58 +00:00
Michael Derynck	7c26eb559b	Improve handling of template exceptions during group data creation (#1068 ) # What this PR does With the addition of tighter controls on jinja templates handle exceptions while rendering group data as follows: - Title will cache error message as title and display to user and the error will be logged - Group distinction will be left as None and the error will be logged - Is resolve signal will be treated as False and the error will be logged - Is acknowledge signal will be treated as False and the error will be logged ## Which issue(s) this PR fixes https://github.com/grafana/oncall-private/issues/1542	2023-01-03 12:30:59 -07:00
Vadim Stepanov	cd770e85ea	Catch DoesNotExist in post_slack_rate_limit_message (#1067 )	2023-01-03 17:44:56 +00:00
Matias Bordese	05524ab698	Merge pull request #1059 from grafana/matiasb/truncate-slack-title-block Truncate slack alert group title block below max size	2023-01-03 08:50:57 -03:00
Matias Bordese	0a3c96d3c3	Merge pull request #1058 from grafana/matias/fix-schedule-no-start-byday Handle no start date when calculating by day ical shift events	2023-01-03 08:50:27 -03:00
Ildar Iskhakov	1ff0a7da99	1.1.5.5 -> dev (#1060 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com> Co-authored-by: Julia <ferril.darkdiver@gmail.com> Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com> Co-authored-by: Matias Bordese <mbordese@gmail.com>	2023-01-03 11:57:16 +08:00
Innokentii Konstantinov	5e297847ae	Speedup alert group search	2023-01-03 11:04:16 +08:00
Matias Bordese	374f32f489	Handle no start date when calculating by day ical shift events	2023-01-02 11:53:49 -03:00
Matias Bordese	75aaeef3f2	Truncate slack alert group title block below max size	2023-01-02 10:07:53 -03:00
Ildar Iskhakov	282e58db7b	Don't render logs for too big telegram dm (#1051 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2022-12-29 13:22:15 +00:00
Joey Orlando	7ebc9cbbf7	modify push notification settings + use fcm-django library (#998 ) - swaps out `django-push-notifications` for [`fcm-django`](https://github.com/grafana/fcm-django). Again.. this is a fork of the parent repo for exactly the same reason.. the migrations point to `auth_user` without letting us use our own user model, this has been patched in the `grafana` fork. The reason why we are using `fcm-django` vs `django-push-notifications` is that the latter does not support the new FCM API, only the "legacy" API. The legacy FCM API does not support certain push notification settings that we would like to use. - modifies the iOS/Android specific push notification settings - adds a `flower` pod in the `docker-compose-developer.yml`, useful for debugging tasks locally - sets the mobile app verification token TTL to 5 minutes when developing locally. The default of 1 minute makes working with device emulators really tricky.. This PR also swaps out the base image in `engine/Dockerfile` from `python:3.9-alpine3.16` to `python:3.9-slim-buster`. As to why.. in short, with the introduction of the `fcm-django` library there is now a peer-dependency on [`grpcio`](https://github.com/grpc/grpc) (which is used by `firebase_admin`.. which I am using in this PR to interact directly with Firebase Cloud Messaging (FCM)). `grpcio` does not publish wheels (read: compiled binaries) for the Alpine distro. It does publish wheels for Debian and hence `pip install -r requirements.txt` does not need to build this library from the source distribution. This is a [known "issue"](https://github.com/grpc/grpc/issues/22815#issuecomment-1107874367) and the recommended solution in the community is to.. not use alpine. These were the numbers, when building the image locally, in terms of image size and build time: \| \| Local image size (uncompressed \| Build time (may differ based on your network speed) \| \| ------------------------- \| -------------------------------------- \| ---------- \| \| `python:3.9-alpine3.16` \| 785MB \| 320s \| \| `python:3.9-slim-buster` \| 1.05GB \| 90s \| Co-authored-by: Salvatore Giordano <salvatoregiordanoo@gmail.com>	2022-12-20 12:41:34 +01:00
Innokentii Konstantinov	7bb4fdfe43	Merge pull request #1017 from grafana/fix_ag_filtering Speedup search alertgroup to group alert	2022-12-19 10:59:24 +08:00
Innokentii Konstantinov	41f886b31e	Speedup seach alertgroup	2022-12-17 19:34:13 +08:00
Joey Orlando	66b2ed5c64	add more logging to push notification celery task (#986 )	2022-12-13 14:06:56 +01:00
Joey Orlando	5967d5af63	remove apns + fix django-push-notifications migrations (#984 ) - removes APNS support - changes the `django-push-notification` library from the `iskhakov` fork to the [`grafana` fork](https://github.com/grafana/django-push-notifications). This new fork basically just patches an issue which affected the database migrations of this django app (previously the library would not respect the `USER_MODEL` setting when creating its tables and would instead reference the `auth_user` table.. which we don't want) - add `--no-cache` flag to the `make build` command NOTE A migration should be applied as follows: ```bash # remove the four push_notifications tables, which have improper foreign key references python manage.py migrate push_notifications zero # recreate the tables with the proper foreign key references python manage.py migrate ```	2022-12-13 13:00:59 +01:00

1 2 3 4 5 ...

491 commits