centralcloud/oncall-engine

Author	SHA1	Message	Date
Michael Derynck	961a9e5349	Webhooks 2 Release (#1830 ) - Enables new webhooks functionality. - Database migration will automatically convert existing webhooks to new ones. Note: Converted webhooks are considered "legacy" they will continue to work as part of your escalation chain but will no longer be editable. To make changes use the `Make a copy` action and edit that one, after you can delete your legacy webhook. Remember to connect your escalation chain with your newly copied webhook! --------- Co-authored-by: Maxim <maxim.mordasov@grafana.com>	2023-07-13 13:53:06 -06:00
Joey Orlando	77f6dedce5	add index on started_at column in alert groups (#2516 ) # What this PR does Adds an index on the `started_at` column in the `alerts_alertgroup` table. For the alert groups query used by the `check_escalation_finished_task`, this resulted in a huge performance boost, taking the query time from 89mins to 4secs (on our largest production dataset). ## Which issue(s) this PR fixes closes #724 closes https://github.com/grafana/oncall-private/issues/1713 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-07-13 05:22:59 -04:00
Joey Orlando	385e1377d6	remove deprecated backend code (#2502 ) # What this PR does See more details comments alongside the code. Regarding frontend changes, the main changes in this PR are to remove unused fields on the `Team` interface + unused methods on the `Team` model. ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) (N/A) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) (N/A)	2023-07-12 02:07:45 -04:00
Joey Orlando	68b13aeb50	slightly tweak django-silk configuration	2023-07-05 20:44:11 +02:00
Joey Orlando	425ffbb740	address mobile device push notification delivery issue when user had > 1 registered device (#2421 ) # What this PR does Address issue where if the user had multiple registered devices w/ FCM, doing django queries like `.first()` could potentially pick the wrong device. Do this in two ways: 1. set the `DELETE_INACTIVE_DEVICES` `fcm_django` setting to `True`. According to the [docs](`20e275618b/README.rst (L127-L130)`), this works as follows: > devices to which notifications cannot be sent, are deleted upon receiving error response from FCM 2. Customizing the `FCMDevice` model provided by `fcm_django`. Add a new method, `get_active_device_for_user`, so that we can centralize the logic for this rather than duplicating `FCMDevice.objects.filter(user=user).first()` ## Which issue(s) this PR fixes https://raintank-corp.slack.com/archives/C0229FD3CE9/p1688461915752119 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) (N/A) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-07-05 15:14:46 +00:00
Innokentii Konstantinov	3f575b5a27	Fix phone provider initialization (#2434 )	2023-07-05 05:54:39 -04:00
Andrey Oleynik	aeb35009be	add zvonok integration (#2339 ) Added integration with [zvonok.com](https://zvonok.com) service. Features: - Phone number validation - Test calls - Selection of pre-recorded audio - Making calls - Processing call status - Acknowledgment alert group (optional) To process the call status, it is required to add a postback with the GET method on the side of the zvonok.com service with the following format ([more info here](https://zvonok.com/ru-ru/guide/guide_postback/)): ```${ONCALL_BASE_URL}/zvonok/call_status_events?campaign_id={ct_campaign_id}&call_id={ct_call_id}&status={ct_status}&user_choice={ct_user_choice}``` The names of the transmitted parameters can be redefined through environment variables. --------- Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>	2023-07-05 05:55:53 +00:00
Yulya Artyukhina	f1fcb41fb4	Add "user_was_notified_of_alert_groups" metric (#2334 ) This PR adds new metric for Prometheus exporter "user_was_notified_of_alert_groups" which counts how many alert groups user was notified of. ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) --------- Co-authored-by: Joey Orlando <joey.orlando@grafana.com>	2023-06-28 08:15:19 +00:00
Joey Orlando	75028d0427	continue addressing mypy violations (#2170 ) # What this PR does See #2173 Also, closes #2187 . All of the new files under `type_stubs/icalendar` were autogenerated by running: ```bash stubgen -p icalendar -o type_stubs ``` ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-06-27 10:23:08 +00:00
Innokentii Konstantinov	f0f2e7c8c6	Draft AlertManager integration v2 (#2167 ) # What this PR does Introduces AlertManagerV2 integration with better grouping and autoresolving, not intended for production use yet. --------- Co-authored-by: Ildar Iskhakov <Ildar.iskhakov@grafana.com>	2023-06-13 07:10:38 +00:00
Joey Orlando	8c5f6238dc	get rid of need for FEATURE_PROMETHEUS_EXPORTER_ENABLED to be present in ci-test.py settings file (#2161 )	2023-06-12 09:43:05 -04:00
Matias Bordese	cc3c18c89c	Add instructions for prometheus exporter setup (#2103 )	2023-06-12 13:04:07 +00:00
Ildar Iskhakov	1bdb54df35	Remove request reading middleware as we use post-buffering (#2094 ) # What this PR does RequestBodyReadingMiddleware is excess as [post-buffering is enabled](https://github.com/grafana/oncall/blob/dev/engine/uwsgi.ini#L17): If an HTTP request has a body (like a POST request generated by a form), you have to read (consume) it in your application. If you do not do this, the communication socket with your webserver may be clobbered. If you are lazy you can use the post-buffering option that will automatically read data for you. For Rack applications this is automatically enabled. (https://uwsgi-docs.readthedocs.io/en/latest/ThingsToKnow.html) ## Which issue(s) this PR fixes ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-06-05 11:49:39 +08:00
Matias Bordese	eee5065e74	Add initial setup for local dev prometheus exporter (#2039 )	2023-06-01 12:31:33 +00:00
Yulya Artyukhina	15ef692009	OnCall prometheus metrics exporter (#1605 ) # What this PR does Add OnCall prometheus metrics exporter ## Which issue(s) this PR fixes ## Checklist - [x] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated --------- Co-authored-by: Joey Orlando <joey.orlando@grafana.com> Co-authored-by: Matias Bordese <mbordese@gmail.com>	2023-05-25 18:26:13 +00:00
Innokentii Konstantinov	1f786e8d2a	Phone provider refactoring (#1713 ) # What this PR does This PR moves phone notification logic into separate object PhoneBackend and introduces PhoneProvider interface to hide actual implementation of external phone services provider. It should allow add new phone providers just by implementing one class (See SimplePhoneProvider for example). # Why [Asterisk PR](https://github.com/grafana/oncall/pull/1282) showed that our phone notification system is not flexible. However this is one of the most frequent community questions - how to add "X" phone provider. Also, this refactoring move us one step closer to unifying all notification backends, since with PhoneBackend all phone notification logic is collected in one place and independent from concrete realisation. # Highligts 1. PhoneBackend object - contains all phone notifications business logic. 2. PhoneProvider - interface to external phone services provider. 3. TwilioPhoneProvider and SimplePhoneProvider - two examples of PhoneProvider implementation. 4. PhoneCallRecord and SMSRecord models. I introduced these models to keep phone notification limits logic decoupled from external providers. Existing TwilioPhoneCall and TwilioSMS objects will be migrated to the new table to not to reset limits counter. To be able to receive status callbacks and gather from Twilio TwilioPhoneCall and TwilioSMS still exists, but they are linked to PhoneCallRecord and SMSRecord via fk, to not to leat twilio logic into core code. --------- Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>	2023-05-24 06:27:48 +00:00
Vadim Stepanov	663987c57e	Bring back FCM_PROJECT_ID env variable (#1980 ) Bring back `FCM_PROJECT_ID` env variable that was removed in https://github.com/grafana/oncall/pull/1969. I made an incorrect assumption that project ID is already specified in the credentials file, but in fact project ID can be different from the one in credentials file.	2023-05-22 14:32:21 +01:00
Vadim Stepanov	07368f3b93	Allow passing Firebase credentials via environment variable (#1969 ) # What this PR does Allow passing Google application credentials (used to send FCM messages using `fcm-django`) as an environment variable `GOOGLE_APPLICATION_CREDENTIALS_JSON_BASE64`. If the env variable is not provided, credentials will be taken from file. This change allows uWSGI workers send messages to FCM (currently it's not possible because the uWSGI user doesn't have access to the credentials file) + makes configuration more consistent. Also removes a redundant `FCM_PROJECT_ID` env variable (Google application credentials already contain the project ID). ## Which issue(s) this PR fixes ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-05-22 12:20:06 +00:00
Joey Orlando	4c5c4f2014	update silk_profiler_enabled logic (#1942 )	2023-05-15 16:00:59 -04:00
Joey Orlando	dc6192fb7c	dont enable silk if maintenance mode is enabled (#1941 )	2023-05-15 19:53:31 +00:00
Joey Orlando	9be8080e51	add the ability to set/display "currently undergoing maintenance message" in the UI (#1917 ) # What this PR does add a new endpoint, `GET /maintenance-mode/`, which returns either a string message pulled from the `CURRENTLY_UNDERGOING_MAINTENANCE_MESSAGE` env var, or `None` + update the UI to conditionally show this message if it is set <img width="1321" alt="Screenshot 2023-05-10 at 11 28 16" src="https://github.com/grafana/oncall/assets/9406895/833a77fb-3a90-4f9f-88d6-dae0d98d99d4"> ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) (N/A) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-05-12 15:44:09 +00:00
Joey Orlando	620f69e409	"You're Going OnCall" mobile app push notification (#1814 ) # What this PR does https://www.loom.com/share/c5deb35309604cfdab6176c44de7b15e ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-05-04 16:59:57 +00:00
Michael Derynck	3d74cbf3f5	Webhook 2 improvements and fixes (#1829 ) - Rename Firing to Alert Group Created to reduce confusion as to why the event only first once and not when unresolve or unacknowledge returns the alert group to the firing state. - Increase password field length - Do not filter webhook execution by team, team is just for filtering ownership now - Do not log webhook triggers in alert group escalation log if the webhook does not trigger (Status/response will still be stored) - Fix formatting for response content and data fields on the Status page - Add a content length limit for responses being stored (50000 characters)	2023-04-26 15:55:08 -06:00
Shantanu Alsi	e806ad32f1	Fix documentation links (#1766 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) --------- Co-authored-by: Joey Orlando <joey.orlando@grafana.com>	2023-04-19 10:12:16 +01:00
Matias Bordese	017d98efad	Rework schedule ical export (#1783 ) Related to #1501. Behind a feature flag, will migrate existing exports to use the new ical export transparently.	2023-04-18 17:07:11 +00:00
Joey Orlando	0eb4bd95e6	Revert "Revert "speed up ci builds from 15 to <7 minutes"" (#1643 ) Reverts grafana/oncall#1639	2023-03-28 09:34:03 +02:00
Innokentii Konstantinov	cbb06492ae	Revert "speed up ci builds from 15 to <7 minutes" (#1639 ) Reverted due to stuck ci	2023-03-28 13:01:49 +08:00
Ildar Iskhakov	c158c8f28b	Configure pyroscope (#1638 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-03-28 11:34:37 +08:00
Joey Orlando	23cd736c30	speed up ci builds from 15 to <7 minutes (#1615 ) This PR cuts GitHub Action build times from 14-15 minutes, down to just under 7 minutes. It does this by: - caching `grafana-plugins/node_modules` and `pip` dependencies based on their respective dependency files (eg. `requirements.txt` & `yarn.lock`). This step alone saves ~3 minutes. - get rid of the "build-engine-docker-image" and "backend-integration-tests" jobs in the old "Integration Tests" workflow. This was split out this way so that we could build the backend docker image once, upload the artifact, and then reuse it across the backend and e2e tests. We no longer need these backend integration tests because we are testing the same thing in the e2e tests. This saves ~45 seconds of having to upload the image artifact. - few improvements within the integration tests themselves: - move plugin configuration to the `globalSetup.ts`. This means that every test does not need to check if the plugin has been configured because it is done once before all the tests are run. - cache the plugin frontend build. If your commit doesn't change anything to `grafana-plugin/src` or `grafana-plugin/yarn.lock` it should be safe to reuse a previously built/cached version of the plugin frontend. This saves ~3 minutes - cache playwright binaries/dependencies. Only re-install them if the version of `@playwright/test` in `grafana-plugin/yarn.lock` changes. This saves ~3 minutes. Other things to mention Once we refactor the `GSelect` component to not call the `onChange` callback on every keyDown event (#1628), this should allow us to parallelize the integration tests, and cut the time required to execute the tests themselves in half	2023-03-27 18:07:19 +02:00
Ildar Iskhakov	8d5cbcecf2	Start pyroscope only after uwsgi fork (#1607 ) # What this PR does Currently main uwsgi process sends spans while being idle, which make graphs unreadable <img width="494" alt="Screenshot 2023-03-23 at 18 00 21" src="https://user-images.githubusercontent.com/2262529/227168746-125f2329-bfaa-4989-a391-712a230e0087.png"> ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-03-27 12:00:57 +08:00
Innokentii Konstantinov	bfe06ac888	Add SLACK_INTEGRATION_MAINTENANCE env var (#1582 ) # What this PR does Add SLACK_INTEGRATION_MAINTENANCE env var to be able to disable slack install/uninstall	2023-03-21 08:15:35 +00:00
Joey Orlando	4d655dff60	modify check_escalation_finished_task task (#1266 ) # What this PR does This PR: - modifies the `check_escalation_finished_task` celery task to: - do stricter escalation validation based on the alert group's escalation snapshot (see the `audit_alert_group_escalation` method in `engine/apps/alerts/tasks/check_escalation_finished.py` for the validation logic) - use a read-only database for querying alert-groups if one is configured, otherwise use the "default" one - ping a configurable heartbeat (new env var `ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL` added) - increase the task frequency from every 10 to every 13 minutes (this can be configured via an env variable) - adds public documentation on how to configure this auditor task - modifies the local celery startup command to properly take into consideration all celery related env vars (similar to the ones we use in `engine/celery_with_exporter.sh`; this made it easier to enable `celery beat` locally for testing) - removes the following code: - removes references to `AlertGroup.estimate_escalation_finish_time` and marks the model field as deprecated using the [`django-deprecate-fields` library](https://pypi.org/project/django-deprecate-fields/). This field was only used for the previous version of this validation task - `EscalationSnapshotMixin.calculate_eta_for_finish_escalation` was only used to calculate the value for `AlertGroup.estimate_escalation_finish_time` - `calculate_escalation_finish_time` celery task ## Which issue(s) this PR fixes https://github.com/grafana/oncall-private/issues/1558 ## Checklist - [x] Tests updated - [x] Documentation added - [x] `CHANGELOG.md` updated	2023-03-17 10:14:08 +00:00
Vadim Stepanov	ea60c0d247	Inbound email integration (#837 ) This PR add Inbound Email integration. It designed to support some variety of ESPs, but in prod we will use Mailgun, so locally I tested it only with mailgun ESP. Important: To make it work on different clusters I'm planning to provide different email domains for different regions, like ....@us.oncall.grafana.net, ...@eu.oncall.grafana.net --------- Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>	2023-03-16 13:59:21 +08:00
Matias Bordese	2048e783ba	Add webhooks app and initial models (#1101 )	2023-03-09 19:39:25 +00:00
Innokentii Konstantinov	fbb83daf21	Store org cluster_slug (#1480 ) # What this PR does Store org cluster slug to write insight logs	2023-03-09 04:10:19 +00:00
Joey Orlando	7c8722e714	remove mobile app feature flag (#1484 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [x] `CHANGELOG.md` updated	2023-03-08 11:22:44 +01:00
Innokentii Konstantinov	7bad073626	Remove OSS_INSTALATION env var (#881 ) It's a duplicate of LICENSE env var What this PR does: Remove OSS_INSTALLATION env var in favour of LICENSE env var. Also, I refactored features tests a little. From my point of view it makes little sense to test if all features are disabled or enabled. Better to test specific use-case (e.g. oss installation). Also to test that all features are disabled it is needed to set LICENSE equals cloud license, which makes test confusing. Checklist - [x] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-03-07 11:07:42 +00:00
ak0nst	44e93b6ab4	Email and phone limits now environment variable (#1219 ) # What this PR does Email and phone limits now environment variables: EMAIL_NOTIFICATIONS_LIMIT=200, PHONE_NOTIFICATIONS_LIMIT=200 ## Which issue(s) this PR fixes #1010 ## Checklist - [ ] Tests updated - [x] Documentation added - [x] `CHANGELOG.md` updated --------- Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>	2023-03-07 10:48:05 +00:00
Innokentii Konstantinov	4b91203eca	Add validation of hostname for recapctha (#1445 ) # What this PR does - Implement recapthca v3 check. DRF_RECAPTCHA didn't support hostname validation and it's too complicated to add it. - Add validation of verification code on oncall side to not to call twilio with obviously invalid codes ## Checklist - [x] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-03-06 08:59:48 +00:00
Innokentii Konstantinov	6a5e75e083	Fix of templates api behaviour for public and private api (#1408 ) # What this PR does This PR fixes templates behaviour for public and private api. It fix "reset to default" for templates from messaging backends and some minor bugs. Also added acknowledge signal and source link templates ## Checklist - [x] Tests updated - [x] Documentation added - [x] `CHANGELOG.md` updated	2023-03-01 16:32:15 +08:00
Michael Derynck	b3659872a7	Get reCAPTCHA site key from backend env (#1400 ) # What this PR does Move reCAPTCHA site key to backend environment for easier management to support multiple environments. ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [x] `CHANGELOG.md` updated	2023-02-24 15:53:35 +00:00
Joey Orlando	c55a9010f7	Add Google reCAPTCHA for mobile app phone verification (#1373 ) # What this PR does Adds reCAPTCHA validation to the get mobile verification code endpoint ## Which issue(s) this PR fixes ## Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [x] `CHANGELOG.md` updated --------- Co-authored-by: Maxim <maxim.mordasov@grafana.com>	2023-02-21 20:17:06 +01:00
Ildar Iskhakov	1b7ada4315	Add database migrations linter (#1020 ) # What this PR does This PR adds [django-migration-linter](https://github.com/3YOURMIND/django-migration-linter) to keep database migrations backwards compatible - we can automatically run migrations and they are zero-downtime, e.g. old code can work with the migrated database - we can run and rollback migrations without worrying about data safety - OnCall is deployed to the multiple environments core team is not able to control See [django-migration-linter checklist](https://github.com/3YOURMIND/django-migration-linter/blob/main/docs/incompatibilities.md) for the common mistakes and best practices ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated --------- Co-authored-by: Joey Orlando <joey.orlando@grafana.com>	2023-02-06 16:01:37 +08:00
Vadim Stepanov	070eb6e538	Enable mobile app backend by default on OSS (#1286 ) # What this PR does Enables mobile app backend by default on OSS. ## Checklist - [x] `CHANGELOG.md` updated	2023-02-03 12:44:22 +00:00
Vadim Stepanov	9b709e86c9	Fix local dev setup slowness (#1270 ) # What this PR does Fixes an issue when a local dev setup becomes extremely slow. - Set `DEBUG` and `SILK_PROFILER_ENABLED` to `False` by default + add utility make commands to toggle it - Use `uwsgi` instead of Django's built-in `runserver` for local dev setup - Limit Celery concurrency to 3 for local dev setup (previously was 20, used >1GB RAM on my machine) --------- Co-authored-by: Joey Orlando <joey.orlando@grafana.com>	2023-02-02 09:08:48 +00:00
Joey Orlando	94fe7979cf	add django-dbconn-retry library (#1262 )	2023-01-31 20:17:54 +01:00
Ildar Iskhakov	4a8011d236	Add silk setting to store .prof files in the specific folder and share it between uwsgi workers (#1228 ) # What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-01-26 20:33:04 +08:00
Ildar Iskhakov	a6a781320d	Set SILKY_PYTHON_PROFILER_BINARY setting to False by default (#1218 ) # What this PR does Here is the example of the visualisation with `snakeviz` <img width="1126" alt="Screenshot 2023-01-25 at 22 15 49" src="https://user-images.githubusercontent.com/2262529/214586753-ad49a002-27e1-4e44-82f2-4ad5f4e40101.png"> ## Which issue(s) this PR fixes ## Checklist - [ ] Tests updated - [ ] Documentation added - [ ] `CHANGELOG.md` updated	2023-01-25 22:17:17 +08:00
Ildar Iskhakov	0a00d3e2c1	Update base.py	2023-01-20 20:20:51 +08:00
Matias Bordese	693b5a41c4	Add slack command to trigger direct paging (#1154 ) Slash command needs to be added to slack app manifest: ``` slash_commands: - command: /escalate url: https://<oncall-public-url>/slack/interactive_api_endpoint/ description: Create a new alert group escalation should_escape: false ```	2023-01-20 09:06:27 -03:00

1 2 3

150 commits