Commit graph

119 commits

Author SHA1 Message Date
Joey Orlando
4d655dff60
modify check_escalation_finished_task task (#1266)
# What this PR does

This PR:
- modifies the `check_escalation_finished_task` celery task to:
  - do stricter escalation validation based on the alert group's
escalation snapshot (see the `audit_alert_group_escalation` method in
`engine/apps/alerts/tasks/check_escalation_finished.py` for the
validation logic)
- use a read-only database for querying alert-groups if one is
configured, otherwise use the "default" one
- ping a configurable heartbeat (new env var
`ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL` added)
- increase the task frequency from every 10 to every 13 minutes (this
can be configured via an env variable)
  - adds public documentation on how to configure this auditor task
- modifies the local celery startup command to properly take into
consideration all celery related env vars (similar to the ones we use in
`engine/celery_with_exporter.sh`; this made it easier to enable `celery
beat` locally for testing)
- removes the following code:
- removes references to `AlertGroup.estimate_escalation_finish_time` and
marks the model field as deprecated using the [`django-deprecate-fields`
library](https://pypi.org/project/django-deprecate-fields/). This field
was only used for the previous version of this validation task
- `EscalationSnapshotMixin.calculate_eta_for_finish_escalation` was only
used to calculate the value for
`AlertGroup.estimate_escalation_finish_time`
  - `calculate_escalation_finish_time` celery task
  

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1558

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-17 10:14:08 +00:00
Vadim Stepanov
ea60c0d247
Inbound email integration (#837)
This PR add Inbound Email integration.

It designed to support some variety of ESPs, but in prod we will use
Mailgun, so locally I tested it only with mailgun ESP.

**Important:**
To make it work on different clusters I'm planning to provide different
email domains for different regions, like ....@us.oncall.grafana.net,
...@eu.oncall.grafana.net

---------

Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
2023-03-16 13:59:21 +08:00
Matias Bordese
2048e783ba
Add webhooks app and initial models (#1101) 2023-03-09 19:39:25 +00:00
Innokentii Konstantinov
fbb83daf21
Store org cluster_slug (#1480)
# What this PR does
Store org cluster slug to write insight logs
2023-03-09 04:10:19 +00:00
Joey Orlando
7c8722e714
remove mobile app feature flag (#1484)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
2023-03-08 11:22:44 +01:00
Innokentii Konstantinov
7bad073626
Remove OSS_INSTALATION env var (#881)
It's a duplicate of LICENSE env var

**What this PR does**:
Remove OSS_INSTALLATION env var in favour of LICENSE env var. Also, I
refactored features tests a little. From my point of view it makes
little sense to test if all features are disabled or enabled. Better to
test specific use-case (e.g. oss installation).
Also to test that all features are disabled it is needed to set LICENSE
equals cloud license, which makes test confusing.

**Checklist**
- [x] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-07 11:07:42 +00:00
ak0nst
44e93b6ab4
Email and phone limits now environment variable (#1219)
# What this PR does
Email and phone limits now environment variables:
EMAIL_NOTIFICATIONS_LIMIT=200, PHONE_NOTIFICATIONS_LIMIT=200

## Which issue(s) this PR fixes
#1010

## Checklist

- [ ] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated

---------

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2023-03-07 10:48:05 +00:00
Innokentii Konstantinov
4b91203eca
Add validation of hostname for recapctha (#1445)
# What this PR does

- Implement recapthca v3 check. DRF_RECAPTCHA didn't support hostname
validation and it's too complicated to add it.
- Add validation of verification code on oncall side to not to call
twilio with obviously invalid codes

## Checklist

- [x] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-03-06 08:59:48 +00:00
Innokentii Konstantinov
6a5e75e083
Fix of templates api behaviour for public and private api (#1408)
# What this PR does

This PR fixes templates behaviour for public and private api. It fix
"reset to default" for templates from messaging backends and some minor
bugs. Also added acknowledge signal and source link templates

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
2023-03-01 16:32:15 +08:00
Michael Derynck
b3659872a7
Get reCAPTCHA site key from backend env (#1400)
# What this PR does
Move reCAPTCHA site key to backend environment for easier management to
support multiple environments.

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [x] `CHANGELOG.md` updated
2023-02-24 15:53:35 +00:00
Joey Orlando
c55a9010f7
Add Google reCAPTCHA for mobile app phone verification (#1373)
# What this PR does

Adds reCAPTCHA validation to the get mobile verification code endpoint

## Which issue(s) this PR fixes

## Checklist

- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated

---------

Co-authored-by: Maxim <maxim.mordasov@grafana.com>
2023-02-21 20:17:06 +01:00
Ildar Iskhakov
1b7ada4315
Add database migrations linter (#1020)
# What this PR does

This PR adds
[django-migration-linter](https://github.com/3YOURMIND/django-migration-linter)
to keep database migrations
 backwards compatible

- we can automatically run migrations and they are zero-downtime, e.g.
old code can work with the migrated database
 - we can run and rollback migrations without worrying about data safety
- OnCall is deployed to the multiple environments core team is not able
to control

See [django-migration-linter
checklist](https://github.com/3YOURMIND/django-migration-linter/blob/main/docs/incompatibilities.md)
for the common mistakes and best practices


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-06 16:01:37 +08:00
Vadim Stepanov
070eb6e538
Enable mobile app backend by default on OSS (#1286)
# What this PR does
Enables mobile app backend by default on OSS.

## Checklist
- [x] `CHANGELOG.md` updated
2023-02-03 12:44:22 +00:00
Vadim Stepanov
9b709e86c9
Fix local dev setup slowness (#1270)
# What this PR does
Fixes an issue when a local dev setup becomes extremely slow.

- Set `DEBUG` and `SILK_PROFILER_ENABLED` to `False` by default + add
utility make commands to toggle it
- Use `uwsgi` instead of Django's built-in `runserver` for local dev
setup
- Limit Celery concurrency to 3 for local dev setup (previously was 20,
used >1GB RAM on my machine)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-02-02 09:08:48 +00:00
Joey Orlando
94fe7979cf
add django-dbconn-retry library (#1262) 2023-01-31 20:17:54 +01:00
Ildar Iskhakov
4a8011d236
Add silk setting to store .prof files in the specific folder and share it between uwsgi workers (#1228)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-26 20:33:04 +08:00
Ildar Iskhakov
a6a781320d
Set SILKY_PYTHON_PROFILER_BINARY setting to False by default (#1218)
# What this PR does
Here is the example of the visualisation with `snakeviz`
<img width="1126" alt="Screenshot 2023-01-25 at 22 15 49"
src="https://user-images.githubusercontent.com/2262529/214586753-ad49a002-27e1-4e44-82f2-4ad5f4e40101.png">


## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-25 22:17:17 +08:00
Ildar Iskhakov
0a00d3e2c1
Update base.py 2023-01-20 20:20:51 +08:00
Matias Bordese
693b5a41c4
Add slack command to trigger direct paging (#1154)
Slash command needs to be added to slack app manifest:

```
  slash_commands:
    - command: /escalate
      url: https://<oncall-public-url>/slack/interactive_api_endpoint/
      description: Create a new alert group escalation
      should_escape: false
```
2023-01-20 09:06:27 -03:00
Ildar Iskhakov
aec54707ec
Add pyroscope integration (#1176)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-20 18:47:16 +08:00
Ildar Iskhakov
c06709fdb6
Move silk profiler under env variable setting (#1175)
# What this PR does

This PR moves silk profiler under the settings flag which can be
configured with env vars. It will allow us to enable silk on the
clusters, e.g. dev

## Which issue(s) this PR fixes

## Checklist

- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
2023-01-20 18:19:31 +08:00
Joey Orlando
98241b9a10
fake-data generation script + fixes for django-silk and django-debug-toolbar (#1128)
# What this PR does

## Main stuff

- add Python script to populate local Grafana/OnCall setup w/ large
amounts of fake data. Right now the data types that can be generated
are:
- teams and Admin users via the Grafana API (must be synced manually by
going into the UI before going onto the next step)
- Calendar Schedules which have three 8h oncall-shifts, via the OnCall
public API
- fixes `django-debug-toolbar` when being run in `docker-compose`
locally

## Other stuff
- documents how to easily modify the Grafana `docker-compose` container
provisioning configuration
- document solutions for two backend setup related issues encountered
when running the engine/celery workers locally, outside of
`docker-compose`, on an Apple silicon Mac
- fixes small bug in `grafana_plugin.helpers.client.APIClient.call_api`
where it would call `response.json()` for all requests, regardless of
whether or not the response actually contained data or not
- in `engine/settings/dev.py`, properly setup `django-silk` and document
the steps to use it locally
- make it possible to log out debug SQL queries by specifying
`DEV_DEBUG_VIEW_SQL_QUERIES` env var, rather than having to uncomment
out a section of `settings/dev.py`

## Which issue(s) this PR fixes

- Some local setup issues when trying to use `django-silk` and
`django-debug-toolbar`
- Makes it much easier to populate your local setup with a lot of fake
data
- Makes it possible to easily modify your local grafana's provisioning
configuration

## Checklist

- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
2023-01-20 09:19:41 +01:00
Matias Bordese
90def88752
Add escalation chain option when creating a direct page alert group (#1143)
Also changes the default integration used when creating an alert group
for a direct page to a custom manual integration to avoid
conflicts/unexpected behaviors with existing manual alerts.
2023-01-18 12:58:26 -03:00
Vadim Stepanov
59f2c293e7
Move FCM relay logic into a celery task (#1137) 2023-01-13 19:28:34 +00:00
Vadim Stepanov
a1e4f72280
remove send_link_to_channel_message_or_fallback_to_full_incident 2023-01-06 11:34:11 +00:00
Innokentii Konstantinov
8abbcee050
Org soft-delete (#1073)
# What this PR does
It introduces soft-delete of organization, since grafana stacks are
soft-deleted too. Also, we had a problem with deleting orgs with large
amounts of alerts, so soft-deletion will fix this problem. I think, that
problem of cleaning alerts of deleted orgs should be solved as a part of
alert retention
2023-01-05 12:42:55 +08:00
Ildar Iskhakov
2b0e4e1d14 Merge branch 'dev' into iskhakov/add-tracing 2023-01-04 10:46:49 +08:00
Joey Orlando
d1a43bdf1b
specify Firebase GCP project id (#1042)
Modifies the Firebase app initialization to explicitly specify the GCP
project ID where the Firebase app is. Previously it would use the
project associated with the service account being used.
2022-12-22 21:44:53 +01:00
Joey Orlando
7ebc9cbbf7
modify push notification settings + use fcm-django library (#998)
- swaps out `django-push-notifications` for
[`fcm-django`](https://github.com/grafana/fcm-django). Again.. this is a
fork of the parent repo for exactly the same reason.. the migrations
point to `auth_user` without letting us use our own user model, this has
been patched in the `grafana` fork. The reason why we are using
`fcm-django` vs `django-push-notifications` is that the latter does not
support the new FCM API, only the "legacy" API. The legacy FCM API does
not support certain push notification settings that we would like to
use.
- modifies the iOS/Android specific push notification settings
- adds a `flower` pod in the `docker-compose-developer.yml`, useful for
debugging tasks locally
- sets the mobile app verification token TTL to 5 minutes when
developing locally. The default of 1 minute makes working with device
emulators really tricky..

This PR also swaps out the base image in `engine/Dockerfile` from
`python:3.9-alpine3.16` to `python:3.9-slim-buster`.

As to why.. in short, with the introduction of the `fcm-django` library
there is now a peer-dependency on
[`grpcio`](https://github.com/grpc/grpc) (which is used by
`firebase_admin`.. which I am using in this PR to interact directly with
Firebase Cloud Messaging (FCM)). `grpcio` does not publish wheels (read:
compiled binaries) for the Alpine distro. It does publish wheels for
Debian and hence `pip install -r requirements.txt` does not need to
build this library from the source distribution.

This is a [known
"issue"](https://github.com/grpc/grpc/issues/22815#issuecomment-1107874367)
and the recommended solution in the community is to.. not use alpine.

These were the numbers, when building the image locally, in terms of
image size and build time:

| | Local image size (uncompressed | Build time (may differ based on
your network speed) |
| ------------------------- | -------------------------------------- |
---------- |
| `python:3.9-alpine3.16`   | 785MB  | 320s |
| `python:3.9-slim-buster` | 1.05GB  | 90s   |

Co-authored-by: Salvatore Giordano <salvatoregiordanoo@gmail.com>
2022-12-20 12:41:34 +01:00
Ildar Iskhakov
fa3413d2a9 Add tracing support 2022-12-19 17:15:06 +08:00
Joey Orlando
5967d5af63
remove apns + fix django-push-notifications migrations (#984)
- removes APNS support
- changes the `django-push-notification` library from the `iskhakov`
fork to the [`grafana`
fork](https://github.com/grafana/django-push-notifications). This new
fork basically just patches an issue which affected the database
migrations of this django app (previously the library would not respect
the `USER_MODEL` setting when creating its tables and would instead
reference the `auth_user` table.. which we don't want)
- add `--no-cache` flag to the `make build` command

**NOTE**
A migration should be applied as follows:
```bash
# remove the four push_notifications tables, which have improper foreign key references
python manage.py migrate push_notifications zero

# recreate the tables with the proper foreign key references
python manage.py migrate
```
2022-12-13 13:00:59 +01:00
Vadim Stepanov
1878b7e596
Mobile app FCM support (#923)
* Add ability to configure FCM_API_KEY and FCM_POST_URL

* Delete APNSDevice and GCMDevice instances when unlinking the mobile app backend

* Add a simple FCM relay endpoint

* GCM -> FCM

* comment
2022-12-01 15:17:01 +00:00
Ildar Iskhakov
132cf1da7f
Add celery profiling (#913) 2022-11-29 16:20:41 +08:00
Michael Derynck
3582f9b08f
Improve Jinja Template feedback and error handling (#884)
* Improve feedback so template errors are given to user

* Add security error logging

* Add limits for templates, payloads, results

* Show popup error notification for webhook errors and template errors that don't have a result

* Update tests

* Split exceptions into warnings/errors to give more control when previewing, rendering, saving templates

* Limit title lengths

* Make TypeError a warning

* Adjust title length limit

* Remove length limiting on urlize since it is being done on template render

* Fix tests

* Add KeyError and ValueError to warnings

* No longer enforcing json result when saving webhook in case it is dependent on payload

* Add tests for expected exceptions coming from apply_jinja_template

* Update changelog

* Send raw post if template result is not JSON
2022-11-28 09:46:51 -07:00
Vadim Stepanov
255964ceaf
Mobile app messaging backend (#874)
* move mobile notifications to a separate backend, remove critical notification

* remove outdated mobile app code

* MOBILE_APP_PUSH_NOTIFICATIONS_ENABLED -> FEATURE_MOBILE_APP_INTEGRATION_ENABLED

* create error log if no devices are set up

* move mobile auth related code to the mobile_app Django app

* move mobile auth related code to the mobile_app Django app

* move mobile auth related code to the mobile_app Django app

* fix typing

* add GCMDevice todos

* add user connection capabilities

* add user connect/disconnect to the messaging backend

* move APNS endpoint to mobile_app Django app

* restore critical notifications

* support hackathon app

* tweak migrations so mobile app auth tokens are preserved

* reuse notify_by IDs

* use mobile app template to render push notification

* add GCM/FCM (Android) support

* fix unlink user

* logger.error -> logger.info
2022-11-23 15:56:43 +00:00
Yulya Artyukhina
381520ee13
Get rid of installation token + add a bunch of tests (#624)
* Get rid of installation token (for OSS installations)

This is done by being required to supply the grafana API URL as an
environment variable on the backend. Additionally, optionally an OnCall
API URL environment variable can be passed in to the frontend (this basically
allows completely skipping the need to configure anything).
- deduplicated a lot of the sync logic on the frontend + made
error message more useful and consistent
- Split PluginConfigPage component into several subcomponents
(making it easier to test each individual component)
- Moved RootWithLoader (from plugin/GrafanaPluginRootPage) into its own
subcomponent (making it easier to test)
- Added tests for pre-existing components that were touched:
  - PluginConfigPage component (and its new subcomponents)
  - state/plugin and state/rootBaseStore functions
  - apps.grafana_plugin django app

Helm changes:
- add GRAFANA_API_URL to oncall.env
- some yaml autoformatting changes
- remove reference to python manage.py issue_invite_for_the_frontend --override

Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
2022-11-21 16:26:00 +01:00
Joey Orlando
fd4877408a
remove grafana_plugin_management django app (#812)
* remove grafana_plugin_management django app

it seems to be no longer used or referenced. In addition apps.api.serializers.organization.PluginOrganizationSerializer was only
referenced from within grafana_plugin_management and is thereby safe
to remove.
2022-11-09 13:53:59 +01:00
Michael Derynck
fc78dd98da
Merge pull request #707 from grafana/add-region-to-organization
Add region info to organizations
2022-11-08 10:30:53 -07:00
Michael Derynck
f01d754851 Merge dev 2022-11-08 10:14:35 -07:00
Innokentii Konstantinov
9c550af721
Support of oncall-gw (#741)
* Draft support of oncall-gw

* Clean up

* Create oncall connector on org create in gcom

* Naming fixes

* Rework oncall-gateway package. \nMove it from apps.

* Fix typo
2022-11-08 14:43:22 +08:00
Joey Orlando
78d01df864
One startup command to rule them all (#760)
* Modify `docker-compose-developer` configuration files, and `Makefile`
to support running everything in containers for local development

- Make use of the COMPOSE_PROFILES env var that is supported by
docker-compose to allow swapping-out/turning off certain docker-compose
services.
- add makefile cleanup command. Will remove all docker resources related
to running the project locally
- The "restart grafana container" issue, where users would need
to restart their grafana container when setting up the project for the
first time, is now fixed (make command now runs yarn build:dev before docker-compose startup;
this ensures grafana-plugin/dist is available for grafana container before it starts up)
- The DEVELOPER.md has been updated as well to reflect these new changes. It
has been moved to ./dev/README.md (and references to the old file have
been updated).
- The redis image that is referenced in the docker-compose files
has been pinned to v7.0.5 (latest version as of this commit) to avoid
any surprises w/ future releases.
- remove root .dockerignore in favour of individual .dockerignore files
in ./engine and ./grafana-plugin
2022-11-07 16:34:43 +01:00
Vadim Stepanov
b7043277a4
Specify queue for apps.email.tasks.notify_user_async 2022-11-04 14:20:26 +00:00
Michael Derynck
81702ba52d Merge dev 2022-11-03 12:42:36 -06:00
Michael Derynck
369a23551b Merge branch 'dev' into add-region-to-organization 2022-11-03 12:41:07 -06:00
Vadim Stepanov
e2456315af
Allow no-auth SMTP connection for email notifications (#759)
* DEFAULT_FROM_EMAIL -> EMAIL_FROM

* add EMAIL_FROM live setting

* EMAIL_FROM -> EMAIL_FROM_ADDRESS

* merge dev

* add test for get_from_email

* allow live settings to be null on internal API

* remove redundant tests
2022-11-03 16:18:37 +00:00
Matvey Kukuy
85d94d342e
Readme about how to add Integrations and Zabbix Integration (#653)
* Readme and zabbix

* Typos

* Linking
2022-11-02 22:58:47 +08:00
Michael Derynck
2905423bad Merge dev 2022-11-01 17:56:03 -06:00
Vadim Stepanov
035584f17e
Enable email backend by default (#740)
* enable email backend by default, catch empty EMAIL_HOST

* fix tests

* fix tests

* fix tests
2022-11-01 13:48:36 +00:00
Michael Derynck
9d7c370752 Merge branch 'dev' into add-region-to-organization 2022-10-27 15:41:35 -06:00
Joey Orlando
7974f32239
Revert "Revert "Remove migration-tool Django app and UI page (#708)"" 2022-10-27 15:42:21 +02:00