# What this PR does
RequestBodyReadingMiddleware is excess as [post-buffering is
enabled](https://github.com/grafana/oncall/blob/dev/engine/uwsgi.ini#L17):
If an HTTP request has a body (like a POST request generated by a form),
you have to read (consume) it in your application. If you do not do
this, the communication socket with your webserver may be clobbered. If
you are lazy you can use the post-buffering option that will
automatically read data for you. For Rack applications this is
automatically enabled.
(https://uwsgi-docs.readthedocs.io/en/latest/ThingsToKnow.html)
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR moves phone notification logic into separate object PhoneBackend
and introduces PhoneProvider interface to hide actual implementation of
external phone services provider. It should allow add new phone
providers just by implementing one class (See SimplePhoneProvider for
example).
# Why
[Asterisk PR](https://github.com/grafana/oncall/pull/1282) showed that
our phone notification system is not flexible. However this is one of
the most frequent community questions - how to add "X" phone provider.
Also, this refactoring move us one step closer to unifying all
notification backends, since with PhoneBackend all phone notification
logic is collected in one place and independent from concrete
realisation.
# Highligts
1. PhoneBackend object - contains all phone notifications business
logic.
2. PhoneProvider - interface to external phone services provider.
3. TwilioPhoneProvider and SimplePhoneProvider - two examples of
PhoneProvider implementation.
4. PhoneCallRecord and SMSRecord models. I introduced these models to
keep phone notification limits logic decoupled from external providers.
Existing TwilioPhoneCall and TwilioSMS objects will be migrated to the
new table to not to reset limits counter. To be able to receive status
callbacks and gather from Twilio TwilioPhoneCall and TwilioSMS still
exists, but they are linked to PhoneCallRecord and SMSRecord via fk, to
not to leat twilio logic into core code.
---------
Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
Bring back `FCM_PROJECT_ID` env variable that was removed in
https://github.com/grafana/oncall/pull/1969.
I made an incorrect assumption that project ID is already specified in
the credentials file, but in fact project ID can be different from the
one in credentials file.
# What this PR does
Allow passing Google application credentials (used to send FCM messages
using `fcm-django`) as an environment variable
`GOOGLE_APPLICATION_CREDENTIALS_JSON_BASE64`. If the env variable is not
provided, credentials will be taken from file. This change allows uWSGI
workers send messages to FCM (currently it's not possible because the
uWSGI user doesn't have access to the credentials file) + makes
configuration more consistent.
Also removes a redundant `FCM_PROJECT_ID` env variable (Google
application credentials already contain the project ID).
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
add a new endpoint, `GET /maintenance-mode/`, which returns either a
string message pulled from the
`CURRENTLY_UNDERGOING_MAINTENANCE_MESSAGE` env var, or `None` + update
the UI to conditionally show this message if it is set
<img width="1321" alt="Screenshot 2023-05-10 at 11 28 16"
src="https://github.com/grafana/oncall/assets/9406895/833a77fb-3a90-4f9f-88d6-dae0d98d99d4">
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
https://www.loom.com/share/c5deb35309604cfdab6176c44de7b15e
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
- Rename Firing to Alert Group Created to reduce confusion as to why the
event only first once and not when unresolve or unacknowledge returns
the alert group to the firing state.
- Increase password field length
- Do not filter webhook execution by team, team is just for filtering
ownership now
- Do not log webhook triggers in alert group escalation log if the
webhook does not trigger (Status/response will still be stored)
- Fix formatting for response content and data fields on the Status page
- Add a content length limit for responses being stored (50000
characters)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
This PR cuts GitHub Action build times from 14-15 minutes, down to just
under 7 minutes. It does this by:
- caching `grafana-plugins/node_modules` and `pip` dependencies based on
their respective dependency files (eg. `requirements.txt` &
`yarn.lock`). This step alone saves ~3 minutes.
- get rid of the "build-engine-docker-image" and
"backend-integration-tests" jobs in the old "Integration Tests"
workflow. This was split out this way so that we could build the backend
docker image once, upload the artifact, and then reuse it across the
backend and e2e tests. We no longer need these backend integration tests
because we are testing the same thing in the e2e tests. This saves ~45
seconds of having to upload the image artifact.
- few improvements within the integration tests themselves:
- move plugin configuration to the `globalSetup.ts`. This means that
every test does not need to check if the plugin has been configured
because it is done once before all the tests are run.
- cache the plugin frontend build. If your commit doesn't change
anything to `grafana-plugin/src` or `grafana-plugin/yarn.lock` it should
be safe to reuse a previously built/cached version of the plugin
frontend. This saves ~3 minutes
- cache playwright binaries/dependencies. Only re-install them if the
version of `@playwright/test` in `grafana-plugin/yarn.lock` changes.
This saves ~3 minutes.
**Other things to mention**
Once we refactor the `GSelect` component to not call the `onChange`
callback on every keyDown event (#1628), this should allow us to
parallelize the integration tests, and cut the time required to execute
the tests themselves in half
# What this PR does
This PR:
- modifies the `check_escalation_finished_task` celery task to:
- do stricter escalation validation based on the alert group's
escalation snapshot (see the `audit_alert_group_escalation` method in
`engine/apps/alerts/tasks/check_escalation_finished.py` for the
validation logic)
- use a read-only database for querying alert-groups if one is
configured, otherwise use the "default" one
- ping a configurable heartbeat (new env var
`ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL` added)
- increase the task frequency from every 10 to every 13 minutes (this
can be configured via an env variable)
- adds public documentation on how to configure this auditor task
- modifies the local celery startup command to properly take into
consideration all celery related env vars (similar to the ones we use in
`engine/celery_with_exporter.sh`; this made it easier to enable `celery
beat` locally for testing)
- removes the following code:
- removes references to `AlertGroup.estimate_escalation_finish_time` and
marks the model field as deprecated using the [`django-deprecate-fields`
library](https://pypi.org/project/django-deprecate-fields/). This field
was only used for the previous version of this validation task
- `EscalationSnapshotMixin.calculate_eta_for_finish_escalation` was only
used to calculate the value for
`AlertGroup.estimate_escalation_finish_time`
- `calculate_escalation_finish_time` celery task
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1558
## Checklist
- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
This PR add Inbound Email integration.
It designed to support some variety of ESPs, but in prod we will use
Mailgun, so locally I tested it only with mailgun ESP.
**Important:**
To make it work on different clusters I'm planning to provide different
email domains for different regions, like ....@us.oncall.grafana.net,
...@eu.oncall.grafana.net
---------
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
It's a duplicate of LICENSE env var
**What this PR does**:
Remove OSS_INSTALLATION env var in favour of LICENSE env var. Also, I
refactored features tests a little. From my point of view it makes
little sense to test if all features are disabled or enabled. Better to
test specific use-case (e.g. oss installation).
Also to test that all features are disabled it is needed to set LICENSE
equals cloud license, which makes test confusing.
**Checklist**
- [x] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
# What this PR does
- Implement recapthca v3 check. DRF_RECAPTCHA didn't support hostname
validation and it's too complicated to add it.
- Add validation of verification code on oncall side to not to call
twilio with obviously invalid codes
## Checklist
- [x] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
# What this PR does
This PR fixes templates behaviour for public and private api. It fix
"reset to default" for templates from messaging backends and some minor
bugs. Also added acknowledge signal and source link templates
## Checklist
- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
# What this PR does
Move reCAPTCHA site key to backend environment for easier management to
support multiple environments.
## Which issue(s) this PR fixes
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [x] `CHANGELOG.md` updated
# What this PR does
Adds reCAPTCHA validation to the get mobile verification code endpoint
## Which issue(s) this PR fixes
## Checklist
- [x] Tests updated
- [ ] Documentation added (N/A)
- [x] `CHANGELOG.md` updated
---------
Co-authored-by: Maxim <maxim.mordasov@grafana.com>
# What this PR does
This PR adds
[django-migration-linter](https://github.com/3YOURMIND/django-migration-linter)
to keep database migrations
backwards compatible
- we can automatically run migrations and they are zero-downtime, e.g.
old code can work with the migrated database
- we can run and rollback migrations without worrying about data safety
- OnCall is deployed to the multiple environments core team is not able
to control
See [django-migration-linter
checklist](https://github.com/3YOURMIND/django-migration-linter/blob/main/docs/incompatibilities.md)
for the common mistakes and best practices
## Which issue(s) this PR fixes
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
Fixes an issue when a local dev setup becomes extremely slow.
- Set `DEBUG` and `SILK_PROFILER_ENABLED` to `False` by default + add
utility make commands to toggle it
- Use `uwsgi` instead of Django's built-in `runserver` for local dev
setup
- Limit Celery concurrency to 3 for local dev setup (previously was 20,
used >1GB RAM on my machine)
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Slash command needs to be added to slack app manifest:
```
slash_commands:
- command: /escalate
url: https://<oncall-public-url>/slack/interactive_api_endpoint/
description: Create a new alert group escalation
should_escape: false
```
# What this PR does
This PR moves silk profiler under the settings flag which can be
configured with env vars. It will allow us to enable silk on the
clusters, e.g. dev
## Which issue(s) this PR fixes
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
# What this PR does
## Main stuff
- add Python script to populate local Grafana/OnCall setup w/ large
amounts of fake data. Right now the data types that can be generated
are:
- teams and Admin users via the Grafana API (must be synced manually by
going into the UI before going onto the next step)
- Calendar Schedules which have three 8h oncall-shifts, via the OnCall
public API
- fixes `django-debug-toolbar` when being run in `docker-compose`
locally
## Other stuff
- documents how to easily modify the Grafana `docker-compose` container
provisioning configuration
- document solutions for two backend setup related issues encountered
when running the engine/celery workers locally, outside of
`docker-compose`, on an Apple silicon Mac
- fixes small bug in `grafana_plugin.helpers.client.APIClient.call_api`
where it would call `response.json()` for all requests, regardless of
whether or not the response actually contained data or not
- in `engine/settings/dev.py`, properly setup `django-silk` and document
the steps to use it locally
- make it possible to log out debug SQL queries by specifying
`DEV_DEBUG_VIEW_SQL_QUERIES` env var, rather than having to uncomment
out a section of `settings/dev.py`
## Which issue(s) this PR fixes
- Some local setup issues when trying to use `django-silk` and
`django-debug-toolbar`
- Makes it much easier to populate your local setup with a lot of fake
data
- Makes it possible to easily modify your local grafana's provisioning
configuration
## Checklist
- [ ] Tests updated (N/A)
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
Also changes the default integration used when creating an alert group
for a direct page to a custom manual integration to avoid
conflicts/unexpected behaviors with existing manual alerts.
# What this PR does
It introduces soft-delete of organization, since grafana stacks are
soft-deleted too. Also, we had a problem with deleting orgs with large
amounts of alerts, so soft-deletion will fix this problem. I think, that
problem of cleaning alerts of deleted orgs should be solved as a part of
alert retention
Modifies the Firebase app initialization to explicitly specify the GCP
project ID where the Firebase app is. Previously it would use the
project associated with the service account being used.
- swaps out `django-push-notifications` for
[`fcm-django`](https://github.com/grafana/fcm-django). Again.. this is a
fork of the parent repo for exactly the same reason.. the migrations
point to `auth_user` without letting us use our own user model, this has
been patched in the `grafana` fork. The reason why we are using
`fcm-django` vs `django-push-notifications` is that the latter does not
support the new FCM API, only the "legacy" API. The legacy FCM API does
not support certain push notification settings that we would like to
use.
- modifies the iOS/Android specific push notification settings
- adds a `flower` pod in the `docker-compose-developer.yml`, useful for
debugging tasks locally
- sets the mobile app verification token TTL to 5 minutes when
developing locally. The default of 1 minute makes working with device
emulators really tricky..
This PR also swaps out the base image in `engine/Dockerfile` from
`python:3.9-alpine3.16` to `python:3.9-slim-buster`.
As to why.. in short, with the introduction of the `fcm-django` library
there is now a peer-dependency on
[`grpcio`](https://github.com/grpc/grpc) (which is used by
`firebase_admin`.. which I am using in this PR to interact directly with
Firebase Cloud Messaging (FCM)). `grpcio` does not publish wheels (read:
compiled binaries) for the Alpine distro. It does publish wheels for
Debian and hence `pip install -r requirements.txt` does not need to
build this library from the source distribution.
This is a [known
"issue"](https://github.com/grpc/grpc/issues/22815#issuecomment-1107874367)
and the recommended solution in the community is to.. not use alpine.
These were the numbers, when building the image locally, in terms of
image size and build time:
| | Local image size (uncompressed | Build time (may differ based on
your network speed) |
| ------------------------- | -------------------------------------- |
---------- |
| `python:3.9-alpine3.16` | 785MB | 320s |
| `python:3.9-slim-buster` | 1.05GB | 90s |
Co-authored-by: Salvatore Giordano <salvatoregiordanoo@gmail.com>
- removes APNS support
- changes the `django-push-notification` library from the `iskhakov`
fork to the [`grafana`
fork](https://github.com/grafana/django-push-notifications). This new
fork basically just patches an issue which affected the database
migrations of this django app (previously the library would not respect
the `USER_MODEL` setting when creating its tables and would instead
reference the `auth_user` table.. which we don't want)
- add `--no-cache` flag to the `make build` command
**NOTE**
A migration should be applied as follows:
```bash
# remove the four push_notifications tables, which have improper foreign key references
python manage.py migrate push_notifications zero
# recreate the tables with the proper foreign key references
python manage.py migrate
```