Occasionally, the Playwright global setup step (which authenticates w/
the Grafana API + configures the plugin) would fail, leading to the CI
job to instantly fail (playwright doesn't retry global setup if it
fails).
My current hypothesis as to why this is happening is because the
`oncall-engine` and `oncall-celery` pods aren't _actually_ ready in
these cases based on the way the `jupyterhub/action-k8s-await-workloads`
action await k8s workloads:
<img width="1076" alt="Screenshot 2023-05-23 at 18 24 36"
src="https://github.com/grafana/oncall/assets/9406895/68d8d2d9-4274-4749-8788-e0a9a3dbad83">
By using the `kubectl rollout status deployment/<deployment-name>
--timeout=300s` instead, we can be sure that these pods are _actually_
ready to receive traffic before we start the tests.
```bash
❯ kubectl rollout status --help
Show the status of the rollout.
By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for
the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout
status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled
over by another revision, use --revision=N where N is the revision you need to watch for.
```
Lastly, even despite this, sometimes the `POST
/api/internal/v1/plugin/sync` endpoint will return HTTP 500 ([example
logs](https://github.com/grafana/oncall/actions/runs/5062712137/jobs/9088529416#step:19:2536)
from failed CI job). In this case, let's setup the Playwright global
setup to retry 3 times.
#1692 is still open. This PR is not an ideal approach, but it's a quick
win while we wait for that issue to be resolved.
By retrying failing tests up to 3 times, we _should_ be fine to
re-enable these on CI. If a test is failing > 3 times, there's likely a
legitimate issue occuring.
# What this PR does
- Improvement to the local development environment for the grafana
plugin
- Run initial yarn build inside the docker container with the same
version that is later used for periodic rebuilds
- Removes the requirement for having yarn/nodejs installed locally
- Using a named volume for storing the node_modules, so they are only
stored once
- Remove the yarn install step from the Dockerfile
- Ideally we store the node_modules only once inside the named volumes.
Currently they are stored times
- on the host system outside of dockerin grafana-plugins/node_modules
- inside the docker image
- inside the anonymous docker volume created at the start of a container
- update `node` to 18.16.0 (14.17.0 has reached end-of-life as of 3
weeks ago)
## Which issue(s) this PR fixes
## Checklist
- [X] ~Unit, integration, and e2e (if applicable) tests updated~ N/A
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Upgrades the backend to Python 3.11.3 (latest stable release) + update
linting step on Drone builds to run **all** the linting steps, not just
the Python ones.
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Replaces `snyk test` with `snyk monitor` so results get pushed to out
Snyk platform and the [Snyk
Dashboards](https://ops.grafana-ops.net/d/H0w7l5NVk/snyk-overview?orgId=1)
gets updated.
## Which issue(s) this PR fixes
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
This PR cuts GitHub Action build times from 14-15 minutes, down to just
under 7 minutes. It does this by:
- caching `grafana-plugins/node_modules` and `pip` dependencies based on
their respective dependency files (eg. `requirements.txt` &
`yarn.lock`). This step alone saves ~3 minutes.
- get rid of the "build-engine-docker-image" and
"backend-integration-tests" jobs in the old "Integration Tests"
workflow. This was split out this way so that we could build the backend
docker image once, upload the artifact, and then reuse it across the
backend and e2e tests. We no longer need these backend integration tests
because we are testing the same thing in the e2e tests. This saves ~45
seconds of having to upload the image artifact.
- few improvements within the integration tests themselves:
- move plugin configuration to the `globalSetup.ts`. This means that
every test does not need to check if the plugin has been configured
because it is done once before all the tests are run.
- cache the plugin frontend build. If your commit doesn't change
anything to `grafana-plugin/src` or `grafana-plugin/yarn.lock` it should
be safe to reuse a previously built/cached version of the plugin
frontend. This saves ~3 minutes
- cache playwright binaries/dependencies. Only re-install them if the
version of `@playwright/test` in `grafana-plugin/yarn.lock` changes.
This saves ~3 minutes.
**Other things to mention**
Once we refactor the `GSelect` component to not call the `onChange`
callback on every keyDown event (#1628), this should allow us to
parallelize the integration tests, and cut the time required to execute
the tests themselves in half
Related to #828
- Enable web UI for API/Terraform schedules to add overrides
- Refactor backend to add a flag toggling between web-based and
iCal-based overrides (these options are mutually exclusive)
Also updated read-only tooltips (related to #1483)
**What this PR does**:
Adds our first UI integration test using
[Playwright](https://playwright.dev/) and runs the test on CI. Right now
the test:
- logs into Grafana
- configures the plugin (if it isn't already)
- creates an OnCall schedule, where the current user will be OnCall
- creates an escalation chain to notify based on the newly created
OnCall schedule
- creates a webhook integration, attached to the created escalation
chain
- sends a demo alert for the new integration
- goes to the alert groups page and validates that the escalation step
to alert the OnCall user actually happened
Currently the Playwright tests are run against the 3 default headless
browsers, chromium, Firefox, and webkit. The CI job that runs these
tests is run as a matrix against 3 tagged versions of `grafana`; `main`,
`latest`, and `9.2.6`.
Secondly, it adds most of the logic for a second test which:
- logs into Grafana
- configures the plugin (if it isn't already)
- goes to the user's settings, verifies their phone number (using a tool
called [MailSlurp](https://www.mailslurp.com/))
- configures the current user's default escalation policy to send alerts
via SMS
- creates an escalation policy and configures it to send alerts to our
current user
- creates an integration and assigns the created escalation policy
- triggers a test alert + verifies that we receive the SMS alert text
(again, using MailSlurp)
**Which issue(s) this PR fixes**:
Closes#873
**Checklist**
- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
The new tokens are managed centrally and have a longer expiry.
Administrators of the grafanabot account will be
notified of the pending expiry and the secret can be rotated centrally
without the need for a repository administrator to update their secrets.
The existing repository secrets can safely be removed. The tokens for
those secrets will be removed by the end of this week.
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
no changelog -> pr:no changelog
no public docs -> pr:no public docs
## Which issue(s) this PR fixes
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
# What this PR does
This PR adds
[django-migration-linter](https://github.com/3YOURMIND/django-migration-linter)
to keep database migrations
backwards compatible
- we can automatically run migrations and they are zero-downtime, e.g.
old code can work with the migrated database
- we can run and rollback migrations without worrying about data safety
- OnCall is deployed to the multiple environments core team is not able
to control
See [django-migration-linter
checklist](https://github.com/3YOURMIND/django-migration-linter/blob/main/docs/incompatibilities.md)
for the common mistakes and best practices
## Which issue(s) this PR fixes
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
The new token is set at an organization level so it does not require
repository administrators to rotate the token. It also has the minimal
classic PAT permissions to facilitate the workflow.
It has expiry but that expiry is reported via email to the engineering
organization and the IT Helpdesk have permissions to regenerate the
token when expiration is imminent.
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
This PR adds a new GitHub Action which will run on PRs against `dev` and
`main`. The GitHub action will not run if the label of "no public docs"
has been applied to the PR in question:
Otherwise, it will check to see if any changes were made to either the
`engine` or `grafana-plugin` directories. If so, it will then check
whether changes were also made to the `docs` directory. If not, it will
fail the job and block the build.
# What this PR does
Add a GitHub Action to check that the `CHANGELOG.md` has been updated.
If no `CHANGELOG.md` change is required, simply add the "no changelog"
label to your PR, which will effectively skip this check.
# What this PR does
Add github action workflow to automatically bump oncall helm chart version on each release and create a PR with this change
## Which issue(s) this PR fixes
## Checklist
- [ ] Tests updated
- [ ] Documentation added
- [ ] `CHANGELOG.md` updated
Secrets have already been created by @mderynck
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
- swaps out `django-push-notifications` for
[`fcm-django`](https://github.com/grafana/fcm-django). Again.. this is a
fork of the parent repo for exactly the same reason.. the migrations
point to `auth_user` without letting us use our own user model, this has
been patched in the `grafana` fork. The reason why we are using
`fcm-django` vs `django-push-notifications` is that the latter does not
support the new FCM API, only the "legacy" API. The legacy FCM API does
not support certain push notification settings that we would like to
use.
- modifies the iOS/Android specific push notification settings
- adds a `flower` pod in the `docker-compose-developer.yml`, useful for
debugging tasks locally
- sets the mobile app verification token TTL to 5 minutes when
developing locally. The default of 1 minute makes working with device
emulators really tricky..
This PR also swaps out the base image in `engine/Dockerfile` from
`python:3.9-alpine3.16` to `python:3.9-slim-buster`.
As to why.. in short, with the introduction of the `fcm-django` library
there is now a peer-dependency on
[`grpcio`](https://github.com/grpc/grpc) (which is used by
`firebase_admin`.. which I am using in this PR to interact directly with
Firebase Cloud Messaging (FCM)). `grpcio` does not publish wheels (read:
compiled binaries) for the Alpine distro. It does publish wheels for
Debian and hence `pip install -r requirements.txt` does not need to
build this library from the source distribution.
This is a [known
"issue"](https://github.com/grpc/grpc/issues/22815#issuecomment-1107874367)
and the recommended solution in the community is to.. not use alpine.
These were the numbers, when building the image locally, in terms of
image size and build time:
| | Local image size (uncompressed | Build time (may differ based on
your network speed) |
| ------------------------- | -------------------------------------- |
---------- |
| `python:3.9-alpine3.16` | 785MB | 320s |
| `python:3.9-slim-buster` | 1.05GB | 90s |
Co-authored-by: Salvatore Giordano <salvatoregiordanoo@gmail.com>
* Modify plugin.json to support RBAC role registration
* defines 26 new custom roles in plugin.json. The main roles are:
- Admin: read/write access to everything in OnCall
- Reader: read access to everything in OnCall
- OnCaller : read access to everything in OnCall + edit access to Alert Groups and Schedules
- <object-type> Editor: read/write access to everything related to <object-type>
- <object-type> Reader: read access for <object-type>
- User Settings Admin: read/write access to all user's settings, not just own settings. This is in comparison to User Settings Editor which can only read/write own settings
* update changelog and documentation (#686)
* implement RBAC for OnCall backend
This commit refactors backend authorization. It trys to use RBAC authorization if the org's grafana instance supports it, otherwise it falls back to basic role authorization.
* update RBAC backend tests
* add tests for RBAC changes
- run backend tests as matrix where RBAC is enabled/disabled. When RBAC is enabled, the permissions granted are read from the role grants in the frontend's plugin.json file (instead of relying what we specify in RBACPermission.Permissions)
- remove --reuse-db --nomigrations flags from engine/tox.ini
- minor autoformatting changes to docker-compose-developer.yml
* remove --ds=settings.ci-test from pytest CI command
DJANGO_SETTINGS_MODULE is already specified as an env var so this is just unecessary duplication
* update gitignore
* update github action job name for "test"
* RBAC frontend changes
* refactors the use of basic roles (ex. Viewer, Editor, Admin) use RBAC permissions (when supported), or falling back to basic roles when RBAC is not supported.
- updates the UserAction enum in grafana-plugin/src/state/userAction.ts. Previously this was hardcoded to a list of strings that were being returned by the OnCall API. Now the values here correspond to the permissions in plugin.json (plus a fallback role)
* changes per Gabriel's comments:
- get rid of group attribute in rbac roles
- remove displayName role attribute
- remove hidden role attribute
- add back role to includes section
* don't try to update user timezone if they don't have permission
locally, docker build works as expected. When not specifying a build target, it builds the last target specified in the Dockerfile (in this case "prod"). On GitHub actions this works properly as well.
However, there seems to be something about the version of docker used on Drone that causes it to build all of the stages (and hence failing on enterprise-dev).
Let's instead just be explicit about which build target to use for both drone and GitHub actions.
* add libs for celery + redis
* move redis & cache config to settings/base.py
* move rmq & celery config to settings/base.py
* BROKER -> BROKER_TYPE
* allow multiple database types
* flake8
* add sqlite db creation to dockerfile
* fix ci
* fix ci
* debug
* remove some defaults
* remove prints
* use local memory as cache on ci
* debug
* add DATABASE_DEFAULTS
* add ci test for sqlite + redis
* add ci test for sqlite + redis
* add ci test for sqlite + redis
* debug
* add redis healthcheck
* fix sqlite
* fix dev settings
* refactor dev settings
* tweak ci settings
* clear cache properly between tests
* move db and broker types to constants
* add librabbitmq deps
* use amqp instead of librabbitmq
This is the same workflow that gates the publishing of docs to the
website in the publish-technical-documentation-*.yml workflows.
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
* Log (failed) attempt to notify a user with viewer role
* Remove old publishing workflow
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
* Add publishing workflows for next (unreleased) and released documentation
Notable features:
- Merges are blocked by strict Hugo reference checking. However, this
only works for references that resolve within the repository. Once you
have Hugo references to website pages beyond this repository, you will
want to remove this test job.
- Pushes to main are automatically published to "next" documentation
consistent with our other OSS projects.
- Pushes of release tags publish to a versioned directory in the
website. The website uses `v<MAJOR>.<MINOR>.x` versioning and the
"Determine technical documentation version" step will make sure that a
tag such as `v0.20.7` is mapped to `v0.20.x`.
- Pushes to release branches will only be published if there is an
existing corresponding release tag. For example, pushing to a new
release branch `release-0.1000` will not trigger a publish of
documentation until there is a `v0.1000.0` release tag.
> **Note:** I have used a release branch naming convention
`release-<MAJOR>-<MINOR>` which is consistent with grafana/mimir but I
see that in the old amixr repository there are long lived release
branches for patch versions. If that is required. I can update this PR
to support that but I would recommend not including patch versions in
release branch naming unless you have a good reason to do so.
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
* Add helm chaart installation
* s/mimir/oncall/
Signed-off-by: Jack Baldry <jack.baldry@grafana.com>
* Remove https:// prefix from BASE_URL docker env var
* Fix cloud heartbeat name
* Polishing telegram
* Update docker-compose.yml
* Update plugin README (#48)
* Update README and screenshot, remove plop for build info since version is now displayed prominently
* Sign build
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Build actions (#38)
* Drone, github action changes
* Minor version updates
* Update frontend dependencies
* Re-enable unit test
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Revert stylelint version (#52)
* Revert stylelint version
* Build plugin as well as lint
* Build in previous step
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Update screenshot (#53)
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* oncall images for docs (#55)
* Fix chart
* Finalise helm chart
* Update README.md
* Top menu fix
* Fix db encoding
* Add api key docs
* Reverting utf8 fix
* bug fixes
* fix for link for OSS version
* Fixing utf8 and docker compose
* 8080 -> 8000 port for consistency
* Improve the helm chart
* makeReq
* Fixing images
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Add last moment improvements
* Fixing port
* Replace symlink with file for CHANGELOG.MD (#68)
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Edit Chart.yaml
* Edit version
* Edit README.md
* Fixing port
* Update README.md
* Fix linting
* image: grafana/oncall
* Merge dev to main (#71)
* Merge dev to main (#54)
* Log (failed) attempt to notify a user with viewer role
* Remove https:// prefix from BASE_URL docker env var
* Fix cloud heartbeat name
* Polishing telegram
* Update docker-compose.yml
* Update plugin README (#48)
* Update README and screenshot, remove plop for build info since version is now displayed prominently
* Sign build
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Build actions (#38)
* Drone, github action changes
* Minor version updates
* Update frontend dependencies
* Re-enable unit test
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Revert stylelint version (#52)
* Revert stylelint version
* Build plugin as well as lint
* Build in previous step
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Update screenshot (#53)
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
Co-authored-by: Matvey Kukuy <Matvey-Kuk@users.noreply.github.com>
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matvey Kukuy <matvey@amixr.io>
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Merge dev to main (#69)
* Log (failed) attempt to notify a user with viewer role
* Remove https:// prefix from BASE_URL docker env var
* Fix cloud heartbeat name
* Polishing telegram
* Update docker-compose.yml
* Update plugin README (#48)
* Update README and screenshot, remove plop for build info since version is now displayed prominently
* Sign build
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Build actions (#38)
* Drone, github action changes
* Minor version updates
* Update frontend dependencies
* Re-enable unit test
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Revert stylelint version (#52)
* Revert stylelint version
* Build plugin as well as lint
* Build in previous step
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Update screenshot (#53)
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* oncall images for docs (#55)
* Update README.md
* Top menu fix
* Fix db encoding
* Add api key docs
* Reverting utf8 fix
* bug fixes
* fix for link for OSS version
* Fixing utf8 and docker compose
* 8080 -> 8000 port for consistency
* makeReq
* Fixing images
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Fixing port
* Replace symlink with file for CHANGELOG.MD (#68)
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
Co-authored-by: Matvey Kukuy <Matvey-Kuk@users.noreply.github.com>
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matvey Kukuy <matvey@amixr.io>
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
Co-authored-by: Alyssa Wada <101596687+alyssawada@users.noreply.github.com>
Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
Co-authored-by: Ildar Iskhakov <Ildar.iskhakov@grafana.com>
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
Co-authored-by: Matvey Kukuy <Matvey-Kuk@users.noreply.github.com>
Co-authored-by: Matvey Kukuy <matvey@amixr.io>
Co-authored-by: Alyssa Wada <101596687+alyssawada@users.noreply.github.com>
Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
Co-authored-by: Jack Baldry <jack.baldry@grafana.com>
Co-authored-by: Ildar Iskhakov <ildar.iskhakov@grafana.com>
Co-authored-by: Matvey Kukuy <Matvey-Kuk@users.noreply.github.com>
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matvey Kukuy <matvey@amixr.io>
Co-authored-by: Alyssa Wada <101596687+alyssawada@users.noreply.github.com>
Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
* Log (failed) attempt to notify a user with viewer role
* Remove https:// prefix from BASE_URL docker env var
* Fix cloud heartbeat name
* Polishing telegram
* Update docker-compose.yml
* Update plugin README (#48)
* Update README and screenshot, remove plop for build info since version is now displayed prominently
* Sign build
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Build actions (#38)
* Drone, github action changes
* Minor version updates
* Update frontend dependencies
* Re-enable unit test
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Revert stylelint version (#52)
* Revert stylelint version
* Build plugin as well as lint
* Build in previous step
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
* Update screenshot (#53)
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
Co-authored-by: Matias Bordese <mbordese@gmail.com>
Co-authored-by: Matvey Kukuy <Matvey-Kuk@users.noreply.github.com>
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
Co-authored-by: Matvey Kukuy <matvey@amixr.io>
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>