# What this PR does
- introduce e2e tests in Tilt
- support e2e tests commands in Makefile
- stabilize local setup
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/3492
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Adds an ability to extract labels from alert group payload. See
[demo](https://www.loom.com/share/cf2b746eea974547b76f44298e32a54f?sid=67ed1e58-40ed-4136-a201-6482fb7773d3).
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2304
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com>
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
# What this PR does
In PR pipelines install dependencies and run e2e tests only in Chromium.
In daily e2e workflow use Chromium, Firefox and Webkit.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Stabilize e2e tests by:
- improve usage of locators
- fix unreliable selectors
- prevent parallelism within the same test file
Additionally:
- configure eslint for e2e tests and fix existing errors/warnings
- bump Playwright version to latest stable
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/3217
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
- updates the GitHub Actions workflow to move the e2e tests into a
"[reusable
workflow](https://docs.github.com/en/actions/using-workflows/reusing-workflows#creating-a-reusable-workflow)"
which are run in two scenarios:
- all tests _except_ those annotated as `@expensive` are run against
`grafana/grafana:latest` on all feature branches
- all tests _including_ `@expensive` tests are run on weekdays @ 07h00
UTC, against a matrix of 6 grafana versions. Results of these builds
will be posted to `#irm-amixr-flux` Slack channel.
- local development will now be:
```bash
make build-dev-images init-k8s start-k8s
```
- `build-dev-images` - builds the engine and UI docker images (only need
to run first time)
- `init-k8s` - creates a `kind` cluster and loads the two Docker images
onto the cluster nodes (only need to run first time)
- `start-k8s` - switches `kubectl` context to the created `kind`
cluster, and uses `helm` to deploy everything as defined in
`./dev/helm-local.yml` and `./dev/helm-local.dev.yml` (that latter file
is `.gitignored` and specific to how _you_ want your setup to look like.
Hot reloading works as before. This is the _start_ of #2381. (I've
marked these `make` commands as beta, because they've not yet been
thoroughly tested for local development).
- modifies the `helm` chart to add the concept of `oncall.devMode`,
`ui`, and ability to run oncall w/ sqlite
- `oncall.devMode` will essentially just add `volumes` and
`volumeMounts` to the various engine/migrate containers +
- `ui.enabled` + `ui.env` - create a ui container (which is needed for
hot reloading locally)
- `sqlite` - this was useful for the e2e test environments where Github
runner resources are scarce. Running `mariadb` eats up precious
resources, instead lets just use sqlite here
- fixes an issue that caused sporadic HTTP 502s from the grafana
plugin-proxy, which led to flaky tests. See [this
comment](https://github.com/grafana/oncall/pull/2751/files#diff-09040e8df192699b9c5742110ebbe8d9d5c3938cb156cc1cb99fa1c3fdee4fefR72-R77)
for more context + a link to a relevant Slack conversation. **tldr;**
there is a bug with the Grafana plugin proxy in Grafana >= v10.0.3.
Let's stop using the `latest`/`main` docker tags in our test and pin to
`10.0.2` for now
- ~~re-enables the e2e test which validates a phone number via SMS, and
asserts that we can receive an alert escalation via SMS (new Mailslurp
API Key has been added as a repo secret)~~ update: this is still blocked
by procurement, will be done in a future PR
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
There are the following tests added:
- admin is allowed to edit other profiles
- editor is not allowed to edit other profiles
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/1586
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
# What this PR does
Lays ground work for #1586. Adds three new fixtures, `adminRolePage`,
`editorRolePage`, and `viewerRolePage`. These fixtures can be easily
accessed in a `test` context and allow the test to be run as a user
authenticated with one of these Grafana basic roles.
The bulk of the changes in the PR are to the "global setup" step. There
is a bit of logic + communication with the Grafana instance's API, in
order to create all the necessary authentication credentials.
Lastly, adds the first basic role authorization test, asserting that
Admin/Editors can view the list of OnCall users, whereas Viewers cannot.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
In the e2e tests, await the grafana instance if it is currently
down/unavailable. This is mostly useful for when the tests are run
against a hosted grafana instance and it is possible that the instance
is paused ([example
build](https://drone.grafana.net/grafana/oncall-private/5735/1/4)).
https://www.loom.com/share/35eb49035d454ac6ba306cddfe63f255
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
Occasionally, the Playwright global setup step (which authenticates w/
the Grafana API + configures the plugin) would fail, leading to the CI
job to instantly fail (playwright doesn't retry global setup if it
fails).
My current hypothesis as to why this is happening is because the
`oncall-engine` and `oncall-celery` pods aren't _actually_ ready in
these cases based on the way the `jupyterhub/action-k8s-await-workloads`
action await k8s workloads:
<img width="1076" alt="Screenshot 2023-05-23 at 18 24 36"
src="https://github.com/grafana/oncall/assets/9406895/68d8d2d9-4274-4749-8788-e0a9a3dbad83">
By using the `kubectl rollout status deployment/<deployment-name>
--timeout=300s` instead, we can be sure that these pods are _actually_
ready to receive traffic before we start the tests.
```bash
❯ kubectl rollout status --help
Show the status of the rollout.
By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for
the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout
status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled
over by another revision, use --revision=N where N is the revision you need to watch for.
```
Lastly, even despite this, sometimes the `POST
/api/internal/v1/plugin/sync` endpoint will return HTTP 500 ([example
logs](https://github.com/grafana/oncall/actions/runs/5062712137/jobs/9088529416#step:19:2536)
from failed CI job). In this case, let's setup the Playwright global
setup to retry 3 times.
#1692 is still open. This PR is not an ideal approach, but it's a quick
win while we wait for that issue to be resolved.
By retrying failing tests up to 3 times, we _should_ be fine to
re-enable these on CI. If a test is failing > 3 times, there's likely a
legitimate issue occuring.
This PR cuts GitHub Action build times from 14-15 minutes, down to just
under 7 minutes. It does this by:
- caching `grafana-plugins/node_modules` and `pip` dependencies based on
their respective dependency files (eg. `requirements.txt` &
`yarn.lock`). This step alone saves ~3 minutes.
- get rid of the "build-engine-docker-image" and
"backend-integration-tests" jobs in the old "Integration Tests"
workflow. This was split out this way so that we could build the backend
docker image once, upload the artifact, and then reuse it across the
backend and e2e tests. We no longer need these backend integration tests
because we are testing the same thing in the e2e tests. This saves ~45
seconds of having to upload the image artifact.
- few improvements within the integration tests themselves:
- move plugin configuration to the `globalSetup.ts`. This means that
every test does not need to check if the plugin has been configured
because it is done once before all the tests are run.
- cache the plugin frontend build. If your commit doesn't change
anything to `grafana-plugin/src` or `grafana-plugin/yarn.lock` it should
be safe to reuse a previously built/cached version of the plugin
frontend. This saves ~3 minutes
- cache playwright binaries/dependencies. Only re-install them if the
version of `@playwright/test` in `grafana-plugin/yarn.lock` changes.
This saves ~3 minutes.
**Other things to mention**
Once we refactor the `GSelect` component to not call the `onChange`
callback on every keyDown event (#1628), this should allow us to
parallelize the integration tests, and cut the time required to execute
the tests themselves in half
# What this PR does
This PR makes the Grafana login portion of the e2e tests much
faster/more reliable.
Currently we use CSS selectors to go to the login form, input the
username/password, and proceed as such. This PR refactors to instead
make a call to `POST ${GRAFANA_API_URL}/login` and then stores that
authentication state that is then reused by subsequent browsers. This
was inspired by [how the Incident team does their playwright
authentication](https://github.com/grafana/incident/blob/main/plugin/e2e/global-setup.ts)
+ the recommendation from the [Playwright
docs](https://playwright.dev/docs/auth#basic-shared-account-in-all-tests)
## Which issue(s) this PR fixes
Slow/flaky Grafana login flow
## Checklist
- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)
**What this PR does**:
Adds our first UI integration test using
[Playwright](https://playwright.dev/) and runs the test on CI. Right now
the test:
- logs into Grafana
- configures the plugin (if it isn't already)
- creates an OnCall schedule, where the current user will be OnCall
- creates an escalation chain to notify based on the newly created
OnCall schedule
- creates a webhook integration, attached to the created escalation
chain
- sends a demo alert for the new integration
- goes to the alert groups page and validates that the escalation step
to alert the OnCall user actually happened
Currently the Playwright tests are run against the 3 default headless
browsers, chromium, Firefox, and webkit. The CI job that runs these
tests is run as a matrix against 3 tagged versions of `grafana`; `main`,
`latest`, and `9.2.6`.
Secondly, it adds most of the logic for a second test which:
- logs into Grafana
- configures the plugin (if it isn't already)
- goes to the user's settings, verifies their phone number (using a tool
called [MailSlurp](https://www.mailslurp.com/))
- configures the current user's default escalation policy to send alerts
via SMS
- creates an escalation policy and configures it to send alerts to our
current user
- creates an integration and assigns the created escalation policy
- triggers a test alert + verifies that we receive the SMS alert text
(again, using MailSlurp)
**Which issue(s) this PR fixes**:
Closes#873
**Checklist**
- [x] Tests updated
- [ ] Documentation added (N/A)
- [ ] `CHANGELOG.md` updated (N/A)