centralcloud/oncall-engine

Author	SHA1	Message	Date
Dominik Broj	92fa509d22	Brojd/improve e2e tests dx (#3516 ) # What this PR does - introduce e2e tests in Tilt - support e2e tests commands in Makefile - stabilize local setup ## Which issue(s) this PR fixes https://github.com/grafana/oncall/issues/3492 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-12-15 08:58:25 +00:00
Vadim Stepanov	9e889403f2	Alert group payload labels (#3434 ) https://github.com/grafana/oncall/pull/3385 + handle null values	2023-11-27 17:53:54 +00:00
Vadim Stepanov	e09422a07d	Revert "Alert group payload labels" (#3433 ) Reverts grafana/oncall#3385	2023-11-27 17:28:34 +00:00
Vadim Stepanov	5fac6aeac5	Alert group payload labels (#3385 ) # What this PR does Adds an ability to extract labels from alert group payload. See [demo](https://www.loom.com/share/cf2b746eea974547b76f44298e32a54f?sid=67ed1e58-40ed-4136-a201-6482fb7773d3). ## Which issue(s) this PR fixes https://github.com/grafana/oncall-private/issues/2304 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) --------- Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com> Co-authored-by: Rares Mardare <rares.mardare@grafana.com>	2023-11-27 16:55:31 +00:00
Dominik Broj	8f13e312f7	Use chromium only in PRs e2e tests (#3374 ) # What this PR does In PR pipelines install dependencies and run e2e tests only in Chromium. In daily e2e workflow use Chromium, Firefox and Webkit. ## Which issue(s) this PR fixes ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-11-17 13:22:34 +00:00
Dominik Broj	45ae04088f	stabilize e2e tests (#3349 ) # What this PR does Stabilize e2e tests by: - improve usage of locators - fix unreliable selectors - prevent parallelism within the same test file Additionally: - configure eslint for e2e tests and fix existing errors/warnings - bump Playwright version to latest stable ## Which issue(s) this PR fixes https://github.com/grafana/oncall/issues/3217 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-11-17 10:07:12 +00:00
Joey Orlando	3783aeab64	fix a few flaky e2e tests + allow running project locally via k8s/helm (#2751 ) # What this PR does - updates the GitHub Actions workflow to move the e2e tests into a "[reusable workflow](https://docs.github.com/en/actions/using-workflows/reusing-workflows#creating-a-reusable-workflow)" which are run in two scenarios: - all tests _except_ those annotated as `@expensive` are run against `grafana/grafana:latest` on all feature branches - all tests _including_ `@expensive` tests are run on weekdays @ 07h00 UTC, against a matrix of 6 grafana versions. Results of these builds will be posted to `#irm-amixr-flux` Slack channel. - local development will now be: ```bash make build-dev-images init-k8s start-k8s ``` - `build-dev-images` - builds the engine and UI docker images (only need to run first time) - `init-k8s` - creates a `kind` cluster and loads the two Docker images onto the cluster nodes (only need to run first time) - `start-k8s` - switches `kubectl` context to the created `kind` cluster, and uses `helm` to deploy everything as defined in `./dev/helm-local.yml` and `./dev/helm-local.dev.yml` (that latter file is `.gitignored` and specific to how _you_ want your setup to look like. Hot reloading works as before. This is the _start_ of #2381. (I've marked these `make` commands as beta, because they've not yet been thoroughly tested for local development). - modifies the `helm` chart to add the concept of `oncall.devMode`, `ui`, and ability to run oncall w/ sqlite - `oncall.devMode` will essentially just add `volumes` and `volumeMounts` to the various engine/migrate containers + - `ui.enabled` + `ui.env` - create a ui container (which is needed for hot reloading locally) - `sqlite` - this was useful for the e2e test environments where Github runner resources are scarce. Running `mariadb` eats up precious resources, instead lets just use sqlite here - fixes an issue that caused sporadic HTTP 502s from the grafana plugin-proxy, which led to flaky tests. See [this comment](https://github.com/grafana/oncall/pull/2751/files#diff-09040e8df192699b9c5742110ebbe8d9d5c3938cb156cc1cb99fa1c3fdee4fefR72-R77) for more context + a link to a relevant Slack conversation. tldr; there is a bug with the Grafana plugin proxy in Grafana >= v10.0.3. Let's stop using the `latest`/`main` docker tags in our test and pin to `10.0.2` for now - ~~re-enables the e2e test which validates a phone number via SMS, and asserts that we can receive an alert escalation via SMS (new Mailslurp API Key has been added as a repo secret)~~ update: this is still blocked by procurement, will be done in a future PR ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-08-22 19:03:29 +02:00
Maxim Mordasov	36f9851003	add a couple of tests for users screen (#2612 ) # What this PR does There are the following tests added: - admin is allowed to edit other profiles - editor is not allowed to edit other profiles ## Which issue(s) this PR fixes https://github.com/grafana/oncall/issues/1586 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) --------- Co-authored-by: Rares Mardare <rares.mardare@grafana.com>	2023-08-02 15:42:48 +03:00
Joey Orlando	d24dc4b630	remove organization maintenance mode + fix integration maintenance mode (#2511 )	2023-07-12 16:41:44 -04:00
Maxim Mordasov	b951b6b6bd	add debounce for GSelect and RemoteSelect (#2466 ) # What this PR does Fix performance Issue in GSelect component when searching ## Which issue(s) this PR fixes https://github.com/grafana/oncall/issues/1628 ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) --------- Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com> Co-authored-by: Joey Orlando <joey.orlando@grafana.com>	2023-07-11 13:51:22 +00:00
Joey Orlando	f5495ed702	add first multi-role e2e tests (#2417 ) # What this PR does Lays ground work for #1586. Adds three new fixtures, `adminRolePage`, `editorRolePage`, and `viewerRolePage`. These fixtures can be easily accessed in a `test` context and allow the test to be run as a user authenticated with one of these Grafana basic roles. The bulk of the changes in the PR are to the "global setup" step. There is a bit of logic + communication with the Grafana instance's API, in order to create all the necessary authentication credentials. Lastly, adds the first basic role authorization test, asserting that Admin/Editors can view the list of OnCall users, whereas Viewers cannot. ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)	2023-07-04 09:19:14 +00:00
Joey Orlando	60ef4450e6	in the e2e tests, await the grafana instance if it is currently down/unavailable (#2366 ) # What this PR does In the e2e tests, await the grafana instance if it is currently down/unavailable. This is mostly useful for when the tests are run against a hosted grafana instance and it is possible that the instance is paused ([example build](https://drone.grafana.net/grafana/oncall-private/5735/1/4)). https://www.loom.com/share/35eb49035d454ac6ba306cddfe63f255 ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated (N/A) - [ ] Documentation added (or `pr:no public docs` PR label added if not required) (N/A) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) (N/A)	2023-06-29 06:48:23 -04:00
Joey Orlando	eefe7be56a	e2e tests on CI - actually await k8s resources to be ready before starting tests (#1997 ) Occasionally, the Playwright global setup step (which authenticates w/ the Grafana API + configures the plugin) would fail, leading to the CI job to instantly fail (playwright doesn't retry global setup if it fails). My current hypothesis as to why this is happening is because the `oncall-engine` and `oncall-celery` pods aren't _actually_ ready in these cases based on the way the `jupyterhub/action-k8s-await-workloads` action await k8s workloads: <img width="1076" alt="Screenshot 2023-05-23 at 18 24 36" src="https://github.com/grafana/oncall/assets/9406895/68d8d2d9-4274-4749-8788-e0a9a3dbad83"> By using the `kubectl rollout status deployment/<deployment-name> --timeout=300s` instead, we can be sure that these pods are _actually_ ready to receive traffic before we start the tests. ```bash ❯ kubectl rollout status --help Show the status of the rollout. By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled over by another revision, use --revision=N where N is the revision you need to watch for. ``` Lastly, even despite this, sometimes the `POST /api/internal/v1/plugin/sync` endpoint will return HTTP 500 ([example logs](https://github.com/grafana/oncall/actions/runs/5062712137/jobs/9088529416#step:19:2536) from failed CI job). In this case, let's setup the Playwright global setup to retry 3 times.	2023-05-23 20:20:46 -04:00
Joey Orlando	c793e550c6	re-enable e2e UI tests on CI (#1961 ) #1692 is still open. This PR is not an ideal approach, but it's a quick win while we wait for that issue to be resolved. By retrying failing tests up to 3 times, we _should_ be fine to re-enable these on CI. If a test is failing > 3 times, there's likely a legitimate issue occuring.	2023-05-23 17:26:12 -04:00
Maxim Mordasov	11d62245e9	fix safari scroll (#1663 ) # What this PR does Fix scroll in Safari ## Which issue(s) this PR fixes https://github.com/grafana/oncall/issues/415 ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated - [ ] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) --------- Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com> Co-authored-by: Joey Orlando <joey.orlando@grafana.com>	2023-03-30 18:17:58 +00:00
Joey Orlando	0eb4bd95e6	Revert "Revert "speed up ci builds from 15 to <7 minutes"" (#1643 ) Reverts grafana/oncall#1639	2023-03-28 09:34:03 +02:00
Innokentii Konstantinov	cbb06492ae	Revert "speed up ci builds from 15 to <7 minutes" (#1639 ) Reverted due to stuck ci	2023-03-28 13:01:49 +08:00
Joey Orlando	23cd736c30	speed up ci builds from 15 to <7 minutes (#1615 ) This PR cuts GitHub Action build times from 14-15 minutes, down to just under 7 minutes. It does this by: - caching `grafana-plugins/node_modules` and `pip` dependencies based on their respective dependency files (eg. `requirements.txt` & `yarn.lock`). This step alone saves ~3 minutes. - get rid of the "build-engine-docker-image" and "backend-integration-tests" jobs in the old "Integration Tests" workflow. This was split out this way so that we could build the backend docker image once, upload the artifact, and then reuse it across the backend and e2e tests. We no longer need these backend integration tests because we are testing the same thing in the e2e tests. This saves ~45 seconds of having to upload the image artifact. - few improvements within the integration tests themselves: - move plugin configuration to the `globalSetup.ts`. This means that every test does not need to check if the plugin has been configured because it is done once before all the tests are run. - cache the plugin frontend build. If your commit doesn't change anything to `grafana-plugin/src` or `grafana-plugin/yarn.lock` it should be safe to reuse a previously built/cached version of the plugin frontend. This saves ~3 minutes - cache playwright binaries/dependencies. Only re-install them if the version of `@playwright/test` in `grafana-plugin/yarn.lock` changes. This saves ~3 minutes. Other things to mention Once we refactor the `GSelect` component to not call the `onChange` callback on every keyDown event (#1628), this should allow us to parallelize the integration tests, and cut the time required to execute the tests themselves in half	2023-03-27 18:07:19 +02:00
Joey Orlando	b6615c087f	improve e2e tests authentication flow (#1470 ) # What this PR does This PR makes the Grafana login portion of the e2e tests much faster/more reliable. Currently we use CSS selectors to go to the login form, input the username/password, and proceed as such. This PR refactors to instead make a call to `POST ${GRAFANA_API_URL}/login` and then stores that authentication state that is then reused by subsequent browsers. This was inspired by [how the Incident team does their playwright authentication](https://github.com/grafana/incident/blob/main/plugin/e2e/global-setup.ts) + the recommendation from the [Playwright docs](https://playwright.dev/docs/auth#basic-shared-account-in-all-tests) ## Which issue(s) this PR fixes Slow/flaky Grafana login flow ## Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [ ] `CHANGELOG.md` updated (N/A)	2023-03-10 06:45:15 +01:00
Joey Orlando	8f22b2fd74	first UI integration test - phone verification + receive SMS alert flow (#900 ) What this PR does: Adds our first UI integration test using [Playwright](https://playwright.dev/) and runs the test on CI. Right now the test: - logs into Grafana - configures the plugin (if it isn't already) - creates an OnCall schedule, where the current user will be OnCall - creates an escalation chain to notify based on the newly created OnCall schedule - creates a webhook integration, attached to the created escalation chain - sends a demo alert for the new integration - goes to the alert groups page and validates that the escalation step to alert the OnCall user actually happened Currently the Playwright tests are run against the 3 default headless browsers, chromium, Firefox, and webkit. The CI job that runs these tests is run as a matrix against 3 tagged versions of `grafana`; `main`, `latest`, and `9.2.6`. Secondly, it adds most of the logic for a second test which: - logs into Grafana - configures the plugin (if it isn't already) - goes to the user's settings, verifies their phone number (using a tool called [MailSlurp](https://www.mailslurp.com/)) - configures the current user's default escalation policy to send alerts via SMS - creates an escalation policy and configures it to send alerts to our current user - creates an integration and assigns the created escalation policy - triggers a test alert + verifies that we receive the SMS alert text (again, using MailSlurp) Which issue(s) this PR fixes: Closes #873 Checklist - [x] Tests updated - [ ] Documentation added (N/A) - [ ] `CHANGELOG.md` updated (N/A)	2023-03-06 16:28:52 +00:00

20 commits