Grafana OnCall engine fork — self-hosted on-call scheduler and alert router
Find a file
Joey Orlando eefe7be56a
e2e tests on CI - actually await k8s resources to be ready before starting tests (#1997)
Occasionally, the Playwright global setup step (which authenticates w/
the Grafana API + configures the plugin) would fail, leading to the CI
job to instantly fail (playwright doesn't retry global setup if it
fails).

My current hypothesis as to why this is happening is because the
`oncall-engine` and `oncall-celery` pods aren't _actually_ ready in
these cases based on the way the `jupyterhub/action-k8s-await-workloads`
action await k8s workloads:

<img width="1076" alt="Screenshot 2023-05-23 at 18 24 36"
src="https://github.com/grafana/oncall/assets/9406895/68d8d2d9-4274-4749-8788-e0a9a3dbad83">


By using the `kubectl rollout status deployment/<deployment-name>
--timeout=300s` instead, we can be sure that these pods are _actually_
ready to receive traffic before we start the tests.
```bash
❯ kubectl rollout status --help
Show the status of the rollout.

 By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for
the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout
status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled
over by another revision, use --revision=N where N is the revision you need to watch for.
```

Lastly, even despite this, sometimes the `POST
/api/internal/v1/plugin/sync` endpoint will return HTTP 500 ([example
logs](https://github.com/grafana/oncall/actions/runs/5062712137/jobs/9088529416#step:19:2536)
from failed CI job). In this case, let's setup the Playwright global
setup to retry 3 times.
2023-05-23 20:20:46 -04:00
.github e2e tests on CI - actually await k8s resources to be ready before starting tests (#1997) 2023-05-23 20:20:46 -04:00
dev Feat(Dev): Improve Building of Grafana Plugin in Development Env + update node version (#1890) 2023-05-17 16:12:51 -04:00
docs Slack: use user_profile_changed event instead of user_change (#1938) 2023-05-15 16:32:06 +00:00
engine Fix MultipleObjectsReturned error on webhook endpoints (#1996) 2023-05-23 16:23:06 +00:00
examples/terraform Terraform examples 2022-08-11 14:32:39 +05:00
grafana-plugin e2e tests on CI - actually await k8s resources to be ready before starting tests (#1997) 2023-05-23 20:20:46 -04:00
helm Merge main to dev (#1974) 2023-05-18 15:58:26 -03:00
tools Bump requests from 2.27.1 to 2.31.0 in /tools/pagerduty-migrator (#1985) 2023-05-23 12:41:01 +00:00
.drone.yml Feat(Dev): Improve Building of Grafana Plugin in Development Env + update node version (#1890) 2023-05-17 16:12:51 -04:00
.gitignore ignore .http file extensions (#1762) 2023-04-17 10:52:03 +02:00
.markdownlint.json add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
.markdownlintignore Add tracing support 2022-12-19 17:15:06 +08:00
.pre-commit-config.yaml fix failing lint github actions job due to issue w/ isort version (#1249) 2023-01-30 11:43:15 +01:00
CHANGELOG.md Fix MultipleObjectsReturned error on webhook endpoints (#1996) 2023-05-23 16:23:06 +00:00
CODE_OF_CONDUCT.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
docker-compose-developer.yml Bring back FCM_PROJECT_ID env variable (#1980) 2023-05-22 14:32:21 +01:00
docker-compose-mysql-rabbitmq.yml bump mysql from 5.7 to 8.0.32 (#1790) 2023-05-10 17:53:27 +00:00
docker-compose.yml Add "Notifications Receiver" RBAC role (#1853) 2023-05-02 12:19:34 +00:00
GOVERNANCE.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
LICENSE World, meet OnCall! 2022-06-03 08:09:47 -06:00
LICENSING.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
MAINTAINERS.md add precommit rules for markdown/json files (#915) 2022-12-01 14:26:54 +01:00
Makefile Feat(Dev): Improve Building of Grafana Plugin in Development Env + update node version (#1890) 2023-05-17 16:12:51 -04:00
README.md Make screenshots bigger in README.md (#1799) 2023-04-20 13:26:12 +08:00
screenshot.png Merge dev to main (#54) 2022-06-13 16:39:58 -06:00
screenshot_mobile.png Readme updates 2023-04-11 15:43:52 +03:00

Grafana OnCall

Latest Release License Docker Pulls Slack Discussion Build Status

Developer-friendly incident response with brilliant Slack integration.

  • Collect and analyze alerts from multiple monitoring systems
  • On-call rotations based on schedules
  • Automatic escalations
  • Phone calls, SMS, Slack, Telegram notifications

Getting Started

We prepared multiple environments:

  1. Download docker-compose.yml:

    curl -fsSL https://raw.githubusercontent.com/grafana/oncall/dev/docker-compose.yml -o docker-compose.yml
    
  2. Set variables:

    echo "DOMAIN=http://localhost:8080
    COMPOSE_PROFILES=with_grafana  # Remove this line if you want to use existing grafana
    SECRET_KEY=my_random_secret_must_be_more_than_32_characters_long" > .env
    
  3. Launch services:

    docker-compose pull && docker-compose up -d
    
  4. Go to OnCall Plugin Configuration, using log in credentials as defined above: admin/admin (or find OnCall plugin in configuration->plugins) and connect OnCall plugin with OnCall backend:

    OnCall backend URL: http://engine:8080
    
  5. Enjoy! Check our OSS docs if you want to set up Slack, Telegram, Twilio or SMS/calls through Grafana Cloud.

Update version

To update your Grafana OnCall hobby environment:

# Update Docker image
docker-compose pull engine

# Re-deploy
docker-compose up -d

After updating the engine, you'll also need to click the "Update" button on the plugin version page. See Grafana docs for more info on updating Grafana plugins.

Join community

Stargazers over time

Stargazers over time

Further Reading