diff --git a/CHANGELOG.md b/CHANGELOG.md index 5eabff18..e8b04c0d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,11 +5,19 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v1.3.21 (2023-08-01) + +### Added + +- [Helm] Add `extraContainers` for engine, celery and migrate-job pods to define sidecars by @lu1as ([#2650](https://github.com/grafana/oncall/pull/2650)) +– Rework of AlertManager integration ([#2643](https://github.com/grafana/oncall/pull/2643)) + ## v1.3.20 (2023-07-31) ### Added - Add filter_shift_swaps endpoint to schedules API ([#2684](https://github.com/grafana/oncall/pull/2684)) +- Add shifts endpoint to shift swap API ([#2697](https://github.com/grafana/oncall/pull/2697/)) ### Fixed diff --git a/docs/sources/get-started/_index.md b/docs/sources/get-started/_index.md index 4040b677..fb0dc82c 100644 --- a/docs/sources/get-started/_index.md +++ b/docs/sources/get-started/_index.md @@ -12,7 +12,10 @@ weight: 300 # Get started with Grafana OnCall -Grafana OnCall was built to help DevOps and SRE teams improve their on-call management process and resolve incidents faster. With OnCall, users can create and manage on-call schedules, automate escalations, and monitor incident response from a central view, right within the Grafana UI. Teams no longer have to manage separate alerts from Grafana, Prometheus, and Alertmanager, lowering the risk of missing an important update and limiting the time spent receiving and responding to notifications. +Grafana OnCall was built to help DevOps and SRE teams improve their on-call management process and resolve incidents faster. With OnCall, +users can create and manage on-call schedules, automate escalations, and monitor incident response from a central view, right within +the Grafana UI. Teams no longer have to manage separate alerts from Grafana, Prometheus, and Alertmanager, lowering the risk of +missing an important update and limiting the time spent receiving and responding to notifications. With a centralized view of all your alerts and alert groups, automated escalations and grouping, and on-call scheduling, Grafana OnCall helps ensure that alert notifications reach the right people, at the right time using the right notification method. diff --git a/docs/sources/integrations/alertmanager/index.md b/docs/sources/integrations/alertmanager/index.md index b349b3f5..cae15096 100644 --- a/docs/sources/integrations/alertmanager/index.md +++ b/docs/sources/integrations/alertmanager/index.md @@ -15,7 +15,13 @@ weight: 300 # Alertmanager integration for Grafana OnCall -> You must have the [role of Admin][user-and-team-management] to be able to create integrations in Grafana OnCall. +> ⚠️ A note about **(Legacy)** integrations: +> We are changing internal behaviour of AlertManager integration. +> Integrations that were created before version 1.3.21 are marked as **(Legacy)**. +> These integrations are still receiving and escalating alerts but will be automatically migrated after 1 November 2023. +>

+> To ensure a smooth transition you can migrate legacy integrations by yourself now. +> [Here][migration] you can read more about changes and migration process. The Alertmanager integration handles alerts from [Prometheus Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/). This integration is the recommended way to send alerts from Prometheus deployed in your infrastructure, to Grafana OnCall. @@ -28,17 +34,16 @@ This integration is the recommended way to send alerts from Prometheus deployed 2. Select **Alertmanager Prometheus** from the list of available integrations. 3. Enter a name and description for the integration, click **Create** 4. A new page will open with the integration details. Copy the **OnCall Integration URL** from **HTTP Endpoint** section. -You will need it when configuring Alertmanager. - - + You will need it when configuring Alertmanager. ## Configuring Alertmanager to Send Alerts to Grafana OnCall 1. Add a new [Webhook](https://prometheus.io/docs/alerting/latest/configuration/#webhook_config) receiver to `receivers` -section of your Alertmanager configuration + section of your Alertmanager configuration 2. Set `url` to the **OnCall Integration URL** from previous section + - **Note:** The url has a trailing slash that is required for it to work properly. 3. Set `send_resolved` to `true`, so Grafana OnCall can autoresolve alert groups when they are resolved in Alertmanager -4. It is recommended to set `max_alerts` to less than `300` to avoid rate-limiting issues +4. It is recommended to set `max_alerts` to less than `100` to avoid requests that are too large. 5. Use this receiver in your route configuration Here is the example of final configuration: @@ -53,7 +58,7 @@ receivers: webhook_configs: - url: send_resolved: true - max_alerts: 300 + max_alerts: 100 ``` ## Complete the Integration Configuration @@ -71,7 +76,7 @@ Grafana OnCall will notify you about that. 1. Go to **Integration Page**, click on three dots on top right, click **Heartbeat settings** 2. Copy **OnCall Heartbeat URL**, you will need it when configuring Alertmanager 3. Set up **Heartbeat Interval**, time period after which Grafana OnCall will start a new alert group if it -doesn't receive a heartbeat request + doesn't receive a heartbeat request ### Configuring Alertmanager to send heartbeats to Grafana OnCall Heartbeat @@ -80,43 +85,99 @@ generator to `prometheus.yaml`. It will always return true and act like always f Grafana OnCall once in a given period of time: ```yaml - groups: - - name: meta - rules: - - alert: heartbeat - expr: vector(1) - labels: - severity: none - annotations: - description: This is a heartbeat alert for Grafana OnCall - summary: Heartbeat for Grafana OnCall +groups: + - name: meta + rules: + - alert: heartbeat + expr: vector(1) + labels: + severity: none + annotations: + description: This is a heartbeat alert for Grafana OnCall + summary: Heartbeat for Grafana OnCall ``` Add receiver configuration to `prometheus.yaml` with the **OnCall Heartbeat URL**: ```yaml - - ... - route: - ... - routes: - - match: - alertname: heartbeat - receiver: 'grafana-oncall-heartbeat' - group_wait: 0s - group_interval: 1m - repeat_interval: 50s - receivers: - - name: 'grafana-oncall-heartbeat' - webhook_configs: - - url: https://oncall-dev-us-central-0.grafana.net/oncall/integrations/v1/alertmanager/1234567890/heartbeat/ - send_resolved: false + ... + route: + ... + routes: + - match: + alertname: heartbeat + receiver: 'grafana-oncall-heartbeat' + group_wait: 0s + group_interval: 1m + repeat_interval: 50s + receivers: + - name: 'grafana-oncall-heartbeat' + webhook_configs: + - url: https://oncall-dev-us-central-0.grafana.net/oncall/integrations/v1/alertmanager/1234567890/heartbeat/ + send_resolved: false ``` +## Migrating from Legacy Integration + +Before we were using each alert from AlertManager group as a separate payload: + +```json +{ + "labels": { + "severity": "critical", + "alertname": "InstanceDown" + }, + "annotations": { + "title": "Instance localhost:8081 down", + "description": "Node has been down for more than 1 minute" + }, + ... +} +``` + +This behaviour was leading to mismatch in alert state between OnCall and AlertManager and draining of rate-limits, +since each AlertManager alert was counted separately. + +We decided to change this behaviour to respect AlertManager grouping by using AlertManager group as one payload. + +```json +{ + "alerts": [...], + "groupLabels": { + "alertname": "InstanceDown" + }, + "commonLabels": { + "job": "node", + "alertname": "InstanceDown" + }, + "commonAnnotations": { + "description": "Node has been down for more than 1 minute" + }, + "groupKey": "{}:{alertname=\"InstanceDown\"}", + ... +} +``` + +You can read more about AlertManager Data model [here](https://prometheus.io/docs/alerting/latest/notifications/#data). + +### How to migrate + +> Integration URL will stay the same, so no need to change AlertManager or Grafana Alerting configuration. +> Integration templates will be reset to suit new payload. +> It is needed to adjust routes manually to new payload. + +1. Go to **Integration Page**, click on three dots on top right, click **Migrate** +2. Confirmation Modal will be shown, read it carefully and proceed with migration. +3. Send demo alert to make sure everything went well. +4. Adjust routes to the new shape of payload. You can use payload of the demo alert from previous step as an example. + {{% docs/reference %}} [user-and-team-management]: "/docs/oncall/ -> /docs/oncall//user-and-team-management" [user-and-team-management]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/oncall/user-and-team-management" [complete-the-integration-configuration]: "/docs/oncall/ -> /docs/oncall//integrations#complete-the-integration-configuration" [complete-the-integration-configuration]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/oncall/integrations#complete-the-integration-configuration" + +[migration]: "/docs/oncall/ -> /docs/oncall//integrations/alertmanager#migrating-from-legacy-integration" +[migration]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/oncall/integrations/alertmanager#migrating-from-legacy-integration" {{% /docs/reference %}} diff --git a/docs/sources/integrations/grafana-alerting/index.md b/docs/sources/integrations/grafana-alerting/index.md index a2493aba..42b3baee 100644 --- a/docs/sources/integrations/grafana-alerting/index.md +++ b/docs/sources/integrations/grafana-alerting/index.md @@ -14,6 +14,14 @@ weight: 100 # Grafana Alerting integration for Grafana OnCall +> ⚠️ A note about **(Legacy)** integrations: +> We are changing internal behaviour of Grafana Alerting integration. +> Integrations that were created before version 1.3.21 are marked as **(Legacy)**. +> These integrations are still receiving and escalating alerts but will be automatically migrated after 1 November 2023. +>

+> To ensure a smooth transition you can migrate them by yourself now. +> [Here][migration] you can read more about changes and migration process. + Grafana Alerting for Grafana OnCall can be set up using two methods: - Grafana Alerting: Grafana OnCall is connected to the same Grafana instance being used to manage Grafana OnCall. @@ -53,11 +61,9 @@ Connect Grafana OnCall with alerts coming from a Grafana instance that is differ OnCall is being managed: 1. In Grafana OnCall, navigate to the **Integrations** tab and select **New Integration to receive alerts**. -2. Select the **Grafana (Other Grafana)** tile. -3. Follow the configuration steps that display in the **How to connect** window to retrieve your unique integration URL - and complete any necessary configurations. -4. Determine the escalation chain for the new integration by either selecting an existing one or by creating a - new escalation chain. +2. Select the **Alertmanager** tile. +3. Enter a name and description for the integration, click Create +4. A new page will open with the integration details. Copy the OnCall Integration URL from HTTP Endpoint section. 5. Go to the other Grafana instance to connect to Grafana OnCall and navigate to **Alerting > Contact Points**. 6. Select **New Contact Point**. 7. Choose the contact point type `webhook`, then paste the URL generated in step 3 into the URL field. @@ -66,3 +72,61 @@ OnCall is being managed: > see [Contact points in Grafana Alerting](https://grafana.com/docs/grafana/latest/alerting/unified-alerting/contact-points/). 8. Click the **Edit** (pencil) icon, then click **Test**. This will send a test alert to Grafana OnCall. + +## Migrating from Legacy Integration + +Before we were using each alert from Grafana Alerting group as a separate payload: + +```json +{ + "labels": { + "severity": "critical", + "alertname": "InstanceDown" + }, + "annotations": { + "title": "Instance localhost:8081 down", + "description": "Node has been down for more than 1 minute" + }, + ... +} +``` + +This behaviour was leading to mismatch in alert state between OnCall and Grafana Alerting and draining of rate-limits, +since each Grafana Alerting alert was counted separately. + +We decided to change this behaviour to respect Grafana Alerting grouping by using AlertManager group as one payload. + +```json +{ + "alerts": [...], + "groupLabels": { + "alertname": "InstanceDown" + }, + "commonLabels": { + "job": "node", + "alertname": "InstanceDown" + }, + "commonAnnotations": { + "description": "Node has been down for more than 1 minute" + }, + "groupKey": "{}:{alertname=\"InstanceDown\"}", + ... +} +``` + +You can read more about AlertManager Data model [here](https://prometheus.io/docs/alerting/latest/notifications/#data). + +### How to migrate + +> Integration URL will stay the same, so no need to make changes on Grafana Alerting side. +> Integration templates will be reset to suit new payload. +> It is needed to adjust routes manually to new payload. + +1. Go to **Integration Page**, click on three dots on top right, click **Migrate** +2. Confirmation Modal will be shown, read it carefully and proceed with migration. +3. Adjust routes to the new shape of payload. + +{{% docs/reference %}} +[migration]: "/docs/oncall/ -> /docs/oncall//integrations/grafana-alerting#migrating-from-legacy-integration" +[migration]: "/docs/grafana-cloud/ -> /docs/grafana-cloud/alerting-and-irm/oncall/integrations/grafana-alerting#migrating-from-legacy-integration" +{{% /docs/reference %}} diff --git a/docs/sources/integrations/zabbix/index.md b/docs/sources/integrations/zabbix/index.md index 3f1d7f52..f281e606 100644 --- a/docs/sources/integrations/zabbix/index.md +++ b/docs/sources/integrations/zabbix/index.md @@ -67,7 +67,8 @@ Within Zabbix web interface, do the following: 1. In a browser, open localhost:80. 2. Navigate to **Adminitstration > Media Types > Create Media Type**. - + + 3. Create a Media Type with the following fields. @@ -87,13 +88,16 @@ To send alerts to Grafana OnCall, the {ALERT.SEND_TO} value must be set in the [ 1. In the web UI, navigate to **Administration > Users** and open the **user properties** form. 2. In the **Media** tab, click **Add** and copy the link from Grafana OnCall in the `Send to` field. - + + 3. Click **Test** in the last column to send a test alert to Grafana OnCall. - + + 4. Specify **Send to** OnCall using the unique integration URL from the above step in the testing window that opens. Create a test message with a body and optional subject and click **Test**. +