From a04ad7049524ceb5a5eb21b463610a6618c02d8a Mon Sep 17 00:00:00 2001 From: Yulya Artyukhina Date: Thu, 10 Oct 2024 15:29:58 +0200 Subject: [PATCH] Doc for `Declare Incident` step (#5093) Related to https://github.com/grafana/oncall-private/issues/2831 Should be merged with enabling `Declare Incident step` in cloud --------- Co-authored-by: Matias Bordese Co-authored-by: Alyssa Wada --- .../escalation-chains-and-routes/index.md | 169 ++++++++---------- .../escalation_policies.md | 5 +- 2 files changed, 82 insertions(+), 92 deletions(-) diff --git a/docs/sources/configure/escalation-chains-and-routes/index.md b/docs/sources/configure/escalation-chains-and-routes/index.md index 001e20d1..5191f70e 100644 --- a/docs/sources/configure/escalation-chains-and-routes/index.md +++ b/docs/sources/configure/escalation-chains-and-routes/index.md @@ -36,118 +36,107 @@ refs: # Escalation Chains and Routes -Often alerts from monitoring systems need to be sent to different escalation chains and messaging channels, based on their severity, or other alert content. +In Grafana OnCall, configuring proper alert routing and escalation ensures that alerts are directed to the right teams and handled promptly. + +Alerts often need to be sent to different teams or channels depending on their severity or specific alert details. +Set up routes and escalation chains to customize and automate escalation according to each teams workflows. ## Routes -Routes are used to determine which escalation chain should be used for a specific alert -group. A route's _[Routing Templates]_ -are evaluated for each alert and **the first matching route** is used to determine the -escalation chain and chatops channels. +Routes determine which escalation chain should be triggered for a specific alert group based on the details of the alert. +A route uses [Routing Templates](ref:routing-templates) to determine the escalation chain and notification channels. -> **Example:** -> -> -> * trigger escalation chain called `Database Critical` for alerts with `{{ payload.severity == "critical" and payload.service == "database" }}` in the payload -> * create a different route for alerts with the payload `{{ "synthetic-monitoring-dev-" in payload.namespace }}` and select a escalation chain called `Security`. +When an alert is received, its details are evaluated against the route's routing template, and **the first matching route** determines how the alert will be handled. -### Manage routes +**Example:** -1. Open Integration page -1. Click **Add route** button to create a new route -1. Click **Edit** button to edit `Routing Template`. The routing template must evaluate to `True` for it to apply -1. Select channels in **Publish to Chatops** section - > **Note:** If the **Publish to Chatops** section doesn't exist, connect Chatops integrations first. - > For more information, refer to [Notify people]. -1. Select **Escalation Chain** from the list -1. If **Escalation Chain** does not exist, click **Add new escalation chain** button to create a new one, it will open in a new tab. -1. Once created, **Reload list**, and select the new escalation chain -1. Click **Arrow Up** and **Arrow Down** on the right to change the order of routes -1. Click **Three dots** and **Delete Route** to delete the route +- Trigger the `Database Critical` escalation chain for alerts with `{{ payload.severity == "critical" and payload.service == "database" }}` +- Use a different route for alerts with the payload `{{ "synthetic-monitoring-dev-" in payload.namespace }}`, selecting the `Security` escalation chain. -### Routing based on labels +### Create and manage routes -> **Note:** Labels are currently available only in cloud. +To create or manage a route: -In addition, there is a `labels` variable available to your routing templates, which contains all of the labels assigned -to the Alert Group, as a `dict`. This allows you to route based on labels (or a mix of labels and/or payload based data): +1. Navigate to the **Integrations** page. +1. Click **Add route** to create a new route, or **Edit** to modify an existing one. +1. In the **Routing Template** section, define conditions that will determine which alerts this route applies to. +The template must evaluate to `True` for the route to be selected. +1. Select the appropriate escalation chain from the **Escalation Chain** dropdown. +If an escalation chain doesn’t exist, click **Add new escalation chain**, which will open a new tab for chain creation. +After creating the chain, return to the routes page and click **Reload list** to update the available options. +1. In the **Publish to ChatOps** section, select the relevant communication channels for this route (Slack, Teams, etc.). +Ensure ChatOps integrations are configured before using this feature. +1. Arrange the routes by clicking the up/down arrows to prioritize the routes as needed. The order determines which route is evaluated first. +1. To delete a route, click the three dots on the route and select **Delete Route**. -> **Example:** -> -> * `{{ labels.foo == "bar" or "hello" in labels.keys() or payload.severity == "critical" }}` +### Label-based routing + +{{< admonition type="note" >}} +This feature is available exclusively on Grafana Cloud. +{{< /admonition >}} + +You can use the labels variable in your routing templates to evaluate based on alert group labels. +This provides additional flexibility in routing alerts based on both labels and payload data. + +**Example:** + +`{{ labels.foo == "bar" or "hello" in labels.keys() or payload.severity == "critical" }}` ## Escalation Chains -Once an alert group is created and assigned to the route with escalation chain, the -escalation chain will be executed. Until user performs an action, which stops the escalation -chain (e.g. acknowledge, resolve, silence etc), the escalation chain will continue to -execute. +Escalation chains define the series of actions taken when an alert is triggered. +The chain continues until a user intervenes by acknowledging, resolving, or silencing the alert. -Users can create escalation chains to configure different type of escalation workflows. -For example, you can create a chain that will notify on-call users with high priority, and -another chain that will only send a message into a Slack channel. +You can configure different escalation chains for different workflows. +For example, one chain might notify on-call users immediately, while another sends a low-priority message to a Slack channel. -Escalation chains determine Who and When to notify. How to notify is set by the user, based on their own preferences. +### Create and manage escalation chains + +1. Navigate to the **Escalation Chains** page. +1. Click **New escalation chain** to create a new chain. +1. Enter a unique name and assign the chain to a team. +1. Click **Add escalation step** to define the steps for this chain (e.g., notifying users, waiting, escalating). +1. To edit an existing chain, click **Edit**. To remove a chain, click **Delete**. + +{{< admonition type="note" >}} + +- The name must be unique across the organization. +Alert groups inherit the team from the integration, not the escalation chain. +- Linked integrations and routes are shown in the right panel. +Changes to the escalation chain impact all associated integrations and routes. +{{< /admonition >}} ### Types of escalation steps -* `Wait` - wait for a specified amount of time before proceeding to the next step. If you -need a larger time interval, use multiple wait steps in a row. -* `Notify users` - send a notification to a user or a group of users. -* `Notify users from on-call schedule` - send a notification to a user or a group of users -from an on-call schedule. -* `Notify all users from a team` - send a notification to all users in a team. -* `Resolve incident automatically` - resolve the alert group right now with status -`Resolved automatically`. -* `Escalate to all Slack channel members` - send a notification to the users in the slack channel. These users will be notified -via the method configured in their user profile. -* `Notify Slack User Group` - send a notification to each member of a slack user group. These users will be notified -via the method configured in their user profile. -* `Trigger outgoing webhook` - trigger an [outgoing webhook]. -* `Notify users one by one (round robin)` - notify users sequentially, cycling through users for **different alert groups**. -Example: if users A, B, and C are in the list, the first alert group notifies A, the second alert group notifies B, and -the third alert group notifies C. Note: users are sorted alphabetically by their username. -To notify multiple users **within the same alert group** until someone acknowledges, instead use `Notify users` policies with -`Wait` policies between them in the escalation chain. -* `Continue escalation if current time is in range` - continue escalation only if current -time is in specified range. It will wait for the specfied time to continue escalation. -Useful when you want to get escalation only during working hours -* `Continue escalation if >X alerts per Y minutes (beta)` - continue escalation only if it -passes some threshold -* `Repeat escalation from beginning (5 times max)` - loop the escalation chain +- `Wait`: Pause for a specified time before moving to the next step. You can add multiple wait steps for longer intervals. +- `Notify users`: Notify individual users or groups. +- `Notify users from on-call schedule`: Send notifications to users from a defined on-call schedule. +- `Notify all team members`: Notify all users in a team. +- `Resolve incident automatically`: Immediately resolve the alert group with the status `Resolved automatically`. +- `Notify Slack channel members`: Notify users in a Slack channel based on their OnCall profile preferences. +- `Notify Slack user group`: Notify all members of a Slack user group. +- `Trigger outgoing webhook`: Activate an [outgoing webhook](ref:outgoing-webhooks). +- `Round robin notifications`: Notify users sequentially, with each user receiving different alert groups. +- `Time-based escalation`: Continue escalation only if the current time falls within a specific range (e.g., during working hours) +- `Threshold-based escalation`: Escalate only if a certain number of alerts occur within a specific time frame. +- `Repeat escalation`: Loop the escalation chain up to five times. +- `Declare incident (non-default routes)`: **Available only in Grafana Cloud**. Declares an incident with a specified severity. +Limited to one incident per route at a time. +Additional alerts are grouped into the active incident, and up to five are listed as incident context. -> **Note:** Both "**Escalate to all Slack channel members**" and "**Notify Slack User Group**" will filter OnCall registered users -matching the users in the Slack channel or Slack User Group with their profiles linked to their Slack accounts (ie. users -should have linked their Slack and OnCall users). In both cases, the filtered users satisfying the criteria above are -notified following their respective notification policies. However, to avoid **spamming** the Slack channel/thread, -users **won't be notified** in the alert group Slack **thread** (this is how the feature is currently implemented) -but instead notify them using their **other defined** options in -their respective policies. +{{< admonition type="note" >}} +The **Notify Slack channel members** and **Notify Slack user group** steps are designed to notify OnCall-registered users via their configured notification rules. +To avoid spamming a Slack channel with alert group notifications, notifications are not sent in the alert group Slack thread. +{{< /admonition >}} ### Notification types -Each escalation step that notifies a user, does so by triggering their personal notification steps. These are configured in the Grafana - OnCall users page (by clicking "View my profile"). -It will be executed for each user in the escalation step -User can configure two types of personal notification chains: +When an escalation step notifies a user, it follows their personal notification settings, which are configured in their user profile. -* **Default Notifications** +Each user can have two sets of notification rules: -* **Important Notifications** +- **Default Notifications**: For standard alerts. +- **Important Notifications**: For high-priority alerts. -In the escalation step, user can select which type of notification to use. -For more information, refer to [Notify people]. - -### Manage Escalation Chains - -1. Open **Escalation Chains** page -2. Click **New escalation chain** button to create a new escalation chain - -3. Enter a name and assign it to a team - > **Note:** Name must be unique across organization - > **Note:** Alert Groups inherit the team from the Integration, not the Escalation Chain -4. Click **Add escalation step** button to add a new step -5. Click **Delete** to delete the Escalation Chain, and **Edit** to edit the name or the team. - -> **Important:** Linked Integrations and Routes are displayed in the right panel. Any change in the Escalation Chain will -affect all linked Integrations and Routes. +Each escalation step allows you to select which set of notification rules to use. +For more information about user notification rules, refer to the [Notifications](ref:notify-people) section. diff --git a/docs/sources/oncall-api-reference/escalation_policies.md b/docs/sources/oncall-api-reference/escalation_policies.md index 584435e5..6e68b42d 100644 --- a/docs/sources/oncall-api-reference/escalation_policies.md +++ b/docs/sources/oncall-api-reference/escalation_policies.md @@ -42,7 +42,7 @@ The above command returns JSON structured in the following way: | ---------------------------------- | :--------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `escalation_chain_id` | Yes | Each escalation policy is assigned to a specific escalation chain. | | `position` | Optional | Escalation policies execute one after another starting from `position=0`. `Position=-1` will put the escalation policy to the end of the list. A new escalation policy created with a position of an existing escalation policy will move the old one (and all following) down in the list. | -| `type` | Yes | One of: `wait`, `notify_persons`, `notify_person_next_each_time`, `notify_on_call_from_schedule`, `notify_user_group`, `trigger_webhook`, `resolve`, `notify_whole_channel`, `notify_if_time_from_to`. | +| `type` | Yes | One of: `wait`, `notify_persons`, `notify_person_next_each_time`, `notify_on_call_from_schedule`, `notify_user_group`, `trigger_webhook`, `resolve`, `notify_whole_channel`, `notify_if_time_from_to`, `declare_incident`. | | `important` | Optional | Default is `false`. Will assign "important" to personal notification rules if `true`. This can be used to distinguish alerts on which you want to be notified immediately by phone. Applicable for types `notify_persons`, `notify_team_members`, `notify_on_call_from_schedule`, and `notify_user_group`. | | `duration` | If type = `wait` | The duration, in seconds, when type `wait` is chosen. Valid values are: `60`, `300`, `900`, `1800`, `3600`. | | `action_to_trigger` | If type = `trigger_webhook` | ID of a webhook. | @@ -52,7 +52,8 @@ The above command returns JSON structured in the following way: | `notify_on_call _from_schedule` | If type = `notify_on_call_from_schedule` | ID of a Schedule. | | `notify_if_time_from` | If type = `notify_if_time_from_to` | UTC time represents the beginning of the time period, for example `09:00:00Z`. | | `notify_if_time_to` | If type = `notify_if_time_from_to` | UTC time represents the end of the time period, for example `18:00:00Z`. | -| `team_to_notify` | If type = `notify_team_members` | ID of a team. | +| `team_to_notify` | If type = `notify_team_members` | ID of a team. | +| `severity` | If type = `declare_incident` | Severity of the incident. | **HTTP request**