Metrics doc (#2149)
# What this PR does ## Which issue(s) this PR fixes ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) --------- Co-authored-by: Matias Bordese <mbordese@gmail.com>
This commit is contained in:
parent
f48d4f1f25
commit
0c46b41498
2 changed files with 99 additions and 10 deletions
|
|
@ -15,7 +15,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||
- Enable schedule related profile settings oncall [1508](https://github.com/grafana/oncall/issues/1508)
|
||||
- Highlight user shifts oncall [1509](https://github.com/grafana/oncall/issues/1509)
|
||||
- Rename or Description for Schedules Rotations [1460](https://github.com/grafana/oncall/issues/1406)
|
||||
- Add dashboard for OnCall metrics
|
||||
- Add documentation for OnCall metrics exporter ([#2149](https://github.com/grafana/oncall/pull/2149))
|
||||
- Add dashboard for OnCall metrics ([#1973](https://github.com/grafana/oncall/pull/1973))
|
||||
|
||||
## Changed
|
||||
|
||||
|
|
|
|||
|
|
@ -6,11 +6,99 @@ keywords:
|
|||
- Metrics
|
||||
- Loki
|
||||
- Prometheus
|
||||
title: Insight Logs
|
||||
title: Insight Logs and Metrics
|
||||
weight: 1400
|
||||
---
|
||||
|
||||
# Insight Logs
|
||||
# Insight Logs and Metrics
|
||||
|
||||
## Metrics
|
||||
|
||||
Grafana OnCall Metrics represents certain parameters, such as:
|
||||
|
||||
- A total count of alert groups for each integration in every state (firing, acknowledged, resolved, silenced).
|
||||
It is a gauge, and its name has the suffix `alert_groups_total`
|
||||
- Response time on alert groups for each integration (mean time between the start and first action of all alert groups
|
||||
for the last 7 days in selected period). It is a histogram, and its name has the suffix `alert_groups_response_time`
|
||||
with the histogram suffixes such as `_bucket`, `_sum` and `_count`
|
||||
|
||||
You can find more information about metrics types in the [Prometheus documentation](https://prometheus.io/docs/concepts/metric_types).
|
||||
|
||||
To retrieve Prometheus metrics use PromQL. If you are not familiar with PromQL, check this [documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/).
|
||||
|
||||
### For Grafana Cloud customers
|
||||
|
||||
OnCall application metrics are collected in preinstalled `grafanacloud_usage` datasource and are available for every
|
||||
cloud instance.
|
||||
|
||||
Metrics have prefix `grafanacloud_oncall_instance`, e.g. `grafanacloud_oncall_instance_alert_groups_total` and
|
||||
`grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket`.
|
||||
|
||||
### For open source customers
|
||||
|
||||
To collect OnCall application metrics you need to set up Prometheus and add it to your Grafana instance as a datasource.
|
||||
You can find more information about Prometheus setup in the [OSS documentation](https://github.com/grafana/oncall#readme)
|
||||
|
||||
Metrics will have the prefix `oncall`, e.g. `oncall_alert_groups_total` and `oncall_alert_groups_response_time_seconds_bucket`.
|
||||
|
||||
Your metrics may also have additional labels, such as `pod`, `instance`, `container`, depending on your Prometheus setup.
|
||||
|
||||
### Metric Alert groups total
|
||||
|
||||
This metric has the following labels:
|
||||
|
||||
| Label Name | Description |
|
||||
|---------------|:-----------------------------------------------------------------------------:|
|
||||
| `id` | ID of Grafana instance (stack) |
|
||||
| `slug` | Slug of Grafana instance (stack) |
|
||||
| `org_id` | ID of Grafana organization |
|
||||
| `team` | Team name |
|
||||
| `integration` | OnCall Integration name |
|
||||
| `state` | Alert groups state. May be `firing`, `acknowledged`, `resolved` and `silenced`|
|
||||
|
||||
**Query example:**
|
||||
|
||||
Get the number of alert groups in "firing" state in integration "Grafana Alerting" in Grafana stack "test_stack":
|
||||
|
||||
```promql
|
||||
grafanacloud_oncall_instance_alert_groups_total{slug="test_stack", integration="Grafana Alerting", state="firing"}
|
||||
```
|
||||
|
||||
### Metric Alert groups response time
|
||||
|
||||
This metric has the following labels:
|
||||
|
||||
| Label Name | Description |
|
||||
|---------------|:------------------------------------------------------------------------------:|
|
||||
| `id` | ID of Grafana instance (stack) |
|
||||
| `slug` | Slug of Grafana instance (stack) |
|
||||
| `org_id` | ID of Grafana organization |
|
||||
| `team` | Team name |
|
||||
| `integration` | OnCall Integration name |
|
||||
| `le` | Histogram bucket value in seconds. May be `60`, `300`, `600`, `3600` and `+Inf`|
|
||||
|
||||
**Query example:**
|
||||
|
||||
Get the number of alert groups with response time more than 10 minutes (600 seconds) in integration "Grafana Alerting"
|
||||
in Grafana stack "test_stack":
|
||||
|
||||
```promql
|
||||
grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket{slug="test_stack", integration="Grafana Alerting", le="600"}
|
||||
```
|
||||
|
||||
### Dashboard
|
||||
|
||||
To import OnCall metrics dashboard go to `Administration` -> `Plugins` page, find OnCall in the plugins list, open
|
||||
`Dashboards` tab at the OnCall plugin settings page and click "Import" near "OnCall metrics". After that you can find
|
||||
the "OnCall metrics" dashboard in your dashboards list. In the datasource dropdown select your Prometheus datasource
|
||||
(for Cloud customers it's `grafanacloud_usage`). You can filter data by your Grafana instances, teams and integrations.
|
||||
|
||||
To update the dashboard to the newest version go to `Dashboards` tab at the OnCall plugin settings page and click
|
||||
“Re-import”.
|
||||
Be aware: if you have made changes to the dashboard, they will be deleted after re-importing. To save your changes go
|
||||
to the dashboard settings, click "Save as" and save a copy of the dashboard.
|
||||
|
||||
## Insight Logs
|
||||
|
||||
> **Note:** Grafana OnCall insight logs are available in Grafana Cloud only.
|
||||
We're in the process of rolling out Insight Logs to all customers,
|
||||
|
|
@ -29,7 +117,7 @@ You can use this query to retrieve all logs related to your OnCall instance.
|
|||
{instance_type="oncall"} | logfmt | __error__=``
|
||||
```
|
||||
|
||||
## Resource insight logs
|
||||
### Resource insight logs
|
||||
|
||||
Logs are created each time a user modifies any resource in Grafana OnCall.
|
||||
|
||||
|
|
@ -39,7 +127,7 @@ These logs will have `action_type=resource` field and can be retrieved with foll
|
|||
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource`
|
||||
```
|
||||
|
||||
### Format
|
||||
#### Format
|
||||
|
||||
Logs contain the following fields, where the fields followed by * are always available, and the others depend on the logged event:
|
||||
|
||||
|
|
@ -67,7 +155,7 @@ resource types are: `integration_heartbeat`, `escalation_chain`, `integration`,
|
|||
`escalation_policy`, `public_api_token`, `schedule_export_token`,`user_schedule_export_token`,
|
||||
`oncall_shift`, `web_schedule`, `ical_schedule`, `calendar_schedule`, `organization`, `user`, `webhook`.
|
||||
|
||||
## Maintenance insight logs
|
||||
### Maintenance insight logs
|
||||
|
||||
Logs are created every time when a maintenance mode is started or finished for an integration.
|
||||
|
||||
|
|
@ -77,7 +165,7 @@ These logs will have `action_type=maintenace` field and can be retrieved with fo
|
|||
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `maintenance`
|
||||
```
|
||||
|
||||
### Format
|
||||
#### Format
|
||||
|
||||
Logs of maintenance insights contain the following fields, where the fields followed by * are always available, and the others depend on the logged event:
|
||||
|
||||
|
|
@ -93,7 +181,7 @@ Logs of maintenance insights contain the following fields, where the fields foll
|
|||
| `team`* | Name of team to which integration belongs. |
|
||||
| `team_id` | ID of team to which integration belongs. |
|
||||
|
||||
## ChatOps insight logs
|
||||
### ChatOps insight logs
|
||||
|
||||
Logs are created when user modifies ChatOps settings.
|
||||
|
||||
|
|
@ -103,7 +191,7 @@ These log lines will have `action_type=chat_ops` field and can be retrieved with
|
|||
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `chat_ops`
|
||||
```
|
||||
|
||||
### Format
|
||||
#### Format
|
||||
|
||||
Logs of chatops insight logs contain the following fields, where the fields followed by * are always available, and the others depend on the logged event:
|
||||
|
||||
|
|
@ -122,7 +210,7 @@ Logs of chatops insight logs contain the following fields, where the fields foll
|
|||
|
||||
chatops action names: `workspace_connected`, `workspace_disconnected`, `channel_connected`, `channel_disconnected`, `user_linked`, `used_unlinked`, `default_channel_changed`.
|
||||
|
||||
## Examples
|
||||
### Examples
|
||||
|
||||
Here is some examples of practical queries to Grafana OnCall insight logs.
|
||||
LogQL is used to retrieve them. If you are not familiar with LogQL check this [documentation](https://grafana.com/docs/loki/latest/logql/).
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue