Keep record of the timestamp when the alert group creation task is
triggered, allowing to track the delta time between alert received
datetime and alert group creation timestamp.
Related to https://github.com/grafana/oncall-private/issues/2347
# What this PR does
Changes how the `MOBILE_APP_GATEWAY_ENABLED` feature flag
enables/disables the mobile app gateway viewset.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Closes https://github.com/grafana/oncall-private/issues/2324
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
## Which issue(s) this PR fixes
Fixes this issue we started seeing popping up because of a change
introduced in #3496:
```python3
File "/etc/app/apps/metrics_exporter/views.py", line 22, in get
result = generate_latest(application_metrics_registry).decode("utf-8")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/prometheus_client/exposition.py", line 198, in generate_latest
for metric in registry.collect():
File "/usr/local/lib/python3.11/site-packages/prometheus_client/registry.py", line 97, in collect
yield from collector.collect()
File "/etc/app/apps/metrics_exporter/metrics_collectors.py", line 56, in collect
alert_groups_total, missing_org_ids_1 = self._get_alert_groups_total_metric(org_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/etc/app/apps/metrics_exporter/metrics_collectors.py", line 97, in _get_alert_groups_total_metric
org_id_from_key = RE_ALERT_GROUPS_TOTAL.match(org_key).groups()[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'groups'
```
```python3
>>> import re
>>> ALERT_GROUPS_TOTAL = "oncall_alert_groups_total"
>>> _RE_BASE_PATTERN = r"{{?{}}}?_(\d+)"
>>> RE_ALERT_GROUPS_TOTAL = re.compile(_RE_BASE_PATTERN.format(ALERT_GROUPS_TOTAL))
>>> org_key = "{oncall_alert_groups_total}_1"
>>> RE_ALERT_GROUPS_TOTAL.match(org_key).groups()[0]
'1'
>>> org_key = "oncall_alert_groups_total_1"
>>> RE_ALERT_GROUPS_TOTAL.match(org_key).groups()[0]
'1'
```
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2363
Addresses this issue that arises when using
`cache.get_many`/`cache.set_many` operations with a Redis Cluster:
```python3
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 1006, in determine_slot
raise RedisClusterException(
redis.exceptions.RedisClusterException: MGET - all keys must map to the same key slot
```
From the Redis Cluster
[docs](https://redis.io/docs/reference/cluster-spec/#hash-tags), this
can be addressed with this 👇 . Basically this will ensure that keys in
multi-key operations will resolve to the same hash slot (read: node):
> Hash tags
> There is an exception for the computation of the hash slot that is
used in order to implement hash tags. Hash tags are a way to ensure that
multiple keys are allocated in the same hash slot. This is used in order
to implement multi-key operations in Redis Cluster.
>
> To implement hash tags, the hash slot for a key is computed in a
slightly different way in certain conditions. If the key contains a
"{...}" pattern only the substring between { and } is hashed in order to
obtain the hash slot. However since it is possible that there are
multiple occurrences of { or } the algorithm is well specified by the
following rules:
>
> IF the key contains a { character.
> AND IF there is a } character to the right of {.
> AND IF there are one or more characters between the first occurrence
of { and the first occurrence of }.
> Then instead of hashing the key, only what is between the first
occurrence of { and the following first occurrence of } is hashed.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Makes sure there are no empty alert group label values.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Disallows creating and deleting direct paging integrations via both
internal and public APIs. It also hides the direct paging option in the
UI when creating a new integration.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2302
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Dominik <dominik.broj@grafana.com>
# What this PR does
Makes organization sync create direct paging integrations for Grafana
teams that don't have one.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2302
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Deletes duplicate direct paging integrations (i.e. keeps only the first
direct paging integration per team).
Also adds a unique constraint that will make such duplicates impossible
at the DB level.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2302
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Moving metrics creation into separate task to make alert ingestion more
robust
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Julia <ferril.darkdiver@gmail.com>
Rewrite LabelAPIClient to be able to return error messages from Label
Repo API.
Main features:
1. Raises LabelRepoAPIException when response code is 400 or 500 level.
2. Always return response as a second argument to further inspect it, if
necessary.
# What this PR does
Change GrafanaServiceAccountAuth to use instance ID header in cloud
instead of slugs.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Adds an ability to extract labels from alert group payload. See
[demo](https://www.loom.com/share/cf2b746eea974547b76f44298e32a54f?sid=67ed1e58-40ed-4136-a201-6482fb7773d3).
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2304
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com>
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
This reverts commit acfba47a81.
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fix alert group rendering when some links were broken because of
replacing `-` to `_`.
## Which issue(s) this PR fixes
https://github.com/grafana/support-escalations/issues/8119https://github.com/grafana/support-escalations/issues/8468
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fixed issue where `User Settings Reader` was missing permission to list
users.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Allows public OnCall API to use Grafana service accounts for
authorization. In cloud requests using a Grafana service account token
also needs to provide headers for `X-Grafana-Org-Slug` and
`X-Grafana-Instance-Slug`
This is **alpha** functionality, it may break or be removed in the
future. Going to use this on one endpoint (resolution notes) before we
consider the implications across all of public API.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR catches redis unavailability exceptions to prevent errors during
alert ingestion in the following places:
StartupProbeView
AlertChannelDefiningMixin
RateLimitMixin
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
This PR add labels for webhooks.
1. Make webhook "labelable" with ability to filter by labels.
2. Add labels to the webhook payload. It contain new field webhook with
it's name, id and labels. Field integration and alert_group has a
corresponding label field as well. See example of a new payload below:
```
{
"event": {
"type": "escalation"
},
"user": null,
"alert_group": {
"id": "IRFN6ZD31N31B",
"integration_id": "CTWM7U4A2QG97",
"route_id": "RUE7U7Z46SKGY",
"alerts_count": 1,
"state": "firing",
"created_at": "2023-11-22T08:54:55.178243Z",
"resolved_at": null,
"acknowledged_at": null,
"title": "Incident",
"permalinks": {
"slack": null,
"telegram": null,
"web": "http://grafana:3000/a/grafana-oncall-app/alert-groups/IRFN6ZD31N31B"
},
"labels": {
"severity": "critical"
}
},
"alert_group_id": "IRFN6ZD31N31B",
"alert_payload": {
"message": "This alert was sent by user for demonstration purposes"
},
"integration": {
"id": "CTWM7U4A2QG97",
"type": "webhook",
"name": "hi - Webhook",
"team": null,
"labels": {
"hello": "world",
"severity": "critical"
}
},
"notified_users": [],
"users_to_be_notified": [],
"webhook": {
"id": "WHAXK4BTC7TAEQ",
"name": "test",
"labels": {
"hello": "kesha"
}
}
}
```
I feel that there is an opportunity to make code cleaner - remove all
label logic from serializers, views and utils to models or dedicated
LabelerService and introduce Labelable interface with something like
label_verbal, update_labels methods. However, I don't want to tie
webhook labels with a refactoring.
---------
Co-authored-by: Dominik <dominik.broj@grafana.com>
# What this PR does
add more logging in `apps.alerts.tasks.notify_user.perform_notification`
task (related to https://github.com/grafana/oncall-private/issues/2318;
won't solve that issue but will help with the investigation)