modify check_escalation_finished_task task (#1266)

# What this PR does

This PR:
- modifies the `check_escalation_finished_task` celery task to:
  - do stricter escalation validation based on the alert group's
escalation snapshot (see the `audit_alert_group_escalation` method in
`engine/apps/alerts/tasks/check_escalation_finished.py` for the
validation logic)
- use a read-only database for querying alert-groups if one is
configured, otherwise use the "default" one
- ping a configurable heartbeat (new env var
`ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL` added)
- increase the task frequency from every 10 to every 13 minutes (this
can be configured via an env variable)
  - adds public documentation on how to configure this auditor task
- modifies the local celery startup command to properly take into
consideration all celery related env vars (similar to the ones we use in
`engine/celery_with_exporter.sh`; this made it easier to enable `celery
beat` locally for testing)
- removes the following code:
- removes references to `AlertGroup.estimate_escalation_finish_time` and
marks the model field as deprecated using the [`django-deprecate-fields`
library](https://pypi.org/project/django-deprecate-fields/). This field
was only used for the previous version of this validation task
- `EscalationSnapshotMixin.calculate_eta_for_finish_escalation` was only
used to calculate the value for
`AlertGroup.estimate_escalation_finish_time`
  - `calculate_escalation_finish_time` celery task
  

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/1558

## Checklist

- [x] Tests updated
- [x] Documentation added
- [x] `CHANGELOG.md` updated
This commit is contained in:
Joey Orlando 2023-03-17 11:14:08 +01:00 committed by GitHub
parent f85f77dbc7
commit 4d655dff60
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
21 changed files with 1369 additions and 254 deletions

View file

@ -7,6 +7,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## Unreleased
### Added
- Modified `check_escalation_finished_task` celery task to use read-only databases for its query, if one is defined +
make the validation logic stricter + ping a configurable heartbeat on successful completion of this task ([1266](https://github.com/grafana/oncall/pull/1266))
### Changed
- Updated wording in some Slack messages to use 'Alert Group' instead of 'Incident' ([1565](https://github.com/grafana/oncall/pull/1565))

View file

@ -29,10 +29,11 @@ GRAFANA_INCIDENT_STATIC_API_KEY=
GRAFANA_API_URL=http://localhost:3000
CELERY_WORKER_QUEUE="default,critical,long,slack,telegram,webhook,retry,celery"
CELERY_WORKER_CONCURRENCY=1
CELERY_WORKER_CONCURRENCY=3
CELERY_WORKER_MAX_TASKS_PER_CHILD=100
CELERY_WORKER_SHUTDOWN_INTERVAL=65m
CELERY_WORKER_BEAT_ENABLED=True
CELERY_WORKER_DEBUG_LOGS=False
RABBITMQ_USERNAME=rabbitmq
RABBITMQ_PASSWORD=rabbitmq

View file

@ -243,7 +243,7 @@ The limit can be changed using env variables:
## Mobile application set up
>**Note**: This application is currently in beta
> **Note**: This application is currently in beta
Grafana OnCall OSS users can use the mobile app to receive push notifications from OnCall.
Grafana OnCall OSS relies on Grafana Cloud as on relay for push notifications.
@ -255,3 +255,29 @@ For Grafana OnCall OSS, the mobile app QR code includes an authentication token
Your Grafana OnCall OSS instance should be reachable from the same network as your mobile device, preferably from the internet.
For more information, see [Grafana OnCall mobile app]({{< relref "../mobile-app" >}})
## Alert Group Escalation Auditor
Grafana OnCall has a periodic background task, which runs to check that all alert group escalations have finished
properly. This feature, if configured, can also ping an OnCall Webhook Integration's heartbeat URL, so that you can be
alerted, in the event that something goes wrong.
Logs originating from the celery worker, for the `apps.alerts.tasks.check_escalation_finished.check_escalation_finished_task`
task, that reference a `AlertGroupEscalationPolicyExecutionAuditException` exception
indicate that the auditor periodic task is failing check(s) on one or more alert groups. Logs for this task which
mention `.. passed the audit checks` indicate that there were no issues with with the escalation on the audited
alert groups.
To configure this feature as such:
1. Create a Webhook, or Formatted Webhook, Integration type.
1. Under the "Heartbeat" tab in the Integration modal, copy the unique heartbeat URL that is shown.
1. Set the hearbeat's expected time interval to 15 minutes (see note below regarding `ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_INTERVAL`)
1. Configure the integration's escalation chain as necessary
1. Populate the following env variables:
- `ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL` - integration's unique heartbeat URL
- `ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_INTERVAL` - how often the auditor task should run. By default the
task runs every 13 minutes so we therefore recommend setting the heartbeat's expected time interval to 15 minutes. If you
would like to modify this, we recommend configuring this env variable to 1 or 2 minutes less than the value set for the
integration's heartbeat expected time interval.

View file

@ -1,3 +1,4 @@
import datetime
import logging
from typing import Optional
@ -5,19 +6,16 @@ import pytz
from celery import uuid as celery_uuid
from dateutil.parser import parse
from django.apps import apps
from django.utils import timezone
from django.utils.functional import cached_property
from rest_framework.exceptions import ValidationError
from apps.alerts.constants import NEXT_ESCALATION_DELAY
from apps.alerts.escalation_snapshot.snapshot_classes import (
ChannelFilterSnapshot,
EscalationChainSnapshot,
EscalationPolicySnapshot,
EscalationSnapshot,
)
from apps.alerts.escalation_snapshot.utils import eta_for_escalation_step_notify_if_time
from apps.alerts.tasks import calculate_escalation_finish_time, escalate_alert_group
from apps.alerts.tasks import escalate_alert_group
logger = logging.getLogger(__name__)
@ -90,8 +88,7 @@ class EscalationSnapshotMixin:
'next_step_eta': '2021-10-18T10:28:28.890369Z
}
"""
escalation_snapshot = None
data = {}
if self.escalation_chain_exists:
channel_filter = self.channel_filter
@ -104,53 +101,7 @@ class EscalationSnapshotMixin:
"escalation_policies_snapshots": escalation_policies,
"slack_channel_id": self.slack_channel_id,
}
escalation_snapshot = EscalationSnapshot.serializer(data).data
return escalation_snapshot
def calculate_eta_for_finish_escalation(self, escalation_started=False, start_time=None):
if not self.escalation_snapshot:
return
EscalationPolicy = apps.get_model("alerts", "EscalationPolicy")
TOLERANCE_SECONDS = 1
TOLERANCE_TIME = timezone.timedelta(seconds=NEXT_ESCALATION_DELAY + TOLERANCE_SECONDS)
start_time = start_time or timezone.now() # start time may be different for silenced incidents
wait_summ = timezone.timedelta()
# Get next_active_escalation_policy_order using flag `escalation_started` because this calculation can be
# started in parallel with escalation task where next_active_escalation_policy_order can be changed.
# That's why we are using `escalation_started` flag here, which means, that we want count eta from the first
# step.
next_escalation_policy_order = (
self.escalation_snapshot.next_active_escalation_policy_order if escalation_started else 0
)
escalation_policies = self.escalation_snapshot.escalation_policies_snapshots[next_escalation_policy_order:]
for escalation_policy in escalation_policies:
if escalation_policy.step == EscalationPolicy.STEP_WAIT:
if escalation_policy.wait_delay is not None:
wait_summ += escalation_policy.wait_delay
else:
wait_summ += EscalationPolicy.DEFAULT_WAIT_DELAY # Default wait in case it's not selected yet
elif escalation_policy.step == EscalationPolicy.STEP_NOTIFY_IF_TIME:
if escalation_policy.from_time and escalation_policy.to_time:
estimate_start_time = start_time + wait_summ
STEP_TOLERANCE = timezone.timedelta(minutes=1)
next_step_estimate_start_time = eta_for_escalation_step_notify_if_time(
escalation_policy.from_time,
escalation_policy.to_time,
estimate_start_time + STEP_TOLERANCE,
)
wait_summ += next_step_estimate_start_time - estimate_start_time
elif escalation_policy.step == EscalationPolicy.STEP_REPEAT_ESCALATION_N_TIMES:
# the part of escalation with repeat step will be passed six times: the first time plus five repeats
wait_summ *= EscalationPolicy.MAX_TIMES_REPEAT + 1
elif escalation_policy.step == EscalationPolicy.STEP_NOTIFY_IF_NUM_ALERTS_IN_TIME_WINDOW:
# In this case we cannot calculate finish time, so we return None
return
elif escalation_policy.step == EscalationPolicy.STEP_FINAL_RESOLVE:
break
wait_summ += TOLERANCE_TIME
escalation_finish_time = start_time + wait_summ
return escalation_finish_time
return EscalationSnapshot.serializer(data).data
@property
def channel_filter_with_respect_to_escalation_snapshot(self):
@ -166,38 +117,52 @@ class EscalationSnapshotMixin:
@cached_property
def channel_filter_snapshot(self) -> Optional[ChannelFilterSnapshot]:
# in some cases we need only channel filter and don't want to serialize whole escalation
channel_filter_snapshot_object = None
"""
in some cases we need only channel filter and don't want to serialize whole escalation
"""
escalation_snapshot = self.raw_escalation_snapshot
if escalation_snapshot is not None:
channel_filter_snapshot = ChannelFilterSnapshot.serializer().to_internal_value(
escalation_snapshot["channel_filter_snapshot"]
)
channel_filter_snapshot_object = ChannelFilterSnapshot(**channel_filter_snapshot)
return channel_filter_snapshot_object
if not escalation_snapshot:
return None
channel_filter_snapshot = escalation_snapshot["channel_filter_snapshot"]
if not channel_filter_snapshot:
return None
channel_filter_snapshot = ChannelFilterSnapshot.serializer().to_internal_value(channel_filter_snapshot)
return ChannelFilterSnapshot(**channel_filter_snapshot)
@cached_property
def escalation_chain_snapshot(self) -> Optional[EscalationChainSnapshot]:
# in some cases we need only escalation chain and don't want to serialize whole escalation
"""
in some cases we need only escalation chain and don't want to serialize whole escalation
escalation_chain_snapshot_object = None
"""
escalation_snapshot = self.raw_escalation_snapshot
if escalation_snapshot is not None:
escalation_chain_snapshot = EscalationChainSnapshot.serializer().to_internal_value(
escalation_snapshot["escalation_chain_snapshot"]
)
escalation_chain_snapshot_object = EscalationChainSnapshot(**escalation_chain_snapshot)
return escalation_chain_snapshot_object
if not escalation_snapshot:
return None
escalation_chain_snapshot = escalation_snapshot["escalation_chain_snapshot"]
if not escalation_chain_snapshot:
return None
escalation_chain_snapshot = EscalationChainSnapshot.serializer().to_internal_value(escalation_chain_snapshot)
return EscalationChainSnapshot(**escalation_chain_snapshot)
@cached_property
def escalation_snapshot(self) -> Optional[EscalationSnapshot]:
escalation_snapshot_object = None
raw_escalation_snapshot = self.raw_escalation_snapshot
if raw_escalation_snapshot is not None:
if raw_escalation_snapshot:
try:
escalation_snapshot_object = self._deserialize_escalation_snapshot(raw_escalation_snapshot)
return self._deserialize_escalation_snapshot(raw_escalation_snapshot)
except ValidationError as e:
logger.error(f"Error trying to deserialize raw escalation snapshot: {e}")
return escalation_snapshot_object
return None
@cached_property
def has_escalation_policies_snapshots(self) -> bool:
if not self.raw_escalation_snapshot:
return False
return len(self.raw_escalation_snapshot["escalation_policies_snapshots"]) > 0
def _deserialize_escalation_snapshot(self, raw_escalation_snapshot) -> EscalationSnapshot:
"""
@ -225,20 +190,34 @@ class EscalationSnapshotMixin:
return escalation_snapshot_object
@property
def escalation_chain_exists(self):
return not self.pause_escalation and self.channel_filter and self.channel_filter.escalation_chain
def escalation_chain_exists(self) -> bool:
if self.pause_escalation:
return False
elif not self.channel_filter:
return False
return self.channel_filter.escalation_chain is not None
@property
def pause_escalation(self):
# get pause_escalation field directly to avoid serialization overhead
return self.raw_escalation_snapshot is not None and self.raw_escalation_snapshot.get("pause_escalation", False)
def pause_escalation(self) -> bool:
"""
get pause_escalation field directly to avoid serialization overhead
"""
if not self.raw_escalation_snapshot:
return False
return self.raw_escalation_snapshot.get("pause_escalation", False)
@property
def next_step_eta(self):
# get next_step_eta field directly to avoid serialization overhead
raw_next_step_eta = (
self.raw_escalation_snapshot.get("next_step_eta") if self.raw_escalation_snapshot is not None else None
)
def next_step_eta(self) -> Optional[datetime.datetime]:
"""
get next_step_eta field directly to avoid serialization overhead
"""
if not self.raw_escalation_snapshot:
return None
raw_next_step_eta = self.raw_escalation_snapshot.get("next_step_eta")
if not raw_next_step_eta:
return None
if raw_next_step_eta:
return parse(raw_next_step_eta).replace(tzinfo=pytz.UTC)
@ -272,13 +251,10 @@ class EscalationSnapshotMixin:
is_escalation_finished=False,
raw_escalation_snapshot=raw_escalation_snapshot,
)
if not self.pause_escalation:
calculate_escalation_finish_time.apply_async((self.pk,), immutable=True)
escalate_alert_group.apply_async((self.pk,), countdown=countdown, immutable=True, eta=eta, task_id=task_id)
def stop_escalation(self):
self.is_escalation_finished = True
self.estimate_escalation_finish_time = None
# change active_escalation_id to prevent alert escalation
self.active_escalation_id = "intentionally_stopped"
self.save(update_fields=["is_escalation_finished", "estimate_escalation_finish_time", "active_escalation_id"])
self.save(update_fields=["is_escalation_finished", "active_escalation_id"])

View file

@ -8,11 +8,11 @@ from apps.alerts.escalation_snapshot.serializers import (
class EscalationSnapshotSerializer(serializers.Serializer):
channel_filter_snapshot = ChannelFilterSnapshotSerializer()
escalation_chain_snapshot = EscalationChainSnapshotSerializer()
channel_filter_snapshot = ChannelFilterSnapshotSerializer(allow_null=True, default=None)
escalation_chain_snapshot = EscalationChainSnapshotSerializer(allow_null=True, default=None)
last_active_escalation_policy_order = serializers.IntegerField(allow_null=True, default=None)
escalation_policies_snapshots = EscalationPolicySnapshotSerializer(many=True)
slack_channel_id = serializers.CharField(allow_null=True)
escalation_policies_snapshots = EscalationPolicySnapshotSerializer(many=True, default=list)
slack_channel_id = serializers.CharField(allow_null=True, default=None)
pause_escalation = serializers.BooleanField(allow_null=True, default=False)
next_step_eta = serializers.DateTimeField(allow_null=True, default=None)

View file

@ -1,15 +1,23 @@
import logging
from typing import Optional
import typing
from celery.utils.log import get_task_logger
from django.utils import timezone
from apps.alerts.escalation_snapshot.serializers import EscalationSnapshotSerializer
from apps.alerts.escalation_snapshot.snapshot_classes.escalation_policy_snapshot import EscalationPolicySnapshot
from apps.alerts.models.alert_group_log_record import AlertGroupLogRecord
logger = get_task_logger(__name__)
logger.setLevel(logging.DEBUG)
if typing.TYPE_CHECKING:
from apps.alerts.escalation_snapshot.snapshot_classes import (
ChannelFilterSnapshot,
EscalationChainSnapshot,
EscalationPolicySnapshot,
)
from apps.alerts.models import AlertGroup
class EscalationSnapshot:
__slots__ = (
@ -28,34 +36,34 @@ class EscalationSnapshot:
def __init__(
self,
alert_group,
channel_filter_snapshot,
escalation_chain_snapshot,
last_active_escalation_policy_order,
escalation_policies_snapshots,
slack_channel_id,
pause_escalation,
next_step_eta,
alert_group: "AlertGroup",
channel_filter_snapshot: "ChannelFilterSnapshot",
escalation_chain_snapshot: "EscalationChainSnapshot",
last_active_escalation_policy_order: int,
escalation_policies_snapshots: typing.List["EscalationPolicySnapshot"],
slack_channel_id: str,
pause_escalation: bool,
next_step_eta: typing.Optional[str],
):
self.alert_group = alert_group
self.channel_filter_snapshot = channel_filter_snapshot # ChannelFilterSnapshot object
self.escalation_chain_snapshot = escalation_chain_snapshot # EscalationChainSnapshot object
self.channel_filter_snapshot = channel_filter_snapshot
self.escalation_chain_snapshot = escalation_chain_snapshot
self.last_active_escalation_policy_order = last_active_escalation_policy_order
self.escalation_policies_snapshots = escalation_policies_snapshots # list of EscalationPolicySnapshot objects
self.escalation_policies_snapshots = escalation_policies_snapshots
self.slack_channel_id = slack_channel_id
self.pause_escalation = pause_escalation
self.next_step_eta = next_step_eta
self.stop_escalation = False
@property
def last_active_escalation_policy_snapshot(self) -> Optional[EscalationPolicySnapshot]:
def last_active_escalation_policy_snapshot(self) -> typing.Optional["EscalationPolicySnapshot"]:
order = self.last_active_escalation_policy_order
if order is None:
return None
return self.escalation_policies_snapshots[order]
@property
def next_active_escalation_policy_snapshot(self) -> Optional[EscalationPolicySnapshot]:
def next_active_escalation_policy_snapshot(self) -> typing.Optional["EscalationPolicySnapshot"]:
order = self.next_active_escalation_policy_order
if len(self.escalation_policies_snapshots) < order + 1:
next_link = None
@ -71,6 +79,31 @@ class EscalationSnapshot:
next_order = self.last_active_escalation_policy_order + 1
return next_order
@property
def executed_escalation_policy_snapshots(self) -> typing.List["EscalationPolicySnapshot"]:
"""
Returns a list of escalation policy snapshots that have already been executed, according
to the value of last_active_escalation_policy_order
"""
if self.last_active_escalation_policy_order is None:
return []
elif self.last_active_escalation_policy_order == 0:
return [self.escalation_policies_snapshots[0]]
return self.escalation_policies_snapshots[: self.last_active_escalation_policy_order]
def next_step_eta_is_valid(self) -> typing.Union[None, bool]:
"""
`next_step_eta` should never be less than the current time (with a 5 minute buffer provided)
as this field should be updated as the escalation policy is executed over time. If it is, this means that
an escalation policy step has been missed, or is substantially delayed
if `next_step_eta` is `None` then `None` is returned, otherwise a boolean is returned
representing the result of the time comparision
"""
if self.next_step_eta is None:
return None
return self.next_step_eta > (timezone.now() - timezone.timedelta(minutes=5))
def save_to_alert_group(self) -> None:
self.alert_group.raw_escalation_snapshot = self.convert_to_dict()
self.alert_group.save(update_fields=["raw_escalation_snapshot"])
@ -83,7 +116,6 @@ class EscalationSnapshot:
Executes actual escalation step and saves result of execution like stop_escalation param and eta,
that will be used for start next escalate_alert_group task.
Also updates self.last_active_escalation_policy_order if escalation step was executed.
:return: None
"""
escalation_policy_snapshot = self.next_active_escalation_policy_snapshot
if escalation_policy_snapshot is None:

View file

@ -134,7 +134,7 @@ class IncidentLogBuilder:
# check if escalation snapshot wasn't saved and channel filter was deleted.
# We cannot generate escalation plan in this case
escalation_snapshot = self.alert_group.escalation_snapshot
if escalation_snapshot is None:
if not self.alert_group.has_escalation_policies_snapshots:
return escalation_plan_dict
if self.alert_group.silenced_until:

View file

@ -13,6 +13,7 @@ from django.db import IntegrityError, models, transaction
from django.db.models import JSONField, Q, QuerySet
from django.utils import timezone
from django.utils.functional import cached_property
from django_deprecate_fields import deprecate_field
from apps.alerts.escalation_snapshot import EscalationSnapshotMixin
from apps.alerts.incident_appearance.renderers.constants import DEFAULT_BACKUP_TITLE
@ -336,7 +337,9 @@ class AlertGroup(AlertGroupSlackRenderingMixin, EscalationSnapshotMixin, models.
maintenance_uuid = models.CharField(max_length=100, unique=True, null=True, default=None)
raw_escalation_snapshot = JSONField(null=True, default=None)
estimate_escalation_finish_time = models.DateTimeField(null=True, default=None)
# THIS FIELD IS DEPRECATED AND SHOULD EVENTUALLY BE REMOVED
estimate_escalation_finish_time = deprecate_field(models.DateTimeField(null=True, default=None))
# This field is used for constraints so we can use get_or_create() in concurrent calls
# https://docs.djangoproject.com/en/3.2/ref/models/querysets/#get-or-create
@ -1464,14 +1467,7 @@ class AlertGroup(AlertGroupSlackRenderingMixin, EscalationSnapshotMixin, models.
def start_unsilence_task(self, countdown):
task_id = celery_uuid()
self.unsilence_task_uuid = task_id
# recalculate finish escalation time
escalation_start_time = timezone.now() + timezone.timedelta(seconds=countdown)
self.estimate_escalation_finish_time = self.calculate_eta_for_finish_escalation(
start_time=escalation_start_time
)
self.save(update_fields=["unsilence_task_uuid", "estimate_escalation_finish_time"])
self.save(update_fields=["unsilence_task_uuid"])
unsilence_task.apply_async((self.pk,), task_id=task_id, countdown=countdown)
@property

View file

@ -3,7 +3,6 @@ from .alert_group_web_title_cache import ( # noqa:F401
update_web_title_cache,
update_web_title_cache_for_alert_receive_channel,
)
from .calculcate_escalation_finish_time import calculate_escalation_finish_time # noqa
from .call_ack_url import call_ack_url # noqa: F401
from .check_escalation_finished import check_escalation_finished_task # noqa: F401
from .create_contact_points_for_datasource import create_contact_points_for_datasource # noqa: F401

View file

@ -1,15 +0,0 @@
from django.apps import apps
from django.conf import settings
from common.custom_celery_tasks import shared_dedicated_queue_retry_task
@shared_dedicated_queue_retry_task(
autoretry_for=(Exception,), retry_backoff=True, max_retries=1 if settings.DEBUG else None
)
def calculate_escalation_finish_time(alert_group_pk):
AlertGroup = apps.get_model("alerts", "AlertGroup")
alert_group = AlertGroup.all_objects.filter(pk=alert_group_pk)[0]
if alert_group.escalation_snapshot:
alert_group.estimate_escalation_finish_time = alert_group.calculate_eta_for_finish_escalation()
alert_group.save(update_fields=["estimate_escalation_finish_time"])

View file

@ -1,48 +1,148 @@
import datetime
import typing
import requests
from celery import shared_task
from django.apps import apps
from django.conf import settings
from django.db.models import Q
from django.utils import timezone
from apps.alerts.tasks.task_logger import task_logger
from common.custom_celery_tasks import shared_dedicated_queue_retry_task
from common.database import get_random_readonly_database_key_if_present_otherwise_default
if typing.TYPE_CHECKING:
from apps.alerts.models.alert_group import AlertGroup
@shared_dedicated_queue_retry_task(
autoretry_for=(Exception,), retry_backoff=True, max_retries=1 if settings.DEBUG else None, default_retry_delay=60
)
class AlertGroupEscalationPolicyExecutionAuditException(BaseException):
"""This exception is raised when an alert group's escalation policy did not execute execute properly for some reason"""
def send_alert_group_escalation_auditor_task_heartbeat() -> None:
heartbeat_url = settings.ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL
if heartbeat_url:
task_logger.info(f"Sending heartbeat to configured URL: {heartbeat_url}")
requests.get(heartbeat_url).raise_for_status()
task_logger.info(f"Heartbeat successfully sent to {heartbeat_url}")
else:
task_logger.info(f"Skipping sending heartbeat as no heartbeat URL is configured")
def audit_alert_group_escalation(alert_group: "AlertGroup") -> None:
escalation_snapshot = alert_group.escalation_snapshot
alert_group_id = alert_group.id
base_msg = f"Alert group {alert_group_id}"
if not escalation_snapshot:
raise AlertGroupEscalationPolicyExecutionAuditException(
f"{base_msg} does not have an escalation snapshot associated with it, this should never occur"
)
task_logger.info(f"{base_msg} has an escalation snapshot associated with it, auditing if it executed properly")
escalation_policies_snapshots = escalation_snapshot.escalation_policies_snapshots
if not escalation_policies_snapshots:
task_logger.info(
f"{base_msg}'s escalation snapshot has an empty escalation_policies_snapshots, skipping further validation"
)
return
task_logger.info(
f"{base_msg}'s escalation snapshot has a populated escalation_policies_snapshots, continuing validation"
)
if escalation_snapshot.next_step_eta_is_valid() is False:
raise AlertGroupEscalationPolicyExecutionAuditException(
f"{base_msg}'s escalation snapshot does not have a valid next_step_eta: {escalation_snapshot.next_step_eta}"
)
task_logger.info(f"{base_msg}'s escalation snapshot has a valid next_step_eta: {escalation_snapshot.next_step_eta}")
executed_escalation_policy_snapshots = escalation_snapshot.executed_escalation_policy_snapshots
num_of_executed_escalation_policy_snapshots = len(executed_escalation_policy_snapshots)
if num_of_executed_escalation_policy_snapshots == 0:
task_logger.info(
f"{base_msg}'s escalation snapshot does not have any executed escalation policies, skipping further validation"
)
else:
task_logger.info(
f"{base_msg}'s escalation snapshot has {num_of_executed_escalation_policy_snapshots} executed escalation policies"
)
# TODO: consider adding the below checks later on. This is it a bit trickier to properly audit as the
# number of log records can vary if there are any STEP_NOTIFY_IF_NUM_ALERTS_IN_TIME_WINDOW or
# STEP_REPEAT_ESCALATION_N_TIMES escalation policy steps in the escalation chain
# see conversations in the original PR (https://github.com/grafana/oncall/pull/1266) for more context on this
#
# compare number of triggered/failed alert group log records to the number of executed
# escalation policy snapshot steps
# num_of_relevant_log_records = AlertGroupLogRecord.objects.filter(
# alert_group_id=alert_group_id,
# type__in=[AlertGroupLogRecord.TYPE_ESCALATION_TRIGGERED, AlertGroupLogRecord.TYPE_ESCALATION_FAILED],
# ).count()
# if num_of_relevant_log_records < num_of_executed_escalation_policy_snapshots:
# raise AlertGroupEscalationPolicyExecutionAuditException(
# f"{base_msg}'s number of triggered/failed alert group log records ({num_of_relevant_log_records}) is less "
# f"than the number of executed escalation policy snapshot steps ({num_of_executed_escalation_policy_snapshots})"
# )
# task_logger.info(
# f"{base_msg}'s number of triggered/failed alert group log records ({num_of_relevant_log_records}) is greater "
# f"than or equal to the number of executed escalation policy snapshot steps ({num_of_executed_escalation_policy_snapshots})"
# )
task_logger.info(f"{base_msg} passed the audit checks")
def get_auditable_alert_groups_started_at_range() -> typing.Tuple[datetime.datetime, datetime.datetime]:
"""
NOTE: this started_at__range is a bit of a hack..
we wanted to avoid performing a migration on the alerts_alertgroup table to update
alert groups where raw_escalation_snapshot was None. raw_escalation_snapshot being None is a legitimate case,
where the alert group's integration does not have an escalation chain associated with it.
However, we wanted a way to be able to differentiate between "actually None" and "there was an error writing to
raw_escalation_snapshot" (as this is performed async by a celery task).
This field was updated, in the commit that added this comment, to no longer be set to None by default.
As part of this celery task we do a check that this field is in fact not None, so if we were to check older
alert groups, whose integration did not have an escalation chain at the time the alert group was created
we would raise errors
"""
return (datetime.datetime(2023, 3, 25), timezone.now() - timezone.timedelta(days=2))
# don't retry this task as the AlertGroup DB query is rather expensive
@shared_task
def check_escalation_finished_task():
"""
This task periodically checks if there are no alert groups with not finished escalations.
TODO: QA this properly, check if new type of escalations had been added
"""
AlertGroup = apps.get_model("alerts", "AlertGroup")
AlertReceiveChannel = apps.get_model("alerts", "AlertReceiveChannel")
CHECKING_TOLERANCE = timezone.timedelta(minutes=5)
CHECKING_TIME = timezone.now() - CHECKING_TOLERANCE
alert_groups = AlertGroup.all_objects.filter(
alert_groups = AlertGroup.all_objects.using(get_random_readonly_database_key_if_present_otherwise_default()).filter(
~Q(channel__integration=AlertReceiveChannel.INTEGRATION_MAINTENANCE),
~Q(silenced=True, silenced_until__isnull=True), # filter silenced forever alert_groups
is_escalation_finished=False,
resolved=False,
acknowledged=False,
root_alert_group=None,
estimate_escalation_finish_time__lte=CHECKING_TIME,
started_at__range=get_auditable_alert_groups_started_at_range(),
)
if not alert_groups.exists():
return
task_logger.info("There are no alert groups to audit, everything is good :)")
exception_template = "Escalation for alert_group {} is not finished at expected time {}, now {}"
alert_group_ids_that_failed_audit: typing.List[str] = []
now = timezone.now()
exception_text = "\n".join(
exception_template.format(alert_group.pk, alert_group.estimate_escalation_finish_time, now)
for alert_group in alert_groups
)
for alert_group in alert_groups:
try:
audit_alert_group_escalation(alert_group)
except AlertGroupEscalationPolicyExecutionAuditException:
alert_group_ids_that_failed_audit.append(str(alert_group.id))
ids = alert_groups.values_list("pk", flat=True)
task_logger.debug(ids)
if alert_group_ids_that_failed_audit:
raise AlertGroupEscalationPolicyExecutionAuditException(
f"The following alert group id(s) failed auditing: {', '.join(alert_group_ids_that_failed_audit)}"
)
raise Exception(exception_text)
send_alert_group_escalation_auditor_task_heartbeat()

View file

@ -1,6 +1,9 @@
import datetime
import pytest
from apps.alerts.incident_appearance.templaters import AlertSlackTemplater
from apps.alerts.models import EscalationPolicy
@pytest.fixture()
@ -9,3 +12,51 @@ def mock_alert_renderer_render_for(monkeypatch):
return "invalid_render_for"
monkeypatch.setattr(AlertSlackTemplater, "_render_for", mock_render_for)
@pytest.fixture()
def escalation_snapshot_test_setup(
make_organization_and_user,
make_user_for_organization,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_escalation_policy,
make_alert_group,
):
organization, user_1 = make_organization_and_user()
user_2 = make_user_for_organization(organization)
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization)
channel_filter = make_channel_filter(
alert_receive_channel,
escalation_chain=escalation_chain,
notification_backends={"BACKEND": {"channel_id": "abc123"}},
)
notify_to_multiple_users_step = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_NOTIFY_MULTIPLE_USERS,
)
notify_to_multiple_users_step.notify_to_users_queue.set([user_1, user_2])
wait_step = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_WAIT,
wait_delay=EscalationPolicy.FIFTEEN_MINUTES,
)
# random time for test
from_time = datetime.time(10, 30)
to_time = datetime.time(18, 45)
notify_if_time_step = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_NOTIFY_IF_TIME,
from_time=from_time,
to_time=to_time,
)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
alert_group.save()
return alert_group, notify_to_multiple_users_step, wait_step, notify_if_time_step

View file

@ -1,45 +1,329 @@
from unittest.mock import Mock, PropertyMock, patch
import pytest
import requests
from django.test import override_settings
from django.utils import timezone
from apps.alerts.models import AlertReceiveChannel
from apps.alerts.tasks import check_escalation_finished_task
from apps.alerts.models import AlertGroup
from apps.alerts.tasks.check_escalation_finished import (
AlertGroupEscalationPolicyExecutionAuditException,
audit_alert_group_escalation,
check_escalation_finished_task,
send_alert_group_escalation_auditor_task_heartbeat,
)
MOCKED_HEARTBEAT_URL = "https://hello.com/lsdjjkf"
# def _get_relevant_log_record_type() -> int:
# return random.choice([AlertGroupLogRecord.TYPE_ESCALATION_TRIGGERED, AlertGroupLogRecord.TYPE_ESCALATION_FAILED])
def test_send_alert_group_escalation_auditor_task_heartbeat_does_not_call_the_heartbeat_url_if_one_is_not_configured():
with patch("apps.alerts.tasks.check_escalation_finished.requests") as mock_requests:
send_alert_group_escalation_auditor_task_heartbeat()
mock_requests.get.assert_not_called()
@override_settings(ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL=MOCKED_HEARTBEAT_URL)
def test_send_alert_group_escalation_auditor_task_heartbeat_calls_the_heartbeat_url_if_one_is_configured():
with patch("apps.alerts.tasks.check_escalation_finished.requests") as mock_requests:
send_alert_group_escalation_auditor_task_heartbeat()
mock_requests.get.assert_called_once_with(MOCKED_HEARTBEAT_URL)
mock_requests.get.return_value.raise_for_status.assert_called_once_with()
@override_settings(ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL=MOCKED_HEARTBEAT_URL)
def test_send_alert_group_escalation_auditor_task_heartbeat_raises_an_exception_if_the_heartbeat_url_request_fails():
with patch("apps.alerts.tasks.check_escalation_finished.requests") as mock_requests:
mock_response = Mock()
mock_response.status_code = 500
mock_response.raise_for_status.side_effect = requests.exceptions.HTTPError
mock_requests.get.return_value = mock_response
with pytest.raises(requests.exceptions.HTTPError):
send_alert_group_escalation_auditor_task_heartbeat()
mock_requests.get.assert_called_once_with(MOCKED_HEARTBEAT_URL)
mock_requests.get.return_value.raise_for_status.assert_called_once_with()
@pytest.mark.django_db
def test_check_escalation_finished_task(
def test_audit_alert_group_escalation_raises_exception_if_the_alert_group_does_not_have_an_escalation_snapshot(
escalation_snapshot_test_setup,
):
alert_group, _, _, _ = escalation_snapshot_test_setup
alert_group.escalation_snapshot = None
with pytest.raises(AlertGroupEscalationPolicyExecutionAuditException):
audit_alert_group_escalation(alert_group)
@pytest.mark.django_db
def test_audit_alert_group_escalation_skips_further_validation_if_the_escalation_policies_snapshots_is_empty(
escalation_snapshot_test_setup,
):
alert_group, _, _, _ = escalation_snapshot_test_setup
alert_group.escalation_snapshot.escalation_policies_snapshots = []
audit_alert_group_escalation(alert_group)
alert_group.escalation_snapshot.escalation_policies_snapshots = None
audit_alert_group_escalation(alert_group)
@pytest.mark.django_db
@pytest.mark.parametrize(
"next_step_eta_is_valid_return_value,raises_exception",
[
(None, False),
(True, False),
(False, True),
],
)
def test_audit_alert_group_escalation_next_step_eta_validation(
escalation_snapshot_test_setup, next_step_eta_is_valid_return_value, raises_exception
):
alert_group, _, _, _ = escalation_snapshot_test_setup
with patch(
"apps.alerts.escalation_snapshot.snapshot_classes.escalation_snapshot.EscalationSnapshot.next_step_eta_is_valid"
) as mock_next_step_eta_is_valid:
mock_next_step_eta_is_valid.return_value = next_step_eta_is_valid_return_value
if raises_exception:
with pytest.raises(AlertGroupEscalationPolicyExecutionAuditException):
audit_alert_group_escalation(alert_group)
else:
try:
audit_alert_group_escalation(alert_group)
except AlertGroupEscalationPolicyExecutionAuditException:
pytest.fail()
mock_next_step_eta_is_valid.assert_called_once_with()
@pytest.mark.django_db
def test_audit_alert_group_escalation_no_executed_escalation_policy_snapshots(escalation_snapshot_test_setup):
alert_group, _, _, _ = escalation_snapshot_test_setup
with patch(
"apps.alerts.escalation_snapshot.snapshot_classes.escalation_snapshot.EscalationSnapshot.executed_escalation_policy_snapshots",
new_callable=PropertyMock,
) as mock_executed_escalation_policy_snapshots:
mock_executed_escalation_policy_snapshots.return_value = []
audit_alert_group_escalation(alert_group)
mock_executed_escalation_policy_snapshots.assert_called_once_with()
# # see TODO: comment in engine/apps/alerts/tasks/check_escalation_finished.py
# @pytest.mark.django_db
# def test_audit_alert_group_escalation_all_executed_escalation_policy_snapshots_have_triggered_log_records(
# escalation_snapshot_test_setup,
# make_organization_and_user,
# make_alert_group_log_record,
# ):
# _, user = make_organization_and_user()
# alert_group, _, _, _ = escalation_snapshot_test_setup
# escalation_policies_snapshots = alert_group.escalation_snapshot.escalation_policies_snapshots
# for escalation_policy_snapshot in escalation_policies_snapshots:
# escalation_policy = EscalationPolicy.objects.get(id=escalation_policy_snapshot.id)
# log_record_type = _get_relevant_log_record_type()
# make_alert_group_log_record(alert_group, log_record_type, user, escalation_policy=escalation_policy)
# with patch(
# "apps.alerts.escalation_snapshot.snapshot_classes.escalation_snapshot.EscalationSnapshot.executed_escalation_policy_snapshots",
# new_callable=PropertyMock,
# ) as mock_executed_escalation_policy_snapshots:
# mock_executed_escalation_policy_snapshots.return_value = escalation_policies_snapshots
# audit_alert_group_escalation(alert_group)
# mock_executed_escalation_policy_snapshots.assert_called_once_with()
# see TODO: comment in engine/apps/alerts/tasks/check_escalation_finished.py
# @pytest.mark.django_db
# def test_audit_alert_group_escalation_one_executed_escalation_policy_snapshot_does_not_have_a_triggered_log_record(
# escalation_snapshot_test_setup,
# make_organization_and_user,
# make_alert_group_log_record,
# ):
# _, user = make_organization_and_user()
# alert_group, _, _, _ = escalation_snapshot_test_setup
# escalation_policies_snapshots = alert_group.escalation_snapshot.escalation_policies_snapshots
# # let's skip creating a relevant alert group log record for the first executed escalation policy
# for idx, escalation_policy_snapshot in enumerate(escalation_policies_snapshots):
# if idx != 0:
# escalation_policy = EscalationPolicy.objects.get(id=escalation_policy_snapshot.id)
# make_alert_group_log_record(
# alert_group, _get_relevant_log_record_type(), user, escalation_policy=escalation_policy
# )
# with patch(
# "apps.alerts.escalation_snapshot.snapshot_classes.escalation_snapshot.EscalationSnapshot.executed_escalation_policy_snapshots",
# new_callable=PropertyMock,
# ) as mock_executed_escalation_policy_snapshots:
# mock_executed_escalation_policy_snapshots.return_value = escalation_policies_snapshots
# with pytest.raises(AlertGroupEscalationPolicyExecutionAuditException):
# audit_alert_group_escalation(alert_group)
# mock_executed_escalation_policy_snapshots.assert_called_once_with()
@pytest.mark.django_db
def test_check_escalation_finished_task_queries_doesnt_grab_alert_groups_outside_of_date_range(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, user = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(
organization, integration=AlertReceiveChannel.INTEGRATION_GRAFANA
)
alert_group = make_alert_group(alert_receive_channel)
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
now = timezone.now()
two_days_ago = now - timezone.timedelta(days=2)
two_days_in_future = now + timezone.timedelta(days=2)
# we don't have escalation finish time, seems we cannot calculate it due escalation chain snapshot has uncalculated
# steps or does not exist, no exception is raised
check_escalation_finished_task()
# we can't simply pass started_at to the fixture because started_at is being "auto-set" on the Model
alert_group1 = make_alert_group(alert_receive_channel)
alert_group1.started_at = now
# it's acceptable time for finish escalation, because we have tolerance time 5 min from now, no exception is raised
alert_group.estimate_escalation_finish_time = now
alert_group.save()
check_escalation_finished_task()
alert_group2 = make_alert_group(alert_receive_channel)
alert_group2.started_at = now - timezone.timedelta(days=5)
# it is acceptable time for finish escalation, so no exception is raised
alert_group.estimate_escalation_finish_time = now + timezone.timedelta(minutes=10)
alert_group.save()
check_escalation_finished_task()
alert_group3 = make_alert_group(alert_receive_channel)
alert_group3.started_at = now + timezone.timedelta(days=5)
# escalation is not finished yet and passed more than 5 minutes after estimate time, exception is raised
alert_group.estimate_escalation_finish_time = now - timezone.timedelta(minutes=10)
alert_group.save()
with pytest.raises(Exception):
check_escalation_finished_task()
AlertGroup.all_objects.bulk_update([alert_group1, alert_group2, alert_group3], ["started_at"])
# escalation is finished and we don't care anymore about its finish time, so no exception is raised
alert_group.is_escalation_finished = True
alert_group.save()
check_escalation_finished_task()
with patch(
"apps.alerts.tasks.check_escalation_finished.get_auditable_alert_groups_started_at_range"
) as mocked_get_auditable_alert_groups_started_at_range:
with patch(
"apps.alerts.tasks.check_escalation_finished.audit_alert_group_escalation"
) as mocked_audit_alert_group_escalation:
with patch(
"apps.alerts.tasks.check_escalation_finished.send_alert_group_escalation_auditor_task_heartbeat"
) as mocked_send_alert_group_escalation_auditor_task_heartbeat:
mocked_get_auditable_alert_groups_started_at_range.return_value = (two_days_ago, two_days_in_future)
check_escalation_finished_task()
mocked_audit_alert_group_escalation.assert_called_once_with(alert_group1)
mocked_send_alert_group_escalation_auditor_task_heartbeat.assert_called_once_with()
@pytest.mark.django_db
def test_check_escalation_finished_task_calls_audit_alert_group_escalation_for_every_alert_group(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
now = timezone.now()
two_days_ago = now - timezone.timedelta(days=2)
two_days_in_future = now + timezone.timedelta(days=2)
# we can't simply pass started_at to the fixture because started_at is being "auto-set" on the Model
alert_group1 = make_alert_group(alert_receive_channel)
alert_group1.started_at = now
alert_group2 = make_alert_group(alert_receive_channel)
alert_group2.started_at = now
alert_group3 = make_alert_group(alert_receive_channel)
alert_group3.started_at = now
AlertGroup.all_objects.bulk_update([alert_group1, alert_group2, alert_group3], ["started_at"])
with patch(
"apps.alerts.tasks.check_escalation_finished.get_auditable_alert_groups_started_at_range"
) as mocked_get_auditable_alert_groups_started_at_range:
with patch(
"apps.alerts.tasks.check_escalation_finished.audit_alert_group_escalation"
) as mocked_audit_alert_group_escalation:
with patch(
"apps.alerts.tasks.check_escalation_finished.send_alert_group_escalation_auditor_task_heartbeat"
) as mocked_send_alert_group_escalation_auditor_task_heartbeat:
mocked_get_auditable_alert_groups_started_at_range.return_value = (two_days_ago, two_days_in_future)
check_escalation_finished_task()
mocked_audit_alert_group_escalation.assert_any_call(alert_group1)
mocked_audit_alert_group_escalation.assert_any_call(alert_group2)
mocked_audit_alert_group_escalation.assert_any_call(alert_group3)
mocked_send_alert_group_escalation_auditor_task_heartbeat.assert_called_once_with()
@pytest.mark.django_db
def test_check_escalation_finished_task_simply_calls_heartbeat_when_no_alert_groups_found():
with patch(
"apps.alerts.tasks.check_escalation_finished.audit_alert_group_escalation"
) as mocked_audit_alert_group_escalation:
with patch(
"apps.alerts.tasks.check_escalation_finished.send_alert_group_escalation_auditor_task_heartbeat"
) as mocked_send_alert_group_escalation_auditor_task_heartbeat:
check_escalation_finished_task()
mocked_audit_alert_group_escalation.assert_not_called()
mocked_send_alert_group_escalation_auditor_task_heartbeat.assert_called_once_with()
@pytest.mark.django_db
def test_check_escalation_finished_task_calls_audit_alert_group_escalation_for_every_alert_group_even_if_one_fails(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
now = timezone.now()
two_days_ago = now - timezone.timedelta(days=2)
two_days_in_future = now + timezone.timedelta(days=2)
# we can't simply pass started_at to the fixture because started_at is being "auto-set" on the Model
alert_group1 = make_alert_group(alert_receive_channel)
alert_group1.started_at = now
alert_group2 = make_alert_group(alert_receive_channel)
alert_group2.started_at = now
alert_group3 = make_alert_group(alert_receive_channel)
alert_group3.started_at = now
AlertGroup.all_objects.bulk_update([alert_group1, alert_group2, alert_group3], ["started_at"])
def _mocked_audit_alert_group_escalation(alert_group):
if not alert_group.id == alert_group3.id:
raise AlertGroupEscalationPolicyExecutionAuditException("asdfasdf")
with patch(
"apps.alerts.tasks.check_escalation_finished.get_auditable_alert_groups_started_at_range"
) as mocked_get_auditable_alert_groups_started_at_range:
with patch(
"apps.alerts.tasks.check_escalation_finished.audit_alert_group_escalation"
) as mocked_audit_alert_group_escalation:
with patch(
"apps.alerts.tasks.check_escalation_finished.send_alert_group_escalation_auditor_task_heartbeat"
) as mocked_send_alert_group_escalation_auditor_task_heartbeat:
mocked_get_auditable_alert_groups_started_at_range.return_value = (two_days_ago, two_days_in_future)
mocked_audit_alert_group_escalation.side_effect = _mocked_audit_alert_group_escalation
with pytest.raises(AlertGroupEscalationPolicyExecutionAuditException) as exc:
check_escalation_finished_task()
assert (
str(exc.value)
== f"The following alert group id(s) failed auditing: {alert_group1.id}, {alert_group2.id}"
)
mocked_audit_alert_group_escalation.assert_any_call(alert_group1)
mocked_audit_alert_group_escalation.assert_any_call(alert_group2)
mocked_audit_alert_group_escalation.assert_any_call(alert_group3)
mocked_send_alert_group_escalation_auditor_task_heartbeat.assert_not_called()

View file

@ -134,7 +134,7 @@ def test_escalation_step_notify_multiple_users(
escalation_step_test_setup,
make_escalation_policy,
):
organization, user, _, channel_filter, alert_group, reason = escalation_step_test_setup
_, user, _, channel_filter, alert_group, reason = escalation_step_test_setup
notify_users_step = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
@ -386,7 +386,7 @@ def test_escalation_step_notify_if_num_alerts_in_window(
).exists()
assert not mocked_execute_tasks.called
organization, user, _, channel_filter, alert_group, reason = escalation_step_test_setup
_, _, _, channel_filter, alert_group, reason = escalation_step_test_setup
make_alert(alert_group=alert_group, raw_request_data={})

View file

@ -1,5 +1,3 @@
import datetime
import pytest
from django.utils import timezone
@ -11,54 +9,6 @@ from apps.alerts.escalation_snapshot.snapshot_classes import (
from apps.alerts.models import EscalationPolicy
@pytest.fixture()
def escalation_snapshot_test_setup(
make_organization_and_user,
make_user_for_organization,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_escalation_policy,
make_alert_group,
):
organization, user_1 = make_organization_and_user()
user_2 = make_user_for_organization(organization)
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization)
channel_filter = make_channel_filter(
alert_receive_channel,
escalation_chain=escalation_chain,
notification_backends={"BACKEND": {"channel_id": "abc123"}},
)
notify_to_multiple_users_step = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_NOTIFY_MULTIPLE_USERS,
)
notify_to_multiple_users_step.notify_to_users_queue.set([user_1, user_2])
wait_step = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_WAIT,
wait_delay=EscalationPolicy.FIFTEEN_MINUTES,
)
# random time for test
from_time = datetime.time(10, 30)
to_time = datetime.time(18, 45)
notify_if_time_step = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_NOTIFY_IF_TIME,
from_time=from_time,
to_time=to_time,
)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
alert_group.save()
return alert_group, notify_to_multiple_users_step, wait_step, notify_if_time_step
@pytest.mark.django_db
def test_raw_escalation_snapshot(escalation_snapshot_test_setup):
alert_group, notify_to_multiple_users_step, wait_step, notify_if_time_step = escalation_snapshot_test_setup
@ -142,7 +92,7 @@ def test_raw_escalation_snapshot(escalation_snapshot_test_setup):
@pytest.mark.django_db
def test_serialized_escalation_snapshot(escalation_snapshot_test_setup):
alert_group, notify_to_multiple_users_step, wait_step, notify_if_time_step = escalation_snapshot_test_setup
alert_group, _, _, _ = escalation_snapshot_test_setup
escalation_snapshot = alert_group.escalation_snapshot
assert isinstance(escalation_snapshot, EscalationSnapshot)
assert escalation_snapshot.channel_filter_snapshot is not None and isinstance(
@ -163,7 +113,7 @@ def test_serialized_escalation_snapshot(escalation_snapshot_test_setup):
@pytest.mark.django_db
def test_escalation_snapshot_with_deleted_channel_filter(escalation_snapshot_test_setup):
alert_group, notify_to_multiple_users_step, wait_step, notify_if_time_step = escalation_snapshot_test_setup
alert_group, _, _, _ = escalation_snapshot_test_setup
alert_group.channel_filter.delete()
escalation_snapshot = alert_group.escalation_snapshot
@ -174,7 +124,7 @@ def test_escalation_snapshot_with_deleted_channel_filter(escalation_snapshot_tes
@pytest.mark.django_db
def test_change_escalation_snapshot(escalation_snapshot_test_setup):
alert_group, notify_to_multiple_users_step, wait_step, notify_if_time_step = escalation_snapshot_test_setup
alert_group, _, _, _ = escalation_snapshot_test_setup
new_active_order = 2
now = timezone.now()
@ -194,7 +144,7 @@ def test_change_escalation_snapshot(escalation_snapshot_test_setup):
@pytest.mark.django_db
def test_next_escalation_policy_snapshot(escalation_snapshot_test_setup):
alert_group, notify_to_multiple_users_step, wait_step, notify_if_time_step = escalation_snapshot_test_setup
alert_group, _, _, _ = escalation_snapshot_test_setup
escalation_snapshot = alert_group.escalation_snapshot
assert escalation_snapshot.last_active_escalation_policy_order is None
@ -226,3 +176,39 @@ def test_next_escalation_policy_snapshot(escalation_snapshot_test_setup):
is escalation_snapshot.escalation_policies_snapshots[-1]
)
assert escalation_snapshot.next_active_escalation_policy_snapshot is None
@pytest.mark.django_db
@pytest.mark.parametrize(
"next_step_eta,expected",
[
(None, None),
(timezone.now() - timezone.timedelta(weeks=50), False),
(timezone.now() - timezone.timedelta(minutes=4), True),
(timezone.now() + timezone.timedelta(minutes=4), True),
],
)
def test_next_step_eta_is_valid(escalation_snapshot_test_setup, next_step_eta, expected) -> None:
alert_group, _, _, _ = escalation_snapshot_test_setup
escalation_snapshot = alert_group.escalation_snapshot
escalation_snapshot.next_step_eta = next_step_eta
assert escalation_snapshot.next_step_eta_is_valid() is expected
@pytest.mark.django_db
def test_executed_escalation_policy_snapshots(escalation_snapshot_test_setup):
alert_group, _, _, _ = escalation_snapshot_test_setup
escalation_snapshot = alert_group.escalation_snapshot
escalation_snapshot.last_active_escalation_policy_order = None
assert escalation_snapshot.executed_escalation_policy_snapshots == []
escalation_snapshot.last_active_escalation_policy_order = 0
assert escalation_snapshot.executed_escalation_policy_snapshots == [
escalation_snapshot.escalation_policies_snapshots[0]
]
escalation_snapshot.last_active_escalation_policy_order = len(escalation_snapshot.escalation_policies_snapshots)
assert escalation_snapshot.executed_escalation_policy_snapshots == escalation_snapshot.escalation_policies_snapshots

View file

@ -0,0 +1,658 @@
from unittest.mock import PropertyMock, patch
import pytest
import pytz
from rest_framework.exceptions import ValidationError
from apps.alerts.escalation_snapshot.snapshot_classes import EscalationSnapshot
from apps.alerts.models import EscalationPolicy
MOCK_SLACK_CHANNEL_ID = "asdfljaskdf"
EMPTY_RAW_ESCALATION_SNAPSHOT = {
"channel_filter_snapshot": None,
"escalation_chain_snapshot": None,
"last_active_escalation_policy_order": None,
"escalation_policies_snapshots": [],
"slack_channel_id": None,
"pause_escalation": False,
"next_step_eta": None,
}
@patch("apps.alerts.models.alert_group.AlertGroup.slack_channel_id", new_callable=PropertyMock)
@pytest.mark.django_db
def test_build_raw_escalation_snapshot_escalation_chain_exists(
mock_alert_group_slack_channel_id,
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_escalation_policy,
make_alert_group,
):
mock_alert_group_slack_channel_id.return_value = MOCK_SLACK_CHANNEL_ID
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_WAIT,
wait_delay=EscalationPolicy.FIFTEEN_MINUTES,
)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
expected_snapshot = EscalationSnapshot.serializer(
{
"channel_filter_snapshot": alert_group.channel_filter,
"escalation_chain_snapshot": alert_group.channel_filter.escalation_chain,
"escalation_policies_snapshots": alert_group.channel_filter.escalation_chain.escalation_policies.all(),
"slack_channel_id": MOCK_SLACK_CHANNEL_ID,
}
)
assert alert_group.build_raw_escalation_snapshot() == expected_snapshot.data
@patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin.pause_escalation",
new_callable=PropertyMock,
)
@pytest.mark.django_db
def test_build_raw_escalation_snapshot_escalation_chain_does_not_exist_escalation_paused(
mocked_pause_escalation,
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_alert_group,
):
mocked_pause_escalation.return_value = True
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.build_raw_escalation_snapshot() == EMPTY_RAW_ESCALATION_SNAPSHOT
@pytest.mark.django_db
def test_build_raw_escalation_snapshot_escalation_chain_does_not_exist_no_channel_filter(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
assert alert_group.build_raw_escalation_snapshot() == EMPTY_RAW_ESCALATION_SNAPSHOT
@pytest.mark.django_db
def test_build_raw_escalation_snapshot_escalation_chain_does_not_exist_no_channel_filter_escalation_chain(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.build_raw_escalation_snapshot() == EMPTY_RAW_ESCALATION_SNAPSHOT
@patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin.channel_filter_snapshot",
new_callable=PropertyMock,
)
@pytest.mark.django_db
def test_channel_filter_with_respect_to_escalation_snapshot(
mock_channel_filter_snapshot,
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
channel_filter_snapshot = "asdfasdfadsfadsf"
mock_channel_filter_snapshot.return_value = channel_filter_snapshot
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.channel_filter_with_respect_to_escalation_snapshot == channel_filter_snapshot
@patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin.channel_filter_snapshot",
new_callable=PropertyMock,
)
@pytest.mark.django_db
def test_channel_filter_with_respect_to_escalation_snapshot_no_channel_filter_snapshot(
mock_channel_filter_snapshot,
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
mock_channel_filter_snapshot.return_value = None
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.channel_filter_with_respect_to_escalation_snapshot == channel_filter
@patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin.escalation_chain_snapshot",
new_callable=PropertyMock,
)
@pytest.mark.django_db
def test_escalation_chain_with_respect_to_escalation_snapshot(
mock_escalation_chain_snapshot,
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
escalation_chain_snapshot = "asdfasdfadsfadsf"
mock_escalation_chain_snapshot.return_value = escalation_chain_snapshot
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.escalation_chain_with_respect_to_escalation_snapshot == escalation_chain_snapshot
@patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin.escalation_chain_snapshot",
new_callable=PropertyMock,
)
@pytest.mark.django_db
def test_escalation_chain_with_respect_to_escalation_snapshot_no_escalation_chain_snapshot(
mock_escalation_chain_snapshot,
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_alert_group,
):
mock_escalation_chain_snapshot.return_value = None
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.escalation_chain_with_respect_to_escalation_snapshot == escalation_chain
alert_group = make_alert_group(alert_receive_channel)
assert alert_group.channel_filter is None
assert alert_group.escalation_chain_with_respect_to_escalation_snapshot is None
@patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin.escalation_chain_snapshot",
new_callable=PropertyMock,
)
@pytest.mark.django_db
def test_escalation_chain_with_respect_to_escalation_snapshot_no_escalation_chain_snapshot_and_no_channel_filter(
mock_escalation_chain_snapshot,
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
mock_escalation_chain_snapshot.return_value = None
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
assert alert_group.escalation_chain_with_respect_to_escalation_snapshot is None
@pytest.mark.django_db
def test_channel_filter_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_escalation_policy,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_WAIT,
wait_delay=EscalationPolicy.FIFTEEN_MINUTES,
)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.channel_filter_snapshot.id == channel_filter.id
@pytest.mark.django_db
def test_channel_filter_snapshot_no_escalation_chain_exists(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.raw_escalation_snapshot["channel_filter_snapshot"] is None
assert alert_group.channel_filter_snapshot is None
@pytest.mark.django_db
def test_channel_filter_snapshot_no_alert_group_raw_escalation_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.channel_filter_snapshot is None
@pytest.mark.django_db
def test_escalation_chain_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_escalation_policy,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_WAIT,
wait_delay=EscalationPolicy.FIFTEEN_MINUTES,
)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.escalation_chain_snapshot.id == escalation_chain.id
@pytest.mark.django_db
def test_escalation_chain_snapshot_no_escalation_chain_exists(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.raw_escalation_snapshot["escalation_chain_snapshot"] is None
assert alert_group.escalation_chain_snapshot is None
@pytest.mark.django_db
def test_escalation_chain_snapshot_no_alert_group_raw_escalation_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.escalation_chain_snapshot is None
@pytest.mark.django_db
def test_escalation_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
return_value = "asdfasdfasdf"
with patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin._deserialize_escalation_snapshot",
return_value=return_value,
):
assert alert_group.escalation_snapshot == return_value
@pytest.mark.django_db
def test_escalation_snapshot_validation_error(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
with patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin._deserialize_escalation_snapshot",
side_effect=ValidationError("asdfasdf"),
):
assert alert_group.escalation_snapshot is None
@pytest.mark.django_db
def test_escalation_snapshot_no_alert_group_raw_escalation_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.escalation_snapshot is None
@pytest.mark.django_db
def test_escalation_snapshot_empty_escalation_policies_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.raw_escalation_snapshot is not None
assert alert_group.has_escalation_policies_snapshots is False
@pytest.mark.django_db
def test_escalation_snapshot_nonempty_escalation_policies_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_escalation_policy,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_WAIT,
wait_delay=EscalationPolicy.FIFTEEN_MINUTES,
)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.raw_escalation_snapshot is not None
assert alert_group.has_escalation_policies_snapshots is True
@pytest.mark.django_db
def test_has_escalation_policies_snapshots_no_alert_group_raw_escalation_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.raw_escalation_snapshot is None
assert alert_group.has_escalation_policies_snapshots is False
@patch("apps.alerts.models.alert_group.AlertGroup.slack_channel_id", new_callable=PropertyMock)
@pytest.mark.django_db
def test_deserialize_escalation_snapshot(
mock_alert_group_slack_channel_id,
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_escalation_policy,
make_alert_group,
):
mock_alert_group_slack_channel_id.return_value = MOCK_SLACK_CHANNEL_ID
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
escalation_policy = make_escalation_policy(
escalation_chain=channel_filter.escalation_chain,
escalation_policy_step=EscalationPolicy.STEP_WAIT,
wait_delay=EscalationPolicy.FIFTEEN_MINUTES,
)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
deserialized_escalation_snapshot = alert_group._deserialize_escalation_snapshot(alert_group.raw_escalation_snapshot)
assert deserialized_escalation_snapshot.alert_group == alert_group
assert deserialized_escalation_snapshot.channel_filter_snapshot.id == channel_filter.id
assert deserialized_escalation_snapshot.escalation_chain_snapshot.id == escalation_chain.id
assert deserialized_escalation_snapshot.last_active_escalation_policy_order is None
assert len(deserialized_escalation_snapshot.escalation_policies_snapshots) == 1
assert deserialized_escalation_snapshot.escalation_policies_snapshots[0].id == escalation_policy.id
assert deserialized_escalation_snapshot.slack_channel_id == MOCK_SLACK_CHANNEL_ID
assert deserialized_escalation_snapshot.pause_escalation is False
assert deserialized_escalation_snapshot.next_step_eta is None
assert deserialized_escalation_snapshot.stop_escalation is False
@pytest.mark.django_db
def test_escalation_chain_exists(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_escalation_chain,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
escalation_chain = make_escalation_chain(organization=organization)
channel_filter = make_channel_filter(alert_receive_channel, escalation_chain=escalation_chain)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.pause_escalation is False
assert alert_group.escalation_chain_exists is True
@patch(
"apps.alerts.escalation_snapshot.escalation_snapshot_mixin.EscalationSnapshotMixin.pause_escalation",
new_callable=PropertyMock,
)
@pytest.mark.django_db
def test_escalation_chain_exists_paused_escalation(
mocked_pause_escalation,
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
mocked_pause_escalation.return_value = True
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
assert alert_group.pause_escalation is True
assert alert_group.escalation_chain_exists is False
@pytest.mark.django_db
def test_escalation_chain_exists_no_channel_filter(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
assert alert_group.pause_escalation is False
assert alert_group.channel_filter is None
assert alert_group.escalation_chain_exists is False
@pytest.mark.django_db
def test_escalation_chain_exists_no_channel_filter_escalation_chain(
make_organization_and_user,
make_alert_receive_channel,
make_channel_filter,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
channel_filter = make_channel_filter(alert_receive_channel)
alert_group = make_alert_group(alert_receive_channel, channel_filter=channel_filter)
assert alert_group.pause_escalation is False
assert alert_group.channel_filter == channel_filter
assert alert_group.channel_filter.escalation_chain is None
assert alert_group.escalation_chain_exists is False
@pytest.mark.django_db
def test_pause_escalation_no_raw_escalation_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
assert alert_group.raw_escalation_snapshot is None
assert alert_group.pause_escalation is False
@pytest.mark.django_db
def test_pause_escalation_raw_escalation_snapshot_exists(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.raw_escalation_snapshot is not None
assert alert_group.raw_escalation_snapshot["pause_escalation"] is False
alert_group.raw_escalation_snapshot["pause_escalation"] = True
assert alert_group.pause_escalation is True
@pytest.mark.django_db
def test_next_step_eta_no_raw_escalation_snapshot(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
assert alert_group.raw_escalation_snapshot is None
assert alert_group.next_step_eta is None
@pytest.mark.django_db
def test_next_step_eta_no_next_step_eta(
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
assert alert_group.raw_escalation_snapshot is not None
assert alert_group.raw_escalation_snapshot["next_step_eta"] is None
assert alert_group.next_step_eta is None
@patch("apps.alerts.escalation_snapshot.escalation_snapshot_mixin.parse")
@pytest.mark.django_db
def test_next_step_eta(
mock_dateutil_parser,
make_organization_and_user,
make_alert_receive_channel,
make_alert_group,
):
mocked_raw_date = "mcvnmcvmnvc"
mocked_parsed_date = "asdfasdfaf"
mock_dateutil_parser.return_value.replace.return_value = mocked_parsed_date
organization, _ = make_organization_and_user()
alert_receive_channel = make_alert_receive_channel(organization)
alert_group = make_alert_group(alert_receive_channel)
alert_group.raw_escalation_snapshot = alert_group.build_raw_escalation_snapshot()
alert_group.raw_escalation_snapshot["next_step_eta"] = mocked_raw_date
assert alert_group.raw_escalation_snapshot is not None
assert alert_group.raw_escalation_snapshot["next_step_eta"] is mocked_raw_date
assert alert_group.next_step_eta == mocked_parsed_date
mock_dateutil_parser.assert_called_once_with(mocked_raw_date)
mock_dateutil_parser.return_value.replace.assert_called_once_with(tzinfo=pytz.UTC)

View file

@ -5,11 +5,11 @@ from django.conf import settings
def get_random_readonly_database_key_if_present_otherwise_default() -> str:
"""
This function returns a string, representing a key in the DATABASES django settings.
If settings.READONLY_DATABASES is set, and non-empty, it randomly chooses one of the read-only databases,
This function returns a string, representing a key in the `DATABASES` django settings.
If `settings.READONLY_DATABASES` is set, and non-empty, it randomly chooses one of the read-only databases,
otherwise it falls back to "default".
This is primarily intended to be used for django's QuerySet.using() function
This is primarily intended to be used for django's `QuerySet.using()` function
"""
using_db = "default"
if hasattr(settings, "READONLY_DATABASES") and len(settings.READONLY_DATABASES) > 0:

View file

@ -49,9 +49,6 @@ class Command(BaseCommand):
alert_group.unsilence_task_uuid = task_id
escalation_start_time = max(now, alert_group.silenced_until)
alert_group.estimate_escalation_finish_time = alert_group.calculate_eta_for_finish_escalation(
start_time=escalation_start_time,
)
alert_groups_to_update.append(alert_group)
tasks.append(
@ -65,9 +62,6 @@ class Command(BaseCommand):
# otherwise start escalate_alert_group task
else:
if alert_group.escalation_snapshot:
alert_group.estimate_escalation_finish_time = alert_group.calculate_eta_for_finish_escalation(
escalation_started=True,
)
alert_group.active_escalation_id = task_id
alert_groups_to_update.append(alert_group)
@ -82,7 +76,7 @@ class Command(BaseCommand):
AlertGroup.all_objects.bulk_update(
alert_groups_to_update,
["estimate_escalation_finish_time", "active_escalation_id", "unsilence_task_uuid"],
["active_escalation_id", "unsilence_task_uuid"],
batch_size=5000,
)

View file

@ -1,9 +1,12 @@
import os
import shlex
import subprocess
from django.core.management.base import BaseCommand
from django.utils import autoreload
from common.utils import getenv_boolean
WORKER_ID = 0
@ -11,8 +14,18 @@ def restart_celery(*args, **kwargs):
global WORKER_ID
kill_worker_cmd = "celery -A engine control shutdown"
subprocess.call(shlex.split(kill_worker_cmd))
start_worker_cmd = "celery -A engine worker -l info --concurrency=3 -Q celery,retry -n {}".format(WORKER_ID)
subprocess.call(shlex.split(start_worker_cmd))
queues = os.environ.get("CELERY_WORKER_QUEUE", "celery,retry")
max_tasks_per_child = os.environ.get("CELERY_WORKER_MAX_TASKS_PER_CHILD", 100)
concurrency = os.environ.get("CELERY_WORKER_CONCURRENCY", 3)
log_level = "debug" if getenv_boolean("CELERY_WORKER_DEBUG_LOGS", False) else "info"
celery_args = f"-A engine worker -l {log_level} --concurrency={concurrency} -Q {queues} --max-tasks-per-child={max_tasks_per_child} -n {WORKER_ID}"
if getenv_boolean("CELERY_WORKER_BEAT_ENABLED", False):
celery_args += " --beat"
subprocess.call(shlex.split(f"celery {celery_args}"))
WORKER_ID = 1 + WORKER_ID

View file

@ -51,3 +51,4 @@ pyroscope-io==0.8.1
django-dbconn-retry==0.1.7
django-ipware==4.0.2
django-anymail==8.6
django-deprecate-fields==0.1.1

View file

@ -395,6 +395,10 @@ CELERY_MAX_TASKS_PER_CHILD = 1
CELERY_WORKER_SEND_TASK_EVENTS = True
CELERY_TASK_SEND_SENT_EVENT = True
ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL = os.getenv(
"ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_URL", None
)
CELERY_BEAT_SCHEDULE = {
"restore_heartbeat_tasks": {
"task": "apps.heartbeat.tasks.restore_heartbeat_tasks",
@ -403,7 +407,11 @@ CELERY_BEAT_SCHEDULE = {
},
"check_escalations": {
"task": "apps.alerts.tasks.check_escalation_finished.check_escalation_finished_task",
"schedule": 10 * 60,
# the task should be executed a minute or two less than the integration's configured interval
#
# ex. if the integration is configured to expect a heartbeat every 15 minutes then this value should be set
# to something like 13 * 60 (every 13 minutes)
"schedule": getenv_integer("ALERT_GROUP_ESCALATION_AUDITOR_CELERY_TASK_HEARTBEAT_INTERVAL", 13 * 60),
"args": (),
},
"start_refresh_ical_files": {