oncall-engine/engine/apps/integrations/tasks.py

178 lines
6.7 KiB
Python
Raw Normal View History

import logging
import random
Support alert routing based on labels (#3778) # What this PR does This PR adds support for routing alerts based on labels. https://www.loom.com/share/4401de6e3c4945d5b8961fe43ee373c9 Additionally: - improve the typing around the `get_object` method that is inherited by [`PublicPrimaryKeyMixin.get_object`](https://github.com/grafana/oncall/blob/dev/engine/common/api_helpers/mixins.py#L153) in most of our models. `PublicPrimaryKeyMixin` is generic, so it can be more strongly typed when it is being subclassed, which results in better typing of the `get_object` method in child classes - I decided to do this because I started looking into this task via the [`AlertReceiveChannelView.send_demo_alert` method/endpoint](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/alert_receive_channel.py#L242). Within that method, `instance` is not typed because the inherited `get_object` method is not typed.. I digress 😄 - improve typing around `Alert.create` and `apps.integrations.tasks.create_alert` functions - make `Alert.render_group_data` more DRY by extracting some logic out into `Alert._apply_jinja_template_to_alert_payload_and_labels` - deduplicate the logic of `value.strip().lower() in ["1", "true", "ok"]` into a shared function, `common.jinja_templater.apply_jinja_template.templated_value_is_truthy` Closes https://github.com/grafana/oncall-private/issues/2490 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) - [x] Documentation added (or `pr:no public docs` PR label added if not required) (will be done in #3762)
2024-01-30 13:07:19 -05:00
import typing
from celery import shared_task
from celery.utils.log import get_task_logger
from django.conf import settings
from django.core.cache import cache
from apps.alerts.models.alert_group_counter import ConcurrentUpdateError
from apps.alerts.tasks import resolve_alert_group_by_source_if_needed
from apps.slack.client import SlackClient
from apps.slack.errors import SlackAPIError
from common.custom_celery_tasks import shared_dedicated_queue_retry_task
from common.custom_celery_tasks.create_alert_base_task import CreateAlertBaseTask
Support alert routing based on labels (#3778) # What this PR does This PR adds support for routing alerts based on labels. https://www.loom.com/share/4401de6e3c4945d5b8961fe43ee373c9 Additionally: - improve the typing around the `get_object` method that is inherited by [`PublicPrimaryKeyMixin.get_object`](https://github.com/grafana/oncall/blob/dev/engine/common/api_helpers/mixins.py#L153) in most of our models. `PublicPrimaryKeyMixin` is generic, so it can be more strongly typed when it is being subclassed, which results in better typing of the `get_object` method in child classes - I decided to do this because I started looking into this task via the [`AlertReceiveChannelView.send_demo_alert` method/endpoint](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/alert_receive_channel.py#L242). Within that method, `instance` is not typed because the inherited `get_object` method is not typed.. I digress 😄 - improve typing around `Alert.create` and `apps.integrations.tasks.create_alert` functions - make `Alert.render_group_data` more DRY by extracting some logic out into `Alert._apply_jinja_template_to_alert_payload_and_labels` - deduplicate the logic of `value.strip().lower() in ["1", "true", "ok"]` into a shared function, `common.jinja_templater.apply_jinja_template.templated_value_is_truthy` Closes https://github.com/grafana/oncall-private/issues/2490 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) - [x] Documentation added (or `pr:no public docs` PR label added if not required) (will be done in #3762)
2024-01-30 13:07:19 -05:00
if typing.TYPE_CHECKING:
from apps.alerts.models import Alert
logger = get_task_logger(__name__)
logger.setLevel(logging.DEBUG)
@shared_task(
base=CreateAlertBaseTask,
autoretry_for=(Exception,),
retry_backoff=True,
max_retries=1 if settings.DEBUG else None,
)
def create_alertmanager_alerts(alert_receive_channel_pk, alert, is_demo=False, received_at=None):
`apps.get_model` -> `import` (#2619) # What this PR does Remove [`apps.get_model`](https://docs.djangoproject.com/en/3.2/ref/applications/#django.apps.apps.get_model) invocations and use inline `import` statements in places where models are imported within functions/methods to avoid circular imports. I believe `import` statements are more appropriate for most use cases as they allow for better static code analysis & formatting, and solve the issue of circular imports without being unnecessarily dynamic as `apps.get_model`. With `import` statements, it's possible to: - Jump to model definitions in most IDEs - Automatically sort inline imports with `isort` - Find import errors faster/easier (most IDEs highlight broken imports) - Have more consistency across regular & inline imports when importing models This PR also adds a flake8 rule to ban imports of `django.apps.apps`, so it's harder to use `apps.get_model` by mistake (it's possible to ignore this rule by using `# noqa: I251`). The rule is not enforced on directories with migration files, because `apps.get_model` is often used to get a historical state of a model, which is useful when writing migrations ([see this SO answer for more details](https://stackoverflow.com/a/37769213)). So `apps.get_model` is considered OK in migrations (even necessary in some cases). ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
2023-07-25 10:43:23 +01:00
from apps.alerts.models import Alert, AlertReceiveChannel
alert_receive_channel = AlertReceiveChannel.objects_with_deleted.get(pk=alert_receive_channel_pk)
if (
alert_receive_channel.deleted_at is not None
or alert_receive_channel.integration == AlertReceiveChannel.INTEGRATION_MAINTENANCE
):
logger.info("AlertReceiveChannel alert ignored if deleted/maintenance")
return
try:
alert = Alert.create(
title=None,
message=None,
image_url=None,
link_to_upstream_details=None,
alert_receive_channel=alert_receive_channel,
integration_unique_data=None,
raw_request_data=alert,
enable_autoresolve=False,
is_demo=is_demo,
received_at=received_at,
)
except ConcurrentUpdateError:
# This error is raised when there are concurrent updates on AlertGroupCounter due to optimistic lock on it.
# The idea is to not block the worker with a database lock and retry the task in case of concurrent updates.
countdown = random.randint(1, 10)
create_alertmanager_alerts.apply_async((alert_receive_channel_pk, alert), countdown=countdown)
logger.warning(f"Retrying the task gracefully in {countdown} seconds due to ConcurrentUpdateError")
return
if alert_receive_channel.allow_source_based_resolving:
alert_group = alert.group
if alert_group.resolved_by != alert_group.NOT_YET_STOP_AUTORESOLVE:
task = resolve_alert_group_by_source_if_needed.apply_async((alert.group.pk,), countdown=5)
alert.group.active_resolve_calculation_id = task.id
alert.group.save(update_fields=["active_resolve_calculation_id"])
logger.debug(
f"Created alertmanager alert alert_id={alert.pk} alert_group_id={alert.group.pk} channel_id={alert_receive_channel.pk}"
)
@shared_task(
base=CreateAlertBaseTask,
autoretry_for=(Exception,),
retry_backoff=True,
max_retries=1 if settings.DEBUG else None,
)
def create_alert(
Support alert routing based on labels (#3778) # What this PR does This PR adds support for routing alerts based on labels. https://www.loom.com/share/4401de6e3c4945d5b8961fe43ee373c9 Additionally: - improve the typing around the `get_object` method that is inherited by [`PublicPrimaryKeyMixin.get_object`](https://github.com/grafana/oncall/blob/dev/engine/common/api_helpers/mixins.py#L153) in most of our models. `PublicPrimaryKeyMixin` is generic, so it can be more strongly typed when it is being subclassed, which results in better typing of the `get_object` method in child classes - I decided to do this because I started looking into this task via the [`AlertReceiveChannelView.send_demo_alert` method/endpoint](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/alert_receive_channel.py#L242). Within that method, `instance` is not typed because the inherited `get_object` method is not typed.. I digress 😄 - improve typing around `Alert.create` and `apps.integrations.tasks.create_alert` functions - make `Alert.render_group_data` more DRY by extracting some logic out into `Alert._apply_jinja_template_to_alert_payload_and_labels` - deduplicate the logic of `value.strip().lower() in ["1", "true", "ok"]` into a shared function, `common.jinja_templater.apply_jinja_template.templated_value_is_truthy` Closes https://github.com/grafana/oncall-private/issues/2490 ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) - [x] Documentation added (or `pr:no public docs` PR label added if not required) (will be done in #3762)
2024-01-30 13:07:19 -05:00
title: typing.Optional[str],
message: typing.Optional[str],
image_url: typing.Optional[str],
link_to_upstream_details: typing.Optional[str],
alert_receive_channel_pk: int,
integration_unique_data: typing.Optional[typing.Dict],
raw_request_data: "Alert.RawRequestData",
is_demo: bool = False,
received_at: typing.Optional[str] = None,
) -> None:
`apps.get_model` -> `import` (#2619) # What this PR does Remove [`apps.get_model`](https://docs.djangoproject.com/en/3.2/ref/applications/#django.apps.apps.get_model) invocations and use inline `import` statements in places where models are imported within functions/methods to avoid circular imports. I believe `import` statements are more appropriate for most use cases as they allow for better static code analysis & formatting, and solve the issue of circular imports without being unnecessarily dynamic as `apps.get_model`. With `import` statements, it's possible to: - Jump to model definitions in most IDEs - Automatically sort inline imports with `isort` - Find import errors faster/easier (most IDEs highlight broken imports) - Have more consistency across regular & inline imports when importing models This PR also adds a flake8 rule to ban imports of `django.apps.apps`, so it's harder to use `apps.get_model` by mistake (it's possible to ignore this rule by using `# noqa: I251`). The rule is not enforced on directories with migration files, because `apps.get_model` is often used to get a historical state of a model, which is useful when writing migrations ([see this SO answer for more details](https://stackoverflow.com/a/37769213)). So `apps.get_model` is considered OK in migrations (even necessary in some cases). ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
2023-07-25 10:43:23 +01:00
from apps.alerts.models import Alert, AlertReceiveChannel
try:
alert_receive_channel = AlertReceiveChannel.objects.get(pk=alert_receive_channel_pk)
except AlertReceiveChannel.DoesNotExist:
return
if image_url is not None:
image_url = str(image_url)[:299]
try:
alert = Alert.create(
title=title,
message=message,
image_url=image_url,
link_to_upstream_details=link_to_upstream_details,
alert_receive_channel=alert_receive_channel,
integration_unique_data=integration_unique_data,
raw_request_data=raw_request_data,
is_demo=is_demo,
received_at=received_at,
)
logger.debug(
f"Created alert alert_id={alert.pk} alert_group_id={alert.group.pk} channel_id={alert_receive_channel.pk}"
)
except ConcurrentUpdateError:
# This error is raised when there are concurrent updates on AlertGroupCounter due to optimistic lock on it.
# The idea is to not block the worker with a database lock and retry the task in case of concurrent updates.
countdown = random.randint(1, 10)
create_alert.apply_async(
(
title,
message,
image_url,
link_to_upstream_details,
alert_receive_channel_pk,
integration_unique_data,
raw_request_data,
),
kwargs={
"received_at": received_at,
},
countdown=countdown,
)
logger.warning(
f"Retrying the task gracefully in {countdown} seconds due to ConcurrentUpdateError for alert_receive_channel={alert_receive_channel_pk}"
)
@shared_dedicated_queue_retry_task()
def start_notify_about_integration_ratelimit(team_id, text, **kwargs):
notify_about_integration_ratelimit_in_slack.apply_async(
args=(
team_id,
text,
),
kwargs=kwargs,
expires=60 * 5,
)
@shared_dedicated_queue_retry_task(
autoretry_for=(Exception,), retry_backoff=True, max_retries=1 if settings.DEBUG else 5
)
def notify_about_integration_ratelimit_in_slack(organization_id, text, **kwargs):
# TODO: Review ratelimits
`apps.get_model` -> `import` (#2619) # What this PR does Remove [`apps.get_model`](https://docs.djangoproject.com/en/3.2/ref/applications/#django.apps.apps.get_model) invocations and use inline `import` statements in places where models are imported within functions/methods to avoid circular imports. I believe `import` statements are more appropriate for most use cases as they allow for better static code analysis & formatting, and solve the issue of circular imports without being unnecessarily dynamic as `apps.get_model`. With `import` statements, it's possible to: - Jump to model definitions in most IDEs - Automatically sort inline imports with `isort` - Find import errors faster/easier (most IDEs highlight broken imports) - Have more consistency across regular & inline imports when importing models This PR also adds a flake8 rule to ban imports of `django.apps.apps`, so it's harder to use `apps.get_model` by mistake (it's possible to ignore this rule by using `# noqa: I251`). The rule is not enforced on directories with migration files, because `apps.get_model` is often used to get a historical state of a model, which is useful when writing migrations ([see this SO answer for more details](https://stackoverflow.com/a/37769213)). So `apps.get_model` is considered OK in migrations (even necessary in some cases). ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)
2023-07-25 10:43:23 +01:00
from apps.user_management.models import Organization
try:
organization = Organization.objects.get(pk=organization_id)
except Organization.DoesNotExist:
logger.warning(f"Organization {organization_id} does not exist")
return
cache_key = f"notify_about_integration_ratelimit_in_slack_{organization.pk}"
if cache.get(cache_key):
logger.debug(f"Message was sent recently for organization {organization_id}")
return
else:
cache.set(cache_key, True, 60 * 15) # Set cache before sending message to make sure we don't ratelimit slack
slack_team_identity = organization.slack_team_identity
if slack_team_identity is not None:
try:
sc = SlackClient(slack_team_identity, enable_ratelimit_retry=True)
sc.chat_postMessage(channel=organization.general_log_channel_id, text=text)
except SlackAPIError as e:
logger.warning(f"Slack exception {e} while sending message for organization {organization_id}")