Commit graph

1091 commits

Author SHA1 Message Date
Joey Orlando
76a88bc0c1
Revert "upgrade to Python 3.12 (#3456)" and "bump uwsgi version to latest #3466" (#3483)
# What this PR does

This reverts commits 7c4b40a046 and
cdb22285db.

See https://github.com/grafana/oncall-private/pull/2361 for more
details.
2023-12-01 09:56:26 -05:00
Ildar Iskhakov
30caa18f9d
Make telegram on_alert_group_action_triggered asynchronous (#3471)
# What this PR does


[send_alert_group_signal](https://github.com/grafana/oncall/blob/dev/engine/apps/alerts/tasks/send_alert_group_signal.py#L12)
task is not idempotent. It launches
[on_alert_group_action_triggered_async](a2851d3f81/engine/apps/slack/representatives/alert_group_representative.py (L158))
for slack and then might fail on
[on_alert_group_action_triggered](b2f4ffb98a/engine/apps/telegram/alert_group_representative.py (L79))
(not async) due to database DoesNotExist exception.

This PR makes telegram representative asyncronous

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-12-01 10:49:00 +00:00
Vadim Stepanov
8188dd5dd2
Create missing direct paging integrations (#3468)
# What this PR does

Makes organization sync create direct paging integrations for Grafana
teams that don't have one.

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall-private/issues/2302

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 17:18:18 +00:00
Joey Orlando
7c4b40a046
upgrade to Python 3.12 (#3456)
# What this PR does

Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 13:47:41 +00:00
Matias Bordese
aa8a904a8d
Update when slack client ratelimit retry handler is enabled (#3447) 2023-11-30 12:35:46 +00:00
Vadim Stepanov
381a9ecf54
Delete duplicate direct paging integrations (#3412)
# What this PR does

Deletes duplicate direct paging integrations (i.e. keeps only the first
direct paging integration per team).
Also adds a unique constraint that will make such duplicates impossible
at the DB level.

## Which issue(s) this PR fixes

Related to https://github.com/grafana/oncall-private/issues/2302

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 11:19:12 +00:00
Matias Bordese
7aa78f5f73
Enable flake8-bugbear, fix issues (#3454)
Enables [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear),
checking for bugs/design problems, and [fixes the issues
found](https://pastebin.com/fEDBz6Ta) (some interesting ones,
particularly with mutable args).

Related to https://github.com/grafana/oncall/pull/3448
2023-11-29 15:04:48 +00:00
Ildar Iskhakov
2fdd885abe
Move new alert group metric creation into async task (#3451)
# What this PR does

Moving metrics creation into separate task to make alert ingestion more
robust

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Julia <ferril.darkdiver@gmail.com>
2023-11-29 12:45:36 +00:00
Rares Mardare
455f74560c
Alert group column/label selector (#3281)
# What this PR does

Adds new functionality to enable which columns should show on the alert
group page


![image](https://github.com/grafana/oncall/assets/40542072/952d4004-9cd6-478c-a104-cd5d270cfd58)

---------

Co-authored-by: Julia <ferril.darkdiver@gmail.com>
2023-11-29 12:11:31 +00:00
Matias Bordese
ec1f120d9c
Update transaction.on_commit to use partial instead of lambda (#3448)
For reference,
https://adamj.eu/tech/2022/08/22/use-partial-with-djangos-transaction-on-commit/.

I think this may help with this one too:
https://github.com/grafana/oncall-private/issues/2318
2023-11-29 12:01:30 +00:00
Innokentii Konstantinov
8c82dac6db
Rewrite LabelsAPIClient (#3422)
Rewrite LabelAPIClient to be able to return error messages from Label
Repo API.
Main features:
1. Raises LabelRepoAPIException when response code is 400 or 500 level.
2. Always return response as a second argument to further inspect it, if
necessary.
2023-11-29 08:56:42 +00:00
Michael Derynck
e9f2178da1
Change service account auth to use instance id instead (#3435)
# What this PR does
Change GrafanaServiceAccountAuth to use instance ID header in cloud
instead of slugs.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-28 15:56:29 +00:00
Innokentii Konstantinov
fb4bad21d2
Document tojson jinja filter (#3432)
# What this PR does
Document tojson jinja filter
2023-11-28 20:47:57 +08:00
Innokentii Konstantinov
ccc64e6b90 Fix 2023-11-28 13:22:18 +08:00
Innokentii Konstantinov
8f17784c49 Fix 2023-11-28 13:16:55 +08:00
Innokentii Konstantinov
22cfba0163 Remove call to bots_info in the slack message handler 2023-11-28 13:14:33 +08:00
Ildar Iskhakov
393c8e06a7
Merge branch 'main' into dev 2023-11-28 09:59:07 +08:00
Vadim Stepanov
9e889403f2
Alert group payload labels (#3434)
https://github.com/grafana/oncall/pull/3385 + handle null values
2023-11-27 17:53:54 +00:00
Vadim Stepanov
e09422a07d
Revert "Alert group payload labels" (#3433)
Reverts grafana/oncall#3385
2023-11-27 17:28:34 +00:00
Vadim Stepanov
5fac6aeac5
Alert group payload labels (#3385)
# What this PR does

Adds an ability to extract labels from alert group payload. See
[demo](https://www.loom.com/share/cf2b746eea974547b76f44298e32a54f?sid=67ed1e58-40ed-4136-a201-6482fb7773d3).

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/2304

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com>
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
2023-11-27 16:55:31 +00:00
Ildar Iskhakov
2ddb47ea29
Revert "Make alert ingestion cache independent (#3414)" (#3430)
This reverts commit acfba47a81.

# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-27 20:23:57 +08:00
Ildar Iskhakov
aab49f0594
Revert "Add test to ensure integrations work when cache is down" (#3431)
Reverts grafana/oncall#3418
2023-11-27 20:10:26 +08:00
Innokentii Konstantinov
85f9b0f168
Log slack bot_id and bot_user_it (#3429)
Log slack bot id and bot user id to check if we can avoid request to
slack api
2023-11-27 18:36:53 +08:00
Yulya Artyukhina
863af25994
Fix alert group rendering (#3424)
# What this PR does
Fix alert group rendering when some links were broken because of
replacing `-` to `_`.

## Which issue(s) this PR fixes
https://github.com/grafana/support-escalations/issues/8119
https://github.com/grafana/support-escalations/issues/8468

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-24 15:39:37 +00:00
Matias Bordese
0816825c5d
Add test to ensure integrations work when cache is down (#3418)
Complements https://github.com/grafana/oncall/pull/3414/
2023-11-24 12:02:36 +00:00
Matias Bordese
d730f6b2bf
Trigger distribute task after alert is committed (#3420)
Fix issue triggering task retries because alert is not yet committed to
the DB.
Similar to https://github.com/grafana/oncall/pull/3001.
2023-11-24 12:02:32 +00:00
Michael Derynck
3436344f1d
Allow users with user settings read to list users (#3419)
# What this PR does
Fixed issue where `User Settings Reader` was missing permission to list
users.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-23 20:41:27 +00:00
Matias Bordese
55fedb25d6
Rework alert group internal API team filter (#3413)
Related to https://github.com/grafana/oncall-private/issues/2177
2023-11-23 17:28:00 +00:00
Michael Derynck
60ef4348f5
Allow OnCall API to use Grafana Service Accounts (#3189)
# What this PR does
Allows public OnCall API to use Grafana service accounts for
authorization. In cloud requests using a Grafana service account token
also needs to provide headers for `X-Grafana-Org-Slug` and
`X-Grafana-Instance-Slug`

This is **alpha** functionality, it may break or be removed in the
future. Going to use this on one endpoint (resolution notes) before we
consider the implications across all of public API.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-23 16:42:27 +00:00
Ildar Iskhakov
95a3ab3b75
Revert "Cache independent ingestion" (#3417)
Reverts grafana/oncall#3415
2023-11-23 21:38:06 +08:00
Ildar Iskhakov
acfba47a81
Make alert ingestion cache independent (#3414)
# What this PR does

This PR catches redis unavailability exceptions to prevent errors during
alert ingestion in the following places:
StartupProbeView
AlertChannelDefiningMixin
RateLimitMixin

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-23 16:32:02 +08:00
Ildar Iskhakov
a6912c96af
Merge pull request #3415 from grafana/iskhakov/cache-independent-ingestion
Cache independent ingestion
2023-11-23 16:22:18 +08:00
Ildar Iskhakov
566e8c53ba Ignore typing checks for imported library (https://mypy.readthedocs.io/en/stable/running_mypy.html\#missing-library-stubs-or-py-typed-marker) 2023-11-23 16:14:30 +08:00
Ildar Iskhakov
0d5ef785bf Make alert ingestion cache independent 2023-11-23 11:27:47 +08:00
Matias Bordese
c0a6e69d9f
Remove unused prefetch; prefer indexed query (#3410)
This should help with slowness in webhooks listing page (do not fetch
*all* responses for webhooks, which besides were not used in the
serializer).
2023-11-23 02:08:04 +00:00
Innokentii Konstantinov
9628bdc51f
Webhook labels (#3383)
This PR add labels for webhooks. 
1. Make webhook "labelable" with ability to filter by labels.
2. Add labels to the webhook payload. It contain new field webhook with
it's name, id and labels. Field integration and alert_group has a
corresponding label field as well. See example of a new payload below:
```
{
    "event": {
        "type": "escalation"
    },
    "user": null,
    "alert_group": {
        "id": "IRFN6ZD31N31B",
        "integration_id": "CTWM7U4A2QG97",
        "route_id": "RUE7U7Z46SKGY",
        "alerts_count": 1,
        "state": "firing",
        "created_at": "2023-11-22T08:54:55.178243Z",
        "resolved_at": null,
        "acknowledged_at": null,
        "title": "Incident",
        "permalinks": {
            "slack": null,
            "telegram": null,
            "web": "http://grafana:3000/a/grafana-oncall-app/alert-groups/IRFN6ZD31N31B"
        },
        "labels": {
            "severity": "critical"
        }
    },
    "alert_group_id": "IRFN6ZD31N31B",
    "alert_payload": {
        "message": "This alert was sent by user for demonstration purposes"
    },
    "integration": {
        "id": "CTWM7U4A2QG97",
        "type": "webhook",
        "name": "hi - Webhook",
        "team": null,
        "labels": {
            "hello": "world",
            "severity": "critical"
        }
    },
    "notified_users": [],
    "users_to_be_notified": [],
    "webhook": {
        "id": "WHAXK4BTC7TAEQ",
        "name": "test",
        "labels": {
            "hello": "kesha"
        }
    }
}
```

I feel that there is an opportunity to make code cleaner - remove all
label logic from serializers, views and utils to models or dedicated
LabelerService and introduce Labelable interface with something like
label_verbal, update_labels methods. However, I don't want to tie
webhook labels with a refactoring.

---------

Co-authored-by: Dominik <dominik.broj@grafana.com>
2023-11-22 11:17:41 +00:00
Joey Orlando
f70c439334
add more logging in apps.alerts.tasks.notify_user.perform_notification task (#3403)
# What this PR does

add more logging in `apps.alerts.tasks.notify_user.perform_notification`
task (related to https://github.com/grafana/oncall-private/issues/2318;
won't solve that issue but will help with the investigation)
2023-11-21 13:35:23 -05:00
Matias Bordese
56d1b529e9
Add builtin slack retry on ratelimited error (#3401)
Fixes https://github.com/grafana/oncall-private/issues/2293

Enable Slack client retries on `ratelimited` errors: it will check the
`Retry-After` header before trying again. After 3 attempts it will raise
the error (and we will fallback to the usual error/task retry handling).
2023-11-21 17:32:29 +00:00
Michael Derynck
b3583cd1a0
Add more logging info on alert creation (#3392)
# What this PR does
Add alert receive channel id when logging to make it easier to trace
grouping

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2023-11-21 16:16:15 +00:00
Vadim Stepanov
cb2d4fa76b
Fix deleting integrations with duplicate names (#3397)
# What this PR does

Fixes a bug when it's not possible to delete two or more integrations
having the same name at once.

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/2313

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-21 12:44:21 +00:00
Joey Orlando
05ec0f97b5
fix issue in /escalate Slack command when selecting a team (#3381)
# Which issue(s) this PR fixes

Closes https://github.com/grafana/support-escalations/issues/8380

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-20 15:27:01 -05:00
Matias Bordese
3b90c6544b
Avoid msg_too_long errors when posting/updating slack resolution note (#3372) 2023-11-20 12:17:07 +00:00
Joey Orlando
6214ffbd66
fix missing users in rotations when RBAC is enabled (#3380)
# Which issue(s) this PR fixes
1. Enable RBAC
2. Create a schedule rotation layer which includes a user whom is Viewer
+ has role `Notifications Receiver` (this is the RBAC role we use to
filter which users show up in the user dropdown in the rotations modal
when creating a rotation)
3. The user _sorta_ shows up in the schedule but they are listed in
`missing_users`

<img width="1166" alt="Screenshot 2023-11-17 at 10 12 30"
src="https://github.com/grafana/oncall/assets/9406895/ae4d6449-3aff-4087-9b05-64645e84b40a">
<img width="1173" alt="Screenshot 2023-11-17 at 10 15 04"
src="https://github.com/grafana/oncall/assets/9406895/3ac4f0b9-49b3-4a7d-bfcf-39a8c51bbb74">


## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-20 11:31:07 +00:00
Michael Derynck
609da8044e
Handle Amazon SNS headers for moved (#3371)
# What this PR does
Previous PR #3326 test and forwarding code was not representative of
actual request. This fixes forwarding of Amazon SNS headers.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-16 13:44:56 -07:00
Joey Orlando
5678d79927
allow specifying more than one redis server URI in the REDIS_URI env var (#3368)
# What this PR does

Modifies the Django `settings/base.py` such that `REDIS_URI` can now be
a comma (or semicolon) separated list of URIs. From [Django
docs](https://docs.djangoproject.com/en/4.2/topics/cache/#:~:text=If%20you%20have%20multiple%20Redis%20servers%20set%20up%20in%20the%20replication%20mode%2C%20you%20can%20specify%20the%20servers%20either%20as%20a%20semicolon%20or%20comma%20delimited%20string%2C%20or%20as%20a%20list):

> If you have multiple Redis servers set up in the replication mode, you
can specify the servers either as a semicolon or comma delimited string,
or as a list. While using multiple servers, write operations are
performed on the first server (leader). Read operations are performed on
the other servers (replicas) chosen at random:
> ```python3
> CACHES = {
>     "default": {
>         "BACKEND": "django.core.cache.backends.redis.RedisCache",
>         "LOCATION": [
>             "redis://127.0.0.1:6379",  # leader
>             "redis://127.0.0.1:6378",  # read-replica 1
>             "redis://127.0.0.1:6377",  # read-replica 2
>         ],
>     }
> }
> ```

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-16 10:48:36 -05:00
Vadim Stepanov
93badbd638
Delete direct paging integration on team delete (#3367)
# What this PR does

- Fix a bug when it's not possible to delete duplicate DP integrations
for "No team"
- Make so that when a team is deleted, its DP integration is deleted
automatically

## Which issue(s) this PR fixes

https://github.com/grafana/oncall-private/issues/2296

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-16 13:54:15 +00:00
Matias Bordese
eb849678a6
Update slack user group update not to retry on some errors (#3363) 2023-11-16 13:41:42 +00:00
Matias Bordese
e1e56fc414
Truncate resolution note text in slack message to satisfy block limits (#3351)
This should help with some retrying tasks.
2023-11-16 13:15:04 +00:00
Ravishankar
13318577d4
fix(2989) Return users field data for web overriden shifts via public… (#3303)
# What this PR does 
The `rolling_users` field for the shift of type override created from
web UI is populated in the `users` field of the public GET Shift API
## Which issue(s) this PR fixes
  #2989
## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-16 10:13:47 -03:00
Joey Orlando
77cb381366
Fix broken openapi schema + add integration test (#3364)
# Which issue(s) this PR fixes

- Fix issue that was causing our openapi schema to return HTTP 500 + add
an integration test which fetches the `.yaml` schema and validates that
the endpoint returns HTTP 200 (should hopefully prevent this from
happening again).
- add a few more type hints along the way

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-16 12:15:05 +00:00