Commit graph

1390 commits

Author SHA1 Message Date
Joey Orlando
e477394b9c
patch occasional UnicodeEncodeError that occurs with outgoing webhooks (#3832)
# Which issue(s) this PR fixes

Closes https://github.com/grafana/oncall/issues/3831

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-02-02 21:08:04 +00:00
Matias Bordese
f5065462c9
Include teams info in users API (#3817) 2024-02-01 17:16:57 -03:00
Joey Orlando
7db7b09c55
attempt to address some SlackAPIRatelimitError exceptions (#3820)
# Which issue(s) this PR fixes

Closes https://github.com/grafana/oncall-private/issues/2515

Attempts to address some `SlackAPIRatelimitError` exceptions seen in the
following tasks:
- `apps.slack.tasks.post_slack_rate_limit_message`
([logs](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22qhs%22:%7B%22datasource%22:%22000000193%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%23%20%7Bcluster%3D~%5C%22prod-%28us-central-0%7Ceu-west-0%29%5C%22,%20namespace%3D%5C%22amixr-prod%5C%22,%20job%3D~%5C%22amixr-prod%2Famixr-engine-celery-retry%2A%5C%22%7D%5Cn%7Bcluster%3D~%5C%22prod-%28us-central-0%7Ceu-west-0%29%5C%22,%20namespace%3D%5C%22amixr-prod%5C%22%7D%20%7C%3D%20%5C%22apps.slack.tasks.post_slack_rate_limit_message%5C%22%20%7C%3D%20%5C%22retry%5C%22%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22000000193%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22from%22:%22now-7d%22,%22to%22:%22now%22%7D%7D%7D&orgId=1))
- `alerts.tasks.notify_user.perform_notification`
([logs](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22qhs%22:%7B%22datasource%22:%22000000193%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D~%5C%22prod-%28us-central-0%7Ceu-west-0%29%5C%22,%20namespace%3D%5C%22amixr-prod%5C%22%7D%20%7C%3D%20%5C%22SlackAPIRatelimitError%5C%22%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22000000193%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22from%22:%22now-7d%22,%22to%22:%22now%22%7D%7D%7D&orgId=1))

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-02-01 14:47:12 -05:00
Matias Bordese
de0e5a19a6
Handle alert group does not exist on telegram button press (#3814)
Fixes https://github.com/grafana/oncall-private/issues/2364
2024-02-01 17:55:35 +00:00
Yulya Artyukhina
ba122ec6ef
Update notification checker (#3818)
# What this PR does
Count sms with status "accepted" as delivered in notification checker
## Which issue(s) this PR fixes

https://raintank-corp.slack.com/archives/C025VMT6SPK/p1706799009342889?thread_ts=1706786822.083149&cid=C025VMT6SPK
## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-02-01 15:42:43 +00:00
KevinDW-Fluxys
4e3194c106
expose OUTGOING_WEBHOOK_TIMEOUT as env var (#3801)
# What this PR does
It adds functionality to be able to configure the outgoing webhook
timeout from an environment variable.

## Which issue(s) this PR fixes
Running into timeouts when outgoing webhooks take longer than 4 seconds
(which is exceptional, but can happen) the webhook reports failure,
while it still might have succeeded on the webhook side.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
2024-02-01 10:48:09 -05:00
Innokentii Konstantinov
006d64a889
Add ability to migrate one org (#3809) 2024-02-01 18:25:36 +08:00
Innokentii Konstantinov
eb3f41c80f
Enable Grafana Alerting v2 integration (#3808)
Enables Grafana Alerting v2 feature. It will enable integration dropdown
in OnCall Contact point in the Alerting UI.
In OSS if user has grafana version which supports this feature it will
work as well. If user has old grafana/oncall - nothing will happen.

---------

Co-authored-by: Ildar Iskhakov <Ildar.iskhakov@grafana.com>
2024-02-01 16:20:47 +08:00
Innokentii Konstantinov
dc355dbf0f Fix grafana_labels sync 2024-02-01 13:43:32 +08:00
Michael Derynck
2a466a0c4f
Add transaction on_commit before signals for alert group actions (#3731)
# What this PR does
Add transactions around log record creation and check transaction
on_commit before sending signals passing DB id of alert group log
records. In cases for delete we can then assume any missing IDs on tasks
are from intentionally deleted alert groups and we can stop tasks from
retrying endlessly.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-31 15:54:50 -07:00
Joey Orlando
14feaba3d1
Merge branch 'dev' of github.com:grafana/oncall into dev 2024-01-31 15:48:05 -05:00
Joey Orlando
bc0f51c071
update tests for going_oncall_notification 2024-01-31 15:48:00 -05:00
Michael Derynck
8427953fad
Fix Incident plugin status sync (#3802)
# What this PR does
- Handle case where key exists for jsonData but explicitly set to None
- Disable incident if plugin disabled after or in the case it was
removed completely from the Grafana instance

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-31 11:52:20 -07:00
Joey Orlando
758c12790d
fix slack API rate limit errors in send_message_to_thread_if_bot_not_in_channel task (#3803)
# What this PR does

See [this
conversation](https://raintank-corp.slack.com/archives/C04JCU51NF8/p1706722752735009)
for more context.

Additionally, improves logging for this task + adds unit tests.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-31 13:42:52 -05:00
Matias Bordese
3795c836d1
Add transaction block and callbacks when triggering tasks (#3779)
Related to https://github.com/grafana/oncall/issues/3729
2024-01-31 09:26:14 -05:00
Yulya Artyukhina
16ce0136f3
Refactor gaps and empty shift checks (#3785)
Refactor gaps and empty shift checks:
- Increase checking gaps and empty shifts frequency
- Unify gaps and empty shift checks
2024-01-31 15:25:06 +01:00
Yulya Artyukhina
801f1ad028
Fix telegram connection check (#3794)
Fix check whether user has telegram connection on `get_telegram_verification_code` endpoint
2024-01-31 15:23:11 +01:00
Matias Bordese
52871b08e6
Fix interval validation when creating shift via public API (#3775)
Related to https://github.com/grafana/support-escalations/issues/9142.
2024-01-31 11:06:54 -03:00
Matias Bordese
390cbb6d6f
Fix list user serializer logic (#3793) 2024-01-31 10:13:08 -03:00
Joey Orlando
3833d8de56
remove manual alert group (/oncall) slack slash command + force_route_id (#3790)
# What this PR does

Related to [this
discussion](https://raintank-corp.slack.com/archives/C04JCU51NF8/p1706550226831949)

Removes the `/oncall` Slack slash command + the concept of
`force_route_id` (as this Slack slash command was the last piece of code
to use this concept
[here](https://github.com/grafana/oncall/blob/dev/engine/apps/slack/scenarios/manual_incident.py#L146))

## TODO before merging
- [x] update the various env's Slack apps to remove the slash command
from the app manifests

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-30 17:28:23 -05:00
Michael Derynck
84a92cc9d3
Add headers for ChatopsProxyAPIClient (#3789)
# What this PR does
Add necessary headers to ChatopsProxyAPIClient for dev environment
testing.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-30 11:57:20 -07:00
Joey Orlando
06933a696a
Support alert routing based on labels (#3778)
# What this PR does

This PR adds support for routing alerts based on labels.
https://www.loom.com/share/4401de6e3c4945d5b8961fe43ee373c9

Additionally:
- improve the typing around the `get_object` method that is inherited by
[`PublicPrimaryKeyMixin.get_object`](https://github.com/grafana/oncall/blob/dev/engine/common/api_helpers/mixins.py#L153)
in most of our models. `PublicPrimaryKeyMixin` is generic, so it can be
more strongly typed when it is being subclassed, which results in better
typing of the `get_object` method in child classes
- I decided to do this because I started looking into this task via the
[`AlertReceiveChannelView.send_demo_alert`
method/endpoint](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/alert_receive_channel.py#L242).
Within that method, `instance` is not typed because the inherited
`get_object` method is not typed.. I digress 😄
- improve typing around `Alert.create` and
`apps.integrations.tasks.create_alert` functions
- make `Alert.render_group_data` more DRY by extracting some logic out
into `Alert._apply_jinja_template_to_alert_payload_and_labels`
- deduplicate the logic of `value.strip().lower() in ["1", "true",
"ok"]` into a shared function,
`common.jinja_templater.apply_jinja_template.templated_value_is_truthy`

Closes https://github.com/grafana/oncall-private/issues/2490

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
- [x] Documentation added (or `pr:no public docs` PR label added if not
required) (will be done in #3762)
2024-01-30 13:07:19 -05:00
Ildar Iskhakov
a6680e5ac1
Merge hotfix 1.3.94 (#3784)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Co-authored-by: GitHub Actions <actions@github.com>
Co-authored-by: Joey Orlando <joseph.t.orlando@gmail.com>
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2024-01-30 18:33:22 +08:00
Ildar Iskhakov
401d279d54
Refactor create_alert task (#3759)
# What this PR does

This PR simplifies alert group/alert creation, so the alert created and
escalation started in the same task.

## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-30 08:39:04 +00:00
Innokentii Konstantinov
c58a81bbdf
Enable labels feature only if labels plugin is enabled (#3769)
# What this PR does
Adds a check to enable labels feature only if plugin provisioned. It's
needed to be protected from reconciliation delays and etc.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-30 07:29:16 +00:00
Matias Bordese
65cdcf93ba
Add is_currently_oncall information to internal user details API (#3765)
Related to https://github.com/grafana/oncall/issues/3164
2024-01-29 17:41:20 +00:00
Yulya Artyukhina
e17bad4cdd
Fix calculating number of oncall users per team (#3773)
# What this PR does
Fixes calculating number of oncall users per team for `team` api
endpoint

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-29 14:32:03 +00:00
Joey Orlando
c5917f4d32
pass str(org_id) to cloud auth api 2024-01-26 14:38:16 -05:00
Joey Orlando
19686a9cd6
add some more logging to mobile app proxy 2024-01-26 10:48:35 -05:00
Joey Orlando
baddc64092
remove iat and exp from auth api token claims 2024-01-25 15:14:08 -05:00
Joey Orlando
4220199a86
cast X-Realms header to
jsonified string
2024-01-25 14:41:35 -05:00
Joey Orlando
add6df570f
address improperly formatted cloud
auth api request headers
2024-01-25 14:19:28 -05:00
Joey Orlando
2abcc4563a
mobile app proxy - request auth token from cloud auth api (#3748)
# What this PR does

Related to https://github.com/grafana/oncall-private/issues/2071

See [this
conversation](https://raintank-corp.slack.com/archives/C064R17Q1A8/p1706125615995019)
for all the context

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-25 13:46:55 -05:00
Yulya Artyukhina
e18dafa650
Fix routes and schedules public api endpoints (#3751)
# What this PR does
Add check whether organization has Slack connection on update Slack
related field using public api endpoints
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1611
## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-25 12:52:55 +00:00
Yulya Artyukhina
19cae8086e
Retry perform_notification with Telegram ratelimit countdown on RetryAfter error (#3744)
# What this PR does
Use Telegram ratelimit countdown when retry `perform_notification` task
on `RetryAfter` error
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2451

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-24 15:31:56 +00:00
Innokentii Konstantinov
a6b1ccd416 Remove unused const 2024-01-24 16:32:58 +08:00
Michael Derynck
032ced6fd0
Add more logging to plugin sync and install (#3730)
# What this PR does
Add logging to process for syncing OnCall backend with Grafana to help
troubleshoot issues in self-hosted setups.


## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-23 22:59:33 +00:00
Matias Bordese
dbd5452a0b
Handle a possible outdated cached integration error (#3741)
Related to
[logs](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22hum%22:%7B%22datasource%22:%22c-R8UWvVk%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22amixr-prod%5C%22,%20job%3D%5C%22amixr-prod%2Famixr-integrations%5C%22%7D%20%7C%3D%20%5C%22django.core.serializers.base.DeserializationError%5C%22%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22c-R8UWvVk%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22from%22:%221706023840486%22,%22to%22:%221706024722486%22%7D%7D%7D&orgId=1)
2024-01-23 20:46:12 +00:00
Kevin
d7ce341b34
Minor Fix to Format of REFRESH-INTERVAL in ical_utils.py (#3732)
# What this PR does
Minor formatting change to the suggested REFRESH-INTERVAL of iCal
exports. DURATION units less than 1d must be prefixed by "T". Fixes
issue with Atlassian Confluence failing to subscribe to iCal URLs from
Grafana OnCall

RFC 2445 explains this a bit more clearly in section 4.3.6 on DURATION. 
https://www.ietf.org/rfc/rfc2445.txt

Obviously I wish Atlassian could just be a little more forgiving in
their digestion of the iCal data since clearly Gmail and others have no
problem with it, but I doubt I'm going to get much traction with them
(we do have a case open though).

## Which issue(s) this PR fixes
I haven't logged one yet but I can if you want. Again, my main issue was
with Atlassian Confluence. Kept throwing errors of "The uploaded data
does not seem to be iCalendar content" as long as the REFRESH-INTERVAL
DURATION was less than 1 day and lacking the "T" character.

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-23 11:14:59 -03:00
Vadim Stepanov
b5aa53d3d6
Alertmanager V2 migration prep (#3722)
# What this PR does

- Adds a Django management command and database fields required for the
Alertmanager V2 migration
- Adds a post-migration warning alert

<img width="1177" alt="Screenshot 2024-01-19 at 17 41 04"
src="https://github.com/grafana/oncall/assets/20116910/512ab22e-9a00-481e-883d-3dadfc95b587">


Related to https://github.com/grafana/oncall-private/issues/2260

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-23 10:36:58 +00:00
Innokentii Konstantinov
3a2cb99ac9 Another round of fixing chatops urls 2024-01-22 15:25:47 +08:00
Innokentii Konstantinov
89b6b06879
Update v3 telegram routes namespace (#3724) 2024-01-22 14:50:13 +08:00
Innokentii Konstantinov
f7df1ad5e7
Slack and telegram routes to test chatops-proxy v3 (#3723) 2024-01-22 13:48:19 +08:00
Innokentii Konstantinov
4a02d83fd1
Chatops api v3 (#3721)
This PR makes OnCall compatible with chatops-proxy v3. When CHATOPS_V3
is enabled, oncall will use new api client to register tenants and slack
installations. Also I added v3 routes for slack and telegram, so it's
possible to test new chatops proxy.

Currently two versions of chatops-proxy api are deployed, but they are
not compatible. They are doing same thing, using different db model and
tables. Once only v3 version will be left in prod, I'll remove
CHATOPS_V3 env var, all leftovers of previous api client and v3 slack
and telegram routes.

---------

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2024-01-20 06:56:17 +00:00
Joey Orlando
ddd71a81d7
Merge branch 'dev' of github.com:grafana/oncall into dev 2024-01-18 09:10:04 -05:00
Joey Orlando
f27aa48dcb
address typo in partial passed to transaction.on_commit 2024-01-18 09:09:48 -05:00
Yulya Artyukhina
40c964c7b7
Speed up send email notification task (#3713)
# What this PR does
Removes unnecessary filtering by organization during emails limit check
in send email notification task since there is filtering by user there,
so there is no need to check organization
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2205
## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-18 13:54:18 +00:00
Joey Orlando
909aacd8b8
change perform_notification.apply_async
transaction.on_commit back to using partial
2024-01-18 07:58:21 -05:00
Joey Orlando
16b648bd15
fix infinitely retrying apps.alerts.tasks.notify_user.perform_notification task (#3708)
# Which issue(s) this PR fixes

Closes https://github.com/grafana/oncall-private/issues/2318

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-18 07:07:01 -05:00
Joey Orlando
969c28c232
modify mobile app proxy gateway headers + request body (#3707)
# What this PR does

Some more necessary changes after local testing w/ Grafana Incident team

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-17 16:29:47 -05:00