Commit graph

1364 commits

Author SHA1 Message Date
Yulya Artyukhina
e17bad4cdd
Fix calculating number of oncall users per team (#3773)
# What this PR does
Fixes calculating number of oncall users per team for `team` api
endpoint

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-29 14:32:03 +00:00
Joey Orlando
c5917f4d32
pass str(org_id) to cloud auth api 2024-01-26 14:38:16 -05:00
Joey Orlando
19686a9cd6
add some more logging to mobile app proxy 2024-01-26 10:48:35 -05:00
Joey Orlando
baddc64092
remove iat and exp from auth api token claims 2024-01-25 15:14:08 -05:00
Joey Orlando
4220199a86
cast X-Realms header to
jsonified string
2024-01-25 14:41:35 -05:00
Joey Orlando
add6df570f
address improperly formatted cloud
auth api request headers
2024-01-25 14:19:28 -05:00
Joey Orlando
2abcc4563a
mobile app proxy - request auth token from cloud auth api (#3748)
# What this PR does

Related to https://github.com/grafana/oncall-private/issues/2071

See [this
conversation](https://raintank-corp.slack.com/archives/C064R17Q1A8/p1706125615995019)
for all the context

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-25 13:46:55 -05:00
Yulya Artyukhina
e18dafa650
Fix routes and schedules public api endpoints (#3751)
# What this PR does
Add check whether organization has Slack connection on update Slack
related field using public api endpoints
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1611
## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-25 12:52:55 +00:00
Yulya Artyukhina
19cae8086e
Retry perform_notification with Telegram ratelimit countdown on RetryAfter error (#3744)
# What this PR does
Use Telegram ratelimit countdown when retry `perform_notification` task
on `RetryAfter` error
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2451

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-24 15:31:56 +00:00
Innokentii Konstantinov
a6b1ccd416 Remove unused const 2024-01-24 16:32:58 +08:00
Michael Derynck
032ced6fd0
Add more logging to plugin sync and install (#3730)
# What this PR does
Add logging to process for syncing OnCall backend with Grafana to help
troubleshoot issues in self-hosted setups.


## Which issue(s) this PR fixes

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-23 22:59:33 +00:00
Matias Bordese
dbd5452a0b
Handle a possible outdated cached integration error (#3741)
Related to
[logs](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22hum%22:%7B%22datasource%22:%22c-R8UWvVk%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22amixr-prod%5C%22,%20job%3D%5C%22amixr-prod%2Famixr-integrations%5C%22%7D%20%7C%3D%20%5C%22django.core.serializers.base.DeserializationError%5C%22%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22c-R8UWvVk%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22from%22:%221706023840486%22,%22to%22:%221706024722486%22%7D%7D%7D&orgId=1)
2024-01-23 20:46:12 +00:00
Kevin
d7ce341b34
Minor Fix to Format of REFRESH-INTERVAL in ical_utils.py (#3732)
# What this PR does
Minor formatting change to the suggested REFRESH-INTERVAL of iCal
exports. DURATION units less than 1d must be prefixed by "T". Fixes
issue with Atlassian Confluence failing to subscribe to iCal URLs from
Grafana OnCall

RFC 2445 explains this a bit more clearly in section 4.3.6 on DURATION. 
https://www.ietf.org/rfc/rfc2445.txt

Obviously I wish Atlassian could just be a little more forgiving in
their digestion of the iCal data since clearly Gmail and others have no
problem with it, but I doubt I'm going to get much traction with them
(we do have a case open though).

## Which issue(s) this PR fixes
I haven't logged one yet but I can if you want. Again, my main issue was
with Atlassian Confluence. Kept throwing errors of "The uploaded data
does not seem to be iCalendar content" as long as the REFRESH-INTERVAL
DURATION was less than 1 day and lacking the "T" character.

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-23 11:14:59 -03:00
Vadim Stepanov
b5aa53d3d6
Alertmanager V2 migration prep (#3722)
# What this PR does

- Adds a Django management command and database fields required for the
Alertmanager V2 migration
- Adds a post-migration warning alert

<img width="1177" alt="Screenshot 2024-01-19 at 17 41 04"
src="https://github.com/grafana/oncall/assets/20116910/512ab22e-9a00-481e-883d-3dadfc95b587">


Related to https://github.com/grafana/oncall-private/issues/2260

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-23 10:36:58 +00:00
Innokentii Konstantinov
3a2cb99ac9 Another round of fixing chatops urls 2024-01-22 15:25:47 +08:00
Innokentii Konstantinov
89b6b06879
Update v3 telegram routes namespace (#3724) 2024-01-22 14:50:13 +08:00
Innokentii Konstantinov
f7df1ad5e7
Slack and telegram routes to test chatops-proxy v3 (#3723) 2024-01-22 13:48:19 +08:00
Innokentii Konstantinov
4a02d83fd1
Chatops api v3 (#3721)
This PR makes OnCall compatible with chatops-proxy v3. When CHATOPS_V3
is enabled, oncall will use new api client to register tenants and slack
installations. Also I added v3 routes for slack and telegram, so it's
possible to test new chatops proxy.

Currently two versions of chatops-proxy api are deployed, but they are
not compatible. They are doing same thing, using different db model and
tables. Once only v3 version will be left in prod, I'll remove
CHATOPS_V3 env var, all leftovers of previous api client and v3 slack
and telegram routes.

---------

Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
2024-01-20 06:56:17 +00:00
Joey Orlando
ddd71a81d7
Merge branch 'dev' of github.com:grafana/oncall into dev 2024-01-18 09:10:04 -05:00
Joey Orlando
f27aa48dcb
address typo in partial passed to transaction.on_commit 2024-01-18 09:09:48 -05:00
Yulya Artyukhina
40c964c7b7
Speed up send email notification task (#3713)
# What this PR does
Removes unnecessary filtering by organization during emails limit check
in send email notification task since there is filtering by user there,
so there is no need to check organization
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2205
## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-18 13:54:18 +00:00
Joey Orlando
909aacd8b8
change perform_notification.apply_async
transaction.on_commit back to using partial
2024-01-18 07:58:21 -05:00
Joey Orlando
16b648bd15
fix infinitely retrying apps.alerts.tasks.notify_user.perform_notification task (#3708)
# Which issue(s) this PR fixes

Closes https://github.com/grafana/oncall-private/issues/2318

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-18 07:07:01 -05:00
Joey Orlando
969c28c232
modify mobile app proxy gateway headers + request body (#3707)
# What this PR does

Some more necessary changes after local testing w/ Grafana Incident team

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-17 16:29:47 -05:00
Matias Bordese
2fd456fc77
Update alert group personal notifications checker to check sent SMS (#3698)
Sent SMS messages are considered completed for our purpose here (ie. do
not wait for Twilio delivered confirmation).
2024-01-17 17:46:18 +00:00
Matias Bordese
c99788e9d2
Update schedule on-call cache on scheduled refresh tasks (#3699)
Related to https://github.com/grafana/oncall/issues/3673
Keep cache up to date on every schedule refresh task run (which should
keep cache populated every time), helping on any call using cached
information (particularly the direct paging slack dialog building).
2024-01-17 16:30:11 +00:00
Joey Orlando
31ffa805cc
update mobile app gateway jwt to include user email (#3704)
# What this PR does

Changes as requested by Grafana Incident team

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-17 10:47:18 -05:00
Matias Bordese
0a077ccfdb
Update and refactor users API team filter (#3703)
This should hopefully fix the lint issue
[here](https://drone.grafana.net/grafana/oncall/3361/1/7)
2024-01-17 15:18:08 +00:00
Innokentii Konstantinov
36d2c3bdb7
Adds new templates cheatsheats (#3643)
Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com>
2024-01-17 13:49:36 +00:00
Yulya Artyukhina
c7895c2308
Fix post message to slack channel (#3701)
# What this PR does
Extend list of exceptions to ignore on posting message to slack channel

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/3694

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-17 13:05:36 +00:00
Vadim Stepanov
6c248ed1c8
Fix posting Slack message when route is deleted (#3702)
# What this PR does

Fixes https://github.com/grafana/oncall/issues/3646

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-17 13:00:25 +00:00
Joey Orlando
f85cc6d33b
add more logging on celery task retry (#3695)
# What this PR does

This is a follow up to https://github.com/grafana/oncall/pull/3677.

It appears that when a task uses the [`autoretry_for`
kwarg](https://docs.celeryq.dev/en/stable/userguide/tasks.html#automatic-retry-for-known-exceptions)
in the task decorator, it doesn't log the exception in `on_failure` as
would be expected. Now when retrying, we log out a message + any
exception/stack trace information.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-16 07:13:16 -05:00
Vadim Stepanov
80f85cf4b4
Fix updating a shift swap with no Slack message (#3686)
# What this PR does

Fixes https://github.com/grafana/oncall/issues/3648

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)

---------

Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
2024-01-15 17:36:01 +00:00
Joey Orlando
da7f07ffd6
Fix occasional AttributeError in apps.grafana_plugin.tasks.sync.sync_organization_async task (#3687)
# Which issue(s) this PR fixes

Fix this issue I came across in a celery task retry exception log:
![Screenshot 2024-01-15 at 11 21
13](https://github.com/grafana/oncall/assets/9406895/ed08f2f1-dc7d-4ad3-88a0-dc02cd740582)


## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-15 11:34:40 -05:00
Vadim Stepanov
cc071806f3
disable DRF_SPECTACULAR_ENABLED by default 2024-01-15 16:06:46 +00:00
Joey Orlando
4036ced9b9
add LogExceptionOnFailureTask celery task class (#3677)
# What this PR does

Closes https://github.com/grafana/oncall-private/issues/2449

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-12 21:31:01 +00:00
Vadim Stepanov
d0904ca405
Improve OpenAPI schema coverage (#3629)
# What this PR does

Improves OpenAPI schema coverage for internal API:

- Fixes/Improves `alert group` and `feature` endpoints
- Adds `integration` and `user` endpoints

## Which issue(s) this PR fixes

https://github.com/grafana/oncall/issues/3444

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-12 15:11:22 +00:00
Matias Bordese
8656404598
Fix oncall_now for a schedule in orgs with multiple entries (#3671)
Fixes https://github.com/grafana/oncall/issues/3626
2024-01-12 14:46:13 +00:00
Yulya Artyukhina
d6a232ba8b
Add missing notification log records (#3664)
Related to https://github.com/grafana/oncall-private/issues/2347
2024-01-12 14:02:44 +00:00
Michael Derynck
d49af63d75
Fix unicode character encoding in JSON for webhooks (#3670)
# What this PR does
Fixes escaping for unicode characters in webhooks.

## Which issue(s) this PR fixes
#3149 

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-11 19:35:23 +00:00
Vadim Stepanov
8b7ffad598
Add team filter for users endpoint (#3666)
# What this PR does

Adds `team` filter for `users` endpoint

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-11 15:03:54 +00:00
Matias Bordese
4e2e7e0a15
Add task logging personal notifications triggered/completed counts (#3638)
Related to https://github.com/grafana/oncall-private/issues/2347
2024-01-10 18:54:27 +00:00
Yulya Artyukhina
c947f8992e
Add endpoint for alert group escalation snapshot (#3615)
# What this PR does
Adds endpoint for alert group escalation snapshot

## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/3277

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-10 14:52:59 +00:00
Yulya Artyukhina
a7d441647e
Add stack slug to /organization endpoint response (#3644)
# What this PR does
Add stack slug to /organization endpoint response

## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/2444
## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-10 12:29:43 +00:00
Joey Orlando
f20aa75869
Fix module 'apps.schedules.tasks.notify_about_empty_shifts_in_schedule' has no attribute 'apply_async' AttributeError (#3640)
# Which issue(s) this PR fixes

We've been seeing this `AttributeError` quite frequently for quite some
time
([logs](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22oPl%22:%7B%22datasource%22:%22000000193%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D~%5C%22prod-%28eu-west-0%7Cus-central-0%29%5C%22,%20namespace%3D%5C%22amixr-prod%5C%22%7D%20%7C%3D%20%60AttributeError%28%5C%22module%20%27apps.schedules.tasks.notify_about_empty_shifts_in_schedule%27%20has%20no%20attribute%20%27apply_async%27%5C%22%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22000000193%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22from%22:%22now-7d%22,%22to%22:%22now%22%7D%7D%7D&orgId=1))

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-10 06:22:44 -05:00
Joey Orlando
006ee4b860
Decrease outgoing webhook timeouts from 10secs to 4secs (#3639)
# Which issue(s) this PR fixes

See all the context
[here](https://raintank-corp.slack.com/archives/C025VMT6SPK/p1704802171131009?thread_ts=1704762857.043879&cid=C025VMT6SPK)

<img width="690" alt="Screenshot 2024-01-09 at 15 26 33"
src="https://github.com/grafana/oncall/assets/9406895/e4c794a3-508d-4f24-af22-0f800828271d">


## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-09 19:55:39 -05:00
Joey Orlando
4cc4099710
Address Telegram HTTP 500s when receiving message from Telegram in discussion group (#3622)
# Which issue(s) this PR fixes

Closes https://github.com/grafana/oncall/issues/3621

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-09 08:31:56 -05:00
Joey Orlando
72e7224ad3
do not retry firebase.messaging.UnregisteredError exceptions for FCM relay tasks (#3637)
# What this PR does

_tldr_; we had a lengthy discussion about this
[here](https://raintank-corp.slack.com/archives/C04JCU51NF8/p1701893410542629?thread_ts=1701690117.016909&cid=C04JCU51NF8).
`firebase.messaging.UnregisteredError` errors occur because of events
outside of our control and retrying will never fix them, therefore we
should simply skip retrying in this case.

We retry these fairly often
([logs](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22iWZ%22:%7B%22datasource%22:%22000000193%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%23%20%7Bcluster%3D~%5C%22prod-%28eu-west-0%7Cus-central-0%29%5C%22,%20namespace%3D%5C%22amixr-prod%5C%22%7D%20%7C%3D%20%5C%22task_name%3Dapps.webhooks.tasks.trigger_webhook.execute_webhook%5C%22%20%7C%3D%20%5C%22retry%5C%22%5Cn%7Bcluster%3D~%5C%22prod-%28eu-west-0%7Cus-central-0%29%5C%22,%20namespace%3D%5C%22amixr-prod%5C%22%7D%20%7C%3D%20%5C%22apps.mobile_app.fcm_relay.fcm_relay_async%5C%22%20%7C%3D%20%5C%22UnregisteredError%5C%22%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22000000193%22%7D,%22editorMode%22:%22code%22%7D%5D,%22range%22:%7B%22from%22:%22now-7d%22,%22to%22:%22now%22%7D%7D%7D&orgId=1))
which eats up unnecessary celery worker resources.

Related to https://github.com/grafana/oncall-private/issues/1820

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-09 08:14:20 -05:00
Joey Orlando
3bcf5efc24
manually retry for requests.exceptions.Timeout exceptions when sending outgoing webhooks (#3632)
# Which issue(s) this PR fixes

Fixes https://github.com/grafana/oncall-private/issues/2439

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-01-08 19:13:15 -05:00
Matias Bordese
d57b41b758
Create log record for telegram formatting error in notification (#3628) 2024-01-08 20:12:28 +00:00