When checking current shifts to trigger slack notifications, a stable
value for timezone should be used (to avoid triggering multiple
duplicated notifications; using the first `VTIMEZONE` value available
may change on refresh, besides being potentially incorrect).
Also, `VTIMEZONE` shouldn't be used as the calendar timezone, it just
describes a [timezone
definition](https://icalendar.org/iCalendar-RFC-5545/3-6-5-time-zone-component.html)
which can later be [used when specifying a start/end
date/datetime](https://icalendar.org/iCalendar-RFC-5545/3-2-19-time-zone-identifier.html)
("The parameter MUST be specified on properties with a DATE-TIME value
if the DATE-TIME is not either a UTC or a "floating" time.").
OTOH, Google uses the non-standard[ `X-WR-TIMEZONE`
](https://github.com/collective/icalendar/issues/343)field as fallback
TZ for date-times without a `TZID`. Try to use this when available,
otherwise fallback to `UTC`.
Fixes an issue where if a sender is defined for a certain country code
and that sender belongs to a different account than the default from the
environment the status callback would get a 443 response from OnCall
back to Twilio. Checking permission on status callbacks will now use the
same account as was used to send the original message, verify, or make
the call.
This will vastly improve troubleshooting when an unexpected error
occurs.
# What this PR does
Enables logging of the exception that might be thrown by
SocialAuthExceptionMiddleware. It vastly improves the speed and accuracy
of troubleshooting requests to external applications with the aim to
authenticate, since there is usually no response to look at.
## Which issue(s) this PR fixes
Closes [#2035](https://github.com/grafana/oncall/issues/2035)
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Innokentii Konstantinov <innokenty.konstantinov@grafana.com>
# What this PR does
Fix HTTP 500 when sending demo alert for inbound integration email.
- Make all integration configs have consistent `is_demo_alert_enabled`
and `example_payload` values
- Add tests
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Before this change, a diff ical check (which happens with frequency with
imported ical), particularly with overrides in an API/terraform schedule
would trigger unexpected slack notifications because the prev vs current
ical comparison will flag a diff, but when comparing current and
previous shifts, `current_shifts` will have the shift in progress while
the `prev_shifts` calculated from the overrides-only diff will most of
the time be empty (unless you set/change an override at current time).
Simplified the checks to always compare previous current shifts (ie. the
ones in the schedule from the DB) vs the recalculated ones using the
(refreshed) ical data from the schedule.
# What this PR does
Reworks Slack handlers for buttons and select menus for AG Slack
messages.
<img width="602" alt="Screenshot 2023-05-31 at 19 34 05"
src="https://github.com/grafana/oncall/assets/20116910/857bf096-7bdd-427b-94b6-15aad873a8ac">
## Current implementation
- It's possible to end up with orphaned Slack messages that are posted
to Slack but have no `SlackMessage` instance in the DB. For such
messages, clicking buttons will result in an exception and HTTP 500. See
private repo
[issue](https://github.com/grafana/oncall-private/issues/1841) for more
info.
- Bug in authorization system, which effectively bypasses any permission
checks. For example, it's possible to resolve an alert group while being
a Viewer.
- No tests covering most buttons.
## Changes in this PR
- Make the system more robust, don't use `SlackMessage` model to figure
out the alert group being interacted on, instead embed `alert_group_pk`
to every button and use it when receiving interaction requests from
Slack.
- Existing orphaned Slack messages will be repaired. Clicking buttons
under orphaned messages will work (and missing `SlackMessage` instance
will be created on interaction). This is possible because some buttons
already have `alert_group_pk` embedded, and it's possible to get this
data on button clicks (even if the clicked button itself doesn't have
`alert_group_pk` embedded).
- Fix authorization. Show warning window when unauthorized:
<img width="511" alt="Screenshot 2023-05-31 at 19 40 02"
src="https://github.com/grafana/oncall/assets/20116910/5abeeaa7-1b61-4a47-b3af-0e21d5cd1907">
- Added tests for all the buttons under AG message. Add tests checking
authorization, actual execution of scenario steps, orphan message
repairing, backward compatibility, etc. Also add tests on
`AlertGroupSlackRenderer` checking that correct data is embedded into
buttons.
- Cosmetic changes such as renaming `incident` to `Alert Group`.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/1841
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Reduce number of alert groups returned for grouping on slack request to
20 to avoid event trigger expiry.
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1835
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
Fixes issue when syncing contact points and there are receiver configs
with no `grafana_managed_receiver_configs` key.
(eg. `{"name": "autogen-contact-point-default"}`)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
Adds a ratelimit for AmazonSNS.
AlertChannelDefining mixin is now injecting alert_receive_channel only
in request, not in kwargs to not to break AmazonSNS.
Many countries are introducing different requirements for SMS senders to
register and/or use alpha numeric ids, short codes or regional numbers
or face being blocked. The changes in this PR will give us more
flexibility by allowing us to map to different resources in Twilio based
on the phone number we are trying to reach. For this first
implementation the selection is made based on country code of the
recipient. Verification and phone calls were given the same treatment
although the immediate need is for SMS. Senders with no country code set
can be used as catch-all defaults. This also falls back to the
configured live settings/environment variables if not configured.
Possible future additions:
- Move through list of trying multiple senders before failing
notification
- Easily expanded to allow per-organization or per-user resources to let
users and tenants configure their own Twilio
- Add UI + replace live settings so users can configure their own
settings
- More selection criteria if needed
TODO:
- [x] Add+Fix Tests
- [x] Verify changes are compatible with #1713
# What this PR does
In trying to solve
https://github.com/grafana/oncall-private/issues/1836, it is very
difficult to understand the root cause without seeing the event payload.
This PR will log this out.
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
Make sure the final schedule is refreshed after dropping the cached ical
representations (sometimes the refresh final task was completed before
the cached ical files were refreshed).
# What this PR does
Handle HTTP 500 error when attempting to update resolution note modal
window that was already closed by user.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/1834
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# Which issue(s) this PR fixes
when running the mobile app (emulator) + OnCall locally and trying to
trigger "You're Going OnCall" push notifications, I was seeing this in
the `celery` logs:
```bash
2023-05-24 21:39:54,032 source=engine:celery worker=ForkPoolWorker-3 task_id=cf9e5b52-a213-430a-8e3c-d6c3bed53318 task_name=apps.mobile_app.tasks.conditionally_send_going_oncall_push_notifications_for_schedule name=celery.app.trace level=INFO Task apps.mobile_app.tasks.conditionally_send_going_oncall_push_notifications_for_schedule[cf9e5b52-a213-430a-8e3c-d6c3bed53318] retry: Retry in 2s: NameError("name 'MobileAppUserSettings' is not defined")
```
This PR patches that by adding the import (inside the relevant function,
to avoid circular imports). After adding this import, I am seeing push
notifications being sent successfully:
```bash
2023-05-24 21:44:08,910 source=engine:celery worker=ForkPoolWorker-3 task_id=71a708b5-9982-4b71-b719-17ed5867dfe1 task_name=apps.mobile_app.tasks.conditionally_send_going_oncall_push_notifications_for_schedule name=apps.mobile_app.tasks level=INFO Evaluating if we should send push notification for schedule 1 for user UWZ6FR5T2KG7U
2023-05-24 21:44:08,912 source=engine:celery worker=ForkPoolWorker-3 task_id=71a708b5-9982-4b71-b719-17ed5867dfe1 task_name=apps.mobile_app.tasks.conditionally_send_going_oncall_push_notifications_for_schedule name=apps.mobile_app.tasks level=INFO timing is right to send going oncall push notification
seconds_until_shift_starts: 476
user_notification_timing_preference: 43200
timing_window_lower: 42780
timing_window_upper: 43620
shift_starts_within_users_notification_timing_preference: False
shift_starts_within_fifteen_minutes: True
2023-05-24 21:44:08,916 source=engine:celery worker=ForkPoolWorker-3 task_id=71a708b5-9982-4b71-b719-17ed5867dfe1 task_name=apps.mobile_app.tasks.conditionally_send_going_oncall_push_notifications_for_schedule name=apps.mobile_app.tasks level=DEBUG Sending push notification with message: {"android": {"priority": "high"}, "apns": {"headers": {"apns-priority": "10"}, "payload": {"aps": {"alert": {"title": "You are going on call in 7 minutes for schedule joey test"}, "interruption-level": "time-sensitive", "sound": {"name": "default_sound.aiff"}, "thread-id": "SZM7GDPI2VI3F:UWZ6FR5T2KG7U:going-oncall"}}}, "data": {"info_notification_sound_name": "default_sound.mp3", "info_notification_volume": "0.8", "info_notification_volume_override": "false", "info_notification_volume_type": "constant", "thread_id": "SZM7GDPI2VI3F:UWZ6FR5T2KG7U:going-oncall", "title": "You are going on call in 7 minutes for schedule joey test", "type": "oncall.info"}, "token": "dqWWqPS8SvOno1TEE_ZBlX:APA91bHW3hB2sXfKHxxrZ6BITyju3gzBfOHyh1drqndc1U8_b-F89JIfPEsaZvXL-uQd0vpJA8LHifEUCZKb_frk-wbTAwbgk92_0a1DvUKdgNcntK-O85MUDRuf6bWhE9NRGIv58tt5"}
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR moves phone notification logic into separate object PhoneBackend
and introduces PhoneProvider interface to hide actual implementation of
external phone services provider. It should allow add new phone
providers just by implementing one class (See SimplePhoneProvider for
example).
# Why
[Asterisk PR](https://github.com/grafana/oncall/pull/1282) showed that
our phone notification system is not flexible. However this is one of
the most frequent community questions - how to add "X" phone provider.
Also, this refactoring move us one step closer to unifying all
notification backends, since with PhoneBackend all phone notification
logic is collected in one place and independent from concrete
realisation.
# Highligts
1. PhoneBackend object - contains all phone notifications business
logic.
2. PhoneProvider - interface to external phone services provider.
3. TwilioPhoneProvider and SimplePhoneProvider - two examples of
PhoneProvider implementation.
4. PhoneCallRecord and SMSRecord models. I introduced these models to
keep phone notification limits logic decoupled from external providers.
Existing TwilioPhoneCall and TwilioSMS objects will be migrated to the
new table to not to reset limits counter. To be able to receive status
callbacks and gather from Twilio TwilioPhoneCall and TwilioSMS still
exists, but they are linked to PhoneCallRecord and SMSRecord via fk, to
not to leat twilio logic into core code.
---------
Co-authored-by: Yulia Shanyrova <yulia.shanyrova@grafana.com>
# What this PR does
Sometimes `CustomButtonView` returns HTTP 500 with the following error:
```
apps.alerts.models.custom_button.CustomButton.MultipleObjectsReturned: get() returned more than one CustomButton -- it returned 3!
```
This PR fixes it by adding `.distinct()` to the `CustomButton` queryset
when retrieving an instance + does the same for `WebhooksView`.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/1828
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Handle different failing authentication scenarios (e.g. when token is
invalid or instance context is not a valid JSON) so endpoints return
appropriate response code (401 instead of 500).
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/1633
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fixes https://github.com/grafana/oncall/issues/1960.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Fix inbound email endpoint bug when attaching files to email leads to
HTTP 500.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Adds ability to send test push notification
---------
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
Organizations that have been deleted outside OnCall were not being
cleaned up by this task as expected.
- Use PluginAuthToken instead of GCOM token == None to determine if the
oncall organization should be matched in GCOM
- Fix how delete was being checked for the instance, the previous method
does not work.
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Adds docs & logo for AppDynamics integration.
Main PR in private repo:
https://github.com/grafana/oncall-private/pull/1790.
## Which issue(s) this PR fixes
https://github.com/grafana/oncall-private/issues/1621
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- No changelog (AppDynamics integration will be only available in cloud)
# What this PR does
## Which issue(s) this PR fixes
User reported receiving a push notification that they were going oncall
~12mins before the shift started but the notification copy instead
showed this:

## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
https://www.loom.com/share/18cc445117de4895a10892d56c7d3699
In preparation to upgrade our cloud databases, this PR makes some minor
changes which, after testing locally, allowed the `POST
/<integration_type>/<alert_channel_key>` endpoints to successfully
receive incoming alerts and queue the celery tasks.
I've tested all of the defined `POST
/integrations/v1/<integration_type>/<alert_channel_key>` endpoints by
sending `POST` requests to an integrations' URL while the MySQL database
was down, bringing the database back up, and ensuring the alerts were
created.
## Some other findings
- the integration heartbeat endpoints will not work as we interact w/
the database to persist the incoming heartbeat instance
- if the integration was created in the last 180 seconds, incoming
alerts will fail due to the way we cache the integration IDs
([code](https://github.com/grafana/oncall/blob/dev/engine/apps/integrations/mixins/alert_channel_defining_mixin.py#L47-L50))
- The `create_alert` celery task is set to `max_retries=None` and
`retry_backoff=True`. This means that the queued tasks will continue
retrying forever w/ an exponential backoff, until the alerts can be
created in the database (ie. when the database is back online).
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)