We are noticing some
[issues](https://ops.grafana-ops.net/goto/sQFLv4XSg?orgId=1) when
updating routes via Terraform because when receiving multiple concurrent
requests updating position, the order is updated in a transaction but it
is then tried to save again in the `.save` call, which could lead to
`IntegrityError`s, because the order may have been updated in a
concurrent request meanwhile.
Test run with the reproduced error (before the serializer update):
```
_____________________________ test_ordered_model_swap_all_to_zero_via_serializer _____________________________
@pytest.mark.django_db(transaction=True)
def test_ordered_model_swap_all_to_zero_via_serializer():
THREADS = 300
exceptions = []
TestOrderedModel.objects.all().delete() # clear table
instances = [TestOrderedModel.objects.create(test_field="test") for _ in range(THREADS)]
# generate random non-unique orders
random.seed(42)
positions = [random.randint(0, THREADS - 1) for _ in range(THREADS)]
def update_order_to_zero(idx):
try:
instance = instances[idx]
serializer = TestOrderedModelSerializer(
instance, data={"order": 0, "extra_field": idx}, partial=True
)
serializer.is_valid(raise_exception=True)
serializer.save()
instance.swap(positions[idx])
except Exception as e:
exceptions.append(e)
threads = [threading.Thread(target=update_order_to_zero, args=(0,)) for _ in range(THREADS)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
# can only check that orders are still sequential and that there are no exceptions
# can't check the exact order because it changes depending on the order of execution
> assert not exceptions
E assert not [IntegrityError(1062, "Duplicate entry 'test-0' for key 'base_testorderedmodel.unique_test_field_order'"), IntegrityEr...rder'"), IntegrityError(1062, "Duplicate entry 'test-0' for key 'base_testorderedmodel.unique_test_field_order'"), ...]
THREADS = 300
exceptions = [IntegrityError(1062, "Duplicate entry 'test-0' for key 'base_testorderedmodel.unique_test_field_order'"), IntegrityEr...rder'"), IntegrityError(1062, "Duplicate entry 'test-0' for key 'base_testorderedmodel.unique_test_field_order'"), ...]
instances = [<TestOrderedModel: TestOrderedModel object (841)>, <TestOrderedModel: TestOrderedModel object (842)>, <TestOrderedMod...ject (844)>, <TestOrderedModel: TestOrderedModel object (845)>, <TestOrderedModel: TestOrderedModel object (846)>, ...]
positions = [57, 12, 140, 125, 114, 71, ...]
thread = <Thread(Thread-302 (update_order_to_zero), stopped 139636013323840)>
threads = [<Thread(Thread-3 (update_order_to_zero), stopped 139646722418240)>, <Thread(Thread-4 (update_order_to_zero), stopped ...ate_order_to_zero), stopped 139646608590400)>, <Thread(Thread-8 (update_order_to_zero), stopped 139646600197696)>, ...]
update_order_to_zero = <function test_ordered_model_swap_all_to_zero_via_serializer.<locals>.update_order_to_zero at 0x7f02086274c0>
common/tests/test_ordered_model.py:481: AssertionError
```
# What this PR does
- bumps `uwsgi` to latest version (`2.0.26`), which unblocks us from
bumping Python to 3.12
- bumps Python to 3.12.3
- refactor the Snyk GitHub Actions workflow to use the composable
actions for installed frontend and backend dependencies
- fixes several `AttributeError`s in our tests that went from a warning
to an error in Python 3.12 (see
https://github.com/python/cpython/issues/100690)
# Which issue(s) this PR closes
Closes#4358
Closes https://github.com/grafana/oncall/issues/4387
# What this PR does
Same as https://github.com/grafana/oncall/pull/4422 but returns wait
delays as strings so it's backward-compatible with the mobile app API
calls.
## Which issue(s) this PR closes
Related to https://github.com/grafana/oncall/issues/2464
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
This PR does a bunch of changes to prepare OnCall for Unified Slack App:
1. Install Slack via Chatops-Proxy. This change contains two parts:
getting a Slack install link from chatops-proxy
([code](https://github.com/grafana/oncall/pull/4232/files#diff-437a77d49fc04b92d315651b3df5991000b1ab74cf60aabb21aa77cb2823bf52R46))
and receiving a "slack installed" event from chatops-proxy
([code](https://github.com/grafana/oncall/pull/4232/files#diff-976d106f0962be5c1de5e35582193f68435ed0c17f2defd6bd2857bf6e27f65d)).
Also it means that OnCall doesn't need to register slack_links anymore
when slack is connected/disconnected. These changes are behind
UNIFIED_SLACK_APP_ENABLED flag and should be no-op if flag is not
enabled.
2. Get rid of Multiregionatily restrictions - instrument all slack
interactions with a ProxyMeta - json data telling chatops-proxy where to
route the interaction. Note, that it doesn't apply for "Add to
resolution notes" message action - it will be handled differently in
following PR.
3. Move all chatops-proxy related stuff from common/oncall-gateway to
apps/chatops-proxy
Minor changes:
1. Remove usage of **CHATOPS_V3** flag. Chatops v3 is already released
(It's a refactoring from previous quarter)
---------
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
# What this PR does
Allows custom wait durations for:
* `Wait` escalation policy
* `>X alerts per Y minutes` escalation policy
* `Wait` user notification policy
## Which issue(s) this PR closes
Related to https://github.com/grafana/oncall/issues/2464
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
---------
Co-authored-by: Rares Mardare <rares.mardare@grafana.com>
# What this PR does
In cloud we are currently (somewhat) improperly determining whether or
not a Grafana stack had the `accessControlOnCall` feature flag enabled.
At first things worked fine. We would enable this feature toggle via the
Grafana Admin UI, and then the OnCall backend would read this value from
GCOM's `GET /instance/<stack_id>` endpoint (via
`config.feature_toggles`), and everything worked as expected.
There was a recent change made in `grafana/deployment_tools` to set this
feature flag to True for all stacks. However, for some reason, the GCOM
endpoint above doesn't return the `accessControlOnCall` feature toggle
value in `config.feature_toggles` if it is set in this manner (it only
returns the value if it is set via the Grafana Admin UI).
So what we should instead be doing is such instead of asking GCOM for
this feature toggle, infer whether RBAC is enabled on the stack by doing
a `HEAD /api/access-control/users/permissions/search` (this endpoint _is
only_ available on a Grafana stack if `accessControlOnCall` is enabled).
**Few caveats to this ☝️**
1. we first have to make sure that the cloud stack is in an `active`
state (ie. not paused). This is because, no matter if the
`accessControlOnCall` is enabled or not, if the stack is in a `paused`
state it will ALWAYS return `HTTP 200` which can be misleading and lead
to bugs (this feels like a bug on the Grafana API, will follow up with
core grafana team)
2. Once we roll out this change we will effectively **actually** be
enabling RBAC for OnCall for all orgs. The Identity Access team would
prefer a progressive rollout, which is why I decided to introduce the
concept of
[`settings.CLOUD_RBAC_ROLLOUT_PERCENTAGE`](https://github.com/grafana/oncall/pull/4279/files#diff-3383aef931e41e44d95829ad971641eeb98fe001be2f5da92217446d300ea1b3R918)
(see also [`Organization.
should_be_considered_for_rbac_permissioning`](https://github.com/grafana/oncall/pull/4279/files#diff-2ca9917f4f56349be39545ee8abd459be5076295d02ca3a7ec545152fcddccdfR348-R362))
## Which issue(s) this PR closes
Related to https://github.com/grafana/identity-access-team/issues/667
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
Jinja2 filter to parse strings into datetime objects. Previously, only a
limited set of strings could be parsed into datetime objects
([datetime_re](https://docs.djangoproject.com/en/2.2/_modules/django/utils/dateparse/)).
The addition of this filter allows for strings of any format to be
converted into datetime.
## Which issue(s) this PR closes
<!--
*Note*: if you have more than one GitHub issue that this PR closes, be
sure to preface
each issue link with a [closing
keyword](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).
This ensures that the issue(s) are auto-closed once the PR has been
merged.
-->
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
Display human readable time ranges in AG filters
## Which issue(s) this PR closes
Closes https://github.com/grafana/oncall/issues/4272
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
---------
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
# What this PR does
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
# What this PR does
- removes unused "custom button" backend code now that we've migrated to
outgoing webhooks
- adds new e2e test for webhooks asserting that an `ngrok`/`express`
webhook handler receives the call as expected + payload is as expected
(related to https://github.com/grafana/oncall/issues/2691) - skipped for
now, the test passes locally but fails on GitHub Actions CI, seems to be
networking related
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
---------
Co-authored-by: Michael Derynck <michael.derynck@grafana.com>
# What this PR does
Fix typos in comments
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
show up in the autogenerated release notes.
Signed-off-by: teslaedison <qingchengqiushuang@gmail.com>
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# What this PR does
Adds internal API endpoints for
`alert_receive_channels/<id>/webhooks/<webhook_id>`.
## Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2541
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
**Cleanup label typing:**
1. LabelParam -> two separate types LabekKey and LabelValue
2. LabelData -> renamed to LabelPair.
3. LabelKeyData -> renamed to LabelOption
Data is not giving any info about what this type represents.
4. Remove LabelsData and LabelsKeysData types. They are just list of
types listed above and with new naming it feels obsolete.
5. ValueData removed. LabelPair is used instead.
6. Rework AlertGroupCustomLabel to use LabelKey type for key to make
type system more consistent. Name model type AlertGroupCustomLabel**DB**
and api type AlertGroupCustomLabel**API** to clearly distinguish them.
**Split update_labels_cache into two tasks** update_label_option_cache
and update_label_pairs_cache.
Original task was expecting array of LabelsData (now it's LabelPair) OR
one LabelKeyData ( now it's LabelOption). I believe having one function
with two sp different argument types makes it more complicated for
understanding.
**Make OnCall backend support prescribed labels**. OnCall will sync and
store "prescribed" field for key and values, so Label dropdown able to
disable editing for certain labels.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Maxim Mordasov <maxim.mordasov@grafana.com>
Co-authored-by: Yulya Artyukhina <Ferril.darkdiver@gmail.com>
# What this PR does
Disable escaping quotes for html in template results

Alert group 6 shows the new rendering vs group 5 which has the previous
incorrect rendering.
## Which issue(s) this PR fixes
#3864
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Update's `POST /v1/sign` API call to Cloud Auth API
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Victor Cinaglia <victorcinaglia@gmail.com>
# What this PR does
Add necessary headers to ChatopsProxyAPIClient for dev environment
testing.
## Which issue(s) this PR fixes
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
This PR adds support for routing alerts based on labels.
https://www.loom.com/share/4401de6e3c4945d5b8961fe43ee373c9
Additionally:
- improve the typing around the `get_object` method that is inherited by
[`PublicPrimaryKeyMixin.get_object`](https://github.com/grafana/oncall/blob/dev/engine/common/api_helpers/mixins.py#L153)
in most of our models. `PublicPrimaryKeyMixin` is generic, so it can be
more strongly typed when it is being subclassed, which results in better
typing of the `get_object` method in child classes
- I decided to do this because I started looking into this task via the
[`AlertReceiveChannelView.send_demo_alert`
method/endpoint](https://github.com/grafana/oncall/blob/dev/engine/apps/api/views/alert_receive_channel.py#L242).
Within that method, `instance` is not typed because the inherited
`get_object` method is not typed.. I digress 😄
- improve typing around `Alert.create` and
`apps.integrations.tasks.create_alert` functions
- make `Alert.render_group_data` more DRY by extracting some logic out
into `Alert._apply_jinja_template_to_alert_payload_and_labels`
- deduplicate the logic of `value.strip().lower() in ["1", "true",
"ok"]` into a shared function,
`common.jinja_templater.apply_jinja_template.templated_value_is_truthy`
Closes https://github.com/grafana/oncall-private/issues/2490
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
- [x] Documentation added (or `pr:no public docs` PR label added if not
required) (will be done in #3762)
This PR makes OnCall compatible with chatops-proxy v3. When CHATOPS_V3
is enabled, oncall will use new api client to register tenants and slack
installations. Also I added v3 routes for slack and telegram, so it's
possible to test new chatops proxy.
Currently two versions of chatops-proxy api are deployed, but they are
not compatible. They are doing same thing, using different db model and
tables. Once only v3 version will be left in prod, I'll remove
CHATOPS_V3 env var, all leftovers of previous api client and v3 slack
and telegram routes.
---------
Co-authored-by: Vadim Stepanov <vadimkerr@gmail.com>
# What this PR does
This is a follow up to https://github.com/grafana/oncall/pull/3677.
It appears that when a task uses the [`autoretry_for`
kwarg](https://docs.celeryq.dev/en/stable/userguide/tasks.html#automatic-retry-for-known-exceptions)
in the task decorator, it doesn't log the exception in `on_failure` as
would be expected. Now when retrying, we log out a message + any
exception/stack trace information.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Closes https://github.com/grafana/oncall-private/issues/2449
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Improves OpenAPI schema coverage for internal API:
- Fixes/Improves `alert group` and `feature` endpoints
- Adds `integration` and `user` endpoints
## Which issue(s) this PR fixes
https://github.com/grafana/oncall/issues/3444
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Adds `team` filter for `users` endpoint
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
Related to https://github.com/grafana/support-escalations/issues/8844
Queuing multiple sync_organization tasks for the same org could lead to
parallel running of the sync task for the same organization, potentially
creating duplicated entries and/or generating multiple unneeded API
calls. This prevents running an organization sync while there is a sync
for that same org in progress.
# What this PR does
Add an additional jinja2 template helper filter to convert a timezone
aware datetime to a different timezone.
## Which issue(s) this PR fixes
Alert payloads that originate from different time zones may include
timestamps having a local time offset. This filter enables
standardization of timestamp timezones.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
---------
Co-authored-by: Joey Orlando <joey.orlando@grafana.com>
# Which issue(s) this PR fixes
Related to https://github.com/grafana/oncall-private/issues/2363
Addresses this issue that arises when using
`cache.get_many`/`cache.set_many` operations with a Redis Cluster:
```python3
File "/usr/local/lib/python3.11/site-packages/redis/cluster.py", line 1006, in determine_slot
raise RedisClusterException(
redis.exceptions.RedisClusterException: MGET - all keys must map to the same key slot
```
From the Redis Cluster
[docs](https://redis.io/docs/reference/cluster-spec/#hash-tags), this
can be addressed with this 👇 . Basically this will ensure that keys in
multi-key operations will resolve to the same hash slot (read: node):
> Hash tags
> There is an exception for the computation of the hash slot that is
used in order to implement hash tags. Hash tags are a way to ensure that
multiple keys are allocated in the same hash slot. This is used in order
to implement multi-key operations in Redis Cluster.
>
> To implement hash tags, the hash slot for a key is computed in a
slightly different way in certain conditions. If the key contains a
"{...}" pattern only the substring between { and } is hashed in order to
obtain the hash slot. However since it is possible that there are
multiple occurrences of { or } the algorithm is well specified by the
following rules:
>
> IF the key contains a { character.
> AND IF there is a } character to the right of {.
> AND IF there are one or more characters between the first occurrence
of { and the first occurrence of }.
> Then instead of hashing the key, only what is between the first
occurrence of { and the following first occurrence of } is hashed.
## Checklist
- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
# What this PR does
Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```
## Checklist
- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)