Commit graph

14 commits

Author SHA1 Message Date
Joey Orlando
92ac1d884f
address occasional missing nextCursor data key from GCOM GET /instances (#4794)
# What this PR does

Addressing this task exception
([logs](https://ops.grafana-ops.net/goto/rkVAurrSR?orgId=1)):
```
2024-08-08 16:30:49,455 source=engine:celery worker=ForkPoolWorker-18 task_id=969226be-64a8-4616-ac32-3909d1f0cb60 task_name=apps.grafana_plugin.tasks.sync.start_sync_organizations name=celery.app.trace level=ERROR Task apps.grafana_plugin.tasks.sync.start_sync_organizations[969226be-64a8-4616-ac32-3909d1f0cb60] raised unexpected: KeyError('nextCursor')
```

[This
conversation](https://raintank-corp.slack.com/archives/C0K031RP1/p1723158123932529)
in `#grafana-com-dev` has more context. The _tldr;_ takeaway after
chatting w/ the GCOM team was lets simply retry requests in this case.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-08-09 15:42:10 +00:00
Joey Orlando
2582a1b1dc
Refactor how RBAC enabled/disabled status is determined for Grafana Cloud stacks (#4279)
# What this PR does

In cloud we are currently (somewhat) improperly determining whether or
not a Grafana stack had the `accessControlOnCall` feature flag enabled.
At first things worked fine. We would enable this feature toggle via the
Grafana Admin UI, and then the OnCall backend would read this value from
GCOM's `GET /instance/<stack_id>` endpoint (via
`config.feature_toggles`), and everything worked as expected.

There was a recent change made in `grafana/deployment_tools` to set this
feature flag to True for all stacks. However, for some reason, the GCOM
endpoint above doesn't return the `accessControlOnCall` feature toggle
value in `config.feature_toggles` if it is set in this manner (it only
returns the value if it is set via the Grafana Admin UI).

So what we should instead be doing is such instead of asking GCOM for
this feature toggle, infer whether RBAC is enabled on the stack by doing
a `HEAD /api/access-control/users/permissions/search` (this endpoint _is
only_ available on a Grafana stack if `accessControlOnCall` is enabled).

**Few caveats to this ☝️**
1. we first have to make sure that the cloud stack is in an `active`
state (ie. not paused). This is because, no matter if the
`accessControlOnCall` is enabled or not, if the stack is in a `paused`
state it will ALWAYS return `HTTP 200` which can be misleading and lead
to bugs (this feels like a bug on the Grafana API, will follow up with
core grafana team)
2. Once we roll out this change we will effectively **actually** be
enabling RBAC for OnCall for all orgs. The Identity Access team would
prefer a progressive rollout, which is why I decided to introduce the
concept of
[`settings.CLOUD_RBAC_ROLLOUT_PERCENTAGE`](https://github.com/grafana/oncall/pull/4279/files#diff-3383aef931e41e44d95829ad971641eeb98fe001be2f5da92217446d300ea1b3R918)
(see also [`Organization.
should_be_considered_for_rbac_permissioning`](https://github.com/grafana/oncall/pull/4279/files#diff-2ca9917f4f56349be39545ee8abd459be5076295d02ca3a7ec545152fcddccdfR348-R362))

## Which issue(s) this PR closes

Related to https://github.com/grafana/identity-access-team/issues/667

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
2024-05-14 16:30:16 +00:00
Michael Derynck
d938b52d80
Add scheduled task to start cleanup tasks (#3976)
# What this PR does
Add scheduled task to start cleanup tasks. Currently purpose is to run
the task every 12 hours and for all active orgs cleanup empty & deleted
integrations. For deleted orgs we can run this manually. It will also
run if an org moves from active to deleted. It is expected to add more
to cleanup_organization task over time.

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2024-03-04 19:45:01 +00:00
Matias Bordese
e39baa6bbe
Revert "Refactor gcom api calls when syncing org" (#3498)
Reverts grafana/oncall#3489

Reviewing logs, it seems something broke related to [token
auth](https://ops.grafana-ops.net/explore?schemaVersion=1&panes=%7B%22ffS%22:%7B%22datasource%22:%22OP27Xzxnk%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bcluster%3D~%5C%22dev-.%2A%5C%22,%20namespace%3D%5C%22grafana-com%5C%22,%20job%3D%5C%22grafana-com%2Fgrafana-com-api%5C%22%7D%20%7C~%20%5C%22%2Finstances%2F%5Ba-z0-9%5D%2B.config%3Dtrue%5C%22%20%7C%3D%20%5C%22Grafana%20OnCall%5C%22%22,%22editorMode%22:%22code%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22OP27Xzxnk%22%7D%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&orgId=1).
Reverting for now, will revisit in a later PR.
2023-12-04 18:02:58 +00:00
Matias Bordese
9eb09c0272
Refactor gcom api calls when syncing org (#3489)
Make one API call instead of two.
2023-12-04 13:08:59 +00:00
Joey Orlando
76a88bc0c1
Revert "upgrade to Python 3.12 (#3456)" and "bump uwsgi version to latest #3466" (#3483)
# What this PR does

This reverts commits 7c4b40a046 and
cdb22285db.

See https://github.com/grafana/oncall-private/pull/2361 for more
details.
2023-12-01 09:56:26 -05:00
Joey Orlando
7c4b40a046
upgrade to Python 3.12 (#3456)
# What this PR does

Upgrade to Python 3.12 + fix several invalid test assertions that lead
to test failures in the latest version of `pytest`:
```
AttributeError: 'called_once_with' is not a valid assertion. Use a spec for the mock if 'called_once_with' is meant to be an attribute.. Did you mean: 'assert_called_once_with'?
```

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-11-30 13:47:41 +00:00
Matias Bordese
7aa78f5f73
Enable flake8-bugbear, fix issues (#3454)
Enables [flake8-bugbear](https://github.com/PyCQA/flake8-bugbear),
checking for bugs/design problems, and [fixes the issues
found](https://pastebin.com/fEDBz6Ta) (some interesting ones,
particularly with mutable args).

Related to https://github.com/grafana/oncall/pull/3448
2023-11-29 15:04:48 +00:00
Michael Derynck
f201fd2be2
Paginate calls to get instances from gcom (#2669)
# What this PR does
GCOM now has many more instances returned than in the past. Paginate
these calls instead of getting all at once.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-28 21:19:27 +00:00
Vadim Stepanov
f977f9faee
Minor formatting changes (#2641)
# What this PR does

- Updates `black` and `flake8` to latest
- Removes `F541` from flake8 ignore (`F541 f-string is missing
placeholders`)
- Enables ["float to top"
option](https://pycqa.github.io/isort/docs/configuration/options.html#float-to-top)
for `isort`
2023-07-26 14:45:44 +01:00
Joey Orlando
f16df03279
support comma and space delimited grafana feature_toggles (#2623)
## Which issue(s) this PR fixes

See the following threads for more context on the issue this PR
addresses:
- https://raintank-corp.slack.com/archives/CRUKW54N5/p1690068395450819
- https://raintank-corp.slack.com/archives/C036J5B39/p1690183217162019

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-25 02:39:39 -04:00
Joey Orlando
182f18de2c
fix parsing of grafana feature flags that're enabled via the feature_toggles.enabled syntax (#2477)
# What this PR does

Grafana provides two methods for enabling feature flags:
- `feature_toggles.enabled`
- `feature_toggles.<name_of_feature_flag>`

For example, to enable `accessControlOnCall` you could either do:
```json
{
  "feature_toggles": {
    "enabled": "accessControlOnCall,someOtherCoolFeatureFlag"
  }
}
```

or:
```json
{
  "feature_toggles": {
    "accessControlOnCall": true,
    "someOtherCoolFeatureFlag": true
  }
}
```

In method 1, if they're multiple feature flags present, they are _comma
separated, not space separated_. This PR fixes this parsing issue.

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-07-10 04:59:15 -04:00
Joey Orlando
2879537c30
properly parse grafana cloud feature toggles (#1880)
# What this PR does

## Which issue(s) this PR fixes

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
2023-05-04 16:38:26 +00:00
Joey Orlando
babacf4da8
refactor the is_rbac_permissions_enabled check to be more robust (#1099)
# What this PR does
Checks the `is_rbac_permissions_enabled` flag differently based on
whether we are dealing with an open-source, or cloud installation:
- for open-source installations, simply continue making a `HEAD` request
to the list RBAC permissions Grafana API endpoint.
- for cloud installations, use the `config` object returned from `GET
/instances/{instance_id}?config=true` and check whether
`instance_info["config"]["feature_toggles"]["accessControlOnCall"] ==
"true"`

## Which issue(s) this PR fixes
Resolves the issue in hosted grafana where when a stack is inactive, the
hosted grafana gateway, returns 200 to the `HEAD` request (which
erroneously sets the `is_rbac_permissions_enabled` flag to `true`)

## Checklist

- [x] Tests updated (N/A)
- [ ] Documentation added
- [x] `CHANGELOG.md` updated
2023-01-11 12:48:30 +01:00