singularity-forge/todo.md
2026-04-19 13:25:07 +02:00

12 KiB
Raw Blame History

Gemini CLI Core Follow-Up TODO

Last updated: 2026-04-19

Context:

  • Transport re-platform landed in commit d83a59fb1 (pi-ai/google-gemini-cli: re-platform transport on @google/gemini-cli-core).
  • Model list wiring landed in commit 8abfc98fd (pi-ai: source google-gemini-cli model list from cli-core's VALID_GEMINI_MODELS).
  • The remaining work is to exploit more of @google/gemini-cli-core instead of keeping SF-specific handwritten logic around quota, retry classification, and model naming.

Relevant files:

  1. Token counting via cli-core
  2. Quota surfacing via cli-core
  3. Structured Gemini error classification
  4. Gemini model aliasing
  5. Lower-priority follow-ups (grounding, tests, keychain, telemetry)

1. Token counting via server.countTokens(...)

Status:

Why this matters:

  • Gives an accurate preflight token count for Gemini before dispatch.
  • Lets SF warn before sending oversized prompts instead of only reacting to provider overflow errors.
  • Enables future auto-compaction logic to use provider-native counts instead of heuristics.

Concrete implementation:

  • Add a small helper in packages/pi-ai/src/providers/google-gemini-cli.ts or a neighboring helper module that builds a CodeAssistServer and exposes countTokens.
  • Thread enough auth/project context into the hook path so Gemini requests can be counted without re-implementing auth discovery.
  • In before_provider_request, when the target model/provider is google-gemini-cli, call cli-core countTokens on the request payload before send.
  • Preserve graceful fallback to current heuristics when auth is missing, cli-core throws, or the request shape is unsupported.
  • Surface the result to the UI/logging path where it is actually actionable:
    • warning banner for obviously large prompts,
    • optional metrics/session recording,
    • optional compact-context trigger later.

Likely touchpoints:

  • packages/pi-ai/src/providers/google-gemini-cli.ts
  • src/resources/extensions/sf/bootstrap/register-hooks.ts
  • src/resources/extensions/sf/token-counter.ts
  • possibly packages/pi-coding-agent/src/core/sdk.ts if the hook plumbing needs richer request/model context.

Acceptance criteria:

  • Gemini requests can report an accurate pre-dispatch token count without sending an inference request.
  • Failures in counting do not block dispatch.
  • Existing token-counter tests remain green; add focused tests for Gemini preflight behavior.

Open question:

  • Decide whether the counted value belongs in ephemeral UI only, metrics only, or both. Do not wire it into multiple places until one storage/display path is chosen.

2. Quota surfacing via server.retrieveUserQuota(...) and refreshAvailableCredits()

Status:

  • src/resources/extensions/sf-usage-bar/index.ts still manually reads OAuth tokens and POSTs directly to https://cloudcode-pa.googleapis.com/v1internal:retrieveUserQuota.
  • That code bypasses cli-core even though cli-core already knows the right request/response path.

Why this matters:

  • Replaces fragile handwritten Gemini usage fetching.
  • Avoids relying on duplicated credential discovery and raw fetch behavior.
  • Makes the usage bar consistent with the same client stack the provider now uses.
  • Gives a better foundation for distinguishing rate limit exhaustion vs billing/quota exhaustion.

Concrete implementation:

  • Extract Gemini auth/client creation into a reusable helper instead of duplicating token parsing in the usage bar.
  • Replace the raw fetchGeminiUsage() request with cli-core quota APIs.
  • Normalize cli-core quota response into the existing UsageSnapshot / RateWindow shape used by the usage bar.
  • Keep current UI semantics where possible:
    • show Pro and Flash windows if quota exists,
    • preserve non-fatal error display,
    • keep timeouts bounded.
  • If cli-core returns richer quota detail, prefer surfacing remaining counts instead of only percentages when it fits the existing footer.

Likely touchpoints:

  • src/resources/extensions/sf-usage-bar/index.ts
  • packages/pi-ai/src/providers/google-gemini-cli.ts or a new shared Gemini cli-core helper
  • auth plumbing if the usage bar should stop reading raw files directly

Acceptance criteria:

  • No direct raw POST to v1internal:retrieveUserQuota remains in sf-usage-bar.
  • Gemini usage bar continues to render successfully with valid OAuth credentials.
  • Missing/expired credentials degrade to a clear non-crashing error state.

Follow-up if this lands cleanly:

  • Remove duplicated token loading from auth.json and ~/.gemini/oauth_creds.json wherever cli-core can own it.

3. Structured Gemini error classification

Status:

  • packages/pi-coding-agent/src/core/retry-handler.ts still decides retryability through large regexes in isRetryableError() and downstream string classification.
  • Claude already identified cli-core exports such as getErrorStatus, ModelNotFoundError, and googleQuotaErrors as a better source of truth for Gemini-specific failures.

Why this matters:

  • Current retry behavior is broad and stringly-typed.
  • Regex classification is hard to audit and easy to drift from upstream wording changes.
  • Structured classification would let Gemini-specific retry/backoff behavior be smarter without perturbing other providers.

Concrete implementation:

  • Add a Gemini-specific error mapping layer that:
    • consumes cli-core error helpers when the active provider is google-gemini-cli,
    • converts those into SF retry classes (rate_limit, quota_exhausted, auth_error, model_not_found, non-retryable, etc.).
  • Keep non-Gemini providers on the existing generic path unless there is a broader retry refactor planned.
  • Update retry decisions so Gemini can distinguish:
    • temporary rate-limit/cooldown,
    • quota exhaustion / credits exhausted,
    • auth problems,
    • invalid model / model not found.
  • Reduce duplicated regex patterns once the structured path is proven.

Likely touchpoints:

  • packages/pi-coding-agent/src/core/retry-handler.ts
  • possibly packages/pi-ai/src/providers/google-gemini-cli.ts if provider-specific error wrapping is needed
  • Gemini retry tests and any SF extension tests asserting rate-limit recovery behavior

Acceptance criteria:

  • Retry behavior for google-gemini-cli is driven by cli-core-aware classification rather than raw message regex where feasible.
  • Existing non-Gemini retry behavior does not regress.
  • Tests cover at least one case each for rate limit, quota exhaustion, auth failure, and model-not-found.

Important constraint:

  • Do not over-generalize this into a cross-provider retry framework in the same change. Keep the blast radius Gemini-scoped first.

4. Gemini model aliasing (auto, pro, flash, flash-lite)

Status:

Why this matters:

  • Lets preferences stay stable even as Google revs concrete model names.
  • Reduces churn in PREFERENCES.md and docs when preview/stable IDs change.
  • Complements the static model import already added from cli-core.

Concrete implementation:

  • Define Gemini aliases accepted by SF config and resolution:
    • gemini:auto
    • gemini:pro
    • gemini:flash
    • gemini:flash-lite
    • decide whether equivalent slash or bare forms should also resolve.
  • Resolve aliases through cli-cores model alias resolution rather than SF-local hardcoding where possible.
  • Hook alias resolution into the same places where model IDs are interpreted from preferences and dynamic routing.
  • Preserve current explicit-ID behavior exactly.
  • Add user-facing documentation once semantics are stable.

Likely touchpoints:

  • src/resources/extensions/sf/auto-model-selection.ts
  • model resolution helpers in SF preference/bootstrap code
  • possibly packages/pi-coding-agent/src/core/model-resolver.ts if alias resolution belongs closer to the registry layer

Acceptance criteria:

  • A user can specify Gemini aliases in preferences and have them resolve to a valid available model.
  • Explicit full model IDs still win when provided.
  • Unknown aliases fail clearly instead of silently picking an arbitrary Gemini model.

Risk to manage:

  • Avoid leaking alias behavior into non-Gemini providers or into provider/model parsing in a way that breaks existing provider/model resolution semantics.

5. Lower-priority follow-ups

5a. Google Search grounding tool

Potential value:

  • Native Gemini grounding could replace some Brave/Tavily usage for Gemini-dispatched work.

Why this is lower priority:

  • It touches provider tools, routing, and citation UX, which is a bigger behavioral change than quota/counting.

Suggested approach:

  • Investigate only after quota/counting/error classification are stable.
  • Keep the change Gemini-scoped and opt-in first.

5b. Test harness migration

Potential value:

  • makeFakeConfig, createMockWorkspaceContext, and MockMessageBus can make provider tests more realistic and less dependent on raw network mocking.

Suggested approach:

  • Fold this into whichever of items 1-4 lands first rather than doing it standalone.

5c. MCP keychain storage

Potential value:

  • Better OAuth token storage for MCP auth flows.

Why this is separate:

  • It is an auth-storage architecture change, not a Gemini provider follow-up.

5d. IDE detection, terminal serialization, experiment flags, telemetry

Potential value:

  • Nice integrations and better fidelity with upstream CLI behavior.

Why this is last:

  • None of them unblock current Gemini provider correctness or UX.

Proposed task breakdown for delegation

If these are delegated as separate rescue tasks, use this split:

  1. gemini-token-counting
  • Implement cli-core-backed Gemini preflight counting.
  • Own hook integration and tests only.
  • Do not touch usage bar or retry logic.
  1. gemini-quota-usage-bar
  • Replace Gemini raw quota fetch with cli-core quota APIs.
  • Own sf-usage-bar integration and Gemini auth helper reuse.
  • Do not touch retry handler or model resolution.
  1. gemini-retry-classification
  • Introduce cli-core-backed Gemini error classification in retry handling.
  • Own retry tests and minimal provider-side wrapping if needed.
  • Do not change quota UI or aliasing.
  1. gemini-model-aliases
  • Add auto/pro/flash/flash-lite alias resolution for Gemini preferences.
  • Own resolution/documentation/tests only.

Suggested definition of done

The follow-up is complete when:

  • Gemini transport, quota lookup, and retry classification all rely on cli-core where upstream already provides that behavior.
  • SF no longer carries duplicated handwritten Gemini logic for areas cli-core already owns.
  • User-facing configuration for Gemini is more stable than concrete preview model IDs.
  • Each change lands with focused tests and without broad cross-provider refactors.