12 KiB
12 KiB
Gemini CLI Core Follow-Up TODO
Last updated: 2026-04-19
Context:
- Transport re-platform landed in commit
d83a59fb1(pi-ai/google-gemini-cli: re-platform transport on @google/gemini-cli-core). - Model list wiring landed in commit
8abfc98fd(pi-ai: source google-gemini-cli model list from cli-core's VALID_GEMINI_MODELS). - The remaining work is to exploit more of
@google/gemini-cli-coreinstead of keeping SF-specific handwritten logic around quota, retry classification, and model naming.
Relevant files:
- packages/pi-ai/src/providers/google-gemini-cli.ts
- packages/pi-ai/src/providers/google-gemini-cli-core-plan.md
- packages/pi-coding-agent/src/core/retry-handler.ts
- src/resources/extensions/sf-usage-bar/index.ts
- src/resources/extensions/sf/bootstrap/register-hooks.ts
- src/resources/extensions/sf/auto-model-selection.ts
- src/resources/extensions/sf/token-counter.ts
Recommended order
- Token counting via cli-core
- Quota surfacing via cli-core
- Structured Gemini error classification
- Gemini model aliasing
- Lower-priority follow-ups (grounding, tests, keychain, telemetry)
1. Token counting via server.countTokens(...)
Status:
- Not integrated.
- SF currently has generic token estimation in src/resources/extensions/sf/token-counter.ts, mostly
tiktokenorchars/4. - There is already a
before_provider_requesthook path in src/resources/extensions/sf/bootstrap/register-hooks.ts, which is the cleanest place to attach pre-dispatch counting for Gemini requests.
Why this matters:
- Gives an accurate preflight token count for Gemini before dispatch.
- Lets SF warn before sending oversized prompts instead of only reacting to provider overflow errors.
- Enables future auto-compaction logic to use provider-native counts instead of heuristics.
Concrete implementation:
- Add a small helper in
packages/pi-ai/src/providers/google-gemini-cli.tsor a neighboring helper module that builds aCodeAssistServerand exposescountTokens. - Thread enough auth/project context into the hook path so Gemini requests can be counted without re-implementing auth discovery.
- In
before_provider_request, when the target model/provider isgoogle-gemini-cli, call cli-corecountTokenson the request payload before send. - Preserve graceful fallback to current heuristics when auth is missing, cli-core throws, or the request shape is unsupported.
- Surface the result to the UI/logging path where it is actually actionable:
- warning banner for obviously large prompts,
- optional metrics/session recording,
- optional compact-context trigger later.
Likely touchpoints:
packages/pi-ai/src/providers/google-gemini-cli.tssrc/resources/extensions/sf/bootstrap/register-hooks.tssrc/resources/extensions/sf/token-counter.ts- possibly
packages/pi-coding-agent/src/core/sdk.tsif the hook plumbing needs richer request/model context.
Acceptance criteria:
- Gemini requests can report an accurate pre-dispatch token count without sending an inference request.
- Failures in counting do not block dispatch.
- Existing token-counter tests remain green; add focused tests for Gemini preflight behavior.
Open question:
- Decide whether the counted value belongs in ephemeral UI only, metrics only, or both. Do not wire it into multiple places until one storage/display path is chosen.
2. Quota surfacing via server.retrieveUserQuota(...) and refreshAvailableCredits()
Status:
src/resources/extensions/sf-usage-bar/index.tsstill manually reads OAuth tokens and POSTs directly tohttps://cloudcode-pa.googleapis.com/v1internal:retrieveUserQuota.- That code bypasses cli-core even though cli-core already knows the right request/response path.
Why this matters:
- Replaces fragile handwritten Gemini usage fetching.
- Avoids relying on duplicated credential discovery and raw fetch behavior.
- Makes the usage bar consistent with the same client stack the provider now uses.
- Gives a better foundation for distinguishing rate limit exhaustion vs billing/quota exhaustion.
Concrete implementation:
- Extract Gemini auth/client creation into a reusable helper instead of duplicating token parsing in the usage bar.
- Replace the raw
fetchGeminiUsage()request with cli-core quota APIs. - Normalize cli-core quota response into the existing
UsageSnapshot/RateWindowshape used by the usage bar. - Keep current UI semantics where possible:
- show
ProandFlashwindows if quota exists, - preserve non-fatal error display,
- keep timeouts bounded.
- show
- If cli-core returns richer quota detail, prefer surfacing remaining counts instead of only percentages when it fits the existing footer.
Likely touchpoints:
src/resources/extensions/sf-usage-bar/index.tspackages/pi-ai/src/providers/google-gemini-cli.tsor a new shared Gemini cli-core helper- auth plumbing if the usage bar should stop reading raw files directly
Acceptance criteria:
- No direct raw POST to
v1internal:retrieveUserQuotaremains insf-usage-bar. - Gemini usage bar continues to render successfully with valid OAuth credentials.
- Missing/expired credentials degrade to a clear non-crashing error state.
Follow-up if this lands cleanly:
- Remove duplicated token loading from
auth.jsonand~/.gemini/oauth_creds.jsonwherever cli-core can own it.
3. Structured Gemini error classification
Status:
packages/pi-coding-agent/src/core/retry-handler.tsstill decides retryability through large regexes inisRetryableError()and downstream string classification.- Claude already identified cli-core exports such as
getErrorStatus,ModelNotFoundError, andgoogleQuotaErrorsas a better source of truth for Gemini-specific failures.
Why this matters:
- Current retry behavior is broad and stringly-typed.
- Regex classification is hard to audit and easy to drift from upstream wording changes.
- Structured classification would let Gemini-specific retry/backoff behavior be smarter without perturbing other providers.
Concrete implementation:
- Add a Gemini-specific error mapping layer that:
- consumes cli-core error helpers when the active provider is
google-gemini-cli, - converts those into SF retry classes (
rate_limit,quota_exhausted,auth_error,model_not_found, non-retryable, etc.).
- consumes cli-core error helpers when the active provider is
- Keep non-Gemini providers on the existing generic path unless there is a broader retry refactor planned.
- Update retry decisions so Gemini can distinguish:
- temporary rate-limit/cooldown,
- quota exhaustion / credits exhausted,
- auth problems,
- invalid model / model not found.
- Reduce duplicated regex patterns once the structured path is proven.
Likely touchpoints:
packages/pi-coding-agent/src/core/retry-handler.ts- possibly
packages/pi-ai/src/providers/google-gemini-cli.tsif provider-specific error wrapping is needed - Gemini retry tests and any SF extension tests asserting rate-limit recovery behavior
Acceptance criteria:
- Retry behavior for
google-gemini-cliis driven by cli-core-aware classification rather than raw message regex where feasible. - Existing non-Gemini retry behavior does not regress.
- Tests cover at least one case each for rate limit, quota exhaustion, auth failure, and model-not-found.
Important constraint:
- Do not over-generalize this into a cross-provider retry framework in the same change. Keep the blast radius Gemini-scoped first.
4. Gemini model aliasing (auto, pro, flash, flash-lite)
Status:
- Concrete model list wiring exists via cli-core
VALID_GEMINI_MODELS. - User-facing model resolution still expects explicit IDs in places like src/resources/extensions/sf/auto-model-selection.ts.
Why this matters:
- Lets preferences stay stable even as Google revs concrete model names.
- Reduces churn in
PREFERENCES.mdand docs when preview/stable IDs change. - Complements the static model import already added from cli-core.
Concrete implementation:
- Define Gemini aliases accepted by SF config and resolution:
gemini:autogemini:progemini:flashgemini:flash-lite- decide whether equivalent slash or bare forms should also resolve.
- Resolve aliases through cli-core’s model alias resolution rather than SF-local hardcoding where possible.
- Hook alias resolution into the same places where model IDs are interpreted from preferences and dynamic routing.
- Preserve current explicit-ID behavior exactly.
- Add user-facing documentation once semantics are stable.
Likely touchpoints:
src/resources/extensions/sf/auto-model-selection.ts- model resolution helpers in SF preference/bootstrap code
- possibly
packages/pi-coding-agent/src/core/model-resolver.tsif alias resolution belongs closer to the registry layer
Acceptance criteria:
- A user can specify Gemini aliases in preferences and have them resolve to a valid available model.
- Explicit full model IDs still win when provided.
- Unknown aliases fail clearly instead of silently picking an arbitrary Gemini model.
Risk to manage:
- Avoid leaking alias behavior into non-Gemini providers or into provider/model parsing in a way that breaks existing
provider/modelresolution semantics.
5. Lower-priority follow-ups
5a. Google Search grounding tool
Potential value:
- Native Gemini grounding could replace some Brave/Tavily usage for Gemini-dispatched work.
Why this is lower priority:
- It touches provider tools, routing, and citation UX, which is a bigger behavioral change than quota/counting.
Suggested approach:
- Investigate only after quota/counting/error classification are stable.
- Keep the change Gemini-scoped and opt-in first.
5b. Test harness migration
Potential value:
makeFakeConfig,createMockWorkspaceContext, andMockMessageBuscan make provider tests more realistic and less dependent on raw network mocking.
Suggested approach:
- Fold this into whichever of items 1-4 lands first rather than doing it standalone.
5c. MCP keychain storage
Potential value:
- Better OAuth token storage for MCP auth flows.
Why this is separate:
- It is an auth-storage architecture change, not a Gemini provider follow-up.
5d. IDE detection, terminal serialization, experiment flags, telemetry
Potential value:
- Nice integrations and better fidelity with upstream CLI behavior.
Why this is last:
- None of them unblock current Gemini provider correctness or UX.
Proposed task breakdown for delegation
If these are delegated as separate rescue tasks, use this split:
gemini-token-counting
- Implement cli-core-backed Gemini preflight counting.
- Own hook integration and tests only.
- Do not touch usage bar or retry logic.
gemini-quota-usage-bar
- Replace Gemini raw quota fetch with cli-core quota APIs.
- Own
sf-usage-barintegration and Gemini auth helper reuse. - Do not touch retry handler or model resolution.
gemini-retry-classification
- Introduce cli-core-backed Gemini error classification in retry handling.
- Own retry tests and minimal provider-side wrapping if needed.
- Do not change quota UI or aliasing.
gemini-model-aliases
- Add
auto/pro/flash/flash-litealias resolution for Gemini preferences. - Own resolution/documentation/tests only.
Suggested definition of done
The follow-up is complete when:
- Gemini transport, quota lookup, and retry classification all rely on cli-core where upstream already provides that behavior.
- SF no longer carries duplicated handwritten Gemini logic for areas cli-core already owns.
- User-facing configuration for Gemini is more stable than concrete preview model IDs.
- Each change lands with focused tests and without broad cross-provider refactors.