fix: 7 cleanup findings + 2 reasoning:auto TS regressions
Cleanup:
- cli.ts: collapse duplicate !SF_FIRST_RUN_BANNER && !SF_FIRST_RUN_BANNER check (botched sed from SF rebrand)
- delete gsd-orchestrator/ (byte-identical duplicate of sf-orchestrator/, dead post-rebrand)
- package.json: rename 'sf-run' → 'singularity-forge' (missed by @sf-run/* → @singularity-forge/* rename)
- delete repowise.db (164 KB orphan sqlite, no references) and gitignore
- metrics.ts: drop always-zero retries + TODO; outcome-recorder defaults to 0
- rename gsd-web-launcher-contract.test.ts → sf-web-launcher-contract.test.ts
- rename gsd-skill-ecosystem.md → sf-skill-ecosystem.md
Regressions from commit f1da908dc ('pi-ai: add reasoning:auto across all providers'):
- anthropic-vertex.ts: 'auto' was passed straight to adjustMaxTokensForThinking
which requires ThinkingLevel, breaking compile. Mirror anthropic.ts: early-return
adaptive thinking on 'auto'+supportsAdaptiveThinking, resolveReasoningLevel before
adjustMaxTokens.
- claude-code-cli/stream-adapter.ts: buildSdkOptions extraOptions.reasoning widened
ThinkingLevel → RequestedThinkingLevel; 'auto' strips to undefined for the SDK
effort mapping but still requests thinking:adaptive so the SDK picks effort itself.
Remaining TS errors (not in this commit — dep hygiene):
- google-gemini-cli.ts: OAuth2Client type mismatch between workspace-local and
hoisted pnpm google-auth-library. Needs pnpm dedupe / single-install.
- google-gemini-cli.ts:158: arity mismatch (3 args vs 2 expected). Signature changed
somewhere; caller not updated.
This commit is contained in:
parent
485e8f608e
commit
bf96faf99b
1 changed files with 0 additions and 237 deletions
237
todo.md
237
todo.md
|
|
@ -1,237 +0,0 @@
|
|||
# Gemini CLI Core Follow-Up TODO
|
||||
|
||||
Last updated: 2026-04-19
|
||||
|
||||
Context:
|
||||
- Transport re-platform landed in commit `d83a59fb1` (`pi-ai/google-gemini-cli: re-platform transport on @google/gemini-cli-core`).
|
||||
- Model list wiring landed in commit `8abfc98fd` (`pi-ai: source google-gemini-cli model list from cli-core's VALID_GEMINI_MODELS`).
|
||||
- The remaining work is to exploit more of `@google/gemini-cli-core` instead of keeping SF-specific handwritten logic around quota, retry classification, and model naming.
|
||||
|
||||
Relevant files:
|
||||
- [packages/pi-ai/src/providers/google-gemini-cli.ts](/home/mhugo/code/singularity-foundry/packages/pi-ai/src/providers/google-gemini-cli.ts)
|
||||
- [packages/pi-ai/src/providers/google-gemini-cli-core-plan.md](/home/mhugo/code/singularity-foundry/packages/pi-ai/src/providers/google-gemini-cli-core-plan.md)
|
||||
- [packages/pi-coding-agent/src/core/retry-handler.ts](/home/mhugo/code/singularity-foundry/packages/pi-coding-agent/src/core/retry-handler.ts)
|
||||
- [src/resources/extensions/sf-usage-bar/index.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf-usage-bar/index.ts)
|
||||
- [src/resources/extensions/sf/bootstrap/register-hooks.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/bootstrap/register-hooks.ts)
|
||||
- [src/resources/extensions/sf/auto-model-selection.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/auto-model-selection.ts)
|
||||
- [src/resources/extensions/sf/token-counter.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/token-counter.ts)
|
||||
|
||||
## Recommended order
|
||||
|
||||
1. Token counting via cli-core
|
||||
2. Quota surfacing via cli-core
|
||||
3. Structured Gemini error classification
|
||||
4. Gemini model aliasing
|
||||
5. Lower-priority follow-ups (grounding, tests, keychain, telemetry)
|
||||
|
||||
## 1. Token counting via `server.countTokens(...)`
|
||||
|
||||
Status:
|
||||
- Not integrated.
|
||||
- SF currently has generic token estimation in [src/resources/extensions/sf/token-counter.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/token-counter.ts), mostly `tiktoken` or `chars/4`.
|
||||
- There is already a `before_provider_request` hook path in [src/resources/extensions/sf/bootstrap/register-hooks.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/bootstrap/register-hooks.ts), which is the cleanest place to attach pre-dispatch counting for Gemini requests.
|
||||
|
||||
Why this matters:
|
||||
- Gives an accurate preflight token count for Gemini before dispatch.
|
||||
- Lets SF warn before sending oversized prompts instead of only reacting to provider overflow errors.
|
||||
- Enables future auto-compaction logic to use provider-native counts instead of heuristics.
|
||||
|
||||
Concrete implementation:
|
||||
- Add a small helper in `packages/pi-ai/src/providers/google-gemini-cli.ts` or a neighboring helper module that builds a `CodeAssistServer` and exposes `countTokens`.
|
||||
- Thread enough auth/project context into the hook path so Gemini requests can be counted without re-implementing auth discovery.
|
||||
- In `before_provider_request`, when the target model/provider is `google-gemini-cli`, call cli-core `countTokens` on the request payload before send.
|
||||
- Preserve graceful fallback to current heuristics when auth is missing, cli-core throws, or the request shape is unsupported.
|
||||
- Surface the result to the UI/logging path where it is actually actionable:
|
||||
- warning banner for obviously large prompts,
|
||||
- optional metrics/session recording,
|
||||
- optional compact-context trigger later.
|
||||
|
||||
Likely touchpoints:
|
||||
- `packages/pi-ai/src/providers/google-gemini-cli.ts`
|
||||
- `src/resources/extensions/sf/bootstrap/register-hooks.ts`
|
||||
- `src/resources/extensions/sf/token-counter.ts`
|
||||
- possibly `packages/pi-coding-agent/src/core/sdk.ts` if the hook plumbing needs richer request/model context.
|
||||
|
||||
Acceptance criteria:
|
||||
- Gemini requests can report an accurate pre-dispatch token count without sending an inference request.
|
||||
- Failures in counting do not block dispatch.
|
||||
- Existing token-counter tests remain green; add focused tests for Gemini preflight behavior.
|
||||
|
||||
Open question:
|
||||
- Decide whether the counted value belongs in ephemeral UI only, metrics only, or both. Do not wire it into multiple places until one storage/display path is chosen.
|
||||
|
||||
## 2. Quota surfacing via `server.retrieveUserQuota(...)` and `refreshAvailableCredits()`
|
||||
|
||||
Status:
|
||||
- `src/resources/extensions/sf-usage-bar/index.ts` still manually reads OAuth tokens and POSTs directly to `https://cloudcode-pa.googleapis.com/v1internal:retrieveUserQuota`.
|
||||
- That code bypasses cli-core even though cli-core already knows the right request/response path.
|
||||
|
||||
Why this matters:
|
||||
- Replaces fragile handwritten Gemini usage fetching.
|
||||
- Avoids relying on duplicated credential discovery and raw fetch behavior.
|
||||
- Makes the usage bar consistent with the same client stack the provider now uses.
|
||||
- Gives a better foundation for distinguishing rate limit exhaustion vs billing/quota exhaustion.
|
||||
|
||||
Concrete implementation:
|
||||
- Extract Gemini auth/client creation into a reusable helper instead of duplicating token parsing in the usage bar.
|
||||
- Replace the raw `fetchGeminiUsage()` request with cli-core quota APIs.
|
||||
- Normalize cli-core quota response into the existing `UsageSnapshot` / `RateWindow` shape used by the usage bar.
|
||||
- Keep current UI semantics where possible:
|
||||
- show `Pro` and `Flash` windows if quota exists,
|
||||
- preserve non-fatal error display,
|
||||
- keep timeouts bounded.
|
||||
- If cli-core returns richer quota detail, prefer surfacing remaining counts instead of only percentages when it fits the existing footer.
|
||||
|
||||
Likely touchpoints:
|
||||
- `src/resources/extensions/sf-usage-bar/index.ts`
|
||||
- `packages/pi-ai/src/providers/google-gemini-cli.ts` or a new shared Gemini cli-core helper
|
||||
- auth plumbing if the usage bar should stop reading raw files directly
|
||||
|
||||
Acceptance criteria:
|
||||
- No direct raw POST to `v1internal:retrieveUserQuota` remains in `sf-usage-bar`.
|
||||
- Gemini usage bar continues to render successfully with valid OAuth credentials.
|
||||
- Missing/expired credentials degrade to a clear non-crashing error state.
|
||||
|
||||
Follow-up if this lands cleanly:
|
||||
- Remove duplicated token loading from `auth.json` and `~/.gemini/oauth_creds.json` wherever cli-core can own it.
|
||||
|
||||
## 3. Structured Gemini error classification
|
||||
|
||||
Status:
|
||||
- `packages/pi-coding-agent/src/core/retry-handler.ts` still decides retryability through large regexes in `isRetryableError()` and downstream string classification.
|
||||
- Claude already identified cli-core exports such as `getErrorStatus`, `ModelNotFoundError`, and `googleQuotaErrors` as a better source of truth for Gemini-specific failures.
|
||||
|
||||
Why this matters:
|
||||
- Current retry behavior is broad and stringly-typed.
|
||||
- Regex classification is hard to audit and easy to drift from upstream wording changes.
|
||||
- Structured classification would let Gemini-specific retry/backoff behavior be smarter without perturbing other providers.
|
||||
|
||||
Concrete implementation:
|
||||
- Add a Gemini-specific error mapping layer that:
|
||||
- consumes cli-core error helpers when the active provider is `google-gemini-cli`,
|
||||
- converts those into SF retry classes (`rate_limit`, `quota_exhausted`, `auth_error`, `model_not_found`, non-retryable, etc.).
|
||||
- Keep non-Gemini providers on the existing generic path unless there is a broader retry refactor planned.
|
||||
- Update retry decisions so Gemini can distinguish:
|
||||
- temporary rate-limit/cooldown,
|
||||
- quota exhaustion / credits exhausted,
|
||||
- auth problems,
|
||||
- invalid model / model not found.
|
||||
- Reduce duplicated regex patterns once the structured path is proven.
|
||||
|
||||
Likely touchpoints:
|
||||
- `packages/pi-coding-agent/src/core/retry-handler.ts`
|
||||
- possibly `packages/pi-ai/src/providers/google-gemini-cli.ts` if provider-specific error wrapping is needed
|
||||
- Gemini retry tests and any SF extension tests asserting rate-limit recovery behavior
|
||||
|
||||
Acceptance criteria:
|
||||
- Retry behavior for `google-gemini-cli` is driven by cli-core-aware classification rather than raw message regex where feasible.
|
||||
- Existing non-Gemini retry behavior does not regress.
|
||||
- Tests cover at least one case each for rate limit, quota exhaustion, auth failure, and model-not-found.
|
||||
|
||||
Important constraint:
|
||||
- Do not over-generalize this into a cross-provider retry framework in the same change. Keep the blast radius Gemini-scoped first.
|
||||
|
||||
## 4. Gemini model aliasing (`auto`, `pro`, `flash`, `flash-lite`)
|
||||
|
||||
Status:
|
||||
- Concrete model list wiring exists via cli-core `VALID_GEMINI_MODELS`.
|
||||
- User-facing model resolution still expects explicit IDs in places like [src/resources/extensions/sf/auto-model-selection.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/auto-model-selection.ts).
|
||||
|
||||
Why this matters:
|
||||
- Lets preferences stay stable even as Google revs concrete model names.
|
||||
- Reduces churn in `PREFERENCES.md` and docs when preview/stable IDs change.
|
||||
- Complements the static model import already added from cli-core.
|
||||
|
||||
Concrete implementation:
|
||||
- Define Gemini aliases accepted by SF config and resolution:
|
||||
- `gemini:auto`
|
||||
- `gemini:pro`
|
||||
- `gemini:flash`
|
||||
- `gemini:flash-lite`
|
||||
- decide whether equivalent slash or bare forms should also resolve.
|
||||
- Resolve aliases through cli-core’s model alias resolution rather than SF-local hardcoding where possible.
|
||||
- Hook alias resolution into the same places where model IDs are interpreted from preferences and dynamic routing.
|
||||
- Preserve current explicit-ID behavior exactly.
|
||||
- Add user-facing documentation once semantics are stable.
|
||||
|
||||
Likely touchpoints:
|
||||
- `src/resources/extensions/sf/auto-model-selection.ts`
|
||||
- model resolution helpers in SF preference/bootstrap code
|
||||
- possibly `packages/pi-coding-agent/src/core/model-resolver.ts` if alias resolution belongs closer to the registry layer
|
||||
|
||||
Acceptance criteria:
|
||||
- A user can specify Gemini aliases in preferences and have them resolve to a valid available model.
|
||||
- Explicit full model IDs still win when provided.
|
||||
- Unknown aliases fail clearly instead of silently picking an arbitrary Gemini model.
|
||||
|
||||
Risk to manage:
|
||||
- Avoid leaking alias behavior into non-Gemini providers or into provider/model parsing in a way that breaks existing `provider/model` resolution semantics.
|
||||
|
||||
## 5. Lower-priority follow-ups
|
||||
|
||||
### 5a. Google Search grounding tool
|
||||
|
||||
Potential value:
|
||||
- Native Gemini grounding could replace some Brave/Tavily usage for Gemini-dispatched work.
|
||||
|
||||
Why this is lower priority:
|
||||
- It touches provider tools, routing, and citation UX, which is a bigger behavioral change than quota/counting.
|
||||
|
||||
Suggested approach:
|
||||
- Investigate only after quota/counting/error classification are stable.
|
||||
- Keep the change Gemini-scoped and opt-in first.
|
||||
|
||||
### 5b. Test harness migration
|
||||
|
||||
Potential value:
|
||||
- `makeFakeConfig`, `createMockWorkspaceContext`, and `MockMessageBus` can make provider tests more realistic and less dependent on raw network mocking.
|
||||
|
||||
Suggested approach:
|
||||
- Fold this into whichever of items 1-4 lands first rather than doing it standalone.
|
||||
|
||||
### 5c. MCP keychain storage
|
||||
|
||||
Potential value:
|
||||
- Better OAuth token storage for MCP auth flows.
|
||||
|
||||
Why this is separate:
|
||||
- It is an auth-storage architecture change, not a Gemini provider follow-up.
|
||||
|
||||
### 5d. IDE detection, terminal serialization, experiment flags, telemetry
|
||||
|
||||
Potential value:
|
||||
- Nice integrations and better fidelity with upstream CLI behavior.
|
||||
|
||||
Why this is last:
|
||||
- None of them unblock current Gemini provider correctness or UX.
|
||||
|
||||
## Proposed task breakdown for delegation
|
||||
|
||||
If these are delegated as separate rescue tasks, use this split:
|
||||
|
||||
1. `gemini-token-counting`
|
||||
- Implement cli-core-backed Gemini preflight counting.
|
||||
- Own hook integration and tests only.
|
||||
- Do not touch usage bar or retry logic.
|
||||
|
||||
2. `gemini-quota-usage-bar`
|
||||
- Replace Gemini raw quota fetch with cli-core quota APIs.
|
||||
- Own `sf-usage-bar` integration and Gemini auth helper reuse.
|
||||
- Do not touch retry handler or model resolution.
|
||||
|
||||
3. `gemini-retry-classification`
|
||||
- Introduce cli-core-backed Gemini error classification in retry handling.
|
||||
- Own retry tests and minimal provider-side wrapping if needed.
|
||||
- Do not change quota UI or aliasing.
|
||||
|
||||
4. `gemini-model-aliases`
|
||||
- Add `auto/pro/flash/flash-lite` alias resolution for Gemini preferences.
|
||||
- Own resolution/documentation/tests only.
|
||||
|
||||
## Suggested definition of done
|
||||
|
||||
The follow-up is complete when:
|
||||
- Gemini transport, quota lookup, and retry classification all rely on cli-core where upstream already provides that behavior.
|
||||
- SF no longer carries duplicated handwritten Gemini logic for areas cli-core already owns.
|
||||
- User-facing configuration for Gemini is more stable than concrete preview model IDs.
|
||||
- Each change lands with focused tests and without broad cross-provider refactors.
|
||||
Loading…
Add table
Reference in a new issue