From bf96faf99b0565ec7fa0acba3a37c719a03d83db Mon Sep 17 00:00:00 2001 From: ace-pm Date: Tue, 21 Apr 2026 01:38:19 +0200 Subject: [PATCH] fix: 7 cleanup findings + 2 reasoning:auto TS regressions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cleanup: - cli.ts: collapse duplicate !SF_FIRST_RUN_BANNER && !SF_FIRST_RUN_BANNER check (botched sed from SF rebrand) - delete gsd-orchestrator/ (byte-identical duplicate of sf-orchestrator/, dead post-rebrand) - package.json: rename 'sf-run' → 'singularity-forge' (missed by @sf-run/* → @singularity-forge/* rename) - delete repowise.db (164 KB orphan sqlite, no references) and gitignore - metrics.ts: drop always-zero retries + TODO; outcome-recorder defaults to 0 - rename gsd-web-launcher-contract.test.ts → sf-web-launcher-contract.test.ts - rename gsd-skill-ecosystem.md → sf-skill-ecosystem.md Regressions from commit f1da908dc ('pi-ai: add reasoning:auto across all providers'): - anthropic-vertex.ts: 'auto' was passed straight to adjustMaxTokensForThinking which requires ThinkingLevel, breaking compile. Mirror anthropic.ts: early-return adaptive thinking on 'auto'+supportsAdaptiveThinking, resolveReasoningLevel before adjustMaxTokens. - claude-code-cli/stream-adapter.ts: buildSdkOptions extraOptions.reasoning widened ThinkingLevel → RequestedThinkingLevel; 'auto' strips to undefined for the SDK effort mapping but still requests thinking:adaptive so the SDK picks effort itself. Remaining TS errors (not in this commit — dep hygiene): - google-gemini-cli.ts: OAuth2Client type mismatch between workspace-local and hoisted pnpm google-auth-library. Needs pnpm dedupe / single-install. - google-gemini-cli.ts:158: arity mismatch (3 args vs 2 expected). Signature changed somewhere; caller not updated. --- todo.md | 237 -------------------------------------------------------- 1 file changed, 237 deletions(-) delete mode 100644 todo.md diff --git a/todo.md b/todo.md deleted file mode 100644 index f0cf6aa39..000000000 --- a/todo.md +++ /dev/null @@ -1,237 +0,0 @@ -# Gemini CLI Core Follow-Up TODO - -Last updated: 2026-04-19 - -Context: -- Transport re-platform landed in commit `d83a59fb1` (`pi-ai/google-gemini-cli: re-platform transport on @google/gemini-cli-core`). -- Model list wiring landed in commit `8abfc98fd` (`pi-ai: source google-gemini-cli model list from cli-core's VALID_GEMINI_MODELS`). -- The remaining work is to exploit more of `@google/gemini-cli-core` instead of keeping SF-specific handwritten logic around quota, retry classification, and model naming. - -Relevant files: -- [packages/pi-ai/src/providers/google-gemini-cli.ts](/home/mhugo/code/singularity-foundry/packages/pi-ai/src/providers/google-gemini-cli.ts) -- [packages/pi-ai/src/providers/google-gemini-cli-core-plan.md](/home/mhugo/code/singularity-foundry/packages/pi-ai/src/providers/google-gemini-cli-core-plan.md) -- [packages/pi-coding-agent/src/core/retry-handler.ts](/home/mhugo/code/singularity-foundry/packages/pi-coding-agent/src/core/retry-handler.ts) -- [src/resources/extensions/sf-usage-bar/index.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf-usage-bar/index.ts) -- [src/resources/extensions/sf/bootstrap/register-hooks.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/bootstrap/register-hooks.ts) -- [src/resources/extensions/sf/auto-model-selection.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/auto-model-selection.ts) -- [src/resources/extensions/sf/token-counter.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/token-counter.ts) - -## Recommended order - -1. Token counting via cli-core -2. Quota surfacing via cli-core -3. Structured Gemini error classification -4. Gemini model aliasing -5. Lower-priority follow-ups (grounding, tests, keychain, telemetry) - -## 1. Token counting via `server.countTokens(...)` - -Status: -- Not integrated. -- SF currently has generic token estimation in [src/resources/extensions/sf/token-counter.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/token-counter.ts), mostly `tiktoken` or `chars/4`. -- There is already a `before_provider_request` hook path in [src/resources/extensions/sf/bootstrap/register-hooks.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/bootstrap/register-hooks.ts), which is the cleanest place to attach pre-dispatch counting for Gemini requests. - -Why this matters: -- Gives an accurate preflight token count for Gemini before dispatch. -- Lets SF warn before sending oversized prompts instead of only reacting to provider overflow errors. -- Enables future auto-compaction logic to use provider-native counts instead of heuristics. - -Concrete implementation: -- Add a small helper in `packages/pi-ai/src/providers/google-gemini-cli.ts` or a neighboring helper module that builds a `CodeAssistServer` and exposes `countTokens`. -- Thread enough auth/project context into the hook path so Gemini requests can be counted without re-implementing auth discovery. -- In `before_provider_request`, when the target model/provider is `google-gemini-cli`, call cli-core `countTokens` on the request payload before send. -- Preserve graceful fallback to current heuristics when auth is missing, cli-core throws, or the request shape is unsupported. -- Surface the result to the UI/logging path where it is actually actionable: - - warning banner for obviously large prompts, - - optional metrics/session recording, - - optional compact-context trigger later. - -Likely touchpoints: -- `packages/pi-ai/src/providers/google-gemini-cli.ts` -- `src/resources/extensions/sf/bootstrap/register-hooks.ts` -- `src/resources/extensions/sf/token-counter.ts` -- possibly `packages/pi-coding-agent/src/core/sdk.ts` if the hook plumbing needs richer request/model context. - -Acceptance criteria: -- Gemini requests can report an accurate pre-dispatch token count without sending an inference request. -- Failures in counting do not block dispatch. -- Existing token-counter tests remain green; add focused tests for Gemini preflight behavior. - -Open question: -- Decide whether the counted value belongs in ephemeral UI only, metrics only, or both. Do not wire it into multiple places until one storage/display path is chosen. - -## 2. Quota surfacing via `server.retrieveUserQuota(...)` and `refreshAvailableCredits()` - -Status: -- `src/resources/extensions/sf-usage-bar/index.ts` still manually reads OAuth tokens and POSTs directly to `https://cloudcode-pa.googleapis.com/v1internal:retrieveUserQuota`. -- That code bypasses cli-core even though cli-core already knows the right request/response path. - -Why this matters: -- Replaces fragile handwritten Gemini usage fetching. -- Avoids relying on duplicated credential discovery and raw fetch behavior. -- Makes the usage bar consistent with the same client stack the provider now uses. -- Gives a better foundation for distinguishing rate limit exhaustion vs billing/quota exhaustion. - -Concrete implementation: -- Extract Gemini auth/client creation into a reusable helper instead of duplicating token parsing in the usage bar. -- Replace the raw `fetchGeminiUsage()` request with cli-core quota APIs. -- Normalize cli-core quota response into the existing `UsageSnapshot` / `RateWindow` shape used by the usage bar. -- Keep current UI semantics where possible: - - show `Pro` and `Flash` windows if quota exists, - - preserve non-fatal error display, - - keep timeouts bounded. -- If cli-core returns richer quota detail, prefer surfacing remaining counts instead of only percentages when it fits the existing footer. - -Likely touchpoints: -- `src/resources/extensions/sf-usage-bar/index.ts` -- `packages/pi-ai/src/providers/google-gemini-cli.ts` or a new shared Gemini cli-core helper -- auth plumbing if the usage bar should stop reading raw files directly - -Acceptance criteria: -- No direct raw POST to `v1internal:retrieveUserQuota` remains in `sf-usage-bar`. -- Gemini usage bar continues to render successfully with valid OAuth credentials. -- Missing/expired credentials degrade to a clear non-crashing error state. - -Follow-up if this lands cleanly: -- Remove duplicated token loading from `auth.json` and `~/.gemini/oauth_creds.json` wherever cli-core can own it. - -## 3. Structured Gemini error classification - -Status: -- `packages/pi-coding-agent/src/core/retry-handler.ts` still decides retryability through large regexes in `isRetryableError()` and downstream string classification. -- Claude already identified cli-core exports such as `getErrorStatus`, `ModelNotFoundError`, and `googleQuotaErrors` as a better source of truth for Gemini-specific failures. - -Why this matters: -- Current retry behavior is broad and stringly-typed. -- Regex classification is hard to audit and easy to drift from upstream wording changes. -- Structured classification would let Gemini-specific retry/backoff behavior be smarter without perturbing other providers. - -Concrete implementation: -- Add a Gemini-specific error mapping layer that: - - consumes cli-core error helpers when the active provider is `google-gemini-cli`, - - converts those into SF retry classes (`rate_limit`, `quota_exhausted`, `auth_error`, `model_not_found`, non-retryable, etc.). -- Keep non-Gemini providers on the existing generic path unless there is a broader retry refactor planned. -- Update retry decisions so Gemini can distinguish: - - temporary rate-limit/cooldown, - - quota exhaustion / credits exhausted, - - auth problems, - - invalid model / model not found. -- Reduce duplicated regex patterns once the structured path is proven. - -Likely touchpoints: -- `packages/pi-coding-agent/src/core/retry-handler.ts` -- possibly `packages/pi-ai/src/providers/google-gemini-cli.ts` if provider-specific error wrapping is needed -- Gemini retry tests and any SF extension tests asserting rate-limit recovery behavior - -Acceptance criteria: -- Retry behavior for `google-gemini-cli` is driven by cli-core-aware classification rather than raw message regex where feasible. -- Existing non-Gemini retry behavior does not regress. -- Tests cover at least one case each for rate limit, quota exhaustion, auth failure, and model-not-found. - -Important constraint: -- Do not over-generalize this into a cross-provider retry framework in the same change. Keep the blast radius Gemini-scoped first. - -## 4. Gemini model aliasing (`auto`, `pro`, `flash`, `flash-lite`) - -Status: -- Concrete model list wiring exists via cli-core `VALID_GEMINI_MODELS`. -- User-facing model resolution still expects explicit IDs in places like [src/resources/extensions/sf/auto-model-selection.ts](/home/mhugo/code/singularity-foundry/src/resources/extensions/sf/auto-model-selection.ts). - -Why this matters: -- Lets preferences stay stable even as Google revs concrete model names. -- Reduces churn in `PREFERENCES.md` and docs when preview/stable IDs change. -- Complements the static model import already added from cli-core. - -Concrete implementation: -- Define Gemini aliases accepted by SF config and resolution: - - `gemini:auto` - - `gemini:pro` - - `gemini:flash` - - `gemini:flash-lite` - - decide whether equivalent slash or bare forms should also resolve. -- Resolve aliases through cli-core’s model alias resolution rather than SF-local hardcoding where possible. -- Hook alias resolution into the same places where model IDs are interpreted from preferences and dynamic routing. -- Preserve current explicit-ID behavior exactly. -- Add user-facing documentation once semantics are stable. - -Likely touchpoints: -- `src/resources/extensions/sf/auto-model-selection.ts` -- model resolution helpers in SF preference/bootstrap code -- possibly `packages/pi-coding-agent/src/core/model-resolver.ts` if alias resolution belongs closer to the registry layer - -Acceptance criteria: -- A user can specify Gemini aliases in preferences and have them resolve to a valid available model. -- Explicit full model IDs still win when provided. -- Unknown aliases fail clearly instead of silently picking an arbitrary Gemini model. - -Risk to manage: -- Avoid leaking alias behavior into non-Gemini providers or into provider/model parsing in a way that breaks existing `provider/model` resolution semantics. - -## 5. Lower-priority follow-ups - -### 5a. Google Search grounding tool - -Potential value: -- Native Gemini grounding could replace some Brave/Tavily usage for Gemini-dispatched work. - -Why this is lower priority: -- It touches provider tools, routing, and citation UX, which is a bigger behavioral change than quota/counting. - -Suggested approach: -- Investigate only after quota/counting/error classification are stable. -- Keep the change Gemini-scoped and opt-in first. - -### 5b. Test harness migration - -Potential value: -- `makeFakeConfig`, `createMockWorkspaceContext`, and `MockMessageBus` can make provider tests more realistic and less dependent on raw network mocking. - -Suggested approach: -- Fold this into whichever of items 1-4 lands first rather than doing it standalone. - -### 5c. MCP keychain storage - -Potential value: -- Better OAuth token storage for MCP auth flows. - -Why this is separate: -- It is an auth-storage architecture change, not a Gemini provider follow-up. - -### 5d. IDE detection, terminal serialization, experiment flags, telemetry - -Potential value: -- Nice integrations and better fidelity with upstream CLI behavior. - -Why this is last: -- None of them unblock current Gemini provider correctness or UX. - -## Proposed task breakdown for delegation - -If these are delegated as separate rescue tasks, use this split: - -1. `gemini-token-counting` -- Implement cli-core-backed Gemini preflight counting. -- Own hook integration and tests only. -- Do not touch usage bar or retry logic. - -2. `gemini-quota-usage-bar` -- Replace Gemini raw quota fetch with cli-core quota APIs. -- Own `sf-usage-bar` integration and Gemini auth helper reuse. -- Do not touch retry handler or model resolution. - -3. `gemini-retry-classification` -- Introduce cli-core-backed Gemini error classification in retry handling. -- Own retry tests and minimal provider-side wrapping if needed. -- Do not change quota UI or aliasing. - -4. `gemini-model-aliases` -- Add `auto/pro/flash/flash-lite` alias resolution for Gemini preferences. -- Own resolution/documentation/tests only. - -## Suggested definition of done - -The follow-up is complete when: -- Gemini transport, quota lookup, and retry classification all rely on cli-core where upstream already provides that behavior. -- SF no longer carries duplicated handwritten Gemini logic for areas cli-core already owns. -- User-facing configuration for Gemini is more stable than concrete preview model IDs. -- Each change lands with focused tests and without broad cross-provider refactors.