Commit graph

3596 commits

Author SHA1 Message Date
Mikael Hugo
7be540480e docs: add CLAUDE.md with dev guide for build pipeline and test runner
Documents the dist-vs-source distinction that caused the memoriesSection
fix to not take effect, the c8 coverage runner process leak, and the
template variable maintenance contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 18:56:03 +02:00
Mikael Hugo
7289933909 fix: populate memoriesSection in execute-task prompt and fix stale dist
buildExecuteTaskPrompt was not passing memoriesSection to loadPrompt,
causing headless auto to fail with a template variable error. Also
updated plan-slice-prompt.test.ts to supply the four template variables
(memoriesSection, runtimeContext, phaseAnchorSection, gatesToClose) that
were missing from the test fixture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 18:46:55 +02:00
Mikael Hugo
a30a7692e3 fix: dist-redirect.mjs incorrectly rewrites .js→.ts for node_modules paths containing /src/
The resolver guarded on context.parentURL.includes('/src/') to identify
in-repo source files, but @google/gemini-cli-core installs to
node_modules/@google/gemini-cli-core/dist/src/ which also contains '/src/'.
Relative imports from that dist package (e.g. './config/config.js') were
incorrectly rewritten to './config/config.ts', causing ERR_MODULE_NOT_FOUND
on every test that transitively imports the google-gemini provider.

Fix: add !context.parentURL.includes('/node_modules/') guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 18:04:23 +02:00
Mikael Hugo
2e32c96fa0 Port gsd2 functional parity: turn-epoch, abandon-detect, reapplyThinking, exec chain, memory chain, onboarding-state
- auto/turn-epoch.ts: AsyncLocalStorage-backed stale-write dropping for timeout recovery
- journal.ts: isStaleWrite() guard drops superseded turn writes
- auto/run-unit.ts: wrap agent_end Promise.race in runWithTurnGeneration
- auto/session.ts: ThinkingLevelSnapshot type + autoModeStartThinkingLevel/originalThinkingLevel fields
- auto-model-selection.ts: reapplyThinkingLevel() called after every successful setModel()
- auto/phases.ts: pass autoModeStartThinkingLevel to selectAndApplyModel + hook override restore
- abandon-detect.ts: two-signal milestone abandon detection in rewrite-docs overrides
- auto-post-unit.ts: use detectAbandonMilestone + parkMilestone in rewrite-docs handler
- preferences-types.ts: ContextModeConfig + isContextModeEnabled
- exec-sandbox.ts: sandboxed bash/node/python subprocess with .sf/exec/ persistence
- exec-history.ts: read-side scan of .sf/exec/*.meta.json
- compaction-snapshot.ts: ≤2 KB markdown digest written before context compaction
- tools/exec-tool.ts: sf_exec MCP tool executor
- tools/exec-search-tool.ts: sf_exec_search MCP tool executor
- tools/resume-tool.ts: sf_resume MCP tool executor
- bootstrap/exec-tools.ts: registers sf_exec/sf_exec_search/sf_resume
- memory-relations.ts: knowledge-graph edges between memories (traverseGraph)
- tools/memory-tools.ts: capture_thought/memory_query/sf_graph executors
- bootstrap/memory-tools.ts: registers capture_thought/memory_query/sf_graph
- bootstrap/register-extension.ts: wire exec-tools + memory-tools into registration
- onboarding-state.ts: onboarding completion record at ~/.sf/agent/onboarding.json

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:58:39 +02:00
Mikael Hugo
5887ea3fd1 port gsd2: blocked-models gate, milestone-summary classifier, unsupported-model recovery
blocked-models.ts (new):
  Persistent per-project blocklist at .sf/runtime/blocked-models.json.
  loadBlockedModels / isModelBlocked / blockModel (file-lock-safe write).

milestone-summary-classifier.ts (new):
  classifyMilestoneSummaryContent → "success" | "failure" | "unknown".
  isTerminalMilestoneSummaryContent: failure summaries are NOT terminal —
  lets auto-mode re-enter a milestone after a failed recovery summary.

state.ts:
  Phase 1 (completeMilestoneIds) and Phase 2 (registry) now check
  isTerminalMilestoneSummaryContent before treating a SUMMARY as complete.
  A failure SUMMARY no longer prematurely parks a milestone.

error-classifier.ts:
  Add "unsupported-model" ErrorClass kind with regex detection
  (model + not-supported/unavailable/no-access + account/plan/tier).
  Checked before "permanent" so /account/i in PERMANENT_RE doesn't swallow it.

auto-model-selection.ts:
  Wire isModelBlocked() gate in selectAndApplyModel candidate loop:
  skips provider-rejected models and continues to fallbacks.

bootstrap/agent-end-recovery.ts:
  Handle cls.kind === "unsupported-model": blockModel(), try fallback chain
  skipping already-blocked models, pause if no usable fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:13:27 +02:00
Mikael Hugo
6cb6de4fd2 perf: parallelize I/O, add runtime cache, extend nix devenv
- unit-context-composer: resolve artifact keys in parallel (Promise.all)
- unit-runtime: add in-memory cache to avoid repeated disk reads per dispatch
- auto-timers: share 15s idle watchdog tick with context-pressure check
- auto-prompts: 1s TTL budget cache to coalesce repeated loadEffectiveSFPreferences calls
- native-git-bridge: extend nativeHasChanges TTL 10s→30s
- auto-dashboard: remove pulsing dot animation (CPU churn, no UX value)
- flake.nix: add nodePackages.typescript to dev shell

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:12:32 +02:00
Mikael Hugo
12aabd863e port gsd2 #4769: worktree telemetry, slice-cadence, canonical-root fix + /sf scan
Ports commit 7fb35ca58 from gsd2 (PR #4769) covering four issues:

#4761 — resolveCanonicalMilestoneRoot in worktree-manager.ts routes
validate-milestone through the live worktree path instead of stale
project-root state when a milestone worktree is active.

#4762 — auditOrphanedMilestoneBranches in auto-start.ts now surfaces
in-progress milestone branches with unmerged commits ahead of main
(previously only complete milestones were audited). Gated on
isClosedStatus so parked/other closed statuses are unaffected.

#4764 — worktree-telemetry.ts: typed emit helpers (emitWorktreeCreated,
emitWorktreeMerged, emitWorktreeOrphaned, emitAutoExit, emitWorktreeSync,
emitCanonicalRootRedirect, emitSliceMerged, emitMilestoneResquash) plus
summarizeWorktreeTelemetry aggregator and nearest-rank percentile().
Wired in: worktree-resolver.ts (create/merge events), auto-start.ts
(orphan telemetry), auto.ts stopAuto (auto-exit with normalized reason),
worktree-manager.ts (canonical-root-redirect). Surfaced in forensics.ts
via detectWorktreeOrphans and Worktree Telemetry sections.

#4765 — slice-cadence.ts: mergeSliceToMain squash-merges each slice's
commits onto main as soon as the slice passes validation (opt-in via
git.collapse_cadence: "slice"). resquashMilestoneOnMain collapses N
per-slice commits into one milestone commit at completion. Wired in
auto-post-unit.ts (slice merge after complete-slice with stopAuto on
conflict/error) and worktree-resolver.ts (resquash at mergeAndExit).
AutoSession.milestoneStartShas tracks the pre-first-slice SHA.
GitPreferences and preferences-validation.ts extended with
collapse_cadence and milestone_resquash fields.

Also ports /sf scan command: commands-scan.ts with parseScanArgs,
resolveScanDocuments, buildScanOutputPaths, and handleScan dispatching
a focused codebase assessment prompt to .sf/codebase/.

journal.ts: 9 new JournalEventType values for the telemetry events.
All changes are additive; default behavior (cadence="milestone") unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 09:03:56 +02:00
Mikael Hugo
2911d3b93d port gsd2: reassess-roadmap opt-in (ADR-003 §4) + prefer toolDefinition.label
reassess-roadmap: flip default from true → false. Most reassess units
conclude "roadmap is fine" burning a session for no change; the
plan-slice prompt now carries a JIT preamble at zero cost. (#4778)

tool-execution: always prefer toolDefinition.label when non-empty,
even when label === name — allows tools to display their canonical
name explicitly. (#4758)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:33:50 +02:00
Mikael Hugo
d4cdcb582d port gsd2 #3338: ecosystem plugin loader for .sf/extensions/
Adds support for project-local SF extension plugins dropped in
.sf/extensions/. Trust-gated (requires pi trust), symlink-escape safe.

- ecosystem/sf-extension-api.ts: SFExtensionAPI wrapper exposing
  getPhase() and getActiveUnit() to third-party handlers; updateSnapshot
  refreshes state before_agent_start so handlers see current phase/unit
- ecosystem/loader.ts: discovers .sf/extensions/*.js, loads them via
  dynamic import, dispatches factory(api) for each
- register-extension.ts: initializes ecosystemHandlers array, wires loader
- register-hooks.ts: before_agent_start refreshes snapshot then dispatches
  ecosystem handlers before returning SF system prompt
- types.ts: SFActiveUnit interface (milestoneId/sliceId/taskId + titles)
- workflow-logger.ts: "ecosystem" added to LogComponent union

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:27:55 +02:00
Mikael Hugo
6c36d62f35 port gsd2 #4961: stop using active-tool snapshot as model-policy gate
Fixes a bug where per-unit tool narrowing poisoned the policy gate for
subsequent units, causing "Model policy denied dispatch before prompt send"
errors on complete-slice and discuss-milestone (100% Win repro).

Four-part port from gsd2@817031b2a:
- ModelPolicyDispatchBlockedError class with per-model deny reasons
- TOOL_BASELINE WeakMap + clearToolBaseline/restoreToolBaseline lifecycle
- auto-model-selection: use getRequiredWorkflowToolsForAutoUnit as requiredTools
- auto/loop: catch ModelPolicyDispatchBlockedError as non-retryable (pause)
- auto.ts: wire clearToolBaseline at startAuto (fresh only) and stopAuto

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:15:04 +02:00
Mikael Hugo
4fdd8700a3 port gsd2 upstream features: scope classifier, composer v2, GPT-5.5, test timeout
- milestone-scope-classifier: add getMilestonePipelineVariant + milestoneRowToScopeInput
  wired into auto-dispatch trivial-skip for research/validation phases (#4781)
- auto-prompts: rename GSD→SF identifiers, add isSummaryCleanForSkip, prefs param
  on checkNeedsReassessment, buildExtractionStepsBlock from commands-extract-learnings
- unit-context-manifest + unit-context-composer: port v2 typed computed artifacts (#4924)
- skill-manifest: per-unit-type skill filter resolver (#4788, #4792)
- escalation: stub for ADR-011 mid-execution escalation (full port deferred)
- auto-start: extract decideSurvivorAction for testability (#4832)
- models: add gpt-5.5 + gpt-5.4-mini to cost table, router, and models.generated.ts
- types: EscalationArtifact, context_window_override, skip_clean_reassess,
  mid_execution_escalation, sketch_scope on SliceRow
- tool-execution: add visibleWidth import (was undefined)
- package.json: add --test-timeout=30000 to prevent parallel tests from freezing machine

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 08:08:11 +02:00
Mikael Hugo
e2147c0694 sf snapshot: pre-dispatch, uncommitted changes after 43m inactivity 2026-04-25 06:34:49 +02:00
Mikael Hugo
7b6c9dd099 sf snapshot: pre-dispatch, uncommitted changes after 4703m inactivity 2026-04-25 05:51:29 +02:00
ace-pm
e625d20a59
fix: add self to flake outputs 2026-04-21 23:27:40 +02:00
ace-pm
c744bdf6c1
fix: atomic writes, parse radix, lossy json, silent worker spawn
8 fixes from 3rd-pass scan:

1. web/components/sf/tempCodeRunnerFile.tsx: remove orphan VS Code
   'Code Runner' artifact (850+ lines duplicated from shell-terminal.tsx).
   Unreferenced but compiled into tsc project.

2. sf/phase-anchor.ts: writePhaseAnchor used plain writeFileSync — a crash
   mid-write would corrupt the handoff checkpoint that readPhaseAnchor then
   silently returns null for, losing cross-phase context. Switched to
   atomicWriteSync (already used by sibling files).

3. sf/forensics.ts: same non-atomic writeFileSync on active-forensics.json
   marker. Race with a concurrent reader produces an empty object and the
   forensics session is lost. Switched to atomicWriteSync.

4. web/auto-dashboard-service.ts: paused-session.json existence was the
   intended signal but a corrupt body silently dropped the paused flag so
   the UI showed active. Now reports paused on file existence regardless
   of body integrity, and warns on corruption.

5. sf/visualizer-data.ts: doctor-history.jsonl parser did .map(JSON.parse)
   inside an outer catch. One corrupt line discarded 19 valid entries.
   Per-line try/catch preserves the valid rows.

6. sf/files.ts: three parseInt calls without radix (step, total_steps,
   totalSteps) — also missing || 0 fallback for NaN.

7. cli.ts: parseInt(process.versions.node) without radix. Split on '.' and
   use radix 10 explicitly.

8. sf/slice-parallel-orchestrator.ts: silent 'catch {}' around spawn()
   masked worker-spawn failures as 'no workers available'. Matches sibling
   parallel-orchestrator.ts pattern — now logs via logWarning.

Skipped from the scan (need a real lock mechanism, not safe as a one-line
fix):
- sf/auto-dispatch.ts:164 (UAT counter race)
- sf/captures.ts:107 (CAPTURES.md append race)

Deferred (low-value):
- preferences-models.ts, key-manager.ts, auto-timers.ts silent catches
- dead variable in visualizer-data.ts
- google-gemini-cli.ts maxTokens clamp interaction

tsc --noEmit green at root.
2026-04-21 02:13:10 +02:00
ace-pm
51b65fd490
fix: symlink extensions + silent catches masking real errors
Real bugs from 2nd-pass scan:

1. extension-registry.ts: discoverAllManifests skipped symlinked extension
   dirs because Dirent.isDirectory() returns false for symlinks. Dev-workflow
   symlinks under ~/.sf/agent/extensions/ were invisible to list/enable/
   disable/info. Matches the regression documented in
   symlink-extension-discovery.test.ts — the test inlines the correct logic,
   but this callsite still had the buggy form. Now accepts isDirectory() ||
   isSymbolicLink().

2. headless.ts SIGINT handler: client.stop() failures were double-silenced
   (inner .catch(()=>{}), outer try{}catch{}). Interactive mode logs stop
   errors to stderr. Restored head/headless parity — still fire-and-forget
   (exit code is forced via process.exit) but failures are observable.

3. openai-codex-responses.ts SSE parser: malformed data frames were silently
   dropped so broken streams looked identical to clean ones. Now debug-logs
   the parse error with the chunk context so broken streams are
   distinguishable in logs. Stream continues on bad chunk (one bad frame
   shouldn't kill the whole generation).

4. web/cleanup-service.ts generated script: bare 'catch {}' around four native
   git calls (nativeBranchList, nativeDetectMainBranch, nativeBranchListMerged,
   nativeForEachRef). A failed main-branch detection silently left mainBranch
   undefined-shaped, then the next native call operated on garbage. Now emits
   console.warn so failures surface in the subprocess log.

5. web/undo-service.ts generated script: git revert failure was silenced;
   when --no-commit failed, user saw commitsReverted=0 with no reason. Now
   logs the revert error before attempting --abort (abort itself remains
   best-effort silent).

False positives from the same scan (investigated and dismissed):
- auto-worktree.ts #2505: code uses ':(exclude).sf/milestones' pathspec +
  shelter-and-restore, which is a better fix than the 'drop --include-untracked'
  approach the test comment describes. Test comment is stale; source is correct.
- Lifecycle handler unhandled rejections across 5 extensions: extensions/runner.ts
  already try/catches handler invocations and routes to emitError. Wrapping the
  individual handlers would be redundant.
2026-04-21 02:01:41 +02:00
ace-pm
0f94341b43
fix(loader): fall back to src/resources when SF-WORKFLOW.md missing from dist
Build sometimes copies dist/resources/extensions/ without the top-level
markdown files (observed: SF-WORKFLOW.md absent in dist/resources/ while
extensions/ was present). existsSync(distRes) was true either way, so
SF_WORKFLOW_PATH pointed at a non-existent path and /sf failed with ENOENT.

Check for the specific file instead of the directory.
2026-04-21 01:39:18 +02:00
ace-pm
bf96faf99b
fix: 7 cleanup findings + 2 reasoning:auto TS regressions
Cleanup:
- cli.ts: collapse duplicate !SF_FIRST_RUN_BANNER && !SF_FIRST_RUN_BANNER check (botched sed from SF rebrand)
- delete gsd-orchestrator/ (byte-identical duplicate of sf-orchestrator/, dead post-rebrand)
- package.json: rename 'sf-run' → 'singularity-forge' (missed by @sf-run/* → @singularity-forge/* rename)
- delete repowise.db (164 KB orphan sqlite, no references) and gitignore
- metrics.ts: drop always-zero retries + TODO; outcome-recorder defaults to 0
- rename gsd-web-launcher-contract.test.ts → sf-web-launcher-contract.test.ts
- rename gsd-skill-ecosystem.md → sf-skill-ecosystem.md

Regressions from commit f1da908dc ('pi-ai: add reasoning:auto across all providers'):
- anthropic-vertex.ts: 'auto' was passed straight to adjustMaxTokensForThinking
  which requires ThinkingLevel, breaking compile. Mirror anthropic.ts: early-return
  adaptive thinking on 'auto'+supportsAdaptiveThinking, resolveReasoningLevel before
  adjustMaxTokens.
- claude-code-cli/stream-adapter.ts: buildSdkOptions extraOptions.reasoning widened
  ThinkingLevel → RequestedThinkingLevel; 'auto' strips to undefined for the SDK
  effort mapping but still requests thinking:adaptive so the SDK picks effort itself.

Remaining TS errors (not in this commit — dep hygiene):
- google-gemini-cli.ts: OAuth2Client type mismatch between workspace-local and
  hoisted pnpm google-auth-library. Needs pnpm dedupe / single-install.
- google-gemini-cli.ts:158: arity mismatch (3 args vs 2 expected). Signature changed
  somewhere; caller not updated.
2026-04-21 01:38:19 +02:00
ace-pm
485e8f608e
chore: init sf 2026-04-21 01:38:02 +02:00
ace-pm
e63184f91d
fix(migrations): drop press-any-key block to avoid stdin wedge
showDeprecationWarnings ran setRawMode(true)/once('data')/setRawMode(false)/
pause() right before pi-tui's own stdin setup. That handoff is fragile —
buffered bytes and mode flips between the migration prompt and the TUI's
raw-mode setup can leave stdin cooked and line-buffered, producing the
'Enter does nothing + garbled typing' symptom.

Warnings now print non-blocking. They stay visible in scrollback above
the TUI, so users still see them without a blocking acknowledge step.
2026-04-21 00:56:18 +02:00
ace-pm
e6676692fc
fix(sf-tui): remove welcome overlay that hangs on enter
The per-session branded welcome overlay was added by the SF rebrand
(9d739dfa5) as a boxed 'Press any key to continue...' splash shown once
per sf session. In practice: Enter doesn't dismiss it and typing renders
as garbled characters behind the overlay, blocking every TUI launch.

Branding was redundant with the header (installed at session_start) and
the footer (git branch + model). Shortcuts are discoverable via help.
Deleting the overlay eliminates the hang vector entirely.

Legacy-extension migration warnings (migrations.ts 'Press any key...')
are unaffected — those are vendored upstream Pi code on a different
code path and only fire when deprecated extensions are present.
2026-04-21 00:44:28 +02:00
ace-pm
6446381730
chore(nix): run deadnix + statix + alejandra
Automated formatting pass: remove dead bindings, apply statix lint
fixes, normalize formatting via alejandra.
2026-04-21 00:27:31 +02:00
ace-pm
d0925d8d31
chore(make): add 'sf' target for running from source 2026-04-21 00:18:55 +02:00
ace-pm
dff521a506
fix(git): drop orphan gitlink at mintlify-docs/docs
Removes stray submodule pointer (mode 160000, commit 5c549fdf) with no
corresponding .gitmodules entry and empty working tree. Produced
'fatal: No url found for submodule path' + exit 128 warning on every
CI checkout (visible in Pipeline 'Update CI Builder Image' runs).
2026-04-21 00:17:45 +02:00
Mikael Hugo
f1da908dcd pi-ai: add reasoning:auto across all providers + Kimi K2.6
RequestedThinkingLevel adds "auto" to the reasoning option. Each provider
handles it natively:

- Claude 4.x (anthropic/bedrock): adaptive thinking, no effort constraint
- Gemini 2.5 Pro/Flash (google/vertex/gemini-cli): THINKING_LEVEL_UNSPECIFIED
- GPT-5+ (openai-responses/azure): reasoning.effort omitted, model decides
- Kimi (kimi-coding): {"type":"enabled"} without budget_tokens via new
  capabilities.thinkingNoBudget flag — model manages reasoning depth
- GLM (zai, thinkingFormat:zai): enable_thinking:true already correct
- MiniMax (anthropic API): explicit budget_tokens required, resolves to medium

ModelCapabilities.thinkingNoBudget: new flag for Anthropic-compatible providers
that accept {"type":"enabled"} without a budget (Kimi API).

models.generated.ts: add Kimi K2.6 (id: kimi-for-coding, beta API); add
thinkingNoBudget capability to all kimi-coding models.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 21:22:25 +02:00
Mikael Hugo
38d3bd55da sf: route Gemini family models to google-gemini-cli by default
resolveModelId now prefers google-gemini-cli over google (direct API) for
bare Gemini/Gemma IDs, matching the operational default after the CLI-core
re-platform. google-vertex is still honoured when it's the current provider.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 20:33:43 +02:00
Mikael Hugo
822791fad3 sf: wire Fix 1 deferred-commit (stage-before-verify, commit-after-verify)
postUnitPreVerification now calls stageOnly() for execute-task units when
action=commit, setting stagedPendingCommit=true and capturing task context.
postUnitPostVerification commits the staged index after the gate passes,
using a conventional-commit message built from the task context. Failure is
non-fatal (logWarning + UI warning). 11 structural tests cover the full
deferral lifecycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 20:33:39 +02:00
Mikael Hugo
315c2c49ca sf: fail-closed verification gate + deferred-commit infrastructure
Fix 2: verification gate no longer passes when no commands are
configured. Empty-commands result now returns passed=false, skipped=true.
Updated verification-gate.test.ts; added skipped-result guard in
auto-verification.ts that warns and continues (not a hard failure).

Fix 3: split auto-verification.ts try/catch into two zones. Zone 1
(gate machinery: prefs load, task lookup, runVerificationGate,
captureRuntimeErrors, runDependencyAudit) catches → pauseAuto + return
"pause". Zone 2 (ancillary: evidence writes, UOK gate, notifications)
catches → logWarning + return "continue". Added verification-fail-
closed.test.ts with 11 structural tests.

Fix 1 (infrastructure): added stageOnly() + commitStaged() to
GitServiceImpl, added stagedPendingCommit flag to AutoSession (cleared
in reset()), marked the runTurnGitAction call site in
postUnitPreVerification with TODO(fix-1-deferral) for the final wiring.

Fix 4: timeout handler in runFinalize now captures hadStagedPending and
hadCommitted before nulling currentUnit. Clears stagedPendingCommit to
prevent orphaned deferred commits. Emits a diagnostic warning for each
case so operators know whether staged-but-uncommitted changes will be
absorbed or whether a commit landed before verification was skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:32:47 +02:00
Mikael Hugo
c940ebc16f sf: unify milestone discuss dispatch + todo.md seed injection
Replace separate dispatchHeadlessBootstrap with one flow:
- dispatchNewMilestoneDiscuss({ auto }) — auto=true uses headless
  prompt + rootFiles seed, no pendingAutoStartMap; auto=false uses
  discuss prompt with preparation, sets pendingAutoStartMap
- bootstrapNewMilestone() — project setup + ID reservation, called
  directly from bootstrapAutoSession instead of the old wrapper
- injectTodoContext() — reads and deletes todo.md/TODO.md/SPEC.md at
  project root, injects content as spec into any preamble; called
  identically in auto and interactive flows

Removes dispatchHeadlessBootstrap entirely. auto-start.ts now calls
the primitives directly. All three showWorkflowEntry new-milestone
sites use dispatchNewMilestoneDiscuss({ auto: false }).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 19:04:12 +02:00
Mikael Hugo
67d25f95f2 sf: add gemini cli preflight token counting 2026-04-19 13:25:07 +02:00
Mikael Hugo
8abfc98fdc pi-ai: source google-gemini-cli model list from cli-core's VALID_GEMINI_MODELS
generate-models.ts now imports @google/gemini-cli-core's
VALID_GEMINI_MODELS set and iterates it to produce SF's google-gemini-cli
provider entries. Single source of truth: when Google ships a new Gemini
model, it lands in cli-core first, then flows into SF on
`npm update @google/gemini-cli-core` + `generate-models.ts` re-run —
no more hand-editing the generate script.

Before:  6 hardcoded entries (gemini-2.0/2.5/3 flash + pro preview, etc.)
After:   7 entries sourced dynamically, filtered to drop `-customtools`
         variants which require a different tool protocol:

  gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite,
  gemini-3-pro-preview, gemini-3-flash-preview,
  gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview

Capability tagging uses cli-core's isProModel / isPreviewModel so
reasoning=true for pro + 3.x preview variants (excluding flash-lite).
Context-window / max-output-tokens kept in an SF-local override table
since cli-core doesn't publish those per-model.

Pre-existing 4 test failures (zai glm-5.1 x3, anthropic resolveBaseUrl
#4140) unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:44:28 +02:00
Mikael Hugo
d83a59fb14 pi-ai/google-gemini-cli: re-platform transport on @google/gemini-cli-core
Replaces the handwritten fetch() + SSE-parsing + custom retry loop in
packages/pi-ai/src/providers/google-gemini-cli.ts with direct calls into
`CodeAssistServer.generateContentStream()` from @google/gemini-cli-core.
Requests to cloudcode-pa.googleapis.com are now byte-identical to what
the real `gemini` CLI sends — same User-Agent, same Client-Metadata,
same retry semantics — which preserves Google's subsidised free-OAuth
quota treatment and eliminates third-party-bot ban risk.

File size: 798 → 511 lines (~290 lines deleted net).

What went away:
  - DEFAULT_ENDPOINT, GEMINI_CLI_HEADERS (cli-core sets these itself)
  - MAX_RETRIES, BASE_DELAY_MS, MAX_EMPTY_STREAM_RETRIES, EMPTY_STREAM_BASE_DELAY_MS
  - CLAUDE_THINKING_BETA_HEADER (was antigravity-only)
  - extractRetryDelay(), isRetryableError(), extractErrorMessage(),
    sleep() — cli-core handles 429/5xx retry with Retry-After honoured
  - needsClaudeThinkingBetaHeader() — antigravity-only stub
  - CloudCodeAssistRequest + CloudCodeAssistResponseChunk interfaces
    (replaced by @google/genai's GenerateContentParameters +
     GenerateContentResponse — already unwrapped by cli-core)
  - ~200-line SSE body-reader block (response.body.getReader() + decoder
     + 'data:' line parsing) — cli-core yields parsed objects directly
  - Empty-stream retry workaround — handled upstream now

What stayed (pure SF adapter code):
  - convertMessages() → @google/genai Content[]
  - convertTools() → functionDeclarations
  - AssistantMessageEventStream — our event shape
  - Part-by-part processing: text vs thinking blocks, function-call
    translation to ToolCall, thoughtSignature retention, usage token
    extraction

New helper:
  - buildCodeAssistServer(token, projectId) constructs an OAuth2Client
    (google-auth-library) seeded with the SF-cached access token and
    wraps it in a CodeAssistServer instance. Ready for future promotion
    to cli-core's getOauthClient() for full auto-refresh; today we
    still pass the token through from SF's auth storage (Strategy A
    from the plan doc).

Live verified end-to-end against gemini-2.5-flash using the user's
cached ~/.gemini/oauth_creds.json — got real streaming response,
correct stopReason, usage tokens accounted.

Models registry test updated from 23 → 22 providers (antigravity gone).
Remaining 4 pi-ai test failures are pre-existing and unrelated
(custom-zai glm-5.1, resolveAnthropicBaseUrl #4140).

Type note: cli-core bundles its own nested copy of @google/genai, so
TypeScript sees two structurally-identical Content types. Runtime is
fine; a single `as any` cast at the generateContentStream call site
handles the nominal split.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:29:56 +02:00
Mikael Hugo
a6320f6c29 package: pin gaxios override to ^6.7.1 (required by googleapis-common)
Previous override (gaxios: 7.1.4) was set in 5c64f991b to silence a
glob@10 deprecation warning. That choice is incompatible with
@google/gemini-cli-core's dependency graph: googleapis-common@7.2.0
does `require("gaxios/build/src/common")` — a deep internal path that
gaxios 6.x exposed but 7.x tightened out of its exports field.

Swapping to ^6.7.1 restores cli-core's runtime: a probe using the
installed cli-core + the user's cached ~/.gemini/oauth_creds.json now
successfully reaches https://cloudcode-pa.googleapis.com/v1internal:
streamGenerateContent and gets a real response from gemini-2.5-flash.

The glob deprecation the previous override fixed is cosmetic and
doesn't block anything. Live cli-core functionality trumps npm warning
noise.

Unblocks task #3: replacing the handwritten fetch() transport in
pi-ai/src/providers/google-gemini-cli.ts with CodeAssistServer calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 11:01:37 +02:00
Mikael Hugo
bae6553e67 pi-ai: remove google-antigravity provider entirely
Continues the antigravity rip-out (previous commit covered SF + pi-coding-
agent UI layer). This commit removes the code from pi-ai:

- Delete packages/pi-ai/src/utils/oauth/google-antigravity.ts (313 lines)
- Update oauth/index.ts: drop antigravityOAuthProvider, refreshAntigravityToken,
  loginAntigravity exports + registry entry. Add comment explaining why
  (no vendor core lib + Google ban risk).
- google-gemini-cli.ts: strip ANTIGRAVITY_* constants, ANTIGRAVITY_ENDPOINT_FALLBACKS,
  getAntigravityHeaders(), ANTIGRAVITY_SYSTEM_INSTRUCTION, and all
  isAntigravity branching from streamGoogleGeminiCli + buildRequest.
  File header rewritten. needsClaudeThinkingBetaHeader() collapses to
  always-false (antigravity was the only path that needed it).
- google-shared.ts: strip stale Antigravity comments (file still shared
  between google, google-gemini-cli, google-vertex).
- types.ts: drop "google-antigravity" from Api / KnownProvider union.
- models.generated.ts: remove google-antigravity provider block (~170 lines,
  4 claude-* models that were only served via Antigravity).
- models.generated.test.ts: drop from expected-providers snapshot.
- scripts/generate-models.ts: remove antigravity model emission + context-
  window override so future regenerations don't re-add it.

Reasoning (same as previous commit): Antigravity has no vendor-published
core library we can embed. Hand-rolled OAuth against
daily-cloudcode-pa.sandbox.googleapis.com was exactly the pattern
Google is banning for third-party tools. Removing it eliminates the
risk surface.

Breaking change: users with google-antigravity configured in their
models.* block will need to migrate to google-gemini-cli (OAuth via
the real `gemini` CLI), google (API key), or google-vertex (GCP auth).

Build passes. Next commit wires the google-gemini-cli provider to
@google/gemini-cli-core per the plan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 10:45:44 +02:00
Mikael Hugo
59806f8cc5 rip out antigravity from SF + pi-coding-agent UI/config layer
Antigravity (Google's IDE sandbox product, different from Gemini CLI) is
removed from:

  src/onboarding.ts                         — drop from LLM_PROVIDER_IDS + OAuth-flow picker
  src/pi-migration.ts                       — drop from LLM_PROVIDER_IDS migration list
  src/web/onboarding-service.ts             — drop from web-UI provider list
  src/tests/integration/web-onboarding-contract.test.ts — update contract
  src/resources/extensions/sf/doctor-providers.ts — drop from CLI_AUTH_PROVIDERS
  src/resources/extensions/sf/key-manager.ts      — drop UI listing
  src/resources/extensions/sf-usage-bar/index.ts  — delete entire quota fetcher block (~200 lines)
  packages/pi-coding-agent/src/cli/args.ts        — drop PI_AI_ANTIGRAVITY_VERSION doc
  packages/pi-coding-agent/src/utils/proxy-server.ts — drop from claude provider chain

Reason: antigravity has no vendor-published core library we can embed
(unlike @google/gemini-cli-core for the Gemini CLI). Continuing to
hand-roll OAuth against daily-cloudcode-pa.sandbox.googleapis.com is
exactly the pattern Google has started banning for third-party tools.
Removing the code removes the ban risk.

pi-ai provider code, OAuth util, and models.generated entries for
google-antigravity are removed in follow-up commits (separated for
reviewability — each layer verified independently).

Build passes. Note: this is a breaking change for any user who had
google-antigravity configured — they'll need to migrate to
google-gemini-cli (OAuth), google (API key), or google-vertex.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 10:39:36 +02:00
Mikael Hugo
233432d486 model-registry: drop google-antigravity from claude family_failover (preparing rip-out) 2026-04-19 10:35:56 +02:00
Mikael Hugo
eed84a2624 pi-ai: add @google/gemini-cli-core@0.38.2 dependency + refactor plan
Installs Google's official core library that powers the `gemini` CLI
binary. This is the first step of re-platforming pi-ai's
`google-gemini-cli` provider to use cli-core's transport instead of
handwritten fetch() calls against cloudcode-pa.googleapis.com.

Why:
  - cli-core requests are byte-for-byte identical to the official
    gemini CLI — preserves Google's subsidised free-OAuth quota and
    eliminates bot-detection drift risk from our reverse-engineered
    User-Agent / Client-Metadata headers.
  - Auto-inherit upstream improvements (new tool formats, grounding,
    session caching, quota displays) on `npm update`.
  - The `genai-proxy` extension (localhost proxy for gemini-cli-format
    clients) becomes "the CLI, but programmable" — same upstream
    behavior, hookable SF routing underneath.

Auth model (unchanged for users):
  - User runs the real `gemini` CLI once to OAuth; credentials land
    in ~/.gemini/oauth_creds.json (or keychain on newer installs).
  - SF reads those credentials via cli-core's own storage helpers;
    no SF-side OAuth flow, no separate login.

Scope for this commit: dependency only. The transport refactor
(replacing the fetch() calls in google-gemini-cli.ts with
CodeAssistServer.generateContentStream()) is queued as the next
task and documented in google-gemini-cli-core-plan.md with a
detailed API map, two integration strategies (transport-only vs
full cli-core auth), and a step-by-step implementation checklist.

Note: this commit adds 66 transitive deps to pi-ai (ajv, zod,
glob, mime, open, etc.). google-antigravity provider stays on
handwritten code — different sandbox endpoints, different auth
contract, not in cli-core's scope.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 10:33:22 +02:00
Mikael Hugo
ffe86284d2 model-registry: split direct vs family_failover providers per model family
Prior PROXY_FAMILY_PRIORITY table conflated "direct provider" with
"failover provider that happens to serve this family". Observed case:
claude-* family listed anthropic, google-antigravity, and
github-copilot all as "providers" — but only anthropic is the direct
vendor. google-antigravity re-serves Claude via Google's sandbox
IDE product (same endpoint as gemini-cli, different auth contract);
github-copilot re-serves via GitHub's paid platform.

This matters for the 429 fallback chain: a broken anthropic key
should try genuinely-vendored endpoints first (none, for Claude),
then fall into family_failover (antigravity, copilot), and only then
reach the generic GLOBAL_PROVIDER_FALLBACK (opencode, opencode-go,
openrouter, ollama-cloud). The old all-flat list hid this distinction.

New shape:
  { providers: [...], family_failover?: [...] }

Corrections applied:
  claude-*: providers=[anthropic], failover=[google-antigravity, github-copilot]
  gemini-*: providers=[google-gemini-cli, google, google-vertex],
            failover=[github-copilot]
  gpt-* / o* / codex-*: providers=[openai],
            failover=[azure-openai-responses, openai-codex, github-copilot]
  mimo-*: providers=[xiaomi]  (new: was [] — Xiaomi MiMo Open Platform
          is direct API at api.xiaomimimo.com / token-plan-sgp.xiaomimimo.com)

buildCandidateOrder stitches [direct, family_failover, global_fallback]
with deduplication. User overrides via settings.proxy.providerPriority
continue to replace only the direct-provider list, keeping family
failover and global fallback intact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 10:20:32 +02:00
Mikael Hugo
0f0dcbf8c7 benchmarks: add Gemini 2.5/3/3.1 Pro + Flash entries
Gemini had zero benchmark entries in model-benchmarks.json despite
being served by google-gemini-cli (OAuth provider, SF native), google
(API key), google-vertex, google-antigravity, openrouter, etc. Every
gemini-* model in the pi-ai catalog scored 0 in the benchmark selector
— effectively excluded from auto-selection even when allow-listed.

Published numbers from DeepMind model cards + Vellum LLM leaderboard +
Vals AI:

  gemini-3-pro-preview:    SWE-Verified 76.2, HLE 37.5, AIME25 95,
                            GPQA-D 91.9, MMLU-Pro 81.0
  gemini-3.1-pro-preview:  SWE-Verified 78, HLE 41, AIME 97,
                            GPQA-D 93, MMLU-Pro 83 (Feb 2026)
  gemini-3-flash-preview:  estimated from Pro-vs-Flash delta
  gemini-2.5-pro:          SWE-Verified 63.8, HLE 18.8, GPQA-D 84.0,
                            MMLU-Pro 86
  gemini-2.5-flash:        estimated from Pro-vs-Flash delta

Context windows reflect Gemini's 1M-2M token capability.

LiveCodeBench Pro Elo (2439 for Gemini 3 Pro) isn't in the 0-100
percent schema — skipped rather than forced. Future: add arena_elo-
style LCB Elo dimension to the schema if we start routing on it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 10:11:45 +02:00
Mikael Hugo
e413cf4a3f preferences: add provider_preference for benchmark tie-breaking
When two models score identically in the benchmark selector — typically
the same underlying weights served by different endpoints — the
previous alphabetical tiebreaker picked wrong. dr-repo example:

  zai/glm-5.1       score 84.7
  opencode-go/glm-5.1 score 84.7

Both are the exact same GLM-5.1 weights. Alphabetical comparison made
opencode-go win ("o" < "z") even though zai is the NATIVE provider.

Fix: new `provider_preference` pref, an ordered list of providers.
Listed providers rank in order, unlisted fall after alphabetically.
Applied as the tie-breaker between score and alphabetical.

Global default shipped in ~/.sf/preferences.md:
  kimi-coding, minimax, zai, mistral, ollama-cloud, opencode-go,
  opencode

Native providers ranked before re-servers. Users can override per
project.

Verified: after the change, dr-repo picks zai/glm-5.1 as primary for
execute-task and gate-evaluate (was opencode-go/glm-5.1), and
kimi-coding/k2p5 stays primary for completion phases with its direct
provider winning over opencode re-servers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 10:09:42 +02:00
Mikael Hugo
345f9586dd benchmark-selector: coverage-confidence multiplier + 12 regression tests
The original "normalise by populated weight" was too aggressive: a model
with 1 strong dimension (delta-fast: human_eval=92) outranked a model
with 4 strong dimensions (beta-coder: swe_bench=85, lcb=90, he=95,
ifeval=90) because both normalised to their own small average.

Fix: multiply normalised score by a confidence factor tied to how much
of the unit's profile the model actually populated. Confidence =
populated_weight / total_profile_weight, blended 50/50 with a flat floor
so sparse-but-strong specialists still rank when no generalist covers
the profile:

  score = (weighted_sum / weight_total) * (0.5 + 0.5 * confidence)

Net effect on dr-repo's auto-resolve:

  Before:                          After:
  plan-milestone   glm-5.1         plan-milestone   MiniMax-M2.5
  research-slice   codestral       research-slice   mistral-large-2411
  execute-task     mistral-large   execute-task     opencode-go/glm-5.1
  validate-m       magistral       validate-m       MiniMax-M2.5
  subagent         mistral-large   subagent         kimi-coding/k2p5

MiniMax's broad coverage (8 populated dimensions from the M2 README)
now correctly outranks GLM-5.1's higher but narrower scores for
reasoning-heavy units. Matches user intuition that "MiniMax is really
powerful".

Also fixes findBenchmarkKey to try "<modelId>-latest" for date-suffixed
model variants — pi-ai catalogs "devstral-medium-2507" but benchmarks
only have "devstral-medium-latest"; matcher now bridges that.

12 regression tests cover:
  - empty candidate pool
  - each profile (reasoning/coding/lightweight) picks right champion
  - swe_bench ↔ swe_bench_verified equivalence
  - models with all-null benchmarks score 0 but stay in fallbacks
  - sparse-strong beats dense-weak (confirms confidence multiplier
    doesn't over-penalise specialists)
  - provider diversification in fallback chain
  - deterministic tie-breaking
  - unknown unit types use default coding profile
  - date-suffixed model IDs match family-latest keys

Audit: 41 of 85 allow-listed models in pi-ai catalog have benchmark
data. 44 score 0 (mostly opencode Zen re-served models, ministral
small variants, pixtral vision models, legacy open-mistral). Top
picks for every dr-repo unit type DO have benchmark data — the gap
is in the long tail of fallbacks, which never matter unless the
primary and closer fallbacks all fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:58:10 +02:00
Mikael Hugo
0b8a1c246f auto-benchmark model selection: pick best-scoring per unit type
New module src/resources/extensions/sf/benchmark-selector.ts implements
benchmark-driven model selection. When models.<unit> is not pinned,
preferences-models.ts falls through to pick the highest-scoring
candidate from allowed_providers × pi-ai's model catalog, ranked
against a per-unit-type weight profile.

Weight profiles per unit type:
  plan-milestone / plan-slice  → agent-planning (swe_bench .25, lcb
                                  .20, hle .15, gpqa .15, mmlu_pro .15,
                                  aime .10)
  research-*                    → mixed (mmlu_pro, hle, human_eval,
                                  browse_comp, simple_qa, gpqa)
  execute-task                  → coding (swe_bench .35, swe_bench_v
                                  .25, lcb .20, human_eval .15)
  execution_simple / complete-* → fast+correct (human_eval .40,
                                  instruction_following .35, ruler .25)
  gate-evaluate                 → review (swe_bench .30, hle .25,
                                  gpqa .25, ifeval .20)
  validate-milestone            → validation (hle .30, gpqa .25,
                                  mmlu_pro .25, swe_bench .20)

Key design decisions:
  - Missing dimensions are dropped (normalised by populated weight),
    so a model with 2 strong populated scores isn't crushed by a peer
    with 5 mediocre ones.
  - swe_bench ↔ swe_bench_verified are fungible — some vendors publish
    one, some the other; treat as equivalent.
  - Provider diversification in fallbacks so one provider going 429
    doesn't kill the whole chain.
  - Score ties broken by coverage, then lexical — deterministic.

Also updates MiniMax-M2/M2.5/M2.7 benchmarks with real numbers from
the M2 official README (DeepWiki sourced) and MiniMax-M2.5 card
(minimax.io): swe_bench_verified 69.4→80.2, LCB 83, HLE 31.8 (w/
tools — more representative for agent work than no-tools 12.5),
AIME25 78, GPQA-D 78, MMLU-Pro 82. Context windows bumped to
weights-level: M2 400K, M2.5/M2.7 1M (endpoints may cap lower).

Verified end-to-end: with dr-repo's allow-list
(kimi-coding/minimax/zai/opencode-go/mistral) and models.* absent,
resolveModelWithFallbacksForUnit() returns:
  plan-milestone     → opencode-go/glm-5.1 (+3 fallbacks)
  research-slice     → mistral/codestral-latest
  execute-task       → mistral/mistral-large-latest
  execution_simple   → kimi-coding/k2p5
  gate-evaluate      → opencode-go/glm-5.1
  validate-milestone → mistral/magistral-medium-latest
  subagent           → mistral/mistral-large-latest

Users can still pin individual units (existing models.* behaviour
unchanged) or rely fully on auto-selection by omitting them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:43:26 +02:00
Mikael Hugo
6450b37025 core + search + benchmarks: auth-error recovery, multi-provider search, M2.7-highspeed entry
Four related improvements that landed in the working tree after the
auto-hardening merge but hadn't been committed:

1. auth_error as a distinct error type (auth-storage + retry-handler).
   Previously invalid/expired API keys would retry the same failing
   credential until the retry budget exhausted. Now:
     - classifyErrorType() recognizes 401s, "invalid api key",
       "authentication error", "unauthorized" etc as "auth_error"
     - RetryHandler triggers cross-provider fallback on auth_error just
       like it does for rate_limit and quota_exhausted — switch
       providers rather than burning retries on a broken key
   Outcome: a stale OPENCODE_API_KEY in sops now fails over to kimi or
   minimax immediately instead of stalling the unit.

2. Multi-provider search-key detection (native-search.ts).
   The "Web search: Set BRAVE_API_KEY" warning fired whenever a
   non-Anthropic model lacked BRAVE_API_KEY, even when the user had
   TAVILY_API_KEY or OLLAMA_API_KEY available. Now: the warning
   suppresses if any of BRAVE/TAVILY/OLLAMA keys is present, and the
   warning text lists all three options. Matches the preferences-
   validation allow-list for search_provider.

3. MiniMax-M2.7-highspeed benchmark entry (model-benchmarks.json).
   Routes the fast-tier variant of M2.7 through the Bayesian blender
   with inherited RULER scores. Lets dynamic routing consider the
   highspeed model when speed matters more than peak quality.

No regressions: the 41 pre-existing test failures in pi-coding-agent
(FallbackResolver chain-membership + LSP integration) are unchanged
relative to the prior commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:24:54 +02:00
Mikael Hugo
a4428ba1ff key-manager: surface opencode-go in LLM provider list for onboarding
opencode-go is already a first-class provider in pi-ai (models.generated.js
registers 7 models under the opencode-go namespace: glm-5, glm-5.1,
kimi-k2.5, mimo-v2-{omni,pro}, minimax-m2.{5,7}) and runs against
https://opencode.ai/zen/go/v1 with OPENCODE_API_KEY auth.

It was missing from key-manager's LLM provider registry, so the /sf
config wizard and onboarding flows didn't prompt users to supply
OPENCODE_API_KEY. Adding it here gives users a discoverable path to
subscribe and surface the 7 opencode-go models in list-models.

Research confirmed (DeepWiki sst/opencode + curl probes):
  - /zen/go/v1/chat/completions is the OpenAI-compatible endpoint
  - OPENCODE_API_KEY is the correct env var
  - No /models listing endpoint — hardcoding is correct (already done
    by the generate-models.ts pipeline)
  - Sister /zen/go/v1/messages serves Anthropic-compat minimax variants

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:22:48 +02:00
Mikael Hugo
58543fdae4 preferences: add allowed_providers hard allowlist + plug 6 merge gaps
New feature: allowed_providers — hard allowlist of providers that
auto-mode can dispatch to. When set, models from any other provider
are invisible to selection BEFORE models.* resolution and dynamic
routing run. This prevents routing from silently picking providers
the user doesn't have keys for — the root cause of repeated
"400 The requested model is not supported" pauses observed in
dr-repo when routing picked gpt-5.2-codex despite no GPT being
configured.

Implementation is a single filter at the top of selectAndApplyModel:
  availableModels = rawAvailable.filter(m => allowed.includes(m.provider.toLowerCase()))
If the allowlist rejects everything, throw with a clear message
pointing at the pref (fail-closed — don't dispatch to whatever's
left).

While wiring this I found mergePreferences was silently dropping
six more validated fields — same latent-bug class as service_tier:

  - allowed_providers (new)       - flat_rate_providers
  - stale_commit_threshold_minutes - widget_mode
  - modelOverrides                - safety_harness

All added to the merge function. Now: if you set it in PREFERENCES,
consumers see it.

Verified end-to-end: loadEffectiveSFPreferences() reads
allowed_providers from dr-repo's .sf/PREFERENCES.md correctly, and
auto-mode model selection honors the filter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 09:12:33 +02:00
Mikael Hugo
9f0723a7be preferences + mcp-client: resolve from main worktree and add global MCP config
Two related fixes surfaced from a real sf headless auto run in dr-repo.

1. Project preferences now resolve from the MAIN worktree, not the
   current linked worktree. SF's auto-mode creates a git worktree per
   milestone (`.sf/worktrees/M003/`). The old code called
   `projectPreferencesPath()` which used `process.cwd()` — the
   milestone worktree — so a pref change on main (service_tier,
   dynamic_routing, model config) never reached an in-flight milestone
   until the branch merged main. Observed concretely when disabling
   dynamic_routing had no effect until we merged main into the
   milestone branch.

   New `projectPrefsRoot()` detects a linked worktree by reading
   `.git` (a FILE in worktrees, pointing to
   `/main/.git/worktrees/NAME`), follows the `commondir` pointer back
   to the main `.git` dir, and walks up one level. Falls back to cwd
   silently for non-worktree setups.

2. MCP server config now also loads from global paths
   (`~/.sf/mcp.json`, `~/.sf/agent/mcp.json`) in addition to the
   existing project-level (`.mcp.json`, `.sf/mcp.json`). First-hit
   wins, so project configs can still shadow or augment a globally-
   registered server by name. This lets the user register unauth'd
   servers like the DeepWiki remote MCP once and have every SF
   project pick it up without per-project `.mcp.json`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:53:27 +02:00
Mikael Hugo
879dc63a70 prompts: add DeepWiki as the preferred docs-lookup path
All 9 research/planning/discuss prompts updated to put DeepWiki
first in the docs-lookup order. Context7 becomes the fallback for
package-registry-only libraries.

Rationale: Context7 free tier is capped at 1000 req/month — a
research-heavy auto loop can burn through that in a single session.
DeepWiki has no cap and covers any GitHub-hosted library with
AI-indexed answers, so it's strictly better as the default for the
typical SF research path.

Prompts touched:
  system.md, discuss.md, discuss-headless.md, plan-milestone.md,
  queue.md, research-milestone.md, research-slice.md,
  guided-discuss-milestone.md, guided-discuss-slice.md,
  guided-research-slice.md

Each references the three DeepWiki tools — ask_question,
read_wiki_structure, read_wiki_contents — and explicitly mentions the
Context7 1000-req/month cap so models don't spend it wastefully.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:47:57 +02:00
Mikael Hugo
3bea082f20 auto-dispatch: silence expected registry fallback on non-auto commands
sf headless query and sf headless status call resolveDispatch() without
going through auto-mode startup, so the rule-registry singleton is
never initialized. The previous code caught getRegistry()'s init error
and logged a warning on every call — noise on the normal path:

  [sf:dispatch] WARN: registry dispatch failed, falling back to inline
  rules: RuleRegistry not initialized — call initRegistry() or
  setRegistry() first.

Now: hasRegistry() probe first. When unset, skip straight to the inline
rule loop without warning (it's the intended behavior outside auto).
When the registry IS set and evaluateDispatch() genuinely throws, log
the warning so real bugs still surface.

Adds hasRegistry() as a public helper for any other hot-path caller
that wants to branch on init without try/catch overhead.

Verified end-to-end: sf headless query and sf headless status in
dr-repo now run clean, no false warning. All 25 rule-registry tests
pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:33:29 +02:00
Mikael Hugo
56130c2e39 preferences: wire 6 more latent pref fields through validation
Same class of bug as the service_tier fix: preference fields declared
in SFPreferences type and consumed by feature code, but never copied
into the validated output, so they silently become undefined when set
in PREFERENCES.md.

Found by diffing validated.<field> vs the interface declarations:

- forensics_dedup (boolean) — /sf forensics issue de-dup opt-in
- stale_commit_threshold_minutes (number) — doctor safety-commit cadence
- widget_mode ("full"|"small"|"min"|"off") — dashboard widget sizing
- slice_parallel ({ enabled?, max_workers? }) — slice-level parallelism
- modelOverrides (Record) — per-model capability patches
- safety_harness ({ enabled?, evidence_collection?, ... }) — LLM safety

Validation is kind-appropriate: primitives get type + range checks,
nested objects get object-shape guards with pass-through for now.
Consumer sites already treat missing fields as optional, so landing
shallow validation first is safe.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:25:59 +02:00
Mikael Hugo
63e87e8e86 preferences: wire service_tier through validation
validatePreferences() is a strict allow-list — it copies only explicitly
handled fields from input to output. service_tier was in
KNOWN_PREFERENCE_KEYS (no unknown-key warning) but was never copied into
the validated output, so users setting service_tier: priority or flex in
PREFERENCES.md silently got undefined.

This was a latent bug from before today's work: the new "off" value hit
it first because I verified end-to-end, but priority/flex had the same
issue. /sf fast on writes "priority" via writeGlobalServiceTier —
correctly — and then the next read drops it on the floor.

Now: service_tier is validated against {priority, flex, off} and copied
through. Invalid values raise an error rather than being silently lost.

Verified: dr-repo's service_tier: "off" in .sf/PREFERENCES.md now loads
correctly via loadEffectiveSFPreferences().preferences.service_tier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-19 08:05:26 +02:00