singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	0606983d97	feat(subagent): add background job manager and tests SubagentBackgroundJobManager tracks long-running subagent jobs with status, abort support, and TTL-based eviction of completed results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 04:18:17 +02:00
Mikael Hugo	efd5e14e0a	feat: add FEATURES.md capability map and generator Human-oriented documentation of SF capabilities, with a script that keeps it in sync with workflow-tools.ts and extension manifests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 04:18:12 +02:00
Mikael Hugo	25797129e2	sf snapshot: pre-dispatch, uncommitted changes after 38m inactivity	2026-04-28 00:21:39 +02:00
Mikael Hugo	0d286b991b	sf snapshot: pre-dispatch, uncommitted changes after 2902m inactivity	2026-04-27 23:42:51 +02:00
Mikael Hugo	260d50a823	docs: warn against Python for managed-resources hash; causes resync hang Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 23:20:15 +02:00
Mikael Hugo	f0da5b6d21	fix: bind getProviderAuthMode to registry instance to avoid undefined 'this' Extracting a class method as a bare reference loses its 'this' context, causing 'Cannot read properties of undefined' when minimax (or any provider) triggers the flat-rate auth-mode lookup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 19:22:39 +02:00
Mikael Hugo	7be540480e	docs: add CLAUDE.md with dev guide for build pipeline and test runner Documents the dist-vs-source distinction that caused the memoriesSection fix to not take effect, the c8 coverage runner process leak, and the template variable maintenance contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 18:56:03 +02:00
Mikael Hugo	7289933909	fix: populate memoriesSection in execute-task prompt and fix stale dist buildExecuteTaskPrompt was not passing memoriesSection to loadPrompt, causing headless auto to fail with a template variable error. Also updated plan-slice-prompt.test.ts to supply the four template variables (memoriesSection, runtimeContext, phaseAnchorSection, gatesToClose) that were missing from the test fixture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 18:46:55 +02:00
Mikael Hugo	a30a7692e3	fix: dist-redirect.mjs incorrectly rewrites .js→.ts for node_modules paths containing /src/ The resolver guarded on context.parentURL.includes('/src/') to identify in-repo source files, but @google/gemini-cli-core installs to node_modules/@google/gemini-cli-core/dist/src/ which also contains '/src/'. Relative imports from that dist package (e.g. './config/config.js') were incorrectly rewritten to './config/config.ts', causing ERR_MODULE_NOT_FOUND on every test that transitively imports the google-gemini provider. Fix: add !context.parentURL.includes('/node_modules/') guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 18:04:23 +02:00
Mikael Hugo	2e32c96fa0	Port gsd2 functional parity: turn-epoch, abandon-detect, reapplyThinking, exec chain, memory chain, onboarding-state - auto/turn-epoch.ts: AsyncLocalStorage-backed stale-write dropping for timeout recovery - journal.ts: isStaleWrite() guard drops superseded turn writes - auto/run-unit.ts: wrap agent_end Promise.race in runWithTurnGeneration - auto/session.ts: ThinkingLevelSnapshot type + autoModeStartThinkingLevel/originalThinkingLevel fields - auto-model-selection.ts: reapplyThinkingLevel() called after every successful setModel() - auto/phases.ts: pass autoModeStartThinkingLevel to selectAndApplyModel + hook override restore - abandon-detect.ts: two-signal milestone abandon detection in rewrite-docs overrides - auto-post-unit.ts: use detectAbandonMilestone + parkMilestone in rewrite-docs handler - preferences-types.ts: ContextModeConfig + isContextModeEnabled - exec-sandbox.ts: sandboxed bash/node/python subprocess with .sf/exec/ persistence - exec-history.ts: read-side scan of .sf/exec/*.meta.json - compaction-snapshot.ts: ≤2 KB markdown digest written before context compaction - tools/exec-tool.ts: sf_exec MCP tool executor - tools/exec-search-tool.ts: sf_exec_search MCP tool executor - tools/resume-tool.ts: sf_resume MCP tool executor - bootstrap/exec-tools.ts: registers sf_exec/sf_exec_search/sf_resume - memory-relations.ts: knowledge-graph edges between memories (traverseGraph) - tools/memory-tools.ts: capture_thought/memory_query/sf_graph executors - bootstrap/memory-tools.ts: registers capture_thought/memory_query/sf_graph - bootstrap/register-extension.ts: wire exec-tools + memory-tools into registration - onboarding-state.ts: onboarding completion record at ~/.sf/agent/onboarding.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:58:39 +02:00
Mikael Hugo	5887ea3fd1	port gsd2: blocked-models gate, milestone-summary classifier, unsupported-model recovery blocked-models.ts (new): Persistent per-project blocklist at .sf/runtime/blocked-models.json. loadBlockedModels / isModelBlocked / blockModel (file-lock-safe write). milestone-summary-classifier.ts (new): classifyMilestoneSummaryContent → "success" \| "failure" \| "unknown". isTerminalMilestoneSummaryContent: failure summaries are NOT terminal — lets auto-mode re-enter a milestone after a failed recovery summary. state.ts: Phase 1 (completeMilestoneIds) and Phase 2 (registry) now check isTerminalMilestoneSummaryContent before treating a SUMMARY as complete. A failure SUMMARY no longer prematurely parks a milestone. error-classifier.ts: Add "unsupported-model" ErrorClass kind with regex detection (model + not-supported/unavailable/no-access + account/plan/tier). Checked before "permanent" so /account/i in PERMANENT_RE doesn't swallow it. auto-model-selection.ts: Wire isModelBlocked() gate in selectAndApplyModel candidate loop: skips provider-rejected models and continues to fallbacks. bootstrap/agent-end-recovery.ts: Handle cls.kind === "unsupported-model": blockModel(), try fallback chain skipping already-blocked models, pause if no usable fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:13:27 +02:00
Mikael Hugo	6cb6de4fd2	perf: parallelize I/O, add runtime cache, extend nix devenv - unit-context-composer: resolve artifact keys in parallel (Promise.all) - unit-runtime: add in-memory cache to avoid repeated disk reads per dispatch - auto-timers: share 15s idle watchdog tick with context-pressure check - auto-prompts: 1s TTL budget cache to coalesce repeated loadEffectiveSFPreferences calls - native-git-bridge: extend nativeHasChanges TTL 10s→30s - auto-dashboard: remove pulsing dot animation (CPU churn, no UX value) - flake.nix: add nodePackages.typescript to dev shell Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:12:32 +02:00
Mikael Hugo	12aabd863e	port gsd2 #4769 : worktree telemetry, slice-cadence, canonical-root fix + /sf scan Ports commit 7fb35ca58 from gsd2 (PR #4769) covering four issues: #4761 — resolveCanonicalMilestoneRoot in worktree-manager.ts routes validate-milestone through the live worktree path instead of stale project-root state when a milestone worktree is active. #4762 — auditOrphanedMilestoneBranches in auto-start.ts now surfaces in-progress milestone branches with unmerged commits ahead of main (previously only complete milestones were audited). Gated on isClosedStatus so parked/other closed statuses are unaffected. #4764 — worktree-telemetry.ts: typed emit helpers (emitWorktreeCreated, emitWorktreeMerged, emitWorktreeOrphaned, emitAutoExit, emitWorktreeSync, emitCanonicalRootRedirect, emitSliceMerged, emitMilestoneResquash) plus summarizeWorktreeTelemetry aggregator and nearest-rank percentile(). Wired in: worktree-resolver.ts (create/merge events), auto-start.ts (orphan telemetry), auto.ts stopAuto (auto-exit with normalized reason), worktree-manager.ts (canonical-root-redirect). Surfaced in forensics.ts via detectWorktreeOrphans and Worktree Telemetry sections. #4765 — slice-cadence.ts: mergeSliceToMain squash-merges each slice's commits onto main as soon as the slice passes validation (opt-in via git.collapse_cadence: "slice"). resquashMilestoneOnMain collapses N per-slice commits into one milestone commit at completion. Wired in auto-post-unit.ts (slice merge after complete-slice with stopAuto on conflict/error) and worktree-resolver.ts (resquash at mergeAndExit). AutoSession.milestoneStartShas tracks the pre-first-slice SHA. GitPreferences and preferences-validation.ts extended with collapse_cadence and milestone_resquash fields. Also ports /sf scan command: commands-scan.ts with parseScanArgs, resolveScanDocuments, buildScanOutputPaths, and handleScan dispatching a focused codebase assessment prompt to .sf/codebase/. journal.ts: 9 new JournalEventType values for the telemetry events. All changes are additive; default behavior (cadence="milestone") unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 09:03:56 +02:00
Mikael Hugo	2911d3b93d	port gsd2: reassess-roadmap opt-in (ADR-003 §4) + prefer toolDefinition.label reassess-roadmap: flip default from true → false. Most reassess units conclude "roadmap is fine" burning a session for no change; the plan-slice prompt now carries a JIT preamble at zero cost. (#4778) tool-execution: always prefer toolDefinition.label when non-empty, even when label === name — allows tools to display their canonical name explicitly. (#4758) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:33:50 +02:00
Mikael Hugo	d4cdcb582d	port gsd2 #3338 : ecosystem plugin loader for .sf/extensions/ Adds support for project-local SF extension plugins dropped in .sf/extensions/. Trust-gated (requires pi trust), symlink-escape safe. - ecosystem/sf-extension-api.ts: SFExtensionAPI wrapper exposing getPhase() and getActiveUnit() to third-party handlers; updateSnapshot refreshes state before_agent_start so handlers see current phase/unit - ecosystem/loader.ts: discovers .sf/extensions/*.js, loads them via dynamic import, dispatches factory(api) for each - register-extension.ts: initializes ecosystemHandlers array, wires loader - register-hooks.ts: before_agent_start refreshes snapshot then dispatches ecosystem handlers before returning SF system prompt - types.ts: SFActiveUnit interface (milestoneId/sliceId/taskId + titles) - workflow-logger.ts: "ecosystem" added to LogComponent union Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:27:55 +02:00
Mikael Hugo	6c36d62f35	port gsd2 #4961 : stop using active-tool snapshot as model-policy gate Fixes a bug where per-unit tool narrowing poisoned the policy gate for subsequent units, causing "Model policy denied dispatch before prompt send" errors on complete-slice and discuss-milestone (100% Win repro). Four-part port from gsd2@817031b2a: - ModelPolicyDispatchBlockedError class with per-model deny reasons - TOOL_BASELINE WeakMap + clearToolBaseline/restoreToolBaseline lifecycle - auto-model-selection: use getRequiredWorkflowToolsForAutoUnit as requiredTools - auto/loop: catch ModelPolicyDispatchBlockedError as non-retryable (pause) - auto.ts: wire clearToolBaseline at startAuto (fresh only) and stopAuto Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:15:04 +02:00
Mikael Hugo	4fdd8700a3	port gsd2 upstream features: scope classifier, composer v2, GPT-5.5, test timeout - milestone-scope-classifier: add getMilestonePipelineVariant + milestoneRowToScopeInput wired into auto-dispatch trivial-skip for research/validation phases (#4781) - auto-prompts: rename GSD→SF identifiers, add isSummaryCleanForSkip, prefs param on checkNeedsReassessment, buildExtractionStepsBlock from commands-extract-learnings - unit-context-manifest + unit-context-composer: port v2 typed computed artifacts (#4924) - skill-manifest: per-unit-type skill filter resolver (#4788, #4792) - escalation: stub for ADR-011 mid-execution escalation (full port deferred) - auto-start: extract decideSurvivorAction for testability (#4832) - models: add gpt-5.5 + gpt-5.4-mini to cost table, router, and models.generated.ts - types: EscalationArtifact, context_window_override, skip_clean_reassess, mid_execution_escalation, sketch_scope on SliceRow - tool-execution: add visibleWidth import (was undefined) - package.json: add --test-timeout=30000 to prevent parallel tests from freezing machine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:08:11 +02:00
Mikael Hugo	e2147c0694	sf snapshot: pre-dispatch, uncommitted changes after 43m inactivity	2026-04-25 06:34:49 +02:00
Mikael Hugo	7b6c9dd099	sf snapshot: pre-dispatch, uncommitted changes after 4703m inactivity	2026-04-25 05:51:29 +02:00
ace-pm	e625d20a59	fix: add self to flake outputs	2026-04-21 23:27:40 +02:00
ace-pm	c744bdf6c1	fix: atomic writes, parse radix, lossy json, silent worker spawn 8 fixes from 3rd-pass scan: 1. web/components/sf/tempCodeRunnerFile.tsx: remove orphan VS Code 'Code Runner' artifact (850+ lines duplicated from shell-terminal.tsx). Unreferenced but compiled into tsc project. 2. sf/phase-anchor.ts: writePhaseAnchor used plain writeFileSync — a crash mid-write would corrupt the handoff checkpoint that readPhaseAnchor then silently returns null for, losing cross-phase context. Switched to atomicWriteSync (already used by sibling files). 3. sf/forensics.ts: same non-atomic writeFileSync on active-forensics.json marker. Race with a concurrent reader produces an empty object and the forensics session is lost. Switched to atomicWriteSync. 4. web/auto-dashboard-service.ts: paused-session.json existence was the intended signal but a corrupt body silently dropped the paused flag so the UI showed active. Now reports paused on file existence regardless of body integrity, and warns on corruption. 5. sf/visualizer-data.ts: doctor-history.jsonl parser did .map(JSON.parse) inside an outer catch. One corrupt line discarded 19 valid entries. Per-line try/catch preserves the valid rows. 6. sf/files.ts: three parseInt calls without radix (step, total_steps, totalSteps) — also missing \|\| 0 fallback for NaN. 7. cli.ts: parseInt(process.versions.node) without radix. Split on '.' and use radix 10 explicitly. 8. sf/slice-parallel-orchestrator.ts: silent 'catch {}' around spawn() masked worker-spawn failures as 'no workers available'. Matches sibling parallel-orchestrator.ts pattern — now logs via logWarning. Skipped from the scan (need a real lock mechanism, not safe as a one-line fix): - sf/auto-dispatch.ts:164 (UAT counter race) - sf/captures.ts:107 (CAPTURES.md append race) Deferred (low-value): - preferences-models.ts, key-manager.ts, auto-timers.ts silent catches - dead variable in visualizer-data.ts - google-gemini-cli.ts maxTokens clamp interaction tsc --noEmit green at root.	2026-04-21 02:13:10 +02:00
ace-pm	51b65fd490	fix: symlink extensions + silent catches masking real errors Real bugs from 2nd-pass scan: 1. extension-registry.ts: discoverAllManifests skipped symlinked extension dirs because Dirent.isDirectory() returns false for symlinks. Dev-workflow symlinks under ~/.sf/agent/extensions/ were invisible to list/enable/ disable/info. Matches the regression documented in symlink-extension-discovery.test.ts — the test inlines the correct logic, but this callsite still had the buggy form. Now accepts isDirectory() \|\| isSymbolicLink(). 2. headless.ts SIGINT handler: client.stop() failures were double-silenced (inner .catch(()=>{}), outer try{}catch{}). Interactive mode logs stop errors to stderr. Restored head/headless parity — still fire-and-forget (exit code is forced via process.exit) but failures are observable. 3. openai-codex-responses.ts SSE parser: malformed data frames were silently dropped so broken streams looked identical to clean ones. Now debug-logs the parse error with the chunk context so broken streams are distinguishable in logs. Stream continues on bad chunk (one bad frame shouldn't kill the whole generation). 4. web/cleanup-service.ts generated script: bare 'catch {}' around four native git calls (nativeBranchList, nativeDetectMainBranch, nativeBranchListMerged, nativeForEachRef). A failed main-branch detection silently left mainBranch undefined-shaped, then the next native call operated on garbage. Now emits console.warn so failures surface in the subprocess log. 5. web/undo-service.ts generated script: git revert failure was silenced; when --no-commit failed, user saw commitsReverted=0 with no reason. Now logs the revert error before attempting --abort (abort itself remains best-effort silent). False positives from the same scan (investigated and dismissed): - auto-worktree.ts #2505: code uses ':(exclude).sf/milestones' pathspec + shelter-and-restore, which is a better fix than the 'drop --include-untracked' approach the test comment describes. Test comment is stale; source is correct. - Lifecycle handler unhandled rejections across 5 extensions: extensions/runner.ts already try/catches handler invocations and routes to emitError. Wrapping the individual handlers would be redundant.	2026-04-21 02:01:41 +02:00
ace-pm	0f94341b43	fix(loader): fall back to src/resources when SF-WORKFLOW.md missing from dist Build sometimes copies dist/resources/extensions/ without the top-level markdown files (observed: SF-WORKFLOW.md absent in dist/resources/ while extensions/ was present). existsSync(distRes) was true either way, so SF_WORKFLOW_PATH pointed at a non-existent path and /sf failed with ENOENT. Check for the specific file instead of the directory.	2026-04-21 01:39:18 +02:00
ace-pm	bf96faf99b	fix: 7 cleanup findings + 2 reasoning:auto TS regressions Cleanup: - cli.ts: collapse duplicate !SF_FIRST_RUN_BANNER && !SF_FIRST_RUN_BANNER check (botched sed from SF rebrand) - delete gsd-orchestrator/ (byte-identical duplicate of sf-orchestrator/, dead post-rebrand) - package.json: rename 'sf-run' → 'singularity-forge' (missed by @sf-run/* → @singularity-forge/* rename) - delete repowise.db (164 KB orphan sqlite, no references) and gitignore - metrics.ts: drop always-zero retries + TODO; outcome-recorder defaults to 0 - rename gsd-web-launcher-contract.test.ts → sf-web-launcher-contract.test.ts - rename gsd-skill-ecosystem.md → sf-skill-ecosystem.md Regressions from commit `f1da908dc` ('pi-ai: add reasoning:auto across all providers'): - anthropic-vertex.ts: 'auto' was passed straight to adjustMaxTokensForThinking which requires ThinkingLevel, breaking compile. Mirror anthropic.ts: early-return adaptive thinking on 'auto'+supportsAdaptiveThinking, resolveReasoningLevel before adjustMaxTokens. - claude-code-cli/stream-adapter.ts: buildSdkOptions extraOptions.reasoning widened ThinkingLevel → RequestedThinkingLevel; 'auto' strips to undefined for the SDK effort mapping but still requests thinking:adaptive so the SDK picks effort itself. Remaining TS errors (not in this commit — dep hygiene): - google-gemini-cli.ts: OAuth2Client type mismatch between workspace-local and hoisted pnpm google-auth-library. Needs pnpm dedupe / single-install. - google-gemini-cli.ts:158: arity mismatch (3 args vs 2 expected). Signature changed somewhere; caller not updated.	2026-04-21 01:38:19 +02:00
ace-pm	485e8f608e	chore: init sf	2026-04-21 01:38:02 +02:00
ace-pm	e63184f91d	fix(migrations): drop press-any-key block to avoid stdin wedge showDeprecationWarnings ran setRawMode(true)/once('data')/setRawMode(false)/ pause() right before pi-tui's own stdin setup. That handoff is fragile — buffered bytes and mode flips between the migration prompt and the TUI's raw-mode setup can leave stdin cooked and line-buffered, producing the 'Enter does nothing + garbled typing' symptom. Warnings now print non-blocking. They stay visible in scrollback above the TUI, so users still see them without a blocking acknowledge step.	2026-04-21 00:56:18 +02:00
ace-pm	e6676692fc	fix(sf-tui): remove welcome overlay that hangs on enter The per-session branded welcome overlay was added by the SF rebrand (`9d739dfa5`) as a boxed 'Press any key to continue...' splash shown once per sf session. In practice: Enter doesn't dismiss it and typing renders as garbled characters behind the overlay, blocking every TUI launch. Branding was redundant with the header (installed at session_start) and the footer (git branch + model). Shortcuts are discoverable via help. Deleting the overlay eliminates the hang vector entirely. Legacy-extension migration warnings (migrations.ts 'Press any key...') are unaffected — those are vendored upstream Pi code on a different code path and only fire when deprecated extensions are present.	2026-04-21 00:44:28 +02:00
ace-pm	6446381730	chore(nix): run deadnix + statix + alejandra Automated formatting pass: remove dead bindings, apply statix lint fixes, normalize formatting via alejandra.	2026-04-21 00:27:31 +02:00
ace-pm	d0925d8d31	chore(make): add 'sf' target for running from source	2026-04-21 00:18:55 +02:00
ace-pm	dff521a506	fix(git): drop orphan gitlink at mintlify-docs/docs Removes stray submodule pointer (mode 160000, commit 5c549fdf) with no corresponding .gitmodules entry and empty working tree. Produced 'fatal: No url found for submodule path' + exit 128 warning on every CI checkout (visible in Pipeline 'Update CI Builder Image' runs).	2026-04-21 00:17:45 +02:00
Mikael Hugo	f1da908dcd	pi-ai: add reasoning:auto across all providers + Kimi K2.6 RequestedThinkingLevel adds "auto" to the reasoning option. Each provider handles it natively: - Claude 4.x (anthropic/bedrock): adaptive thinking, no effort constraint - Gemini 2.5 Pro/Flash (google/vertex/gemini-cli): THINKING_LEVEL_UNSPECIFIED - GPT-5+ (openai-responses/azure): reasoning.effort omitted, model decides - Kimi (kimi-coding): {"type":"enabled"} without budget_tokens via new capabilities.thinkingNoBudget flag — model manages reasoning depth - GLM (zai, thinkingFormat:zai): enable_thinking:true already correct - MiniMax (anthropic API): explicit budget_tokens required, resolves to medium ModelCapabilities.thinkingNoBudget: new flag for Anthropic-compatible providers that accept {"type":"enabled"} without a budget (Kimi API). models.generated.ts: add Kimi K2.6 (id: kimi-for-coding, beta API); add thinkingNoBudget capability to all kimi-coding models. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 21:22:25 +02:00
Mikael Hugo	38d3bd55da	sf: route Gemini family models to google-gemini-cli by default resolveModelId now prefers google-gemini-cli over google (direct API) for bare Gemini/Gemma IDs, matching the operational default after the CLI-core re-platform. google-vertex is still honoured when it's the current provider. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 20:33:43 +02:00
Mikael Hugo	822791fad3	sf: wire Fix 1 deferred-commit (stage-before-verify, commit-after-verify) postUnitPreVerification now calls stageOnly() for execute-task units when action=commit, setting stagedPendingCommit=true and capturing task context. postUnitPostVerification commits the staged index after the gate passes, using a conventional-commit message built from the task context. Failure is non-fatal (logWarning + UI warning). 11 structural tests cover the full deferral lifecycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 20:33:39 +02:00
Mikael Hugo	315c2c49ca	sf: fail-closed verification gate + deferred-commit infrastructure Fix 2: verification gate no longer passes when no commands are configured. Empty-commands result now returns passed=false, skipped=true. Updated verification-gate.test.ts; added skipped-result guard in auto-verification.ts that warns and continues (not a hard failure). Fix 3: split auto-verification.ts try/catch into two zones. Zone 1 (gate machinery: prefs load, task lookup, runVerificationGate, captureRuntimeErrors, runDependencyAudit) catches → pauseAuto + return "pause". Zone 2 (ancillary: evidence writes, UOK gate, notifications) catches → logWarning + return "continue". Added verification-fail- closed.test.ts with 11 structural tests. Fix 1 (infrastructure): added stageOnly() + commitStaged() to GitServiceImpl, added stagedPendingCommit flag to AutoSession (cleared in reset()), marked the runTurnGitAction call site in postUnitPreVerification with TODO(fix-1-deferral) for the final wiring. Fix 4: timeout handler in runFinalize now captures hadStagedPending and hadCommitted before nulling currentUnit. Clears stagedPendingCommit to prevent orphaned deferred commits. Emits a diagnostic warning for each case so operators know whether staged-but-uncommitted changes will be absorbed or whether a commit landed before verification was skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:32:47 +02:00
Mikael Hugo	c940ebc16f	sf: unify milestone discuss dispatch + todo.md seed injection Replace separate dispatchHeadlessBootstrap with one flow: - dispatchNewMilestoneDiscuss({ auto }) — auto=true uses headless prompt + rootFiles seed, no pendingAutoStartMap; auto=false uses discuss prompt with preparation, sets pendingAutoStartMap - bootstrapNewMilestone() — project setup + ID reservation, called directly from bootstrapAutoSession instead of the old wrapper - injectTodoContext() — reads and deletes todo.md/TODO.md/SPEC.md at project root, injects content as spec into any preamble; called identically in auto and interactive flows Removes dispatchHeadlessBootstrap entirely. auto-start.ts now calls the primitives directly. All three showWorkflowEntry new-milestone sites use dispatchNewMilestoneDiscuss({ auto: false }). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:04:12 +02:00
Mikael Hugo	67d25f95f2	sf: add gemini cli preflight token counting	2026-04-19 13:25:07 +02:00
Mikael Hugo	8abfc98fdc	pi-ai: source google-gemini-cli model list from cli-core's VALID_GEMINI_MODELS generate-models.ts now imports @google/gemini-cli-core's VALID_GEMINI_MODELS set and iterates it to produce SF's google-gemini-cli provider entries. Single source of truth: when Google ships a new Gemini model, it lands in cli-core first, then flows into SF on `npm update @google/gemini-cli-core` + `generate-models.ts` re-run — no more hand-editing the generate script. Before: 6 hardcoded entries (gemini-2.0/2.5/3 flash + pro preview, etc.) After: 7 entries sourced dynamically, filtered to drop `-customtools` variants which require a different tool protocol: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite, gemini-3-pro-preview, gemini-3-flash-preview, gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview Capability tagging uses cli-core's isProModel / isPreviewModel so reasoning=true for pro + 3.x preview variants (excluding flash-lite). Context-window / max-output-tokens kept in an SF-local override table since cli-core doesn't publish those per-model. Pre-existing 4 test failures (zai glm-5.1 x3, anthropic resolveBaseUrl #4140) unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 11:44:28 +02:00
Mikael Hugo	d83a59fb14	pi-ai/google-gemini-cli: re-platform transport on @google/gemini-cli-core Replaces the handwritten fetch() + SSE-parsing + custom retry loop in packages/pi-ai/src/providers/google-gemini-cli.ts with direct calls into `CodeAssistServer.generateContentStream()` from @google/gemini-cli-core. Requests to cloudcode-pa.googleapis.com are now byte-identical to what the real `gemini` CLI sends — same User-Agent, same Client-Metadata, same retry semantics — which preserves Google's subsidised free-OAuth quota treatment and eliminates third-party-bot ban risk. File size: 798 → 511 lines (~290 lines deleted net). What went away: - DEFAULT_ENDPOINT, GEMINI_CLI_HEADERS (cli-core sets these itself) - MAX_RETRIES, BASE_DELAY_MS, MAX_EMPTY_STREAM_RETRIES, EMPTY_STREAM_BASE_DELAY_MS - CLAUDE_THINKING_BETA_HEADER (was antigravity-only) - extractRetryDelay(), isRetryableError(), extractErrorMessage(), sleep() — cli-core handles 429/5xx retry with Retry-After honoured - needsClaudeThinkingBetaHeader() — antigravity-only stub - CloudCodeAssistRequest + CloudCodeAssistResponseChunk interfaces (replaced by @google/genai's GenerateContentParameters + GenerateContentResponse — already unwrapped by cli-core) - ~200-line SSE body-reader block (response.body.getReader() + decoder + 'data:' line parsing) — cli-core yields parsed objects directly - Empty-stream retry workaround — handled upstream now What stayed (pure SF adapter code): - convertMessages() → @google/genai Content[] - convertTools() → functionDeclarations - AssistantMessageEventStream — our event shape - Part-by-part processing: text vs thinking blocks, function-call translation to ToolCall, thoughtSignature retention, usage token extraction New helper: - buildCodeAssistServer(token, projectId) constructs an OAuth2Client (google-auth-library) seeded with the SF-cached access token and wraps it in a CodeAssistServer instance. Ready for future promotion to cli-core's getOauthClient() for full auto-refresh; today we still pass the token through from SF's auth storage (Strategy A from the plan doc). Live verified end-to-end against gemini-2.5-flash using the user's cached ~/.gemini/oauth_creds.json — got real streaming response, correct stopReason, usage tokens accounted. Models registry test updated from 23 → 22 providers (antigravity gone). Remaining 4 pi-ai test failures are pre-existing and unrelated (custom-zai glm-5.1, resolveAnthropicBaseUrl #4140). Type note: cli-core bundles its own nested copy of @google/genai, so TypeScript sees two structurally-identical Content types. Runtime is fine; a single `as any` cast at the generateContentStream call site handles the nominal split. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 11:29:56 +02:00
Mikael Hugo	a6320f6c29	package: pin gaxios override to ^6.7.1 (required by googleapis-common) Previous override (gaxios: 7.1.4) was set in `5c64f991b` to silence a glob@10 deprecation warning. That choice is incompatible with @google/gemini-cli-core's dependency graph: googleapis-common@7.2.0 does `require("gaxios/build/src/common")` — a deep internal path that gaxios 6.x exposed but 7.x tightened out of its exports field. Swapping to ^6.7.1 restores cli-core's runtime: a probe using the installed cli-core + the user's cached ~/.gemini/oauth_creds.json now successfully reaches https://cloudcode-pa.googleapis.com/v1internal: streamGenerateContent and gets a real response from gemini-2.5-flash. The glob deprecation the previous override fixed is cosmetic and doesn't block anything. Live cli-core functionality trumps npm warning noise. Unblocks task #3: replacing the handwritten fetch() transport in pi-ai/src/providers/google-gemini-cli.ts with CodeAssistServer calls. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 11:01:37 +02:00
Mikael Hugo	bae6553e67	pi-ai: remove google-antigravity provider entirely Continues the antigravity rip-out (previous commit covered SF + pi-coding- agent UI layer). This commit removes the code from pi-ai: - Delete packages/pi-ai/src/utils/oauth/google-antigravity.ts (313 lines) - Update oauth/index.ts: drop antigravityOAuthProvider, refreshAntigravityToken, loginAntigravity exports + registry entry. Add comment explaining why (no vendor core lib + Google ban risk). - google-gemini-cli.ts: strip ANTIGRAVITY_* constants, ANTIGRAVITY_ENDPOINT_FALLBACKS, getAntigravityHeaders(), ANTIGRAVITY_SYSTEM_INSTRUCTION, and all isAntigravity branching from streamGoogleGeminiCli + buildRequest. File header rewritten. needsClaudeThinkingBetaHeader() collapses to always-false (antigravity was the only path that needed it). - google-shared.ts: strip stale Antigravity comments (file still shared between google, google-gemini-cli, google-vertex). - types.ts: drop "google-antigravity" from Api / KnownProvider union. - models.generated.ts: remove google-antigravity provider block (~170 lines, 4 claude-* models that were only served via Antigravity). - models.generated.test.ts: drop from expected-providers snapshot. - scripts/generate-models.ts: remove antigravity model emission + context- window override so future regenerations don't re-add it. Reasoning (same as previous commit): Antigravity has no vendor-published core library we can embed. Hand-rolled OAuth against daily-cloudcode-pa.sandbox.googleapis.com was exactly the pattern Google is banning for third-party tools. Removing it eliminates the risk surface. Breaking change: users with google-antigravity configured in their models.* block will need to migrate to google-gemini-cli (OAuth via the real `gemini` CLI), google (API key), or google-vertex (GCP auth). Build passes. Next commit wires the google-gemini-cli provider to @google/gemini-cli-core per the plan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:45:44 +02:00
Mikael Hugo	59806f8cc5	rip out antigravity from SF + pi-coding-agent UI/config layer Antigravity (Google's IDE sandbox product, different from Gemini CLI) is removed from: src/onboarding.ts — drop from LLM_PROVIDER_IDS + OAuth-flow picker src/pi-migration.ts — drop from LLM_PROVIDER_IDS migration list src/web/onboarding-service.ts — drop from web-UI provider list src/tests/integration/web-onboarding-contract.test.ts — update contract src/resources/extensions/sf/doctor-providers.ts — drop from CLI_AUTH_PROVIDERS src/resources/extensions/sf/key-manager.ts — drop UI listing src/resources/extensions/sf-usage-bar/index.ts — delete entire quota fetcher block (~200 lines) packages/pi-coding-agent/src/cli/args.ts — drop PI_AI_ANTIGRAVITY_VERSION doc packages/pi-coding-agent/src/utils/proxy-server.ts — drop from claude provider chain Reason: antigravity has no vendor-published core library we can embed (unlike @google/gemini-cli-core for the Gemini CLI). Continuing to hand-roll OAuth against daily-cloudcode-pa.sandbox.googleapis.com is exactly the pattern Google has started banning for third-party tools. Removing the code removes the ban risk. pi-ai provider code, OAuth util, and models.generated entries for google-antigravity are removed in follow-up commits (separated for reviewability — each layer verified independently). Build passes. Note: this is a breaking change for any user who had google-antigravity configured — they'll need to migrate to google-gemini-cli (OAuth), google (API key), or google-vertex. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:39:36 +02:00
Mikael Hugo	233432d486	model-registry: drop google-antigravity from claude family_failover (preparing rip-out)	2026-04-19 10:35:56 +02:00
Mikael Hugo	eed84a2624	pi-ai: add @google/gemini-cli-core@0.38.2 dependency + refactor plan Installs Google's official core library that powers the `gemini` CLI binary. This is the first step of re-platforming pi-ai's `google-gemini-cli` provider to use cli-core's transport instead of handwritten fetch() calls against cloudcode-pa.googleapis.com. Why: - cli-core requests are byte-for-byte identical to the official gemini CLI — preserves Google's subsidised free-OAuth quota and eliminates bot-detection drift risk from our reverse-engineered User-Agent / Client-Metadata headers. - Auto-inherit upstream improvements (new tool formats, grounding, session caching, quota displays) on `npm update`. - The `genai-proxy` extension (localhost proxy for gemini-cli-format clients) becomes "the CLI, but programmable" — same upstream behavior, hookable SF routing underneath. Auth model (unchanged for users): - User runs the real `gemini` CLI once to OAuth; credentials land in ~/.gemini/oauth_creds.json (or keychain on newer installs). - SF reads those credentials via cli-core's own storage helpers; no SF-side OAuth flow, no separate login. Scope for this commit: dependency only. The transport refactor (replacing the fetch() calls in google-gemini-cli.ts with CodeAssistServer.generateContentStream()) is queued as the next task and documented in google-gemini-cli-core-plan.md with a detailed API map, two integration strategies (transport-only vs full cli-core auth), and a step-by-step implementation checklist. Note: this commit adds 66 transitive deps to pi-ai (ajv, zod, glob, mime, open, etc.). google-antigravity provider stays on handwritten code — different sandbox endpoints, different auth contract, not in cli-core's scope. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:33:22 +02:00
Mikael Hugo	ffe86284d2	model-registry: split direct vs family_failover providers per model family Prior PROXY_FAMILY_PRIORITY table conflated "direct provider" with "failover provider that happens to serve this family". Observed case: claude-* family listed anthropic, google-antigravity, and github-copilot all as "providers" — but only anthropic is the direct vendor. google-antigravity re-serves Claude via Google's sandbox IDE product (same endpoint as gemini-cli, different auth contract); github-copilot re-serves via GitHub's paid platform. This matters for the 429 fallback chain: a broken anthropic key should try genuinely-vendored endpoints first (none, for Claude), then fall into family_failover (antigravity, copilot), and only then reach the generic GLOBAL_PROVIDER_FALLBACK (opencode, opencode-go, openrouter, ollama-cloud). The old all-flat list hid this distinction. New shape: { providers: [...], family_failover?: [...] } Corrections applied: claude-: providers=[anthropic], failover=[google-antigravity, github-copilot] gemini-: providers=[google-gemini-cli, google, google-vertex], failover=[github-copilot] gpt-* / o* / codex-: providers=[openai], failover=[azure-openai-responses, openai-codex, github-copilot] mimo-: providers=[xiaomi] (new: was [] — Xiaomi MiMo Open Platform is direct API at api.xiaomimimo.com / token-plan-sgp.xiaomimimo.com) buildCandidateOrder stitches [direct, family_failover, global_fallback] with deduplication. User overrides via settings.proxy.providerPriority continue to replace only the direct-provider list, keeping family failover and global fallback intact. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:20:32 +02:00
Mikael Hugo	0f0dcbf8c7	benchmarks: add Gemini 2.5/3/3.1 Pro + Flash entries Gemini had zero benchmark entries in model-benchmarks.json despite being served by google-gemini-cli (OAuth provider, SF native), google (API key), google-vertex, google-antigravity, openrouter, etc. Every gemini-* model in the pi-ai catalog scored 0 in the benchmark selector — effectively excluded from auto-selection even when allow-listed. Published numbers from DeepMind model cards + Vellum LLM leaderboard + Vals AI: gemini-3-pro-preview: SWE-Verified 76.2, HLE 37.5, AIME25 95, GPQA-D 91.9, MMLU-Pro 81.0 gemini-3.1-pro-preview: SWE-Verified 78, HLE 41, AIME 97, GPQA-D 93, MMLU-Pro 83 (Feb 2026) gemini-3-flash-preview: estimated from Pro-vs-Flash delta gemini-2.5-pro: SWE-Verified 63.8, HLE 18.8, GPQA-D 84.0, MMLU-Pro 86 gemini-2.5-flash: estimated from Pro-vs-Flash delta Context windows reflect Gemini's 1M-2M token capability. LiveCodeBench Pro Elo (2439 for Gemini 3 Pro) isn't in the 0-100 percent schema — skipped rather than forced. Future: add arena_elo- style LCB Elo dimension to the schema if we start routing on it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:11:45 +02:00
Mikael Hugo	e413cf4a3f	preferences: add provider_preference for benchmark tie-breaking When two models score identically in the benchmark selector — typically the same underlying weights served by different endpoints — the previous alphabetical tiebreaker picked wrong. dr-repo example: zai/glm-5.1 score 84.7 opencode-go/glm-5.1 score 84.7 Both are the exact same GLM-5.1 weights. Alphabetical comparison made opencode-go win ("o" < "z") even though zai is the NATIVE provider. Fix: new `provider_preference` pref, an ordered list of providers. Listed providers rank in order, unlisted fall after alphabetically. Applied as the tie-breaker between score and alphabetical. Global default shipped in ~/.sf/preferences.md: kimi-coding, minimax, zai, mistral, ollama-cloud, opencode-go, opencode Native providers ranked before re-servers. Users can override per project. Verified: after the change, dr-repo picks zai/glm-5.1 as primary for execute-task and gate-evaluate (was opencode-go/glm-5.1), and kimi-coding/k2p5 stays primary for completion phases with its direct provider winning over opencode re-servers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:09:42 +02:00
Mikael Hugo	345f9586dd	benchmark-selector: coverage-confidence multiplier + 12 regression tests The original "normalise by populated weight" was too aggressive: a model with 1 strong dimension (delta-fast: human_eval=92) outranked a model with 4 strong dimensions (beta-coder: swe_bench=85, lcb=90, he=95, ifeval=90) because both normalised to their own small average. Fix: multiply normalised score by a confidence factor tied to how much of the unit's profile the model actually populated. Confidence = populated_weight / total_profile_weight, blended 50/50 with a flat floor so sparse-but-strong specialists still rank when no generalist covers the profile: score = (weighted_sum / weight_total) * (0.5 + 0.5 * confidence) Net effect on dr-repo's auto-resolve: Before: After: plan-milestone glm-5.1 plan-milestone MiniMax-M2.5 research-slice codestral research-slice mistral-large-2411 execute-task mistral-large execute-task opencode-go/glm-5.1 validate-m magistral validate-m MiniMax-M2.5 subagent mistral-large subagent kimi-coding/k2p5 MiniMax's broad coverage (8 populated dimensions from the M2 README) now correctly outranks GLM-5.1's higher but narrower scores for reasoning-heavy units. Matches user intuition that "MiniMax is really powerful". Also fixes findBenchmarkKey to try "<modelId>-latest" for date-suffixed model variants — pi-ai catalogs "devstral-medium-2507" but benchmarks only have "devstral-medium-latest"; matcher now bridges that. 12 regression tests cover: - empty candidate pool - each profile (reasoning/coding/lightweight) picks right champion - swe_bench ↔ swe_bench_verified equivalence - models with all-null benchmarks score 0 but stay in fallbacks - sparse-strong beats dense-weak (confirms confidence multiplier doesn't over-penalise specialists) - provider diversification in fallback chain - deterministic tie-breaking - unknown unit types use default coding profile - date-suffixed model IDs match family-latest keys Audit: 41 of 85 allow-listed models in pi-ai catalog have benchmark data. 44 score 0 (mostly opencode Zen re-served models, ministral small variants, pixtral vision models, legacy open-mistral). Top picks for every dr-repo unit type DO have benchmark data — the gap is in the long tail of fallbacks, which never matter unless the primary and closer fallbacks all fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:58:10 +02:00
Mikael Hugo	0b8a1c246f	auto-benchmark model selection: pick best-scoring per unit type New module src/resources/extensions/sf/benchmark-selector.ts implements benchmark-driven model selection. When models.<unit> is not pinned, preferences-models.ts falls through to pick the highest-scoring candidate from allowed_providers × pi-ai's model catalog, ranked against a per-unit-type weight profile. Weight profiles per unit type: plan-milestone / plan-slice → agent-planning (swe_bench .25, lcb .20, hle .15, gpqa .15, mmlu_pro .15, aime .10) research-* → mixed (mmlu_pro, hle, human_eval, browse_comp, simple_qa, gpqa) execute-task → coding (swe_bench .35, swe_bench_v .25, lcb .20, human_eval .15) execution_simple / complete-* → fast+correct (human_eval .40, instruction_following .35, ruler .25) gate-evaluate → review (swe_bench .30, hle .25, gpqa .25, ifeval .20) validate-milestone → validation (hle .30, gpqa .25, mmlu_pro .25, swe_bench .20) Key design decisions: - Missing dimensions are dropped (normalised by populated weight), so a model with 2 strong populated scores isn't crushed by a peer with 5 mediocre ones. - swe_bench ↔ swe_bench_verified are fungible — some vendors publish one, some the other; treat as equivalent. - Provider diversification in fallbacks so one provider going 429 doesn't kill the whole chain. - Score ties broken by coverage, then lexical — deterministic. Also updates MiniMax-M2/M2.5/M2.7 benchmarks with real numbers from the M2 official README (DeepWiki sourced) and MiniMax-M2.5 card (minimax.io): swe_bench_verified 69.4→80.2, LCB 83, HLE 31.8 (w/ tools — more representative for agent work than no-tools 12.5), AIME25 78, GPQA-D 78, MMLU-Pro 82. Context windows bumped to weights-level: M2 400K, M2.5/M2.7 1M (endpoints may cap lower). Verified end-to-end: with dr-repo's allow-list (kimi-coding/minimax/zai/opencode-go/mistral) and models.* absent, resolveModelWithFallbacksForUnit() returns: plan-milestone → opencode-go/glm-5.1 (+3 fallbacks) research-slice → mistral/codestral-latest execute-task → mistral/mistral-large-latest execution_simple → kimi-coding/k2p5 gate-evaluate → opencode-go/glm-5.1 validate-milestone → mistral/magistral-medium-latest subagent → mistral/mistral-large-latest Users can still pin individual units (existing models.* behaviour unchanged) or rely fully on auto-selection by omitting them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:43:26 +02:00
Mikael Hugo	6450b37025	core + search + benchmarks: auth-error recovery, multi-provider search, M2.7-highspeed entry Four related improvements that landed in the working tree after the auto-hardening merge but hadn't been committed: 1. auth_error as a distinct error type (auth-storage + retry-handler). Previously invalid/expired API keys would retry the same failing credential until the retry budget exhausted. Now: - classifyErrorType() recognizes 401s, "invalid api key", "authentication error", "unauthorized" etc as "auth_error" - RetryHandler triggers cross-provider fallback on auth_error just like it does for rate_limit and quota_exhausted — switch providers rather than burning retries on a broken key Outcome: a stale OPENCODE_API_KEY in sops now fails over to kimi or minimax immediately instead of stalling the unit. 2. Multi-provider search-key detection (native-search.ts). The "Web search: Set BRAVE_API_KEY" warning fired whenever a non-Anthropic model lacked BRAVE_API_KEY, even when the user had TAVILY_API_KEY or OLLAMA_API_KEY available. Now: the warning suppresses if any of BRAVE/TAVILY/OLLAMA keys is present, and the warning text lists all three options. Matches the preferences- validation allow-list for search_provider. 3. MiniMax-M2.7-highspeed benchmark entry (model-benchmarks.json). Routes the fast-tier variant of M2.7 through the Bayesian blender with inherited RULER scores. Lets dynamic routing consider the highspeed model when speed matters more than peak quality. No regressions: the 41 pre-existing test failures in pi-coding-agent (FallbackResolver chain-membership + LSP integration) are unchanged relative to the prior commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:24:54 +02:00
Mikael Hugo	a4428ba1ff	key-manager: surface opencode-go in LLM provider list for onboarding opencode-go is already a first-class provider in pi-ai (models.generated.js registers 7 models under the opencode-go namespace: glm-5, glm-5.1, kimi-k2.5, mimo-v2-{omni,pro}, minimax-m2.{5,7}) and runs against https://opencode.ai/zen/go/v1 with OPENCODE_API_KEY auth. It was missing from key-manager's LLM provider registry, so the /sf config wizard and onboarding flows didn't prompt users to supply OPENCODE_API_KEY. Adding it here gives users a discoverable path to subscribe and surface the 7 opencode-go models in list-models. Research confirmed (DeepWiki sst/opencode + curl probes): - /zen/go/v1/chat/completions is the OpenAI-compatible endpoint - OPENCODE_API_KEY is the correct env var - No /models listing endpoint — hardcoding is correct (already done by the generate-models.ts pipeline) - Sister /zen/go/v1/messages serves Anthropic-compat minimax variants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:22:48 +02:00

1 2 3 4 5 ...

3602 commits