singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	a8cf2cd941	feat(workflow): add product-audit (slim port) Milestone-end workflow that compares declared product intent (VISION.md, RUNBOOKS.md, etc.) against actual code/test/deploy/docs evidence and emits structured gaps with severity. Soft gates — adds follow-up slices but doesn't hard-block merge. Slim port (4 new files + 1 registration) — extracts only the audit feature itself, not bunker's parallel rewrite of dispatch/prompts/ benchmark-selector that came with it in commit 2aa785475. Created: - prompts/product-audit.md — prompt verbatim, gsd_→sf_ and .gsd→.sf - tools/product-audit-tool.ts — slim file-write implementation, atomicWriteAsync to .sf/active/{mid}/ PRODUCT-AUDIT.{json,md}; no DB deps - bootstrap/product-audit-tool.ts — pi-coding-agent tool registration, TypeBox schema for sf_product_audit - workflow-templates/product-audit.md — workflow template Modified: - bootstrap/register-extension.ts — 2 lines: import + add to nonCriticalRegistrations - workflow-templates/registry.json — registry entry - package.json — version 2.75.0 → 2.75.1 Verdict logic (no-gaps \| gaps-found \| contract-underspecified) is the load-bearing innovation: contract-underspecified forces the auditor to flag unverifiable docs as a real gap rather than rubber-stamping no-gaps when the product contract is silent. Out of scope: phase enum changes, dispatch hookup. Wire-up to the phase machine is a follow-up; the prompt + tool + template stand alone. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:55:23 +02:00
Mikael Hugo	2eebeccb93	feat(search): add MiniMax web search provider New search backend alongside tavily/brave/serper/exa/ollama. API key resolution: MINIMAX_CODE_PLAN_KEY → MINIMAX_CODING_API_KEY → MINIMAX_API_KEY (fallback order matches MiniMax's documented aliases). Wired through every existing seam: - type union: SearchProvider = 'tavily' \| 'minimax' \| 'brave' \| 'ollama' - VALID_PREFERENCES set + selection logic in provider.ts - native-search routing (Anthropic native web_search delegates correctly) - /search-provider CLI command (tab completion, select UI, parser) - tool-search.ts: search execution path - tool-llm-context.ts: prefetch / context-builder path - preferences-types + preferences-validation - configuration.md user docs - extension-manifest description Tests not added in this commit — the bunker reference tests don't match our preferences/provider export shape (we have serper/exa/combosearch that bunker doesn't). Tests for getMiniMaxSearchApiKey priority order, resolveSearchProvider returning "minimax", /search-provider minimax CLI behavior, no-key error messages, and executeMiniMaxSearch request shape are TODO. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:55:04 +02:00
Mikael Hugo	dff0df5fdc	fix(headless): suppress notification spam, categorize messages, distinguish phase vs status Three small UX fixes for headless / autopilot logs: 1. Add `zz-notifications` to TUI_FOOTER_STATUS_KEYS — these are sticky notification dots from the interactive TUI footer; they have no meaning in headless and were spamming the log. 2. Categorize notification messages by prefix so headless output is scannable: [mcp] for MCP-client-ready, [search] for web search status, [parallel] for slice-parallel/subagent dispatch. Falls through to the existing important/non-important formatting for everything else. 3. Distinguish phase transitions from generic status updates: phase:/ milestone:/slice:/task: prefixed keys get [phase]; everything else gets [status]. Previously both used [phase], which was misleading. Patterns based on bunker commits 14ec4d97f / c15afb45f (which were the research source) but written fresh against our existing TUI_FOOTER_STATUS_KEYS structure rather than cherry-picked. The assistant-text-preview commit (cf0274c63) is a separate, larger refactor in headless.ts and is deferred to v3.1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:43:40 +02:00
Mikael Hugo	c41912ff55	fix(prompts): tell agents about Serena (repo-intelligence MCP) for code exploration We have .serena/ configured (cache, memories, project.local.yml) but no prompt mentioned Serena anywhere. Agents weren't using it for symbol lookup or cross-file architecture mapping; they fell straight to rg/find. Added a one-sentence Serena hint to the code-exploration step in: - research-slice.md - research-milestone.md - plan-slice.md - plan-milestone.md - guided-research-slice.md Phrased generically ("If a repo-intelligence MCP (e.g. Serena) is configured...") so it degrades cleanly when Serena isn't set up. Pattern based on bunker commit 4ba746888 but written fresh against our post-rename prompt structure rather than cherry-picked. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:41:33 +02:00
Mikael Hugo	b24f426f2b	batch: snapshot of in-flight v2 work This commit captures uncommitted modifications that accumulated in the working tree across multiple in-progress workstreams. It is a snapshot to clear the deck before sf v3 work begins; individual workstreams should land separately on top of this. Notable additions: - trace-collector.ts, traces.ts, src/tests/trace-export.test.ts — trace export plumbing - biome.json — Biome linter configuration - .gitignore — exclude native/npm/*/.node compiled binaries The bulk of the diff is across src/resources/extensions/sf/ (301 files) and src/resources/extensions/sf/tests/ (277 files), reflecting the ongoing sf extension work. Specific feature commits should follow this snapshot rather than being archaeology'd out of it. The 76MB native/npm/linux-x64-gnu/forge_engine.node compiled binary was left out of the commit — it's now gitignored and built locally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 12:42:31 +02:00
Mikael Hugo	6eaf5926ad	sf snapshot: uncommitted changes after 248m inactivity	2026-04-28 21:10:17 +02:00
Mikael Hugo	d30d91bf2f	sf snapshot: uncommitted changes after 41m inactivity	2026-04-28 17:01:26 +02:00
Mikael Hugo	5d3c204006	fix(git-merge): no auto-flip from approved to declined; cached approval is sticky Codex-rescue output (a299c461 / bnr88iy59) — the 'Git merge approved once' followed seconds later by 'Git merge declined by user' bug we hit on M002 complete-milestone. Same gate, same agent run, opposite verdicts. Single source of truth for the merge-gate state in guardrails/index.ts. Approval is now sticky — re-asks return the cached approval until consumed or explicitly revoked, never auto-flip to decline. Timeout converts to pause+log instead of decline. Adds tests/safe-git-merge-gate.test.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: OpenAI Codex <noreply@openai.com>	2026-04-28 16:20:08 +02:00
Mikael Hugo	d38e5ea092	fix(schema): auto-coerce string → [string] for sf_* list fields + provider_model_allow tests Two codex-rescue tasks landed together: 1. Auto-coerce JSON-schema validator: when a tool field declares {type:"array", items:{type:"string"}} and the model sends a single string, wrap it in [string] before validation instead of hard-rejecting. Fixes the recurring "keyDecisions: must be array" rejection on sf_complete_task that wasted retries. 2. Provider_model_allow filter (proper implementation with helpers): - resolveProviderModelAllowList / isProviderModelAllowed / filterModelsByProviderModelAllow helpers in preferences-models - Wired into model-registry and auto-model-selection - New tests/provider-model-allow.test.ts Tools coerced: sf_complete_task, sf_complete_milestone, sf_plan_milestone, sf_plan_slice, sf_replan_slice, sf_reassess_roadmap (key list fields). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: OpenAI Codex <noreply@openai.com>	2026-04-28 12:30:55 +02:00
Mikael Hugo	f98a1e360e	batch: codex-rescue session output (multiple in-flight tasks) Combined output of multiple parallel codex-rescue runs that produced working-tree edits but didn't commit. Tasks contributing: - prefs: per-provider model allow-list (provider_model_allow) — manual - TUI scroll + unresponsive (a7884d1a / bt3fpn4y2) - planningMeeting required (aa09e904 / br127l763) - Logs UX 4-pack (a5c65314 / btcplhu7f) - Gate auto-resolve + completion nudge (ae4c8b64 / bw1w1fjkp) - sf_task_complete atomic + retry (a7a079b4 / b20cy5owv) - Multi-model meeting + minimax M2.7 + draft promotion (a756faac / task-moifjknd-lwjc98) - Per-role slice prompts (a94c3e1a) - Per-role vision-meeting prompts (afd165a0 / task-moifple5-lcwtjl) - Schema sweep (ac994b1e / task-moifq7pu-83coqz) - Flow audit (ad26ecfd / bttj4vrqm) Typecheck passes. Tests not run as a full suite — spot-check after merge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: OpenAI Codex <noreply@openai.com>	2026-04-28 11:52:42 +02:00
Mikael Hugo	66ff949c11	cherry-pick(security): harden project-controlled surfaces (PR #4755 partial) Cherry-pick of gsd-build/gsd-2 65ca5aa2e — applies the security hardening hunks that conflicted minimally: - mcp-server/env-writer: validate writes against a strict allowlist - web/api/files: enforce path containment via web/lib/secure-path - vscode-extension: read binaryPath/autoStart only from trusted global/default scopes (resolveTrustedSfStartupConfig), avoiding workspace-controlled override (renamed Gsd → Sf for sf naming) - New regression tests: mcp-client-security, vscode-startup-security, web-files-symlink Skipped hunks (drifted): mcp-server/server.ts, mcp-client/index.ts, mcp-server/README.md. Co-Authored-By: Jeremy <jeremy@fluxlabs.net> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:37:07 +02:00
Mikael Hugo	bf727173e7	cherry-pick(file-lock): make file-lock actually lock and throw on contention Cherry-pick of gsd-build/gsd-2 a09e01640 — withFileLockSync now actually acquires a proper-lockfile (was previously a no-op when proper-lockfile wasn't required) and throws on ELOCKED contention by default. Adds onLocked: 'skip' option for best-effort callers that tolerate dropped entries (audit, journal). Modernizes import style (createRequire/join from imports rather than ad-hoc require). Path-renames preserved (gsd-pi → sf-run). Co-Authored-By: Jeremy <jeremy@fluxlabs.net> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:28:36 +02:00
Mikael Hugo	22d4579690	cherry-pick(state): lock-wrapped appends for journal, audit, workflow-logger Cherry-pick of gsd-build/gsd-2 53babec29 — lock-wrapped append half. Wraps appends to .sf/journal/, .sf/audit/events.jsonl, and the workflow-logger error log in withFileLockSync (onLocked: skip), preserving best-effort semantics while preventing torn writes under contention. Companion to the atomic-write half landed in `3df56cb94`. Path-renames (gsdRoot→sfRoot, gsd-db→sf-db) preserved during conflict resolution. Co-Authored-By: Jeremy <jeremy@fluxlabs.net> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:27:44 +02:00
Mikael Hugo	f1f4b840e1	cherry-pick(doctor): self-heal symlinked .sf staging to prevent silent data loss Cherry-pick of gsd-build/gsd-2 9340f1e9b (#4423) — doctor self-heal detection for symlinked staging directories that can cause silent data loss. Skips native-git-bridge.ts and git-service test (drifted). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:25:56 +02:00
Mikael Hugo	7fd4672e55	cherry-pick(auto): handle worktree context fallback + sanitize paused session paths Cherry-pick of gsd-build/gsd-2 a4f78731f — handles worktree context fallback and sanitizes paths in paused session resumption. Skips uok-plan-v2-wiring test hunk (drifted in sf). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:25:40 +02:00
Mikael Hugo	93402643f4	cherry-pick(sf-db): tolerate corrupt task arrays in milestone rows Cherry-pick of gsd-build/gsd-2 851507913 (#4056) — defensive parsing so a corrupt or non-array tasks blob in a milestone row doesn't crash sf-db reads. Test hunk skipped (sf-db.test.ts has drifted). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:25:21 +02:00
Mikael Hugo	3df56cb94f	cherry-pick(state): atomic-writes for guided-flow-queue and reports Cherry-pick of gsd-build/gsd-2 53babec29 (Jeremy <jeremy@fluxlabs.net>) — atomic-write half only. Eliminates torn-write risk on PROJECT.md queue sync and reports.json/HTML index regeneration by switching writeFileSync → atomicWriteSync (tmp+rename). The companion lock-wrapped-append changes (journal.ts, uok/audit.ts, workflow-logger.ts) are deferred — they need proper-lockfile + withFileLockSync helper introduced first. Co-Authored-By: Jeremy <jeremy@fluxlabs.net> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:16:39 +02:00
Mikael Hugo	8e827147c9	feat(code-intelligence): add sift indexer backend alongside project-rag Generalize the code-intelligence hook to support multiple indexer backends, with sift (rupurt/sift) as a new option next to the existing project-rag MCP server. Backend is selected via CodebaseMapPreferences. - code-intelligence.ts: new abstraction + sift backend (detect, resolve, status, context-block contribution) - preferences-types.ts: codebaseIndexer field (project-rag \| sift \| none) - preferences-validation.ts: validate the new field - bootstrap/system-context.ts, commands-codebase.ts: dispatch on backend - tests/code-intelligence.test.ts: sift detection/resolution/status tests (19 pass, 0 fail) project-rag path unchanged and continues to work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 05:05:26 +02:00
Mikael Hugo	0606983d97	feat(subagent): add background job manager and tests SubagentBackgroundJobManager tracks long-running subagent jobs with status, abort support, and TTL-based eviction of completed results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 04:18:17 +02:00
Mikael Hugo	efd5e14e0a	feat: add FEATURES.md capability map and generator Human-oriented documentation of SF capabilities, with a script that keeps it in sync with workflow-tools.ts and extension manifests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-28 04:18:12 +02:00
Mikael Hugo	0d286b991b	sf snapshot: pre-dispatch, uncommitted changes after 2902m inactivity	2026-04-27 23:42:51 +02:00
Mikael Hugo	f0da5b6d21	fix: bind getProviderAuthMode to registry instance to avoid undefined 'this' Extracting a class method as a bare reference loses its 'this' context, causing 'Cannot read properties of undefined' when minimax (or any provider) triggers the flat-rate auth-mode lookup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 19:22:39 +02:00
Mikael Hugo	7289933909	fix: populate memoriesSection in execute-task prompt and fix stale dist buildExecuteTaskPrompt was not passing memoriesSection to loadPrompt, causing headless auto to fail with a template variable error. Also updated plan-slice-prompt.test.ts to supply the four template variables (memoriesSection, runtimeContext, phaseAnchorSection, gatesToClose) that were missing from the test fixture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 18:46:55 +02:00
Mikael Hugo	a30a7692e3	fix: dist-redirect.mjs incorrectly rewrites .js→.ts for node_modules paths containing /src/ The resolver guarded on context.parentURL.includes('/src/') to identify in-repo source files, but @google/gemini-cli-core installs to node_modules/@google/gemini-cli-core/dist/src/ which also contains '/src/'. Relative imports from that dist package (e.g. './config/config.js') were incorrectly rewritten to './config/config.ts', causing ERR_MODULE_NOT_FOUND on every test that transitively imports the google-gemini provider. Fix: add !context.parentURL.includes('/node_modules/') guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 18:04:23 +02:00
Mikael Hugo	2e32c96fa0	Port gsd2 functional parity: turn-epoch, abandon-detect, reapplyThinking, exec chain, memory chain, onboarding-state - auto/turn-epoch.ts: AsyncLocalStorage-backed stale-write dropping for timeout recovery - journal.ts: isStaleWrite() guard drops superseded turn writes - auto/run-unit.ts: wrap agent_end Promise.race in runWithTurnGeneration - auto/session.ts: ThinkingLevelSnapshot type + autoModeStartThinkingLevel/originalThinkingLevel fields - auto-model-selection.ts: reapplyThinkingLevel() called after every successful setModel() - auto/phases.ts: pass autoModeStartThinkingLevel to selectAndApplyModel + hook override restore - abandon-detect.ts: two-signal milestone abandon detection in rewrite-docs overrides - auto-post-unit.ts: use detectAbandonMilestone + parkMilestone in rewrite-docs handler - preferences-types.ts: ContextModeConfig + isContextModeEnabled - exec-sandbox.ts: sandboxed bash/node/python subprocess with .sf/exec/ persistence - exec-history.ts: read-side scan of .sf/exec/*.meta.json - compaction-snapshot.ts: ≤2 KB markdown digest written before context compaction - tools/exec-tool.ts: sf_exec MCP tool executor - tools/exec-search-tool.ts: sf_exec_search MCP tool executor - tools/resume-tool.ts: sf_resume MCP tool executor - bootstrap/exec-tools.ts: registers sf_exec/sf_exec_search/sf_resume - memory-relations.ts: knowledge-graph edges between memories (traverseGraph) - tools/memory-tools.ts: capture_thought/memory_query/sf_graph executors - bootstrap/memory-tools.ts: registers capture_thought/memory_query/sf_graph - bootstrap/register-extension.ts: wire exec-tools + memory-tools into registration - onboarding-state.ts: onboarding completion record at ~/.sf/agent/onboarding.json Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:58:39 +02:00
Mikael Hugo	5887ea3fd1	port gsd2: blocked-models gate, milestone-summary classifier, unsupported-model recovery blocked-models.ts (new): Persistent per-project blocklist at .sf/runtime/blocked-models.json. loadBlockedModels / isModelBlocked / blockModel (file-lock-safe write). milestone-summary-classifier.ts (new): classifyMilestoneSummaryContent → "success" \| "failure" \| "unknown". isTerminalMilestoneSummaryContent: failure summaries are NOT terminal — lets auto-mode re-enter a milestone after a failed recovery summary. state.ts: Phase 1 (completeMilestoneIds) and Phase 2 (registry) now check isTerminalMilestoneSummaryContent before treating a SUMMARY as complete. A failure SUMMARY no longer prematurely parks a milestone. error-classifier.ts: Add "unsupported-model" ErrorClass kind with regex detection (model + not-supported/unavailable/no-access + account/plan/tier). Checked before "permanent" so /account/i in PERMANENT_RE doesn't swallow it. auto-model-selection.ts: Wire isModelBlocked() gate in selectAndApplyModel candidate loop: skips provider-rejected models and continues to fallbacks. bootstrap/agent-end-recovery.ts: Handle cls.kind === "unsupported-model": blockModel(), try fallback chain skipping already-blocked models, pause if no usable fallback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:13:27 +02:00
Mikael Hugo	6cb6de4fd2	perf: parallelize I/O, add runtime cache, extend nix devenv - unit-context-composer: resolve artifact keys in parallel (Promise.all) - unit-runtime: add in-memory cache to avoid repeated disk reads per dispatch - auto-timers: share 15s idle watchdog tick with context-pressure check - auto-prompts: 1s TTL budget cache to coalesce repeated loadEffectiveSFPreferences calls - native-git-bridge: extend nativeHasChanges TTL 10s→30s - auto-dashboard: remove pulsing dot animation (CPU churn, no UX value) - flake.nix: add nodePackages.typescript to dev shell Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 10:12:32 +02:00
Mikael Hugo	12aabd863e	port gsd2 #4769 : worktree telemetry, slice-cadence, canonical-root fix + /sf scan Ports commit 7fb35ca58 from gsd2 (PR #4769) covering four issues: #4761 — resolveCanonicalMilestoneRoot in worktree-manager.ts routes validate-milestone through the live worktree path instead of stale project-root state when a milestone worktree is active. #4762 — auditOrphanedMilestoneBranches in auto-start.ts now surfaces in-progress milestone branches with unmerged commits ahead of main (previously only complete milestones were audited). Gated on isClosedStatus so parked/other closed statuses are unaffected. #4764 — worktree-telemetry.ts: typed emit helpers (emitWorktreeCreated, emitWorktreeMerged, emitWorktreeOrphaned, emitAutoExit, emitWorktreeSync, emitCanonicalRootRedirect, emitSliceMerged, emitMilestoneResquash) plus summarizeWorktreeTelemetry aggregator and nearest-rank percentile(). Wired in: worktree-resolver.ts (create/merge events), auto-start.ts (orphan telemetry), auto.ts stopAuto (auto-exit with normalized reason), worktree-manager.ts (canonical-root-redirect). Surfaced in forensics.ts via detectWorktreeOrphans and Worktree Telemetry sections. #4765 — slice-cadence.ts: mergeSliceToMain squash-merges each slice's commits onto main as soon as the slice passes validation (opt-in via git.collapse_cadence: "slice"). resquashMilestoneOnMain collapses N per-slice commits into one milestone commit at completion. Wired in auto-post-unit.ts (slice merge after complete-slice with stopAuto on conflict/error) and worktree-resolver.ts (resquash at mergeAndExit). AutoSession.milestoneStartShas tracks the pre-first-slice SHA. GitPreferences and preferences-validation.ts extended with collapse_cadence and milestone_resquash fields. Also ports /sf scan command: commands-scan.ts with parseScanArgs, resolveScanDocuments, buildScanOutputPaths, and handleScan dispatching a focused codebase assessment prompt to .sf/codebase/. journal.ts: 9 new JournalEventType values for the telemetry events. All changes are additive; default behavior (cadence="milestone") unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 09:03:56 +02:00
Mikael Hugo	2911d3b93d	port gsd2: reassess-roadmap opt-in (ADR-003 §4) + prefer toolDefinition.label reassess-roadmap: flip default from true → false. Most reassess units conclude "roadmap is fine" burning a session for no change; the plan-slice prompt now carries a JIT preamble at zero cost. (#4778) tool-execution: always prefer toolDefinition.label when non-empty, even when label === name — allows tools to display their canonical name explicitly. (#4758) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:33:50 +02:00
Mikael Hugo	d4cdcb582d	port gsd2 #3338 : ecosystem plugin loader for .sf/extensions/ Adds support for project-local SF extension plugins dropped in .sf/extensions/. Trust-gated (requires pi trust), symlink-escape safe. - ecosystem/sf-extension-api.ts: SFExtensionAPI wrapper exposing getPhase() and getActiveUnit() to third-party handlers; updateSnapshot refreshes state before_agent_start so handlers see current phase/unit - ecosystem/loader.ts: discovers .sf/extensions/*.js, loads them via dynamic import, dispatches factory(api) for each - register-extension.ts: initializes ecosystemHandlers array, wires loader - register-hooks.ts: before_agent_start refreshes snapshot then dispatches ecosystem handlers before returning SF system prompt - types.ts: SFActiveUnit interface (milestoneId/sliceId/taskId + titles) - workflow-logger.ts: "ecosystem" added to LogComponent union Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:27:55 +02:00
Mikael Hugo	6c36d62f35	port gsd2 #4961 : stop using active-tool snapshot as model-policy gate Fixes a bug where per-unit tool narrowing poisoned the policy gate for subsequent units, causing "Model policy denied dispatch before prompt send" errors on complete-slice and discuss-milestone (100% Win repro). Four-part port from gsd2@817031b2a: - ModelPolicyDispatchBlockedError class with per-model deny reasons - TOOL_BASELINE WeakMap + clearToolBaseline/restoreToolBaseline lifecycle - auto-model-selection: use getRequiredWorkflowToolsForAutoUnit as requiredTools - auto/loop: catch ModelPolicyDispatchBlockedError as non-retryable (pause) - auto.ts: wire clearToolBaseline at startAuto (fresh only) and stopAuto Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:15:04 +02:00
Mikael Hugo	4fdd8700a3	port gsd2 upstream features: scope classifier, composer v2, GPT-5.5, test timeout - milestone-scope-classifier: add getMilestonePipelineVariant + milestoneRowToScopeInput wired into auto-dispatch trivial-skip for research/validation phases (#4781) - auto-prompts: rename GSD→SF identifiers, add isSummaryCleanForSkip, prefs param on checkNeedsReassessment, buildExtractionStepsBlock from commands-extract-learnings - unit-context-manifest + unit-context-composer: port v2 typed computed artifacts (#4924) - skill-manifest: per-unit-type skill filter resolver (#4788, #4792) - escalation: stub for ADR-011 mid-execution escalation (full port deferred) - auto-start: extract decideSurvivorAction for testability (#4832) - models: add gpt-5.5 + gpt-5.4-mini to cost table, router, and models.generated.ts - types: EscalationArtifact, context_window_override, skip_clean_reassess, mid_execution_escalation, sketch_scope on SliceRow - tool-execution: add visibleWidth import (was undefined) - package.json: add --test-timeout=30000 to prevent parallel tests from freezing machine Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 08:08:11 +02:00
Mikael Hugo	e2147c0694	sf snapshot: pre-dispatch, uncommitted changes after 43m inactivity	2026-04-25 06:34:49 +02:00
Mikael Hugo	7b6c9dd099	sf snapshot: pre-dispatch, uncommitted changes after 4703m inactivity	2026-04-25 05:51:29 +02:00
ace-pm	c744bdf6c1	fix: atomic writes, parse radix, lossy json, silent worker spawn 8 fixes from 3rd-pass scan: 1. web/components/sf/tempCodeRunnerFile.tsx: remove orphan VS Code 'Code Runner' artifact (850+ lines duplicated from shell-terminal.tsx). Unreferenced but compiled into tsc project. 2. sf/phase-anchor.ts: writePhaseAnchor used plain writeFileSync — a crash mid-write would corrupt the handoff checkpoint that readPhaseAnchor then silently returns null for, losing cross-phase context. Switched to atomicWriteSync (already used by sibling files). 3. sf/forensics.ts: same non-atomic writeFileSync on active-forensics.json marker. Race with a concurrent reader produces an empty object and the forensics session is lost. Switched to atomicWriteSync. 4. web/auto-dashboard-service.ts: paused-session.json existence was the intended signal but a corrupt body silently dropped the paused flag so the UI showed active. Now reports paused on file existence regardless of body integrity, and warns on corruption. 5. sf/visualizer-data.ts: doctor-history.jsonl parser did .map(JSON.parse) inside an outer catch. One corrupt line discarded 19 valid entries. Per-line try/catch preserves the valid rows. 6. sf/files.ts: three parseInt calls without radix (step, total_steps, totalSteps) — also missing \|\| 0 fallback for NaN. 7. cli.ts: parseInt(process.versions.node) without radix. Split on '.' and use radix 10 explicitly. 8. sf/slice-parallel-orchestrator.ts: silent 'catch {}' around spawn() masked worker-spawn failures as 'no workers available'. Matches sibling parallel-orchestrator.ts pattern — now logs via logWarning. Skipped from the scan (need a real lock mechanism, not safe as a one-line fix): - sf/auto-dispatch.ts:164 (UAT counter race) - sf/captures.ts:107 (CAPTURES.md append race) Deferred (low-value): - preferences-models.ts, key-manager.ts, auto-timers.ts silent catches - dead variable in visualizer-data.ts - google-gemini-cli.ts maxTokens clamp interaction tsc --noEmit green at root.	2026-04-21 02:13:10 +02:00
ace-pm	51b65fd490	fix: symlink extensions + silent catches masking real errors Real bugs from 2nd-pass scan: 1. extension-registry.ts: discoverAllManifests skipped symlinked extension dirs because Dirent.isDirectory() returns false for symlinks. Dev-workflow symlinks under ~/.sf/agent/extensions/ were invisible to list/enable/ disable/info. Matches the regression documented in symlink-extension-discovery.test.ts — the test inlines the correct logic, but this callsite still had the buggy form. Now accepts isDirectory() \|\| isSymbolicLink(). 2. headless.ts SIGINT handler: client.stop() failures were double-silenced (inner .catch(()=>{}), outer try{}catch{}). Interactive mode logs stop errors to stderr. Restored head/headless parity — still fire-and-forget (exit code is forced via process.exit) but failures are observable. 3. openai-codex-responses.ts SSE parser: malformed data frames were silently dropped so broken streams looked identical to clean ones. Now debug-logs the parse error with the chunk context so broken streams are distinguishable in logs. Stream continues on bad chunk (one bad frame shouldn't kill the whole generation). 4. web/cleanup-service.ts generated script: bare 'catch {}' around four native git calls (nativeBranchList, nativeDetectMainBranch, nativeBranchListMerged, nativeForEachRef). A failed main-branch detection silently left mainBranch undefined-shaped, then the next native call operated on garbage. Now emits console.warn so failures surface in the subprocess log. 5. web/undo-service.ts generated script: git revert failure was silenced; when --no-commit failed, user saw commitsReverted=0 with no reason. Now logs the revert error before attempting --abort (abort itself remains best-effort silent). False positives from the same scan (investigated and dismissed): - auto-worktree.ts #2505: code uses ':(exclude).sf/milestones' pathspec + shelter-and-restore, which is a better fix than the 'drop --include-untracked' approach the test comment describes. Test comment is stale; source is correct. - Lifecycle handler unhandled rejections across 5 extensions: extensions/runner.ts already try/catches handler invocations and routes to emitError. Wrapping the individual handlers would be redundant.	2026-04-21 02:01:41 +02:00
ace-pm	0f94341b43	fix(loader): fall back to src/resources when SF-WORKFLOW.md missing from dist Build sometimes copies dist/resources/extensions/ without the top-level markdown files (observed: SF-WORKFLOW.md absent in dist/resources/ while extensions/ was present). existsSync(distRes) was true either way, so SF_WORKFLOW_PATH pointed at a non-existent path and /sf failed with ENOENT. Check for the specific file instead of the directory.	2026-04-21 01:39:18 +02:00
ace-pm	485e8f608e	chore: init sf	2026-04-21 01:38:02 +02:00
ace-pm	e6676692fc	fix(sf-tui): remove welcome overlay that hangs on enter The per-session branded welcome overlay was added by the SF rebrand (`9d739dfa5`) as a boxed 'Press any key to continue...' splash shown once per sf session. In practice: Enter doesn't dismiss it and typing renders as garbled characters behind the overlay, blocking every TUI launch. Branding was redundant with the header (installed at session_start) and the footer (git branch + model). Shortcuts are discoverable via help. Deleting the overlay eliminates the hang vector entirely. Legacy-extension migration warnings (migrations.ts 'Press any key...') are unaffected — those are vendored upstream Pi code on a different code path and only fire when deprecated extensions are present.	2026-04-21 00:44:28 +02:00
Mikael Hugo	38d3bd55da	sf: route Gemini family models to google-gemini-cli by default resolveModelId now prefers google-gemini-cli over google (direct API) for bare Gemini/Gemma IDs, matching the operational default after the CLI-core re-platform. google-vertex is still honoured when it's the current provider. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 20:33:43 +02:00
Mikael Hugo	822791fad3	sf: wire Fix 1 deferred-commit (stage-before-verify, commit-after-verify) postUnitPreVerification now calls stageOnly() for execute-task units when action=commit, setting stagedPendingCommit=true and capturing task context. postUnitPostVerification commits the staged index after the gate passes, using a conventional-commit message built from the task context. Failure is non-fatal (logWarning + UI warning). 11 structural tests cover the full deferral lifecycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 20:33:39 +02:00
Mikael Hugo	315c2c49ca	sf: fail-closed verification gate + deferred-commit infrastructure Fix 2: verification gate no longer passes when no commands are configured. Empty-commands result now returns passed=false, skipped=true. Updated verification-gate.test.ts; added skipped-result guard in auto-verification.ts that warns and continues (not a hard failure). Fix 3: split auto-verification.ts try/catch into two zones. Zone 1 (gate machinery: prefs load, task lookup, runVerificationGate, captureRuntimeErrors, runDependencyAudit) catches → pauseAuto + return "pause". Zone 2 (ancillary: evidence writes, UOK gate, notifications) catches → logWarning + return "continue". Added verification-fail- closed.test.ts with 11 structural tests. Fix 1 (infrastructure): added stageOnly() + commitStaged() to GitServiceImpl, added stagedPendingCommit flag to AutoSession (cleared in reset()), marked the runTurnGitAction call site in postUnitPreVerification with TODO(fix-1-deferral) for the final wiring. Fix 4: timeout handler in runFinalize now captures hadStagedPending and hadCommitted before nulling currentUnit. Clears stagedPendingCommit to prevent orphaned deferred commits. Emits a diagnostic warning for each case so operators know whether staged-but-uncommitted changes will be absorbed or whether a commit landed before verification was skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:32:47 +02:00
Mikael Hugo	c940ebc16f	sf: unify milestone discuss dispatch + todo.md seed injection Replace separate dispatchHeadlessBootstrap with one flow: - dispatchNewMilestoneDiscuss({ auto }) — auto=true uses headless prompt + rootFiles seed, no pendingAutoStartMap; auto=false uses discuss prompt with preparation, sets pendingAutoStartMap - bootstrapNewMilestone() — project setup + ID reservation, called directly from bootstrapAutoSession instead of the old wrapper - injectTodoContext() — reads and deletes todo.md/TODO.md/SPEC.md at project root, injects content as spec into any preamble; called identically in auto and interactive flows Removes dispatchHeadlessBootstrap entirely. auto-start.ts now calls the primitives directly. All three showWorkflowEntry new-milestone sites use dispatchNewMilestoneDiscuss({ auto: false }). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 19:04:12 +02:00
Mikael Hugo	67d25f95f2	sf: add gemini cli preflight token counting	2026-04-19 13:25:07 +02:00
Mikael Hugo	59806f8cc5	rip out antigravity from SF + pi-coding-agent UI/config layer Antigravity (Google's IDE sandbox product, different from Gemini CLI) is removed from: src/onboarding.ts — drop from LLM_PROVIDER_IDS + OAuth-flow picker src/pi-migration.ts — drop from LLM_PROVIDER_IDS migration list src/web/onboarding-service.ts — drop from web-UI provider list src/tests/integration/web-onboarding-contract.test.ts — update contract src/resources/extensions/sf/doctor-providers.ts — drop from CLI_AUTH_PROVIDERS src/resources/extensions/sf/key-manager.ts — drop UI listing src/resources/extensions/sf-usage-bar/index.ts — delete entire quota fetcher block (~200 lines) packages/pi-coding-agent/src/cli/args.ts — drop PI_AI_ANTIGRAVITY_VERSION doc packages/pi-coding-agent/src/utils/proxy-server.ts — drop from claude provider chain Reason: antigravity has no vendor-published core library we can embed (unlike @google/gemini-cli-core for the Gemini CLI). Continuing to hand-roll OAuth against daily-cloudcode-pa.sandbox.googleapis.com is exactly the pattern Google has started banning for third-party tools. Removing the code removes the ban risk. pi-ai provider code, OAuth util, and models.generated entries for google-antigravity are removed in follow-up commits (separated for reviewability — each layer verified independently). Build passes. Note: this is a breaking change for any user who had google-antigravity configured — they'll need to migrate to google-gemini-cli (OAuth), google (API key), or google-vertex. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:39:36 +02:00
Mikael Hugo	0f0dcbf8c7	benchmarks: add Gemini 2.5/3/3.1 Pro + Flash entries Gemini had zero benchmark entries in model-benchmarks.json despite being served by google-gemini-cli (OAuth provider, SF native), google (API key), google-vertex, google-antigravity, openrouter, etc. Every gemini-* model in the pi-ai catalog scored 0 in the benchmark selector — effectively excluded from auto-selection even when allow-listed. Published numbers from DeepMind model cards + Vellum LLM leaderboard + Vals AI: gemini-3-pro-preview: SWE-Verified 76.2, HLE 37.5, AIME25 95, GPQA-D 91.9, MMLU-Pro 81.0 gemini-3.1-pro-preview: SWE-Verified 78, HLE 41, AIME 97, GPQA-D 93, MMLU-Pro 83 (Feb 2026) gemini-3-flash-preview: estimated from Pro-vs-Flash delta gemini-2.5-pro: SWE-Verified 63.8, HLE 18.8, GPQA-D 84.0, MMLU-Pro 86 gemini-2.5-flash: estimated from Pro-vs-Flash delta Context windows reflect Gemini's 1M-2M token capability. LiveCodeBench Pro Elo (2439 for Gemini 3 Pro) isn't in the 0-100 percent schema — skipped rather than forced. Future: add arena_elo- style LCB Elo dimension to the schema if we start routing on it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:11:45 +02:00
Mikael Hugo	e413cf4a3f	preferences: add provider_preference for benchmark tie-breaking When two models score identically in the benchmark selector — typically the same underlying weights served by different endpoints — the previous alphabetical tiebreaker picked wrong. dr-repo example: zai/glm-5.1 score 84.7 opencode-go/glm-5.1 score 84.7 Both are the exact same GLM-5.1 weights. Alphabetical comparison made opencode-go win ("o" < "z") even though zai is the NATIVE provider. Fix: new `provider_preference` pref, an ordered list of providers. Listed providers rank in order, unlisted fall after alphabetically. Applied as the tie-breaker between score and alphabetical. Global default shipped in ~/.sf/preferences.md: kimi-coding, minimax, zai, mistral, ollama-cloud, opencode-go, opencode Native providers ranked before re-servers. Users can override per project. Verified: after the change, dr-repo picks zai/glm-5.1 as primary for execute-task and gate-evaluate (was opencode-go/glm-5.1), and kimi-coding/k2p5 stays primary for completion phases with its direct provider winning over opencode re-servers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 10:09:42 +02:00
Mikael Hugo	345f9586dd	benchmark-selector: coverage-confidence multiplier + 12 regression tests The original "normalise by populated weight" was too aggressive: a model with 1 strong dimension (delta-fast: human_eval=92) outranked a model with 4 strong dimensions (beta-coder: swe_bench=85, lcb=90, he=95, ifeval=90) because both normalised to their own small average. Fix: multiply normalised score by a confidence factor tied to how much of the unit's profile the model actually populated. Confidence = populated_weight / total_profile_weight, blended 50/50 with a flat floor so sparse-but-strong specialists still rank when no generalist covers the profile: score = (weighted_sum / weight_total) * (0.5 + 0.5 * confidence) Net effect on dr-repo's auto-resolve: Before: After: plan-milestone glm-5.1 plan-milestone MiniMax-M2.5 research-slice codestral research-slice mistral-large-2411 execute-task mistral-large execute-task opencode-go/glm-5.1 validate-m magistral validate-m MiniMax-M2.5 subagent mistral-large subagent kimi-coding/k2p5 MiniMax's broad coverage (8 populated dimensions from the M2 README) now correctly outranks GLM-5.1's higher but narrower scores for reasoning-heavy units. Matches user intuition that "MiniMax is really powerful". Also fixes findBenchmarkKey to try "<modelId>-latest" for date-suffixed model variants — pi-ai catalogs "devstral-medium-2507" but benchmarks only have "devstral-medium-latest"; matcher now bridges that. 12 regression tests cover: - empty candidate pool - each profile (reasoning/coding/lightweight) picks right champion - swe_bench ↔ swe_bench_verified equivalence - models with all-null benchmarks score 0 but stay in fallbacks - sparse-strong beats dense-weak (confirms confidence multiplier doesn't over-penalise specialists) - provider diversification in fallback chain - deterministic tie-breaking - unknown unit types use default coding profile - date-suffixed model IDs match family-latest keys Audit: 41 of 85 allow-listed models in pi-ai catalog have benchmark data. 44 score 0 (mostly opencode Zen re-served models, ministral small variants, pixtral vision models, legacy open-mistral). Top picks for every dr-repo unit type DO have benchmark data — the gap is in the long tail of fallbacks, which never matter unless the primary and closer fallbacks all fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:58:10 +02:00
Mikael Hugo	0b8a1c246f	auto-benchmark model selection: pick best-scoring per unit type New module src/resources/extensions/sf/benchmark-selector.ts implements benchmark-driven model selection. When models.<unit> is not pinned, preferences-models.ts falls through to pick the highest-scoring candidate from allowed_providers × pi-ai's model catalog, ranked against a per-unit-type weight profile. Weight profiles per unit type: plan-milestone / plan-slice → agent-planning (swe_bench .25, lcb .20, hle .15, gpqa .15, mmlu_pro .15, aime .10) research-* → mixed (mmlu_pro, hle, human_eval, browse_comp, simple_qa, gpqa) execute-task → coding (swe_bench .35, swe_bench_v .25, lcb .20, human_eval .15) execution_simple / complete-* → fast+correct (human_eval .40, instruction_following .35, ruler .25) gate-evaluate → review (swe_bench .30, hle .25, gpqa .25, ifeval .20) validate-milestone → validation (hle .30, gpqa .25, mmlu_pro .25, swe_bench .20) Key design decisions: - Missing dimensions are dropped (normalised by populated weight), so a model with 2 strong populated scores isn't crushed by a peer with 5 mediocre ones. - swe_bench ↔ swe_bench_verified are fungible — some vendors publish one, some the other; treat as equivalent. - Provider diversification in fallbacks so one provider going 429 doesn't kill the whole chain. - Score ties broken by coverage, then lexical — deterministic. Also updates MiniMax-M2/M2.5/M2.7 benchmarks with real numbers from the M2 official README (DeepWiki sourced) and MiniMax-M2.5 card (minimax.io): swe_bench_verified 69.4→80.2, LCB 83, HLE 31.8 (w/ tools — more representative for agent work than no-tools 12.5), AIME25 78, GPQA-D 78, MMLU-Pro 82. Context windows bumped to weights-level: M2 400K, M2.5/M2.7 1M (endpoints may cap lower). Verified end-to-end: with dr-repo's allow-list (kimi-coding/minimax/zai/opencode-go/mistral) and models.* absent, resolveModelWithFallbacksForUnit() returns: plan-milestone → opencode-go/glm-5.1 (+3 fallbacks) research-slice → mistral/codestral-latest execute-task → mistral/mistral-large-latest execution_simple → kimi-coding/k2p5 gate-evaluate → opencode-go/glm-5.1 validate-milestone → mistral/magistral-medium-latest subagent → mistral/mistral-large-latest Users can still pin individual units (existing models.* behaviour unchanged) or rely fully on auto-selection by omitting them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:43:26 +02:00
Mikael Hugo	6450b37025	core + search + benchmarks: auth-error recovery, multi-provider search, M2.7-highspeed entry Four related improvements that landed in the working tree after the auto-hardening merge but hadn't been committed: 1. auth_error as a distinct error type (auth-storage + retry-handler). Previously invalid/expired API keys would retry the same failing credential until the retry budget exhausted. Now: - classifyErrorType() recognizes 401s, "invalid api key", "authentication error", "unauthorized" etc as "auth_error" - RetryHandler triggers cross-provider fallback on auth_error just like it does for rate_limit and quota_exhausted — switch providers rather than burning retries on a broken key Outcome: a stale OPENCODE_API_KEY in sops now fails over to kimi or minimax immediately instead of stalling the unit. 2. Multi-provider search-key detection (native-search.ts). The "Web search: Set BRAVE_API_KEY" warning fired whenever a non-Anthropic model lacked BRAVE_API_KEY, even when the user had TAVILY_API_KEY or OLLAMA_API_KEY available. Now: the warning suppresses if any of BRAVE/TAVILY/OLLAMA keys is present, and the warning text lists all three options. Matches the preferences- validation allow-list for search_provider. 3. MiniMax-M2.7-highspeed benchmark entry (model-benchmarks.json). Routes the fast-tier variant of M2.7 through the Bayesian blender with inherited RULER scores. Lets dynamic routing consider the highspeed model when speed matters more than peak quality. No regressions: the 41 pre-existing test failures in pi-coding-agent (FallbackResolver chain-membership + LSP integration) are unchanged relative to the prior commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-19 09:24:54 +02:00

1 2 3 4 5 ...

2509 commits