singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	f92022730b	fix(promoter): cluster by domain:family + repair upsertRequirement field-binding Two related fixes that complete AC4 of sf-mp4rxkwt-sfthez (kind taxonomy, commit `83c28b756`): 1. Cluster by domain:family prefix instead of exact kind string. The promoter was clustering on the full `kind` value, which after the taxonomy enforcement means every entry like gap:routing:tiebreak-cost-only and gap:routing:agentic-axis-partial- coverage stayed in cluster size 1. Empirical confirmation: live ledger 2026-05-14 had 10 open entries, max cluster size 1 under exact-string matching — promoter could never fire on real diverse data. New behavior: extract first two segments as the cluster key. Entries sharing domain:family group together; legacy single-segment kinds cluster as themselves. With this change, the live ledger's gap:routing family would include 3 entries today. 2. Repair the silently-broken upsertRequirement call (LATENT BUG). The promoter was calling upsertRequirement with only {id, title, description, status, class, source} — but the schema binds every column positionally including {why, primary_owner, supporting_slices, validation, notes, full_content, superseded_by}. SQLite cannot bind `undefined`, so EVERY upsert attempt threw — caught silently by the surrounding try/catch ("non-fatal") with no log line. Result: the promoter has never successfully created a requirement row in this project's history, regardless of clustering threshold. Fix: pass all schema columns explicitly with null defaults for unused ones. Also encode the human-readable cluster title into description's first line since the requirements table has no title column (separate schema-evolution concern, out of scope here). Tests: new tests/requirement-promoter.test.mjs (5 tests) covers domain:family clustering when count>=5, no cross-family clustering, legacy single-segment kinds, below-threshold returns 0, non-forge bail. The first test would have caught both the prefix clustering miss AND the upsertRequirement field-binding bug — runs end-to-end through upsertRequirement → getActiveRequirements. 1601/1601 SF extension tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:34:13 +02:00
Mikael Hugo	83c28b756c	feat(self-feedback): enforce kind taxonomy at recordSelfFeedback Addresses sf-mp4rxkwt-sfthez (gap:self-feedback-kind-vocabulary-unbounded). The reflection report identified this as part of the deepest architectural concern (4 entries clustered under data-plane isolation), and the threshold-promoter was structurally unable to fire because every entry's kind was a unique string (clusters by exact match). Add a `domain:family[:specific]` taxonomy validated at recordSelfFeedback write time: ALLOWED_KIND_DOMAINS enum of allowed top-level domains (gap, architecture-defect, architectural-risk, inconsistency, runaway-loop, schema-drift, janitor-gap, upstream-rollup, reflection, copilot-parity-gaps, gap-audit-orphan-prompt, gap-audit-orphan-command, flow-audit, executor-refused, solver-missing-checkpoint, runaway-guard-hard-pause, self-feedback-resolution) KIND_SEGMENT_RE /^[a-z][a-z0-9](?:-[a-z0-9]+)$/ — kebab-case per segment validateKind(kind) accepts: domain (1-segment legacy) domain:family (2-segment canonical) domain👪specific (3-segment specific) rejects: empty, non-string, >3 segments, unknown domain, non-kebab segments recordSelfFeedback now returns null when validateKind fails, with a warning logged via workflow-logger. Existing rows in the ledger are grandfathered (validation only fires on NEW writes through this entry point) so the migration is non-destructive. This unblocks the threshold-promoter to cluster by domain:family prefix once the requirement-promoter is updated to do so (separate follow-up). Detectors and reflection passes can now reason about domains rather than handfuls of unique strings. Tests: 3 new (canonical-shapes / malformed-rejected / non-string-rejected). 8 existing test fixtures updated to use canonical kinds (gap:test-feedback etc.) — they were using bare slugs that the new validation correctly rejects. 1596/1596 SF extension tests pass; typecheck clean. Note on prior dispatch: this work was first attempted via "sf headless -p" to dogfood the new memory rule (drive SF work through sf headless, not parallel Claude Code agents). The dispatch ran 49s with 8 tool calls but landed nothing — the same fragility documented in sf-mp4rxkwb-l4baga (triage-not-a-first-class-unit-type). Hand-coding fallback applied; fragility data point added to the open entry's evidence trail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:28:19 +02:00
Mikael Hugo	e2f631901f	test(sf-db-migration): bump expected schema version 62 → 63 Schema head moved to v63 in commit `21d905461` (parallel agent's "rem-agent-inspired memory discipline + always-in-context invariants board" track) but the migration tests still asserted v62 — flagged in the last 2 iterations as "pre-existing migration failures, not mine." Update both schema-version assertions to 63 + add a context_board table-exists check after the v63 migration so future schema bumps explicitly require updating both the version assertion AND the matching table-presence check (catches naked-version-bump skews). 11/11 migration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:19:09 +02:00
Mikael Hugo	2f8ee57256	feat(self-feedback): mirror resolutions into memory-store on success Addresses sf-mp4rp6y2-31jfau (architecture-defect:self-feedback-not- wired-to-memory-subsystem). The reflection layer surfaced this as part of the deepest architectural concern in the 2026-05-14T02-49-45Z report: "resolutions are hidden from the memory graph, SF will continue to forget its own triaged solutions and fail to cluster identical root causes." When markResolved succeeds against the DB, also call memory-store's createMemory to mirror the closure as a memory entry that detectors and reflection passes can consult later via getRelevantMemoriesRanked. Memory entry shape: category: "self-feedback-resolution" content: "[<entry.kind>] <entry.summary>\n→ <evidence.kind>: <reason>" confidence: 0.9 source_unit_type: "self-feedback" source_unit_id: <entryId> tags: [ <entry.kind>, "evidence:<evidence.kind>", "commit:<sha-12-prefix>" // when commitSha present "requirement:<reqId>" // when requirementId present ] Best-effort: any memory-write failure is silently swallowed. The resolution itself already landed via DB UPDATE + JSONL audit append + markdown regen — the memory mirror is observability + future detector consumption, not a correctness requirement. The try/catch ensures a broken memory subsystem cannot roll back a valid resolution. Tests (2 new, 13 total in self-feedback-db): - agent-fix with commitSha → memory entry has [kind, evidence:agent-fix, commit:<sha-prefix>] tags + sourceUnitId pointing at the resolved entry - human-clear without commit → memory entry has [kind, evidence:human- clear] tags only, no commit tag Pre-existing migration failures in sf-db-migration.test.mjs (2 tests: v27 spec backfill, v52 routing-history heal) are unrelated to this commit; same failure mode as last iteration. Logged here so the 1591/1593 pass rate is auditable. The other three siblings of the consolidating reflection entry (sf-mp4w89mv-3ulqp4) remain open and need schema migration: - sf-mp4rxkwt-sfthez kind vocabulary (domain:family[:specific]) - sf-mp4rxkwx-jz0soh causal links (self_feedback_relations table) - sf-mp4rxkx0-fkt3e2 prioritization (impact_score + effort_estimate cols) This commit lands the writer-layer-only piece (#4 in the rollup's suggested fix), unlocking detector + reflection consumption immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:16:28 +02:00
Mikael Hugo	6a88ad2f00	refactor(reflection): route through @singularity-forge/ai, drop subprocess + gemini hardcoding User-correctable architecture defect: runGeminiReflection shelled out to the `gemini` CLI binary and hardcoded the gemini provider, duplicating auth discovery and disconnecting the call from SF's metrics, cost accounting, and provider abstraction. Should have routed through the existing @singularity-forge/ai layer from the start. Replace runGeminiReflection with runReflection that: - Resolves an operator-supplied "provider/modelId" string via @singularity-forge/ai's getModel (the canonical accessor for the runtime model registry — MODELS itself isn't re-exported). - Calls completeSimple from @singularity-forge/ai. Same provider routing every other SF LLM call uses (anthropic, openai, google-gemini-cli, openai-codex-responses, mistral, etc.). No subprocess. - Default model is google-gemini-cli/gemini-3-pro-preview because that matches the operator's primary AI Ultra tier — but the default lives in a single named constant (DEFAULT_REFLECTION_MODEL), no provider hardcoding in the call path. Operators override per-call via --model. - Returns { ok, content?, cleanFinish?, error?, provider, modelId } for observability into which provider actually answered. runGeminiReflection kept as an alias for back-compat so the existing headless-reflect.ts caller works unchanged. New code should use runReflection directly. Tests: switched from a fake-gemini-binary-on-PATH approach (5 tests) to a clean dependency-injection approach via options.complete (5 tests + 1 new "rejects bare model strings"). Mock returns AssistantMessage shape directly, no subprocess machinery. Two pre-existing migration test failures in sf-db-migration.test.mjs (openDatabase_migrates_v27, openDatabase_v52_db_heals_routing_history) are unaffected by this commit — they fail in isolation too, likely related to commit 7570aac4b's routing-metrics track. Logged here so the 1589/1591 pass rate is auditable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:11:07 +02:00
Mikael Hugo	21d9054611	feat(sf): rem-agent-inspired memory discipline + always-in-context invariants board Two patterns lifted from Copilot CLI 1.0.47's rem-agent design. 1. add/prune-only consolidation surface (memory-store, memory-extractor) - applyConsolidationActions(): new export that gates the extractor path to two action kinds only — "add" (→ CREATE) and "prune" (→ SUPERSEDE with sentinel superseded_by = "pruned:<unitType>:<unitId>"). UPDATE / REINFORCE / SUPERSEDE actions are rejected with a descriptive error from the consolidation path; manual paths still use applyMemoryActions and keep full action surface. - memory-extractor.js EXTRACTION_SYSTEM prompt updated: model is told to emit add/prune only and to fix wrong entries by prune+readd, not edit. - Discipline win: every consolidation change is visible as an addition or removal — no silent revisions. 2. swarm member inheritance of parent memory view (swarm-dispatch) - SwarmDispatchLayer.dispatch() now fetches getActiveMemoriesRanked(30) and formatMemoriesForPrompt(memories, 2000, false) at dispatch time, attaches as memoryContext on both bus metadata and DispatchResult. - Snapshot semantics — members get the view at dispatch time, no live updates mid-task. - Resolves the TODO at swarm-dispatch.js:22. 3. always-in-context invariants board (new capability) - New src/resources/extensions/sf/context-board.js — SQLite-backed, per-repo/per-branch entries. Two ops: addBoardEntry, pruneBoardEntry (no update — same discipline as #1). 4 KB byte cap in formatBoardForPrompt with truncation marker. - New src/resources/extensions/sf/tools/context-board-tool.js + bootstrap/context-board-tool.js — registered via pi.registerTool with two ops: add(content, category?) and prune(id). Repository + branch auto-filled from git context. - Schema migration v62 → v63 in sf-db-schema.js adds context_board table + idx_context_board_repo_branch index. ensureContextBoardTable wired into initSchema for fresh databases. - System-prompt injection at auto/phases-dispatch.js runDispatch right after dispatchResult.prompt resolution: prepends board snapshot under a labeled section. Try/catch fail-open — board errors never break dispatch. Sidecar/custom-engine paths intentionally not covered (carry full unit context already + low frequency). Why these complement existing infra rather than replace: - memory-store remains queryable (recall on demand) for facts the agent references sometimes. - context_board is always-rendered (small, prompt-injected) for invariants the agent should never operate without — current milestone scope, architectural rules, known-broken paths, in-flight migrations. Comparison to Copilot rem-agent: - We have what they have on consolidation (add/prune + board) plus what SF already had (queue + drain + memory-extractor + SLEEPTIME swarm topology that's richer than their single-agent rem-agent). Tests: 40/40 pass across memory-consolidation-discipline.test.ts (18) and context-board.test.ts (22). Full test:unit deferred — see follow-up. Two parallel Sonnet 4.6 sub-agents in isolated worktrees produced the work; integration adapted for the modular sf-db split (schema went into sf-db/sf-db-schema.js, prompt injection into auto/phases-dispatch.js, both of which got pulled out of their original files since the swarms launched). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:08:31 +02:00
Mikael Hugo	f68ab20953	fix(ai): backfill MiniMax M2/M2.1 cacheRead pricing	2026-05-14 04:55:46 +02:00
Mikael Hugo	4af10ac1b2	fix(self-feedback): verify agent-fix commit_sha exists in repo Partially addresses sf-mp4rxkwn-jmp039 (no-outcomes-verification): AC1 and AC3 land here. AC2 (cross-check that the cited commit's changed files include the entry's referenced files) is filed separately as a follow-up — different mechanism (semantic AC parsing). Without this check, an agent could stamp ANY string as commit_sha and markResolved would accept it under the writer-layer constraint shipped in `d477ce703`. The credibility check at the reader caught the OBVIOUS non-canonical shapes (null evidence, {file, line}) but a well-formed {kind: "agent-fix", commitSha: "phantom-sha"} would have passed. Implementation: verifyCommitExists(commitSha, basePath) returns one of: - "verified" — git is present and the commit is in the repo - "missing" — git is present but the commit lookup failed - "ungrokable" — git unavailable or basePath isn't a git repo (carve-out: we can't verify, so don't punish) markResolved policy: reject on "missing"; accept on the others. The agent-fix-unverified kind (reserved in `d477ce703`) is the explicit escape hatch for "I want to mark agent-fix but can't cite a verifiable commit" — those resolutions remain re-includable under the credibility check, which is what we want. Implementation uses two shell-outs to git (rev-parse --verify, then rev-parse --git-dir to distinguish missing from not-a-repo). Both are guarded with 5-second timeouts and never throw — failure modes return "ungrokable" so the carve-out kicks in. Tests: 2 new (11 total in self-feedback-db). - rejects_agent_fix_with_nonexistent_commit_sha: initializes a real git repo, files an entry, rejects bogus SHA, accepts real HEAD SHA - accepts_agent_fix_with_no_commit_sha_or_ungrokable_path: covers both the carve-out (no-git) and agent-fix-without-commitSha (testPath/summaryNarrative path) Full SF extension suite (1549 tests) passes; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:44:04 +02:00
Mikael Hugo	d477ce7039	fix(self-feedback): reject non-canonical evidence at the writer layer Addresses sf-mp4qoby4-meiir7: the credibility check at the READER side of self-feedback (selectInlineFixCandidates) was previously the only gate. An agent that wrote DB rows directly via raw SQL or the wrong tool could bypass it, landing resolutions like {file, line} or null that the reader would then either trust (legacy carve-out) or quietly re-open. Observed live in 2026-05-13 dogfood (5/5 sloppy resolutions with non-canonical evidence shapes). This commit makes the policy belt-and-suspenders: markResolved (and by extension resolveSelfFeedbackEntry) refuse to write resolutions whose evidence.kind is not in the accepted set: agent-fix, human-clear, promoted-to-requirement, auto-version-bump, agent-fix-unverified (reserved for outcomes-verification follow-up) When evidence is missing, non-object, or its kind is outside the set, markResolved returns false WITHOUT touching the DB or JSONL — caller recovers by re-submitting with a valid kind. All existing callers (resolve_issue tool, requirement-promoter, auto-version-bump resolver, triage-self-feedback) already pass valid kinds; no breakage. Raw SQL bypass is a known limit documented in the entry — full coverage needs a DB CHECK constraint on resolved_evidence_json (schema migration, separate work). Tests: 2 new (markResolved_rejects_non_canonical, accepts_each_canonical) covering all four rejection paths (bad kind, missing kind, missing evidence, unknown kind) and all five accepted kinds. Full SF extension suite (1547 tests) passes; typecheck clean. Plus inline cleanup: closed 3 stale upstream-rollup re-files (sf-mp4qyotx, sf-mp4qyoub, sf-mp4qyouh) with human-clear evidence — the bridge fix in `6d27cba06` now prevents recurrence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:40:52 +02:00
Mikael Hugo	6d27cba067	fix(upstream-bridge): suppress re-file of recently-closed rollup kinds Addresses sf-mp4rp6xn-hpag5h: bridgeUpstreamFeedback's idempotency check only looked at currently-OPEN upstream-rollup entries, so any closure (human-clear or agent-fix) would let the bridge re-file the same cluster on the next session_start. Observed live during 2026-05-13 dogfood: closed 3 upstream-rollup entries with human-clear, bridge re-filed all 3 on the next run. Change: extend the idempotency set to also exclude rollup kinds that were RESOLVED within the last 30 days (matches the existing THIRTY_DAYS_MS upstream-source cutoff — same window, same rationale). Closures are treated as time-limited: after the window expires, a re-cluster CAN re-file, because the original closure was made against then-current state and later state may legitimately surface the same kind again. This is the right balance — operators get respite from re-files while the closure decision was fresh, without trapping the ledger forever if conditions actually change. 7 new tests cover the regression (files new / skips open / skips recently-closed / allows re-file after window / threshold guards / non-forge-repo bail). Full SF extension suite (1545 tests) passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:37:10 +02:00
Mikael Hugo	62b19d7ba4	feat(reflection): wire LLM dispatch (sf headless reflect --run) Phase 1B of the reflection layer: complete the operator-driven loop by adding actual LLM dispatch. Phase 1A (commit `e161a59e2`) shipped the corpus assembler + prompt template + the prompt-emit operator surface. This commit wires the dispatch end so `sf headless reflect --run` produces a real report on disk without manual model piping. Why shell-out to the gemini CLI and not SF's provider abstraction: reflection is a single-prompt one-shot inference. Going through SF's full agent dispatch would require a session, model registry, tool registration, recovery shell — overkill for "render this prompt, capture text." The gemini CLI handles auth (~/.gemini/oauth_creds.json), Code Assist project discovery, and protocol drift on its behalf. Subprocess cost is paid once per reflection (rare). Implementation: - reflection.js: runGeminiReflection(prompt, options) spawns `gemini --yolo --model <model> -p "<directive>"` and pipes the giant rendered template via stdin (gemini -p reads stdin and appends). Returns { ok, content, cleanFinish, exitCode, error, stderr }; never throws. Defaults to gemini-3-pro-preview (0% used on AI Ultra, strongest agentic model with quota). 8-minute timeout. cleanFinish detected by REFLECTION_COMPLETE terminator (emitted by the prompt template's output contract) — operator gets a warning when the report is truncated. - headless-reflect.ts: --run flag triggers dispatch + report write via writeReflectionReport. --model overrides the default. Errors surface as JSON or text per --json. Successful runs emit the report path on stdout; failures emit error + truncated stderr. - help-text.ts: documents --run and --model flags. - Tests (4 new, 13 total): use a fake `gemini` binary on PATH to exercise the spawn path without real OAuth/network — covers ok+cleanFinish, non-zero exit, hang/timeout, missing-terminator. All 1538 SF extension tests pass; typecheck clean. Phase 2 follow-up (still gated on sf-mp4rxkwb-l4baga triage-not-a-first-class-unit-type landing): reflection-pass becomes a real autonomous-loop unit type, milestone-close auto-triggers it, the report's `Recommended new self-feedback entries` section gets parsed and the entries auto-filed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:33:16 +02:00
Mikael Hugo	e161a59e2f	feat(reflection): add Phase 1A reflection layer (corpus + prompt + sf headless reflect) Addresses self-feedback entry sf-mp4uzvcd-pazg6v (architecture-defect:no-reflection-layer-over-self-feedback-corpus): SF detected symptoms and triaged individual entries but had no layer that reasoned about the corpus to recognize recurring structural patterns. The same architectural pressure expressed itself across multiple entries with different exact-kind strings; nothing escalated the pattern to a class. The cognitive work fell on the operator. This commit ships Phase 1A — the data-assembly + prompt half of the reflection layer + an operator-driven entry point. Phase 1B (LLM dispatch via the autonomous loop as a real unit type) lands once sf-mp4rxkwb-l4baga (triage-not-a-first-class-unit-type) is in. Files: - src/resources/extensions/sf/reflection.js (new) - assembleReflectionCorpus(basePath): bundles open + recent-resolved self-feedback (full json), last 50 commits via git log, milestone + slice + task state, all milestone validation verdicts, and prior reflection report into one struct. Returns null on prerequisite failure (DB closed) so callers downgrade gracefully. - renderReflectionCorpusBrief(corpus): renders the corpus into a markdown brief the LLM consumes in one turn. - writeReflectionReport(basePath, content): persists to .sf/reflection/<timestamp>-report.md so next pass detects "what changed since last reflection." - src/resources/extensions/sf/prompts/reflection-pass.md (new) - {{include:working-directory}} prefix. - Reasoning order: cluster by structural shape (not exact kind), identify recurring patterns, identify commit/ledger gaps, identify stale validation drift, identify the deepest architectural concern, compare against prior report. - Output contract: structured markdown report with named sections, terminator REFLECTION_COMPLETE for clean-finish detection. - Constraints: don't fix anything (reflection layer not executor), don't resolve entries without commit-SHA evidence, don't invent IDs. - src/headless-reflect.ts (new) — sf headless reflect [--json] - Pre-opens the project DB via auto-start.openProjectDbIfPresent (one-shot bypass path doesn't run the full SF agent bootstrap). - Default: emits the rendered prompt brief (template + corpus) for operators to pipe into any model. Lets the corpus-assembly layer ship and validate before the LLM-dispatch layer is wired. - --json: emits raw corpus snapshot for tooling. - src/headless.ts: registers the new "reflect" command after the existing usage block. - src/help-text.ts: documents it in the headless command list. - src/resources/extensions/sf/tests/reflection.test.mjs (new, 9 tests): null-when-DB-closed; collects open + recent-resolved; excludes >30d resolutions; captures milestone/slice/task tree; captures validation verdicts; commits returned as array (best-effort tmpdir is ok); brief renders all major sections; entry IDs/severity/kind appear in brief; writeReflectionReport round-trips through assembleReflectionCorpus's previousReport read. Live smoke verified: sf headless reflect against the real .sf/sf.db returns 15 open + 23 recent-resolved entries, 50 commits, 2 milestones, 1 validation file (correctly surfacing M001's stale needs-attention verdict against actual 5/5 slices done — exactly the case that motivated this layer). Total: +848 LOC, full SF extension suite (1534 tests) passes, typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:27:29 +02:00
Mikael Hugo	7570aac4b7	feat(sf): generation-aware failover + canonical-keyed metrics Two parallel refactors building on the model-registry consolidation: 1. Generation-aware failover (model-route-failure.js, agent-end-recovery.js) - resolveNextModelRoute now takes unitType so it knows whether the caller is solver-pinned per ADR-0079 (autonomous-solver). When pinned, rejects candidates whose canonicalIdFor() differs from the failed route's canonical id — closes the latent solver-invariant violation where kimi-coding/kimi-k2.6 could silently fail over to ollama-cloud/kimi-k2.5:cloud (different generation). - Cross-generation failover in non-pinned units now emits a structured logWarning so generation downgrades are visible in traces instead of looking like an equivalent route switch. 2. Canonical-keyed performance metrics (model-learner.js) - .sf/model-performance.json now keys by canonical_id with an {aggregate, by_route} sub-shape instead of fused provider/wire-model strings. Cross-route history per model is now coherent — kimi-k2.6 reached via kimi-coding accumulates into the same aggregate as reached via openrouter. - Migration runs at boot: detects old shape (no 'aggregate' key in unit-type blob values), distributes each entry into by_route, recomputes aggregate, writes a backup to .sf/model-performance.json.pre-canonical-backup. Unmappable route keys land in _unmapped so nothing is dropped. - getRouteStats(taskType, routeKey) added for per-route failover ordering; existing getRankedModels emits canonical IDs for cross-route strength queries. 3. Tests - model-registry.test.ts: bundled in this commit (Swarm A's test file was left untracked when the registry module was committed). - model-route-failure.test.ts: 12 tests covering solver-pin guard, same-canonical multi-route failover, generation-downgrade log emit. - model-learner-canonical.test.ts: 17 tests covering migration round-trip, aggregate invariant, _unmapped bucket, and zero-default reads. - model-learner.test.ts: one existing test updated for the new _unmapped.by_route shape on bare model IDs. 4. Results - Targeted tests: 147/147 across registry, route-failure, learner, learner-canonical. - Full npm run test:unit: 4707 pass, 0 fail, 83 skipped (no new regressions vs pre-edit baseline of 4669). Work parallelized across two Sonnet 4.6 sub-agents in isolated git worktrees. Contract authored in docs/dev/drafts/model-registry-contract.md (committed earlier in `1d753af6b`) and consumed by both agents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:15:08 +02:00
Mikael Hugo	09bc50f0f6	feat(openai-codex): mirror codex CLI's models_cache.json into SF catalog The static catalog in models.generated.ts carries phantom slugs like gpt-5-codex / gpt-5.1-codex / gpt-5.1-codex-max / gpt-5.2-codex that the ChatGPT-account API rejects with HTTP 400 ("model is not supported when using Codex with a ChatGPT account"). Verified live on this machine: ERROR: "The 'gpt-5-codex' model is not supported when using Codex with a ChatGPT account." Meanwhile the actually-supported slugs for a ChatGPT subscription (gpt-5.5 default, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2) are not in SF's view at all — so the router scores phantoms, picks one, dispatch fails, no successful routes record, and routing silently drifts. The codex CLI itself maintains ~/.codex/models_cache.json with the authoritative "what THIS account can actually serve" list (visibility + supported_in_api flags). SF reads that file directly — no duplicate discovery, no separate API call, single source of truth. Changes: - src/resources/extensions/sf/openai-codex-catalog.js (new) — pure file reader. Resolves CODEX_HOME (or ~/.codex), parses models_cache.json, filters by visibility==="list" AND supported_in_api===true, mirrors the result into .sf/runtime/model-catalog/openai-codex.json. Same cache shape as the generic model-catalog-cache and gemini-catalog modules so getKnownModelIds picks it up transparently. - bootstrap/register-hooks.js — wire scheduleOpenaiCodexCatalogRefresh into session_start, parallel to the existing gemini and generic catalog refreshes. - Tests (9): cache-missing, malformed, filter correctness against the real shape, no-pass-through, slug validation, refresh-writes-cache, cache-fresh-skips-refresh, and live discovery via the smoke probe returns exactly ["gpt-5.5", "gpt-5.4", "gpt-5.4-mini", "gpt-5.3-codex", "gpt-5.2"] on this machine. Asymmetry vs gemini-cli is appropriate: codex CLI caches locally so SF just reads the file; gemini-cli does not, so SF's gemini path calls setupUser + retrieveUserQuota over the wire. Each provider gets the cheapest reliable discovery path. Follow-up filed separately: extract codex transport (codex-app-server-client.ts, openai-codex-responses.ts, this catalog reader) into a dedicated @singularity-forge/openai-codex-provider package mirroring the gemini-cli-provider structure for symmetry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 03:53:34 +02:00
Mikael Hugo	383e495085	feat(headless,gemini-cli): add sf headless usage + unify gemini quota path Adds a machine-readable headless surface for live LLM-provider usage and unifies the gemini-cli quota fetch through one helper, removing the duplication that existed between usage-bar.js and the new package. 1. snapshotGeminiCliAccount in @singularity-forge/google-gemini-cli-provider - Single source of truth for { projectId, userTierId, userTierName, paidTier, models[] } via setupUser + retrieveUserQuota. - Dedups buckets per modelId, keeping the worst (lowest remainingFraction) so consumers always see the most-restrictive window. Code Assist sometimes returns multiple buckets per model; the pessimistic choice is what every consumer needs. - discoverGeminiCliModels(cwd?) wraps it for catalog-cache callers that only need the IDs. 2. sf headless usage subcommand - New src/headless-usage.ts handler. text (default) and --json output. Uses the package's snapshot directly — no RPC child, no jiti gymnastics — matching the shape of headless-uok-status / headless-doctor. - Wired into src/headless.ts after the doctor block. - Help text adds the command line. 3. usage-bar.js refactored to delegate - fetchGeminiUsage no longer imports gemini-cli-core directly. It calls snapshotGeminiCliAccount and reshapes the result into the existing { provider, displayName, windows[] } UI contract. - Eliminates the duplicate setupUser + retrieveUserQuota code path. - The fast existsSync(~/.gemini/oauth_creds.json) pre-flight stays so unauth'd users get a friendly message without paying for OAuth bootstrap. 4. Model registry refactor (separate track committed alongside) - src/resources/extensions/sf/model-registry.ts (new) consolidates canonical model identity, capability tier, and generation tags into one source of truth that auto-model-selection, benchmark-selector, and model-router now consume instead of maintaining parallel maps. All 1487 tests pass (151 files); typecheck clean for both the package and the SF extensions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 03:42:53 +02:00
Mikael Hugo	c6a3fa6a6a	feat(gemini-cli): discover account models via gemini-cli-core + retry on capacity errors Two related fixes for the google-gemini-cli provider, both motivated by today's dogfood diagnosis: SF was pinned to a single model (gemini-3-flash-preview) even though the AI Ultra account has access to seven (verified via the live gemini-cli-core probe), and a transient "No capacity available for model X on the server" was classified as `unknown` so SF gave up instead of retrying. 1. Account snapshot + model discovery in @singularity-forge/google-gemini-cli-provider - Add `snapshotGeminiCliAccount(cwd?)` returning { projectId, userTierId, userTierName, paidTier, models } where `models[]` carries each modelId with usedFraction, remainingFraction, and resetTime. Built on the same setupUser + CodeAssistServer.retrieveUserQuota path usage-bar.js already uses, but extracted to the dedicated package so any consumer (model picker, capacity diagnostics, catalog cache) can call one helper. - Add `discoverGeminiCliModels(cwd?)` as a thin "just the IDs" wrapper. - Both are best-effort: any failure (OAuth expired, no project, network) returns null silently — never throws. 2. SF-side cache writer at src/resources/extensions/sf/gemini-catalog.js - Delegates discovery to the package; only handles cache file path, 6-hour TTL, and the session_start lifecycle hook. - Cache lands at .sf/runtime/model-catalog/google-gemini-cli.json with the same shape as the generic model-catalog-cache, so getKnownModelIds and the model picker pick it up transparently. - Wired into bootstrap/register-hooks.js session_start in parallel with the existing scheduleModelCatalogRefresh (the generic REST + API-key path can't reach gemini-cli's OAuth-only Code Assist endpoint). 3. Capacity error classification fix - error-classifier.js SERVER_RE now matches "no capacity (available\|left)", "capacity (unavailable\|exhausted)", and "no capacity ... on the server". Previously these fell through to kind=unknown, which is not transient, so agent-end-recovery never retried — even though the same handler already caps gemini-cli rate-limit backoff at 30s for exactly this class of transient. With the pattern matched as `server`, the existing retry-with-backoff path covers it. The full extension test suite (1386 tests) passes. Typecheck clean for both the package and the SF extensions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 03:32:35 +02:00
Mikael Hugo	1d753af6b6	docs(dev): draft model registry contract for upcoming refactor Spec for consolidating the three alias tables (benchmark-selector, auto-model-selection, model-router) into a single SF-extension registry that reads from @singularity-forge/ai's MODELS and enriches it with canonical_id, generation, and tier. Shared interface for parallel Swarm A/B/C work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 02:57:27 +02:00
Mikael Hugo	f0f31989fe	refactor(autonomous-solver): extract prompt strings to .md templates Lands the prompt extraction the triage worker performed in dogfood round 5 on entry sf-mp37p9u6-eyobzb (inconsistency:prompts-monolithic- not-modular). Changes: - prompts/autonomous-solver-contract.md (new): solver loop block, with {{include:working-directory}} for the shared prefix. - prompts/autonomous-executor-contract.md (new): executor loop block, same fragment include. - prompts/autonomous-solver-pass.md (new): solver-pass classifier. - autonomous-solver.js: _buildAutonomousLoopPromptPrefix renamed to buildAutonomousLoopVars and returns the variables for the new templates instead of a pre-rendered string. Net -120/+60 lines. The {{include:fragment}} syntax is already supported by prompt-loader.js and the working-directory fragment already exists at prompts/fragments/working-directory.md. All 1386 tests pass; typecheck clean. Resolves: sf-mp37p9u6-eyobzb (inconsistency:prompts-monolithic-not-modular) Co-resolved: sf-mp37p9u0-hebruv (architectural-risk:single-transaction- migration) — already verified-and-closed by the triage worker via resolve_issue with kind=agent-fix, evidence "migrateSchema already uses per-migration BEGIN/COMMIT via runMigrationStep". JSONL audit log captured the resolution event end-to-end through the new appendResolutionToJsonl path (commit `ce58d3223`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 02:41:46 +02:00
Mikael Hugo	79db5704bc	fix(self-feedback): require structured evidence kind for trusted resolution Dogfood of the triage worker revealed that the agent can bypass the resolve_issue tool (which hardcodes kind=agent-fix) and write DB rows directly with non-canonical evidence shapes (null, or {file, line}). The earlier credibility check trusted any resolution that had a prose resolvedReason — a "legacy narrative" carve-out meant to preserve operator clears predating structured evidence. Brand-new sloppy agent resolutions slipped through that carve-out: 5/5 of today's triage resolutions had non-canonical evidence and would have been treated as authoritative under the old check. Replace the denylist/legacy-carve-out with an allowlist: - isSuspectlyResolved returns true unless resolvedEvidence.kind is in {agent-fix, human-clear, promoted-to-requirement}. - SUSPECT_RESOLUTION_KINDS is kept as documentation of the auto-version-bump case but the allowlist makes it redundant for the actual policy decision. Tests now cover both failure modes: prose-only resolution (no kind) and non-canonical evidence shape ({file, line}) both re-include the entry as a candidate. Legacy entries that genuinely lack an evidence kind are backfilled to kind=human-clear separately so they keep their resolution under the stricter check. A self-feedback entry (sf-mp4qoby4-meiir7, severity=high) was filed about the underlying bypass — markResolved should ALSO reject or auto-tag non-canonical writes at the writer layer, since the reader is currently the only gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 02:17:47 +02:00
Mikael Hugo	6e95c3542c	fix(bootstrap): always dispatch self-feedback triage on session_start The session_start hook only invoked dispatchSelfFeedbackInlineFixIfNeeded when triage.stillBlocked contained at least one high/critical entry. After the previous commit rewired the worker as a triage queue that returns every open forge-local entry (not just high/critical), this gate stranded medium/low backlog forever at startup — the unit was never given a chance to triage them. The dispatcher's own selectInlineFixCandidates is now the source of truth for eligibility; the call site should call unconditionally. Keep the high/critical-specific notify (still useful operator signal when the loud ones are present) but stop using it to gate the dispatch. The turn_end hook at the bottom of register-hooks.js already calls the dispatcher unconditionally, so this change aligns the two paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:59:13 +02:00
Mikael Hugo	ce58d32231	fix(self-feedback,state): close two state-drift gaps 1. Self-feedback JSONL is now a real append-only audit log. Previously markResolved updated the DB row in place but never echoed the resolution to JSONL, so a DB rebuild via importLegacyJsonlToDb would re-import all entries with their original pre-resolution state and silently lose every resolution that had ever landed. The JSONL was a half event log — creations yes, resolutions no. - Introduce a `recordType: "resolution"` JSONL record shape. Append one of these to the project JSONL whenever markResolved succeeds against the DB. Best-effort: failure to append never blocks the resolution itself. - Extend importLegacyJsonlToDb to handle both record types. Entry creations go through insertSelfFeedbackEntry (ON CONFLICT DO NOTHING — idempotent). Resolution events go through resolveSelfFeedbackEntry, which is already a no-op on missing or already-resolved rows, so replay is idempotent. - Tests cover: the appended record shape; a DB rebuild correctly reconstructing resolved_at/resolved_evidence_json from a JSONL audit trail; orphan resolution events (entry never existed) are a silent no-op. Closes self-feedback entry sf-mp4ikbta-2zcbhh. 2. The reconcile path at state-db.js:reconcileSliceTasks warns when an on-disk SUMMARY.md exists for a task whose DB row is still pending and refuses to silently import — a safety check so autonomous runs can't promote themselves to complete by writing a SUMMARY without a real DB transition. But operators had no remediation path when the drift was real (lost DB write, hand edit). They had to mutate the DB by hand. - New `state-reconcile.js` with `reconcileTaskFromSummary` exposes the remediation explicitly. Parses the SUMMARY via the existing parseSummary helper, validates via isValidTaskSummary, and writes status / completed_at / verification_result / blocker / key_files / full_summary_md into the DB row through a new `setTaskSummaryFields` helper in sf-db-tasks. - Returns structured { ok, reason, applied } outcomes — never throws — so operator tooling can branch on `db-unavailable`, `summary-missing`, `summary-invalid`, `task-not-in-db`, `already-done`. - The reconcile warning text now points at the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:55:30 +02:00
Mikael Hugo	5f245b721d	fix(self-feedback): rewire inline-fix worker as triage queue The inline-fix worker was a partial repair queue — it picked only high/critical+blocking entries plus my recent gap/architecture-defect override and left everything else (medium inconsistencies, janitor gaps, architectural-risks, low-severity gaps) sitting open forever. The requirement-promoter clusters by exact `kind` string and never fires on diverse forge-local entries (every open entry currently has a unique kind), so there is no other sweep that ever touches these. They just accumulate. The point of the worker is triage, not just repair: every open entry should get an eyes-on per session and reach one of three outcomes — fix, promote to requirement, or close as not-of-value with reason. Closing deliberately is a valid, expected outcome. Changes: - `selectInlineFixCandidates` now returns every open forge-local entry, modulo the existing credibility check that re-includes suspect resolutions. Severity and blocking filters are gone; the kind-based override is no longer needed because everything qualifies. - The dispatch prompt is rewritten as a three-way triage protocol (Fix / Promote / Close) with explicit guidance per outcome and explicit prohibition on the `auto-version-bump` evidence kind (which would re-open under the credibility check). - Tests collapse the three filter-coverage tests into a single "selects every open forge-local entry" assertion that exercises the full severity × blocking × kind matrix. Upstream feedback is still excluded — those entries describe behavior in other repos that the inline-fix unit cannot directly repair. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:46:24 +02:00
Mikael Hugo	085beb5199	docs(sf-ace): restore parked location + keep ADR cross-references SF's S05/T02 executor moved the doc back to docs/dev/sf-ace-patterns.md while completing the slice (correctly: that was the task's stated deliverable location). The doc is parked under docs/dev/drafts/ because ACE Coder has no active consumer for it; re-park it. Keep the ADR-019 / ADR-020 cross-references the executor added — they are real content improvements over the previous version. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:24:12 +02:00
Mikael Hugo	89b52b6011	fix(self-feedback): widen inline-fix candidate selection + drop upstream The inline-fix dispatcher had three blind spots that left forge-local architectural debt rotting in the ledger: 1. Filter required `severity ∈ {high, critical} AND blocking`. Medium `gap:` and `architecture-defect:` entries — describing the exact class of debt the inline-fix unit was built to repair — were dropped on the floor. The forge-local queue currently has 0 high+blocking open entries and 3 architectural gaps, so the old filter would dispatch on nothing local and fall back to upstream. 2. Resolutions were trusted unconditionally. `auto-version-bump` fires on any sf-version bump without verifying the bump contained a fix, silently burying defects. 3. Upstream feedback was merged into the candidate set. Upstream entries describe behavior observed in OTHER repos (e.g. `flow-audit:repeated- milestone-failure` from /srv/infra/apps/centralcloud_ops) — the inline-fix unit edits forge source and cannot repair issues in those other repos. Including them dispatches work the unit cannot perform. Changes to `selectInlineFixCandidates`: - Add kind-based override: entries with `kind` starting with `gap:` or `architecture-defect:` qualify regardless of severity/blocking. - Add resolution credibility check: re-include entries resolved with evidence kind `auto-version-bump`, or with no evidence kind AND no `resolvedReason` narrative at all. Legacy resolutions with a meaningful operator narrative (the historical format) are still trusted. - Drop `readUpstreamSelfFeedback()` from the candidate merge. Upstream stays readable for SELF-FEEDBACK.md rollups and operator review, just not auto-dispatched to inline-fix. Also relax the schedule-e2e readEntries timing assertion from a 100ms threshold to 500ms — the test is a catastrophic-regression guard, not a microbenchmark, and parallel-suite jitter on dev machines routinely adds >100ms even when the underlying read is fast (≤ a few hundred ms). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 22:23:57 +02:00
Mikael Hugo	5a2618c05d	fix(auto): re-dispatch on executor refusal instead of pausing The autonomous solver was designed precisely to handle executor refusals (per its own docstring: "the solver role MUST stay on a stable, agentic, refusal-resistant model independent of any per-unit routing choices"), but the refusal handler short-circuited past it and emitted a `blocked` checkpoint, which assessAutonomousSolverTurn unconditionally turns into a `pause` — defeating autonomous mode every time the router selects a capability-mismatched executor. The 1h model-block added in `3f2babb5d` was the right primitive but had no consumer: nothing actually re-dispatched the unit after the model was blocked, so the block only mattered if the operator manually unpaused and retried. This change wires the missing consumer: - Add per-unit `executorRefusalEscalations` counter to solver state plus a `recordExecutorRefusalEscalation` helper. Counter persists across iterations of the same unit and resets on unit change. - On `executor-refused`: block the refusing model and slice-routing entry (unchanged), file self-feedback (unchanged), then synthesize a `continue` checkpoint and return `{ action: "continue" }` directly so the auto loop re-dispatches the unit. selectAndApplyModel will skip the now-blocked model and pick a higher-tier fallback. - Bounded by `MAX_EXECUTOR_REFUSAL_ESCALATIONS=3`. When the budget is exhausted (an entire fallback chain refused on the same unit), fall back to the legacy blocked-and-pause path so the operator can review. - Bypass `assessAutonomousSolverTurn` on the refusal-continue path because its no-op detector would (correctly) reject a continue over a refusal transcript — but here the "no-op" is the whole point: we are explicitly swapping the routed model. Tests cover the new state field's init/persistence/reset semantics and the constant's invariants. Full SF extension suite (1369 tests) passes. Refs: sf-mp3bm6u0-2fskt8 (now fully addressed, not just AC1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 21:49:51 +02:00
Mikael Hugo	288a2a5fd7	docs(sf-ace): park SF→ACE pattern reference under docs/dev/drafts/ Promotes the .draft stub into a fuller 183-line reference covering six SF patterns (Preferences, PDD, UOK Gates, Notifications, Skills-as- Contracts, Idempotency) with SF source paths and ACE adoption notes. Filed under docs/dev/drafts/ with a STATUS: Draft header — no active consumer yet. SF's own priorities take precedence until ACE Coder maintainers pull on convergence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 21:30:34 +02:00
Mikael Hugo	32cfb6224b	test: migrate node:test imports to vitest and stabilize timing thresholds - Three .test.mjs files now import describe/it from vitest, matching the harness CLAUDE.md mandates for the SF extension suite. - schedule-e2e local readEntries threshold raised 50ms → 100ms with a comment noting full-suite parallelism adds scheduler/filesystem jitter on dev machines (CI threshold unchanged at 200ms). - e2e-smoke "headless new-milestone without --context" timeout raised 10s → 30s so the exit-1 assertion isn't flaky under load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 21:30:21 +02:00
Mikael Hugo	3f2babb5d1	fix(auto): block refusing executor model temporarily to force escalation on retry When classifyExecutorRefusal detects an executor refusal, the model is now temporarily blocked (1-hour TTL) via the existing blocked-models mechanism. This ensures that on retry — whether automatic or manual — the router skips the refusing model and the tier-escalation path in selectAndApplyModel picks a higher-tier alternative. This satisfies AC1 of self-feedback entry sf-mp3bm6u0-2fskt8. AC2 (refusal pattern detection) was already satisfied by the existing apology-no-tools pattern in classifyExecutorRefusal. Refs: sf-mp3bm6u0-2fskt8	2026-05-13 02:40:41 +02:00
Mikael Hugo	2cad6d54f4	fix(doctor): enrich flow-audit repeated-failure rollup with full diagnostic context The flow-audit repeated-milestone-failure rollup now includes: - Active milestone/unit and session pointer (AC1) - Stale dispatched units (AC2) - Runaway history (AC3) - Over-budget child processes (AC3) This satisfies the acceptance criteria of self-feedback entry sf-mp3ati7u-qqxcyi so operators can use the rollup evidence to repair stale dispatch, missing summary, runaway, or child-process handling without needing to re-run the flow audit manually. Refs: sf-mp3ati7u-qqxcyi	2026-05-13 02:25:29 +02:00
Mikael Hugo	65e195a9fd	feat: Created draft mapping of SF patterns to ACE reference draft SF-Task: S05/T01	2026-05-13 02:01:41 +02:00
Mikael Hugo	1ed505669b	fix(sf-db,autonomous-solver): resolve schema-drift and checkpoint runaway loop - sf-db-schema.js: per-migration transaction boundaries (runMigrationStep) so a late migration failure does not roll back earlier successful ones. Post-migration assertion recreates routing_history if missing. - routing-history.js: catch missing routing_history table at init and latch _dbTableAvailable=false so auto-start does not crash. - autonomous-solver.js: sticky identity guard in appendAutonomousSolverCheckpoint pins to orchestrator's unitType/unitId instead of trusting agent's claim. Emit journal event on identity mismatch. Record mismatchedIdentity diagnostic. Hard cap MAX_CHECKPOINTS_PER_ITERATION=5 in assessAutonomousSolverTurn. - Tests: add v52 DB smoke test with auto-start path; add sticky identity tests (4 cases); add excessive-checkpoint pause test. Fixes: sf-mp36kfqm-rjrzju, sf-mp37kjmo-1mfuru	2026-05-13 01:47:19 +02:00
Mikael Hugo	a49ea1da87	feat(sf/prompts): Phase 4 — cache_control breakpoints at static/dynamic boundary Split reorderForCaching into a structured reorderAndSplitForCaching that returns {before, after} at the semi-static→dynamic section boundary. - prompt-ordering.js: export reorderAndSplitForCaching — returns null if no dynamic sections, otherwise {before: static+semi-static, after: dynamic} - auto.js: import and wire reorderAndSplitForCaching into deps - phases-unit.js: use split function; pass promptParts to runUnit when split succeeds; fall back to flat reorderForCaching when null - run-unit.js: when promptParts is present, send a two-block content array [{type:text, text:before, cache_control:{type:ephemeral}}, {type:text, text:after}] so Anthropic-compatible providers cache the stable prefix - openai-completions.ts: preserve cache_control on text parts in convertMessages; skip maybeAddOpenRouterAnthropicCacheControl if any part already has cache_control Tests: 5 new contract tests for reorderAndSplitForCaching; all 4502 unit tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 01:36:22 +02:00
Mikael Hugo	3b83d09692	feat(sf/prompts): Phase 3 v2 — migrate milestone+slice builders to composeUnitContext Migrate buildPlanMilestonePrompt, buildValidateMilestonePrompt, buildCompleteMilestonePrompt, buildReplanSlicePrompt, buildResearchSlicePrompt, and renderSlicePrompt (plan-slice + refine-slice) from imperative inlined[] push loops to the v2 composeUnitContext API (manifest-driven, prepend/computed support). Changes: - unit-context-manifest.js: add 7 new ARTIFACT_KEYS (slice-summaries, blocker-summaries, queue, verification-classes, outstanding-items, previous-validation, prior-milestone-summary); update 7 manifests with correct prepend/inline/computed declarations - auto-prompts.js: import composeUnitContext; migrate all 6 builders; remove orphaned old buildValidateMilestonePrompt tail left by partial prior edit - tests: add auto-prompts-phase3.test.mjs with 7 contract tests covering plan-milestone, replan-slice, validate-milestone, and research-slice prompt generation Pre-computation pattern: complex async logic (blocker scan, slice aggregation, verification classes, prior validation) is computed imperatively before composeUnitContext, then returned from resolveArtifact. This preserves parallel execution of other artifacts. buildPlanMilestonePrompt keeps framingBlock imperative: the framing check wraps the composed inlinedContext rather than going inside the composer boundary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 01:02:48 +02:00
Mikael Hugo	ca5d869e34	feat(prompts): fragment infrastructure + RFC #4782 stub manifests Phase 1 — Fragment infrastructure: - Add {{include:fragment-name}} support to prompt-loader.js - fragmentsDir registered alongside promptsDir/templatesDir - warmCache() now reads prompts/fragments/*.md with 'frg:' prefix - Pre-resolution pass in loadPrompt() resolves {{include:}} before the {{var}} validator (colon is outside validator regex [a-zA-Z0-9_], so unresolved includes are caught as parse errors) - Lazy-load fallback for fragments mirrors existing prompt lazy-load - Create prompts/fragments/working-directory.md (Variant A: full contract including 'Do NOT cd to any other directory') - Create prompts/fragments/working-directory-ops.md (Variant B: ops prompts, no cd restriction) - Replace duplicated 3-line Working Directory boilerplate in 17 prompts with {{include:working-directory}} (12 files) or {{include:working-directory-ops}} (5 ops files) - One fix to Working Directory wording now propagates to all 17 prompts Phase 2 — RFC #4782 stub manifests: - Add deploy, smoke-production, release, rollback, challenge to KNOWN_UNIT_TYPES and UNIT_MANIFESTS in unit-context-manifest.js - All 5 builders already called composeInlinedContext() but returned "" because resolveManifest() found no entry; now they return live content - All 26 unit types now have manifests (resolveManifest returns non-null for every type in KNOWN_UNIT_TYPES) Tests: - 5 new tests in prompt-loader-fragments.test.mjs (include resolution, lazy-load fallback, unknown fragment error, nested var inheritance, variant-B fragment) - Full unit suite: 427 files passed, 4476 tests passed, 0 regressions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-13 00:30:19 +02:00
Mikael Hugo	55229f6604	fix(auto): split autonomous solver from executor per ADR-0079 - Lock solver model to kimi-k2.6 independent of unit-type router - Executor prompt no longer requires checkpoint tool call - Add dedicated solver pass that reads executor transcript and emits canonical checkpoint - Classify executor refusals as blocker outcomes (already partially implemented) - Classify no-op iterations (continue with zero work) as missing-checkpoint-retry - Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment Fixes sf-mp34nxb6-27zdx7	2026-05-12 23:55:02 +02:00
Mikael Hugo	e2f2cb7e2e	feat: Create Command Behavior Verification Matrix across CLI, TUI, and… SF-Task: S04/T01	2026-05-12 23:01:31 +02:00
Mikael Hugo	f789bf0f40	sf snapshot: uncommitted changes after 53m inactivity	2026-05-12 22:51:31 +02:00
Mikael Hugo	9a678f1449	sf snapshot: uncommitted changes after 270m inactivity	2026-05-12 21:58:31 +02:00
Mikael Hugo	93d547c65e	fix(headless): skip Ask→Build mode gate in SF_HEADLESS mode In headless mode the showConfirm dialog blocks forever since there is no TUI to answer it. The user already consented by calling /next or /autonomous explicitly — the gate adds no value and hangs the run. Add process.env.SF_HEADLESS !== '1' to the gate condition so headless runs bypass it and proceed directly to autonomous execution. Verified: `sf headless --command next` now completes slice S03 (719 526 tokens, 10 tool calls, $0.027) without hanging. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 17:28:09 +02:00
Mikael Hugo	d22df007a7	fix(headless): correct log message to show actual command format The log message said '/sf ${command}' but the actual command sent is '/${command}' (without the sf namespace). Fix to match actual dispatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 17:04:11 +02:00
Mikael Hugo	16db710468	sf snapshot: uncommitted changes after 49m inactivity	2026-05-12 16:45:04 +02:00
Mikael Hugo	0426aafad2	fix(headless): drop /sf prefix so typed commands route through extension dispatch headless.ts was sending `/sf {subcommand} {args}` to the RPC session, but commands are registered without the sf namespace (e.g. 'todo', 'autonomous'). _tryExecuteExtensionCommand parsed commandName='sf', found no match, and the LLM handled the request instead of the typed backend. Fix: send `/{subcommand} {args}` directly — matches what registerSFCommands registers and what the TUI already uses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 15:55:46 +02:00
Mikael Hugo	2bb9cdbeef	feat(scaffold): ADR-022 scaffold profiles (all phases) Add profile-aware scaffold system so SF does not lay down irrelevant templates in infra/ops/docs repos. ## What ships Phase 1 — data model - scaffold-versioning.js: add 'disabled' to VALID_STATES; readScaffoldManifest returns profile field; recordScaffoldApply preserves manifest.profile (fixes roundtrip bug where profile was stripped on every write). - scaffold-constants.js: PROFILES (app/library/infra/docs/minimal as Set<string>) and PROFILE_NAMES exports. Phase 2 — profile-aware drift detection - scaffold-drift.js: disabled bucket in emptyCounts, resolveActiveProfileSet integration, profile param on detectScaffoldDrift/migrateLegacyScaffold. - doc-checker.js: filter to active profile, skip disabled-state files. Phase 3 — auto-detection on first run - scaffold-profiles.js: detectRepoProfile() heuristics (nix→infra, terraform→infra, react→app, node-no-ui→library, docs-only→docs, else→app). - agentic-docs-scaffold.js: reads profile from manifest, auto-detects on first run, persists to manifest, filters SCAFFOLD_FILES to active profile. Phase 4 — migrate command - commands-scaffold-migrate.js: sf scaffold migrate --profile <name> Re-enables pending files entering the new profile; stamps state=disabled (or prunes with --prune) files leaving it; warns on editing/completed files. - commands/handlers/ops.js, commands/catalog.js: registered and tab-completed. Phase 5 — custom profiles + PREFERENCES.md frontmatter - scaffold-profiles.js: readPreferencesProfile(), loadCustomProfileSet() (~/.sf/profiles/<name>.yaml with extends/add/remove), resolveActiveProfileSet() implementing full ADR-022 §6 precedence. - All callers updated to use resolveActiveProfileSet as the single source of truth. Tests: 28 new tests in adr-022-scaffold-profiles.test.mjs — all passing. Pre-existing node:test stubs (3 files) unaffected. ADR: docs/dev/ADR-022-scaffold-profiles.md Misc: triage TODO.md dump into BACKLOG.md (phases-helpers export error T1, /todo triage typed-handler gap T1, structured triage tiers T2, sha-track markdown files T2, cross-repo triage T3). Reset TODO.md to empty template. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-12 15:28:03 +02:00
Mikael Hugo	ad53b792fb	docs(.agents): add AGENTS.md — directory map and override pattern Documents every folder under .agents/, what it contains, and the override-by-same-name pattern. Explains YOLO as a flag not a mode. is globally ignored but the spec file under .agents/ must be tracked. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 23:48:36 +02:00
Mikael Hugo	4f04fb4c34	chore(.agents): keep lean — remove default mode files, no modes list .agents/ is an override layer. Default modes (ask/build/autonomous) and default skills come from SF's built-in config. Project files only exist when overriding or adding something project-specific. - Remove modes/ask.md, modes/build.md, modes/autonomous.md (defaults) - Remove enabled.modes from manifest (nothing project-defined) - Policies and skills stay: they are project-specific overrides To override a mode or skill, add a file with the same name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 23:47:29 +02:00
Mikael Hugo	82d629c3ee	feat(.agents): add autonomous mode; clarify yolo is a flag not a mode - Add modes/autonomous.md — third SF mode (ask/build/autonomous). Describes UOK dispatch loop, bash 120s timeout, fresh-context-per-unit, recovery/runaway-guard, and when to use vs Build. - Add autonomous to enabled.modes in manifest.yaml. - Update policies/yolo.yaml description: YOLO is a flag on Build or Autonomous, not a mode, not a Shift+Tab stop. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 23:45:24 +02:00
Mikael Hugo	8ea4b0745d	fix(.agents): list all 5 skills in manifest.yaml enabled.skills sf-wiki, forge-autonomous-runtime, forge-command-surface, nix-build, and smoke-test are all present in .agents/skills/ and must be declared in enabled.skills per the AGENTS-1 spec. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 20:12:37 +02:00
Mikael Hugo	a9ebfb4442	fix(skills): move sf-wiki project override to .agents/skills/ (standard location) .agents/skills/ is the documented standard for project-level skill overrides (docs/user-docs/skills.md). .sf/skills/ is also searched but .agents/skills/ is the ecosystem-standard path used across all compatible agents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 20:10:21 +02:00
Mikael Hugo	f3d84cd116	.agents: adopt agentsfolder/spec v0.1 as canonical agent configuration Some checks failed CI / detect-changes (push) Has been cancelled Details CI / docs-check (push) Has been cancelled Details CI / lint (push) Has been cancelled Details CI / build (push) Has been cancelled Details CI / integration-tests (push) Has been cancelled Details CI / windows-portability (push) Has been cancelled Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Has been cancelled Details CI / rtk-portability (macos, macos-15) (push) Has been cancelled Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Has been cancelled Details Replaces the fragmented (AGENTS.md + CLAUDE.md + .github/copilot-instructions.md + .sf/STYLE.md + .sf/PRINCIPLES.md + .sf/NON-GOALS.md) surface with a single canonical .agents/ tree per https://github.com/agentsfolder/spec. Structure: .agents/manifest.yaml spec metadata + defaults + project info .agents/prompts/ base.md project-agnostic base prompt project.md SF-specific: purpose-first, DB-first, build pipeline, Ask/Build/YOLO model snippets/{style,principles,non-goals}.md short pointers into .sf/{STYLE,PRINCIPLES, NON-GOALS}.md for composition .agents/modes/{ask,build}.md YAML front matter + human-readable body .agents/policies/{default-safe,yolo}.yaml conservative default + YOLO override .agents/skills/.gitkeep empty per spec — SF's own skills not yet migrated to agentskills.io format .agents/scopes/.gitkeep single-tree, no scopes yet .agents/profiles/.gitkeep no overlays yet .agents/schemas/.gitkeep generated by validators .agents/state/.gitignore excludes state.yaml from VCS per spec Status: spec is pre-1.0 (specVersion 0.1.0 pinned). No agent runtime currently reads .agents/ — this is structural adoption ahead of ecosystem support. Legacy files (AGENTS.md, CLAUDE.md, etc.) kept during the transition; .agents/ is now the canonical surface and they will eventually point here. This is the reference template; centralcloud/infra, operations-memory, oncall-mobile-android to follow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 20:04:35 +02:00
Mikael Hugo	edd0eb22ac	feat(skills): add project-level sf-wiki skill override with UPPERCASE convention .sf/skills/ is the project-local skill override directory. This override inherits all sf-wiki defaults and adds one project-specific rule: wiki pages use UPPERCASE filenames (INDEX.md, ARCHITECTURE.md, etc.) to match the .sf/ operational file convention (DECISIONS.md, KNOWLEDGE.md, etc.). The built-in src/resources/skills/sf-wiki/SKILL.md stays generic (lowercase). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-11 19:54:18 +02:00

1 2 3 4 5 ...

4503 commits