singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	587b5fa31c	refactor: narrow sf slash surface	2026-05-14 20:04:53 +02:00
Mikael Hugo	5ce9df2e37	refactor: make bundled agents internal	2026-05-14 19:54:56 +02:00
Mikael Hugo	18aa257ede	refactor: rename review gate agent	2026-05-14 19:43:01 +02:00
Mikael Hugo	62fbc5d57b	refactor: align agent resource overlays	2026-05-14 19:32:41 +02:00
Mikael Hugo	7000373e88	fix(uok-status): surface manualAttention bucket in status uok output Codex audit follow-up (fix A). manual-attention outcomes were counted by getGateRunStats but dropped from the user-facing surface — they inflated `total` invisibly with no distinct column or key, so an operator couldn't tell a gate with 5 pass / 3 manual-attention apart from a gate with 5 pass / 3 fail. Adds `manualAttention: number` to GateHealthEntry and renders it as its own column between Fail and Retry in the human table. JSON consumers get the new key alongside pass/fail/retry. Test count for headless-uok-status.test.mjs: 30/30 (+2 new — column present in header, distinguishable from fail in row). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:46:28 +02:00
Mikael Hugo	7794208340	test(uok,slice-3b): cover ctx propagation through gate-runner, phases, plan-slice Adds focused unit tests for the slice-3b wiring: - UokGateRunner.run emits surface/runControl/permissionProfile/ parentTrace on all three trace paths (normal, unknown-gate, circuit-breaker-blocked) and omits them when absent. - buildAutonomousUokContext pins surface=autonomous + runControl= autonomous and derives permissionProfile from session/prefs (YOLO → low, prefs.permissionLevel honored, "high" default). - emitAutonomousGate forwards the schema-v2 ctx into UokGateRunner (covers the phases-pre-dispatch / phases-guards call sites via the new shared helper). - handlePlanSlice options.uokContext lands on every seeded Q3-Q8 quality_gates row; without it, rows stay in the legacy null shape. Refactors phases-pre-dispatch and phases-guards to call the new emitAutonomousGate helper so the three sites stay in sync going forward. phases-finalize keeps its inline UokGateRunner because the verification gate's execute callback isn't a static verdict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:33:26 +02:00
Mikael Hugo	95ea9eecee	feat(uok,plan-slice): seed Q3-Q8 gate rows with schema-v2 ctx from autonomous session Slice 3b of "Make UOK the SF Control Plane". handlePlanSlice now accepts an optional uokContext option and threads it into every insertGateRow call (Q3, Q4 slice gates; Q5, Q6, Q7 per task; Q8 slice closeout). executePlanSlice derives the ctx from the singleton autonomous session when one is active — currentTraceId becomes the v2 traceId/parentTrace, surface and runControl are pinned to "autonomous", permissionProfile follows session/prefs. Tools invoked outside an autonomous loop (interactive REPL, headless one-shot) pass uokContext=null and the seeded rows fall through to the legacy NULL-column shape, classified as "legacy" by status uok. Lazy import of auto/session.js keeps headless/test code paths from paying the session-singleton load cost when they don't need it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:20:32 +02:00
Mikael Hugo	a2c55d5fde	feat(uok,autonomous-loop): wire pre-dispatch/guard/finalize gates to schema-v2 ctx Slice 3b of "Make UOK the SF Control Plane". The autonomous loop's three high-traffic gate sites (resource-version-guard, pre-dispatch-health-gate, planning-flow-gate in phases-pre-dispatch; plan-gate in phases-guards; unit-verification-gate in phases-finalize) now build a schema-v2 UOK run-context per iteration and pass surface/runControl/permissionProfile/parentTrace into the gate runner. The gate-runner emits these onto every gate_run trace event, so the classifier in `sf headless status uok --json` reads them as coverageStatus: "ok" instead of "legacy". New helper uok/auto-uok-ctx.js pins surface="autonomous" and runControl="autonomous" for these phases and derives permissionProfile from session/prefs: "low" under YOLO or a minimal/low permissionLevel, "medium" for medium, "high" otherwise (the default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:18:17 +02:00
Mikael Hugo	7003da3f6a	test(uok): assert triage-apply-mutation-gate fires after agree-path Codex audit (Q4) flagged that the mutation gate landed in slice 3a but the test suite only verified the three earlier gates. Add coverage: - agree-path: mutation-gate fires with outcome=fail, rejectedCount=1, resolvedCount=0 (the test fixture has no real ledger entry for the decision id, so markResolved rejects it — the gate correctly surfaces the partial failure) - disagree-path: mutation-gate does NOT fire (apply phase skipped) Pins the 4-gate contract end-to-end. Suite: 4/4 in this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:16:04 +02:00
Mikael Hugo	cf52aceb64	feat(uok,gate-runner): extend ctx with surface/runControl/permissionProfile/parentTrace Slice 3b of "Make UOK the SF Control Plane". UokGateRunner.run now reads the schema-v2 run-context fields off ctx and propagates them into every gate_run trace event (unknown-gate path, circuit-breaker-blocked path, normal execution path). Fields are omitted when absent so legacy callers keep the pre-v2 shape and status-uok continues to classify them as "legacy" rather than "incomplete". Helper buildGateRunEvent centralizes the trace shape so the three sites stay in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:13:45 +02:00
Mikael Hugo	61d3031007	test(uok): fail-closed contract for triage-apply gate emission Adds the missing test case that confirms the fail-closed semantics the parallel worker shipped in slice 3a: when the trace writer cannot persist a UOK gate record (e.g. .sf/traces is unwritable), runTriageApply MUST abort before any subagent runs and surface the emission failure as the run error. This pins down the contract codex Q5 noted as soft: enrichment failures are debug-only, but PRIMARY gate emission for the apply flow is hard-required. Without observable gates, an apply that mutates the ledger has no audit trail — refusing is the right call. Test asserts: trace-dir write failure → ok=false, error contains "UOK gate emission failed for trusted-agent-source-gate", and the mocked agentRunner was never invoked. Suite: 1682/1682. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:08:29 +02:00
Mikael Hugo	454e051aed	feat(uok): slice 3a — triage --apply emits 4 schema-v2 UOK gates First production caller of the schema-v2 writer chain. Every `sf headless triage --apply` invocation now emits four gate_run trace events with surface=headless, runControl=supervised, permissionProfile= high, traceId=flowId — making the gates visible in `status uok --json` with coverageStatus: "ok" (or fail/manual-attention on reject paths). Gates emitted, in order: 1. trusted-agent-source-gate — fires on the trust precondition: pass: both triage-decider and rubber-duck are SF-shipped built-ins fail: missing-agent OR non-builtin source OR untrusted custom runner (covers all three pre-dispatch refusal paths so operators see the failure in status uok, not just in the journal) 2. triage-plan-validation-gate — fires on the strict-parse contract: pass: parseTriagePlanStrict returns a valid plan covering expectedIds fail: missing marker / bad yaml / unknown id / outcome-required field missing 3. triage-apply-review-gate — fires on the rubber-duck verdict: pass: rubber-duck: agree → apply phase proceeds fail: rubber-duck disagreed → clean pause, no mutations manual-attention: rubber-duck subagent failed to complete 4. triage-apply-mutation-gate — fires after applyTriagePlan: pass: every approved mutation landed fail: any rejected mutation manual-attention: zero approved mutations (all decisions were "fix") Includes counts in extra: resolvedCount, rejectedCount, pendingFixCount. Reader-side fixes (codex review follow-up on slice 3a): - getDistinctGateIds (sf-db-gates.js) now UNIONs trace-event IDs with quality_gates DB IDs instead of returning trace IDs early when any exist. The old behavior silently hid slice-scoped DB-only gates the moment a flow-scoped trace landed. - getGateMeta (headless-uok-status.ts) now reads BOTH trace events and DB row, then picks whichever has the later evaluatedAt. Tie-break prefers trace (flow-scoped gates with no quality_gates FK row are trace-only). Old behavior preferred trace whenever surface was set, regardless of timestamp. Live verification: ran `sf headless triage --apply` 4 times against the operator's environment (rubber-duck is a project-level override). trusted-agent-source-gate now shows in `sf headless status uok --json` with total: 4, fail: 4, coverageStatus: "ok" — proving the schema-v2 metadata round-trips through the trace events and reaches the classifier. Tests: - headless-triage-uok-gates.test.ts (3 new tests): agree path emits 3 pass gates with v2 metadata; disagree path emits review fail; unknown-id path emits validation fail with no review gate. - Existing test suites adjusted for the GateMetadataRow → GateRunContextRow rename (classifier helpers renamed consistently across .ts source and the .mjs test mirror). - Full SF + headless apply: 1681/1681. Still legacy in production (slice 3b targets these next): - phases-pre-dispatch.js gates: resource-version-guard, pre-dispatch- health-gate, planning-flow-gate. None of these pass uokContext yet. - phases-unit.js gates: unit-verification-gate, plan-gate. - plan-slice.js: Q3/Q4/Q5/Q6/Q7/Q8 seed gates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:04:50 +02:00
Mikael Hugo	f0c57b58c6	feat(uok): slice 2 — schema-v2 metadata adapter + writer chain Second slice of "Make UOK the SF Control Plane". Wires the DB-level capability for schema-v2 gate metadata so future callers can flip quality_gates rows from "legacy" to "ok"/"stale"/"incomplete" by passing a canonical uokContext. No production caller passes ctx yet — slice 3 wires producers (headless triage --apply, phases-pre-dispatch, phases-unit). Schema migration v66 (SCHEMA_VERSION bumped 65 → 66): - quality_gates gains 5 nullable columns: surface, run_control, permission_profile, trace_id, parent_trace. - Idempotent ALTERs via PRAGMA table_info probes — fresh-DB CREATE path already includes the columns; migration only ALTERs older DBs. - Existing rows keep NULL across the new columns, so classifyCoverage in headless-uok-status reads them as "legacy" — no day-one warning flood. New adapter src/resources/extensions/sf/uok/run-context.js: - buildUokRunContext(opts) validates and normalizes the canonical camelCase shape: surface, runControl, permissionProfile, traceId (required), plus parentTrace, unitType, unitId, milestoneId, sliceId, taskId (optional). Frozen on success, null on any invalid or missing required field. - VALID_SURFACES / VALID_RUN_CONTROLS / VALID_PERMISSION_PROFILES enums reject typos at build time so we don't get silent schema-v2 rows with garbage in the enum columns. - uokRunContextToGateColumns(ctx) translates camelCase → snake_case column shape used by sf-db-gates writers. Writer chain (sf-db-gates.js): - insertGateRow now imports uokRunContextToGateColumns and translates g.uokContext (canonical camelCase) to the SQL column shape. Callers pass canonical ctx, the DB writer owns translation. NULL on legacy callers, NULL on malformed ctx. - saveGateResult mirrors the same translation; uses COALESCE(:col, col) so a missing ctx on a follow-up update preserves the row's existing schema-v2 metadata instead of nulling it. Reader chain (headless-uok-status.ts): - getGateMeta SELECTs surface, run_control, permission_profile, trace_id alongside scope and evaluated_at. ORDER BY uses "evaluated_at IS NULL, evaluated_at DESC" for cross-SQLite safety (NULLS LAST is not portable). - classifyCoverage signature changed from (entry, metadataPresent: bool) to (entry, meta: GateMetadataRow). Returns "incomplete" when surface is set but runControl/permissionProfile/traceId missing — surfaces buggy writers instead of silently classifying as "ok". Tests: - uok-run-context.test.mjs (12 tests): adapter validation, enum rejection, optional-field handling, frozen output, column translation. - uok-quality-gates-writer.test.mjs (5 tests): real DB round-trip proving insertGateRow + saveGateResult populate schema-v2 columns from canonical camelCase ctx, leave NULL on legacy/malformed, and preserve existing metadata via COALESCE on no-ctx updates. - headless-uok-status.test.mjs adjusted: classifier now takes GateMetadataRow; added test for "incomplete" classification. - sf-db-migration.test.mjs bumped expected version 65 → 66 and asserts the 5 new quality_gates columns exist. Full SF suite: 1678/1678 ✓ (+17 from slice 2 + +9 from slice 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:48:05 +02:00
Mikael Hugo	c058bef26d	feat(uok-status): slice 1 — schema v2 + coverage classification + legacy tagging First slice of "Make UOK the SF Control Plane". Ships the operator- facing visibility primitive that subsequent slices fill in. No enforcement yet, no new gates yet — just the contract. Changes to sf headless status uok: - Bumps JSON output to schemaVersion: 2. - Adds coverageStatus per gate (ok \| stale \| incomplete \| missing \| legacy). Slice 1 only populates ok / stale / legacy: - legacy row predates schema-v2 metadata (every existing row today). NOT a warning — operators are not paged for the rich history of pre-v2 records. - stale schema-v2 row with no runs in window, OR last run older than the 24h stale threshold. Surfaces gates that stopped being exercised. - ok schema-v2 row with recent runs in window. incomplete / missing wait for the schema-v2 writer adapter (slice 2) and the configured-gate registry (later). - Adds the Coverage column to the human table output. - Removes the stale "missing getDistinctGateIds import" workaround comment from headless-uok-status.ts:104. The import exists today (gate-runner.js:5); the comment was lying. Bypassing UokGateRunner.getHealthSummary is still appropriate but for a different reason — documented inline. Tests (28 total, +9 new): - classifyCoverage: legacy wins over freshness; ok requires metadata + recent runs; stale fires on no-runs-in-window or last-run > 24h. - empty-DB does not false-positive coverage warnings (the bug codex called out in the plan review). - formatTable includes the Coverage column and renders each status distinctly. hasSchemaV2Metadata is a placeholder that returns false today; it will read row.surface / row.run_control / row.permission_profile when those columns ship in slice 2. Next slice: adapter foundation — start writing schema-v2 metadata into new gate rows from headless and autonomous paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:35:52 +02:00
Mikael Hugo	12f5eb2279	feat(triage): wire --apply CLI + canonical resolve_issue evidence kinds Three coupled changes that together complete the operator-facing --apply surface for sf headless triage: 1. headless.ts: parse --apply from commandArgs and forward to handleTriage. The triage option flow now distinguishes inspect (--list, --json), one-shot (--run), and orchestrated apply (--apply) cleanly. 2. help-text.ts: triage subcommand line + examples block now document the --apply mode (triage-decider → rubber-duck pipeline). 3. bootstrap/db-tools.js: resolve_issue tool now accepts the full canonical evidence-kind set instead of hardcoding "agent-fix": - agent-fix (default; commit-based fix evidence) - human-clear (stale, superseded, false positive, intentional close) - promoted-to-requirement (with required requirement_id) The tool surfaces a clear error when promoted-to-requirement is used without requirement_id. The promptGuidelines updated to walk callers through choosing the right kind. self-feedback-db.test.mjs extended with coverage for all three evidence kinds + the missing-requirement_id rejection path. Together these make sf headless triage --apply genuinely useful: the agent can produce a plan with any outcome, rubber-duck reviews it, and the runner applies via resolve_issue with the right evidence kind per decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:23:10 +02:00
Mikael Hugo	1881918ab8	feat(subagent): prompt-parts runtime — canonical named-parts composition New module: src/resources/extensions/sf/subagent/prompt-parts.js. Replaces the copilot-shaped boolean include* matrix with a canonical SF-native form: promptParts: [aiSafety, toolInstructions, parallelToolCalling, customAgentInstructions, environmentContext, agentBody, ...] Each part is a registered renderer (PROMPT_PARTS) that emits a specific section text given context. composeAgentPrompt orders parts deterministically, deduplicates, and concatenates with consistent separators. validatePromptParts rejects unknown keys at agent-load time so typos surface immediately instead of silently producing an empty section. Integrated into: - subagent/agents.js: validateAgentDefinition runs the new validator at agent discovery; built-in agents must validate (project/user agents with invalid promptParts get skipped). - subagent/index.js: dispatch path uses composeAgentPrompt to assemble the runtime system prompt. - unit-context-manifest.js: unit-type manifests declare their promptParts allowlist; validation runs against the same registry so unit dispatch and agent dispatch share one canonical schema. - agents/rubber-duck.agent.yaml: converted from the boolean include* form to the canonical array form. Tests: - subagent-agent-yaml.test.mjs: validates the array shape, rejects unknown part keys, asserts built-in agents validate cleanly, project overrides win. - unit-context-manifest-prompt-parts.test.mjs (new): asserts every unit-type manifest's promptParts is valid per the registry. The copilot boolean-include shape is intentionally NOT supported: this is the SF-native canonical form, simpler to read and harder to typo (no silent no-op for misspelled keys). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:22:26 +02:00
Mikael Hugo	f038f2a072	fix(uok-gate-runner): use correct getRelevantMemoriesRanked API The "Memory enrichment failed for gate test: DB error" warning in test output was a real API mismatch, not a benign degradation. The previous code called getRelevantMemoriesRanked(embedding, "gotcha", 2) but the canonical signature is getRelevantMemoriesRanked(query, limit). Replace the embedding-based call with a query-string built from gateId + failureClass + rationale, and pass limit=2. The embedding helper (computeGateEmbedding) is removed entirely since the memory store does its own embedding internally. Also switch the enrichment-failure log from logWarning to debugLog — gate enrichment is best-effort and must not affect gates, so the failure path should not surface as a warning to operators. Test fixture updated to assert against the new API call shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:21:18 +02:00
Mikael Hugo	6851869c00	refactor(auto): rename promptParts → promptCacheSplit in run-unit path The cache-split signal {before, after} was named promptParts in the autonomous-unit dispatch path, overloading the same term that .agent.yaml uses for declarative prompt-section composition. With the prompt-parts runtime landing as canonical (`aiSafety`, `toolInstructions`, ...), the overload becomes confusing — promptParts now means "list of declarative section keys", not "before/after cache-split tuple". Renames in run-unit.js, phases-unit.js (call site), and run-unit.test.mjs. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:20:59 +02:00
Mikael Hugo	289bf9e264	fix(triage-apply): strict plan validation + custom-runner guard + per-decision failures Codex review follow-up (2026-05-14) addressed all three remaining issues from the earlier rescue pass: 1. Strict plan validation. parseTriagePlanStrict refuses the WHOLE plan on any malformed item instead of silently dropping. Enforces: - completion marker "Self-feedback triage complete" present - exactly one fenced ```yaml block - every decision has non-empty id + outcome ∈ {fix, promote, close} - outcome-specific required fields (close → reason; promote → reason + requirement_id; fix → proposed_approach) - duplicate ids rejected - when expectedIds is supplied, decisions must cover the candidate set exactly — no extras (hallucinated ids), no missing Returns ParseTriagePlanResult with {plan, error} so the caller can surface the specific failure reason. 2. Custom-runner trust guard. runTriageApply refuses an injected options.agentRunner unless allowUntrustedRunner is also explicitly set. Production callers cannot inject a runner. Without this guard a custom runner could side-channel-mutate the ledger despite the read-only tool override (codex Q2). 3. Per-decision failure surfacing. applyTriagePlan now returns {resolvedIds, rejectedIds, pendingFixIds} instead of just resolvedIds. runTriageApply reports ok=false if rejectedIds is non-empty, with the count + ids in the error message. Mutations still happen one-by-one (no SQL transaction wrapping) but the failure is no longer silent (codex Q3). Tests: src/tests/headless-triage-apply.test.ts now covers: - agree-path runs both agents in order; apply fails on missing ledger entry → ok=false, rejectedIds populated (the realistic contract for a test fixture without a seeded DB) - custom runner without allowUntrustedRunner refuses, agentRunner never invoked - rubber-duck disagrees → clean pause, ok=false, agreed=false - decider fails → skip rubber-duck - unknown id in plan rejected before review Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:19:12 +02:00
Mikael Hugo	d8ce433c7a	fix(triage-apply): plan-and-review pipeline, no mutations before agreement Codex review (2026-05-14) flagged the original runTriageApply design as unsafe: triage-decider was invoked with resolve_issue in its tool list, so it could (and would) close ledger entries during its own turn — BEFORE rubber-duck saw the decisions. If rubber-duck disagreed, the mutations from phase 1 had already landed with no rollback path. Restructured to a 3-phase plan-and-review pipeline: Phase 1 — Plan: triage-decider runs READ-ONLY (resolve_issue removed from both the YAML and the runner's tool override) and emits a structured YAML plan as a fenced block. The plan is the contract; parseTriagePlan extracts it. Phase 2 — Review: rubber-duck reads the parsed plan + the original ledger entries and votes "rubber-duck: agree" or names concerning decisions. Read-only tools. Phase 3 — Apply: ONLY on agreement, this runner (not an agent) calls markResolved for each close/promote decision. Fix decisions are surfaced to the operator and never auto-mutate. Other codex-flagged gaps addressed: - Trusted-source guard: --apply refuses to run when either agent has source != "builtin". Project/user overrides shadow built-ins (the documented precedence), but they don't get to silently disable rubber-duck's independence. Operators can still customize via --review mode. - Plan-not-emitted is a hard refuse: if the decider's output has no parseable ```yaml decisions: block, the apply runner returns ok=false with a clear error. We can't audit what we can't read. - Disagreement is a clean pause, not an error: returns ok=false with agreed=false and both outputs preserved for operator review. - The triage-decider YAML's prompt now codifies the plan-only contract explicitly: "You do not call resolve_issue. You produce a structured decision plan." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:10:43 +02:00
Mikael Hugo	ab682ddd6e	feat(subagent): built-in rubber-duck + triage-decider agent YAMLs First slice of putting the triage/rubber-duck flow into SF itself (sf-mp5lnlbc-ty5fec). Two built-in agent definitions ship with SF and get auto-discovered alongside operator-defined ones — no setup needed. agents/rubber-duck.agent.yaml Devil's-advocate critic. Tools: "*". Reviews any artifact (default consumer: triage --apply pipeline) and surfaces ONLY confidently-real concerns. High-signal output: "rubber-duck: agree" or `## Concern N:` sections with evidence citations. Never proposes fixes. agents/triage-decider.agent.yaml Self-feedback queue decider. Tools: [resolve_issue, view, grep, glob, git_log] — read-only investigation plus the one mutating tool needed to close/promote entries. No edit/write/bash — code fixes go to the operator. Implements the existing buildInlineFixPrompt protocol (Fix/Promote/Close per entry). Both YAMLs include the copilot-style promptParts block as intent documentation. SF's prompt-composition runtime doesn't honor those flags yet; the day it lands, the agents pick it up without a YAML edit. discoverAgents now loads from a built-in directory (sibling agents/ to subagent/) with source: "builtin". User and project definitions override built-ins by name, preserving the existing precedence model. Tests assert: (1) both built-ins discovered with source=builtin in scope=both, (2) project override wins over built-in. Full SF suite: 1637/1637. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:53:36 +02:00
Mikael Hugo	192129a69e	fix(triage): drop defaultModel from triage candidate pool Operator's settings.json defaultModel is for general dispatch (typically a cheap/flash pick — gemini-3-flash-preview in current config). Mixing it into the triage candidate pool gave it a chance to win on cost tie-break against agentic-better but pricier options from the explicit enabledModels allowlist. Triage is agentic-heavy; restrict its candidate pool to the operator's enabledModels (kimi-coding/* + minimax/* + zai/* + …) and let the agentic-weighted router pick. Also fixes the wildcard expansion path which was calling a non-existent ai.getModelsByProvider — now correctly uses ai.getModels(provider). Dogfood confirms: router now picks kimi-coding/kimi-for-coding (agentic 90) instead of gemini-3-flash-preview (operator default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:40:19 +02:00
Mikael Hugo	98d1b2b258	feat(triage): route runTriage via model-router using operator allowlist Drops the hardcoded "google-gemini-cli/gemini-3-pro-preview" default and routes through SF's own model-router using a new BASE_REQUIREMENTS["self-feedback-triage"] (agentic-heavy: coding 0.4, instruction 0.8, reasoning 0.8, agentic 0.9). Candidate selection priority: 1. Explicit options.model override (operator --model) 2. options.candidates (test injection) 3. ~/.sf/agent/settings.json enabledModels (expanded against pi-ai MODELS catalog) + defaultProvider/defaultModel 4. TRIAGE_FALLBACK_CANDIDATES — Chinese-provider set (kimi + minimax + zai). Gemini intentionally NOT in the fallback so operators who removed it from settings don't silently re-default. Dispatch walks the router-ranked list with retry-on-credential-error so the top pick failing on missing API keys falls through to the next candidate (caught the openai-no-key case in dogfood today). Closes part 1 of sf-mp5khix3-9beona AC1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 16:29:56 +02:00
Mikael Hugo	e2dd625d7d	sf snapshot: uncommitted changes after 383m inactivity	2026-05-14 16:03:35 +02:00
Mikael Hugo	2f0e5c8054	feat(subagent,run-unit): YAML agent loader + solver-pass tool scoping Two coupled product changes from the working tree, validated together: 1. Agent YAML loader (subagent/agents.js + subagent-agent-yaml.test.mjs) .sf/agents/.agent.yaml files now load as first-class agent definitions alongside the existing .agent.md frontmatter format. Adds `` wildcard support for the tools field (unrestricted) and a parseAgentModel helper for the YAML-only model selector. Mirrors the copilot-style YAML format so SF can consume agent definitions shared across tools without forcing the markdown wrapping. 2. Solver-pass tool scoping (run-unit.js + phases-unit.js + run-unit.test.mjs) New scopeActiveToolsForRunUnit honors an explicit activeToolsAllowlist so callers can restrict a unit dispatch to a tighter tool set than the unit-type's default SF allowlist. The autonomous solver pass uses this to constrain the solver to just `checkpoint` — solver should reason and persist checkpoints, not edit files or dispatch tools. Keeps the solver inside its authority boundary. Tests: 7/7 in the two affected files; full SF suite stays green. Not in this commit: the sidekick-trigger event emission in autonomous-solver.js and the external scripts/sidekick-runner.js + .agents/policies/proactive-sidekick.yaml — that's an experiment that stays in the working tree pending operator direction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 09:40:13 +02:00
Mikael Hugo	7ea41b89ae	feat(ai,coding-agent): wireModelId — provider deployment alias Adds an optional wireModelId field to the Model interface and a resolveWireModelId helper. Forge's canonical model.id stays stable for selection, capability scoring, policy, and history; providers now send model.wireModelId on the wire when set, model.id otherwise. Use cases: Azure deployment names, vendor model slugs that differ from Forge's canonical identity, A/B routing where the operator wants canonical history but a specific deployment. Wired through every provider in @singularity-forge/ai (anthropic, amazon-bedrock, azure-openai-responses, google, google-vertex, google-gemini-cli, mistral, openai-codex-responses, openai-completions, openai-responses) plus @singularity-forge/coding-agent's ModelRegistry (model definitions + per-model overrides). Tests: openai-completions wireModelId payload coverage + model-registry-auth-mode coverage for the override + definition fields. Full pi-ai + coding-agent suite: 956/956 ✓ (7 unrelated skipped). This realizes the model-registry contract drafted in `1d753af6b`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 09:25:21 +02:00
Mikael Hugo	a6c36a4b6b	fix(headless-triage): --run takes precedence over --json/--list Discovered via dogfood: \`sf headless triage --run --json\` short- circuited to the candidate-list JSON before reaching the dispatch path, so the run never happened. --run is the action; --json/--list describe output format. Restructure so --run always dispatches; --json then controls whether the run result is JSON vs human text. Without --run, --json/--list still emit the candidate digest as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 08:29:11 +02:00
Mikael Hugo	65c1914b1f	test(idle-triage): lock in surfaceSelfFeedbackQueueOnIdle contract Five unit tests covering the bail-time queue notifier landed in `001740680`: notify-with-pointer when candidates exist, plural/singular noun agreement, silent on empty queue, silent on non-forge basePath, no-throw when downstream notify itself crashes (bail-path safety). Locks in the contract for the partial-AC1 slice of sf-mp4rxkwb-l4baga (autonomous loop surfaces the queue at idle) without yet touching the larger remaining work (real self-feedback-triage unit type with begin/dispatch/checkpoint/complete). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 08:11:10 +02:00
Mikael Hugo	fa9baf71d5	feat(secret-scan): SF_SECURITY_FAST contract for the regex-only fast path Codifies AC4 of sf-mp4w2dij-xm6cwj: the regex-only path is the today-default fast mode. SF_SECURITY_FAST=1 is the explicit opt-in for callers that want to assert "regex-only, no LLM escalation, sub-100ms" regardless of any future tiered reviewer landing in the script. Today the env var changes only the trailing status line so operators can verify the contract is observable. When the LLM-backed review hook (AC1) lands, the absence of SF_SECURITY_FAST becomes the trigger for escalation; setting it=1 keeps offline / pre-commit callers on the fast path. Locked in by tests in both the .sh and .mjs scanners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 07:57:02 +02:00
Mikael Hugo	001740680b	feat(headless,auto): surface self-feedback queue at autonomous-loop idle Two thin slices toward sf-mp4rxkwb-l4baga: 1. Help text. The triage and reflect commands have shipped over the last few commits but neither was discoverable via `sf headless help`. Add both to the command list + add five usage examples covering the piping and --run patterns. 2. Bail-time queue notifier. When the autonomous loop is about to break for "no-active-milestone" or "milestone-complete" while open self-feedback entries still exist, surface the queue with a clear pointer to `sf headless triage --list` / `--run`. Best-effort wrapper that never throws — the proper fix (triage as a real unit type with begin/dispatch/checkpoint/complete lifecycle) is the larger remaining slice of the parent entry; this just makes the queue VISIBLE at the exact moment operators historically lost track of it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 07:44:34 +02:00
Mikael Hugo	34521814cc	feat(headless): sf headless triage --run — dispatch via @singularity-forge/ai Adds runTriage to self-feedback-drain.js, mirroring runReflection in reflection.js: provider-agnostic dispatch via @singularity-forge/ai's completeSimple, dependency-injectable for tests, 8-minute timeout race, clean-finish detection on the canonical "Self-feedback triage complete" terminator. `sf headless triage --run [--model provider/modelId]` now dispatches the canonical triage prompt and writes the model's decision text to .sf/triage/decisions/<ts>.md. Operators apply the decisions (resolve_issue calls, code edits) — a tool-enabled variant that lets the model close entries directly is follow-up work. Default model: google-gemini-cli/gemini-3-pro-preview (matches DEFAULT_REFLECTION_MODEL). Continues the bounded chip away at sf-mp4rxkwb-l4baga: triage now has both an operator-pipe path (default) and a one-shot dispatch path (--run). The full unit-type registration that wires this into the autonomous dispatcher's idle path is the remaining slice of that entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 07:29:29 +02:00
Mikael Hugo	8fde12301f	feat(headless): sf headless triage — operator-driven self-feedback drain Adds a deterministic, turn-independent path to drain the self-feedback queue. Modes: - default: emits the canonical buildInlineFixPrompt() output for piping into any model (sf headless triage \| sf headless -p -) - --list: human-readable digest sorted by impact↓ effort↑ ts↑ - --json: structured candidate list for tooling - --max N: cap candidates Why this matters (partial step toward sf-mp4rxkwb-l4baga): the existing session_start drain queues triage as `triggerTurn:true, deliverAs:"followUp"`. When autonomous mode bails at milestone validation before any turn runs, the followUp gets dropped and the queue stays unprocessed. This command sidesteps that by rendering the prompt synchronously to stdout — operators can pipe it into any model without depending on autonomous-loop turn semantics. The full unit-type registration that fixes the underlying dispatcher gap is larger work tracked in the parent entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 07:04:01 +02:00
Mikael Hugo	a342868068	feat(packages): extract @singularity-forge/openai-codex-provider Mirrors the @singularity-forge/google-gemini-cli-provider package layout for the codex CLI integration boundary. The new package owns: - CodexAppServerClient (the JSON-RPC subprocess client; previously packages/ai/src/providers/codex-app-server-client.ts, no pi-ai internal coupling) - snapshotCodexCliAccount / discoverCodexCliModels (reads ~/.codex/models_cache.json with visibility=list ∧ supported_in_api filter; previously inline in src/resources/extensions/sf/openai-codex-catalog.js) openai-codex-responses.ts (the stream-shaping provider) intentionally stays in @singularity-forge/ai because it depends on pi-ai stream-event internals and is not reusable outside the provider — same scope as google-gemini-cli.ts vs google-gemini-cli-provider. The SF extension's openai-codex-catalog.js is now a thin SF-side cache writer that delegates to discoverCodexCliModels, mirroring how gemini-catalog.js delegates to discoverGeminiCliModels. readCodexAvailableModels became async to match the dynamic-import path; tests updated. Closes sf-mp4u5fcz-wh6ac9 (with documented AC2 narrowing — see resolution). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 06:48:19 +02:00
Mikael Hugo	0694803df3	feat(model-router): explicit agentic score for every capability profile Sweep MODEL_CAPABILITY_PROFILES so all 82 entries declare an explicit agentic score; the agentic=50 fallback in scoreModel was silently giving untouched profiles a generous default and letting weak agentic models slip through execute-task routing. Anchors per the entry's suggestedFix: coding-only ~25-40, very small/older ~30-40, older generations ~55-70, frontier agentic ~85-95. Adds an invariant test that asserts no profile relies on the default. Closes sf-mp37p9u2-80f2gz. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 06:28:06 +02:00
Mikael Hugo	48e793c003	refactor(reflect): route reflection-pass through loadPrompt in extension Move the loadPrompt("reflection-pass") call site from headless-reflect.ts into a new renderReflectionPrompt helper in reflection.js. gap-audit greps EXTENSION_SRC for loadPrompt call sites; without a hit there it flagged the prompt as orphan even though the headless surface was using it (sf-mp4warqc-y1u0b3). Side benefits: fragment composition + variable validation now run via the canonical path instead of the prior raw fs.readFile + string substitution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 06:20:38 +02:00
Mikael Hugo	639dcde717	feat(self-feedback): outcomes-verification AC2 — check commit touches AC-mentioned files Addresses sf-mp4vxusa-pn2tnd. Completes the outcomes-verification chain filed as AC2 of the original sf-mp4rxkwn-jmp039 (AC1 was commit-exists, shipped `4af10ac1b`). When an agent-fix resolution cites a commit_sha AND the entry has acceptanceCriteria mentioning specific file paths, verify the cited commit actually modifies at least one of those files. Without this check, an agent could stamp ANY existing commit (e.g. the most recent unrelated commit on main) as the fix evidence — the SHA exists but the commit has nothing to do with the entry. Implementation: extractFilesFromAcceptanceCriteria(acText) Two extraction strategies: 1. Backticked code spans (most reliable): `src/foo.js` 2. Bare path-like tokens (only when slash + dotted extension present, no whitespace, no http:// prefix, no leading digit) Returns [] when AC has no extractable paths — prose-only AC skips the check rather than rejecting (the silent-skip is the right failure mode here; we don't want to fabricate rejections when there's nothing to verify against). getCommitTouchedFiles(commitSha, basePath) Shell to git diff-tree --no-commit-id --name-only -r <sha>. 5-second timeout. Returns null on git failure or out-of-repo. Matching strategy: exact-path-set OR basename-set. The basename fallback tolerates the common operator informality where AC says "src/types.ts" but the actual change was at "packages/ai/src/types.ts". Exact match wins; basename match catches the typical case without over-trusting (still requires a file with that exact basename to be touched). Carve-out: skip the check when getCommitTouchedFiles returns null (git unavailable / not-a-repo) — same shape as AC1's "ungrokable" carve-out. The agent-fix-unverified evidence kind remains the explicit escape hatch for "I want agent-fix attribution but can't cite a verifiable commit." Tests (3 new, 19 total): - rejects_agent_fix_when_commit_does_not_touch_AC_files: real git init, commit touches src/unrelated.js, AC mentions src/expected.js → markResolved returns false. Then commit that DOES touch expected → markResolved returns true. - skips_AC_file_check_when_AC_has_no_extractable_paths: prose-only AC accepts any commit. - AC_file_check_tolerates_basename_match: AC says src/types.ts but commit touches packages/ai/src/types.ts — accepted via basename. 1619/1619 SF extension tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 06:01:57 +02:00
Mikael Hugo	2b64f308cf	feat(self-feedback): prioritization signal — impact_score + effort_estimate (v65) Addresses sf-mp4rxkx0-fkt3e2 (gap:no-prioritization-signal-on-open-queue) AND closes the consolidating reflection entry sf-mp4w89mv-3ulqp4 (all four data-plane-isolation siblings now resolved: kind taxonomy, causal-link relations, memory mirror, prioritization). Schema v65 adds two columns to self_feedback: impact_score INTEGER (0-100; default by severity) effort_estimate INTEGER (1-5; default null → treated as 3 in selector) Severity-derived default for impact_score, set by insertSelfFeedbackEntry when no explicit value supplied: critical → 95 high → 80 medium → 50 low → 20 selectInlineFixCandidates now sorts by: 1. impact desc — high-impact work first 2. effort asc — quick wins ahead of multi-day work at same impact 3. ts asc — older entries break ties (FIFO within priority) Replaces the pure-FIFO ordering. Operators can override per-entry by setting impact_score/effort_estimate explicitly at file time, so e.g. a "low" severity entry with a critical real-world impact gets bumped above routine "medium" entries. Migration is idempotent: ensureSelfFeedbackTables (the fresh-DB CREATE path) already includes both columns, so the v65 ALTER probes via PRAGMA table_info before adding to avoid "duplicate column" errors on fresh DBs. Older fixtures still get the ALTER. Two ALTER guards needed because the columns are added independently and the second probe must see post-first-ALTER state. Tests: sf-db-migration: assertion 64 → 65 + new impact_score/effort_estimate column-exists checks self-feedback-drain: prioritization order test (5 entries spanning all severities + explicit-effort overrides) + explicit-impact-overrides-default test 1616/1616 SF extension tests pass; typecheck clean. Note: the consolidating reflection entry sf-mp4w89mv-3ulqp4 (filed by the reflection layer's deepest-architectural-concern finding) is now fully addressed across 4 commits today: `2f8ee5725` (memory mirror), `83c28b756` (kind taxonomy), `d40a3d21d` (causal links), this commit (prioritization). Resolves both entries in one go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:56:20 +02:00
Mikael Hugo	d40a3d21dd	feat(self-feedback): causal-link relations between entries (v64 migration) Addresses sf-mp4rxkwx-jz0soh (gap:no-causal-links-between-self-feedback- entries). Third sibling of the consolidating reflection entry sf-mp4w89mv-3ulqp4 (data-plane-isolation cluster). Schema v64 adds self_feedback_relations: from_id TEXT NOT NULL (FK → self_feedback.id) to_id TEXT NOT NULL (FK → self_feedback.id) relation_kind TEXT NOT NULL (CHECK: closed enum of 5 kinds) created_at TEXT NOT NULL PRIMARY KEY (from_id, to_id, relation_kind) CHECK (from_id != to_id) INDEX on (to_id, relation_kind) for inbound queries Allowed kinds: supersedes, duplicate_of, blocks, root_cause_of, partial_fix_of. The composite PK allows multiple kinds between the same pair (e.g. "A supersedes B AND blocks B") but prevents exact triple duplicates. Helpers in sf-db-self-feedback.js: SELF_FEEDBACK_RELATION_KINDS frozen array of allowed kinds linkEntries(from, to, kind) inserts; returns true on new row, false on PK collision (idempotent), throws on FK / CHECK / unknown-kind getRelatedEntries(id) returns [{id, relationKind, direction: 'outbound'\|'inbound'}] — inbound + outbound in one call Implementation note: linkEntries uses plain INSERT (NOT INSERT OR IGNORE) so CHECK and FK violations surface as thrown errors. Idempotency for PK collisions is implemented by catching the specific error message. INSERT OR IGNORE would have silently swallowed self-loops and broken FKs — exactly the kind of writer-layer bug we just fixed in `83c28b756` and the upsertRequirement repair in `f92022730`. Tests: sf-db-migration.test.mjs — 2 assertion bumps (63 → 64) + new self_feedback_relations table-exists check self-feedback-relations.test.mjs (new, 9 tests) — SELF_FEEDBACK_RELATION_KINDS enum shape linkEntries inserts new triple linkEntries idempotent on duplicate linkEntries allows multiple kinds same pair linkEntries throws on unknown kind (writer-layer) linkEntries throws on self-loop (CHECK) linkEntries throws on missing FK getRelatedEntries returns outbound + inbound getRelatedEntries empty for unlinked entries 1610/1610 SF extension tests pass; typecheck clean. Note on dispatch: this work was first attempted via "sf headless -p" to dogfood per memory rule. The dispatch ran 99s with 19 tool calls but went off-script — modified 10+ files in packages/ai/providers/ (adding wireModelId field across all providers, separate refactor) and never touched sf-db-schema.js or the relations table. Hand-coded fallback applied; off-script-dispatch pattern logged as another data point in sf-mp4rxkwb-l4baga (triage-not-a-first-class-unit-type). The wireModelId provider changes remain uncommitted in the working tree for operator review — they may be valuable but were not the requested work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:46:56 +02:00
Mikael Hugo	f92022730b	fix(promoter): cluster by domain:family + repair upsertRequirement field-binding Two related fixes that complete AC4 of sf-mp4rxkwt-sfthez (kind taxonomy, commit `83c28b756`): 1. Cluster by domain:family prefix instead of exact kind string. The promoter was clustering on the full `kind` value, which after the taxonomy enforcement means every entry like gap:routing:tiebreak-cost-only and gap:routing:agentic-axis-partial- coverage stayed in cluster size 1. Empirical confirmation: live ledger 2026-05-14 had 10 open entries, max cluster size 1 under exact-string matching — promoter could never fire on real diverse data. New behavior: extract first two segments as the cluster key. Entries sharing domain:family group together; legacy single-segment kinds cluster as themselves. With this change, the live ledger's gap:routing family would include 3 entries today. 2. Repair the silently-broken upsertRequirement call (LATENT BUG). The promoter was calling upsertRequirement with only {id, title, description, status, class, source} — but the schema binds every column positionally including {why, primary_owner, supporting_slices, validation, notes, full_content, superseded_by}. SQLite cannot bind `undefined`, so EVERY upsert attempt threw — caught silently by the surrounding try/catch ("non-fatal") with no log line. Result: the promoter has never successfully created a requirement row in this project's history, regardless of clustering threshold. Fix: pass all schema columns explicitly with null defaults for unused ones. Also encode the human-readable cluster title into description's first line since the requirements table has no title column (separate schema-evolution concern, out of scope here). Tests: new tests/requirement-promoter.test.mjs (5 tests) covers domain:family clustering when count>=5, no cross-family clustering, legacy single-segment kinds, below-threshold returns 0, non-forge bail. The first test would have caught both the prefix clustering miss AND the upsertRequirement field-binding bug — runs end-to-end through upsertRequirement → getActiveRequirements. 1601/1601 SF extension tests pass; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:34:13 +02:00
Mikael Hugo	83c28b756c	feat(self-feedback): enforce kind taxonomy at recordSelfFeedback Addresses sf-mp4rxkwt-sfthez (gap:self-feedback-kind-vocabulary-unbounded). The reflection report identified this as part of the deepest architectural concern (4 entries clustered under data-plane isolation), and the threshold-promoter was structurally unable to fire because every entry's kind was a unique string (clusters by exact match). Add a `domain:family[:specific]` taxonomy validated at recordSelfFeedback write time: ALLOWED_KIND_DOMAINS enum of allowed top-level domains (gap, architecture-defect, architectural-risk, inconsistency, runaway-loop, schema-drift, janitor-gap, upstream-rollup, reflection, copilot-parity-gaps, gap-audit-orphan-prompt, gap-audit-orphan-command, flow-audit, executor-refused, solver-missing-checkpoint, runaway-guard-hard-pause, self-feedback-resolution) KIND_SEGMENT_RE /^[a-z][a-z0-9](?:-[a-z0-9]+)$/ — kebab-case per segment validateKind(kind) accepts: domain (1-segment legacy) domain:family (2-segment canonical) domain👪specific (3-segment specific) rejects: empty, non-string, >3 segments, unknown domain, non-kebab segments recordSelfFeedback now returns null when validateKind fails, with a warning logged via workflow-logger. Existing rows in the ledger are grandfathered (validation only fires on NEW writes through this entry point) so the migration is non-destructive. This unblocks the threshold-promoter to cluster by domain:family prefix once the requirement-promoter is updated to do so (separate follow-up). Detectors and reflection passes can now reason about domains rather than handfuls of unique strings. Tests: 3 new (canonical-shapes / malformed-rejected / non-string-rejected). 8 existing test fixtures updated to use canonical kinds (gap:test-feedback etc.) — they were using bare slugs that the new validation correctly rejects. 1596/1596 SF extension tests pass; typecheck clean. Note on prior dispatch: this work was first attempted via "sf headless -p" to dogfood the new memory rule (drive SF work through sf headless, not parallel Claude Code agents). The dispatch ran 49s with 8 tool calls but landed nothing — the same fragility documented in sf-mp4rxkwb-l4baga (triage-not-a-first-class-unit-type). Hand-coding fallback applied; fragility data point added to the open entry's evidence trail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:28:19 +02:00
Mikael Hugo	e2f631901f	test(sf-db-migration): bump expected schema version 62 → 63 Schema head moved to v63 in commit `21d905461` (parallel agent's "rem-agent-inspired memory discipline + always-in-context invariants board" track) but the migration tests still asserted v62 — flagged in the last 2 iterations as "pre-existing migration failures, not mine." Update both schema-version assertions to 63 + add a context_board table-exists check after the v63 migration so future schema bumps explicitly require updating both the version assertion AND the matching table-presence check (catches naked-version-bump skews). 11/11 migration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:19:09 +02:00
Mikael Hugo	2f8ee57256	feat(self-feedback): mirror resolutions into memory-store on success Addresses sf-mp4rp6y2-31jfau (architecture-defect:self-feedback-not- wired-to-memory-subsystem). The reflection layer surfaced this as part of the deepest architectural concern in the 2026-05-14T02-49-45Z report: "resolutions are hidden from the memory graph, SF will continue to forget its own triaged solutions and fail to cluster identical root causes." When markResolved succeeds against the DB, also call memory-store's createMemory to mirror the closure as a memory entry that detectors and reflection passes can consult later via getRelevantMemoriesRanked. Memory entry shape: category: "self-feedback-resolution" content: "[<entry.kind>] <entry.summary>\n→ <evidence.kind>: <reason>" confidence: 0.9 source_unit_type: "self-feedback" source_unit_id: <entryId> tags: [ <entry.kind>, "evidence:<evidence.kind>", "commit:<sha-12-prefix>" // when commitSha present "requirement:<reqId>" // when requirementId present ] Best-effort: any memory-write failure is silently swallowed. The resolution itself already landed via DB UPDATE + JSONL audit append + markdown regen — the memory mirror is observability + future detector consumption, not a correctness requirement. The try/catch ensures a broken memory subsystem cannot roll back a valid resolution. Tests (2 new, 13 total in self-feedback-db): - agent-fix with commitSha → memory entry has [kind, evidence:agent-fix, commit:<sha-prefix>] tags + sourceUnitId pointing at the resolved entry - human-clear without commit → memory entry has [kind, evidence:human- clear] tags only, no commit tag Pre-existing migration failures in sf-db-migration.test.mjs (2 tests: v27 spec backfill, v52 routing-history heal) are unrelated to this commit; same failure mode as last iteration. Logged here so the 1591/1593 pass rate is auditable. The other three siblings of the consolidating reflection entry (sf-mp4w89mv-3ulqp4) remain open and need schema migration: - sf-mp4rxkwt-sfthez kind vocabulary (domain:family[:specific]) - sf-mp4rxkwx-jz0soh causal links (self_feedback_relations table) - sf-mp4rxkx0-fkt3e2 prioritization (impact_score + effort_estimate cols) This commit lands the writer-layer-only piece (#4 in the rollup's suggested fix), unlocking detector + reflection consumption immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:16:28 +02:00
Mikael Hugo	6a88ad2f00	refactor(reflection): route through @singularity-forge/ai, drop subprocess + gemini hardcoding User-correctable architecture defect: runGeminiReflection shelled out to the `gemini` CLI binary and hardcoded the gemini provider, duplicating auth discovery and disconnecting the call from SF's metrics, cost accounting, and provider abstraction. Should have routed through the existing @singularity-forge/ai layer from the start. Replace runGeminiReflection with runReflection that: - Resolves an operator-supplied "provider/modelId" string via @singularity-forge/ai's getModel (the canonical accessor for the runtime model registry — MODELS itself isn't re-exported). - Calls completeSimple from @singularity-forge/ai. Same provider routing every other SF LLM call uses (anthropic, openai, google-gemini-cli, openai-codex-responses, mistral, etc.). No subprocess. - Default model is google-gemini-cli/gemini-3-pro-preview because that matches the operator's primary AI Ultra tier — but the default lives in a single named constant (DEFAULT_REFLECTION_MODEL), no provider hardcoding in the call path. Operators override per-call via --model. - Returns { ok, content?, cleanFinish?, error?, provider, modelId } for observability into which provider actually answered. runGeminiReflection kept as an alias for back-compat so the existing headless-reflect.ts caller works unchanged. New code should use runReflection directly. Tests: switched from a fake-gemini-binary-on-PATH approach (5 tests) to a clean dependency-injection approach via options.complete (5 tests + 1 new "rejects bare model strings"). Mock returns AssistantMessage shape directly, no subprocess machinery. Two pre-existing migration test failures in sf-db-migration.test.mjs (openDatabase_migrates_v27, openDatabase_v52_db_heals_routing_history) are unaffected by this commit — they fail in isolation too, likely related to commit 7570aac4b's routing-metrics track. Logged here so the 1589/1591 pass rate is auditable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:11:07 +02:00
Mikael Hugo	21d9054611	feat(sf): rem-agent-inspired memory discipline + always-in-context invariants board Two patterns lifted from Copilot CLI 1.0.47's rem-agent design. 1. add/prune-only consolidation surface (memory-store, memory-extractor) - applyConsolidationActions(): new export that gates the extractor path to two action kinds only — "add" (→ CREATE) and "prune" (→ SUPERSEDE with sentinel superseded_by = "pruned:<unitType>:<unitId>"). UPDATE / REINFORCE / SUPERSEDE actions are rejected with a descriptive error from the consolidation path; manual paths still use applyMemoryActions and keep full action surface. - memory-extractor.js EXTRACTION_SYSTEM prompt updated: model is told to emit add/prune only and to fix wrong entries by prune+readd, not edit. - Discipline win: every consolidation change is visible as an addition or removal — no silent revisions. 2. swarm member inheritance of parent memory view (swarm-dispatch) - SwarmDispatchLayer.dispatch() now fetches getActiveMemoriesRanked(30) and formatMemoriesForPrompt(memories, 2000, false) at dispatch time, attaches as memoryContext on both bus metadata and DispatchResult. - Snapshot semantics — members get the view at dispatch time, no live updates mid-task. - Resolves the TODO at swarm-dispatch.js:22. 3. always-in-context invariants board (new capability) - New src/resources/extensions/sf/context-board.js — SQLite-backed, per-repo/per-branch entries. Two ops: addBoardEntry, pruneBoardEntry (no update — same discipline as #1). 4 KB byte cap in formatBoardForPrompt with truncation marker. - New src/resources/extensions/sf/tools/context-board-tool.js + bootstrap/context-board-tool.js — registered via pi.registerTool with two ops: add(content, category?) and prune(id). Repository + branch auto-filled from git context. - Schema migration v62 → v63 in sf-db-schema.js adds context_board table + idx_context_board_repo_branch index. ensureContextBoardTable wired into initSchema for fresh databases. - System-prompt injection at auto/phases-dispatch.js runDispatch right after dispatchResult.prompt resolution: prepends board snapshot under a labeled section. Try/catch fail-open — board errors never break dispatch. Sidecar/custom-engine paths intentionally not covered (carry full unit context already + low frequency). Why these complement existing infra rather than replace: - memory-store remains queryable (recall on demand) for facts the agent references sometimes. - context_board is always-rendered (small, prompt-injected) for invariants the agent should never operate without — current milestone scope, architectural rules, known-broken paths, in-flight migrations. Comparison to Copilot rem-agent: - We have what they have on consolidation (add/prune + board) plus what SF already had (queue + drain + memory-extractor + SLEEPTIME swarm topology that's richer than their single-agent rem-agent). Tests: 40/40 pass across memory-consolidation-discipline.test.ts (18) and context-board.test.ts (22). Full test:unit deferred — see follow-up. Two parallel Sonnet 4.6 sub-agents in isolated worktrees produced the work; integration adapted for the modular sf-db split (schema went into sf-db/sf-db-schema.js, prompt injection into auto/phases-dispatch.js, both of which got pulled out of their original files since the swarms launched). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 05:08:31 +02:00
Mikael Hugo	f68ab20953	fix(ai): backfill MiniMax M2/M2.1 cacheRead pricing	2026-05-14 04:55:46 +02:00
Mikael Hugo	4af10ac1b2	fix(self-feedback): verify agent-fix commit_sha exists in repo Partially addresses sf-mp4rxkwn-jmp039 (no-outcomes-verification): AC1 and AC3 land here. AC2 (cross-check that the cited commit's changed files include the entry's referenced files) is filed separately as a follow-up — different mechanism (semantic AC parsing). Without this check, an agent could stamp ANY string as commit_sha and markResolved would accept it under the writer-layer constraint shipped in `d477ce703`. The credibility check at the reader caught the OBVIOUS non-canonical shapes (null evidence, {file, line}) but a well-formed {kind: "agent-fix", commitSha: "phantom-sha"} would have passed. Implementation: verifyCommitExists(commitSha, basePath) returns one of: - "verified" — git is present and the commit is in the repo - "missing" — git is present but the commit lookup failed - "ungrokable" — git unavailable or basePath isn't a git repo (carve-out: we can't verify, so don't punish) markResolved policy: reject on "missing"; accept on the others. The agent-fix-unverified kind (reserved in `d477ce703`) is the explicit escape hatch for "I want to mark agent-fix but can't cite a verifiable commit" — those resolutions remain re-includable under the credibility check, which is what we want. Implementation uses two shell-outs to git (rev-parse --verify, then rev-parse --git-dir to distinguish missing from not-a-repo). Both are guarded with 5-second timeouts and never throw — failure modes return "ungrokable" so the carve-out kicks in. Tests: 2 new (11 total in self-feedback-db). - rejects_agent_fix_with_nonexistent_commit_sha: initializes a real git repo, files an entry, rejects bogus SHA, accepts real HEAD SHA - accepts_agent_fix_with_no_commit_sha_or_ungrokable_path: covers both the carve-out (no-git) and agent-fix-without-commitSha (testPath/summaryNarrative path) Full SF extension suite (1549 tests) passes; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:44:04 +02:00
Mikael Hugo	d477ce7039	fix(self-feedback): reject non-canonical evidence at the writer layer Addresses sf-mp4qoby4-meiir7: the credibility check at the READER side of self-feedback (selectInlineFixCandidates) was previously the only gate. An agent that wrote DB rows directly via raw SQL or the wrong tool could bypass it, landing resolutions like {file, line} or null that the reader would then either trust (legacy carve-out) or quietly re-open. Observed live in 2026-05-13 dogfood (5/5 sloppy resolutions with non-canonical evidence shapes). This commit makes the policy belt-and-suspenders: markResolved (and by extension resolveSelfFeedbackEntry) refuse to write resolutions whose evidence.kind is not in the accepted set: agent-fix, human-clear, promoted-to-requirement, auto-version-bump, agent-fix-unverified (reserved for outcomes-verification follow-up) When evidence is missing, non-object, or its kind is outside the set, markResolved returns false WITHOUT touching the DB or JSONL — caller recovers by re-submitting with a valid kind. All existing callers (resolve_issue tool, requirement-promoter, auto-version-bump resolver, triage-self-feedback) already pass valid kinds; no breakage. Raw SQL bypass is a known limit documented in the entry — full coverage needs a DB CHECK constraint on resolved_evidence_json (schema migration, separate work). Tests: 2 new (markResolved_rejects_non_canonical, accepts_each_canonical) covering all four rejection paths (bad kind, missing kind, missing evidence, unknown kind) and all five accepted kinds. Full SF extension suite (1547 tests) passes; typecheck clean. Plus inline cleanup: closed 3 stale upstream-rollup re-files (sf-mp4qyotx, sf-mp4qyoub, sf-mp4qyouh) with human-clear evidence — the bridge fix in `6d27cba06` now prevents recurrence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:40:52 +02:00
Mikael Hugo	6d27cba067	fix(upstream-bridge): suppress re-file of recently-closed rollup kinds Addresses sf-mp4rp6xn-hpag5h: bridgeUpstreamFeedback's idempotency check only looked at currently-OPEN upstream-rollup entries, so any closure (human-clear or agent-fix) would let the bridge re-file the same cluster on the next session_start. Observed live during 2026-05-13 dogfood: closed 3 upstream-rollup entries with human-clear, bridge re-filed all 3 on the next run. Change: extend the idempotency set to also exclude rollup kinds that were RESOLVED within the last 30 days (matches the existing THIRTY_DAYS_MS upstream-source cutoff — same window, same rationale). Closures are treated as time-limited: after the window expires, a re-cluster CAN re-file, because the original closure was made against then-current state and later state may legitimately surface the same kind again. This is the right balance — operators get respite from re-files while the closure decision was fresh, without trapping the ledger forever if conditions actually change. 7 new tests cover the regression (files new / skips open / skips recently-closed / allows re-file after window / threshold guards / non-forge-repo bail). Full SF extension suite (1545 tests) passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:37:10 +02:00
Mikael Hugo	62b19d7ba4	feat(reflection): wire LLM dispatch (sf headless reflect --run) Phase 1B of the reflection layer: complete the operator-driven loop by adding actual LLM dispatch. Phase 1A (commit `e161a59e2`) shipped the corpus assembler + prompt template + the prompt-emit operator surface. This commit wires the dispatch end so `sf headless reflect --run` produces a real report on disk without manual model piping. Why shell-out to the gemini CLI and not SF's provider abstraction: reflection is a single-prompt one-shot inference. Going through SF's full agent dispatch would require a session, model registry, tool registration, recovery shell — overkill for "render this prompt, capture text." The gemini CLI handles auth (~/.gemini/oauth_creds.json), Code Assist project discovery, and protocol drift on its behalf. Subprocess cost is paid once per reflection (rare). Implementation: - reflection.js: runGeminiReflection(prompt, options) spawns `gemini --yolo --model <model> -p "<directive>"` and pipes the giant rendered template via stdin (gemini -p reads stdin and appends). Returns { ok, content, cleanFinish, exitCode, error, stderr }; never throws. Defaults to gemini-3-pro-preview (0% used on AI Ultra, strongest agentic model with quota). 8-minute timeout. cleanFinish detected by REFLECTION_COMPLETE terminator (emitted by the prompt template's output contract) — operator gets a warning when the report is truncated. - headless-reflect.ts: --run flag triggers dispatch + report write via writeReflectionReport. --model overrides the default. Errors surface as JSON or text per --json. Successful runs emit the report path on stdout; failures emit error + truncated stderr. - help-text.ts: documents --run and --model flags. - Tests (4 new, 13 total): use a fake `gemini` binary on PATH to exercise the spawn path without real OAuth/network — covers ok+cleanFinish, non-zero exit, hang/timeout, missing-terminator. All 1538 SF extension tests pass; typecheck clean. Phase 2 follow-up (still gated on sf-mp4rxkwb-l4baga triage-not-a-first-class-unit-type landing): reflection-pass becomes a real autonomous-loop unit type, milestone-close auto-triggers it, the report's `Recommended new self-feedback entries` section gets parsed and the entries auto-filed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:33:16 +02:00
Mikael Hugo	e161a59e2f	feat(reflection): add Phase 1A reflection layer (corpus + prompt + sf headless reflect) Addresses self-feedback entry sf-mp4uzvcd-pazg6v (architecture-defect:no-reflection-layer-over-self-feedback-corpus): SF detected symptoms and triaged individual entries but had no layer that reasoned about the corpus to recognize recurring structural patterns. The same architectural pressure expressed itself across multiple entries with different exact-kind strings; nothing escalated the pattern to a class. The cognitive work fell on the operator. This commit ships Phase 1A — the data-assembly + prompt half of the reflection layer + an operator-driven entry point. Phase 1B (LLM dispatch via the autonomous loop as a real unit type) lands once sf-mp4rxkwb-l4baga (triage-not-a-first-class-unit-type) is in. Files: - src/resources/extensions/sf/reflection.js (new) - assembleReflectionCorpus(basePath): bundles open + recent-resolved self-feedback (full json), last 50 commits via git log, milestone + slice + task state, all milestone validation verdicts, and prior reflection report into one struct. Returns null on prerequisite failure (DB closed) so callers downgrade gracefully. - renderReflectionCorpusBrief(corpus): renders the corpus into a markdown brief the LLM consumes in one turn. - writeReflectionReport(basePath, content): persists to .sf/reflection/<timestamp>-report.md so next pass detects "what changed since last reflection." - src/resources/extensions/sf/prompts/reflection-pass.md (new) - {{include:working-directory}} prefix. - Reasoning order: cluster by structural shape (not exact kind), identify recurring patterns, identify commit/ledger gaps, identify stale validation drift, identify the deepest architectural concern, compare against prior report. - Output contract: structured markdown report with named sections, terminator REFLECTION_COMPLETE for clean-finish detection. - Constraints: don't fix anything (reflection layer not executor), don't resolve entries without commit-SHA evidence, don't invent IDs. - src/headless-reflect.ts (new) — sf headless reflect [--json] - Pre-opens the project DB via auto-start.openProjectDbIfPresent (one-shot bypass path doesn't run the full SF agent bootstrap). - Default: emits the rendered prompt brief (template + corpus) for operators to pipe into any model. Lets the corpus-assembly layer ship and validate before the LLM-dispatch layer is wired. - --json: emits raw corpus snapshot for tooling. - src/headless.ts: registers the new "reflect" command after the existing usage block. - src/help-text.ts: documents it in the headless command list. - src/resources/extensions/sf/tests/reflection.test.mjs (new, 9 tests): null-when-DB-closed; collects open + recent-resolved; excludes >30d resolutions; captures milestone/slice/task tree; captures validation verdicts; commits returned as array (best-effort tmpdir is ok); brief renders all major sections; entry IDs/severity/kind appear in brief; writeReflectionReport round-trips through assembleReflectionCorpus's previousReport read. Live smoke verified: sf headless reflect against the real .sf/sf.db returns 15 open + 23 recent-resolved entries, 50 commits, 2 milestones, 1 validation file (correctly surfacing M001's stale needs-attention verdict against actual 5/5 slices done — exactly the case that motivated this layer). Total: +848 LOC, full SF extension suite (1534 tests) passes, typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 04:27:29 +02:00

1 2 3 4 5 ...

4441 commits