singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	46d9d45279	fix(bash): block wrong project python runtime	2026-05-15 05:33:28 +02:00
Copilot	6652462a9d	fix(self-feedback): isolate headless triage spawn from auto.lock contention Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Self-feedback inline fix spawns 'sf headless triage --apply' as a detached child when SF_HEADLESS=1. The child previously grabbed the same auto.lock as the parent, causing lock contention that blocked the parent's unit execution. - Pass SF_SELF_FEEDBACK_WORKER=1 to the child environment. - session-lock: effectiveLockFile() returns auto-self-feedback.lock when the env var is set. - session-lock: effectiveLockTarget() returns .sf/parallel/self-feedback/ so the OS-level lock directory is also isolated. This mirrors the existing SF_PARALLEL_WORKER / SF_MILESTONE_LOCK mechanism used for parallel milestone workers (#2184). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 05:28:23 +02:00
Mikael Hugo	ef2b3af7dd	feat(swarm): teach worker the checkpoint contract + executor tool suite Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details The swarm worker now receives the autonomous executor's compact role prompt (buildSwarmWorkerSystemPrompt in auto/run-unit.js) which teaches it the checkpoint tool contract and PDD field requirements. This closes the last gap before SF_AUTONOMOUS_VIA_SWARM=1 can become default: without the contract the worker never emitted checkpoint tool calls, so workerSignaledOutcome stayed null and the loop terminated after one unit. With the contract, the worker calls checkpoint(outcome=...) and the orchestrator gets accurate completion signals. Envelope carries two new optional fields propagated through every layer: - executorSystemPrompt: overrides the swarm worker's default prompt - executorTools: optional tool name filter Flow: runUnitViaSwarm builds them → swarmDispatchAndWait reads them from envelope → forwards to runAgentTurn → runHeadlessPrompt passes them as systemPromptOverride / toolsOverride → runSubagent. No changes needed to runSubagent: createAgentSession + bindExtensions + _refreshToolRegistry already picks up extension-registered tools like `checkpoint` automatically. Tests: 61 passing across the two affected files (22+9 baseline + 30 new); 234 test files passing overall. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 05:12:55 +02:00
Mikael Hugo	54ac56d9bd	feat(swarm): honor worker checkpoint outcomes	2026-05-15 04:59:15 +02:00
Mikael Hugo	1115437cec	feat(swarm): event streaming + outcome derivation for runUnitViaSwarm - Forward onEvent through swarm-dispatch → agent-runner → runSubagent - Collect toolcall_end events in runUnitViaSwarm to build real tool-use blocks - Detect checkpoint tool outcome for accurate unit completion signal - Add headless.ts graceful shutdown (async signal handler, 2.5s timeout) - RPC client stop() now awaits flush and propagates stop to child sessions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 04:54:58 +02:00
Mikael Hugo	ffcd3d1157	chore(doc-checker): allowlist intentionally-short scaffold files The doc-checker startup hook prints a "9 files need content" advisory on every autonomous bootstrap. The flagged files are intentionally terse: - AGENTS.md indices under docs/ and .sf/harness/* point at sibling directories where the real content lives - .sf/PRINCIPLES.md / STYLE.md / NON-GOALS.md are terse-by-design bullet lists; the # heading line is stripped by countContentLines so a 9-bullet file falls one short of the 10-line threshold despite being substantive Adding them to STUB_ALLOWED_PATHS so the advisory only flags genuinely unfilled scaffolds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:43:18 +02:00
Mikael Hugo	3faa599f9d	fix(swarm): close multi-dispatch + checkpoint parity gaps Two real bugs surfaced by SF_AUTONOMOUS_VIA_SWARM=1 dogfood (Round 4): 1. Second dispatch to the same swarm agent returned reply=null because each MessageBus instance held a 30s-stale inbox cache. runAgentTurn now accepts opts.onlyMessageId; when set it forces agent._inbox.refresh() from SQLite, processes only that message, and leaves stale messages untouched for later turns. dispatchAndWait passes the just-dispatched messageId so each call is surgical. 2. runUnitViaSwarm now writes an appendAutonomousSolverCheckpoint and synthesizes a swarm_unit_complete tool_use block alongside the text reply, so phases-unit.js stops firing claimed-checkpoint-without-tool repair loops. Outcome is conservatively "continue" — a real "complete" requires the swarm agent to emit an actual checkpoint tool call (future round wires runSubagent.onEvent through dispatchAndWait). Tests: 51 passing for the two affected files (11 swarm-dispatch + 40 run-unit-via-swarm). Full suite: 1760/1760. Known remaining gap before flipping default: synthesized outcome is always "continue", so the loop relies on iteration caps for termination rather than agent-signaled completion. Wiring real tool calls through is the next round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:37:59 +02:00
Mikael Hugo	b428f1ab22	fix(headless): send terminal notification when loop exits without stopAuto Headless mode waits for 'Assisted/Autonomous mode stopped' to detect completion. When the loop exits via natural break (e.g. step-wizard in /next), stopAuto() is never called, so headless hangs forever. - Add s.stopAutoCalled flag to AutoSession - Set flag in stopAuto(), clear in cleanupAfterLoopExit() - Send terminal notification from cleanupAfterLoopExit() only when stopAuto() was bypassed - Fixes sf headless next hanging after unit completes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 04:32:05 +02:00
Mikael Hugo	78d52d7967	feat(autonomous): SF_AUTONOMOUS_VIA_SWARM=1 routes unit dispatch through swarm Add runUnitViaSwarm as an opt-in path in auto/run-unit.js. When SF_AUTONOMOUS_VIA_SWARM=1 (or =true), each unit dispatch builds a DispatchEnvelope (unitType -> workMode via deriveWorkMode), calls swarmDispatchAndWait, and returns the agent reply as a synthetic {status: "completed", event.messages: [{role: "assistant", content: reply}]} matching the shape phases-unit.js / classifyExecutorRefusal already expect. Default (flag unset) is byte-identical to today — no regression in the default path, 1751/1751 tests pass. Known gap (acceptable for an experimental opt-in, must be closed before swarm becomes default): - Tool-call events from the swarm worker do NOT surface to the orchestrator UI (runAgentTurn handles them internally). - The worker emits a plain text reply, not a structured checkpoint, so phases-unit.js' checkpoint-missing repair path will not trigger and classifyExecutorRefusal will not detect refusals. This is the first concrete step toward routing autonomous unit work through swarm: role-based agent selection, memory inheritance via the envelope, and a durable bus audit trail of every unit dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:27:00 +02:00
Mikael Hugo	bbade22388	feat(swarm): dispatchAndWait — synchronous request/response for swarm agents Add SwarmDispatchLayer.dispatchAndWait(envelope, { timeoutMs, signal }) which enqueues via _busDispatch, drives the target agent's turn via runAgentTurn (in-process runSubagent), and reads back the agent's reply from the bus. Returns DispatchResult extended with reply + replyMessageId. This is the missing piece for collapsing /delegate-style subagent calls into the swarm interface: callers that need a reply (not just delivery) can now use the swarm contract instead of the subagent extension's bespoke dispatch path. Round 4 will migrate those callers. New helper MessageBus.getReplyTo(messageId, fromAgent) queries SQLite directly via json_extract for the most recent reply to a given message. Plus 8 tests covering happy path, error paths (no reply, runner throws, runner returns {error}), the swarmDispatchAndWait convenience function, and the A2A short-circuit path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:15:52 +02:00
Mikael Hugo	903cdd4d9d	feat(subagent): event streaming for in-process runSubagent Add RunSubagentOptions.onEvent callback so callers (TUI live update panel for /delegate, /rubber-duck, etc.) get every session event without polling. Errors from the callback are caught so a buggy caller cannot crash the agent. Chain caller-supplied AbortSignal through a local AbortController in runSingleAgent and register it in a new liveSubagentControllers set so stopLiveSubagents aborts in-process subagents alongside the legacy spawn-based processes (cmux split, sift codebase_search). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-15 04:04:52 +02:00
Mikael Hugo	62f886430c	fix: run subagents in process by default	2026-05-15 03:59:34 +02:00
Mikael Hugo	8b0f0bbd65	fix: harden headless dogfood self-healing	2026-05-15 03:53:15 +02:00
Mikael Hugo	3ac5aede1e	fix: repair headless runtime self-healing	2026-05-15 03:33:29 +02:00
Mikael Hugo	72c3811a7b	feat(auto): auto-triage TODO.md on each autonomous cycle - Add autoTriageTodo() helper that checks root TODO.md for raw dump notes beyond the empty template before each autonomous cycle - Lazy-imports buildTodoTriageLLMCall + triageTodoDump from commands-todo.js to avoid startup overhead - Triage results written to DB backlog with clear=true + backlog=true - Best-effort: never blocks autonomous loop on triage failure - Fast-path skips when TODO.md is empty template or doesn't exist Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 03:19:13 +02:00
Mikael Hugo	ca7ff554c3	feat(swarm): integrate LLM runner into AgentSwarm.run() - Make AgentSwarm.run() async with optional enableLLM flag - Wire runAgentTurn from agent-runner.js into all 4 topologies (round_robin, supervisor, dynamic, sleeptime) - Update drainSleeptimeQueue to use runAgentTurn for actual LLM execution instead of passive inbox reading - Export runAgentTurn, runAgentLoop, runSwarmTurn from uok/index.js - Update PersistentAgent JSDoc to reflect runner exists - Fix test imports after extension consolidation (ttsr, google-search) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 03:05:01 +02:00
Mikael Hugo	f6619b792c	refactor(extensions): move cmux into sf extension as internal module cmux was a standalone extension directory with no extension-manifest.json, functioning as a utility library for the sf extension. Moving it into sf/cmux/ makes the dependency explicit and removes the orphaned extension directory. Import paths updated: - commands-cmux.js, notifications.js, auto.js: ../cmux → ./cmux - bootstrap/system-context.js: ../../cmux → ../cmux - subagent/index.js: ../../cmux → ../cmux Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 02:34:35 +02:00
Mikael Hugo	534ed85ee1	refactor(extensions): merge google-search into search-the-web Google Search was a standalone extension providing a single tool (google_search) that used Gemini's Google Search grounding feature. It had fallback logic to search-the-web providers (Tavily, Brave) when Google OAuth was unavailable. Merging it into search-the-web consolidates all web search capabilities into one extension and eliminates the tight coupling between the two. Changes: - Copied google-search tool logic into search-the-web/tool-google-search.js - Added registerGoogleSearchTool / resetGoogleSearchCache exports - Integrated into search-the-web/index.js deferred loading - Added google_search to search-the-web extension-manifest.json tools - Deleted google-search/ extension directory Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 02:33:05 +02:00
Mikael Hugo	f0c3eaf999	refactor(extensions): merge ttsr into guardrails TTSR (Time Traveling Stream Rules) monitored streaming output against regex patterns. Guardrails blocked dangerous actions and redacted secrets. Both are safety/guardrail concerns — merging them into one extension reduces surface area and simplifies the safety model. Changes: - Copied ttsr-rule-loader.js, ttsr-manager.js, ttsr-interrupt.md into guardrails/ - Updated guardrails extension-manifest.json with ttsr hooks (turn_start, message_update, turn_end, agent_end) - Integrated TTSR session_start/turn_start/message_update/turn_end/agent_end handlers into guardrails/index.js - Deleted ttsr/ extension directory Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 02:28:40 +02:00
Mikael Hugo	2d5a05a48b	fix(security): resolve 7 findings from full-repo code review - Create web/middleware.ts to authenticate all API routes via bearer token and origin checks (previously unauthenticated due to missing middleware file) - Fix path traversal in browse-directories: replace startsWith with realpathSync + relative + isAbsolute containment checks - Fix XSS in session HTML export: escape raw HTML blocks via marked renderer - Fix PTY process leak: destroy session on SSE stream cancellation - Fix unhandled exception in terminal sessions POST: wrap getOrCreateSession in try/catch with structured JSON error response - Fix silent child-process failure in headless dispatch: add exit handler to write failed claim when sf headless triage exits non-zero - Fix TypeError on malformed claim JSON: add Array.isArray guard before accessing claim.ids.length All changes type-check cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-15 02:18:43 +02:00
Mikael Hugo	def1edefa9	sf snapshot: uncommitted changes after 268m inactivity	2026-05-15 02:08:06 +02:00
Mikael Hugo	7e1631618a	fix(self-feedback-drain): route inline-fix dispatch via 'sf headless triage --apply' when SF_HEADLESS=1 The existing dispatch used pi.sendMessage to queue a chat followUp. That works in interactive sf sessions but no chat agent is listening in 'sf headless' / autonomous flows — the message is queued and never delivered, leaving the high/critical blocker active on every iteration. When SF_HEADLESS=1, spawn the same triage-decider → review-code pipeline (via the already-shipped 'sf headless triage --apply' subprocess) instead. The autonomous loop then sees resolved entries via DB on the next gate check, no chat agent required. Forge-only: the dispatcher still only operates in the SF repo itself — `readAllSelfFeedback` for non-forge repos returns the upstream-feedback log (SF developer work), which must not be auto-dispatched from inside consumer projects. Documented that constraint inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:39:47 +02:00
Mikael Hugo	b0ebe7ce18	fix: register sf stop command outside tui	2026-05-14 21:30:00 +02:00
Mikael Hugo	2e4bdd292c	fix: keep hidden sf commands callable in print mode	2026-05-14 21:25:18 +02:00
Mikael Hugo	ccdf530488	fix(auto-prompts): add missing `join` import from node:path auto-prompts.js called `join(base, ...)` in 11 places but only imported `basename` from node:path. Crashed autonomous mode every iteration with ReferenceError: join is not defined — observed in dr repo, 3 consecutive iteration failures triggered the hard stop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:19:09 +02:00
Mikael Hugo	a3b68bb269	fix(env): align SF_PERMISSION_LEVEL enum with permission-profile values Schema now accepts the same five levels used elsewhere in the codebase (minimal/low/medium/high/bypassed) instead of the stale full/restricted/ sandbox triple. Docs and env test updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 21:11:36 +02:00
Mikael Hugo	f88b48b0aa	fix: show print mode liveness	2026-05-14 20:59:19 +02:00
Mikael Hugo	487237a32c	fix: bound sf print mode and chat routing	2026-05-14 20:55:00 +02:00
Mikael Hugo	b19096800b	fix(triage-apply): 8-minute watchdog on agent dispatch subprocess Observed 2026-05-14: a triage --apply run hung for 33+ minutes because the spawned subagent process stalled (provider SDK call without its own timeout) and defaultAgentRunner had no watchdog — it waited indefinitely on proc.on("close"). Adds a per-dispatch watchdog (default 8 min, override via SF_TRIAGE_AGENT_TIMEOUT_MS env). On expiry: SIGTERM → 5s grace → SIGKILL. Resolves immediately with ok=false / exitCode=124 (POSIX timeout convention) so the trust / review / mutation gates surface the failure as a real outcome instead of a silent stall. Provider-agnostic: the timeout protects the orchestrator regardless of which model the router picks. Operators running long-context provider calls can bump the env var; default 8min matches runTriage / runReflection's existing completeSimple timeout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 20:28:05 +02:00
Mikael Hugo	7cb1eef948	feat: record sf chat workflow evidence	2026-05-14 20:27:53 +02:00
Mikael Hugo	47867c1236	feat: route clear sf chat commands	2026-05-14 20:21:37 +02:00
Mikael Hugo	ab1a1edcf9	refactor: tier sf slash commands	2026-05-14 20:14:09 +02:00
Mikael Hugo	587b5fa31c	refactor: narrow sf slash surface	2026-05-14 20:04:53 +02:00
Mikael Hugo	5ce9df2e37	refactor: make bundled agents internal	2026-05-14 19:54:56 +02:00
Mikael Hugo	18aa257ede	refactor: rename review gate agent	2026-05-14 19:43:01 +02:00
Mikael Hugo	62fbc5d57b	refactor: align agent resource overlays	2026-05-14 19:32:41 +02:00
Mikael Hugo	7000373e88	fix(uok-status): surface manualAttention bucket in status uok output Codex audit follow-up (fix A). manual-attention outcomes were counted by getGateRunStats but dropped from the user-facing surface — they inflated `total` invisibly with no distinct column or key, so an operator couldn't tell a gate with 5 pass / 3 manual-attention apart from a gate with 5 pass / 3 fail. Adds `manualAttention: number` to GateHealthEntry and renders it as its own column between Fail and Retry in the human table. JSON consumers get the new key alongside pass/fail/retry. Test count for headless-uok-status.test.mjs: 30/30 (+2 new — column present in header, distinguishable from fail in row). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:46:28 +02:00
Mikael Hugo	7794208340	test(uok,slice-3b): cover ctx propagation through gate-runner, phases, plan-slice Adds focused unit tests for the slice-3b wiring: - UokGateRunner.run emits surface/runControl/permissionProfile/ parentTrace on all three trace paths (normal, unknown-gate, circuit-breaker-blocked) and omits them when absent. - buildAutonomousUokContext pins surface=autonomous + runControl= autonomous and derives permissionProfile from session/prefs (YOLO → low, prefs.permissionLevel honored, "high" default). - emitAutonomousGate forwards the schema-v2 ctx into UokGateRunner (covers the phases-pre-dispatch / phases-guards call sites via the new shared helper). - handlePlanSlice options.uokContext lands on every seeded Q3-Q8 quality_gates row; without it, rows stay in the legacy null shape. Refactors phases-pre-dispatch and phases-guards to call the new emitAutonomousGate helper so the three sites stay in sync going forward. phases-finalize keeps its inline UokGateRunner because the verification gate's execute callback isn't a static verdict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:33:26 +02:00
Mikael Hugo	95ea9eecee	feat(uok,plan-slice): seed Q3-Q8 gate rows with schema-v2 ctx from autonomous session Slice 3b of "Make UOK the SF Control Plane". handlePlanSlice now accepts an optional uokContext option and threads it into every insertGateRow call (Q3, Q4 slice gates; Q5, Q6, Q7 per task; Q8 slice closeout). executePlanSlice derives the ctx from the singleton autonomous session when one is active — currentTraceId becomes the v2 traceId/parentTrace, surface and runControl are pinned to "autonomous", permissionProfile follows session/prefs. Tools invoked outside an autonomous loop (interactive REPL, headless one-shot) pass uokContext=null and the seeded rows fall through to the legacy NULL-column shape, classified as "legacy" by status uok. Lazy import of auto/session.js keeps headless/test code paths from paying the session-singleton load cost when they don't need it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:20:32 +02:00
Mikael Hugo	a2c55d5fde	feat(uok,autonomous-loop): wire pre-dispatch/guard/finalize gates to schema-v2 ctx Slice 3b of "Make UOK the SF Control Plane". The autonomous loop's three high-traffic gate sites (resource-version-guard, pre-dispatch-health-gate, planning-flow-gate in phases-pre-dispatch; plan-gate in phases-guards; unit-verification-gate in phases-finalize) now build a schema-v2 UOK run-context per iteration and pass surface/runControl/permissionProfile/parentTrace into the gate runner. The gate-runner emits these onto every gate_run trace event, so the classifier in `sf headless status uok --json` reads them as coverageStatus: "ok" instead of "legacy". New helper uok/auto-uok-ctx.js pins surface="autonomous" and runControl="autonomous" for these phases and derives permissionProfile from session/prefs: "low" under YOLO or a minimal/low permissionLevel, "medium" for medium, "high" otherwise (the default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:18:17 +02:00
Mikael Hugo	7003da3f6a	test(uok): assert triage-apply-mutation-gate fires after agree-path Codex audit (Q4) flagged that the mutation gate landed in slice 3a but the test suite only verified the three earlier gates. Add coverage: - agree-path: mutation-gate fires with outcome=fail, rejectedCount=1, resolvedCount=0 (the test fixture has no real ledger entry for the decision id, so markResolved rejects it — the gate correctly surfaces the partial failure) - disagree-path: mutation-gate does NOT fire (apply phase skipped) Pins the 4-gate contract end-to-end. Suite: 4/4 in this file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:16:04 +02:00
Mikael Hugo	cf52aceb64	feat(uok,gate-runner): extend ctx with surface/runControl/permissionProfile/parentTrace Slice 3b of "Make UOK the SF Control Plane". UokGateRunner.run now reads the schema-v2 run-context fields off ctx and propagates them into every gate_run trace event (unknown-gate path, circuit-breaker-blocked path, normal execution path). Fields are omitted when absent so legacy callers keep the pre-v2 shape and status-uok continues to classify them as "legacy" rather than "incomplete". Helper buildGateRunEvent centralizes the trace shape so the three sites stay in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:13:45 +02:00
Mikael Hugo	61d3031007	test(uok): fail-closed contract for triage-apply gate emission Adds the missing test case that confirms the fail-closed semantics the parallel worker shipped in slice 3a: when the trace writer cannot persist a UOK gate record (e.g. .sf/traces is unwritable), runTriageApply MUST abort before any subagent runs and surface the emission failure as the run error. This pins down the contract codex Q5 noted as soft: enrichment failures are debug-only, but PRIMARY gate emission for the apply flow is hard-required. Without observable gates, an apply that mutates the ledger has no audit trail — refusing is the right call. Test asserts: trace-dir write failure → ok=false, error contains "UOK gate emission failed for trusted-agent-source-gate", and the mocked agentRunner was never invoked. Suite: 1682/1682. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:08:29 +02:00
Mikael Hugo	454e051aed	feat(uok): slice 3a — triage --apply emits 4 schema-v2 UOK gates First production caller of the schema-v2 writer chain. Every `sf headless triage --apply` invocation now emits four gate_run trace events with surface=headless, runControl=supervised, permissionProfile= high, traceId=flowId — making the gates visible in `status uok --json` with coverageStatus: "ok" (or fail/manual-attention on reject paths). Gates emitted, in order: 1. trusted-agent-source-gate — fires on the trust precondition: pass: both triage-decider and rubber-duck are SF-shipped built-ins fail: missing-agent OR non-builtin source OR untrusted custom runner (covers all three pre-dispatch refusal paths so operators see the failure in status uok, not just in the journal) 2. triage-plan-validation-gate — fires on the strict-parse contract: pass: parseTriagePlanStrict returns a valid plan covering expectedIds fail: missing marker / bad yaml / unknown id / outcome-required field missing 3. triage-apply-review-gate — fires on the rubber-duck verdict: pass: rubber-duck: agree → apply phase proceeds fail: rubber-duck disagreed → clean pause, no mutations manual-attention: rubber-duck subagent failed to complete 4. triage-apply-mutation-gate — fires after applyTriagePlan: pass: every approved mutation landed fail: any rejected mutation manual-attention: zero approved mutations (all decisions were "fix") Includes counts in extra: resolvedCount, rejectedCount, pendingFixCount. Reader-side fixes (codex review follow-up on slice 3a): - getDistinctGateIds (sf-db-gates.js) now UNIONs trace-event IDs with quality_gates DB IDs instead of returning trace IDs early when any exist. The old behavior silently hid slice-scoped DB-only gates the moment a flow-scoped trace landed. - getGateMeta (headless-uok-status.ts) now reads BOTH trace events and DB row, then picks whichever has the later evaluatedAt. Tie-break prefers trace (flow-scoped gates with no quality_gates FK row are trace-only). Old behavior preferred trace whenever surface was set, regardless of timestamp. Live verification: ran `sf headless triage --apply` 4 times against the operator's environment (rubber-duck is a project-level override). trusted-agent-source-gate now shows in `sf headless status uok --json` with total: 4, fail: 4, coverageStatus: "ok" — proving the schema-v2 metadata round-trips through the trace events and reaches the classifier. Tests: - headless-triage-uok-gates.test.ts (3 new tests): agree path emits 3 pass gates with v2 metadata; disagree path emits review fail; unknown-id path emits validation fail with no review gate. - Existing test suites adjusted for the GateMetadataRow → GateRunContextRow rename (classifier helpers renamed consistently across .ts source and the .mjs test mirror). - Full SF + headless apply: 1681/1681. Still legacy in production (slice 3b targets these next): - phases-pre-dispatch.js gates: resource-version-guard, pre-dispatch- health-gate, planning-flow-gate. None of these pass uokContext yet. - phases-unit.js gates: unit-verification-gate, plan-gate. - plan-slice.js: Q3/Q4/Q5/Q6/Q7/Q8 seed gates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:04:50 +02:00
Mikael Hugo	f0c57b58c6	feat(uok): slice 2 — schema-v2 metadata adapter + writer chain Second slice of "Make UOK the SF Control Plane". Wires the DB-level capability for schema-v2 gate metadata so future callers can flip quality_gates rows from "legacy" to "ok"/"stale"/"incomplete" by passing a canonical uokContext. No production caller passes ctx yet — slice 3 wires producers (headless triage --apply, phases-pre-dispatch, phases-unit). Schema migration v66 (SCHEMA_VERSION bumped 65 → 66): - quality_gates gains 5 nullable columns: surface, run_control, permission_profile, trace_id, parent_trace. - Idempotent ALTERs via PRAGMA table_info probes — fresh-DB CREATE path already includes the columns; migration only ALTERs older DBs. - Existing rows keep NULL across the new columns, so classifyCoverage in headless-uok-status reads them as "legacy" — no day-one warning flood. New adapter src/resources/extensions/sf/uok/run-context.js: - buildUokRunContext(opts) validates and normalizes the canonical camelCase shape: surface, runControl, permissionProfile, traceId (required), plus parentTrace, unitType, unitId, milestoneId, sliceId, taskId (optional). Frozen on success, null on any invalid or missing required field. - VALID_SURFACES / VALID_RUN_CONTROLS / VALID_PERMISSION_PROFILES enums reject typos at build time so we don't get silent schema-v2 rows with garbage in the enum columns. - uokRunContextToGateColumns(ctx) translates camelCase → snake_case column shape used by sf-db-gates writers. Writer chain (sf-db-gates.js): - insertGateRow now imports uokRunContextToGateColumns and translates g.uokContext (canonical camelCase) to the SQL column shape. Callers pass canonical ctx, the DB writer owns translation. NULL on legacy callers, NULL on malformed ctx. - saveGateResult mirrors the same translation; uses COALESCE(:col, col) so a missing ctx on a follow-up update preserves the row's existing schema-v2 metadata instead of nulling it. Reader chain (headless-uok-status.ts): - getGateMeta SELECTs surface, run_control, permission_profile, trace_id alongside scope and evaluated_at. ORDER BY uses "evaluated_at IS NULL, evaluated_at DESC" for cross-SQLite safety (NULLS LAST is not portable). - classifyCoverage signature changed from (entry, metadataPresent: bool) to (entry, meta: GateMetadataRow). Returns "incomplete" when surface is set but runControl/permissionProfile/traceId missing — surfaces buggy writers instead of silently classifying as "ok". Tests: - uok-run-context.test.mjs (12 tests): adapter validation, enum rejection, optional-field handling, frozen output, column translation. - uok-quality-gates-writer.test.mjs (5 tests): real DB round-trip proving insertGateRow + saveGateResult populate schema-v2 columns from canonical camelCase ctx, leave NULL on legacy/malformed, and preserve existing metadata via COALESCE on no-ctx updates. - headless-uok-status.test.mjs adjusted: classifier now takes GateMetadataRow; added test for "incomplete" classification. - sf-db-migration.test.mjs bumped expected version 65 → 66 and asserts the 5 new quality_gates columns exist. Full SF suite: 1678/1678 ✓ (+17 from slice 2 + +9 from slice 1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:48:05 +02:00
Mikael Hugo	c058bef26d	feat(uok-status): slice 1 — schema v2 + coverage classification + legacy tagging First slice of "Make UOK the SF Control Plane". Ships the operator- facing visibility primitive that subsequent slices fill in. No enforcement yet, no new gates yet — just the contract. Changes to sf headless status uok: - Bumps JSON output to schemaVersion: 2. - Adds coverageStatus per gate (ok \| stale \| incomplete \| missing \| legacy). Slice 1 only populates ok / stale / legacy: - legacy row predates schema-v2 metadata (every existing row today). NOT a warning — operators are not paged for the rich history of pre-v2 records. - stale schema-v2 row with no runs in window, OR last run older than the 24h stale threshold. Surfaces gates that stopped being exercised. - ok schema-v2 row with recent runs in window. incomplete / missing wait for the schema-v2 writer adapter (slice 2) and the configured-gate registry (later). - Adds the Coverage column to the human table output. - Removes the stale "missing getDistinctGateIds import" workaround comment from headless-uok-status.ts:104. The import exists today (gate-runner.js:5); the comment was lying. Bypassing UokGateRunner.getHealthSummary is still appropriate but for a different reason — documented inline. Tests (28 total, +9 new): - classifyCoverage: legacy wins over freshness; ok requires metadata + recent runs; stale fires on no-runs-in-window or last-run > 24h. - empty-DB does not false-positive coverage warnings (the bug codex called out in the plan review). - formatTable includes the Coverage column and renders each status distinctly. hasSchemaV2Metadata is a placeholder that returns false today; it will read row.surface / row.run_control / row.permission_profile when those columns ship in slice 2. Next slice: adapter foundation — start writing schema-v2 metadata into new gate rows from headless and autonomous paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:35:52 +02:00
Mikael Hugo	12f5eb2279	feat(triage): wire --apply CLI + canonical resolve_issue evidence kinds Three coupled changes that together complete the operator-facing --apply surface for sf headless triage: 1. headless.ts: parse --apply from commandArgs and forward to handleTriage. The triage option flow now distinguishes inspect (--list, --json), one-shot (--run), and orchestrated apply (--apply) cleanly. 2. help-text.ts: triage subcommand line + examples block now document the --apply mode (triage-decider → rubber-duck pipeline). 3. bootstrap/db-tools.js: resolve_issue tool now accepts the full canonical evidence-kind set instead of hardcoding "agent-fix": - agent-fix (default; commit-based fix evidence) - human-clear (stale, superseded, false positive, intentional close) - promoted-to-requirement (with required requirement_id) The tool surfaces a clear error when promoted-to-requirement is used without requirement_id. The promptGuidelines updated to walk callers through choosing the right kind. self-feedback-db.test.mjs extended with coverage for all three evidence kinds + the missing-requirement_id rejection path. Together these make sf headless triage --apply genuinely useful: the agent can produce a plan with any outcome, rubber-duck reviews it, and the runner applies via resolve_issue with the right evidence kind per decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:23:10 +02:00
Mikael Hugo	1881918ab8	feat(subagent): prompt-parts runtime — canonical named-parts composition New module: src/resources/extensions/sf/subagent/prompt-parts.js. Replaces the copilot-shaped boolean include* matrix with a canonical SF-native form: promptParts: [aiSafety, toolInstructions, parallelToolCalling, customAgentInstructions, environmentContext, agentBody, ...] Each part is a registered renderer (PROMPT_PARTS) that emits a specific section text given context. composeAgentPrompt orders parts deterministically, deduplicates, and concatenates with consistent separators. validatePromptParts rejects unknown keys at agent-load time so typos surface immediately instead of silently producing an empty section. Integrated into: - subagent/agents.js: validateAgentDefinition runs the new validator at agent discovery; built-in agents must validate (project/user agents with invalid promptParts get skipped). - subagent/index.js: dispatch path uses composeAgentPrompt to assemble the runtime system prompt. - unit-context-manifest.js: unit-type manifests declare their promptParts allowlist; validation runs against the same registry so unit dispatch and agent dispatch share one canonical schema. - agents/rubber-duck.agent.yaml: converted from the boolean include* form to the canonical array form. Tests: - subagent-agent-yaml.test.mjs: validates the array shape, rejects unknown part keys, asserts built-in agents validate cleanly, project overrides win. - unit-context-manifest-prompt-parts.test.mjs (new): asserts every unit-type manifest's promptParts is valid per the registry. The copilot boolean-include shape is intentionally NOT supported: this is the SF-native canonical form, simpler to read and harder to typo (no silent no-op for misspelled keys). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:22:26 +02:00
Mikael Hugo	f038f2a072	fix(uok-gate-runner): use correct getRelevantMemoriesRanked API The "Memory enrichment failed for gate test: DB error" warning in test output was a real API mismatch, not a benign degradation. The previous code called getRelevantMemoriesRanked(embedding, "gotcha", 2) but the canonical signature is getRelevantMemoriesRanked(query, limit). Replace the embedding-based call with a query-string built from gateId + failureClass + rationale, and pass limit=2. The embedding helper (computeGateEmbedding) is removed entirely since the memory store does its own embedding internally. Also switch the enrichment-failure log from logWarning to debugLog — gate enrichment is best-effort and must not affect gates, so the failure path should not surface as a warning to operators. Test fixture updated to assert against the new API call shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:21:18 +02:00
Mikael Hugo	6851869c00	refactor(auto): rename promptParts → promptCacheSplit in run-unit path The cache-split signal {before, after} was named promptParts in the autonomous-unit dispatch path, overloading the same term that .agent.yaml uses for declarative prompt-section composition. With the prompt-parts runtime landing as canonical (`aiSafety`, `toolInstructions`, ...), the overload becomes confusing — promptParts now means "list of declarative section keys", not "before/after cache-split tuple". Renames in run-unit.js, phases-unit.js (call site), and run-unit.test.mjs. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 17:20:59 +02:00

1 2 3 4 5 ...

4473 commits