Commit graph

4463 commits

Author SHA1 Message Date
Mikael Hugo
903cdd4d9d feat(subagent): event streaming for in-process runSubagent
Add RunSubagentOptions.onEvent callback so callers (TUI live update panel
for /delegate, /rubber-duck, etc.) get every session event without polling.
Errors from the callback are caught so a buggy caller cannot crash the agent.

Chain caller-supplied AbortSignal through a local AbortController in
runSingleAgent and register it in a new liveSubagentControllers set so
stopLiveSubagents aborts in-process subagents alongside the legacy spawn-based
processes (cmux split, sift codebase_search).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 04:04:52 +02:00
Mikael Hugo
62f886430c fix: run subagents in process by default 2026-05-15 03:59:34 +02:00
Mikael Hugo
8b0f0bbd65 fix: harden headless dogfood self-healing 2026-05-15 03:53:15 +02:00
Mikael Hugo
3ac5aede1e fix: repair headless runtime self-healing 2026-05-15 03:33:29 +02:00
Mikael Hugo
72c3811a7b feat(auto): auto-triage TODO.md on each autonomous cycle
- Add autoTriageTodo() helper that checks root TODO.md for raw dump notes
  beyond the empty template before each autonomous cycle
- Lazy-imports buildTodoTriageLLMCall + triageTodoDump from commands-todo.js
  to avoid startup overhead
- Triage results written to DB backlog with clear=true + backlog=true
- Best-effort: never blocks autonomous loop on triage failure
- Fast-path skips when TODO.md is empty template or doesn't exist

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 03:19:13 +02:00
Mikael Hugo
ca7ff554c3 feat(swarm): integrate LLM runner into AgentSwarm.run()
- Make AgentSwarm.run() async with optional enableLLM flag
- Wire runAgentTurn from agent-runner.js into all 4 topologies
  (round_robin, supervisor, dynamic, sleeptime)
- Update drainSleeptimeQueue to use runAgentTurn for actual LLM
  execution instead of passive inbox reading
- Export runAgentTurn, runAgentLoop, runSwarmTurn from uok/index.js
- Update PersistentAgent JSDoc to reflect runner exists
- Fix test imports after extension consolidation (ttsr, google-search)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 03:05:01 +02:00
Mikael Hugo
f6619b792c refactor(extensions): move cmux into sf extension as internal module
cmux was a standalone extension directory with no extension-manifest.json,
functioning as a utility library for the sf extension. Moving it into sf/cmux/
makes the dependency explicit and removes the orphaned extension directory.

Import paths updated:
- commands-cmux.js, notifications.js, auto.js: ../cmux → ./cmux
- bootstrap/system-context.js: ../../cmux → ../cmux
- subagent/index.js: ../../cmux → ../cmux

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 02:34:35 +02:00
Mikael Hugo
534ed85ee1 refactor(extensions): merge google-search into search-the-web
Google Search was a standalone extension providing a single tool
(google_search) that used Gemini's Google Search grounding feature.
It had fallback logic to search-the-web providers (Tavily, Brave) when
Google OAuth was unavailable.

Merging it into search-the-web consolidates all web search capabilities
into one extension and eliminates the tight coupling between the two.

Changes:
- Copied google-search tool logic into search-the-web/tool-google-search.js
- Added registerGoogleSearchTool / resetGoogleSearchCache exports
- Integrated into search-the-web/index.js deferred loading
- Added google_search to search-the-web extension-manifest.json tools
- Deleted google-search/ extension directory

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 02:33:05 +02:00
Mikael Hugo
f0c3eaf999 refactor(extensions): merge ttsr into guardrails
TTSR (Time Traveling Stream Rules) monitored streaming output against regex
patterns. Guardrails blocked dangerous actions and redacted secrets. Both are
safety/guardrail concerns — merging them into one extension reduces surface
area and simplifies the safety model.

Changes:
- Copied ttsr-rule-loader.js, ttsr-manager.js, ttsr-interrupt.md into guardrails/
- Updated guardrails extension-manifest.json with ttsr hooks (turn_start,
  message_update, turn_end, agent_end)
- Integrated TTSR session_start/turn_start/message_update/turn_end/agent_end
  handlers into guardrails/index.js
- Deleted ttsr/ extension directory

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 02:28:40 +02:00
Mikael Hugo
2d5a05a48b fix(security): resolve 7 findings from full-repo code review
- Create web/middleware.ts to authenticate all API routes via bearer token
  and origin checks (previously unauthenticated due to missing middleware file)

- Fix path traversal in browse-directories: replace startsWith with
  realpathSync + relative + isAbsolute containment checks

- Fix XSS in session HTML export: escape raw HTML blocks via marked renderer

- Fix PTY process leak: destroy session on SSE stream cancellation

- Fix unhandled exception in terminal sessions POST: wrap getOrCreateSession
  in try/catch with structured JSON error response

- Fix silent child-process failure in headless dispatch: add exit handler
  to write failed claim when sf headless triage exits non-zero

- Fix TypeError on malformed claim JSON: add Array.isArray guard before
  accessing claim.ids.length

All changes type-check cleanly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 02:18:43 +02:00
Mikael Hugo
def1edefa9 sf snapshot: uncommitted changes after 268m inactivity 2026-05-15 02:08:06 +02:00
Mikael Hugo
7e1631618a fix(self-feedback-drain): route inline-fix dispatch via 'sf headless triage --apply' when SF_HEADLESS=1
The existing dispatch used pi.sendMessage to queue a chat followUp.
That works in interactive sf sessions but no chat agent is listening
in 'sf headless' / autonomous flows — the message is queued and never
delivered, leaving the high/critical blocker active on every iteration.

When SF_HEADLESS=1, spawn the same triage-decider → review-code pipeline
(via the already-shipped 'sf headless triage --apply' subprocess) instead.
The autonomous loop then sees resolved entries via DB on the next gate
check, no chat agent required.

Forge-only: the dispatcher still only operates in the SF repo itself —
`readAllSelfFeedback` for non-forge repos returns the upstream-feedback
log (SF developer work), which must not be auto-dispatched from inside
consumer projects. Documented that constraint inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:39:47 +02:00
Mikael Hugo
b0ebe7ce18 fix: register sf stop command outside tui 2026-05-14 21:30:00 +02:00
Mikael Hugo
2e4bdd292c fix: keep hidden sf commands callable in print mode 2026-05-14 21:25:18 +02:00
Mikael Hugo
ccdf530488 fix(auto-prompts): add missing join import from node:path
auto-prompts.js called `join(base, ...)` in 11 places but only imported
`basename` from node:path. Crashed autonomous mode every iteration with
ReferenceError: join is not defined — observed in dr repo, 3 consecutive
iteration failures triggered the hard stop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:19:09 +02:00
Mikael Hugo
a3b68bb269 fix(env): align SF_PERMISSION_LEVEL enum with permission-profile values
Schema now accepts the same five levels used elsewhere in the codebase
(minimal/low/medium/high/bypassed) instead of the stale full/restricted/
sandbox triple. Docs and env test updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 21:11:36 +02:00
Mikael Hugo
f88b48b0aa fix: show print mode liveness 2026-05-14 20:59:19 +02:00
Mikael Hugo
487237a32c fix: bound sf print mode and chat routing 2026-05-14 20:55:00 +02:00
Mikael Hugo
b19096800b fix(triage-apply): 8-minute watchdog on agent dispatch subprocess
Observed 2026-05-14: a triage --apply run hung for 33+ minutes because
the spawned subagent process stalled (provider SDK call without its own
timeout) and defaultAgentRunner had no watchdog — it waited indefinitely
on proc.on("close").

Adds a per-dispatch watchdog (default 8 min, override via
SF_TRIAGE_AGENT_TIMEOUT_MS env). On expiry: SIGTERM → 5s grace →
SIGKILL. Resolves immediately with ok=false / exitCode=124 (POSIX
timeout convention) so the trust / review / mutation gates surface
the failure as a real outcome instead of a silent stall.

Provider-agnostic: the timeout protects the orchestrator regardless of
which model the router picks. Operators running long-context provider
calls can bump the env var; default 8min matches runTriage /
runReflection's existing completeSimple timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 20:28:05 +02:00
Mikael Hugo
7cb1eef948 feat: record sf chat workflow evidence 2026-05-14 20:27:53 +02:00
Mikael Hugo
47867c1236 feat: route clear sf chat commands 2026-05-14 20:21:37 +02:00
Mikael Hugo
ab1a1edcf9 refactor: tier sf slash commands 2026-05-14 20:14:09 +02:00
Mikael Hugo
587b5fa31c refactor: narrow sf slash surface 2026-05-14 20:04:53 +02:00
Mikael Hugo
5ce9df2e37 refactor: make bundled agents internal 2026-05-14 19:54:56 +02:00
Mikael Hugo
18aa257ede refactor: rename review gate agent 2026-05-14 19:43:01 +02:00
Mikael Hugo
62fbc5d57b refactor: align agent resource overlays 2026-05-14 19:32:41 +02:00
Mikael Hugo
7000373e88 fix(uok-status): surface manualAttention bucket in status uok output
Codex audit follow-up (fix A). manual-attention outcomes were counted
by getGateRunStats but dropped from the user-facing surface — they
inflated `total` invisibly with no distinct column or key, so an
operator couldn't tell a gate with 5 pass / 3 manual-attention apart
from a gate with 5 pass / 3 fail.

Adds `manualAttention: number` to GateHealthEntry and renders it as
its own column between Fail and Retry in the human table. JSON
consumers get the new key alongside pass/fail/retry.

Test count for headless-uok-status.test.mjs: 30/30 (+2 new — column
present in header, distinguishable from fail in row).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:46:28 +02:00
Mikael Hugo
7794208340 test(uok,slice-3b): cover ctx propagation through gate-runner, phases, plan-slice
Adds focused unit tests for the slice-3b wiring:
  - UokGateRunner.run emits surface/runControl/permissionProfile/
    parentTrace on all three trace paths (normal, unknown-gate,
    circuit-breaker-blocked) and omits them when absent.
  - buildAutonomousUokContext pins surface=autonomous + runControl=
    autonomous and derives permissionProfile from session/prefs
    (YOLO → low, prefs.permissionLevel honored, "high" default).
  - emitAutonomousGate forwards the schema-v2 ctx into UokGateRunner
    (covers the phases-pre-dispatch / phases-guards call sites via
    the new shared helper).
  - handlePlanSlice options.uokContext lands on every seeded Q3-Q8
    quality_gates row; without it, rows stay in the legacy null shape.

Refactors phases-pre-dispatch and phases-guards to call the new
emitAutonomousGate helper so the three sites stay in sync going
forward. phases-finalize keeps its inline UokGateRunner because the
verification gate's execute callback isn't a static verdict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:33:26 +02:00
Mikael Hugo
95ea9eecee feat(uok,plan-slice): seed Q3-Q8 gate rows with schema-v2 ctx from autonomous session
Slice 3b of "Make UOK the SF Control Plane". handlePlanSlice now
accepts an optional uokContext option and threads it into every
insertGateRow call (Q3, Q4 slice gates; Q5, Q6, Q7 per task; Q8
slice closeout).

executePlanSlice derives the ctx from the singleton autonomous session
when one is active — currentTraceId becomes the v2 traceId/parentTrace,
surface and runControl are pinned to "autonomous", permissionProfile
follows session/prefs. Tools invoked outside an autonomous loop
(interactive REPL, headless one-shot) pass uokContext=null and the
seeded rows fall through to the legacy NULL-column shape, classified
as "legacy" by status uok.

Lazy import of auto/session.js keeps headless/test code paths from
paying the session-singleton load cost when they don't need it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:20:32 +02:00
Mikael Hugo
a2c55d5fde feat(uok,autonomous-loop): wire pre-dispatch/guard/finalize gates to schema-v2 ctx
Slice 3b of "Make UOK the SF Control Plane". The autonomous loop's
three high-traffic gate sites (resource-version-guard,
pre-dispatch-health-gate, planning-flow-gate in phases-pre-dispatch;
plan-gate in phases-guards; unit-verification-gate in phases-finalize)
now build a schema-v2 UOK run-context per iteration and pass
surface/runControl/permissionProfile/parentTrace into the gate runner.

The gate-runner emits these onto every gate_run trace event, so the
classifier in `sf headless status uok --json` reads them as
coverageStatus: "ok" instead of "legacy".

New helper uok/auto-uok-ctx.js pins surface="autonomous" and
runControl="autonomous" for these phases and derives permissionProfile
from session/prefs: "low" under YOLO or a minimal/low permissionLevel,
"medium" for medium, "high" otherwise (the default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:18:17 +02:00
Mikael Hugo
7003da3f6a test(uok): assert triage-apply-mutation-gate fires after agree-path
Codex audit (Q4) flagged that the mutation gate landed in slice 3a but
the test suite only verified the three earlier gates. Add coverage:

- agree-path: mutation-gate fires with outcome=fail, rejectedCount=1,
  resolvedCount=0 (the test fixture has no real ledger entry for the
  decision id, so markResolved rejects it — the gate correctly surfaces
  the partial failure)
- disagree-path: mutation-gate does NOT fire (apply phase skipped)

Pins the 4-gate contract end-to-end. Suite: 4/4 in this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:16:04 +02:00
Mikael Hugo
cf52aceb64 feat(uok,gate-runner): extend ctx with surface/runControl/permissionProfile/parentTrace
Slice 3b of "Make UOK the SF Control Plane". UokGateRunner.run now reads
the schema-v2 run-context fields off ctx and propagates them into every
gate_run trace event (unknown-gate path, circuit-breaker-blocked path,
normal execution path). Fields are omitted when absent so legacy callers
keep the pre-v2 shape and status-uok continues to classify them as
"legacy" rather than "incomplete".

Helper buildGateRunEvent centralizes the trace shape so the three sites
stay in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:13:45 +02:00
Mikael Hugo
61d3031007 test(uok): fail-closed contract for triage-apply gate emission
Adds the missing test case that confirms the fail-closed semantics
the parallel worker shipped in slice 3a: when the trace writer
cannot persist a UOK gate record (e.g. .sf/traces is unwritable),
runTriageApply MUST abort before any subagent runs and surface the
emission failure as the run error.

This pins down the contract codex Q5 noted as soft: enrichment
failures are debug-only, but PRIMARY gate emission for the apply
flow is hard-required. Without observable gates, an apply that
mutates the ledger has no audit trail — refusing is the right call.

Test asserts: trace-dir write failure → ok=false, error contains
"UOK gate emission failed for trusted-agent-source-gate", and the
mocked agentRunner was never invoked.

Suite: 1682/1682.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:08:29 +02:00
Mikael Hugo
454e051aed feat(uok): slice 3a — triage --apply emits 4 schema-v2 UOK gates
First production caller of the schema-v2 writer chain. Every
`sf headless triage --apply` invocation now emits four gate_run trace
events with surface=headless, runControl=supervised, permissionProfile=
high, traceId=flowId — making the gates visible in `status uok --json`
with coverageStatus: "ok" (or fail/manual-attention on reject paths).

Gates emitted, in order:

  1. trusted-agent-source-gate — fires on the trust precondition:
       pass: both triage-decider and rubber-duck are SF-shipped built-ins
       fail: missing-agent OR non-builtin source OR untrusted custom runner
       (covers all three pre-dispatch refusal paths so operators see the
       failure in status uok, not just in the journal)
  2. triage-plan-validation-gate — fires on the strict-parse contract:
       pass: parseTriagePlanStrict returns a valid plan covering expectedIds
       fail: missing marker / bad yaml / unknown id / outcome-required field missing
  3. triage-apply-review-gate — fires on the rubber-duck verdict:
       pass: rubber-duck: agree → apply phase proceeds
       fail: rubber-duck disagreed → clean pause, no mutations
       manual-attention: rubber-duck subagent failed to complete
  4. triage-apply-mutation-gate — fires after applyTriagePlan:
       pass: every approved mutation landed
       fail: any rejected mutation
       manual-attention: zero approved mutations (all decisions were "fix")
     Includes counts in extra: resolvedCount, rejectedCount, pendingFixCount.

Reader-side fixes (codex review follow-up on slice 3a):

  - getDistinctGateIds (sf-db-gates.js) now UNIONs trace-event IDs with
    quality_gates DB IDs instead of returning trace IDs early when any
    exist. The old behavior silently hid slice-scoped DB-only gates the
    moment a flow-scoped trace landed.
  - getGateMeta (headless-uok-status.ts) now reads BOTH trace events and
    DB row, then picks whichever has the later evaluatedAt. Tie-break
    prefers trace (flow-scoped gates with no quality_gates FK row are
    trace-only). Old behavior preferred trace whenever surface was set,
    regardless of timestamp.

Live verification: ran `sf headless triage --apply` 4 times against the
operator's environment (rubber-duck is a project-level override).
trusted-agent-source-gate now shows in `sf headless status uok --json`
with total: 4, fail: 4, coverageStatus: "ok" — proving the schema-v2
metadata round-trips through the trace events and reaches the
classifier.

Tests:
  - headless-triage-uok-gates.test.ts (3 new tests): agree path emits
    3 pass gates with v2 metadata; disagree path emits review fail;
    unknown-id path emits validation fail with no review gate.
  - Existing test suites adjusted for the GateMetadataRow →
    GateRunContextRow rename (classifier helpers renamed consistently
    across .ts source and the .mjs test mirror).
  - Full SF + headless apply: 1681/1681.

Still legacy in production (slice 3b targets these next):
  - phases-pre-dispatch.js gates: resource-version-guard, pre-dispatch-
    health-gate, planning-flow-gate. None of these pass uokContext yet.
  - phases-unit.js gates: unit-verification-gate, plan-gate.
  - plan-slice.js: Q3/Q4/Q5/Q6/Q7/Q8 seed gates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:04:50 +02:00
Mikael Hugo
f0c57b58c6 feat(uok): slice 2 — schema-v2 metadata adapter + writer chain
Second slice of "Make UOK the SF Control Plane". Wires the DB-level
capability for schema-v2 gate metadata so future callers can flip
quality_gates rows from "legacy" to "ok"/"stale"/"incomplete" by
passing a canonical uokContext. No production caller passes ctx yet —
slice 3 wires producers (headless triage --apply, phases-pre-dispatch,
phases-unit).

Schema migration v66 (SCHEMA_VERSION bumped 65 → 66):
  - quality_gates gains 5 nullable columns: surface, run_control,
    permission_profile, trace_id, parent_trace.
  - Idempotent ALTERs via PRAGMA table_info probes — fresh-DB CREATE
    path already includes the columns; migration only ALTERs older DBs.
  - Existing rows keep NULL across the new columns, so classifyCoverage
    in headless-uok-status reads them as "legacy" — no day-one warning
    flood.

New adapter src/resources/extensions/sf/uok/run-context.js:
  - buildUokRunContext(opts) validates and normalizes the canonical
    camelCase shape: surface, runControl, permissionProfile, traceId
    (required), plus parentTrace, unitType, unitId, milestoneId,
    sliceId, taskId (optional). Frozen on success, null on any invalid
    or missing required field.
  - VALID_SURFACES / VALID_RUN_CONTROLS / VALID_PERMISSION_PROFILES
    enums reject typos at build time so we don't get silent schema-v2
    rows with garbage in the enum columns.
  - uokRunContextToGateColumns(ctx) translates camelCase → snake_case
    column shape used by sf-db-gates writers.

Writer chain (sf-db-gates.js):
  - insertGateRow now imports uokRunContextToGateColumns and translates
    g.uokContext (canonical camelCase) to the SQL column shape. Callers
    pass canonical ctx, the DB writer owns translation. NULL on legacy
    callers, NULL on malformed ctx.
  - saveGateResult mirrors the same translation; uses COALESCE(:col,
    col) so a missing ctx on a follow-up update preserves the row's
    existing schema-v2 metadata instead of nulling it.

Reader chain (headless-uok-status.ts):
  - getGateMeta SELECTs surface, run_control, permission_profile,
    trace_id alongside scope and evaluated_at. ORDER BY uses
    "evaluated_at IS NULL, evaluated_at DESC" for cross-SQLite safety
    (NULLS LAST is not portable).
  - classifyCoverage signature changed from (entry, metadataPresent:
    bool) to (entry, meta: GateMetadataRow). Returns "incomplete" when
    surface is set but runControl/permissionProfile/traceId missing —
    surfaces buggy writers instead of silently classifying as "ok".

Tests:
  - uok-run-context.test.mjs (12 tests): adapter validation, enum
    rejection, optional-field handling, frozen output, column
    translation.
  - uok-quality-gates-writer.test.mjs (5 tests): real DB round-trip
    proving insertGateRow + saveGateResult populate schema-v2 columns
    from canonical camelCase ctx, leave NULL on legacy/malformed,
    and preserve existing metadata via COALESCE on no-ctx updates.
  - headless-uok-status.test.mjs adjusted: classifier now takes
    GateMetadataRow; added test for "incomplete" classification.
  - sf-db-migration.test.mjs bumped expected version 65 → 66 and
    asserts the 5 new quality_gates columns exist.

Full SF suite: 1678/1678 ✓ (+17 from slice 2 + +9 from slice 1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:48:05 +02:00
Mikael Hugo
c058bef26d feat(uok-status): slice 1 — schema v2 + coverage classification + legacy tagging
First slice of "Make UOK the SF Control Plane". Ships the operator-
facing visibility primitive that subsequent slices fill in. No
enforcement yet, no new gates yet — just the contract.

Changes to sf headless status uok:

  - Bumps JSON output to schemaVersion: 2.
  - Adds coverageStatus per gate (ok | stale | incomplete | missing
    | legacy). Slice 1 only populates ok / stale / legacy:
      - legacy   row predates schema-v2 metadata (every existing row
                 today). NOT a warning — operators are not paged for
                 the rich history of pre-v2 records.
      - stale    schema-v2 row with no runs in window, OR last run
                 older than the 24h stale threshold. Surfaces gates
                 that stopped being exercised.
      - ok       schema-v2 row with recent runs in window.
    incomplete / missing wait for the schema-v2 writer adapter
    (slice 2) and the configured-gate registry (later).
  - Adds the Coverage column to the human table output.
  - Removes the stale "missing getDistinctGateIds import" workaround
    comment from headless-uok-status.ts:104. The import exists today
    (gate-runner.js:5); the comment was lying. Bypassing
    UokGateRunner.getHealthSummary is still appropriate but for a
    different reason — documented inline.

Tests (28 total, +9 new):
  - classifyCoverage: legacy wins over freshness; ok requires
    metadata + recent runs; stale fires on no-runs-in-window or
    last-run > 24h.
  - empty-DB does not false-positive coverage warnings (the bug
    codex called out in the plan review).
  - formatTable includes the Coverage column and renders each status
    distinctly.

hasSchemaV2Metadata is a placeholder that returns false today; it
will read row.surface / row.run_control / row.permission_profile
when those columns ship in slice 2.

Next slice: adapter foundation — start writing schema-v2 metadata
into new gate rows from headless and autonomous paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:35:52 +02:00
Mikael Hugo
12f5eb2279 feat(triage): wire --apply CLI + canonical resolve_issue evidence kinds
Three coupled changes that together complete the operator-facing
--apply surface for sf headless triage:

1. headless.ts: parse --apply from commandArgs and forward to
   handleTriage. The triage option flow now distinguishes inspect
   (--list, --json), one-shot (--run), and orchestrated apply
   (--apply) cleanly.

2. help-text.ts: triage subcommand line + examples block now document
   the --apply mode (triage-decider → rubber-duck pipeline).

3. bootstrap/db-tools.js: resolve_issue tool now accepts the full
   canonical evidence-kind set instead of hardcoding "agent-fix":
   - agent-fix (default; commit-based fix evidence)
   - human-clear (stale, superseded, false positive, intentional close)
   - promoted-to-requirement (with required requirement_id)
   The tool surfaces a clear error when promoted-to-requirement is
   used without requirement_id. The promptGuidelines updated to walk
   callers through choosing the right kind.

   self-feedback-db.test.mjs extended with coverage for all three
   evidence kinds + the missing-requirement_id rejection path.

Together these make sf headless triage --apply genuinely useful: the
agent can produce a plan with any outcome, rubber-duck reviews it,
and the runner applies via resolve_issue with the right evidence
kind per decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:23:10 +02:00
Mikael Hugo
1881918ab8 feat(subagent): prompt-parts runtime — canonical named-parts composition
New module: src/resources/extensions/sf/subagent/prompt-parts.js.
Replaces the copilot-shaped boolean include* matrix with a canonical
SF-native form:

  promptParts: [aiSafety, toolInstructions, parallelToolCalling,
                customAgentInstructions, environmentContext,
                agentBody, ...]

Each part is a registered renderer (PROMPT_PARTS) that emits a
specific section text given context. composeAgentPrompt orders parts
deterministically, deduplicates, and concatenates with consistent
separators. validatePromptParts rejects unknown keys at agent-load
time so typos surface immediately instead of silently producing an
empty section.

Integrated into:
  - subagent/agents.js: validateAgentDefinition runs the new
    validator at agent discovery; built-in agents must validate
    (project/user agents with invalid promptParts get skipped).
  - subagent/index.js: dispatch path uses composeAgentPrompt to
    assemble the runtime system prompt.
  - unit-context-manifest.js: unit-type manifests declare their
    promptParts allowlist; validation runs against the same registry
    so unit dispatch and agent dispatch share one canonical schema.
  - agents/rubber-duck.agent.yaml: converted from the boolean
    include* form to the canonical array form.

Tests:
  - subagent-agent-yaml.test.mjs: validates the array shape, rejects
    unknown part keys, asserts built-in agents validate cleanly,
    project overrides win.
  - unit-context-manifest-prompt-parts.test.mjs (new): asserts every
    unit-type manifest's promptParts is valid per the registry.

The copilot boolean-include shape is intentionally NOT supported:
this is the SF-native canonical form, simpler to read and harder to
typo (no silent no-op for misspelled keys).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:22:26 +02:00
Mikael Hugo
f038f2a072 fix(uok-gate-runner): use correct getRelevantMemoriesRanked API
The "Memory enrichment failed for gate test: DB error" warning in test
output was a real API mismatch, not a benign degradation. The previous
code called getRelevantMemoriesRanked(embedding, "gotcha", 2) but the
canonical signature is getRelevantMemoriesRanked(query, limit).

Replace the embedding-based call with a query-string built from
gateId + failureClass + rationale, and pass limit=2. The embedding
helper (computeGateEmbedding) is removed entirely since the memory
store does its own embedding internally.

Also switch the enrichment-failure log from logWarning to debugLog —
gate enrichment is best-effort and must not affect gates, so the
failure path should not surface as a warning to operators.

Test fixture updated to assert against the new API call shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:21:18 +02:00
Mikael Hugo
6851869c00 refactor(auto): rename promptParts → promptCacheSplit in run-unit path
The cache-split signal {before, after} was named promptParts in the
autonomous-unit dispatch path, overloading the same term that
.agent.yaml uses for declarative prompt-section composition. With the
prompt-parts runtime landing as canonical (`aiSafety`,
`toolInstructions`, ...), the overload becomes confusing —
promptParts now means "list of declarative section keys", not
"before/after cache-split tuple".

Renames in run-unit.js, phases-unit.js (call site), and
run-unit.test.mjs. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:20:59 +02:00
Mikael Hugo
289bf9e264 fix(triage-apply): strict plan validation + custom-runner guard + per-decision failures
Codex review follow-up (2026-05-14) addressed all three remaining
issues from the earlier rescue pass:

1. Strict plan validation. parseTriagePlanStrict refuses the WHOLE
   plan on any malformed item instead of silently dropping. Enforces:
   - completion marker "Self-feedback triage complete" present
   - exactly one fenced ```yaml block
   - every decision has non-empty id + outcome ∈ {fix, promote, close}
   - outcome-specific required fields (close → reason; promote →
     reason + requirement_id; fix → proposed_approach)
   - duplicate ids rejected
   - when expectedIds is supplied, decisions must cover the candidate
     set exactly — no extras (hallucinated ids), no missing
   Returns ParseTriagePlanResult with {plan, error} so the caller can
   surface the specific failure reason.

2. Custom-runner trust guard. runTriageApply refuses an injected
   options.agentRunner unless allowUntrustedRunner is also explicitly
   set. Production callers cannot inject a runner. Without this guard
   a custom runner could side-channel-mutate the ledger despite the
   read-only tool override (codex Q2).

3. Per-decision failure surfacing. applyTriagePlan now returns
   {resolvedIds, rejectedIds, pendingFixIds} instead of just
   resolvedIds. runTriageApply reports ok=false if rejectedIds is
   non-empty, with the count + ids in the error message. Mutations
   still happen one-by-one (no SQL transaction wrapping) but the
   failure is no longer silent (codex Q3).

Tests: src/tests/headless-triage-apply.test.ts now covers:
   - agree-path runs both agents in order; apply fails on missing
     ledger entry → ok=false, rejectedIds populated (the realistic
     contract for a test fixture without a seeded DB)
   - custom runner without allowUntrustedRunner refuses, agentRunner
     never invoked
   - rubber-duck disagrees → clean pause, ok=false, agreed=false
   - decider fails → skip rubber-duck
   - unknown id in plan rejected before review

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:19:12 +02:00
Mikael Hugo
d8ce433c7a fix(triage-apply): plan-and-review pipeline, no mutations before agreement
Codex review (2026-05-14) flagged the original runTriageApply design as
unsafe: triage-decider was invoked with resolve_issue in its tool list,
so it could (and would) close ledger entries during its own turn —
BEFORE rubber-duck saw the decisions. If rubber-duck disagreed, the
mutations from phase 1 had already landed with no rollback path.

Restructured to a 3-phase plan-and-review pipeline:

  Phase 1 — Plan: triage-decider runs READ-ONLY (resolve_issue removed
    from both the YAML and the runner's tool override) and emits a
    structured YAML plan as a fenced block. The plan is the contract;
    parseTriagePlan extracts it.

  Phase 2 — Review: rubber-duck reads the parsed plan + the original
    ledger entries and votes "rubber-duck: agree" or names concerning
    decisions. Read-only tools.

  Phase 3 — Apply: ONLY on agreement, this runner (not an agent) calls
    markResolved for each close/promote decision. Fix decisions are
    surfaced to the operator and never auto-mutate.

Other codex-flagged gaps addressed:

  - Trusted-source guard: --apply refuses to run when either agent has
    source != "builtin". Project/user overrides shadow built-ins (the
    documented precedence), but they don't get to silently disable
    rubber-duck's independence. Operators can still customize via
    --review mode.

  - Plan-not-emitted is a hard refuse: if the decider's output has no
    parseable ```yaml decisions: block, the apply runner returns
    ok=false with a clear error. We can't audit what we can't read.

  - Disagreement is a clean pause, not an error: returns ok=false with
    agreed=false and both outputs preserved for operator review.

  - The triage-decider YAML's prompt now codifies the plan-only contract
    explicitly: "You do not call resolve_issue. You produce a structured
    decision plan."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 17:10:43 +02:00
Mikael Hugo
ab682ddd6e feat(subagent): built-in rubber-duck + triage-decider agent YAMLs
First slice of putting the triage/rubber-duck flow into SF itself
(sf-mp5lnlbc-ty5fec). Two built-in agent definitions ship with SF and
get auto-discovered alongside operator-defined ones — no setup needed.

agents/rubber-duck.agent.yaml
  Devil's-advocate critic. Tools: "*". Reviews any artifact (default
  consumer: triage --apply pipeline) and surfaces ONLY confidently-real
  concerns. High-signal output: "rubber-duck: agree" or `## Concern N:`
  sections with evidence citations. Never proposes fixes.

agents/triage-decider.agent.yaml
  Self-feedback queue decider. Tools: [resolve_issue, view, grep, glob,
  git_log] — read-only investigation plus the one mutating tool needed
  to close/promote entries. No edit/write/bash — code fixes go to the
  operator. Implements the existing buildInlineFixPrompt protocol
  (Fix/Promote/Close per entry).

Both YAMLs include the copilot-style promptParts block as intent
documentation. SF's prompt-composition runtime doesn't honor those
flags yet; the day it lands, the agents pick it up without a YAML edit.

discoverAgents now loads from a built-in directory (sibling agents/
to subagent/) with source: "builtin". User and project definitions
override built-ins by name, preserving the existing precedence model.

Tests assert: (1) both built-ins discovered with source=builtin in
scope=both, (2) project override wins over built-in. Full SF suite:
1637/1637.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:53:36 +02:00
Mikael Hugo
192129a69e fix(triage): drop defaultModel from triage candidate pool
Operator's settings.json defaultModel is for general dispatch (typically
a cheap/flash pick — gemini-3-flash-preview in current config). Mixing
it into the triage candidate pool gave it a chance to win on cost
tie-break against agentic-better but pricier options from the explicit
enabledModels allowlist.

Triage is agentic-heavy; restrict its candidate pool to the operator's
enabledModels (kimi-coding/* + minimax/* + zai/* + …) and let the
agentic-weighted router pick. Also fixes the wildcard expansion path
which was calling a non-existent ai.getModelsByProvider — now correctly
uses ai.getModels(provider).

Dogfood confirms: router now picks kimi-coding/kimi-for-coding
(agentic 90) instead of gemini-3-flash-preview (operator default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:40:19 +02:00
Mikael Hugo
98d1b2b258 feat(triage): route runTriage via model-router using operator allowlist
Drops the hardcoded "google-gemini-cli/gemini-3-pro-preview" default and
routes through SF's own model-router using a new
BASE_REQUIREMENTS["self-feedback-triage"] (agentic-heavy: coding 0.4,
instruction 0.8, reasoning 0.8, agentic 0.9).

Candidate selection priority:
  1. Explicit options.model override (operator --model)
  2. options.candidates (test injection)
  3. ~/.sf/agent/settings.json enabledModels (expanded against pi-ai
     MODELS catalog) + defaultProvider/defaultModel
  4. TRIAGE_FALLBACK_CANDIDATES — Chinese-provider set
     (kimi + minimax + zai). Gemini intentionally NOT in the fallback
     so operators who removed it from settings don't silently re-default.

Dispatch walks the router-ranked list with retry-on-credential-error so
the top pick failing on missing API keys falls through to the next
candidate (caught the openai-no-key case in dogfood today).

Closes part 1 of sf-mp5khix3-9beona AC1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 16:29:56 +02:00
Mikael Hugo
e2dd625d7d sf snapshot: uncommitted changes after 383m inactivity 2026-05-14 16:03:35 +02:00
Mikael Hugo
2f0e5c8054 feat(subagent,run-unit): YAML agent loader + solver-pass tool scoping
Two coupled product changes from the working tree, validated together:

1. Agent YAML loader (subagent/agents.js + subagent-agent-yaml.test.mjs)
   .sf/agents/*.agent.yaml files now load as first-class agent
   definitions alongside the existing .agent.md frontmatter format.
   Adds `*` wildcard support for the tools field (unrestricted) and a
   parseAgentModel helper for the YAML-only model selector. Mirrors
   the copilot-style YAML format so SF can consume agent definitions
   shared across tools without forcing the markdown wrapping.

2. Solver-pass tool scoping (run-unit.js + phases-unit.js +
   run-unit.test.mjs)
   New scopeActiveToolsForRunUnit honors an explicit
   activeToolsAllowlist so callers can restrict a unit dispatch to a
   tighter tool set than the unit-type's default SF allowlist. The
   autonomous solver pass uses this to constrain the solver to just
   `checkpoint` — solver should reason and persist checkpoints, not
   edit files or dispatch tools. Keeps the solver inside its
   authority boundary.

Tests: 7/7 in the two affected files; full SF suite stays green.

Not in this commit: the sidekick-trigger event emission in
autonomous-solver.js and the external scripts/sidekick-runner.js +
.agents/policies/proactive-sidekick.yaml — that's an experiment
that stays in the working tree pending operator direction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:40:13 +02:00
Mikael Hugo
7ea41b89ae feat(ai,coding-agent): wireModelId — provider deployment alias
Adds an optional wireModelId field to the Model interface and a
resolveWireModelId helper. Forge's canonical model.id stays stable for
selection, capability scoring, policy, and history; providers now send
model.wireModelId on the wire when set, model.id otherwise.

Use cases: Azure deployment names, vendor model slugs that differ
from Forge's canonical identity, A/B routing where the operator wants
canonical history but a specific deployment.

Wired through every provider in @singularity-forge/ai (anthropic,
amazon-bedrock, azure-openai-responses, google, google-vertex,
google-gemini-cli, mistral, openai-codex-responses, openai-completions,
openai-responses) plus @singularity-forge/coding-agent's
ModelRegistry (model definitions + per-model overrides).

Tests: openai-completions wireModelId payload coverage +
model-registry-auth-mode coverage for the override + definition fields.
Full pi-ai + coding-agent suite: 956/956 ✓ (7 unrelated skipped).

This realizes the model-registry contract drafted in 1d753af6b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 09:25:21 +02:00
Mikael Hugo
a6c36a4b6b fix(headless-triage): --run takes precedence over --json/--list
Discovered via dogfood: \`sf headless triage --run --json\` short-
circuited to the candidate-list JSON before reaching the dispatch
path, so the run never happened.

--run is the action; --json/--list describe output format. Restructure
so --run always dispatches; --json then controls whether the run
result is JSON vs human text. Without --run, --json/--list still emit
the candidate digest as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:29:11 +02:00
Mikael Hugo
65c1914b1f test(idle-triage): lock in surfaceSelfFeedbackQueueOnIdle contract
Five unit tests covering the bail-time queue notifier landed in
001740680: notify-with-pointer when candidates exist, plural/singular
noun agreement, silent on empty queue, silent on non-forge basePath,
no-throw when downstream notify itself crashes (bail-path safety).

Locks in the contract for the partial-AC1 slice of sf-mp4rxkwb-l4baga
(autonomous loop surfaces the queue at idle) without yet touching the
larger remaining work (real self-feedback-triage unit type with
begin/dispatch/checkpoint/complete).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 08:11:10 +02:00