The HaltWatchdog fires when the loop goes >10s without a heartbeat. Each
iteration ends with a heartbeat, but unit execution itself can take 3+ minutes.
Without a heartbeat at the start of the unit phase, the watchdog detects idle
and emits a false-positive 'possible stuck iteration' error.
Add watchdog.heartbeat() immediately before both runUnitPhaseViaContract calls
(one in the custom-engine path, one in the dev path) so the watchdog timer is
reset before the long-running work begins.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add `adversary` to SUPPORTED_MODEL_ROLES and a new symbolic constraint
`lineage-diverse-from-worker` to SUPPORTED_MODEL_ROLE_CONSTRAINTS.
Default constraints for `adversary` and `reviewer` now include
`lineage-diverse-from-worker` so the reviewer/adversary CANNOT be a
lineage-twin of the model that produced the artifact under review —
prevents "yeah looks fine to me" rubber-stamp from same-family models.
Helpers exported alongside the policy:
- rootVendorFor(modelId) → "anthropic" | "openai" | "google" | "moonshot"
| "mistral" | "minimax" | "zhipu" | "meituan" | "unknown"
- isSameRootVendor(candidateId, workerId) → boolean (fail-open on unknown)
These are the building blocks the selector needs. The actual filter
wiring in auto-model-selection's selectAndApplyModel is left as a
documented TODO — the function doesn't currently thread role context
through, so plugging in lineage filtering needs a small refactor that
is out of scope here.
Tests: 24 pass (was 6 + 18 new). Coverage: role registration,
constraint registration, defaults, validation, rootVendor mapping
matrix, isSameRootVendor predicate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The catch block was swallowing the actual error, leaving operators with
"v2 init failed, falling back to v1 string-matching" and no diagnostic
to act on. Found out this session that the failure was build staleness
(packages/coding-agent dist was not rebuilt by copy-resources) — would
have been instant to diagnose if the reason had been logged.
Now: "[headless] Warning: v2 init failed (Timeout waiting for response
to init...), falling back to v1 string-matching"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 7 dogfood failed with "0 tool calls — context exhaustion" even
though the swarm worker's session DID call tools. Root cause: the
phases-unit.js zero-tool-call guard reads from the PARENT session's
message ledger via snapshotUnitMetrics. The swarm worker runs in an
ISOLATED subagent session — its tool calls never appear in the
parent's messages, so the guard always sees 0 and fires a false-
positive context-exhaustion retry.
Fix:
- runUnitViaSwarm now returns swarmToolCallCount on the UnitResult,
surfacing the real worker tool call count from the onEvent stream
(collectedToolCalls.length, accurate end-to-end).
- phases-unit.js zero-tool-call guard checks
unitResult._via === "swarm" && swarmToolCallCount > 0 and bypasses
the false-positive retry, logging "zero-tool-calls-swarm-bypass".
Also adds a debug stderr line in subagent-runner.ts printing the tool
count after bindExtensions, confirming the worker session HAS the
full tool set (checkpoint + built-ins) — Hypotheses 1 and 2 from the
Round 8 brief ruled out by direct observation.
Tests: 3 new (swarmToolCallCount = 0 / N / 1-on-checkpoint-only);
2518 tests pass total, 0 regressions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The swarm dispatch path is now automatically enabled when SF_HEADLESS=1
without requiring the operator to set SF_AUTONOMOUS_VIA_SWARM=1. This makes
headless mode use the swarm execution engine by default, which is the
intended architecture for autonomous execution.
- Explicit SF_AUTONOMOUS_VIA_SWARM=1/true still works.
- Explicit SF_AUTONOMOUS_VIA_SWARM=0/false disables it even in headless.
- When unset + SF_HEADLESS=1, swarm is used.
- When unset + SF_HEADLESS!=1, legacy path is used (unchanged).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Self-feedback inline fix spawns 'sf headless triage --apply' as a detached
child when SF_HEADLESS=1. The child previously grabbed the same auto.lock
as the parent, causing lock contention that blocked the parent's unit
execution.
- Pass SF_SELF_FEEDBACK_WORKER=1 to the child environment.
- session-lock: effectiveLockFile() returns auto-self-feedback.lock when
the env var is set.
- session-lock: effectiveLockTarget() returns .sf/parallel/self-feedback/
so the OS-level lock directory is also isolated.
This mirrors the existing SF_PARALLEL_WORKER / SF_MILESTONE_LOCK mechanism
used for parallel milestone workers (#2184).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The swarm worker now receives the autonomous executor's compact role
prompt (buildSwarmWorkerSystemPrompt in auto/run-unit.js) which teaches
it the checkpoint tool contract and PDD field requirements. This closes
the last gap before SF_AUTONOMOUS_VIA_SWARM=1 can become default:
without the contract the worker never emitted checkpoint tool calls,
so workerSignaledOutcome stayed null and the loop terminated after one
unit. With the contract, the worker calls checkpoint(outcome=...) and
the orchestrator gets accurate completion signals.
Envelope carries two new optional fields propagated through every layer:
- executorSystemPrompt: overrides the swarm worker's default prompt
- executorTools: optional tool name filter
Flow: runUnitViaSwarm builds them → swarmDispatchAndWait reads them
from envelope → forwards to runAgentTurn → runHeadlessPrompt passes
them as systemPromptOverride / toolsOverride → runSubagent.
No changes needed to runSubagent: createAgentSession + bindExtensions
+ _refreshToolRegistry already picks up extension-registered tools
like `checkpoint` automatically.
Tests: 61 passing across the two affected files (22+9 baseline + 30
new); 234 test files passing overall.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Forward onEvent through swarm-dispatch → agent-runner → runSubagent
- Collect toolcall_end events in runUnitViaSwarm to build real tool-use blocks
- Detect checkpoint tool outcome for accurate unit completion signal
- Add headless.ts graceful shutdown (async signal handler, 2.5s timeout)
- RPC client stop() now awaits flush and propagates stop to child sessions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The doc-checker startup hook prints a "9 files need content" advisory on
every autonomous bootstrap. The flagged files are intentionally terse:
- AGENTS.md indices under docs/ and .sf/harness/* point at sibling
directories where the real content lives
- .sf/PRINCIPLES.md / STYLE.md / NON-GOALS.md are terse-by-design bullet
lists; the # heading line is stripped by countContentLines so a 9-bullet
file falls one short of the 10-line threshold despite being substantive
Adding them to STUB_ALLOWED_PATHS so the advisory only flags genuinely
unfilled scaffolds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two real bugs surfaced by SF_AUTONOMOUS_VIA_SWARM=1 dogfood (Round 4):
1. Second dispatch to the same swarm agent returned reply=null because
each MessageBus instance held a 30s-stale inbox cache. runAgentTurn
now accepts opts.onlyMessageId; when set it forces agent._inbox.refresh()
from SQLite, processes only that message, and leaves stale messages
untouched for later turns. dispatchAndWait passes the just-dispatched
messageId so each call is surgical.
2. runUnitViaSwarm now writes an appendAutonomousSolverCheckpoint and
synthesizes a swarm_unit_complete tool_use block alongside the text
reply, so phases-unit.js stops firing claimed-checkpoint-without-tool
repair loops. Outcome is conservatively "continue" — a real "complete"
requires the swarm agent to emit an actual checkpoint tool call
(future round wires runSubagent.onEvent through dispatchAndWait).
Tests: 51 passing for the two affected files (11 swarm-dispatch +
40 run-unit-via-swarm). Full suite: 1760/1760.
Known remaining gap before flipping default: synthesized outcome is
always "continue", so the loop relies on iteration caps for
termination rather than agent-signaled completion. Wiring real tool
calls through is the next round.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Headless mode waits for 'Assisted/Autonomous mode stopped' to detect
completion. When the loop exits via natural break (e.g. step-wizard
in /next), stopAuto() is never called, so headless hangs forever.
- Add s.stopAutoCalled flag to AutoSession
- Set flag in stopAuto(), clear in cleanupAfterLoopExit()
- Send terminal notification from cleanupAfterLoopExit() only when
stopAuto() was bypassed
- Fixes sf headless next hanging after unit completes
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add runUnitViaSwarm as an opt-in path in auto/run-unit.js. When
SF_AUTONOMOUS_VIA_SWARM=1 (or =true), each unit dispatch builds a
DispatchEnvelope (unitType -> workMode via deriveWorkMode), calls
swarmDispatchAndWait, and returns the agent reply as a synthetic
{status: "completed", event.messages: [{role: "assistant", content: reply}]}
matching the shape phases-unit.js / classifyExecutorRefusal already expect.
Default (flag unset) is byte-identical to today — no regression in the
default path, 1751/1751 tests pass.
Known gap (acceptable for an experimental opt-in, must be closed before
swarm becomes default):
- Tool-call events from the swarm worker do NOT surface to the
orchestrator UI (runAgentTurn handles them internally).
- The worker emits a plain text reply, not a structured checkpoint,
so phases-unit.js' checkpoint-missing repair path will not trigger
and classifyExecutorRefusal will not detect refusals.
This is the first concrete step toward routing autonomous unit work
through swarm: role-based agent selection, memory inheritance via the
envelope, and a durable bus audit trail of every unit dispatch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add SwarmDispatchLayer.dispatchAndWait(envelope, { timeoutMs, signal })
which enqueues via _busDispatch, drives the target agent's turn via
runAgentTurn (in-process runSubagent), and reads back the agent's reply
from the bus. Returns DispatchResult extended with reply + replyMessageId.
This is the missing piece for collapsing /delegate-style subagent calls
into the swarm interface: callers that need a reply (not just delivery)
can now use the swarm contract instead of the subagent extension's
bespoke dispatch path. Round 4 will migrate those callers.
New helper MessageBus.getReplyTo(messageId, fromAgent) queries SQLite
directly via json_extract for the most recent reply to a given message.
Plus 8 tests covering happy path, error paths (no reply, runner throws,
runner returns {error}), the swarmDispatchAndWait convenience function,
and the A2A short-circuit path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add RunSubagentOptions.onEvent callback so callers (TUI live update panel
for /delegate, /rubber-duck, etc.) get every session event without polling.
Errors from the callback are caught so a buggy caller cannot crash the agent.
Chain caller-supplied AbortSignal through a local AbortController in
runSingleAgent and register it in a new liveSubagentControllers set so
stopLiveSubagents aborts in-process subagents alongside the legacy spawn-based
processes (cmux split, sift codebase_search).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add autoTriageTodo() helper that checks root TODO.md for raw dump notes
beyond the empty template before each autonomous cycle
- Lazy-imports buildTodoTriageLLMCall + triageTodoDump from commands-todo.js
to avoid startup overhead
- Triage results written to DB backlog with clear=true + backlog=true
- Best-effort: never blocks autonomous loop on triage failure
- Fast-path skips when TODO.md is empty template or doesn't exist
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Make AgentSwarm.run() async with optional enableLLM flag
- Wire runAgentTurn from agent-runner.js into all 4 topologies
(round_robin, supervisor, dynamic, sleeptime)
- Update drainSleeptimeQueue to use runAgentTurn for actual LLM
execution instead of passive inbox reading
- Export runAgentTurn, runAgentLoop, runSwarmTurn from uok/index.js
- Update PersistentAgent JSDoc to reflect runner exists
- Fix test imports after extension consolidation (ttsr, google-search)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cmux was a standalone extension directory with no extension-manifest.json,
functioning as a utility library for the sf extension. Moving it into sf/cmux/
makes the dependency explicit and removes the orphaned extension directory.
Import paths updated:
- commands-cmux.js, notifications.js, auto.js: ../cmux → ./cmux
- bootstrap/system-context.js: ../../cmux → ../cmux
- subagent/index.js: ../../cmux → ../cmux
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Google Search was a standalone extension providing a single tool
(google_search) that used Gemini's Google Search grounding feature.
It had fallback logic to search-the-web providers (Tavily, Brave) when
Google OAuth was unavailable.
Merging it into search-the-web consolidates all web search capabilities
into one extension and eliminates the tight coupling between the two.
Changes:
- Copied google-search tool logic into search-the-web/tool-google-search.js
- Added registerGoogleSearchTool / resetGoogleSearchCache exports
- Integrated into search-the-web/index.js deferred loading
- Added google_search to search-the-web extension-manifest.json tools
- Deleted google-search/ extension directory
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Create web/middleware.ts to authenticate all API routes via bearer token
and origin checks (previously unauthenticated due to missing middleware file)
- Fix path traversal in browse-directories: replace startsWith with
realpathSync + relative + isAbsolute containment checks
- Fix XSS in session HTML export: escape raw HTML blocks via marked renderer
- Fix PTY process leak: destroy session on SSE stream cancellation
- Fix unhandled exception in terminal sessions POST: wrap getOrCreateSession
in try/catch with structured JSON error response
- Fix silent child-process failure in headless dispatch: add exit handler
to write failed claim when sf headless triage exits non-zero
- Fix TypeError on malformed claim JSON: add Array.isArray guard before
accessing claim.ids.length
All changes type-check cleanly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The existing dispatch used pi.sendMessage to queue a chat followUp.
That works in interactive sf sessions but no chat agent is listening
in 'sf headless' / autonomous flows — the message is queued and never
delivered, leaving the high/critical blocker active on every iteration.
When SF_HEADLESS=1, spawn the same triage-decider → review-code pipeline
(via the already-shipped 'sf headless triage --apply' subprocess) instead.
The autonomous loop then sees resolved entries via DB on the next gate
check, no chat agent required.
Forge-only: the dispatcher still only operates in the SF repo itself —
`readAllSelfFeedback` for non-forge repos returns the upstream-feedback
log (SF developer work), which must not be auto-dispatched from inside
consumer projects. Documented that constraint inline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
auto-prompts.js called `join(base, ...)` in 11 places but only imported
`basename` from node:path. Crashed autonomous mode every iteration with
ReferenceError: join is not defined — observed in dr repo, 3 consecutive
iteration failures triggered the hard stop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Schema now accepts the same five levels used elsewhere in the codebase
(minimal/low/medium/high/bypassed) instead of the stale full/restricted/
sandbox triple. Docs and env test updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observed 2026-05-14: a triage --apply run hung for 33+ minutes because
the spawned subagent process stalled (provider SDK call without its own
timeout) and defaultAgentRunner had no watchdog — it waited indefinitely
on proc.on("close").
Adds a per-dispatch watchdog (default 8 min, override via
SF_TRIAGE_AGENT_TIMEOUT_MS env). On expiry: SIGTERM → 5s grace →
SIGKILL. Resolves immediately with ok=false / exitCode=124 (POSIX
timeout convention) so the trust / review / mutation gates surface
the failure as a real outcome instead of a silent stall.
Provider-agnostic: the timeout protects the orchestrator regardless of
which model the router picks. Operators running long-context provider
calls can bump the env var; default 8min matches runTriage /
runReflection's existing completeSimple timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit follow-up (fix A). manual-attention outcomes were counted
by getGateRunStats but dropped from the user-facing surface — they
inflated `total` invisibly with no distinct column or key, so an
operator couldn't tell a gate with 5 pass / 3 manual-attention apart
from a gate with 5 pass / 3 fail.
Adds `manualAttention: number` to GateHealthEntry and renders it as
its own column between Fail and Retry in the human table. JSON
consumers get the new key alongside pass/fail/retry.
Test count for headless-uok-status.test.mjs: 30/30 (+2 new — column
present in header, distinguishable from fail in row).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds focused unit tests for the slice-3b wiring:
- UokGateRunner.run emits surface/runControl/permissionProfile/
parentTrace on all three trace paths (normal, unknown-gate,
circuit-breaker-blocked) and omits them when absent.
- buildAutonomousUokContext pins surface=autonomous + runControl=
autonomous and derives permissionProfile from session/prefs
(YOLO → low, prefs.permissionLevel honored, "high" default).
- emitAutonomousGate forwards the schema-v2 ctx into UokGateRunner
(covers the phases-pre-dispatch / phases-guards call sites via
the new shared helper).
- handlePlanSlice options.uokContext lands on every seeded Q3-Q8
quality_gates row; without it, rows stay in the legacy null shape.
Refactors phases-pre-dispatch and phases-guards to call the new
emitAutonomousGate helper so the three sites stay in sync going
forward. phases-finalize keeps its inline UokGateRunner because the
verification gate's execute callback isn't a static verdict.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 3b of "Make UOK the SF Control Plane". handlePlanSlice now
accepts an optional uokContext option and threads it into every
insertGateRow call (Q3, Q4 slice gates; Q5, Q6, Q7 per task; Q8
slice closeout).
executePlanSlice derives the ctx from the singleton autonomous session
when one is active — currentTraceId becomes the v2 traceId/parentTrace,
surface and runControl are pinned to "autonomous", permissionProfile
follows session/prefs. Tools invoked outside an autonomous loop
(interactive REPL, headless one-shot) pass uokContext=null and the
seeded rows fall through to the legacy NULL-column shape, classified
as "legacy" by status uok.
Lazy import of auto/session.js keeps headless/test code paths from
paying the session-singleton load cost when they don't need it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 3b of "Make UOK the SF Control Plane". The autonomous loop's
three high-traffic gate sites (resource-version-guard,
pre-dispatch-health-gate, planning-flow-gate in phases-pre-dispatch;
plan-gate in phases-guards; unit-verification-gate in phases-finalize)
now build a schema-v2 UOK run-context per iteration and pass
surface/runControl/permissionProfile/parentTrace into the gate runner.
The gate-runner emits these onto every gate_run trace event, so the
classifier in `sf headless status uok --json` reads them as
coverageStatus: "ok" instead of "legacy".
New helper uok/auto-uok-ctx.js pins surface="autonomous" and
runControl="autonomous" for these phases and derives permissionProfile
from session/prefs: "low" under YOLO or a minimal/low permissionLevel,
"medium" for medium, "high" otherwise (the default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex audit (Q4) flagged that the mutation gate landed in slice 3a but
the test suite only verified the three earlier gates. Add coverage:
- agree-path: mutation-gate fires with outcome=fail, rejectedCount=1,
resolvedCount=0 (the test fixture has no real ledger entry for the
decision id, so markResolved rejects it — the gate correctly surfaces
the partial failure)
- disagree-path: mutation-gate does NOT fire (apply phase skipped)
Pins the 4-gate contract end-to-end. Suite: 4/4 in this file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>