136 KiB
sf v3 — Specification
Version: 1.0.0-draft
Status: Research / Pre-implementation
Authors: singularity-ng
Implementation target: the next major version of singularity-forge (sf, formerly Get-Shit-Done / GSD), built on the existing pi-mono SDK packages already vendored under packages/pi-*. Not a fork of charmbracelet/crush.
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.
Retarget note (v1.0): earlier draft versions (0.1–0.8) targeted a Go fork of Crush. That direction was reconsidered after recognising:
- sf already has gen-2 harness control via pi-mono (vs. gen-1 skills which proved insufficient).
- The cold-start performance argument for Go is largely moot once the daemon (
packages/daemon) absorbs startup cost.- sf already ships an MCP server (
packages/mcp-server) — meaning other agent CLIs can call sf as a backend, not the inverse.- Most of the Crush infrastructure we'd inherit (TUI, agent loop, multi-provider) is duplicated in pi-mono.
The structure of the previous spec — phase machine, schema, hook pipeline, knowledge layer, persistent agents, conformance checklist — survives this retarget. The implementation target changes from Go-on-Crush to TypeScript-on-pi-mono.
Implementation status (against
singularity-foundryHEAD): This document is annotated section-by-section with current status. Three categories:
- EXISTS — already implemented in sf and matches the spec's contract (modulo minor naming).
- PARTIAL — implemented but diverges from the spec; needs alignment work.
- NEW — not yet implemented.
Conformance items (§ 26) are similarly tagged. Roughly 70% of this spec is EXISTS or PARTIAL in sf today; the remaining 30% (persistent-agent inbox model, Singularity Memory integration, SSH workers, several supervisor refinements) is genuinely new work.
Table of Contents
- Overview
- Definitions
- Data Model
- Phase State Machine
- Orchestration Loop
- Worker Attempt Lifecycle
- Prompt Contract
- Context Budget
- Supervision
- Hook Pipeline
- Workspace Management
- Worktree Isolation
- Verification Gates
- Configuration
- Model Routing
- Knowledge Layer
- Persistent Agents
- Inter-Agent Messaging
- Observability
- Failure Taxonomy
- Trust Boundary
- Distributed Execution
- Plugin Extension Points
- Secret Management
- CLI Commands
- Conformance Checklist
1. Overview
sf is an autopilot for software engineering work that the user owns end-to-end: the user states a goal (/sf plan "add OAuth") and sf decomposes, plans, executes, verifies, reviews, and merges through a structured phase pipeline without per-unit human intervention. The user watches or steers; the agent executes.
sf v3 is the next major version of singularity-forge, built directly on the pi-mono SDK packages already vendored under packages/pi-*:
| Vendored package | Role |
|---|---|
@singularity-forge/pi-coding-agent |
Coding agent CLI primitives (vendored from pi-mono) |
@singularity-forge/pi-agent-core |
General-purpose agent core |
@singularity-forge/pi-ai |
Unified LLM API across 20+ providers |
@singularity-forge/pi-tui |
TUI primitives |
sf adds the autopilot layer on top: phase state machine, persistent agent fleet, knowledge integration with Singularity Memory, gates, hooks, worktree management, blockers and dispatch scheduling. The agent harness itself (tool execution, model calls, hook plumbing) is pi-mono's; the orchestration is sf's.
1.1 Existing infrastructure
sf already ships:
packages/daemon— long-lived background process that absorbs Node.js cold-start cost. The autopilot loop runs in the daemon; CLI invocations (/sf status,/sf next) talk to it via local RPC.packages/mcp-server— exposes sf orchestration tools (plan/dispatch/status) via MCP so other agent CLIs (Claude Code, Cursor) can call sf as a backend.packages/native— N-API bindings for performance-critical native (Rust) code where TypeScript would be too slow.packages/rpc-client— standalone RPC client SDK with zero internal dependencies, used by the CLI to talk to the daemon.
This spec defines the v3 contract that ties these together with the gen-2 harness control pattern that GSD established (drop into pi-mono primitives directly; do NOT layer skills on top and hope the LLM follows them).
1.2 Versioning
sf follows SemVer 2.0. For this spec:
- Patch (1.x.Y): clarifications, conformance refinements, no behavioural change.
- Minor (1.Y.0): additions to the harness API, schema, or CLI that do not break existing implementations.
- Major (X.0.0): breaking changes to schema, hook contracts, or harness API.
v1.0.0 (this spec, when finalised) freezes §§3 (Data Model), 4 (Phase State Machine), 6 (Worker Attempt Lifecycle), 10 (Hooks), 14 (Configuration), and 26 (Conformance) — changes to those sections post-v1 require a major bump. sf v3 MUST NOT rebuild what pi-mono already provides:
- Agent loop via
pi-coding-agent - Multi-provider LLM (20+ providers including Anthropic, OpenAI, Gemini, Groq, Bedrock, Azure, Ollama) via
pi-ai - MCP client (
modelcontextprotocol/go-sdk) - LSP integration
- SQLite state via
ncruces/go-sqlite3 - TUI primitives via
pi-tui - Tool execution (bash, file read/write, grep, web search, sourcegraph)
- Agent Skills open standard (
internal/skills/) - Permission service with pubsub, persistent grants, hook pre-approval
- PreToolUse hook system with allow/deny/halt, input rewriting, multi-hook aggregation
This specification covers only what sf v3 adds on top of pi-mono. Behaviour already provided by the pi-mono SDK packages is inherited.
Project-level conformance. sf MUST enforce JSDoc on every exported function, type, and class in its harness modules via a CI check (scripts/specs-check.ts — an AST walk, no external linter dependency). This applies to sf's own development; it is not a runtime gate against user projects.
2. Definitions
Unit — the atomic unit of work. Has a type (milestone, slice, task), a phase, and an attempt counter. Units are ephemeral — they complete or fail and are archived.
Unit IDs use the format {type}/{slug} where slug is hierarchical:
- Milestone:
milestone/m{n}(e.g.milestone/m2) - Slice:
slice/m{n}/s{n}(e.g.slice/m2/s3) - Task:
task/m{n}/s{n}/t{n}(e.g.task/m2/s3/t1)
The slug encodes the parent hierarchy redundantly with units.parent_id to make trace and log lines self-describing without requiring a join.
Phase — a named stage of a unit's lifecycle. The harness owns all phase transitions; no other layer may transition a phase directly.
Attempt — one dispatch of a worker for a unit. A unit may accumulate multiple attempts across failures and retries.
Turn — one model call within an attempt. An attempt consists of one or more turns. The first turn receives the full task prompt; subsequent turns receive continuation guidance only.
Project — a directory with .sf/config.toml. The project root is the directory containing .sf/. Each project has its own SQLite DB at <project>/.sf/sf.db — ~/.sf/sf.db is the cross-project default DB used only when no project-local DB exists. Multiple projects on the same machine MUST use separate .sf/ directories and therefore separate DBs, locks, and trace files.
Session — a top-level container scoped to one project, with a stable ULID, persisting across process restarts of the same project. A session is created on the first /sf auto or /sf next invocation in a project and reused on subsequent invocations until explicitly ended (/sf session end) or until 30 days of inactivity. The session holds the running state for all units, the context budget, and the supervisor state.
Harness — the layer between pi-coding-agent's agent loop and sf's orchestration logic (milestones, phases, git, worktrees). It owns: context budget, phase transitions, unit lifecycle hooks, session contract, observability, and supervision. Nothing in the planning or git layers MUST reach past the harness boundary into pi-coding-agent directly.
Worker — the process (local or SSH-remote) that executes one attempt. Spawned by the orchestrator.
Orchestrator — central process. Owns the scheduling loop, in-memory state, and all SQLite writes. Always runs locally even in distributed deployments.
Singularity Memory (sm) — the durable knowledge layer. An HTTP + MCP server holding memories, learnings, and anti-patterns across sessions, projects, and tools. Originally derived from vectorize-io/hindsight (MIT) and assimilated into our codebase under singularity_memory_server/; we own the engine. Runs either embedded (in-process for single-user sf) or remote (shared service on tailnet, reachable from sf, Hermes, OpenClaw, Claude Code, Cursor, etc.). Not SQLite — knowledge lives in Singularity Memory; SQLite holds only orchestration state.
Skill — a SKILL.md file providing prompt guidance to the agent. Inspirational, not enforced.
Workflow template — a TOML file specifying the exact phase sequence the harness enforces for a class of work. Programmatic, not a suggestion to the agent.
Plan — the local source of truth for work units. Created by the user via /sf plan "..." or by editing .sf/plan.md. Decomposed by sf into milestones → slices → tasks. There is no external tracker — sf's SQLite DB is authoritative. (External visibility, e.g. mirroring to GitHub Issues for teammates, is achieved via PostUnit hook scripts, not a built-in tracker integration. See § 10.)
Claim — a soft lock recorded on a units row indicating the orchestrator is currently dispatching it. Stored as claim_holder (worker host or PID) and claim_until (UNIX ms expiry). A claim is released on terminal phase, worker exit, or claim expiry. Prevents two workers picking up the same unit simultaneously. The orchestrator MUST sweep expired claims at the start of every poll tick: any row with claim_until < now() and phase_status = 'running' is reset to phase_status = 'interrupted' and claim_holder = NULL.
Run — the unifying abstraction for one execution of the worker attempt lifecycle (§ 6). A run is either a unit attempt (driven by the phase state machine) or a persistent agent run (driven by inbox messages). The runs table (§ 3.5) records both, distinguished by run_kind. Trace, billing, and supervisor monitoring all key on run_id.
3. Data Model
Status: PARTIAL — sf has
milestones/slices/tasksas 3 separate tables instead of oneunitstable withtypediscriminator. Plus richer planning tables (decisions,requirements,artifacts,assessments,replan_history). Reconciliation: either migrate to singleunitstable or keep 3-table shape and update spec.
The orchestrator uses a single SQLite database per project at <project>/.sf/sf.db (or ~/.sf/sf.db for non-project sessions) for orchestration state only: sessions, units, phase transitions, blockers, gate results, benchmarks, circuit breakers, and persistent agents. Knowledge (memories, learnings, anti-patterns, codebase context) lives in Singularity Memory (§ 16), not SQLite.
All primary keys for runtime-allocated rows (sessions, units, runs, agents, agent_messages, agent_inbox, gate_results, session_blockers, pending_retain) MUST be ULIDs — sortable by creation time without a separate timestamp column. Schema-natural keys (model name, agent name) remain TEXT but are not ULIDs.
The schema MUST be managed via versioned migrations (Drizzle / Kysely) and MUST use WAL mode:
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
3.1 Core tables
CREATE TABLE sessions (
id TEXT PRIMARY KEY,
status TEXT NOT NULL, -- idle | running | paused | interrupted | complete | failed
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
CREATE TABLE units (
id TEXT PRIMARY KEY, -- format: type/m{n}[/s{n}[/t{n}]]
session_id TEXT NOT NULL REFERENCES sessions(id),
parent_id TEXT REFERENCES units(id), -- NULL for milestone; ≤ 3 levels deep
type TEXT NOT NULL CHECK (type IN ('milestone', 'slice', 'task')),
workflow TEXT NOT NULL, -- workflow template name; pinned at first dispatch
workflow_hash TEXT NOT NULL, -- SHA-256 of pinned template content (FK workflow_pins.hash)
phase TEXT NOT NULL,
phase_status TEXT NOT NULL CHECK (phase_status IN
('pending', 'running', 'succeeded', 'failed', 'canceled', 'interrupted')),
attempt INTEGER NOT NULL DEFAULT 1, -- 1 = first try, 2 = first retry, ...
claim_holder TEXT, -- format: "{host}#{pid}" or "ssh:{host}#{pid}"
claim_until INTEGER, -- UNIX ms; claim auto-expires at this time
priority INTEGER, -- 1 (urgent) .. 4 (low); NULL sorts last
title TEXT NOT NULL,
description TEXT,
metadata TEXT, -- arbitrary JSON: gh_issue, slack_channel, custom keys
worker_host TEXT, -- "local" | SSH host name; current/last worker
workspace TEXT, -- path of latest workspace (current attempt)
archived_at INTEGER, -- soft-delete; non-NULL = archived/forgotten
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
-- Hierarchy depth is enforced in code (the harness rejects parent_id pointing to a task).
-- It would also be enforceable via a recursive trigger, but that adds write-path overhead
-- for a constraint that the planning layer already validates.
CREATE TABLE phase_transitions (
id TEXT PRIMARY KEY,
unit_id TEXT NOT NULL REFERENCES units(id),
from_phase TEXT NOT NULL,
to_phase TEXT NOT NULL,
reason TEXT,
transitioned_at INTEGER NOT NULL
);
CREATE TABLE task_blockers (
task_id TEXT NOT NULL REFERENCES units(id) ON DELETE CASCADE,
blocked_by TEXT NOT NULL REFERENCES units(id) ON DELETE CASCADE,
PRIMARY KEY (task_id, blocked_by)
);
CREATE TABLE gate_results (
id TEXT PRIMARY KEY,
unit_id TEXT NOT NULL REFERENCES units(id),
gate_name TEXT NOT NULL,
passed INTEGER NOT NULL,
attempt INTEGER NOT NULL,
max_retries INTEGER NOT NULL,
output TEXT, -- truncated at 8KB
duration_ms INTEGER NOT NULL,
recorded_at INTEGER NOT NULL
);
CREATE TABLE session_blockers (
id TEXT PRIMARY KEY, -- ULID
session_id TEXT NOT NULL REFERENCES sessions(id),
event TEXT NOT NULL, -- GateBlocked | MergeConflict | Paused | UATPending
unit_id TEXT,
detail TEXT,
created_at INTEGER NOT NULL,
resolved_at INTEGER, -- non-NULL = resolved; see resolution rules below
resolved_by TEXT -- "user" | "auto" | command name (e.g. "/sf uat-approve")
);
-- Resolution rules:
-- GateBlocked : resolved when the gate passes on a subsequent attempt OR the unit
-- transitions to PhaseReassess; resolved_by = "auto" | "/sf force-clear"
-- MergeConflict : resolved on /sf revert, /sf merge-resolve, or git service hook;
-- resolved_by = command name
-- Paused : resolved on /sf resume; resolved_by = "user"
-- UATPending : resolved on /sf uat-approve or /sf uat-reject; resolved_by = command name
--
-- An unresolved blocker MUST be displayed in /sf status. The TUI also subscribes to
-- the corresponding pubsub event (§ 10.1) for live updates.
CREATE TABLE benchmark_results (
id TEXT PRIMARY KEY,
model TEXT NOT NULL,
tier TEXT NOT NULL,
fingerprint TEXT NOT NULL, -- phase+complexity+project hash
quality REAL NOT NULL, -- 0.0 .. 1.0
latency_p50 INTEGER NOT NULL, -- milliseconds
cost_per_1k_micro_usd INTEGER NOT NULL, -- micro-USD per 1k tokens
sample_count INTEGER NOT NULL DEFAULT 1,
recorded_at INTEGER NOT NULL
);
CREATE TABLE circuit_breakers (
model TEXT PRIMARY KEY,
tier TEXT NOT NULL,
tripped_at INTEGER NOT NULL,
resets_at INTEGER NOT NULL, -- UNIX ms; auto-reset deadline
fail_count INTEGER NOT NULL DEFAULT 3,
reason TEXT
);
CREATE TABLE schema_migrations (
version INTEGER PRIMARY KEY,
applied_at INTEGER NOT NULL,
description TEXT
);
CREATE TABLE runs (
id TEXT PRIMARY KEY, -- ULID
run_kind TEXT NOT NULL CHECK (run_kind IN ('unit_attempt', 'agent_run')),
unit_id TEXT REFERENCES units(id) ON DELETE SET NULL, -- preserve forensics
agent_id TEXT REFERENCES agents(id) ON DELETE SET NULL, -- preserve forensics
unit_id_snap TEXT, -- ID at run start; survives delete
agent_name_snap TEXT, -- name at run start; survives delete
attempt INTEGER, -- only for unit_attempt
worker_host TEXT,
workspace TEXT, -- workspace AT THIS attempt; authoritative for this run
started_at INTEGER NOT NULL,
ended_at INTEGER,
outcome TEXT CHECK (outcome IS NULL OR outcome IN
('success','failure','abandoned','canceled','interrupted',
'unit_timeout','turn_timeout','stalled')),
error_code TEXT, -- typed error from § 20.1; stores the string
-- value of the const, e.g. "turn_timeout"
input_tokens INTEGER NOT NULL DEFAULT 0,
output_tokens INTEGER NOT NULL DEFAULT 0,
cost_micro_usd INTEGER NOT NULL DEFAULT 0, -- cost in micro-USD (1e-6 USD); avoids float drift
CHECK (
(run_kind = 'unit_attempt' AND unit_id_snap IS NOT NULL AND agent_name_snap IS NULL AND attempt IS NOT NULL)
OR
(run_kind = 'agent_run' AND agent_name_snap IS NOT NULL AND unit_id_snap IS NULL AND attempt IS NULL)
)
);
-- Aggregate token/cost columns are an end-of-run rollup written once on ended_at.
-- Span data in trace.jsonl (§ 19.3) is authoritative; runs columns are the cached
-- summary used by /sf session-report and the HTTP API without re-scanning JSONL.
--
-- Soft-delete model: units and agents are NEVER hard-deleted by the harness — only
-- marked archived (units.archived_at, agents.archived_at). The snap_ columns ensure
-- run history survives even if a future operator manually drops rows.
-- Local mirror of selected Singularity Memory entries that the harness needs offline.
-- Limited to anti-patterns by default — small, high-value, MUST surface even
-- if Singularity Memory is unreachable.
CREATE TABLE local_anti_patterns (
id TEXT PRIMARY KEY,
description TEXT NOT NULL,
context TEXT NOT NULL,
correct_path TEXT NOT NULL,
source_unit TEXT,
fingerprint TEXT, -- phase + project hash, for fast filter
created_at INTEGER NOT NULL,
synced_at INTEGER -- last time confirmed against Singularity Memory
);
3.2 Persistent agent tables
Status: NEW — no
agents,agent_memory_blocks,agent_messages,agent_inboxtables in sf today.
CREATE TABLE agents (
id TEXT PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
system TEXT NOT NULL, -- system prompt template
model TEXT NOT NULL,
state TEXT NOT NULL DEFAULT 'idle' CHECK (state IN ('idle','running','waiting','stopped')),
capabilities TEXT, -- JSON array of capability tags; cached in agent_capabilities
max_turns_per_run INTEGER NOT NULL DEFAULT 100,
archived_at INTEGER, -- soft-delete; non-NULL = archived
created_at INTEGER NOT NULL,
last_active INTEGER
);
-- Indexed lookup table for capability matching (handoff "capability:tag1,tag2").
-- Maintained in sync with agents.capabilities by the agent CRUD layer.
CREATE TABLE agent_capabilities (
agent_id TEXT NOT NULL REFERENCES agents(id) ON DELETE CASCADE,
capability TEXT NOT NULL,
PRIMARY KEY (agent_id, capability)
);
CREATE INDEX agent_capabilities_by_tag ON agent_capabilities(capability, agent_id);
CREATE TABLE agent_memory_blocks (
agent_id TEXT NOT NULL REFERENCES agents(id),
label TEXT NOT NULL,
value TEXT NOT NULL DEFAULT '',
char_limit INTEGER NOT NULL DEFAULT 2000,
read_only INTEGER NOT NULL DEFAULT 0,
updated_at INTEGER NOT NULL,
PRIMARY KEY (agent_id, label)
);
CREATE TABLE agent_messages (
id TEXT PRIMARY KEY,
agent_id TEXT NOT NULL REFERENCES agents(id),
seq INTEGER NOT NULL, -- monotonically increasing per agent
role TEXT NOT NULL, -- user | assistant | tool_call | tool_return | system
content TEXT NOT NULL,
tool_name TEXT,
created_at INTEGER NOT NULL
);
CREATE TABLE agent_inbox (
id TEXT PRIMARY KEY,
agent_id TEXT NOT NULL REFERENCES agents(id),
from_agent TEXT NOT NULL,
content TEXT NOT NULL,
delivered INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL
);
agent_inbox is append-only. Rows MUST NOT be deleted or modified after insert. delivered is the only mutable field.
3.3 No external tracker
Status: EXISTS — sf already operates from local SQLite only; no Linear/Jira/etc. integration in core.
sf v3 does not integrate with external task trackers (Linear, GitHub Issues, Jira). Work units are entirely local — created by /sf plan "...", edited via .sf/plan.md, and stored in units (§ 3.1). The local SQLite DB is the only authoritative source of unit state.
This is a deliberate simplification from the earlier draft. Reasons:
- sf's gen-2 model is "user states a goal, sf decomposes and executes" — not "team files tickets in Linear, sf picks them up." The autopilot doesn't need an external queue.
- External tracker integration adds network dependency on the orchestrator's critical path (rate limits, outages, GraphQL pagination edge cases).
- Symphony-style reconciliation (cancel mid-run when external state changes) doesn't apply when the only source is internal.
External visibility is achieved via hooks, not core integration. A PostUnit hook can call gh issue comment, slack-cli send, or any other publishing target to broadcast progress. Read-side stays in sf's DB; write-side goes through hooks. See § 10.5.1 for an example GH Issues publishing hook.
Sources of truth for unit creation:
| Source | When |
|---|---|
/sf plan "<goal>" CLI command |
Adds a milestone with sf-decomposed slices and tasks |
.sf/plan.md file edit |
Declarative; sf re-reads and reconciles on /sf plan reload |
/sf dispatch <ad-hoc-prompt> |
One-off task, no enclosing milestone |
/sf agent run <name> "<message>" |
Wakes a persistent agent; not a unit (§ 17.1) |
There is no poll loop against any external API. The orchestrator's poll cycle (§ 5.1) reads only from local SQLite.
4. Phase State Machine
Status: PARTIAL — phases
research/plan/execute/review/completeexist in sf prompts. Spec'stdd,verify,merge,reassess,uatneed verification — may exist under different names (assessments,replan_historytable hints at reassess).
4.1 Phase enum
type Phase int
const (
PhaseResearch Phase = iota // map the problem, gather context
PhasePlan // decompose into slices and tasks, get sign-off
PhaseExecute // write the code
PhaseTDD // write tests for what was just built; red → green
PhaseVerify // run full test suite + lint + type check; gates pass
PhaseReview // structured self-review: correctness, style, security
PhaseMerge // commit, push, open PR
PhaseComplete // unit done; result recorded; artifact archived
PhaseReassess // re-enter planning with failure context
PhaseUAT // human acceptance; only when workflow has require_uat = true
)
4.2 Standard flow
Research → Plan → Execute → TDD → Verify → Review → Merge → Complete
Permitted non-standard transitions:
| Trigger | Transition |
|---|---|
| Gate failure in Verify (attempt < max_retries) | Verify → Execute |
| Gate failure in Verify (attempt = max_retries) | Verify → Reassess |
| Review finds a real problem | Review → Execute |
| Merge conflict | Merge → Reassess |
| External cancellation | Any → (AttemptCanceled, no phase write) |
All other transitions are REJECTED at the harness boundary with a typed error. The harness MUST NOT silently allow invalid transitions.
4.3 Attempt state
Within each phase, individual dispatch attempts move through finer-grained states:
type AttemptState int
const (
AttemptPreparingWorkspace AttemptState = iota
AttemptBuildingPrompt
AttemptLaunchingAgent
AttemptInitializingSession
AttemptStreamingTurn
AttemptFinishing
AttemptSucceeded
AttemptFailed
AttemptTimedOut
AttemptStalled // stall_timeout exceeded since last agent event
AttemptCanceled // issue became non-active mid-run (reconciliation)
)
AttemptCanceled is distinct from AttemptFailed. It means the work was valid but the task was externally invalidated (deleted, moved to a terminal state, superseded). The harness MUST NOT retry a canceled attempt — it releases the slot and moves on.
4.4 Turn kind
type TurnKind int
const (
TurnFirst TurnKind = iota // full rendered task prompt
TurnContinuation // short continuation guidance, same thread
)
Turn 1 of every attempt is always TurnFirst. Turns 2+ are TurnContinuation. The harness determines TurnKind; the agent never does.
4.5 Workflow templates
A workflow template MUST be a TOML file in .sf/workflows/<name>.toml. The harness reads the template, constructs the phase sequence from it, and enforces it programmatically. The agent has no say in phase ordering or skipping.
# .sf/workflows/feature.toml
name = "feature"
phases = ["research", "plan", "execute", "tdd", "verify", "review", "merge", "complete"]
require_tdd = true # PhaseTDD is enforced; skipping is a gate violation
require_review = true
require_uat = false # if true, PhaseUAT is inserted before PhaseComplete
max_retries = 3 # per gate in PhaseVerify
max_reassess = 2
# .sf/workflows/release.toml — uses UAT
name = "release"
phases = ["research", "plan", "execute", "tdd", "verify", "review", "uat", "merge", "complete"]
require_tdd = true
require_review = true
require_uat = true # halts after UAT enters; only resumes on /sf uat-approve
max_retries = 3
# .sf/workflows/spike.toml
name = "spike"
phases = ["research", "plan", "execute", "complete"]
require_tdd = false
require_review = false
max_retries = 0
PhaseUAT halts the auto-loop with SignalPause and waits for /sf uat-approve <unit-id> (advance to PhaseMerge) or /sf uat-reject <unit-id> "reason" (advance to PhaseReassess). The harness MUST fail startup if a configured workflow template references an unknown phase or includes uat without require_uat = true.
Workflow selection at dispatch
The workflow used for a given unit is determined in this order:
- Explicit unit metadata:
metadata.workflow = "<name>"set at/sf plantime. - Project default:
[harness] default_workflow = "feature"in.sf/config.toml. - Built-in fallback:
feature(if available) else the first workflow in.sf/workflows/.
The selected workflow is recorded in units.workflow at dispatch time and never re-evaluated for that unit, even on retry — workflow stability across attempts is a hard guarantee. Additionally, the content of the chosen template is hashed (SHA-256) and stored in units.workflow_hash. If the on-disk template changes mid-session, the harness uses the pinned hash's content (cached in SQLite at workflow_pins.content) for that unit; new units pick up the new content. This prevents in-flight units from silently changing rules.
CREATE TABLE workflow_pins (
hash TEXT PRIMARY KEY, -- SHA-256 of template content
name TEXT NOT NULL,
content TEXT NOT NULL, -- frozen TOML at first pin
pinned_at INTEGER NOT NULL
);
4.6 PhaseReassess
PhaseReassess is entered when a unit cannot make progress through normal phases (gate failed max_retries times, merge conflict, supervisor halt). The Reassess agent is dispatched at the reasoning tier with Think: true and is given:
- The original task description.
- The full failure trail: gate output, last
max_retriesattempt errors, last commit history. - The unit's plan (from
.sf/active/{unit-id}/plan.md).
The Reassess agent MUST output one of:
| Outcome | Effect |
|---|---|
| Re-plan | Writes a new plan.md, transitions back to PhasePlan. Counter max_reassess decrements. |
| Abandon | Writes a decision.md explaining why the task cannot succeed; transitions to PhaseComplete with verdict abandoned. Any registered visibility hook (e.g. GH Issues comment) fires from the standard PostUnit pipeline. |
| Escalate | Halts auto-loop with SignalPause; writes a human-question.md with concrete questions for the operator. Resumes on /sf reassess-resolve <unit-id>. |
If max_reassess hits zero on a Re-plan path, the next entry into PhaseReassess MUST be Abandon or Escalate; Re-plan is rejected.
4.7 Phase transition rules
- All phase transitions MUST go through a single
Harness.Transition(ctx, from, to, reason)method. TransitionMUST persist thePhaseTransitionrecord to SQLite BEFORE the new phase begins. A crash mid-phase means on resume the harness re-enters the last committed phase cleanly (see § 4.8).TransitionMUST emit a pubsubPhaseChangeevent after the SQLite write. The TUI subscribes — it MUST NOT poll phase state directly.- The harness MUST set
Think: trueon the model config forResearch,Plan, andReassessphases. The agent does not control this. PhaseChangeis non-vetoable. Hook subscribers receive a notification after the transition is committed; they cannot block or reject. Hooks that need veto semantics MUST register onPreDispatchinstead, which fires before the next dispatch and IS vetoable.
4.8 Crash recovery
In-memory scheduler state is intentionally not persisted (§ 20.2). On restart, the orchestrator MUST follow this exact sequence:
- Acquire project lock at
<project>/.sf/run.lock(PID file). Stale lock (PID not in/procon Linux,kill(pid, 0)on other Unixes) is cleaned and logged. The lock is per-project; multiple projects can run auto concurrently on the same machine. - Mark interrupted units. All units with
phase_status = 'running'are updated tophase_status = 'interrupted'. This is the only schema-level recovery action. - Run startup cleanup (§ 5.6) — move stale active artifacts to archive.
- Resume from the last committed phase boundary. Each
interruptedunit is treated as eligible for fresh dispatch; the worker re-enters atunit.phasewith a new attempt number (unit.attempt + 1). The agent receives alast_errorof"resumed_after_crash"so the prompt can warn the agent. - Begin polling. Resume normal poll cycle. Operator-issued
/sf abandoncommands made during the outage are visible via the next poll because they're persisted inunits.phase_status.
The harness MUST NOT replay tool calls. It MUST NOT attempt to "resume" a partial agent session. The crash recovery model is fresh dispatch from the last persisted phase boundary, not transparent continuation.
Side effects are not rolled back. A crash mid-Merge may have produced a partial commit, push, or PR. The agent on retry sees the existing commits and either continues from there or surfaces a conflict. This MUST be documented in the Merge phase prompt: "if you see existing commits from a previous attempt, integrate them; do not start over."
Workspace state is preserved. A crashed worker's workspace remains on disk; the next attempt reuses it (ensure_workspace returns created=false). The before_run hook is responsible for any cleanup (e.g. git stash, npm clean) appropriate for the project.
5. Orchestration Loop
Status: EXISTS —
src/resources/extensions/sf/auto-loop.ts,auto-dispatch.ts,auto-supervisor.ts.
5.1 Poll cycle
The orchestrator runs a single goroutine that polls on a configurable interval (default 1s). Each tick:
- Re-check config stamp (§ 14.3).
- Fetch eligible units from SQLite.
- Apply priority sort (§ 5.2).
- For each eligible unit (up to capacity), dispatch a worker.
- Check running workers for stalled/timed-out attempts.
- Write orchestrator snapshot to HTTP API state (§ 19.4).
The orchestrator MUST be the single authority for all in-memory scheduler state. No other goroutine writes scheduler state.
5.2 Priority ordering
When multiple units are eligible, the orchestrator sorts them:
- Explicit priority —
priority1 (urgent) before 4 (low);NULLsorts last. - Blocker-free first — units with no non-terminal upstream blockers before blocked units.
- Phase order — earlier phases first (Research before Execute) within the same priority bucket.
- Created-at — oldest first as tie-breaker.
- Unit ID lexicographic — final deterministic tie-breaker.
This ordering is re-evaluated fresh on every poll tick.
5.3 Blocker-aware dispatch
A unit MUST NOT be dispatched if any of its upstream dependencies (in task_blockers) are in a non-terminal state.
Terminal means PhaseComplete, PhaseReassess (resolved), or explicitly cancelled. Non-terminal means any other state, including PhaseVerify in progress.
A dependency that failed and was marked abandoned is terminal and MUST NOT block downstream dispatch.
Blocked units stay queued and are re-evaluated on the next poll tick. No backoff, no retry counter increment for a blocked wait.
5.3.1 Atomic claim acquisition
The orchestrator acquires a claim with a single conditional UPDATE:
UPDATE units
SET claim_holder = ?, claim_until = ?, phase_status = 'running', updated_at = ?
WHERE id = ?
AND (claim_holder IS NULL OR claim_until < ?); -- ? = now()
Dispatch proceeds only if rows_affected = 1. This makes the claim race-free at the DB level and supports multiple orchestrators against the same ~/.sf/sf.db even though SF normally runs as a singleton (one process per ~/.sf/run.lock). The atomic claim is the safety net if the lock fails (e.g. shared NFS, broken filesystem semantics).
units.attempt is the current attempt counter (used as the attempt prompt template variable). Historical attempts live in runs (§ 3.1). Authority: units.attempt is incremented exactly when a new runs row is inserted; the two are kept in sync inside the same transaction.
5.4 Per-phase concurrency
The harness MUST NOT exceed max_agents_by_phase[phase] concurrent units in any given phase. When a phase slot is full, further dispatches for that phase wait until the next tick.
[harness.concurrency]
max_agents = 10
max_agents_by_phase.execute = 4
max_agents_by_phase.tdd = 4
max_agents_by_phase.verify = 10
5.5 Continuation retry and exponential backoff
After a normal (clean) exit from a worker, the orchestrator MUST schedule a 1-second continuation retry to re-poll eligibility. If the unit is still active, a new session starts. If terminal, the claim is released. This is not a failure retry.
5.4.1 Turn outcome signal
Between transport-level "turn ran cleanly" and phase-level "gate passed," the harness MUST capture a per-turn semantic signal. After every turn, the harness inspects the model output for an explicit terminal marker:
| Marker (in agent output) | Meaning | Effect |
|---|---|---|
<turn_status>complete</turn_status> |
Agent considers this turn's goal achieved | Recorded; allow continuation if max_turns_per_attempt not reached |
<turn_status>blocked</turn_status> |
Agent stuck, need user input or escalation | Triggers SignalPause if auto-mode |
<turn_status>giving_up</turn_status> |
Agent has decided the task can't be done | Ends attempt; transitions to PhaseReassess |
| (no marker) | Default success | Continue normally |
The marker is parsed from the last 200 chars of the agent's response. Markers appearing earlier are ignored (prevents partial-quote false positives). This gives the harness a checkpoint between turns without waiting for a phase boundary.
The agent prompt template (prompts/execute-task.md) instructs the agent to emit one of these markers at end-of-turn. Compliance is best-effort — absence of a marker is treated as default success.
After an abnormal exit, exponential backoff. attempt is 1-indexed (first try = 1, first retry = 2, …):
delay = min(10s × 2^(attempt - 1), max_retry_backoff)
| Attempt | Delay before next dispatch |
|---|---|
| 1 (first try) | (no retry yet) |
| 2 (first retry) | 20 s |
| 3 | 40 s |
| 4 | 80 s |
| 5 | 160 s |
| 6+ | capped at max_retry_backoff (default 5 min) |
Configurable: [harness] max_retry_backoff = "5m", [harness] max_attempts = 6.
5.6 Startup cleanup
On startup, the orchestrator MUST:
- Scan
.sf/active/for unit artifacts whose tasks are in terminal states. - Move stale active artifacts to
.sf/archive/atomically (rename, not copy+delete). - Mark any running/claimed units as interrupted in SQLite.
- Release all worker slots.
6. Worker Attempt Lifecycle
Status: EXISTS —
src/resources/extensions/sf/auto.tsand surrounding modules.
The exact sequence inside a single worker attempt:
run_worker_attempt(unit, attempt):
# 1. Workspace
workspace = create_or_reuse_workspace(unit.id, unit.worker_host)
if workspace failed:
fail_attempt(ErrWorkspaceCreation)
# 2. Before-run hook (fatal)
result = run_hook("before_run", workspace, unit)
if result failed:
fail_attempt(ErrHookFailed)
# 3. Session start
session = agent.start_session(cwd=workspace, model=route(unit.phase))
if session failed:
run_hook_best_effort("after_run", workspace, unit)
fail_attempt(ErrAgentStartup)
# 4. Turn loop
turn = 1
loop:
kind = TurnFirst if turn == 1 else TurnContinuation
prompt = build_prompt(unit, attempt, turn, kind)
if prompt failed:
agent.stop_session(session)
run_hook_best_effort("after_run", workspace, unit)
fail_attempt(ErrPromptRender)
result = agent.run_turn(session, prompt)
if result failed:
agent.stop_session(session)
run_hook_best_effort("after_run", workspace, unit)
fail_attempt(result.error)
# Re-check unit state between turns (local DB only — no external tracker)
current_state = db.fetch_unit_phase_status(unit.id)
if current_state in ('canceled', 'succeeded'):
break # → AttemptCanceled (e.g. operator ran /sf abandon mid-run)
if turn >= max_turns_per_attempt:
break
turn++
# 5. Teardown
agent.stop_session(session)
run_hook_best_effort("after_run", workspace, unit)
exit_normal()
Rules:
before_runhook failure is fatal — the harness MUST fail the attempt without starting the session.after_runhook is always attempted, even after failure. Its failure is logged but MUST NOT change the attempt outcome.- The unit state re-check between turns MUST happen before building the next turn prompt. A canceled unit MUST NOT receive another turn.
7. Prompt Contract
Status: PARTIAL —
auto-prompts.tsandcommands-handlers.tsload templates; strict-variable-mode behavior needs verification.
7.1 Template variables
Every prompt template MUST be rendered with a strict variable checker. An unknown variable in the template MUST cause loadPrompt to panic at startup rather than silently render an empty string.
Canonical variables for execute-task templates:
| Variable | Type | Notes |
|---|---|---|
unit_id |
string | Stable unit identifier |
unit_type |
string | "milestone" | "slice" | "task" |
phase |
string | Current phase name ("execute", "tdd", etc.) |
attempt |
int | null | null on first dispatch; integer ≥ 1 on retry |
session_id |
string | Stable session UUID |
issue |
object | Full issue/task struct as flat map |
last_error |
string | null | Injected automatically when attempt >= 1 |
When adding a new {{variable}} to any template: (1) pass it in every loadPrompt call site, (2) add a placeholder in every test that renders that template, (3) recompile. Skipping either step causes a startup panic.
7.2 Continuation turns
A TurnContinuation MUST receive a short guidance prompt, not the full task prompt. The full prompt is already in the thread history — resending it inflates context and degrades model reasoning. The continuation prompt MUST NOT re-state the task description; it provides only steering context for the current turn.
7.3 Attempt variable semantics
The attempt variable enables prompt templates to give different instructions to retrying agents vs. fresh starts. A retry prompt SHOULD include: "your previous attempt failed with: {{last_error}} — focus on that specifically." The harness injects last_error automatically on attempt >= 2.
last_error is only injected on TurnFirst of attempts ≥ 2. Continuation turns within the same attempt have already established context and don't need it. A turn failure within an attempt always fails the entire attempt (§ 6); there are no mid-attempt error injections to reason about.
last_error content MUST be capped at 4 KB. Larger payloads (gate output, lint dumps, traceback) are truncated head-and-tail: 2 KB from the start, marker ... [truncated, full payload at <path>] ..., then 2 KB from the end. The full payload is written to .sf/active/{unit-id}/last-error-full.txt so the agent can read_file it if the truncated context isn't enough.
7.4 turn_input_required in auto-mode
When the agent raises turn_input_required during auto-mode, the harness MUST respond according to the turn_input_required config (default: "soft"):
"soft"— inject"This is a non-interactive session. Operator input is unavailable."as auserrole turn and let the session continue. The agent adapts."hard"— end the attempt immediately, recordErrTurnInputRequired, schedule failure retry.
In interactive/step mode, the harness MUST surface the request to the user via the TUI and MUST NOT auto-respond. It waits up to unit_timeout before failing.
The harness MUST NOT leave a run stalled indefinitely waiting for interactive input in any mode.
8. Context Budget
Status: EXISTS —
src/resources/extensions/sf/auto-budget.tswith threshold logic;auto-recovery.tsfor compaction.
8.1 Budget type
type Budget struct {
MaxTokens int
UsedTokens int
CompactAt float64 // fraction e.g. 0.80
HardLimitAt float64 // fraction e.g. 0.95
}
func (b *Budget) ShouldCompact() bool {
return float64(b.UsedTokens)/float64(b.MaxTokens) >= b.CompactAt
}
func (b *Budget) AtHardLimit() bool {
return float64(b.UsedTokens)/float64(b.MaxTokens) >= b.HardLimitAt
}
8.2 Rules
- The harness MUST update
UsedTokensafter every model response. The agent loop MUST NOT manage budget. - When
ShouldCompact()is true, the harness MUST trigger compaction before the next turn, not mid-turn. - When
AtHardLimit(), the harness MUST halt the current unit, snapshot state, and surfaceErrBudgetExhausted. It MUST NOT let the agent proceed and hit a provider context error. - Budget state MUST be persisted to SQLite after every turn so crash recovery can restore it.
8.3 Compaction
When compaction fires (budget at compact threshold):
- Write a
session_summaryentry to Singularity Memory viaretain. - Clear the hot cache (in-memory last-N turns).
- Start the next turn with a fresh context window seeded by a
recallfrom Singularity Memory.
Compaction MUST NOT truncate the window — it MUST replace it with a fresh recall. A truncated window loses structure; a recalled window gains relevance.
Agent run compaction preserves the wake context. For persistent agent runs, the compacted window MUST include verbatim:
- The wake message that started this run.
- The most recent 3 inbox arrivals delivered in this run.
- The agent's full
agent_memory_blocks(these are durable anyway, but they go above the recall block).
Compaction without this preservation can drop the originating intent and cause the agent to lose thread continuity mid-run.
8.4 Token accounting precision
Provider responses arrive as either absolute thread totals or per-turn deltas. The harness MUST prefer absolute totals (thread/tokenUsage/updated-style events) and MUST track the last-reported total to compute deltas, preventing double-counting.
Aggregate totals (input, output, cache-read, cache-write, cost-usd) MUST accumulate in orchestrator state and be included in every runtime snapshot.
9. Supervision
Status: EXISTS —
abandon-detect.ts,auto-budget.ts,auto-recovery.ts,auto-timeout-recovery.ts,blocked-models.tscover most checks. Circuit breaker and ModelUnavailable specifics need verification.
9.1 Supervisor interface
The harness MUST run a supervisor goroutine alongside the agent loop. The supervisor communicates exclusively via pubsub — it MUST NOT touch agent state directly.
type SupervisorCheck interface {
Name() string
Check(ctx context.Context, state SupervisorState) SupervisorSignal
}
type SupervisorSignal int
const (
SignalOK SupervisorSignal = iota
SignalWarn // log, surface in TUI
SignalPause // pause auto-loop, wait for user
SignalAbort // stop unit, mark interrupted
)
9.2 Built-in checks
| Check | Trigger | Signal |
|---|---|---|
StuckLoop |
Same phase for > N turns with no successful tool calls | SignalPause |
BudgetWarning |
Context approaching compaction threshold | SignalWarn |
TimeoutCheck |
Unit running longer than unit_timeout |
SignalAbort |
AbandonDetect |
Agent producing output with no tool calls | SignalPause |
GitDivergence |
Working branch diverged from base unexpectedly | SignalPause |
BlockerCheck |
Upstream dependency moved to non-terminal state mid-run | SignalPause |
ModelUnavailable |
Provider returns "model not supported / not found" class error | SignalAbort immediately (not after timeout) |
CircuitBreaker |
Same model fails 3 consecutive times within a session | Trip circuit; SignalAbort on next dispatch to tripped model |
9.3 Circuit breaker
When the circuit trips for a model:
- Write circuit state to SQLite (
circuit_breakerstable —model,tripped_at,resets_at). - Subsequent dispatches in that tier MUST skip the tripped model.
- Circuit auto-resets after 24 hours or on explicit
/sf reset-circuits. - The circuit state MUST survive a process restart.
9.4 Supervisor constraints
- The supervisor MUST NOT call
os.Exitor panic. - The supervisor MUST NOT write to agent state or SQLite unit state directly.
- The auto-loop acts on
SignalPauseandSignalAbort. The TUI shows warnings onSignalWarn.
9.5 SignalAbort and in-flight tool calls
When the harness receives SignalAbort while a tool call is in flight (e.g. a long-running bash subprocess), it MUST follow this sequence:
- Cancel the tool call's context (Go
context.CancelFunc). Cooperative cancellation MUST be honoured by built-in tools. - Wait up to
[harness] tool_abort_grace = "5s"for the tool to exit cleanly. - After the grace period, send
SIGTERMto any tool subprocess. - Wait an additional
[harness] tool_abort_kill = "3s". - If the subprocess is still running, send
SIGKILL.
Total worst case: 8 seconds from SignalAbort to forcible termination. The harness MUST NOT hang the orchestrator waiting on a non-cooperating tool call.
After the tool call ends (cleanly or via SIGKILL), the harness records the run as outcome = canceled with error_code = canceled_by_supervisor and emits the after_run hook before releasing the slot.
10. Hook Pipeline
Status: EXISTS —
src/resources/extensions/sf/post-unit-hooks.ts,bootstrap/register-hooks.ts. Per-hook timeouts and exact event set (PreDispatch/AutoLoop/etc.) need cross-check.
10.1 Events
The harness extends pi-coding-agent's hook system with sf-specific events:
const (
// Existing pi-coding-agent event
EventPreToolUse = "PreToolUse"
// Unit lifecycle
EventPreDispatch = "PreDispatch" // before a unit is dispatched; can block
EventPostUnit = "PostUnit" // after a unit completes
EventPhaseChange = "PhaseChange" // on phase transition
// Auto-loop
EventAutoLoop = "AutoLoop" // each iteration of the auto-loop
// Worktree
EventWorktreeCreate = "WorktreeCreate"
EventWorktreeDelete = "WorktreeDelete"
EventMergeReady = "MergeReady"
EventMergeConflict = "MergeConflict"
// Agent fleet
EventAgentWake = "AgentWake" // target agent should start/resume
EventAgentMessage = "AgentMessage" // message routed (TUI + tracing)
EventAgentIdle = "AgentIdle" // agent completed its turn, inbox empty
)
10.2 UnitResult payload
PostUnit hooks receive:
type UnitResult struct {
UnitID string
UnitType string // "milestone" | "slice" | "task"
Phase Phase
Verdict string // "success" | "failure" | "abandoned"
Duration time.Duration
InputTokens int
OutputTokens int
CacheHits int
CostUSD float64
Model string
WorkerHost string
Error error
Learnings []string
}
The payload is serialized to JSON and passed to hook subprocesses via stdin.
10.3 Hook execution rules
-
PostUnit hooks run sequentially, not concurrently. The next dispatch MUST NOT begin until all PostUnit hooks have returned.
-
A hook subprocess that exits non-zero for
PreDispatchorPostUnitMUST triggerSignalAbort. The harness stops the session and marks itSessionFailed. -
Hook timeouts are per-hook-type. Defaults:
Hook Default Rationale before_run120sCleanup, dependency install can take time after_run30sBest-effort teardown after_create120sFirst-time setup before_remove30sCleanup pre_dispatch15sShould be a fast check post_unit60sSubprocess work; longer for git push doc_sync(built-in)5mRuns an agent dispatch over the diff All overridable in config via
[harness.hooks.timeouts.<hook_name>] = "<duration>". A timeout kills the hook and logs. APostUnithook timeout MUST NOT block the next dispatch. -
The git service subscribes to PostUnit via a hook and handles commits, branch creation, and push. The harness MUST NOT call
gitdirectly. -
Singularity Memory feedback (retain learnings, mark anti-patterns) is emitted from a built-in PostUnit hook (not a subprocess) — it calls the Singularity Memory client directly.
-
PostUnit hook results MUST be written to the trace as child spans of the unit span.
10.4 Tool response contract
Every tool call — successful or not — MUST return a response in this shape:
type ToolResponse struct {
Success bool `json:"success"`
Output string `json:"output"`
ContentItems []ContentItem `json:"contentItems"`
}
type ContentItem struct {
Type string `json:"type"` // always "inputText" for text results
Text string `json:"text"`
}
For successful calls: success = true, output = result summary. For unsupported or failed calls: success = false, output = human-readable error, contentItems lists which tools are available in the current context. The shape MUST be consistent — the agent relies on success to distinguish real failures from tool-not-found errors.
If the agent calls a tool that is not registered, the harness MUST return a structured failure response and continue the session. It MUST NOT stall, panic, or exit on an unknown tool name.
10.5.0 SF tool registration
pi-coding-agent (vendored from pi-mono under packages/pi-coding-agent/) provides the agent's tool registry. sf adds new tools (send_message, core_memory_append/replace, handoff, wait_for_reply, chapter_open, stop, plan_unit, etc.) by registering them at sf-startup via pi-coding-agent's API. There is NO parallel tool registry — sf tools live in src/resources/extensions/sf/tools/ and call into pi-coding-agent's registration during module init.
sf-specific tools MUST:
- Conform to the response shape of § 10.4 (
{success, output, contentItems}). - Honour pi-coding-agent's
PreToolUsehook system — they receive the same hook pipeline as built-in tools. - Document the auto_approve key they expect (e.g.
agent:send_message) so projects can list them in[harness.auto_approve.tools].
This means PreToolUse hooks can deny sf tool calls just like any other; the auto-approve list scopes them; permissions are uniform.
10.5.1 External visibility via PostUnit hooks (recipe)
Status: NEW — documentation-only; no
gh issue commentrecipe shipped.
If the user wants teammates to see sf's progress in GitHub Issues (or Slack, or any other system), this is done as a PostUnit hook script — not a built-in tracker integration.
Example: .sf/hooks/post-unit-gh.sh
#!/usr/bin/env bash
# Reads UnitResult JSON from stdin; posts a comment to a GitHub issue
# whose number is stored in the unit's `external_ref` field (set at plan
# time via /sf plan --link-issue=42 "...").
set -euo pipefail
payload="$(cat)"
issue=$(jq -r '.unit.metadata.gh_issue // empty' <<< "$payload")
verdict=$(jq -r '.verdict' <<< "$payload")
phase=$(jq -r '.phase' <<< "$payload")
[ -z "$issue" ] && exit 0 # not linked, no-op
gh issue comment "$issue" --body "sf $phase: $verdict"
Wired in .sf/config.toml:
[harness.hooks]
post_unit = ["./.sf/hooks/post-unit-gh.sh"]
The unit's metadata.gh_issue field is set at plan time:
sf plan --link-issue=42 "implement OAuth"
This pattern keeps the orchestrator's critical path local (sf's DB) while still giving external visibility where the user wants it. The same pattern works for Slack, Discord, Jira, Linear, in-house dashboards — sf doesn't need to know about any of them.
10.5 Doc sync (sub-step of PhaseMerge or PhaseComplete)
Status: NEW — no doc-sync sub-step found.
Doc sync runs as the final sub-step of the last code-mutating phase before PhaseComplete:
- For workflows that include
PhaseMerge: doc sync runs at end ofPhaseMerge. - For workflows that omit
PhaseMergebut includePhaseExecute(e.g.spike): doc sync runs at end of the last code-mutating phase that ran. If the spike adopted a new dependency, doc sync still gets a chance to updateSTACK.md.
It is not a separate phase and not a post-merge hook; it is the final sub-step of whichever phase was last to mutate code.
The doc-sync sub-step:
- Dispatches a
fast-tier turn against the merged diff with a short prompt asking whether project-level docs (ARCHITECTURE.md,CONVENTIONS.md,STACK.md) need updating. - The agent emits a diff (possibly empty) to stdout.
- If the diff is non-empty, the harness surfaces it to the TUI for user approval. On approval, it is committed as
docs: sync after {unit_id}on the same branch and the merge hook is re-triggered. - On empty diff, the sub-step is a no-op and PhaseMerge proceeds to PhaseComplete.
Configuration:
[harness] doc_sync = falsedisables the sub-step entirely.[harness] doc_sync_auto_approve = trueskips the user prompt and commits the diff directly. Off by default.
11. Workspace Management
Status: EXISTS —
auto-worktree.tsplusworktree-manager.ts. Symlink-aware path containment specifics need verification.
11.1 Naming
Workspace directories are derived from the unit identifier. The identifier MUST be sanitized: replace any character not in [a-zA-Z0-9._-] with _. This prevents path injection via issue identifiers containing slashes, .., or null bytes.
11.2 Symlink-aware path containment
Workspace path validation MUST use segment-by-segment canonicalization, not filepath.EvalSymlinks or path.Clean alone. A naive call can be defeated by a symlink that resolves outside the workspace root.
Algorithm:
resolveCanonical(path):
segments = split(path)
resolved = root
for segment in segments:
candidate = join(resolved, segment)
stat = lstat(candidate)
if stat == symlink:
target = readlink(candidate)
# expand target relative to current resolved prefix
# restart segment walk from resolved target
elif stat == exists:
resolved = candidate
elif stat == ENOENT:
resolved = join(resolved, remaining segments) # path not yet created; OK
break
else:
return error
return resolved
After canonicalization, MUST assert canonical_workspace has canonical_root + "/" as a prefix. If it does not, reject with ErrWorkspaceSymlinkEscape.
For remote workers, the same check MUST be performed via a shell script that resolves each path segment before mkdir.
11.3 Workspace lifecycle
after_create— runs once when the workspace directory is first created.before_run— runs before every attempt. Fatal if it fails.after_run— runs after every attempt (success or failure). Best-effort.before_remove— runs before the workspace is deleted.
All hooks run in the workspace directory as the working directory.
11.4 Local workspace creation
ensure_workspace(workspace):
if directory exists:
return (workspace, created=false)
if file exists at path:
rm -rf path
mkdir -p path
return (workspace, created=true)
11.5 Remote workspace creation
For SSH workers, the orchestrator runs a shell script on the remote host that atomically creates and resolves the workspace, then echoes a tab-separated marker line:
printf '%s\t%s\t%s\n' '__SINGULARITY_WORKSPACE__' "$created" "$(pwd -P)"
The orchestrator parses this line from stdout to confirm the resolved canonical path.
12. Worktree Isolation
Status: EXISTS — rich subsystem:
worktree.ts,worktree-manager.ts,worktree-resolver.ts,worktree-health.ts,worktree-telemetry.ts,worktree-command.ts,worktree-command-bootstrap.ts. Slice merge ordering already inslice-parallel-conflict.ts.
12.1 Modes
[harness]
worktree_mode = "branch-per-slice" # or "milestone-per-worktree"
branch-per-slice (default):
- Each slice gets its own git branch (
sf/m{n}-s{n}-{slug}) created from the current base. - The harness emits
WorktreeCreatebefore branch creation; the git service handles the actualgit worktree add. - After PostUnit hooks run, the git service merges the branch to the integration branch. The harness waits for the merge hook before marking the slice complete.
- Merge conflicts emit
MergeConflict, which triggersSignalPause.
milestone-per-worktree:
- A single worktree created for the entire milestone.
- All slices share that worktree. The git service commits incrementally.
- The worktree is merged at milestone PostUnit time.
12.2 Rules
- The harness MUST emit
WorktreeCreateandWorktreeDeleteevents. It MUST NOT callgitdirectly. worktree_modeis session-immutable — changing it requires restart.
12.3 Merge ordering for parallel slices
When multiple slices in branch-per-slice mode complete concurrently, the harness MUST merge them in dependency-aware order, not completion order:
- A slice marked
code_depends_on: ["m1/s2"]in unit metadata is held until that upstream slice's branch has merged. - With no declared code dependency, slices merge in
created_atorder. - The merge gate is serial: only one slice's merge runs at a time per project, even if multiple are eligible.
This is distinct from task_blockers (task-completion dependency). Code dependency means slice B's diff cannot merge cleanly before slice A's diff. Without explicit declaration, the harness assumes no code dependency and merges in creation order — accept that this can produce avoidable conflicts that the next attempt will resolve.
13. Verification Gates
Status: EXISTS —
verification-gate.ts,verification-evidence.ts,auto-verification.ts,gate_runstable. PhaseReview 3-pass chunking is NEW.
13.1 Configuration
[harness.gates]
post_slice = ["./gates/run-tests.sh", "./gates/lint.sh"]
post_milestone = ["./gates/integration-tests.sh"]
13.2 Execution rules
- Gates run as subprocesses. The
UnitResultJSON is passed via stdin. - Exit 0 = pass. Non-zero = fail.
- Fail increments the gate-level retry counter (separate from
units.attempt). The gate retry counter resets on the next phase transition. - Default max gate retries: 3. Configurable per gate via
[harness.gates.max_retries.<gate-name>]. - On retry, the harness re-dispatches the same unit with gate failure output appended to context. The agent MUST see what failed and why.
- After max retries, the harness transitions to
PhaseReassessand emitsGateBlockedon pubsub. - Gate results MUST be stored in
gate_resultstable and written as span events on the unit span.
13.2.1 Gate script protocol
Every gate script MUST adhere to this contract. Implementations that violate any rule are rejected at startup validation.
Environment variables provided:
| Variable | Value |
|---|---|
SF_PROJECT_ROOT |
Absolute path to project root |
SF_HOME |
SF data directory (~/.sf or override) |
SF_UNIT_ID |
Active unit ID (§ 2 format) |
SF_RUN_ID |
Active run ULID |
SF_PHASE |
Phase name (e.g. verify) |
SF_ATTEMPT |
Attempt counter, 1-indexed |
SF_GATE_NAME |
This gate's name (script basename without extension) |
SF_GATE_RETRY |
Gate retry counter, 0-indexed |
SF_WORKSPACE |
Path of the unit's workspace |
SF_TRACE_FILE |
Path to current day's trace JSONL |
Stdin: the UnitResult JSON struct (§ 10.2). UTF-8, single line, terminated with \n.
Exit code: 0 = pass; 1 = fail (retry); 2 = block (do not retry, transition straight to PhaseReassess); 3 = skip (gate is not applicable for this unit). Other codes are treated as 1.
Stdout / stderr: captured combined, truncated at 8 KB, stored in gate_results.output. Multi-line is fine. No structured output is required, but if the first line is valid JSON of the form {"summary": "...", "issues": [...]} the harness uses it for richer reporting.
Timeout: default 5 minutes per gate, configurable via [harness.gates.timeouts.<gate-name>]. Timeout = SIGTERM, then 10s grace, then SIGKILL; recorded as error_code = "gate_timeout".
Cwd: the workspace directory. Scripts MAY assume git status etc. work as expected.
type GateResult struct {
GateName string
UnitID string
Passed bool
Attempt int
MaxRetries int
Output string // combined stdout+stderr, truncated at 8KB
Duration time.Duration
}
13.3 PhaseReview — chunked review
Large diffs MUST NOT be reviewed in a single pass. The harness MUST split the changed file list into chunks of ≤ 300 lines (ReviewChunkLines = 300) before dispatching the review agent. Files larger than ReviewChunkLines get their own chunk.
To prevent context-blind review of cross-file changes, the harness runs three passes:
- Establish-context pass (single dispatch, fast tier). The agent receives the full diff summary (file list + first/last 20 lines of each) and produces a one-paragraph "what this change does and what to watch for" summary.
- Per-chunk review pass (parallel,
standardtier). Each chunk receives: the establish-context summary as a system-prompt prefix, then its own files. Reviewer findings are accumulated. Parallelism is bounded bymax_agents_by_phase.review. - Synthesis pass (single dispatch,
standardtier). All chunk findings are merged, deduplicated, and prioritised. The synthesis agent decides whether the review should pass, request changes, or block (security/correctness issue).
The synthesis verdict is what the harness acts on — chunked passes alone never decide.
13.4 Unit archive
When a slice or milestone reaches PhaseComplete, the harness MUST move its artifact directory from .sf/active/ to .sf/archive/{YYYY-MM-DD}-{unit-id}/ atomically (rename, not copy+delete).
.sf/active/ holds only in-progress work. .sf/archive/ is queried by /sf history.
13.5 Reserved
(specs.check, godoc enforcement on the harness package, is a sf CI requirement — see § 1 — not a runtime gate against user projects.)
14. Configuration
Status: PARTIAL —
config-overlay.tsexists but does not appear to expose the spec's canonical keys (context_compact_at,max_agents_by_phase,turn_input_required,unit_timeout_by_phase). Schema needs alignment.
14.1 File locations and precedence
~/.sf/config.toml— global defaults.sf/config.toml— project overrides (takes precedence)
Both files are TOML. Project overrides global on a per-key basis.
14.2 Canonical schema
[harness]
context_compact_at = 0.80
context_hard_limit = 0.95
unit_timeout = "10m" # default per-attempt cap; can override per phase
turn_timeout = "5m" # bounds one model turn
stall_timeout = "2m" # AttemptStalled when no agent event for this long
tool_abort_grace = "5s" # cooperative cancel window before SIGTERM
tool_abort_kill = "3s" # SIGTERM-to-SIGKILL window
max_turns_per_attempt = 50
max_attempts = 6 # exponential backoff before giving up
hot_cache_turns = 10 # in-memory recent-turn buffer
supervisor_interval = "10s"
max_retry_backoff = "5m"
doc_sync = true
turn_input_required = "soft" # or "hard"
worktree_mode = "branch-per-slice"
[harness.unit_timeout_by_phase]
research = "30m" # AST analysis / spec reading can take real time
plan = "20m"
execute = "15m"
tdd = "10m"
verify = "10m"
review = "15m"
merge = "5m"
reassess = "20m"
uat = "0" # 0 = no timeout (UAT can take days; advance via /sf uat-approve)
[harness.concurrency.max_agents_by_phase]
execute = 4
tdd = 4
verify = 10 # mostly reads — cheap
review = 4 # parallel chunked review (§ 13.3)
merge = 1 # serial per project (§ 12.3)
[harness.concurrency]
max_agents = 10 # global cap; per-phase caps under [harness.concurrency.max_agents_by_phase] above
[harness.auto_approve]
tools = ["bash:read", "fs:read", "git:status", "git:diff"]
[harness.hooks]
pre_dispatch = ["./hooks/pre-dispatch.sh"]
post_unit = ["./hooks/post-unit.sh"]
after_create = "./hooks/after-create.sh"
before_run = "./hooks/before-run.sh"
after_run = "./hooks/after-run.sh"
before_remove = "./hooks/before-remove.sh"
[harness.hooks.timeouts] # per-hook overrides; defaults in § 10.3
before_run = "120s"
post_unit = "60s"
doc_sync = "5m"
[providers]
# pi-ai provider settings live here. pi-ai is the multi-provider client; sf inherits all 20+ providers it supports.
# API keys MUST use vault:// (§ 24); plaintext is rejected at startup.
anthropic.api_key = "vault://secret/sf#anthropic_api_key"
openai.api_key = "vault://secret/sf#openai_api_key"
[harness.gates]
post_slice = ["./gates/run-tests.sh"]
post_milestone = ["./gates/integration-tests.sh"]
[harness.log]
path = "~/.sf/log/sf.log"
max_size = 10485760 # 10MB
max_files = 5
stderr = false
[server]
port = 7842 # 0 = ephemeral (tests)
[memory]
mode = "embedded" # "embedded" (default) | "remote"
url = "http://memory.tailnet.local:7843" # required when mode = "remote"
api_key = "vault://secret/sf#sm_api_key" # required when mode = "remote"
# Embedded mode runs the singularity_memory_server engine in-process.
# Remote mode shares the server across the fleet (Hermes, OpenClaw, sf, etc.).
[worker]
ssh_hosts = []
max_concurrent_agents_per_host = 3
ssh_auth_method = "agent" # "agent" | "key" | "key+agent"
ssh_identity_file = "~/.ssh/id_ed25519" # used for "key" or "key+agent"
ssh_known_hosts = "~/.ssh/known_hosts" # MUST verify; no auto-trust
ssh_disconnect_timeout = "30s"
host_quarantine = "5m"
[routing]
research = "reasoning"
plan = "reasoning"
execute = "standard"
tdd = "standard"
verify = "fast"
review = "standard"
merge = "fast"
complete = "fast"
reassess = "reasoning"
[tiers.fast]
models = ["claude-haiku-4-5", "gemini-flash-2.0"]
[tiers.standard]
models = ["claude-sonnet-4-6", "gemini-2.0-pro"]
[tiers.reasoning]
models = ["claude-opus-4-7", "o3"]
14.3 Dynamic reload
The harness MUST poll .sf/config.toml on every orchestrator tick using a {mtime, size, content_hash} stamp. content_hash is SHA-256 of the file bytes.
When the stamp changes:
- Re-parse and re-validate.
- On success: apply changes immediately to future dispatch, concurrency limits, and hook lists. In-flight runs are NOT interrupted.
- On failure (parse error, validation error): log error at WARNING level, keep last known good config. MUST NOT crash.
The following fields are session-immutable even with dynamic reload enabled:
worktree_modecontext_compact_atcontext_hard_limit
Changing session-immutable fields requires restart. If a dynamic reload detects a changed session-immutable field, the harness MUST:
- Log a warning naming the field, old value, new value.
- Continue using the in-process value for the current session.
- Display the change in
/sf statusas "config drift detected — restart to apply: ". - NOT crash and NOT auto-restart.
14.4 Startup validation
The harness MUST validate config at startup and MUST fail fast with a descriptive error on invalid config. It MUST NOT silently ignore unknown keys or bad values. /sf doctor MUST run HarnessConfig.Validate() as one of its checks.
14.5 Plan.md format
Every active unit has a .sf/active/{unit-id}/plan.md written by PhasePlan and consumed by all subsequent phases. The format is:
---
unit_id: task/m1/s2/t3
created_at: 2026-04-29T14:22:00Z
written_by: claude-sonnet-4-6
plan_version: 1
---
# Goal
<one paragraph: what success looks like>
# Approach
<2-3 paragraphs: how the agent intends to do it>
# Deliverables
- [ ] <concrete file or behavioural change>
- [ ] <…>
# Verification
- <gate or check that proves done>
- <…>
# Notes
<context, gotchas, anti-patterns to avoid for this unit>
The frontmatter plan_version increments on each PhaseReassess→Re-plan. Subsequent phases parse the frontmatter to detect plan version changes (informational; not load-bearing).
The harness MUST validate that plan.md parses as Markdown with the required frontmatter fields before allowing a transition out of PhasePlan. Missing # Goal or # Deliverables sections fail the phase.
14.6 Project directory layout
Every project has a .sf/ directory with this canonical layout:
<project>/
├── .sf/
│ ├── config.toml # project config (§ 14.1)
│ ├── workflows/ # workflow templates (§ 4.5)
│ │ ├── feature.toml
│ │ └── spike.toml
│ ├── hooks/ # hook scripts referenced by config
│ ├── gates/ # gate scripts referenced by config
│ ├── sf.db # SQLite orchestration DB
│ ├── run.lock # process lock (§ 4.7)
│ ├── auto.lock # signals auto-mode active (§ 4.7)
│ ├── active/ # in-progress unit artifacts
│ │ └── {unit-id}/ # one directory per active unit
│ │ ├── plan.md # unit's plan/notes
│ │ ├── workspace -> /path # symlink to actual workspace
│ │ └── run-{run-id}.log # per-run log
│ ├── archive/ # completed work + age-rolled artifacts
│ │ ├── {YYYY-MM-DD}-{unit-id}/ # one per completed unit
│ │ ├── agents/ # rolled agent_inbox/messages
│ │ └── lost-learnings.jsonl # pending_retain ages out here (§ 16.1)
│ ├── log/
│ │ └── sf.log # rolling structured log (§ 19.2)
│ ├── runtime/
│ │ ├── paused-session.json # written when SessionPaused
│ │ ├── gate-state.json # last gate result per unit
│ │ └── server.port # actual HTTP API port (§ 14.2)
│ └── trace/
│ ├── trace-{YYYY-MM-DD}.jsonl # daily-rotated spans
│ └── _meta.json # trace schema version, file index
Layout is stable: /sf revert, /sf history, archive sweeps, and the HTTP API all assume these exact paths.
15. Model Routing
Status: EXISTS —
auto-model-selection.ts,benchmark-selector.ts,blocked-models.ts,llm_task_outcomestable.
15.1 Three tiers
The tier names are fixed: fast, standard, reasoning. Custom tier names are NOT supported — adding a tier would force changes in routing config, complexity-upgrade logic, and the rate-feedback fingerprint, with little benefit. Each tier holds multiple candidate models in [tiers.<name>]. The router picks within the tier; it does not change the tier assignment.
15.2 Phase → tier mapping
Static, config-driven (see § 14.2 [routing] table). The harness MUST apply the phase-to-tier mapping before each dispatch. The agent MUST NOT influence this mapping.
The harness MUST set Think: true on the model config for phases mapped to reasoning tier.
15.3 Complexity upgrade
A classifier at dispatch time — file count, scope breadth, cross-cutting changes → complexity score. If the score crosses a configurable threshold, the tier bumps one level (fast→standard, standard→reasoning). The fingerprint and upgrade decision MUST be stored in SQLite for future routing decisions.
15.4 Within-tier selection
Within a tier, the router picks the model with the highest benchmark score:
score = quality * 0.6 + (1 - normalised_latency) * 0.2 + (1 - normalised_cost) * 0.2
Weights are configurable. If no benchmark data exists for the current fingerprint, use the tier's first model.
Models with a tripped circuit breaker (§ 9.3) MUST be skipped.
15.5 /sf rate feedback loop
Two signal sources:
- Auto-mode — the agent self-evaluates at unit close:
over/ok/underrelative to phase objective. No human in the loop. - Interactive mode — human signals
over/ok/underafter reviewing unit output.
Both write to benchmark_results. Human ratings carry higher weight than LLM self-ratings (configurable multiplier, default 3×).
Score mappings: over=0.3 (over-resourced), ok=0.8, under=0.0 (blocks model for this fingerprint).
16. Knowledge Layer
Status: PARTIAL — sf has its own local memory layer (
memory-store.ts,memory-extractor.ts,memory-relations.ts,tools/memory-tools.ts,bootstrap/memory-tools.ts,memoriesSQLite table). Spec's Singularity Memory integration is NEW. Decision needed: replace, layer, or drop.
16.1 Architecture
The knowledge layer is Singularity Memory (sm) — an HTTP + MCP server we own at singularity-ng/singularity-memory. The engine was derived from vectorize-io/hindsight (MIT) and assimilated into singularity_memory_server/ under our namespace; from sf's perspective there is no upstream service. The same sm server is shared across our agent fleet (Hermes, OpenClaw, Claude Code, Cursor, sf), so memories accumulate across tools.
sf uses github.com/singularity-ng/singularity-memory-client-go, auto-generated from the OpenAPI document published by the running sm server (/openapi.json). There is no local vector store, no sqlite-vec table, no FTS5 fallback — all retrieval and persistence go through sm.
Embedded vs remote deployment. sm supports both modes:
| Mode | When | Config |
|---|---|---|
| Embedded (default for single-user sf) | sm engine runs in-process; no extra service to operate | [memory] mode = "embedded" |
| Remote | sm runs as a tailnet service shared across multiple tools/users | [memory] mode = "remote", [memory] url = "http://memory.tailnet.local:7843" |
Embedded mode eliminates the network hop for the common case. Switching to remote shares context across the fleet at the cost of a network round-trip per recall.
SQLite in sf holds orchestration state only (sessions, units, blockers, gates, benchmarks, circuit breakers, agents). Memories, learnings, anti-patterns, and codebase context live in Singularity Memory.
When sm is unreachable, the harness MUST log a warning and dispatch with no recall context (plus the local local_anti_patterns mirror, § 3.1). The agent still runs; it just lacks historical memory for that session. The harness MUST NOT block dispatch on memory availability.
Retain failures queue locally. PostUnit retain calls that fail (transport error, 5xx) MUST be enqueued in pending_retain and retried with exponential backoff on every poll tick until success. This means a unit's learnings are never silently lost to an sm outage:
CREATE TABLE pending_retain (
id TEXT PRIMARY KEY, -- ULID
bank TEXT NOT NULL,
payload TEXT NOT NULL, -- serialised retain request
attempts INTEGER NOT NULL DEFAULT 0,
next_retry_at INTEGER NOT NULL,
last_error TEXT,
created_at INTEGER NOT NULL
);
pending_retain rows older than 7 days are flushed to .sf/archive/lost-learnings.jsonl and removed; at that point the operator is expected to investigate.
16.1.1 Memory client interface
The harness uses github.com/singularity-ng/singularity-memory-client-go (auto-generated from the sm server's /openapi.json) through a thin wrapper that the rest of the codebase depends on. This wrapper is the seam between sf and Singularity Memory; tests substitute a fake.
type Memory interface {
// Recall fetches top-k entries from a bank for a query. opts.Filter
// may include {"collection": "anti_patterns"} or other tags.
Recall(ctx context.Context, bank string, query string, opts RecallOpts) ([]Entry, error)
// Retain stores a new entry in a bank. document_id is required for
// upsert-by-content-hash semantics (§ 16.3).
Retain(ctx context.Context, bank string, entry Entry) error
// Feedback signals helpfulness of an entry recalled in this dispatch.
// signal ∈ {-1, 0, +1}; +1 resets decay timer.
Feedback(ctx context.Context, entryID string, signal int) error
// Validate marks the entry as still-relevant (resets decay timer).
// Called by PostUnit when a recalled entry directly contributed to success.
Validate(ctx context.Context, entryID string) error
// Health probe. Used by /sf doctor and the retain queue.
Health(ctx context.Context) error
}
type RecallOpts struct {
TopK int
Filter map[string]string
RerankQuality string // "fast" | "accurate"
}
type Entry struct {
DocumentID string // content hash; upsert key
Content string
Tags []string
Metadata map[string]string // includes maturity, decay_factor, etc.
Score float64 // populated on Recall, ignored on Retain
}
The wrapper is responsible for:
- Translating sf's
last_errorand gate output intoEntry.Content. - Adding
is_negativeandcollectiontags appropriately. - Routing transport errors through
pending_retain(§ 16.1). - Exposing the local
local_anti_patternsmirror toRecallwhensmis unreachable.
16.2 Memory tiers
Two tiers prevent token bloat during long-running sessions:
Hot cache — current dispatch's recent turns held in memory (never persisted to SQLite). Configurable size: [harness] hot_cache_turns = 10. Cleared on compaction.
Singularity Memory store — durable. PostUnit writes summaries, learnings, and anti-patterns. Pre-dispatch reads top-N most relevant entries. On compaction, the hot cache is summarised and written to Singularity Memory as a session_summary entry.
The harness MUST NOT mix the two tiers.
16.3 Two-bank pattern
Each session uses two Singularity Memory banks, queried separately and merged before each dispatch:
projectRecall := sm.Recall("project/"+projectHash, query)
globalRecall := sm.Recall("global/coding", query)
// merge, deduplicate, inject top-N into unit context
projectHash is derived deterministically (so the same project hits the same bank from any machine):
- If the project root is a git repository,
projectHash = sha256(canonical_remote_url)[:16]where canonical_remote_url is theoriginURL normalised (strip auth, lowercase host, drop trailing.git). - If no git remote,
projectHash = sha256(absolute_path_with_real_user_home)[:16]. - The resolved hash is cached in
.sf/runtime/project-hash.jsonto ensure stability if the remote changes (a cleared cache forces re-derivation; a project move under a different remote is a deliberate re-bank).
This means a developer cloning the repo on a second machine hits the same Singularity Memory bank as their first machine. Different forks of the same project have different remotes and thus different banks — desired, because their context diverges.
Concurrent retain calls from parallel slice workers use document_id derived from content hash. Duplicate memories silently overwrite rather than accumulate.
16.4 Anti-pattern library
Anti-patterns are memories tagged collection: anti_patterns, is_negative: true. They:
- Are written explicitly when the agent makes a mistake (gate failure or user feedback).
- MUST NOT be subject to normal maturation decay — they persist at full weight until explicitly removed.
- Are retrieved at dispatch time and presented in a dedicated block:
<anti_patterns>avoid these mistakes...</anti_patterns>. - MUST also be mirrored to the local
local_anti_patternsSQLite table (§ 3.1) onretain. When Singularity Memory is unreachable, the harness still injects local anti-patterns into prompt context. Anti-patterns are small, high-value, and never decay — making them the one knowledge category worth duplicating locally.
type AntiPattern struct {
ID string
Description string // what went wrong
Context string // when/where this applies
CorrectPath string // what to do instead
SourceUnit string
CreatedAt time.Time
}
16.5 Pattern maturation
| State | Condition | Retrieval weight |
|---|---|---|
candidate |
< 3 observations | 0.5× |
established |
≥ 3 obs, harmful ratio < 30% | 1.0× |
proven |
decayed helpful score ≥ 5, harmful ratio < 15% | 1.5× |
deprecated |
harmful ratio > 30% | 0× (excluded) |
After 3 failed uses, content is prefixed AVOID: and flagged is_negative: true.
16.6 Confidence decay
halfLife = 90 * (0.5 + confidence) // days; confidence ∈ [0.0, 1.0]
decayFactor = 0.5 ^ (ageInDays / halfLife)
finalScore = similarityScore * decayFactor
Memory access tiers: hot (accessed within 7 days), warm (within 30 days), cold/stale (older).
Entries with 10+ accesses gain a 7-day buffer against decay. Calling validate() when a memory directly aids task completion resets the decay timer.
16.7 Retrieval pipeline
Retrieval is delegated to Singularity Memory via sm.Recall(bank, query, opts). Singularity Memory runs its own internal pipeline — fused semantic + lexical retrieval, optional reranking, and decay weighting — and returns ranked entries. The harness does not implement a retrieval pipeline of its own.
Recall options the harness uses:
| Option | Use |
|---|---|
top_k |
Number of entries to inject into prompt (default 5) |
bank |
project/{hash} or global/coding (§ 16.3) |
filter |
Tag filters (e.g. collection=anti_patterns) |
rerank_quality |
fast (routine) or accurate (pre-dispatch context injection) |
The harness applies its own maturity and anti-pattern weighting (§ 16.4, § 16.5) by tagging entries on retain and filtering / re-ordering on recall — Singularity Memory stores the metadata but does not interpret it.
16.8 sf init
Deep analysis is default, not opt-in:
- AST-level codebase scan (languages, structure, entry points, dependencies).
- Git history analysis (active areas, recent changes, contributors).
- Retain findings into the
project/{hash}Singularity Memory bank. - Establish
.sf/config.tomlwith detected stack, workflow templates, model routing hints.
--quick flag skips Singularity Memory indexing for throwaway sessions.
17. Persistent Agents
Status: PARTIAL — sf has ephemeral subagents, including single, parallel, chain, and bounded debate batches (
subagent({ mode: "debate", rounds, tasks }); tests includesubagent-agent-discovery,subagent-model-dispatch,agent-end-retry,subagent-debate-mode). The spec's persistent-identity + memory-blocks + inbox-wake model is NEW.
17.1 Agent vs unit
A unit is ephemeral work created by /sf plan (or .sf/plan.md) and driven through the phase state machine (§ 4). It is archived on completion.
A persistent agent is a named, long-lived identity: it has its own memory blocks, system prompt, and message history. It sleeps at zero cost when idle and wakes when its inbox receives a message or an explicit /sf agent run <name> is issued.
A persistent agent run is NOT a unit. Specifically:
| Aspect | Unit | Persistent agent run |
|---|---|---|
| Source of work | User goal via /sf plan (§ 3.3) |
Inbox message or explicit /sf agent run |
| Phase state machine | YES | NO |
| Verification gates | YES | NO |
| Workflow templates | YES | NO |
| PostUnit hooks | YES | NO (replaced by PostAgentRun) |
before_run / after_run workspace hooks |
YES | YES (shared lifecycle) |
| Supervisor checks (StuckLoop, AbandonDetect, BudgetWarning) | YES | YES |
| Crash recovery | re-dispatch from last phase | re-deliver undelivered inbox |
| Budget instance | fresh per attempt | persistent across runs (until reset) |
What they share: the worker attempt lifecycle (§ 6) — workspace creation, before_run hook, agent session, turn loop, after_run hook — is identical. The supervisor goroutine monitors agent runs and unit attempts with the same checks. The trace records both as runs with distinct run_kind attributes.
17.2 Memory block injection
At dispatch time, the harness MUST render the agent's memory blocks into the system prompt:
<memory>
<block label="persona">{{value}}</block>
<block label="human">{{value}}</block>
<block label="task">{{value}}</block>
</memory>
17.3 Built-in memory tools
| Tool | Signature | Effect |
|---|---|---|
core_memory_append |
(label string, content string) |
Appends content to block, respects char_limit |
core_memory_replace |
(label string, old string, new string) |
Replaces substring in block |
Both tools MUST write to agent_memory_blocks in SQLite before the next turn is dispatched. A crash mid-session MUST preserve the updated block state.
17.4 Agent lifecycle
type AgentState int
const (
AgentIdle AgentState = iota // no pending messages, not running
AgentRunning // dispatched, consuming tokens
AgentWaiting // sent a message to another agent, awaiting reply
AgentStopped // explicitly stopped; will not wake automatically
)
The harness owns all state transitions. The agent loop MUST NOT write AgentState directly.
17.5 Agent run termination
A persistent agent run terminates when ANY of:
- Inbox drained. The agent's inbox has no
delivered = 0rows AND the agent's last turn produced no outgoingsend_messagerequiringwait_for_reply. - Explicit stop. The agent calls a built-in
stop()tool, signalling it has no further work. - Budget exhausted. Per-agent
Budget.AtHardLimit()fires (§ 8). Compaction does NOT terminate the run; only hard-limit does. - Turn cap.
max_turns_per_run = 100(configurable per-agent viaagents.max_turns_per_runcolumn or[harness] agent_max_turns_per_run). Higher than unit cap because agents are long-running. - Supervisor signal.
SignalAbortfor any reason (StuckLoop, AbandonDetect, ReconciliationCancel does not apply to agents). - Timeout. A configurable
agent_run_timeout = "30m"from run start.
On termination the agent transitions to AgentIdle (or AgentStopped for case 2). On wake (next inbox message), a NEW run begins — the agent's hot cache is NOT preserved across runs; only the durable memory blocks (agent_memory_blocks) and message history (agent_messages) survive.
17.6 Agent fleet supervision
Each persistent agent has its own Budget instance (§ 8) that persists across runs and is reset only on explicit /sf agent reset <name>. Compaction fires per-agent — when one agent's budget hits the compact threshold, only its hot cache is summarised; other agents are unaffected.
Crash recovery for agents differs from unit recovery (§ 4.7): on restart, each agent's agent_inbox is rescanned for delivered = 0 rows. Any such rows trigger an immediate AgentWake — the agent resumes processing the queue. There is no phase to resume; the inbox IS the resumption state.
The trace records each agent run as a separate root span with run_kind = "agent" and agent_id = <id>. /sf session-report breaks down spend by agent.
18. Inter-Agent Messaging
Status: NEW — no
send_messagetool, noagent_inboxtable, no AgentWake events in sf.
18.1 send_message tool
// Tool the agent calls:
// send_message(to: string, message: string) -> void
//
// to: agent name or agent ID
// message: plain text; the receiving agent sees it as a "user" role message
When called, the harness MUST:
- Insert a row into
agent_inboxfor the target agent. - Emit an
AgentWakepubsub event for the target agent. - Record the message in
agent_messagesfor both sender and receiver.
18.2 Wake rules
- An
AgentIdleagent that receivesAgentWakeMUST start a new dispatch cycle immediately. - An
AgentRunningagent queues the message for its next dispatch cycle. - Undelivered inbox messages MUST be prepended to the context as
userrole messages in arrival order at the start of each dispatch, then markeddelivered = 1.
18.3 wait_for_reply
An agent calling wait_for_reply(ticket_id) transitions to AgentWaiting. The harness suspends its dispatch loop until the target agent sends a reply or a configurable timeout elapses.
wait_for_reply has a mandatory timeout. The harness MUST NOT block indefinitely.
18.4 Agent handoff
handoff(to, context) transfers the active task to a specialist agent. to is either an agent name (exact match) or a capability tag string (e.g. "capability:go" or "capability:sql,perf"):
- Resolution. If
tostarts withcapability:, the harness queriesagentsfor an active agent (archived_at IS NULL,state != 'stopped') whosecapabilitiesJSON array includes ALL listed tags. If multiple match, the one with the lowestlast_activewins (round-robin). If none match,handoffreturnsErrNoCapableAgent. - Suspension. The calling agent's current run is suspended (not completed).
- Context delivery. The target agent receives the full task context (system prompt, memory blocks at handoff time, last N messages) pre-loaded as a snapshot in its inbox.
- Wait. The calling agent transitions to
AgentWaitinguntil the specialist replies (subject towait_for_replytimeout). - Fallback. If the target agent is not found or is
AgentStopped,handoffreturns an error and the calling agent continues.
// Tool the agent calls:
// handoff(to: string, context: string) -> HandoffTicket
// Agent calls wait_for_reply(ticket.id) to block until the specialist responds.
//
// to formats:
// "go-specialist" — exact agent name
// "capability:go" — first eligible agent with capability tag "go"
// "capability:sql,perf" — agent with both "sql" AND "perf" tags
Capability matching is the recommended form — it lets the agent fleet evolve without changing handoff call sites.
18.5 Append-only inbox log
agent_inbox MUST be append-only. Rows MUST NOT be deleted after insert. delivered is the only mutable column. This gives a complete audit trail of all inter-agent communication.
Inbox and message tables are subject to a periodic GC sweep: rows with delivered = 1 and created_at < now() - retain_window are moved to .sf/archive/agents/{agent_id}/inbox-{YYYY-MM}.jsonl and deleted from the live tables. Default retain_window = 30d, configurable via [harness] agent_inbox_retain = "30d". The archive is human-readable and queryable by /sf agent history.
18.6 Memory block concurrency
An agent's memory blocks are owned by that agent — they are NEVER shared with other agents (§ 18.7). Within a single agent, a turn's tool calls execute serially (one tool at a time), so two core_memory_* writes within a turn cannot race. Across turns, the harness commits the prior turn's writes before dispatching the next turn (§ 17.3).
handoff does NOT share blocks — the receiving agent gets its own blocks. The context argument of handoff is a snapshot, not a reference.
18.7 What not to build
- Shared memory — agents MUST NOT share memory blocks. If two agents need a common fact, one sends it as a message.
- Broadcast — there is no
send_message_all. Routing MUST be explicit. - Synchronous RPC —
send_messageis fire-and-forget.wait_for_reply()is explicit and has a timeout.
19. Observability
Status: EXISTS —
activity-log.ts,trace-collector.ts. HTTP API and intent chapters are NEW.
19.1 Structured log format
All harness log lines MUST use stable key=value pairs. Required context fields:
| Scope | Required fields |
|---|---|
| Any unit-related log | unit_id=, unit_type= |
| Agent session lifecycle | session_id=, turn_count= |
| Phase transitions | from=, to=, reason= |
| Gate execution | gate=, attempt=, passed= |
Include action outcome in the message: completed, failed, retrying, canceled. MUST NOT log large raw payloads — truncate hook output at 2 KB and append (truncated).
19.2 Log rotation
- Max file size: 10 MB.
- Max rotating files: 5.
- Single-line format — no multi-line log entries.
- When file logging is configured, the default stderr handler MUST be removed (logs to file only).
- Default path:
~/.sf/log/sf.log.
19.3 Spans and trace
type Span struct {
TraceID string
SpanID string
Operation string // "tool_call" | "phase_transition" | "model_request" | "hook"
StartedAt time.Time
Duration time.Duration
Attrs map[string]any
Error error
}
- Every tool call, phase transition, model request, and hook execution MUST emit a span.
- Spans MUST be written to
<project>/.sf/trace/trace-{YYYY-MM-DD}.jsonl(rolls at local-midnight on first span emission after midnight). - Span emission MUST be non-blocking — use a buffered channel with a background writer goroutine.
- MUST NOT drop spans. If the buffer is full, block briefly rather than discard.
- The first line of each daily file MUST be a
_metarecord:
Readers branch on{"_meta":true,"trace_schema_version":1,"sf_version":"<semver>","created_at":"<rfc3339>"}trace_schema_version. Future schema changes bump the version; no in-place migration of historical files.
19.3.1 Trace index for forensics
JSONL is the source of truth for spans, but /sf forensics queries demand fast access to specific runs/units/sessions. The harness MUST maintain a small SQL index alongside the JSONL:
CREATE TABLE trace_index (
run_id TEXT NOT NULL,
span_id TEXT NOT NULL,
parent_span_id TEXT,
trace_id TEXT NOT NULL,
operation TEXT NOT NULL, -- "tool_call" | "phase_transition" | "model_request" | "hook"
started_at INTEGER NOT NULL,
duration_ms INTEGER,
file_path TEXT NOT NULL, -- which JSONL file holds the full record
file_offset INTEGER NOT NULL, -- byte offset within the file
PRIMARY KEY (run_id, span_id)
);
CREATE INDEX trace_index_started_at ON trace_index(started_at);
CREATE INDEX trace_index_trace_id ON trace_index(trace_id);
The index is populated by the trace writer goroutine after a successful flush. /sf forensics <run-id> queries the index, then seeks into the JSONL files for full payloads.
JSONL files older than 30 days MAY be moved to <project>/.sf/archive/trace/ by /sf clean. The move MUST be a single transaction:
- Move the JSONL file to
archive/trace/. - UPDATE
trace_index SET file_path = REPLACE(file_path, '.sf/trace/', '.sf/archive/trace/') WHERE file_path = ?.
Both steps under a process-level lock so a concurrent forensics query never observes a half-renamed state. If /sf clean is interrupted mid-move, on next run it detects the file in archive but index pointing to original path and repairs by re-running the UPDATE.
19.4 Intent chapters
Spans are grouped into named chapters by intent (not just by phase).
type Chapter struct {
ID string
UnitID string
Name string // inferred or agent-declared
Intent string // one-sentence summary written at close
OpenedAt time.Time
ClosedAt *time.Time
Outcome string // "success" | "failure" | "pivot"
SpanIDs []string
}
Chapters serve two purposes:
- Context recovery — on resume after a crash, the harness reconstructs "what the agent was doing and why" from the chapter log. The chapter summary is injected at the top of the restored context.
- Singularity Memory recall — completed chapters are stored as discrete entries. Recall queries match against chapter intent.
The agent MAY open a chapter explicitly via chapter_open(name).
19.5 HTTP observability API
The harness MUST expose a lightweight HTTP server on localhost when server.port is configured. The API is observability-only — orchestrator correctness MUST NOT depend on it.
Auth. The server binds to 127.0.0.1 only. Every request MUST include header Authorization: Bearer <token> where token is read from <project>/.sf/runtime/api.token (generated as 32 random bytes hex on first start, mode 0600). Multi-user machines need this — localhost alone is insufficient. The actual port and token are written to <project>/.sf/runtime/server.port and api.token for tools to discover.
Session filter. All endpoints accept ?session=<id> to scope the response to one session. With no parameter, responses include all active sessions in the project DB; the response body has a top-level sessions: [...] array with the snapshot per session.
GET /api/v1/state — runtime snapshot:
{
"generated_at": "2026-04-29T14:22:00Z",
"counts": { "running": 3, "retrying": 1, "queued": 5 },
"running": [
{
"unit_id": "execute-task/m1/s2/t3",
"phase": "execute",
"session_id": "sess-abc-turn-4",
"turn_count": 7,
"last_event": "tool_call",
"started_at": "2026-04-29T14:10:00Z",
"tokens": { "input": 18200, "output": 2100, "total": 20300 }
}
],
"retrying": [
{
"unit_id": "execute-task/m1/s2/t4",
"attempt": 2,
"due_at": "2026-04-29T14:24:00Z",
"error": "gate: tests failed"
}
],
"totals": {
"input_tokens": 84000,
"output_tokens": 12000,
"cost_usd": 1.24,
"seconds_running": 4820
}
}
GET /api/v1/units/<unit_id> — per-unit debug detail: recent events, workspace path, retry count, last error, log file path.
POST /api/v1/refresh — queue an immediate poll + reconciliation cycle (202 Accepted; best-effort coalescing of rapid requests).
19.6 Rate-limit tracking
The harness MUST track the latest rate-limit payload from any provider event and surface it in the TUI and HTTP API. Rate-limit data is observability-only — no retry logic is driven by it.
Why not actively throttle on rate limits? Three reasons: (a) rate limit headers vary in format and meaning across providers (Anthropic's anthropic-ratelimit-tokens-remaining vs OpenAI's x-ratelimit-remaining-tokens differ in semantics — input-only vs total), (b) the model router (§ 15) already moves between providers, so a single provider's pressure does not need to feed back into dispatch, (c) the circuit breaker (§ 9.3) handles repeated provider failures including 429. Rate-limit data is for the operator to see what's happening, not for the orchestrator to react to.
20. Failure Taxonomy
Status: PARTIAL —
src/resources/extensions/sf/errors.tsexists; full spec error code set needs cross-check against actual codes.
Every harness failure has a class. The class determines recovery behavior.
| Class | Examples | Recovery |
|---|---|---|
config |
Missing or invalid .sf/workflows/*.toml, invalid .sf/config.toml, missing API key |
Block new dispatches. Keep service alive. Emit operator-visible error. |
workspace |
Directory creation failure, hook timeout, invalid path | Fail the current attempt. Orchestrator retries with backoff. |
agent_session |
Startup handshake failed, turn timeout, turn cancelled, subprocess exit, stalled session, turn_input_required (hard mode) |
Fail the current attempt. Orchestrator retries with backoff. |
observability |
Snapshot timeout, dashboard render error, log sink failure | Log and ignore. MUST NOT crash the orchestrator over an observability failure. |
20.1 Typed error codes
const (
ErrMissingWorkflowFile = "missing_workflow_file" // .sf/workflows/<name>.toml not found
ErrWorkflowParseError = "workflow_parse_error"
ErrWorkspaceCreation = "workspace_creation_failed"
ErrWorkspaceSymlinkEscape = "workspace_symlink_escape"
ErrHookTimeout = "hook_timeout"
ErrHookFailed = "hook_failed"
ErrAgentStartup = "agent_session_startup"
ErrTurnTimeout = "turn_timeout"
ErrTurnFailed = "turn_failed"
ErrTurnInputRequired = "turn_input_required"
ErrPromptRender = "prompt_render_failed"
ErrBudgetExhausted = "budget_exhausted"
ErrStalled = "stalled"
ErrCanceledByOperator = "canceled_by_operator" // user ran /sf abandon
ErrModelUnavailable = "model_unavailable"
ErrCircuitOpen = "circuit_open"
ErrNoCapableAgent = "no_capable_agent"
ErrSshDisconnected = "ssh_disconnected"
ErrCanceledBySupervisor = "canceled_by_supervisor"
)
Implementations MUST match on typed error codes. Matching on error message strings is PROHIBITED.
20.2 Scheduler state
Scheduler state is intentionally in-memory. Restart recovery MUST NOT attempt to restore retry timers, live sessions, or in-flight agent state. After restart: startup terminal cleanup → fresh poll → re-dispatch eligible work. This is a design choice, not a limitation. Durable retry state is a future extension.
21. Trust Boundary
Status: PARTIAL —
auto_approveconfig key not found verbatim. PreToolUse hooks exist via pi-coding-agent. Auto-approve allowlist needs to be made explicit.
Every deployment MUST document its trust posture explicitly. There is no universal safe default.
21.1 Default posture (single-user developer machine)
- Auto-approve tool execution and file changes within the workspace.
turn_input_required = "soft".- Workspace isolation enforced (symlink-aware path containment, sanitized names).
- Secrets from Vault only — MUST NOT store secrets in config files in plaintext.
21.2 Hardening measures for less-trusted environments
- Filter which issues/tasks are eligible for dispatch — untrusted or out-of-scope tasks MUST NOT automatically reach the agent.
- Restrict the
plan_unitclient-side tool to read-only or scope-limited mutations only. - Run the agent subprocess under a dedicated OS user with no write access outside the workspace root.
- Add container or VM isolation around each workspace (Docker, nsjail, etc.).
- Restrict network access from the workspace.
- Narrow available tools to the minimum needed for the workflow.
21.3 Auto-approval contract
In auto-mode the harness calls pi-coding-agent's existing permission API ONLY for operations listed in [harness.auto_approve]. Sensitive operations (fs:write-outside-project, shell:exec) MUST always prompt regardless of auto-mode setting.
Precedence between PreToolUse hooks and auto-approve. pi-coding-agent's PreToolUse hook system already runs before any tool call. If a PreToolUse hook returns deny or halt, the tool call is rejected even if auto_approve lists the tool. The order is:
- PreToolUse hooks run first; their decision is final for
deny/halt. - If hooks return
allowor no decision, the auto-approve list is consulted. - If neither approves, the user is prompted (interactive mode) or the call fails (auto-mode for non-allowlisted tools).
This means: PreToolUse hooks MAY revoke an auto-approval; the auto-approve list MAY NOT override a hook denial. This precedence is critical for security policies that need to override per-session approvals.
SF-specific permission gates:
git:write— any git operation that mutates state. Requires explicit grant in auto-mode.worktree:createandworktree:delete— worktree lifecycle.fs:write-outside-project— ALWAYS prompt, NEVER auto-approve.shell:exec— allowlist specific commands; no blanket approval.
22. Distributed Execution
Status: NEW — no SSH worker code found in sf.
22.1 Topology
The orchestrator ALWAYS runs centrally. Workers MAY execute on remote hosts over SSH.
[worker]
ssh_hosts = ["mikki-bunker", "forge-gpu-1"]
max_concurrent_agents_per_host = 3
22.2 Rules
workspace.rootis resolved on the remote host, not the orchestrator.- The agent subprocess is launched over SSH stdio. The orchestrator owns the session lifecycle.
- Continuation turns within one worker lifetime MUST stay on the same host and workspace.
- If a host is at capacity, dispatch MUST wait rather than silently fall back to local or another host.
- Once a run has produced side effects, moving to another host on retry is treated as a new attempt (not invisible failover).
- The run record MUST include
worker_hostso operators can see where each run executed. - SSH workspace creation MUST use the same symlink-aware validation as local workspaces, implemented via shell script.
22.3 Disconnect and zombie handling
When the SSH connection drops mid-turn:
- The orchestrator marks the attempt
failedwitherror_code = "ssh_disconnected"after[worker] ssh_disconnect_timeout = "30s"of no stdio activity. - Before scheduling a retry, the orchestrator MUST emit a remote-cleanup script over a fresh SSH session:
pgrep -f "<workspace_marker>" | xargs -r kill -TERM, wait 10s, thenkill -KILL. The marker is a unique string injected into the agent process's command line (e.g.--sf-run-id=<run_id>). - If the cleanup script fails (host unreachable), the host is marked
unhealthyfor[worker] host_quarantine = "5m". New dispatches skip it; the host re-eligibility check runs each tick. - The retry MUST land on a different host if
host_quarantineis in effect for the original host; otherwise same host with a fresh workspace re-creation (the previous workspace is moved to~/.sf/orphaned-workspaces/{timestamp}-{run-id}/for forensics, not deleted).
Zombies are the dominant failure mode for distributed execution; ignoring them produces double-write corruption.
23. Plugin Extension Points
Status: NEW — no
Shipper/VCS/Notifierinterfaces. sf has its own command/extension model that may need to be reconciled with this spec.
Plugin interfaces are TypeScript classes implementing the listed contracts. sf loads them via dynamic import at boot from .sf/plugins/. Each plugin is a Node module exporting a default class with a marker method (e.g. static readonly kind = "shipper").
23.1 Interfaces
SupervisorCheck — custom supervisor checks without forking:
type SupervisorCheck interface {
Name() string
Check(ctx context.Context, state SupervisorState) SupervisorSignal
}
Shipper — PR/MR creation. GitHub default; GitLab, Gitea, Forgejo alternatives:
type Shipper interface {
Ship(ctx context.Context, opts ShipOptions) (ShipResult, error)
}
VCS — version control backend. git default; jj (Jujutsu) first alternative:
type VCS interface {
Commit(ctx context.Context, msg string, files []string) error
Branch(ctx context.Context, name string) error
Push(ctx context.Context, remote, branch string) error
}
Store — storage backend. SQLite for personal use; PostgreSQL for team sessions:
type Store interface {
SaveSession(ctx context.Context, s Session) error
LoadSession(ctx context.Context, id string) (Session, error)
SaveMemory(ctx context.Context, m Memory) error
SearchMemory(ctx context.Context, q MemoryQuery) ([]Memory, error)
}
Notifier — notification provider. Slack, Discord, webhook:
type Notifier interface {
Notify(ctx context.Context, event Event) error
}
23.2 What stays out of plugins
- Workflow templates — enforced TOML/YAML data
- Skills —
SKILL.mdprompt guidance - Model routing — config + SQLite + thin Go scorer
- Phase transitions — harness-owned, not extensible
24. Secret Management
Status: NEW — no
vault://resolver found in sf.
24.1 vault:// URI scheme
Secrets MUST NOT be stored in config files in plaintext. The canonical secret reference format is:
vault://secret/sf#anthropic_api_key
In config:
{
"providers": {
"anthropic": { "api_key": "vault://secret/sf#anthropic_api_key" }
}
}
24.2 VaultResolver
type VaultResolver struct {
client *vault.Client
}
func (r *VaultResolver) Resolve(uri string) (string, error) {
// parse vault://path#field
// client.KVv2(mount).Get(ctx, path) → secret.Data["field"]
}
Auth chain (first that succeeds):
VAULT_TOKENenv var (CI / ephemeral)~/.vault-tokenfile (local dev)- AppRole via
VAULT_ROLE_ID+VAULT_SECRET_ID(production)
Secrets MUST be fetched once at startup and held in memory for the session lifetime. MUST NOT be written to disk or logged.
24.3 Stopgap
Until the native resolver is built, sf supports the same $(command) substitution that pi-mono inherits — embed a shell command:
{ "api_key": "$(vault kv get -field=anthropic_api_key secret/sf)" }
25. CLI Commands
Status: PARTIAL — 22
commands-*.tsfiles cover most spec commands but some are named differently (e.g.commands-do.tsvs spec's/sf next). Full mapping table needed.
/sf plan "<goal>" [--workflow=feature] [--link-issue=<n>]
Add a milestone to the project's plan. sf decomposes into slices and tasks at runtime (Plan phase) but the milestone row is created immediately so it shows up in /sf status. --link-issue=<n> writes metadata.gh_issue for use by visibility hooks (§ 10.5.1). --workflow= overrides the default workflow template.
/sf plan reload
Re-read .sf/plan.md and reconcile against current units. Adds new milestones, surfaces removed ones as archived, leaves in-flight units alone.
/sf abandon <unit-id> "reason"
Operator override to mark a unit terminal mid-flight. Sets phase_status = 'canceled', records the reason in runs.error_code = "canceled_by_operator". Mid-turn workers detect the change at the next inter-turn check (§ 6) and exit cleanly.
/sf auto
Start the autonomous loop. The harness polls units for eligible work and dispatches workers until no more eligible units exist or until stopped by /sf pause.
/sf next
Manual step mode. Dispatch one unit, wait for completion, surface result. Repeat on each invocation.
/sf dispatch <unit-id>
Force-dispatch a specific unit regardless of priority or blocker state. Surfaces a warning if blockers exist.
/sf pause
Cleanly pause auto-mode. Writes SessionPaused to SQLite. All in-flight units complete their current turn before stopping.
/sf status
Structured project health snapshot:
Project: singularity-foundry
Phase: Execute [m2/s3/t1 — add trace export]
Next: TDD [m2/s3/t1]
Blocker: none
Milestones: 2 / 5 (40%)
Slices: 7 / 18 (39%)
Tasks: 14 / 42 (33%)
Session: 4h 12m | $0.83 | claude-sonnet-4-6
Blockers surface from the session_blockers table. /sf status MUST NOT poll pubsub — it reads SQLite directly.
/sf revert <unit-id>
Four-phase git-aware revert protocol:
- Target selection — accept explicit unit ID, or present the top 3 in-progress + 3 most recent completed units as a numbered menu.
- Git reconciliation — find all commits belonging to the target unit. Handle ghost commits (SHA missing after rebase/squash) by searching by commit message prefix.
- Confirmation — display exact SHA list with descriptions and dates. Warn on merge commits.
- Execution —
git revert --no-edit <sha>in reverse order (newest first). On conflict:SignalPause.
After all reverts: restore .sf/active/{unit-id}/ artifacts from archive; mark unit as [ ] in the plan.
/sf rate over|ok|under [unit-id]
Signal model quality. Without unit-id, targets the most recently completed run in the current session — specifically the latest row in runs where outcome IN ('success', 'failure') and ended_at IS NOT NULL, scoped to session_id. With unit-id, targets the latest run for that unit.
Writes to benchmark_results with the human-rating weight multiplier (default 3×). Cannot be issued against an in-flight run.
/sf benchmark
Run on-demand model benchmarks for all tiers against real task samples. Updates benchmark_results.
/sf doctor
Run health checks:
HarnessConfig.Validate()- Vault connectivity
- Singularity Memory connectivity
- SQLite schema version
- Lock file state
- Workflow template syntax
- HTTP API token presence + permissions
Exit code: 0 if all checks pass, 1 if any FAIL or WARN. Useful in CI: sf doctor || exit 1. The TUI rendering shows pass/warn/fail per check; the JSON form (/sf doctor --json) returns a structured report for automation.
/sf history
Query archived units in .sf/archive/. Supports filtering by date, phase, model, verdict.
/sf forensics
Inspect the trace for a specific unit or session. Shows all spans, tool calls, phase transitions, and gate results in chronological order.
/sf reset-circuits
Clear all tripped circuit breakers. Next dispatch uses benchmark scores to select within each tier normally.
/sf reassess-resolve <unit-id> "operator response"
Resume a unit that entered PhaseReassess with the Escalate outcome (§ 4.6). The operator's response is appended as the next attempt's last_error so the agent can incorporate it. The unit re-enters PhasePlan.
/sf force-clear <blocker-id>
Operator override: mark a session_blockers row resolved with resolved_by = "/sf force-clear". Used to dismiss stuck GateBlocked events that can't auto-resolve (e.g. flaky external test infrastructure).
/sf merge-resolve <unit-id>
Resume a unit halted on MergeConflict. Assumes the operator has resolved the conflict in the worktree. Triggers re-emission of MergeReady.
/sf uat-approve <unit-id> and /sf uat-reject <unit-id> "reason"
Advance a unit out of PhaseUAT (§ 4.6). Approve transitions to PhaseMerge; reject transitions to PhaseReassess with the reason as last_error.
/sf agent <subcommand>
Persistent agent management:
/sf agent list— show all agents with state, last_active, capabilities./sf agent run <name> "message"— wake an agent with an ad-hoc message (bypasses inbox routing)./sf agent reset <name>— clear hot cache and reset Budget; memory blocks and message history preserved./sf agent delete <name>— soft-delete (setsarchived_at); runs and messages preserved via snap_ columns./sf agent inspect <name>— show memory blocks, recent messages, current state./sf agent history <name>— query archived inbox in.sf/archive/agents/{id}/.
/sf history [filters]
Query archived units in .sf/archive/. Filter syntax:
/sf history --since 2026-04-01 --phase merge --verdict success
/sf history --workflow spike
/sf history --model claude-sonnet-4-6 --limit 50
/sf history --json # machine-readable output for automation
Filters are AND-combined. Without filters, returns the most recent 20 archived units. The query reads from runs table joined with archive metadata; full unit artifacts are accessible at .sf/archive/{date}-{unit-id}/.
/sf clean [--dry-run]
Garbage-collect: rotate trace JSONL older than 30 days to .sf/archive/trace/, evict pending_retain rows older than 7 days to lost-learnings.jsonl, vacuum SQLite. --dry-run shows what would be removed.
26. Conformance Checklist
Use this checklist as the definition-of-done for each build phase. An implementation is core-conformant when all core items pass. Extension-conformant when all extension items also pass.
Each item is tagged:
- [REQUIRED] — MUST be present for conformance at its tier. Absence = non-conformant.
- [STRONG] — SHOULD be present; departure requires a written rationale.
- [OPTIONAL] — MAY be present; absence is acceptable.
Default tag is [REQUIRED] unless explicitly noted.
26.1 Core (must ship)
- C-01 [EXISTS] Workflow template TOML loader with
phases,require_tdd,require_review,max_retries,max_reassessfields; unknown fields rejected. - C-02 [PARTIAL] Phase state machine with all 10 phases; invalid transitions rejected with typed error at harness boundary.
- C-03 [EXISTS]
Harness.Transition(ctx, from, to, reason)persists to SQLite before new phase begins; emits pubsubPhaseChangeafter write. - C-04 [NEW] AttemptState enum (11 states);
AttemptCanceleddistinct fromAttemptFailed. - C-05 [PARTIAL] TurnKind enum; continuation turns receive guidance-only prompt, not full task prompt.
- C-06 [PARTIAL] Strict prompt rendering: unknown
{{variable}}in template → startup panic. - C-07 [EXISTS]
attemptvariable:nullon first dispatch; integer ≥ 1 on retry;last_errorauto-injected on retry. - C-08 [EXISTS]
turn_input_requiredconfigurablesoft(inject non-interactive message) orhard(fail immediately); MUST NOT stall indefinitely. - C-09 [EXISTS] Context budget:
ShouldCompact()triggers compaction before next turn;AtHardLimit()halts unit; budget state persisted to SQLite after every turn. - C-10 [EXISTS] Budget token accounting prefers absolute totals; prevents double-counting.
- C-11 [PARTIAL] Compaction: write session summary to Singularity Memory, clear hot cache, start next turn with fresh recall.
- C-12 [EXISTS] Supervisor goroutine: all 9 built-in checks; communicates only via pubsub; MUST NOT call
os.Exit. - C-13 [PARTIAL] Circuit breaker: 3 consecutive non-transient failures trips model; state persisted to SQLite; resets after 24h or
/sf reset-circuits. - C-14 [PARTIAL]
ModelUnavailable→SignalAbortimmediately (not after timeout). - C-15 [EXISTS] Hook events:
PreDispatch,PostUnit,PhaseChange,AutoLoop,WorktreeCreate,WorktreeDelete,MergeReady,MergeConflict. - C-16 [EXISTS]
UnitResultstruct passed to PostUnit hooks as JSON via stdin. - C-17 [EXISTS] PostUnit hooks run sequentially; non-zero exit →
SignalAbort; timeout → kill, log, continue. - C-18 [PARTIAL] Tool response contract:
{success, output, contentItems}shape for all tool responses. - C-19 [EXISTS] Unknown tool call → structured failure response; session continues.
- C-20 [NEW] Doc sync hook runs after every
PhaseMerge; MAY be disabled withdoc_sync = false. - C-21 [EXISTS] Workspace name sanitization:
[^a-zA-Z0-9._-]→_. - C-22 [PARTIAL] Symlink-aware workspace path containment via segment-by-segment
lstatcanonicalization; naiveEvalSymlinksis insufficient. - C-23 [EXISTS] Workspace lifecycle hooks:
after_create,before_run,after_run,before_remove;before_runfatal,after_runbest-effort. - C-24 [EXISTS] Startup cleanup: stale active artifacts moved to archive; running units marked interrupted.
- C-25 [PARTIAL] Dynamic config reload:
{mtime, size, SHA-256}stamp polled every tick; invalid reload keeps last known good; session-immutable fields unchanged without restart. - C-26 [PARTIAL] Per-phase concurrency caps (
max_agents_by_phase). - C-27 [EXISTS] Blocker-aware dispatch: non-terminal upstream → skip, re-evaluate next tick; no backoff increment.
- C-28 [EXISTS] Priority sort: priority asc → blocker-free first → phase order → created_at asc → id lexicographic.
- C-29 [EXISTS] Continuation retry (1s) after normal worker exit.
- C-30 [EXISTS] Exponential backoff after abnormal exit; cap configurable (default 5m).
- C-31 [EXISTS] Structured log format:
key=valuepairs; required context fields per scope; truncate at 2KB. - C-32 [EXISTS] Log rotation: 10MB max, 5 files, single-line format, stderr handler removed when file logging active.
- C-33 [EXISTS] Span-based trace to
~/.sf/trace.jsonl; non-blocking buffered writer; MUST NOT drop spans. - C-34 [NEW] Intent chapters: open/close with intent summary; used for crash recovery context and Singularity Memory recall.
- C-35 [PARTIAL] Typed error codes; matching on error strings PROHIBITED.
- C-36 [EXISTS] Scheduler state intentionally in-memory; restart re-dispatches from fresh poll.
- C-37 [PARTIAL] Project CI runs
specs.check: AST-based godoc enforcement on all exported identifiers in sf's own harness packages. (Not a user-project runtime gate.) - C-38 [NEW] Vault secret resolution:
vault://path#fieldURI scheme; auth chain:VAULT_TOKEN→~/.vault-token→ AppRole; secrets MUST NOT be written to disk or logged. - C-39 [NEW] PhaseReview chunked at ≤ 300 lines per chunk.
- C-40 [EXISTS] Unit archive:
.sf/active/→.sf/archive/{date}-{unit-id}/onPhaseCompletevia atomic rename. - C-41 [EXISTS] No external tracker integration. The orchestrator polls only
unitsin local SQLite. External visibility (GH Issues, Slack, etc.) is achieved via PostUnit hook scripts, not built-in adapters. - C-42 [EXISTS] Unit creation sources:
/sf plan "<goal>"CLI,.sf/plan.mdreload,/sf dispatch <prompt>. No background poll of any external API. - C-43 [PARTIAL] Crash recovery:
runningunits →interruptedon startup; re-dispatch fresh from last persisted phase boundary withlast_error = "resumed_after_crash"; tool calls NOT replayed; agent sessions NOT resumed. - C-44 [EXISTS] Process lock at
~/.sf/run.lock; stale-lock cleanup via/procPID check. - C-45 [NEW] Doc-sync runs as a sub-step of
PhaseMerge(not a separate phase, not a post-merge dispatch); empty diff is a no-op; user approval required unlessdoc_sync_auto_approve = true. - C-46 [PARTIAL] SQLite is orchestration-only — no
memoriestable, no vector index. Knowledge MUST live in Singularity Memory. - C-47 [PARTIAL] Atomic claim acquisition: single conditional UPDATE pattern; rows_affected = 1 gates dispatch.
- C-48 [NEW]
runstable: CHECK constraint enforces XOR between unit_attempt and agent_run; aggregate token/cost are end-of-run rollup. - C-49 [NEW]
units.attemptis current counter; historical attempts inruns; both updated in same transaction. - C-50 [EXISTS] Mid-run cancellation only via
/sf abandon <unit-id>(operator) or supervisor signal; no automated cancellation from external state changes (since there is no external state). - C-51 [NEW] Singularity Memory retain failures queue in
pending_retain; flush tolost-learnings.jsonlafter 7d. - C-52 [PARTIAL] Workflow selection priority:
metadata.workflowset at plan time →default_workflowconfig → built-in fallback. Pinned to unit at first dispatch; never re-evaluated. - C-53 [NEW] PhaseUAT trigger: workflow
require_uat = true; halts auto-loop withSignalPause; resumes via/sf uat-approveor/sf uat-reject. - C-54 [NEW] Agent run termination conditions defined (inbox drain, stop tool, hard budget, turn cap, supervisor abort, timeout); hot cache NOT preserved across runs; durable blocks and message history ARE.
- C-55 [PARTIAL]
last_errorinjected only onTurnFirstofattempt >= 2. - C-56 [EXISTS] Per-project lock at
<project>/.sf/run.lock; multiple projects can run auto concurrently. - C-57 [EXISTS] Project DB at
<project>/.sf/sf.db; canonical directory layout (§ 14.5) MUST be honoured for/sf revert,/sf history, archive sweeps. - C-58 [PARTIAL] All runtime ULID PKs; soft-delete via
archived_atfor units and agents (no cascade delete of runs). - C-59 [NEW]
runssnap_ columns survive entity deletion; FK usesON DELETE SET NULL. - C-60 [PARTIAL] Per-hook-type timeouts (table in § 10.3); not a single global value.
- C-61 [PARTIAL] PhaseReassess outcomes: Re-plan / Abandon / Escalate;
max_reassessdecrements only on Re-plan; reasoning tier with Think. - C-62 [EXISTS] PhaseChange is non-vetoable; veto semantics live on PreDispatch.
- C-63 [NEW] PhaseReview three-pass: establish-context → parallel chunked review → synthesis.
- C-64 [NEW] SSH disconnect:
error_code = "ssh_disconnected"; remote zombie cleanup via marker pgrep; host quarantine on cleanup failure; orphaned workspace preserved for forensics. - C-65 [NEW] Agent compaction preserves wake message + recent 3 inbox arrivals + full memory blocks.
- C-66 [PARTIAL] PreToolUse hook decisions outrank auto_approve list (deny wins; allow falls through to auto-approve).
- C-67 [EXISTS] Slice merge ordering:
code_depends_onhonoured; merges serialised per project. - C-68 [NEW] Doc-sync sub-step runs at end of last code-mutating phase (Merge if present, else Execute).
- C-69 [NEW] Cost stored as
cost_micro_usdINTEGER (1e-6 USD); float drift avoided. - C-70 [NEW]
session_blockers.resolved_atset per resolution-rules table;resolved_byrecords source. - C-71 [NEW] Workflow content pinning via
workflow_pins(hash, name, content); in-flight units use pinned content even if template file changes. - C-72 [NEW]
projectHashderivation: git-remote SHA-256 → fallback path SHA-256; cached in.sf/runtime/project-hash.json. - C-73 [NEW] Dynamic reload of session-immutable fields: warn, keep in-process value, surface in
/sf statusas drift; do NOT crash. - C-74 [NEW]
last_errorcapped at 4 KB head-and-tail; full payload at.sf/active/{unit-id}/last-error-full.txt. - C-75 [NEW] SSH auth via agent / explicit key;
ssh_known_hostsMUST verify; no auto-trust. - C-76 [NEW] UAT phase has timeout = 0 (infinite); advanced via
/sf uat-approveor/sf uat-reject. - C-77 [NEW] HTTP API requires
Authorization: Bearer <token>from.sf/runtime/api.token(mode 0600);?session=<id>filter supported. - C-78 [PARTIAL]
/sf doctorexit code 0 = all pass, 1 = any FAIL or WARN;--jsonreturns structured report. - C-79 [NEW] Trace JSONL has
_metafirst-line record withtrace_schema_version; readers branch on version. - C-80 [NEW] Trace SQL index (
trace_index) populated by trace writer;/sf forensicsqueries it for fast span lookup. - C-81 [NEW] Turn outcome marker parsed from last 200 chars:
<turn_status>complete|blocked|giving_up</turn_status>; blocked → SignalPause, giving_up → PhaseReassess. - C-82 [NEW] Agent handoff supports
capability:tag1,tag2form; round-robin bylast_activeamong matching agents;ErrNoCapableAgentif none. - C-83 [NEW] Provider API keys MUST use
vault://; plaintext rejected at startup validation. - C-84 [PARTIAL] Gate script protocol: env vars (SF_PROJECT_ROOT, SF_UNIT_ID, SF_RUN_ID, SF_PHASE, SF_ATTEMPT, SF_GATE_NAME, SF_GATE_RETRY, SF_WORKSPACE, SF_TRACE_FILE), stdin = UnitResult JSON, exit codes 0/1/2/3, output truncated at 8 KB.
- C-85 [PARTIAL] Gate retry counter is separate from
units.attempt; resets on phase transition. - C-86 [PARTIAL]
plan.mdfrontmatter (unit_id, created_at, written_by, plan_version) + sections (Goal, Approach, Deliverables, Verification, Notes) validated before transition out of PhasePlan. - C-87 [PARTIAL]
Memoryinterface (Recall, Retain, Feedback, Validate, Health) generated from sm's/openapi.json;pending_retainqueue routes failed Retains;local_anti_patternsmirror exposed when sm unreachable. - C-88 [EXISTS] sf tools registered through pi-coding-agent's tool registry; PreToolUse hooks apply uniformly; auto_approve keys documented per tool.
- C-89 [PARTIAL] All operator commands referenced elsewhere in spec are present in § 25: reassess-resolve, force-clear, merge-resolve, uat-approve, uat-reject, agent {list,run,reset,delete,inspect,history}, history, clean.
- C-90 [NEW]
agent_capabilitiesindex maintained in sync withagents.capabilities; capability lookup is index scan, not full table scan. - C-91 [NEW] Trace JSONL archive move is transactional with
trace_index.file_pathUPDATE; recoverable if interrupted. - C-92 [PARTIAL] Versioning policy: SemVer; v1.0 freezes §§3, 4, 6, 10, 14, 26.
- C-93 [PARTIAL] [STRONG] Rate-limit data is observability-only; no orchestrator retry/dispatch logic reads it.
- C-94 [NEW] Singularity Memory is the sole knowledge backend; engine assimilated into
singularity_memory_server/(MIT-attributed, no upstream runtime dep). - C-95 [NEW]
[memory] mode = "embedded"is the default for single-user sf;mode = "remote"MUST requireurlandapi_key(vault://). - C-96 [NEW] Go client
github.com/singularity-ng/singularity-memory-client-gois generated from sm's/openapi.json; sf imports it as a normal Go module dependency.
26.2 Knowledge layer (ship after core)
- K-01 [NEW] Memory tiers: hot cache (in-memory, last 10 turns); Singularity Memory store (durable, PostUnit writes).
- K-02 [NEW] Two-bank pattern in Singularity Memory:
project/{hash}+global/coding; merged before dispatch. - K-03 [NEW] Anti-pattern library:
collection: anti_patterns; never decay; surfaced in dedicated<anti_patterns>block. - K-04 [NEW] Pattern maturation: 4 states (candidate → established → proven → deprecated); weights as specified.
- K-05 [NEW] Confidence decay:
halfLife = 90 * (0.5 + confidence)days. - K-06 [NEW] Singularity Memory is the sole knowledge backend; on sm outage, dispatch proceeds with empty recall (plus local_anti_patterns mirror) and a logged warning.
- K-07 [EXISTS]
sf initdeep analysis default;--quickskips Singularity Memory indexing.
26.3 Model routing (ship after core)
- R-01 [EXISTS] Three tiers; phase → tier static mapping from config.
- R-02 [EXISTS]
Think: trueset forreasoningtier phases; agent cannot override. - R-03 [EXISTS] Within-tier selection by benchmark score formula.
- R-04 [PARTIAL] Complexity upgrade: classifier at dispatch time; fingerprint stored in SQLite.
- R-05 [EXISTS]
/sf ratewritesbenchmark_results; human ratings carry 3× weight.
26.4 Persistent agents (ship after core)
- A-01 [NEW]
agents,agent_memory_blocks,agent_messages,agent_inboxSQLite tables. - A-02 [NEW] Memory block injection as XML into system prompt at dispatch.
- A-03 [NEW]
core_memory_appendandcore_memory_replacetools write to SQLite before next turn. - A-04 [NEW]
AgentStateenum (4 states); harness owns all transitions. - A-05 [NEW]
agent_inboxappend-only;deliveredis the only mutable column. - A-06 [NEW]
send_messagetool: inserts to inbox, emitsAgentWake. - A-07 [NEW]
wait_for_replywith mandatory timeout; MUST NOT block indefinitely. - A-08 [NEW]
handoff(to, context): suspends calling agent → target receives full context → calling agent transitions toAgentWaiting. - A-09 [NEW] Per-agent budget tracking, supervision, and crash recovery.
- A-10 [NEW] Cost recorded per agent in trace.
26.5 Extensions (ship after core)
- E-01 [NEW] HTTP observability API:
GET /api/v1/state,GET /api/v1/units/<id>,POST /api/v1/refresh. - E-02 [NEW] SSH worker extension:
worker.ssh_hosts; remote workspace creation via shell script with symlink-aware validation; per-host concurrency cap. - E-03 [NEW] Durable retry queue across restarts (SQLite-backed).
- E-04 [NEW]
plan_unitclient-side tool: agent can refine its own plan mid-run (add/split/reorder units). Uses orchestrator auth; subject to PreToolUse hooks. - E-05 [NEW] Plugin interfaces:
SupervisorCheck,Shipper,VCS,Store,Notifier. (Trackerdeliberately not in this list — see § 3.3.)
End of SPEC.md v0.1.0-draft