22 KiB
Dispatch Architecture Consolidation Plan
Status: Draft — for review
Author: Research synthesis from codebase analysis
Date: 2026-05-08
1. Root Cause Diagnosis — Why the Proliferation Happened
The 5 dispatch mechanisms + 1 message bus are not accidental complexity — they are each responses to a genuine gap that appeared at a different time, with different constraints. The structural symptom is that dispatch, orchestration, and coordination are conflated into one system, and SF grew new systems rather than extending existing ones when the use cases diverged.
The Timeline of Divergence
| Era | Mechanism Added | Gap It Filled |
|---|---|---|
| Early SF | subagent tool |
Ad-hoc delegation: "run this agent for this task" |
| Parallel work | parallel-orchestrator |
"run milestone X in a worktree, independently" — required isolation at process boundary |
| Slice-level work | slice-parallel-orchestrator |
Same as above but at finer granularity — duplicate code, not a different concept |
| Autonomous loop | UOK kernel |
"run the full PDD loop continuously, gated by confidence/risk" |
| Multi-agent messaging | MessageBus |
"agents need to communicate across turns/sessions" (Letta-style) |
| Surface multiplexing | Cmux |
"TUI needs multiple visible surfaces for parallel agents" |
Structural Root Cause
Single-process thinking drove process-per-unit. The original SF was a single-agent CLI. When parallelism was needed, the natural answer was spawn('sf headless') — a new OS process per milestone. This is correct for isolation but wrong for shared-state coordination. SQLite WAL was bolted on to let workers share a DB, which created the "shared DB with file-based locking" model that all orchestrators now use.
The UOK kernel was designed as a single-agent loop. It runs inside the headless process and manages one autonomous run. It does not know about sibling workers, does not coordinate with the parallel orchestrator, and does not have a model for "I am one of N workers running concurrently."
MessageBus was designed for persistent agents, but SF doesn't have persistent agents yet. The Letta-style inbox model is architecturally correct but premature — you need durable named agents before durable named inboxes matter. Today the MessageBus is used for UOK internal observer chains but not for real multi-agent coordination.
Subagent tool was never designed to integrate with SF's state. It spawns sf CLI which is a full TUI/CLI binary. It cannot call SF tools like complete-task or plan-slice because those are registered in the headless RPC path, not in the subagent's spawned CLI context. The 4 registered tools (subagent, scout, reviewer, reporter) are intentionally narrow to avoid dangerous nested dispatch.
The Concretion
The proliferation is a symptom of three missing abstractions:
- No unified "dispatch context" — subagent, parallel-orchestrator, and UOK each create their own notion of "what am I running and with what environment"
- No shared dispatch registry — there is no single place that tracks "what is currently running" across all parallelism dimensions
- No first-class "work unit" concept — milestone, slice, and task are different tables with different lock semantics, not different states of the same work unit
2. What Should Stay vs Merge
Keep (Genuinely Different Needs)
| Mechanism | Reason to Keep |
|---|---|
| UOK kernel | This is the autonomous loop engine. It implements the PDD gate model (confidence/risk/reversibility/blast-radius/cost). Removing it means rewriting autonomous mode from scratch. It should be the inner loop of dispatch, not replaced by it. |
| MessageBus | SQLite-backed durable inbox is the right model for cross-turn coordination when agents are long-lived. This is a genuine infrastructure primitive. However: it should be repurposed, not extended — it serves UOK diagnostics today and should serve agent handoff tomorrow. |
| Cmux | This is surface-layer multiplexing (terminal UI). It belongs in pi-tui, not in the dispatch layer. It should be decoupled from dispatch entirely — the parallel orchestrator should not know about Cmux grid layouts. |
Merge (Duplication Without Functional Difference)
| Duplicated | Problem | Resolution |
|---|---|---|
parallel-orchestrator.js + slice-parallel-orchestrator.js |
90% identical code. The only difference is scope (milestone vs slice) and the lock env var name (SF_MILESTONE_LOCK vs SF_SLICE_LOCK). The conflict detection, worktree management, and worker lifecycle are copy-pasted. |
Merge into a single WorktreeOrchestrator with a scope parameter. Share all file overlap detection, worktree lifecycle, and status tracking. |
| subagent tool's parallel/debate/chain modes vs parallel-orchestrator's milestone workers | Both implement "run multiple things at the same time." The subagent tool does in-process Promise.all over spawned sf CLIs; the parallel orchestrator does the same over sf headless with worktrees. They use different IPC mechanisms and different isolation models. |
Subagent tool should delegate to the unified orchestrator for multi-agent work, rather than managing its own concurrency pool. The subagent tool keeps single-agent dispatch (its core value) but offloads parallel/debate to the orchestrator layer. |
Refactor (Same Need, Wrong Implementation)
| Current | Issue | Refactor |
|---|---|---|
subagent spawning sf CLI |
Full CLI binary with TUI/headless分辨. Subagent is a thin wrapper that spawns a binary, not a dispatch primitive. The 4-tool limitation is a workaround for not having a proper dispatch API. | Subagent should use a headless RPC client directly, not spawn sf. This allows it to call any SF tool, not just the 4 registered ones. |
| parallel-orchestrator + slice-parallel using SQLite WAL + file IPC | Workers coordinate via sf headless + session status files + signal files. This is a hand-rolled IPC layer. The status files are "poll the filesystem" coordination — correct but fragile. |
Replace with MessageBus-based coordination. Workers publish status to MessageBus; coordinator subscribes. Eliminates file-based IPC and session status polling. |
| UOK kernel owning the autonomous loop | The kernel runs inside a headless process. When the parallel orchestrator spawns sf headless autonomous, each worker has its own UOK kernel. Coordination between kernels requires external signals. |
The UOK kernel should be the runtime environment for any autonomous dispatch, not a process-bound concept. The orchestrator manages worktree lifecycle; the kernel manages turn-level execution within each worktree. |
3. Streamlined Architecture
The Unified Dispatch Layer
┌─────────────────────────────────────────────────────────────────────┐
│ Unified Dispatch API (UDA) │
├─────────────────────────────────────────────────────────────────────┤
│ dispatch.work({ unit, mode, model, tools, cwd, signal }) │
│ dispatch.batch([{ unit, ... }, { unit, ... }], { strategy }) │
│ dispatch.chain([{ unit, after }, ...]) │
│ dispatch.debate([{ unit, role }, ...], { rounds }) │
│ dispatch.subscribe(handler) // for events: start, end, error, log │
│ dispatch.cancel(workId) │
│ dispatch.status() → { active: WorkInfo[] } │
└─────────────────────────────────────────────────────────────────────┘
Modes: isolated (worktree), shared (same process), rpc (separate process via headless)
How the Existing Components Map
| Component | Role in Unified Architecture |
|---|---|
| subagent tool | Becomes a thin UDA client in the TUI. Single-agent dispatch with the full SF tool access. Keeps the 4-mode interface (single/parallel/debate/chain) but implemented via UDA, not spawned CLI. |
| parallel-orchestrator + slice-parallel | Merge into WorktreeOrchestrator — a UDA backend that manages worktree lifecycle and multi-slot execution. Implements dispatch.work({ mode: 'isolated' }) for milestone/slice workers. |
| UOK kernel | Becomes UOK runtime — a UDA execution mode that wraps any dispatch with the PDD gate model. A dispatch.work({ unit, runControl: 'autonomous' }) automatically uses the UOK runtime. The kernel is not a separate process; it's the execution strategy. |
| MessageBus | Becomes the UDA event/logging backbone. All dispatch events (start, end, tool call, error, cost) are published to MessageBus. The parallel orchestrator's file-based IPC is replaced by MessageBus subscriptions. |
| Cmux | Decoupled entirely. Cmux listens to MessageBus for dispatch events and renders grid layouts accordingly. The dispatch layer does not know about Cmux. |
The Mental Model: Dispatch Is a Service, Not a Tool
The unified dispatch API is a service (backed by WorktreeOrchestrator + UOK runtime) that SF agents and tools call. It is not a tool itself and is not registered as one.
Agent/Tool Dispatch Service
│ │
├── dispatch.work() ───────►│ Spawns worktree, runs UOK loop
│ │
│◄──── work.start event ────┤
│◄──── work.end event ──────┤
│
├── dispatch.batch() ───────►│ Runs N work items in parallel
│ │ (via WorktreeOrchestrator)
│
├── dispatch.chain() ───────►│ Runs N items sequentially, passes
│ │ previous output as {previous} input
4. Multi-Dimensional Parallelism
SF needs to run multiple things concurrently at multiple levels:
| Dimension | Example | Current Implementation |
|---|---|---|
| Unit (milestone/slice) | Two milestones simultaneously | parallel-orchestrator (worktree-per-milestone) |
| Agent within unit | Two agents working on the same slice | subagent parallel mode (Promise.all over spawned CLIs) |
| Turn within agent | Agent running autonomous loop | UOK kernel (single-threaded, event loop) |
| Tool within turn | Concurrent tool executions | Not supported (single-threaded LLM dispatch) |
What Should Actually Be Parallel
The real parallelism need is at the unit level, not at the agent level. Milestones and slices are the natural parallelism boundary because:
- They have independent file scope (reduced conflict surface)
- They are tracked independently in the DB
- They have independent cost budgets
- They can recover independently from failure
Agent-level parallelism within a unit (subagent parallel/debate) is useful for review and research tasks but is not the primary parallelism mode. It should remain but as a secondary mechanism.
Proposed Multi-Dimensional Model
WorktreeOrchestrator
├── slot[0] → worktree for milestone M1
│ └── UOK kernel running autonomous loop
│ ├── turn[0]: agent dispatch
│ └── turn[1]: agent dispatch (sequential within unit)
├── slot[1] → worktree for milestone M2
│ └── UOK kernel running autonomous loop
└── slot[2] → worktree for slice S1 (within M1)
└── UOK kernel running autonomous loop
Constraints:
- Worktrees provide filesystem isolation (required for concurrent file mutations)
- Each worktree runs one UOK kernel (not multiple concurrent kernels per worktree)
- The kernel turn loop is sequential within a worktree (correct — you can't have two LLM turns modifying state simultaneously)
- Tool-level parallelism (e.g., running
grepandreadsimultaneously) is not needed — the LLM dispatches tools serially
Concurrency Limits
| Level | Max Concurrent |
|---|---|
| Project (milestones) | parallel.max_workers config (default: CPU cores / 2) |
| Milestone (slices) | parallel.slice_max_workers config (default: 2) |
| Subagent parallel tasks | MAX_CONCURRENCY = 4 (current hardcoded) |
5. DB Access from Subagents
The Current Constraint
The subagent tool cannot call SF DB tools (complete-task, plan-slice, etc.) because:
- It spawns
sfCLI which is a full binary with its own extension registration - The spawned CLI does not share the parent process's RPC connection
- The 4 registered tools (subagent, scout, reviewer, reporter) are intentionally all that's available
This is a correct security isolation, not a bug. A spawned sf CLI with full SF tool access running in a user-specified cwd is a significant attack surface.
The Right Model
Layer 1 — No direct DB access from subagents (correct, keep it) Subagents should not have direct SQLite access. The DB is the source of truth for the primary agent's state; subagents reading it creates consistency hazards.
Layer 2 — Structured output from subagents (keep and expand)
Subagents return structured output (via --mode json + event stream). The parent agent is responsible for interpreting the output and calling the appropriate DB tools. This is the "subagent as a function" model — it returns data, not mutations.
Layer 3 — Intention declaration for later commit
For cases where a subagent needs to propose a state change (e.g., "I found this issue, mark the slice as blocked"), the subagent should return a structured intention (e.g., { intended_action: "block_slice", slice_id: "S01", reason: "..." }). The parent agent reviews and commits it via its own DB tools.
Layer 4 — Shared WAL for read-your-own-writes consistency (future) When the UDA runs subagents in the same process (not spawned CLI), it can share the DB connection. This enables the subagent to read what the parent just wrote in the same transaction. This requires the subagent to run as a headless RPC client, not a spawned CLI.
Recommendation
Keep the current constraint for spawned-CLI subagents. The 4-tool limit is a security boundary, not a limitation to be fixed.
Add a new subagent mode — dispatch.work({ mode: 'rpc' }) — where the subagent runs as an RPC client in the same process, gaining access to all SF tools. This is the headless equivalent of the subagent tool. Use this for internal SF workflows (e.g., "dispatch a review subagent that calls complete-task").
6. Naming — The Mental Model
The current names reflect implementation history, not user intent. Here is what they should be:
Current → Proposed
| Current | Problem | Proposed | Rationale |
|---|---|---|---|
subagent tool |
"subagent" implies a lesser agent, not a dispatch primitive | dispatch tool (in TUI) |
The tool is the dispatch API surface |
parallel-orchestrator |
"orchestrator" is vague; doesn't convey worktree isolation | worktree-pool or worktree-scheduler |
Conveys the resource model |
slice-parallel-orchestrator |
Duplicate of above | Merge into worktree-pool |
See section 2 |
UOK kernel |
"kernel" implies OS-level; "UOK" is jargon | autonomous-runtime or UOK stays if we accept the acronym |
"UOK" means "unit-of-work kernel" internally; can keep if documented |
MessageBus |
Generic; doesn't convey durability | agent-inbox (but it's more than a bus) |
Actually, MessageBus is fine — it is a bus pattern. Keep it. |
Cmux |
"cmux" is implementation detail of terminal multiplexing | surface-grid |
User-facing concept: "show agents in a grid" |
The Unified Naming Hierarchy
dispatch — The high-level API and TUI tool name
├── work() — Run a single unit (milestone/slice/task)
├── batch() — Run multiple units in parallel (worktree pool)
├── chain() — Run units sequentially, passing output
├── debate() — Run units as adversarial roles
└── subscribe() — Listen to dispatch events
worktree-pool — The backend that manages worktree lifecycle
autonomous-runtime — The PDD-gated execution loop (UOK kernel)
MessageBus — Durable inter-agent messaging
7. Implementation Priority
This is a large refactor. The work should be sequenced to avoid breaking the current system while building the new one underneath.
Phase 1 — Foundation (Weeks 1-3)
Goal: Establish the UDA backbone without changing existing behavior.
| Task | Why | Risk |
|---|---|---|
Extract a minimal dispatch-worktree module from parallel-orchestrator.js that just manages worktree lifecycle (create/remove/heartbeat) |
The worktree management is the most isolated piece and the easiest to extract first | Low |
Add MessageBus subscriptions to dispatch-worktree for worker status (replacing session status file polling) |
MessageBus already exists; this just redirects the existing file-based IPC | Low |
Create dispatch-chain module that takes an array of { unit, afterId } and runs them sequentially, passing output |
Reuses worktree-pool; no new parallelism semantics | Low |
| Do NOT change subagent tool or parallel-orchestrator yet | These must keep working while foundation is laid | — |
Phase 2 — Merge (Weeks 4-6)
Goal: Eliminate duplication, keep external behavior identical.
| Task | Why | Risk |
|---|---|---|
Merge slice-parallel-orchestrator.js into dispatch-worktree as scope: 'slice' parameter |
90% code duplication; this is a pure refactor | Medium |
Replace parallel-orchestrator's file-based IPC with MessageBus subscriptions |
Changes the coordination mechanism but not the external API | Medium |
Add dispatch-batch() that calls dispatch-worktree for N units |
Reuses the same worktree pool; just adds the batch interface | Low |
| Verify all existing parallel orchestrator tests still pass | Regression protection | Low |
Phase 3 — Subagent RPC Mode (Weeks 7-8)
Goal: Subagent gains headless RPC access without spawning CLI.
| Task | Why | Risk |
|---|---|---|
Add dispatch.rpc() — spawn a headless RPC client (not CLI) for a subagent |
The 4-tool limitation goes away when subagent is an RPC client | Medium |
Wire subagent({ mode: 'rpc' }) to use dispatch.rpc() |
Subagent keeps its 4-mode interface; the implementation changes | Medium |
| Ensure subagent RPC mode cannot access tools the parent mode doesn't permit | Security boundary must be preserved | Medium |
Phase 4 — UOK as Execution Mode (Weeks 9-10)
Goal: UOK kernel becomes a dispatch execution mode, not a separate process.
| Task | Why | Risk |
|---|---|---|
Refactor runAutoLoopWithUok to be dispatch.autonomous() — a UDA execution mode |
The autonomous loop becomes a configuration of dispatch, not a separate entry point | Medium |
sf headless autonomous calls dispatch.batch() with UOK runtime per slot |
The headless binary becomes a thin launcher for the dispatch service | Medium |
| Remove the notion of "UOK kernel" as a separate coordination entity | The kernel is an execution context; coordination is dispatch's job | Medium |
Phase 5 — Cmux Decoupling (Week 11)
Goal: Cmux becomes a MessageBus subscriber, not a dispatch-aware component.
| Task | Why | Risk |
|---|---|---|
| Make Cmux grid layout creation driven by MessageBus events, not by dispatch calling Cmux directly | Dispatch should not know about terminal surface implementation | Low |
Remove cmuxSplitsEnabled from subagent tool |
This is the concrete coupling point — dispatch knows about Cmux grid layouts | Low |
Phase 6 — Naming Cleanup (Week 12)
Goal: Rename things to match the mental model once the refactor is stable.
| Task | Why | Risk |
|---|---|---|
Rename subagent tool to dispatch in the TUI (keep subagent as alias) |
User-facing naming should match the mental model | Low |
Rename parallel-orchestrator file to worktree-pool.js |
Internal naming | Low |
Document the architecture in ARCHITECTURE.md |
The current dispatch docs are scattered | Low |
Summary
The 5 dispatch mechanisms + 1 message bus represent 3 genuinely different needs (UOK autonomous loop, worktree-based isolation, durable inter-agent messaging) and 3 duplications (parallel-orchestrator + slice-parallel-orchestrator; subagent parallel mode + parallel-orchestrator; Cmux tight coupling). The root cause is that dispatch, orchestration, and coordination evolved separately rather than being designed as layers of one system.
The plan is to:
- Merge
parallel-orchestrator+slice-parallel-orchestratorinto a singleWorktreePool - Make subagent an RPC client of a unified
Dispatchservice, not a spawned CLI - Make UOK an execution mode of the dispatch service, not a separate process
- Make MessageBus the event backbone replacing all file-based IPC
- Decouple Cmux from dispatch entirely (it subscribes to MessageBus)
- Sequence the refactor so existing behavior is preserved at each step