23 KiB
Dispatch/Orchestration Architecture — Consolidation Plan
Author: Research synthesis
Date: 2026-05-08
Status: Draft — for review and promotion
1. Root Cause Diagnosis
The 5 dispatch mechanisms + 1 message bus grew to fill genuine gaps at different stages, but the structural symptom is a missing abstraction layer: there is no unified concept that unifies "what to run" (controller) from "how to run" (mechanism).
Timeline of divergence
| Era | Mechanism | Gap filled |
|---|---|---|
| Early SF | subagent tool (extensions/subagent/index.js) |
Ad-hoc delegation: "run this agent for this task" from within a session |
| Parallel work | parallel-orchestrator.js |
"run milestone X in a worktree, independently" — process isolation via spawn('sf headless') |
| Slice-level work | slice-parallel-orchestrator.js |
Same as above but at slice granularity — 90% copy-paste of parallel-orchestrator |
| Autonomous loop | UOK kernel (uok/kernel.js) |
"run the full PDD loop continuously, gated by confidence/risk" |
| Multi-agent messaging | MessageBus (uok/message-bus.js) |
"agents need to communicate across turns/sessions" (Letta-style) |
| Surface multiplexing | Cmux (cmux/index.js) |
"TUI needs multiple visible surfaces for parallel agents" |
Three structural problems
1. Single-process thinking drove process-per-unit.
SF was originally a single-agent CLI. When parallelism was needed, the natural answer was spawn('sf headless') — a new OS process per milestone. This is correct for filesystem isolation but requires bolted-on coordination (SQLite WAL, file-based IPC, session status polling).
2. UOK kernel was designed as a single-agent loop.
It runs inside a headless process and manages one autonomous run. It does not know about sibling workers spawned by parallel-orchestrator, does not coordinate with it, and has no model for "I am one of N workers running concurrently."
3. MessageBus was designed for persistent agents SF doesn't have yet.
The Letta-style inbox model is architecturally correct but premature — you need durable named agents before durable named inboxes matter. Today MessageBus is used for UOK internal observer chains but not for real multi-agent coordination between workers and coordinator.
The concretion
The proliferation is a missing abstraction problem at three levels:
- No unified "dispatch context" — subagent, parallel-orchestrator, and UOK each create their own notion of "what am I running and with what environment"
- No shared dispatch registry — no single place that tracks "what is currently running" across all parallelism dimensions
- No first-class "work unit" concept — milestone, slice, and task are different tables with different lock semantics, not different states of the same work unit
2. What Should Stay vs Merge
Stay (genuinely different needs)
| Mechanism | Reason to Keep |
|---|---|
| UOK kernel | Autonomous loop engine implementing the PDD gate model (confidence/risk/reversibility/blast-radius/cost). This is the controller — it decides what to run, not how. Removing it means rewriting autonomous mode from scratch. |
| MessageBus | SQLite-backed durable inbox is the right model for cross-turn coordination. This is genuine infrastructure. However: it should be repurposed — it serves UOK diagnostics today and should serve agent handoff when persistent agents land (v3.1 per BUILD_PLAN.md). |
| Cmux | Terminal UI surface multiplexing. Belongs in pi-tui, not the dispatch layer. Should be decoupled from dispatch entirely — parallel orchestrator should not know about Cmux grid layouts. |
Execution graph (uok/execution-graph.js) |
File-conflict DAG that computes which milestones/slices can run in parallel. This is the constraint solver — stays separate from dispatch mechanism. |
Merge (duplication without functional difference)
| Duplicated | Problem | Resolution |
|---|---|---|
parallel-orchestrator.js + slice-parallel-orchestrator.js |
~80% identical. Only diffs: scope (milestone vs slice), lock env vars, status file naming. Slice orchestrator additionally calls slice-parallel-conflict.ts for file overlap filtering. |
Merge into a single WorktreeOrchestrator parameterized by `{ scope: 'milestone' |
| subagent tool's parallel/debate/chain modes vs parallel-orchestrator's milestone workers | Both implement "run multiple things at the same time." Subagent tool does in-process Promise.all over spawned sf CLIs; parallel-orchestrator does the same with worktrees. Different IPC mechanisms, different isolation models. |
Subagent keeps single-agent dispatch (its core value). For multi-agent work, subagent should delegate to the unified orchestrator rather than managing its own concurrency pool. |
Refactor (same need, wrong implementation)
| Current | Issue | Refactor |
|---|---|---|
subagent spawning sf CLI |
Thin wrapper spawning a full binary. Only 4 tools registered as a workaround for not having a proper dispatch API. | Subagent should use a headless RPC client directly, not spawn sf. Enables calling any SF tool, not just 4 registered ones. |
| parallel/slice orchestrator using SQLite WAL + file IPC | Hand-rolled IPC via session status files + signal files. "Poll the filesystem" coordination — correct but fragile. | Replace with MessageBus-based coordination. Workers publish status to MessageBus; coordinator subscribes. |
| UOK kernel owning the autonomous loop | Runs inside a headless process. When parallel orchestrator spawns sf headless autonomous, each worker has its own UOK kernel with no coordination between kernels. |
UOK kernel should be the runtime environment for any autonomous dispatch, not a process-bound concept. |
3. Streamlined Architecture
Three-tier dispatch model
┌──────────────────────────────────────────────────────────────────┐
│ UOK Kernel (controller) │
│ Decides WHAT to run next; enforces PDD gates, policy, parity │
│ - Phase machine: Discuss → Plan → Execute → Merge → Complete │
│ - Calls WorktreeOrchestrator.dispatch() to execute │
└────────────────────────────┬──────────────────────────────────────┘
│ DispatchEnvelope { scope, unitId, ... }
▼
┌──────────────────────────────────────────────────────────────────┐
│ WorktreeOrchestrator (mechanism) │
│ Decides HOW to run: worktree lifecycle, process registry, budget │
│ - Worktree pool (git worktree per milestone/slice) │
│ - Process registry (child_process per worker) │
│ - Cost accumulator (NDJSON parsing from worker stdout) │
│ - File-intent tracker (parallel-intent.js) │
│ - MessageBus integration per worker (AgentInbox) │
└────────────────────────────┬──────────────────────────────────────┘
│ spawns
▼
┌──────────────────────────────────────────────────────────────────┐
│ Worker (execution unit) │
│ `sf headless --json autonomous` in a worktree │
│ - Owns SQLite WAL connection to project DB │
│ - Has AgentInbox for MessageBus delivery │
│ - Emits NDJSON events consumed by WorktreeOrchestrator │
└──────────────────────────────────────────────────────────────────┘
How existing components map
| Component | Role in Unified Architecture |
|---|---|
| subagent tool | Thin UDA client in TUI. Single-agent dispatch with full SF tool access. Keeps 4-mode interface (single/parallel/debate/chain) but implemented via UDA, not spawned CLI. |
| parallel-orchestrator + slice-parallel | Merge into WorktreeOrchestrator — a UDA backend managing worktree lifecycle and multi-slot execution. |
| UOK kernel | Becomes autonomous-runtime — a UDA execution mode wrapping any dispatch with the PDD gate model. dispatch.work({ unit, runControl: 'autonomous' }) automatically uses the autonomous-runtime. |
| MessageBus | Becomes the UDA event/logging backbone. All dispatch events (start, end, error, cost) published to MessageBus. File-based IPC replaced by MessageBus subscriptions. |
| Cmux | Decoupled entirely. Cmux listens to MessageBus for dispatch events and renders grid layouts. Dispatch layer does not know about Cmux. |
WorktreeOrchestrator interface (proposed)
// File: src/resources/extensions/sf/worktree-orchestrator.js
interface DispatchOptions {
scope: 'milestone' | 'slice';
milestoneId: string;
sliceId?: string;
basePath: string;
maxWorkers?: number;
budgetCeiling?: number;
workerTimeoutMs?: number;
shellWrapper?: string[];
useExecutionGraph?: boolean;
}
class WorktreeOrchestrator {
// Returns eligible units filtered by execution-graph conflicts
async prepare(opts: DispatchOptions): Promise<PrepareResult>;
// Start workers for given unit IDs
async start(ids: string[], opts: DispatchOptions): Promise<StartResult>;
// Stop all or specific workers
async stop(ids?: string[]): Promise<void>;
// Pause/resume workers via MessageBus
pause(ids?: string[]): void;
resume(ids?: string[]): void;
// Read current state (for dashboard)
getStatus(): DispatchStatus;
// Shared MessageBus instance
readonly bus: MessageBus;
// Budget tracking
totalCost(): number;
isBudgetExceeded(): boolean;
}
How UOK kernel uses WorktreeOrchestrator
Today, uok/kernel.js runs the autonomous loop and calls into tools that spawn agents. The parallel orchestrator is started separately by the TUI dashboard or headless command. After unification:
- UOK kernel initializes
WorktreeOrchestratorat autonomous loop start - UOK calls
orchestrator.start(eligibleMilestoneIds)for parallel milestones - Workers emit NDJSON events → orchestrator parses cost → updates budget
- Workers emit completion → UOK kernel processes post-unit staging
- Workers receive messages via their
AgentInbox(MessageBus integration) orchestrator.stop()called on autonomous loop exit
4. Multi-Dimensional Parallelism
Current axes of parallelism
| Axis | Mechanism | Status |
|---|---|---|
| Inter-project | Multiple sf invocations |
✅ not SF's concern |
| Inter-milestone | parallel-orchestrator + worktrees | ✅ implemented |
| Inter-slice | slice-parallel-orchestrator + worktrees | ✅ implemented |
| Inter-task (in-process) | subagent parallel mode |
✅ mapWithConcurrencyLimit |
| Inter-agent (debate/chain) | subagent debate/chain mode |
✅ implemented |
| Terminal-level | Cmux grid layout for parallel agents | ✅ implemented |
What "true concurrency" means
The current architecture already achieves true process-level concurrency via worktrees and separate sf headless processes. The shared SQLite WAL allows concurrent readers with a single writer.
What is missing is coordinated dispatch, not more parallelism axes:
- The execution graph (
uok/execution-graph.js) already computes file-conflict relationships selectConflictFreeBatchpicks a conflict-free subset for parallel dispatch- But this is only wired into parallel-orchestrator, not into slice-parallel or the UOK autonomous loop's dispatch decisions
Proposed coordination model
Execution Graph (file-conflict DAG)
│
├── selectConflictFreeBatch() ──► WorktreeOrchestrator.start()
│ Workers run in parallel
│ Each worker has AgentInbox
│
UOK kernel
│
├── reads unit readiness from DB
├── calls WorktreeOrchestrator.start(milestoneIds)
└── calls WorktreeOrchestrator.start(sliceIds) for intra-milestone parallelism
Debate mode (subagent tool): runs multiple agents sequentially within a single process using mapWithConcurrencyLimit. This is not true process-level parallelism but is correct for LLM-based debate where shared context and a single conversation transcript are needed.
Chain mode: purely sequential — each step's output feeds into the next step's prompt.
Concurrency limits
| Level | Max Concurrent |
|---|---|
| Project (milestones) | parallel.max_workers config (default: CPU cores / 2) |
| Milestone (slices) | parallel.slice_max_workers config (default: 2) |
| Subagent parallel tasks | MAX_CONCURRENCY = 4 (hardcoded in subagent/index.js) |
5. DB Access from Subagents
The current constraint is intentional
The subagent tool cannot call complete-task or plan-slice because:
- Only 4 tools are registered in the subagent extension manifest
- The subagent is meant to be a task executor, not a state mutator
This is a correct security isolation, not a bug. A spawned sf CLI with full SF tool access running in a user-specified cwd is a significant attack surface.
The right model: two-tier DB access
Coordinator (UOK kernel) ──► project .sf/sf.db (WAL mode)
milestone/slice state
task execution ledger
Subagent (sf process) ──► ~/.sf/sf.db (global)
memories, preferences
agent-level state
✗ project .sf/sf.db (write)
The subagent can read project state via prompt injection (system context assembly already does this). Writes only to global state.
If a subagent needs to record a finding
- Subagent writes to its output (stdout/file)
- Coordinator reads and processes the output
- Coordinator calls DB tools
This is the Letta pattern — agents return results, the orchestrator decides what to persist.
Architectural backing for the constraint
// In subagent tool — formalize the access contract
const SUBAGENT_DB_ACCESS = {
read: ['project_context'], // via prompt injection only
write: ['~/.sf/sf.db'], // global state only
prohibited: ['project .sf/sf.db write operations']
};
The extension manifest's tools[] array currently enforces this by omission. A more explicit model would declare the access contract formally, making it auditable.
Future: RPC-mode subagent
Keep the current constraint for spawned-CLI subagents.
Add a new subagent mode — dispatch.work({ mode: 'rpc' }) — where the subagent runs as an RPC client in the same process, gaining access to all SF tools. This is the headless equivalent of the subagent tool. Use this for internal SF workflows (e.g., "dispatch a review subagent that calls complete-task").
6. Naming — The Mental Model
Current → Proposed
| Current | Problem | Proposed | Rationale |
|---|---|---|---|
subagent tool |
"subagent" implies a lesser agent | dispatch tool (in TUI) |
The tool is the dispatch API surface |
parallel-orchestrator |
"orchestrator" is vague; doesn't convey worktree isolation | milestone-dispatcher |
Conveys scope + role |
slice-parallel-orchestrator |
Duplicate of above | slice-dispatcher (merged into WorktreeOrchestrator) |
See section 2 |
WorktreeOrchestrator (new) |
— | worktree-orchestrator |
Backend that manages worktree lifecycle |
UOK kernel |
"kernel" implies OS-level; "UOK" is jargon | autonomous-runtime |
PDD-gated execution loop |
MessageBus |
Generic | keep as-is | It is a bus pattern. Keep it. |
Cmux |
"cmux" is implementation detail | surface-grid |
User-facing: "show agents in a grid" |
The mental model hierarchy
dispatch — The high-level API and TUI tool name
├── work() — Run a single unit (milestone/slice/task)
├── batch() — Run multiple units in parallel (worktree pool)
├── chain() — Run units sequentially, passing output
├── debate() — Run units as adversarial roles
└── subscribe() — Listen to dispatch events (MessageBus)
worktree-orchestrator — Backend: worktree lifecycle + process registry
autonomous-runtime — PDD gate model (UOK kernel, renamed)
MessageBus — Durable inter-agent messaging (keeps name)
surface-grid — Cmux decoupled from dispatch
Why "kernel" is the right metaphor for UOK (keep it internally):
A kernel manages resources and enforces policy; it doesn't do the work itself. The UOK kernel evaluates confidence/risk gates, manages parity reporting, and decides when to proceed — but it delegates execution to WorktreeOrchestrator. The name fits.
7. Implementation Priority
Phase 1 — Merge the two orchestrators (Lowest risk, highest clarity)
1.1 — Extract WorktreeOrchestrator from parallel-orchestrator + slice-parallel
Create new dispatch-layer.js that merges the ~80% shared logic. Parameterized by { scope: 'milestone' | 'slice' }. The slice orchestrator's conflict-filtering logic (filterConflictingSlices in slice-parallel-conflict.ts) already lives separately — call it from the merged class.
Files touched:
- New:
src/resources/extensions/sf/dispatch-layer.js - Refactor:
parallel-orchestrator.js→ thin wrapper calling dispatch-layer - Refactor:
slice-parallel-orchestrator.js→ thin wrapper calling dispatch-layer
Test: Both /parallel command and slice-level parallelism continue to work identically. Dashboard continues to show correct worker states.
Effort: ~1 week. Pure refactor, no behavior change.
Phase 2 — Wire MessageBus into WorktreeOrchestrator
2.1 — Add AgentInbox to each worker
Every sf headless worker opens a MessageBus inbox named after its milestone/slice ID. The coordinator can send messages to workers (pause, resume, report status).
2.2 — Replace file-based IPC with MessageBus
Replace session-status-io.js polling and sendSignal file-based signals with MessageBus send(). File-based signals remain as crash-recovery fallback.
Files touched:
dispatch-layer.js(new)session-status-io.js(add MessageBus-backed path)- Worker bootstrap in both orchestrators
Test: Workers respond to coordinator pause/resume messages delivered via MessageBus.
Effort: ~3 days.
Phase 3 — Subagent RPC Mode
3.1 — Add dispatch.rpc() — spawn a headless RPC client (not CLI)
The 4-tool limitation goes away when subagent is an RPC client. The subagent keeps its 4-mode interface; the implementation changes.
3.2 — Ensure subagent RPC mode cannot access tools the parent mode doesn't permit
Security boundary must be preserved. This is where the access contract from section 5 gets enforced.
Files touched:
extensions/subagent/index.js(add RPC mode path)extensions/subagent/rpc-client.js(new)
Test: Subagent with mode: 'rpc'} can call complete-task and other SF tools.
Effort: ~1 week.
Phase 4 — UOK Kernel Adopts WorktreeOrchestrator
4.1 — Replace direct parallel-orchestrator calls with WorktreeOrchestrator
The autonomous loop's parallel dispatch path (analyzeParallelEligibility → startParallel) goes through WorktreeOrchestrator instead of calling parallel-orchestrator directly.
4.2 — UOK reads worker status from WorktreeOrchestrator
Dashboard refresh reads from orchestrator.getStatus() instead of directly from parallel-orchestrator's state.
Files touched:
uok/kernel.js(import WorktreeOrchestrator)parallel-orchestrator.js(becomes wrapper or is removed)
Test: Autonomous mode with parallel milestones works identically to current behavior.
Effort: ~3 days.
Phase 5 — Cmux Decoupling
5.1 — Make Cmux grid layout driven by MessageBus events
Dispatch should not call Cmux directly. Cmux subscribes to MessageBus dispatch events and creates/destroys grid surfaces accordingly.
5.2 — Remove cmuxSplitsEnabled from subagent tool
This is the concrete coupling point — dispatch knows about Cmux grid layouts. Remove it; let Cmux manage its own surface allocation based on dispatch events.
Files touched:
cmux/index.js(add MessageBus subscriber)extensions/subagent/index.js(remove cmuxSplitsEnabled)
Effort: ~2 days.
Phase 6 — Naming Cleanup
6.1 — Rename dispatch-layer.js → worktree-orchestrator.js
6.2 — Rename parallel-orchestrator wrapper → milestone-dispatcher.js
6.3 — Rename slice-parallel-orchestrator wrapper → slice-dispatcher.js
6.4 — Update all import references
Effort: ~1 day.
Phase 7 — Document the Architecture
7.1 — Update ARCHITECTURE.md
Add a section on the unified dispatch architecture. Current dispatch docs are scattered across inline comments and session-status-io.
Effort: ~1 day.
Summary
The 5 dispatch mechanisms + 1 message bus represent 3 genuinely different needs (UOK autonomous loop, worktree-based isolation, durable inter-agent messaging) and 3 duplications (parallel-orchestrator + slice-parallel-orchestrator; subagent parallel mode + parallel-orchestrator; Cmux tight coupling).
The plan:
- Merge
parallel-orchestrator+slice-parallel-orchestratorintoWorktreeOrchestrator - Wire MessageBus into WorktreeOrchestrator — workers become reachable via durable messaging
- UOK kernel becomes the controller that calls WorktreeOrchestrator, not a parallel system
- Subagent tool stays separate — it's ad-hoc in-session delegation, with an optional RPC mode for internal workflows
- Cmux becomes a MessageBus subscriber, decoupled from dispatch
- DB access model is already correct: spawned subagents cannot write to project DB; workers dispatched via WorktreeOrchestrator can
The adversarial_partner/adversarial_combatant/adversarial_architect fields already in the DB are planning ceremony fields (Letta-inspired), not dispatch mechanism fields. They belong in the PDD planning layer, not in the dispatch layer.
Total effort estimate: 3-4 weeks across 7 phases, sequenced to preserve existing behavior at each step.