24 KiB
Dispatch/Orchestration Architecture — Consolidation Plan
Author: Research synthesis
Date: 2026-05-08
Status: Draft — for review and promotion
1. Root Cause Diagnosis — Why Did This Proliferation Happen?
The five dispatch mechanisms + 1 message bus grew to fill genuine gaps, not from poor design. But the structural symptom is consistent with every system that accumulates dispatch primitives without a unifying abstraction: there is no single concept that unifies them.
Each addition was driven by a real gap at a different time:
| Mechanism | Gap filled | Structural symptom |
|---|---|---|
subagent tool (extensions/subagent/index.js) |
Ad-hoc delegation from within a TUI/headless session | First-class spawning of a full CLI process via spawn(); only 4 tools registered; no DB tools |
parallel-orchestrator (parallel-orchestrator.js) |
True parallel milestone execution with git worktree isolation | Mirrors subagent's spawn pattern but at milestone scope with session status files, cost accumulation, and file-intent tracking |
slice-parallel-orchestrator (slice-parallel-orchestrator.js) |
Slice-level parallelism within a milestone | Copy-paste of parallel-orchestrator with scope changed; ~90% identical code |
UOK kernel (uok/kernel.js) |
Deterministic autonomous loop with gates, observability, parity reporting | Grew into the central orchestration engine but does not subsume the dispatch primitives below it |
MessageBus (uok/message-bus.js) |
Durable SQLite-backed inter-agent messaging for multi-agent coordination | Modeled on Letta's SQLite-backed messaging; lives in UOK but is not wired into subagent or parallel-orchestrator dispatch paths |
Cmux (cmux/index.js) |
RPC multiplexing and terminal surface integration | Orthogonal to dispatch — a UI/surface concern, not an orchestration concern |
The Concretion
Three missing abstractions drove the proliferation:
-
No unified "dispatch context" — subagent, parallel-orchestrator, and UOK each create their own notion of "what am I running and with what environment." The result is three different spawn patterns, three different ways of tracking cost, and no shared vocabulary.
-
No shared dispatch registry — there is no single place that tracks "what is currently running" across all parallelism dimensions. The parallel orchestrator tracks milestone workers via session status files; the slice-parallel orchestrator tracks slice workers separately; subagent tracks spawned processes in a
Set. These are not unified. -
No first-class "work unit" concept — milestone, slice, and task are different tables with different lock semantics, not different states of the same work unit. This is why the slice-parallel orchestrator had to be a near-total copy of the milestone orchestrator rather than a parameterization.
The UOK kernel was designed as a single-agent loop. It runs inside the headless process and manages one autonomous run. It does not know about sibling workers, does not coordinate with the parallel orchestrator, and does not have a model for "I am one of N workers running concurrently."
Subagent tool was never designed to integrate with SF's state. It spawns sf CLI which is a full binary with its own extension registration. It cannot call SF tools like complete-task or plan-slice because those are registered in the headless RPC path, not in the subagent's spawned CLI context. The 4 registered tools are intentionally narrow to avoid dangerous nested dispatch.
MessageBus was designed for persistent agents, but SF doesn't have persistent agents yet. The Letta-style inbox model is architecturally correct but premature — you need durable named agents before durable named inboxes matter. Today the MessageBus is used for UOK internal observer chains but not for real multi-agent coordination.
The adversarial_partner/combatant/architect Fields
These DB fields (in slices table, sf-db.js) are planning ceremony fields, not dispatch mechanism fields. They belong in the PDD planning layer and are rendered in markdown-renderer.js and workflow-projections.js as "Partner Review", "Combatant Review", and "Architect Review" sections in slice output. They have nothing to do with the dispatch layer — they are populated by planning tools, not by dispatch.
2. What Should Stay vs Merge
Stay (genuinely different concerns)
| Mechanism | Reason to Keep |
|---|---|
subagent tool (extensions/subagent/index.js) |
Ad-hoc in-session delegation. The 4-tool surface (subagent, await_subagent, cancel_subagent, /subagent command) is the right interface for human-in-the-loop or autonomous session agents that need to spin up a helper without leaving their context. The restriction to only those 4 tools is intentional and correct. The subagent spawns sf --mode json (not sf headless), which is correct for its shorter-lived, interactive nature. |
UOK kernel (uok/kernel.js, uok/index.js) |
The deterministic autonomous loop with gate evaluation, parity reporting, audit envelopes, and run-control policy. This is the controller in the architecture sense. It decides what to run next; it does not implement how to run it. The runAutoLoopWithUok function is correctly scoped. |
MessageBus (uok/message-bus.js) |
Durable SQLite-backed inter-agent messaging. The send, broadcast, sendOnce, getConversation, and AgentInbox primitives are genuinely useful for multi-agent coordination. The Letta-style design is sound. The problem is it is not wired into the dispatch path — agents spawned by subagent or parallel-orchestrator cannot use it. |
Cmux (cmux/index.js) |
RPC multiplexing and terminal surface integration. Orthogonal to dispatch — a UI/surface concern, not an orchestration concern. Correctly scoped as a UI/shell concern. |
Execution graph (uok/execution-graph.js) |
The file-conflict DAG that computes which milestones/slices can run in parallel. This is the constraint solver — it knows about file overlaps but not about process lifecycle. |
CoordinationStore (uok/coordination-store.js) |
Redis-like primitives (TTL KV, streams, lease-based queues) on SQLite. Right building block for durable background coordination without a server process. |
Merge (duplication with no semantic difference)
| Duplicated | Problem | Resolution |
|---|---|---|
parallel-orchestrator.js + slice-parallel-orchestrator.js |
~90% identical code. The only meaningful differences: scope (milestone vs slice), lock env vars (SF_MILESTONE_LOCK vs SF_SLICE_LOCK + SF_MILESTONE_LOCK), and status file naming (milestoneId vs milestoneId/sliceId). The conflict detection, worktree management, worker lifecycle, NDJSON parsing, and cost tracking are copy-pasted. |
Merge into a single WorktreeOrchestrator class parameterized by `{ scope: 'milestone' |
Refactor (same need, wrong implementation)
| Current | Issue | Refactor |
|---|---|---|
subagent spawning sf CLI |
The subagent tool spawns sf CLI as a full binary. The 4-tool limitation is enforced by not registering other tools, not by a principled access model. |
Keep spawning sf CLI for security isolation, but formalize the access contract explicitly. See section 5. |
| parallel-orchestrator + slice-parallel using file-based IPC | Workers coordinate via session-status-io.js (filesystem polling) and sendSignal. This is a hand-rolled IPC layer. The filesystem polling is correct but fragile. |
Replace with MessageBus-based coordination. Workers publish status to MessageBus; coordinator subscribes. See section 3. |
3. Streamlined Architecture — The Unified Dispatch Layer
Three-tier conceptual model
┌─────────────────────────────────────────────────────────────┐
│ UOK Kernel (controller) │
│ Decides WHAT to run next; enforces gates, policy, parity │
│ - Phase machine: Discuss → Plan → Execute → Merge → Complete │
│ - Calls DispatchLayer.dispatch() to execute │
└─────────────────────────────┬─────────────────────────────────┘
│ DispatchEnvelope { scope, unitId, ... }
▼
┌─────────────────────────────────────────────────────────────┐
│ DispatchLayer (mechanism) │
│ Decides HOW to run: worktree? process? in-process? │
│ - Worktree pool (git worktree per milestone/slice) │
│ - Process registry (child_process per worker) │
│ - Budget accumulator (cost tracking via NDJSON parsing) │
│ - File-intent tracker (parallel-intent.js) │
│ - AgentInbox per worker (MessageBus integration) │
└─────────────────────────────┬─────────────────────────────────┘
│ spawns
▼
┌─────────────────────────────────────────────────────────────┐
│ Worker (execution unit) │
│ `sf headless --json autonomous` in a worktree │
│ - Owns SQLite WAL connection to project DB │
│ - Has AgentInbox for MessageBus delivery │
│ - Emits NDJSON events consumed by DispatchLayer │
└─────────────────────────────────────────────────────────────┘
Subagent tool relationship to DispatchLayer
The subagent tool and DispatchLayer serve different dispatch scopes:
-
subagent tool: in-session, ad-hoc, short-lived. The subagent is a separate
sfCLI process spawned from within a running session and its output is returned to the caller synchronously. It is not managed by the DispatchLayer's worktree pool or budget tracking. It spawnssf --mode json(notsf headless), which is correct for its interactive nature. -
DispatchLayer: autonomous, long-running, milestone/slice scoped. Workers are spawned and tracked by DispatchLayer; they emit cost events back to the layer; they share the project DB via WAL.
These two paths should remain separate but use the same worker bootstrap (sf headless --json autonomous).
DispatchLayer interface (proposed)
// lives in: src/resources/extensions/sf/dispatch-layer.js
interface DispatchOptions {
scope: 'milestone' | 'slice';
milestoneId: string;
sliceId?: string;
basePath: string;
maxWorkers?: number;
budgetCeiling?: number;
workerTimeoutMs?: number;
shellWrapper?: string[];
useExecutionGraph?: boolean;
}
class DispatchLayer {
// Returns eligible units filtered by execution-graph conflicts
async prepare(opts: DispatchOptions): Promise<PrepareResult>;
// Start workers for given unit IDs
async start(ids: string[], opts: DispatchOptions): Promise<StartResult>;
// Stop all or specific workers
async stop(ids?: string[]): Promise<void>;
// Pause/resume
pause(ids?: string[]): void;
resume(ids?: string[]): void;
// Read current state (for dashboard)
getStatus(): DispatchStatus;
// Shared MessageBus instance
readonly bus: MessageBus;
// Budget
totalCost(): number;
isBudgetExceeded(): boolean;
}
How UOK kernel uses DispatchLayer
Today, uok/kernel.js runs the autonomous loop and calls into tools like execute_task which eventually spawn agents. The parallel orchestrator is started separately by the TUI dashboard or headless command. After unification:
- UOK kernel initializes
DispatchLayerat autonomous loop start - UOK calls
dispatchLayer.start(eligibleMilestoneIds)for parallel milestones - Workers emit NDJSON events → DispatchLayer parses cost → updates budget
- Workers emit completion → UOK kernel processes post-unit staging
- Workers can receive messages via their
AgentInbox(MessageBus integration) DispatchLayer.stop()called on autonomous loop exit
4. Multi-Dimensional Parallelism
Three axes of parallelism
| Axis | Mechanism | Status |
|---|---|---|
| Inter-project | Multiple sf invocations (manual or CI) |
✅ not SF's concern |
| Inter-milestone | DispatchLayer + worktrees | ✅ currently via parallel-orchestrator |
| Inter-slice | DispatchLayer + worktrees | ✅ currently via slice-parallel-orchestrator |
| Inter-task (in-process) | subagent parallel mode |
✅ implemented (mapWithConcurrencyLimit) |
| Inter-agent (debate/chain) | subagent debate/chain mode |
✅ implemented |
| Terminal-level | Cmux grid layout for parallel agents | ✅ implemented |
What "true concurrency" means
The current architecture already achieves true process-level concurrency via worktrees and separate sf headless processes. The shared SQLite WAL means all workers can read the same DB concurrently — WAL allows concurrent readers with a single writer.
What is missing is not more parallelism axes but coordinated dispatch:
- The execution graph (
uok/execution-graph.js) already computes file-conflict relationships between milestones and slices selectConflictFreeBatchpicks a conflict-free subset for parallel dispatch- But this is only wired into parallel-orchestrator, not into the slice-parallel path or the UOK autonomous loop's dispatch decisions
Proposed coordination model
The execution graph is the source of truth for parallelism constraints. The DispatchLayer is the enforcer. The UOK kernel is the policy layer:
Execution Graph (file-conflict DAG)
│
├── selectConflictFreeBatch() ──► DispatchLayer.start()
│ Workers run in parallel
│ Each worker has AgentInbox
│
UOK kernel
│
├── reads unit readiness from DB
├── calls DispatchLayer.start(milestoneIds)
└── calls DispatchLayer.start(sliceIds) for intra-milestone parallelism
Debate mode (subagent tool): runs multiple agents sequentially within a single process using mapWithConcurrencyLimit. This is not true process-level parallelism but is correct for LLM-based debate where shared context and a single conversation transcript are needed. The Cmux grid layout provides terminal-level parallelism for these agents via split panes.
Chain mode: purely sequential — each step's output feeds into the next step's prompt. No parallelism needed here.
5. DB Access from Subagents
The current model
Subagents spawn sf CLI as a separate process with its own environment. The inheritance envelope (subagent-inheritance.js) propagates preferences, but the subagent's sf process opens its own SQLite connection to ~/.sf/sf.db (global state) or .sf/sf.db (project state). This is correct isolation — a subagent should not write to the project DB directly.
The constraint is intentional
The subagent tool cannot call complete-task or plan-slice — not because those tools don't exist in the subagent's tool registry, but because:
- Only 4 tools are registered in the subagent extension manifest (
subagent,await_subagent,cancel_subagent, and the/subagentcommand) - The subagent is meant to be a task executor, not a state mutator
If a subagent could call complete-task, it could mark tasks done without the coordinator's knowledge, corrupting the UOK state machine.
The right model: two-tier DB access
Coordinator (UOK kernel) ──► project .sf/sf.db (WAL mode)
milestone/slice state
task execution ledger
Subagent (sf process) ──► ~/.sf/sf.db (global)
memories, preferences
agent-level state
✗ project .sf/sf.db
The subagent can read from the project DB for context (via system prompt injection), but writes only to global state. The inheritanceEnvelope already controls what context the subagent receives.
Exception: The sf CLI that runs as a DispatchLayer worker (sf headless --json autonomous) is a different mode — it IS the coordinator for its worktree's scope and SHOULD write to the project DB. This is already how it works (workers open .sf/sf.db in the worktree, which syncs from the project root via syncSfStateToWorktree).
What subagents CAN do with the DB
- Read project state via prompt injection (system context assembly already does this)
- Write to global
~/.sf/sf.dbfor their own memories and preferences - NOT write to the project
.sf/sf.db
If a subagent needs to record a finding that the coordinator should see, the right pattern is:
- Subagent writes to its output (stdout/file)
- Coordinator reads and processes the output
- Coordinator calls DB tools
This is the same pattern as Letta agents — agents return results, the orchestrator decides what to persist.
Architectural backing for the constraint
The "no DB tools for subagents" constraint should be backed by a principled access model, not just "we didn't register those tools." Proposed:
// In subagent tool — formalize the access contract
const SUBAGENT_DB_ACCESS = {
read: ['project_context'], // via prompt injection only
write: ['~/.sf/sf.db'], // global state only
prohibited: ['project .sf/sf.db write operations']
};
The extension manifest's tools[] array currently enforces this by omission. A more explicit model would declare the access contract formally, making it auditable.
6. Naming — What Should the Mental Model Be?
The names are confusing because they mix three different layers of abstraction. Proposed renaming:
| Current name | Proposed name | Reason |
|---|---|---|
parallel-orchestrator.js |
milestone-dispatcher.js |
Describes scope + role |
slice-parallel-orchestrator.js |
slice-dispatcher.js |
Scope + role; merges into unified DispatchLayer |
DispatchLayer (new) |
dispatch-layer.js |
The unified class |
uok/kernel.js |
keep as-is | Kernel is the right metaphor for the controller |
MessageBus |
keep as-is | Standard pattern name |
Cmux |
keep as-is | Product name for terminal multiplexing |
subagent tool |
keep as-is | The user-facing tool name |
Mental model:
- Controller = UOK kernel (deterministic policy, what to run)
- Dispatcher = DispatchLayer (mechanism, how to run)
- Workers =
sf headlessprocesses in worktrees (the doing) - Inbox = AgentInbox per worker (message receiving)
- Bus = MessageBus (durable inter-agent messaging)
- Subagent tool = in-session ad-hoc delegation (separate from the DispatchLayer path)
The confusion arises because "orchestrator" suggests it controls both what and how. In a clean architecture, orchestrator = controller (what), and dispatcher = mechanism (how). Today, parallel-orchestrator does both, which is why it feels heavyweight and why slice-parallel-orchestrator had to be cloned to change scope.
7. Implementation Priority
Phase 1: Eliminate duplication (lowest risk, highest clarity)
1.1 — Merge parallel-orchestrator + slice-parallel-orchestrator
Extract shared logic into a DispatchLayer class parameterized by scope. The slice orchestrator's conflict-filtering logic (filterConflictingSlices) already lives in slice-parallel-conflict.ts and stays there. The merged dispatch-layer.js calls it.
Test: both the /parallel command and the slice-level parallelism continue to work identically. The parallel orchestrator dashboard continues to show milestone workers; slice-level parallelism shows slice workers.
File: new src/resources/extensions/sf/dispatch-layer.js (~400 LOC merged from both orchestrators).
Phase 2: Wire MessageBus into DispatchLayer
2.1 — Add AgentInbox to each worker
Every sf headless worker opens a MessageBus inbox named after its milestone/slice ID. The coordinator can send messages to workers (e.g., "pause", "resume", "report status").
2.2 — Use MessageBus for coordinator → worker signaling
Replace file-based IPC signals (session-status-io.js, sendSignal) with MessageBus send(). The file-based signals already exist as a fallback for crash recovery; MessageBus gives durable at-least-once delivery.
Test: workers respond to coordinator pause/resume messages delivered via MessageBus instead of or in addition to file signals.
Phase 3: UOK kernel adopts DispatchLayer
3.1 — Replace direct parallel-orchestrator calls with DispatchLayer
The autonomous loop's parallel dispatch path (analyzeParallelEligibility → startParallel) goes through DispatchLayer instead of calling parallel-orchestrator directly.
3.2 — UOK reads worker status from DispatchLayer
Dashboard refresh reads from dispatchLayer.getStatus() instead of directly from parallel-orchestrator's state.
File changes: uok/kernel.js imports DispatchLayer; parallel-orchestrator.js becomes a thin wrapper (or is removed if no other callers remain).
Phase 4: Subagent tool gets optional MessageBus inbox
4.1 — Allow subagent workers to opt-in to MessageBus
A subagent spawned with useMessageBus: true in params gets an AgentInbox injected into its prompt context. This enables the subagent to receive coordinator messages during long-running tasks.
Constraint: subagent still cannot write to project DB. MessageBus read access does not change this.
Test: long-running subagent receives a pause message from the coordinator via MessageBus.
Phase 5: Naming cleanup (cosmetic but reduces confusion)
5.1 — Rename parallel-orchestrator.js → milestone-dispatcher.js
5.2 — Rename slice-parallel-orchestrator.js → slice-dispatcher.js
Update all import references.
5.3 — Trim uok/index.js exports
Move non-orchestration exports (skills, model policy, etc.) to their own barrels or remove from the UOK public API. The uok/index.js barrel re-exports ~60 symbols from ~30 sub-modules. Some exports (e.g., skill functions, model policy functions) are used only by specific tools and do not belong in an orchestration kernel export.
Summary
The 5 dispatch mechanisms + 1 message bus represent 3 genuinely different needs (UOK autonomous loop, worktree-based isolation, durable inter-agent messaging) and 2 duplications (parallel-orchestrator + slice-parallel-orchestrator; file-based IPC replacing MessageBus). The root cause is that dispatch, orchestration, and coordination evolved separately rather than being designed as layers of one system.
The plan is to:
- Merge
parallel-orchestrator+slice-parallel-orchestratorinto a singleDispatchLayerclass - Wire MessageBus into DispatchLayer so workers become reachable via durable messaging (replacing file-based IPC)
- UOK kernel becomes the controller that calls DispatchLayer, not a parallel system
- Subagent tool stays separate — it's ad-hoc in-session delegation, not autonomous dispatch; formalize its DB access contract
- Cmux stays orthogonal — it's surface integration, not dispatch
The DB access model is already correct: subagents run in their own process with their own DB connection and cannot write to the project state. Workers (dispatched via DispatchLayer) are the project's own agents and do have project DB write access.
The adversarial_partner/adversarial_combatant/adversarial_architect fields are planning ceremony fields (Letta-inspired) that belong in the PDD planning layer (slice/milestone planning), not in the dispatch layer. They are populated by planning tools and rendered in slice output. The dispatch layer should remain purely about "how to run" — worktree lifecycle, process management, cost tracking, and message delivery.