# Dispatch/Orchestration Architecture — Consolidation Plan

**Author:** Research synthesis  
**Date:** 2026-05-08  
**Status:** Draft — for review and promotion

---

## 1. Root Cause Diagnosis

The 5 dispatch mechanisms + 1 message bus grew to fill genuine gaps at different stages, but the structural symptom is a **missing abstraction layer**: there is no unified concept that unifies "what to run" (controller) from "how to run" (mechanism).

### Timeline of divergence

| Era | Mechanism | Gap filled |
|-----|-----------|-----------|
| Early SF | `subagent tool` (`extensions/subagent/index.js`) | Ad-hoc delegation: "run this agent for this task" from within a session |
| Parallel work | `parallel-orchestrator.js` | "run milestone X in a worktree, independently" — process isolation via `spawn('sf headless')` |
| Slice-level work | `slice-parallel-orchestrator.js` | Same as above but at slice granularity — **90% copy-paste of parallel-orchestrator** |
| Autonomous loop | `UOK kernel` (`uok/kernel.js`) | "run the full PDD loop continuously, gated by confidence/risk" |
| Multi-agent messaging | `MessageBus` (`uok/message-bus.js`) | "agents need to communicate across turns/sessions" (Letta-style) |
| Surface multiplexing | `Cmux` (`cmux/index.js`) | "TUI needs multiple visible surfaces for parallel agents" |

### Three structural problems

**1. Single-process thinking drove process-per-unit.**  
SF was originally a single-agent CLI. When parallelism was needed, the natural answer was `spawn('sf headless')` — a new OS process per milestone. This is correct for filesystem isolation but requires bolted-on coordination (SQLite WAL, file-based IPC, session status polling).

**2. UOK kernel was designed as a single-agent loop.**  
It runs inside a headless process and manages one autonomous run. It does not know about sibling workers spawned by parallel-orchestrator, does not coordinate with it, and has no model for "I am one of N workers running concurrently."

**3. MessageBus was designed for persistent agents SF doesn't have yet.**  
The Letta-style inbox model is architecturally correct but premature — you need durable named agents before durable named inboxes matter. Today MessageBus is used for UOK internal observer chains but not for real multi-agent coordination between workers and coordinator.

### The concretion

The proliferation is a **missing abstraction problem** at three levels:

1. **No unified "dispatch context"** — subagent, parallel-orchestrator, and UOK each create their own notion of "what am I running and with what environment"
2. **No shared dispatch registry** — no single place that tracks "what is currently running" across all parallelism dimensions
3. **No first-class "work unit" concept** — milestone, slice, and task are different tables with different lock semantics, not different states of the same work unit

---

## 2. What Should Stay vs Merge

### Stay (genuinely different needs)

| Mechanism | Reason to Keep |
|-----------|---------------|
| **UOK kernel** | Autonomous loop engine implementing the PDD gate model (confidence/risk/reversibility/blast-radius/cost). This is the **controller** — it decides what to run, not how. Removing it means rewriting autonomous mode from scratch. |
| **MessageBus** | SQLite-backed durable inbox is the right model for cross-turn coordination. This is genuine infrastructure. However: it should be **repurposed** — it serves UOK diagnostics today and should serve agent handoff when persistent agents land (v3.1 per BUILD_PLAN.md). |
| **Cmux** | Terminal UI surface multiplexing. Belongs in `pi-tui`, not the dispatch layer. Should be **decoupled** from dispatch entirely — parallel orchestrator should not know about Cmux grid layouts. |
| **Execution graph** (`uok/execution-graph.js`) | File-conflict DAG that computes which milestones/slices can run in parallel. This is the **constraint solver** — stays separate from dispatch mechanism. |

### Merge (duplication without functional difference)

| Duplicated | Problem | Resolution |
|------------|---------|-----------|
| `parallel-orchestrator.js` + `slice-parallel-orchestrator.js` | ~80% identical. Only diffs: scope (milestone vs slice), lock env vars, status file naming. Slice orchestrator additionally calls `slice-parallel-conflict.ts` for file overlap filtering. | **Merge into a single `WorktreeOrchestrator`** parameterized by `{ scope: 'milestone' | 'slice', milestoneId, sliceId? }`. Conflict filtering already lives in `slice-parallel-conflict.ts` — call it from the merged class. |
| **subagent tool's parallel/debate/chain modes** vs **parallel-orchestrator's milestone workers** | Both implement "run multiple things at the same time." Subagent tool does in-process `Promise.all` over spawned `sf` CLIs; parallel-orchestrator does the same with worktrees. Different IPC mechanisms, different isolation models. | **Subagent keeps single-agent dispatch** (its core value). For multi-agent work, subagent should delegate to the unified orchestrator rather than managing its own concurrency pool. |

### Refactor (same need, wrong implementation)

| Current | Issue | Refactor |
|---------|-------|----------|
| **subagent spawning `sf` CLI** | Thin wrapper spawning a full binary. Only 4 tools registered as a workaround for not having a proper dispatch API. | Subagent should use a **headless RPC client** directly, not spawn `sf`. Enables calling any SF tool, not just 4 registered ones. |
| **parallel/slice orchestrator using SQLite WAL + file IPC** | Hand-rolled IPC via session status files + signal files. "Poll the filesystem" coordination — correct but fragile. | Replace with **MessageBus-based coordination**. Workers publish status to MessageBus; coordinator subscribes. |
| **UOK kernel owning the autonomous loop** | Runs inside a headless process. When parallel orchestrator spawns `sf headless autonomous`, each worker has its own UOK kernel with no coordination between kernels. | UOK kernel should be the **runtime environment** for any autonomous dispatch, not a process-bound concept. |

---

## 3. Streamlined Architecture

### Three-tier dispatch model

```
┌──────────────────────────────────────────────────────────────────┐
│  UOK Kernel (controller)                                          │
│  Decides WHAT to run next; enforces PDD gates, policy, parity     │
│  - Phase machine: Discuss → Plan → Execute → Merge → Complete      │
│  - Calls WorktreeOrchestrator.dispatch() to execute                 │
└────────────────────────────┬──────────────────────────────────────┘
                             │ DispatchEnvelope { scope, unitId, ... }
                             ▼
┌──────────────────────────────────────────────────────────────────┐
│  WorktreeOrchestrator (mechanism)                                   │
│  Decides HOW to run: worktree lifecycle, process registry, budget   │
│  - Worktree pool (git worktree per milestone/slice)                │
│  - Process registry (child_process per worker)                      │
│  - Cost accumulator (NDJSON parsing from worker stdout)             │
│  - File-intent tracker (parallel-intent.js)                        │
│  - MessageBus integration per worker (AgentInbox)                   │
└────────────────────────────┬──────────────────────────────────────┘
                             │ spawns
                             ▼
┌──────────────────────────────────────────────────────────────────┐
│  Worker (execution unit)                                           │
│  `sf headless --json autonomous` in a worktree                    │
│  - Owns SQLite WAL connection to project DB                        │
│  - Has AgentInbox for MessageBus delivery                          │
│  - Emits NDJSON events consumed by WorktreeOrchestrator            │
└──────────────────────────────────────────────────────────────────┘
```

### How existing components map

| Component | Role in Unified Architecture |
|-----------|------------------------------|
| **subagent tool** | Thin UDA client in TUI. Single-agent dispatch with full SF tool access. Keeps 4-mode interface (single/parallel/debate/chain) but implemented via UDA, not spawned CLI. |
| **parallel-orchestrator + slice-parallel** | Merge into **WorktreeOrchestrator** — a UDA backend managing worktree lifecycle and multi-slot execution. |
| **UOK kernel** | Becomes **autonomous-runtime** — a UDA execution mode wrapping any dispatch with the PDD gate model. `dispatch.work({ unit, runControl: 'autonomous' })` automatically uses the autonomous-runtime. |
| **MessageBus** | Becomes the **UDA event/logging backbone**. All dispatch events (start, end, error, cost) published to MessageBus. File-based IPC replaced by MessageBus subscriptions. |
| **Cmux** | **Decoupled entirely**. Cmux listens to MessageBus for dispatch events and renders grid layouts. Dispatch layer does not know about Cmux. |

### WorktreeOrchestrator interface (proposed)

```ts
// File: src/resources/extensions/sf/worktree-orchestrator.js

interface DispatchOptions {
  scope: 'milestone' | 'slice';
  milestoneId: string;
  sliceId?: string;
  basePath: string;
  maxWorkers?: number;
  budgetCeiling?: number;
  workerTimeoutMs?: number;
  shellWrapper?: string[];
  useExecutionGraph?: boolean;
}

class WorktreeOrchestrator {
  // Returns eligible units filtered by execution-graph conflicts
  async prepare(opts: DispatchOptions): Promise<PrepareResult>;

  // Start workers for given unit IDs
  async start(ids: string[], opts: DispatchOptions): Promise<StartResult>;

  // Stop all or specific workers
  async stop(ids?: string[]): Promise<void>;

  // Pause/resume workers via MessageBus
  pause(ids?: string[]): void;
  resume(ids?: string[]): void;

  // Read current state (for dashboard)
  getStatus(): DispatchStatus;

  // Shared MessageBus instance
  readonly bus: MessageBus;

  // Budget tracking
  totalCost(): number;
  isBudgetExceeded(): boolean;
}
```

### How UOK kernel uses WorktreeOrchestrator

Today, `uok/kernel.js` runs the autonomous loop and calls into tools that spawn agents. The parallel orchestrator is started separately by the TUI dashboard or headless command. After unification:

1. UOK kernel initializes `WorktreeOrchestrator` at autonomous loop start
2. UOK calls `orchestrator.start(eligibleMilestoneIds)` for parallel milestones
3. Workers emit NDJSON events → orchestrator parses cost → updates budget
4. Workers emit completion → UOK kernel processes post-unit staging
5. Workers receive messages via their `AgentInbox` (MessageBus integration)
6. `orchestrator.stop()` called on autonomous loop exit

---

## 4. Multi-Dimensional Parallelism

### Current axes of parallelism

| Axis | Mechanism | Status |
|------|-----------|--------|
| **Inter-project** | Multiple `sf` invocations | ✅ not SF's concern |
| **Inter-milestone** | parallel-orchestrator + worktrees | ✅ implemented |
| **Inter-slice** | slice-parallel-orchestrator + worktrees | ✅ implemented |
| **Inter-task** (in-process) | subagent `parallel` mode | ✅ `mapWithConcurrencyLimit` |
| **Inter-agent** (debate/chain) | subagent `debate`/`chain` mode | ✅ implemented |
| **Terminal-level** | Cmux grid layout for parallel agents | ✅ implemented |

### What "true concurrency" means

The current architecture already achieves **true process-level concurrency** via worktrees and separate `sf headless` processes. The shared SQLite WAL allows concurrent readers with a single writer.

**What is missing is coordinated dispatch, not more parallelism axes:**

- The execution graph (`uok/execution-graph.js`) already computes file-conflict relationships
- `selectConflictFreeBatch` picks a conflict-free subset for parallel dispatch
- But this is only wired into parallel-orchestrator, not into slice-parallel or the UOK autonomous loop's dispatch decisions

### Proposed coordination model

```
Execution Graph (file-conflict DAG)
    │
    ├── selectConflictFreeBatch() ──► WorktreeOrchestrator.start()
    │                                  Workers run in parallel
    │                                  Each worker has AgentInbox
    │
UOK kernel
    │
    ├── reads unit readiness from DB
    ├── calls WorktreeOrchestrator.start(milestoneIds)
    └── calls WorktreeOrchestrator.start(sliceIds) for intra-milestone parallelism
```

**Debate mode** (subagent tool): runs multiple agents sequentially within a single process using `mapWithConcurrencyLimit`. This is **not** true process-level parallelism but is correct for LLM-based debate where shared context and a single conversation transcript are needed.

**Chain mode**: purely sequential — each step's output feeds into the next step's prompt.

### Concurrency limits

| Level | Max Concurrent |
|-------|---------------|
| Project (milestones) | `parallel.max_workers` config (default: CPU cores / 2) |
| Milestone (slices) | `parallel.slice_max_workers` config (default: 2) |
| Subagent parallel tasks | `MAX_CONCURRENCY = 4` (hardcoded in `subagent/index.js`) |

---

## 5. DB Access from Subagents

### The current constraint is intentional

The subagent tool **cannot** call `complete-task` or `plan-slice` because:
1. Only 4 tools are registered in the subagent extension manifest
2. The subagent is meant to be a **task executor**, not a **state mutator**

This is a **correct security isolation**, not a bug. A spawned `sf` CLI with full SF tool access running in a user-specified `cwd` is a significant attack surface.

### The right model: two-tier DB access

```
Coordinator (UOK kernel)     ──►  project .sf/sf.db (WAL mode)
                                    milestone/slice state
                                    task execution ledger

Subagent (sf process)        ──►  ~/.sf/sf.db (global)
                                    memories, preferences
                                    agent-level state
                              ✗    project .sf/sf.db (write)
```

The subagent can **read** project state via **prompt injection** (system context assembly already does this). Writes only to global state.

### If a subagent needs to record a finding

1. Subagent writes to its output (stdout/file)
2. Coordinator reads and processes the output
3. Coordinator calls DB tools

This is the Letta pattern — agents return results, the orchestrator decides what to persist.

### Architectural backing for the constraint

```ts
// In subagent tool — formalize the access contract
const SUBAGENT_DB_ACCESS = {
  read: ['project_context'],    // via prompt injection only
  write: ['~/.sf/sf.db'],       // global state only
  prohibited: ['project .sf/sf.db write operations']
};
```

The extension manifest's `tools[]` array currently enforces this by omission. A more explicit model would declare the access contract formally, making it auditable.

### Future: RPC-mode subagent

**Keep the current constraint for spawned-CLI subagents.**

**Add a new subagent mode** — `dispatch.work({ mode: 'rpc' })` — where the subagent runs as an RPC client in the same process, gaining access to all SF tools. This is the headless equivalent of the subagent tool. Use this for internal SF workflows (e.g., "dispatch a review subagent that calls `complete-task`").

---

## 6. Naming — The Mental Model

### Current → Proposed

| Current | Problem | Proposed | Rationale |
|---------|---------|----------|-----------|
| `subagent` tool | "subagent" implies a lesser agent | `dispatch` tool (in TUI) | The tool *is* the dispatch API surface |
| `parallel-orchestrator` | "orchestrator" is vague; doesn't convey worktree isolation | `milestone-dispatcher` | Conveys scope + role |
| `slice-parallel-orchestrator` | Duplicate of above | `slice-dispatcher` (merged into WorktreeOrchestrator) | See section 2 |
| `WorktreeOrchestrator` (new) | — | `worktree-orchestrator` | Backend that manages worktree lifecycle |
| `UOK kernel` | "kernel" implies OS-level; "UOK" is jargon | `autonomous-runtime` | PDD-gated execution loop |
| `MessageBus` | Generic | keep as-is | It *is* a bus pattern. Keep it. |
| `Cmux` | "cmux" is implementation detail | `surface-grid` | User-facing: "show agents in a grid" |

### The mental model hierarchy

```
dispatch           — The high-level API and TUI tool name
  ├── work()       — Run a single unit (milestone/slice/task)
  ├── batch()      — Run multiple units in parallel (worktree pool)
  ├── chain()      — Run units sequentially, passing output
  ├── debate()     — Run units as adversarial roles
  └── subscribe()  — Listen to dispatch events (MessageBus)

worktree-orchestrator  — Backend: worktree lifecycle + process registry
autonomous-runtime     — PDD gate model (UOK kernel, renamed)
MessageBus             — Durable inter-agent messaging (keeps name)
surface-grid           — Cmux decoupled from dispatch
```

**Why "kernel" is the right metaphor for UOK (keep it internally):**  
A kernel manages resources and enforces policy; it doesn't do the work itself. The UOK kernel evaluates confidence/risk gates, manages parity reporting, and decides when to proceed — but it delegates execution to WorktreeOrchestrator. The name fits.

---

## 7. Implementation Priority

### Phase 1 — Merge the two orchestrators (Lowest risk, highest clarity)

**1.1 — Extract `WorktreeOrchestrator` from parallel-orchestrator + slice-parallel**

Create new `dispatch-layer.js` that merges the ~80% shared logic. Parameterized by `{ scope: 'milestone' | 'slice' }`. The slice orchestrator's conflict-filtering logic (`filterConflictingSlices` in `slice-parallel-conflict.ts`) already lives separately — call it from the merged class.

**Files touched:**
- New: `src/resources/extensions/sf/dispatch-layer.js`
- Refactor: `parallel-orchestrator.js` → thin wrapper calling dispatch-layer
- Refactor: `slice-parallel-orchestrator.js` → thin wrapper calling dispatch-layer

**Test:** Both `/parallel` command and slice-level parallelism continue to work identically. Dashboard continues to show correct worker states.

**Effort:** ~1 week. Pure refactor, no behavior change.

### Phase 2 — Wire MessageBus into WorktreeOrchestrator

**2.1 — Add AgentInbox to each worker**

Every `sf headless` worker opens a `MessageBus` inbox named after its milestone/slice ID. The coordinator can send messages to workers (pause, resume, report status).

**2.2 — Replace file-based IPC with MessageBus**

Replace `session-status-io.js` polling and `sendSignal` file-based signals with MessageBus `send()`. File-based signals remain as crash-recovery fallback.

**Files touched:**
- `dispatch-layer.js` (new)
- `session-status-io.js` (add MessageBus-backed path)
- Worker bootstrap in both orchestrators

**Test:** Workers respond to coordinator pause/resume messages delivered via MessageBus.

**Effort:** ~3 days.

### Phase 3 — Subagent RPC Mode

**3.1 — Add `dispatch.rpc()` — spawn a headless RPC client (not CLI)**

The 4-tool limitation goes away when subagent is an RPC client. The subagent keeps its 4-mode interface; the implementation changes.

**3.2 — Ensure subagent RPC mode cannot access tools the parent mode doesn't permit**

Security boundary must be preserved. This is where the access contract from section 5 gets enforced.

**Files touched:**
- `extensions/subagent/index.js` (add RPC mode path)
- `extensions/subagent/rpc-client.js` (new)

**Test:** Subagent with `mode: 'rpc'}` can call `complete-task` and other SF tools.

**Effort:** ~1 week.

### Phase 4 — UOK Kernel Adopts WorktreeOrchestrator

**4.1 — Replace direct parallel-orchestrator calls with WorktreeOrchestrator**

The autonomous loop's parallel dispatch path (`analyzeParallelEligibility` → `startParallel`) goes through WorktreeOrchestrator instead of calling parallel-orchestrator directly.

**4.2 — UOK reads worker status from WorktreeOrchestrator**

Dashboard refresh reads from `orchestrator.getStatus()` instead of directly from parallel-orchestrator's state.

**Files touched:**
- `uok/kernel.js` (import WorktreeOrchestrator)
- `parallel-orchestrator.js` (becomes wrapper or is removed)

**Test:** Autonomous mode with parallel milestones works identically to current behavior.

**Effort:** ~3 days.

### Phase 5 — Cmux Decoupling

**5.1 — Make Cmux grid layout driven by MessageBus events**

Dispatch should not call Cmux directly. Cmux subscribes to MessageBus dispatch events and creates/destroys grid surfaces accordingly.

**5.2 — Remove `cmuxSplitsEnabled` from subagent tool**

This is the concrete coupling point — dispatch knows about Cmux grid layouts. Remove it; let Cmux manage its own surface allocation based on dispatch events.

**Files touched:**
- `cmux/index.js` (add MessageBus subscriber)
- `extensions/subagent/index.js` (remove cmuxSplitsEnabled)

**Effort:** ~2 days.

### Phase 6 — Naming Cleanup

**6.1 — Rename `dispatch-layer.js` → `worktree-orchestrator.js`**  
**6.2 — Rename parallel-orchestrator wrapper → `milestone-dispatcher.js`**  
**6.3 — Rename slice-parallel-orchestrator wrapper → `slice-dispatcher.js`**  
**6.4 — Update all import references**

**Effort:** ~1 day.

### Phase 7 — Document the Architecture

**7.1 — Update `ARCHITECTURE.md`**  
Add a section on the unified dispatch architecture. Current dispatch docs are scattered across inline comments and session-status-io.

**Effort:** ~1 day.

---

## Summary

The 5 dispatch mechanisms + 1 message bus represent **3 genuinely different needs** (UOK autonomous loop, worktree-based isolation, durable inter-agent messaging) and **3 duplications** (parallel-orchestrator + slice-parallel-orchestrator; subagent parallel mode + parallel-orchestrator; Cmux tight coupling).

**The plan:**

1. **Merge** `parallel-orchestrator` + `slice-parallel-orchestrator` into `WorktreeOrchestrator`
2. **Wire** MessageBus into WorktreeOrchestrator — workers become reachable via durable messaging
3. **UOK kernel** becomes the controller that calls WorktreeOrchestrator, not a parallel system
4. **Subagent tool** stays separate — it's ad-hoc in-session delegation, with an optional RPC mode for internal workflows
5. **Cmux** becomes a MessageBus subscriber, decoupled from dispatch
6. **DB access model** is already correct: spawned subagents cannot write to project DB; workers dispatched via WorktreeOrchestrator can

The `adversarial_partner`/`adversarial_combatant`/`adversarial_architect` fields already in the DB are **planning ceremony fields** (Letta-inspired), not dispatch mechanism fields. They belong in the PDD planning layer, not in the dispatch layer.

**Total effort estimate:** 3-4 weeks across 7 phases, sequenced to preserve existing behavior at each step.