singularity-forge/docs/plans/DISPATCH_ORCHESTRATION_PLAN.md

23 KiB

Dispatch/Orchestration Architecture — Consolidation Plan

Author: Research synthesis
Date: 2026-05-08
Status: Draft — for review and promotion


1. Root Cause Diagnosis

The 5 dispatch mechanisms + 1 message bus grew to fill genuine gaps at different stages, but the structural symptom is a missing abstraction layer: there is no unified concept that unifies "what to run" (controller) from "how to run" (mechanism).

Timeline of divergence

Era Mechanism Gap filled
Early SF subagent tool (extensions/subagent/index.js) Ad-hoc delegation: "run this agent for this task" from within a session
Parallel work parallel-orchestrator.js "run milestone X in a worktree, independently" — process isolation via spawn('sf headless')
Slice-level work slice-parallel-orchestrator.js Same as above but at slice granularity — 90% copy-paste of parallel-orchestrator
Autonomous loop UOK kernel (uok/kernel.js) "run the full PDD loop continuously, gated by confidence/risk"
Multi-agent messaging MessageBus (uok/message-bus.js) "agents need to communicate across turns/sessions" (Letta-style)
Surface multiplexing Cmux (cmux/index.js) "TUI needs multiple visible surfaces for parallel agents"

Three structural problems

1. Single-process thinking drove process-per-unit.
SF was originally a single-agent CLI. When parallelism was needed, the natural answer was spawn('sf headless') — a new OS process per milestone. This is correct for filesystem isolation but requires bolted-on coordination (SQLite WAL, file-based IPC, session status polling).

2. UOK kernel was designed as a single-agent loop.
It runs inside a headless process and manages one autonomous run. It does not know about sibling workers spawned by parallel-orchestrator, does not coordinate with it, and has no model for "I am one of N workers running concurrently."

3. MessageBus was designed for persistent agents SF doesn't have yet.
The Letta-style inbox model is architecturally correct but premature — you need durable named agents before durable named inboxes matter. Today MessageBus is used for UOK internal observer chains but not for real multi-agent coordination between workers and coordinator.

The concretion

The proliferation is a missing abstraction problem at three levels:

  1. No unified "dispatch context" — subagent, parallel-orchestrator, and UOK each create their own notion of "what am I running and with what environment"
  2. No shared dispatch registry — no single place that tracks "what is currently running" across all parallelism dimensions
  3. No first-class "work unit" concept — milestone, slice, and task are different tables with different lock semantics, not different states of the same work unit

2. What Should Stay vs Merge

Stay (genuinely different needs)

Mechanism Reason to Keep
UOK kernel Autonomous loop engine implementing the PDD gate model (confidence/risk/reversibility/blast-radius/cost). This is the controller — it decides what to run, not how. Removing it means rewriting autonomous mode from scratch.
MessageBus SQLite-backed durable inbox is the right model for cross-turn coordination. This is genuine infrastructure. However: it should be repurposed — it serves UOK diagnostics today and should serve agent handoff when persistent agents land (v3.1 per BUILD_PLAN.md).
Cmux Terminal UI surface multiplexing. Belongs in pi-tui, not the dispatch layer. Should be decoupled from dispatch entirely — parallel orchestrator should not know about Cmux grid layouts.
Execution graph (uok/execution-graph.js) File-conflict DAG that computes which milestones/slices can run in parallel. This is the constraint solver — stays separate from dispatch mechanism.

Merge (duplication without functional difference)

Duplicated Problem Resolution
parallel-orchestrator.js + slice-parallel-orchestrator.js ~80% identical. Only diffs: scope (milestone vs slice), lock env vars, status file naming. Slice orchestrator additionally calls slice-parallel-conflict.ts for file overlap filtering. Merge into a single WorktreeOrchestrator parameterized by `{ scope: 'milestone'
subagent tool's parallel/debate/chain modes vs parallel-orchestrator's milestone workers Both implement "run multiple things at the same time." Subagent tool does in-process Promise.all over spawned sf CLIs; parallel-orchestrator does the same with worktrees. Different IPC mechanisms, different isolation models. Subagent keeps single-agent dispatch (its core value). For multi-agent work, subagent should delegate to the unified orchestrator rather than managing its own concurrency pool.

Refactor (same need, wrong implementation)

Current Issue Refactor
subagent spawning sf CLI Thin wrapper spawning a full binary. Only 4 tools registered as a workaround for not having a proper dispatch API. Subagent should use a headless RPC client directly, not spawn sf. Enables calling any SF tool, not just 4 registered ones.
parallel/slice orchestrator using SQLite WAL + file IPC Hand-rolled IPC via session status files + signal files. "Poll the filesystem" coordination — correct but fragile. Replace with MessageBus-based coordination. Workers publish status to MessageBus; coordinator subscribes.
UOK kernel owning the autonomous loop Runs inside a headless process. When parallel orchestrator spawns sf headless autonomous, each worker has its own UOK kernel with no coordination between kernels. UOK kernel should be the runtime environment for any autonomous dispatch, not a process-bound concept.

3. Streamlined Architecture

Three-tier dispatch model

┌──────────────────────────────────────────────────────────────────┐
│  UOK Kernel (controller)                                          │
│  Decides WHAT to run next; enforces PDD gates, policy, parity     │
│  - Phase machine: Discuss → Plan → Execute → Merge → Complete      │
│  - Calls WorktreeOrchestrator.dispatch() to execute                 │
└────────────────────────────┬──────────────────────────────────────┘
                             │ DispatchEnvelope { scope, unitId, ... }
                             ▼
┌──────────────────────────────────────────────────────────────────┐
│  WorktreeOrchestrator (mechanism)                                   │
│  Decides HOW to run: worktree lifecycle, process registry, budget   │
│  - Worktree pool (git worktree per milestone/slice)                │
│  - Process registry (child_process per worker)                      │
│  - Cost accumulator (NDJSON parsing from worker stdout)             │
│  - File-intent tracker (parallel-intent.js)                        │
│  - MessageBus integration per worker (AgentInbox)                   │
└────────────────────────────┬──────────────────────────────────────┘
                             │ spawns
                             ▼
┌──────────────────────────────────────────────────────────────────┐
│  Worker (execution unit)                                           │
│  `sf headless --json autonomous` in a worktree                    │
│  - Owns SQLite WAL connection to project DB                        │
│  - Has AgentInbox for MessageBus delivery                          │
│  - Emits NDJSON events consumed by WorktreeOrchestrator            │
└──────────────────────────────────────────────────────────────────┘

How existing components map

Component Role in Unified Architecture
subagent tool Thin UDA client in TUI. Single-agent dispatch with full SF tool access. Keeps 4-mode interface (single/parallel/debate/chain) but implemented via UDA, not spawned CLI.
parallel-orchestrator + slice-parallel Merge into WorktreeOrchestrator — a UDA backend managing worktree lifecycle and multi-slot execution.
UOK kernel Becomes autonomous-runtime — a UDA execution mode wrapping any dispatch with the PDD gate model. dispatch.work({ unit, runControl: 'autonomous' }) automatically uses the autonomous-runtime.
MessageBus Becomes the UDA event/logging backbone. All dispatch events (start, end, error, cost) published to MessageBus. File-based IPC replaced by MessageBus subscriptions.
Cmux Decoupled entirely. Cmux listens to MessageBus for dispatch events and renders grid layouts. Dispatch layer does not know about Cmux.

WorktreeOrchestrator interface (proposed)

// File: src/resources/extensions/sf/worktree-orchestrator.js

interface DispatchOptions {
  scope: 'milestone' | 'slice';
  milestoneId: string;
  sliceId?: string;
  basePath: string;
  maxWorkers?: number;
  budgetCeiling?: number;
  workerTimeoutMs?: number;
  shellWrapper?: string[];
  useExecutionGraph?: boolean;
}

class WorktreeOrchestrator {
  // Returns eligible units filtered by execution-graph conflicts
  async prepare(opts: DispatchOptions): Promise<PrepareResult>;

  // Start workers for given unit IDs
  async start(ids: string[], opts: DispatchOptions): Promise<StartResult>;

  // Stop all or specific workers
  async stop(ids?: string[]): Promise<void>;

  // Pause/resume workers via MessageBus
  pause(ids?: string[]): void;
  resume(ids?: string[]): void;

  // Read current state (for dashboard)
  getStatus(): DispatchStatus;

  // Shared MessageBus instance
  readonly bus: MessageBus;

  // Budget tracking
  totalCost(): number;
  isBudgetExceeded(): boolean;
}

How UOK kernel uses WorktreeOrchestrator

Today, uok/kernel.js runs the autonomous loop and calls into tools that spawn agents. The parallel orchestrator is started separately by the TUI dashboard or headless command. After unification:

  1. UOK kernel initializes WorktreeOrchestrator at autonomous loop start
  2. UOK calls orchestrator.start(eligibleMilestoneIds) for parallel milestones
  3. Workers emit NDJSON events → orchestrator parses cost → updates budget
  4. Workers emit completion → UOK kernel processes post-unit staging
  5. Workers receive messages via their AgentInbox (MessageBus integration)
  6. orchestrator.stop() called on autonomous loop exit

4. Multi-Dimensional Parallelism

Current axes of parallelism

Axis Mechanism Status
Inter-project Multiple sf invocations not SF's concern
Inter-milestone parallel-orchestrator + worktrees implemented
Inter-slice slice-parallel-orchestrator + worktrees implemented
Inter-task (in-process) subagent parallel mode mapWithConcurrencyLimit
Inter-agent (debate/chain) subagent debate/chain mode implemented
Terminal-level Cmux grid layout for parallel agents implemented

What "true concurrency" means

The current architecture already achieves true process-level concurrency via worktrees and separate sf headless processes. The shared SQLite WAL allows concurrent readers with a single writer.

What is missing is coordinated dispatch, not more parallelism axes:

  • The execution graph (uok/execution-graph.js) already computes file-conflict relationships
  • selectConflictFreeBatch picks a conflict-free subset for parallel dispatch
  • But this is only wired into parallel-orchestrator, not into slice-parallel or the UOK autonomous loop's dispatch decisions

Proposed coordination model

Execution Graph (file-conflict DAG)
    │
    ├── selectConflictFreeBatch() ──► WorktreeOrchestrator.start()
    │                                  Workers run in parallel
    │                                  Each worker has AgentInbox
    │
UOK kernel
    │
    ├── reads unit readiness from DB
    ├── calls WorktreeOrchestrator.start(milestoneIds)
    └── calls WorktreeOrchestrator.start(sliceIds) for intra-milestone parallelism

Debate mode (subagent tool): runs multiple agents sequentially within a single process using mapWithConcurrencyLimit. This is not true process-level parallelism but is correct for LLM-based debate where shared context and a single conversation transcript are needed.

Chain mode: purely sequential — each step's output feeds into the next step's prompt.

Concurrency limits

Level Max Concurrent
Project (milestones) parallel.max_workers config (default: CPU cores / 2)
Milestone (slices) parallel.slice_max_workers config (default: 2)
Subagent parallel tasks MAX_CONCURRENCY = 4 (hardcoded in subagent/index.js)

5. DB Access from Subagents

The current constraint is intentional

The subagent tool cannot call complete-task or plan-slice because:

  1. Only 4 tools are registered in the subagent extension manifest
  2. The subagent is meant to be a task executor, not a state mutator

This is a correct security isolation, not a bug. A spawned sf CLI with full SF tool access running in a user-specified cwd is a significant attack surface.

The right model: two-tier DB access

Coordinator (UOK kernel)     ──►  project .sf/sf.db (WAL mode)
                                    milestone/slice state
                                    task execution ledger

Subagent (sf process)        ──►  ~/.sf/sf.db (global)
                                    memories, preferences
                                    agent-level state
                              ✗    project .sf/sf.db (write)

The subagent can read project state via prompt injection (system context assembly already does this). Writes only to global state.

If a subagent needs to record a finding

  1. Subagent writes to its output (stdout/file)
  2. Coordinator reads and processes the output
  3. Coordinator calls DB tools

This is the Letta pattern — agents return results, the orchestrator decides what to persist.

Architectural backing for the constraint

// In subagent tool — formalize the access contract
const SUBAGENT_DB_ACCESS = {
  read: ['project_context'],    // via prompt injection only
  write: ['~/.sf/sf.db'],       // global state only
  prohibited: ['project .sf/sf.db write operations']
};

The extension manifest's tools[] array currently enforces this by omission. A more explicit model would declare the access contract formally, making it auditable.

Future: RPC-mode subagent

Keep the current constraint for spawned-CLI subagents.

Add a new subagent modedispatch.work({ mode: 'rpc' }) — where the subagent runs as an RPC client in the same process, gaining access to all SF tools. This is the headless equivalent of the subagent tool. Use this for internal SF workflows (e.g., "dispatch a review subagent that calls complete-task").


6. Naming — The Mental Model

Current → Proposed

Current Problem Proposed Rationale
subagent tool "subagent" implies a lesser agent dispatch tool (in TUI) The tool is the dispatch API surface
parallel-orchestrator "orchestrator" is vague; doesn't convey worktree isolation milestone-dispatcher Conveys scope + role
slice-parallel-orchestrator Duplicate of above slice-dispatcher (merged into WorktreeOrchestrator) See section 2
WorktreeOrchestrator (new) worktree-orchestrator Backend that manages worktree lifecycle
UOK kernel "kernel" implies OS-level; "UOK" is jargon autonomous-runtime PDD-gated execution loop
MessageBus Generic keep as-is It is a bus pattern. Keep it.
Cmux "cmux" is implementation detail surface-grid User-facing: "show agents in a grid"

The mental model hierarchy

dispatch           — The high-level API and TUI tool name
  ├── work()       — Run a single unit (milestone/slice/task)
  ├── batch()      — Run multiple units in parallel (worktree pool)
  ├── chain()      — Run units sequentially, passing output
  ├── debate()     — Run units as adversarial roles
  └── subscribe()  — Listen to dispatch events (MessageBus)

worktree-orchestrator  — Backend: worktree lifecycle + process registry
autonomous-runtime     — PDD gate model (UOK kernel, renamed)
MessageBus             — Durable inter-agent messaging (keeps name)
surface-grid           — Cmux decoupled from dispatch

Why "kernel" is the right metaphor for UOK (keep it internally):
A kernel manages resources and enforces policy; it doesn't do the work itself. The UOK kernel evaluates confidence/risk gates, manages parity reporting, and decides when to proceed — but it delegates execution to WorktreeOrchestrator. The name fits.


7. Implementation Priority

Phase 1 — Merge the two orchestrators (Lowest risk, highest clarity)

1.1 — Extract WorktreeOrchestrator from parallel-orchestrator + slice-parallel

Create new dispatch-layer.js that merges the ~80% shared logic. Parameterized by { scope: 'milestone' | 'slice' }. The slice orchestrator's conflict-filtering logic (filterConflictingSlices in slice-parallel-conflict.ts) already lives separately — call it from the merged class.

Files touched:

  • New: src/resources/extensions/sf/dispatch-layer.js
  • Refactor: parallel-orchestrator.js → thin wrapper calling dispatch-layer
  • Refactor: slice-parallel-orchestrator.js → thin wrapper calling dispatch-layer

Test: Both /parallel command and slice-level parallelism continue to work identically. Dashboard continues to show correct worker states.

Effort: ~1 week. Pure refactor, no behavior change.

Phase 2 — Wire MessageBus into WorktreeOrchestrator

2.1 — Add AgentInbox to each worker

Every sf headless worker opens a MessageBus inbox named after its milestone/slice ID. The coordinator can send messages to workers (pause, resume, report status).

2.2 — Replace file-based IPC with MessageBus

Replace session-status-io.js polling and sendSignal file-based signals with MessageBus send(). File-based signals remain as crash-recovery fallback.

Files touched:

  • dispatch-layer.js (new)
  • session-status-io.js (add MessageBus-backed path)
  • Worker bootstrap in both orchestrators

Test: Workers respond to coordinator pause/resume messages delivered via MessageBus.

Effort: ~3 days.

Phase 3 — Subagent RPC Mode

3.1 — Add dispatch.rpc() — spawn a headless RPC client (not CLI)

The 4-tool limitation goes away when subagent is an RPC client. The subagent keeps its 4-mode interface; the implementation changes.

3.2 — Ensure subagent RPC mode cannot access tools the parent mode doesn't permit

Security boundary must be preserved. This is where the access contract from section 5 gets enforced.

Files touched:

  • extensions/subagent/index.js (add RPC mode path)
  • extensions/subagent/rpc-client.js (new)

Test: Subagent with mode: 'rpc'} can call complete-task and other SF tools.

Effort: ~1 week.

Phase 4 — UOK Kernel Adopts WorktreeOrchestrator

4.1 — Replace direct parallel-orchestrator calls with WorktreeOrchestrator

The autonomous loop's parallel dispatch path (analyzeParallelEligibilitystartParallel) goes through WorktreeOrchestrator instead of calling parallel-orchestrator directly.

4.2 — UOK reads worker status from WorktreeOrchestrator

Dashboard refresh reads from orchestrator.getStatus() instead of directly from parallel-orchestrator's state.

Files touched:

  • uok/kernel.js (import WorktreeOrchestrator)
  • parallel-orchestrator.js (becomes wrapper or is removed)

Test: Autonomous mode with parallel milestones works identically to current behavior.

Effort: ~3 days.

Phase 5 — Cmux Decoupling

5.1 — Make Cmux grid layout driven by MessageBus events

Dispatch should not call Cmux directly. Cmux subscribes to MessageBus dispatch events and creates/destroys grid surfaces accordingly.

5.2 — Remove cmuxSplitsEnabled from subagent tool

This is the concrete coupling point — dispatch knows about Cmux grid layouts. Remove it; let Cmux manage its own surface allocation based on dispatch events.

Files touched:

  • cmux/index.js (add MessageBus subscriber)
  • extensions/subagent/index.js (remove cmuxSplitsEnabled)

Effort: ~2 days.

Phase 6 — Naming Cleanup

6.1 — Rename dispatch-layer.jsworktree-orchestrator.js
6.2 — Rename parallel-orchestrator wrapper → milestone-dispatcher.js
6.3 — Rename slice-parallel-orchestrator wrapper → slice-dispatcher.js
6.4 — Update all import references

Effort: ~1 day.

Phase 7 — Document the Architecture

7.1 — Update ARCHITECTURE.md
Add a section on the unified dispatch architecture. Current dispatch docs are scattered across inline comments and session-status-io.

Effort: ~1 day.


Summary

The 5 dispatch mechanisms + 1 message bus represent 3 genuinely different needs (UOK autonomous loop, worktree-based isolation, durable inter-agent messaging) and 3 duplications (parallel-orchestrator + slice-parallel-orchestrator; subagent parallel mode + parallel-orchestrator; Cmux tight coupling).

The plan:

  1. Merge parallel-orchestrator + slice-parallel-orchestrator into WorktreeOrchestrator
  2. Wire MessageBus into WorktreeOrchestrator — workers become reachable via durable messaging
  3. UOK kernel becomes the controller that calls WorktreeOrchestrator, not a parallel system
  4. Subagent tool stays separate — it's ad-hoc in-session delegation, with an optional RPC mode for internal workflows
  5. Cmux becomes a MessageBus subscriber, decoupled from dispatch
  6. DB access model is already correct: spawned subagents cannot write to project DB; workers dispatched via WorktreeOrchestrator can

The adversarial_partner/adversarial_combatant/adversarial_architect fields already in the DB are planning ceremony fields (Letta-inspired), not dispatch mechanism fields. They belong in the PDD planning layer, not in the dispatch layer.

Total effort estimate: 3-4 weeks across 7 phases, sequenced to preserve existing behavior at each step.