singularity-forge/docs/plans/UNIFIED_DISPATCH_V2_PLAN.md

666 lines
32 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Unified Dispatch v2 — Qwen Plan
**Author:** Architecture research
**Date:** 2026-05-08
**Status:** Structured plan for review
**Supersedes:** `DISPATCH_ARCHITECTURE_CONSOLIDATION.md`, `dispatch-orchestration-architecture.md`, `DISPATCH_ORCHESTRATION_PLAN.md`
---
## The Unified Vision
SF should support a single dispatch system where ALL of these coexist and compose:
1. **Full-tool agents** — workers with all SF tools + full project DB access (today's parallel-orchestrator workers)
2. **Constrained subagents** — the current subagent tool (4 tools, no project DB writes)
3. **MessageBus-coordinated agents** — agents with AgentInbox, communicating via MessageBus (durable inbox, not file-based IPC)
4. **Coordinators on MessageBus too** — UOK kernel publishes to workers via MessageBus, workers reply via MessageBus
5. **All in parallel/debate/chain** — the subagent tool's 4 modes apply to ALL of the above
6. **Shared SQLite WAL** — all agents that need project state share the same DB
7. **Optional MessageBus inbox for subagents** — subagents can opt in to receive coordinator messages
The dispatch layer is **ONE system** parameterized by four dimensions:
| Dimension | Values | Meaning |
|-----------|--------|---------|
| `isolation` | `'full'` \| `'constrained'` | Full: all SF tools + project DB writes. Constrained: 4 tools, no project DB writes. |
| `coordination` | `'standalone'` \| `'managed'` | Standalone: no MessageBus. Managed: has AgentInbox, coordinator can message. |
| `scope` | `'milestone'` \| `'slice'` \| `'task'` \| `'inline'` | The work unit hierarchy. |
| `mode` | `'single'` \| `'parallel'` \| `'debate'` \| `'chain'` | How many agents run and in what relationship. |
---
## Current State Map
| Mechanism | Isolation | Coordination | Scope | Mode | Key Files |
|-----------|-----------|--------------|-------|------|-----------|
| `parallel-orchestrator.js` | `full` | `standalone` (file-based IPC) | `milestone` | `parallel` | `src/resources/extensions/sf/parallel-orchestrator.js` |
| `slice-parallel-orchestrator.js` | `full` | `standalone` (file-based IPC) | `slice` | `parallel` | `src/resources/extensions/sf/slice-parallel-orchestrator.js` |
| Subagent tool (`extensions/subagent/index.js`) | `constrained` (4 tools) | `standalone` | `inline` | `single/parallel/debate/chain` | `src/resources/extensions/subagent/index.js` |
| UOK kernel (`uok/kernel.js`) | N/A (runs in-process) | `standalone` (no MessageBus) | `milestone` | `single` (autonomous loop) | `src/resources/extensions/sf/uok/kernel.js` |
| MessageBus (`uok/message-bus.js`) | N/A | N/A | N/A | N/A | `src/resources/extensions/sf/uok/message-bus.js` |
---
## Q1: Unified `dispatch()` API
### Design Principle
One function, four parameters. Every dispatch configuration is a point in the 4D parameter space.
```ts
// File: src/resources/extensions/sf/dispatch-layer.js (proposed)
export interface DispatchOptions {
// ── Isolation ───────────────────────────────────────────────────────────
// 'full': all SF tools, project DB read/write (milestone workers, slice workers)
// 'constrained': 4 tools only, no project DB writes (subagent, scout, reviewer, reporter)
isolation: 'full' | 'constrained';
// ── Coordination ───────────────────────────────────────────────────────
// 'standalone': no MessageBus, no coordinator messaging
// 'managed': AgentInbox per worker, coordinator can send pause/resume/stop/status messages
coordination: 'standalone' | 'managed';
// ── Scope ──────────────────────────────────────────────────────────────
// 'milestone': git worktree per milestone, SF_MILESTONE_LOCK set
// 'slice': git worktree per slice within a milestone, SF_SLICE_LOCK + SF_MILESTONE_LOCK set
// 'task': no worktree, runs in same process or short-lived subprocess
// 'inline': in-process agent call (subagent single mode, no worktree)
scope: 'milestone' | 'slice' | 'task' | 'inline';
// ── Mode ───────────────────────────────────────────────────────────────
// 'single': run one agent
// 'parallel': run N agents concurrently (up to maxConcurrency)
// 'debate': bounded adversarial rounds, each agent sees prior rounds' output
// 'chain': sequential, each step's output feeds into next step as {previous}
mode: 'single' | 'parallel' | 'debate' | 'chain';
// ── Common fields ──────────────────────────────────────────────────────
unitId: string; // milestoneId, sliceId, or taskId
milestoneId?: string; // required when scope='slice'
agent: string | string[]; // agent name(s); array for parallel/debate/chain
task?: string; // task description (single mode)
tasks?: TaskItem[]; // task list (parallel/debate mode)
chain?: ChainItem[]; // chain steps (chain mode)
basePath: string;
maxConcurrency?: number; // default: 4 for parallel/debate, 1 for chain
budgetCeiling?: number;
workerTimeoutMs?: number;
shellWrapper?: string[];
useExecutionGraph?: boolean; // use file-conflict DAG to filter parallel set
modelOverride?: string;
parentTrace?: string; // audit context injected for review subagents
// ── MessageBus routing (when coordination='managed') ──────────────────
coordinatorId?: string; // agentId of the coordinator (for routing replies)
}
export interface TaskItem {
agent: string;
task: string;
cwd?: string;
model?: string;
parentTrace?: string;
}
export interface ChainItem {
agent: string;
task: string; // may contain {previous} placeholder
cwd?: string;
model?: string;
parentTrace?: string;
}
export class DispatchLayer {
readonly bus: MessageBus; // shared bus for all dispatches
constructor(basePath: string, busOptions?: MessageBusOptions);
// ── Core dispatch ────────────────────────────────────────────────────────
async dispatch(opts: DispatchOptions): Promise<DispatchResult>;
// ── Batch helpers ───────────────────────────────────────────────────────
async dispatchMilestones(milestoneIds: string[], opts: Partial<DispatchOptions>): Promise<StartResult>;
async dispatchSlices(milestoneId: string, sliceIds: string[], opts: Partial<DispatchOptions>): Promise<StartResult>;
// ── Lifecycle ───────────────────────────────────────────────────────────
async stop(workIds?: string[]): Promise<void>;
pause(workIds?: string[]): void; // via MessageBus when managed
resume(workIds?: string[]): void; // via MessageBus when managed
// ── State ───────────────────────────────────────────────────────────────
getStatus(): DispatchStatus;
totalCost(): number;
isBudgetExceeded(): boolean;
// ── Event subscription ─────────────────────────────────────────────────
subscribe(handler: DispatchEventHandler): UnsubscribeFn;
}
```
### Parameter Matrix
| isolation | coordination | scope | mode | Current equivalent | DB access |
|-----------|---------------|-------|------|-------------------|-----------|
| `full` | `managed` | `milestone` | `parallel` | `parallel-orchestrator.js` | project DB read/write |
| `full` | `managed` | `slice` | `parallel` | `slice-parallel-orchestrator.js` | project DB read/write |
| `constrained` | `standalone` | `inline` | `single` | subagent single mode | no project DB |
| `constrained` | `standalone` | `inline` | `parallel` | subagent parallel mode | no project DB |
| `constrained` | `standalone` | `inline` | `debate` | subagent debate mode | no project DB |
| `constrained` | `standalone` | `inline` | `chain` | subagent chain mode | no project DB |
| `constrained` | `managed` | `inline` | `single` | **new: managed subagent** | no project DB |
| `full` | `managed` | `inline` | `single` | **new: headless autonomous** | project DB read/write |
---
## Q2: MessageBus as the Backbone
### Answer: Yes, ALL coordinator → worker and worker → coordinator communication flows through MessageBus.
The file-based IPC (`session-status-io.js`, `sendSignal`) becomes a **crash-recovery fallback only**, not the primary path.
### What MessageBus replaces today
| Current mechanism | Replaced by | Where defined |
|-------------------|-------------|---------------|
| `session-status-io.js` — write/read session status files | `MessageBus.send()` to `worker/<id>/status` | `dispatch-layer.js` |
| `sendSignal(basePath, mid, "pause\|resume\|stop")` | `MessageBus.send(coordinatorId, workerId, "pause")` | `dispatch-layer.js` |
| `consumeSignal(basePath, mid)` | Worker polls `AgentInbox.list()` | Worker bootstrap |
| `parallel-intent.js` — CoordinationStore for file intent | `MessageBus.send()` to `coordinator/file-intent` | `dispatch-layer.js` |
### Worker inbox naming convention
```
Workers get AgentInbox named: dispatch:<scope>:<unitId>
e.g., dispatch:milestone:M01
e.g., dispatch:slice:M01/S01
Coordinator gets AgentInbox named: dispatch:coordinator:<runId>
```
### MessageBus event taxonomy
```ts
// Worker → Coordinator messages (via worker's AgentInbox addressed to coordinatorId)
type WorkerStatusMessage =
| { type: 'worker.started', milestoneId: string, sliceId?: string, pid: number, worktreePath: string }
| { type: 'worker.heartbeat', milestoneId: string, cost: number, currentUnit?: string }
| { type: 'worker.completed', milestoneId: string, exitCode: number, totalCost: number }
| { type: 'worker.error', milestoneId: string, error: string }
| { type: 'worker.paused', milestoneId: string }
| { type: 'worker.resumed', milestoneId: string };
// Coordinator → Worker messages (via coordinator's AgentInbox addressed to workerId)
type CoordinatorCommandMessage =
| { type: 'coordinator.pause' }
| { type: 'coordinator.resume' }
| { type: 'coordinator.stop' }
| { type: 'coordinator.status_request' };
// Broadcast messages (coordinator → all workers)
type BroadcastMessage =
| { type: 'coordinator.budget_exceeded', ceiling: number }
| { type: 'coordinator.sibling_failed', triggeringWorkerId: string };
```
### Worker bootstrap changes
Today, a worker (`sf headless --json autonomous` in a worktree) has no MessageBus integration. It writes status files that the coordinator polls. After Q2:
```ts
// In the worker bootstrap (inside sf headless autonomous process)
const workerId = `dispatch:${scope}:${unitId}`;
const inbox = dispatchLayer.bus.getInbox(workerId);
// On each dispatch tick, check inbox for coordinator messages
const messages = inbox.list(unreadOnly = true);
for (const msg of messages) {
if (msg.body.type === 'coordinator.stop') {
// graceful shutdown
break;
}
inbox.markRead(msg.id);
}
```
### File-based IPC as fallback
`session-status-io.js` stays for **crash recovery**: if a coordinator restarts and workers are still running, the coordinator reads `session-status.json` files to restore state. This is already implemented and correct — it just stops being the *primary* coordination path.
### Code reference for existing file-based IPC
- `session-status-io.js:writeSessionStatus()` — atomic JSON write to `.sf/parallel/<mid>.status.json`
- `session-status-io.js:sendSignal()` — atomic JSON write to `.sf/parallel/<mid>.signal.json`
- `parallel-orchestrator.js:refreshWorkerStatuses()` — polls all status files every dashboard refresh cycle
- `parallel-orchestrator.js:processWorkerLine()` — parses NDJSON from worker stdout, updates status
---
## Q3: DB Access Matrix
### The single-writer invariant
`sf-db.js` enforces that **only this file** issues write SQL (INSERT/UPDATE/DELETE) against `.sf/sf.db`. All other modules must call typed wrappers exported here. This is checked in CI by `tests/single-writer-invariant.test.ts`.
### DB access per dispatch configuration
| Dispatch config | Project DB (.sf/sf.db) | Global DB (~/.sf/sf.db) | Notes |
|----------------|------------------------|--------------------------|-------|
| `isolation:full, scope:milestone` | **read/write** | read | Workers open WAL connection to project DB via `syncSfStateToWorktree` |
| `isolation:full, scope:slice` | **read/write** | read | Same as above |
| `isolation:full, scope:inline` | **read/write** | read | `sf headless autonomous` running in same process (UOK kernel's autonomous mode) |
| `isolation:constrained, scope:inline` | **read via prompt injection only** | read/write | Subagent spawns `sf` CLI which opens `~/.sf/sf.db` only; project DB accessed via prompt context injection |
| `isolation:constrained, scope:task` | **none** | read/write | Ephemeral task dispatch (future) |
### How constrained isolation is enforced
The subagent tool (`extensions/subagent/index.js`) spawns `sf` CLI as a **separate OS process**:
```ts
// subagent/index.js — runSingleAgent()
const child = spawn(launchSpec.command, launchSpec.args, {
cwd: cwd ?? defaultCwd,
env: launchSpec.env, // inherits parent env but NOT the RPC connection
shell: false,
stdio: ["ignore", "pipe", "pipe"],
});
```
The spawned `sf` CLI opens its **own** SQLite connection — by default `~/.sf/sf.db` (global). Project DB access happens only through **prompt injection** (`system-context.js` assembles project context into the system prompt).
The 4-tool registry (`subagent`, `await_subagent`, `cancel_subagent`, background job tools) is enforced by the extension manifest's `tools[]` array, not by DB permissions.
### The access contract for constrained dispatch
```ts
// In dispatch-layer.js — formalize the access contract
const DISPATCH_DB_ACCESS = {
full: {
read: ['project .sf/sf.db', 'global ~/.sf/sf.db'],
write: ['project .sf/sf.db', 'global ~/.sf/sf.db'],
},
constrained: {
read: ['project context via prompt injection only'],
write: ['global ~/.sf/sf.db only'],
}
} as const;
```
### Future: RPC-mode subagent (constrained + managed)
Today: `isolation:constrained, coordination:standalone`
New mode: `isolation:constrained, coordination:managed`
A subagent with `coordination:'managed'` gets an `AgentInbox` but still cannot write to the project DB. This enables long-running subagents to receive pause/stop messages from the coordinator without gaining DB write access.
---
## Q4: Coordinator Pattern with Coordinators on MessageBus
### Answer: Coordinators ARE MessageBus agents. Debate and chain work differently.
The coordinator is not a separate process — it is a **role** that a MessageBus agent plays when it initiates dispatch and monitors replies.
### Two coordinator patterns
**Pattern A — UOK Kernel as Coordinator (full-tool, managed, milestone/slice scope)**
The UOK kernel initializes a `DispatchLayer` with `coordination:'managed'`. Each worker has an `AgentInbox` named `dispatch:milestone:<mid>`. The kernel subscribes to all worker inboxes via the shared `MessageBus`.
```
UOK Kernel (coordinatorId: "dispatch:coordinator:<runId>")
├── MessageBus.send("dispatch:coordinator:<runId>", "dispatch:milestone:M01", { type: 'worker.started', ... })
├── MessageBus.send("dispatch:coordinator:<runId>", "dispatch:milestone:M02", { type: 'worker.started', ... })
│ Workers each poll their AgentInbox:
│ AgentInbox("dispatch:milestone:M01").list()
│ AgentInbox("dispatch:milestone:M02").list()
├── UOK kernel processes milestone completions
└── Calls dispatchLayer.stop() on autonomous loop exit
```
**Pattern B — Subagent Tool as Coordinator (constrained, standalone or managed, inline scope)**
The subagent tool itself is the coordinator for its `parallel`, `debate`, and `chain` modes. It does **not** use MessageBus for coordination in `standalone` mode — it uses `Promise.all` + in-process event streaming.
In `managed` mode (new), the subagent tool would also have an `AgentInbox` so the parent TUI session can send it pause/stop messages.
### How debate mode works (subagent tool, NOT using MessageBus for agent coordination)
The subagent tool's debate mode (`subagent/index.js:executeSubagentInvocation()`, line 320) runs multiple agents **sequentially within a single process** using `mapWithConcurrencyLimit(MAX_CONCURRENCY=4)`:
```ts
// subagent/index.js — debate mode
for (let round = 1; round <= rounds; round++) {
for (let i = 0; i < batchTasks.length; i++) {
// buildDebatePrompt() injects prior round transcripts
const prompt = buildDebatePrompt(task, round, transcriptEntries.join("\n\n"));
const result = await runSingleAgent(..., prompt, ...);
debateResults[(round-1) * batchTasks.length + i] = result;
transcriptEntries.push(formatResult(result));
}
}
```
This is **not** MessageBus-based because:
1. Agents in a debate share a **single conversation transcript** — they must run sequentially and pass state through the coordinator's memory
2. True process-level parallelism would require separate conversation contexts, which breaks the shared-transcript model
3. The coordinator IS the single-agent orchestrator that sequences rounds and injects transcripts
### How chain mode works (subagent tool)
Chain mode (`subagent/index.js:executeSubagentInvocation()`, line 220) runs steps **sequentially**, passing output as `{previous}`:
```ts
let previousOutput = "";
for (let i = 0; i < params.chain.length; i++) {
const step = params.chain[i];
const taskWithContext = step.task.replace(/\{previous\}/g, previousOutput);
const result = await runSingleAgent(..., taskWithContext, ...);
results.push(result);
previousOutput = getFinalOutput(result.messages);
}
```
**Chain does NOT need MessageBus** — it's purely sequential, and the coordinator (subagent tool) holds state in memory.
### When MessageBus IS used for coordinator ↔ workers
| Mode | MessageBus needed? | Why |
|------|---------------------|-----|
| `single` (inline) | No | One agent, no coordination needed |
| `parallel` (inline, subagent) | No | Coordinator uses `Promise.all`, in-process |
| `parallel` (milestone/slice, WorktreeOrchestrator) | **Yes** | Workers are separate processes; coordinator needs durable signaling |
| `debate` | No | Sequential rounds with in-memory transcript; process-level parallelism defeats shared context |
| `chain` | No | Purely sequential with in-memory `{previous}` injection |
| `managed` subagent | **Yes** | Parent TUI needs to send pause/stop to long-running subagent |
### The coordinator IS a MessageBus agent
```ts
// In DispatchLayer constructor
this.coordinatorInbox = this.bus.getOrCreateInbox(`dispatch:coordinator:${runId}`);
// When a worker wants to send to the coordinator:
this.bus.send(workerId, coordinatorId, { type: 'worker.completed', ... });
// When coordinator wants to send to a worker:
this.bus.send(coordinatorId, workerId, { type: 'coordinator.pause' });
```
---
## Q5: Migration from Today to Unified System
### Principle: Never break existing workflows. Build the new system alongside the old.
### Migration strategy: Strangler Fig
The old dispatch mechanisms are replaced one at a time, with the unified `DispatchLayer` absorbing their responsibilities. External behavior (CLI flags, command handlers, dashboard output) stays identical throughout.
### Phase 1 — Extract DispatchLayer (week 1-2)
**Goal**: Create `dispatch-layer.js` without changing behavior.
```ts
// New: src/resources/extensions/sf/dispatch-layer.js
// Internal implementation: delegates to existing parallel-orchestrator.js
// Public API: the DispatchOptions interface above
// parallel-orchestrator.js becomes a thin wrapper:
export async function startParallel(basePath, milestoneIds, prefs) {
const layer = new DispatchLayer(basePath);
return layer.dispatchMilestones(milestoneIds, { isolation: 'full', coordination: 'standalone', scope: 'milestone', mode: 'parallel', ...prefs });
}
```
**Files touched**:
- New: `src/resources/extensions/sf/dispatch-layer.js` (~600 LOC, merged from both orchestrators)
- `parallel-orchestrator.js` — refactored to delegate to DispatchLayer
- `slice-parallel-orchestrator.js` — refactored to delegate to DispatchLayer
**Test**: All parallel and slice-parallel tests pass. Dashboard shows same worker states.
### Phase 2 — Wire MessageBus for coordinator → worker signaling (week 2-3)
**Goal**: Workers get `AgentInbox`, coordinator sends pause/resume/stop via MessageBus.
```ts
// In dispatch-layer.js — start() method
async start(ids: string[], opts: DispatchOptions) {
// For each worker, create AgentInbox
for (const id of ids) {
const workerId = `dispatch:${opts.scope}:${id}`;
this.bus.getOrCreateInbox(workerId);
// ... spawn worker with MessageBus integration
}
}
// Worker bootstrap (sf headless autonomous process)
// On each dispatch tick:
const inbox = dispatchLayer.bus.getInbox(workerId);
for (const msg of inbox.list(true)) {
handleCoordinatorMessage(msg.body);
inbox.markRead(msg.id);
}
```
**Files touched**:
- `dispatch-layer.js` — add MessageBus send on start/pause/resume/stop
- Worker NDJSON bootstrap in `parallel-orchestrator.js` and `slice-parallel-orchestrator.js` — add inbox polling loop
**Test**: Workers respond to MessageBus pause/resume messages. File-based IPC (`session-status-io.js`) still works as crash-recovery fallback.
### Phase 3 — Subagent gets optional MessageBus inbox (week 3)
**Goal**: Subagents can opt in to `coordination:'managed'` for long-running tasks.
```ts
// In subagent tool params — new field
const SubagentParams = Type.Object({
// ... existing fields ...
managed: Type.Optional(Type.Boolean({
description: 'Give this subagent a MessageBus AgentInbox so the coordinator can send pause/stop messages.',
default: false,
})),
});
```
**Files touched**:
- `extensions/subagent/index.js` — add `managed` parameter
- When `managed: true`, spawn `sf headless` (not `sf` CLI) so it can receive MessageBus messages
**Test**: Long-running subagent receives coordinator pause message via MessageBus.
### Phase 4 — UOK kernel adopts DispatchLayer (week 4)
**Goal**: UOK kernel calls `DispatchLayer` instead of directly managing parallel workers.
Today: `uok/kernel.js` calls `parallel-orchestrator.js` via separate import.
After: `uok/kernel.js` calls `DispatchLayer` which owns the worktree pool.
**Files touched**:
- `uok/kernel.js` — replace `import { startParallel } from '../parallel-orchestrator.js'` with `import { DispatchLayer }`
- `dispatch-layer.js` — add `useExecutionGraph` integration so UOK kernel's dispatch decisions use the file-conflict DAG
**Test**: `sf headless autonomous` with parallel milestones works identically to current behavior.
### Phase 5 — Deprecate file-based IPC paths (week 5-6)
**Goal**: MessageBus becomes primary, file-based IPC becomes pure fallback.
After Phase 2, file-based IPC (`session-status-io.js`) is still written by workers for crash recovery. After Phase 5, coordinator **stops reading** status files on the primary path — only reads them on startup if MessageBus has no worker state.
**Files touched**:
- `dispatch-layer.js` — change `refreshWorkerStatuses()` to read from MessageBus first, fall back to `session-status-io.js` only if no MessageBus state found
- `session-status-io.js` — keep as crash-recovery-only, mark as `@deprecated`
**Test**: Crash recovery still works. Primary path uses MessageBus.
### Phase 6 — Subagent RPC mode (week 6-7)
**Goal**: Constrained subagents can gain full tool access by running as headless RPC client.
Today, constrained subagent spawns `sf` CLI (full binary). New mode: spawns `sf headless` as RPC client.
```ts
// In dispatch-layer.js
async dispatch(opts: DispatchOptions): Promise<DispatchResult> {
if (opts.isolation === 'constrained' && opts.rpcMode) {
return this.dispatchAsRpcClient(opts); // calls sf headless RPC
}
// ...
}
```
**Files touched**:
- `extensions/subagent/index.js` — add `rpcMode: true` path
- New: `extensions/subagent/rpc-client.js`
**Test**: Subagent with `rpcMode: true` can call `complete-task`.
### Phase 7 — Naming cleanup (week 7)
**Goal**: Reflect the unified model in file names.
- `dispatch-layer.js``worktree-orchestrator.js` (or keep `dispatch-layer.js` if preferred — the name is already clear)
- Update all import paths
---
## Q6: Implementation Order
### The correct build sequence (not the same as migration order)
The key constraint: **subagent tool must stay stable** throughout because it is the primary user-facing tool. The UOK kernel and parallel orchestrator are internal — they can change more aggressively.
### Order:
```
1. Extract DispatchLayer (week 1-2)
→ No behavior change. Parallel/slice orchestrators delegate to it.
→ Test: all existing parallel/slice tests pass.
2. Subagent RPC mode (week 3)
→ Most impactful user-facing improvement.
→ Subagent with RPC mode can call complete-task and other SF tools.
→ Isolated from the rest of the refactor — subagent is its own path.
3. MessageBus wiring in DispatchLayer (week 4)
→ Coordinator → workers via MessageBus, not file IPC.
→ Worker bootstrap gets inbox polling.
→ File-based IPC becomes fallback only.
4. UOK kernel adopts DispatchLayer (week 5)
→ UOK kernel is internal. Breaks less if changed late.
→ After this, the UOK kernel is the coordinator for autonomous mode.
5. Subagent managed mode (week 6)
→ Optional MessageBus inbox for subagents.
→ Parent TUI can pause/stop long-running subagents.
6. Deprecate file-based IPC (week 7)
→ MessageBus becomes the only primary path.
→ session-status-io.js kept for crash recovery only.
7. Naming cleanup + Cmux decoupling (week 8)
→ Remove cmuxSplitsEnabled coupling from subagent tool.
→ Cmux subscribes to MessageBus dispatch events.
```
### Why this order
1. **Subagent RPC mode first** — highest user impact with lowest risk. Subagent is a separate dispatch path; changes don't affect parallel orchestrator or UOK kernel.
2. **MessageBus wiring before UOK kernel adoption** — UOK kernel is the most complex consumer. We want MessageBus as the backbone *before* we hook UOK kernel to it.
3. **UOK kernel adoption late** — it's internal infrastructure. Changing it last means we've already validated the DispatchLayer API in production-like conditions (parallel orchestrator + subagent).
4. **Cmux decoupling last** — it's UI, not dispatch. It can follow the architecture once the dispatch architecture is stable.
---
## Key Code References
### Parallel Orchestrator
- `src/resources/extensions/sf/parallel-orchestrator.js` — 820 lines, manages milestone workers
- `startParallel()` — spawns `sf headless --json autonomous` in worktrees
- `spawnWorker()` — sets `SF_MILESTONE_LOCK`, `SF_PROJECT_ROOT`, `SF_PARALLEL_WORKER` env vars
- `processWorkerLine()` — parses NDJSON, extracts cost from `message_end` events
- `refreshWorkerStatuses()` — polls `session-status-io.js` for worker state
### Slice Parallel Orchestrator
- `src/resources/extensions/sf/slice-parallel-orchestrator.js` — 90% identical to parallel-orchestrator.js
- Key diff: sets `SF_SLICE_LOCK` + `SF_MILESTONE_LOCK` env vars
- Calls `filterConflictingSlices()` from `slice-parallel-conflict.ts`
### Session Status (File-based IPC)
- `src/resources/extensions/sf/session-status-io.js` — 150 lines
- `writeSessionStatus()` — atomic write to `.sf/parallel/<mid>.status.json`
- `sendSignal()` / `consumeSignal()` — pause/resume/stop via `.sf/parallel/<mid>.signal.json`
### Subagent Tool
- `src/resources/extensions/subagent/index.js` — 2700 lines
- `runSingleAgent()` — spawns `sf` CLI, parses NDJSON events
- `executeSubagentInvocation()` — handles single/parallel/debate/chain modes
- `mapWithConcurrencyLimit()` — in-process concurrency for parallel/debate modes
- No DB tools registered (only 4 tools in extension manifest)
### Subagent Inheritance (DB Access Contract)
- `src/resources/extensions/sf/subagent-inheritance.js` — 220 lines
- `buildSubagentInheritanceEnvelope()` — captures parent mode for subagent dispatch
- `validateSubagentDispatch()` — rejects subagents that bypass provider allowlists
### MessageBus
- `src/resources/extensions/sf/uok/message-bus.js` — 280 lines
- `MessageBus.send()` — SQLite-backed durable send to AgentInbox
- `AgentInbox` — per-agent durable inbox with TTL and retention
### UOK Kernel
- `src/resources/extensions/sf/uok/kernel.js` — 220 lines
- `runAutoLoopWithUok()` — the autonomous loop entry point
- Currently calls `parallel-orchestrator.js` separately, not through a unified dispatch layer
### Execution Graph (Constraint Solver)
- `src/resources/extensions/sf/uok/execution-graph.js`
- `selectConflictFreeBatch()` — picks conflict-free parallel subset from file overlap DAG
- Already used by parallel-orchestrator, should be used by slice-parallel and UOK kernel
### DB Schema
- `src/resources/extensions/sf/sf-db.js` — single-writer invariant
- `milestones` table — `id, title, status, created_at, ...`
- `slices` table — `milestone_id, id, title, status, ...`
- `tasks` table — `milestone_id, slice_id, id, status, ...`
- `milestone_specs`, `slice_specs`, `task_specs` — immutable spec records
- `milestone_evidence`, `slice_evidence`, `task_evidence` — append-only audit trail
### Parallel Intent (File Claim Registry)
- `src/resources/extensions/sf/parallel-intent.js` — 170 lines
- `declareIntent()` — worker announces file intent before editing
- Uses `UokCoordinationStore` (Redis-like on SQLite) for TTL-based claims
---
## Summary
**Q1 — Unified API**: One `dispatch()` function with four parameters: `isolation × coordination × scope × mode`. Current 5 mechanisms collapse to one `DispatchLayer` class.
**Q2 — MessageBus backbone**: YES. All coordinator ↔ worker communication flows through MessageBus. File-based IPC (`session-status-io.js`) becomes crash-recovery fallback only.
**Q3 — DB access matrix**: `isolation:full` → project DB read/write. `isolation:constrained` → no project DB writes, reads via prompt injection only. Global DB always accessible. Enforced by process boundary (spawned CLI) and extension manifest tools array.
**Q4 — Coordinator pattern**: Coordinators ARE MessageBus agents. The UOK kernel gets a `DispatchLayer` coordinator inbox. Debate/chain modes do NOT use MessageBus — they are sequential in-memory coordination with in-process `Promise.all`. Subagent parallel mode is also in-process. MessageBus is for **cross-process** coordination only.
**Q5 — Migration**: Strangler Fig pattern. Extract `DispatchLayer` first (no behavior change). Then wire MessageBus. Then UOK kernel adopts it. Subagent RPC mode is independent and can ship first.
**Q6 — Implementation order**:
1. Subagent RPC mode (highest impact, lowest risk)
2. Extract `DispatchLayer` (foundational, no behavior change)
3. Wire MessageBus into `DispatchLayer`
4. UOK kernel adopts `DispatchLayer`
5. Subagent managed mode
6. Deprecate file-based IPC
7. Cmux decoupling + naming cleanup
**Total: ~8 weeks**, sequenced to never break existing workflows.