590 lines
30 KiB
Markdown
590 lines
30 KiB
Markdown
|
|
# Unified Dispatch v2 — Architecture Plan
|
||
|
|
|
||
|
|
**Author:** Research synthesis
|
||
|
|
**Date:** 2026-05-08
|
||
|
|
**Status:** Draft — for review
|
||
|
|
**Scope:** Answer the 6 unified-dispatch questions with specific, opinionated positions backed by code references.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## The Unified Vision
|
||
|
|
|
||
|
|
SF should support a single dispatch system where ALL of these coexist and compose:
|
||
|
|
|
||
|
|
1. **Full-tool agents** — workers with all SF tools + full project DB access (today's parallel-orchestrator workers)
|
||
|
|
2. **Constrained subagents** — the current subagent tool (4 tools, no project DB writes)
|
||
|
|
3. **MessageBus-coordinated agents** — agents with AgentInbox, communicating via MessageBus (durable inbox, not file-based IPC)
|
||
|
|
4. **Coordinators on MessageBus too** — UOK kernel publishes to workers via MessageBus, workers reply via MessageBus
|
||
|
|
5. **All in parallel/debate/chain** — the subagent tool's 4 modes apply to ALL of the above
|
||
|
|
6. **Shared SQLite WAL** — all agents that need project state share the same DB
|
||
|
|
7. **Optional MessageBus inbox for subagents** — subagents can opt in to receive coordinator messages
|
||
|
|
|
||
|
|
The dispatch layer is ONE system parameterized by four dimensions:
|
||
|
|
|
||
|
|
```
|
||
|
|
dispatch(opts)
|
||
|
|
├── isolation: 'full' ← all SF tools + project DB WAL
|
||
|
|
│ 'constrained' ← 4 tools + ~/.sf/sf.db only (subagent)
|
||
|
|
├── coordination: 'standalone' ← no MessageBus, no coordinator messaging
|
||
|
|
│ 'managed' ← AgentInbox + MessageBus-enabled
|
||
|
|
├── scope: 'milestone' | 'slice' | 'task' | 'inline'
|
||
|
|
└── mode: 'single' | 'parallel' | 'debate' | 'chain'
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Q1. Unified Interface — The `dispatch()` API
|
||
|
|
|
||
|
|
### Current State
|
||
|
|
|
||
|
|
Three separate dispatch mechanisms with no shared interface:
|
||
|
|
|
||
|
|
| Component | Interface | Backing |
|
||
|
|
|-----------|-----------|---------|
|
||
|
|
| `parallel-orchestrator.js` | `startParallel(basePath, milestoneIds, prefs)` | worktree pool + child_process |
|
||
|
|
| `slice-parallel-orchestrator.js` | `startSliceParallel(basePath, milestoneId, eligibleSlices, opts)` | same, different scope |
|
||
|
|
| `subagent/index.js` | `executeSubagentInvocation({defaultCwd, agents, params, signal, ...})` | spawn `sf` CLI |
|
||
|
|
| `uok/kernel.js` | `runAutoLoopWithUok(args)` — owns autonomous loop | owns controller + mechanism |
|
||
|
|
|
||
|
|
### Proposed API
|
||
|
|
|
||
|
|
A single `DispatchService` class (formerly `WorktreeOrchestrator`) with a typed `DispatchOptions` interface:
|
||
|
|
|
||
|
|
```ts
|
||
|
|
// File: src/resources/extensions/sf/dispatch/service.js
|
||
|
|
|
||
|
|
export interface DispatchOptions {
|
||
|
|
// ── Isolation (what tools + DB access) ──────────────────────────
|
||
|
|
isolation: 'full' | 'constrained';
|
||
|
|
|
||
|
|
// ── Coordination (messaging model) ──────────────────────────────
|
||
|
|
coordination: 'standalone' | 'managed';
|
||
|
|
|
||
|
|
// ── Scope (work unit type) ────────────────────────────────────
|
||
|
|
scope: 'milestone' | 'slice' | 'task' | 'inline';
|
||
|
|
|
||
|
|
// ── Unit identity ──────────────────────────────────────────────
|
||
|
|
milestoneId?: string;
|
||
|
|
sliceId?: string;
|
||
|
|
taskId?: string; // future: task-level dispatch
|
||
|
|
basePath: string;
|
||
|
|
|
||
|
|
// ── Execution mode ─────────────────────────────────────────────
|
||
|
|
mode: 'single' | 'parallel' | 'debate' | 'chain';
|
||
|
|
|
||
|
|
// ── Capacity ──────────────────────────────────────────────────
|
||
|
|
maxWorkers?: number; // default: parallel.max_workers config
|
||
|
|
budgetCeiling?: number; // default: parallel.budget_ceiling config
|
||
|
|
workerTimeoutMs?: number;
|
||
|
|
|
||
|
|
// ── Execution graph (file-conflict DAG) ────────────────────────
|
||
|
|
useExecutionGraph?: boolean; // default: true
|
||
|
|
|
||
|
|
// ── Subagent-specific ──────────────────────────────────────────
|
||
|
|
// Only valid when isolation === 'constrained'
|
||
|
|
agentScope?: 'user' | 'project' | 'both';
|
||
|
|
parentTrace?: string; // audit context injected into task prompts
|
||
|
|
useMessageBus?: boolean; // give subagent an AgentInbox
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Core API Surface
|
||
|
|
|
||
|
|
```ts
|
||
|
|
class DispatchService {
|
||
|
|
// ── Lifecycle ─────────────────────────────────────────────────
|
||
|
|
constructor(opts: DispatchOptions);
|
||
|
|
|
||
|
|
// Prepare: run eligibility analysis + execution graph filtering
|
||
|
|
// Returns { eligible, conflicts, skipped } without starting workers
|
||
|
|
async prepare(): Promise<PrepareResult>;
|
||
|
|
|
||
|
|
// Start workers for given unit IDs
|
||
|
|
async start(unitIds: string[]): Promise<StartResult>;
|
||
|
|
|
||
|
|
// Stop all or specific workers
|
||
|
|
async stop(unitIds?: string[]): Promise<void>;
|
||
|
|
|
||
|
|
// Pause/resume workers (via MessageBus when coordination === 'managed')
|
||
|
|
pause(unitIds?: string[]): void;
|
||
|
|
resume(unitIds?: string[]): void;
|
||
|
|
|
||
|
|
// ── Observation ───────────────────────────────────────────────
|
||
|
|
// Returns current state snapshot for dashboard
|
||
|
|
getStatus(): DispatchStatus;
|
||
|
|
|
||
|
|
// Subscribe to dispatch events (wraps MessageBus)
|
||
|
|
subscribe(handler: DispatchEventHandler): UnsubscribeFn;
|
||
|
|
|
||
|
|
// ── Budget ────────────────────────────────────────────────────
|
||
|
|
totalCost(): number;
|
||
|
|
isBudgetExceeded(): boolean;
|
||
|
|
|
||
|
|
// ── Shared infrastructure ─────────────────────────────────────
|
||
|
|
readonly bus: MessageBus; // shared bus when coordination === 'managed'
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### How the 4 Dimensions Compose
|
||
|
|
|
||
|
|
| isolation | coordination | scope | mode | What happens |
|
||
|
|
|-----------|-------------|-------|------|-------------|
|
||
|
|
| `'full'` | `'standalone'` | `'milestone'` | `'parallel'` | Current parallel-orchestrator behavior |
|
||
|
|
| `'full'` | `'standalone'` | `'slice'` | `'parallel'` | Current slice-parallel behavior |
|
||
|
|
| `'full'` | `'managed'` | `'milestone'` | `'parallel'` | Workers have AgentInbox; coordinator sends pause/resume via MessageBus |
|
||
|
|
| `'constrained'` | `'standalone'` | `'inline'` | `'single'` | Current subagent single mode |
|
||
|
|
| `'constrained'` | `'standalone'` | `'inline'` | `'parallel'` | Current subagent parallel mode |
|
||
|
|
| `'constrained'` | `'standalone'` | `'inline'` | `'debate'` | Current subagent debate mode |
|
||
|
|
| `'constrained'` | `'standalone'` | `'inline'` | `'chain'` | Current subagent chain mode |
|
||
|
|
| `'constrained'` | `'managed'` | `'inline'` | `'single'` | Subagent with AgentInbox (opt-in); coordinator can message it |
|
||
|
|
| `'full'` | `'managed'` | `'milestone'` | `'debate'` | Full-tool debate: multiple milestone workers with MessageBus |
|
||
|
|
| `'full'` | `'managed'` | `'milestone'` | `'chain'` | Full-tool chain: milestone workers run sequentially via MessageBus handoff |
|
||
|
|
|
||
|
|
### Subagent Tool as DispatchService Client
|
||
|
|
|
||
|
|
The `subagent` tool becomes a thin **client** of `DispatchService`:
|
||
|
|
|
||
|
|
```
|
||
|
|
subagent tool
|
||
|
|
│
|
||
|
|
├── isolation: 'constrained'
|
||
|
|
├── coordination: params.useMessageBus ? 'managed' : 'standalone'
|
||
|
|
├── scope: 'inline'
|
||
|
|
├── mode: params.mode (single/parallel/debate/chain)
|
||
|
|
│
|
||
|
|
└── Calls DispatchService instead of managing its own spawn pool
|
||
|
|
```
|
||
|
|
|
||
|
|
This eliminates the ~1000 LOC of concurrency management in `subagent/index.js` (`mapWithConcurrencyLimit`, `runSingleAgent`, `runSingleAgentInCmuxSplit`, `spawn` boilerplate) and replaces it with a single `dispatch.start()` call.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Q2. MessageBus as the Backbone
|
||
|
|
|
||
|
|
### Current State
|
||
|
|
|
||
|
|
MessageBus (`uok/message-bus.js`) is wired **only to UOK kernel internal observer chains**:
|
||
|
|
|
||
|
|
- UOK kernel creates a `MessageBus` instance in `runAutoLoopWithUok`
|
||
|
|
- `createTurnObserver` (`uok/loop-adapter.js`) subscribes to UOK events
|
||
|
|
- The parallel orchestrator and slice-parallel orchestrator use **file-based IPC** exclusively:
|
||
|
|
- `session-status-io.js`: poll `.sf/parallel/sessions/*.json` every refresh cycle
|
||
|
|
- `sendSignal(basePath, mid, "pause"|"resume"|"stop")`: write signal files that workers check on next dispatch
|
||
|
|
|
||
|
|
### The Gap
|
||
|
|
|
||
|
|
File-based IPC is correct for crash recovery (workers persist state to disk and survive coordinator restarts), but it has two weaknesses:
|
||
|
|
|
||
|
|
1. **No durable coordinator → worker messaging**: When a coordinator restarts, it re-reads session files to restore state, but workers don't know the coordinator restarted unless they poll. Workers check for signals on each dispatch turn — correct but ~1-2 second latency.
|
||
|
|
|
||
|
|
2. **No worker → coordinator messaging**: Workers emit cost via NDJSON stdout, but there's no inbox model for workers to send structured messages back to the coordinator.
|
||
|
|
|
||
|
|
### Proposed: MessageBus Replaces File-Based IPC for Live Coordination
|
||
|
|
|
||
|
|
```
|
||
|
|
Current (file-based):
|
||
|
|
Coordinator ──signal file──► Worker
|
||
|
|
Worker ────session status file───► Coordinator (polled)
|
||
|
|
|
||
|
|
Proposed (MessageBus):
|
||
|
|
Coordinator ──MessageBus.send()──► Worker AgentInbox
|
||
|
|
Worker ────MessageBus.send()───────► Coordinator Inbox
|
||
|
|
(File-based IPC stays as crash-recovery fallback)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Implementation
|
||
|
|
|
||
|
|
The `DispatchService` owns a single `MessageBus` instance per basePath. Each worker gets an `AgentInbox` named after its unit ID (e.g., `milestone:M01`, `slice:S01:02`).
|
||
|
|
|
||
|
|
**Coordinator → Worker messages** (pause, resume, stop, status request):
|
||
|
|
```ts
|
||
|
|
// In DispatchService
|
||
|
|
this.bus.send('coordinator', `worker:${unitId}`, { type: 'control', action }, metadata);
|
||
|
|
```
|
||
|
|
|
||
|
|
**Worker → Coordinator messages** (unit started, unit completed, error, cost update):
|
||
|
|
```ts
|
||
|
|
// In worker bootstrap (sf headless entry point)
|
||
|
|
const bus = new MessageBus(basePath);
|
||
|
|
const inbox = bus.getInbox(`worker:${unitId}`);
|
||
|
|
inbox.receive({ from: 'worker', body: { type: 'unit_started' }, ... });
|
||
|
|
```
|
||
|
|
|
||
|
|
**File-based fallback remains**: `session-status-io.js` is NOT removed. Workers still write session status files. The coordinator still reads them on startup for crash recovery. MessageBus adds *durable live coordination* on top.
|
||
|
|
|
||
|
|
### Should ALL Coordination Flow Through MessageBus?
|
||
|
|
|
||
|
|
**Yes, for live coordination between a running coordinator and its workers.**
|
||
|
|
|
||
|
|
The UOK kernel itself becomes a coordinator that uses MessageBus. When `runAutoLoopWithUok` initializes `DispatchService`, it passes `coordination: 'managed'`. The UOK kernel then receives worker events via the shared bus rather than polling session files.
|
||
|
|
|
||
|
|
**File-based IPC stays for crash recovery** — when a coordinator dies and restarts, it reads session status files to adopt surviving workers. MessageBus state does not survive coordinator restarts (inboxes are in-memory, backed by SQLite messages). This is the right split: MessageBus for live coordination, file-based for durability.
|
||
|
|
|
||
|
|
**What replaces file-based IPC for subagent coordination?** Subagents spawned with `isolation: 'constrained'` and `coordination: 'standalone'` use the current model (spawn `sf` CLI, parse NDJSON stdout). When `coordination: 'managed'`, subagents get an `AgentInbox` and the coordinator can send them pause/resume messages.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Q3. DB Access Matrix
|
||
|
|
|
||
|
|
### Current State
|
||
|
|
|
||
|
|
| Dispatch configuration | DB access | Mechanism |
|
||
|
|
|-----------------------|-----------|-----------|
|
||
|
|
| Parallel-orchestrator workers | Full project `.sf/sf.db` WAL | Workers open `.sf/sf.db` in worktree via `syncSfStateToWorktree` |
|
||
|
|
| Slice-parallel-orchestrator workers | Full project `.sf/sf.db` WAL | Same as above |
|
||
|
|
| Subagent (spawned `sf` CLI) | Global `~/.sf/sf.db` only; NO project DB | Spawned process has own SQLite connection |
|
||
|
|
| UOK kernel (autonomous loop) | Full project `.sf/sf.db` WAL | Runs in project context |
|
||
|
|
| Cmux | None | Terminal surface only |
|
||
|
|
|
||
|
|
### The Constraint Is Intentional
|
||
|
|
|
||
|
|
The subagent's 4-tool limit is **correct security isolation**, not a limitation to be fixed:
|
||
|
|
|
||
|
|
- A spawned `sf` CLI with project DB write access running in a user-specified `cwd` is a significant attack surface
|
||
|
|
- Subagents should return **structured output**, not mutate state directly
|
||
|
|
- The coordinator (UOK kernel or parent agent) is responsible for interpreting subagent output and calling DB tools
|
||
|
|
|
||
|
|
### Proposed DB Access Matrix (Unified Model)
|
||
|
|
|
||
|
|
```
|
||
|
|
┌──────────────────────────────────────────────────────────────────┐
|
||
|
|
│ isolation: 'full', coordination: 'managed' │
|
||
|
|
│ Workers: milestone/slice agents spawned via DispatchService │
|
||
|
|
│ DB: project .sf/sf.db (WAL) — full read/write │
|
||
|
|
│ AgentInbox: yes │
|
||
|
|
├──────────────────────────────────────────────────────────────────┤
|
||
|
|
│ isolation: 'constrained', coordination: 'standalone' │
|
||
|
|
│ Subagent: current subagent tool (spawned sf CLI) │
|
||
|
|
│ DB: ~/.sf/sf.db (global) read/write; project .sf/sf.db: read-via-prompt-injection │
|
||
|
|
│ AgentInbox: no │
|
||
|
|
├──────────────────────────────────────────────────────────────────┤
|
||
|
|
│ isolation: 'constrained', coordination: 'managed' │
|
||
|
|
│ Subagent with opt-in messaging │
|
||
|
|
│ DB: same as above + MessageBus inbox for coordinator messages │
|
||
|
|
│ AgentInbox: yes (injected via prompt context) │
|
||
|
|
├──────────────────────────────────────────────────────────────────┤
|
||
|
|
│ isolation: 'full', coordination: 'standalone' │
|
||
|
|
│ Workers without MessageBus (legacy standalone mode) │
|
||
|
|
│ DB: project .sf/sf.db (WAL) — full read/write │
|
||
|
|
│ AgentInbox: no │
|
||
|
|
└──────────────────────────────────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
### Key Rule
|
||
|
|
|
||
|
|
**`isolation: 'full'` = project DB WAL access. `isolation: 'constrained'` = no project DB writes.**
|
||
|
|
|
||
|
|
The DB access is determined solely by `isolation`, not by `scope` or `mode`. A slice-scope worker with `isolation: 'full'` has the same DB access as a milestone-scope worker — correct, since they both represent the primary agent running project work.
|
||
|
|
|
||
|
|
### Subagent Output Contract
|
||
|
|
|
||
|
|
When a constrained subagent needs to record something in project state, the contract is:
|
||
|
|
|
||
|
|
1. Subagent returns structured output (via NDJSON `message_end` events)
|
||
|
|
2. Coordinator parses and calls the appropriate DB tool (`complete-task`, `block-slice`, etc.)
|
||
|
|
3. Subagent never writes to project DB directly
|
||
|
|
|
||
|
|
This mirrors the Letta agent pattern: agents return results, the orchestrator persists.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Q4. Coordinator Pattern — Debate and Chain on MessageBus
|
||
|
|
|
||
|
|
### Current State
|
||
|
|
|
||
|
|
**Subagent debate/chain** (`subagent/index.js`):
|
||
|
|
- Debate: `mapWithConcurrencyLimit` runs N agents per round sequentially within a single process; each agent sees the prior round's transcript
|
||
|
|
- Chain: sequential `runSingleAgent` calls, each feeding its output into the next step's prompt
|
||
|
|
- The coordinator is **in-process** — it's the `subagent/index.js` call stack
|
||
|
|
|
||
|
|
**Parallel orchestrator** (`parallel-orchestrator.js`):
|
||
|
|
- Coordinator is **out-of-process** — it's the TUI/headless process that spawned milestone workers
|
||
|
|
- No MessageBus — coordinator and workers communicate via session status files and NDJSON stdout
|
||
|
|
- Workers run `sf headless --json autonomous` in worktrees
|
||
|
|
|
||
|
|
### How Debate and Chain Work with Coordinators on MessageBus
|
||
|
|
|
||
|
|
The coordinator is always the **dispatching agent** (UOK kernel or subagent tool). The key question is whether the coordinator is in-process or out-of-process.
|
||
|
|
|
||
|
|
#### Debate Mode with Full-Tool Workers (Milestone-Level)
|
||
|
|
|
||
|
|
```
|
||
|
|
Coordinator (UOK kernel, coordination: 'managed')
|
||
|
|
│
|
||
|
|
├── Round 1: bus.broadcast('coordinator', [worker:M1a, worker:M1b], {type: 'debate', round: 1, topic, prompt})
|
||
|
|
├── Each worker replies via their AgentInbox with their position
|
||
|
|
├── Coordinator collects all replies, builds transcript
|
||
|
|
│
|
||
|
|
├── Round 2: bus.broadcast('coordinator', [worker:M1a, worker:M1b], {type: 'debate', round: 2, transcript})
|
||
|
|
├── ...
|
||
|
|
│
|
||
|
|
└── Round N: coordinator issues final verdict
|
||
|
|
```
|
||
|
|
|
||
|
|
This is **true process-level parallelism** — workers are separate `sf headless` processes in worktrees, each with full project DB access. The coordinator sequences rounds via MessageBus.
|
||
|
|
|
||
|
|
#### Chain Mode with Full-Tool Workers
|
||
|
|
|
||
|
|
```
|
||
|
|
Coordinator (UOK kernel, coordination: 'managed')
|
||
|
|
│
|
||
|
|
├── Step 1: bus.send('coordinator', 'worker:M1a', {type: 'chain', step: 1, after: null})
|
||
|
|
│ Worker M1a produces output
|
||
|
|
├── Coordinator collects output
|
||
|
|
│
|
||
|
|
├── Step 2: bus.send('coordinator', 'worker:M1b', {type: 'chain', step: 2, after: output_from_M1a})
|
||
|
|
│ Worker M1b produces output
|
||
|
|
├── ...
|
||
|
|
```
|
||
|
|
|
||
|
|
The coordinator controls sequencing — it waits for each step's output before dispatching the next. Workers can run in different worktrees or the same worktree depending on file-conflict constraints.
|
||
|
|
|
||
|
|
#### Debate Mode with Constrained Subagents (Current Behavior)
|
||
|
|
|
||
|
|
The current subagent debate mode runs in-process via `mapWithConcurrencyLimit`. This is correct for constrained subagents because:
|
||
|
|
- They're short-lived, spawned per debate round
|
||
|
|
- They don't need project DB access
|
||
|
|
- In-process is faster (no process spawn overhead per round)
|
||
|
|
|
||
|
|
**This does NOT change** for constrained subagents. The coordinator stays in-process.
|
||
|
|
|
||
|
|
#### Chain Mode with Constrained Subagents (Current Behavior)
|
||
|
|
|
||
|
|
Current subagent chain mode is sequential `runSingleAgent` calls in the same process. **This does NOT change** for constrained subagents.
|
||
|
|
|
||
|
|
### When Does the Coordinator Become a MessageBus Agent?
|
||
|
|
|
||
|
|
**Only when `coordination: 'managed'` and `isolation: 'full'`** (full-tool workers).
|
||
|
|
|
||
|
|
The coordinator (UOK kernel) gets its own `AgentInbox` on the MessageBus:
|
||
|
|
```ts
|
||
|
|
// In DispatchService
|
||
|
|
const coordinatorInbox = this.bus.getInbox('coordinator');
|
||
|
|
```
|
||
|
|
|
||
|
|
Workers send messages to `coordinator`; coordinator sends to `worker:${unitId}`.
|
||
|
|
|
||
|
|
**For constrained subagents** (`isolation: 'constrained'`), the coordinator is always in-process. They don't use MessageBus unless `coordination: 'managed'` is explicitly set — in which case the subagent tool creates an `AgentInbox` for the spawned subagent process and the coordinator (subagent tool's process) can send it messages.
|
||
|
|
|
||
|
|
### Summary
|
||
|
|
|
||
|
|
| Mode | isolation | Coordinator location | MessageBus role |
|
||
|
|
|------|-----------|---------------------|----------------|
|
||
|
|
| `parallel` | `'full'` | Out-of-process (UOK kernel) | Workers reachable via AgentInbox |
|
||
|
|
| `debate` | `'full'` | Out-of-process (UOK kernel) | Rounds sequenced via broadcast |
|
||
|
|
| `chain` | `'full'` | Out-of-process (UOK kernel) | Sequential handoff via send/reply |
|
||
|
|
| `single` | `'full'` | Out-of-process (UOK kernel) | Worker has AgentInbox |
|
||
|
|
| `parallel` | `'constrained'` | In-process (subagent tool) | Optional AgentInbox if opt-in |
|
||
|
|
| `debate` | `'constrained'` | In-process (subagent tool) | Not MessageBus (in-process) |
|
||
|
|
| `chain` | `'constrained'` | In-process (subagent tool) | Not MessageBus (in-process) |
|
||
|
|
| `single` | `'constrained'` | In-process (subagent tool) | Optional AgentInbox if opt-in |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Q5. Migration — From Today's Siloed Mechanisms to Unified System
|
||
|
|
|
||
|
|
### The Constraint: Don't Break Existing Workflows
|
||
|
|
|
||
|
|
SF has active users relying on:
|
||
|
|
- `sf parallel <milestone-id>` — the parallel orchestrator dashboard
|
||
|
|
- `sf headless autonomous` — the UOK kernel autonomous loop
|
||
|
|
- `sf` subagent tool with all 4 modes — used inside TUI/headless sessions
|
||
|
|
- Slice-level parallelism inside milestones
|
||
|
|
|
||
|
|
**Migration must be additive and backward-compatible at each step.**
|
||
|
|
|
||
|
|
### Migration Path: 6 Phases
|
||
|
|
|
||
|
|
#### Phase 1 — Merge Parallel + Slice Orchestrators (Week 1)
|
||
|
|
**Risk: Low | Behavior: identical**
|
||
|
|
|
||
|
|
Extract the ~80% shared logic from `parallel-orchestrator.js` + `slice-parallel-orchestrator.js` into a single `WorktreeOrchestrator` class parameterized by `{ scope: 'milestone' | 'slice' }`.
|
||
|
|
|
||
|
|
**Before:**
|
||
|
|
```
|
||
|
|
parallel-orchestrator.js (~800 LOC)
|
||
|
|
slice-parallel-orchestrator.js (~450 LOC)
|
||
|
|
```
|
||
|
|
|
||
|
|
**After:**
|
||
|
|
```
|
||
|
|
worktree-orchestrator.js (~900 LOC merged)
|
||
|
|
├── Both orchestrators become thin wrappers calling WorktreeOrchestrator
|
||
|
|
└── slice-parallel-conflict.ts stays as the constraint solver
|
||
|
|
```
|
||
|
|
|
||
|
|
**Files touched:**
|
||
|
|
- New: `src/resources/extensions/sf/worktree-orchestrator.js`
|
||
|
|
- Refactor: `parallel-orchestrator.js` → thin wrapper
|
||
|
|
- Refactor: `slice-parallel-orchestrator.js` → thin wrapper
|
||
|
|
- All callers of `startParallel` / `startSliceParallel` continue to work
|
||
|
|
|
||
|
|
**Verification:** parallel dashboard and slice-level parallelism work identically. Zero behavior change.
|
||
|
|
|
||
|
|
#### Phase 2 — Extract DispatchService API (Week 2)
|
||
|
|
**Risk: Low | Behavior: identical**
|
||
|
|
|
||
|
|
Create the `DispatchService` class with the `DispatchOptions` interface. Wrap `WorktreeOrchestrator` internally. The parallel orchestrator wrapper becomes a `DispatchService` client.
|
||
|
|
|
||
|
|
```ts
|
||
|
|
// New file: src/resources/extensions/sf/dispatch/service.js
|
||
|
|
export class DispatchService {
|
||
|
|
constructor(opts: DispatchOptions) { ... }
|
||
|
|
async prepare(): Promise<PrepareResult> { return this.orchestrator.prepare(...); }
|
||
|
|
async start(unitIds: string[]): Promise<StartResult> { ... }
|
||
|
|
...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Files touched:**
|
||
|
|
- New: `src/resources/extensions/sf/dispatch/service.js`
|
||
|
|
- New: `src/resources/extensions/sf/dispatch/types.js`
|
||
|
|
- Parallel orchestrator wrapper updated to call `DispatchService`
|
||
|
|
- Slice parallel orchestrator wrapper updated to call `DispatchService`
|
||
|
|
|
||
|
|
**Verification:** all existing dispatch paths (parallel, slice-parallel) work via the new API.
|
||
|
|
|
||
|
|
#### Phase 3 — Wire MessageBus into DispatchService (Week 3)
|
||
|
|
**Risk: Medium | Behavior: additive**
|
||
|
|
|
||
|
|
Add `MessageBus` to `DispatchService` and give each worker an `AgentInbox` when `coordination: 'managed'`. File-based IPC (`session-status-io.js`) stays as fallback.
|
||
|
|
|
||
|
|
**New behavior (opt-in):**
|
||
|
|
```ts
|
||
|
|
const dispatch = new DispatchService({
|
||
|
|
isolation: 'full',
|
||
|
|
coordination: 'managed', // NEW: workers get AgentInbox
|
||
|
|
scope: 'milestone',
|
||
|
|
mode: 'parallel',
|
||
|
|
...
|
||
|
|
});
|
||
|
|
```
|
||
|
|
|
||
|
|
**Files touched:**
|
||
|
|
- `src/resources/extensions/sf/dispatch/service.js` — add MessageBus integration
|
||
|
|
- `src/resources/extensions/sf/worktree-orchestrator.js` — add worker AgentInbox creation
|
||
|
|
- `worker bootstrap` in `spawnWorker` — open MessageBus inbox after fork
|
||
|
|
|
||
|
|
**Verification:** workers respond to `dispatch.pause()` / `dispatch.resume()` via MessageBus. File-based fallback still works.
|
||
|
|
|
||
|
|
#### Phase 4 — Subagent Tool Uses DispatchService (Week 4)
|
||
|
|
**Risk: Medium | Behavior: constrained subagent modes unchanged**
|
||
|
|
|
||
|
|
Replace subagent tool's internal spawn pool with `DispatchService({ isolation: 'constrained', scope: 'inline' })`. For now, use `coordination: 'standalone'` — no MessageBus for subagents yet.
|
||
|
|
|
||
|
|
**Files touched:**
|
||
|
|
- `src/resources/extensions/subagent/index.js` — replace concurrency management with `DispatchService` calls
|
||
|
|
- Estimated: ~600 LOC removed (spawn management, `mapWithConcurrencyLimit`, `runSingleAgent`, etc.)
|
||
|
|
|
||
|
|
**Verification:** all 4 subagent modes (single/parallel/debate/chain) work identically. The implementation changes, the user experience doesn't.
|
||
|
|
|
||
|
|
#### Phase 5 — UOK Kernel Adopts DispatchService (Week 5)
|
||
|
|
**Risk: Medium | Behavior: UOK autonomous loop uses unified API**
|
||
|
|
|
||
|
|
Refactor `runAutoLoopWithUok` to use `DispatchService` instead of calling `startParallel` / `slice-parallel` directly.
|
||
|
|
|
||
|
|
```ts
|
||
|
|
// Before (in kernel.js):
|
||
|
|
const { started, errors } = await startParallel(basePath, milestoneIds, prefs);
|
||
|
|
|
||
|
|
// After:
|
||
|
|
const dispatch = new DispatchService({
|
||
|
|
isolation: 'full',
|
||
|
|
coordination: 'managed',
|
||
|
|
scope: 'milestone',
|
||
|
|
mode: 'parallel',
|
||
|
|
basePath,
|
||
|
|
...
|
||
|
|
});
|
||
|
|
await dispatch.start(eligibleMilestoneIds);
|
||
|
|
```
|
||
|
|
|
||
|
|
**Files touched:**
|
||
|
|
- `src/resources/extensions/sf/uok/kernel.js` — use DispatchService
|
||
|
|
- Remove `startParallel` / `startSliceParallel` exports (or keep as legacy wrappers)
|
||
|
|
|
||
|
|
**Verification:** `sf headless autonomous` works identically. Workers appear in dashboard.
|
||
|
|
|
||
|
|
#### Phase 6 — Subagent Optional MessageBus Inbox (Week 6)
|
||
|
|
**Risk: Low | Behavior: opt-in, additive**
|
||
|
|
|
||
|
|
Allow subagent tool to pass `useMessageBus: true`, giving the spawned subagent an `AgentInbox` that the coordinator can message.
|
||
|
|
|
||
|
|
**Files touched:**
|
||
|
|
- `src/resources/extensions/subagent/index.js` — inject `useMessageBus` into DispatchService opts
|
||
|
|
- `src/resources/extensions/sf/dispatch/service.js` — handle `isolation: 'constrained', coordination: 'managed'`
|
||
|
|
|
||
|
|
**Verification:** subagent with `useMessageBus: true` can receive pause/resume from coordinator.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Q6. Implementation Order — Build First, Second, Third
|
||
|
|
|
||
|
|
### Priority Rationale
|
||
|
|
|
||
|
|
**Highest value first:**
|
||
|
|
1. **Phase 1 (merge)** — Eliminates 90% code duplication. Pure refactor, no new behavior. Clarifies the worktree pool as a single concept. Sets the foundation for all subsequent changes.
|
||
|
|
|
||
|
|
2. **Phase 2 (API extraction)** — Codifies the `DispatchOptions` interface before any new dispatch paths are added. Forces the 4-dimension model to be explicit and typed. New code immediately benefits from the API.
|
||
|
|
|
||
|
|
3. **Phase 3 (MessageBus)** — Adds durable coordination on top of the merged worktree pool. This is the key differentiator for the "unified" vision — workers become reachable via durable messaging. File-based IPC stays as crash recovery.
|
||
|
|
|
||
|
|
4. **Phase 4 (subagent → DispatchService)** — Removes ~600 LOC of duplicate concurrency management from subagent. Makes subagent a client of the unified API. Opens the door for subagents to opt into MessageBus coordination.
|
||
|
|
|
||
|
|
5. **Phase 5 (UOK → DispatchService)** — Makes the UOK kernel a `DispatchService` client. This is the most impactful migration: the autonomous loop and the parallel orchestrator now share the same dispatch machinery.
|
||
|
|
|
||
|
|
6. **Phase 6 (subagent MessageBus)** — Final piece of the unified vision: subagents with MessageBus inboxes. Lowest risk (opt-in, additive) but completes the composition story.
|
||
|
|
|
||
|
|
### What NOT to Build Yet
|
||
|
|
|
||
|
|
- **Task-level dispatch** (`scope: 'task'`): Not needed yet. Milestone and slice are the primary parallelism boundaries. Task dispatch would require the unit-runtime layer (`uok/unit-runtime.js`) to be more mature.
|
||
|
|
|
||
|
|
- **Nested dispatch** (subagent spawning subagent): The current security boundary (constrained isolation = no project DB writes) prevents dangerous nested dispatch. Don't remove this constraint.
|
||
|
|
|
||
|
|
- **Persistent agents** (Letta-style): MessageBus is the right primitive, but SF doesn't have persistent named agents yet. Don't build agent registry/lifecycle management until there's a concrete use case.
|
||
|
|
|
||
|
|
- **Cmux decoupling**: Lower priority. Cmux grid layout is a UI concern. The dispatch layer doesn't need to know about it.
|
||
|
|
|
||
|
|
### The Order in Summary
|
||
|
|
|
||
|
|
```
|
||
|
|
Week 1: Phase 1 — Merge parallel + slice orchestrators → WorktreeOrchestrator
|
||
|
|
Week 2: Phase 2 — Extract DispatchService API (DispatchOptions interface)
|
||
|
|
Week 3: Phase 3 — Wire MessageBus into DispatchService (coordination: 'managed')
|
||
|
|
Week 4: Phase 4 — Subagent tool becomes DispatchService client
|
||
|
|
Week 5: Phase 5 — UOK kernel uses DispatchService
|
||
|
|
Week 6: Phase 6 — Subagent optional MessageBus inbox
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Summary of Positions
|
||
|
|
|
||
|
|
| Question | Position |
|
||
|
|
|----------|----------|
|
||
|
|
| **Unified interface** | Single `DispatchService` class with `DispatchOptions { isolation, coordination, scope, mode }`. Four typed dimensions, not separate mechanisms. |
|
||
|
|
| **MessageBus as backbone** | YES for live coordinator↔worker messaging. File-based IPC (`session-status-io.js`) stays as crash-recovery fallback. All live coordination flows through MessageBus when `coordination: 'managed'`. |
|
||
|
|
| **DB access matrix** | `isolation: 'full'` = project DB WAL. `isolation: 'constrained'` = ~/.sf/sf.db only, no project writes. Scope and mode don't affect DB access. |
|
||
|
|
| **Coordinator on MessageBus** | YES for `isolation: 'full', coordination: 'managed'`. UOK kernel becomes a DispatchService client with an AgentInbox. Workers reply via MessageBus. Debate/chain run as sequential rounds over MessageBus broadcast. Constrained subagents stay in-process for debate/chain. |
|
||
|
|
| **Migration** | 6 additive phases. Merge first (lowest risk), API extraction second, MessageBus wiring third, subagent adoption fourth, UOK migration fifth, subagent MessageBus opt-in sixth. Zero behavior change until Phase 4. |
|
||
|
|
| **Implementation order** | Phase 1 → Phase 2 → Phase 3 → Phase 4 → Phase 5 → Phase 6. Highest-value/lowest-risk items first. Don't build task-level dispatch, nested dispatch, persistent agents, or Cmux decoupling yet. |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Key File References
|
||
|
|
|
||
|
|
| File | Role in Unified System |
|
||
|
|
|------|----------------------|
|
||
|
|
| `src/resources/extensions/sf/parallel-orchestrator.js` | Merged into `worktree-orchestrator.js` |
|
||
|
|
| `src/resources/extensions/sf/slice-parallel-orchestrator.js` | Merged into `worktree-orchestrator.js` |
|
||
|
|
| `src/resources/extensions/sf/worktree-orchestrator.js` | **NEW** — merged orchestration engine |
|
||
|
|
| `src/resources/extensions/sf/dispatch/service.js` | **NEW** — `DispatchService` class |
|
||
|
|
| `src/resources/extensions/sf/dispatch/types.js` | **NEW** — `DispatchOptions` and related types |
|
||
|
|
| `src/resources/extensions/sf/uok/message-bus.js` | MessageBus + AgentInbox (already exists) |
|
||
|
|
| `src/resources/extensions/sf/uok/kernel.js` | UOK kernel (becomes DispatchService client) |
|
||
|
|
| `src/resources/extensions/sf/uok/execution-graph.js` | Constraint solver (stays separate) |
|
||
|
|
| `src/resources/extensions/sf/uok/dispatch-envelope.js` | What-to-dispatch contract (already exists) |
|
||
|
|
| `src/resources/extensions/sf/session-status-io.js` | File-based IPC fallback (stays) |
|
||
|
|
| `src/resources/extensions/subagent/index.js` | Subagent tool (becomes DispatchService client) |
|
||
|
|
| `src/resources/extensions/sf/slice-parallel-conflict.js` | Slice conflict checker (stays) |
|