666 lines
32 KiB
Markdown
666 lines
32 KiB
Markdown
# Unified Dispatch v2 — Qwen Plan
|
||
|
||
**Author:** Architecture research
|
||
**Date:** 2026-05-08
|
||
**Status:** Structured plan for review
|
||
**Supersedes:** `DISPATCH_ARCHITECTURE_CONSOLIDATION.md`, `dispatch-orchestration-architecture.md`, `DISPATCH_ORCHESTRATION_PLAN.md`
|
||
|
||
---
|
||
|
||
## The Unified Vision
|
||
|
||
SF should support a single dispatch system where ALL of these coexist and compose:
|
||
|
||
1. **Full-tool agents** — workers with all SF tools + full project DB access (today's parallel-orchestrator workers)
|
||
2. **Constrained subagents** — the current subagent tool (4 tools, no project DB writes)
|
||
3. **MessageBus-coordinated agents** — agents with AgentInbox, communicating via MessageBus (durable inbox, not file-based IPC)
|
||
4. **Coordinators on MessageBus too** — UOK kernel publishes to workers via MessageBus, workers reply via MessageBus
|
||
5. **All in parallel/debate/chain** — the subagent tool's 4 modes apply to ALL of the above
|
||
6. **Shared SQLite WAL** — all agents that need project state share the same DB
|
||
7. **Optional MessageBus inbox for subagents** — subagents can opt in to receive coordinator messages
|
||
|
||
The dispatch layer is **ONE system** parameterized by four dimensions:
|
||
|
||
| Dimension | Values | Meaning |
|
||
|-----------|--------|---------|
|
||
| `isolation` | `'full'` \| `'constrained'` | Full: all SF tools + project DB writes. Constrained: 4 tools, no project DB writes. |
|
||
| `coordination` | `'standalone'` \| `'managed'` | Standalone: no MessageBus. Managed: has AgentInbox, coordinator can message. |
|
||
| `scope` | `'milestone'` \| `'slice'` \| `'task'` \| `'inline'` | The work unit hierarchy. |
|
||
| `mode` | `'single'` \| `'parallel'` \| `'debate'` \| `'chain'` | How many agents run and in what relationship. |
|
||
|
||
---
|
||
|
||
## Current State Map
|
||
|
||
| Mechanism | Isolation | Coordination | Scope | Mode | Key Files |
|
||
|-----------|-----------|--------------|-------|------|-----------|
|
||
| `parallel-orchestrator.js` | `full` | `standalone` (file-based IPC) | `milestone` | `parallel` | `src/resources/extensions/sf/parallel-orchestrator.js` |
|
||
| `slice-parallel-orchestrator.js` | `full` | `standalone` (file-based IPC) | `slice` | `parallel` | `src/resources/extensions/sf/slice-parallel-orchestrator.js` |
|
||
| Subagent tool (`extensions/subagent/index.js`) | `constrained` (4 tools) | `standalone` | `inline` | `single/parallel/debate/chain` | `src/resources/extensions/subagent/index.js` |
|
||
| UOK kernel (`uok/kernel.js`) | N/A (runs in-process) | `standalone` (no MessageBus) | `milestone` | `single` (autonomous loop) | `src/resources/extensions/sf/uok/kernel.js` |
|
||
| MessageBus (`uok/message-bus.js`) | N/A | N/A | N/A | N/A | `src/resources/extensions/sf/uok/message-bus.js` |
|
||
|
||
---
|
||
|
||
## Q1: Unified `dispatch()` API
|
||
|
||
### Design Principle
|
||
|
||
One function, four parameters. Every dispatch configuration is a point in the 4D parameter space.
|
||
|
||
```ts
|
||
// File: src/resources/extensions/sf/dispatch-layer.js (proposed)
|
||
|
||
export interface DispatchOptions {
|
||
// ── Isolation ───────────────────────────────────────────────────────────
|
||
// 'full': all SF tools, project DB read/write (milestone workers, slice workers)
|
||
// 'constrained': 4 tools only, no project DB writes (subagent, scout, reviewer, reporter)
|
||
isolation: 'full' | 'constrained';
|
||
|
||
// ── Coordination ───────────────────────────────────────────────────────
|
||
// 'standalone': no MessageBus, no coordinator messaging
|
||
// 'managed': AgentInbox per worker, coordinator can send pause/resume/stop/status messages
|
||
coordination: 'standalone' | 'managed';
|
||
|
||
// ── Scope ──────────────────────────────────────────────────────────────
|
||
// 'milestone': git worktree per milestone, SF_MILESTONE_LOCK set
|
||
// 'slice': git worktree per slice within a milestone, SF_SLICE_LOCK + SF_MILESTONE_LOCK set
|
||
// 'task': no worktree, runs in same process or short-lived subprocess
|
||
// 'inline': in-process agent call (subagent single mode, no worktree)
|
||
scope: 'milestone' | 'slice' | 'task' | 'inline';
|
||
|
||
// ── Mode ───────────────────────────────────────────────────────────────
|
||
// 'single': run one agent
|
||
// 'parallel': run N agents concurrently (up to maxConcurrency)
|
||
// 'debate': bounded adversarial rounds, each agent sees prior rounds' output
|
||
// 'chain': sequential, each step's output feeds into next step as {previous}
|
||
mode: 'single' | 'parallel' | 'debate' | 'chain';
|
||
|
||
// ── Common fields ──────────────────────────────────────────────────────
|
||
unitId: string; // milestoneId, sliceId, or taskId
|
||
milestoneId?: string; // required when scope='slice'
|
||
agent: string | string[]; // agent name(s); array for parallel/debate/chain
|
||
task?: string; // task description (single mode)
|
||
tasks?: TaskItem[]; // task list (parallel/debate mode)
|
||
chain?: ChainItem[]; // chain steps (chain mode)
|
||
basePath: string;
|
||
maxConcurrency?: number; // default: 4 for parallel/debate, 1 for chain
|
||
budgetCeiling?: number;
|
||
workerTimeoutMs?: number;
|
||
shellWrapper?: string[];
|
||
useExecutionGraph?: boolean; // use file-conflict DAG to filter parallel set
|
||
modelOverride?: string;
|
||
parentTrace?: string; // audit context injected for review subagents
|
||
// ── MessageBus routing (when coordination='managed') ──────────────────
|
||
coordinatorId?: string; // agentId of the coordinator (for routing replies)
|
||
}
|
||
|
||
export interface TaskItem {
|
||
agent: string;
|
||
task: string;
|
||
cwd?: string;
|
||
model?: string;
|
||
parentTrace?: string;
|
||
}
|
||
|
||
export interface ChainItem {
|
||
agent: string;
|
||
task: string; // may contain {previous} placeholder
|
||
cwd?: string;
|
||
model?: string;
|
||
parentTrace?: string;
|
||
}
|
||
|
||
export class DispatchLayer {
|
||
readonly bus: MessageBus; // shared bus for all dispatches
|
||
|
||
constructor(basePath: string, busOptions?: MessageBusOptions);
|
||
|
||
// ── Core dispatch ────────────────────────────────────────────────────────
|
||
async dispatch(opts: DispatchOptions): Promise<DispatchResult>;
|
||
|
||
// ── Batch helpers ───────────────────────────────────────────────────────
|
||
async dispatchMilestones(milestoneIds: string[], opts: Partial<DispatchOptions>): Promise<StartResult>;
|
||
async dispatchSlices(milestoneId: string, sliceIds: string[], opts: Partial<DispatchOptions>): Promise<StartResult>;
|
||
|
||
// ── Lifecycle ───────────────────────────────────────────────────────────
|
||
async stop(workIds?: string[]): Promise<void>;
|
||
pause(workIds?: string[]): void; // via MessageBus when managed
|
||
resume(workIds?: string[]): void; // via MessageBus when managed
|
||
|
||
// ── State ───────────────────────────────────────────────────────────────
|
||
getStatus(): DispatchStatus;
|
||
totalCost(): number;
|
||
isBudgetExceeded(): boolean;
|
||
|
||
// ── Event subscription ─────────────────────────────────────────────────
|
||
subscribe(handler: DispatchEventHandler): UnsubscribeFn;
|
||
}
|
||
```
|
||
|
||
### Parameter Matrix
|
||
|
||
| isolation | coordination | scope | mode | Current equivalent | DB access |
|
||
|-----------|---------------|-------|------|-------------------|-----------|
|
||
| `full` | `managed` | `milestone` | `parallel` | `parallel-orchestrator.js` | project DB read/write |
|
||
| `full` | `managed` | `slice` | `parallel` | `slice-parallel-orchestrator.js` | project DB read/write |
|
||
| `constrained` | `standalone` | `inline` | `single` | subagent single mode | no project DB |
|
||
| `constrained` | `standalone` | `inline` | `parallel` | subagent parallel mode | no project DB |
|
||
| `constrained` | `standalone` | `inline` | `debate` | subagent debate mode | no project DB |
|
||
| `constrained` | `standalone` | `inline` | `chain` | subagent chain mode | no project DB |
|
||
| `constrained` | `managed` | `inline` | `single` | **new: managed subagent** | no project DB |
|
||
| `full` | `managed` | `inline` | `single` | **new: headless autonomous** | project DB read/write |
|
||
|
||
---
|
||
|
||
## Q2: MessageBus as the Backbone
|
||
|
||
### Answer: Yes, ALL coordinator → worker and worker → coordinator communication flows through MessageBus.
|
||
|
||
The file-based IPC (`session-status-io.js`, `sendSignal`) becomes a **crash-recovery fallback only**, not the primary path.
|
||
|
||
### What MessageBus replaces today
|
||
|
||
| Current mechanism | Replaced by | Where defined |
|
||
|-------------------|-------------|---------------|
|
||
| `session-status-io.js` — write/read session status files | `MessageBus.send()` to `worker/<id>/status` | `dispatch-layer.js` |
|
||
| `sendSignal(basePath, mid, "pause\|resume\|stop")` | `MessageBus.send(coordinatorId, workerId, "pause")` | `dispatch-layer.js` |
|
||
| `consumeSignal(basePath, mid)` | Worker polls `AgentInbox.list()` | Worker bootstrap |
|
||
| `parallel-intent.js` — CoordinationStore for file intent | `MessageBus.send()` to `coordinator/file-intent` | `dispatch-layer.js` |
|
||
|
||
### Worker inbox naming convention
|
||
|
||
```
|
||
Workers get AgentInbox named: dispatch:<scope>:<unitId>
|
||
e.g., dispatch:milestone:M01
|
||
e.g., dispatch:slice:M01/S01
|
||
|
||
Coordinator gets AgentInbox named: dispatch:coordinator:<runId>
|
||
```
|
||
|
||
### MessageBus event taxonomy
|
||
|
||
```ts
|
||
// Worker → Coordinator messages (via worker's AgentInbox addressed to coordinatorId)
|
||
type WorkerStatusMessage =
|
||
| { type: 'worker.started', milestoneId: string, sliceId?: string, pid: number, worktreePath: string }
|
||
| { type: 'worker.heartbeat', milestoneId: string, cost: number, currentUnit?: string }
|
||
| { type: 'worker.completed', milestoneId: string, exitCode: number, totalCost: number }
|
||
| { type: 'worker.error', milestoneId: string, error: string }
|
||
| { type: 'worker.paused', milestoneId: string }
|
||
| { type: 'worker.resumed', milestoneId: string };
|
||
|
||
// Coordinator → Worker messages (via coordinator's AgentInbox addressed to workerId)
|
||
type CoordinatorCommandMessage =
|
||
| { type: 'coordinator.pause' }
|
||
| { type: 'coordinator.resume' }
|
||
| { type: 'coordinator.stop' }
|
||
| { type: 'coordinator.status_request' };
|
||
|
||
// Broadcast messages (coordinator → all workers)
|
||
type BroadcastMessage =
|
||
| { type: 'coordinator.budget_exceeded', ceiling: number }
|
||
| { type: 'coordinator.sibling_failed', triggeringWorkerId: string };
|
||
```
|
||
|
||
### Worker bootstrap changes
|
||
|
||
Today, a worker (`sf headless --json autonomous` in a worktree) has no MessageBus integration. It writes status files that the coordinator polls. After Q2:
|
||
|
||
```ts
|
||
// In the worker bootstrap (inside sf headless autonomous process)
|
||
const workerId = `dispatch:${scope}:${unitId}`;
|
||
const inbox = dispatchLayer.bus.getInbox(workerId);
|
||
|
||
// On each dispatch tick, check inbox for coordinator messages
|
||
const messages = inbox.list(unreadOnly = true);
|
||
for (const msg of messages) {
|
||
if (msg.body.type === 'coordinator.stop') {
|
||
// graceful shutdown
|
||
break;
|
||
}
|
||
inbox.markRead(msg.id);
|
||
}
|
||
```
|
||
|
||
### File-based IPC as fallback
|
||
|
||
`session-status-io.js` stays for **crash recovery**: if a coordinator restarts and workers are still running, the coordinator reads `session-status.json` files to restore state. This is already implemented and correct — it just stops being the *primary* coordination path.
|
||
|
||
### Code reference for existing file-based IPC
|
||
|
||
- `session-status-io.js:writeSessionStatus()` — atomic JSON write to `.sf/parallel/<mid>.status.json`
|
||
- `session-status-io.js:sendSignal()` — atomic JSON write to `.sf/parallel/<mid>.signal.json`
|
||
- `parallel-orchestrator.js:refreshWorkerStatuses()` — polls all status files every dashboard refresh cycle
|
||
- `parallel-orchestrator.js:processWorkerLine()` — parses NDJSON from worker stdout, updates status
|
||
|
||
---
|
||
|
||
## Q3: DB Access Matrix
|
||
|
||
### The single-writer invariant
|
||
|
||
`sf-db.js` enforces that **only this file** issues write SQL (INSERT/UPDATE/DELETE) against `.sf/sf.db`. All other modules must call typed wrappers exported here. This is checked in CI by `tests/single-writer-invariant.test.ts`.
|
||
|
||
### DB access per dispatch configuration
|
||
|
||
| Dispatch config | Project DB (.sf/sf.db) | Global DB (~/.sf/sf.db) | Notes |
|
||
|----------------|------------------------|--------------------------|-------|
|
||
| `isolation:full, scope:milestone` | **read/write** | read | Workers open WAL connection to project DB via `syncSfStateToWorktree` |
|
||
| `isolation:full, scope:slice` | **read/write** | read | Same as above |
|
||
| `isolation:full, scope:inline` | **read/write** | read | `sf headless autonomous` running in same process (UOK kernel's autonomous mode) |
|
||
| `isolation:constrained, scope:inline` | **read via prompt injection only** | read/write | Subagent spawns `sf` CLI which opens `~/.sf/sf.db` only; project DB accessed via prompt context injection |
|
||
| `isolation:constrained, scope:task` | **none** | read/write | Ephemeral task dispatch (future) |
|
||
|
||
### How constrained isolation is enforced
|
||
|
||
The subagent tool (`extensions/subagent/index.js`) spawns `sf` CLI as a **separate OS process**:
|
||
|
||
```ts
|
||
// subagent/index.js — runSingleAgent()
|
||
const child = spawn(launchSpec.command, launchSpec.args, {
|
||
cwd: cwd ?? defaultCwd,
|
||
env: launchSpec.env, // inherits parent env but NOT the RPC connection
|
||
shell: false,
|
||
stdio: ["ignore", "pipe", "pipe"],
|
||
});
|
||
```
|
||
|
||
The spawned `sf` CLI opens its **own** SQLite connection — by default `~/.sf/sf.db` (global). Project DB access happens only through **prompt injection** (`system-context.js` assembles project context into the system prompt).
|
||
|
||
The 4-tool registry (`subagent`, `await_subagent`, `cancel_subagent`, background job tools) is enforced by the extension manifest's `tools[]` array, not by DB permissions.
|
||
|
||
### The access contract for constrained dispatch
|
||
|
||
```ts
|
||
// In dispatch-layer.js — formalize the access contract
|
||
const DISPATCH_DB_ACCESS = {
|
||
full: {
|
||
read: ['project .sf/sf.db', 'global ~/.sf/sf.db'],
|
||
write: ['project .sf/sf.db', 'global ~/.sf/sf.db'],
|
||
},
|
||
constrained: {
|
||
read: ['project context via prompt injection only'],
|
||
write: ['global ~/.sf/sf.db only'],
|
||
}
|
||
} as const;
|
||
```
|
||
|
||
### Future: RPC-mode subagent (constrained + managed)
|
||
|
||
Today: `isolation:constrained, coordination:standalone`
|
||
|
||
New mode: `isolation:constrained, coordination:managed`
|
||
|
||
A subagent with `coordination:'managed'` gets an `AgentInbox` but still cannot write to the project DB. This enables long-running subagents to receive pause/stop messages from the coordinator without gaining DB write access.
|
||
|
||
---
|
||
|
||
## Q4: Coordinator Pattern with Coordinators on MessageBus
|
||
|
||
### Answer: Coordinators ARE MessageBus agents. Debate and chain work differently.
|
||
|
||
The coordinator is not a separate process — it is a **role** that a MessageBus agent plays when it initiates dispatch and monitors replies.
|
||
|
||
### Two coordinator patterns
|
||
|
||
**Pattern A — UOK Kernel as Coordinator (full-tool, managed, milestone/slice scope)**
|
||
|
||
The UOK kernel initializes a `DispatchLayer` with `coordination:'managed'`. Each worker has an `AgentInbox` named `dispatch:milestone:<mid>`. The kernel subscribes to all worker inboxes via the shared `MessageBus`.
|
||
|
||
```
|
||
UOK Kernel (coordinatorId: "dispatch:coordinator:<runId>")
|
||
│
|
||
├── MessageBus.send("dispatch:coordinator:<runId>", "dispatch:milestone:M01", { type: 'worker.started', ... })
|
||
├── MessageBus.send("dispatch:coordinator:<runId>", "dispatch:milestone:M02", { type: 'worker.started', ... })
|
||
│
|
||
│ Workers each poll their AgentInbox:
|
||
│ AgentInbox("dispatch:milestone:M01").list()
|
||
│ AgentInbox("dispatch:milestone:M02").list()
|
||
│
|
||
├── UOK kernel processes milestone completions
|
||
└── Calls dispatchLayer.stop() on autonomous loop exit
|
||
```
|
||
|
||
**Pattern B — Subagent Tool as Coordinator (constrained, standalone or managed, inline scope)**
|
||
|
||
The subagent tool itself is the coordinator for its `parallel`, `debate`, and `chain` modes. It does **not** use MessageBus for coordination in `standalone` mode — it uses `Promise.all` + in-process event streaming.
|
||
|
||
In `managed` mode (new), the subagent tool would also have an `AgentInbox` so the parent TUI session can send it pause/stop messages.
|
||
|
||
### How debate mode works (subagent tool, NOT using MessageBus for agent coordination)
|
||
|
||
The subagent tool's debate mode (`subagent/index.js:executeSubagentInvocation()`, line 320) runs multiple agents **sequentially within a single process** using `mapWithConcurrencyLimit(MAX_CONCURRENCY=4)`:
|
||
|
||
```ts
|
||
// subagent/index.js — debate mode
|
||
for (let round = 1; round <= rounds; round++) {
|
||
for (let i = 0; i < batchTasks.length; i++) {
|
||
// buildDebatePrompt() injects prior round transcripts
|
||
const prompt = buildDebatePrompt(task, round, transcriptEntries.join("\n\n"));
|
||
const result = await runSingleAgent(..., prompt, ...);
|
||
debateResults[(round-1) * batchTasks.length + i] = result;
|
||
transcriptEntries.push(formatResult(result));
|
||
}
|
||
}
|
||
```
|
||
|
||
This is **not** MessageBus-based because:
|
||
1. Agents in a debate share a **single conversation transcript** — they must run sequentially and pass state through the coordinator's memory
|
||
2. True process-level parallelism would require separate conversation contexts, which breaks the shared-transcript model
|
||
3. The coordinator IS the single-agent orchestrator that sequences rounds and injects transcripts
|
||
|
||
### How chain mode works (subagent tool)
|
||
|
||
Chain mode (`subagent/index.js:executeSubagentInvocation()`, line 220) runs steps **sequentially**, passing output as `{previous}`:
|
||
|
||
```ts
|
||
let previousOutput = "";
|
||
for (let i = 0; i < params.chain.length; i++) {
|
||
const step = params.chain[i];
|
||
const taskWithContext = step.task.replace(/\{previous\}/g, previousOutput);
|
||
const result = await runSingleAgent(..., taskWithContext, ...);
|
||
results.push(result);
|
||
previousOutput = getFinalOutput(result.messages);
|
||
}
|
||
```
|
||
|
||
**Chain does NOT need MessageBus** — it's purely sequential, and the coordinator (subagent tool) holds state in memory.
|
||
|
||
### When MessageBus IS used for coordinator ↔ workers
|
||
|
||
| Mode | MessageBus needed? | Why |
|
||
|------|---------------------|-----|
|
||
| `single` (inline) | No | One agent, no coordination needed |
|
||
| `parallel` (inline, subagent) | No | Coordinator uses `Promise.all`, in-process |
|
||
| `parallel` (milestone/slice, WorktreeOrchestrator) | **Yes** | Workers are separate processes; coordinator needs durable signaling |
|
||
| `debate` | No | Sequential rounds with in-memory transcript; process-level parallelism defeats shared context |
|
||
| `chain` | No | Purely sequential with in-memory `{previous}` injection |
|
||
| `managed` subagent | **Yes** | Parent TUI needs to send pause/stop to long-running subagent |
|
||
|
||
### The coordinator IS a MessageBus agent
|
||
|
||
```ts
|
||
// In DispatchLayer constructor
|
||
this.coordinatorInbox = this.bus.getOrCreateInbox(`dispatch:coordinator:${runId}`);
|
||
|
||
// When a worker wants to send to the coordinator:
|
||
this.bus.send(workerId, coordinatorId, { type: 'worker.completed', ... });
|
||
|
||
// When coordinator wants to send to a worker:
|
||
this.bus.send(coordinatorId, workerId, { type: 'coordinator.pause' });
|
||
```
|
||
|
||
---
|
||
|
||
## Q5: Migration from Today to Unified System
|
||
|
||
### Principle: Never break existing workflows. Build the new system alongside the old.
|
||
|
||
### Migration strategy: Strangler Fig
|
||
|
||
The old dispatch mechanisms are replaced one at a time, with the unified `DispatchLayer` absorbing their responsibilities. External behavior (CLI flags, command handlers, dashboard output) stays identical throughout.
|
||
|
||
### Phase 1 — Extract DispatchLayer (week 1-2)
|
||
|
||
**Goal**: Create `dispatch-layer.js` without changing behavior.
|
||
|
||
```ts
|
||
// New: src/resources/extensions/sf/dispatch-layer.js
|
||
// Internal implementation: delegates to existing parallel-orchestrator.js
|
||
// Public API: the DispatchOptions interface above
|
||
|
||
// parallel-orchestrator.js becomes a thin wrapper:
|
||
export async function startParallel(basePath, milestoneIds, prefs) {
|
||
const layer = new DispatchLayer(basePath);
|
||
return layer.dispatchMilestones(milestoneIds, { isolation: 'full', coordination: 'standalone', scope: 'milestone', mode: 'parallel', ...prefs });
|
||
}
|
||
```
|
||
|
||
**Files touched**:
|
||
- New: `src/resources/extensions/sf/dispatch-layer.js` (~600 LOC, merged from both orchestrators)
|
||
- `parallel-orchestrator.js` — refactored to delegate to DispatchLayer
|
||
- `slice-parallel-orchestrator.js` — refactored to delegate to DispatchLayer
|
||
|
||
**Test**: All parallel and slice-parallel tests pass. Dashboard shows same worker states.
|
||
|
||
### Phase 2 — Wire MessageBus for coordinator → worker signaling (week 2-3)
|
||
|
||
**Goal**: Workers get `AgentInbox`, coordinator sends pause/resume/stop via MessageBus.
|
||
|
||
```ts
|
||
// In dispatch-layer.js — start() method
|
||
async start(ids: string[], opts: DispatchOptions) {
|
||
// For each worker, create AgentInbox
|
||
for (const id of ids) {
|
||
const workerId = `dispatch:${opts.scope}:${id}`;
|
||
this.bus.getOrCreateInbox(workerId);
|
||
// ... spawn worker with MessageBus integration
|
||
}
|
||
}
|
||
|
||
// Worker bootstrap (sf headless autonomous process)
|
||
// On each dispatch tick:
|
||
const inbox = dispatchLayer.bus.getInbox(workerId);
|
||
for (const msg of inbox.list(true)) {
|
||
handleCoordinatorMessage(msg.body);
|
||
inbox.markRead(msg.id);
|
||
}
|
||
```
|
||
|
||
**Files touched**:
|
||
- `dispatch-layer.js` — add MessageBus send on start/pause/resume/stop
|
||
- Worker NDJSON bootstrap in `parallel-orchestrator.js` and `slice-parallel-orchestrator.js` — add inbox polling loop
|
||
|
||
**Test**: Workers respond to MessageBus pause/resume messages. File-based IPC (`session-status-io.js`) still works as crash-recovery fallback.
|
||
|
||
### Phase 3 — Subagent gets optional MessageBus inbox (week 3)
|
||
|
||
**Goal**: Subagents can opt in to `coordination:'managed'` for long-running tasks.
|
||
|
||
```ts
|
||
// In subagent tool params — new field
|
||
const SubagentParams = Type.Object({
|
||
// ... existing fields ...
|
||
managed: Type.Optional(Type.Boolean({
|
||
description: 'Give this subagent a MessageBus AgentInbox so the coordinator can send pause/stop messages.',
|
||
default: false,
|
||
})),
|
||
});
|
||
```
|
||
|
||
**Files touched**:
|
||
- `extensions/subagent/index.js` — add `managed` parameter
|
||
- When `managed: true`, spawn `sf headless` (not `sf` CLI) so it can receive MessageBus messages
|
||
|
||
**Test**: Long-running subagent receives coordinator pause message via MessageBus.
|
||
|
||
### Phase 4 — UOK kernel adopts DispatchLayer (week 4)
|
||
|
||
**Goal**: UOK kernel calls `DispatchLayer` instead of directly managing parallel workers.
|
||
|
||
Today: `uok/kernel.js` calls `parallel-orchestrator.js` via separate import.
|
||
|
||
After: `uok/kernel.js` calls `DispatchLayer` which owns the worktree pool.
|
||
|
||
**Files touched**:
|
||
- `uok/kernel.js` — replace `import { startParallel } from '../parallel-orchestrator.js'` with `import { DispatchLayer }`
|
||
- `dispatch-layer.js` — add `useExecutionGraph` integration so UOK kernel's dispatch decisions use the file-conflict DAG
|
||
|
||
**Test**: `sf headless autonomous` with parallel milestones works identically to current behavior.
|
||
|
||
### Phase 5 — Deprecate file-based IPC paths (week 5-6)
|
||
|
||
**Goal**: MessageBus becomes primary, file-based IPC becomes pure fallback.
|
||
|
||
After Phase 2, file-based IPC (`session-status-io.js`) is still written by workers for crash recovery. After Phase 5, coordinator **stops reading** status files on the primary path — only reads them on startup if MessageBus has no worker state.
|
||
|
||
**Files touched**:
|
||
- `dispatch-layer.js` — change `refreshWorkerStatuses()` to read from MessageBus first, fall back to `session-status-io.js` only if no MessageBus state found
|
||
- `session-status-io.js` — keep as crash-recovery-only, mark as `@deprecated`
|
||
|
||
**Test**: Crash recovery still works. Primary path uses MessageBus.
|
||
|
||
### Phase 6 — Subagent RPC mode (week 6-7)
|
||
|
||
**Goal**: Constrained subagents can gain full tool access by running as headless RPC client.
|
||
|
||
Today, constrained subagent spawns `sf` CLI (full binary). New mode: spawns `sf headless` as RPC client.
|
||
|
||
```ts
|
||
// In dispatch-layer.js
|
||
async dispatch(opts: DispatchOptions): Promise<DispatchResult> {
|
||
if (opts.isolation === 'constrained' && opts.rpcMode) {
|
||
return this.dispatchAsRpcClient(opts); // calls sf headless RPC
|
||
}
|
||
// ...
|
||
}
|
||
```
|
||
|
||
**Files touched**:
|
||
- `extensions/subagent/index.js` — add `rpcMode: true` path
|
||
- New: `extensions/subagent/rpc-client.js`
|
||
|
||
**Test**: Subagent with `rpcMode: true` can call `complete-task`.
|
||
|
||
### Phase 7 — Naming cleanup (week 7)
|
||
|
||
**Goal**: Reflect the unified model in file names.
|
||
|
||
- `dispatch-layer.js` → `worktree-orchestrator.js` (or keep `dispatch-layer.js` if preferred — the name is already clear)
|
||
- Update all import paths
|
||
|
||
---
|
||
|
||
## Q6: Implementation Order
|
||
|
||
### The correct build sequence (not the same as migration order)
|
||
|
||
The key constraint: **subagent tool must stay stable** throughout because it is the primary user-facing tool. The UOK kernel and parallel orchestrator are internal — they can change more aggressively.
|
||
|
||
### Order:
|
||
|
||
```
|
||
1. Extract DispatchLayer (week 1-2)
|
||
→ No behavior change. Parallel/slice orchestrators delegate to it.
|
||
→ Test: all existing parallel/slice tests pass.
|
||
|
||
2. Subagent RPC mode (week 3)
|
||
→ Most impactful user-facing improvement.
|
||
→ Subagent with RPC mode can call complete-task and other SF tools.
|
||
→ Isolated from the rest of the refactor — subagent is its own path.
|
||
|
||
3. MessageBus wiring in DispatchLayer (week 4)
|
||
→ Coordinator → workers via MessageBus, not file IPC.
|
||
→ Worker bootstrap gets inbox polling.
|
||
→ File-based IPC becomes fallback only.
|
||
|
||
4. UOK kernel adopts DispatchLayer (week 5)
|
||
→ UOK kernel is internal. Breaks less if changed late.
|
||
→ After this, the UOK kernel is the coordinator for autonomous mode.
|
||
|
||
5. Subagent managed mode (week 6)
|
||
→ Optional MessageBus inbox for subagents.
|
||
→ Parent TUI can pause/stop long-running subagents.
|
||
|
||
6. Deprecate file-based IPC (week 7)
|
||
→ MessageBus becomes the only primary path.
|
||
→ session-status-io.js kept for crash recovery only.
|
||
|
||
7. Naming cleanup + Cmux decoupling (week 8)
|
||
→ Remove cmuxSplitsEnabled coupling from subagent tool.
|
||
→ Cmux subscribes to MessageBus dispatch events.
|
||
```
|
||
|
||
### Why this order
|
||
|
||
1. **Subagent RPC mode first** — highest user impact with lowest risk. Subagent is a separate dispatch path; changes don't affect parallel orchestrator or UOK kernel.
|
||
2. **MessageBus wiring before UOK kernel adoption** — UOK kernel is the most complex consumer. We want MessageBus as the backbone *before* we hook UOK kernel to it.
|
||
3. **UOK kernel adoption late** — it's internal infrastructure. Changing it last means we've already validated the DispatchLayer API in production-like conditions (parallel orchestrator + subagent).
|
||
4. **Cmux decoupling last** — it's UI, not dispatch. It can follow the architecture once the dispatch architecture is stable.
|
||
|
||
---
|
||
|
||
## Key Code References
|
||
|
||
### Parallel Orchestrator
|
||
- `src/resources/extensions/sf/parallel-orchestrator.js` — 820 lines, manages milestone workers
|
||
- `startParallel()` — spawns `sf headless --json autonomous` in worktrees
|
||
- `spawnWorker()` — sets `SF_MILESTONE_LOCK`, `SF_PROJECT_ROOT`, `SF_PARALLEL_WORKER` env vars
|
||
- `processWorkerLine()` — parses NDJSON, extracts cost from `message_end` events
|
||
- `refreshWorkerStatuses()` — polls `session-status-io.js` for worker state
|
||
|
||
### Slice Parallel Orchestrator
|
||
- `src/resources/extensions/sf/slice-parallel-orchestrator.js` — 90% identical to parallel-orchestrator.js
|
||
- Key diff: sets `SF_SLICE_LOCK` + `SF_MILESTONE_LOCK` env vars
|
||
- Calls `filterConflictingSlices()` from `slice-parallel-conflict.ts`
|
||
|
||
### Session Status (File-based IPC)
|
||
- `src/resources/extensions/sf/session-status-io.js` — 150 lines
|
||
- `writeSessionStatus()` — atomic write to `.sf/parallel/<mid>.status.json`
|
||
- `sendSignal()` / `consumeSignal()` — pause/resume/stop via `.sf/parallel/<mid>.signal.json`
|
||
|
||
### Subagent Tool
|
||
- `src/resources/extensions/subagent/index.js` — 2700 lines
|
||
- `runSingleAgent()` — spawns `sf` CLI, parses NDJSON events
|
||
- `executeSubagentInvocation()` — handles single/parallel/debate/chain modes
|
||
- `mapWithConcurrencyLimit()` — in-process concurrency for parallel/debate modes
|
||
- No DB tools registered (only 4 tools in extension manifest)
|
||
|
||
### Subagent Inheritance (DB Access Contract)
|
||
- `src/resources/extensions/sf/subagent-inheritance.js` — 220 lines
|
||
- `buildSubagentInheritanceEnvelope()` — captures parent mode for subagent dispatch
|
||
- `validateSubagentDispatch()` — rejects subagents that bypass provider allowlists
|
||
|
||
### MessageBus
|
||
- `src/resources/extensions/sf/uok/message-bus.js` — 280 lines
|
||
- `MessageBus.send()` — SQLite-backed durable send to AgentInbox
|
||
- `AgentInbox` — per-agent durable inbox with TTL and retention
|
||
|
||
### UOK Kernel
|
||
- `src/resources/extensions/sf/uok/kernel.js` — 220 lines
|
||
- `runAutoLoopWithUok()` — the autonomous loop entry point
|
||
- Currently calls `parallel-orchestrator.js` separately, not through a unified dispatch layer
|
||
|
||
### Execution Graph (Constraint Solver)
|
||
- `src/resources/extensions/sf/uok/execution-graph.js`
|
||
- `selectConflictFreeBatch()` — picks conflict-free parallel subset from file overlap DAG
|
||
- Already used by parallel-orchestrator, should be used by slice-parallel and UOK kernel
|
||
|
||
### DB Schema
|
||
- `src/resources/extensions/sf/sf-db.js` — single-writer invariant
|
||
- `milestones` table — `id, title, status, created_at, ...`
|
||
- `slices` table — `milestone_id, id, title, status, ...`
|
||
- `tasks` table — `milestone_id, slice_id, id, status, ...`
|
||
- `milestone_specs`, `slice_specs`, `task_specs` — immutable spec records
|
||
- `milestone_evidence`, `slice_evidence`, `task_evidence` — append-only audit trail
|
||
|
||
### Parallel Intent (File Claim Registry)
|
||
- `src/resources/extensions/sf/parallel-intent.js` — 170 lines
|
||
- `declareIntent()` — worker announces file intent before editing
|
||
- Uses `UokCoordinationStore` (Redis-like on SQLite) for TTL-based claims
|
||
|
||
---
|
||
|
||
## Summary
|
||
|
||
**Q1 — Unified API**: One `dispatch()` function with four parameters: `isolation × coordination × scope × mode`. Current 5 mechanisms collapse to one `DispatchLayer` class.
|
||
|
||
**Q2 — MessageBus backbone**: YES. All coordinator ↔ worker communication flows through MessageBus. File-based IPC (`session-status-io.js`) becomes crash-recovery fallback only.
|
||
|
||
**Q3 — DB access matrix**: `isolation:full` → project DB read/write. `isolation:constrained` → no project DB writes, reads via prompt injection only. Global DB always accessible. Enforced by process boundary (spawned CLI) and extension manifest tools array.
|
||
|
||
**Q4 — Coordinator pattern**: Coordinators ARE MessageBus agents. The UOK kernel gets a `DispatchLayer` coordinator inbox. Debate/chain modes do NOT use MessageBus — they are sequential in-memory coordination with in-process `Promise.all`. Subagent parallel mode is also in-process. MessageBus is for **cross-process** coordination only.
|
||
|
||
**Q5 — Migration**: Strangler Fig pattern. Extract `DispatchLayer` first (no behavior change). Then wire MessageBus. Then UOK kernel adopts it. Subagent RPC mode is independent and can ship first.
|
||
|
||
**Q6 — Implementation order**:
|
||
1. Subagent RPC mode (highest impact, lowest risk)
|
||
2. Extract `DispatchLayer` (foundational, no behavior change)
|
||
3. Wire MessageBus into `DispatchLayer`
|
||
4. UOK kernel adopts `DispatchLayer`
|
||
5. Subagent managed mode
|
||
6. Deprecate file-based IPC
|
||
7. Cmux decoupling + naming cleanup
|
||
|
||
**Total: ~8 weeks**, sequenced to never break existing workflows.
|