- Rebrand commits already in history (gsd → forge) - Sync pre-existing doc, docker, and CI config updates - All rebrand artifacts verified in place: * Native crates: forge-engine, forge-ast, forge-grep * Log prefixes: [forge] across 22+ files * Binary: ~/bin/sf-run * Workspace scopes: @sf-run/*, @singularity-forge/* * Nix flake: Rust toolchain ready System ready for: nix develop && bun run build:native Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
13 KiB
ADR-009: Unified Orchestration Kernel Refactor
Status: Proposed Date: 2026-04-14 Deciders: Jeremy McSpadden, SF Core Team Related: ADR-001 (worktree architecture), ADR-003 (pipeline simplification), ADR-004 (capability-aware routing), ADR-005 (multi-provider strategy), ADR-008 (tools over MCP)
Context
SF already ships many advanced features:
- dynamic model routing and multi-provider support
- hooks (
pre_dispatch_hooks,post_unit_hooks) - subagents and parallel execution
- worktree/branch isolation and automated git flows
- per-unit metrics and cost ledgers
- activity logs and structured journal events
- verification retries and failure recovery
The current limitation is not missing capability. The limitation is distribution of control logic across large, mixed-concern modules, especially in auto-mode and related orchestration files. This raises change risk, creates duplicated policy paths, and slows the introduction of stronger guarantees.
The target requirements for the next architecture are:
- User can use any available model during any phase.
- First-class hooks, agents, sub-agents, team execution, and parallel workflows.
- Git actions on every turn with deterministic, auditable behavior.
- Logging of every action with causal traceability.
- Long upfront planning via multi-round questioning and research.
- Plan slicing and controlled dispatch through strict gate validation.
- Deterministic failure reprocessing loops.
- Automatic testing during build and gate transitions.
- Explicit token usage controls including a high-burn mode.
- Enforced compliance with provider/model terms of service.
Decision
Refactor SF into a Unified Orchestration Kernel (UOK) with explicit control planes, typed contracts, and an incremental strangler migration. This is a staged architectural replacement of orchestration internals, not a rewrite of user-facing CLI/web/MCP surfaces.
Core Architectural Model
The orchestrator is split into six control planes:
- Plan Plane
- Execution Plane
- Model Plane
- Gate Plane
- GitOps Plane
- Audit Plane
Each dispatched unit (turn) executes through a single deterministic pipeline:
Discover/Clarify/Research -> Plan Compile -> Model Select -> Execute -> Validate -> Git Transaction -> Persist Audit -> Next Unit
Detailed Design
1) Plan Plane: Multi-Round Front-Loaded Planning
Add a formal planning lifecycle:
discover: codebase and state scanclarify: multi-round user questions (bounded rounds, explicit stop condition)research: internal and external synthesisdraft-plan: produce full roadmap and milestonescompile: slice into executable units with IO boundariesplan-gate: reject/repair invalid plans before execution starts
Required outputs:
ROADMAP.md(complete)- per-milestone slice graph
- per-task executable unit specs
- requirement trace matrix (requirement -> unit(s) -> verification)
- plan risk register
Plan gate fails closed if:
- missing acceptance criteria
- missing verification strategy
- cyclic task dependencies
- unowned artifacts
- missing rollback/recovery semantics for risky units
2) Execution Plane: Agents, Sub-Agents, Teams, Parallel
Unify all execution into a typed DAG scheduler.
Node kinds:
unit(single execution task)hooksubagentteam-workerverificationreprocess
Edges express:
- hard dependencies
- resource conflicts (file-level IO locks)
- ordering constraints (gate-before-merge, test-before-closeout)
Execution modes:
- single-worker deterministic mode
- multi-worker parallel mode
- team mode (shared repo, unique milestone IDs, gated merge)
This removes ad-hoc parallel behavior and makes sub-agent and team paths first-class scheduler decisions.
3) Model Plane: Any Model in Any Phase
Replace rigid phase->model assumptions with requirement-based eligibility.
Selection pipeline:
- gather phase/unit requirements (capabilities, context size, latency profile)
- gather eligible models from configured providers
- apply hard policy filters (provider auth, TOS, tool compatibility, org rules)
- apply soft scoring (capability vectors, budget profile, historical outcomes)
- choose primary + fallback chain
Rules:
- Any model can run any phase if it passes policy and capability constraints.
- User pins remain hard ceilings only when configured explicitly.
- Unknown models are allowed with conservative default capability scores.
Add model intent profiles:
economy(lowest cost)balancedqualityburn-max(highest compute/token burn within policy and budget limits)
4) Gate Plane: Controlled Dispatch and Reprocessing
All units pass explicit gates:
policy-gate(provider/tool/TOS/security checks)input-gate(unit contract completeness, artifact readiness)execution-gate(runtime guardrails, timeout strategy, tool allowlist)artifact-gate(expected outputs and format validation)verification-gate(lint/test/typecheck/security checks)closeout-gate(state transition safety + git transaction outcome)
Gate outcomes:
passretryable-failhard-failmanual-attention
Failure reprocessing matrix (deterministic):
- code failure -> targeted fix prompt + bounded retry
- test failure -> impacted test fix loop
- tool failure -> alternate tool/provider fallback
- model failure -> fallback model chain
- policy failure -> immediate hard stop and explicit reason
Retry policy:
- bounded attempts per gate
- escalating strategy per attempt
- terminal state persisted with full evidence
5) GitOps Plane: Git Action Every Turn
Every dispatched unit is wrapped in a git transaction:
turn-start: capture branch/worktree status and dirty-state snapshotturn-exec: run unitturn-stage: stage relevant changesturn-checkpoint: commit checkpoint or structured no-op recordturn-publish: optional push per policyturn-record: write commit metadata into audit ledger
Defaults:
- checkpoint commit each turn in milestone branch/worktree
- squash on milestone merge to keep main history clean
Configurable strictness:
git.turn_action: commit|snapshot|status-onlygit.turn_push: never|milestone|always
If a repo state blocks commit (e.g., conflicts), turn fails at closeout gate with explicit diagnostics.
6) Audit Plane: Log Every Action
Promote current activity/journal into a single causal event model.
Event classes:
- orchestrator (
dispatch,gate-result,state-transition) - model (
selection,fallback,provider-switch) - tool (
call,result,error) - git (
status,stage,commit,merge,push) - test (
command,result,retry) - policy (
allow,deny,warning) - cost (
tokens,cost,cache-hit,budget-pressure)
Every event includes:
eventIdtraceId(session)turnId(unit)causedByreference- timestamp
- durable payload
Storage:
- append-only JSONL + indexed SQLite projection for queryability
- no destructive rewrites of source audit logs
Compliance and TOS Enforcement
Introduce a provider policy engine as a hard dependency of the policy gate.
Provider policy definition includes:
- allowed auth modes
- prohibited token exchange paths
- tool/protocol constraints
- subscription vs API usage boundaries
- model-specific restrictions
Enforcement rules:
- deny disallowed auth/routing before dispatch
- deny model selection if provider constraints are not met
- emit policy evidence events on every allow/deny decision
This formalizes current compliance work (notably Anthropic/Claude Code boundaries) into a reusable engine rather than scattered checks.
Automatic Testing Strategy
Testing becomes mandatory at three levels:
- Per-turn: impacted tests + lint/typecheck subset
- Per-slice closeout: full slice verification profile
- Per-milestone closeout: full suite (or policy-defined release profile)
Verification commands become declarative policies by unit type, not ad-hoc shell lists only.
Token Strategy and Burn-Max Mode
Existing token optimization modes remain, plus explicit high-burn profile.
burn-max behavior:
- maximize context inclusion
- prefer high-capability models
- enable deeper critique/review passes
- increase planning/research depth
Hard limits still apply:
- budget ceiling and enforcement rules
- provider rate limits
- TOS/policy constraints
The system must never bypass provider restrictions to increase usage.
Migration Plan (Strangler Refactor)
No big-bang rewrite. Migrate in waves with compatibility adapters.
Wave 0: Contracts and Telemetry Baseline
- define turn contract and gate result schemas
- add trace IDs/turn IDs to current paths
- keep behavior unchanged
Wave 1: Gate Plane Extraction
- extract gate runner from auto loop
- route existing checks through unified gate API
Wave 2: Model Plane Unification
- requirement-based model selection
- policy filter insertion before scoring
- preserve existing model config semantics
Wave 3: Scheduler and Execution Graph
- introduce DAG scheduler
- map existing subagent/parallel features to graph nodes
- enable graph mode behind flag
Wave 4: GitOps Transaction Layer
- enforce turn-level git actions
- add deterministic checkpoint behavior
Wave 5: Audit Plane Consolidation
- unify journal/activity/metrics events under common envelope
- add query projection
Wave 6: Plan Plane v2
- multi-round clarify/research planner
- compiled unit graph + plan gate
Wave 7: Legacy Path Retirement
- remove obsolete branches in
auto.tsand related modules - keep CLI/API compatibility
Module Extraction Targets
Primary decomposition targets:
auto.ts-> orchestrator kernel + adaptersauto-prompts.ts-> plan compiler + prompt renderersstate.ts-> state query service + immutable state viewsgsd-db.ts-> data access layer + event projection storeauto-post-unit.ts/auto-verification.ts-> closeout gate services
Acceptance Criteria
The refactor is accepted when all conditions are true:
- Any configured model can be selected in any phase when policy permits.
- Hooks, agents, sub-agents, teams, and parallel all execute under one scheduler contract.
- Every turn produces at least one git action record and auditable turn closeout.
- Every dispatch and action is traceable by
traceIdandturnId. - Multi-round planning produces a full executable unit graph before execution.
- Gate outcomes are explicit, deterministic, and persisted.
- Failure reprocessing uses typed failure classes, not generic retries.
- Automatic tests run per policy on every turn/slice/milestone gate.
- Token usage is tracked at turn granularity with burn-max profile support.
- Policy engine blocks TOS-violating routes and records evidence.
Consequences
Positive
- Stronger reliability through fail-closed gates
- Faster feature delivery by isolating orchestration concerns
- Clear compliance and audit posture
- Better debuggability from causal event logs
- Controlled support for aggressive high-burn workflows
Negative
- Significant migration effort across core modules
- More configuration surface area
- Temporary complexity during dual-path migration
Neutral
- Existing user commands and workflows remain stable during migration
- Existing preferences remain supported with compatibility adapters
Alternatives Considered
A) Full rewrite in a new codebase
Rejected. Too risky for a live project with broad surface area and active releases.
B) Continue incremental patching without architecture split
Rejected. Slows delivery and increases regression risk as orchestration complexity grows.
C) Keep existing optimization-first token model only
Rejected. Does not satisfy explicit requirement for intentional high-burn workflows.
Risks and Mitigations
- Migration regressions
- Mitigation: golden-path replay tests and shadow mode comparisons per wave.
- Audit log volume growth
- Mitigation: append-only raw logs plus indexed projections and retention policies.
- Git noise from per-turn commits
- Mitigation: milestone squash merge defaults and configurable checkpoint modes.
- Provider policy drift
- Mitigation: versioned provider policy registry with test fixtures per provider.
Open Questions
- Should
turn_action: commitbe mandatory default for all modes or only auto-mode? - Should
burn-maxbe opt-in global, project-scoped, or both? - Should policy violations always halt or allow configurable warn-only mode for local development?
Implementation Note
This ADR intentionally aligns with current architecture principles:
- extension-first where practical
- strong test contracts
- pragmatic incremental rollout
- provider-agnostic execution with explicit policy constraints