singularity-forge/ARCHITECTURE.md

# Architecture

## Purpose

Singularity Forge (SF) is the product. It runs long-horizon coding work through the Unified Operation Kernel (UOK): milestones → slices → tasks. Each dispatch unit runs a fresh AI context, writes its output to disk, then terminates. UOK owns lifecycle, recovery, and the DB-backed run ledger; runtime files under `.sf/runtime/` are projections for query, UI, and compatibility. A deterministic controller (not an LLM) reads canonical state and decides what to dispatch next. Core changes follow purpose-driven TDD: purpose and consumer first, then failing tests, then implementation. The user is the end-gate — autonomous mode delivers work to human review, it does not merge to production unattended.

## Codemap

| Path | Purpose |
|------|---------|
| `src/loader.ts` | Entry point — initializes resources, registers extension |
| `src/headless.ts` | Non-interactive (headless) mode driver — exit codes 0/1/10/11/12 |
| `src/headless-events.ts` | Transcript event parsing and notification routing |
| `src/extension-registry.ts` | Registers SF as a coding-agent extension |
| `src/resources/extensions/sf/` | All SF extension source (TypeScript) |
| `src/resources/extensions/sf/auto/` | Autonomous workflow orchestrator (UOK lifecycle, dispatch, planning) |
| `src/resources/extensions/sf/bootstrap/` | Context injection, system prompt assembly |
| `src/resources/extensions/sf/prompts/` | Prompt templates (`.md`, loaded by `prompt-loader.ts`) |
| `src/resources/extensions/sf/tests/` | Unit and integration tests |
| `dist/resources/extensions/sf/` | Compiled JS (rebuilt by `npm run copy-resources`) |
| `~/.sf/agent/extensions/sf/` | Installed copy (synced from dist on startup) |
| `docs/` | Durable product, design, plan, reliability, and security context |
| `harness/` | Specs (behavior contracts), evals (model-output tests), graders |

## State layout (`.sf/`)

`.sf/` can be a **symlink** (external state, `~/.sf/projects/<hash>/`) or a **local directory** (tracking-enabled per ADR-001).

**Tracked in git** (travel with the branch, per ADR-001):
```
.sf/milestones/     — roadmaps, plans, summaries, task plans (rendered projections from DB)
.sf/PROJECT.md      — project overview
```

**Gitignored** (runtime/ephemeral — managed by `ensureGitInfoExclude()` in `.git/info/exclude`):
```
.sf/activity/       — JSONL session dumps
.sf/audit/          — audit trail entries (primary: events.jsonl)
.sf/exec/           — in-flight execution state
.sf/forensics/      — crash forensics
.sf/journal/        — SF journal entries
.sf/model-benchmarks/ — model benchmark results
.sf/parallel/       — parallel dispatch coordination
.sf/reports/        — generated reports
.sf/runtime/        — dispatch records, timeout tracking, error spill files
.sf/traces/         — per-session trace JSONL (gate runs, git ops); latest symlink
.sf/worktrees/      — git worktree working directories
.sf/auto.lock       — crash detection sentinel
.sf/metrics.db      — token/cost metrics (dedicated DB, separate from sf.db)
.sf/sf.db*          — SQLite canonical structured state, priority order, validation/gate state, and UOK ledgers
```

The symlink case uses a blanket `.sf` gitignore pattern (git cannot traverse symlinks). The directory case uses granular patterns so planning artifacts remain trackable.

**DB-first invariant:** `sf.db` is the single source of truth for all structured state (milestones, slices, tasks, decisions, requirements, memories, self-feedback). Markdown files under `.sf/` are rendered projections or human-editable inputs — they are never the authoritative source when the DB is open. Agents write to DB via tool calls (`save_decision`, `save_knowledge`, `save_requirement`, `update_requirement`), not by appending to `.md` files.

## Key flows

**Autonomous dispatch loop** (`src/resources/extensions/sf/auto/`):
1. UOK reconciles the DB-backed ledger and runtime diagnostics into a typed state snapshot
2. Controller selects the next dispatch unit (research, plan, implement, verify, etc.) from canonical DB state
3. A fresh agent context is started with the task plan injected via `system-context.js`
4. Agent writes artifacts to disk, commits, exits
5. UOK records completion/recovery, updates projections, and repeats until milestone completes or a gate fails

**System context assembly** (`bootstrap/system-context.js`):
`PREFERENCES.md` → project knowledge (DB memories table) → `ARCHITECTURE.md` → `CODEBASE.md` → code intelligence → active decisions (DB) → active requirements (DB) → self-feedback (DB) → worktree/VCS blocks

**Write gate** (`bootstrap/write-gate.ts`):
All file writes in autonomous mode pass through a gate. Protected files (CLAUDE.md, CODEBASE.md, certain spec files) require explicit override.

## UOK Dispatch State Machine (Five-Phase Loop)

UOK orchestrates work through a deterministic five-phase state machine:

```mermaid
stateDiagram-v2
    direction LR

    [*] --> PhaseDiscuss : sf start / milestone begin

    PhaseDiscuss --> PhasePlan : discussion-close gate passes
    PhaseDiscuss --> PhaseDiscuss : gate fails → gather more context

    PhasePlan --> PhaseExecute : planning-approval gate passes
    PhasePlan --> PhasePlan : gate fails → replan or add remediation slice

    PhaseExecute --> PhaseMerge : all tasks complete, code-quality + test gates pass
    PhaseExecute --> PhaseExecute : task fails → isolate + recovery slice dispatched
    PhaseExecute --> PhaseExecute : stuck-loop detected → timeout / skip recovery

    PhaseMerge --> PhaseComplete : integration gate passes
    PhaseMerge --> PhaseExecute : integration failure → add fix slice, retry

    PhaseComplete --> [*] : acceptance gate passes, summary written
    PhaseComplete --> PhaseExecute : remediation milestone added

    note right of PhaseExecute
        See Task Lifecycle diagram below.
    end note
```

```mermaid
stateDiagram-v2
    direction TB

    [*] --> todo : task created

    todo --> running : dispatch picks task
    todo --> cancelled : explicit cancel

    running --> verifying : implementation done, run checks
    running --> reviewing : needs human / agent review
    running --> done : trivial task, skip verify
    running --> blocked : dependency unresolved
    running --> paused : user interrupt
    running --> retrying : transient failure, retry
    running --> failed : unrecoverable error
    running --> cancelled : explicit cancel

    verifying --> reviewing : checks pass, review needed
    verifying --> done : checks pass, no review needed
    verifying --> blocked : check dependency missing
    verifying --> paused : user interrupt
    verifying --> retrying : check flake, retry
    verifying --> failed : checks failed
    verifying --> cancelled : explicit cancel

    reviewing --> running : feedback applied, re-implement
    reviewing --> verifying : back to verify after edits
    reviewing --> done : review approved
    reviewing --> blocked : waiting on reviewer
    reviewing --> paused : user interrupt
    reviewing --> failed : review rejected
    reviewing --> cancelled : explicit cancel

    blocked --> todo : dependency resolved, reset
    blocked --> running : unblocked, resume
    blocked --> retrying : auto-unblock retry
    blocked --> cancelled : explicit cancel

    paused --> running : resume
    paused --> retrying : auto-resume
    paused --> cancelled : explicit cancel

    retrying --> running : retry attempt starts
    retrying --> failed : retry budget exhausted
    retrying --> cancelled : explicit cancel

    failed --> retrying : manual re-queue
    failed --> cancelled : give up

    done --> [*]
    cancelled --> [*]
```

```mermaid
stateDiagram-v2
    direction LR

    [*] --> queued : task_scheduler INSERT
    queued --> due : poll tick reaches due_at
    due --> claimed : atomic UPDATE (conditional, one worker wins)
    claimed --> dispatched : worker picks up claim
    dispatched --> consumed : unit completes (any terminal status)
    dispatched --> expired : lease timeout, no heartbeat
    expired --> queued : lease cleared, re-enqueued

    note right of claimed
        Lease prevents two workers
        dispatching the same unit
        (shared-NFS / parallel mode).
    end note
```

**Phase details:**

| Phase | Purpose | Exit Conditions | Failure Path |
|-------|---------|-----------------|--------------|
| **PhaseDiscuss** | Gather project context, requirements, scope | Gates pass (discussion-close gate) | Loop back for more context or escalate |
| **PhasePlan** | Create milestone/slice plans with success criteria | Gates pass (planning-approval gate) | Add remediation slices or replan |
| **PhaseExecute** | Implement tasks through the dispatch sequence | Gates pass (code-quality, test gates) | Isolate failed task, add recovery slices |
| **PhaseMerge** | Integrate slices, run end-to-end tests, merge branches | Gates pass (integration gate) | Add integration-fix slices, retry |
| **PhaseComplete** | Final validation, audit trail, summary, gate completion | Validation passes (acceptance gate) | Add remediation milestone or escalate |

**Error recovery:**

- If a gate fails, UOK records the verdict and routes through phase-specific handlers
- Failed gates can trigger automatic remediation slices (new plan → execute loop)
- Stuck-loop detection: if the same unit repeats without progress after N attempts, invoke recovery protocol (timeout, manual review, or skip)
- Crash recovery: `.sf/auto.lock` sentinel + `sf.db` WAL enables recovery from agent crash mid-phase
- Run errors are capped at 4 KB in `uok_runs.error`; payloads exceeding that spill to `.sf/runtime/errors/<runId>.txt`

## Gate Verdict Semantics

Every gate runs in parallel and returns one of three verdicts:

| Verdict | Meaning | Next Action |
|---------|---------|-------------|
| **passed** | Gate question answerable; no concern blocking this phase | Proceed to next phase |
| **failed** | Gate question answerable; concern blocks phase progression | Record failure, optionally add remediation slice(s) |
| **omitted** | Gate question not applicable to this unit (e.g., no auth work → auth gate omitted) | Proceed (gate doesn't apply) |

**Critical rule:** `omitted` must have a one-line reason (e.g., "no auth surface"). Unexplained omitted verdicts are treated as failures and re-dispatched with explicit instruction to pick `passed` or `failed`.

Gate run history is written to `.sf/traces/<traceId>.jsonl` (append-only JSONL, not DB). Gate circuit-breaker state lives in the `gate_circuit_breakers` table in `sf.db`.

## Outcome Learning for Model Selection

UOK tracks model success/failure per task-type using Bayesian updating:

```
P(model_i succeeds | task_type) = (successes + prior) / (total_trials + prior_weight)
```

**Mechanism:**

- After each task completes, UOK logs: `{ model, task_type, succeeded: bool, latency_ms, tokens }`
- Model scores updated dynamically; different models get different confidence per phase/task
- Prior weights prevent early abandonment (new models get benefit of the doubt)
- Used by `benchmark-selector.ts` to route future similar tasks to higher-scoring models

## Self-Evolution Mechanisms

### Self-Report Collection
Agents and gates file issues via the `report_issue` tool during dispatch:
- Reports stored in `self_feedback` table in `sf.db`
- Triage pipeline (`triage-self-feedback.js`) runs at session start to cluster and prioritize entries
- High/critical entries surfaced in system context for the next planning round
- **Status:** Collection and triage injection are active

### Knowledge Compounding
Knowledge entries are stored in the `memories` table in `sf.db` (category: `knowledge`):
- Agents write via `save_knowledge` tool (not by appending to files)
- Injected into agent prompts via `system-context.js` (DB query, keyword-scoped, budget-capped)
- `knowledge-compounding.js` distills high-confidence judgment-log entries after each milestone close
- **Status:** Storage, injection, and compounding are all active

### Requirement Promotion
`requirement-promoter.js` sweeps `self_feedback` entries at session start:
- Clusters recurring feedback by kind (count ≥ 5 or spanning ≥ 3 milestones)
- Promotes clusters to the `requirements` table via `upsertRequirement`
- Promoted entries are marked resolved in `self_feedback`
- **Status:** Active

### Gate-Based Pattern Detection
Gates can detect and report repeated failure patterns (e.g., "same requirement-validation failure in S01 and S03")
- **Status:** Logic exists per gate; no automatic aggregation across gates

## Invariants

- UOK and the dispatch controller are pure TypeScript — no LLM decisions in the dispatch loop itself.
- Each dispatch unit runs in a fresh context — no cross-turn state accumulation.
- Planning artifacts are tracked in git; runtime artifacts are never committed.
- **DB-first:** `sf.db` is the only executable truth. Agents read decisions, requirements, and knowledge from DB-injected context; they write back via tool calls. `.md` projection files are rendered outputs, not inputs.
- `SF_RUNTIME_PATTERNS` in `gitignore.ts` is the canonical source of truth for runtime paths. `git-service.ts` (`RUNTIME_EXCLUSION_PATHS`) and `worktree-manager.ts` (`SKIP_*` arrays) must stay synchronized with it.
- The user is the end-gate. SF delivers for review, not to production.