feat: harness scaffold, runtime pattern sync, and ARCHITECTURE injection

- Add harness/ directory to SF repo (specs/, evals/, graders/ with AGENTS.md) and seed harness/specs/bootstrap.md (agent-legibility verification) - Extend agentic-docs-scaffold.ts: new repos get harness/ + ADR-TEMPLATE.md and just adr / just spec / just harness-spec recipes via justfile - Sync SF_RUNTIME_PATTERNS (gitignore.ts canonical) → git-service.ts and worktree-manager.ts: add audit/, exec/, model-benchmarks/, reports/, notifications.jsonl, routing-history.json, self-feedback.jsonl, repo-meta.json, and milestone continue-marker patterns - Inject ARCHITECTURE.md into system prompt via loadArchitectureBlock() in system-context.ts (capped at 8 000 chars, after KNOWLEDGE block) - Write real ARCHITECTURE.md for this repo (system map, .sf/ layout, key flows) - Add ADR-TEMPLATE.md to docs/design-docs/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-01 22:46:28 +02:00 · 2026-05-01 22:46:28 +02:00 · 6f877b61ab
commit 6f877b61ab
parent 16ff608d80
10 changed files with 310 additions and 15 deletions
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@ -1,20 +1,81 @@
 # Architecture

-This file is the short map of the codebase. Keep it current and compact.
-
 ## Purpose

-Describe the product, its users, and the job this repository exists to do.
+Singularity Forge (SF) is an autonomous agent orchestration system. It runs long-horizon coding work as a state machine: milestones → slices → tasks. Each dispatch unit runs a fresh AI context, writes its output to disk, then terminates. A deterministic controller (not an LLM) reads disk state and decides what to dispatch next. The user is the end-gate — autonomous mode delivers work to human review, it does not merge to production unattended.

 ## Codemap

- `src/`: primary implementation.
- `tests/`: behavior and regression coverage.
- `docs/`: durable product, design, plan, reliability, and security context.
+| Path | Purpose |
+|------|---------|
+| `src/loader.ts` | Entry point — initializes resources, registers extension |
+| `src/headless.ts` | Non-interactive (headless) mode driver — exit codes 0/1/10/11/12 |
+| `src/headless-events.ts` | Transcript event parsing and notification routing |
+| `src/extension-registry.ts` | Registers SF as a Pi coding-agent extension |
+| `src/resources/extensions/sf/` | All SF extension source (TypeScript) |
+| `src/resources/extensions/sf/auto/` | Autonomous workflow orchestrator (state machine, dispatch, planning) |
+| `src/resources/extensions/sf/bootstrap/` | Context injection, system prompt assembly |
+| `src/resources/extensions/sf/prompts/` | Prompt templates (`.md`, loaded by `prompt-loader.ts`) |
+| `src/resources/extensions/sf/tests/` | Unit and integration tests |
+| `dist/resources/extensions/sf/` | Compiled JS (rebuilt by `npm run copy-resources`) |
+| `~/.sf/agent/extensions/sf/` | Installed copy (synced from dist on startup) |
+| `docs/` | Durable product, design, plan, reliability, and security context |
+| `harness/` | Specs (behavior contracts), evals (model-output tests), graders |
+
+## State layout (`.sf/`)
+
+`.sf/` can be a **symlink** (external state, `~/.sf/projects/<hash>/`) or a **local directory** (tracking-enabled per ADR-001).
+
+**Tracked in git** (travel with the branch, per ADR-001):
+```
+.sf/milestones/     — roadmaps, plans, summaries, task plans
+.sf/PROJECT.md      — project overview
+.sf/DECISIONS.md    — architectural decisions register
+.sf/REQUIREMENTS.md — requirements register
+.sf/QUEUE.md        — work queue / backlog
+.sf/KNOWLEDGE.md    — project-specific rules for agents
+```
+
+**Gitignored** (runtime/ephemeral — managed by `ensureGitInfoExclude()` in `.git/info/exclude`):
+```
+.sf/activity/       — JSONL session dumps
+.sf/audit/          — audit trail entries
+.sf/exec/           — in-flight execution state
+.sf/forensics/      — crash forensics
+.sf/journal/        — SF journal entries
+.sf/model-benchmarks/ — model benchmark results
+.sf/parallel/       — parallel dispatch coordination
+.sf/reports/        — generated reports
+.sf/runtime/        — dispatch records, timeout tracking
+.sf/worktrees/      — git worktree working directories
+.sf/auto.lock       — crash detection sentinel
+.sf/metrics.json    — token/cost accumulator
+.sf/sf.db*          — SQLite cache (rebuilt from markdown by importers)
+.sf/STATE.md        — derived state cache
+.sf/notifications.jsonl, .sf/routing-history.json, .sf/self-feedback.jsonl, .sf/repo-meta.json
+```
+
+The symlink case uses a blanket `.sf` gitignore pattern (git cannot traverse symlinks). The directory case uses granular patterns so planning artifacts remain trackable.
+
+## Key flows
+
+**Autonomous dispatch loop** (`src/resources/extensions/sf/auto/`):
+1. `deriveState()` reads disk and produces a typed state snapshot
+2. Controller selects the next dispatch unit (research, plan, implement, verify, etc.)
+3. A fresh agent context is started with the task plan injected via `system-context.ts`
+4. Agent writes artifacts to disk, commits, exits
+5. Loop repeats until milestone completes or a gate fails
+
+**System context assembly** (`bootstrap/system-context.ts`):
+`PREFERENCES.md` → `KNOWLEDGE.md` → `ARCHITECTURE.md` → `CODEBASE.md` → code intelligence → memories → worktree/VCS blocks
+
+**Write gate** (`bootstrap/write-gate.ts`):
+All file writes in autonomous mode pass through a gate. Protected files (CLAUDE.md, CODEBASE.md, certain spec files) require explicit override.

 ## Invariants

- Prefer small, named modules with clear ownership.
- Behavior changes need tests or an explicit eval.
- Keep generated artifacts out of hand-written design docs.
- Update this map when new top-level concepts or directories become important.
+- The state machine (controller) is pure TypeScript — no LLM decisions in the dispatch loop itself.
+- Each dispatch unit runs in a fresh context — no cross-turn state accumulation.
+- Planning artifacts are tracked in git; runtime artifacts are never committed.
+- `SF_RUNTIME_PATTERNS` in `gitignore.ts` is the canonical source of truth for runtime paths. `git-service.ts` (`RUNTIME_EXCLUSION_PATHS`) and `worktree-manager.ts` (`SKIP_*` arrays) must stay synchronized with it.
+- The user is the end-gate. SF delivers for review, not to production.
--- a/docs/design-docs/ADR-TEMPLATE.md
+++ b/docs/design-docs/ADR-TEMPLATE.md
@ -0,0 +1,29 @@
+# ADR-NNN: Title
+
+**Status:** Proposed | Accepted | Rejected | Superseded by ADR-NNN
+**Date:** YYYY-MM-DD
+**Deciders:** (names)
+
+## Context
+
+What is the problem or situation that requires a decision? Include constraints and the forces at play.
+
+## Decision
+
+What is the change being made or the approach being adopted?
+
+## Consequences
+
+What becomes easier or harder after this decision? Include positive and negative outcomes.
+
+## Alternatives Considered
+
+What other options were evaluated and why were they not chosen?
+
+## Validation
+
+What command or evidence confirms the decision is correct?
+
+```bash
+# verification command here
+```
--- a/docs/design-docs/index.md
+++ b/docs/design-docs/index.md
@ -28,5 +28,6 @@ in `docs/dev/`. Lighter design docs (problem framing, event model decisions) liv

 | Doc | Title | Status |
 |-----|-------|--------|
+| [ADR-TEMPLATE.md](./ADR-TEMPLATE.md) | ADR Template | Reference |
 | [core-beliefs.md](./core-beliefs.md) | Core Beliefs | Accepted |
 | [notification-event-model.md](./notification-event-model.md) | Notification Event Model | Draft |
--- a/harness/specs/bootstrap.md
+++ b/harness/specs/bootstrap.md
@ -0,0 +1,20 @@
+# Bootstrap Spec: Agent Legibility
+
+Verifies that the SF repo is minimally agent-legible.
+
+## Criteria
+
+- [ ] `AGENTS.md` exists at repo root and is non-empty.
+- [ ] `ARCHITECTURE.md` exists at repo root and describes the system.
+- [ ] `docs/exec-plans/active/index.md` exists.
+- [ ] `docs/exec-plans/tech-debt-tracker.md` exists.
+- [ ] `docs/design-docs/ADR-TEMPLATE.md` exists.
+- [ ] `harness/specs/` exists with at least this file.
+
+## Verification command
+
+```bash
+for f in AGENTS.md ARCHITECTURE.md docs/exec-plans/active/index.md docs/exec-plans/tech-debt-tracker.md docs/design-docs/ADR-TEMPLATE.md harness/specs/bootstrap.md; do [ -s "$f" ] && echo "OK: $f" || echo "MISSING: $f"; done
+```
+
+All lines should start with `OK:` for this spec to pass.
--- a/29
+++ b/29
@ -51,3 +51,32 @@ clean:
 # Run SF CLI from source (usage: just sf <args>)
 sf *args:
    ./bin/sf-from-source {{args}}
+
+# Create a new ADR from the template (usage: just adr "My Decision Title")
+adr title:
+    #!/usr/bin/env bash
+    set -euo pipefail
+    next=$(ls docs/dev/ADR-*.md 2>/dev/null | sed 's/.*ADR-\([0-9]*\).*/\1/' | sort -n | tail -1)
+    num=$(printf "%03d" $(( ${next:-0} + 1 )))
+    slug=$(echo "{{title}}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-')
+    dest="docs/dev/ADR-${num}-${slug}.md"
+    sed "s/ADR-NNN/ADR-${num}/; s/Title/{{title}}/" docs/design-docs/ADR-TEMPLATE.md > "${dest}"
+    echo "Created: ${dest}"
+
+# Create a new product spec (usage: just spec "my-feature-name")
+spec name:
+    #!/usr/bin/env bash
+    set -euo pipefail
+    dest="docs/product-specs/{{name}}.md"
+    if [ -f "${dest}" ]; then echo "Already exists: ${dest}"; exit 1; fi
+    printf "# {{name}}\n\n## Job to be done\n\n## Workflow\n\n## Edge cases\n\n## Non-goals\n\n## Verification\n\n\`\`\`bash\n# command that proves this spec passes\n\`\`\`\n" > "${dest}"
+    echo "Created: ${dest}"
+
+# Create a new harness spec (usage: just harness-spec "behavior-name")
+harness-spec name:
+    #!/usr/bin/env bash
+    set -euo pipefail
+    dest="harness/specs/{{name}}.md"
+    if [ -f "${dest}" ]; then echo "Already exists: ${dest}"; exit 1; fi
+    printf "# Harness Spec: {{name}}\n\n## Behavior\n\n## Verification command\n\n\`\`\`bash\n\n\`\`\`\n\n## Pass criteria\n\n" > "${dest}"
+    echo "Created: ${dest}"
--- a/src/resources/extensions/sf/agentic-docs-scaffold.ts
+++ b/src/resources/extensions/sf/agentic-docs-scaffold.ts
@ -315,6 +315,105 @@ Document expected failure modes, recovery paths, observability, and release chec
 		content: `# Security

 Document trust boundaries, secrets handling, dependency risk, and security review requirements here.
+`,
+	},
+	{
+		path: "docs/design-docs/ADR-TEMPLATE.md",
+		content: `# ADR-NNN: Title
+
+**Status:** Proposed | Accepted | Rejected | Superseded by ADR-NNN
+**Date:** YYYY-MM-DD
+
+## Context
+
+What is the problem or situation that requires a decision? Include constraints and the forces at play.
+
+## Decision
+
+What is the change being made or the approach being adopted?
+
+## Consequences
+
+What becomes easier or harder after this decision? Include positive and negative outcomes.
+
+## Alternatives Considered
+
+What other options were evaluated and why were they not chosen?
+`,
+	},
+	{
+		path: "harness/AGENTS.md",
+		content: `# Harness Agent Notes
+
+The harness is a collection of contracts the agent can read and verify against.
+
+- \`specs/\`: behavior contracts. Each spec states what "done" looks like and the command that proves it.
+- \`evals/\`: task definitions for behaviors tests cannot cover — model output quality, multi-turn flows, agent decisions.
+- \`graders/\`: reusable grader scripts (code-based checks, LLM-judge prompts used by evals).
+
+**Rule:** Before marking a task done, run the relevant spec's verification command. Record the result in the completion summary or execution plan.
+`,
+	},
+	{
+		path: "harness/specs/AGENTS.md",
+		content: `# Harness Specs Agent Notes
+
+Each spec file in this directory:
+
+- States the behavior being specified (not the implementation).
+- Includes the exact command that proves the spec passes.
+- Is referenced by the relevant execution plan or ADR.
+
+Write the spec before implementation. Run it after. Record the result.
+`,
+	},
+	{
+		path: "harness/specs/bootstrap.md",
+		content: `# Bootstrap Spec: Agent Legibility
+
+Verifies that this repo is minimally agent-legible.
+
+## Criteria
+
+- [ ] \`AGENTS.md\` exists at repo root and is non-empty.
+- [ ] \`ARCHITECTURE.md\` exists at repo root and is non-empty.
+- [ ] \`docs/exec-plans/active/\` exists.
+- [ ] \`docs/exec-plans/tech-debt-tracker.md\` exists.
+- [ ] \`docs/design-docs/ADR-TEMPLATE.md\` exists.
+
+## Verification command
+
+\`\`\`bash
+for f in AGENTS.md ARCHITECTURE.md docs/exec-plans/active/index.md docs/exec-plans/tech-debt-tracker.md docs/design-docs/ADR-TEMPLATE.md; do [ -s "$f" ] && echo "OK: $f" || echo "MISSING: $f"; done
+\`\`\`
+
+All lines should start with \`OK:\` for the bootstrap spec to pass.
+`,
+	},
+	{
+		path: "harness/evals/AGENTS.md",
+		content: `# Harness Evals Agent Notes
+
+Evals verify behavior that unit tests cannot cover — model output quality, agent decisions, multi-turn flows.
+
+Each eval should include:
+- The input fixture or prompt
+- The expected output or scoring rubric
+- The command to run it (\`promptfoo eval\`, custom script, etc.)
+
+Keep evals deterministic where possible. Log results to \`docs/records/\` at milestone close.
+`,
+	},
+	{
+		path: "harness/graders/AGENTS.md",
+		content: `# Harness Graders Agent Notes
+
+Graders are reusable scripts or prompts that score eval outputs.
+
+- Code-based graders: shell scripts or test files that check structured outputs deterministically.
+- LLM-judge graders: prompt templates that ask a model to score free-text output against a rubric.
+
+Prefer code-based graders. Add LLM-judge graders only when deterministic checking is impossible.
 `,
 	},
 ];
--- a/src/resources/extensions/sf/bootstrap/system-context.ts
+++ b/src/resources/extensions/sf/bootstrap/system-context.ts
@ -168,6 +168,7 @@ export async function buildBeforeAgentStartResult(
 		sfHome,
 		process.cwd(),
 	);
+	const architectureBlock = loadArchitectureBlock(process.cwd());
 	if (globalSizeKb > 4) {
 		ctx.ui.notify(
 			`SF: ~/.sf/agent/KNOWLEDGE.md is ${globalSizeKb.toFixed(1)}KB — consider trimming to keep system prompt lean.`,
@ -281,7 +282,7 @@ export async function buildBeforeAgentStartResult(
 		? `\n\n## Subagent Model\n\nWhen spawning subagents via the \`subagent\` tool, always pass \`model: "${subagentModelConfig.primary}"\` in the tool call parameters. Never omit this — always specify it explicitly.`
 		: "";

-	const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${systemContent}${preferenceBlock}${knowledgeBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${repositoryVcsBlock}${modelIdentityBlock}${subagentModelBlock}`;
+	const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${systemContent}${preferenceBlock}${knowledgeBlock}${architectureBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${repositoryVcsBlock}${modelIdentityBlock}${subagentModelBlock}`;

 	stopContextTimer({
 		systemPromptSize: fullSystem.length,
@ -363,6 +364,28 @@ export function loadKnowledgeBlock(
 	};
 }

+/**
+ * Load ARCHITECTURE.md from the project root into context. Capped at 8 000 chars
+ * to avoid bloating every request — full file is always readable on disk.
+ */
+function loadArchitectureBlock(cwd: string): string {
+	const architecturePath = join(cwd, "ARCHITECTURE.md");
+	if (!existsSync(architecturePath)) return "";
+	try {
+		const raw = readFileSync(architecturePath, "utf-8").trim();
+		if (!raw) return "";
+		const MAX_CHARS = 8_000;
+		const content =
+			raw.length > MAX_CHARS
+				? raw.slice(0, MAX_CHARS) +
+					"\n\n*(truncated — see ARCHITECTURE.md for full map)*"
+				: raw;
+		return `\n\n[ARCHITECTURE — System map and invariants]\n\n${content}`;
+	} catch {
+		return "";
+	}
+}
+
 function buildWorktreeContextBlock(): string {
 	const worktreeName = getActiveWorktreeName();
 	const worktreeMainCwd = getWorktreeOriginalCwd();
--- a/src/resources/extensions/sf/git-service.ts
+++ b/src/resources/extensions/sf/git-service.ts
@ -236,20 +236,30 @@ export interface PreMergeCheckResult {
 */
 export const RUNTIME_EXCLUSION_PATHS: readonly string[] = [
 	".sf/activity/",
+	".sf/audit/",
+	".sf/exec/",
 	".sf/forensics/",
+	".sf/journal/",
+	".sf/model-benchmarks/",
+	".sf/parallel/",
+	".sf/reports/",
 	".sf/runtime/",
 	".sf/worktrees/",
-	".sf/parallel/",
 	".sf/auto.lock",
 	".sf/metrics.json",
 	".sf/completed-units*.json", // covers completed-units.json and archived completed-units-{MID}.json
 	".sf/state-manifest.json",
 	".sf/STATE.md",
 	".sf/sf.db*",
-	".sf/journal/",
 	".sf/doctor-history.jsonl",
 	".sf/event-log.jsonl",
+	".sf/notifications.jsonl",
+	".sf/routing-history.json",
+	".sf/self-feedback.jsonl",
+	".sf/repo-meta.json",
 	".sf/DISCUSSION-MANIFEST.json",
+	".sf/milestones/**/*-CONTINUE.md",
+	".sf/milestones/**/continue.md",
 ];

 function isPathExcluded(path: string, exclusions: readonly string[]): boolean {
--- a/src/resources/extensions/sf/gitignore.ts
+++ b/src/resources/extensions/sf/gitignore.ts
@ -28,19 +28,27 @@ import { sfRoot } from "./paths.js";
 */
 const SF_RUNTIME_PATTERNS = [
 	".sf/activity/",
+	".sf/audit/",
+	".sf/exec/",
 	".sf/forensics/",
+	".sf/journal/",
+	".sf/model-benchmarks/",
+	".sf/parallel/",
+	".sf/reports/",
 	".sf/runtime/",
 	".sf/worktrees/",
-	".sf/parallel/",
 	".sf/auto.lock",
 	".sf/metrics.json",
 	".sf/completed-units*.json", // covers completed-units.json and archived completed-units-{MID}.json
 	".sf/state-manifest.json",
 	".sf/STATE.md",
 	".sf/sf.db*",
-	".sf/journal/",
 	".sf/doctor-history.jsonl",
 	".sf/event-log.jsonl",
+	".sf/notifications.jsonl",
+	".sf/routing-history.json",
+	".sf/self-feedback.jsonl",
+	".sf/repo-meta.json",
 	".sf/DISCUSSION-MANIFEST.json",
 	".sf/milestones/**/*-CONTINUE.md",
 	".sf/milestones/**/continue.md",
--- a/src/resources/extensions/sf/worktree-manager.ts
+++ b/src/resources/extensions/sf/worktree-manager.ts
@ -765,8 +765,12 @@ const SKIP_PATHS = [
 	".sf/worktrees/",
 	".sf/runtime/",
 	".sf/activity/",
+	".sf/audit/",
+	".sf/exec/",
 	".sf/forensics/",
+	".sf/model-benchmarks/",
 	".sf/parallel/",
+	".sf/reports/",
 	".sf/journal/",
 ];
 const SKIP_EXACT = [
@ -776,6 +780,11 @@ const SKIP_EXACT = [
 	".sf/state-manifest.json",
 	".sf/doctor-history.jsonl",
 	".sf/event-log.jsonl",
+	".sf/notifications.jsonl",
+	".sf/routing-history.json",
+	".sf/self-feedback.jsonl",
+	".sf/repo-meta.json",
+	".sf/DISCUSSION-MANIFEST.json",
 ];
 /** File prefixes to skip (for wildcard patterns like completed-units*.json, sf.db*). */
 const SKIP_PREFIXES = [".sf/completed-units", ".sf/sf.db"];
@ -784,6 +793,12 @@ function shouldSkipPath(filePath: string): boolean {
 	if (SKIP_PATHS.some((p) => filePath.startsWith(p))) return true;
 	if (SKIP_EXACT.includes(filePath)) return true;
 	if (SKIP_PREFIXES.some((p) => filePath.startsWith(p))) return true;
+	// Milestone continue markers are ephemeral interruption signals, not durable artifacts.
+	if (
+		filePath.startsWith(".sf/milestones/") &&
+		(filePath.endsWith("-CONTINUE.md") || filePath.endsWith("/continue.md"))
+	)
+		return true;
 	return false;
 }