feat: harness scaffold, runtime pattern sync, and ARCHITECTURE injection

- Add harness/ directory to SF repo (specs/, evals/, graders/ with AGENTS.md)
  and seed harness/specs/bootstrap.md (agent-legibility verification)
- Extend agentic-docs-scaffold.ts: new repos get harness/ + ADR-TEMPLATE.md
  and just adr / just spec / just harness-spec recipes via justfile
- Sync SF_RUNTIME_PATTERNS (gitignore.ts canonical) → git-service.ts and
  worktree-manager.ts: add audit/, exec/, model-benchmarks/, reports/,
  notifications.jsonl, routing-history.json, self-feedback.jsonl, repo-meta.json,
  and milestone continue-marker patterns
- Inject ARCHITECTURE.md into system prompt via loadArchitectureBlock() in
  system-context.ts (capped at 8 000 chars, after KNOWLEDGE block)
- Write real ARCHITECTURE.md for this repo (system map, .sf/ layout, key flows)
- Add ADR-TEMPLATE.md to docs/design-docs/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-01 22:46:28 +02:00
parent 16ff608d80
commit 6f877b61ab
10 changed files with 310 additions and 15 deletions

View file

@ -1,20 +1,81 @@
# Architecture
This file is the short map of the codebase. Keep it current and compact.
## Purpose
Describe the product, its users, and the job this repository exists to do.
Singularity Forge (SF) is an autonomous agent orchestration system. It runs long-horizon coding work as a state machine: milestones → slices → tasks. Each dispatch unit runs a fresh AI context, writes its output to disk, then terminates. A deterministic controller (not an LLM) reads disk state and decides what to dispatch next. The user is the end-gate — autonomous mode delivers work to human review, it does not merge to production unattended.
## Codemap
- `src/`: primary implementation.
- `tests/`: behavior and regression coverage.
- `docs/`: durable product, design, plan, reliability, and security context.
| Path | Purpose |
|------|---------|
| `src/loader.ts` | Entry point — initializes resources, registers extension |
| `src/headless.ts` | Non-interactive (headless) mode driver — exit codes 0/1/10/11/12 |
| `src/headless-events.ts` | Transcript event parsing and notification routing |
| `src/extension-registry.ts` | Registers SF as a Pi coding-agent extension |
| `src/resources/extensions/sf/` | All SF extension source (TypeScript) |
| `src/resources/extensions/sf/auto/` | Autonomous workflow orchestrator (state machine, dispatch, planning) |
| `src/resources/extensions/sf/bootstrap/` | Context injection, system prompt assembly |
| `src/resources/extensions/sf/prompts/` | Prompt templates (`.md`, loaded by `prompt-loader.ts`) |
| `src/resources/extensions/sf/tests/` | Unit and integration tests |
| `dist/resources/extensions/sf/` | Compiled JS (rebuilt by `npm run copy-resources`) |
| `~/.sf/agent/extensions/sf/` | Installed copy (synced from dist on startup) |
| `docs/` | Durable product, design, plan, reliability, and security context |
| `harness/` | Specs (behavior contracts), evals (model-output tests), graders |
## State layout (`.sf/`)
`.sf/` can be a **symlink** (external state, `~/.sf/projects/<hash>/`) or a **local directory** (tracking-enabled per ADR-001).
**Tracked in git** (travel with the branch, per ADR-001):
```
.sf/milestones/ — roadmaps, plans, summaries, task plans
.sf/PROJECT.md — project overview
.sf/DECISIONS.md — architectural decisions register
.sf/REQUIREMENTS.md — requirements register
.sf/QUEUE.md — work queue / backlog
.sf/KNOWLEDGE.md — project-specific rules for agents
```
**Gitignored** (runtime/ephemeral — managed by `ensureGitInfoExclude()` in `.git/info/exclude`):
```
.sf/activity/ — JSONL session dumps
.sf/audit/ — audit trail entries
.sf/exec/ — in-flight execution state
.sf/forensics/ — crash forensics
.sf/journal/ — SF journal entries
.sf/model-benchmarks/ — model benchmark results
.sf/parallel/ — parallel dispatch coordination
.sf/reports/ — generated reports
.sf/runtime/ — dispatch records, timeout tracking
.sf/worktrees/ — git worktree working directories
.sf/auto.lock — crash detection sentinel
.sf/metrics.json — token/cost accumulator
.sf/sf.db* — SQLite cache (rebuilt from markdown by importers)
.sf/STATE.md — derived state cache
.sf/notifications.jsonl, .sf/routing-history.json, .sf/self-feedback.jsonl, .sf/repo-meta.json
```
The symlink case uses a blanket `.sf` gitignore pattern (git cannot traverse symlinks). The directory case uses granular patterns so planning artifacts remain trackable.
## Key flows
**Autonomous dispatch loop** (`src/resources/extensions/sf/auto/`):
1. `deriveState()` reads disk and produces a typed state snapshot
2. Controller selects the next dispatch unit (research, plan, implement, verify, etc.)
3. A fresh agent context is started with the task plan injected via `system-context.ts`
4. Agent writes artifacts to disk, commits, exits
5. Loop repeats until milestone completes or a gate fails
**System context assembly** (`bootstrap/system-context.ts`):
`PREFERENCES.md``KNOWLEDGE.md``ARCHITECTURE.md``CODEBASE.md` → code intelligence → memories → worktree/VCS blocks
**Write gate** (`bootstrap/write-gate.ts`):
All file writes in autonomous mode pass through a gate. Protected files (CLAUDE.md, CODEBASE.md, certain spec files) require explicit override.
## Invariants
- Prefer small, named modules with clear ownership.
- Behavior changes need tests or an explicit eval.
- Keep generated artifacts out of hand-written design docs.
- Update this map when new top-level concepts or directories become important.
- The state machine (controller) is pure TypeScript — no LLM decisions in the dispatch loop itself.
- Each dispatch unit runs in a fresh context — no cross-turn state accumulation.
- Planning artifacts are tracked in git; runtime artifacts are never committed.
- `SF_RUNTIME_PATTERNS` in `gitignore.ts` is the canonical source of truth for runtime paths. `git-service.ts` (`RUNTIME_EXCLUSION_PATHS`) and `worktree-manager.ts` (`SKIP_*` arrays) must stay synchronized with it.
- The user is the end-gate. SF delivers for review, not to production.

View file

@ -0,0 +1,29 @@
# ADR-NNN: Title
**Status:** Proposed | Accepted | Rejected | Superseded by ADR-NNN
**Date:** YYYY-MM-DD
**Deciders:** (names)
## Context
What is the problem or situation that requires a decision? Include constraints and the forces at play.
## Decision
What is the change being made or the approach being adopted?
## Consequences
What becomes easier or harder after this decision? Include positive and negative outcomes.
## Alternatives Considered
What other options were evaluated and why were they not chosen?
## Validation
What command or evidence confirms the decision is correct?
```bash
# verification command here
```

View file

@ -28,5 +28,6 @@ in `docs/dev/`. Lighter design docs (problem framing, event model decisions) liv
| Doc | Title | Status |
|-----|-------|--------|
| [ADR-TEMPLATE.md](./ADR-TEMPLATE.md) | ADR Template | Reference |
| [core-beliefs.md](./core-beliefs.md) | Core Beliefs | Accepted |
| [notification-event-model.md](./notification-event-model.md) | Notification Event Model | Draft |

View file

@ -0,0 +1,20 @@
# Bootstrap Spec: Agent Legibility
Verifies that the SF repo is minimally agent-legible.
## Criteria
- [ ] `AGENTS.md` exists at repo root and is non-empty.
- [ ] `ARCHITECTURE.md` exists at repo root and describes the system.
- [ ] `docs/exec-plans/active/index.md` exists.
- [ ] `docs/exec-plans/tech-debt-tracker.md` exists.
- [ ] `docs/design-docs/ADR-TEMPLATE.md` exists.
- [ ] `harness/specs/` exists with at least this file.
## Verification command
```bash
for f in AGENTS.md ARCHITECTURE.md docs/exec-plans/active/index.md docs/exec-plans/tech-debt-tracker.md docs/design-docs/ADR-TEMPLATE.md harness/specs/bootstrap.md; do [ -s "$f" ] && echo "OK: $f" || echo "MISSING: $f"; done
```
All lines should start with `OK:` for this spec to pass.

View file

@ -51,3 +51,32 @@ clean:
# Run SF CLI from source (usage: just sf <args>)
sf *args:
./bin/sf-from-source {{args}}
# Create a new ADR from the template (usage: just adr "My Decision Title")
adr title:
#!/usr/bin/env bash
set -euo pipefail
next=$(ls docs/dev/ADR-*.md 2>/dev/null | sed 's/.*ADR-\([0-9]*\).*/\1/' | sort -n | tail -1)
num=$(printf "%03d" $(( ${next:-0} + 1 )))
slug=$(echo "{{title}}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-' | tr -cd 'a-z0-9-')
dest="docs/dev/ADR-${num}-${slug}.md"
sed "s/ADR-NNN/ADR-${num}/; s/Title/{{title}}/" docs/design-docs/ADR-TEMPLATE.md > "${dest}"
echo "Created: ${dest}"
# Create a new product spec (usage: just spec "my-feature-name")
spec name:
#!/usr/bin/env bash
set -euo pipefail
dest="docs/product-specs/{{name}}.md"
if [ -f "${dest}" ]; then echo "Already exists: ${dest}"; exit 1; fi
printf "# {{name}}\n\n## Job to be done\n\n## Workflow\n\n## Edge cases\n\n## Non-goals\n\n## Verification\n\n\`\`\`bash\n# command that proves this spec passes\n\`\`\`\n" > "${dest}"
echo "Created: ${dest}"
# Create a new harness spec (usage: just harness-spec "behavior-name")
harness-spec name:
#!/usr/bin/env bash
set -euo pipefail
dest="harness/specs/{{name}}.md"
if [ -f "${dest}" ]; then echo "Already exists: ${dest}"; exit 1; fi
printf "# Harness Spec: {{name}}\n\n## Behavior\n\n## Verification command\n\n\`\`\`bash\n\n\`\`\`\n\n## Pass criteria\n\n" > "${dest}"
echo "Created: ${dest}"

View file

@ -315,6 +315,105 @@ Document expected failure modes, recovery paths, observability, and release chec
content: `# Security
Document trust boundaries, secrets handling, dependency risk, and security review requirements here.
`,
},
{
path: "docs/design-docs/ADR-TEMPLATE.md",
content: `# ADR-NNN: Title
**Status:** Proposed | Accepted | Rejected | Superseded by ADR-NNN
**Date:** YYYY-MM-DD
## Context
What is the problem or situation that requires a decision? Include constraints and the forces at play.
## Decision
What is the change being made or the approach being adopted?
## Consequences
What becomes easier or harder after this decision? Include positive and negative outcomes.
## Alternatives Considered
What other options were evaluated and why were they not chosen?
`,
},
{
path: "harness/AGENTS.md",
content: `# Harness Agent Notes
The harness is a collection of contracts the agent can read and verify against.
- \`specs/\`: behavior contracts. Each spec states what "done" looks like and the command that proves it.
- \`evals/\`: task definitions for behaviors tests cannot cover — model output quality, multi-turn flows, agent decisions.
- \`graders/\`: reusable grader scripts (code-based checks, LLM-judge prompts used by evals).
**Rule:** Before marking a task done, run the relevant spec's verification command. Record the result in the completion summary or execution plan.
`,
},
{
path: "harness/specs/AGENTS.md",
content: `# Harness Specs Agent Notes
Each spec file in this directory:
- States the behavior being specified (not the implementation).
- Includes the exact command that proves the spec passes.
- Is referenced by the relevant execution plan or ADR.
Write the spec before implementation. Run it after. Record the result.
`,
},
{
path: "harness/specs/bootstrap.md",
content: `# Bootstrap Spec: Agent Legibility
Verifies that this repo is minimally agent-legible.
## Criteria
- [ ] \`AGENTS.md\` exists at repo root and is non-empty.
- [ ] \`ARCHITECTURE.md\` exists at repo root and is non-empty.
- [ ] \`docs/exec-plans/active/\` exists.
- [ ] \`docs/exec-plans/tech-debt-tracker.md\` exists.
- [ ] \`docs/design-docs/ADR-TEMPLATE.md\` exists.
## Verification command
\`\`\`bash
for f in AGENTS.md ARCHITECTURE.md docs/exec-plans/active/index.md docs/exec-plans/tech-debt-tracker.md docs/design-docs/ADR-TEMPLATE.md; do [ -s "$f" ] && echo "OK: $f" || echo "MISSING: $f"; done
\`\`\`
All lines should start with \`OK:\` for the bootstrap spec to pass.
`,
},
{
path: "harness/evals/AGENTS.md",
content: `# Harness Evals Agent Notes
Evals verify behavior that unit tests cannot cover model output quality, agent decisions, multi-turn flows.
Each eval should include:
- The input fixture or prompt
- The expected output or scoring rubric
- The command to run it (\`promptfoo eval\`, custom script, etc.)
Keep evals deterministic where possible. Log results to \`docs/records/\` at milestone close.
`,
},
{
path: "harness/graders/AGENTS.md",
content: `# Harness Graders Agent Notes
Graders are reusable scripts or prompts that score eval outputs.
- Code-based graders: shell scripts or test files that check structured outputs deterministically.
- LLM-judge graders: prompt templates that ask a model to score free-text output against a rubric.
Prefer code-based graders. Add LLM-judge graders only when deterministic checking is impossible.
`,
},
];

View file

@ -168,6 +168,7 @@ export async function buildBeforeAgentStartResult(
sfHome,
process.cwd(),
);
const architectureBlock = loadArchitectureBlock(process.cwd());
if (globalSizeKb > 4) {
ctx.ui.notify(
`SF: ~/.sf/agent/KNOWLEDGE.md is ${globalSizeKb.toFixed(1)}KB — consider trimming to keep system prompt lean.`,
@ -281,7 +282,7 @@ export async function buildBeforeAgentStartResult(
? `\n\n## Subagent Model\n\nWhen spawning subagents via the \`subagent\` tool, always pass \`model: "${subagentModelConfig.primary}"\` in the tool call parameters. Never omit this — always specify it explicitly.`
: "";
const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${systemContent}${preferenceBlock}${knowledgeBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${repositoryVcsBlock}${modelIdentityBlock}${subagentModelBlock}`;
const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${systemContent}${preferenceBlock}${knowledgeBlock}${architectureBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${repositoryVcsBlock}${modelIdentityBlock}${subagentModelBlock}`;
stopContextTimer({
systemPromptSize: fullSystem.length,
@ -363,6 +364,28 @@ export function loadKnowledgeBlock(
};
}
/**
* Load ARCHITECTURE.md from the project root into context. Capped at 8 000 chars
* to avoid bloating every request full file is always readable on disk.
*/
function loadArchitectureBlock(cwd: string): string {
const architecturePath = join(cwd, "ARCHITECTURE.md");
if (!existsSync(architecturePath)) return "";
try {
const raw = readFileSync(architecturePath, "utf-8").trim();
if (!raw) return "";
const MAX_CHARS = 8_000;
const content =
raw.length > MAX_CHARS
? raw.slice(0, MAX_CHARS) +
"\n\n*(truncated — see ARCHITECTURE.md for full map)*"
: raw;
return `\n\n[ARCHITECTURE — System map and invariants]\n\n${content}`;
} catch {
return "";
}
}
function buildWorktreeContextBlock(): string {
const worktreeName = getActiveWorktreeName();
const worktreeMainCwd = getWorktreeOriginalCwd();

View file

@ -236,20 +236,30 @@ export interface PreMergeCheckResult {
*/
export const RUNTIME_EXCLUSION_PATHS: readonly string[] = [
".sf/activity/",
".sf/audit/",
".sf/exec/",
".sf/forensics/",
".sf/journal/",
".sf/model-benchmarks/",
".sf/parallel/",
".sf/reports/",
".sf/runtime/",
".sf/worktrees/",
".sf/parallel/",
".sf/auto.lock",
".sf/metrics.json",
".sf/completed-units*.json", // covers completed-units.json and archived completed-units-{MID}.json
".sf/state-manifest.json",
".sf/STATE.md",
".sf/sf.db*",
".sf/journal/",
".sf/doctor-history.jsonl",
".sf/event-log.jsonl",
".sf/notifications.jsonl",
".sf/routing-history.json",
".sf/self-feedback.jsonl",
".sf/repo-meta.json",
".sf/DISCUSSION-MANIFEST.json",
".sf/milestones/**/*-CONTINUE.md",
".sf/milestones/**/continue.md",
];
function isPathExcluded(path: string, exclusions: readonly string[]): boolean {

View file

@ -28,19 +28,27 @@ import { sfRoot } from "./paths.js";
*/
const SF_RUNTIME_PATTERNS = [
".sf/activity/",
".sf/audit/",
".sf/exec/",
".sf/forensics/",
".sf/journal/",
".sf/model-benchmarks/",
".sf/parallel/",
".sf/reports/",
".sf/runtime/",
".sf/worktrees/",
".sf/parallel/",
".sf/auto.lock",
".sf/metrics.json",
".sf/completed-units*.json", // covers completed-units.json and archived completed-units-{MID}.json
".sf/state-manifest.json",
".sf/STATE.md",
".sf/sf.db*",
".sf/journal/",
".sf/doctor-history.jsonl",
".sf/event-log.jsonl",
".sf/notifications.jsonl",
".sf/routing-history.json",
".sf/self-feedback.jsonl",
".sf/repo-meta.json",
".sf/DISCUSSION-MANIFEST.json",
".sf/milestones/**/*-CONTINUE.md",
".sf/milestones/**/continue.md",

View file

@ -765,8 +765,12 @@ const SKIP_PATHS = [
".sf/worktrees/",
".sf/runtime/",
".sf/activity/",
".sf/audit/",
".sf/exec/",
".sf/forensics/",
".sf/model-benchmarks/",
".sf/parallel/",
".sf/reports/",
".sf/journal/",
];
const SKIP_EXACT = [
@ -776,6 +780,11 @@ const SKIP_EXACT = [
".sf/state-manifest.json",
".sf/doctor-history.jsonl",
".sf/event-log.jsonl",
".sf/notifications.jsonl",
".sf/routing-history.json",
".sf/self-feedback.jsonl",
".sf/repo-meta.json",
".sf/DISCUSSION-MANIFEST.json",
];
/** File prefixes to skip (for wildcard patterns like completed-units*.json, sf.db*). */
const SKIP_PREFIXES = [".sf/completed-units", ".sf/sf.db"];
@ -784,6 +793,12 @@ function shouldSkipPath(filePath: string): boolean {
if (SKIP_PATHS.some((p) => filePath.startsWith(p))) return true;
if (SKIP_EXACT.includes(filePath)) return true;
if (SKIP_PREFIXES.some((p) => filePath.startsWith(p))) return true;
// Milestone continue markers are ephemeral interruption signals, not durable artifacts.
if (
filePath.startsWith(".sf/milestones/") &&
(filePath.endsWith("-CONTINUE.md") || filePath.endsWith("/continue.md"))
)
return true;
return false;
}