docs: clarify SF harness rollout boundaries
This commit is contained in:
parent
d78c5ac198
commit
b32fe7acd1
4 changed files with 67 additions and 39 deletions
|
|
@ -98,7 +98,7 @@ These came up during recent ports and refactor passes — tracked here so they d
|
||||||
| **Pi-mono SDK sync** | We pull from pi-mono directly (separate from gsd-2 sync stance). Periodically check `pi-mono/main` for SDK improvements worth taking. The remote is set up; cadence is not. | 3 | recurring |
|
| **Pi-mono SDK sync** | We pull from pi-mono directly (separate from gsd-2 sync stance). Periodically check `pi-mono/main` for SDK improvements worth taking. The remote is set up; cadence is not. | 3 | recurring |
|
||||||
| **Caveman input-side compression** (manual) | Caveman skill installed (output compression, ~75% fewer agent tokens). Input side — sf's own prompts (`execute-task.md`, `discuss.md`, `plan-*.md`, etc.) — is verbose: 10-step instruction lists, `runtimeContext`, `memoriesSection`, `taskPlanInline`, `slicePlanExcerpt`. Manually rewrite the heaviest sections in caveman style (preserve intent + nuance, drop fluff). Test against current to confirm no quality regression. | 2 | 1-2 days |
|
| **Caveman input-side compression** (manual) | Caveman skill installed (output compression, ~75% fewer agent tokens). Input side — sf's own prompts (`execute-task.md`, `discuss.md`, `plan-*.md`, etc.) — is verbose: 10-step instruction lists, `runtimeContext`, `memoriesSection`, `taskPlanInline`, `slicePlanExcerpt`. Manually rewrite the heaviest sections in caveman style (preserve intent + nuance, drop fluff). Test against current to confirm no quality regression. | 2 | 1-2 days |
|
||||||
| **Runtime input preprocessor** (caveman-compress) | Add a transformation step in dispatch that pipes sf's rendered prompt through `caveman-compress` (sub-skill in juliusbrussee/caveman repo, ~46% input-token reduction) before LLM call. Only enable when a `terse_prompts: true` preference is set. Adds a layer that can drift from authored intent — needs a comparison harness. | 3 | 3-4 days |
|
| **Runtime input preprocessor** (caveman-compress) | Add a transformation step in dispatch that pipes sf's rendered prompt through `caveman-compress` (sub-skill in juliusbrussee/caveman repo, ~46% input-token reduction) before LLM call. Only enable when a `terse_prompts: true` preference is set. Adds a layer that can drift from authored intent — needs a comparison harness. | 3 | 3-4 days |
|
||||||
| **Swarm chat / debate mode** for `subagent` tool | Today `subagent({ tasks: [...] })` runs parallel fire-and-forget — adversarial reviewers never engage each other's strongest defence. Add `mode: "debate"` + `rounds: N` so each task sees prior rounds' outputs. See [ADR-011](docs/dev/ADR-011-swarm-chat-and-debate-mode.md) — Option A (round-robin debate) first, Option C (full inbox-based swarm chat) after the persistent-agent layer (SPEC §17–18) lands. | 2 | 1 week (Option A); ~3 weeks (Option C, depends on persistent-agent layer) |
|
| **Full swarm chat for `subagent` tool** | Round-robin debate mode now exists as `subagent({ mode: "debate", rounds: N, tasks: [...] })`, so adversarial reviewers can engage prior-round arguments. Remaining work is Option C from [ADR-011](docs/dev/ADR-011-swarm-chat-and-debate-mode.md): full inbox-based swarm chat after the persistent-agent layer (SPEC §17–18) lands. | 3 | ~3 weeks (depends on persistent-agent layer) |
|
||||||
| **Singularity Knowledge + Agent Platform (Go re-platform)** | Re-platform Singularity Memory from Python+FastAPI+Postgres+vchord to Go on Charm: charm-server patterns for auth/identity, fantasy as agent runtime, same Postgres+vchord for retrieval, exact wire-contract preserved. Load-bearing for cross-instance knowledge federation AND future central persistent agents (sf SPEC §17). See [ADR-014](docs/dev/ADR-014-singularity-knowledge-and-agent-platform.md) and [`singularity-memory/MIGRATION.md`](https://github.com/singularity-ng/singularity-memory/blob/main/MIGRATION.md). | 1 | ~12 weeks across phases |
|
| **Singularity Knowledge + Agent Platform (Go re-platform)** | Re-platform Singularity Memory from Python+FastAPI+Postgres+vchord to Go on Charm: charm-server patterns for auth/identity, fantasy as agent runtime, same Postgres+vchord for retrieval, exact wire-contract preserved. Load-bearing for cross-instance knowledge federation AND future central persistent agents (sf SPEC §17). See [ADR-014](docs/dev/ADR-014-singularity-knowledge-and-agent-platform.md) and [`singularity-memory/MIGRATION.md`](https://github.com/singularity-ng/singularity-memory/blob/main/MIGRATION.md). | 1 | ~12 weeks across phases |
|
||||||
| **Wire sf to Singularity Memory remote-mode** | sf-side: change `memory-store.ts` provider chain from local-SQLite-only to remote-Singularity-Memory → embedded → local-only fallback. Once wired, ~80% of the "should sf instances interlink?" question (ADR-012) is answered for free. Depends on the platform itself being live. | 1 | 1 week post-platform |
|
| **Wire sf to Singularity Memory remote-mode** | sf-side: change `memory-store.ts` provider chain from local-SQLite-only to remote-Singularity-Memory → embedded → local-only fallback. Once wired, ~80% of the "should sf instances interlink?" question (ADR-012) is answered for free. Depends on the platform itself being live. | 1 | 1 week post-platform |
|
||||||
| **sf-worker SSH host** | Build the Go-based SSH worker host for distributed execution (SPEC §22, NEW): `wish` + `xpty`/`conpty` + `promwish`. Orchestrator dispatches over SSH; worker spawns the agent in a real pty per attempt; Prometheus metrics for free. See [ADR-013](docs/dev/ADR-013-network-and-remote-execution.md). | 2 | ~3 weeks |
|
| **sf-worker SSH host** | Build the Go-based SSH worker host for distributed execution (SPEC §22, NEW): `wish` + `xpty`/`conpty` + `promwish`. Orchestrator dispatches over SSH; worker spawns the agent in a real pty per attempt; Prometheus metrics for free. See [ADR-013](docs/dev/ADR-013-network-and-remote-execution.md). | 2 | ~3 weeks |
|
||||||
|
|
|
||||||
6
SPEC.md
6
SPEC.md
|
|
@ -1784,7 +1784,11 @@ Deep analysis is default, not opt-in:
|
||||||
|
|
||||||
## 17. Persistent Agents
|
## 17. Persistent Agents
|
||||||
|
|
||||||
> **Status: PARTIAL** — sf has subagents (test files: `subagent-agent-discovery`, `subagent-model-dispatch`, `agent-end-retry`; module: `bootstrap/agent-end-recovery.ts`). The spec's persistent-identity + memory-blocks + inbox-wake model is NEW.
|
> **Status: PARTIAL** — sf has ephemeral subagents, including single,
|
||||||
|
> parallel, chain, and bounded debate batches (`subagent({ mode: "debate",
|
||||||
|
> rounds, tasks })`; tests include `subagent-agent-discovery`,
|
||||||
|
> `subagent-model-dispatch`, `agent-end-retry`, `subagent-debate-mode`).
|
||||||
|
> The spec's persistent-identity + memory-blocks + inbox-wake model is NEW.
|
||||||
|
|
||||||
|
|
||||||
### 17.1 Agent vs unit
|
### 17.1 Agent vs unit
|
||||||
|
|
|
||||||
|
|
@ -1,11 +1,11 @@
|
||||||
# ADR-011: Swarm chat and debate mode for ephemeral subagents
|
# ADR-011: Swarm chat and debate mode for ephemeral subagents
|
||||||
|
|
||||||
**Date**: 2026-04-29
|
**Date**: 2026-04-29
|
||||||
**Status**: proposed (deferred — capture for future implementation)
|
**Status**: accepted (Option A implemented; full swarm chat deferred)
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
sf's `subagent` tool today dispatches one or more subagents in **parallel fire-and-forget** mode (`subagent({ tasks: [...] })`). All tasks run concurrently; none see each other; the parent collects results and synthesises.
|
sf's `subagent` tool originally dispatched one or more subagents in **parallel fire-and-forget** mode (`subagent({ tasks: [...] })`). All tasks ran concurrently; none saw each other; the parent collected results and synthesised.
|
||||||
|
|
||||||
This is sufficient for many cases (parallel research, parallel gate evaluation), but it has a structural gap for **adversarial review** and **multi-stakeholder negotiation**:
|
This is sufficient for many cases (parallel research, parallel gate evaluation), but it has a structural gap for **adversarial review** and **multi-stakeholder negotiation**:
|
||||||
|
|
||||||
|
|
@ -17,11 +17,20 @@ The user asked whether agent-to-agent communication could happen inside ephemera
|
||||||
|
|
||||||
## Decision
|
## Decision
|
||||||
|
|
||||||
**Defer.** Capture the design in this ADR and a `BUILD_PLAN.md` row. Implement after the persistent-agent layer (`agents`, `agent_messages`, `agent_inbox`, `send_message` tool) lands as a NEW tier, since 90 % of the machinery is shared. Implement Option A (debate mode) first as a forcing function — once we see how much real debate improves outcomes, the case for full swarm-chat (Option C) writes itself.
|
**Implement Option A now; defer Option C.**
|
||||||
|
|
||||||
|
Round-robin debate mode is implemented on the existing `subagent` tool as
|
||||||
|
`subagent({ mode: "debate", rounds: N, tasks: [...] })`. It gives each
|
||||||
|
participant the prior rounds' transcript and keeps the parent as synthesiser.
|
||||||
|
|
||||||
|
Full inbox-based swarm chat remains deferred until the persistent-agent layer
|
||||||
|
(`agents`, `agent_messages`, `agent_inbox`, `send_message` tool) lands. That
|
||||||
|
machinery is still shared with SPEC §17-18 and should not be rebuilt inside the
|
||||||
|
ephemeral subagent extension.
|
||||||
|
|
||||||
## Alternatives Considered
|
## Alternatives Considered
|
||||||
|
|
||||||
### Option A — Round-robin debate mode (RECOMMENDED first)
|
### Option A — Round-robin debate mode (IMPLEMENTED)
|
||||||
|
|
||||||
Add `mode: "debate"` and `rounds: N` to the `subagent` tool. Each round, every task sees the previous round's outputs.
|
Add `mode: "debate"` and `rounds: N` to the `subagent` tool. Each round, every task sees the previous round's outputs.
|
||||||
|
|
||||||
|
|
@ -30,8 +39,8 @@ subagent({
|
||||||
mode: "debate",
|
mode: "debate",
|
||||||
rounds: 3,
|
rounds: 3,
|
||||||
tasks: [
|
tasks: [
|
||||||
{ id: "advocate", model_tier: "validation", prompt: "Make case for X. ..." },
|
{ agent: "reviewer", task: "Make case for X. ..." },
|
||||||
{ id: "challenger", model_tier: "validation", prompt: "Attack X. ..." }
|
{ agent: "reviewer", task: "Attack X. ..." }
|
||||||
]
|
]
|
||||||
})
|
})
|
||||||
```
|
```
|
||||||
|
|
@ -42,7 +51,8 @@ subagent({
|
||||||
- **Why not**: doesn't support free-form many-to-many messaging. Each task speaks once per round in a fixed order.
|
- **Why not**: doesn't support free-form many-to-many messaging. Each task speaks once per round in a fixed order.
|
||||||
- **Why first**: smallest change, biggest immediate quality win, reusable as a primitive.
|
- **Why first**: smallest change, biggest immediate quality win, reusable as a primitive.
|
||||||
|
|
||||||
**Effort**: ~1 dev-week. Touches: `subagent` tool definition, dispatch path in pi-coding-agent, new test cases, `dispatching-subagents` skill section, possibly `advisory-partner` skill update.
|
**Implementation:** `src/resources/extensions/subagent/index.ts`.
|
||||||
|
**Regression test:** `src/tests/subagent-debate-mode.test.ts`.
|
||||||
|
|
||||||
### Option B — Shared scratchpad
|
### Option B — Shared scratchpad
|
||||||
|
|
||||||
|
|
@ -78,12 +88,12 @@ swarm({
|
||||||
**Positive**
|
**Positive**
|
||||||
|
|
||||||
- **Higher-quality adversarial review** — the challenger actually engages the advocate's strongest defence, instead of issuing a parallel monologue.
|
- **Higher-quality adversarial review** — the challenger actually engages the advocate's strongest defence, instead of issuing a parallel monologue.
|
||||||
- **Multi-stakeholder negotiation** — the Vision Alignment Meeting becomes a real meeting, not a parallel survey.
|
- **Multi-stakeholder pressure testing** — the Vision Alignment Meeting can use bounded debate rounds instead of only a parallel survey.
|
||||||
- **Reusable primitive** — debate mode can be invoked from any skill that today does `subagent({ tasks: [advocate, challenger] })` (currently `advisory-partner`, `brainstorming`, `requesting-code-review`).
|
- **Reusable primitive** — debate mode can be invoked from any skill that today does `subagent({ tasks: [advocate, challenger] })` (currently `advisory-partner`, `brainstorming`, `requesting-code-review`).
|
||||||
|
|
||||||
**Negative**
|
**Negative**
|
||||||
|
|
||||||
- **Cost grows linearly with rounds.** A 3-round debate is 3× the tokens. Budget gates need updating in `auto-budget.ts` so debate dispatches don't silently blow past the per-unit ceiling.
|
- **Cost grows linearly with rounds.** A 3-round debate is roughly 3× the tokens. Callers should reserve budget accordingly.
|
||||||
- **Determinism drops.** A fire-and-collect batch is reproducible from prompts alone; a debate is path-dependent. Trace recording becomes more important — `.sf/traces/` must capture each round.
|
- **Determinism drops.** A fire-and-collect batch is reproducible from prompts alone; a debate is path-dependent. Trace recording becomes more important — `.sf/traces/` must capture each round.
|
||||||
- **Synthesis complexity rises** — the parent must summarise a debate transcript, not just collect verdicts. The synthesis prompt itself becomes a tunable artefact.
|
- **Synthesis complexity rises** — the parent must summarise a debate transcript, not just collect verdicts. The synthesis prompt itself becomes a tunable artefact.
|
||||||
|
|
||||||
|
|
@ -102,39 +112,32 @@ swarm({
|
||||||
- **Cross-session swarm replay** — a swarm session, once archived, is read-only. No "fork from round 2" support in v1.
|
- **Cross-session swarm replay** — a swarm session, once archived, is read-only. No "fork from round 2" support in v1.
|
||||||
- **Human-in-the-loop debate** — swarms are agent-to-agent only. If the user wants to inject a turn, that's a different surface (the existing `discuss` flow).
|
- **Human-in-the-loop debate** — swarms are agent-to-agent only. If the user wants to inject a turn, that's a different surface (the existing `discuss` flow).
|
||||||
|
|
||||||
## Implementation Sketch (Option A first)
|
## Implementation Notes (Option A)
|
||||||
|
|
||||||
1. Extend `subagent` tool:
|
1. `subagent` accepts `mode: "parallel" | "debate"` on `tasks` batches.
|
||||||
- Add `mode` field: `"parallel"` (default, current behaviour) | `"debate"`.
|
2. `rounds` defaults to `2`, is capped at `5`, and is valid only with
|
||||||
- Add `rounds` field (required when `mode = "debate"`, default `2`, max `5`).
|
`mode: "debate"`.
|
||||||
2. In the dispatch layer (pi-coding-agent / sf adapter):
|
3. Debate requires at least two participants.
|
||||||
- For `mode = "debate"`: maintain an in-memory transcript per swarm. Each round, render `previous_rounds_transcript` as a context block and append it to each task's prompt.
|
4. Each round runs the participant tasks, then appends their outputs to an
|
||||||
- Per-round trace span: `swarm.<id>.round.<n>.task.<id>` so `.sf/traces/` reflects the structure.
|
in-memory transcript.
|
||||||
3. Synthesis prompt:
|
5. Later rounds receive the transcript under `Debate transcript so far`.
|
||||||
- When all rounds complete, the parent receives the full transcript plus a synthesis directive: "summarise the strongest claim, the strongest objection, the convergence (if any), and the residual disagreement."
|
6. The final round asks each participant to end with `FINAL_VERDICT`.
|
||||||
4. Budget gate:
|
7. The parent still owns synthesis and persistence; debate mode does not create
|
||||||
- `auto-budget.ts` needs to multiply the projected cost by `rounds` before approving the dispatch.
|
persistent agent messages.
|
||||||
5. Tests:
|
|
||||||
- Unit test: a 2-round debate produces a transcript with 4 turns (2 tasks × 2 rounds).
|
|
||||||
- Integration test: an advocate/challenger pair on a known weak design — verify the falsifier surfaces by round 3 (vs. parallel mode where it doesn't).
|
|
||||||
6. Skill updates:
|
|
||||||
- `advisory-partner` — add "for non-trivial reviews, consider `mode: 'debate'` over parallel fire".
|
|
||||||
- `brainstorming` Step 5 — same.
|
|
||||||
- `dispatching-subagents` — add a "debate mode" pattern between Pattern 2 and Pattern 3.
|
|
||||||
|
|
||||||
## Sequencing
|
## Sequencing
|
||||||
|
|
||||||
| When | Why |
|
| When | Why |
|
||||||
|---|---|
|
|---|---|
|
||||||
| Persistent-agent layer scoped (`SPEC.md` §17 NEW → IN PROGRESS) | Most of Option A's machinery (transcript persistence, message scoping) overlaps. |
|
| Now | Option A is available as bounded debate mode on `subagent`. |
|
||||||
| Option A implemented | Forcing function — observe quality lift on adversarial reviews. |
|
| Six months of Option A in production | Decide whether full swarm-chat with inbox is worth the build. |
|
||||||
| Six months of Option A in production | Decide whether Option C (full swarm-chat with inbox) is worth the build. |
|
| Persistent-agent layer scoped (`SPEC.md` §17 NEW → IN PROGRESS) | Revisit Option C because inbox/message persistence machinery will exist. |
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
- `docs/SPEC.md` §17 (Persistent Agents) — defines `agents`, `agent_memory_blocks`, `agent_messages`, `agent_inbox`.
|
- `docs/SPEC.md` §17 (Persistent Agents) — defines `agents`, `agent_memory_blocks`, `agent_messages`, `agent_inbox`.
|
||||||
- `docs/SPEC.md` §18 (Inter-Agent Messaging) — defines `send_message` tool. Currently NEW (not implemented).
|
- `docs/SPEC.md` §18 (Inter-Agent Messaging) — defines `send_message` tool. Currently NEW (not implemented).
|
||||||
- `src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md` — current parallel-only contract.
|
- `src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md` — current single/parallel/debate/chain guidance.
|
||||||
- `src/resources/extensions/sf/skills/advisory-partner/SKILL.md` — primary consumer of adversarial dispatch today.
|
- `src/resources/extensions/sf/skills/advisory-partner/SKILL.md` — primary consumer of adversarial dispatch today.
|
||||||
- `src/resources/extensions/sf/prompts/gate-evaluate.md` — pre-execution Q3/Q4 gates.
|
- `src/resources/extensions/sf/prompts/gate-evaluate.md` — pre-execution Q3/Q4 gates.
|
||||||
- `src/resources/extensions/sf/prompts/validate-milestone.md` — post-execution 3-reviewer pattern.
|
- `src/resources/extensions/sf/prompts/validate-milestone.md` — post-execution 3-reviewer pattern.
|
||||||
|
|
|
||||||
|
|
@ -27,15 +27,35 @@ The system starts from template kits, then adapts them to the repository by read
|
||||||
|
|
||||||
Add the contract to markdown now. Add runtime flow behavior later behind tests.
|
Add the contract to markdown now. Add runtime flow behavior later behind tests.
|
||||||
|
|
||||||
The first implementation should not start by changing the worker prompt. It should add a pre-plan profile snapshot and a post-unit evidence retention hook, because those are observable and testable without changing every dispatch. Once those are stable, sf can inject harness/memory context into planning and verification prompts.
|
The first implementation should not start by changing the worker prompt or
|
||||||
|
writing repo-local harness files. It should add a pre-plan profile snapshot and
|
||||||
|
a post-unit evidence retention hook, because those are observable and testable
|
||||||
|
without changing every dispatch. Once those are stable, sf can inject
|
||||||
|
harness/memory context into planning and verification prompts.
|
||||||
|
|
||||||
|
Near-term repository-write boundary:
|
||||||
|
|
||||||
|
- All repositories use the same sf built-in skills and harness behavior.
|
||||||
|
- sf MUST NOT generate repo-local custom skill packs such as `.agents/skills/`
|
||||||
|
for project repos.
|
||||||
|
- sf MUST NOT create tracked `harness/`, `gates/`, CI, or repo spec files as
|
||||||
|
part of normal initialization.
|
||||||
|
- The only project-level file write allowed by this stream before the explicit
|
||||||
|
harness-writer phase is sf project preferences/config, such as
|
||||||
|
`.sf/PREFERENCES.md` or `.sf/preferences.md`, when the user asks for project
|
||||||
|
preferences.
|
||||||
|
- `.sf/sf.db` may record ignored operational state, including repo profiles and
|
||||||
|
untracked-file observations. That is not repo ownership and must not be
|
||||||
|
staged by default.
|
||||||
|
|
||||||
| When | Flow addition | Why |
|
| When | Flow addition | Why |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Now, in docs | Define repo profiling, untracked observation, harness planning, eval/judge rig, and memory retention contracts. | Gives implementation a stable target. |
|
| Now, in docs | Define repo profiling, untracked observation, harness planning, eval/judge rig, and memory retention contracts. | Gives implementation a stable target. |
|
||||||
| First code slice | Add read-only repo profile snapshot before planning. | Lets sf understand repo shape without taking ownership. |
|
| First code slice | Add read-only repo profile snapshot before planning. | Lets sf understand repo shape without taking ownership or writing tracked files. |
|
||||||
| Second code slice | Add post-unit evidence retention into `.sf/sf.db` and Singularity Memory. | Converts gate results into future guidance. |
|
| Second code slice | Add post-unit evidence retention into `.sf/sf.db` and Singularity Memory. | Converts gate results into future guidance. |
|
||||||
| Third code slice | Add harness proposal generation as a planning artifact. | Keeps generated files reviewable before write. |
|
| Third code slice | Add harness proposal generation as a planning artifact. | Produces dry-run proposals only; no tracked repo files are written. |
|
||||||
| Later | Inject harness/memory context into runtime prompts and workflow templates. | This changes agent behavior and needs regression fixtures. |
|
| Later | Inject harness/memory context into runtime prompts and workflow templates. | This changes agent behavior and needs regression fixtures. |
|
||||||
|
| Explicit opt-in later | Enable Harness Writer for reviewed diffs. | Allows tracked harness files only when a unit plan claims them and the user accepts the diff. |
|
||||||
|
|
||||||
### Files, database, and memory
|
### Files, database, and memory
|
||||||
|
|
||||||
|
|
@ -43,7 +63,7 @@ Use all three layers, with separate responsibilities:
|
||||||
|
|
||||||
| Layer | Role | Examples |
|
| Layer | Role | Examples |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Tracked repo files | Durable contract and executable harness | `SPEC.md`, `ARCHITECTURE.md`, `harness/manifest.json`, `harness/evals/*.jsonl`, `gates/*.sh`, CI workflow snippets |
|
| Tracked repo files | Future durable contract and executable harness after explicit opt-in | `SPEC.md`, `ARCHITECTURE.md`, `harness/manifest.json`, `harness/evals/*.jsonl`, `gates/*.sh`, CI workflow snippets |
|
||||||
| `.sf/sf.db` | Operational state and evidence ledger | repo profile snapshots, harness inventory, eval runs, gate results, drift events, untracked-file observations |
|
| `.sf/sf.db` | Operational state and evidence ledger | repo profile snapshots, harness inventory, eval runs, gate results, drift events, untracked-file observations |
|
||||||
| Singularity Memory | Cross-session knowledge | proven patterns, anti-patterns, recurring failures, repo-specific risk notes, judge calibration lessons |
|
| Singularity Memory | Cross-session knowledge | proven patterns, anti-patterns, recurring failures, repo-specific risk notes, judge calibration lessons |
|
||||||
|
|
||||||
|
|
@ -152,10 +172,11 @@ Detailed design is in `repo-native-harness-architecture.md`.
|
||||||
| Stage | Work | Result |
|
| Stage | Work | Result |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| 1 | Add repo profile snapshots and untracked observation model. | sf understands repo shape without taking ownership. |
|
| 1 | Add repo profile snapshots and untracked observation model. | sf understands repo shape without taking ownership. |
|
||||||
| 2 | Add template kit registry and harness manifest format. | sf can generate reviewable harness files. |
|
| 2 | Add template kit registry and harness manifest format. | sf can generate dry-run harness proposals without writing repo files. |
|
||||||
| 3 | Add judge rig and eval suite runner. | AI and agent behavior becomes measurable. |
|
| 3 | Add judge rig and eval suite runner. | AI and agent behavior becomes measurable. |
|
||||||
| 4 | Connect evidence to Singularity Memory. | Patterns and anti-patterns improve future dispatch. |
|
| 4 | Connect evidence to Singularity Memory. | Patterns and anti-patterns improve future dispatch. |
|
||||||
| 5 | Add drift detection and automatic harness update proposals. | Harnesses evolve with the repo. |
|
| 5 | Add drift detection and automatic harness update proposals. | Harnesses evolve with the repo as proposals. |
|
||||||
|
| 6 | Add explicit opt-in Harness Writer. | Reviewed repo diffs can create tracked harness files; repo-local skills remain out of scope unless separately accepted. |
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue