diff --git a/docs/SPEC_FIRST_TDD.md b/docs/SPEC_FIRST_TDD.md index f85299bdf..39a8b38c6 100644 --- a/docs/SPEC_FIRST_TDD.md +++ b/docs/SPEC_FIRST_TDD.md @@ -57,14 +57,18 @@ Every unit (milestone, slice, task) sits in one of those rows. If a piece of wor ## Purpose Gate -Every artifact (slice plan, task plan, function, test, ADR) must answer: +Every artifact (slice plan, task plan, function, test, ADR) must answer the same 8 PDD fields captured by the [`purpose-driven-development`](../src/resources/extensions/sf/skills/purpose-driven-development/SKILL.md) skill — these fields ARE the Purpose Gate: -- **why** this behaviour exists -- **what value** it creates or protects -- **who** uses it in production (real consumer, not just tests) -- **what breaks** if it returns the wrong answer +- **Purpose**: why this behaviour exists. +- **Consumer**: who depends on the outcome in production (real caller, not just tests). +- **Contract**: what observable behaviour proves success — what the consumer receives, not how the implementation works internally. +- **Failure boundary**: what *correct failure* looks like if the purpose can't be fulfilled — degrade, surface, do not swallow. +- **Evidence**: the test, metric, or repro that proves the contract. Each criterion must be machine-executable (named test, queryable metric, runnable command) OR explicitly tagged `[MANUAL: reviewer + scenario]`. Prose-only evidence is unfalsifiable and rejected. +- **Non-goals**: what this is *not* solving. +- **Invariants**: what must remain true. If the change touches async, queues, timers, or state machines, split into safety ("X never happens") + liveness ("Y eventually happens"). Pure synchronous code may use safety-only. +- **Assumptions**: conditions about the world that MUST be true for this spec to be valid — locking protocols, API stability, caller invariants, deployment context, data shape. World-side failures (assumption violated) are invisible to internal tests and are the most expensive failure class. -If any answer is missing: `BLOCKED: purpose unclear — [which field is missing]`. Do not invent a plausible purpose to proceed. Surfacing the gap is more valuable than rationalising past it. +If any field is missing: `BLOCKED: purpose unclear — [which field is missing]`. Do not invent a plausible answer to proceed. Surfacing the gap is more valuable than rationalising past it. Treat the contract as a **falsifiable hypothesis**: name the evidence that would prove it wrong before implementation locks in. A contract without a falsifier is half a contract. diff --git a/src/resources/extensions/sf/skills/receiving-code-review/SKILL.md b/src/resources/extensions/sf/skills/receiving-code-review/SKILL.md index ec432e92a..7e0c0f868 100644 --- a/src/resources/extensions/sf/skills/receiving-code-review/SKILL.md +++ b/src/resources/extensions/sf/skills/receiving-code-review/SKILL.md @@ -98,7 +98,7 @@ For non-trivial runtime/provider review items, verification must include: Before changing code based on a suggestion, verify it doesn't violate sf's invariants: - **Iron Law**: Does the suggestion ask you to skip the failing test → fix → green cycle? Push back. The bug is a missing test; write it first. -- **Purpose Gate**: Does the suggestion add an exported symbol without a Purpose / Consumer / Value-at-risk? Push back. +- **Purpose Gate**: Does the suggestion add an exported symbol without a complete PDD packet (Purpose, Consumer, Contract, Failure boundary, Evidence, Non-goals, Invariants, Assumptions — see [`purpose-driven-development`](../purpose-driven-development/SKILL.md))? Push back. A symbol that doesn't earn all 8 fields is unspecified, and "I'll fill them in later" is a scope-creep vector. Evidence must be machine-executable or `[MANUAL: reviewer + scenario]` — prose-only is rejected. - **YAGNI**: Does the suggestion add abstraction with zero current callers? Run `rg` for callers — if zero, push back: *"Nothing calls this in production. Adds dead code. Remove per YAGNI?"* - **Recent decisions**: If the suggestion contradicts a `.sf/DECISIONS.md` entry or a recent ADR (`docs/dev/ADR-*.md`), check the history before implementing or pushing back. - **Self-modification boundary**: If the suggestion edits a protected file (`SPEC.md`, `BUILD_PLAN.md`, `AGENTS.md`, etc.) without explicit human approval, push back — those need human sign-off. diff --git a/src/resources/extensions/sf/skills/requesting-code-review/SKILL.md b/src/resources/extensions/sf/skills/requesting-code-review/SKILL.md index de84b3e49..c6f81e0c0 100644 --- a/src/resources/extensions/sf/skills/requesting-code-review/SKILL.md +++ b/src/resources/extensions/sf/skills/requesting-code-review/SKILL.md @@ -64,10 +64,21 @@ sf_search_memories(query="", limit=5) - Requirement: - Decision: -## Purpose & Consumer +## PDD packet (all 8 fields) + +Restate verbatim from `.sf/active/{unit-id}/pdd.md` — do not paraphrase. Same 8 fields filled in by [`purpose-driven-development`](../purpose-driven-development/SKILL.md). If any field is missing or empty, **stop and return to PDD** — do not request review with an incomplete packet. + - **Purpose**: - **Consumer**: -- **Value at risk**: +- **Contract**: +- **Failure boundary**: +- **Evidence**: see Evidence section below; every criterion must be machine-executable or `[MANUAL: reviewer + scenario]` — prose-only is rejected +- **Non-goals**: +- **Invariants**: +- **Assumptions**: + +## Value at risk + ## Evidence - Typecheck: ✅ `npm run typecheck:extensions` @@ -83,10 +94,10 @@ sf_search_memories(query="", limit=5) ## Falsifier - + ## Scope defence - + ``` For architecture-heavy or boundary-heavy changes, append: @@ -155,5 +166,7 @@ sf_save_memory( - If blast radius is high (>10 transitive callers), flag it explicitly. - For non-trivial runtime/provider changes, include the debug/repro evidence — not just trace summaries. - For architecture-heavy changes, include disagreement evidence — what advocate strengthened, what challenger attacked. +- All 8 PDD packet fields must be present and restated from `.sf/active/{unit-id}/pdd.md`. If any of the 8 is missing, or Evidence contains prose-only criteria, **stop** and return to `purpose-driven-development` — do not request review with an incomplete packet. +- For changes touching async, queues, timers, or state machines, the Invariants field MUST split safety ("X never happens") and liveness ("Y eventually happens"). A single combined invariants list is a v1 artefact and will be rejected. - Mark major claims as `Observed`, `Inferred`, or `Proposed`. - Include the strongest reason the change could still be wrong even if tests pass.