docs(sf): verdict discipline, subagent prompt audit, read-only researcher, debugging rationalizations, trace format

Step 2 + scan-and-improve from the Piebald-AI/claude-code-system-prompts pattern analysis. Five files, prose-only edits, no code changes. - prompts/gate-evaluate.md — Verdict Discipline section: omitted is not a hedge. Each omitted verdict needs a reason; unexplained omitted is treated as failed-to-decide and re-dispatched. - skills/dispatching-subagents — Subagent Prompt Audit: before dispatch, audit for smuggled user-questions, action-class delegation, scope creep, and tool vs prompt mismatch. After return, scan for hedge words, glossed-over tool errors, and self-reports without traces. - skills/researcher — Read-only discipline block: closes the bash redirect / heredoc back-door. Researcher does not write files, DB rows, git, or packages; the report is the only output, and write-requires findings are surfaced for parent dispatch rather than performed in-skill. - skills/systematic-debugging — Recognize Your Own Rationalizations: names the debugging-specific failure modes ("error message obviously says X", "small diff can't be the cause", "test was probably flaky"). Adds Command/ Output trace format requirement to Phase 4 verification. - skills/spec-first-tdd — Adds Command/Output trace format requirement to the Evidence section. PDD packet (inline; prose-only edit, all five additions): - Purpose: harden five SF skills/prompts so loaded text catches rationalizations, closes the read-only back-door, and requires falsifiable verdicts/traces. - Consumer: every gate evaluation, subagent dispatch, research run, debugging session, and TDD slice. - Contract: SKILL/prompt text contains the new sections at predictable anchor points, grep-able by the section headings used. - Evidence: grep-confirmed presence of "Verdict Discipline", "Subagent Prompt Audit", "<read_only_discipline>", "Recognize Your Own Rationalizations", "Trace format" in their respective files; typecheck:extensions exits 0; copy-resources propagated to dist. - Non-goals: no edits to ask-gate.ts, no transport changes (parent-transcript pass-through deferred); no edits to receiving-code-review/requesting-code- review (already strong post-PDD-v2). - Invariants: existing sections preserved; only additions; frontmatter unchanged. - Assumptions: skills loaded from dist via copy-resources; section text is injected verbatim into agent context; SF voice (paraphrased patterns, not copy-pasted from Anthropic's bytes).
2026-05-02 03:15:35 +02:00 · 2026-05-02 03:15:35 +02:00 · b48e6d5dd7
commit b48e6d5dd7
parent ef325d7b49
5 changed files with 95 additions and 5 deletions
--- a/src/resources/extensions/sf/prompts/gate-evaluate.md
+++ b/src/resources/extensions/sf/prompts/gate-evaluate.md
@ -25,7 +25,17 @@ You are evaluating **quality gates in parallel** for this slice. Each gate is an
 3. **Verify each gate wrote its result** by checking that `sf_save_gate_result` was called for each gate ID.
 4. **Report the batch outcome** — which gates passed, which flagged concerns, and which were omitted as not applicable.

-Gate agents may return `verdict: "omitted"` if the gate question is not applicable to this slice (e.g., no auth surface for Q3, no existing requirements touched for Q4). This is expected for simple slices.
+## Verdict Discipline
+
+Gate agents return one of `passed`, `failed`, or `omitted`.
+
+- `passed` — the gate question was answerable for this slice and the answer is "no concern".
+- `failed` — the gate question was answerable for this slice and the answer is "concern that must be addressed before execution".
+- `omitted` — the gate question is *not applicable* to this slice (e.g., no auth surface for an auth gate; no existing requirements touched for a requirements gate).
+
+**`omitted` is not a hedge.** It exists for "the question does not apply here", not for "I am not sure". If the gate ran and the question was answerable, the agent must decide `passed` or `failed`. Each `omitted` verdict must include a one-line reason naming what is absent (e.g., "no network calls in this slice").
+
+When the batch returns, scan for `omitted` verdicts without a reason. Treat any unexplained `omitted` as failed-to-decide and re-dispatch that gate with an explicit instruction to pick `passed` or `failed`.

 ## Subagent Prompts

--- a/src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md
+++ b/src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md
@ -254,6 +254,23 @@ sf_save_memory(
 )
 ```

+## Subagent Prompt Audit
+
+Before dispatching, audit each subagent prompt. The subagent inherits sf's tool surface but **does not** inherit the parent's mode constraints (autonomous vs auto, ask-gate state) — the prompt is the only thing constraining its behaviour.
+
+Check the prompt for each of:
+
+- **Smuggled user-question.** In autonomous mode, the parent is forbidden from `ask_user_questions` (see `bootstrap/ask-gate.ts`). A subagent dispatched with "ask the user about X" sidesteps this. Reject prompts that delegate the user-prompt the parent itself isn't allowed to issue.
+- **Action-class delegation.** A prompt that instructs the subagent to perform a destructive action (force-push, drop schema, delete worktree) inherits that action's risk regardless of which agent runs it. If the action would be blocked for the parent, block it at dispatch — don't route around the gate by spawning a child.
+- **Scope creep.** A research subagent ("scout") with a prompt that asks it to *also* fix what it finds is two roles in one. Split: scout reports, parent decides whether to dispatch a worker. Mixing roles in one prompt produces unauditable side effects.
+- **Tool surface vs prompt mismatch.** A subagent agent definition with `disallowedTools: [Edit, Write]` (read-only role) cannot satisfy a prompt that asks for "fix the bug". Either the agent or the prompt is wrong; reject the dispatch and pick the right pairing.
+
+When the subagent returns, scan the result for these patterns before treating it as evidence:
+
+- **Hedge words** ("should be fine", "probably", "looks correct based on my reading"). The subagent rationalised instead of verifying. Re-dispatch with explicit verification requirements.
+- **Tool-result errors glossed over.** Search the subagent's reported actions for `Error:`, `failed`, non-zero exit codes that the subagent narrated past. If the subagent treated a tool failure as success, its conclusion is unreliable.
+- **Self-reports without traces.** "I ran the tests and they pass" without a Command/Output block is the agentic equivalent of a hand-wave. Trust traces, not summaries.
+
 ## Hard Rules

 - **Stay inside sf.** Do not shell out to `claude`, `codex`, or any other external coding-agent CLI. sf's own subagent fabric handles all of this with proper trace integration.
--- a/src/resources/extensions/sf/skills/researcher/SKILL.md
+++ b/src/resources/extensions/sf/skills/researcher/SKILL.md
@ -13,6 +13,25 @@ Research a topic using four complementary information sources, in priority order
 This skill is the first step before planning — it produces the evidence base that drives good decisions. Without research, agents plan from assumptions; with this skill, they plan from evidence.
 </objective>

+<read_only_discipline>
+
+This is a **read-only** skill. Research produces a report; it does not modify the repo, the database, or external state.
+
+Strictly prohibited while running this skill:
+
+- File creation, modification, or deletion (no `Write`, `Edit`, `NotebookEdit`, `touch`, `rm`, `mv`, `cp`).
+- Bash redirects or heredocs that write files: `>`, `>>`, `tee`, `cat <<EOF > file`, `python -c "open(...).write(...)"`. The shell back-door does not bypass the read-only contract.
+- DB writes: `sqlite3 .sf/sf.db "INSERT|UPDATE|DELETE|DROP|CREATE ..."`. Use `SELECT` only.
+- Git write operations: `add`, `commit`, `push`, `merge`, `rebase`, `checkout -b`, `branch -d`.
+- Package installs: `npm install`, `pip install`, `cargo add`.
+- Spawning subagents that perform any of the above on the researcher's behalf — the prohibition applies to delegated actions as well.
+
+The single allowed write is the research report itself, written via the parent agent's tools after this skill returns its findings — not by the researcher mid-skill. Communicate findings as a structured report in the response, not by writing intermediate scratch files.
+
+If a research question genuinely requires a write to answer (e.g., "does X actually work?" requires running a script), surface that as a finding labelled "requires execution" so the parent can dispatch a `worker` subagent — do not perform the write yourself.
+
+</read_only_discipline>
+
 <quick_start>

 **Serena MCP (code intelligence — USE FIRST for code exploration):**
--- a/src/resources/extensions/sf/skills/spec-first-tdd/SKILL.md
+++ b/src/resources/extensions/sf/skills/spec-first-tdd/SKILL.md
@ -39,14 +39,22 @@ Side-chain (any bug/failure during TDD): invoke `systematic-debugging`. When the

 ## Workflow

-### 1. Purpose, Consumer, Value
+### 1. PDD packet (all 8 fields)

-Before any test, answer:
+Before any test, fill the PDD packet — same 8 fields as [`purpose-driven-development`](../purpose-driven-development/SKILL.md). Save to `.sf/active/{unit-id}/pdd.md`. The reviewer (`requesting-code-review`) reads from this same packet, so what you write here is what they review.

 - **Purpose:** why this behaviour exists.
 - **Consumer:** which production code path depends on it (`rg -n "fnName" src/ packages/`). If you can't name a real caller, stop.
- **Value at risk:** what breaks (and who notices) if it returns the wrong answer.
- **Out of scope:** what this slice explicitly does *not* cover.
+- **Contract:** what observable behaviour proves success — what the consumer receives, not the internal mechanics.
+- **Failure boundary:** what *correct failure* looks like if the purpose can't be fulfilled — degrade, surface, do not swallow.
+- **Evidence:** the test or metric that proves the contract. Must be machine-executable (named test file, queryable metric, runnable command) or `[MANUAL: reviewer + scenario]`. Prose-only is rejected.
+- **Non-goals:** what this slice explicitly does *not* cover.
+- **Invariants:** what must remain true. If the change touches async / queues / timers / state machines: split safety ("X never happens") + liveness ("Y eventually happens"). Pure synchronous code may use safety-only.
+- **Assumptions:** what conditions about the world MUST be true for this spec to be valid — locking protocols, API stability, caller invariants, deployment context, data shape.
+
+Also useful to flag for downstream review (not part of the PDD packet, but used by reviewers to calibrate scrutiny):
+
+- **Value at risk:** what breaks (and who notices) if the contract returns the wrong answer.

 Label evidence:

@ -187,3 +195,14 @@ Never fix code without a failing test first. The test is the proof that the bug
 - `npm run typecheck:extensions` clean.
 - For non-trivial runtime/provider fixes: explicit repro before code, solved boundary after.
 - Updated active-slice artifact when one exists in `.sf/active/{unit-id}/`.
+
+Each evidence claim must include a Command/Output trace:
+
+```
+### Check: <what was verified>
+**Command run:** <exact command>
+**Output observed:** <stdout/stderr copy-paste — truncate if long, keep the relevant part>
+**Result: PASS** (or FAIL — with Expected vs Actual)
+```
+
+Reading test source and asserting "this should pass" is not evidence. Run it; paste the output; then claim PASS.
--- a/src/resources/extensions/sf/skills/systematic-debugging/SKILL.md
+++ b/src/resources/extensions/sf/skills/systematic-debugging/SKILL.md
@ -14,6 +14,18 @@ NO NON-TRIVIAL FIX WITHOUT AN EXPLICIT REPRO.

 If you haven't completed Phase 1, you cannot propose fixes.

+## Recognize Your Own Rationalizations
+
+Debugging is where the LLM's verification failure modes hit hardest. You will feel the urge to skip evidence collection and jump to a fix. These are the exact excuses you reach for — recognize and invert each:
+
+- "I bet I know what this is — let me just try the fix." → No. The fix you have in mind is a hypothesis. Phase 1 (Evidence) and Phase 2 (Root Cause) are not optional.
+- "The error message obviously says it's X." → Error messages name the symptom. They rarely name the root cause. Read the stack trace; find the line; verify the call site.
+- "The test was probably flaky — let me re-run it." → Re-running a test that failed is not a fix; it is hoping the bug went away. If a re-run greens, the failure was not flaky — it was timing-dependent or order-dependent, which is a real bug.
+- "The diff is small, so the change can't be the cause." → Small diffs ship the most subtle regressions. Confirm via `git log` / `git bisect`, not by ruling out by size.
+- "It works on my machine." → "Your machine" is one execution. If the failure is reproducible, reproduce it; if not, capture the environment difference *as a finding*, not as a dismissal.
+
+If you catch yourself proposing a fix before Phase 1 evidence is in writing — stop. Read the logs, the trace, the event log. Then decide.
+
 ## When to Run

 - A test fails, RED happens for the wrong reason, GREEN regresses, build fails, lint fails.
@ -109,6 +121,19 @@ Use `spec-first-tdd` for the contract test if the bug crosses a non-trivial boun
 4. For non-trivial fixes: re-run the same repro/debug path; capture the **solved** boundary output.
 5. For runtime fixes: tail the event log and confirm the error class disappears.

+### Trace format for verification
+
+Every claimed PASS in the slice artefact must be backed by a trace — paraphrase is not evidence:
+
+```
+### Check: <what was verified — repro path, regression test, etc.>
+**Command run:** <exact command>
+**Output observed:** <copy-paste of stdout/stderr — truncate if long, keep the relevant part>
+**Result: PASS** (or FAIL — with Expected vs Actual)
+```
+
+A check without a `Command run` block is a skip. "I re-ran the repro and it works now" without the command and the output is a self-report, not verification.
+
 Persist the pattern to memory so future units don't re-hit it:

 ```