chore(sf): autonomous sweep — judgment-log/knowledge-compounding/tacit-knowledge tests + PDD v2 research record
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
7f41e61381
commit
6ee31e83f4
9 changed files with 811 additions and 8 deletions
84
docs/records/2026-05-02-pdd-v2-research.md
Normal file
84
docs/records/2026-05-02-pdd-v2-research.md
Normal file
|
|
@ -0,0 +1,84 @@
|
|||
---
|
||||
actionable: true
|
||||
kind: design-research
|
||||
date: 2026-05-02
|
||||
---
|
||||
|
||||
# PDD v2 — Research Findings
|
||||
|
||||
Three independent lines of research identified gaps in PDD v1 and drove the v1 → v2 upgrade.
|
||||
|
||||
---
|
||||
|
||||
## Gap 1 — No explicit domain assumptions
|
||||
|
||||
**Source:** Jackson, M. & Zave, P. "Four Dark Corners of Requirements Engineering." *ACM Transactions on Software Engineering and Methodology* 6(1), 1997.
|
||||
https://dl.acm.org/doi/10.1145/237432.237434
|
||||
|
||||
**Finding.** The world-machine framework formalises that a specification S is valid only relative to domain assumptions A. The machine (software) operates in a world it cannot fully control. Failure Boundary captures *machine-side* failures; it does not capture *world-side* failures where an assumption is violated and the spec itself becomes inapplicable — even when the code is perfectly correct.
|
||||
|
||||
**PDD gap.** PDD v1 had no Assumptions field. World-model violations (unexpected locking protocol, changed API contract, invalid caller invariant, deployment context mismatch) fell through the Failure Boundary without a name. Code proved correct against a wrong model is the most expensive failure class because internal tests cannot detect it.
|
||||
|
||||
**Resolution.** Added Assumptions as the 8th PDD field: "What conditions about the world MUST be true for this spec to be valid?"
|
||||
|
||||
---
|
||||
|
||||
## Gap 2 — Invariants not distinguished by verification class
|
||||
|
||||
**Sources:**
|
||||
- Alpern, B. & Schneider, F.B. "Defining Liveness." *Information Processing Letters* 21(4), 1985. https://doi.org/10.1016/0020-0190(85)90056-0
|
||||
- Alpern, B. & Schneider, F.B. "Recognizing Safety and Liveness." *Distributed Computing* 2(3), 1987. https://doi.org/10.1007/BF01782772
|
||||
|
||||
**Finding.** Every temporal property is the intersection of a safety property and a liveness property. Safety: "X never happens" — verifiable by point-in-time assertion, like a DbC precondition. Liveness: "Y eventually happens" — requires temporal reasoning (traces, reachability). The two require different verification techniques; treating them as one category produces proof gaps.
|
||||
|
||||
**PDD gap.** PDD v1 listed a single "Invariants" field without distinguishing the two classes. For pure synchronous code this is harmless. For anything touching async operations, queues, timers, retries, or state machines, the absence of liveness invariants means eventual-progress guarantees go unstated and unverified.
|
||||
|
||||
**Resolution.** Invariants now carry a note: split into Safety ("X never happens") and Liveness ("Y eventually happens") when the change touches async, queues, timers, or state machines. Pure synchronous code may use safety-only invariants.
|
||||
|
||||
---
|
||||
|
||||
## Gap 3 — Prose evidence enables agent self-deception
|
||||
|
||||
**Sources:**
|
||||
- SWE-bench Pro. arXiv:2509.16941, 2025. https://arxiv.org/abs/2509.16941
|
||||
- Hong, S. et al. (MetaGPT). "MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework." *ICLR 2024*. https://arxiv.org/abs/2308.00352
|
||||
- Agentic Rubrics. Scale AI, 2024. (internal benchmark, cited in SWE-bench Pro §4)
|
||||
|
||||
**Finding.** Empirical benchmarking across agentic coding systems found two distinct failure modes correlated with evidence specification:
|
||||
|
||||
- *Prose evidence* ("the feature should work correctly") enabled self-deception: agents accepted their own implementation as correct without an external falsifiability check. SWE-bench Pro identified this as the primary cause of agent task failure, outweighing model capability differences.
|
||||
- *Machine-executable evidence* (named test file, queryable metric, runnable command with expected exit code) enabled self-correction loops: agents could run the check, observe failure, and revise.
|
||||
|
||||
MetaGPT's specification protocol and Scale AI's Agentic Rubrics both independently required verifiable acceptance criteria as a structural element of agent task definitions.
|
||||
|
||||
**PDD gap.** PDD v1 Evidence field accepted prose descriptions. No distinction between falsifiable and unfalsifiable criteria.
|
||||
|
||||
**Resolution.** Evidence field now requires each criterion to be either machine-executable (named test, metric, command) or explicitly tagged `[MANUAL: reviewer + scenario]` when automation is not feasible. Prose-only evidence is forbidden.
|
||||
|
||||
**Example of compliant Evidence:**
|
||||
```
|
||||
Evidence:
|
||||
- tests/scaffold-versioning.test.ts → "stampScaffoldFile preserves body" passes
|
||||
- typecheck:extensions exits 0 after change
|
||||
- [MANUAL: mhugo, M001 milestone close — visually inspect .sf/CODEBASE.md regenerated correctly]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Methodologies that disagree
|
||||
|
||||
For completeness, three widely-used approaches take a different position:
|
||||
|
||||
- **XP (Extreme Programming, Beck 1999):** Evidence is "all tests pass" — no per-change evidence decomposition. Works at the test-suite level; loses granularity for agent-executed changes where individual acceptance criteria must be self-checkable.
|
||||
- **Cynefin / Lean Startup (Snowden 2002, Ries 2011):** In complex/chaotic domains, upfront assumption enumeration impedes learning; probe-sense-respond is preferred. Applicable to product discovery; not applicable to implementation-level specifications where world-model violations must be detected before code ships.
|
||||
- **BDD (Cucumber / Gherkin):** Assumes evidence is prose-first (Given/When/Then in natural language), with automation as optional layer. The PDD v2 requirement inverts this: automation is the default, prose is the named exception.
|
||||
|
||||
---
|
||||
|
||||
## Integration work
|
||||
|
||||
This record carries `actionable: true`. The record-promoter will create a milestone for:
|
||||
|
||||
1. Audit existing PDD packets in `.sf/active/` and `.sf/completed-units/` — identify any missing Assumptions or prose-only Evidence fields.
|
||||
2. Update `requesting-code-review` skill to validate all 8 fields.
|
||||
3. Consider linting the PDD output template for machine-executable Evidence markers.
|
||||
|
|
@ -64,6 +64,11 @@ export function registerJudgmentTools(pi: ExtensionAPI): void {
|
|||
text: `Judgment logged for unit ${params.unitId}: "${params.decision}" (confidence: ${params.confidence})`,
|
||||
},
|
||||
],
|
||||
details: {
|
||||
operation: "judgment_log",
|
||||
unitId: params.unitId,
|
||||
confidence: params.confidence,
|
||||
} as any,
|
||||
};
|
||||
},
|
||||
});
|
||||
|
|
|
|||
|
|
@ -104,6 +104,7 @@ export function registerSfExtension(pi: ExtensionAPI): void {
|
|||
["memory-tools", () => registerMemoryTools(pi)],
|
||||
["product-audit-tool", () => registerProductAuditTool(pi)],
|
||||
["journal-tools", () => registerJournalTools(pi)],
|
||||
["judgment-tools", () => registerJudgmentTools(pi)],
|
||||
["query-tools", () => registerQueryTools(pi)],
|
||||
["shortcuts", () => registerShortcuts(pi)],
|
||||
["hooks", () => registerHooks(pi, ecosystemHandlers)],
|
||||
|
|
|
|||
|
|
@ -358,7 +358,13 @@ export async function buildBeforeAgentStartResult(
|
|||
// stronger language that forbids ask_user_questions entirely.
|
||||
const escalationPolicyBlock = buildEscalationPolicyBlock(isCanAskUser());
|
||||
|
||||
const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${escalationPolicyBlock}${systemContent}${preferenceBlock}${knowledgeBlock}${architectureBlock}${tacitKnowledgeBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${repositoryVcsBlock}${modelIdentityBlock}${subagentModelBlock}`;
|
||||
// Judgment-log instruction for autonomous mode: agent is prompted to call
|
||||
// sf_log_judgment when making non-trivial calls between alternatives.
|
||||
const judgmentLogBlock = !isCanAskUser()
|
||||
? `\n\n[JUDGMENT LOG — autonomous mode]\nWhen you make a judgment call between alternatives at an ambiguous point, call sf_log_judgment with: decision, alternatives, reasoning, confidence. This lets the user review your reasoning at milestone close. It does NOT delay or block the work.`
|
||||
: "";
|
||||
|
||||
const fullSystem = `${event.systemPrompt}\n\n[SYSTEM CONTEXT — SF]\n\n${escalationPolicyBlock}${systemContent}${preferenceBlock}${knowledgeBlock}${architectureBlock}${tacitKnowledgeBlock}${codebaseBlock}${codeIntelligenceBlock}${memoryBlock}${newSkillsBlock}${worktreeBlock}${repositoryVcsBlock}${modelIdentityBlock}${subagentModelBlock}${judgmentLogBlock}`;
|
||||
|
||||
stopContextTimer({
|
||||
systemPromptSize: fullSystem.length,
|
||||
|
|
|
|||
|
|
@ -19,9 +19,10 @@ Before coding, name each of the following explicitly. If any one is missing, the
|
|||
| **Consumer** | Who depends on that outcome? (production caller, not a test) |
|
||||
| **Contract** | What observable behaviour proves success? |
|
||||
| **Failure boundary** | What does *correct failure* look like if the purpose can't be fulfilled? |
|
||||
| **Evidence** | What test, repro path, or smoke check proves the contract? |
|
||||
| **Evidence** | What test, metric, or repro proves the contract? Each criterion must be machine-executable (a named test file, a queryable metric, a runnable command) OR explicitly tagged `[MANUAL: reviewer + scenario]` when no automated check is feasible. Prose-only evidence is unfalsifiable and forbidden. |
|
||||
| **Non-goals** | What is this change *not* solving? |
|
||||
| **Invariants** | What must remain true while making the change? |
|
||||
| **Invariants** | What must remain true while making the change? Split into safety ("X never happens") + liveness ("Y eventually happens") when the change touches async, queues, timers, or state machines. Pure synchronous code can use safety-only invariants. |
|
||||
| **Assumptions** | What conditions about the world MUST be true for this spec to even be valid? (locking protocols, API stability, caller invariants, deployment context, data shape) |
|
||||
|
||||
These are the same fields the **Purpose Gate** in [`docs/SPEC_FIRST_TDD.md`](../../../../../../docs/SPEC_FIRST_TDD.md) requires of every artefact. PDD is how you fill them in before you start.
|
||||
|
||||
|
|
@ -31,11 +32,12 @@ These are the same fields the **Purpose Gate** in [`docs/SPEC_FIRST_TDD.md`](../
|
|||
2. Name the **consumer** precisely — file, function, route, command. If you can't, stop.
|
||||
3. Define the observable **contract** — what the consumer receives, not what the implementation does internally.
|
||||
4. Define the **failure boundary** — degradation is not crash; surface, don't swallow.
|
||||
5. State **non-goals** and **invariants**.
|
||||
6. Choose the **evidence** before changing code — what test or repro will prove the contract is met.
|
||||
7. Write or identify the failing behaviour test or repro for non-trivial work (hand off to `spec-first-tdd`).
|
||||
8. Implement the *minimum* change that satisfies the contract.
|
||||
9. Verify using the evidence chosen in step 6 — not a different one chosen post-hoc.
|
||||
5. State **non-goals** and **invariants** (split safety/liveness when the change touches async).
|
||||
6. State **assumptions** — what must be true about the world for this spec to apply at all.
|
||||
7. Choose the **evidence** before changing code — each criterion must be machine-executable or tagged `[MANUAL: reviewer + scenario]`.
|
||||
8. Write or identify the failing behaviour test or repro for non-trivial work (hand off to `spec-first-tdd`).
|
||||
9. Implement the *minimum* change that satisfies the contract.
|
||||
10. Verify using the evidence chosen in step 7 — not a different one chosen post-hoc.
|
||||
|
||||
## Rules
|
||||
|
||||
|
|
@ -100,6 +102,17 @@ Persist the PDD packet to the unit's artefacts so the next phase (TDD) and the r
|
|||
- **Evidence**:
|
||||
- **Non-goals**:
|
||||
- **Invariants**:
|
||||
- **Assumptions**:
|
||||
```
|
||||
|
||||
Save to `.sf/active/{unit-id}/pdd.md` (or inline at the top of the slice plan). When the slice completes, this packet feeds the `Evidence` block of `requesting-code-review`.
|
||||
|
||||
## What's new in v2
|
||||
|
||||
Three research findings drove the v1 → v2 update:
|
||||
|
||||
**Assumptions field (Jackson & Zave, 1997).** "Four Dark Corners of Requirements Engineering" (ACM TOSEM) formalised the world-machine framework: a specification S is only valid relative to domain assumptions A. The Failure Boundary field catches machine-side failures; the new Assumptions field catches world-side failures where the spec itself becomes invalid even when the code is correct. Violated world assumptions are the most expensive failure class because they are invisible to internal tests.
|
||||
|
||||
**Safety/liveness split for Invariants (Alpern & Schneider, 1985/1987).** "Defining Liveness" and "Recognizing Safety and Liveness" showed that every property is the intersection of a safety property ("X never happens") and a liveness property ("Y eventually happens"), and that the two require different verification techniques. Conflating them produces proof gaps in code that touches async, queues, timers, or state machines. Pure synchronous code can continue using safety-only invariants.
|
||||
|
||||
**Machine-executable Evidence requirement (SWE-bench Pro arXiv:2509.16941, 2025; MetaGPT ICLR 2024; Agentic Rubrics Scale AI 2024).** Empirical benchmarking across agentic coding systems found that prose-only evidence enables self-deception ("I think this is right") while machine-executable evidence enables self-correction loops. SWE-bench Pro identified under-specified evidence as the primary cause of agent failure, not model capability. Evidence criteria are now required to be either machine-executable (named test, queryable metric, runnable command) or explicitly tagged `[MANUAL: reviewer + scenario]`.
|
||||
|
|
|
|||
145
src/resources/extensions/sf/tests/judgment-log.test.ts
Normal file
145
src/resources/extensions/sf/tests/judgment-log.test.ts
Normal file
|
|
@ -0,0 +1,145 @@
|
|||
/**
|
||||
* Tests for Mechanism 3 — judgment log.
|
||||
*
|
||||
* Covers:
|
||||
* - appendJudgment creates the JSONL file with correct fields
|
||||
* - readJudgmentLog returns empty array when file doesn't exist
|
||||
* - readJudgmentLog returns all entries when no unitId filter
|
||||
* - readJudgmentLog filters by unitId prefix
|
||||
* - appendJudgment is non-fatal on bad path (does not throw)
|
||||
* - ts field is a valid ISO 8601 timestamp
|
||||
*/
|
||||
|
||||
import assert from "node:assert/strict";
|
||||
import {
|
||||
mkdirSync,
|
||||
mkdtempSync,
|
||||
readFileSync,
|
||||
rmSync,
|
||||
} from "node:fs";
|
||||
import { tmpdir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
import { test } from "node:test";
|
||||
|
||||
import { appendJudgment, readJudgmentLog, resolveJudgmentLogPath } from "../judgment-log.ts";
|
||||
|
||||
function makeTmpProject(): string {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-judgment-"));
|
||||
mkdirSync(join(dir, ".sf"), { recursive: true });
|
||||
return dir;
|
||||
}
|
||||
|
||||
test("judgment-log: appendJudgment creates JSONL file", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "Used uv run for verification",
|
||||
alternatives: ["python -m pytest directly", "skip verification"],
|
||||
reasoning: "Project has uv.lock; uv-managed env is canonical",
|
||||
confidence: "high",
|
||||
});
|
||||
const logPath = resolveJudgmentLogPath(dir);
|
||||
const content = readFileSync(logPath, "utf-8");
|
||||
const parsed = JSON.parse(content.trim().split("\n")[0]);
|
||||
assert.strictEqual(parsed.unitId, "M001/S01/T01");
|
||||
assert.strictEqual(parsed.decision, "Used uv run for verification");
|
||||
assert.strictEqual(parsed.confidence, "high");
|
||||
assert.ok(Array.isArray(parsed.alternatives));
|
||||
assert.ok(parsed.ts, "should have ts field");
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("judgment-log: ts field is a valid ISO 8601 timestamp", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "choice",
|
||||
alternatives: [],
|
||||
reasoning: "reason",
|
||||
confidence: "medium",
|
||||
});
|
||||
const entries = readJudgmentLog(dir);
|
||||
assert.ok(entries.length > 0);
|
||||
const ts = entries[0].ts;
|
||||
const d = new Date(ts);
|
||||
assert.ok(!Number.isNaN(d.getTime()), `ts should be valid date, got: ${ts}`);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("judgment-log: readJudgmentLog returns empty array when file does not exist", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
const entries = readJudgmentLog(dir);
|
||||
assert.deepEqual(entries, []);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("judgment-log: readJudgmentLog returns all entries without filter", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "d1",
|
||||
alternatives: [],
|
||||
reasoning: "r1",
|
||||
confidence: "low",
|
||||
});
|
||||
appendJudgment(dir, {
|
||||
unitId: "M002/S01/T01",
|
||||
decision: "d2",
|
||||
alternatives: [],
|
||||
reasoning: "r2",
|
||||
confidence: "high",
|
||||
});
|
||||
const entries = readJudgmentLog(dir);
|
||||
assert.strictEqual(entries.length, 2);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("judgment-log: readJudgmentLog filters by unitId prefix", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "d1",
|
||||
alternatives: [],
|
||||
reasoning: "r1",
|
||||
confidence: "low",
|
||||
});
|
||||
appendJudgment(dir, {
|
||||
unitId: "M002/S01/T01",
|
||||
decision: "d2",
|
||||
alternatives: [],
|
||||
reasoning: "r2",
|
||||
confidence: "high",
|
||||
});
|
||||
const entries = readJudgmentLog(dir, "M001");
|
||||
assert.strictEqual(entries.length, 1);
|
||||
assert.strictEqual(entries[0].unitId, "M001/S01/T01");
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("judgment-log: appendJudgment is non-fatal when base directory does not exist", () => {
|
||||
// Provide a non-existent directory — should not throw
|
||||
assert.doesNotThrow(() => {
|
||||
appendJudgment("/tmp/nonexistent-sf-test-dir-12345/project", {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "test",
|
||||
alternatives: [],
|
||||
reasoning: "test",
|
||||
confidence: "low",
|
||||
});
|
||||
});
|
||||
});
|
||||
190
src/resources/extensions/sf/tests/knowledge-compounding.test.ts
Normal file
190
src/resources/extensions/sf/tests/knowledge-compounding.test.ts
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
/**
|
||||
* Tests for Mechanism 4 — KNOWLEDGE.md compounding from judgment log.
|
||||
*
|
||||
* Covers:
|
||||
* - compoundLearningsIntoKnowledge returns {added:0, skipped:0} with no high-confidence entries
|
||||
* - high-confidence entries are appended under a section heading
|
||||
* - low/medium confidence entries are not included
|
||||
* - duplicate entries are skipped (deduplication)
|
||||
* - creates KNOWLEDGE.md from scratch if missing
|
||||
* - appends to existing KNOWLEDGE.md
|
||||
*/
|
||||
|
||||
import assert from "node:assert/strict";
|
||||
import {
|
||||
existsSync,
|
||||
mkdirSync,
|
||||
mkdtempSync,
|
||||
readFileSync,
|
||||
rmSync,
|
||||
writeFileSync,
|
||||
} from "node:fs";
|
||||
import { tmpdir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
import { test } from "node:test";
|
||||
|
||||
import { appendJudgment } from "../judgment-log.ts";
|
||||
import { compoundLearningsIntoKnowledge } from "../knowledge-compounding.ts";
|
||||
|
||||
function makeTmpProject(): string {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-compound-"));
|
||||
mkdirSync(join(dir, ".sf"), { recursive: true });
|
||||
return dir;
|
||||
}
|
||||
|
||||
test("knowledge-compounding: returns {added:0, skipped:0} when no entries exist", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
const result = compoundLearningsIntoKnowledge(dir, "M001");
|
||||
assert.deepEqual(result, { added: 0, skipped: 0 });
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("knowledge-compounding: returns {added:0, skipped:0} when only low/medium confidence entries", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "low decision",
|
||||
alternatives: [],
|
||||
reasoning: "low reason",
|
||||
confidence: "low",
|
||||
});
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T02",
|
||||
decision: "medium decision",
|
||||
alternatives: [],
|
||||
reasoning: "medium reason",
|
||||
confidence: "medium",
|
||||
});
|
||||
const result = compoundLearningsIntoKnowledge(dir, "M001");
|
||||
assert.deepEqual(result, { added: 0, skipped: 0 });
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("knowledge-compounding: appends high-confidence entries to KNOWLEDGE.md", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "Used uv run",
|
||||
alternatives: ["python directly"],
|
||||
reasoning: "uv.lock present",
|
||||
confidence: "high",
|
||||
});
|
||||
const result = compoundLearningsIntoKnowledge(dir, "M001");
|
||||
assert.strictEqual(result.added, 1);
|
||||
assert.strictEqual(result.skipped, 0);
|
||||
|
||||
const knowledgePath = join(dir, ".sf", "KNOWLEDGE.md");
|
||||
assert.ok(existsSync(knowledgePath), "KNOWLEDGE.md should be created");
|
||||
const content = readFileSync(knowledgePath, "utf-8");
|
||||
assert.ok(content.includes("## Learned during M001"));
|
||||
assert.ok(content.includes("Used uv run"));
|
||||
assert.ok(content.includes("uv.lock present"));
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("knowledge-compounding: creates KNOWLEDGE.md from scratch when missing", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "new decision",
|
||||
alternatives: [],
|
||||
reasoning: "new reasoning",
|
||||
confidence: "high",
|
||||
});
|
||||
compoundLearningsIntoKnowledge(dir, "M001");
|
||||
const knowledgePath = join(dir, ".sf", "KNOWLEDGE.md");
|
||||
assert.ok(existsSync(knowledgePath));
|
||||
const content = readFileSync(knowledgePath, "utf-8");
|
||||
assert.ok(content.includes("# Project Knowledge"));
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("knowledge-compounding: appends to existing KNOWLEDGE.md", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
const knowledgePath = join(dir, ".sf", "KNOWLEDGE.md");
|
||||
writeFileSync(
|
||||
knowledgePath,
|
||||
"# Project Knowledge\n\n## Existing section\n\n- K001: Old rule.\n",
|
||||
"utf-8",
|
||||
);
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "new high decision",
|
||||
alternatives: [],
|
||||
reasoning: "new reasoning",
|
||||
confidence: "high",
|
||||
});
|
||||
const result = compoundLearningsIntoKnowledge(dir, "M001");
|
||||
assert.strictEqual(result.added, 1);
|
||||
|
||||
const content = readFileSync(knowledgePath, "utf-8");
|
||||
assert.ok(content.includes("K001: Old rule"), "existing content should be preserved");
|
||||
assert.ok(content.includes("## Learned during M001"));
|
||||
assert.ok(content.includes("new high decision"));
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("knowledge-compounding: deduplicates entries on second run", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "dedup decision",
|
||||
alternatives: [],
|
||||
reasoning: "dedup reasoning",
|
||||
confidence: "high",
|
||||
});
|
||||
const r1 = compoundLearningsIntoKnowledge(dir, "M001");
|
||||
assert.strictEqual(r1.added, 1);
|
||||
|
||||
// Run again — same entry should be skipped
|
||||
const r2 = compoundLearningsIntoKnowledge(dir, "M001");
|
||||
assert.strictEqual(r2.added, 0);
|
||||
assert.strictEqual(r2.skipped, 1);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("knowledge-compounding: only compounds entries for the specified milestoneId", () => {
|
||||
const dir = makeTmpProject();
|
||||
try {
|
||||
appendJudgment(dir, {
|
||||
unitId: "M001/S01/T01",
|
||||
decision: "M001 decision",
|
||||
alternatives: [],
|
||||
reasoning: "M001 reasoning",
|
||||
confidence: "high",
|
||||
});
|
||||
appendJudgment(dir, {
|
||||
unitId: "M002/S01/T01",
|
||||
decision: "M002 decision",
|
||||
alternatives: [],
|
||||
reasoning: "M002 reasoning",
|
||||
confidence: "high",
|
||||
});
|
||||
const result = compoundLearningsIntoKnowledge(dir, "M001");
|
||||
assert.strictEqual(result.added, 1);
|
||||
|
||||
const content = readFileSync(join(dir, ".sf", "KNOWLEDGE.md"), "utf-8");
|
||||
assert.ok(content.includes("M001 decision"), "should include M001 entry");
|
||||
assert.ok(!content.includes("M002 decision"), "should not include M002 entry");
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
|
@ -0,0 +1,167 @@
|
|||
/**
|
||||
* Tests for Mechanism 2 — milestone framing check.
|
||||
*
|
||||
* Covers:
|
||||
* - Returns empty array when milestone context file is missing
|
||||
* - Anti-goal keyword match produces a warning finding
|
||||
* - No anti-goal match when keywords are absent
|
||||
* - PROJECT.md vision mismatch produces an info finding
|
||||
* - formatFramingFindings returns empty string for empty findings
|
||||
* - formatFramingFindings returns formatted block with severity labels
|
||||
*/
|
||||
|
||||
import assert from "node:assert/strict";
|
||||
import {
|
||||
mkdirSync,
|
||||
mkdtempSync,
|
||||
rmSync,
|
||||
writeFileSync,
|
||||
} from "node:fs";
|
||||
import { tmpdir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
import { test } from "node:test";
|
||||
|
||||
import {
|
||||
checkMilestoneFraming,
|
||||
formatFramingFindings,
|
||||
} from "../milestone-framing-check.ts";
|
||||
|
||||
function makeProject(antiGoals?: string, projectMd?: string, contextMd?: string): string {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-framing-"));
|
||||
mkdirSync(join(dir, ".sf", "milestones", "M001"), { recursive: true });
|
||||
if (antiGoals !== undefined) {
|
||||
writeFileSync(join(dir, ".sf", "ANTI-GOALS.md"), antiGoals, "utf-8");
|
||||
}
|
||||
if (projectMd !== undefined) {
|
||||
writeFileSync(join(dir, "PROJECT.md"), projectMd, "utf-8");
|
||||
}
|
||||
if (contextMd !== undefined) {
|
||||
writeFileSync(
|
||||
join(dir, ".sf", "milestones", "M001", "M001-CONTEXT.md"),
|
||||
contextMd,
|
||||
"utf-8",
|
||||
);
|
||||
}
|
||||
return dir;
|
||||
}
|
||||
|
||||
test("framing-check: returns empty array when milestone context file is missing", () => {
|
||||
const dir = makeProject("- no federation", undefined, undefined);
|
||||
try {
|
||||
const findings = checkMilestoneFraming(dir, "M001");
|
||||
assert.deepEqual(findings, []);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("framing-check: returns empty array when no anti-goals file exists", () => {
|
||||
const dir = makeProject(
|
||||
undefined,
|
||||
undefined,
|
||||
"# Context\n\nBuild a central federation system.\n",
|
||||
);
|
||||
try {
|
||||
const findings = checkMilestoneFraming(dir, "M001");
|
||||
// No anti-goals file → no anti-goal warnings
|
||||
const antiGoalFindings = findings.filter((f) => f.source === "anti_goal");
|
||||
assert.deepEqual(antiGoalFindings, []);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("framing-check: anti-goal keyword match produces a warning finding", () => {
|
||||
const antiGoals =
|
||||
"# Anti-goals\n\n- No central federation of repositories across projects.\n";
|
||||
const context = "# Build federated central repository\n\nThis milestone implements federation.\n";
|
||||
const dir = makeProject(antiGoals, undefined, context);
|
||||
try {
|
||||
const findings = checkMilestoneFraming(dir, "M001");
|
||||
const warnings = findings.filter(
|
||||
(f) => f.source === "anti_goal" && f.severity === "warning",
|
||||
);
|
||||
assert.ok(warnings.length > 0, "should have at least one anti-goal warning");
|
||||
assert.ok(
|
||||
warnings[0].concern.includes("ANTI-GOALS.md"),
|
||||
"concern should reference ANTI-GOALS.md",
|
||||
);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("framing-check: no anti-goal warning when context has no matching keywords", () => {
|
||||
const antiGoals =
|
||||
"# Anti-goals\n\n- No central federation of repositories across projects.\n";
|
||||
const context = "# Add login screen\n\nBuild the user authentication UI.\n";
|
||||
const dir = makeProject(antiGoals, undefined, context);
|
||||
try {
|
||||
const findings = checkMilestoneFraming(dir, "M001");
|
||||
const antiGoalWarnings = findings.filter((f) => f.source === "anti_goal");
|
||||
assert.deepEqual(antiGoalWarnings, []);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("framing-check: PROJECT.md vision mismatch produces an info finding", () => {
|
||||
const projectMd =
|
||||
"# Project\n\nBuild a fast payment processing pipeline for merchants.\n";
|
||||
// Context completely unrelated to payment/processing/pipeline/merchants
|
||||
const context = "# Refactor CSS classes\n\nUpdate button colors and fonts.\n";
|
||||
const dir = makeProject(undefined, projectMd, context);
|
||||
try {
|
||||
const findings = checkMilestoneFraming(dir, "M001");
|
||||
const infoFindings = findings.filter(
|
||||
(f) => f.source === "project_vision" && f.severity === "info",
|
||||
);
|
||||
assert.ok(
|
||||
infoFindings.length > 0,
|
||||
"should have a project-vision info finding when context is unrelated",
|
||||
);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("framing-check: no project-vision finding when PROJECT.md is absent", () => {
|
||||
const context = "# Some milestone\n\nDoes something.\n";
|
||||
const dir = makeProject(undefined, undefined, context);
|
||||
try {
|
||||
const findings = checkMilestoneFraming(dir, "M001");
|
||||
const visionFindings = findings.filter((f) => f.source === "project_vision");
|
||||
assert.deepEqual(visionFindings, []);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
// ─── formatFramingFindings ────────────────────────────────────────────────────
|
||||
|
||||
test("formatFramingFindings: returns empty string for empty findings array", () => {
|
||||
const result = formatFramingFindings("M001", []);
|
||||
assert.strictEqual(result, "");
|
||||
});
|
||||
|
||||
test("formatFramingFindings: returns block with milestone ID and severity labels", () => {
|
||||
const findings = [
|
||||
{
|
||||
concern: "Uses federation",
|
||||
source: "anti_goal" as const,
|
||||
severity: "warning" as const,
|
||||
},
|
||||
{
|
||||
concern: "No project link",
|
||||
source: "project_vision" as const,
|
||||
severity: "info" as const,
|
||||
},
|
||||
];
|
||||
const result = formatFramingFindings("M001", findings);
|
||||
assert.ok(result.includes("MILESTONE FRAMING CHECK"));
|
||||
assert.ok(result.includes("M001"));
|
||||
assert.ok(result.includes("WARNING"));
|
||||
assert.ok(result.includes("INFO"));
|
||||
assert.ok(result.includes("Uses federation"));
|
||||
assert.ok(result.includes("No project link"));
|
||||
});
|
||||
192
src/resources/extensions/sf/tests/tacit-knowledge.test.ts
Normal file
192
src/resources/extensions/sf/tests/tacit-knowledge.test.ts
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
/**
|
||||
* Tests for Mechanism 1 — tacit knowledge files and system-context injection.
|
||||
*
|
||||
* Covers:
|
||||
* - SCAFFOLD_FILES includes .sf/PRINCIPLES.md, .sf/TASTE.md, .sf/ANTI-GOALS.md
|
||||
* - ensureAgenticDocsScaffold creates the three tacit knowledge files
|
||||
* - loadTacitKnowledgeBlock returns empty string when files are missing
|
||||
* - loadTacitKnowledgeBlock injects present files into the block
|
||||
* - Scaffold markers are stripped from emitted block
|
||||
* - Sections exceeding 4 KB are truncated with a note
|
||||
*/
|
||||
|
||||
import assert from "node:assert/strict";
|
||||
import {
|
||||
mkdirSync,
|
||||
mkdtempSync,
|
||||
rmSync,
|
||||
writeFileSync,
|
||||
existsSync,
|
||||
readFileSync,
|
||||
} from "node:fs";
|
||||
import { tmpdir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
import { test } from "node:test";
|
||||
|
||||
import { SCAFFOLD_FILES, ensureAgenticDocsScaffold } from "../agentic-docs-scaffold.ts";
|
||||
import { loadTacitKnowledgeBlock } from "../bootstrap/system-context.ts";
|
||||
|
||||
// ─── Scaffold file registration ───────────────────────────────────────────────
|
||||
|
||||
test("tacit-knowledge: SCAFFOLD_FILES includes .sf/PRINCIPLES.md", () => {
|
||||
const paths = SCAFFOLD_FILES.map((f) => f.path);
|
||||
assert.ok(
|
||||
paths.includes(".sf/PRINCIPLES.md"),
|
||||
"SCAFFOLD_FILES should include .sf/PRINCIPLES.md",
|
||||
);
|
||||
});
|
||||
|
||||
test("tacit-knowledge: SCAFFOLD_FILES includes .sf/TASTE.md", () => {
|
||||
const paths = SCAFFOLD_FILES.map((f) => f.path);
|
||||
assert.ok(paths.includes(".sf/TASTE.md"), "SCAFFOLD_FILES should include .sf/TASTE.md");
|
||||
});
|
||||
|
||||
test("tacit-knowledge: SCAFFOLD_FILES includes .sf/ANTI-GOALS.md", () => {
|
||||
const paths = SCAFFOLD_FILES.map((f) => f.path);
|
||||
assert.ok(
|
||||
paths.includes(".sf/ANTI-GOALS.md"),
|
||||
"SCAFFOLD_FILES should include .sf/ANTI-GOALS.md",
|
||||
);
|
||||
});
|
||||
|
||||
test("tacit-knowledge: scaffold template content includes expected headings", () => {
|
||||
const principles = SCAFFOLD_FILES.find((f) => f.path === ".sf/PRINCIPLES.md");
|
||||
assert.ok(principles, "PRINCIPLES.md template should exist");
|
||||
assert.match(principles!.content, /# Principles/);
|
||||
|
||||
const taste = SCAFFOLD_FILES.find((f) => f.path === ".sf/TASTE.md");
|
||||
assert.ok(taste, "TASTE.md template should exist");
|
||||
assert.match(taste!.content, /# Taste/);
|
||||
|
||||
const antiGoals = SCAFFOLD_FILES.find((f) => f.path === ".sf/ANTI-GOALS.md");
|
||||
assert.ok(antiGoals, "ANTI-GOALS.md template should exist");
|
||||
assert.match(antiGoals!.content, /# Anti-goals/);
|
||||
});
|
||||
|
||||
// ─── ensureAgenticDocsScaffold creates tacit knowledge files ──────────────────
|
||||
|
||||
test("tacit-knowledge: ensureAgenticDocsScaffold creates .sf/PRINCIPLES.md", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
ensureAgenticDocsScaffold(dir);
|
||||
const path = join(dir, ".sf", "PRINCIPLES.md");
|
||||
assert.ok(existsSync(path), ".sf/PRINCIPLES.md should be created");
|
||||
assert.match(readFileSync(path, "utf-8"), /# Principles/);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("tacit-knowledge: ensureAgenticDocsScaffold creates .sf/TASTE.md", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
ensureAgenticDocsScaffold(dir);
|
||||
const path = join(dir, ".sf", "TASTE.md");
|
||||
assert.ok(existsSync(path), ".sf/TASTE.md should be created");
|
||||
assert.match(readFileSync(path, "utf-8"), /# Taste/);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("tacit-knowledge: ensureAgenticDocsScaffold creates .sf/ANTI-GOALS.md", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
ensureAgenticDocsScaffold(dir);
|
||||
const path = join(dir, ".sf", "ANTI-GOALS.md");
|
||||
assert.ok(existsSync(path), ".sf/ANTI-GOALS.md should be created");
|
||||
assert.match(readFileSync(path, "utf-8"), /# Anti-goals/);
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
// ─── loadTacitKnowledgeBlock ──────────────────────────────────────────────────
|
||||
|
||||
test("loadTacitKnowledgeBlock: returns empty string when .sf/ is missing", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
const result = loadTacitKnowledgeBlock(dir);
|
||||
assert.strictEqual(result, "");
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("loadTacitKnowledgeBlock: returns empty string when all three files are missing", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
mkdirSync(join(dir, ".sf"), { recursive: true });
|
||||
const result = loadTacitKnowledgeBlock(dir);
|
||||
assert.strictEqual(result, "");
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("loadTacitKnowledgeBlock: injects PRINCIPLES.md content when present", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
mkdirSync(join(dir, ".sf"), { recursive: true });
|
||||
writeFileSync(
|
||||
join(dir, ".sf", "PRINCIPLES.md"),
|
||||
"# Principles\n\n- SF stays standalone forever.\n",
|
||||
"utf-8",
|
||||
);
|
||||
const result = loadTacitKnowledgeBlock(dir);
|
||||
assert.ok(result.includes("[TACIT KNOWLEDGE"));
|
||||
assert.ok(result.includes("## Principles"));
|
||||
assert.ok(result.includes("SF stays standalone forever"));
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("loadTacitKnowledgeBlock: injects all three sections when all present", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
mkdirSync(join(dir, ".sf"), { recursive: true });
|
||||
writeFileSync(join(dir, ".sf", "PRINCIPLES.md"), "- p1\n", "utf-8");
|
||||
writeFileSync(join(dir, ".sf", "TASTE.md"), "- t1\n", "utf-8");
|
||||
writeFileSync(join(dir, ".sf", "ANTI-GOALS.md"), "- a1\n", "utf-8");
|
||||
const result = loadTacitKnowledgeBlock(dir);
|
||||
assert.ok(result.includes("## Principles"));
|
||||
assert.ok(result.includes("## Taste"));
|
||||
assert.ok(result.includes("## Anti-goals"));
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("loadTacitKnowledgeBlock: strips scaffold markers from content", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
mkdirSync(join(dir, ".sf"), { recursive: true });
|
||||
writeFileSync(
|
||||
join(dir, ".sf", "PRINCIPLES.md"),
|
||||
"# Principles\n\n<!-- sf-scaffold: version=1 hash=abc state=pending -->\n\n- Real content.\n",
|
||||
"utf-8",
|
||||
);
|
||||
const result = loadTacitKnowledgeBlock(dir);
|
||||
assert.ok(!result.includes("sf-scaffold"), "scaffold markers should be stripped");
|
||||
assert.ok(result.includes("Real content"), "real content should be present");
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("loadTacitKnowledgeBlock: truncates section exceeding 4KB with note", () => {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-tacit-"));
|
||||
try {
|
||||
mkdirSync(join(dir, ".sf"), { recursive: true });
|
||||
const bigContent = "x".repeat(5000);
|
||||
writeFileSync(join(dir, ".sf", "PRINCIPLES.md"), bigContent, "utf-8");
|
||||
const result = loadTacitKnowledgeBlock(dir);
|
||||
assert.ok(result.includes("truncated"), "should include truncation note");
|
||||
// The block should not contain all 5000 x's
|
||||
const xCount = (result.match(/x/g) ?? []).length;
|
||||
assert.ok(xCount < 5000, "truncated block should be shorter than 5000 chars");
|
||||
} finally {
|
||||
rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
Loading…
Add table
Reference in a new issue