fix(auto): align UAT artifact suffix with gsd_slice_complete output (#2592)

* fix(auto): align UAT artifact suffix with gsd_slice_complete output

The auto-mode files referenced UAT-RESULT as the artifact suffix,
but gsd_slice_complete writes files as S##-UAT.md. This mismatch
caused ENOENT errors during validate-milestone dispatch.

Fixes #2564

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(auto): update test and doc references from UAT-RESULT to UAT

Aligns test assertions and ADR documentation with the corrected
artifact suffix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(auto): replace separate UAT-RESULT file check with in-file verdict check

The original two-file model (UAT spec + UAT-RESULT verdict) never
worked because gsd_slice_complete only writes S##-UAT.md. The blind
string replacement made checkNeedsRunUat always return null by
resolving the same file twice. Now checks for a verdict: line inside
the UAT file content to determine if UAT has been completed.

Also deduplicates a redundant resolveSliceFile call in the verdict
gate and updates tests to verify the single-file verdict model.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
TÂCHES 2026-03-25 22:12:08 -06:00 committed by GitHub
parent 891cde201c
commit 5f8bbbc6e1
8 changed files with 48 additions and 56 deletions

View file

@ -217,18 +217,18 @@ For the same 4-slice, 3-task milestone:
#### 5. Replace validate-milestone with mechanical verification
**Current:** An LLM session re-reads the ROADMAP and all slice summaries, checks success criteria against delivery evidence, and writes a VALIDATION.md with a verdict. It also inlines UAT-RESULT artifacts from slices with `uat_dispatch` enabled.
**Current:** An LLM session re-reads the ROADMAP and all slice summaries, checks success criteria against delivery evidence, and writes a VALIDATION.md with a verdict. It also inlines UAT artifacts from slices with `uat_dispatch` enabled.
**New:** The system mechanically aggregates verification results from all tasks and slices. The canonical verification data sources are:
1. **`T##-VERIFY.json`** files (written by `writeVerificationJSON()` in `verification-evidence.ts`) — machine-readable per-task verification results with command, exit code, verdict, duration, and blocking status.
2. **`S##-UAT-RESULT.md`** files (when `uat_dispatch` is enabled) — human or artifact-driven UAT outcomes.
2. **`S##-UAT.md`** files (when `uat_dispatch` is enabled) — human or artifact-driven UAT outcomes.
3. **Task summary frontmatter** `verification_result` field — a human-readable pass/fail string (not structured, used as a secondary signal).
The aggregator reads `T##-VERIFY.json` as the primary source of truth, supplements with UAT-RESULT artifacts, and produces a deterministic VALIDATION.md.
The aggregator reads `T##-VERIFY.json` as the primary source of truth, supplements with UAT artifacts, and produces a deterministic VALIDATION.md.
**What changes:**
- A new `aggregateMilestoneVerification()` function collects `T##-VERIFY.json` files and `S##-UAT-RESULT.md` files across all slices.
- A new `aggregateMilestoneVerification()` function collects `T##-VERIFY.json` files and `S##-UAT.md` files across all slices.
- The function produces a VALIDATION.md with per-task and per-slice pass/fail status, UAT evidence, and an overall verdict.
- The LLM-driven validate-milestone session is removed from the default pipeline.
- The validate-milestone template is retained for explicit dispatch (users who want LLM-driven validation can run `/gsd dispatch validate`).
@ -254,8 +254,8 @@ async function aggregateMilestoneVerification(base: string, mid: string): Promis
}
}
// Secondary source: S##-UAT-RESULT.md (when uat_dispatch enabled)
const uatResultFile = resolveSliceFile(base, mid, slice.id, "UAT-RESULT");
// Secondary source: S##-UAT.md (when uat_dispatch enabled)
const uatResultFile = resolveSliceFile(base, mid, slice.id, "UAT");
if (uatResultFile) {
const uatContent = await loadFile(uatResultFile);
if (uatContent) uatResults.push({ sliceId: slice.id, content: uatContent });
@ -476,7 +476,7 @@ async function mechanicalSliceCompletion(base: string, mid: string, sid: string)
#### Mechanical milestone validation
See `aggregateMilestoneVerification()` above (Section 5). Reads `T##-VERIFY.json` and `S##-UAT-RESULT.md` as canonical sources.
See `aggregateMilestoneVerification()` above (Section 5). Reads `T##-VERIFY.json` and `S##-UAT.md` as canonical sources.
#### Mechanical milestone summary
@ -547,7 +547,7 @@ At current Opus pricing ($15/MTok input, $75/MTok output — as of March 2026),
| `auto-prompts.ts` — plan-milestone exploration | ~30 | Research instructions merged in |
| `auto-prompts.ts` — plan-slice reassessment + exploration | ~25 | Reassessment + exploration preamble |
| `auto-post-unit.ts``mechanicalSliceCompletion()` | ~80 | Structured frontmatter aggregation, UAT generation, artifact writes |
| `auto-verification.ts``aggregateMilestoneVerification()` | ~60 | T##-VERIFY.json + UAT-RESULT aggregation |
| `auto-verification.ts``aggregateMilestoneVerification()` | ~60 | T##-VERIFY.json + UAT aggregation |
| `auto-unit-closeout.ts``generateMilestoneSummary()` | ~60 | Mechanical summary generation |
| **Total added** | **~255** | |
@ -694,7 +694,7 @@ The mechanical summary quality might be insufficient for complex slices.
13. Implement `mechanicalRequirementsUpdate()` and `appendNewDecisions()`
### Phase 3: Mechanical milestone validation + completion
14. Implement `aggregateMilestoneVerification()` reading `T##-VERIFY.json` and `S##-UAT-RESULT.md`
14. Implement `aggregateMilestoneVerification()` reading `T##-VERIFY.json` and `S##-UAT.md`
15. Implement `generateMilestoneSummary()` from slice summary aggregation
16. Wire into post-unit processing: after last slice completion, run mechanical validation + summary
17. Make reassess-roadmap opt-in via `reassess_after_slice` preference (default: false)
@ -723,14 +723,14 @@ The mechanical summary quality might be insufficient for complex slices.
3. ✅ Token savings double-counting (eliminated sessions + re-ingestion) — **fixed**: removed overlap, noted savings are not additive
4. ✅ Context inlining change (file paths vs inline) underanalyzed — **fixed**: expanded to dedicated risk section with enforcement strategy, phased rollout, and interaction with budget engine
5. ✅ Budget engine interaction not discussed — **fixed**: addressed in context inlining section
6. ✅ `aggregateMilestoneVerification()` reads wrong data source — **fixed**: now reads `T##-VERIFY.json` as primary source, supplemented by `S##-UAT-RESULT.md`
6. ✅ `aggregateMilestoneVerification()` reads wrong data source — **fixed**: now reads `T##-VERIFY.json` as primary source, supplemented by `S##-UAT.md`
7. ✅ Phase ordering creates heavy intermediate state (Phase 1 without Phase 4) — **fixed**: Phase 1 now includes targeted inlining reduction for planning sessions
8. ✅ ADR number conflict — **fixed**: confirmed no ADR-003 exists in `docs/` (the referenced file doesn't exist in current git)
**OpenAI Codex** identified 6 issues:
1. ✅ HIGH: Folding completion into execute-task breaks verification-retry model — **fixed**: moved completion to post-gate mechanical processing instead of executor prompt. Added Alternative D explaining why.
2. ✅ HIGH: Mechanical validation reads nonexistent `verification_evidence` frontmatter — **fixed**: now reads `T##-VERIFY.json` (canonical machine-readable source from `verification-evidence.ts`)
3. ✅ HIGH: Replacement validation drops UAT evidence — **fixed**: aggregator now reads both `T##-VERIFY.json` and `S##-UAT-RESULT.md`
3. ✅ HIGH: Replacement validation drops UAT evidence — **fixed**: aggregator now reads both `T##-VERIFY.json` and `S##-UAT.md`
4. ✅ HIGH: "State derivation stays unchanged" is false — **fixed**: explicitly documented that `deriveState()` phases are preserved, mechanical processing resolves them synchronously, fallback dispatch rules handle failures
5. ✅ MEDIUM: Folded completion omits REQUIREMENTS.md and KNOWLEDGE.md updates — **fixed**: mechanical completion handles REQUIREMENTS.md and DECISIONS.md; KNOWLEDGE.md addressed in Risk 5
6. ✅ MEDIUM: Session and token math inconsistent — **fixed**: complete rederivation with per-slice breakdown, corrected to 30 baseline sessions, noted profile variations

View file

@ -53,7 +53,7 @@ export function resolveExpectedArtifactPath(
}
case "run-uat": {
const dir = resolveSlicePath(base, mid, sid!);
return dir ? join(dir, buildSliceFileName(sid!, "UAT-RESULT")) : null;
return dir ? join(dir, buildSliceFileName(sid!, "UAT")) : null;
}
case "execute-task": {
const tid = parts[2];
@ -120,7 +120,7 @@ export function diagnoseExpectedArtifact(
case "reassess-roadmap":
return `${relSliceFile(base, mid!, sid!, "ASSESSMENT")} (roadmap reassessment)`;
case "run-uat":
return `${relSliceFile(base, mid!, sid!, "UAT-RESULT")} (UAT result)`;
return `${relSliceFile(base, mid!, sid!, "UAT")} (UAT result)`;
case "validate-milestone":
return `${relMilestoneFile(base, mid!, "VALIDATION")} (milestone validation report)`;
case "complete-milestone":

View file

@ -184,7 +184,7 @@ export const DISPATCH_RULES: DispatchRule[] = [
}
for (const sliceId of completedSliceIds) {
const resultFile = resolveSliceFile(basePath, mid, sliceId, "UAT-RESULT");
const resultFile = resolveSliceFile(basePath, mid, sliceId, "UAT");
if (!resultFile) continue;
const content = await loadFile(resultFile);
if (!content) continue;
@ -196,15 +196,9 @@ export const DISPATCH_RULES: DispatchRule[] = [
// produce PARTIAL when all automatable checks pass but human-only
// checks remain — this should not block progression.
const acceptableVerdicts: string[] = ["pass", "passed"];
const uatFile = resolveSliceFile(basePath, mid, sliceId, "UAT");
if (uatFile) {
const uatContent = await loadFile(uatFile);
if (uatContent) {
const uatType = extractUatType(uatContent);
if (uatType === "mixed" || uatType === "human-experience" || uatType === "live-runtime") {
acceptableVerdicts.push("partial");
}
}
const uatType = extractUatType(content);
if (uatType === "mixed" || uatType === "human-experience" || uatType === "live-runtime") {
acceptableVerdicts.push("partial");
}
if (verdict && !acceptableVerdicts.includes(verdict)) {

View file

@ -772,11 +772,8 @@ export async function checkNeedsRunUat(
if (!uatFile) return null;
const uatContent = await loadFile(uatFile);
if (!uatContent) return null;
const uatResultFile = resolveSliceFile(base, mid, sid, "UAT-RESULT");
if (uatResultFile) {
const hasResult = !!(await loadFile(uatResultFile));
if (hasResult) return null;
}
// If the UAT file already contains a verdict, UAT has been run — skip
if (/verdict:\s*[\w-]+/i.test(uatContent)) return null;
const uatType = extractUatType(uatContent) ?? "artifact-driven";
return { sliceId: sid, uatType };
}
@ -799,11 +796,8 @@ export async function checkNeedsRunUat(
if (!uatFileFb) return null;
const uatContentFb = await loadFile(uatFileFb);
if (!uatContentFb) return null;
const uatResultFb = resolveSliceFile(base, mid, uatSid, "UAT-RESULT");
if (uatResultFb) {
const hasResultFb = !!(await loadFile(uatResultFb));
if (hasResultFb) return null;
}
// If the UAT file already contains a verdict, UAT has been run — skip
if (/verdict:\s*[\w-]+/i.test(uatContentFb)) return null;
const uatTypeFb = extractUatType(uatContentFb) ?? "artifact-driven";
return { sliceId: uatSid, uatType: uatTypeFb };
}
@ -1349,8 +1343,8 @@ export async function buildValidateMilestonePrompt(
const summaryRel = relSliceFile(base, mid, sid, "SUMMARY");
inlined.push(await inlineFile(summaryPath, summaryRel, `${sid} Summary`));
const uatPath = resolveSliceFile(base, mid, sid, "UAT-RESULT");
const uatRel = relSliceFile(base, mid, sid, "UAT-RESULT");
const uatPath = resolveSliceFile(base, mid, sid, "UAT");
const uatRel = relSliceFile(base, mid, sid, "UAT");
const uatInline = await inlineFileOptional(uatPath, uatRel, `${sid} UAT Result`);
if (uatInline) inlined.push(uatInline);
}
@ -1501,7 +1495,7 @@ export async function buildRunUatPrompt(
const inlinedContext = capPreamble(`## Inlined Context (preloaded — do not re-read these files)\n\n${inlined.join("\n\n---\n\n")}`);
const uatResultPath = join(base, relSliceFile(base, mid, sliceId, "UAT-RESULT"));
const uatResultPath = join(base, relSliceFile(base, mid, sliceId, "UAT"));
const uatType = extractUatType(uatContent) ?? "artifact-driven";
return loadPrompt("run-uat", {

View file

@ -90,7 +90,7 @@ export function resolveExpectedArtifactPath(
}
case "run-uat": {
const dir = resolveSlicePath(base, mid, sid!);
return dir ? join(dir, buildSliceFileName(sid!, "UAT-RESULT")) : null;
return dir ? join(dir, buildSliceFileName(sid!, "UAT")) : null;
}
case "execute-task": {
const tid = parts[2];
@ -503,7 +503,7 @@ export function diagnoseExpectedArtifact(
case "reassess-roadmap":
return `${relSliceFile(base, mid!, sid!, "ASSESSMENT")} (roadmap reassessment)`;
case "run-uat":
return `${relSliceFile(base, mid!, sid!, "UAT-RESULT")} (UAT result)`;
return `${relSliceFile(base, mid!, sid!, "UAT")} (UAT result)`;
case "validate-milestone":
return `${relMilestoneFile(base, mid!, "VALIDATION")} (milestone validation report)`;
case "complete-milestone":

View file

@ -46,7 +46,7 @@ GSD extension source code is at: `{{gsdSourceDir}}`
├── milestones/{ID}/ — milestone artifacts
│ ├── {ID}-ROADMAP.md, {ID}-RESEARCH.md, {ID}-CONTEXT.md, {ID}-SUMMARY.md
│ └── slices/{SID}/ — slice artifacts
│ ├── {SID}-PLAN.md, {SID}-RESEARCH.md, {SID}-UAT-RESULT.md, {SID}-SUMMARY.md
│ ├── {SID}-PLAN.md, {SID}-RESEARCH.md, {SID}-UAT.md, {SID}-SUMMARY.md
│ └── tasks/{TID}-PLAN.md, {TID}-SUMMARY.md
└── worktrees/{milestoneId}/ — per-milestone worktree with replicated .gsd/
```

View file

@ -112,7 +112,7 @@ test("resolveExpectedArtifactPath returns correct path for all slice-level types
const uatResult = resolveExpectedArtifactPath("run-uat", "M001/S01", base);
assert.ok(uatResult);
assert.ok(uatResult!.includes("UAT-RESULT"));
assert.ok(uatResult!.includes("UAT"));
});
// ─── diagnoseExpectedArtifact ─────────────────────────────────────────────

View file

@ -171,7 +171,7 @@ test('(k) run-uat prompt template', () => {
const milestoneId = 'M001';
const sliceId = 'S01';
const uatPath = '.gsd/milestones/M001/slices/S01/S01-UAT.md';
const uatResultPath = '.gsd/milestones/M001/slices/S01/S01-UAT-RESULT.md';
const uatResultPath = '.gsd/milestones/M001/slices/S01/S01-UAT.md';
const uatType = 'live-runtime';
const inlinedContext = '<!-- no context -->';
let promptResult: string | undefined;
@ -234,7 +234,7 @@ test('(k2) run-uat prompt references gsd_summary_save, not direct write', () =>
milestoneId: 'M001',
sliceId: 'S01',
uatPath: '.gsd/milestones/M001/slices/S01/S01-UAT.md',
uatResultPath: '.gsd/milestones/M001/slices/S01/S01-UAT-RESULT.md',
uatResultPath: '.gsd/milestones/M001/slices/S01/S01-UAT.md',
uatType: 'artifact-driven',
inlinedContext: '<!-- no context -->',
});
@ -265,14 +265,13 @@ test('(l) dispatch preconditions via resolveSliceFile', () => {
'resolveSliceFile(..., "UAT") returns non-null when UAT file exists (dispatch trigger state)',
);
const uatResultFilePath = resolveSliceFile(base, 'M001', 'S01', 'UAT-RESULT');
assert.deepStrictEqual(
uatResultFilePath,
null,
'resolveSliceFile(..., "UAT-RESULT") returns null when result file missing (dispatch trigger state)',
// UAT spec without a verdict line means UAT has not been run yet
const rawContent = readFileSync(uatFilePath!, 'utf-8');
assert.ok(
!/verdict:\s*[\w-]+/i.test(rawContent),
'UAT file without verdict indicates UAT has not been run (dispatch trigger state)',
);
const rawContent = readFileSync(uatFilePath!, 'utf-8');
assert.deepStrictEqual(
extractUatType(rawContent),
'artifact-driven',
@ -286,13 +285,18 @@ test('(l) dispatch preconditions via resolveSliceFile', () => {
test('test block at line 307', () => {
const base = createFixtureBase();
try {
writeSliceFile(base, 'M001', 'S01', 'UAT', makeUatContent('artifact-driven'));
writeSliceFile(base, 'M001', 'S01', 'UAT-RESULT', '# UAT Result\n\nverdict: PASS\n');
// Write UAT file with a verdict — simulates completed UAT
writeSliceFile(base, 'M001', 'S01', 'UAT', '# UAT Result\n\nverdict: PASS\n');
const uatResultFilePath = resolveSliceFile(base, 'M001', 'S01', 'UAT-RESULT');
const uatFilePath = resolveSliceFile(base, 'M001', 'S01', 'UAT');
assert.ok(
uatResultFilePath !== null,
'resolveSliceFile(..., "UAT-RESULT") returns non-null when result file exists (idempotent skip state)',
uatFilePath !== null,
'resolveSliceFile(..., "UAT") returns non-null when UAT file exists',
);
const content = readFileSync(uatFilePath!, 'utf-8');
assert.ok(
/verdict:\s*[\w-]+/i.test(content),
'UAT file with verdict indicates UAT has been completed (idempotent skip state)',
);
} finally {
cleanup(base);
@ -390,7 +394,7 @@ test('(p) run-uat prompt allows PASS when human-only checks remain as NEEDS-HUMA
milestoneId: 'M001',
sliceId: 'S01',
uatPath: '.gsd/milestones/M001/slices/S01/S01-UAT.md',
uatResultPath: '.gsd/milestones/M001/slices/S01/S01-UAT-RESULT.md',
uatResultPath: '.gsd/milestones/M001/slices/S01/S01-UAT.md',
uatType: 'mixed',
inlinedContext: '<!-- no context -->',
});
@ -432,7 +436,7 @@ test('(n) stale replay guard', async () => {
);
writeSliceFile(base, 'M001', 'S01', 'UAT', makeUatContent('artifact-driven'));
writeSliceFile(base, 'M001', 'S01', 'UAT-RESULT', '---\nverdict: FAIL\n---\n');
writeSliceFile(base, 'M001', 'S01', 'UAT', '---\nverdict: FAIL\n---\n');
const state = {
activeMilestone: { id: 'M001', title: 'Test roadmap' },
@ -449,7 +453,7 @@ test('(n) stale replay guard', async () => {
assert.deepStrictEqual(
result,
null,
'existing UAT-RESULT with FAIL verdict does not re-dispatch; verdict gate owns blocking',
'existing UAT with FAIL verdict does not re-dispatch; verdict gate owns blocking',
);
} finally {
cleanup(base);