Merge pull request #3850 from jeremymcs/fix/auto-loop-test-timeouts

fix: make gsd_complete_task the only execute-task summary path
2026-04-09 05:35:46 -05:00 · 2026-04-09 05:35:46 -05:00 · ff54c91dd8
commit ff54c91dd8
parent fb63ec6b8e dcc85c6d0a
4 changed files with 31 additions and 14 deletions
--- a/src/resources/extensions/gsd/prompts/complete-slice.md
+++ b/src/resources/extensions/gsd/prompts/complete-slice.md
@ -25,11 +25,11 @@ Then:
 4. If the slice plan includes observability/diagnostic surfaces, confirm they work. Skip this for simple slices that don't have observability sections.
 5. If the slice involved runtime behavior, fill the **Operational Readiness** section (Q8) in the slice summary: health signal, failure signal, recovery procedure, and monitoring gaps. Omit entirely for simple slices with no runtime concerns.
 6. If this slice produced evidence that a requirement changed status (Active → Validated, Active → Deferred, etc.), call `gsd_requirement_update` with the requirement ID, updated `status`, and `validation` evidence. Do NOT write `.gsd/REQUIREMENTS.md` directly — the engine renders it from the database.
-7. Write `{{sliceSummaryPath}}` (compress all task summaries).
-8. Write `{{sliceUatPath}}` — a concrete UAT script with real test cases derived from the slice plan and task summaries. Include preconditions, numbered steps with expected outcomes, and edge cases. This must NOT be a placeholder or generic template — tailor every test case to what this slice actually built.
+7. Prepare the slice completion content you will pass to `gsd_complete_slice` using the camelCase fields `milestoneId`, `sliceId`, `sliceTitle`, `oneLiner`, `narrative`, `verification`, and `uatContent`. Do **not** manually write `{{sliceSummaryPath}}`. Do **not** manually write `{{sliceUatPath}}` — the DB-backed tool is the canonical write path for both artifacts.
+8. Draft the UAT content you will pass as `uatContent` — a concrete UAT script with real test cases derived from the slice plan and task summaries. Include preconditions, numbered steps with expected outcomes, and edge cases. This must NOT be a placeholder or generic template — tailor every test case to what this slice actually built.
 9. Review task summaries for `key_decisions`. Append any significant decisions to `.gsd/DECISIONS.md` if missing.
 10. Review task summaries for patterns, gotchas, or non-obvious lessons learned. If any would save future agents from repeating investigation or hitting the same issues, append them to `.gsd/KNOWLEDGE.md`. Only add entries that are genuinely useful — don't pad with obvious observations.
-11. Call `gsd_complete_slice` with milestoneId, sliceId, the slice summary, and the UAT result. Do NOT manually mark the roadmap checkbox — the tool writes to the DB and renders the ROADMAP.md projection automatically.
+11. Call `gsd_complete_slice` with the camelCase fields `milestoneId`, `sliceId`, `sliceTitle`, `oneLiner`, `narrative`, `verification`, and `uatContent`, plus any optional enrichment fields you have. Do NOT manually mark the roadmap checkbox — the tool writes to the DB, renders `{{sliceSummaryPath}}` and `{{sliceUatPath}}`, and updates the ROADMAP.md projection automatically.
 12. Do not run git commands — the system commits your changes and handles any merge after this unit succeeds.
 13. Update `.gsd/PROJECT.md` if it exists — refresh current state if needed: use the `write` tool with `path: ".gsd/PROJECT.md"` and `content` containing the full updated document reflecting current project state. Do NOT use the `edit` tool for this — PROJECT.md is a full-document refresh.

--- a/src/resources/extensions/gsd/prompts/execute-task.md
+++ b/src/resources/extensions/gsd/prompts/execute-task.md
@ -69,14 +69,14 @@ Then:
 16. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
 17. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
 18. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
-19. Write `{{taskSummaryPath}}`
-20. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and a summary of what was accomplished. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, and renders PLAN.md automatically.
+19. Use that template to prepare the completion content you will pass to `gsd_complete_task` using the camelCase fields `milestoneId`, `sliceId`, `taskId`, `oneLiner`, `narrative`, `verification`, and `verificationEvidence`. Do **not** manually write `{{taskSummaryPath}}` — the DB-backed tool is the canonical write path and renders the summary file for you.
+20. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and the completion fields derived from the template. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, renders `{{taskSummaryPath}}`, and updates PLAN.md automatically.
 21. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.

 All work stays in your working directory: `{{workingDirectory}}`.

 **Autonomous execution:** Do not call `ask_user_questions` or `secure_env_collect`. You are running in auto-mode — there is no human available to answer questions. Make reasonable assumptions and document them in the task summary. If a decision genuinely requires human input, note it in the summary and proceed with the best available option.

-**You MUST call `gsd_complete_task` AND write `{{taskSummaryPath}}` before finishing.**
+**You MUST call `gsd_complete_task` before finishing. Do not manually write `{{taskSummaryPath}}`.**

 When done, say: "Task {{taskId}} complete."
--- a/src/resources/extensions/gsd/prompts/validate-milestone.md
+++ b/src/resources/extensions/gsd/prompts/validate-milestone.md
@ -40,9 +40,9 @@ After all reviewers complete, aggregate their verdicts:
 - If any reviewer says NEEDS-ATTENTION → overall verdict: `needs-attention`
 - If any reviewer says FAIL → overall verdict: `needs-remediation`

-### Step 3 — Write VALIDATION File
+### Step 3 — Persist Validation

-Write to `{{validationPath}}`:
+Prepare the validation content you will pass to `gsd_validate_milestone`. Do **not** manually write `{{validationPath}}` — the DB-backed tool is the canonical write path and renders the validation file for you.

 ```markdown
 ---
@ -69,13 +69,15 @@ reviewers: 3
 <if verdict is not pass: specific actions required>
 ```

+Call `gsd_validate_milestone` with the camelCase fields `milestoneId`, `verdict`, `remediationRound`, `successCriteriaChecklist`, `sliceDeliveryAudit`, `crossSliceIntegration`, `requirementCoverage`, `verdictRationale`, and `remediationPlan` when needed. If you include verification-class analysis, pass it in `verificationClasses`.
+
 **DB access safety:** Do NOT query `.gsd/gsd.db` directly via `sqlite3` or `node -e require('better-sqlite3')` — the engine owns the WAL connection. Use `gsd_milestone_status` to read milestone and slice state. All data you need is already inlined in the context above or accessible via the `gsd_*` tools. Direct DB access corrupts the WAL and bypasses tool-level validation.

 If verdict is `needs-remediation`:
- Add new slices to `{{roadmapPath}}` with unchecked `[ ]` status
- These slices will be planned and executed before validation re-runs
+- Use `gsd_reassess_roadmap` to add the remediation slices instead of editing `{{roadmapPath}}` manually
+- Those slices will be planned and executed before validation re-runs

-**You MUST write `{{validationPath}}` before finishing.**
+**You MUST call `gsd_validate_milestone` before finishing. Do not manually write `{{validationPath}}`.**

 **File system safety:** When scanning milestone directories for evidence, use `ls` or `find` to list directory contents first — never pass a directory path (e.g. `tasks/`, `slices/`) directly to the `read` tool. The `read` tool only accepts file paths, not directories.

--- a/src/resources/extensions/gsd/tests/prompt-contracts.test.ts
+++ b/src/resources/extensions/gsd/tests/prompt-contracts.test.ts
@ -71,11 +71,13 @@ test("execute-task prompt references gsd_complete_task tool", () => {
  assert.match(prompt, /gsd_complete_task/);
 });

-test("execute-task prompt instructs writing task summary before tool call", () => {
+test("execute-task prompt uses gsd_complete_task as canonical summary write path", () => {
  const prompt = readPrompt("execute-task");
-  // The prompt instructs writing the summary file AND calling the tool
  assert.match(prompt, /\{\{taskSummaryPath\}\}/);
  assert.match(prompt, /gsd_complete_task/);
+  assert.match(prompt, /DB-backed tool is the canonical write path/i);
+  assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{taskSummaryPath\}\}`?/i);
+  assert.doesNotMatch(prompt, /^\d+\.\s+Write `?\{\{taskSummaryPath\}\}`?\s*$/m);
 });

 test("execute-task prompt does not instruct LLM to toggle checkboxes manually", () => {
@ -119,10 +121,14 @@ test("guided-complete-slice prompt references gsd_slice_complete tool", () => {

 test("complete-slice prompt instructs writing summary and UAT files before tool call", () => {
  const prompt = readPrompt("complete-slice");
-  // The prompt instructs writing the summary AND UAT files, then calling the tool
  assert.match(prompt, /\{\{sliceSummaryPath\}\}/);
  assert.match(prompt, /\{\{sliceUatPath\}\}/);
  assert.match(prompt, /gsd_complete_slice/);
+  assert.match(prompt, /DB-backed tool is the canonical write path/i);
+  assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{sliceSummaryPath\}\}`?/i);
+  assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{sliceUatPath\}\}`?/i);
+  assert.doesNotMatch(prompt, /^\d+\.\s+Write `?\{\{sliceSummaryPath\}\}`?.*$/m);
+  assert.doesNotMatch(prompt, /^\d+\.\s+Write `?\{\{sliceUatPath\}\}`?.*$/m);
 });

 test("complete-slice prompt preserves decisions and knowledge review steps", () => {
@ -131,6 +137,15 @@ test("complete-slice prompt preserves decisions and knowledge review steps", ()
  assert.match(prompt, /KNOWLEDGE\.md/);
 });

+test("validate-milestone prompt uses gsd_validate_milestone as canonical validation write path", () => {
+  const prompt = readPrompt("validate-milestone");
+  assert.match(prompt, /gsd_validate_milestone/);
+  assert.match(prompt, /\{\{validationPath\}\}/);
+  assert.match(prompt, /DB-backed tool is the canonical write path/i);
+  assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{validationPath\}\}`?/i);
+  assert.doesNotMatch(prompt, /Write to `?\{\{validationPath\}\}`?:/i);
+});
+
 test("complete-slice prompt still contains template variables for context", () => {
  const prompt = readPrompt("complete-slice");
  assert.match(prompt, /\{\{sliceSummaryPath\}\}/);