Merge pull request #3850 from jeremymcs/fix/auto-loop-test-timeouts

fix: make gsd_complete_task the only execute-task summary path
This commit is contained in:
Jeremy McSpadden 2026-04-09 05:35:46 -05:00 committed by GitHub
commit ff54c91dd8
4 changed files with 31 additions and 14 deletions

View file

@ -25,11 +25,11 @@ Then:
4. If the slice plan includes observability/diagnostic surfaces, confirm they work. Skip this for simple slices that don't have observability sections.
5. If the slice involved runtime behavior, fill the **Operational Readiness** section (Q8) in the slice summary: health signal, failure signal, recovery procedure, and monitoring gaps. Omit entirely for simple slices with no runtime concerns.
6. If this slice produced evidence that a requirement changed status (Active → Validated, Active → Deferred, etc.), call `gsd_requirement_update` with the requirement ID, updated `status`, and `validation` evidence. Do NOT write `.gsd/REQUIREMENTS.md` directly — the engine renders it from the database.
7. Write `{{sliceSummaryPath}}` (compress all task summaries).
8. Write `{{sliceUatPath}}` — a concrete UAT script with real test cases derived from the slice plan and task summaries. Include preconditions, numbered steps with expected outcomes, and edge cases. This must NOT be a placeholder or generic template — tailor every test case to what this slice actually built.
7. Prepare the slice completion content you will pass to `gsd_complete_slice` using the camelCase fields `milestoneId`, `sliceId`, `sliceTitle`, `oneLiner`, `narrative`, `verification`, and `uatContent`. Do **not** manually write `{{sliceSummaryPath}}`. Do **not** manually write `{{sliceUatPath}}` — the DB-backed tool is the canonical write path for both artifacts.
8. Draft the UAT content you will pass as `uatContent` — a concrete UAT script with real test cases derived from the slice plan and task summaries. Include preconditions, numbered steps with expected outcomes, and edge cases. This must NOT be a placeholder or generic template — tailor every test case to what this slice actually built.
9. Review task summaries for `key_decisions`. Append any significant decisions to `.gsd/DECISIONS.md` if missing.
10. Review task summaries for patterns, gotchas, or non-obvious lessons learned. If any would save future agents from repeating investigation or hitting the same issues, append them to `.gsd/KNOWLEDGE.md`. Only add entries that are genuinely useful — don't pad with obvious observations.
11. Call `gsd_complete_slice` with milestoneId, sliceId, the slice summary, and the UAT result. Do NOT manually mark the roadmap checkbox — the tool writes to the DB and renders the ROADMAP.md projection automatically.
11. Call `gsd_complete_slice` with the camelCase fields `milestoneId`, `sliceId`, `sliceTitle`, `oneLiner`, `narrative`, `verification`, and `uatContent`, plus any optional enrichment fields you have. Do NOT manually mark the roadmap checkbox — the tool writes to the DB, renders `{{sliceSummaryPath}}` and `{{sliceUatPath}}`, and updates the ROADMAP.md projection automatically.
12. Do not run git commands — the system commits your changes and handles any merge after this unit succeeds.
13. Update `.gsd/PROJECT.md` if it exists — refresh current state if needed: use the `write` tool with `path: ".gsd/PROJECT.md"` and `content` containing the full updated document reflecting current project state. Do NOT use the `edit` tool for this — PROJECT.md is a full-document refresh.

View file

@ -69,14 +69,14 @@ Then:
16. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
17. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
18. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
19. Write `{{taskSummaryPath}}`
20. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and a summary of what was accomplished. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, and renders PLAN.md automatically.
19. Use that template to prepare the completion content you will pass to `gsd_complete_task` using the camelCase fields `milestoneId`, `sliceId`, `taskId`, `oneLiner`, `narrative`, `verification`, and `verificationEvidence`. Do **not** manually write `{{taskSummaryPath}}` — the DB-backed tool is the canonical write path and renders the summary file for you.
20. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and the completion fields derived from the template. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, renders `{{taskSummaryPath}}`, and updates PLAN.md automatically.
21. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.
All work stays in your working directory: `{{workingDirectory}}`.
**Autonomous execution:** Do not call `ask_user_questions` or `secure_env_collect`. You are running in auto-mode — there is no human available to answer questions. Make reasonable assumptions and document them in the task summary. If a decision genuinely requires human input, note it in the summary and proceed with the best available option.
**You MUST call `gsd_complete_task` AND write `{{taskSummaryPath}}` before finishing.**
**You MUST call `gsd_complete_task` before finishing. Do not manually write `{{taskSummaryPath}}`.**
When done, say: "Task {{taskId}} complete."

View file

@ -40,9 +40,9 @@ After all reviewers complete, aggregate their verdicts:
- If any reviewer says NEEDS-ATTENTION → overall verdict: `needs-attention`
- If any reviewer says FAIL → overall verdict: `needs-remediation`
### Step 3 — Write VALIDATION File
### Step 3 — Persist Validation
Write to `{{validationPath}}`:
Prepare the validation content you will pass to `gsd_validate_milestone`. Do **not** manually write `{{validationPath}}` — the DB-backed tool is the canonical write path and renders the validation file for you.
```markdown
---
@ -69,13 +69,15 @@ reviewers: 3
<if verdict is not pass: specific actions required>
```
Call `gsd_validate_milestone` with the camelCase fields `milestoneId`, `verdict`, `remediationRound`, `successCriteriaChecklist`, `sliceDeliveryAudit`, `crossSliceIntegration`, `requirementCoverage`, `verdictRationale`, and `remediationPlan` when needed. If you include verification-class analysis, pass it in `verificationClasses`.
**DB access safety:** Do NOT query `.gsd/gsd.db` directly via `sqlite3` or `node -e require('better-sqlite3')` — the engine owns the WAL connection. Use `gsd_milestone_status` to read milestone and slice state. All data you need is already inlined in the context above or accessible via the `gsd_*` tools. Direct DB access corrupts the WAL and bypasses tool-level validation.
If verdict is `needs-remediation`:
- Add new slices to `{{roadmapPath}}` with unchecked `[ ]` status
- These slices will be planned and executed before validation re-runs
- Use `gsd_reassess_roadmap` to add the remediation slices instead of editing `{{roadmapPath}}` manually
- Those slices will be planned and executed before validation re-runs
**You MUST write `{{validationPath}}` before finishing.**
**You MUST call `gsd_validate_milestone` before finishing. Do not manually write `{{validationPath}}`.**
**File system safety:** When scanning milestone directories for evidence, use `ls` or `find` to list directory contents first — never pass a directory path (e.g. `tasks/`, `slices/`) directly to the `read` tool. The `read` tool only accepts file paths, not directories.

View file

@ -71,11 +71,13 @@ test("execute-task prompt references gsd_complete_task tool", () => {
assert.match(prompt, /gsd_complete_task/);
});
test("execute-task prompt instructs writing task summary before tool call", () => {
test("execute-task prompt uses gsd_complete_task as canonical summary write path", () => {
const prompt = readPrompt("execute-task");
// The prompt instructs writing the summary file AND calling the tool
assert.match(prompt, /\{\{taskSummaryPath\}\}/);
assert.match(prompt, /gsd_complete_task/);
assert.match(prompt, /DB-backed tool is the canonical write path/i);
assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{taskSummaryPath\}\}`?/i);
assert.doesNotMatch(prompt, /^\d+\.\s+Write `?\{\{taskSummaryPath\}\}`?\s*$/m);
});
test("execute-task prompt does not instruct LLM to toggle checkboxes manually", () => {
@ -119,10 +121,14 @@ test("guided-complete-slice prompt references gsd_slice_complete tool", () => {
test("complete-slice prompt instructs writing summary and UAT files before tool call", () => {
const prompt = readPrompt("complete-slice");
// The prompt instructs writing the summary AND UAT files, then calling the tool
assert.match(prompt, /\{\{sliceSummaryPath\}\}/);
assert.match(prompt, /\{\{sliceUatPath\}\}/);
assert.match(prompt, /gsd_complete_slice/);
assert.match(prompt, /DB-backed tool is the canonical write path/i);
assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{sliceSummaryPath\}\}`?/i);
assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{sliceUatPath\}\}`?/i);
assert.doesNotMatch(prompt, /^\d+\.\s+Write `?\{\{sliceSummaryPath\}\}`?.*$/m);
assert.doesNotMatch(prompt, /^\d+\.\s+Write `?\{\{sliceUatPath\}\}`?.*$/m);
});
test("complete-slice prompt preserves decisions and knowledge review steps", () => {
@ -131,6 +137,15 @@ test("complete-slice prompt preserves decisions and knowledge review steps", ()
assert.match(prompt, /KNOWLEDGE\.md/);
});
test("validate-milestone prompt uses gsd_validate_milestone as canonical validation write path", () => {
const prompt = readPrompt("validate-milestone");
assert.match(prompt, /gsd_validate_milestone/);
assert.match(prompt, /\{\{validationPath\}\}/);
assert.match(prompt, /DB-backed tool is the canonical write path/i);
assert.match(prompt, /Do \*\*not\*\* manually write `?\{\{validationPath\}\}`?/i);
assert.doesNotMatch(prompt, /Write to `?\{\{validationPath\}\}`?:/i);
});
test("complete-slice prompt still contains template variables for context", () => {
const prompt = readPrompt("complete-slice");
assert.match(prompt, /\{\{sliceSummaryPath\}\}/);