Merge pull request #3988 from mastertyko/fix/3717-read-existing-artifacts-before-write

fix(gsd): read existing artifacts before write
2026-04-11 22:50:53 -05:00 · 2026-04-11 22:50:53 -05:00 · 083fc8e03d
commit 083fc8e03d
parent 7161424152 12be0eab7a
3 changed files with 54 additions and 20 deletions
--- a/src/resources/extensions/gsd/prompts/execute-task.md
+++ b/src/resources/extensions/gsd/prompts/execute-task.md
@ -32,29 +32,30 @@ Then:
 0. Narrate step transitions, key implementation decisions, and verification outcomes as you work. Keep it terse — one line between tool-call clusters, not between every call — but write complete sentences in user-facing prose, not shorthand notes or scratchpad fragments.
 1. {{skillActivation}} Follow any activated skills before writing code. If no skills match this task, skip this step.
 2. Execute the steps in the inlined task plan, adapting minor local mismatches when the surrounding code differs from the planner's snapshot
-3. Build the real thing. If the task plan says "create login endpoint", build an endpoint that actually authenticates against a real store, not one that returns a hardcoded success response. If the task plan says "create dashboard page", build a page that renders real data from the API, not a component with hardcoded props. Stubs and mocks are for tests, not for the shipped feature.
-4. Write or update tests as part of execution — tests are verification, not an afterthought. If the slice plan defines test files in its Verification section and this is the first task, create them (they should initially fail).
-5. When implementing non-trivial runtime behavior (async flows, API boundaries, background processes, error paths), add or preserve agent-usable observability. Skip this for simple changes where it doesn't apply.
+3. Before any `Write` that creates an artifact or output file, check whether that path already exists. If it does, read it first and decide whether the work is already done, should be extended, or truly needs replacement. "Create" in the plan does **not** mean the file is missing — a prior session may already have started it.
+4. Build the real thing. If the task plan says "create login endpoint", build an endpoint that actually authenticates against a real store, not one that returns a hardcoded success response. If the task plan says "create dashboard page", build a page that renders real data from the API, not a component with hardcoded props. Stubs and mocks are for tests, not for the shipped feature.
+5. Write or update tests as part of execution — tests are verification, not an afterthought. If the slice plan defines test files in its Verification section and this is the first task, create them (they should initially fail).
+6. When implementing non-trivial runtime behavior (async flows, API boundaries, background processes, error paths), add or preserve agent-usable observability. Skip this for simple changes where it doesn't apply.

   **Background process rule:** Never use bare `command &` to run background processes. The shell's `&` operator leaves stdout/stderr attached to the parent, which causes the Bash tool to hang indefinitely waiting for those streams to close. Always redirect output before backgrounding:
   - Correct: `command > /dev/null 2>&1 &` or `nohup command > /dev/null 2>&1 &`
   - Example: `python -m http.server 8080 > /dev/null 2>&1 &` (NOT `python -m http.server 8080 &`)
   - Preferred: use the `bg_shell` tool if available — it manages process lifecycle correctly without stream-inheritance issues
-6. If the task plan includes a **Failure Modes** section (Q5), implement the error/timeout/malformed handling specified. Verify each dependency's failure path is handled. Skip if the section is absent.
-7. If the task plan includes a **Load Profile** section (Q6), implement protections for the identified 10x breakpoint (connection pooling, rate limiting, pagination, etc.). Skip if absent.
-8. If the task plan includes a **Negative Tests** section (Q7), write the specified negative test cases alongside the happy-path tests — malformed inputs, error paths, and boundary conditions. Skip if absent.
-9. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
-10. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
-11. After the verification gate runs (you'll see gate results in stderr/notify output), populate the `## Verification Evidence` table in your task summary with the check results. Use the `formatEvidenceTable` format: one row per check with command, exit code, verdict (✅ pass / ❌ fail), and duration. If no verification commands were discovered, note that in the section.
-12. If the task touches UI, browser flows, DOM behavior, or user-visible web state:
+7. If the task plan includes a **Failure Modes** section (Q5), implement the error/timeout/malformed handling specified. Verify each dependency's failure path is handled. Skip if the section is absent.
+8. If the task plan includes a **Load Profile** section (Q6), implement protections for the identified 10x breakpoint (connection pooling, rate limiting, pagination, etc.). Skip if absent.
+9. If the task plan includes a **Negative Tests** section (Q7), write the specified negative test cases alongside the happy-path tests — malformed inputs, error paths, and boundary conditions. Skip if absent.
+10. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
+11. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
+12. After the verification gate runs (you'll see gate results in stderr/notify output), populate the `## Verification Evidence` table in your task summary with the check results. Use the `formatEvidenceTable` format: one row per check with command, exit code, verdict (✅ pass / ❌ fail), and duration. If no verification commands were discovered, note that in the section.
+13. If the task touches UI, browser flows, DOM behavior, or user-visible web state:
   - exercise the real flow in the browser
   - prefer `browser_batch` when the next few actions are obvious and sequential
   - prefer `browser_assert` for explicit pass/fail verification of the intended outcome
   - use `browser_diff` when an action's effect is ambiguous
   - use console/network/dialog diagnostics when validating async, stateful, or failure-prone UI
   - record verification in terms of explicit checks passed/failed, not only prose interpretation
-13. If the task plan includes an Observability Impact section, verify those signals directly. Skip this step if the task plan omits the section.
-14. **If execution is running long or verification fails:**
+14. If the task plan includes an Observability Impact section, verify those signals directly. Skip this step if the task plan omits the section.
+15. **If execution is running long or verification fails:**

    **Context budget:** You have approximately **{{verificationBudget}}** reserved for verification context. If you've used most of your context and haven't finished all steps, stop implementing and prioritize writing the task summary with clear notes on what's done and what remains. A partial summary that enables clean resumption is more valuable than one more half-finished step with no documentation. Never sacrifice summary quality for one more implementation step.

@ -65,13 +66,13 @@ Then:
    - Distinguish "I know" from "I assume." Observable facts (the error says X) are strong evidence. Assumptions (this library should work this way) need verification.
    - Know when to stop. If you've tried 3+ fixes without progress, your mental model is probably wrong. Stop. List what you know for certain. List what you've ruled out. Form fresh hypotheses from there.
    - Don't fix symptoms. Understand *why* something fails before changing code. A test that passes after a change you don't understand is luck, not a fix.
-15. **Blocker discovery:** If execution reveals that the remaining slice plan is fundamentally invalid — not just a bug or minor deviation, but a plan-invalidating finding like a wrong API, missing capability, or architectural mismatch — set `blocker_discovered: true` in the task summary frontmatter and describe the blocker clearly in the summary narrative. Do NOT set `blocker_discovered: true` for ordinary debugging, minor deviations, or issues that can be fixed within the current task or the remaining plan. This flag triggers an automatic replan of the slice.
-16. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
-17. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
-18. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
-19. Use that template to prepare the completion content you will pass to `gsd_complete_task` using the camelCase fields `milestoneId`, `sliceId`, `taskId`, `oneLiner`, `narrative`, `verification`, and `verificationEvidence`. Do **not** manually write `{{taskSummaryPath}}` — the DB-backed tool is the canonical write path and renders the summary file for you.
-20. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and the completion fields derived from the template. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, renders `{{taskSummaryPath}}`, and updates PLAN.md automatically.
-21. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.
+16. **Blocker discovery:** If execution reveals that the remaining slice plan is fundamentally invalid — not just a bug or minor deviation, but a plan-invalidating finding like a wrong API, missing capability, or architectural mismatch — set `blocker_discovered: true` in the task summary frontmatter and describe the blocker clearly in the summary narrative. Do NOT set `blocker_discovered: true` for ordinary debugging, minor deviations, or issues that can be fixed within the current task or the remaining plan. This flag triggers an automatic replan of the slice.
+17. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.gsd/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
+18. If you discover a non-obvious rule, recurring gotcha, or useful pattern during execution, append it to `.gsd/KNOWLEDGE.md`. Only add entries that would save future agents from repeating your investigation. Don't add obvious things.
+19. Read the template at `~/.gsd/agent/extensions/gsd/templates/task-summary.md`
+20. Use that template to prepare the completion content you will pass to `gsd_complete_task` using the camelCase fields `milestoneId`, `sliceId`, `taskId`, `oneLiner`, `narrative`, `verification`, and `verificationEvidence`. Do **not** manually write `{{taskSummaryPath}}` — the DB-backed tool is the canonical write path and renders the summary file for you.
+21. Call `gsd_complete_task` with milestoneId, sliceId, taskId, and the completion fields derived from the template. This is your final required step — do NOT manually edit PLAN.md checkboxes. The tool marks the task complete, updates the DB, renders `{{taskSummaryPath}}`, and updates PLAN.md automatically.
+22. Do not run git commands — the system reads your task summary after completion and creates a meaningful commit from it (type inferred from title, message from your one-liner, key files from frontmatter). Write a clear, specific one-liner in the summary — it becomes the commit message.

 All work stays in your working directory: `{{workingDirectory}}`.

--- a/src/resources/extensions/gsd/prompts/guided-resume-task.md
+++ b/src/resources/extensions/gsd/prompts/guided-resume-task.md
@ -1 +1 @@
-Resume interrupted work. Find the continue file (`{{sliceId}}-CONTINUE.md` or `continue.md`) in slice {{sliceId}} of milestone {{milestoneId}}, read it, and use it as the recovery contract for where to pick up. Do **not** delete the continue file immediately. Keep it until the task is successfully completed or you have written a newer summary/continue artifact that clearly supersedes it. If the resumed attempt fails again, update or replace the continue file so no recovery context is lost. {{skillActivation}}
+Resume interrupted work. Find the continue file (`{{sliceId}}-CONTINUE.md` or `continue.md`) in slice {{sliceId}} of milestone {{milestoneId}}, read it, and use it as the recovery contract for where to pick up. Before you create any expected artifact or output file, check whether it already exists and read it first — a prior session may already have started or completed that work. Do **not** delete the continue file immediately. Keep it until the task is successfully completed or you have written a newer summary/continue artifact that clearly supersedes it. If the resumed attempt fails again, update or replace the continue file so no recovery context is lost. {{skillActivation}}
--- a/src/resources/extensions/gsd/tests/execute-task-prompt-existing-artifact-guard.test.ts
+++ b/src/resources/extensions/gsd/tests/execute-task-prompt-existing-artifact-guard.test.ts
@ -0,0 +1,33 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+import { readFileSync } from "node:fs";
+import { dirname, join } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const promptsDir = join(__dirname, "..", "prompts");
+
+test("execute-task prompt requires reading existing artifacts before write", () => {
+  const prompt = readFileSync(join(promptsDir, "execute-task.md"), "utf-8");
+
+  assert.match(
+    prompt,
+    /Before any `Write` that creates an artifact or output file, check whether that path already exists\./,
+    "execute-task prompt should require an existence check before creating artifacts",
+  );
+  assert.match(
+    prompt,
+    /If it does, read it first and decide whether the work is already done, should be extended, or truly needs replacement\./,
+    "execute-task prompt should require reading existing artifacts before replacement",
+  );
+});
+
+test("guided resume prompt checks for pre-existing artifacts", () => {
+  const prompt = readFileSync(join(promptsDir, "guided-resume-task.md"), "utf-8");
+
+  assert.match(
+    prompt,
+    /Before you create any expected artifact or output file, check whether it already exists and read it first/i,
+    "guided resume prompt should guard pre-existing artifacts",
+  );
+});