fix: run forensics duplicate detection before investigation (#2704) (#3260)

Move the dedup check from after the Investigation Protocol to before it,
so already-known bugs are caught before spending tokens on deep source
analysis. The DEDUP_PROMPT_SECTION now acts as a pre-investigation gate
with a decision to skip full investigation when a match is found.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Tom Boucher 2026-03-30 15:50:06 -04:00 committed by GitHub
parent 3d896eee8a
commit 0964c97931
3 changed files with 44 additions and 15 deletions

View file

@ -106,13 +106,15 @@ interface ForensicReport {
// ─── Duplicate Detection ──────────────────────────────────────────────────────
const DEDUP_PROMPT_SECTION = `
## Duplicate Detection (REQUIRED before issue creation)
## Pre-Investigation: Duplicate Check (REQUIRED)
Before offering to create a GitHub issue, you MUST search for existing issues and PRs that may already address this bug. This step uses the user's AI tokens for analysis.
Before reading GSD source code or performing deep analysis, you MUST search for existing issues and PRs that may already address this bug. This avoids wasting tokens on already-fixed bugs.
### Search Steps
1. **Search closed issues** for similar keywords from your diagnosis:
Use keywords from the user's problem description and the anomaly summaries in the forensic report above.
1. **Search closed issues** for similar keywords:
\`\`\`
gh issue list --repo gsd-build/gsd-2 --state closed --search "<keywords from root cause>" --limit 20
\`\`\`
@ -129,20 +131,16 @@ Before offering to create a GitHub issue, you MUST search for existing issues an
### Analysis
For each result, compare it against your root-cause diagnosis:
For each result, compare it against the user's reported symptoms and the forensic anomalies:
- Does the issue describe the same code path or file?
- Does the PR modify the same file:line you identified?
- Does the PR modify the area related to the reported symptoms?
- Is the symptom description semantically similar even if keywords differ?
### Present Findings
### Decision Gate
If you find potential matches, present them to the user:
1. **"Already fixed by PR #X — skip issue creation"** when a merged PR or closed issue clearly addresses the same root cause. Explain why you believe it matches.
2. **"Add my findings to existing issue #Y"** when an open issue exists for the same bug. Use \`gh issue comment #Y --repo gsd-build/gsd-2\` to add forensic evidence.
3. **"Create new issue anyway"** when existing results do not cover this specific failure.
Only proceed to issue creation if no matches were found OR the user explicitly chooses "Create new issue anyway".
- **Merged PR clearly fixes the described symptom** Report "Already fixed by PR #X" with brief explanation. Skip full investigation.
- **Open issue matches** Report "Existing issue #Y covers this." Offer to add forensic evidence. Skip full investigation unless user asks for deeper analysis.
- **No matches** Proceed to full investigation below.
`;
async function writeForensicsDedupPref(ctx: ExtensionCommandContext, enabled: boolean): Promise<void> {

View file

@ -102,6 +102,8 @@ A stale lock (PID is dead) means the previous auto-mode session crashed mid-unit
A unit dispatched more than once (`type/id` appears multiple times) indicates a stuck loop — the unit completed but artifact verification failed.
{{dedupSection}}
## Investigation Protocol
1. **Start with the pre-parsed forensic report** above. The anomaly section contains automated findings — treat these as leads, not conclusions.
@ -133,8 +135,6 @@ Explain your findings:
- **Code snippet** — the problematic code and what it should do instead
- **Recovery** — what the user can do right now to get unstuck
{{dedupSection}}
Then **offer GitHub issue creation**: "Would you like me to create a GitHub issue for this on gsd-build/gsd-2?"
**CRITICAL: The `github_issues` tool ONLY targets the current user's repository — it has no `repo` parameter. You MUST use `gh issue create --repo gsd-build/gsd-2` via the `bash` tool to file on the correct repo. Do NOT use the `github_issues` tool for this.**

View file

@ -46,3 +46,34 @@ describe("forensics dedup (#2096)", () => {
"opt-in notice must mention duplicate detection");
});
});
describe("forensics dedup ordering (#2704)", () => {
it("{{dedupSection}} appears before Investigation Protocol in the prompt template", () => {
const prompt = readFileSync(join(gsdDir, "prompts", "forensics.md"), "utf-8");
const dedupIndex = prompt.indexOf("{{dedupSection}}");
const investigationIndex = prompt.indexOf("## Investigation Protocol");
assert.ok(dedupIndex !== -1, "prompt must contain {{dedupSection}}");
assert.ok(investigationIndex !== -1, "prompt must contain ## Investigation Protocol");
assert.ok(
dedupIndex < investigationIndex,
`{{dedupSection}} (index ${dedupIndex}) must appear before Investigation Protocol (index ${investigationIndex}) — dedup should run before expensive investigation to avoid wasting tokens on already-fixed bugs`,
);
});
it("DEDUP_PROMPT_SECTION contains a decision gate to skip investigation", () => {
const source = readFileSync(join(gsdDir, "forensics.ts"), "utf-8");
// The dedup section must instruct the agent to skip investigation when a match is found
assert.ok(
source.includes("Skip full investigation") || source.includes("skip full investigation") || source.includes("Skip investigation"),
"DEDUP_PROMPT_SECTION must contain a decision gate telling the agent to skip full investigation when a duplicate is found",
);
});
it("DEDUP_PROMPT_SECTION heading reflects pre-investigation role", () => {
const source = readFileSync(join(gsdDir, "forensics.ts"), "utf-8");
assert.ok(
source.includes("Pre-Investigation") || source.includes("pre-investigation"),
"DEDUP_PROMPT_SECTION heading must indicate it runs before investigation, not just before issue creation",
);
});
});