Merge pull request #4019 from jeremymcs/fix/4018-anti-fabrication-guardrails

fix(gsd): enforce anti-fabrication turn-taking in discuss prompts
This commit is contained in:
Jeremy McSpadden 2026-04-12 00:29:47 -05:00 committed by GitHub
commit 8ffec7fad0
8 changed files with 32 additions and 3 deletions

View file

@ -275,7 +275,7 @@ Work flows through these phases. Each phase produces a file.
**How to do it manually:**
1. Read the roadmap to understand the scope.
2. Identify 3-5 gray areas — implementation decisions the user cares about.
3. Use `ask_user_questions` to discuss each area.
3. Use `ask_user_questions` to discuss each area, one round at a time. Never fabricate user input; wait for the user's actual response before the next round.
4. Write decisions to the appropriate context file (`M###-CONTEXT.md` or `S##-CONTEXT.md`).
5. Do NOT discuss how to implement — only what the user wants.

View file

@ -73,6 +73,8 @@ After each round of answers, decide whether you already have enough depth to wri
You are a thinking partner, not an interviewer.
**Turn-taking contract (non-bypassable).** Never fabricate, simulate, or role-play user responses. Never generate fake transcript markers like `[User]`, `[Human]`, or `User:` to invent input. Ask one question round (1-3 questions) per turn, then stop and wait for the user's actual response before continuing. If you use `ask_user_questions`, call it at most once per turn and treat its returned response as the only valid structured user input for that round.
**Start open, follow energy.** Let the user's enthusiasm guide where you dig deeper. If they light up about a particular aspect, explore it. If they're vague about something, that's where you probe.
**Challenge vagueness, make abstract concrete.** When the user says something abstract ("it should be smart" / "it needs to handle edge cases" / "good UX"), push for specifics. What does "smart" mean in practice? Which edge cases? What does good UX look like for this specific interaction?

View file

@ -32,6 +32,8 @@ Ask **13 questions per round**. Keep each question focused on one of:
- **The biggest technical unknowns / risks** — what could fail, what hasn't been proven
- **What external systems/services this touches** — APIs, databases, third-party services
**Never fabricate or simulate user input.** Never generate fake transcript markers like `[User]`, `[Human]`, or `User:`. Ask one question round, then wait for the user's actual response before continuing.
**If `{{structuredQuestionsAvailable}}` is `true`:** use `ask_user_questions` for each round. 13 questions per call, each as a separate question object. Keep option labels short (35 words). Always include a freeform "Other / let me explain" option. When the user picks that option or writes a long freeform answer, switch to plain text follow-up for that thread before resuming structured questions. **IMPORTANT: Call `ask_user_questions` exactly once per turn. Never make multiple calls with the same or overlapping questions — wait for the user's response before asking the next round.**
**If `{{structuredQuestionsAvailable}}` is `false`:** ask questions in plain text. Keep each round to 13 focused questions. Wait for answers before asking the next round.

View file

@ -22,6 +22,8 @@ Do **not** go deep — just enough that your questions reflect what's actually t
### Question rounds
**Never fabricate or simulate user input.** Never generate fake transcript markers like `[User]`, `[Human]`, or `User:`. Ask one question round, then wait for the user's actual response before continuing.
**If `{{structuredQuestionsAvailable}}` is `true`:** Ask **13 questions per round** using `ask_user_questions`. **Call `ask_user_questions` exactly once per turn — never make multiple calls with the same or overlapping questions. Wait for the user's response before asking the next round.**
**If `{{structuredQuestionsAvailable}}` is `false`:** Ask **13 questions per round** in plain text. Number them and wait for the user's response before asking the next round.
Keep each question focused on one of:

View file

@ -18,6 +18,7 @@ Say exactly: "What do you want to add?" — nothing else. Wait for the user's an
## Discussion Phase
After they describe it, your job is to understand the new work deeply enough to create context files that a future planning session can use.
Never fabricate or simulate user input during this discussion. Never generate fake transcript markers like `[User]`, `[Human]`, or `User:`. Ask one question round, then wait for the user's actual response before continuing.
**If the user provides a file path or pastes a large document** (spec, design doc, product plan, chat export), read it fully before asking questions. Use it as the starting point — don't ask them to re-explain what's already in the document. Your questions should fill gaps and resolve ambiguities the document doesn't cover.
@ -36,11 +37,11 @@ Don't go deep — just enough that your next question reflects what's actually t
- How the new work relates to existing milestones — overlap, dependencies, prerequisites
- If `.gsd/REQUIREMENTS.md` exists: which unmet Active or Deferred requirements this queued work advances
**Then use ask_user_questions** to dig into gray areas — scope boundaries, proof expectations, integration choices, tech preferences when they materially matter, and what's in vs out. 1-3 questions per round.
**Then use ask_user_questions** to dig into gray areas — scope boundaries, proof expectations, integration choices, tech preferences when they materially matter, and what's in vs out. Ask 1-3 questions per round, then wait for the user's response before asking the next round.
If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during discuss/planning work, but do not let it override the required discuss flow or artifact requirements.
**Self-regulate:** Do **not** ask a meta "ready to queue?" question after every round. Keep going until you have enough depth to write the context well, then use a single wrap-up prompt if needed. If the user clearly keeps adding detail instead of objecting, treat that as permission to continue.
**Self-regulate:** Do **not** ask a meta "ready to queue?" question after every round. Keep going until you have enough depth to write the context well, then use a single wrap-up prompt if needed. Do not infer permission to continue from silence or from partial prior answers — each new round requires an actual user response.
## Existing Milestone Awareness

View file

@ -35,6 +35,7 @@ GSD ships with bundled skills. Load the relevant skill file with the `read` tool
- Read before edit.
- Reproduce before fix when possible.
- Work is not done until the relevant verification has passed.
- **Never fabricate, simulate, or role-play user responses.** Never generate markers like `[User]`, `[Human]`, `User:`, or similar to represent user input inside your own output. Ask one question round (1-3 questions), then stop and wait for the user's actual response before continuing. If `ask_user_questions` is available, treat its returned response as the only valid structured user input for that round.
- Never print, echo, log, or restate secrets or credentials. Report only key names and applied/skipped status.
- Never ask the user to edit `.env` files or set secrets manually. Use `secure_env_collect`.
- In enduring files, write current state only unless the file is explicitly historical.

View file

@ -42,9 +42,19 @@ test("system prompt references CODEBASE.md and /gsd codebase", () => {
assert.match(prompt, /auto-refreshes it when tracked files change/i);
});
test("system prompt hard rules forbid fabricating user responses", () => {
const prompt = readPrompt("system");
assert.match(prompt, /never fabricate, simulate, or role-play user responses/i);
assert.match(prompt, /never generate markers like `?\[User\]`?, `?\[Human\]`?, `?User:`?/i);
assert.match(prompt, /ask one question round \(1-3 questions\), then stop and wait for the user's actual response/i);
assert.match(prompt, /ask_user_questions.*only valid structured user input/i);
});
test("discuss prompt allows implementation questions when they materially matter", () => {
const prompt = readPrompt("discuss");
assert.match(prompt, /Lead with experience, but ask implementation when it materially matters/i);
assert.match(prompt, /Never fabricate, simulate, or role-play user responses/i);
assert.match(prompt, /Ask one question round \(1-3 questions\) per turn, then stop and wait for the user's actual response/i);
assert.match(prompt, /one gate, not two/i);
assert.doesNotMatch(prompt, /Questions must be about the experience, not the implementation/i);
});
@ -56,6 +66,8 @@ test("guided discussion prompts avoid wrap-up prompts after every round", () =>
assert.match(slicePrompt, /Do \*\*not\*\* ask a meta "ready to wrap up\?" question after every round/i);
assert.doesNotMatch(milestonePrompt, /I think I have a solid picture of this milestone\. Ready to wrap up/i);
assert.doesNotMatch(slicePrompt, /I think I have a solid picture of this slice\. Ready to wrap up/i);
assert.match(milestonePrompt, /Never fabricate or simulate user input/i);
assert.match(slicePrompt, /Never fabricate or simulate user input/i);
});
test("guided milestone discussion scopes depth verification to the milestone id", () => {
@ -64,6 +76,13 @@ test("guided milestone discussion scopes depth verification to the milestone id"
assert.doesNotMatch(prompt, /depth_verification_confirm" — this enables the write-gate downstream/i, "legacy global depth gate wording should be gone");
});
test("queue prompt requires waiting for user response between rounds", () => {
const prompt = readPrompt("queue");
assert.match(prompt, /Never fabricate or simulate user input during this discussion/i);
assert.match(prompt, /Ask 1-3 questions per round, then wait for the user's response before asking the next round\./i);
assert.doesNotMatch(prompt, /treat that as permission to continue/i);
});
test("guided-resume-task prompt preserves recovery state until work is superseded", () => {
const prompt = readPrompt("guided-resume-task");
assert.match(prompt, /Do \*\*not\*\* delete the continue file immediately/i);

View file

@ -78,6 +78,8 @@ Based on the user's message, route directly to the appropriate workflow:
**If user intent is unclear, ask minimal clarifying questions:**
- "Create a MIDI skill" → "Task-execution skill (does MIDI tasks) or domain expertise (complete MIDI knowledge base)?"
- "Work on my skill" → "Which skill? What do you want to do with it?"
- Ask one clarifying question round at a time, then wait for the user's actual response before asking another.
- Never fabricate or simulate user responses while clarifying (for example, fake `[User]` markers or imagined answers).
Then proceed directly to the workflow.
</routing>