Tests were checking `git log --oneline` for M001, but the refactor moved
milestone IDs from commit subject scopes to git trailers in the body.
Switch to `git log` (full format) so the trailer content is visible.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `/gsd auto --yolo <spec-file>` (or `-y`) which reads a spec/PRD/ADR
and creates all milestone artifacts without interactive Q&A gates. Uses
the existing showHeadlessMilestoneCreation path — no changes to
startAuto or bootstrapAutoSession internals.
Rewrites discuss-headless.md to match the full rigor of the interactive
discuss.md prompt: mandatory codebase investigation, focused research
(table stakes, domain standards, omissions), capability contract with
R### traceability, gsd_plan_milestone tool usage, roadmap preview in
chat, multi-milestone manifest tracking, and depth verification audit
trail. The only difference from interactive mode is that all decisions
are made autonomously with assumptions documented.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove GSD planning IDs (milestone/slice/task) from conventional commit
subject lines and place them in machine-parseable git trailers instead.
Skip auto-commits for lifecycle-only unit types that only touch .gsd/ files.
Resolvesgsd-build/gsd-2#2553
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gsd-build/gsd-2/sessions/250b4775-2d82-4329-9ccc-504b857428da
The verdict gate in auto-dispatch.ts now reads the UAT file to determine
the UAT type. For mixed, human-experience, and live-runtime modes,
PARTIAL is accepted as a valid verdict (all automatable checks passed,
human-only checks documented as NEEDS-HUMAN).
The run-uat prompt is updated so that PASS is the correct verdict when
all automatable checks succeed, even if human-only checks remain. PARTIAL
is reserved for when automatable checks themselves are inconclusive.
Fixesgsd-build/gsd-2#1400
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gsd-build/gsd-2/sessions/5a619137-0710-4934-949f-bae63945bf70
scanJournalForForensics() previously called queryJournal() which loaded
ALL journal entries from ALL daily files into memory. For long-running
projects this could be thousands of entries and megabytes of data.
Now:
- Only the last 3 daily files are fully JSON-parsed (event counts, flows)
- Older files are line-counted only (no JSON parsing) for totals
- Recent events use a rolling window of 20 (shift, not accumulate)
- Constants MAX_JOURNAL_RECENT_FILES and MAX_JOURNAL_RECENT_EVENTS
make limits explicit and tunable
Activity log scanning was already intelligent:
- nativeParseJsonlTail with 10MB byte cap
- Only last 5 files scanned
- extractTrace() distills raw JSONL into compact ExecutionTrace structs
- formatReportForPrompt has 30KB hard cap on total output
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gsd-build/gsd-2/sessions/7e7f71ec-0d56-409b-930e-5dff1305ff2a
When a custom provider (e.g. claude-code-cli) registers a streamSimple
handler with the same api type as a built-in (e.g. 'anthropic-messages'),
the global API provider registry was overwritten, routing ALL models of
that api type through the custom handler.
This caused anthropic/claude-opus-4-6 requests to be dispatched through
the Claude Code SDK subprocess instead of the Anthropic API, resulting
in 'Tool not found' errors for Glob, Read, Edit, Bash (SDK tool names
not present in pi's tool registry).
Fix: wrap the registered handler with a model.provider guard so it only
fires for models from the registering provider, delegating to the
previous handler for all other providers.
Closes#2536
The insertChildBefore approach doesn't fix tool ordering because the
message component is already live-streaming text when tool_execution
events arrive. Proper fix requires T3 Code-style session-lifetime
architecture. Revert to simple addChild for now.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
extractTrace() indiscriminately counts all isError tool results as
errors, including grep/rg/find returning exit code 1 (no matches)
and user-interrupt skips. This produces false-positive error-trace
anomalies in forensics reports — in a healthy 10-unit run, 3 units
were flagged with 8 spurious 'errors'.
Add two filters before pushing to the errors array:
- Bash commands with '(no output)' + exit code 1 (normal POSIX grep)
- 'Skipped due to queued user message' (intentional user interrupt)
Real errors (non-zero exit with actual error output, non-bash tool
failures) are still counted as before.
Closes#2539
The module-level reservedMilestoneIds Set persists across /gsd
invocations within the same Node process. Each cancelled session
reserves an ID that is never claimed, permanently inflating the
next milestone number. Starting /gsd 3 times without completing
produces M011 instead of M009.
Call clearReservedMilestoneIds() at the top of showSmartEntry()
and showHeadlessMilestoneCreation() so stale reservations from
previous cancelled sessions are discarded before generating new IDs.
The function already existed but was never called outside tests.
Closes#2488
- Add insertChildBefore() to Box component for positional insertion
- In chat controller, insert tool_execution components before the last
assistant message component (instead of appending after) when tools
were executed externally
- Simplify agent-loop externalToolExecution path back to basic
tool_execution_start/end emission
- Toolcall streaming events are filtered in the Claude Code adapter
to prevent duplicate rendering via message_update
Result: externally-executed tool calls render above the text response,
matching the expected visual flow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pre-flight milestone queue check in auto-start warns about every
CONTEXT-DRAFT.md it finds, regardless of milestone status. A completed
milestone with a leftover CONTEXT-DRAFT.md triggers a spurious warning
on every session start — noise with no actionable meaning.
Add a status guard that skips completed and parked milestones before
checking for CONTEXT-DRAFT files. When the DB is unavailable, fall back
to the existing warn-on-all behavior (safe default).
Closes#2473
- Filter toolcall_start/delta/end events from streaming to prevent
out-of-order rendering in the TUI's accumulated message content
- Collect tool calls from intermediate SDK turns and include them
BEFORE text content in the final AssistantMessage
- The agent loop's externalToolExecution path emits proper
tool_execution_start/end events for each intermediate tool call
- Result: tool activity renders above the text response, not below
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `externalToolExecution` flag to AgentLoopConfig. When true, the
agent loop emits tool_execution_start/end events for TUI rendering but
skips local tool dispatch. Used by providers that handle tool execution
internally (e.g., Claude Code CLI via Agent SDK).
The flag is dynamically evaluated per-loop via a callback on
AgentOptions, so model switches mid-session are handled correctly.
Providers with authMode "externalCli" automatically use this mode.
Also updates the Claude Code CLI stream adapter to preserve tool call
blocks in the final message instead of stripping them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Mintlify docs migration renamed docs/ to docs-internal/ but
pr-risk-check.mjs still referenced the old path, causing every
PR Risk Report workflow to fail with an empty body.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix 1 (auto-start.ts): Replace nativeIsRepo(base) with existsSync(join(base, ".git"))
so bootstrap always creates .git locally even when parent repo makes git rev-parse succeed.
Fix 2 (repo-identity.ts): Start walk-up loop at dirname(normalizedBase) instead of
normalizedBase — finding .gsd at basePath itself is irrelevant to inheritance detection.
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gsd-build/gsd-2/sessions/99fdcddc-7e44-4a64-a1ec-a536806216f6
- Add pathToClaudeCodeExecutable to SDK query options, resolving the
system `claude` binary via `which claude`. Without this, the SDK
looks for a bundled cli.js that doesn't exist when installed as a
library dependency.
- Remove env option that was replacing the subprocess environment and
stripping auth credentials, causing "Not logged in" errors.
- Update model IDs to current versions: claude-opus-4-6 (1M ctx),
claude-sonnet-4-6 (1M ctx), claude-haiku-4-5 (200K ctx).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add journalSummary to ForensicReport: flow count, event type
distribution, recent events timeline, date range
- Add activityLogMeta to ForensicReport: file count, total size,
oldest/newest files
- Add journal-based anomaly detectors: stuck-detected, guard-block,
rapid-iterations, worktree-failure events
- Update formatReportForPrompt and saveForensicReport to include
journal timeline and activity log metadata
- Update forensics prompt template with journal format docs,
investigation guidance for cross-referencing activity+journal
- Update web types (diagnostics-types.ts) and forensics-service.ts
for new fields
- Add forensics-journal.test.ts with 11 contract tests
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gsd-build/gsd-2/sessions/d648480a-42f4-4c41-81c7-85038609c717
The old "demoable" definition was biased toward GUI/SaaS products —
it explicitly penalized terminal commands and curl as demo surfaces.
For developer tools (CLIs, APIs, frameworks), the terminal IS the
product interface and curl IS a legitimate demo.
Redefines "demoable" as audience-appropriate: the intended user
exercising the capability through its real interface. Adds a carve-out
for infrastructure-as-product slices (protocols, extension APIs,
provider interfaces) to the foundation-only rule.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements Phase 1 of the Claude Code subscription-as-provider integration
(issue #2509). Users with a Claude Code subscription (Pro/Max/Team) can
use subsidized inference through GSD's UI via the official Agent SDK.
The extension registers a provider with authMode: "externalCli" that
delegates to the user's locally-installed claude CLI. The SDK runs the
full agentic loop (multi-turn, tool execution) in one streamSimple call.
Tool calls stream in real-time for TUI visibility but are stripped from
the final AssistantMessage so the agent loop ends cleanly without local
tool dispatch.
Zero core changes — pure extension-based implementation.
Closes#2509
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>