Commit graph

2292 commits

Author SHA1 Message Date
Jeremy
ad2211b218 fix(claude-code): wrap prompt history in XML tags to stop transcript fabrication
Closes #4102.

buildPromptFromContext previously serialized multi-turn history using
literal [User] / [Assistant] / [System] bracket labels. Those tokens
are the exact pattern the anti-fabrication rule in system.md and
discuss.md forbids — the model saw its own input framed as a bracket-
labeled transcript and mirrored the format in its output, inventing
both sides of the conversation during /gsd discuss turns.

Replace the bracket labels with XML-tag structure:
  - <conversation_history> wraps the whole turn sequence
  - <user_message> / <assistant_message> per turn
  - <prior_system_context> for the system prompt (renamed from
    <system_prompt> to avoid overlap with Claude Code's reserved
    <system-reminder> convention)

Prepend a directive telling the model to respond only to the final
user message and not emit the XML tags in its own response. Keep
system.md and discuss.md in sync by documenting that prior context
is delivered in those tags.

Add regression tests asserting:
  - no literal [User]/[Assistant]/[System] substrings in the prompt
  - history wrapped in <conversation_history> with per-turn tags
  - directive leads the prompt
  - empty-history edge cases still render correctly
2026-04-13 01:23:47 -05:00
Jeremy McSpadden
3a529f7a95 Merge pull request #4100 from jeremymcs/claude/cleanup-mcp-stream-output-9uCeK
Improve MCP tool rendering with name parsing and compact args
2026-04-13 00:54:38 -05:00
Claude
2d1081f1cc fix: clean up MCP tool rendering in Claude Code CLI stream
Strip the `mcp__<server>__` prefix from tool_use blocks emitted by the
Claude Agent SDK so registered GSD extension renderers (gsd_plan_milestone,
gsd_task_complete, etc.) match instead of falling through to the generic
JSON-dump fallback. The original server name is preserved on the toolCall
block under `mcpServer` for downstream rendering.

Tighten the generic ToolExecutionComponent fallback for any remaining
prefixed names (third-party MCP servers): show a muted `server·tool`
title, render primitive args as compact `key=value` pairs, and truncate
output to 10 lines when collapsed.
2026-04-13 05:46:35 +00:00
Jeremy McSpadden
c189b2152e Merge pull request #4092 from jeremymcs/fix/openrouter-credit-retry
fix(auto): recover from OpenRouter affordability 402 errors
2026-04-12 23:04:58 -05:00
Jeremy
724464c7ae fix(auto): recover from OpenRouter credit affordability errors 2026-04-12 22:48:55 -05:00
Jeremy McSpadden
cac4f8ac37 Merge pull request #4087 from jeremymcs/feat/add-specialist-agents
feat(agents): add 8 specialist subagents, slim pro agents, add GSD phase guard
2026-04-12 22:11:43 -05:00
Jeremy
0c19ca88f2 feat(agents): add GSD phase guard to prevent subagent/phase conflicts
When GSD auto-mode is running a planning phase, the planner subagent
could bypass GSD's state machine and artifact system. This adds a
shared state module and conflict check to block agents that overlap
with the active GSD phase.

- Add shared/gsd-phase-state.ts for cross-extension phase coordination
- Add conflicts_with frontmatter field to agent definitions
- Block conflicting agents with clear error directing to GSD workflow
- Tag planner agent with conflicts_with for plan/research phases
- 10 new tests for phase state and conflict parsing
2026-04-12 21:56:52 -05:00
Jeremy
66f0d45a8c feat(agents): add 8 specialist subagents and slim pro agents
Add focused, token-efficient specialist agents:
- reviewer: structured code review with severity ratings
- debugger: hypothesis-driven bug investigation
- tester: test writing, fixing, and coverage gap analysis
- refactorer: safe code transformations (extract, inline, rename)
- security: OWASP security audit and secrets detection
- planner: architecture/implementation planning (no code output)
- git-ops: conflict resolution, rebase strategy, PR prep
- doc-writer: documentation generation from code

Slim typescript-pro (256→64 lines) and javascript-pro (281→69 lines):
- Remove verbose code examples (the LLM already knows these patterns)
- Remove persistent memory sections (not used in this project)
- Keep core principles, key patterns list, and verification checklist
- Total token savings ~75% per invocation of these agents
2026-04-12 21:56:40 -05:00
Jeremy
c6ba27f371 fix(gsd): cast unknown gate id in test to satisfy GateId type
The gate-registry test intentionally passes an invalid gate id "Q999"
to verify error handling, but the strict GateId union type rejects it
at compile time. Cast to GateId to fix the typecheck:extensions CI step.
2026-04-12 21:30:56 -05:00
Claude
8f58481875 fix(gsd): route quality gates through a per-turn registry
Every workflow turn that needed a quality gate either let it drop
silently or bulk-stamped it at closeout. Q8 was the worst case: seeded
as scope:"slice" by plan-slice, treated as a blocker for the
evaluating-gates phase by state.ts, then filtered out of the
gate-evaluate prompt via `if (!meta) continue;` and never closed by
complete-slice — a guaranteed auto-loop stall once slice gates were
enabled.

Introduce gate-registry.ts as the single source of truth for which
turn owns which gate (Q3/Q4 → gate-evaluate, Q5/Q6/Q7 → execute-task,
Q8 → complete-slice, MV01–MV04 → validate-milestone). Every layer of
the prompt system now consults it:

- state.ts derives pending counts by owner turn, not scope, so Q8
  never stalls evaluating-gates again.
- auto-prompts.ts builders call assertGateCoverage() and render a
  "Gates to Close" block from the registry instead of a hand-rolled
  GATE_QUESTIONS table.
- complete-slice and complete-task handlers saveGateResult for every
  gate they own, mapping gate id → params field so empty sections
  become `omitted` and populated sections become `pass`.
- milestone-validation-gates sources its MV id list from the registry.
- prompt-validation.ts adds validateSliceSummaryOutput /
  validateTaskSummaryOutput / validateMilestoneValidationOutput
  schema checks.
- gsd_save_gate_result accepts MV01–MV04 (via the registry keys) in
  the MCP server and bootstrap tool registration.

Tests: new gate-registry + prompt-system-gate-coverage +
complete-slice-gate-closure suites, plus a Q8 regression case in
gate-dispatch.test.ts. 161 related tests pass end-to-end.

https://claude.ai/code/session_019PT3EmrkMxr4TsgGGLSYK3
2026-04-12 21:13:16 -05:00
Jeremy McSpadden
da7a7e255f Merge pull request #4082 from jeremymcs/claude/review-mcp-server-tools-2Gchv
Add query filtering, abort handling, and permission mode control
2026-04-12 20:54:51 -05:00
Claude
1eb357ca46 fix(mcp): expose every registered tool and fix SDK subpath resolution
Two related fixes for `gsd --mode mcp` that the audit missed on first pass:

1. Tool inventory — session.agent.state.tools was the *active* subset, not
   the full registry. Before this change, MCP clients connected to GSD saw
   63 tools and four built-ins were silently missing: `find`, `grep`, `ls`,
   and `hashline_edit`. After: 67 tools, matching the full _toolRegistry.
   Fix: call session.getAllTools() + session.setActiveToolsByName() before
   starting the MCP transport so every registered tool is active for the
   lifetime of the MCP session.

2. SDK subpath resolution — the #3603 createRequire workaround no longer
   works with @modelcontextprotocol/sdk 1.27.x + current Node. The
   wildcard export ./* → ./dist/cjs/* does NOT auto-append `.js`, and
   _require.resolve fails with "Cannot find module .../server/stdio".
   End-to-end handshake was actually broken in src/mcp-server.ts even
   before my earlier F5 change. Fix: use explicit `.js` suffixes on
   every subpath import (server/index.js, server/stdio.js, types.js),
   matching the convention already in use by packages/mcp-server/.

The regression test is rewritten to enforce the `.js`-suffix convention
and reject any bare subpath or lingering createRequire resolution.

Verified end-to-end via raw JSON-RPC against `gsd --mode mcp --bare`:
  BEFORE_COUNT=63
  AFTER_COUNT=67
  diff: +find +grep +hashline_edit +ls

Test sweep: 76 tests pass across mcp-createRequire, stream-adapter,
mcp-server, workflow-tools.

https://claude.ai/code/session_0174sYny3VvdwYTdCNTmY4Do
2026-04-13 01:40:05 +00:00
Jeremy McSpadden
5c271e72e7 Merge pull request #3790 from salioglu/fix/3718-sessions-stdin-cleanup
fix(cli): clean up stdin after sessions command readline interface closes
2026-04-12 20:18:09 -05:00
Jeremy
dc489f0a07 fix(mcp): resolve rebase regressions in stream-adapter
Rename intermediateToolCalls → intermediateToolBlocks to match upstream
rename, and pass onElicitation via extraOptions (4th arg) instead of
overrides (3rd arg) in buildSdkOptions test.
2026-04-12 20:09:36 -05:00
Claude
1be15758ec fix(mcp): thread abort signals, restore tool fidelity, and fix subpath imports
Audit-driven fixes across the two MCP server surfaces and the Claude Code
streaming adapter:

- src/mcp-server.ts: propagate `extra.signal` into `tool.execute` so MCP
  clients can actually cancel long-running Bash/WebFetch/grep calls, and
  route the remaining `/server` subpath import through `createRequire`
  for #3603 consistency.
- src/tests/mcp-createRequire.test.ts: extend regression coverage to the
  `/server` subpath.
- claude-code-cli/stream-adapter.ts: (a) classify aborts as `aborted`
  instead of the retry-eligible `stream_exhausted_without_result`,
  (b) merge final-turn toolCall blocks from the builder into the
  AssistantMessage via the new `mergePendingToolCalls` helper so a turn
  ending in `tool_use` stop_reason no longer drops its tool calls, and
  (c) resolve the SDK permission mode via `resolveClaudePermissionMode`
  (auto-mode → bypass, interactive → acceptEdits, env override).
- packages/mcp-server/src/server.ts: make `gsd_query` actually respect
  its `query` argument with known categories + forward-compatible
  fallback, and thread `extra.signal` into `gsd_execute` so an aborted
  RPC request cancels the newly-created session instead of leaking a
  background RpcClient process.
- stream-adapter test suite: add regression tests for abort
  classification, final-turn tool-call merging, and permission mode
  resolution.

Verified via: mcp-createRequire, stream-adapter (27), partial-builder,
mcp-server package (31), workflow-tools (13) — 83 tests green.

https://claude.ai/code/session_0174sYny3VvdwYTdCNTmY4Do
2026-04-12 20:04:47 -05:00
Jeremy
e9e2850165 test(doctor): add regression test for claude-code CLI auth provider
Verifies that claude-code provider is reported as ok without any API
key, since it uses external CLI authentication.
2026-04-12 19:24:29 -05:00
Jeremy
20f627fb67 fix(doctor): skip key check for CLI-authenticated providers
Providers like claude-code, openai-codex, google-gemini-cli use external
CLI auth — they don't need API keys. The doctor was incorrectly reporting
"claude-code key missing" for subscription users.
2026-04-12 19:16:16 -05:00
Jeremy McSpadden
79f79b617d Merge pull request #4077 from jeremymcs/fix/tui-notification-overlay-wiring
fix(tui): overlay subscription + Ctrl+Shift+P shortcut conflict
2026-04-12 18:29:28 -05:00
Jeremy
df1a7a76d0 fix(tui): overlay subscription + Ctrl+Shift+P shortcut conflict
- Replace notification overlay 3s polling with onNotificationStoreChange
  subscription for immediate updates; keep 30s safety-net for cross-process
- Remove Ctrl+Shift+P parallel fallback that conflicts with cycleModelBackward
- Add hasFallback flag to GSDShortcutDef so hint text is accurate
- Fix misleading _withLock comment; rename ownsLock → createdLock

Closes gsd-build/gsd-2#4076
2026-04-12 18:14:01 -05:00
Jeremy McSpadden
cdecbf2d68 Merge pull request #4074 from jeremymcs/fix/ollama-footer-status
fix(ollama): clear footer status when provider unavailable
2026-04-12 18:08:25 -05:00
Claude
701ab18d81 fix(models): block unconfigured models from selection surfaces
Filter models whose provider has no working API key or OAuth out of
every user-facing selection path. Previously, stale defaults and scoped
sets could leak unconfigured models into /model, /gsd model, and auto
run — the user could "pick" a model that immediately threw on use.

- model-selector: filter scopedModels via isProviderRequestReady;
  default to "all" scope when no scoped model is ready.
- model-controller: same filter for getModelCandidates, so exact-match
  resolution from /model <term> can't return an unauth'd scoped model.
- model-resolver: gate findInitialModel step 3 on provider readiness so
  a stale saved default falls through to the available-models path.
- startup-model-validation: check configuredExists against getAvailable
  instead of getAll, so a configured-but-unauth default triggers the
  fallback picker and thinking-level reset.
- auto-start: validate resolveDefaultSessionModel against the live
  registry + auth before snapshotting, and warn when PREFERENCES.md
  names an unconfigured model.

https://claude.ai/code/session_015q6b23ap9Pyqdogzz2FXGh
2026-04-12 17:25:06 -05:00
Jeremy
61166a6f17 fix(ollama): clear footer status when provider unavailable 2026-04-12 17:23:23 -05:00
Jeremy McSpadden
c3aa3a3bf0 Merge pull request #4067 from jeremymcs/fix/gsd-model-session-override 2026-04-12 12:50:12 -05:00
Jeremy McSpadden
3ecaf1eb35 Merge pull request #4066 from mastertyko/fix/2156-clean-merged-worktree-branch 2026-04-12 12:48:18 -05:00
Jeremy
5842e2834a fix(gsd): guard model override in minimal command contexts 2026-04-12 12:28:52 -05:00
Jeremy
e247f2fe61 fix(gsd): honor /gsd model as session override across dispatch 2026-04-12 11:48:06 -05:00
mastertyko
8a37e2ce10 fix(gsd): use milestone branch for merged worktree cleanup 2026-04-12 18:45:36 +02:00
Jeremy McSpadden
c8996c40bd Merge pull request #4040 from mastertyko/fix/3733-start-auto-fire-and-forget
fix(gsd): detach auto start from active turns
2026-04-12 09:43:16 -05:00
Jeremy McSpadden
564a71da37 Merge pull request #4053 from jeremymcs/fix/auto-session-credential-cooldown
fix(auto): survive transient 429 credential cooldown
2026-04-12 09:42:37 -05:00
Jeremy
4f2e90e1e8 test(auto): add tests for credential cooldown fix
- auth-storage.test.ts: 8 tests for getEarliestBackoffExpiry()
- sdk.test.ts: 12 tests for CredentialCooldownError class
- infra-errors-cooldown.test.ts: 35 tests for isTransientCooldownError(),
  getCooldownRetryAfterMs(), and exported constants

Required by CI lint (require-tests.sh) per CONTRIBUTING.md.

Closes #4052
2026-04-12 09:30:52 -05:00
Jeremy
d0afe018eb fix(auto): add structured cooldown error and bounded retry budget
Address Codex adversarial review findings:

- Replace string-matched cooldown detection with typed
  CredentialCooldownError (code: AUTH_COOLDOWN, retryAfterMs)
- Add MAX_COOLDOWN_RETRIES (5) cap so cooldown retries can't spin for
  hours on persistent quota exhaustion
- Auto-loop uses retryAfterMs from structured error when available,
  falls back to 35s default
- Export CredentialCooldownError from pi-coding-agent package
- Retain regex fallback for cross-process error propagation

Closes #4052
2026-04-12 09:16:05 -05:00
Jeremy
4d41b21fbd test(gsd): align widget assertions after tui conflict resolution 2026-04-12 09:14:41 -05:00
Jeremy
cd86e8a7d0 feat(tui): improve gsd overlays, shortcuts, and notification flows 2026-04-12 09:13:46 -05:00
Jeremy
1ae93e9822 fix(auto): survive transient 429 credential cooldown in auto sessions
getApiKey() retry loop (3 attempts, ~12s) couldn't outlast the 30s
rate-limit backoff window, causing cooldown errors to cascade through
the auto-loop and trigger a hard stop after 3 consecutive failures.

- Add AuthStorage.getEarliestBackoffExpiry() to expose when the next
  credential becomes available
- Update getApiKey() to sleep until backoff expiry (up to 60s) instead
  of fixed 2s/4s/6s delays
- Add isTransientCooldownError() detector in infra-errors.ts
- Auto-loop now waits 35s on cooldown errors without incrementing the
  consecutive error counter

Closes #4052
2026-04-12 09:04:41 -05:00
Jeremy McSpadden
d21d3e364d Merge pull request #4041 from mastertyko/fix/3707-unpark-db-desync
fix(gsd): repair DB-only milestone unpark state
2026-04-12 09:02:42 -05:00
Jeremy McSpadden
7c45b5abf2 Merge pull request #4042 from mastertyko/fix/3760-forensics-session-aware-loops
fix(gsd): scope stuck-loop forensics to auto sessions
2026-04-12 08:46:35 -05:00
Jeremy McSpadden
343dc8a675 Merge pull request #4044 from mastertyko/fix/3776-claude-cli-error-signal
fix(claude-code-cli): surface result text for success errors
2026-04-12 08:46:18 -05:00
mastertyko
1ab3d9a04f fix(headless): keep idle timeout off during interactive tools 2026-04-12 14:04:15 +02:00
mastertyko
4189afe8a0 fix(claude-code-cli): surface result text for success errors 2026-04-12 14:03:29 +02:00
mastertyko
e987734559 fix(gsd): scope stuck-loop forensics to auto sessions 2026-04-12 14:00:01 +02:00
mastertyko
102457618d fix(gsd): repair DB-only milestone unpark state 2026-04-12 13:34:28 +02:00
mastertyko
2a1bd3a265 fix(gsd): detach auto start from active turns 2026-04-12 13:28:49 +02:00
Jeremy
488e4b5110 fix(cli): include all internal node_modules entries in pnpm merged dir
PR #3564 narrowed the internal overlay to @gsd* prefixes only, which
dropped non-hoisted optional deps like @anthropic-ai/claude-agent-sdk
from the merged ~/.gsd/agent/node_modules directory. Revert to overlaying
all non-dotfile internal entries so optional deps resolve correctly.
2026-04-12 02:12:13 -05:00
Jeremy
d5e4938320 Merge remote-tracking branch 'upstream/main' into fix/4018-anti-fabrication-guardrails
# Conflicts:
#	src/resources/extensions/gsd/prompts/discuss-prepared.md
2026-04-12 00:07:30 -05:00
Jeremy
5aa1fe0c0c fix(gsd): enforce anti-fabrication turn-taking in discuss prompts 2026-04-12 00:04:08 -05:00
Jeremy McSpadden
c900e1004a Merge pull request #3564 from Tibsfox/fix/node-modules-symlink-target
fix(cli): resolve hoisted node_modules for global installs
2026-04-12 00:00:51 -05:00
Tibsfox
a6286ac32c fix(cli): address review findings for pnpm merged node_modules
- Use content fingerprint (packageRoot + sorted entry names from both
  dirs) in .gsd-merged marker so pnpm add/remove triggers rebuild
- Restrict overlay loop to @gsd* scopes only, preventing accidental
  shadowing of hoisted deps with internal versions
- Guard marker write behind linkedCount > 0 to avoid stamping success
  on a broken/empty merged directory
- Log warnings when readdirSync fails on hoisted/internal roots

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:45:12 -07:00
Tibsfox
42d2e25e0b fix(cli): handle pnpm global installs by merging both node_modules roots
pnpm's virtual-store layout doesn't hoist @gsd/* workspace scopes to
the parent node_modules, so the simple symlink-to-hoisted approach from
the original fix (#3529) left workspace packages unresolvable.

Detect when workspace scopes are missing from the hoisted root and
create a real node_modules directory with symlinks from both the hoisted
root (external deps) and internal root (workspace packages). A .gsd-merged
marker file skips rebuild on subsequent startups.

Restores behavioral tests deleted in the original PR and adds unit tests
for the pnpm merge path and scope detection logic.

Reported-by: @moekify
Fixes: #3564 (comment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 21:40:32 -07:00
Jeremy McSpadden
e7dc2d4bd2 Merge pull request #3655 from Tibsfox/fix/connection-error-transient
fix(gsd): classify plain 'Connection error.' as transient for auto-mode retry
2026-04-11 23:22:05 -05:00
Jeremy McSpadden
b797380786 Merge pull request #3025 from jeremymcs/worktree-fix-3023-home-dir-error
fix(commands): friendly message when /gsd runs from $HOME
2026-04-11 23:19:51 -05:00