Commit graph

2420 commits

Author SHA1 Message Date
Jeremy
2cc01c11ee fix(merge): clean stale MERGE_HEAD before squash merge (#2912)
A pre-existing MERGE_HEAD (from failed prior merge, libgit2 native path,
or external tooling) blocks git merge --squash. Remove stale merge state
files before starting the squash merge, not just after.
2026-03-31 17:48:45 -05:00
Jeremy
0e978d4565 fix(state): always run disk→DB reconciliation when DB is available (#2631)
When DB was available but empty, deriveState skipped deriveStateFromDb
entirely, bypassing the disk→DB sync logic. Milestones created outside
the DB write path were never discovered.
2026-03-31 17:34:05 -05:00
Jeremy
36b03890da fix(git-service): fix merge-base ancestry check and .gsd/ leakage in snapshot absorption
- Check HEAD~1 (newest snapshot) instead of resetTarget (pre-snapshot
  base) for remote ancestry. The old check false-positived when the
  remote was at the pre-snapshot base but snapshots were local-only.
- Re-run smartStage() after soft reset so RUNTIME_EXCLUSION_PATHS
  apply to the absorbed commit. Without this, .gsd/ state files from
  snapshot commits leaked into the real commit.
2026-03-31 17:25:29 -05:00
Jeremy
fa0651bfd6 feat(doctor): stale commit safety check with gsd snapshot and auto-cleanup
Adds a safety mechanism that detects uncommitted changes idle past a
configurable threshold (default: 30 min), auto-snapshots tracked files
using `git add -u`, and cleans up snapshot commits when real work lands.

- New `stale_uncommitted_changes` doctor issue with auto-snapshot fix
- Detection in health widget (60s), pre-dispatch gate, and /gsd doctor
- `nativeAddTracked()` stages only tracked files (no secrets/binaries)
- `absorbSnapshotCommits()` squashes `gsd snapshot:` commits into next
  real autoCommit via soft reset + re-commit
- Configurable via `stale_commit_threshold_minutes` preference (0=off)
2026-03-31 17:25:29 -05:00
Jeremy McSpadden
e0d130e682 feat(extensions): wire up topological sort and unified registry filtering (#3152)
- Add extension-manifest.ts and extension-sort.ts to pi-coding-agent
  with manifest reading and Kahn's BFS topological sort algorithm
- Add extensionPathsTransform hook to DefaultResourceLoader that runs
  between path merging and loadExtensions() — enables pre-load
  filtering and reordering without modifying pi internals
- Wire GSD's buildResourceLoader() to provide a transform that:
  1. Filters ALL extensions (including community) through the GSD registry
  2. Sorts in topological dependency order via sortExtensionPaths()
- Mark discoverAndLoadExtensions() as @deprecated (dead code path)
- Add 16 tests covering manifest reading, dependency sorting, cycles,
  missing deps, and non-array deps

Previously, dependencies.extensions in manifests was decorative (sort
existed but was never called), and gsd extensions disable only worked
for bundled extensions. Community extensions in ~/.gsd/agent/extensions/
bypassed the registry entirely.
2026-03-31 11:54:48 -06:00
Jeremy McSpadden
f0059a5498 fix(extensions): update provides.hooks in 7 extension manifests to match actual registrations (#3157)
Audit found that 7 bundled extensions had incomplete provides.hooks
arrays in their manifests. Updated each to match actual pi.on() calls:

- async-jobs: +session_before_switch, session_shutdown
- bg-shell: +8 hooks (session_compact, session_tree, etc.)
- browser-tools: +session_start
- context7: +session_shutdown
- google-search: +session_shutdown
- gsd: +12 hooks (bash_transform, tool_call, tool_result, etc.)
- search-the-web: +session_start

Closes #3156
2026-03-31 11:54:41 -06:00
Jeremy McSpadden
1e89090136 test(state-machine): add regression suite — 86 tests across 6 files (#3161) (#3162)
Comprehensive validation of the GSD state machine identified 7 HIGH, 14 MEDIUM,
and 16 LOW findings. This adds regression and integration tests covering:

Unit tests (49):
- Event replay idempotency (M4 lossy blocker replay, M5 duplicate evidence)
- Reconciliation edge cases (fork detection, entity keys, conflict detection)
- Completion hierarchy guards (vacuous truth, phantom parents, rollback fidelity)
- State derivation parity (ghost milestones, phase transitions, DB/FS consistency)
- Stuck detection coverage (all 3 rules + documented gap for 3-unit cycles)

Integration tests (37):
- Full happy-path lifecycle (pre-planning → complete)
- 12 completion guard edge cases with real handlers
- 7 reopen operations including H5 (no reopen-milestone exists)
- Phantom parent auto-creation (H6)
- State derivation consistency with live DB
- Event log integrity across operations
- M12: stale SUMMARY.md causes reconciler to override reopen

Closes #3161
2026-03-31 11:54:30 -06:00
Jeremy McSpadden
fbb67f15f8 feat(widget): add last commit display and dashboard layout improvements (#3226)
- Health widget: always-on last commit with relative time + message
- Dashboard: move worktree/branch info to right-aligned line under header
- Dashboard: move last commit to bottom-left with hints on right
- Dashboard: cap task titles at 45 chars, commit messages at 65 chars
- Dashboard: use … instead of ... for all truncation
2026-03-31 11:49:35 -06:00
Jeremy McSpadden
eaccf3e690 test(state): comprehensive state machine phase walkthrough (#3276) (#3277)
70 tests covering all 16 phases of the GSD state machine with both
happy-path and failure-mode verification. Exercises DB and filesystem
derivation paths, reconciliation logic, and edge cases.

Findings documented in #3276: 0-byte SUMMARY triggers false completion,
DB task rows missing causes wrong phase, stale path cache across
derivations, non-standard status strings silently accepted.
2026-03-31 11:49:28 -06:00
Jeremy McSpadden
706a2f8e9f refactor(state): centralize pipeline logging through workflow logger (#3282)
* refactor(state): centralize pipeline logging through workflow logger

Route 15 raw process.stderr.write calls through the structured
workflow logger (logWarning/logError). Adds "db" and "dispatch"
as new LogComponent values. Enables auto-loop drain/summarize,
audit-log persistence, and doctor integration for reconciliation
and DB events that previously bypassed structured logging.

Files changed:
- workflow-logger.ts: add "db" and "dispatch" components
- state.ts: 3 reconciliation calls → logWarning/logError
- gsd-db.ts: 4 DB operation calls → logError
- workflow-reconcile.ts: 3 event merge calls → logWarning/logError
- auto-dispatch.ts: 1 reactive dispatch call → logError
- auto-post-unit.ts: 3 triage/rogue calls → logWarning/logError

* test(workflow-logger): add tests for db and dispatch log components

Cover the new LogComponent values added in this refactor to satisfy
the CI require-tests gate.
2026-03-31 11:49:19 -06:00
Jeremy McSpadden
17471ea280 feat(model-routing): enable dynamic routing by default (#3120)
* feat(model-routing): enable dynamic routing by default

Change defaultRoutingConfig().enabled from false to true so that
dynamic model routing (tier-based downgrading for light/standard
tasks) is active out of the box. Users can still disable it via
dynamic_routing.enabled: false in PREFERENCES.md.

This is a behavioral change: sessions that previously used the
configured model for all tasks will now automatically downgrade
to cheaper models for light and standard complexity tasks.

* test(model-routing): verify dynamic routing enabled by default

Tests that defaultRoutingConfig returns enabled: true and all
routing features are active.
2026-03-31 11:47:38 -06:00
Tom Boucher
081c5dc52f fix: surface nativeCommit errors in reconcileMergeState instead of silently swallowing (#3052)
The catch block in reconcileMergeState silently swallowed all nativeCommit
exceptions, including real failures (permissions, corrupt git state, hook
rejections). This caused auto-mode to report success and return true (dirty,
re-derive) even when the merge commit actually failed, leading to an infinite
loop where auto-mode repeatedly attempted worktree finalization.

Now the catch block logs the error via ctx.ui.notify at "error" level and
returns false to signal that reconciliation failed, allowing upstream logic
to react appropriately. The nativeCommit return value is also checked —
a null return (nothing to commit) gets its own info notification distinct
from a successful commit SHA.

Closes #2542

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:50:28 -06:00
Tom Boucher
46d798a1bf fix(parallel): scope commits to milestone boundaries in parallel mode (#3047)
When GSD_MILESTONE_LOCK is set (parallel worker mode), smartStage() now
excludes .gsd/milestones/<M>/ directories for all milestones other than the
locked one. This prevents a parallel worker (e.g., M033) from staging and
committing fabricated artifacts for a milestone it does not own (e.g., M032).

Previously, smartStage() ran `git add -A` with only runtime path exclusions,
allowing cross-milestone pollution when workers share the same .gsd/ directory
(git.isolation: "none"). The GSD_MILESTONE_LOCK env var only filtered what
deriveState() sees but did not prevent file staging.

Closes #1991

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:50:21 -06:00
Tom Boucher
de9ba8aeb7 fix: add windowsHide to all web-mode subprocess spawns (#2628) (#3046)
On Windows, child_process.spawn() and execFile() open a visible console
window by default. The web server spawn, RPC bridge, browser opener, and
all 15 web service subprocess calls were missing windowsHide: true,
causing constant console window flashing when running gsd --web.

Closes #2628

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:50:13 -06:00
Tom Boucher
466c7dea18 fix: skip auto-mode pause on empty-content aborted messages (#2695) (#3045)
When the LLM sends an assistant message with empty content[] and
stopReason "aborted", this is a non-fatal agent stop — not a crash.
The abort handler now checks for empty content and missing errorMessage
before deciding to pause. Empty-content aborts are routed to
resolveAgentEnd instead, breaking the stuck re-dispatch loop.

Closes #2695

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:50:05 -06:00
Tom Boucher
0b36977804 fix: detect and remove nested .git dirs in worktree cleanup to prevent data loss (#3044)
Scaffolding tools (create-next-app, cargo init, etc.) create nested .git
directories inside worktrees. Git records these as gitlinks (mode 160000)
without .gitmodules, so worktree cleanup destroys the only copy of the
nested object database — causing permanent silent data loss.

Added findNestedGitDirs() helper that recursively scans worktree for nested
.git directories (skipping node_modules and other non-project dirs). The
removeWorktree() function now calls this before cleanup and removes any
nested .git dirs so files are tracked as regular content.

Closes #2616

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:49:54 -06:00
Tom Boucher
9384641b25 fix: prevent data loss when git isolation default changes (#2625) (#3043)
When the default isolation mode flipped from "worktree" to "none" between
versions, mergeAndExit() returned early for mode "none" without checking
whether the session was physically inside an active worktree. This silently
skipped the merge, orphaning committed work on the milestone branch.

The fix moves the worktree-presence check (isInAutoWorktree + originalBasePath)
before the mode-none early return. If we are inside a worktree, mergeAndExit
proceeds with the worktree merge path regardless of the configured mode.

Also fixes the misleading JSDoc on GitPreferences.isolation that claimed
"worktree" was the default when the runtime default is actually "none".

Closes #2625

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:49:03 -06:00
Tom Boucher
893c525578 fix(read-tool): clamp offset to file bounds instead of throwing (#3007) (#3042)
When an agent requests read(file, offset: 30) on a 13-line file, the
read tool threw "Offset 30 is beyond end of file" which propagated as
invalid JSON downstream during milestone completion. Now clamps the
offset to the last line and prepends a notice, allowing the agent to
continue with valid content.

Fixes both read.ts and hashline-read.ts variants.

Closes #3007

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:48:01 -06:00
Tom Boucher
70e3d9d6c2 fix(gsd): preserve queued milestones with worktrees in ghost detection (#3041)
isGhostMilestone() now checks for DB rows and worktree directories before
falling back to content-file detection. A milestone with a DB row or a
worktree is a legitimate milestone that hasn't been populated yet, not a
ghost from a killed session.

Fixes #2921

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:47:53 -06:00
Tom Boucher
3a1cedd7de fix(compaction): add chunked fallback when messages exceed model context window (#3038)
When a session grows beyond the context window of available models,
generateSummary() now detects the overflow and falls back to chunked
summarization: split messages into context-fitting chunks, summarize
the first chunk, then iteratively merge subsequent chunks using the
existing UPDATE_SUMMARIZATION_PROMPT path.

Closes #2932

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:47:41 -06:00
Tom Boucher
dfb4fbecef fix: preserve interactive terminal across tab switches and project changes (#3055)
Two root causes destroyed terminal state during normal navigation:

1. The pagehide handler fired a shutdown beacon unconditionally, but on
   mobile/Safari tab switches pagehide fires with event.persisted=true
   (bfcache entry). This killed the server and all PTY sessions when the
   user merely switched browser tabs. Fix: check event.persisted and skip
   the beacon when the page is being cached, not unloaded.

2. ShellTerminal used project-agnostic session IDs ("default"), so
   switching projects and switching back either collided with the old
   session or spawned a new one, losing terminal state. Fix: scope session
   IDs by project path (e.g. "default:/path/to/project") so the server's
   getOrCreateSession returns the existing live PTY on reconnect.

Closes #2701

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:46:09 -06:00
Tom Boucher
fb10141e9b fix: call cleanupQuickBranch on turn_end to squash-merge quick branch back (#3054)
cleanupQuickBranch() was exported from quick.ts but never called anywhere.
After a /gsd quick task completed, the user was left on the quick branch
with orphaned state in quick-return.json.

Register a turn_end hook in register-hooks.ts that calls cleanupQuickBranch()
after each agent turn. The function is already idempotent (no-op when no
quick-return state is pending), so it is safe to call on every turn.

Closes #2668

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:46:03 -06:00
Tom Boucher
dfb18c6e62 fix: align run-uat artifact path to ASSESSMENT, preventing false stuck retries (#3053)
The run-uat prompt instructs the agent to save results via gsd_summary_save
with artifact_type: "ASSESSMENT", which writes S##-ASSESSMENT.md. But
resolveExpectedArtifactPath and diagnoseExpectedArtifact expected S##-UAT.md,
causing artifact verification to fail and auto-mode to retry indefinitely.

Align all three contract points (prompt uatResultPath, artifact resolution,
and diagnostic message) to use ASSESSMENT as the canonical artifact type.

Closes #2873

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:45:57 -06:00
Tom Boucher
fb0fb5582e fix: replace invalid Discord invite links with canonical URL (#3056)
Closes #2699

The Discord badge in README.md pointed to https://discord.gg/gsd (expired
vanity URL) and the Pi ecosystem doc used an old invite code. Both now use
the canonical invite https://discord.com/invite/nKXTsAcmbT that was
established in commit 0a1dad9a.

Adds a regression test that validates all Discord invite links in
user-facing files match the canonical URL.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:45:32 -06:00
Tom Boucher
fad23944e7 fix: add Windows shell guard to remaining spawn sites (#3058)
Three spawn call sites were missing `shell: process.platform === "win32"`,
causing ENOENT/EINVAL errors on Windows where npm-installed tools are .cmd
batch scripts that require shell resolution:

- exec.ts: hardcoded `shell: false` -> platform-guarded
- lsp/index.ts: missing shell option on project-type command spawn
- lsp/lspmux.ts: missing shell option on lspmux binary spawn

Adds a structural regression test that scans all spawn sites invoking
user-facing binaries and asserts the Windows shell guard is present.

Closes #2854

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:44:20 -06:00
Tom Boucher
05b7cb95cb fix: route gsd auto to headless runner to prevent hang on piped stdin/stdout (#3057)
`gsd auto` was not handled as a subcommand — it fell through to the
interactive TUI, which hangs indefinitely when stdin/stdout are piped
(non-TTY). Add `auto` as a recognized subcommand that rewrites argv
and delegates to `runHeadless(parseHeadlessArgs(...))`, matching the
existing `gsd headless auto` behavior.

Also adds `gsd auto` to TTY error hints and help text.

Closes #2732

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:44:04 -06:00
Tom Boucher
df9e06cfa5 fix: respect .gitignore for .gsd/ in rethink prompt (#3059)
* fix: respect .gitignore for .gsd/ in rethink prompt (#2570)

The rethink.md prompt template hardcoded `git add .gsd/` which caused
the executing agent to force-add .gsd/ files (via `git add -f`) when
.gsd was listed in .gitignore. This silently overrode the user's
gitignore configuration, tracking planning artifacts they explicitly
excluded.

- Add `isGsdGitignored()` utility that uses `git check-ignore` to
  detect when .gsd is covered by .gitignore rules
- Replace hardcoded `git add .gsd/` in rethink.md with the
  `{{commitInstruction}}` template variable (consistent with all
  other prompt templates)
- Pass gitignore-aware commit instruction from rethink.ts: skip
  commit when .gsd is gitignored, include git add only when it is not

Closes #2570

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: re-trigger checks

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:43:56 -06:00
Tom Boucher
e71de432ab fix: migrate unit ownership from JSON to SQLite to eliminate read-modify-write race (#3061)
The JSON-based unit-claims storage had a lost-update race under concurrent
multi-agent use: two agents could both read the file as unclaimed, then both
write their claim, with the second silently overwriting the first.

Replace with a SQLite-backed store using INSERT OR IGNORE on a PRIMARY KEY
constraint for atomic first-writer-wins claim semantics. claimUnit() now
returns boolean (true = claimed, false = already claimed by another agent).

Closes #2728

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:43:44 -06:00
Tom Boucher
e78dca41d4 fix(roadmap): handle numbered, bracketed, and indented prose H3 headers in slice parser (#3063)
The prose slice header fallback parser failed to extract slices when
LLMs generated common formatting variants: numbered prefixes (### 1. S01),
parenthetical numbering (### (1) S01), bracketed IDs (### [S01]), or
indented headings (  ### S01). This caused auto-mode to permanently block
with "No slice eligible" when the plan-milestone prompt produced these
formats inside a ## Slices section.

Broadened the parseProseSliceHeaders regex to accept optional leading
whitespace, numeric prefixes, parenthetical numbering, and square brackets
around slice IDs.

Closes #2567

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:43:33 -06:00
Tom Boucher
8b680179e2 fix: add worktree-merge to resolveModelWithFallbacksForUnit switch and update KNOWN_UNIT_TYPES (#3066)
Closes #2900

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:43:22 -06:00
Tom Boucher
571b382075 fix: clean up MERGE_HEAD on all error paths in mergeMilestoneToMain (#2912) (#3068)
libgit2's merge implementation creates MERGE_HEAD even for squash merges,
unlike CLI git. When the merge fails with conflicts, the error paths in
mergeMilestoneToMain cleaned SQUASH_MSG and MERGE_MSG but left MERGE_HEAD
on disk. This blocked all subsequent merge attempts and caused doctor to
report corrupt merge state.

Add MERGE_HEAD cleanup (via nativeMergeAbort + explicit unlink) to:
- The code-conflict error path (before MergeConflictError throw)
- The dirty-working-tree error path (defensive)
- The success path (alongside existing SQUASH_MSG cleanup)

Closes #2912

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:43:02 -06:00
Tom Boucher
9dc6a6a97d fix: prevent LLM from confusing background task output with user input (#3069)
* fix: wrap custom messages with system notification prefix in LLM context

Background job completion notifications (delivered as custom messages via
sendMessage with deliverAs: "followUp") were converted to plain role: "user"
messages in convertToLlm(), making the LLM indistinguishable from actual
human input. This caused the agent to confuse background task output with
user messages, responding to job completions as if the user had typed them.

Wrap all custom messages with a clear system notification prefix that
includes the customType and an explicit instruction that the content is
an automated system event, not user input. This follows the same pattern
used by branchSummary and compactionSummary messages which already use
structured prefixes/suffixes.

Closes #3026

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve TS import extension and type errors in messages test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:42:56 -06:00
Tom Boucher
43ece11be7 fix: add openai-codex provider and modern OpenAI models to MODEL_CAPABILITY_TIER and cost tables (#3070)
Closes #2885

The MODEL_CAPABILITY_TIER map in model-router.ts and the BUNDLED_COST_TABLE
in model-cost-table.ts were missing all openai-codex provider models
(gpt-5.1, gpt-5.2, gpt-5.3-codex, gpt-5.4, etc.) and modern OpenAI models
(o4-mini, gpt-4.1, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro). This caused
dynamic routing to treat these models as unknown (falling back to the
isKnownModel guard) and cost comparisons to assign them 999 (the "unknown,
assume expensive" fallback).

Added 17 new model entries to MODEL_CAPABILITY_TIER across all three tiers,
matching the tier assignments from the issue. Added corresponding entries to
both MODEL_COST_PER_1K_INPUT (model-router.ts) and BUNDLED_COST_TABLE
(model-cost-table.ts). Updated the #2192 test fixture that used gpt-5.4 as
an "unknown" model since it is now known.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:41:13 -06:00
Tom Boucher
cb26d71483 fix: preserve active tab when switching projects (#3071)
Closes #2711

Two changes fix the tab-reset-to-dashboard bug:

1. Remove the forced `gsd:navigate-view` dispatch to "dashboard" in
   ProjectsPanel.handleSelectProject — this was unconditionally resetting
   the view on every project switch.

2. Add a useEffect in WorkspaceChrome that resets `viewRestored` when
   `projectPath` changes, so the per-project sessionStorage view restore
   fires for the newly-selected project.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:41:05 -06:00
Tom Boucher
501fb83606 fix: include project name in desktop notifications (#3072)
Desktop notifications now display "GSD — projectName" instead of just
"GSD", making it clear which project a notification belongs to when
multiple projects are active.

Closes #2708

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:40:58 -06:00
Tom Boucher
5f660bf3ce fix: recover from many-image dimension overflow by stripping older images (#3075)
When a session accumulates many images (screenshots, file reads), the
Anthropic API enforces a 2000px dimension limit for "many-image requests"
and returns a 400 error. Previously this error was not classified as
retryable, causing the session to get permanently stuck in an error loop
with no recovery path.

This adds automatic recovery: detect the specific "image dimensions exceed
max allowed size for many-image requests" error, strip older images from
the conversation history (keeping the 5 most recent), and auto-retry.
Also handles manual retry (continue/retry) by downsizing before retrying.

Closes #2874

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:40:35 -06:00
Tom Boucher
a3fc400c09 fix: resolve bare model IDs to anthropic over claude-code provider (#3076)
When the claude-code-cli extension is active and the session provider is
claude-code, bare model IDs in PREFERENCES.md (e.g. "claude-sonnet-4-6")
silently resolve to claude-code/* instead of anthropic/*, routing all
dispatch through the Claude Code CLI subprocess with different context,
tool visibility, and cost characteristics.

The fix introduces provider precedence in resolveModelId(): extension
providers like claude-code are deprioritized for bare ID resolution.
First-class API providers (anthropic, bedrock, openai, azure, etc.)
retain the existing current-provider preference behavior. When the
session provider is an extension, resolution falls through to prefer
anthropic, then any non-extension provider, preserving backward
compatibility for all existing provider combinations.

Closes #2905

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:40:26 -06:00
Tom Boucher
e7351fbd75 fix(auto): move selectAndApplyModel before updateProgressWidget (#3079)
Closes #2907

selectAndApplyModel was called after updateProgressWidget and the prompt
injection block, so the dashboard showed the previous unit's model label
and a second call (if added by a future #2899 fix) would overwrite the
first result.  Move the single selectAndApplyModel call to before
updateProgressWidget so the model is resolved before the widget renders
and there is exactly one call per unit dispatch.

Adds a structural regression test that asserts selectAndApplyModel
appears exactly once in runUnitPhase and before updateProgressWidget.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:40:03 -06:00
Tom Boucher
cbb9c2edd9 fix: detect project relocation and recover state without data loss (#3080)
* fix: detect project relocation and recover state without data loss

For repos with a remote URL, compute identity as SHA256(remoteUrl) only,
dropping the git root path from the hash. This makes the identity stable
across directory moves/renames -- the most common cause of silent data loss.

For local-only repos, write a .gsd-id marker file in the project root that
records the identity hash. After a move, ensureGsdSymlink reads the marker,
finds the orphaned state directory, and migrates data to the new identity
path automatically.

Also handles the upgrade migration: when an existing .gsd symlink points
to a valid state dir under the old hash format, data is transparently
migrated to the new remote-only hash path.

Closes #2750

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle existing symlink in project-relocation recovery test

Add defensive unlinkSync calls before symlinkSync in ensureGsdSymlinkCore
to prevent EEXIST race conditions when a dangling or residual symlink
exists at the .gsd path during project relocation recovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:39:50 -06:00
Tom Boucher
6ac30ae9d9 fix: add free-text input to ask-user-questions when "None of the above" is selected (#3081)
Fixes #2715

The ask-user-questions UI trapped users in a re-asking loop when the agent
needed a free-text explanation rather than a fixed choice. Two paths were
affected:

- RPC fallback (ctx.ui.select): selecting "None of the above" recorded
  the label but never prompted for a free-text explanation. Now follows up
  with ctx.ui.input() so the user can type their answer.

- Custom interview UI: selecting "None of the above" advanced to the next
  question without opening the notes editor. Now auto-focuses the notes
  field so the user can immediately type a free-text response.

Also updated the "None of the above" description from "Press TAB to add
optional notes." to "Select to type your own answer." for discoverability.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:39:35 -06:00
Tom Boucher
a09ae1de26 fix: block work execution during /gsd queue mode (#2545) (#3082)
When /gsd queue was active, the agent had unrestricted access to all
tools and would execute described work instead of creating milestones.
The queue prompt instructed milestone-only behavior, but the system
prompt's "execute with full commitment" directive dominated.

Add a mechanical tool gate (shouldBlockQueueExecution) that blocks
write/edit to non-.gsd/ paths and mutating bash commands when queue
phase is active. Read-only tools, discussion tools, and .gsd/ artifact
writes remain allowed. This enforces the queue contract at the tool
layer rather than relying solely on prompt compliance.

Closes #2545

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:39:26 -06:00
Tom Boucher
eb40f74cfe fix: detect worktree basePath in gsdRoot() to prevent escaping to project root (#3083)
When gsdRoot() is called with a basePath inside .gsd/worktrees/<name>/,
the git-root probe and walk-up logic can escape to the project root's .gsd
directory. This causes ensurePreconditions() to create slice directories
in the wrong location and deriveState() to read stale project-root state
instead of worktree-local state.

Add isInsideGsdWorktree() guard that detects the .gsd/worktrees/<name>/
pattern in the basePath before the git rev-parse probe runs. When detected,
return the worktree-local .gsd path immediately. Also check the
symlink-resolved path for the pattern (handles macOS /tmp -> /private/tmp).

Closes #2594

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:39:15 -06:00
Tom Boucher
a9d881ad8c fix: invalidate stale quick-task captures across milestone boundaries (#3084)
Closes #2872

Quick-task captures resolved in a prior milestone were re-executed in
subsequent sessions because loadActionableCaptures() used the Executed
flag as its sole staleness gate.  Milestone completion never marked
captures as executed, so captures whose issues were fixed by planned
work remained permanently actionable.

Three changes fix this:

1. Track which milestone a capture was resolved in (new **Milestone:**
   field in CAPTURES.md, written by markCaptureResolved and the triage
   prompt).  loadActionableCaptures() now accepts an optional
   currentMilestoneId and excludes captures from prior milestones.

2. Add a verification step to buildQuickTaskPrompt() instructing the
   agent to confirm the issue still exists before making changes.

3. Add stampCaptureMilestone() as a reconciliation safety net --
   executeTriageResolutions() stamps actionable captures that are
   missing the Milestone field with the current milestone ID.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:39:07 -06:00
Tom Boucher
6e22a20580 fix: defer model validation until after extensions register (#3089)
* fix: defer model validation until after extensions register (#2626)

Extension-provided models (e.g. claude-code/claude-sonnet-4-6) were
silently overwritten on every startup because the model validation ran
before createAgentSession(), which is where extensions register their
models in the ModelRegistry. At validation time, extension models did
not exist in the registry, so the user's valid choice was replaced
with a built-in fallback.

Extract validation into validateConfiguredModel() and call it after
createAgentSession() in both print-mode and interactive-mode paths.

Closes #2626

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: align MinimalSettingsManager interface with SettingsManager

The MinimalSettingsManager interface used `string` for thinking level
types, but SettingsManager uses a specific union type and returns
`undefined`. This caused TS2345 at cli.ts lines 448 and 587.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:38:10 -06:00
Tom Boucher
2f3ffbfc10 fix: repair YAML bullet lists in malformed tool-call JSON (#3090)
* fix: repair YAML bullet lists in malformed tool-call JSON (#2660)

When LLMs copy YAML template formatting into tool-call arguments, they
produce `"key": - item` instead of `"key": ["item"]`, causing JSON parse
errors that block milestone completion. Add a repairToolJson() utility
that detects and converts YAML-style bullet lists into JSON arrays before
parsing. Integrated into both the PartialMessageBuilder (claude-code-cli)
and the anthropic-shared streaming provider, with fallback in
parseStreamingJson for all other providers.

Closes #2660

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use .js import extension in repair-tool-json test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:37:09 -06:00
Tom Boucher
c924f9f1f8 fix: unify SUMMARY.md render paths for projection fidelity (#3091)
* fix: unify SUMMARY.md render paths for projection fidelity

Closes #2720

renderSummaryMarkdown (complete-task.ts) and renderSummaryContent
(workflow-projections.ts) produced structurally different output for the
same data — different frontmatter format, different sections, different
formatting. Deleting a SUMMARY.md and regenerating it via projection
yielded a different file than the original.

Fix: make renderSummaryContent the single source of truth. complete-task
now builds a TaskRow from params and delegates to renderSummaryContent.
The projection renderer passes verification evidence from the DB so
both paths produce identical output including the Verification Evidence
table, Files Created/Modified section, and YAML-format frontmatter.

Added getVerificationEvidence() to gsd-db for projection-time evidence
retrieval, and a 22-assertion parity test that prevents future drift.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: safe type assertion for verification evidence query result

Cast through `unknown` to satisfy TS2352 — better-sqlite3's `.all()`
returns `Record<string, unknown>[]` which doesn't directly overlap with
`VerificationEvidenceRow[]`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:36:37 -06:00
Tom Boucher
3e78270cad fix: chat mode misrepresents terminal output, looks stuck, omits user messages (#3092)
Three root causes addressed:

1. PtyChatParser: user input echoed after a bare prompt line (e.g. "❯ \n"
   followed by "hello\n") was misclassified as assistant content. Added
   _awaitingInput flag that flips true on prompt boundary and classifies the
   next content line as role=user.

2. Chat mode "looks stuck": when the session is idle (connected, not
   streaming, has timeline content), no visual cue indicated GSD was waiting
   for input. Added a "Ready for your input" indicator with a pulsing dot.

3. Transcript overflow misalignment: chatUserMessages was not trimmed when
   liveTranscript/completedTurnSegments overflowed MAX_TRANSCRIPT_BLOCKS,
   causing index-based interleaving to pair user messages with wrong
   assistant responses.

Also exposed isAwaitingInput() on PtyChatParser so chat UIs can query
whether the session is waiting for user input, and widened the > and $
prompt marker regexes to match bare prompts after trimEnd strips trailing
whitespace.

Closes #2707

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:36:21 -06:00
Tom Boucher
4327b4bb3f fix: resolve 4 state corruption bugs in milestone/slice completion (#2945) (#3093)
* fix: resolve 4 state corruption bugs in milestone/slice completion workflow

Closes #2945

Bug 1 - ROADMAP corrupted by inline UAT content:
renderRoadmapContent and renderPlanContent used slice.full_uat_md as a
fallback when the demo field was empty. This injected multi-line UAT
content (preconditions, steps, expected results) into table cells,
corrupting the markdown table and making subsequent slices invisible to
the parser. Fix: use "TBD" fallback instead of full_uat_md.

Bug 2 - complete-milestone accepts pending slices via event replay:
workflow-reconcile's replayEvents blindly called updateSliceStatus("done")
for complete_slice events without validating that all tasks in the slice
were actually complete. During API overload or partial execution, this
allowed slices with pending tasks to be marked done, which then let
complete-milestone succeed. Fix: extract replaySliceComplete function that
validates task completion before updating slice status.

Bug 3 - Worktree directory not cleaned up after merge:
WorktreeResolver._mergeWorktreeMode delegated worktree cleanup to
mergeMilestoneToMain's internal best-effort removeWorktree call, which
can silently fail. Fix: add secondary teardownAutoWorktree call after
successful merge to ensure cleanup.

Bug 4 - Quality gate records not written by validate-milestone:
handleValidateMilestone wrote to the assessments table and rendered
VALIDATION.md to disk, but never persisted quality_gates records in the
DB. Fix: insert milestone-level quality gates (MV01-MV04) alongside the
assessment record. Extended GateScope and GateId types to support
milestone-level validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: align test type literals with MilestoneRow, SliceRow, and AutoSession

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update tests for removed full_summary_md fallback and FK constraints

- workflow-projections: renderPlanContent test now expects TBD fallback
  instead of full_summary_md (removed in #2945 to prevent corruption)
- validate-milestone: insert slice rows before validation so
  quality_gates FK constraint (milestone_id, slice_id) is satisfied
- worktree-resolver: update teardownAutoWorktree assertion from 0 to 1
  to account for secondary cleanup added in #2945 Bug 3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:36:07 -06:00
Tom Boucher
a02b140f61 fix: isolate guided-flow session state and key discussion milestone queries (#2985) (#3094)
* fix: resolve 4 correctness bugs in GSD extension core (#2985)

Bug 1 — preferences.ts process.cwd() side-channel:
  loadEffectiveGSDPreferences() and loadProjectGSDPreferences() now accept
  an optional projectRoot parameter.  When provided, preferences are loaded
  from the specified project directory instead of relying on process.cwd().
  All 37+ callers continue to work unchanged (parameter defaults to cwd).

Bug 2 — state.ts DB writes inside read functions (CQS violation):
  Extracted disk-to-DB milestone reconciliation into a new exported function
  reconcileDiskMilestonesToDb().  deriveState() and deriveStateFromDb() no
  longer write to the DB as a side effect of reading state.  Callers that
  need reconciliation (auto-start.ts, guided-flow.ts, register-hooks.ts)
  now call it explicitly before reading state.

Bug 3 — guided-flow.ts module-level session state:
  Converted pendingAutoStart from a module-level singleton to a Map keyed
  by basePath.  Concurrent discuss sessions for different projects are now
  independent — the second session no longer silently overwrites the first.

Bug 4 — getDiscussionMilestoneId() unkeyed query:
  getDiscussionMilestoneId() now accepts an optional basePath parameter for
  keyed lookup.  When multiple sessions exist and no basePath is provided,
  it returns null instead of an arbitrary entry.  Single-session backward
  compatibility is preserved.

Closes #2985

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: narrow scope to complement igouss PRs #2986 and #2987

Revert changes to preferences.ts (Bug 1), state.ts, auto-start.ts,
register-hooks.ts (Bug 2), and their test files. Those fixes are
covered by @igouss in PRs #2986 and #2987.

This PR now only contains:
- Bug 3: guided-flow.ts pendingAutoStart singleton → Map (session isolation)
- Bug 4: getDiscussionMilestoneId() keyed by basePath
- Supporting unitType additions in preferences-models.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: align source code with test expectations after scope narrowing

The refactor commit (6972c97c) reverted source changes to state.ts,
preferences.ts, and auto-start.ts but left their corresponding test
assertions in place, causing 8 CI failures:

- isValidationTerminal: treat any extracted verdict as terminal (#2769)
- parseHeadingListFormat: handle raw YAML blocks under headings (#2794)
- bootstrapAutoSession: snapshot ctx.model before guided-flow (#2829)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: open project DB before initial deriveState on cold bootstrap (#2841)

When auto-mode starts cold (no prior DB handle), deriveState silently
falls back to markdown-only data for DB-backed helpers (queue-order,
task status), producing stale or incomplete state.  Add
openProjectDbIfPresent() helper that resolves the project-root DB path
and opens it before the first deriveState call, ensuring full data
visibility from the start.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:33:30 -06:00
Tom Boucher
45bd2572ac fix(guided-flow): route dispatchWorkflow through dynamic routing pipeline (#3153)
Closes #2958

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:33:23 -06:00