Commit graph

929 commits

Author SHA1 Message Date
Tom Boucher
8aa71bfb55 fix: prevent ensureGitignore from adding .gsd when tracked in git (#1364) (#1367)
* rfc: GitOps branching & versioning strategy proposal

Proposes a Git-Flow Lite model with automated integration branches:

  main          ← production-ready, tagged releases only
  next          ← integration branch for next minor (PRs target here)
  release/X.Y   ← stabilization branch, only bugfixes allowed
  hotfix/X.Y.Z  ← emergency fixes cherry-picked to release

Includes:
  - RFC document with lifecycle diagrams, migration path, open questions
  - Workflow scaffolds (in docs/proposals/workflows/, NOT .github/):
    - create-release.yml: manual dispatch to cut release branch from next
    - sync-next.yml: auto-sync next branch after version tags
    - backmerge.yml: auto back-merge release fixes to next

This is an experimental proposal requesting community feedback before
any implementation. The workflow files are inert scaffolds — they do
not run in CI.

* fix: prevent ensureGitignore from adding .gsd when tracked in git (#1364)

CRITICAL DATA-LOSS FIX: ensureGitignore() unconditionally added '.gsd' to
.gitignore even when .gsd/ was a real git-tracked directory, causing git to
report ~889 tracked files as deleted.

Root cause: BASELINE_PATTERNS included '.gsd' unconditionally, and the
gitignore modification ran BEFORE migration checks in auto-start.ts.

Changes:
- Add hasGitTrackedGsdFiles() helper using nativeLsFiles to detect tracked
  .gsd/ content
- ensureGitignore() now skips the '.gsd' pattern when .gsd/ has tracked files
- untrackRuntimeFiles() now skips entirely when .gsd/ has tracked files
- migrateToExternalState() aborts when .gsd/ has tracked files
- Reorder auto-start.ts: migration runs BEFORE gitignore modification
- Add 8 regression tests covering all scenarios

Fixes #1364

* fix: break recursive dialog loop when all milestones complete (#1348)

Two interacting bugs:

1. Recursive dialog loop: When all milestones are complete, bootstrapAutoSession
   calls showSmartEntry → sets pendingAutoStart → checkAutoStartAfterDiscuss
   calls startAuto → bootstrapAutoSession → showSmartEntry → infinite loop.
   The discuss workflow completes without producing a milestone directory, so
   phase stays 'complete' and the cycle never breaks.

   Fix: Add a re-entry counter (_consecutiveCompleteBootstraps) that tracks
   how many times bootstrapAutoSession enters the 'complete' branch without
   advancing. After 2 consecutive attempts, break the loop with a warning
   message and return false.

2. Missing _releaseFunction = null in retry lock onCompromised handler:
   The retry lock path in session-lock.ts set _lockCompromised but didn't
   null out _releaseFunction, which could leave a stale reference that
   masks the compromise detection in validateSessionLock().

Fixes #1348

* fix: self-heal stale roadmap checkbox for interrupted complete-slice (#1350)

When complete-slice is interrupted after writing SUMMARY.md and UAT.md but
before flipping the roadmap checkbox, auto-mode enters an infinite loop —
re-launching the same complete-slice unit because the dispatch loop uses
the roadmap checkbox as the sole 'slice done' signal.

Fix: Add a self-heal case in selfHealRuntimeRecords that detects when
SUMMARY + UAT exist but the roadmap checkbox is unchecked, and auto-fixes
the checkbox. This allows the verification to pass and the dispatch loop
to advance.

Fixes #1350

* fix: add EISDIR guard to complete/validate milestone prompts (#1343)

The LLM was passing tasks/ directory paths to the read tool during
milestone completion, causing EISDIR crashes. Added file system safety
instructions to both complete-milestone and validate-milestone prompts
telling the LLM to use ls/find for directory listing, not the read tool.

Fixes #1343

* feat: improve extension conflict messages with removal guidance (#1347)

When a user extension registers tools/commands that now ship as built-ins,
the conflict message now includes '(built-in tool supersedes — consider
removing <path>)' and the log level is downgraded from 'Extension load error'
to 'Extension conflict'.

Changes:
- resource-loader.ts: detect built-in vs user extension conflicts, add hint
- cli.ts: downgrade severity for superseded-tool conflicts

Fixes #1347

* test: fix always-skipped preferences test, add test:marketplace script

- preferences.test.ts: Replace always-skipped getIsolationMode test with
  a filesystem-independent version that validates the default through
  validatePreferences() instead of reading ~/.gsd/preferences.md.
  Reduces skipped count from 3 → 2.

- package.json: Add test:marketplace script for running marketplace
  contract tests (claude-import-tui, plugin-importer-live,
  marketplace-discovery) with GSD_TEST_CLONE_MARKETPLACES=1.
  These tests need external repos and self-skip in unit test runs.

Remaining 2 skips:
- Marketplace contract test suites (need external repos, run via test:marketplace)
- Windows-only tests in validate-directory.test.ts are platform-conditional
  and correctly skip on macOS

* fix: use execFileSync in regression tests for Windows portability

The regression tests used execSync with shell-dependent constructs:
- '&&' command chaining (works in bash/cmd but fragile)
- Single-quoted commit messages (bash-only, cmd.exe splits on spaces)

Replaced with execFileSync via a git() helper that bypasses the shell
entirely. Each git operation is a separate call with proper argument
arrays, eliminating all shell interpretation issues.

Fixes windows-portability CI failure.

* fix: guard milestone completion against missing slice summaries (#1368)

Auto-mode could report a milestone as complete after executing only the
last slice, skipping earlier unexecuted slices. The milestone completion
signal fired based on roadmap checkbox state, which could be stale or
inconsistent after worktree transitions.

Changes:
- auto-dispatch.ts: Added slice SUMMARY file existence check to both
  validating-milestone and completing-milestone dispatch rules. If any
  slice lacks a SUMMARY file, dispatch stops with a diagnostic error
  instead of proceeding to validation/completion.
- validate-milestone.test.ts: Updated tests to create slice summary
  files (required by the new guard).
- file-watcher.test.ts: Fixed flaky 'auth.json change emits auth-changed
  event' test by adding watcher initialization delay and increasing event
  propagation timeout (race condition when run in full suite).

Fixes #1368

* fix: warn on common misspelled preference keys + verify field guidance (#1373, #1341)

#1373: Users setting 'taskIsolation.mode: none' instead of 'git.isolation: none'
got a generic 'unknown key' warning. Added KEY_MIGRATION_HINTS that map common
misspellings (taskIsolation, task_isolation, isolation, manage_gitignore, auto_push,
main_branch) to their correct git.* equivalents with actionable messages.

#1341: Planning agent writes aspirational prose in Verify fields ('Sections 3.1
and 3.2 exist with exact formulas. Zero TBD.') instead of executable commands.
Added explicit verify field rules to the plan template: must be mechanically
executable, with examples of good vs bad patterns for content tasks.

Fixes #1373, partially addresses #1341

* refactor: extract roadmap-mutations.ts + shared test-utils.ts

Consolidation:
- roadmap-mutations.ts: Extracted markSliceDoneInRoadmap() and markTaskDoneInPlan()
  from duplicated implementations in doctor.ts, mechanical-completion.ts, and
  auto-recovery.ts. All three callers used identical regex patterns.
  mechanical-completion.ts and auto-recovery.ts now import the shared utility.
  (doctor.ts deferred — touched by PR #1349)

- test-utils.ts: Shared cross-platform test utilities for GSD extension tests.
  Provides git() helper (execFileSync, no shell), makeTempRepo() with
  core.autocrlf=false, cleanup(), createFile(), safeReadFile(), and
  writeMilestoneFixture(). 12 test files currently define their own versions
  of these helpers — new tests should import from test-utils.ts instead.

Security audit: No injection vectors (sid/tid are alphanumeric from roadmap
parser), no path traversal, no secrets, no new dependencies.

* fix: port conflict false positive on non-Node projects + paused worktree resume (#1381, #1383)

projects without package.json. macOS AirPlay Receiver listens on port 5000,
causing a spurious warning on non-Node projects.
Fix: Skip port checks entirely when no package.json exists. When using
default ports, filter out 5000 on macOS.

in-memory only. Re-entering /gsd started a fresh bootstrap from the project
root instead of the active worktree.
Fix: pauseAuto() now writes paused-session.json to .gsd/runtime/ with
milestoneId, worktreePath, originalBasePath, and stepMode. startAuto()
checks for this file before bootstrap and restores the paused session
context, including worktree re-entry. stopAuto() cleans up the file.

Fixes #1381, #1383

* fix: catch spawn ENOENT in uncaught exception guard + snapshot session lock path (#1384, #1363)

uncaught exception and crashes auto-mode. The EPIPE guard now also catches
ENOENT from spawn syscalls — logs the error and continues instead of
terminating the process.

the lock path differently via gsdRoot() because basePath could be either the
project root or a worktree path. gsdRoot() produces different results for
each, so the lock was written to one path and validated against another.
Fix: Snapshot the resolved lock path (_snapshotLockPath) at acquisition time
and reuse it for all subsequent lock operations within the session.

Fixes #1384, #1363

* fix: suppress false-positive lock compromise + skip migration with active worktrees (#1362, #1337)

because the event loop stall delays the heartbeat mtime update. The handler
now checks elapsed time since acquisition — if within the 30-minute stale
window, it logs a warning and continues instead of setting _lockCompromised.
Real takeovers (past the stale window) still trigger the compromise flag.

even when .gsd/worktrees/ contained active git worktrees with locked
directory handles. This caused EBUSY errors and destructive data loss.
Migration now checks for active worktree directories and skips entirely
if any are found.

Fixes #1362, #1337
2026-03-19 17:06:01 -06:00
TÂCHES
2ea7abcd0c refine: extensions elegance improvements (#1503)
* refine: R1 delete dead wizard-ui.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: R2 remove dead BgProcess fields (commandHistory, envKeys)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: R3 remove no-op acknowledgeDeliveries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: R4 remove unused lineDedup tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: R5 remove unused ProcessEvent types (output, pattern_match)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: S1 replace duplicate formatTokens with shared formatTokenCount

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: R1 remove re-staged wizard-ui.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: S2 consolidate maskEditorLine into shared/sanitize

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refine: S3 add session cleanup to context7 and google-search

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 16:59:52 -06:00
Tom Boucher
8e2827646a fix: check project root .env when secrets gate runs in worktree (#1387) (#1470)
In worktree isolation mode, the secrets gate checked for .env at the
worktree path (process.cwd()), but the user's .env lives at the project
root. Keys that existed in the project root's .env were reported as
missing, causing repeated blocking key collection prompts.

Fix: getManifestStatus() now accepts an optional projectRoot parameter.
When provided (worktree mode), it checks both the worktree .env AND the
project root .env. All callers in auto.ts and auto-start.ts updated to
pass s.originalBasePath.

Fixes #1387
2026-03-19 16:57:59 -06:00
Tom Boucher
bc9cfb1992 fix: realign cwd before dispatch + clean stale merge state on failure (#1389) (#1400)
In worktree isolation, process.cwd() drifts when async_bash or background
jobs change directory, causing commits to land on the wrong branch.
Realign cwd to basePath before every dispatch and hook dispatch.

Also clean stale .git/SQUASH_MSG, MERGE_HEAD, MERGE_MSG after failed
squash-merges to prevent subsequent git operations from seeing phantom
merge state.

Adapted to post-M001 architecture: cwd fix in auto-loop.ts runUnit(),
hook cwd fix in auto.ts dispatchHookUnit(), merge cleanup in
worktree-resolver.ts _mergeWorktreeMode().

Co-authored-by: Lex Christopherson <lex@glittercowboy.com>
2026-03-19 16:55:18 -06:00
HyperDev1
c58bf42333 fix: create milestones/ directory in worktree when missing (#1374)
syncGsdStateToWorktree() assumed the milestones/ directory already
existed in the worktree. On a fresh worktree bootstrap this directory
is absent, causing milestone sync to silently skip all entries and
auto-mode to report "All milestones complete" immediately.

Create the directory before iterating if the main repo has milestones
but the worktree does not.

Co-authored-by: Berat Can <berat@hyperlab.games>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 16:54:19 -06:00
Tom Boucher
f069790d7d fix: inject network_idle warning into hook prompts (#1345) (#1401)
Post-unit hooks that use browser tools can hang indefinitely when the LLM
calls browser_wait_for with condition 'network_idle' against a dev server
with persistent connections (Vite HMR, WebSocket). The networkidle event
never fires because at least one connection stays open.

Fix: Inject a browser safety instruction into every hook prompt warning
against network_idle and recommending selector_visible, text_visible, or
delay as alternatives.

Fixes #1345
2026-03-19 16:54:14 -06:00
Tom Boucher
b720e7e15c fix: verify symlink after migration + fix test failures (#1377) (#1404)
migrateToExternalState() moved .gsd/ to ~/.gsd/projects/<hash>/ and
created a symlink, but never verified the symlink resolved correctly.
On Windows, junction creation can silently fail or resolve to the wrong
target. If the symlink was broken, the backup (.gsd.migrating) was
deleted anyway, losing all project state.

Changes:
- migrate-external.ts: After creating symlink, verify it resolves to the
  expected path and is readable. If verification fails, restore from backup.
- repo-identity-worktree.test.ts: Canonicalize temp dirs with realpathSync
  to fix macOS /var → /private/var mismatch in path assertions.
- resource-loader.ts: Check for agents/ subdir before using dist/resources
  as source — partial builds (tsc without copy-resources) create an
  incomplete dist/resources that's missing agents/ and skills/.

Fixes #1377
2026-03-19 16:54:11 -06:00
TÂCHES
bdeec039c0 fix: validate CWD instead of project root when running from a GSD worktree (#1317) (#1504)
When the user's home directory is a git repo, resolveProjectRoot() correctly
returns $HOME as the main tree root. assertSafeDirectory() then hard-blocks it
with the home-directory guard added in #1053, even though the session is running
from a valid GSD worktree (e.g. ~/.gsd/worktrees/M001) — not from $HOME itself.

Fix: in projectRoot(), detect when CWD diverges from the resolved root (i.e. we
are inside a git worktree) and validate the CWD instead. The worktree path is
never $HOME, so the guard no longer fires. When not in a worktree cwd === root,
preserving the existing behaviour unchanged.

Adds a regression test: validateDirectory() on a ~/.gsd/worktrees/M001 path
must return { safe: true, severity: "ok" }.

Co-authored-by: Jeremy McSpadden <jeremy@fluxlabs.net>
2026-03-19 16:53:34 -06:00
Derek Pearson
5eed57f876 fix(gsd): detect initialized health widget projects (#1432)
* fix(gsd extension): detect initialized projects in health widget

Use .gsd presence plus project-state detection for the health widget so bootstrapped projects no longer appear as unloaded before metrics exist.

* fix(gsd extension): make health widget execution-aware

Lead the health widget with current GSD execution state so it explains what the project is doing before surfacing provider and environment diagnostics. Keep issue, budget, and progress details as secondary context and cover the new output with focused widget tests.

* fix(gsd extension): address review feedback on health widget PR

- Replace em dash with ASCII hyphen in headline for terminal safety
- Reformat catch/finally to standard single-line style
- Replace computeProgressScore() status with direct phase labels so
  the status reflects the actual execution phase, not a global health
  aggregate
- Use lightweight milestone-dir scan instead of full detectProjectState()
  to avoid unnecessary filesystem work on the 60s refresh
- Add cache warm-up comment on updateSliceProgressCache call
- Add safety comment on early void refresh() call
- Update test assertions for new phase labels and ASCII separator
2026-03-19 16:50:13 -06:00
Jeremy McSpadden
96b94065ff fix: smarter .gsd root discovery — git-root anchor + walk-up replaces symlink hack (#1386)
* fix: replace symlink-follow in gsdRoot() with git-root-anchored walk-up discovery

The old implementation blindly assumed .gsd lived at basePath and only
followed symlinks as a migration escape hatch. This caused the health
widget to show "No project loaded" when:
- .gsd was moved to a non-default location
- cwd was a subdirectory of the project root
- the session started inside a worktree path

New probe chain in gsdRoot():
  1. basePath/.gsd         — fast path (common case, zero overhead)
  2. git rev-parse root    — anchors to the repo root regardless of cwd
  3. Walk up from basePath — finds .gsd in an ancestor (bounded by git root)
  4. basePath/.gsd         — creation fallback for init/new projects

Key correctness detail: basePath is normalized via realpathSync before
any comparisons, ensuring the git-root boundary check works on macOS
where /var is a symlink to /private/var. Walk-up only runs when inside
a git repo and only when basePath != gitRoot — preventing escape into
unrelated filesystem directories.

Result is cached per-basePath for the process lifetime. All 52 callers
of gsdRoot() benefit with no call-site changes.

Adds tests/paths.test.ts covering all 6 probe cases.

* fix: correct report() call signature in paths.test.ts — takes no arguments

* fix: normalize git output paths and use realpathSync.native for Windows compatibility

- Use path.normalize() on git rev-parse output to convert forward slashes
  to backslashes on Windows, so the git-root boundary check fires correctly
- Use realpathSync.native() instead of realpathSync() to resolve both
  symlinks (macOS /var→/private/var) and 8.3 short names (Windows RUNNER~1)
- Update test tmp() helper to use realpathSync.native so expected paths
  match the resolved paths returned by probeGsdRoot
2026-03-19 16:50:10 -06:00
HyperDev1
fecf32dc1e fix: correct GSD-WORKFLOW.md fallback path and sync to agentDir (#1375)
The fallback path for GSD-WORKFLOW.md still referenced the legacy .pi
directory (~/.pi/GSD-WORKFLOW.md) instead of the correct .gsd/agent
location. This broke workflow dispatch when GSD_WORKFLOW_PATH env var
was not set.

- Update fallback path from ~/.pi/ to ~/.gsd/agent/ in three call sites
  (dispatchWorkflow, dispatchDoctorHeal, handleTriage)
- Sync GSD-WORKFLOW.md to agentDir during initResources() as a fallback
  for alternative entry points that may not set the env var

Co-authored-by: Berat Can <berat@hyperlab.games>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 16:50:08 -06:00
TÂCHES
d3d682cf00 Merge pull request #1359 from jbrahy/fix/loadfile-eisdir-guard
fix: avoid EISDIR crash in GSD file loader
2026-03-19 16:38:11 -06:00
TÂCHES
0028ceef26 Merge pull request #1357 from jbrahy/fix/inspect-open-existing-db
fix: open existing GSD database on /gsd inspect
2026-03-19 16:30:56 -06:00
TÂCHES
671b72b684 Merge pull request #1446 from frizynn/refactor/extension-type-guards-consolidation
refactor: consolidate extension type guards and inline handler aliases
2026-03-19 15:44:17 -06:00
TÂCHES
aa6f41c256 Merge pull request #1424 from jeremymcs/fix/doctor-gap-coverage
fix: close 5 doctor coverage gaps — providers, lock dir, integration branch, orphaned worktrees
2026-03-19 15:39:52 -06:00
TÂCHES
9ac9cb06e2 Merge pull request #1390 from jeremymcs/fix/prefs-gaps
fix(prefs): close merge, validation, serialization, and docs gaps
2026-03-19 15:39:48 -06:00
TÂCHES
f029ae6f64 Merge pull request #1409 from trek-e/fix/1398-crash-lock-pid-check
fix: add PID self-check to guided-flow crash lock detection (#1398)
2026-03-19 15:39:23 -06:00
TÂCHES
364bd5b65b Merge pull request #1430 from trek-e/fix/1423-session-cost-compaction
fix: accumulate session cost independently of message array (#1423)
2026-03-19 15:39:19 -06:00
Jean-Dominique Stepek
29a1882c04 feat(gsd): add /gsd changelog command with LLM-summarized release notes (#1465)
Add a new /gsd changelog command that fetches releases from the GitHub API,
filters by version, and sends the raw changelog into the conversation for the
LLM to summarize the most important changes.

- New changelog.ts module: GitHub API fetch, semver filtering, body parsing
- Routing block in commands.ts with lazy import (same pattern as forensics)
- Tab completion in commands-bootstrap.ts TOP_LEVEL_SUBCOMMANDS
- Help text under VISIBILITY section in showHelp()
- No new npm dependencies — uses built-in fetch()
2026-03-19 15:36:43 -06:00
John Brahy
c10e42b392 fix(mcp): preserve args for mcp_call tool invocations (#1354) 2026-03-19 15:29:19 -06:00
TÂCHES
d761e45a41 M001: The Minimal Machine — linear auto-loop, sole-authority state, sidecar queue, WorktreeResolver (#1419)
* refactor: replace recursive auto-dispatch with linear autoLoop, delete ~3k lines of dead code

Replace the complex recursive dispatch system (dispatchNextUnit, reentrancy
guards, stall detection, idempotency tracking, skip-depth machinery) with a
simple linear while(s.active) loop in auto-loop.ts.

Key changes:
- New auto-loop.ts with autoLoop(), runUnit(), resolveAgentEnd()
- Deleted auto-idempotency.ts, auto-stuck-detection.ts, session-lock.ts,
  mechanical-completion.ts, progress-score.ts, auto-constants.ts, unit-id.ts
- Extracted WorktreeResolver class for worktree path resolution
- Added auto-worktree-sync.ts for worktree synchronization
- Simplified auto.ts from ~1400 lines to ~400 lines
- Fixed 9 TypeScript errors (NotifyCtx type widening, capture typing)
- Comprehensive test coverage: 32 auto-loop tests + worktree resolver/DB tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address 6 audit findings in auto-loop refactor

1. CRITICAL: Move pendingResolve to AutoSession + queue orphaned agent_end
   events instead of silently dropping them. Prevents permanent stalls when
   error-recovery sendMessage retries fire between loop iterations.

2. HIGH: Scope pendingResolve per-session via _activeSession ref, preventing
   concurrent /gsd auto sessions from corrupting each other's promises.

3. HIGH: Replace console.log in dispatchHookUnit with debugLog to prevent
   hook prompt content (potentially containing secrets) from leaking to stdout.

4. HIGH: Restore parked milestone handling in state.ts — Phase 1 skips
   parked milestones so they don't satisfy depends_on, Phase 2 registers
   them as 'parked' status. Add 'parked' to MilestoneRegistryEntry type.

5. MEDIUM: Restore queuePhaseActive parameter in shouldBlockContextWrite
   and re-export setQueuePhaseActive for guided-flow-queue.ts consumers.

6. MEDIUM: Add MAX_LOOP_ITERATIONS (500) lifetime cap to autoLoop to prevent
   runaway loops when units alternate between IDs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve build breakers, add correctness fixes, and graduated recovery

Build breakers (CRITICAL):
- Restore unit-id.ts (deleted but still imported by complexity-classifier.ts, metrics.ts)
- Restore progress-score.ts (deleted but still imported by commands.ts, dashboard-overlay.ts, doctor.ts)
- Rewrite worktree-sync-milestones.test.ts to use new syncProjectRootToWorktree API

Correctness fixes (MEDIUM):
- Cap pendingAgentEndQueue to 3 entries to prevent unbounded growth from stale events
- Add milestoneId path traversal validation in WorktreeResolver
- Clear depthVerificationDone on session_start to prevent cross-session leaks in RPC mode
- Add verification gate for non-hook sidecar units (triage, quick-tasks)
- Remove dead handleAgentEnd import from index.ts

Graduated recovery (Jeremy's feedback):
- Blanket try/catch around loop body — one bad iteration no longer kills the session
- Graduated stuck recovery: at count 3 try artifact verification + cache invalidation,
  at count 5 hard stop (was: binary stop at 5 with no recovery attempt)
- Graduated error recovery: 1st error retries, 2nd invalidates caches, 3rd stops

Test results: 32/32 auto-loop, 28/28 worktree-resolver, 11/11 sidecar-queue, tsc clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore copyWorktreeDb/reconcileWorktreeDb exports and fix loadToolApiKeys import

Two missing exports caused ~90% of the 120 pre-existing test failures:

1. copyWorktreeDb + reconcileWorktreeDb — imported by auto-worktree.ts but
   never added to gsd-db.ts. Restored with the original implementations.
2. loadToolApiKeys — moved to commands-config.ts but index.ts still imported
   from commands.ts. Fixed the import path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move loadToolApiKeys import to commands-config.js

loadToolApiKeys was moved to commands-config.ts but index.ts still
imported it from commands.ts, causing runtime failures in all tests
that transitively load the extension entry point.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: fix provider error assertion on windows

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 14:56:00 -06:00
Derek Pearson
f2657e1ba0 fix: release stranded bootstrap locks and handle re-entrant reacquire (#1352)
Release session locks on bootstrap abort paths and reset same-process lock state before re-acquiring so stale proper-lockfile callbacks cannot poison a fresh auto-mode session. Adds regression coverage for bootstrap cleanup and re-entrant lock acquisition.
2026-03-19 12:12:06 -06:00
frizynn
bbc180a3b6 refactor: consolidate extension type guards and inline handler type aliases
Replace 7 individual ToolResultEvent type guards (isBashToolResult,
isReadToolResult, etc.) with a unified isToolResultEventType() function,
mirroring the existing isToolCallEventType() pattern.

Inline 14 handler type aliases (SendMessageHandler, SetModelHandler, etc.)
directly into the ExtensionActions interface since they were only used there
and added no semantic value.

Update documentation examples to use the new unified guard.
2026-03-19 14:55:00 -03:00
Tom Boucher
8b0727c0e5 fix: accumulate session cost independently of message array (#1423)
getSessionStats() calculated cost by summing usage from assistant messages
in state.messages. After auto-compaction, pre-compaction messages are
replaced by a compactionSummary with no usage field — dropping the cost.

Fix: Added cumulative accumulators (_cumulativeCost, _cumulativeInputTokens,
_cumulativeOutputTokens, _cumulativeToolCalls) that are incremented on
every assistant message event, independent of the message array.
getSessionStats() now returns max(array-sum, cumulative) to ensure
monotonically non-decreasing values.

Fixes #1423
2026-03-19 12:44:11 -04:00
Jeremy McSpadden
c048aa2e7a fix: resolve CI failures — scope provider check, fix Windows path, correct severity
Three CI regressions from the initial commit:

1. doctor.test.ts "two blocking errors" assertion broke (expected 2, got 3):
   The provider check fired on any project with an active milestone, including
   CI environments with no API key. Fix: change provider_key_missing severity
   from "error" to "warning". A missing key is advisory — it blocks future
   dispatch but doesn't corrupt existing state, analogous to env_git_remote.

2. doctor-runtime.test.ts stranded_lock_directory fails on Windows:
   proper-lockfile uses advisory file locking on Windows, not the directory-based
   mechanism (.gsd.lock/). The check and tests are POSIX-specific. Fix: skip
   both stranded_lock_directory tests on Windows with process.platform guard,
   same pattern used by worktree and branch tests.

3. doctor-checks.ts used root.split("/").pop() which is not cross-platform:
   Windows paths use backslash separators. Fix: replace with basename(root)
   from node:path which is platform-aware. Also add basename to imports.
2026-03-19 11:41:48 -05:00
Jeremy McSpadden
ccde39c2c8 fix: close 5 doctor coverage gaps — providers, lock dir, integration branch, orphaned worktrees
Closes the highest-impact gaps identified in the /gsd doctor deep-dive analysis.

**1. Wire provider checks into runGSDDoctor()**
doctor-providers.ts existed and worked but was never called from the main
doctor run. Units could dispatch into guaranteed API failures with no warning.
Now runProviderChecks() is called in runGSDDoctor() and converts required-provider
errors/warnings into DoctorIssue entries with codes:
  - provider_key_missing (error)
  - provider_key_backedoff (warning)

**2. Stranded lock directory detection (doctor-checks.ts)**
proper-lockfile creates a .gsd.lock/ directory as the OS-level lock mechanism.
After SIGKILL or hard crash, this directory can remain stranded, blocking all
future auto-mode sessions from acquiring the lock (#1245 pattern). Doctor now:
  - Detects .gsd.lock/ existing without a live process holding it
  - Reports as stranded_lock_directory (error, fixable)
  - Auto-fix removes the stranded directory

**3. Integration branch existence check (doctor-checks.ts + doctor-proactive.ts)**
When a milestone records an integration branch and that branch is later deleted
or renamed, merge-back will fail silently at the end of the milestone. Doctor now:
  - Checks each active milestone's stored integration branch exists in git
  - Reports as integration_branch_missing (error, not auto-fixable)
  - preDispatchHealthGate blocks dispatch if the active milestone's integration
    branch is missing, preventing work from being dispatched into a dead end

**4. Orphaned worktree directory detection (doctor-checks.ts)**
Worktree removal can fail after a branch delete, leaving a .gsd/worktrees/<name>/
directory that is no longer registered with git. Re-creating the same name fails
with "already exists". Doctor now:
  - Compares .gsd/worktrees/ entries against git worktree list
  - Reports unregistered directories as worktree_directory_orphaned (warning, fixable)
  - Auto-fix removes the orphaned directory

Tests: all new codes covered with detection + fix assertions, including
false-positive safety cases (live lock holder, registered worktrees,
existing integration branch). All 1843 existing tests still pass.
2026-03-19 11:25:20 -05:00
Tom Boucher
f0fe4b2443 fix: emit agent_end after abort during tool execution (#1414) (#1417)
* fix: sync worktree completion artifacts back to external state before merge (#1412)

When a worktree's .gsd/ was a real directory (not symlinked to external
state), milestone completion artifacts (SUMMARY, VALIDATION, updated
ROADMAP) were written locally but never synced back. The project root's
deriveState() read from external state and found no SUMMARY — reporting
the milestone as incomplete.

Changes:
- auto-worktree.ts: Added syncWorktreeStateBack() that copies milestone
  and slice .md files from worktree .gsd/ to the main external state dir
- auto.ts: Call syncWorktreeStateBack() in tryMergeMilestone before the
  git merge, ensuring artifacts are visible from the project root

Fixes #1412

* fix: emit agent_end after abort during tool execution (#1414)

When a user aborts a turn while a tool call is running, the abort RPC
succeeds but agent_end was never emitted. RPC consumers tracking turn
lifecycle via events got stuck in a 'streaming' state permanently.

Fix: After abort() + waitForIdle(), emit a synthetic agent_end if the
agent is no longer streaming. This ensures consumers always see the
turn-complete signal regardless of how the turn ended.

Fixes #1414
2026-03-19 10:24:39 -06:00
Tom Boucher
2d921ecfad fix: add PID self-check to guided-flow crash lock detection (#1398)
guided-flow.ts showed 'Interrupted Session Detected' whenever auto.lock
existed, without checking if the lock was written by the current process.
This caused infinite prompt loops when the current session's own lock
triggered the crash detection.

Fix: Added crashLock.pid !== process.pid check, matching the guard in
auto-start.ts.

Also includes test fixes:
- repo-identity-worktree: macOS /var canonicalization
- resource-loader: partial-build dist/resources fallback
- file-watcher: init delay + timeout for timing stability

Fixes #1398
2026-03-19 11:01:37 -04:00
Juan Francisco Lebrero
e6bbd035ba fix: auto-discard bootstrap crash locks and clean auto.lock on exit (#1397)
Two root causes for the false "Interrupted Session Detected" prompt
that appears every time /gsd is run after a normal exit:

1. guided-flow.ts showed the crash recovery menu even for bootstrap
   crashes (unitType="starting", unitId="bootstrap", completedUnits=0)
   where no work was lost. Now these are silently discarded — the menu
   only appears when real auto-mode work was interrupted.

2. session-lock.ts exit handler cleaned the OS lock directory
   (.gsd.lock/) but not the auto.lock metadata file. On next startup,
   readCrashLock() found the stale file and triggered false recovery.
   Now the exit handler also removes auto.lock.
2026-03-19 08:31:15 -06:00
Jeremy McSpadden
efbbcc790d fix(prefs): close merge, validation, serialization, and docs gaps
- mergePreferences(): add auto_visualize and auto_report (both were
  silently dropped when a project prefs file existed alongside global)
- preferences-validation.ts: add validation blocks for auto_visualize,
  auto_report, compression_strategy, and context_selection — all four
  were in KNOWN_PREFERENCE_KEYS and the GSDPreferences interface but
  accepted any value without type-checking
- serializePreferencesToFrontmatter orderedKeys: add skill_staleness_days,
  dynamic_routing, token_profile, phases, parallel, auto_visualize,
  auto_report, verification_commands, verification_auto_fix,
  verification_max_retries, search_provider, compression_strategy,
  context_selection — these were falling through to the arbitrary-order
  fallback loop instead of appearing in consistent positions
- preferences-reference.md: document git.auto_pr, git.pr_target_branch,
  search_provider, compression_strategy, context_selection; add
  deprecation notices for git.commit_docs and git.merge_to_main
- tests/preferences.test.ts: two new test cases covering all four newly
  validated fields (valid values pass, invalid values produce errors)
2026-03-19 08:55:25 -05:00
deseltrus
2dc804a485 fix: harden quick-task branch lifecycle — disk recovery + integration branch guard (#1342) 2026-03-19 07:39:54 -06:00
deseltrus
e6ceb8dfe8 fix: skip verification retry on spawn infra errors (ETIMEDOUT, ENOENT) (#1340) 2026-03-19 07:39:13 -06:00
Jeremy McSpadden
d7bf3d4e72 Improve startup performance with lazy extension loading (#1336) 2026-03-19 07:38:50 -06:00
dan bachelder
b67101c51b fix: keep external GSD state stable in worktrees (#1334) 2026-03-19 07:37:25 -06:00
John Brahy
76e7aec0e8 fix(gsd): avoid EISDIR crash in file loader 2026-03-19 01:40:52 -07:00
John Brahy
6fe2280363 fix(gsd): open existing database on inspect 2026-03-19 01:35:28 -07:00
Tom Boucher
d121c8e3b2 fix: stop excluding all .gsd/ from commits — only exclude runtime files (#1326) (#1328)
smartStage() was excluding the entire .gsd/ directory from git staging,
which is correct when .gsd/ is symlinked to external state. But on
Windows (junction links) or projects where .gsd/ is git-tracked (not
gitignored), this caused a mid-milestone behavioral discontinuity:

1. One-time cleanup removes runtime files from the index
2. After cleanup, nativeAddAll() + nativeResetPaths('.gsd/') causes ALL
   .gsd/ files to be unstaged — including milestone artifacts
3. autoCommit returns null (nothing staged) for the rest of the milestone
4. Work continues silently with no commits, no errors, no warnings
5. Worktree teardown loses all uncommitted .gsd/ artifacts

Fix: replace the blanket '.gsd/' exclusion with targeted RUNTIME_EXCLUSION_PATHS.
Milestone artifacts (.gsd/milestones/, preferences.md, DECISIONS.md, etc.)
are now committed normally when they're tracked. When .gsd/ is in .gitignore
(the default), git add -A already skips it — the reset is a harmless no-op.

Updated git-service.test.ts to verify the new behavior: runtime files
excluded, milestone artifacts committed.

Fixes #1326
2026-03-18 22:06:41 -06:00
Jeremy McSpadden
fc56cdf93e fix: handle ECOMPROMISED in uncaughtException guard and align retry onCompromised (#1322) (#1332)
When a GSD session crashes hard (SIGKILL, OOM, etc.) without running its
exit handler, the proper-lockfile OS lock directory (.gsd.lock/) is left
stranded. On the next /gsd auto resume, acquireSessionLock detects the dead
PID, cleans up the stale directory, and re-acquires via the retry path.

10 seconds later, proper-lockfile's update timer fires. Due to a subtle
interaction between the synchronous fs adapter (lockSync / toSyncOptions)
and the setTimeout boundary in Node.js v25+, the ECOMPROMISED error
propagates up through the synchronous callback chain and becomes an
uncaught exception — even though the onCompromised callback sets
_lockCompromised = true without throwing.

The _gsdEpipeGuard uncaughtException handler only handled EPIPE, so it
re-threw ECOMPROMISED, crashing the process. Each crash wrote a new
"interrupted session" record, causing an infinite crash loop on resume.

Two fixes:

1. index.ts: Handle ECOMPROMISED in _gsdEpipeGuard. Exit with code 1
   (non-zero to signal failure) so the process.once("exit") handler runs
   and removes the lock directory, allowing the next session to start clean.

2. session-lock.ts: The retry path's onCompromised was missing
   `_releaseFunction = null`, unlike the primary path. This left the
   release function pointer live after compromise, causing validateSessionLock
   to return true and preventing graceful stop detection. Now matches primary.
2026-03-18 22:06:03 -06:00
Jeremy McSpadden
15a8807eb3 fix: clean up stale numbered lock files and harden signal/exit handling (#1315) (#1323) 2026-03-18 21:15:47 -06:00
Tom Boucher
7537e30815 fix: worktree sync and home-directory safety check (#1311, #1317) (#1322) 2026-03-18 21:15:36 -06:00
Tom Boucher
68e0672dda test: add regression harness for auto-mode dispatch loop (125 assertions) (#1319) 2026-03-18 21:14:59 -06:00
Jeremy McSpadden
805c7718c4 chore: remove orphaned mcporter extension manifest (#1318) 2026-03-18 21:14:50 -06:00
Tom Boucher
0418458cf9 refactor: extract tryMergeMilestone to eliminate 4 duplicate merge paths in auto.ts (#1314) 2026-03-18 20:04:10 -06:00
Tom Boucher
583e84e932 refactor: dispatch loop hardening — defensive guards, regression tests, lock alignment (#1310) 2026-03-18 20:03:59 -06:00
TÂCHES
e6ab3b6722 refactor: extract parseUnitId() to centralize unit ID parsing (#1282)
Replaces 30+ inline `unitId.split("/")` + destructuring patterns across
16 production files with a single `parseUnitId()` helper that returns
`{ milestone, slice?, task? }`. If the unit ID format ever changes,
only one function needs updating.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:20:08 -06:00
Tom Boucher
afb438164e fix: align retry lock path with primary lock settings to prevent ECOMPROMISED (#1307)
The retry lock acquisition path (from stale lock recovery in #1251)
used a 5-minute stale threshold and no onCompromised handler, while
the primary path used 30 minutes and a graceful flag-based handler.

This mismatch meant locks acquired via the retry path would throw
ECOMPROMISED (uncaught, crashes process) if the event loop stalled
for >5 minutes — which happens during long LLM operations.

Fixed:
- Stale timeout: 300_000 → 1_800_000 (matches primary)
- Added onCompromised handler (sets _lockCompromised flag)
- Added process.on('exit') safety net (matches primary)

Also: reporter is on Node v25.6.1 which is unsupported — GSD requires
Node >=22.0.0 with 24 LTS recommended.

Fixes #1304
2026-03-18 19:15:47 -06:00
Tom Boucher
150575957d fix: skip symlinks in makeTreeWritable to prevent EPERM on NixOS/nix-darwin (#1303)
makeTreeWritable used statSync which follows symlinks. On NixOS and
nix-darwin, ~/.gsd/agent/bin/ contains symlinks to the immutable Nix
store (/run/current-system/sw/bin/). Attempting to chmod those targets
crashed GSD on startup with EPERM.

Changes:
- Use lstatSync instead of statSync — detects symlinks without
  following them
- Skip symlinks entirely (they don't carry own permissions, targets
  may be immutable)
- Added try/catch around chmodSync as safety net for any remaining
  permission errors on unusual filesystems

Secondary analysis: rmSync with force:true already handles symlinks
correctly (removes the link, not the target). cpSync with force:true
replaces symlinks with regular files (desired behavior for resource
sync).

Fixes #1298
2026-03-18 19:15:33 -06:00
TÂCHES
2a2056bcd7 refactor: extract getErrorMessage() helper to eliminate 65 inline duplicates (#1280)
Consolidate the repeated `err instanceof Error ? err.message : String(err)`
pattern into a single `getErrorMessage(err)` utility. Reduces visual noise in
catch blocks across 20 files in the GSD extension.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:12:44 -06:00
TÂCHES
922826ba8a refactor: consolidate DB-fallback inline functions in auto-prompts (#1276)
* refactor: consolidate DB-fallback inline functions in auto-prompts

Extract shared inlineFromDbOrFile() helper that encapsulates the
repeated pattern of checking DB availability, dynamically importing
context-store, running a query, formatting results, and falling back
to the filesystem. The three public functions (inlineDecisionsFromDb,
inlineRequirementsFromDb, inlineProjectFromDb) become thin wrappers
that pass only the differing query/format logic as a callback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update source-level test to match refactored DB-fallback function name

The context-compression test greps auto-prompts.ts source for
`inlineGsdRootFile(base, "project.md"` which was replaced by
`inlineProjectFromDb(base)` in the consolidation refactor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:11:01 -06:00
Tom Boucher
3ae1d54759 fix: handle Windows EPERM on .gsd migration rename with copy+delete fallback (#1296) 2026-03-18 18:57:06 -06:00