Cherry-picks the worktree-sync concept from #902, adapted to the flat
auto-*.ts naming convention with fixes:
- Uses gsdVersion (not syncedAt) for resource staleness detection (#804)
- Uses ESM imports (not require()) for node:fs/node:os
- Includes both sync directions (project→worktree and worktree→project)
- Includes escapeStaleWorktree and cleanStaleRuntimeUnits
auto.ts: 3,256 → 3,099 lines (-157 lines)
Break up the 3,975-line auto.ts into focused, single-concern modules:
- auto-budget.ts: Pure budget alert level and enforcement functions
- auto-tool-tracking.ts: In-flight tool call tracking for idle detection
- auto-observability.ts: Pre-dispatch observability validation and repair
- auto-unit-closeout.ts: Consolidated metrics/activity/memory closeout helper
- auto-direct-dispatch.ts: Manual /gsd dispatch phase command handling
- auto-timeout-recovery.ts: Idle and hard timeout recovery with escalation
- auto-model-selection.ts: Model routing, complexity classification, fallback chains
auto.ts retains orchestration (start/stop/pause, handleAgentEnd, dispatchNextUnit)
and drops from 3,975 to 3,256 lines (-719 lines, -18%).
All extractions are pure moves with re-exports — no behavior changes.
All 1,092 unit tests and 30 integration tests pass.
Fixes#882 — npm install -g gsd-pi installing a broken version where
@gsd/pi-coding-agent cannot be resolved, causing ERR_MODULE_NOT_FOUND.
Root causes addressed:
1. On Windows without Developer Mode or admin rights, symlinkSync fails
even for NTFS junctions, leaving node_modules/@gsd/ empty and causing
a cryptic ERR_MODULE_NOT_FOUND instead of a usable error message.
2. If npm latest dist-tag is stale (pointing to an old version that
predates the packages/ directory), users get the same failure.
Changes:
- src/loader.ts: after symlinking, validate @gsd/pi-coding-agent exists;
emit a clear actionable error with reinstall instructions instead of
letting Node throw ERR_MODULE_NOT_FOUND deep inside cli.js. Also adds
cpSync fallback when symlinkSync fails (Windows without elevated perms).
- scripts/link-workspace-packages.cjs: same cpSync fallback — ensures
postinstall succeeds on restricted Windows environments.
- scripts/validate-pack.js: verify @gsd/* packages are resolvable after
the isolated install test, and run `gsd -v` to confirm end-to-end
resolution before declaring the pack valid.
- .github/workflows/build-native.yml: add post-publish dist-tag
verification step that confirms npm dist-tags.latest matches the
published version for stable releases, catching stale-tag regressions
in CI before users encounter them.
When the headless child process crashes or errors out, auto-restart
with exponential backoff (5s, 10s, 15s... up to 30s) instead of
exiting immediately. This enables overnight 'fire and forget' runs.
- --max-restarts N (default 3, 0 to disable): controls restart budget
- Only restarts on crashes (exit code !== 0), not on success or blocked
- SIGINT/SIGTERM bypasses restart (user intent to stop)
- Restart count shown in summary output
- Backoff prevents rapid crash loops from burning API credits
The inner loop function (runHeadlessOnce) returns exit status instead
of calling process.exit, letting the outer loop decide whether to
restart or terminate.
This is the first step toward the 'absolute autonomy' goal described
in #886 — process-level resilience for long-running sessions.
isTerminalNotification() used broad substring matching against
['complete', 'stopped', 'blocked']. Any notification containing these
words triggered early exit — including progress messages like:
'All slices are complete — nothing to discuss.'
'Override(s) resolved — rewrite-docs completed.'
'Skipped 5+ completed units. Yielding to UI before continuing.'
Fix: Replace substring matching with prefix matching against the actual
stop signals emitted by stopAuto():
'Auto-mode stopped...'
'Step-mode stopped...'
These are the ONLY notifications that indicate auto-mode has genuinely
terminated. All other notifications (slice completion, override
resolution, skip yielding) are progress events and must not trigger
exit.
Also tighten isBlockedNotification to match 'blocked:' (with colon)
instead of bare 'blocked' to avoid false positives from unrelated
messages.
Added 15 regression tests covering:
- All real terminal notification variants
- 6 false-positive cases from the issue report
- Non-notify event rejection
- Blocked detection with and without colon
On Windows, process.cwd() returns backslash paths (C:\Users\name\...).
When these paths are injected into system prompts, worktree context
blocks, or tool results, the model copies them into bash commands.
Bash interprets backslashes as escape characters, silently stripping
them — producing invalid paths like 'C:Usersnamedevelopmentapp-name'.
This is not a regex hack — it's a proper cross-platform boundary:
- Filesystem operations (fs, path.join, spawn cwd) use native paths
unchanged. Node handles both separators correctly for I/O.
- LLM-visible text (prompts, tool results, extension messages) uses
toPosixPath() to normalize to forward slashes. C:/Users/name/...
is valid in Git Bash, WSL bash, PowerShell, and Node.js.
Changes:
- utils/path-display.ts: New toPosixPath() utility in pi-coding-agent
package (for system prompt) and shared extension module (for
extensions that can't import from the compiled package at dev time)
- system-prompt.ts: Normalize resolvedCwd before injecting into the
'Current working directory' line
- gsd/index.ts: Normalize all process.cwd() and originalBase paths in
worktree context blocks injected into the system prompt
- bg-shell/index.ts: Normalize cwd in tool result text (start, env
actions) that the model reads and may reference in commands
- path-display.test.ts: 9 regression tests covering toPosixPath
behavior and system prompt output verification. Includes a scanner
that fails if any Windows absolute paths with backslashes appear in
buildSystemPrompt() output.
Audit scope: Checked all process.cwd() usage across pi-coding-agent
and all bundled extensions. Filesystem-only paths (join, readFile,
spawn cwd, existsSync) are correct and left unchanged. Only paths
entering LLM text are normalized.
When the LLM writes freeform prose roadmaps with `## Slice S01: Title`
headers instead of the machine-readable `## Slices` checklist,
parseRoadmapSlices() returned zero slices, causing deriveState() to
permanently block with 'No slice eligible'.
Add a fallback parser that detects prose-style `## Slice SXX:` headers
(and variants like `## S01:`, `## S01 —`) and extracts slice IDs,
titles, and dependencies from the prose. Also parses `Depends on:`
text patterns. All fallback slices default to risk:medium and done:false.
When merging a milestone back to main, `git checkout main` fails if
untracked .gsd/ state files (STATE.md, completed-units.json, auto.lock)
in the working tree conflict with tracked files on the branch.
Remove these known GSD-managed state files before checkout. They are
runtime artifacts regenerated by doctor/rebuildState and are not
meaningful in the main working tree — the worktree had the real state.
OAuthSelectorComponent calls its onSelect callback synchronously (no
await), but the callback was async — calling showLoginDialog which
throws 'Login cancelled' on Escape. The unhandled rejection bubbled
up to the uncaughtException handler and crashed GSD.
Wrap the async work in a named function with .catch() so cancellation
errors are swallowed gracefully. showLoginDialog already handles its
own error display internally.
Worktree initialization only copied DECISIONS.md, REQUIREMENTS.md,
PROJECT.md, and QUEUE.md. The missing STATE.md caused the pre-dispatch
health check in doctor-proactive.ts to block dispatch with
'STATE.md missing'.
Add STATE.md, KNOWLEDGE.md, and OVERRIDES.md to the copy list so
worktrees start with complete planning state.
Every new pi session writes a fresh syncedAt timestamp to
managed-resources.json, causing a running auto-mode session to falsely
detect a GSD update and stop. The actual version (gsdVersion) only
changes on real upgrades.
Switch the staleness check from syncedAt (timestamp) to gsdVersion
(semver string) so that launching a second session no longer triggers
a false positive.
When multiple tool calls (e.g. concurrent gsd_save_decision) target the
same markdown file, the deterministic .tmp suffix caused ENOENT on
rename() because one caller consumed the temp file before another could
rename it.
Replace the static `.tmp` suffix with a per-call random suffix so each
concurrent writer gets its own temp file. Also clean up orphaned temp
files on rename failure.
The .planning → .gsd migration creates roadmaps and summaries but not
VALIDATION files. deriveState() requires a terminal validation file
(verdict: pass) to consider a milestone complete. Without it, every
migrated milestone enters validating-milestone phase, blocking progress
to the actual current milestone.
For milestones where all slices are done, write a pass-through
VALIDATION.md (verdict: pass, migrated: true) and SUMMARY.md so
deriveState() skips them correctly.
Updated integration test to verify VALIDATION/SUMMARY files are written
and deriveState returns 'complete' phase with activeMilestone pointing
to the last completed entry (expected behavior).
When unique_milestone_ids is enabled, the LLM cannot generate random
suffixes itself. Previously only the first milestone got a correct ID
(pre-generated in TS), while subsequent milestones in multi-milestone
projects got bare M002/M003 without suffixes.
Added a gsd_generate_milestone_id tool that the LLM calls to get each
milestone ID. The tool scans disk for existing milestones and respects
the unique_milestone_ids preference, making it impossible to produce
wrong-format IDs.
Updated discuss, discuss-headless, and queue prompts to instruct the
LLM to use the tool instead of inventing milestone IDs.
* feat: worker NDJSON monitoring, budget enforcement, PID-based stop fallback
Closes three gaps in parallel orchestration:
1. **Worker stdout monitoring** — Workers now run with `--mode json` so
they emit NDJSON events. The coordinator parses stdout line-by-line,
extracting cost/token data from `message_end` events. This keeps
per-worker cost tracking in sync with actual API spend and updates
session status files for live dashboard visibility.
2. **Budget enforcement before spawn** — `startParallel()` now checks
`isBudgetExceeded()` before each worker spawn. When the aggregate
cost across all workers reaches the configured ceiling, no new
workers are started.
3. **PID-based stop fallback** — `stopParallel()` now falls back to
`process.kill(pid, "SIGTERM")` when the ChildProcess handle is null
(e.g., after coordinator restart when handles aren't available).
Previously, orphaned workers could not be stopped.
Includes 11 new tests covering NDJSON format validation, cost
aggregation, budget ceiling comparison, and PID-based kill patterns.
All 54 existing parallel-orchestration tests still pass.
Relates to #672
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: currentUnit type must match SessionStatus interface (object | null, not string)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: render native web search tool calls in TUI
The Anthropic streaming parser silently dropped server_tool_use and
web_search_tool_result content blocks, making native web search
invisible. Add ServerToolUseContent and WebSearchResultContent types,
handle both block types in the streaming parser and conversation replay,
and render them as ToolExecutionComponent in the interactive TUI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add PREFER_BRAVE_SEARCH env var to bypass native web search
Set PREFER_BRAVE_SEARCH=1 to keep Brave/custom search tools active
on Anthropic models instead of injecting native server-side web search.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip non-toolCall blocks in Mistral provider conversation replay
The ServerToolUseContent and WebSearchResultContent types added for
native web search don't have id/name/arguments properties, causing
TypeScript errors when the Mistral provider tried to push them as
tool calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: meaningful commit messages from task summaries (#785, #784)
Post-task commits now derive messages from the task summary one-liner,
inferred type, and key files. Planning prompts respect commit_docs: false.
Commit type inference expanded with perf type and oneLiner parameter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace invalidateStateCache with invalidateAllCaches in crash recovery
PR #799 reintroduced invalidateStateCache() calls in the phantom skip
loop crash recovery paths. These should use invalidateAllCaches() which
is the renamed function.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve CI failures from main merge conflicts
- Replace invalidateStateCache() → invalidateAllCaches() in crash
recovery paths (reintroduced by PR #799 merge)
- Expand smart-entry-draft test chunk window from 3000 to 4000 chars
to accommodate commitInstruction additions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>