After the skip-loop breaker evicts a completion key, the fallback path
at the bottom of dispatchNextUnit re-persists it because the expected
artifact exists on disk. This recreates the exact loop the breaker was
trying to break:
evict key → dispatch → verifyArtifact(true) → re-persist key → skip → evict → repeat
Fix: Track recently-evicted keys in a Set. The fallback artifact-check
path skips re-persistence for keys that were just evicted by the
skip-loop breaker. Set is cleared on stopAuto.
Two fixes:
1. lsp/config.ts: Use `where.exe` instead of `which` on Windows.
MSYS's `which` returns POSIX paths (/c/Users/...) that Node's
spawn() can't execute. `where.exe` returns native Windows paths.
2. lsp/client.ts: Handle spawn ENOENT error gracefully. When the LSP
server binary doesn't exist, the error event now triggers a clean
exit instead of bubbling up and crashing auto-mode.
When a slice plan (S03-PLAN.md) was pre-created during roadmapping
but plan-slice never ran to generate per-task files (tasks/T01-PLAN.md),
deriveState returned 'executing' phase. execute-task then failed because
the task plan didn't exist, creating an infinite restart loop.
Fix: In deriveState, when the tasks directory exists but has zero .md
files and the slice plan references tasks, return 'planning' phase
instead of 'executing'. This causes plan-slice to dispatch and generate
the missing task plans.
Tests updated: 6 test files that create synthetic state fixtures now
include a stub task plan file so their 'executing' phase assertions
remain valid.
* fix: reduce CPU usage on long auto-mode sessions
Seven targeted fixes for compounding process/timer/I/O issues that cause
high CPU during multi-hour /gsd auto sessions:
1. Wrap idle watchdog and hard timeout async callbacks in try-catch to
prevent unhandled rejections from orphaning intervals
2. Cache nativeHasChanges fallback (10s TTL) to avoid spawning a new
git process every 15 seconds when native module is unavailable
3. Call clearUnitTimeout() before dispatchNextUnit() in all recovery
paths to prevent stale idle watchdog from firing alongside new timers
4. Add 10-second timeout to subagent worktree cleanup to prevent hangs
when git worktree remove blocks indefinitely
5. Prune dead bg-shell processes after each unit completion to free
retained output buffers (~500KB-1MB per dead process)
6. Throttle STATE.md rebuilds to at most once per 30 seconds (was every
unit completion at 100-400ms each)
7. Increase progress widget refresh interval from 5s to 15s to reduce
synchronous file I/O on the hot path
* fix: reset nativeHasChanges cache in worktree test
The 10s TTL cache on nativeHasChanges was causing the worktree test
to return stale "no changes" when checking a freshly dirtied repo
within the cache window. Reset the cache before the dirty-repo
assertion so the test correctly detects new changes.
Conflicts arose because main added continueHereHandle cleanup and
buildSnapshotOpts (with continueHereFired) while the PR extracted
inline closeout code into closeoutUnit(). Resolution: use closeoutUnit()
with buildSnapshotOpts() to pass all fields including continueHereFired.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolveModelId narrowed the return type to { id, provider } which lost
the full Model<Api> shape needed by pi.setModel(). Using a generic <T>
preserves the actual type from the registry's getAvailable() call.
Cherry-picks the worktree-sync concept from #902, adapted to the flat
auto-*.ts naming convention with fixes:
- Uses gsdVersion (not syncedAt) for resource staleness detection (#804)
- Uses ESM imports (not require()) for node:fs/node:os
- Includes both sync directions (project→worktree and worktree→project)
- Includes escapeStaleWorktree and cleanStaleRuntimeUnits
auto.ts: 3,256 → 3,099 lines (-157 lines)
Break up the 3,975-line auto.ts into focused, single-concern modules:
- auto-budget.ts: Pure budget alert level and enforcement functions
- auto-tool-tracking.ts: In-flight tool call tracking for idle detection
- auto-observability.ts: Pre-dispatch observability validation and repair
- auto-unit-closeout.ts: Consolidated metrics/activity/memory closeout helper
- auto-direct-dispatch.ts: Manual /gsd dispatch phase command handling
- auto-timeout-recovery.ts: Idle and hard timeout recovery with escalation
- auto-model-selection.ts: Model routing, complexity classification, fallback chains
auto.ts retains orchestration (start/stop/pause, handleAgentEnd, dispatchNextUnit)
and drops from 3,975 to 3,256 lines (-719 lines, -18%).
All extractions are pure moves with re-exports — no behavior changes.
All 1,092 unit tests and 30 integration tests pass.
Fixes#882 — npm install -g gsd-pi installing a broken version where
@gsd/pi-coding-agent cannot be resolved, causing ERR_MODULE_NOT_FOUND.
Root causes addressed:
1. On Windows without Developer Mode or admin rights, symlinkSync fails
even for NTFS junctions, leaving node_modules/@gsd/ empty and causing
a cryptic ERR_MODULE_NOT_FOUND instead of a usable error message.
2. If npm latest dist-tag is stale (pointing to an old version that
predates the packages/ directory), users get the same failure.
Changes:
- src/loader.ts: after symlinking, validate @gsd/pi-coding-agent exists;
emit a clear actionable error with reinstall instructions instead of
letting Node throw ERR_MODULE_NOT_FOUND deep inside cli.js. Also adds
cpSync fallback when symlinkSync fails (Windows without elevated perms).
- scripts/link-workspace-packages.cjs: same cpSync fallback — ensures
postinstall succeeds on restricted Windows environments.
- scripts/validate-pack.js: verify @gsd/* packages are resolvable after
the isolated install test, and run `gsd -v` to confirm end-to-end
resolution before declaring the pack valid.
- .github/workflows/build-native.yml: add post-publish dist-tag
verification step that confirms npm dist-tags.latest matches the
published version for stable releases, catching stale-tag regressions
in CI before users encounter them.
When the headless child process crashes or errors out, auto-restart
with exponential backoff (5s, 10s, 15s... up to 30s) instead of
exiting immediately. This enables overnight 'fire and forget' runs.
- --max-restarts N (default 3, 0 to disable): controls restart budget
- Only restarts on crashes (exit code !== 0), not on success or blocked
- SIGINT/SIGTERM bypasses restart (user intent to stop)
- Restart count shown in summary output
- Backoff prevents rapid crash loops from burning API credits
The inner loop function (runHeadlessOnce) returns exit status instead
of calling process.exit, letting the outer loop decide whether to
restart or terminate.
This is the first step toward the 'absolute autonomy' goal described
in #886 — process-level resilience for long-running sessions.
isTerminalNotification() used broad substring matching against
['complete', 'stopped', 'blocked']. Any notification containing these
words triggered early exit — including progress messages like:
'All slices are complete — nothing to discuss.'
'Override(s) resolved — rewrite-docs completed.'
'Skipped 5+ completed units. Yielding to UI before continuing.'
Fix: Replace substring matching with prefix matching against the actual
stop signals emitted by stopAuto():
'Auto-mode stopped...'
'Step-mode stopped...'
These are the ONLY notifications that indicate auto-mode has genuinely
terminated. All other notifications (slice completion, override
resolution, skip yielding) are progress events and must not trigger
exit.
Also tighten isBlockedNotification to match 'blocked:' (with colon)
instead of bare 'blocked' to avoid false positives from unrelated
messages.
Added 15 regression tests covering:
- All real terminal notification variants
- 6 false-positive cases from the issue report
- Non-notify event rejection
- Blocked detection with and without colon
On Windows, process.cwd() returns backslash paths (C:\Users\name\...).
When these paths are injected into system prompts, worktree context
blocks, or tool results, the model copies them into bash commands.
Bash interprets backslashes as escape characters, silently stripping
them — producing invalid paths like 'C:Usersnamedevelopmentapp-name'.
This is not a regex hack — it's a proper cross-platform boundary:
- Filesystem operations (fs, path.join, spawn cwd) use native paths
unchanged. Node handles both separators correctly for I/O.
- LLM-visible text (prompts, tool results, extension messages) uses
toPosixPath() to normalize to forward slashes. C:/Users/name/...
is valid in Git Bash, WSL bash, PowerShell, and Node.js.
Changes:
- utils/path-display.ts: New toPosixPath() utility in pi-coding-agent
package (for system prompt) and shared extension module (for
extensions that can't import from the compiled package at dev time)
- system-prompt.ts: Normalize resolvedCwd before injecting into the
'Current working directory' line
- gsd/index.ts: Normalize all process.cwd() and originalBase paths in
worktree context blocks injected into the system prompt
- bg-shell/index.ts: Normalize cwd in tool result text (start, env
actions) that the model reads and may reference in commands
- path-display.test.ts: 9 regression tests covering toPosixPath
behavior and system prompt output verification. Includes a scanner
that fails if any Windows absolute paths with backslashes appear in
buildSystemPrompt() output.
Audit scope: Checked all process.cwd() usage across pi-coding-agent
and all bundled extensions. Filesystem-only paths (join, readFile,
spawn cwd, existsSync) are correct and left unchanged. Only paths
entering LLM text are normalized.