Fixes#882 — npm install -g gsd-pi installing a broken version where
@gsd/pi-coding-agent cannot be resolved, causing ERR_MODULE_NOT_FOUND.
Root causes addressed:
1. On Windows without Developer Mode or admin rights, symlinkSync fails
even for NTFS junctions, leaving node_modules/@gsd/ empty and causing
a cryptic ERR_MODULE_NOT_FOUND instead of a usable error message.
2. If npm latest dist-tag is stale (pointing to an old version that
predates the packages/ directory), users get the same failure.
Changes:
- src/loader.ts: after symlinking, validate @gsd/pi-coding-agent exists;
emit a clear actionable error with reinstall instructions instead of
letting Node throw ERR_MODULE_NOT_FOUND deep inside cli.js. Also adds
cpSync fallback when symlinkSync fails (Windows without elevated perms).
- scripts/link-workspace-packages.cjs: same cpSync fallback — ensures
postinstall succeeds on restricted Windows environments.
- scripts/validate-pack.js: verify @gsd/* packages are resolvable after
the isolated install test, and run `gsd -v` to confirm end-to-end
resolution before declaring the pack valid.
- .github/workflows/build-native.yml: add post-publish dist-tag
verification step that confirms npm dist-tags.latest matches the
published version for stable releases, catching stale-tag regressions
in CI before users encounter them.
When the headless child process crashes or errors out, auto-restart
with exponential backoff (5s, 10s, 15s... up to 30s) instead of
exiting immediately. This enables overnight 'fire and forget' runs.
- --max-restarts N (default 3, 0 to disable): controls restart budget
- Only restarts on crashes (exit code !== 0), not on success or blocked
- SIGINT/SIGTERM bypasses restart (user intent to stop)
- Restart count shown in summary output
- Backoff prevents rapid crash loops from burning API credits
The inner loop function (runHeadlessOnce) returns exit status instead
of calling process.exit, letting the outer loop decide whether to
restart or terminate.
This is the first step toward the 'absolute autonomy' goal described
in #886 — process-level resilience for long-running sessions.
isTerminalNotification() used broad substring matching against
['complete', 'stopped', 'blocked']. Any notification containing these
words triggered early exit — including progress messages like:
'All slices are complete — nothing to discuss.'
'Override(s) resolved — rewrite-docs completed.'
'Skipped 5+ completed units. Yielding to UI before continuing.'
Fix: Replace substring matching with prefix matching against the actual
stop signals emitted by stopAuto():
'Auto-mode stopped...'
'Step-mode stopped...'
These are the ONLY notifications that indicate auto-mode has genuinely
terminated. All other notifications (slice completion, override
resolution, skip yielding) are progress events and must not trigger
exit.
Also tighten isBlockedNotification to match 'blocked:' (with colon)
instead of bare 'blocked' to avoid false positives from unrelated
messages.
Added 15 regression tests covering:
- All real terminal notification variants
- 6 false-positive cases from the issue report
- Non-notify event rejection
- Blocked detection with and without colon
On Windows, process.cwd() returns backslash paths (C:\Users\name\...).
When these paths are injected into system prompts, worktree context
blocks, or tool results, the model copies them into bash commands.
Bash interprets backslashes as escape characters, silently stripping
them — producing invalid paths like 'C:Usersnamedevelopmentapp-name'.
This is not a regex hack — it's a proper cross-platform boundary:
- Filesystem operations (fs, path.join, spawn cwd) use native paths
unchanged. Node handles both separators correctly for I/O.
- LLM-visible text (prompts, tool results, extension messages) uses
toPosixPath() to normalize to forward slashes. C:/Users/name/...
is valid in Git Bash, WSL bash, PowerShell, and Node.js.
Changes:
- utils/path-display.ts: New toPosixPath() utility in pi-coding-agent
package (for system prompt) and shared extension module (for
extensions that can't import from the compiled package at dev time)
- system-prompt.ts: Normalize resolvedCwd before injecting into the
'Current working directory' line
- gsd/index.ts: Normalize all process.cwd() and originalBase paths in
worktree context blocks injected into the system prompt
- bg-shell/index.ts: Normalize cwd in tool result text (start, env
actions) that the model reads and may reference in commands
- path-display.test.ts: 9 regression tests covering toPosixPath
behavior and system prompt output verification. Includes a scanner
that fails if any Windows absolute paths with backslashes appear in
buildSystemPrompt() output.
Audit scope: Checked all process.cwd() usage across pi-coding-agent
and all bundled extensions. Filesystem-only paths (join, readFile,
spawn cwd, existsSync) are correct and left unchanged. Only paths
entering LLM text are normalized.
When the LLM writes freeform prose roadmaps with `## Slice S01: Title`
headers instead of the machine-readable `## Slices` checklist,
parseRoadmapSlices() returned zero slices, causing deriveState() to
permanently block with 'No slice eligible'.
Add a fallback parser that detects prose-style `## Slice SXX:` headers
(and variants like `## S01:`, `## S01 —`) and extracts slice IDs,
titles, and dependencies from the prose. Also parses `Depends on:`
text patterns. All fallback slices default to risk:medium and done:false.
When merging a milestone back to main, `git checkout main` fails if
untracked .gsd/ state files (STATE.md, completed-units.json, auto.lock)
in the working tree conflict with tracked files on the branch.
Remove these known GSD-managed state files before checkout. They are
runtime artifacts regenerated by doctor/rebuildState and are not
meaningful in the main working tree — the worktree had the real state.
OAuthSelectorComponent calls its onSelect callback synchronously (no
await), but the callback was async — calling showLoginDialog which
throws 'Login cancelled' on Escape. The unhandled rejection bubbled
up to the uncaughtException handler and crashed GSD.
Wrap the async work in a named function with .catch() so cancellation
errors are swallowed gracefully. showLoginDialog already handles its
own error display internally.
Worktree initialization only copied DECISIONS.md, REQUIREMENTS.md,
PROJECT.md, and QUEUE.md. The missing STATE.md caused the pre-dispatch
health check in doctor-proactive.ts to block dispatch with
'STATE.md missing'.
Add STATE.md, KNOWLEDGE.md, and OVERRIDES.md to the copy list so
worktrees start with complete planning state.
Every new pi session writes a fresh syncedAt timestamp to
managed-resources.json, causing a running auto-mode session to falsely
detect a GSD update and stop. The actual version (gsdVersion) only
changes on real upgrades.
Switch the staleness check from syncedAt (timestamp) to gsdVersion
(semver string) so that launching a second session no longer triggers
a false positive.
When multiple tool calls (e.g. concurrent gsd_save_decision) target the
same markdown file, the deterministic .tmp suffix caused ENOENT on
rename() because one caller consumed the temp file before another could
rename it.
Replace the static `.tmp` suffix with a per-call random suffix so each
concurrent writer gets its own temp file. Also clean up orphaned temp
files on rename failure.
The .planning → .gsd migration creates roadmaps and summaries but not
VALIDATION files. deriveState() requires a terminal validation file
(verdict: pass) to consider a milestone complete. Without it, every
migrated milestone enters validating-milestone phase, blocking progress
to the actual current milestone.
For milestones where all slices are done, write a pass-through
VALIDATION.md (verdict: pass, migrated: true) and SUMMARY.md so
deriveState() skips them correctly.
Updated integration test to verify VALIDATION/SUMMARY files are written
and deriveState returns 'complete' phase with activeMilestone pointing
to the last completed entry (expected behavior).
When unique_milestone_ids is enabled, the LLM cannot generate random
suffixes itself. Previously only the first milestone got a correct ID
(pre-generated in TS), while subsequent milestones in multi-milestone
projects got bare M002/M003 without suffixes.
Added a gsd_generate_milestone_id tool that the LLM calls to get each
milestone ID. The tool scans disk for existing milestones and respects
the unique_milestone_ids preference, making it impossible to produce
wrong-format IDs.
Updated discuss, discuss-headless, and queue prompts to instruct the
LLM to use the tool instead of inventing milestone IDs.