* test: parallel merge reconciliation + budget atomicity coverage (G5/G6)
27 new tests covering two gaps identified in #672:
G5 — Merge Reconciliation (parallel-merge.test.ts, 17 tests):
- determineMergeOrder: sequential, by-completion, filtering, defaults
- formatMergeResults: success, conflict, empty, mixed output
- mergeCompletedMilestone: clean merge with session cleanup, missing
roadmap error, conflict detection with structured file list
- mergeAllCompleted: sequential order, stop-on-first-conflict,
by-completion order (integration tests with real git repos)
G6 — Budget Atomicity (parallel-budget-atomicity.test.ts, 10 tests):
- Ceiling enforcement: exceeded, not exceeded, exact boundary
- Cost aggregation: correct sum, incremental updates
- No double-counting: 5 rapid refreshes produce correct total
- Budget reset: resetOrchestrator clears all state
- No ceiling: unlimited spending when budget_ceiling unset
- Worker state sync: refreshWorkerStatuses picks up disk changes
All tests use node:test + node:assert/strict. No production code changes.
Relates to #672
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use double quotes in git commit messages for Windows compatibility
Single-quoted commit messages in test helpers fail on Windows CMD
(pathspec errors). Switch to double quotes which work cross-platform.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: TUI dashboard cleanup, dedup, and feature improvements
- Extract shared format-utils.ts: formatDuration, padRight, joinColumns,
centerLine, fitColumns, sparkline, stripAnsi — eliminating 3× duplication
across dashboard-overlay, visualizer-views, and auto-dashboard
- Use shared STATUS_GLYPH/STATUS_COLOR from ui.ts consistently across all
overlay and view files instead of hardcoded Unicode glyphs
- Fix redundant dynamic import('node:fs') in visualizer-data.ts (statSync
already imported at top level)
- Replace (entry as any) casts with proper SessionMessageEntry type narrowing
- Add mtime-based file content cache for visualizer data loader to avoid
re-parsing unchanged roadmap/plan files on every refresh
- Increase visualizer refresh interval from 2s to 5s (with mtime cache,
unchanged files are effectively free)
- Fix sparkline to use loop-based max instead of Math.max(...values) to
avoid stack overflow on large arrays
- Add ETA/time-remaining estimate to progress widget and dashboard overlay
based on average unit duration from metrics ledger
- Show warning glyph for budget-pressured units in completed units list
(continueHereFired units now show ⚠ instead of ✓)
- Add terminal resize (SIGWINCH) handling to both overlays — invalidates
cache and re-renders on window size change
- Fix dispose race in dashboard overlay close path — now calls dispose()
before onClose() to prevent timer callbacks firing after teardown
- Add 23 unit tests for format-utils.ts (including 100k-element sparkline)
- Add 2 tests for estimateTimeRemaining
- Add source-contract tests for resize handler and shared imports
* fix: use STATUS_GLYPH.warning instead of STATUS_GLYPH.statusWarning
STATUS_GLYPH is keyed by ProgressStatus ("warning"), not by GLYPH
property name ("statusWarning"). Fixes typecheck failure in CI.
The startup model validation overwrote the user's configured model when
it was 'not available' (API key missing, OAuth token expired, rate
limited). This silently changed the model to a fallback like
google/gemini-1.5-flash or openai/gpt-5.4.
Fix: Only trigger the fallback when the configured model doesn't exist
in the registry at all (removed/unknown). A model that exists but is
temporarily unavailable (credential issue) keeps its setting — the
session-level fallback resolver handles it at prompt time.
After the skip-loop breaker evicts a completion key, the fallback path
at the bottom of dispatchNextUnit re-persists it because the expected
artifact exists on disk. This recreates the exact loop the breaker was
trying to break:
evict key → dispatch → verifyArtifact(true) → re-persist key → skip → evict → repeat
Fix: Track recently-evicted keys in a Set. The fallback artifact-check
path skips re-persistence for keys that were just evicted by the
skip-loop breaker. Set is cleared on stopAuto.
When a slice plan (S03-PLAN.md) was pre-created during roadmapping
but plan-slice never ran to generate per-task files (tasks/T01-PLAN.md),
deriveState returned 'executing' phase. execute-task then failed because
the task plan didn't exist, creating an infinite restart loop.
Fix: In deriveState, when the tasks directory exists but has zero .md
files and the slice plan references tasks, return 'planning' phase
instead of 'executing'. This causes plan-slice to dispatch and generate
the missing task plans.
Tests updated: 6 test files that create synthetic state fixtures now
include a stub task plan file so their 'executing' phase assertions
remain valid.
* fix: reduce CPU usage on long auto-mode sessions
Seven targeted fixes for compounding process/timer/I/O issues that cause
high CPU during multi-hour /gsd auto sessions:
1. Wrap idle watchdog and hard timeout async callbacks in try-catch to
prevent unhandled rejections from orphaning intervals
2. Cache nativeHasChanges fallback (10s TTL) to avoid spawning a new
git process every 15 seconds when native module is unavailable
3. Call clearUnitTimeout() before dispatchNextUnit() in all recovery
paths to prevent stale idle watchdog from firing alongside new timers
4. Add 10-second timeout to subagent worktree cleanup to prevent hangs
when git worktree remove blocks indefinitely
5. Prune dead bg-shell processes after each unit completion to free
retained output buffers (~500KB-1MB per dead process)
6. Throttle STATE.md rebuilds to at most once per 30 seconds (was every
unit completion at 100-400ms each)
7. Increase progress widget refresh interval from 5s to 15s to reduce
synchronous file I/O on the hot path
* fix: reset nativeHasChanges cache in worktree test
The 10s TTL cache on nativeHasChanges was causing the worktree test
to return stale "no changes" when checking a freshly dirtied repo
within the cache window. Reset the cache before the dirty-repo
assertion so the test correctly detects new changes.
Conflicts arose because main added continueHereHandle cleanup and
buildSnapshotOpts (with continueHereFired) while the PR extracted
inline closeout code into closeoutUnit(). Resolution: use closeoutUnit()
with buildSnapshotOpts() to pass all fields including continueHereFired.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolveModelId narrowed the return type to { id, provider } which lost
the full Model<Api> shape needed by pi.setModel(). Using a generic <T>
preserves the actual type from the registry's getAvailable() call.
Cherry-picks the worktree-sync concept from #902, adapted to the flat
auto-*.ts naming convention with fixes:
- Uses gsdVersion (not syncedAt) for resource staleness detection (#804)
- Uses ESM imports (not require()) for node:fs/node:os
- Includes both sync directions (project→worktree and worktree→project)
- Includes escapeStaleWorktree and cleanStaleRuntimeUnits
auto.ts: 3,256 → 3,099 lines (-157 lines)
Break up the 3,975-line auto.ts into focused, single-concern modules:
- auto-budget.ts: Pure budget alert level and enforcement functions
- auto-tool-tracking.ts: In-flight tool call tracking for idle detection
- auto-observability.ts: Pre-dispatch observability validation and repair
- auto-unit-closeout.ts: Consolidated metrics/activity/memory closeout helper
- auto-direct-dispatch.ts: Manual /gsd dispatch phase command handling
- auto-timeout-recovery.ts: Idle and hard timeout recovery with escalation
- auto-model-selection.ts: Model routing, complexity classification, fallback chains
auto.ts retains orchestration (start/stop/pause, handleAgentEnd, dispatchNextUnit)
and drops from 3,975 to 3,256 lines (-719 lines, -18%).
All extractions are pure moves with re-exports — no behavior changes.
All 1,092 unit tests and 30 integration tests pass.
Fixes#882 — npm install -g gsd-pi installing a broken version where
@gsd/pi-coding-agent cannot be resolved, causing ERR_MODULE_NOT_FOUND.
Root causes addressed:
1. On Windows without Developer Mode or admin rights, symlinkSync fails
even for NTFS junctions, leaving node_modules/@gsd/ empty and causing
a cryptic ERR_MODULE_NOT_FOUND instead of a usable error message.
2. If npm latest dist-tag is stale (pointing to an old version that
predates the packages/ directory), users get the same failure.
Changes:
- src/loader.ts: after symlinking, validate @gsd/pi-coding-agent exists;
emit a clear actionable error with reinstall instructions instead of
letting Node throw ERR_MODULE_NOT_FOUND deep inside cli.js. Also adds
cpSync fallback when symlinkSync fails (Windows without elevated perms).
- scripts/link-workspace-packages.cjs: same cpSync fallback — ensures
postinstall succeeds on restricted Windows environments.
- scripts/validate-pack.js: verify @gsd/* packages are resolvable after
the isolated install test, and run `gsd -v` to confirm end-to-end
resolution before declaring the pack valid.
- .github/workflows/build-native.yml: add post-publish dist-tag
verification step that confirms npm dist-tags.latest matches the
published version for stable releases, catching stale-tag regressions
in CI before users encounter them.
When the headless child process crashes or errors out, auto-restart
with exponential backoff (5s, 10s, 15s... up to 30s) instead of
exiting immediately. This enables overnight 'fire and forget' runs.
- --max-restarts N (default 3, 0 to disable): controls restart budget
- Only restarts on crashes (exit code !== 0), not on success or blocked
- SIGINT/SIGTERM bypasses restart (user intent to stop)
- Restart count shown in summary output
- Backoff prevents rapid crash loops from burning API credits
The inner loop function (runHeadlessOnce) returns exit status instead
of calling process.exit, letting the outer loop decide whether to
restart or terminate.
This is the first step toward the 'absolute autonomy' goal described
in #886 — process-level resilience for long-running sessions.
isTerminalNotification() used broad substring matching against
['complete', 'stopped', 'blocked']. Any notification containing these
words triggered early exit — including progress messages like:
'All slices are complete — nothing to discuss.'
'Override(s) resolved — rewrite-docs completed.'
'Skipped 5+ completed units. Yielding to UI before continuing.'
Fix: Replace substring matching with prefix matching against the actual
stop signals emitted by stopAuto():
'Auto-mode stopped...'
'Step-mode stopped...'
These are the ONLY notifications that indicate auto-mode has genuinely
terminated. All other notifications (slice completion, override
resolution, skip yielding) are progress events and must not trigger
exit.
Also tighten isBlockedNotification to match 'blocked:' (with colon)
instead of bare 'blocked' to avoid false positives from unrelated
messages.
Added 15 regression tests covering:
- All real terminal notification variants
- 6 false-positive cases from the issue report
- Non-notify event rejection
- Blocked detection with and without colon
On Windows, process.cwd() returns backslash paths (C:\Users\name\...).
When these paths are injected into system prompts, worktree context
blocks, or tool results, the model copies them into bash commands.
Bash interprets backslashes as escape characters, silently stripping
them — producing invalid paths like 'C:Usersnamedevelopmentapp-name'.
This is not a regex hack — it's a proper cross-platform boundary:
- Filesystem operations (fs, path.join, spawn cwd) use native paths
unchanged. Node handles both separators correctly for I/O.
- LLM-visible text (prompts, tool results, extension messages) uses
toPosixPath() to normalize to forward slashes. C:/Users/name/...
is valid in Git Bash, WSL bash, PowerShell, and Node.js.
Changes:
- utils/path-display.ts: New toPosixPath() utility in pi-coding-agent
package (for system prompt) and shared extension module (for
extensions that can't import from the compiled package at dev time)
- system-prompt.ts: Normalize resolvedCwd before injecting into the
'Current working directory' line
- gsd/index.ts: Normalize all process.cwd() and originalBase paths in
worktree context blocks injected into the system prompt
- bg-shell/index.ts: Normalize cwd in tool result text (start, env
actions) that the model reads and may reference in commands
- path-display.test.ts: 9 regression tests covering toPosixPath
behavior and system prompt output verification. Includes a scanner
that fails if any Windows absolute paths with backslashes appear in
buildSystemPrompt() output.
Audit scope: Checked all process.cwd() usage across pi-coding-agent
and all bundled extensions. Filesystem-only paths (join, readFile,
spawn cwd, existsSync) are correct and left unchanged. Only paths
entering LLM text are normalized.
When the LLM writes freeform prose roadmaps with `## Slice S01: Title`
headers instead of the machine-readable `## Slices` checklist,
parseRoadmapSlices() returned zero slices, causing deriveState() to
permanently block with 'No slice eligible'.
Add a fallback parser that detects prose-style `## Slice SXX:` headers
(and variants like `## S01:`, `## S01 —`) and extracts slice IDs,
titles, and dependencies from the prose. Also parses `Depends on:`
text patterns. All fallback slices default to risk:medium and done:false.
When merging a milestone back to main, `git checkout main` fails if
untracked .gsd/ state files (STATE.md, completed-units.json, auto.lock)
in the working tree conflict with tracked files on the branch.
Remove these known GSD-managed state files before checkout. They are
runtime artifacts regenerated by doctor/rebuildState and are not
meaningful in the main working tree — the worktree had the real state.
Worktree initialization only copied DECISIONS.md, REQUIREMENTS.md,
PROJECT.md, and QUEUE.md. The missing STATE.md caused the pre-dispatch
health check in doctor-proactive.ts to block dispatch with
'STATE.md missing'.
Add STATE.md, KNOWLEDGE.md, and OVERRIDES.md to the copy list so
worktrees start with complete planning state.
Every new pi session writes a fresh syncedAt timestamp to
managed-resources.json, causing a running auto-mode session to falsely
detect a GSD update and stop. The actual version (gsdVersion) only
changes on real upgrades.
Switch the staleness check from syncedAt (timestamp) to gsdVersion
(semver string) so that launching a second session no longer triggers
a false positive.
When multiple tool calls (e.g. concurrent gsd_save_decision) target the
same markdown file, the deterministic .tmp suffix caused ENOENT on
rename() because one caller consumed the temp file before another could
rename it.
Replace the static `.tmp` suffix with a per-call random suffix so each
concurrent writer gets its own temp file. Also clean up orphaned temp
files on rename failure.
The .planning → .gsd migration creates roadmaps and summaries but not
VALIDATION files. deriveState() requires a terminal validation file
(verdict: pass) to consider a milestone complete. Without it, every
migrated milestone enters validating-milestone phase, blocking progress
to the actual current milestone.
For milestones where all slices are done, write a pass-through
VALIDATION.md (verdict: pass, migrated: true) and SUMMARY.md so
deriveState() skips them correctly.
Updated integration test to verify VALIDATION/SUMMARY files are written
and deriveState returns 'complete' phase with activeMilestone pointing
to the last completed entry (expected behavior).
When unique_milestone_ids is enabled, the LLM cannot generate random
suffixes itself. Previously only the first milestone got a correct ID
(pre-generated in TS), while subsequent milestones in multi-milestone
projects got bare M002/M003 without suffixes.
Added a gsd_generate_milestone_id tool that the LLM calls to get each
milestone ID. The tool scans disk for existing milestones and respects
the unique_milestone_ids preference, making it impossible to produce
wrong-format IDs.
Updated discuss, discuss-headless, and queue prompts to instruct the
LLM to use the tool instead of inventing milestone IDs.
* feat: worker NDJSON monitoring, budget enforcement, PID-based stop fallback
Closes three gaps in parallel orchestration:
1. **Worker stdout monitoring** — Workers now run with `--mode json` so
they emit NDJSON events. The coordinator parses stdout line-by-line,
extracting cost/token data from `message_end` events. This keeps
per-worker cost tracking in sync with actual API spend and updates
session status files for live dashboard visibility.
2. **Budget enforcement before spawn** — `startParallel()` now checks
`isBudgetExceeded()` before each worker spawn. When the aggregate
cost across all workers reaches the configured ceiling, no new
workers are started.
3. **PID-based stop fallback** — `stopParallel()` now falls back to
`process.kill(pid, "SIGTERM")` when the ChildProcess handle is null
(e.g., after coordinator restart when handles aren't available).
Previously, orphaned workers could not be stopped.
Includes 11 new tests covering NDJSON format validation, cost
aggregation, budget ceiling comparison, and PID-based kill patterns.
All 54 existing parallel-orchestration tests still pass.
Relates to #672
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: currentUnit type must match SessionStatus interface (object | null, not string)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: render native web search tool calls in TUI
The Anthropic streaming parser silently dropped server_tool_use and
web_search_tool_result content blocks, making native web search
invisible. Add ServerToolUseContent and WebSearchResultContent types,
handle both block types in the streaming parser and conversation replay,
and render them as ToolExecutionComponent in the interactive TUI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add PREFER_BRAVE_SEARCH env var to bypass native web search
Set PREFER_BRAVE_SEARCH=1 to keep Brave/custom search tools active
on Anthropic models instead of injecting native server-side web search.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip non-toolCall blocks in Mistral provider conversation replay
The ServerToolUseContent and WebSearchResultContent types added for
native web search don't have id/name/arguments properties, causing
TypeScript errors when the Mistral provider tried to push them as
tool calls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>