Downgrade internal recovery machinery to info/verbose-only so users
only see warnings when action is needed:
- "Dispatch gap detected" → verbose-only info (recovery is automatic)
- "Model not found, trying fallback" → verbose-only info
- "Failed to set model, trying fallback" → verbose-only info
- "Could not set any preferred model" → deleted (redundant)
- "New session cancelled" → info (user action, not error)
- "Unexpected phase" → info with doctor suggestion
- "No command context" → info with restart suggestion
Kept as warnings (user-actionable):
- Budget ceiling, blockers, prior slice incomplete, pre-flight,
no context, stub summary, model ambiguity, all fallbacks exhausted
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): prevent infinite re-dispatch when completion key is missing
Root cause: When a task completed successfully on the first attempt,
the idempotency key was never persisted to completed-units.json.
The persistence logic (persistCompletedKey) only triggered at the
retry threshold (MAX_UNIT_DISPATCHES=3). After session restart, the
key was missing and auto-mode re-dispatched the same task endlessly.
Evidence: M008/S01/T01 was dispatched 15+ times over 3.5 hours.
T01-SUMMARY.md existed, S01-PLAN.md marked T01 as [x], but
completed-units.json had no execute-task/M008/S01/T01 entry.
Fix: Added fallback artifact check before dispatch. If the expected
artifact already exists on disk but the completion key is missing,
the key is repaired (persisted + added to in-memory set) and the
unit is skipped. This catches the gap between the closeout-based
persistence (which requires the NEXT dispatch to fire) and the
retry-threshold persistence (which requires MAX attempts).
Also fixes guided-flow-escape.test.ts: added missing cache
invalidation after rmSync (clearPathCache + invalidateStateCache).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): prevent TUI freeze on cascading skip-dispatches
When multiple completed tasks are skipped in sequence (T01 artifact
fallback → T02 idempotency skip → T03 dispatch), the recursive
dispatchNextUnit calls can freeze the TUI.
Fix: invalidateStateCache() after key repair so deriveState returns
the correct next task, and use setTimeout(50ms) instead of
setImmediate to yield more generously to the event loop between
cascading skips.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): systematic hardening of dispatch recovery pipeline
Five fixes addressing the 20 failure modes identified in the auto-mode
dispatch loop audit:
1. Stale runtime record cleanup: selfHealRuntimeRecords now clears
records older than 1h with phase=dispatched (crash orphans), and
also persists completion keys for records with existing artifacts.
2. Recursion depth limit: _skipDepth counter prevents TUI freeze when
many completed units are skipped in cascade. After MAX_SKIP_DEPTH
(20) skips, yields 200ms to the event loop before continuing.
3. Atomic completed-units.json writes: persistCompletedKey now uses
tmp file + renameSync to prevent partial writes on crash.
4. Skip depth tracking on both skip paths (idempotency check at L1815
and artifact fallback at L1844) with setTimeout(50ms) between skips.
5. Self-heal now also repairs missing completion keys when artifact
exists, closing the gap where crash between completion and closeout
leaves the key unwritten.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): add reentrancy guard to dispatchNextUnit itself
The _handlingAgentEnd boolean only guards calls from agent_end hooks.
Direct calls from watchdog timers, step wizard, and crash recovery
can still race with an in-progress dispatch. Added _dispatching guard
that blocks concurrent external calls while allowing recursive skip
calls (_skipDepth > 0). Cleared on stopAuto.
Audit confirmed: double watchdog (#11) already prevented by existing
clearDispatchGapWatchdog in startDispatchGapWatchdog + catch/return.
Counter cleanup (#16) already handled by unitDispatchCount.clear()
in startAuto before selfHealRuntimeRecords.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): final hardening for unattended multi-milestone runs
Three fixes from paranoid stress-test audit:
1. Git index.lock cleanup: Remove stale .git/index.lock (>60s old) at
auto-start. A crash during git commit/merge leaves this file behind,
blocking ALL subsequent git operations with no recovery.
2. Stub summary for complete-milestone: If the LLM fails to write a
milestone SUMMARY after MAX_UNIT_DISPATCHES attempts, generate a
stub summary to unblock the pipeline. Without this, auto-mode
loops forever in "completing-milestone" phase.
3. Pre-flight queue validation: At auto-start with multiple milestones,
scan for CONTEXT-DRAFT.md files (will pause for discussion) and
report milestone count. Gives the user early visibility into what
will happen during the run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: deseltrus <simulacraverse@protonmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): prevent nested worktree creation inside existing worktrees
When auto-mode starts inside a manual worktree (e.g., /worktree memory-db),
it unconditionally created an auto-worktree for the milestone, nesting
.gsd/worktrees/M001 inside the existing worktree. This caused GSD to
chdir into the inner worktree, read state from the wrong repo, and
report "All milestones complete" or loop on artifact verification.
Add detectWorktreeName() guard to both the start and resume paths:
if already inside a worktree, skip auto-worktree creation and work
directly on the current branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The multi-milestone discussion flow writes CONTEXT.md files for each
milestone but never adds depends_on YAML frontmatter. The QUEUE.md
documents the dependency chain, but the auto-mode state machine reads
dependencies from CONTEXT.md frontmatter only — not from QUEUE.md.
Without frontmatter, milestones execute in filesystem order regardless
of their actual dependency chain, causing out-of-order execution.
Fix: Added MANDATORY depends_on documentation to both discuss.md
(Phase 2, after primary milestone) and queue.md (output section).
Instructs the LLM to write frontmatter with the exact milestone IDs
from the dependency chain confirmed during the milestone split gate.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: TÂCHES <afromanguy@me.com>
* docs: add startup performance analysis and optimization plan
Profiled GSD CLI startup finding 2.2s for --version and ~3.8s for
interactive mode. Identified 5 root causes with measured timings and
created a phased optimization plan targeting <0.2s for --version
and ~0.8s for interactive startup.
* perf: speed up GSD startup with lazy loading and fast paths
- Fast-path --version/-v and --help/-h in loader.ts before importing
any heavy dependencies (2.2s → 0.15s, 14x faster)
- Lazy-load undici (~200ms) only when HTTP_PROXY env vars are set
- Skip initResources cpSync when managed-resources.json version
matches current GSD version (~128ms saved per launch)
- Lazy-load Mistral SDK (~369ms) on first API call instead of startup
- Lazy-load Google GenAI SDK (~186ms) on first API call instead of
startup
- Parallelize extension loading with Promise.all() instead of
sequential for-loop
---------
Co-authored-by: TÂCHES <afromanguy@me.com>
Eliminate slice branches — all work commits sequentially on milestone/<MID>
within auto-mode worktrees. No branch creation, switching, or merging
within a worktree. Planning artifacts (.gsd/milestones/) tracked in git
properly instead of being blanket-gitignored then force-added.
Removes ~2,600 lines: ensureSliceBranch, switchToMain, mergeSliceToMain,
mergeSliceToMilestone, shouldUseWorktreeIsolation, getMergeToMainMode,
withMergeHeal, recoverCheckout, fix-merge dispatch/labels, and associated
tests. Adds legacy_slice_branches doctor check, deprecation warnings for
git.isolation and git.merge_to_main preferences.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `gsd auto --debug` and `gsd next --debug` flags (also GSD_DEBUG=1
env var) that write structured JSONL debug logs to `.gsd/debug/`.
Instrumentation points:
- deriveState() timing and result phase
- parseRoadmap/parsePlan timing with native flag
- TTSR checkDelta timing, buffer size, and peak tracking
- Unit dispatch cycle/lifetime counts
- Context injection sizing
- Debug summary with aggregated stats on stop
The logger is zero-overhead when disabled — all functions check a
boolean and return immediately. Auto-prunes to 5 most recent logs.
Replace [\s\S]*? regex patterns with indexOf-based string parsing in
boundary map, preferences, and skill-discovery frontmatter parsers to
eliminate catastrophic backtracking on content containing code fences.
Add 50ms throttle to TTSR JS-fallback regex path to prevent CPU spinning
when token deltas arrive faster than regex evaluation on growing buffers.
Closes#468
Hooks were dispatched (runtime record created with phase="dispatched") but
never properly tracked through completion. Four issues fixed:
1. Hook runtime records now finalized: handleAgentEnd writes phase="finalized"
and clears the record when a hook completes. Previously records stayed at
"dispatched" forever because verifyExpectedArtifact returned false for
hook types.
2. Supervision timer for hooks: hook dispatch now sets a hard timeout so
stuck hooks don't hang auto-mode indefinitely.
3. Hook retry removes completion key: when a hook requests retry via
retry_on, the trigger unit's completion key is removed from the
idempotency set so dispatchNextUnit will re-dispatch it.
4. Hook closeout in dispatchNextUnit: hook units are properly closed out
(pushed to completedUnits, runtime cleared) without polluting the
idempotency set. verifyExpectedArtifact returns true for hook/ types.
Fixes#140 (comment 4063396798)
Fix parsing issues that prevented OpenRouter model preferences from
being correctly picked up during auto-mode dispatching:
- Array items with colons (e.g. qwen/qwen3-coder:free) were incorrectly
parsed as objects instead of strings. Now only items matching a valid
key-value pattern (key:value where key is [A-Za-z0-9_]+) are treated
as structured objects.
- Inline YAML comments (# after whitespace) were included in parsed
values, causing model ID lookups to fail silently.
- Frontmatter regex now handles Windows CRLF line endings.
- GSDPreferences.models type updated from GSDModelConfig (legacy
string-only) to GSDModelConfig | GSDModelConfigV2 to match actual
runtime usage with extended object format.
- Explicit comment-line skipping in the parser loop for clarity.
- Added comprehensive test suite covering OpenRouter-style org/model IDs,
colon variants, inline comments, CRLF, and mixed format configs.
A plan file with zero tasks caused `find(t => !t.done)` to return
undefined, which was treated as "all tasks done" → summarizing phase.
Now requires `tasks.length > 0` before entering summarizing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Git worktree path resolution on Windows CI uses UNC/8.3 temp dir forms
that don't survive normalization for path matching. The underlying source
logic works correctly (tested on macOS/Linux); these tests exercise git
worktree infrastructure that has inherent platform differences in path
representation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for Windows CI failures:
1. Remove single quotes from git branch --list glob patterns. On Windows,
cmd.exe passes single quotes literally to git, preventing glob expansion.
Affects shouldUseWorktreeIsolation() and stale branch detection.
2. Extend listWorktrees() branch-name fallback to cover milestone/* branches,
not just worktree/* branches. On Windows, path normalization can prevent
path-based worktree matching; the branch-name fallback is the safety net.
3. Use path.sep instead of hardcoded "/" in CWD prefix checks (doctor.ts
orphan fix guard, worktree-manager.ts removeWorktree guard).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shell string interpolation of multi-line commit messages breaks on
Windows — the closing quote gets consumed mid-message, causing the
branch name suffix to be parsed as a second argument to git merge
(producing "fatal: No remote for the current branch").
Switch to execFileSync with argument arrays for merge, commit, and
add commands that include user-generated content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace single-quoted git commit messages with double quotes
- Replace bash redirect syntax with cross-platform alternatives
- Add git branch -M main to git-self-heal test setup for consistent branch naming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- cli.ts: warn on stderr when terminal is narrower than 40 columns
- dashboard-overlay.ts: add disposed flag checked before scheduling
refreshes and after async loadData() to prevent rendering on a
stopped TUI
- cli.ts: use chalk.yellow/dim/bold instead of raw \x1b sequences for
version mismatch message; chalk v5 auto-respects NO_COLOR
- update-check.ts: same chalk migration for the update banner
- guided-flow.ts: log auto-start errors when GSD_DEBUG is set instead
of silently swallowing them
Kills two independent failure paths causing the recurring dispatch loop bug:
Path B: dispatchNextUnit() called clearPathCache() but not clearParseCache(),
allowing stale parsed roadmap data (with [ ] instead of [x]) to persist
through the doctor→dispatch transition.
Path A: handleAgentEnd() never verified whether the just-completed unit
produced its expected artifact before re-entering the dispatch loop.
Now persists completion key after verification, so the idempotency
check in dispatchNextUnit() skips already-completed units.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>