Two compounding bugs caused auto-mode to loop infinitely after stopping
and restarting when a worktree with committed progress existed:
Bug 1: copyPlanningArtifacts overwrites worktree state on restart
When auto-mode restarts and the milestone branch exists (worktree dir was
removed but branch preserved), createAutoWorktree re-attaches the worktree
to the existing branch — git correctly checks out the committed state with
[x] checkboxes. But then copyPlanningArtifacts unconditionally copies the
project root's .gsd/milestones/ into the worktree, overwriting the correct
[x] with stale [ ] from the root (which isn't always fully synced).
Fix: Skip copyPlanningArtifacts when branchExists is true. The branch
checkout already has the correct artifacts from committed work.
Bug 2: deriveState reads stale content from SQLite DB
deriveState had a DB-first content loading path that read artifact content
from the SQLite artifacts table. This table was populated once during
migrateFromMarkdown and never updated when files changed on disk (roadmap
checkbox updates, plan changes, etc.). Even after fixing files on disk,
deriveState returned stale DB content, keeping the state machine stuck.
Fix: Remove the DB content loading path from deriveState entirely. The
native Rust batch parser (nativeBatchParseGsdFiles) reads all .md files
in one call and is fast enough. The DB is still used for structured queries
(decisions, requirements) but no longer as a content cache for state
derivation.
Updated derive-state-db.test.ts Test 5 to write requirements to disk
instead of testing the now-removed DB-only content path.
When mergeMilestoneToMain runs from a worktree context, main is already
checked out at the project root. The unconditional git checkout main
fails with "already used by worktree" because git refuses to checkout a
branch that is active in another worktree.
Skip the checkout when the integration branch is already current at the
project root, which is always the case in worktree-mode merges.
Resolve conflicts between #699 (empty scaffold rejection) and #739
(task plan file verification) in auto-dispatch.ts imports and
auto-recovery.test.ts tests.
- auto-dispatch.ts: merged imports from both branches (resolveTaskFile
from #739, resolveMilestonePath/buildMilestoneFileName from main)
- auto-recovery.test.ts: included all tests from both #699 (empty
scaffold, actual tasks, completed tasks) and #739 (all task plans
exist, missing task plan, no tasks). Updated #699 tests to create
task plan files alongside slice plans to satisfy #739's verification.
Updated #739 "no tasks" test to expect false per #699's requirement
that plans must have task entries.
- auto-recovery.ts: auto-merged cleanly, both checks coexist
All 26 recovery tests pass. Full build clean.
Add a `validating-milestone` phase that runs BEFORE `completing-milestone`
to reconcile planned work against delivered work. The validator checks
success criteria, slice deliverables, cross-slice integration, and
requirement coverage before allowing milestone completion.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- bg-shell/types: add compiled union regexes (ERROR/WARNING/READINESS/BUILD/TEST)
built once at module load; add LINE_DEDUP_MAX constant (500); add
stdoutLineCount/stderrLineCount tracked fields to BgProcess; export
PORT_PATTERN_SOURCE string to avoid .source access per line
- bg-shell/output-formatter: analyzeLine uses union regexes instead of
.some(p => p.test(line)) across 5 pattern arrays; PORT_PATTERN no longer
reconstructed via new RegExp() on every line; lineDedup Map now has LRU
eviction at LINE_DEDUP_MAX entries (prevents unbounded memory growth on
long-running processes); getHighlights also uses union regexes
- bg-shell/process-manager: addOutputLine increments stdoutLineCount/
stderrLineCount in O(1) as lines arrive; getInfo uses tracked counters
instead of two O(n) .filter() passes over the output buffer
- gsd/diff-context: replace execFileSync with async execFile wrapper;
getRecentlyChangedFiles and getChangedFilesWithContext now run all
independent git queries concurrently via Promise.all (3-5 serial
subprocess spawns -> 1 parallel batch)
- gsd/workspace-index: per-slice indexing now runs concurrently via
Promise.all within each milestone; add IndexWorkspaceOptions with
validate flag (default false) — validatePlanBoundary/validateCompleteBoundary
skipped by default since they do expensive content analysis and are only
needed for explicit doctor/audit flows; getSuggestedNextCommands passes
validate:true as the sole consumer of validationIssues
Adds everything needed to publish the extension to the VS Code Marketplace:
- README.md — full feature documentation with commands table, keyboard
shortcuts, configuration reference, quick start guide, and @gsd chat
participant usage
- CHANGELOG.md — initial 0.1.0 release notes
- .vscodeignore — excludes src/, tsconfig, maps from the .vsix package
- .gitignore — excludes dist/ and *.vsix from version control
- LICENSE — MIT license copied from repo root
- package.json — adds repository, homepage, bugs, keywords, galleryBanner
fields required by the marketplace; adds @vscode/vsce to devDependencies;
adds publish script
Verified: `npm run package` produces a clean 30KB .vsix with no warnings.
Run `npm run publish` with a VSCE_PAT token to publish.
Extensions run from ~/.gsd/agent/extensions/gsd/ at runtime, not from the
package install directory. The previous code traversed 4 levels up from
import.meta.url to find package.json, which resolves to ~/package.json at
runtime — wrong on every system.
The loader already sets process.env.GSD_VERSION at startup, which is how
every other extension reads the version. Use that instead.
- CHANGELOG: fill in [Unreleased] with gsd sessions, 10 new browser
tools, visualizer shift-tab fix, capture resolution fix, screenshot
constraint fix, auto.lock fix, and cross-platform test fix
- README: add gsd sessions to CLI reference table; expand Browser Tools
description to cover the 13 new tools shipped in #698
- docs/commands.md: add gsd sessions to CLI Flags table
- docs/getting-started.md: document gsd sessions in Resume a Session
- docs/proposals/698: mark status as Shipped, update Current State
section to reflect the 13 implemented tools
ExtensionContext in the published package does not have getActiveTools —
it lives on ExtensionAPI (pi). The local source has it on both but CI
typechecks against the installed package, which failed with:
Property 'getActiveTools' does not exist on type 'ExtensionCommandContext'
guided-discuss-milestone.md was a single-paragraph stub — the agent had
no interview protocol, no check-in round, no depth verification, and no
host-conditional behaviour. On Copilot this meant every clarification
burned a separate request with no structure.
Changes:
- guided-discuss-milestone.md: full interview protocol matching
guided-discuss-slice structure:
- mandatory investigation pass before first round
- 1–3 questions per round
- check-in after each round (wrap up vs keep going)
- depth verification checklist before wrap-up
- host-conditional: uses ask_user_questions when available (pi),
falls back to plain text when not (Copilot, Cursor, Windsurf)
- depth_verification question ID convention preserved for the
write-gate in index.ts
- guided-flow.ts: all 5 loadPrompt('guided-discuss-milestone') call
sites now pass structuredQuestionsAvailable by checking
ctx.getActiveTools().includes('ask_user_questions') at dispatch time.
Returns 'true'/'false' string so the prompt can branch conditionally.
All major LLM provider SDKs were loaded eagerly at startup, penalizing
users regardless of which provider they actually use. This change defers
SDK loading until first API call for:
- @anthropic-ai/sdk (anthropic.ts)
- openai (openai-responses.ts, openai-completions.ts, azure-openai-responses.ts)
- @google/genai (google-vertex.ts)
The Bedrock provider already used this pattern. Now all 5 remaining
providers use the same async lazy-loader pattern:
- Static import changed to `import type` (erased at compile time)
- Module-level `let _SdkClass` cache variable
- `async function getSdkClass()` loader with singleton caching
- `createClient()` made async, uses `await getSdkClass()`
- Call sites updated with `await createClient()`
For google-vertex.ts, ThinkingLevel enum usage replaced with equivalent
string literals to eliminate the runtime import entirely.
All packages build cleanly. The startup improvement is proportional to
how many providers were installed — on typical installs this eliminates
eager loading of 30-40MB of SDK code.
Four-part fix for the failure chain reported in #739:
1. **Dispatch guard** (auto-dispatch.ts): refuse to dispatch execute-task
when T{tid}-PLAN.md is missing on disk. Emits a stop action with a
clear error message instead of sending the agent in blind with a
missing plan, which was the proximate cause of the runaway session
and eventual EPIPE crash.
2. **verifyExpectedArtifact for plan-slice** (auto-recovery.ts): after
verifying S{sid}-PLAN.md exists, also check that every task listed in
the plan has a corresponding T{tid}-PLAN.md. A plan-slice that wrote
the slice plan but omitted task plans was previously considered
complete, allowing the dispatch guard above to be bypassed on
idempotency replay.
3. **EPIPE guard** (index.ts): register an uncaughtException handler at
extension load time that catches EPIPE (broken stdio pipe) and exits
cleanly instead of crashing with an unhandled exception. The crash in
#739 was triggered by process.stderr.write() calls to a closed pipe
during LSP diagnostics in the execute-task session.
4. **Prompt hardening** (prompts/research-slice.md): explicitly note that
the research template is already inlined in the prompt and must not be
read from disk. The agent in #739 hallucinated a read of
templates/SLICE-RESEARCH.md (ENOENT), causing the subagent to abort,
which left no S03-RESEARCH.md and poisoned the downstream plan-slice.
The bash tool waits for stdout/stderr file descriptors to close. When the
LLM runs 'python -m http.server 8080 &', the backgrounded process inherits
stdout and keeps it open — the bash call hangs indefinitely.
The bg_shell tool exists for exactly this purpose (detached process groups,
readiness detection, lifecycle management). The system prompt already said
to use bg_shell for servers but didn't explicitly warn against bash with &.
Added:
- Explicit anti-pattern: 'Never use bash with & to background a process'
- Expanded background processes section explaining why & hangs
- Both reference bg_shell start as the correct alternative
LLMs frequently write depends:[S01-S04] as natural shorthand.
The parser split only on commas, so this produced a single literal
element "S01-S04" that never matched any real slice ID —
permanently blocking the slice with "No slice eligible".
Changes:
roadmap-slices.ts:
- Add expandDependencies() helper — after comma-split, detect dep
tokens matching /^PrefixN(-|..)PrefixM$/ and expand to individual
IDs. Handles S01-S04 (dash range) and S01..S04 (dot-range).
Zero-padding preserved. Mismatched prefixes and reversed ranges
pass through unchanged.
- Wire into parseRoadmapSlices() after the comma-split step.
- Export for direct testing.
doctor.ts:
- Add "unresolvable_dependency" warning code.
- In the slice audit loop, check each dep against the set of known
slice IDs in the roadmap. Fires a warning with the bad dep name
and the correct format hint. Catches leftover range IDs on roadmaps
that were written before this fix, and catches typos.
plan-milestone.md prompt:
- Add explicit rule: use comma-separated depends:[S01,S02,S03], never
range syntax. Defense-in-depth so LLMs don't generate the problem.
Tests:
- roadmap-slices.test.ts: 10 new expandDependencies cases + 2
parseRoadmapSlices integration cases (range + comma round-trip).
- doctor.test.ts: unresolvable_dependency fires for unknown dep S99,
does not fire for valid S01 dep.
952/952 unit tests pass.
Closes#737
verifyExpectedArtifact() for plan-slice units only checked whether the
plan file existed on disk, not whether it contained actual task entries.
When a plan file was created as an empty scaffold during discussion/context
(headings but no tasks), the artifact check considered it 'complete' and
skipped the dispatch. Since deriveState still returned phase:'planning'
(no tasks found), this created an infinite skip loop until auto-mode
exhausted its retry budget and stopped silently.
Added a content check that requires at least one task entry matching
the pattern '- [ ] **T##:' or '- [x] **T##:' before considering a
plan-slice artifact valid. This mirrors the existing content-aware
check used for execute-task (which verifies checkbox state).
Added 3 regression tests covering empty scaffold, valid tasks, and
completed tasks.
Two additional layers to address #733 (background command hang):
1. Stalled-tool detection in idle watchdog (auto.ts)
- Change inFlightTools from Set<string> to Map<string, number> to
track per-tool start timestamps
- Idle watchdog now compares the oldest in-flight tool's age to the
idle timeout. Tools in-flight for < idleTimeoutMs continue to
suppress recovery as before. Tools running >= idleTimeoutMs are
treated as stuck and recovery proceeds — preventing infinite hang
when the bash rewrite is bypassed or a tool hangs for other reasons.
- Export getOldestInFlightToolAgeMs() for testability
2. Prompt guidance in execute-task.md
- Add explicit "Background process rule" to step 5 explaining why
bare `command &` hangs the Bash tool and showing the correct
`command > /dev/null 2>&1 &` pattern
- Recommends bg_shell tool as the preferred approach
3. Test updates (in-flight-tool-tracking.test.ts)
- Import and verify getOldestInFlightToolAgeMs export
- Update header comment to reflect Map-with-timestamps design
Root cause: when the LLM runs `cmd &`, bash forks the process and
exits immediately. The forked process inherits Node's piped stdout/
stderr FDs. Node.js waits for all holders of those FDs to close before
firing the 'close' event — so the tool hangs until the background
process exits (which for a server is never).
Fix: add rewriteBackgroundCommand() in bash.ts. Before exec, detect
commands with a trailing & background operator and inject
>/dev/null 2>&1 before the & when stdout is not already redirected.
This severs the pipe inheritance so Node gets 'close' immediately
when the shell exits.
Guards:
- Commands already redirecting stdout (>, >>, &>, |) are not rewritten
- && (logical AND) is not affected
- & inside single-quoted strings is not affected
- A brief onUpdate advisory is surfaced when rewrite happens so the
LLM knows to prefer nohup/setsid for robust detachment
Export rewriteBackgroundCommand from pi-coding-agent for testability.
Tests: bash-background.test.ts — 12 cases covering no-op paths,
rewrite paths, compound commands, and already-safe nohup patterns.
Closes#733
When deriveState() keeps returning the same already-completed unit,
the idempotency skip paths in dispatchNextUnit recursively call
themselves forever. The existing MAX_SKIP_DEPTH (20) breaker yields
to the UI but then re-enters the same loop; the hard lifetime counter
(unitLifetimeDispatches) is never reached because skip paths return
before touching it.
Root cause: no per-unit counter on the skip-only path.
Fix:
- Add unitConsecutiveSkips map + MAX_CONSECUTIVE_SKIPS = 3
- Both skip paths (completedKeySet hit, and fallback artifact-exists)
increment the counter on each skip of the same idempotencyKey
- When the counter exceeds MAX_CONSECUTIVE_SKIPS, evict the key from
completedKeySet and persisted storage, invalidate state, and let
deriveState reconcile on the next real dispatch
- Counter resets to 0 for a given key whenever a real dispatch
proceeds (i.e., past both skip paths)
- Counter fully cleared at all 4 existing clear sites (stopAuto,
startAuto, crash recovery, pause/resume)
Export _getUnitConsecutiveSkips / _resetUnitConsecutiveSkips /
MAX_CONSECUTIVE_SKIPS for testability (same pattern as
doctor-proactive.ts resetProactiveHealing).
Tests: auto-skip-loop.test.ts — counter mechanics, threshold bounds,
eviction round-trip, per-key isolation (10 assertions).
Closes#728
- Remove --verbose flag from headless (use --json for detailed output)
- Remove redundant sawToolExecution state variable
- Remove unused rejectCompletion
- Add missing build*Prompt imports in auto.ts (fixes CI typecheck:extensions)
- Document headless mode in README.md and docs/commands.md
- Simplify help text with examples instead of exhaustive command catalog
Replace --step flag with positional command routing so any /gsd
subcommand can run headlessly. Add /gsd dispatch <phase> for direct
unit-type dispatch (research, plan, execute, complete, reassess, uat,
replan) with state-aware resolution.
Quick commands (status, queue, doctor, etc.) resolve on first agent_end.
Long-running commands (auto, next, dispatch) use idle timer + terminal
notification detection.