- bg-shell/types: add compiled union regexes (ERROR/WARNING/READINESS/BUILD/TEST)
built once at module load; add LINE_DEDUP_MAX constant (500); add
stdoutLineCount/stderrLineCount tracked fields to BgProcess; export
PORT_PATTERN_SOURCE string to avoid .source access per line
- bg-shell/output-formatter: analyzeLine uses union regexes instead of
.some(p => p.test(line)) across 5 pattern arrays; PORT_PATTERN no longer
reconstructed via new RegExp() on every line; lineDedup Map now has LRU
eviction at LINE_DEDUP_MAX entries (prevents unbounded memory growth on
long-running processes); getHighlights also uses union regexes
- bg-shell/process-manager: addOutputLine increments stdoutLineCount/
stderrLineCount in O(1) as lines arrive; getInfo uses tracked counters
instead of two O(n) .filter() passes over the output buffer
- gsd/diff-context: replace execFileSync with async execFile wrapper;
getRecentlyChangedFiles and getChangedFilesWithContext now run all
independent git queries concurrently via Promise.all (3-5 serial
subprocess spawns -> 1 parallel batch)
- gsd/workspace-index: per-slice indexing now runs concurrently via
Promise.all within each milestone; add IndexWorkspaceOptions with
validate flag (default false) — validatePlanBoundary/validateCompleteBoundary
skipped by default since they do expensive content analysis and are only
needed for explicit doctor/audit flows; getSuggestedNextCommands passes
validate:true as the sole consumer of validationIssues
Adds everything needed to publish the extension to the VS Code Marketplace:
- README.md — full feature documentation with commands table, keyboard
shortcuts, configuration reference, quick start guide, and @gsd chat
participant usage
- CHANGELOG.md — initial 0.1.0 release notes
- .vscodeignore — excludes src/, tsconfig, maps from the .vsix package
- .gitignore — excludes dist/ and *.vsix from version control
- LICENSE — MIT license copied from repo root
- package.json — adds repository, homepage, bugs, keywords, galleryBanner
fields required by the marketplace; adds @vscode/vsce to devDependencies;
adds publish script
Verified: `npm run package` produces a clean 30KB .vsix with no warnings.
Run `npm run publish` with a VSCE_PAT token to publish.
Extensions run from ~/.gsd/agent/extensions/gsd/ at runtime, not from the
package install directory. The previous code traversed 4 levels up from
import.meta.url to find package.json, which resolves to ~/package.json at
runtime — wrong on every system.
The loader already sets process.env.GSD_VERSION at startup, which is how
every other extension reads the version. Use that instead.
- CHANGELOG: fill in [Unreleased] with gsd sessions, 10 new browser
tools, visualizer shift-tab fix, capture resolution fix, screenshot
constraint fix, auto.lock fix, and cross-platform test fix
- README: add gsd sessions to CLI reference table; expand Browser Tools
description to cover the 13 new tools shipped in #698
- docs/commands.md: add gsd sessions to CLI Flags table
- docs/getting-started.md: document gsd sessions in Resume a Session
- docs/proposals/698: mark status as Shipped, update Current State
section to reflect the 13 implemented tools
ExtensionContext in the published package does not have getActiveTools —
it lives on ExtensionAPI (pi). The local source has it on both but CI
typechecks against the installed package, which failed with:
Property 'getActiveTools' does not exist on type 'ExtensionCommandContext'
guided-discuss-milestone.md was a single-paragraph stub — the agent had
no interview protocol, no check-in round, no depth verification, and no
host-conditional behaviour. On Copilot this meant every clarification
burned a separate request with no structure.
Changes:
- guided-discuss-milestone.md: full interview protocol matching
guided-discuss-slice structure:
- mandatory investigation pass before first round
- 1–3 questions per round
- check-in after each round (wrap up vs keep going)
- depth verification checklist before wrap-up
- host-conditional: uses ask_user_questions when available (pi),
falls back to plain text when not (Copilot, Cursor, Windsurf)
- depth_verification question ID convention preserved for the
write-gate in index.ts
- guided-flow.ts: all 5 loadPrompt('guided-discuss-milestone') call
sites now pass structuredQuestionsAvailable by checking
ctx.getActiveTools().includes('ask_user_questions') at dispatch time.
Returns 'true'/'false' string so the prompt can branch conditionally.
The bash tool waits for stdout/stderr file descriptors to close. When the
LLM runs 'python -m http.server 8080 &', the backgrounded process inherits
stdout and keeps it open — the bash call hangs indefinitely.
The bg_shell tool exists for exactly this purpose (detached process groups,
readiness detection, lifecycle management). The system prompt already said
to use bg_shell for servers but didn't explicitly warn against bash with &.
Added:
- Explicit anti-pattern: 'Never use bash with & to background a process'
- Expanded background processes section explaining why & hangs
- Both reference bg_shell start as the correct alternative
LLMs frequently write depends:[S01-S04] as natural shorthand.
The parser split only on commas, so this produced a single literal
element "S01-S04" that never matched any real slice ID —
permanently blocking the slice with "No slice eligible".
Changes:
roadmap-slices.ts:
- Add expandDependencies() helper — after comma-split, detect dep
tokens matching /^PrefixN(-|..)PrefixM$/ and expand to individual
IDs. Handles S01-S04 (dash range) and S01..S04 (dot-range).
Zero-padding preserved. Mismatched prefixes and reversed ranges
pass through unchanged.
- Wire into parseRoadmapSlices() after the comma-split step.
- Export for direct testing.
doctor.ts:
- Add "unresolvable_dependency" warning code.
- In the slice audit loop, check each dep against the set of known
slice IDs in the roadmap. Fires a warning with the bad dep name
and the correct format hint. Catches leftover range IDs on roadmaps
that were written before this fix, and catches typos.
plan-milestone.md prompt:
- Add explicit rule: use comma-separated depends:[S01,S02,S03], never
range syntax. Defense-in-depth so LLMs don't generate the problem.
Tests:
- roadmap-slices.test.ts: 10 new expandDependencies cases + 2
parseRoadmapSlices integration cases (range + comma round-trip).
- doctor.test.ts: unresolvable_dependency fires for unknown dep S99,
does not fire for valid S01 dep.
952/952 unit tests pass.
Closes#737
verifyExpectedArtifact() for plan-slice units only checked whether the
plan file existed on disk, not whether it contained actual task entries.
When a plan file was created as an empty scaffold during discussion/context
(headings but no tasks), the artifact check considered it 'complete' and
skipped the dispatch. Since deriveState still returned phase:'planning'
(no tasks found), this created an infinite skip loop until auto-mode
exhausted its retry budget and stopped silently.
Added a content check that requires at least one task entry matching
the pattern '- [ ] **T##:' or '- [x] **T##:' before considering a
plan-slice artifact valid. This mirrors the existing content-aware
check used for execute-task (which verifies checkbox state).
Added 3 regression tests covering empty scaffold, valid tasks, and
completed tasks.
Two additional layers to address #733 (background command hang):
1. Stalled-tool detection in idle watchdog (auto.ts)
- Change inFlightTools from Set<string> to Map<string, number> to
track per-tool start timestamps
- Idle watchdog now compares the oldest in-flight tool's age to the
idle timeout. Tools in-flight for < idleTimeoutMs continue to
suppress recovery as before. Tools running >= idleTimeoutMs are
treated as stuck and recovery proceeds — preventing infinite hang
when the bash rewrite is bypassed or a tool hangs for other reasons.
- Export getOldestInFlightToolAgeMs() for testability
2. Prompt guidance in execute-task.md
- Add explicit "Background process rule" to step 5 explaining why
bare `command &` hangs the Bash tool and showing the correct
`command > /dev/null 2>&1 &` pattern
- Recommends bg_shell tool as the preferred approach
3. Test updates (in-flight-tool-tracking.test.ts)
- Import and verify getOldestInFlightToolAgeMs export
- Update header comment to reflect Map-with-timestamps design
Root cause: when the LLM runs `cmd &`, bash forks the process and
exits immediately. The forked process inherits Node's piped stdout/
stderr FDs. Node.js waits for all holders of those FDs to close before
firing the 'close' event — so the tool hangs until the background
process exits (which for a server is never).
Fix: add rewriteBackgroundCommand() in bash.ts. Before exec, detect
commands with a trailing & background operator and inject
>/dev/null 2>&1 before the & when stdout is not already redirected.
This severs the pipe inheritance so Node gets 'close' immediately
when the shell exits.
Guards:
- Commands already redirecting stdout (>, >>, &>, |) are not rewritten
- && (logical AND) is not affected
- & inside single-quoted strings is not affected
- A brief onUpdate advisory is surfaced when rewrite happens so the
LLM knows to prefer nohup/setsid for robust detachment
Export rewriteBackgroundCommand from pi-coding-agent for testability.
Tests: bash-background.test.ts — 12 cases covering no-op paths,
rewrite paths, compound commands, and already-safe nohup patterns.
Closes#733
When deriveState() keeps returning the same already-completed unit,
the idempotency skip paths in dispatchNextUnit recursively call
themselves forever. The existing MAX_SKIP_DEPTH (20) breaker yields
to the UI but then re-enters the same loop; the hard lifetime counter
(unitLifetimeDispatches) is never reached because skip paths return
before touching it.
Root cause: no per-unit counter on the skip-only path.
Fix:
- Add unitConsecutiveSkips map + MAX_CONSECUTIVE_SKIPS = 3
- Both skip paths (completedKeySet hit, and fallback artifact-exists)
increment the counter on each skip of the same idempotencyKey
- When the counter exceeds MAX_CONSECUTIVE_SKIPS, evict the key from
completedKeySet and persisted storage, invalidate state, and let
deriveState reconcile on the next real dispatch
- Counter resets to 0 for a given key whenever a real dispatch
proceeds (i.e., past both skip paths)
- Counter fully cleared at all 4 existing clear sites (stopAuto,
startAuto, crash recovery, pause/resume)
Export _getUnitConsecutiveSkips / _resetUnitConsecutiveSkips /
MAX_CONSECUTIVE_SKIPS for testability (same pattern as
doctor-proactive.ts resetProactiveHealing).
Tests: auto-skip-loop.test.ts — counter mechanics, threshold bounds,
eviction round-trip, per-key isolation (10 assertions).
Closes#728
- Remove --verbose flag from headless (use --json for detailed output)
- Remove redundant sawToolExecution state variable
- Remove unused rejectCompletion
- Add missing build*Prompt imports in auto.ts (fixes CI typecheck:extensions)
- Document headless mode in README.md and docs/commands.md
- Simplify help text with examples instead of exhaustive command catalog
Replace --step flag with positional command routing so any /gsd
subcommand can run headlessly. Add /gsd dispatch <phase> for direct
unit-type dispatch (research, plan, execute, complete, reassess, uat,
replan) with state-aware resolution.
Quick commands (status, queue, doctor, etc.) resolve on first agent_end.
Long-running commands (auto, next, dispatch) use idle timer + terminal
notification detection.
End-to-end test that validates the headless CLI subcommand by:
- Creating a temp dir with a complete .gsd/ project fixture
- Spawning `node dist/loader.js headless --step --json`
- Validating exit code, JSONL stdout, stderr progress, and artifact
Supports --dry-run for fixture validation without running the agent.
Adds a first-class `gsd headless` command that runs auto-mode without a
TUI by spawning a child process in RPC mode via RpcClient. Useful for
CI/CD pipelines, scripts, and unattended execution.
CLI interface:
gsd headless - Run auto-mode until complete
gsd headless --step - Run one unit only (sends /gsd next)
gsd headless --timeout 300000 - Custom timeout (default 5 min)
gsd headless --json - Forward RPC events as JSONL to stdout
gsd headless --verbose - Show full agent text and tool results
gsd headless --model <id> - Override model
Exit codes: 0 = complete, 1 = error/timeout, 2 = blocked
Features:
- Extension UI auto-responder (handles select, confirm, input, editor,
notify, setStatus, setWidget, setTitle, set_editor_text)
- Completion detection via terminal notification keywords + idle timeout
- Human-readable progress output to stderr
- SIGINT/SIGTERM forwarding for clean shutdown
- Child process crash detection
- Completion summary with diagnostics on failure
PID 1 (init) exists on Unix but not on Windows, causing the
cross-process detection test to fail in CI. Use process.ppid
(parent process) which is guaranteed alive on all platforms.
When auto-mode runs in an auto-worktree, activity logs are written to
`.gsd/worktrees/<MID>/.gsd/activity/` while forensics only scanned
`.gsd/activity/` at the project root. This caused forensics to report
stale failures from the root while the worktree had already produced
the correct artifacts and advanced to execution.
Changes:
forensics.ts:
- scanActivityLogs() now accepts activeMilestone and scans both the
worktree activity dir (if an auto-worktree exists) and the root dir
- Results are merged and sorted by mtime so the most recent traces
from either source appear first
- detectMissingArtifacts() checks both root and worktree paths before
reporting a missing artifact, preventing false positives
- ForensicReport now includes activeWorktree field for visibility
- Saved report and prompt output include worktree context
session-forensics.ts:
- getDeepDiagnostic() now checks the worktree activity dir first by
reading the active milestone ID from STATE.md (synchronous, no
async deriveState dependency)
- Falls back to root activity dir when no worktree is found
- Added readActiveMilestoneId() helper for sync milestone detection
Closes#724
Three bugs caused /gsd status to show "No unit running" while auto mode
was actively executing in another terminal:
1. auto.lock was only written during unit dispatch (after newSession()),
not at auto-mode startup or resume. Any cross-process check between
startup and first dispatch would find no lock file.
2. The dashboard read only the in-memory `active` flag, which is always
false in a different process. It never checked auto.lock for
cross-process detection.
3. The triage dispatch path wrote the lock to `basePath` (worktree)
instead of `lockBase()` (project root), making it invisible to
other terminals checking the project root.
Changes:
- Write initial auto.lock immediately in startAuto() and on resume
- Add cross-process detection in getAutoDashboardData() via auto.lock
- Add remoteSession field to AutoDashboardData for cross-process info
- Update dashboard overlay to show remote session status and unit info
- Fix triage dispatch to use lockBase() instead of basePath
- Add 11 tests covering lock creation, cross-process detection, and
stale lock handling
1. Webview CSP nonce (security): Added Content-Security-Policy meta tag
with nonce-based script-src to sidebar.ts. Replaced all inline
onclick handlers with data-command attributes and a single delegated
event listener, which CSP requires over inline handlers.
2. Dead branch in chat-participant.ts: Removed the isSlashCommand
conditional that ran identical code for both paths — slash commands
and regular messages both call sendPrompt() the same way.
3. Restart loop cooldown in gsd-client.ts: Added a 60-second sliding
window that tracks crash timestamps. If the process crashes more
than 3 times within 60 seconds, auto-restart is disabled and an
error is surfaced to the user via the onError event emitter.
Full-page screenshots were being squished into a 1568x1568 square,
making tall pages unreadable. Now caps width at 1568px and height
at 8000px independently, preserving readability for long pages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>