Commit graph

523 commits

Author SHA1 Message Date
Ethan Hurst
0dd0a1a4d2 fix: add missing reasonSuffix declaration in stopAuto
The reason parameter was added to stopAuto() but the reasonSuffix
variable derived from it was never declared, causing TS2304 errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 12:13:02 +10:00
Ethan
04721cb20e Merge branch 'main' into fix/stop-auto-reason 2026-03-17 11:48:42 +10:00
Ethan Hurst
894089cc32 fix: add stop reason to every auto-mode stop (#760)
stopAuto() now accepts an optional `reason` parameter that is included
in the session summary — every stop is self-documenting instead of
showing a generic "Auto-mode stopped" message.

Also replaces the catch-all `!mid` check with registry-aware logic that
distinguishes "all complete" from "blocked" and "unexpected no active
milestone" (with diagnostic output). Adds midTitle recovery fallback
when title regex strips to empty string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:38:54 +10:00
TÂCHES
3cf6b35b8a Merge pull request #736 from gsd-build/feat/validate-milestone-code
feat(gsd): implement validate-milestone phase and dispatch
2026-03-16 18:57:45 -06:00
Lex Christopherson
09d62e01d1 feat(gsd): implement validate-milestone phase and dispatch
Add a `validating-milestone` phase that runs BEFORE `completing-milestone`
to reconcile planned work against delivered work. The validator checks
success criteria, slice deliverables, cross-slice integration, and
requirement coverage before allowing milestone completion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 18:46:08 -06:00
TÂCHES
e0c1cc2f9d Merge branch 'main' into feat/gsd-headless-command 2026-03-16 18:44:18 -06:00
TÂCHES
889a2ee137 Merge pull request #755 from jeremymcs/feat/vscode-marketplace
feat(vscode): marketplace-ready files for VS Code extension publishing
2026-03-16 18:43:31 -06:00
TÂCHES
1e951f9648 Merge pull request #718 from jeremymcs/fix/682-vscode-extension-rebase
feat: VS Code extension — rebased with CI + review fixes (#682)
2026-03-16 18:42:30 -06:00
TÂCHES
4d78620ff1 Merge pull request #754 from jeremymcs/fix/forensics-version-loading
fix(forensics): use GSD_VERSION env var instead of package.json path traversal
2026-03-16 18:42:09 -06:00
TÂCHES
0f13a8d59c Merge pull request #752 from jeremymcs/feat/688-guided-discuss-milestone
feat(discuss): structured question rounds in guided-discuss-milestone (#688)
2026-03-16 18:41:37 -06:00
TÂCHES
f4f998efc5 Merge pull request #747 from trek-e/fix/733-bash-ampersand-hang
fix: add anti-pattern rule against bash with & to prevent agent hangs (#733)
2026-03-16 18:38:40 -06:00
TÂCHES
8ca9725bb0 Merge pull request #745 from jeremymcs/fix/737-dependency-range-expansion
fix(roadmap): expand range syntax in depends (S01-S04 → S01,S02,S03,S04)
2026-03-16 18:37:53 -06:00
TÂCHES
a129e15759 Merge pull request #738 from jeremymcs/fix/733-background-command-hang
fix: prevent indefinite hang when LLM uses bare & to background processes (#733)
2026-03-16 18:36:29 -06:00
TÂCHES
3354f6300c Merge pull request #735 from trek-e/fix/699-plan-slice-empty-scaffold
fix: reject empty scaffold plan files in plan-slice artifact verification (#699)
2026-03-16 18:35:38 -06:00
TÂCHES
4df08ad935 Merge pull request #734 from jeremymcs/fix/728-skip-loop-breaker
fix(auto): break infinite skip loop on repeatedly-skipped completed units
2026-03-16 18:35:04 -06:00
TÂCHES
d8ebe1300a Merge pull request #729 from jeremymcs/fix/724-forensics-worktree-awareness
fix: make forensics worktree-aware to prevent stale root misdiagnosis (#724)
2026-03-16 18:34:46 -06:00
TÂCHES
f9c356bfba Merge pull request #730 from gsd-build/feat/validate-milestone-prompt
feat(gsd): add validate-milestone prompt and template
2026-03-16 18:34:29 -06:00
Jeremy McSpadden
6e90e8d83b perf: optimize bg-shell hot path, parallel git queries, lazy workspace validation
- bg-shell/types: add compiled union regexes (ERROR/WARNING/READINESS/BUILD/TEST)
  built once at module load; add LINE_DEDUP_MAX constant (500); add
  stdoutLineCount/stderrLineCount tracked fields to BgProcess; export
  PORT_PATTERN_SOURCE string to avoid .source access per line

- bg-shell/output-formatter: analyzeLine uses union regexes instead of
  .some(p => p.test(line)) across 5 pattern arrays; PORT_PATTERN no longer
  reconstructed via new RegExp() on every line; lineDedup Map now has LRU
  eviction at LINE_DEDUP_MAX entries (prevents unbounded memory growth on
  long-running processes); getHighlights also uses union regexes

- bg-shell/process-manager: addOutputLine increments stdoutLineCount/
  stderrLineCount in O(1) as lines arrive; getInfo uses tracked counters
  instead of two O(n) .filter() passes over the output buffer

- gsd/diff-context: replace execFileSync with async execFile wrapper;
  getRecentlyChangedFiles and getChangedFilesWithContext now run all
  independent git queries concurrently via Promise.all (3-5 serial
  subprocess spawns -> 1 parallel batch)

- gsd/workspace-index: per-slice indexing now runs concurrently via
  Promise.all within each milestone; add IndexWorkspaceOptions with
  validate flag (default false) — validatePlanBoundary/validateCompleteBoundary
  skipped by default since they do expensive content analysis and are only
  needed for explicit doctor/audit flows; getSuggestedNextCommands passes
  validate:true as the sole consumer of validationIssues
2026-03-16 19:11:43 -05:00
Jeremy McSpadden
359deb6c23 fix(forensics): use GSD_VERSION env var instead of package.json path traversal
Extensions run from ~/.gsd/agent/extensions/gsd/ at runtime, not from the
package install directory. The previous code traversed 4 levels up from
import.meta.url to find package.json, which resolves to ~/package.json at
runtime — wrong on every system.

The loader already sets process.env.GSD_VERSION at startup, which is how
every other extension reads the version. Use that instead.
2026-03-16 18:54:30 -05:00
Jeremy McSpadden
67847a6547 fix(ci): use pi.getActiveTools() instead of ctx.getActiveTools()
ExtensionContext in the published package does not have getActiveTools —
it lives on ExtensionAPI (pi). The local source has it on both but CI
typechecks against the installed package, which failed with:

  Property 'getActiveTools' does not exist on type 'ExtensionCommandContext'
2026-03-16 18:44:21 -05:00
Jeremy McSpadden
18aa6b1084 feat(discuss): structured ask_user_questions rounds in guided-discuss-milestone (#688)
guided-discuss-milestone.md was a single-paragraph stub — the agent had
no interview protocol, no check-in round, no depth verification, and no
host-conditional behaviour. On Copilot this meant every clarification
burned a separate request with no structure.

Changes:

- guided-discuss-milestone.md: full interview protocol matching
  guided-discuss-slice structure:
  - mandatory investigation pass before first round
  - 1–3 questions per round
  - check-in after each round (wrap up vs keep going)
  - depth verification checklist before wrap-up
  - host-conditional: uses ask_user_questions when available (pi),
    falls back to plain text when not (Copilot, Cursor, Windsurf)
  - depth_verification question ID convention preserved for the
    write-gate in index.ts

- guided-flow.ts: all 5 loadPrompt('guided-discuss-milestone') call
  sites now pass structuredQuestionsAvailable by checking
  ctx.getActiveTools().includes('ask_user_questions') at dispatch time.
  Returns 'true'/'false' string so the prompt can branch conditionally.
2026-03-16 18:39:31 -05:00
Tom Boucher
75a5dd08ad fix: add anti-pattern rule against bash with & to prevent agent hangs (#733)
The bash tool waits for stdout/stderr file descriptors to close. When the
LLM runs 'python -m http.server 8080 &', the backgrounded process inherits
stdout and keeps it open — the bash call hangs indefinitely.

The bg_shell tool exists for exactly this purpose (detached process groups,
readiness detection, lifecycle management). The system prompt already said
to use bg_shell for servers but didn't explicitly warn against bash with &.

Added:
- Explicit anti-pattern: 'Never use bash with & to background a process'
- Expanded background processes section explaining why & hangs
- Both reference bg_shell start as the correct alternative
2026-03-16 19:20:09 -04:00
Jeremy McSpadden
4774a1df22 fix(roadmap): expand range syntax in depends (S01-S04 → S01,S02,S03,S04)
LLMs frequently write depends:[S01-S04] as natural shorthand.
The parser split only on commas, so this produced a single literal
element "S01-S04" that never matched any real slice ID —
permanently blocking the slice with "No slice eligible".

Changes:

roadmap-slices.ts:
- Add expandDependencies() helper — after comma-split, detect dep
  tokens matching /^PrefixN(-|..)PrefixM$/ and expand to individual
  IDs. Handles S01-S04 (dash range) and S01..S04 (dot-range).
  Zero-padding preserved. Mismatched prefixes and reversed ranges
  pass through unchanged.
- Wire into parseRoadmapSlices() after the comma-split step.
- Export for direct testing.

doctor.ts:
- Add "unresolvable_dependency" warning code.
- In the slice audit loop, check each dep against the set of known
  slice IDs in the roadmap. Fires a warning with the bad dep name
  and the correct format hint. Catches leftover range IDs on roadmaps
  that were written before this fix, and catches typos.

plan-milestone.md prompt:
- Add explicit rule: use comma-separated depends:[S01,S02,S03], never
  range syntax. Defense-in-depth so LLMs don't generate the problem.

Tests:
- roadmap-slices.test.ts: 10 new expandDependencies cases + 2
  parseRoadmapSlices integration cases (range + comma round-trip).
- doctor.test.ts: unresolvable_dependency fires for unknown dep S99,
  does not fire for valid S01 dep.

952/952 unit tests pass.
Closes #737
2026-03-16 18:09:06 -05:00
Tom Boucher
2756428e6e fix: reject empty scaffold plan files in plan-slice artifact verification (#699)
verifyExpectedArtifact() for plan-slice units only checked whether the
plan file existed on disk, not whether it contained actual task entries.
When a plan file was created as an empty scaffold during discussion/context
(headings but no tasks), the artifact check considered it 'complete' and
skipped the dispatch. Since deriveState still returned phase:'planning'
(no tasks found), this created an infinite skip loop until auto-mode
exhausted its retry budget and stopped silently.

Added a content check that requires at least one task entry matching
the pattern '- [ ] **T##:' or '- [x] **T##:' before considering a
plan-slice artifact valid. This mirrors the existing content-aware
check used for execute-task (which verifies checkbox state).

Added 3 regression tests covering empty scaffold, valid tasks, and
completed tasks.
2026-03-16 19:07:37 -04:00
TÂCHES
d10412bb1e Merge pull request #727 from jeremymcs/fix/723-auto-lock-creation 2026-03-16 17:04:39 -06:00
Jeremy McSpadden
ae4ae8e8d8 fix(auto): add stalled-tool detection and background process prompt guidance
Two additional layers to address #733 (background command hang):

1. Stalled-tool detection in idle watchdog (auto.ts)
   - Change inFlightTools from Set<string> to Map<string, number> to
     track per-tool start timestamps
   - Idle watchdog now compares the oldest in-flight tool's age to the
     idle timeout. Tools in-flight for < idleTimeoutMs continue to
     suppress recovery as before. Tools running >= idleTimeoutMs are
     treated as stuck and recovery proceeds — preventing infinite hang
     when the bash rewrite is bypassed or a tool hangs for other reasons.
   - Export getOldestInFlightToolAgeMs() for testability

2. Prompt guidance in execute-task.md
   - Add explicit "Background process rule" to step 5 explaining why
     bare `command &` hangs the Bash tool and showing the correct
     `command > /dev/null 2>&1 &` pattern
   - Recommends bg_shell tool as the preferred approach

3. Test updates (in-flight-tool-tracking.test.ts)
   - Import and verify getOldestInFlightToolAgeMs export
   - Update header comment to reflect Map-with-timestamps design
2026-03-16 18:04:27 -05:00
Jeremy McSpadden
742cd70c9b fix(auto): break infinite skip loop on repeatedly-skipped completed units
When deriveState() keeps returning the same already-completed unit,
the idempotency skip paths in dispatchNextUnit recursively call
themselves forever. The existing MAX_SKIP_DEPTH (20) breaker yields
to the UI but then re-enters the same loop; the hard lifetime counter
(unitLifetimeDispatches) is never reached because skip paths return
before touching it.

Root cause: no per-unit counter on the skip-only path.

Fix:
- Add unitConsecutiveSkips map + MAX_CONSECUTIVE_SKIPS = 3
- Both skip paths (completedKeySet hit, and fallback artifact-exists)
  increment the counter on each skip of the same idempotencyKey
- When the counter exceeds MAX_CONSECUTIVE_SKIPS, evict the key from
  completedKeySet and persisted storage, invalidate state, and let
  deriveState reconcile on the next real dispatch
- Counter resets to 0 for a given key whenever a real dispatch
  proceeds (i.e., past both skip paths)
- Counter fully cleared at all 4 existing clear sites (stopAuto,
  startAuto, crash recovery, pause/resume)

Export _getUnitConsecutiveSkips / _resetUnitConsecutiveSkips /
MAX_CONSECUTIVE_SKIPS for testability (same pattern as
doctor-proactive.ts resetProactiveHealing).

Tests: auto-skip-loop.test.ts — counter mechanics, threshold bounds,
eviction round-trip, per-key isolation (10 assertions).
Closes #728
2026-03-16 17:48:39 -05:00
frizynn
f56b8c69f0 fix: simplify headless flags, add missing imports, document headless mode
- Remove --verbose flag from headless (use --json for detailed output)
- Remove redundant sawToolExecution state variable
- Remove unused rejectCompletion
- Add missing build*Prompt imports in auto.ts (fixes CI typecheck:extensions)
- Document headless mode in README.md and docs/commands.md
- Simplify help text with examples instead of exhaustive command catalog
2026-03-16 19:46:56 -03:00
frizynn
8ddea154e5 feat: redesign gsd headless for full workflow orchestration
Replace --step flag with positional command routing so any /gsd
subcommand can run headlessly. Add /gsd dispatch <phase> for direct
unit-type dispatch (research, plan, execute, complete, reassess, uat,
replan) with state-aware resolution.

Quick commands (status, queue, doctor, etc.) resolve on first agent_end.
Long-running commands (auto, next, dispatch) use idle timer + terminal
notification detection.
2026-03-16 19:45:39 -03:00
frizynn
93ee6646f1 test: add integration test for gsd headless command
End-to-end test that validates the headless CLI subcommand by:
- Creating a temp dir with a complete .gsd/ project fixture
- Spawning `node dist/loader.js headless --step --json`
- Validating exit code, JSONL stdout, stderr progress, and artifact

Supports --dry-run for fixture validation without running the agent.
2026-03-16 19:45:39 -03:00
Jeremy McSpadden
1871da1fb3 fix: use process.ppid instead of PID 1 for cross-platform test
PID 1 (init) exists on Unix but not on Windows, causing the
cross-process detection test to fail in CI. Use process.ppid
(parent process) which is guaranteed alive on all platforms.
2026-03-16 17:41:36 -05:00
Lex Christopherson
138a13b620 feat(gsd): add validate-milestone prompt and template
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 16:39:06 -06:00
Jeremy McSpadden
b19357dc84 fix: make forensics worktree-aware to prevent stale root misdiagnosis
When auto-mode runs in an auto-worktree, activity logs are written to
`.gsd/worktrees/<MID>/.gsd/activity/` while forensics only scanned
`.gsd/activity/` at the project root. This caused forensics to report
stale failures from the root while the worktree had already produced
the correct artifacts and advanced to execution.

Changes:

forensics.ts:
- scanActivityLogs() now accepts activeMilestone and scans both the
  worktree activity dir (if an auto-worktree exists) and the root dir
- Results are merged and sorted by mtime so the most recent traces
  from either source appear first
- detectMissingArtifacts() checks both root and worktree paths before
  reporting a missing artifact, preventing false positives
- ForensicReport now includes activeWorktree field for visibility
- Saved report and prompt output include worktree context

session-forensics.ts:
- getDeepDiagnostic() now checks the worktree activity dir first by
  reading the active milestone ID from STATE.md (synchronous, no
  async deriveState dependency)
- Falls back to root activity dir when no worktree is found
- Added readActiveMilestoneId() helper for sync milestone detection

Closes #724
2026-03-16 17:38:07 -05:00
TÂCHES
1a85853fd8 Merge pull request #725 from gsd-build/fix/screenshot-squish-constraint
fix: prevent full-page screenshots from being squished
2026-03-16 16:37:43 -06:00
Jeremy McSpadden
def96a1b6e fix: write auto.lock at startup and detect remote sessions in dashboard (#723)
Three bugs caused /gsd status to show "No unit running" while auto mode
was actively executing in another terminal:

1. auto.lock was only written during unit dispatch (after newSession()),
   not at auto-mode startup or resume. Any cross-process check between
   startup and first dispatch would find no lock file.

2. The dashboard read only the in-memory `active` flag, which is always
   false in a different process. It never checked auto.lock for
   cross-process detection.

3. The triage dispatch path wrote the lock to `basePath` (worktree)
   instead of `lockBase()` (project root), making it invisible to
   other terminals checking the project root.

Changes:
- Write initial auto.lock immediately in startAuto() and on resume
- Add cross-process detection in getAutoDashboardData() via auto.lock
- Add remoteSession field to AutoDashboardData for cross-process info
- Update dashboard overlay to show remote session status and unit info
- Fix triage dispatch to use lockBase() instead of basePath
- Add 11 tests covering lock creation, cross-process detection, and
  stale lock handling
2026-03-16 17:36:04 -05:00
Lex Christopherson
5ae08c4ec5 fix: use independent width/height caps for screenshot constraining
Full-page screenshots were being squished into a 1568x1568 square,
making tall pages unreadable. Now caps width at 1568px and height
at 8000px independently, preserving readability for long pages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 16:23:53 -06:00
TÂCHES
51cf029c96 Merge pull request #717 from domstepek/fix/visualizer-shift-tab
fix(gsd): support Shift+Tab in visualizer
2026-03-16 16:07:40 -06:00
TÂCHES
73b7b0d540 Merge pull request #714 from jeremymcs/fix/701-capture-resolution-execution
fix: execute capture resolutions after triage (#701)
2026-03-16 16:07:08 -06:00
TÂCHES
e185b9e263 Merge pull request #715 from trek-e/docs/698-browser-tools-requirements
feat(browser-tools): add 10 new browser tools (#698)
2026-03-16 16:03:52 -06:00
Dom Stepek
d2917f18b6 fix(gsd): support shift-tab in visualizer 2026-03-16 17:50:50 -04:00
Jeremy McSpadden
48feced87d feat: add VS Code extension scaffold and MCP server compiled module
- Add vscode-extension/ with full MVP scaffold:
  - GsdClient: spawns gsd --mode rpc, JSON line communication
  - @gsd Chat participant: forward messages to agent, stream responses
  - Sidebar panel: connection status, model info, start/stop controls
  - Command palette: gsd.start, gsd.stop, gsd.newSession, gsd.sendMessage
  - Extension config: gsd.binaryPath setting
- Add compiled MCP server module at src/mcp-server.ts for tsc output
- Add MCP server tests verifying module import and instantiation
2026-03-16 16:46:20 -05:00
Tom Boucher
ca299db1c6 feat(browser-tools): add 10 new browser tools (#698)
Implement all features from the browser-tools feature additions proposal:

1. browser_extract — structured data extraction with JSON Schema validation
2. browser_save_state / browser_restore_state — session state persistence
3. browser_generate_test — Playwright test code generation from session
4. browser_mock_route / browser_block_urls / browser_clear_routes — network interception
5. browser_emulate_device — device emulation with 143 Playwright device presets
6. browser_visual_diff — visual regression diffing with baseline management
7. browser_save_pdf — PDF generation (Chromium page.pdf)
8. browser_zoom_region — region capture with upscaling via sharp
9. browser_action_cache — intent→selector caching for repeat visits
10. browser_check_injection — prompt injection detection on page content

Total browser tools: 47 → 60. No new dependencies — uses existing
sharp, ajv, @sinclair/typebox, and Playwright core APIs.
2026-03-16 17:45:11 -04:00
Jeremy McSpadden
c46a4ec484 fix: execute capture resolutions after triage instead of just classifying
Captures classified as inject, replan, or quick-task were marked
"resolved" in CAPTURES.md but their resolution actions were never
executed — tasks were never injected into plans, replan triggers
were never written, and quick-tasks were never dispatched.

This wires up the existing resolution executor functions that were
defined but never called:

- After triage-captures unit completes, executeTriageResolutions()
  reads actionable captures and executes their resolutions:
  - inject: calls executeInject() to add tasks to the slice plan
  - replan: calls executeReplan() to write REPLAN-TRIGGER.md
  - quick-task: queues for dispatch as a new unit type

- Quick-task dispatch block dispatches queued captures one at a time
  using buildQuickTaskPrompt(), with proper session/timeout handling

- New markCaptureExecuted() and loadActionableCaptures() functions
  track execution state, preventing double-execution on retries

- Quick-task unit type excluded from post-unit hooks (lightweight
  one-offs don't need hook chains)

Closes #701
2026-03-16 16:28:39 -05:00
TÂCHES
da25c0b692 Merge pull request #703 from rangoc/fix/auto-mode-skill-loading
fix(prompts): make skill loading an active directive in auto-mode units
2026-03-16 15:22:50 -06:00
TÂCHES
915112ca1f Merge pull request #710 from jeremymcs/fix/707-execute-task-verification-budget
fix: pass verificationBudget to execute-task prompt template
2026-03-16 15:17:31 -06:00
TÂCHES
f550904724 Merge pull request #708 from jeremymcs/fix/gsd-cleanup-command
fix: handle bare /gsd cleanup command
2026-03-16 15:17:10 -06:00
TÂCHES
bbe665ac04 Merge pull request #702 from ryharrin/fix/gsd-bg-shell-stale-cwd
fix: stop bg-shell from persisting into stale auto-worktree paths
2026-03-16 15:16:38 -06:00
Jeremy McSpadden
b8e6294e6b fix: pass verificationBudget to execute-task prompt template
buildExecuteTaskPrompt() was missing the verificationBudget variable
that the execute-task.md template expects. The prompt-loader's strict
placeholder validator threw on every auto-mode task dispatch, blocking
all execution entirely.

Compute the budget from the executor's context window using the existing
computeBudgets() engine and pass it as ~NNK chars format string.

Fixes #707
2026-03-16 16:07:53 -05:00
Jeremy McSpadden
d19e213010 fix: handle bare /gsd cleanup command
Previously, running `/gsd cleanup` without a subcommand (branches or
snapshots) fell through to the unknown command handler, producing a
warning. Now bare `/gsd cleanup` runs both branch and snapshot cleanup.
2026-03-16 16:04:00 -05:00
Ryan Harrington
f87b4938ca fix/gsd-bg-shell-stale-cwd: normalize bg-shell worktree cwd detection 2026-03-16 17:02:58 -04:00