* refactor: replace serial prefs wizard with categorized menu
The /gsd prefs wizard previously dumped 20+ prompts in sequence, which
was overwhelming. This refactors it into a category picker loop where
users select from 7 categories (Models, Timeouts, Git, Skills, Budget,
Notifications, Advanced), configure only what they need, and return to
the menu with updated summaries showing current values at a glance.
- Extract 7 category functions from monolithic handlePrefsWizard
- Add buildCategorySummaries() for current-value display in menu
- Category loop with Save & Exit / Escape to serialize and write
- No logic changes to individual prompts — pure structural refactor
* fix: narrow ctx.ui.select return type for TypeScript strict mode
ctx.ui.select returns string | string[], so startsWith is not available
without narrowing. Extract to string with typeof guard before dispatching.
When /gsd auto is called with no milestone, it delegates to the
discussion flow (showSmartEntry). Previously, if the LLM didn't follow
the discussion protocol — e.g. for simple tasks where it judged the
ceremony overkill and started editing directly — auto mode never
activated. The function returned after showSmartEntry with no retry
or notification, leaving the user in a loop.
Fix: After showSmartEntry returns in both the no-milestone and
pre-planning paths, re-derive state from disk. If the LLM produced
enough artifacts (CONTEXT.md, ROADMAP.md, or advanced the phase),
auto mode proceeds instead of returning. If not, a clear warning
tells the user what happened and what to do next.
This handles the case where the LLM writes files but doesn't follow
the exact discussion → CONTEXT.md → checkAutoStartAfterDiscuss flow.
Adds /gsd help (aliases: h, ?) that displays a grouped reference of
every available subcommand with usage, flags, and shortcuts.
Commands are organized by category: Workflow, Visibility, Course
Correction, Project Knowledge, Configuration, and Maintenance.
Also simplifies the "Unknown command" error to point users to /gsd help
instead of listing all commands inline.
In worktree contexts, the LLM received relative output paths like
`.gsd/milestones/M002/slices/S01/S01-RESEARCH.md` combined with a
working directory containing `.gsd/worktrees/M002`. The double .gsd
in the resulting path confused the LLM, which resolved the relative
path against the project root instead of the worktree — writing
artifacts to the wrong location and triggering loop detection.
All write-target path variables (outputPath, taskSummaryPath,
sliceSummaryPath, milestoneSummaryPath, replanPath, planPath,
uatResultPath, assessmentPath, secretsOutputPath) are now passed
as absolute paths via join(base, relPath), eliminating the need
for the LLM to do path arithmetic in confusing worktree layouts.
Solo developers can fire-and-forget thoughts during auto-mode execution
via /gsd capture. The system triages accumulated captures at natural seams
between tasks, classifies their impact into five types (quick-task, inject,
defer, replan, note), and proposes appropriate action with user confirmation
for plan-modifying resolutions.
Pipeline: capture → triage → confirm → resolve → resume
- /gsd capture appends to .gsd/CAPTURES.md (worktree-aware)
- Triage fires automatically between tasks in handleAgentEnd
- Five resolution types: inline quick task, inject task into plan,
defer for reassess, trigger replan with context, acknowledge as note
- Dashboard overlay shows pending capture count badge
- Capture context injected into replan-slice and reassess-roadmap prompts
- Parse failure falls back to note — pipeline never blocks
New modules: captures.ts, triage-ui.ts, triage-resolution.ts
New prompt: triage-captures.md
52 tests across 3 test files, all passing
Requirements R045-R051 validated
Closes#505
chore: pre-merge cleanup — remove dead code, single-read dashboard optimization
- Remove processTriageResults() and associated types (dead code, superseded by
inline resolution in auto.ts dispatch loop)
- Add countPendingCaptures() for single-read regex count on dashboard hot path
(replaces two-phase hasPendingCaptures + loadPendingCaptures)
- Update triage-dispatch tests to match new implementation
* feat: dynamic model routing for token consumption optimization (#575)
Add complexity-based model routing that classifies units into light/standard/heavy
tiers and routes to cheaper models when appropriate. Reduces token consumption
by 20-50% for users on capped plans.
- Complexity classifier with heuristic-based tier assignment (no LLM call)
- Model router with downgrade-only semantics (user's config is ceiling)
- Budget-pressure-aware routing (more aggressive as budget fills)
- Cross-provider cost comparison via bundled cost table
- Hook classification support
- Escalation on failure (light → standard → heavy)
- Full preference validation and merge support
- Metrics tracking with tier and downgrade fields
- 40 new tests (classifier, router, cost table)
Closes#575
* feat: phases 2-4 — dashboard, adaptive learning, task introspection
Phase 2 — Observability & Dashboard:
- Tier badge [L]/[S]/[H] displayed in progress widget next to phase label
- Dynamic routing savings summary shown in footer when units have been downgraded
- Tier and modelDowngraded fields passed through snapshotUnitMetrics
Phase 3 — Adaptive Learning:
- New routing-history.ts: tracks success/failure per tier per unit-type pattern
- Rolling window of 50 entries per pattern to prevent stale data
- User feedback support (over/under/ok) with 2x weight vs automatic
- Failure rate >20% auto-bumps tier for that pattern
- Tag-specific patterns (e.g. execute-task:docs) for granular learning
- History persists to .gsd/routing-history.json
- Classifier consults adaptive history before finalizing tier
Phase 4 — Task Plan Introspection:
- Code block counting in task plans (5+ blocks → heavy)
- Complexity keyword detection: migration, architecture, security,
performance, concurrency, compatibility
- Multiple complexity keywords (2+) → heavy, single → standard
- New codeBlockCount and complexityKeywords fields in TaskMetadata
Tests: 16 new tests (routing history + introspection), 419 total passing
* test: add feature-branch lifecycle integration test
Proves the core invariant: milestone worktrees branch from and merge
back to the feature branch, never touching main. Covers:
- Full lifecycle with unique milestone IDs (M001-xxxxxx format)
- Untracked .gsd/ planning files copied into worktree
- Multiple successive milestones on the same feature branch
- Main branch completely untouched throughout
* fix: commitCount return type (parseInt)
Update both docs/configuration.md (user-facing) and
src/resources/extensions/gsd/docs/preferences-reference.md (internal)
with complete coverage of all GSD preferences:
- Add /gsd prefs subcommands table (global, project, status, wizard, setup)
- Document token_profile (budget/balanced/quality) and phases settings
- Document context_pause_threshold field
- Document remote_questions configuration (Slack/Discord)
- Document git.merge_strategy (squash/merge) and git.isolation (worktree/branch)
- Expand post_unit_hooks with missing agent field
- Expand pre_dispatch_hooks with skip_if, unit_type, model fields
and action validation rules
- Add known unit types list for hook before/after arrays
- Add examples for pre-dispatch hooks (modify/skip/replace)
- Add examples for token profile, phases, and remote questions
- Update models to show all 6 phases (research, planning, execution,
execution_simple, completion, subagent)
- Add full example combining all major settings
In RPC mode, ctx.ui.custom() returns undefined without emitting any event.
This caused showNextAction() — and all 13+ call sites in guided-flow.ts —
to silently complete without taking action. No error thrown, no event
emitted, command handler returns normally.
Fix: After custom() returns, check for undefined/null and fall back to
ctx.ui.select() which IS implemented in RPC mode. Maps the action list
to select labels and resolves the chosen action id.
Four bugfixes for open issues:
1. Worktree created from integration branch, not main (#606)
- createAutoWorktree reads integration branch from META.json
- mergeMilestoneToMain merges to integration branch, not hardcoded main
- createWorktree accepts optional startPoint parameter
2. Resolve project root from worktree paths in all commands (#608, #602)
- Add resolveProjectRoot() to detect .gsd/worktrees/ in cwd
- All GSD commands use projectRoot() instead of raw process.cwd()
- Fixes stale cwd after milestone completion (#608)
- Fixes discuss/status basepath disagreement (#602)
3. Milestone merge skipped in branch isolation mode (#603)
- Add branch-mode fallback when isInAutoWorktree() is false
- Detects milestone/* branch and performs squash-merge
- Uses same mergeMilestoneToMain flow as worktree mode
4. Remote questions onboarding missing .js module (#592)
- Extract saveRemoteQuestionsConfig into compiled src/ helper
- Avoids cross-boundary import from compiled JS to raw .ts
After milestone completion and merge, the process cwd could remain
inside .gsd/worktrees/<MID>/, causing new milestone writes to land
in the wrong directory.
Three-layer fix:
1. escapeStaleWorktree() at startAuto entry — detects if base path
is inside .gsd/worktrees/ and chdir back to project root
2. stopAuto() unconditionally restores cwd to originalBasePath,
not just when isInAutoWorktree returns true (module state may
have been cleared by mergeMilestoneToMain already)
3. Milestone merge error handler restores cwd on partial failure
where mergeMilestoneToMain chdir'd but then threw
Closes#608
Multiple sources of unbounded memory growth caused V8 to OOM after
~50 minutes of auto-mode operation:
1. activity-log.ts: saveActivityLog serialized ALL session entries
into a single string for SHA1 dedup, allocating hundreds of MB
per unit cycle. Now uses streaming writes (writeSync per entry)
and a lightweight fingerprint (entry count + last 3 entries hash)
instead of full-content hashing.
2. activity-log.ts: activityLogState Map was never cleared between
sessions, accumulating lastSnapshotKeyByUnit entries indefinitely.
Added clearActivityLogState() export, called from stopAuto().
3. auto.ts: completedUnits array grew unbounded for dashboard
display. Now capped at 200 entries and cleared on stopAuto().
4. paths.ts: dirEntryCache and dirListCache Maps grew without bounds
between clearPathCache() calls. Added DIR_CACHE_MAX (200) eviction
— when cache exceeds limit, it's cleared before adding new entries.
Closes#611
* fix: avoid native hangs in gsd auto paths
* fix: use .js extension in edit-diff.test.ts import for tsc compatibility
* fix: prevent OOM on large file diffs and implement context-line windowing
- Add size guard (MAX_DP_CELLS=4M) to buildLineDiff that falls back to a
linear-time prefix/suffix matching algorithm for large files, preventing
the O(n*m) DP table from causing OOM crashes
- Implement contextLines parameter in generateDiffString so only lines
within N lines of a change are rendered (with "..." separators), matching
unified diff behavior — the parameter was previously accepted but ignored
- Add tests for both context windowing and large-file fallback
---------
Co-authored-by: TÂCHES <afromanguy@me.com>
Reduces auto-mode token consumption by 40-60% through coordinated
optimizations driven by a single token_profile preference.
Profile presets (budget/balanced/quality):
- One preference key coordinates model selection, phase skipping,
context compression, and subagent routing
- Balanced is the default for new projects (D046)
- Explicit user preferences always override profile defaults
Phase skipping:
- Guard clauses on research-milestone, research-slice, and
reassess-roadmap dispatch rules
- Skipped phases return null (fall-through), preserving state machine
- Budget profile skips all research + reassess; balanced skips slice
research only
Context compression:
- inlineLevel parameter (full/standard/minimal) on 6 prompt builders
- Minimal: only output template + essential context (≥30% reduction)
- Standard: skip redundant templates
- Full: current behavior unchanged
Complexity routing:
- classifyTaskComplexity() for task plans (step/file/signal heuristics)
- classifyUnitComplexity() for unit types with budget pressure
thresholds at 50/75/90% (from #579)
- execution_simple model config for cheap simple-task routing
- escalateTier() for failure recovery (light→standard→heavy)
Adaptive learning (from #579):
- routing-history.ts tracks success/failure per tier per pattern
- Rolling 50-entry window, 20% failure threshold auto-bumps tier
- User feedback weighted 2x vs automatic detection
- Persists to .gsd/routing-history.json
Budget prediction:
- getAverageCostPerUnitType() + predictRemainingCost() in metrics
- projectedRemainingCost + profileDowngraded in AutoDashboardData
- One-way auto-downgrade within a milestone (D048)
Addresses #575
95 tests across 5 test files, all passing.
* fix: allow stopping auto-mode from a different terminal (#584)
Auto-mode lock file was written to the worktree path instead of the
project root, making it invisible to other processes. Additionally,
/gsd stop only checked in-memory state which is process-local.
- Add lockBase() helper to always write auto.lock at project root
- Add stopAutoRemote() for cross-process stop via SIGTERM
- Update /gsd stop to fall back to lock-file-based remote stop
* fix: handle Windows SIGTERM behavior in stop-auto-remote test
On Windows, SIGTERM is not interceptable by Node.js processes — the
process exits with code 1 rather than running the SIGTERM handler.
Accept either exit code on Windows while still asserting clean exit (0)
on Unix platforms.
cacheKey() used length + first/last 100 chars, which collides when a
checkbox changes [ ] → [x] mid-file (same length, same endpoints).
verifyExpectedArtifact() only cleared the path cache, not the parse
cache, so parseRoadmap() returned stale data with done=false.
- Add clearParseCache() to verifyExpectedArtifact alongside clearPathCache
- Include middle 100-char sample in cacheKey to prevent interior collisions
- Add regression test for the cache collision scenario
* feat: add git.commit_docs setting to keep .gsd/ local-only (#501)
Adds a new `commit_docs` boolean to git preferences. When set to `false`:
- The entire `.gsd/` directory is added to `.gitignore`
- `smartStage()` excludes all `.gsd/` files from commits
- Bootstrap init skips the "chore: init gsd" commit
- `writeIntegrationBranch()` skips committing metadata
- The self-heal that removes blanket `.gsd/` patterns is bypassed
This allows users in corporate environments or mixed teams to use GSD
without polluting the shared git repository with planning artifacts.
Closes#501
* feat: add commit_docs toggle to preferences wizard
Adds "Track .gsd/ planning docs in git" to the /gsd prefs wizard,
allowing users to toggle commit_docs interactively alongside other
git settings like main_branch.
The guided flow's "Create roadmap" path never set pendingAutoStart,
so checkAutoStartAfterDiscuss() always returned false after planning
completed. Auto-mode stalled at "Milestone planned" instead of
proceeding to slice research/execution.
Three fixes:
- Set pendingAutoStart when choice === "plan" in showSmartEntry
- Relax Gate 1 to accept ROADMAP.md (plan path) or CONTEXT.md (discuss path)
- Add STATE.md write instruction to guided-plan-milestone prompt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(undo): use invalidateAllCaches to prevent stale state after undo
After deleting summary files and modifying PLAN files, only
invalidateStateCache() was called. Path and parse caches remained
stale, causing deriveState() to return incorrect results — showing
undone tasks as still complete.
* perf: optimize hot-path lookups, cache clearing, and error resilience
- Replace O(n) Array.includes() with Set-based O(1) lookups in
persistCompletedKey, findCommitsForUnit, and extractCommitShas
- Skip unnecessary cache invalidation for hook units in
verifyExpectedArtifact (moved clearPathCache after hook early-return)
- Avoid redundant disk writes in removePersistedKey when key not present
- Single-pass partition for conflicted files in reconcileMergeState
instead of two separate filter passes
- Wrap undo git operations in try/finally to guarantee cache
invalidation even on partial failure
- Surface auto-start errors to user via ui.notify instead of
swallowing silently (was debug-only logging)
* fix: use execFileSync for git commands to handle paths with spaces
execSync builds a shell command string via string interpolation, so any
path containing spaces (e.g. 'Current Projects/my-repo') gets word-split
by the shell into multiple arguments. This caused 'git worktree add' to
fail with a usage error whenever the repo was in a directory with spaces.
Switch all three git runner functions to execFileSync, which takes args
as an array and bypasses the shell entirely. Paths are passed as discrete
arguments and never subject to word-splitting or other shell expansions.
Affected files:
- worktree-manager.ts: runGit()
- git-service.ts: runGit()
- native-git-bridge.ts: gitExec()
* fix: restore pre-merge check command execution
* ci: add extension type-checking to CI pipeline and prepublishOnly
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve remaining extension type errors after merge
- Use cred.type === "api_key" for proper union narrowing in loadToolApiKeys
- Fix optional level parameter in provider-error-pause test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add "success" to notify type union across ExtensionUIContext, interactive
mode, and RPC mode implementations. Fix null safety for readFileSync and
contextUsage.percent in auto.ts. Add discriminated union narrowing for
dispatch results. Add string type guards for select() return values in
commands.ts. Align ProviderErrorPauseUI notify signature. Simplify
AuthStorage return type.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `/gsd steer <change>` command that registers user overrides in
`.gsd/OVERRIDES.md`. Active overrides are injected into all prompts.
A `rewrite-docs` dispatch unit propagates overrides across plan docs.
Addresses all review concerns from PR #409:
- resolveAllOverrides wired into handleAgentEnd
- Circuit breaker (max 3 attempts, then force-resolve)
- verifyExpectedArtifact validates OVERRIDES.md state
- Milestone-only unitId when no active slice
- Test temp dirs cleaned up
- Remove extraneous argument in auto-worktree report() call
- Replace vitest imports with node:test in integration-mixed-milestones and unique-milestone-ids
- Cast deprecated merge_to_main references as any in preferences-git tests
- Type pre-dispatch hook arrays as PreDispatchHookConfig[] in preferences-hooks tests
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add missing parameters (signal, onUpdate, ctx) to tool execute signatures
and details property to return objects to satisfy AgentToolResult<T> type.
Fix string-to-boolean type mismatch on display property in sendMessage calls.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Export TOOL_KEYS constant and add loadToolApiKeys() function to load
API keys from ~/.gsd/agent/auth.json into environment variables.
Called in session_start handler so tool-based extensions (Context7,
Brave Search, Jina, Tavily, Groq) work immediately without requiring
/gsd config.
* fix(auto): add missing import for resolveSkillDiscoveryMode
Used at line 687 but not imported, causing "resolveSkillDiscoveryMode is
not defined" crash on auto-mode startup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): add workingDirectory to all auto-mode prompt templates
Six prompt templates (reassess-roadmap, complete-milestone, replan-slice,
run-uat, research-milestone, plan-milestone) were missing the working
directory directive. Without it, the LLM infers the main repo path from
system context and cd's there instead of staying in the worktree. This
causes artifacts to be written to the wrong location, preventing the
dispatch loop from detecting completion and triggering infinite
re-dispatches of the same unit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(auto): detect mid-session resource updates and stop gracefully
Templates are read from disk on each dispatch but extension code is
loaded once at startup. If resources are re-synced mid-session (via
/gsd:update, npm update, or dev copy-resources), templates may expect
variables the in-memory code doesn't provide, causing a crash.
Add a syncedAt timestamp to managed-resources.json. Auto-mode captures
this at startup and checks before each dispatch. If resources changed,
it stops with a clear message instead of crashing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add workingDirectory to prompt template test fixtures
Tests that load prompt templates via loadPromptFromWorktree now pass the
workingDirectory variable, matching the updated templates that include
the {{workingDirectory}} directive.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After deleting summary files and modifying PLAN files, only
invalidateStateCache() was called. Path and parse caches remained
stale, causing deriveState() to return incorrect results — showing
undone tasks as still complete.
Four fixes to auto-recovery logic that caused silent failures or
inconsistent state:
1. skipExecuteTask: return false when checkbox regex doesn't match the
plan format, so callers fall through to other recovery strategies
instead of assuming success (lines 252-255)
2. verifyExpectedArtifact: fail verification on corrupt/unparseable
roadmap instead of silently passing. Prevents advancing past an
incomplete complete-slice when the roadmap file is malformed (line 152)
3. removePersistedKey: use atomic tmp+rename write (matching
persistCompletedKey) to prevent completed-units.json corruption
on crash mid-write (line 293)
4. selfHealRuntimeRecords: use verifyExpectedArtifact instead of bare
existsSync for execute-task healing, so tasks with summary but
unchecked plan checkbox aren't incorrectly marked complete (line 374)
Co-authored-by: TÂCHES <afromanguy@me.com>
The progress bar in the auto-mode widget was snapshot-based — only
updated at dispatch time via updateSliceProgressCache(). During
long-running units (especially after the worktree architecture in
PR #506), the bar appeared frozen even as tasks completed on disk.
Add a 5-second interval inside the widget that re-reads the roadmap
and plan files from disk, so slice/task progress reflects reality
without waiting for the next unit dispatch.
Closes#549
#536 changed git.isolation from deprecated to an active setting.
Update the test to verify it passes through correctly instead of
expecting a deprecation warning. Add separate test for the still-
deprecated git.merge_to_main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>