Addresses state safety issues found during #1062 deep dive:
1. completed-units.json writes in auto-worktree.ts and auto-worktree-sync.ts
used plain writeFileSync which could produce truncated/corrupt files on
crash, losing completion keys and causing unit re-dispatch. Switched to
atomicWriteSync (temp file + rename) for crash safety.
2. Plan file checkbox reconciliation in auto-worktree.ts also switched to
atomicWriteSync to prevent partial PLAN.md writes on crash.
3. db-writer.ts functions (saveDecisionToDb, updateRequirementInDb,
saveArtifactToDb) wrote markdown files via saveFile() without invalidating
caches afterward. Added targeted cache invalidation (state + path + parse)
so deriveState() always sees fresh data. Uses individual invalidation
functions rather than invalidateAllCaches() to avoid clearing the artifacts
table that was just written to.
The verification gate's discoverCommands() was passing prose descriptions
from task plan Verify: fields through sanitizeCommand(), which only checked
for shell injection characters. English prose like "Document exists, contains
all 5 scale names..." passed the filter and was executed via spawnSync,
causing exit code 127 false negatives.
Added isLikelyCommand() heuristic that distinguishes executable commands
from prose descriptions by checking:
- Known command prefixes (npm, node, tsc, eslint, etc.)
- Path-like first tokens (./script.sh, /usr/bin/check)
- Flag-like tokens (-v, --check)
- Uppercase-initial words with 4+ tokens (prose pattern)
- Comma-space clause separators (prose pattern)
Prose Verify: fields now fall through to package.json scripts or "none"
instead of being executed. Valid commands continue to work as before.
Closes#1066
When a model fails during auto-mode and the fallback chain is exhausted
(or absent), the error recovery path previously fell through to pause
without attempting to restore the session's original model. Meanwhile,
the fallback chain itself was read fresh from disk via
loadEffectiveGSDPreferences(), which could pick up models configured by
a different concurrent GSD session sharing the same global preferences
file.
This adds a session model recovery step between fallback exhaustion and
pause. After the existing fallback chain logic, we now check whether the
current model has diverged from the model captured at auto-mode start
(autoModeStartModel). If so, we restore the session model and retry
before giving up and pausing.
Changes:
- auto.ts: export getAutoModeStartModel() getter for the session's
captured start model
- index.ts: add session model recovery block after fallback chain
exhaustion, using the session-scoped model instead of re-reading
global preferences from disk
- model-isolation.test.ts: add 4 tests covering cross-session leakage
detection, divergence checks, and null safety
Two compounding bugs caused auto-mode to re-dispatch run-uat indefinitely
after UAT passed:
1. markSliceDoneInRoadmap regex required dash at line start (^-) but the
roadmap parser accepts optional leading whitespace (^\s*-). When LLMs
indented checklist items, the doctor could never mark them done.
2. After run-uat completed, handleAgentEnd ran doctor with fixLevel:"task"
which explicitly excluded slice-level completion transitions. Since
run-uat is the terminal unit for a slice, the roadmap checkbox stayed
unchecked, causing deriveState to return the same slice indefinitely.
Fix: Update markSliceDoneInRoadmap and markTaskDoneInPlan regexes to
accept leading whitespace (matching the parser), preserving indentation
in the replacement. Add run-uat to the set of unit types that use
fixLevel:"all" in handleAgentEnd closeout.
The /gsd update command imported compareSemver from ../../../update-check.js,
a relative path that resolves correctly in the source tree (src/resources/
extensions/gsd/ → src/update-check.js) but breaks when extensions are synced
to ~/.gsd/agent/extensions/gsd/ (where ../../../ points to ~/.gsd/ which has
no update-check.js).
This caused the error:
Extension "command:gsd" error: Cannot find module '../../../update-check.js'
Fix: inline a local compareSemverLocal() function in commands.ts, eliminating
the cross-tree import. The function is small (10 lines) and already well-tested
via update-check.test.ts.
* feat(gsd): add directory safeguards to prevent running in system/home paths
GSD previously had no protection against being launched from dangerous
directories like $HOME, /, /usr, or /etc. This adds layered validation:
- Blocked system paths (hard stop): /, /usr, /etc, /var, $HOME, tmpdir, etc.
- High entry count heuristic (>200 entries triggers confirmation dialog)
- Symlink resolution via realpathSync to prevent bypass
- Integrated at three chokepoints: projectRoot(), showSmartEntry(), bootstrapGsdDirectory()
Includes 19 tests covering all blocked categories, boundary conditions, and
the assertSafeDirectory throw/return behavior.
* fix: make directory safeguard tests cross-platform (Windows CI)
- Skip Unix-specific blocked path tests on Windows (/, /usr, /etc, etc.)
- Add Windows-specific blocked path tests (C:\, C:\Windows)
- Use platform-appropriate path separator in trailing slash test
- Fix root path normalization for Windows drive letters (C:\ not C:)
* feat: enhance HTML report with derived metrics, visualizations, and interactivity
Add 13 features to the HTML report generator across 6 implementation waves:
Wave 1 - Summary enhancements:
- Executive summary paragraph with project completion %, cost, and budget context
- ETA calculation based on completion rate and remaining slices
- Cost/slice and Tokens/tool efficiency metrics in KV grid
- Cache hit ratio percentage
- Milestone scope indicator when scoped to a milestone
Wave 2 - Metrics visualizations:
- Cost over time inline SVG area chart with grid lines and axis labels
- Duration by slice bar chart (third chart using existing buildBarChart)
- Budget burndown horizontal stacked bar (spent/projected/overshoot)
- Chart row CSS changed to auto-fit for flexible multi-column layout
Wave 3 - Blockers section:
- New section with card-based layout for blocker verifications and high-risk
incomplete slices, added to sections array and TOC nav
Wave 4 - Gantt chart:
- SVG horizontal bar timeline grouped by slice with done/active/pending
coloring and time axis labels
Wave 5 - Interactive JS features:
- Timeline filter input for text-based row filtering
- Collapsible sections with toggle buttons (localStorage persisted)
- Dark/light theme toggle in header (localStorage persisted)
Wave 6 - Mobile responsiveness:
- 768px and 480px breakpoints with stacked layouts and compressed padding
All changes in a single file (export-html.ts). No data layer changes needed.
30 new tests covering all features and edge cases.
* fix: correct Phase type literal in export-html-enhancements test
Change "execution" to "executing" to match the Phase type definition.
* docs: add Node LTS pinning guide for macOS Homebrew users
New doc (docs/node-lts-macos.md) explains how to pin Node 24 LTS
via Homebrew to avoid running on odd-numbered development releases.
Covers brew install/link/pin, version managers as alternatives,
and verification steps.
Added notice banner in README linking to the guide.
* feat: auto-extract lessons to KNOWLEDGE.md on slice/milestone completion (#711)
Added knowledge extraction steps to completion prompts:
- complete-slice.md step 9: review task summaries for patterns,
gotchas, and non-obvious lessons → append to KNOWLEDGE.md
- complete-milestone.md step 9: review all slice summaries for
cross-cutting insights → append to KNOWLEDGE.md
Combined with the existing execute-task step 13 (which already
tells agents to append discoveries during execution), this creates
a three-layer extraction pipeline: task → slice → milestone.
* docs: add Node LTS pinning guide for macOS Homebrew users
New doc (docs/node-lts-macos.md) explains how to pin Node 24 LTS
via Homebrew to avoid running on odd-numbered development releases.
Covers brew install/link/pin, version managers as alternatives,
and verification steps.
Added notice banner in README linking to the guide.
* feat: auto-create PR on milestone completion (#687)
New git preferences:
- git.auto_pr (boolean, default false): create a PR when a
milestone completes via gh CLI
- git.pr_target_branch (string, default main branch): target
branch for auto-created PRs (e.g. develop, qa, staging)
Implementation:
- GitPreferences: added auto_pr and pr_target_branch fields
- preferences.ts: added validation for both fields
- auto-worktree.ts: after push, pushes milestone branch and
creates PR via 'gh pr create' (non-fatal on failure)
Documentation:
- configuration.md: added fields to git config block, table,
and new git.auto_pr section with requirements and flow
- git-strategy.md: added Automatic Pull Requests section with
Gitflow example config
* fix: add barrel files for remote-questions, ttsr, and shared extensions
Centralizes public API surface for three extension directories behind
index.ts barrel files. External consumers now import from the barrel
instead of reaching into internal module files, reducing coupling and
making future refactors safer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: rename barrel files to mod.ts to avoid extension loader auto-discovery
The extension loader auto-discovers extensions by looking for index.ts files
inside extensions/*/ directories. remote-questions/ and shared/ are utility
directories, not extensions — their index.ts barrel files caused load failures.
Renamed to mod.ts which the loader ignores, and updated all import paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: consolidate frontmatter parsing into shared module
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: strip quotes from frontmatter scalar values
The shared parseFrontmatterMap was missing quote-stripping that the old
rule-loader had, causing 3 test failures in ttsr-rule-loader.test.ts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ensurePreconditions() had two branches: create-slice (which included
tasks/) and slice-exists (which conditionally created tasks/). The
conditional path could miss cases where a slice dir was created
manually or by a previous run without the tasks/ subdirectory.
Simplified to: create slice dir if missing, then always check and
create tasks/ unconditionally. Removes the branching that could
leave tasks/ missing.
9 tests that enforce the encapsulation of auto-mode state in AutoSession:
1. No module-level let declarations in auto.ts
2. No module-level var declarations in auto.ts
3. Exactly one AutoSession singleton
4. reset() covers every instance property
5. toJSON() includes key diagnostic properties
6. Module-level consts are only constants/accessors (no mutable state)
7-9. session.ts exports AutoSession with reset() and toJSON()
Added maintenance comments to auto.ts and auto/session.ts explaining
the invariant and linking to these tests. Any PR that adds a module-level
mutable variable to auto.ts will fail CI.
Move scattered timeout and cache-size constants (DEFAULT_COMMAND_TIMEOUT_MS,
DEFAULT_BASH_TIMEOUT_SECS, DIR_CACHE_MAX, CACHE_MAX) into a single
constants.ts module within the GSD extension.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split RemotePromptRecord into a discriminated union of PendingPromptRecord
(ref is undefined) and DispatchedPromptRecord (ref is required). This
makes the type system enforce that ref is always present after dispatch.
Also removes a redundant truthiness check on dispatch.ref in manager.ts,
since RemoteDispatchResult.ref is already non-optional.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add descriptive comments to all empty catch blocks explaining why the
error is intentionally swallowed. Covers networkidle timeouts, optional
screenshots, best-effort file writes, response body reads, route
cleanup, and page metadata refreshes.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes a security gap where ask-user-questions errorResult() could leak
tokens in error messages. The sanitizeError function and TOKEN_PATTERNS
are now in shared/sanitize.ts, imported by both manager.ts and
ask-user-questions.ts.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cap parallel async operations to prevent memory spikes when processing
large numbers of items:
- session-manager.ts: limit file loading to 10 concurrent reads
- pipeline.ts: limit job execution to 5 concurrent LLM calls
- discovery.ts: limit tool scanning to 5 concurrent scanners
Uses an inline pLimit utility in each file to avoid adding a dependency.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add argument completions for /thinking command with all 6 levels
(off, minimal, low, medium, high, xhigh) and descriptions
- Add descriptions to all GSD 2nd-level subcommand completions across
14 subcommand groups (auto, mode, parallel, setup, prefs, remote,
next, history, undo, export, cleanup, knowledge, doctor, dispatch)
- Add 35 new tests for autocomplete and fuzzy matching systems
gsd_generate_milestone_id scans disk for existing milestone dirs.
When called multiple times before any artifacts are written, it
returned the same ID (e.g. M001) every time because no dirs existed
yet.
Added an in-memory reservation set that tracks IDs returned by the
tool. Subsequent calls merge reserved IDs with on-disk IDs before
computing the next sequential ID, ensuring M001, M002, M003 are
returned in sequence even without intermediate disk writes.
* fix: consolidate duplicate formatting functions
- Rename private formatDuration in verification-evidence.ts to
formatDurationSecs to clarify its distinct fixed-seconds behavior
- Replace formatUptime implementation in bg-shell/utilities.ts with
import of shared formatDuration (functionally equivalent for >=1s)
- Replace inline fmt lambda in cli.ts with shared formatTokenCount
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: revert cli.ts import from extensions path that breaks at runtime
The extensions directory uses a separate compilation/bundle path and
cannot be imported from cli.ts. Restores the inline fmt lambda.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gsd): delete orphaned complexity.ts (superseded by complexity-classifier.ts)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): update complexity-routing tests for complexity-classifier.ts
The test file imported from the deleted complexity.ts. Removed tests
for the defunct classifyTaskComplexity function and updated all source
reads to reference complexity-classifier.ts with its actual exports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(search): consolidate duplicate Brave API helper functions
getBraveApiKey() and braveHeaders() were duplicated across provider.ts,
tool-llm-context.ts, and tool-search.ts. Export both from provider.ts
and import in the tool files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): update provider export count to include braveHeaders
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the final milestone completed with no queued follow-up, stopAuto()
tore down the worktree with preserveBranch: true but never called
mergeMilestoneToMain(). All work stayed on the milestone branch,
unmerged to main.
Add merge logic to the "all milestones complete" path in
dispatchNextUnit(), mirroring the existing merge handling in the
single-milestone-complete path. Handles both worktree isolation and
branch isolation modes.
Move resolveGitHeadPath() and nudgeGitBranchCache() to worktree.ts as
the canonical shared location. Both auto-worktree.ts and
worktree-command.ts now import from worktree.ts instead of defining
their own copies.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename browser-tools VerificationCheck/Result to BrowserVerificationCheck/Result
to eliminate name collisions with the semantically different types in gsd/types.ts.
The browser-tools types track UI element state verification (name/passed/value),
while gsd/types.ts types track command execution verification (command/exitCode/stdout).
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gitFileExec() was missing the GIT_NO_PROMPT_ENV env overlay, which meant
git could prompt for credentials and hang the process on write operations
(add, commit, revert, checkout). Extract the shared constant into
git-constants.ts to avoid duplication between git-service.ts and
native-git-bridge.ts.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, only rate-limit errors (429/too many requests) triggered
auto-resume with a delay. Server errors (500 Internal Server Error,
503 Service Unavailable, overloaded) paused auto-mode indefinitely,
requiring manual /gsd auto to resume. This was the core complaint in
sit idle until morning.
Now all transient provider errors auto-resume:
- Rate limits: auto-resume after retry-after delay (or 60s default)
- Server errors (500/502/503/overloaded/api_error): auto-resume after 30s
- Permanent errors (auth/billing/quota): still pause indefinitely
New classifyProviderError() function in provider-error-pause.ts provides
structured error classification with suggested delays. The agent_end
handler in index.ts now uses this instead of a simple regex check.
14 new tests cover all error categories and edge cases.
Files changed:
- provider-error-pause.ts: Add classifyProviderError(), update
pauseAutoForProviderError() to support isTransient flag
- index.ts: Use classifyProviderError() for structured error handling
- tests/provider-error-classify.test.ts: 14 new tests
* Add CI/CD pipeline design spec
Three-stage promotion pipeline (Dev → Test → Prod) using npm dist-tags,
GitHub Environments, Docker images, and an LLM fixture recording system.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: replace ambiguous compound question in reflection step (#963)
The reflection prompt 'Did I get that right, or did I miss something?'
is a compound question where 'yes' maps to both possible answers.
Replaced with 'Does that capture it? If not, tell me what I missed.'
— one closed question plus an instruction, removing ambiguity.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Allow orchestrators to filter the JSONL event stream to specific event
types, reducing stdout noise. The filter applies only to output —
internal processing (completion detection, supervised mode, answer
injection) is unaffected.
- New `--events <types>` flag (comma-separated, implies `--json`)
- Filter applied at stdout write point, all events still processed internally
- Updated help-text and SKILL.md with examples
- Tests for argument parsing and filter matching logic
* fix(gsd): clear all caches after discuss dispatch so picker sees new CONTEXT files
guided-flow.ts called only invalidateStateCache() after waitForIdle(),
leaving dirEntryCache stale. resolveSliceFile("CONTEXT") missed files
written during the discuss session, keeping the just-discussed slice
recommended and preventing the allDiscussed exit gate from firing.
Swap to invalidateAllCaches() at both call sites (discuss loop and
queue reorder), matching the pattern used throughout auto.ts.
Fixes#977
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gsd): remove STATE.md update instructions from all prompts (#978)
STATE.md is a derived runtime file rebuilt by auto.ts after every unit.
Prompts telling LLMs to update it caused git staging errors when the
LLM ran `git add .gsd/` and hit the gitignore barrier, leaving sessions
stuck. Removed redundant STATE.md instructions from 13 prompts, made
commit instructions explicit about which files to stage, and switched
bootstrap `nativeAddPaths` to `nativeAddAll` to respect .gitignore in
the CLI fallback path.
Closes#978
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
guided-flow.ts called only invalidateStateCache() after waitForIdle(),
leaving dirEntryCache stale. resolveSliceFile("CONTEXT") missed files
written during the discuss session, keeping the just-discussed slice
recommended and preventing the allDiscussed exit gate from firing.
Swap to invalidateAllCaches() at both call sites (discuss loop and
queue reorder), matching the pattern used throughout auto.ts.
Fixes#977
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the verification gate fails with retries remaining, handleAgentEnd
sets pendingVerificationRetry and deletes the completion key, but then
returns early without calling dispatchNextUnit. This leaves auto-mode
active but permanently stalled — no new unit is ever dispatched because
the dispatch chain is broken.
Fix: call dispatchNextUnit immediately after setting up the retry state,
with a fallback to the dispatch gap watchdog if dispatch throws.
* feat(ux): group model list by provider in /gsd prefs wizard
The model selection in configureModels() was a single flat alphabetical
list that became unwieldy with many providers. This groups models under
provider headers (e.g. "─── anthropic (24) ───") so users can quickly
scan to the right provider section.
Changes:
- ExtensionSelectorComponent: add SEPARATOR_PREFIX convention for
non-selectable group headers. Navigation skips separators, Enter
ignores them, and they render with borderAccent styling.
- configureModels: build grouped option list with provider headers
instead of flat map.
- New test: extension-selector-separator.test.ts (6 tests).
* fix(ci): use node:test instead of vitest in extension-selector test
tsconfig.extensions.json does not include vitest type declarations,
causing TS2307 in CI. Rewrite test to use node:test + node:assert
to match the convention used by other extension tests.
* fix(ci): rewrite separator test to avoid TS parameter property issue
The extension-selector component transitively imports countdown-timer.ts
which uses TypeScript parameter properties (private tui: TUI). Node's
--experimental-strip-types cannot handle these, causing ERR_UNSUPPORTED_
TYPESCRIPT_SYNTAX in CI.
Rewrite the test to verify the separator detection logic and model
grouping contract without importing the component. Duplicates the
isSeparator/nextSelectable helpers and tests them directly.
Pre-supply answers and secrets for non-interactive headless runs via a
declarative JSON file. Two main use cases:
1. Provide secrets that today get lost in headless mode (secure_env_collect
returns null in RPC mode). Secrets are injected as env vars into the
RPC child process.
2. Override default auto-responses when the first option isn't desired.
Uses two-phase correlation: observe tool_execution_start events for
question metadata, then match extension_ui_request events by title to
look up pre-supplied answers. Out-of-order events are buffered with a
500ms timeout.
Coexists with --supervised: injector tries first, then supervised mode,
then auto-responder.