handleSteer used process.cwd() as the base path for appendOverride,
which writes to project/.gsd/OVERRIDES.md. When auto-mode runs in a
worktree, it reads from worktree/.gsd/ — so overrides written from a
second terminal were never seen by the agent.
Now checks for an active worktree via getAutoWorktreePath and writes
the override there when one exists, falling back to the project root
when no worktree is active.
Closes#3476
The codebase preferences block was accepted as a known key but never
validated or assigned in validatePreferences(), causing all user-configured
codebase defaults to be silently discarded. Adds validation for
exclude_patterns (string[]), max_files (positive int), and collapse_threshold
(positive int) with unknown-key warnings and 4 new tests.
Add configurable codebase map options via preferences.md (exclude_patterns,
max_files, collapse_threshold), expose --collapse-threshold as a CLI flag,
and auto-generate CODEBASE.md during project init for instant agent orientation.
Closes#3509
The ensureDbOpen catch block now logs via logWarning with error message
instead of structured diagnostic object. Update source-level assertion
to match the new pattern.
Update existing workflow-logger tests to use logError for audit
persistence assertions (warnings are now ephemeral). Add void
expression to empty catch blocks in detectMainBranch to satisfy
the no-empty-catch CI check.
Only persist error-severity entries to audit-log.jsonl (warnings stay
ephemeral in stderr + buffer). Sanitize persisted entries with message
truncation and context field allowlisting. Demote expected main/master
branch probe failures to silent control flow. Remove JSON.stringify of
diagnostic objects embedding cwd/paths in warning messages.
Addresses Codex adversarial review findings on workflow-logger migration.
workflow-events.ts: stop logging raw event line content to audit log —
log byte length only to avoid persisting potentially sensitive payload
fragments to .gsd/audit-log.jsonl.
parallel-orchestrator.ts: revert worker NDJSON parse failure to silent
drop — non-JSON lines (progress text, tool output) are expected in
worker stdout and logging each one creates I/O pressure and audit log
bloat in the parallel execution hot path.
Replace raw process.stderr.write(), console.error(), and empty catch
blocks across 50 GSD files with structured logWarning/logError calls
from the centralized workflow-logger system.
Add 13 new LogComponent types to cover all subsystems: recovery,
session, prompt, dashboard, timer, worktree, command, parallel, fs,
bootstrap, guided, registry, renderer.
Every migrated catch block now automatically:
- Shows in terminal (stderr) with component tag
- Gets buffered for auto-loop stuck-detection summary
- Persists to .gsd/audit-log.jsonl for post-mortem analysis
Update regression test to verify catch blocks use workflow-logger
instead of raw stderr/console, covering auto-mode files and all
explicitly migrated infrastructure files.
Closes#3506
Supersedes the approach in #3496
- Stop/backtrack guard now calls pauseAuto before marking captures executed,
and returns break on any exception to prevent silently dropping user halt intent
- Backtrack target parsing excludes current milestone ID and rejects ambiguous
multi-target strings instead of guessing first match
- Fixed gsd_skip_slice parameter names in rethink prompt (milestone_id → milestoneId)
Auto-mode has empty catch blocks across 11 files that silently
swallow errors. When these operations fail (DB writes, git commands,
file sync, worktree operations), the error is lost and downstream
systems see stale or inconsistent state — leading to stuck loops,
phantom milestones, and silent data loss.
Replace every empty catch with a process.stderr.write() call that
logs the operation context and error message. Format:
gsd [filename]: <operation> failed: <error.message>
For catches already annotated with /* non-fatal */ or /* best-effort */
comments, the logging is added alongside the annotation to preserve
the original intent while making failures observable.
Adds a regression test that scans all auto-mode source files and
asserts no empty catch blocks remain.
Files modified (11):
auto-worktree.ts, auto.ts, auto-recovery.ts, auto-prompts.ts,
auto-dashboard.ts, auto-start.ts, auto-timers.ts, auto-post-unit.ts,
auto-dispatch.ts, auto-unit-closeout.ts, auto/phases.ts
No behavioral changes — only diagnostic output added.
Addresses #3348, addresses #3345
Exercises the goNextOrSubmit → notes auto-open path to ensure:
- Enter after typing a note advances instead of looping
- Empty notes still trigger the auto-open
- Normal option selection is unaffected
Fixes#3502
goNextOrSubmit() unconditionally reopened the notes field whenever the
cursor sat on the "None of the above" slot, even after the user had
already typed a note and pressed Enter. This trapped users in an
endless loop where Enter always bounced back to notes mode.
Add a `!states[currentIdx].notes` guard so the auto-open only fires
when notes are still empty.
Fixes#3502
Add 102 integration tests across two new files covering state machine
edge cases, runtime failures, and boundary conditions not exercised
by the existing live-validation suite.
Closes#3498
* feat: add stop/backtrack capture classifications for milestone regression (#3487)
Adds 4-layer methodology for halting auto-mode and backtracking to
previous milestones when captures indicate the user wants to stop or
that a milestone missed critical features:
1. Type layer: "stop" and "backtrack" classification types in captures.ts
2. Guard layer: pre-dispatch stop check in runGuards() pauses auto-mode
before the next unit dispatches
3. Resolution layer: executeBacktrack() writes BACKTRACK-TRIGGER.md and
milestone regression markers for state machine detection
4. Protection layer: revertExecutorResolvedCaptures() detects and reverts
captures silenced by non-triage agents (resolved without classification)
Also adds fast-path stop detection in auto-post-unit.ts that pattern-matches
pending capture text for stop keywords without waiting for triage.
Closes#3487
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add slice-level skip with gsd_skip_slice tool (#3477)
Adds "skipped" as a closed status alongside "complete" and "done":
- status-guards.ts: isClosedStatus() recognizes "skipped"
- state.ts: isStatusDone() recognizes "skipped"
- gsd-db.ts: getActiveSliceFromDb() skips slices with status "skipped"
- db-tools.ts: new gsd_skip_slice tool for rethink and manual use
- rethink.md: added "Skip a slice" operation to rethink prompt
- rethink.ts: buildRethinkData shows skipped slice counts
Skipped slices satisfy dependencies for downstream slices, allowing
auto-mode to advance past them. Slice data is preserved for reference.
Relates to #3477
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve 4 issues found in adversarial review of PR #3488
1. triage-ui.ts: Restore stop/backtrack entries in CLASSIFICATION_LABELS
and ALL_CLASSIFICATIONS — the Record<Classification, ...> type requires
all union members, and runtime lookups would crash on stop/backtrack.
Also auto-confirm stop/backtrack in the triage confirmation flow
(matching the triage-captures.md prompt directive).
2. triage-resolution.ts: Replace require("node:fs") in clearBacktrackTrigger
with ESM import of unlinkSync — consistent with the rest of the codebase.
3. auto-post-unit.ts: Anchor STOP_PATTERN regex to start-of-string (^) to
prevent false positives on captures like "add a pause button" or "stop
the timer from re-rendering" which are feature descriptions, not halt
directives.
4. status-guards.test.ts: Add missing test case for isClosedStatus("skipped")
to cover the new status value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update tool-naming test count for gsd_skip_slice
The new gsd_skip_slice tool (no alias) brings the total from 29 to 30.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add context optimization design spec, implementation plan, and pi-layer research
- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)
Part of #3171, #3406, #3452, #3433.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(context): add observation masking for auto-mode sessions
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(context): add phase handoff anchors for auto-mode
Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(context): add capability-aware model routing and context management preferences
Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(context): wire observation masking, phase anchors, and tool truncation
Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: remove internal planning artifacts from PR
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add capability routing, observation masking, and context management
Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Merge branch 'main' into feat/gsd-context-optimization
* fix: add context_management to known keys and prevent tool truncation state corruption
- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
to prevent corrupting shared conversation message objects.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add stop and backtrack to triage-ui classification labels
The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: context masker and tool truncation operate on correct pi-ai message format
The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.
- Fix context-masker to match on role, handle array content, detect bash
results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
array content blocks
- Update tests to use correct pi-ai LLM payload format
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move regression tests and override tests from standalone files into
the existing test files introduced by PR #666:
- resolve-config-value.test.ts: add REGRESSION #666 describe block
and setAllowedCommandPrefixes override tests
- url-utils.test.ts: add REGRESSION #666 describe block and
setFetchAllowedUrls override tests
- Delete: regression-666.test.ts, resolve-config-value-override.test.ts,
url-utils-override.test.ts
Same 59 tests, fewer files, tests live next to the code they test.
Two regression tests that prove the bug introduced by PR #666:
1. Non-default credential tool (sops) is silently blocked by the
hardcoded SAFE_COMMAND_PREFIXES with no way to override.
2. Private IP URL is silently blocked by isBlockedUrl() with no
way to allowlist.
Both tests use dynamic import to check for the override functions,
so they run cleanly on both main (where they fail) and this branch
(where they pass). Verified in a git worktree of main.
PR #666 introduced hardcoded SAFE_COMMAND_PREFIXES and SSRF URL
blocklists with no override mechanism. Users with non-standard
credential tools (sops, doppler, age, infisical) or needing to fetch
from internal URLs (self-hosted docs, VPN services) were silently
blocked with no recourse.
Add two global-only settings (ignored in project-level settings.json
to preserve the security property against malicious repos):
- allowedCommandPrefixes: replaces the built-in command allowlist
- fetchAllowedUrls: exempts hostnames from SSRF blocking
Both also support env var overrides (GSD_ALLOWED_COMMAND_PREFIXES,
GSD_FETCH_ALLOWED_URLS) for CI/container environments. Env vars
take precedence over settings.json.
Security model: global-only keys are stripped from project settings
at load time via stripGlobalOnlyKeys(), applied at all three
assignment points for this.projectSettings. The merge function
stays untouched — no future caller can accidentally skip stripping.
15 new tests covering override behavior, cache invalidation,
allowlist exemptions, and global-only enforcement.
Keep catch-all STREAM_RE from PR; upstream's 5-variant whack-a-mole is
superseded by the /in JSON at position \d+/ pattern. Also drop the now-
stale comment about checking stream before server/connection (no longer
needed since catch-all avoids those false-positive overlaps).
Satisfies CI require-tests gate by adding a test that verifies the
comprehensive pre-merge cleanup (step 7b) removes stale SQUASH_MSG and
MERGE_MSG files — the enhancement over the prior MERGE_HEAD-only cleanup.
https://claude.ai/code/session_01SSHD9RNwVGNxAJZEgNZpgZ
Main already had a simpler step 7c (removing only MERGE_HEAD). The PR's
step 7b is more thorough: it also removes SQUASH_MSG and MERGE_MSG,
matching the existing post-merge cleanup pattern. Replace 7c with 7b.
https://claude.ai/code/session_01SSHD9RNwVGNxAJZEgNZpgZ
Self-contained extension at src/resources/extensions/ollama/ that
auto-detects a running Ollama instance, discovers locally pulled models,
and registers them as a first-class provider with zero configuration.
Features:
- Auto-discovery of local models via /api/tags on session_start
- Capability detection (vision, reasoning, context window) for 40+ model families
- /ollama slash command with status, list, pull, remove, ps subcommands
- ollama_manage LLM-callable tool for agent-driven model operations
- Onboarding flow with auto-detect (no API key required)
- Non-blocking async probe — doesn't delay TUI paint
- Respects OLLAMA_HOST env var for non-default endpoints
Core changes (minimal):
- Add "ollama" to KnownProvider in pi-ai types
- Add "ollama" key resolution in env-api-keys.ts
- Add "ollama" default model in model-resolver.ts
- Add "Ollama (Local)" to onboarding wizard with probe flow
A pre-existing MERGE_HEAD (from failed prior merge, libgit2 native path,
or external tooling) blocks git merge --squash. Remove stale merge state
files before starting the squash merge, not just after.
When DB was available but empty, deriveState skipped deriveStateFromDb
entirely, bypassing the disk→DB sync logic. Milestones created outside
the DB write path were never discovered.
- Check HEAD~1 (newest snapshot) instead of resetTarget (pre-snapshot
base) for remote ancestry. The old check false-positived when the
remote was at the pre-snapshot base but snapshots were local-only.
- Re-run smartStage() after soft reset so RUNTIME_EXCLUSION_PATHS
apply to the absorbed commit. Without this, .gsd/ state files from
snapshot commits leaked into the real commit.
Adds a safety mechanism that detects uncommitted changes idle past a
configurable threshold (default: 30 min), auto-snapshots tracked files
using `git add -u`, and cleans up snapshot commits when real work lands.
- New `stale_uncommitted_changes` doctor issue with auto-snapshot fix
- Detection in health widget (60s), pre-dispatch gate, and /gsd doctor
- `nativeAddTracked()` stages only tracked files (no secrets/binaries)
- `absorbSnapshotCommits()` squashes `gsd snapshot:` commits into next
real autoCommit via soft reset + re-commit
- Configurable via `stale_commit_threshold_minutes` preference (0=off)
- Add extension-manifest.ts and extension-sort.ts to pi-coding-agent
with manifest reading and Kahn's BFS topological sort algorithm
- Add extensionPathsTransform hook to DefaultResourceLoader that runs
between path merging and loadExtensions() — enables pre-load
filtering and reordering without modifying pi internals
- Wire GSD's buildResourceLoader() to provide a transform that:
1. Filters ALL extensions (including community) through the GSD registry
2. Sorts in topological dependency order via sortExtensionPaths()
- Mark discoverAndLoadExtensions() as @deprecated (dead code path)
- Add 16 tests covering manifest reading, dependency sorting, cycles,
missing deps, and non-array deps
Previously, dependencies.extensions in manifests was decorative (sort
existed but was never called), and gsd extensions disable only worked
for bundled extensions. Community extensions in ~/.gsd/agent/extensions/
bypassed the registry entirely.
- Health widget: always-on last commit with relative time + message
- Dashboard: move worktree/branch info to right-aligned line under header
- Dashboard: move last commit to bottom-left with hints on right
- Dashboard: cap task titles at 45 chars, commit messages at 65 chars
- Dashboard: use … instead of ... for all truncation
70 tests covering all 16 phases of the GSD state machine with both
happy-path and failure-mode verification. Exercises DB and filesystem
derivation paths, reconciliation logic, and edge cases.
Findings documented in #3276: 0-byte SUMMARY triggers false completion,
DB task rows missing causes wrong phase, stale path cache across
derivations, non-standard status strings silently accepted.
* refactor(state): centralize pipeline logging through workflow logger
Route 15 raw process.stderr.write calls through the structured
workflow logger (logWarning/logError). Adds "db" and "dispatch"
as new LogComponent values. Enables auto-loop drain/summarize,
audit-log persistence, and doctor integration for reconciliation
and DB events that previously bypassed structured logging.
Files changed:
- workflow-logger.ts: add "db" and "dispatch" components
- state.ts: 3 reconciliation calls → logWarning/logError
- gsd-db.ts: 4 DB operation calls → logError
- workflow-reconcile.ts: 3 event merge calls → logWarning/logError
- auto-dispatch.ts: 1 reactive dispatch call → logError
- auto-post-unit.ts: 3 triage/rogue calls → logWarning/logError
* test(workflow-logger): add tests for db and dispatch log components
Cover the new LogComponent values added in this refactor to satisfy
the CI require-tests gate.
* feat(model-routing): enable dynamic routing by default
Change defaultRoutingConfig().enabled from false to true so that
dynamic model routing (tier-based downgrading for light/standard
tasks) is active out of the box. Users can still disable it via
dynamic_routing.enabled: false in PREFERENCES.md.
This is a behavioral change: sessions that previously used the
configured model for all tasks will now automatically downgrade
to cheaper models for light and standard complexity tasks.
* test(model-routing): verify dynamic routing enabled by default
Tests that defaultRoutingConfig returns enabled: true and all
routing features are active.
The catch block in reconcileMergeState silently swallowed all nativeCommit
exceptions, including real failures (permissions, corrupt git state, hook
rejections). This caused auto-mode to report success and return true (dirty,
re-derive) even when the merge commit actually failed, leading to an infinite
loop where auto-mode repeatedly attempted worktree finalization.
Now the catch block logs the error via ctx.ui.notify at "error" level and
returns false to signal that reconciliation failed, allowing upstream logic
to react appropriately. The nativeCommit return value is also checked —
a null return (nothing to commit) gets its own info notification distinct
from a successful commit SHA.
Closes#2542
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When GSD_MILESTONE_LOCK is set (parallel worker mode), smartStage() now
excludes .gsd/milestones/<M>/ directories for all milestones other than the
locked one. This prevents a parallel worker (e.g., M033) from staging and
committing fabricated artifacts for a milestone it does not own (e.g., M032).
Previously, smartStage() ran `git add -A` with only runtime path exclusions,
allowing cross-milestone pollution when workers share the same .gsd/ directory
(git.isolation: "none"). The GSD_MILESTONE_LOCK env var only filtered what
deriveState() sees but did not prevent file staging.
Closes#1991
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
On Windows, child_process.spawn() and execFile() open a visible console
window by default. The web server spawn, RPC bridge, browser opener, and
all 15 web service subprocess calls were missing windowsHide: true,
causing constant console window flashing when running gsd --web.
Closes#2628
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>