Kills two independent failure paths causing the recurring dispatch loop bug:
Path B: dispatchNextUnit() called clearPathCache() but not clearParseCache(),
allowing stale parsed roadmap data (with [ ] instead of [x]) to persist
through the doctor→dispatch transition.
Path A: handleAgentEnd() never verified whether the just-completed unit
produced its expected artifact before re-entering the dispatch loop.
Now persists completion key after verification, so the idempotency
check in dispatchNextUnit() skips already-completed units.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The loop detection in dispatchNextUnit stops auto-mode when a unit has
been dispatched MAX_UNIT_DISPATCHES (3) times. Previously, only
execute-task had reconciliation logic to check whether the artifact
actually exists on disk before bailing. All other unit types
(complete-slice, plan-slice, research-slice, etc.) would immediately
stop — even if the Nth attempt successfully produced the artifact.
This is a race between the dispatch counter and disk verification:
the counter increments at dispatch time, but artifact verification
only runs during closeout of the NEXT unit. If the last allowed
attempt succeeds, the counter is already at the limit when the next
dispatch tries to run, and nobody checks disk state.
Reproduction scenario:
1. complete-slice dispatched 3 times (LLM missed writing UAT on
attempts 1-2, succeeded on attempt 3)
2. Attempt 3 produces both SUMMARY and UAT — auto-committed to disk
3. Dispatch 4 fires: prevCount (3) >= MAX_UNIT_DISPATCHES (3)
4. No disk check for complete-slice → pipeline stops with
'Expected artifact not found' despite artifacts existing
Fix: add a general verifyExpectedArtifact() check after the
execute-task-specific reconciliation and before the final bail-out.
If artifacts exist on disk, clear the counter and advance. If not,
same error as before — no behavior change for genuinely stuck units.
When loop-recovery or self-repair reconciliation succeeds (artifacts exist on
disk), the dispatch counter is reset but the unit is never marked complete in
completed-units.json. If deriveState() continues returning the same unit, the
cycle repeats indefinitely: 3 dispatches → stuck detection → reconciliation
→ counter reset → 3 more dispatches...
This was observed in production burning $93.87 on 103 dispatches of a single
already-completed task over 4.9 hours.
Changes:
1. Persist completed key (persistCompletedKey + completedKeySet.add) in both
the loop-recovery and self-repair success paths, so the idempotency check
at the top of dispatchNextUnit prevents re-dispatch.
2. Add invalidateStateCache() after reconciliation writes to ensure the next
deriveState() call sees fresh disk state.
3. Add a hard lifetime dispatch counter (unitLifetimeDispatches) that survives
counter resets from reconciliation paths. Caps any single unit at 6 total
dispatches across all reconciliation cycles.
Fixes#462
The static assertion searched the first 1200 chars of checkAutoStartAfterDiscuss
for CONTEXT-DRAFT and unlinkSync references, but the function grew to 4164 chars
after adding Gates 2-4 (STATE.md, PROJECT.md, manifest validation). The search
window now extends to the next export statement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes the remaining gap in multi-milestone enforcement: the code
previously validated only the END STATE (files exist) but not the
PROCESS (each gate was presented to the user).
New mechanism:
- discuss.md instructs the LLM to write .gsd/DISCUSSION-MANIFEST.json
after EACH Phase 3 gate decision, tracking gates_completed vs total
- checkAutoStartAfterDiscuss() Gate 4: BLOCKS auto-start if
gates_completed < total (not just a warning)
- Manifest is deleted after auto-start (only needed during discussion)
- Single-milestone discussions don't use manifest (backward-compatible)
- DISCUSSION-MANIFEST.json added to baseline gitignore patterns
This creates a three-layer enforcement:
Layer 1 (Prompt): ask_user_questions calls at each gate
Layer 2 (Files): CONTEXT.md/DRAFT/directory existence check
Layer 3 (Manifest): gates_completed == total process verification
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for config/setup UX:
1. cli.ts: Add missing loadStoredEnvKeys() call in gsd config flow.
Previously, gsd config showed keys as "not configured" even when
they existed in auth.json because env vars weren't hydrated first.
2. commands.ts: New /gsd config slash command that lets users configure
API keys (Tavily, Brave, Context7, Jina, Groq) from within a running
session. Keys are saved to auth.json and activated immediately.
No need to exit the session and run gsd config externally.
3. command-search-provider.ts: Show native Anthropic web search status
when using Claude models, so users know search works even without
Brave/Tavily keys.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The preferences wizard now shows available models from the model
registry as a selectable list instead of requiring users to manually
type model IDs. Falls back to text input when no authenticated
models are available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Input component had no placeholder text support — when empty, it
showed only "> " with a blinking cursor and no hint of expected input.
The ExtensionInputComponent received a placeholder parameter but
discarded it (_placeholder with underscore = intentionally unused).
Fix: Input now has a public placeholder property. When value is empty,
renders the placeholder in dim text. ExtensionInputComponent passes
the placeholder through to Input.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The preferences parser treated [] and {} as strings instead of empty
array/object. On next serialize, yamlSafeString quoted them as "[]"
and "{}", permanently corrupting the preferences file. This caused
the wizard to show empty fields (models, auto_supervisor, etc.).
Fix: parseScalar now recognizes [] and {} (quoted or unquoted) as
empty array/object. Serializer omits empty values entirely instead
of writing key: [] or key: {}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of just showing "Edit file" notification, /gsd prefs now
ensures the preferences file exists and immediately launches the
interactive wizard. This matches user expectation — typing "prefs"
should let you edit preferences, not just show a file path.
/gsd prefs status still available for file path info without wizard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split skill diagnostics into [Skill conflicts] (actual collisions) and
[Skill issues] (validation warnings like missing description) so users
aren't misled by the label. Add wizard hint to /gsd prefs output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The synchronous fuzzyFind() native call blocks the event loop during
@ file autocomplete. On large codebases (e.g. Java projects with deep
directory trees), each call can take seconds. Since updateAutocomplete()
was called on every keystroke while autocomplete was active, rapid typing
would cascade into dozens of blocking searches — freezing the TUI for
minutes. This made it appear that arrow keys caused the freeze, when
the actual cause was accumulated backlog from processing buffered input.
Debounce all @ file reference autocomplete paths (character input,
backspace, forward delete, and re-trigger after cancellation) with a
150ms delay so only the final keystroke triggers the expensive search.
Slash command autocomplete remains synchronous since it's cheap.
Co-authored-by: TÂCHES <afromanguy@me.com>
The synchronous fuzzyFind() native call blocks the event loop during
@ file autocomplete. On large codebases (e.g. Java projects with deep
directory trees), each call can take seconds. Since updateAutocomplete()
was called on every keystroke while autocomplete was active, rapid typing
would cascade into dozens of blocking searches — freezing the TUI for
minutes. This made it appear that arrow keys caused the freeze, when
the actual cause was accumulated backlog from processing buffered input.
Debounce all @ file reference autocomplete paths (character input,
backspace, forward delete, and re-trigger after cancellation) with a
150ms delay so only the final keystroke triggers the expensive search.
Slash command autocomplete remains synchronous since it's cheap.
GitHub Copilot users with Claude models got 400 errors because the native
Anthropic web_search_20250305 tool was injected into requests to Copilot's
API proxy, which doesn't support it. The root cause was that model_select
never fires before the first API request on new sessions, so the fallback
heuristic (model name starts with "claude-") couldn't distinguish direct
Anthropic from proxied providers.
Fix: pass the resolved Model object through to the before_provider_request
event so extensions can check model.provider directly instead of relying on
model name heuristics.
When a milestone is in 'needs-discussion' phase (has CONTEXT-DRAFT.md
but no ROADMAP.md yet), /gsd discuss was incorrectly hitting the
'No roadmap yet' guard and returning early — creating the deadlock
reported in #440.
Fix: add an early check in showDiscuss() for the needs-discussion phase.
When detected, it shows the same draft discussion menu that the main
/gsd wizard shows (discuss_draft / discuss_fresh / skip_milestone),
bypassing the roadmap guard entirely. The discussion IS how the roadmap
gets created, so requiring it to already exist was wrong.
Fixes#440
Moves extension tool_call/tool_result interception from wrapToolsWithExtensions
(which fires inside the agent loop, bypassing event settlement) to
beforeToolCall/afterToolCall hooks that await _agentEventQueue. This ensures
extensions always see settled state — including the appended assistant message —
even when tools execute in parallel.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Initial plan
* chore: establish baseline before implementing em-dash fix
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
* fix: validate milestone titles against delimiter characters (em dash, slash) that break state management
- Changed STATE.md separator from em dash to colon in buildStateMarkdown and state.md template
- Removed ambiguous '— Context' suffix from context.md H1 template
- Added validateTitle() function to detect problematic delimiter characters
- Added delimiter_in_title doctor issue code for milestone/slice title validation
- Added tests for validateTitle() and doctor delimiter detection
- Added em-dash-in-title cases to regex-hardening test
Fixes: milestone titles containing '—' caused state corruption when the LLM
misread the ambiguous STATE.md separator format and wrote incorrect planning files.
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: glittercowboy <186001655+glittercowboy@users.noreply.github.com>
Add Ollama Cloud (ollama.com) as a built-in provider with both model
hosting and web search/fetch capabilities.
Model provider:
- 13 curated models via OpenAI-compatible API (Llama 3.1, Qwen 3,
DeepSeek R1, Gemma 3, Mistral, Phi-4, GPT-OSS)
- Auth via OLLAMA_API_KEY environment variable
- Registered in onboarding, env hydration, and model resolver
Web tool provider:
- Search via POST ollama.com/api/web_search
- Page fetch via POST ollama.com/api/web_fetch (fallback after Jina)
- Added as third search provider option alongside Tavily and Brave
- /search-provider command updated with ollama option
Closes#430
auto.ts has selfHealRuntimeRecords() which cleans up stale .gsd/runtime/units/
records when /gsd auto starts. However, guided-flow.ts (used by /gsd manual
mode) had zero awareness of runtime records — it only checked auto.lock.
This means if auto-mode crashes mid-unit, the stale runtime records persist
until the next /gsd auto run. Users who alternate between manual and auto
mode, or who only use manual mode after a crash, would accumulate stale
records that could cause spurious re-dispatch or confusing state.
Add selfHealRuntimeRecords() to guided-flow.ts that:
- Clears records where the expected artifact already exists (completed but
closeout didn't finish)
- Clears records stuck in dispatched or timeout phase (process died mid-unit)
- Notifies the user how many stale records were cleaned
Called in showSmartEntry() before the crash lock check so the wizard always
starts from a clean state regardless of how the previous session ended.
Co-authored-by: Thomas <twilliams1234@gmail.com>
When a server fails to bind to the configured ready_port, the process
would stay in "starting" status indefinitely after the probing interval
cleared, with no error surfaced to the agent. This fixes the hang by:
- Transitioning process to "error" status when port probing times out
- Detecting process exit during port polling and reporting stderr context
- Adding ready_timeout parameter for custom timeout values
- Including stderr output in waitForReady timeout/error responses
- Registering SIGTERM/SIGINT handlers to clean up bg processes on exit
Closes#428
The directory listing cache in paths.ts has no TTL and was never cleared
in production, causing dispatchNextUnit to re-dispatch the same unit
when files written by the previous unit weren't visible to deriveState.
Add clearPathCache() calls at the top of dispatchNextUnit (before
deriveState) and verifyExpectedArtifact so each dispatch cycle and
artifact check sees fresh disk state.
Closes#431
Add loadTemplate() and inlineTemplate() to prompt-loader.ts, then use
them in all 7 auto.ts builder functions and ~9 guided-flow.ts callsites
to inject template content at prompt-build time. Update 16 prompt .md
files to reference inlined templates instead of instructing agents to
read them from disk.
Over a typical 3-slice/15-task milestone run, this eliminates ~44
unnecessary READ tool calls (~45-90s latency, ~5-9k wasted tokens).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures auto-mode reads fresh file data after unit completion,
slice merges, and self-healing — prevents stale cached parses
from the memoized deriveState pipeline.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --version flag outputs a banner with ANSI escape codes. The smoke
test compared the entire multi-line output against the bare version
string, causing false failures on every release.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>