- Move new coercion tests to standalone file using node:test +
node:assert/strict (per CONTRIBUTING testing standards)
- Remove tests from legacy complete-slice.test.ts to avoid mixing
test frameworks in the same file
LLMs sometimes pass plain strings instead of the expected object shape
for array fields like filesModified and requires, causing TypeBox
validation to reject the input before the execute function runs. This
adds Type.Union schemas to accept both formats and normalizes strings
to objects with sensible defaults in the execute functions.
Closes#3565
The flat-rate provider guard from #3552 can fail open in two scenarios:
1. Provider alias mismatch — isFlatRateProvider only matched the exact
string "github-copilot", but "copilot" appears as a provider alias
in the codebase. Case variations could also bypass the check.
Fix: add "copilot" alias and lowercase input before set membership.
2. Unresolved primary model — when resolveModelId returns undefined
(stale model ID, registry mismatch), the guard was skipped entirely,
allowing dynamic routing to downgrade models on a flat-rate backend.
Fix: fall back to autoModeStartModel.provider and ctx.model.provider
when primary resolution fails, disabling routing if either indicates
a flat-rate provider.
Ref: #3453
Three related bugs cause auto-mode to permanently stop on transient provider
errors (overloaded_error, rate limits, 503s) that should be recoverable:
1. **Layer 1/Layer 2 race condition:** The extension's handleAgentEnd processes
agent_end events BEFORE the Core RetryHandler in _processAgentEvent. For
transient errors, both layers reacted simultaneously — the extension called
pauseAuto (tearing down the session) while the Core called agent.continue()
(in-context retry). This ripped the agent out of its context window mid-
recovery. Fix: handleAgentEnd now returns immediately for transient errors,
letting the Core retry in-context with full conversation preservation.
2. **retryState accumulation across resume cycles:** consecutiveTransientCount
in agent-end-recovery.ts accumulated across pause/resume cycles without
resetting, permanently locking out auto-resume after MAX_TRANSIENT_AUTO_RESUMES
total (not per-cycle) errors. Fix: resetTransientRetryState() is called before
startAuto() in the resume path. MAX_TRANSIENT_AUTO_RESUMES raised from 3 to 8
to cover ~30 minutes of sustained provider overload.
3. **ExtensionContext lacks newSession():** The provider-error resume callback
receives an ExtensionContext (from the agent_end hook), not an
ExtensionCommandContext. startAuto() overwrote s.cmdCtx with this incomplete
context, causing 'newSession is not a function' on every subsequent runUnit()
call. Fix: startAuto() now checks for newSession before overwriting — on
provider-error resume, it preserves the original ExtensionCommandContext.
Bonus: Session creation timeout (category=timeout) now calls pauseAuto instead
of stopAuto, matching the provider-pause path. Structural errors (TypeError)
still hard-stop to prevent infinite retry loops.
Fixes#2492
The prompt injection scan flags "You are now responsible" in
doctor-heal.md as role injection (matches "you are now [a-z]").
This is a pre-existing legitimate prompt instruction, not injection.
promptGuidelines from every registered tool are injected into the system
prompt on every API call. The return shape details were redundant (the
JSON response is self-describing). Keep only the sqlite3 prohibition.
1. Replace ensureDbOpen() with isDbAvailable() in gsd_milestone_status
so the read-only tool cannot create/migrate the DB as a side effect
2. Wrap all reads in a BEGIN/COMMIT transaction for snapshot consistency
under concurrent WAL writes
3. Broaden negative regex in guardrail tests to catch sqlite3 with
flags, relative paths, absolute paths, and quoted paths
Add 4-layer defense-in-depth to enforce single-writer WAL discipline:
1. Global anti-pattern in system.md protecting all 35+ auto-mode units
2. DB access safety blocks in 5 high-risk prompts (validate-milestone,
complete-milestone, doctor-heal, forensics, reassess-roadmap)
3. New gsd_milestone_status read-only query tool giving the LLM a
sanctioned path to inspect milestone/slice/task state
4. 14 regression tests (8 prompt guardrails + 6 tool coverage)
Closes#3541
Replace the OpenAI-compat shim with a native Ollama /api/chat streaming
provider that exposes all commonly-used Ollama options and surfaces
inference performance metrics.
Key changes:
- Native NDJSON streaming from /api/chat (no more OpenAI shim)
- Known models send num_ctx from capability table; unknown models defer
to Ollama's default to avoid OOM on constrained hosts
- Exposes: temperature, top_p, top_k, repeat_penalty, seed, num_gpu,
keep_alive, num_predict via per-model providerOptions
- Extracts <think>...</think> blocks for reasoning models (deepseek-r1, qwq)
- Surfaces InferenceMetrics (tokens/sec, durations) on AssistantMessage
- Adds remove and show actions to ollama_manage LLM tool
- Adds "ollama-chat" to KnownApi, providerOptions to Model<TApi>
- NDJSON parser uses strict mode for chat (fails on malformed frames)
- Mixed content+tool_call chunks handled independently
Closes#3544
When requirements are authored in REQUIREMENTS.md during the discussion
phase (the standard workflow), the DB requirements table stays empty.
gsd_requirement_update then fails with not_found for every requirement
at milestone completion, burning tokens on retries.
When updateRequirementInDb encounters a requirement ID not in the DB,
it now parses REQUIREMENTS.md via parseRequirementsSections() and seeds
all requirements into the DB before retrying the lookup. This preserves
the original content (class, description, why, source, validation)
instead of creating an empty skeleton.
The seeding is:
- Lazy: only runs on first miss, not on every update
- Collision-safe: skips IDs already in the DB
- Non-blocking: falls through to skeleton if REQUIREMENTS.md is
missing or unparseable
Adds 1 regression test verifying that updating R005 when the DB is
empty seeds all 3 requirements from REQUIREMENTS.md with their
original content preserved.
Closes#3346
S##-CONTEXT.md files produced by /gsd discuss (require_slice_discussion)
are never injected into downstream prompt builders. Discussed
requirements, acceptance criteria, and design decisions are silently
dropped — the researcher, planner, completer, replanner, and
reassessor never see them.
Add resolveSliceFile(base, mid, sid, "CONTEXT") + inlineFileOptional()
to all 5 affected builders:
1. buildResearchSlicePrompt
2. buildPlanSlicePrompt
3. buildCompleteSlicePrompt
4. buildReplanSlicePrompt
5. buildReassessRoadmapPrompt
The slice CONTEXT is placed immediately after the roadmap and before
other context (research, decisions, requirements) so the discussed
scope is visible before detailed planning artifacts.
Uses the existing inlineFileOptional() pattern — if no S##-CONTEXT.md
exists, nothing is injected (zero cost for projects not using slice
discussion).
Adds 5 regression tests verifying each builder resolves and inlines
the slice CONTEXT file.
Closes#3452
Update What's New section from v2.52 to v2.63, expand native engine
docs to cover all 20+ modules, add missing extensions and ADRs to
indexes, update version references and Node.js requirements.
Address Codex adversarial review findings:
1. Only re-apply the validated model when createAgentSession() signals
a fallback (modelFallbackMessage is truthy). This prevents silently
overriding the persisted model of resumed conversations.
2. Use modelRegistry.getAvailable() instead of find() to ensure the
model's provider is request-ready before calling setModel().
3. Await session.setModel() and wrap in try/catch so provider auth
failures don't surface as unhandled promise rejections at startup.
Applies to both print-mode and interactive-mode startup paths.
Extension-provided models (e.g. claude-code/*) were unavailable during
findInitialModel() because pendingProviderRegistrations had not been
flushed yet, causing the fallback chain to select Google Gemini even
when the user explicitly configured claude-code as their default.
Three compounding issues fixed:
(A) Flush pendingProviderRegistrations in createAgentSession() before
findInitialModel() runs, so extension models are in the registry
when initial model selection happens.
(B) Re-apply the validated model to the session after
validateConfiguredModel() in both print and interactive CLI paths.
Previously, validation updated settingsManager but never called
session.setModel(), leaving the session on the wrong model.
(C) Update defaultModelPerProvider.anthropic from "claude-opus-4-6[1m]"
to "claude-opus-4-6" — the [1m] variant was removed from the model
registry when the base model was upgraded to 1M context, causing the
Anthropic fallback to silently fail and skip to Google.
Closes#3534
* fix: detect Xcode bundles by suffix scan in worktree health check (#1882)
Xcode project directories have project-specific names (e.g. Sudokuxyz.xcodeproj)
that cannot be matched by the exact-filename PROJECT_FILES list. Add a
readdirSync suffix scan for *.xcodeproj and *.xcworkspace so iOS/macOS projects
are not incorrectly treated as greenfield when the health check runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: replace empty catch with debugLog in Xcode bundle scan
The silent-catch-diagnostics test (#3348) bans empty catch blocks in
migrated auto-mode files. Replace the bare `catch { /* best-effort */ }`
with a debugLog call to satisfy the workflow-logger requirement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(perf): share jiti module cache across extension loads (#2108)
Each extension was creating a new jiti instance with moduleCache: false,
causing shared dependencies to be recompiled for every extension. Use a
shared singleton with moduleCache: true so shared modules are compiled once.
Export resetExtensionLoaderCache() for test teardown and explicit reload.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: correct loader path in extension-load-perf test (4 → 3 levels up)
The test file is at src/tests/ (2 levels deep from repo root), so
fileURLToPath(import.meta.url) + 3x'..' reaches the repo root.
Using 4 levels exits the repo into the GitHub Actions workspace parent,
causing ERR_MODULE_NOT_FOUND for loader.js in dist/.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: use process.cwd() for loader path in perf test (source/compiled portability)
import.meta.url resolves to different depths in source (src/tests/) vs compiled
(dist-test/src/tests/), so relative '../' navigation produces the wrong path in
the build phase. process.cwd() is always the repo root in CI regardless of
where the test file is compiled to.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: trek-e <trek-e@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>