- Move new coercion tests to standalone file using node:test +
node:assert/strict (per CONTRIBUTING testing standards)
- Remove tests from legacy complete-slice.test.ts to avoid mixing
test frameworks in the same file
LLMs sometimes pass plain strings instead of the expected object shape
for array fields like filesModified and requires, causing TypeBox
validation to reject the input before the execute function runs. This
adds Type.Union schemas to accept both formats and normalizes strings
to objects with sensible defaults in the execute functions.
Closes#3565
The flat-rate provider guard from #3552 can fail open in two scenarios:
1. Provider alias mismatch — isFlatRateProvider only matched the exact
string "github-copilot", but "copilot" appears as a provider alias
in the codebase. Case variations could also bypass the check.
Fix: add "copilot" alias and lowercase input before set membership.
2. Unresolved primary model — when resolveModelId returns undefined
(stale model ID, registry mismatch), the guard was skipped entirely,
allowing dynamic routing to downgrade models on a flat-rate backend.
Fix: fall back to autoModeStartModel.provider and ctx.model.provider
when primary resolution fails, disabling routing if either indicates
a flat-rate provider.
Ref: #3453
Three related bugs cause auto-mode to permanently stop on transient provider
errors (overloaded_error, rate limits, 503s) that should be recoverable:
1. **Layer 1/Layer 2 race condition:** The extension's handleAgentEnd processes
agent_end events BEFORE the Core RetryHandler in _processAgentEvent. For
transient errors, both layers reacted simultaneously — the extension called
pauseAuto (tearing down the session) while the Core called agent.continue()
(in-context retry). This ripped the agent out of its context window mid-
recovery. Fix: handleAgentEnd now returns immediately for transient errors,
letting the Core retry in-context with full conversation preservation.
2. **retryState accumulation across resume cycles:** consecutiveTransientCount
in agent-end-recovery.ts accumulated across pause/resume cycles without
resetting, permanently locking out auto-resume after MAX_TRANSIENT_AUTO_RESUMES
total (not per-cycle) errors. Fix: resetTransientRetryState() is called before
startAuto() in the resume path. MAX_TRANSIENT_AUTO_RESUMES raised from 3 to 8
to cover ~30 minutes of sustained provider overload.
3. **ExtensionContext lacks newSession():** The provider-error resume callback
receives an ExtensionContext (from the agent_end hook), not an
ExtensionCommandContext. startAuto() overwrote s.cmdCtx with this incomplete
context, causing 'newSession is not a function' on every subsequent runUnit()
call. Fix: startAuto() now checks for newSession before overwriting — on
provider-error resume, it preserves the original ExtensionCommandContext.
Bonus: Session creation timeout (category=timeout) now calls pauseAuto instead
of stopAuto, matching the provider-pause path. Structural errors (TypeError)
still hard-stop to prevent infinite retry loops.
Fixes#2492
The prompt injection scan flags "You are now responsible" in
doctor-heal.md as role injection (matches "you are now [a-z]").
This is a pre-existing legitimate prompt instruction, not injection.
promptGuidelines from every registered tool are injected into the system
prompt on every API call. The return shape details were redundant (the
JSON response is self-describing). Keep only the sqlite3 prohibition.
1. Replace ensureDbOpen() with isDbAvailable() in gsd_milestone_status
so the read-only tool cannot create/migrate the DB as a side effect
2. Wrap all reads in a BEGIN/COMMIT transaction for snapshot consistency
under concurrent WAL writes
3. Broaden negative regex in guardrail tests to catch sqlite3 with
flags, relative paths, absolute paths, and quoted paths
Add 4-layer defense-in-depth to enforce single-writer WAL discipline:
1. Global anti-pattern in system.md protecting all 35+ auto-mode units
2. DB access safety blocks in 5 high-risk prompts (validate-milestone,
complete-milestone, doctor-heal, forensics, reassess-roadmap)
3. New gsd_milestone_status read-only query tool giving the LLM a
sanctioned path to inspect milestone/slice/task state
4. 14 regression tests (8 prompt guardrails + 6 tool coverage)
Closes#3541
Replace the OpenAI-compat shim with a native Ollama /api/chat streaming
provider that exposes all commonly-used Ollama options and surfaces
inference performance metrics.
Key changes:
- Native NDJSON streaming from /api/chat (no more OpenAI shim)
- Known models send num_ctx from capability table; unknown models defer
to Ollama's default to avoid OOM on constrained hosts
- Exposes: temperature, top_p, top_k, repeat_penalty, seed, num_gpu,
keep_alive, num_predict via per-model providerOptions
- Extracts <think>...</think> blocks for reasoning models (deepseek-r1, qwq)
- Surfaces InferenceMetrics (tokens/sec, durations) on AssistantMessage
- Adds remove and show actions to ollama_manage LLM tool
- Adds "ollama-chat" to KnownApi, providerOptions to Model<TApi>
- NDJSON parser uses strict mode for chat (fails on malformed frames)
- Mixed content+tool_call chunks handled independently
Closes#3544