Commit graph

3806 commits

Author SHA1 Message Date
Jeremy
0d3ef6b545 feat(gsd): add LLM safety harness for auto-mode damage control
Unified safety layer that monitors, validates, and constrains LLM behavior
during auto-mode execution. All components use warn-and-continue policy by
default (log violations, notify user, keep going).

Components:
- Evidence collector: real-time bash/write/edit tool call tracking
- Destructive command guard: classifies 10 dangerous patterns (rm -rf, force push, etc.)
- File change validator: compares git diff against task plan's expected output
- Evidence cross-reference: detects tasks marked complete with zero bash calls
- Git checkpoint: pre-unit refs/gsd/checkpoints/ for optional rollback
- Content validator: minimum quality checks on plans and summaries
- Timeout scale cap: limits timeout multiplier to 6x (was unlimited)

New preference: safety_harness with per-component toggles.
Enabled by default, auto_rollback off by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:00:06 -05:00
Tibsfox
857c45bd6a fix(gsd): replace remaining empty catch with logWarning 2026-04-05 12:21:36 -07:00
Tibsfox
f4ecfd1a56 fix(gsd): use logWarning instead of raw stderr in catch blocks 2026-04-05 12:14:23 -07:00
Tibsfox
5a51631941 fix(gsd): log error instead of empty catch in STATE.md rebuild 2026-04-05 12:08:19 -07:00
Tibsfox
114bde1788 fix(gsd): log error instead of empty catch in skip_slice 2026-04-05 12:06:11 -07:00
Tibsfox
db90607378 fix(gsd): cast milestone classification to string for type safety 2026-04-05 12:05:02 -07:00
Tibsfox
4b68e8c9d9 test(headless): add extension path alignment test 2026-04-05 11:57:40 -07:00
Tibsfox
469acf53af test(cli): add update diagnostics regression test 2026-04-05 11:57:25 -07:00
Tibsfox
a3f2de828e test(cli): add node_modules symlink regression test 2026-04-05 11:56:58 -07:00
Tibsfox
3213cd8c80 test(headless): add multi-turn command classification test 2026-04-05 11:56:44 -07:00
Tibsfox
d4b6eb714c test(pi-coding-agent): add custom provider registration test 2026-04-05 11:56:29 -07:00
Tibsfox
dbe6f9d292 test(ollama): add authMode regression test 2026-04-05 11:56:15 -07:00
Tibsfox
5b6d7784c2 test(gsd): add zero-slice roadmap guided flow test 2026-04-05 11:55:49 -07:00
Tibsfox
824e8e12a8 test(gsd): add skip-slice STATE.md rebuild regression test 2026-04-05 11:55:35 -07:00
Tibsfox
107efc5bff test(gsd): add worktree main_branch preference test 2026-04-05 11:55:21 -07:00
Tibsfox
5cb04f54ca test(gsd): add defer capture stamp regression test 2026-04-05 11:55:07 -07:00
Tibsfox
48dc32eeb5 test(tui): add Image dimension caching regression test 2026-04-05 11:54:47 -07:00
Jeremy McSpadden
3e09184493 Merge pull request #3566 from jeremymcs/fix/complete-slice-string-coercion
fix(gsd): coerce string arrays to objects in complete-slice/task tools
2026-04-05 13:44:40 -05:00
Tibsfox
523fcd89a8 fix(headless): sync resources and use agent dir for query 2026-04-05 11:35:11 -07:00
Jeremy
0b7764349c chore(gsd): remove copyright line from test file 2026-04-05 13:33:13 -05:00
Tibsfox
3bcd55ccfd fix(cli): show latest version and bypass npm cache in update check 2026-04-05 11:33:03 -07:00
Jeremy
e210b7efdf fix(gsd): follow CONTRIBUTING standards for #3565
- Move new coercion tests to standalone file using node:test +
  node:assert/strict (per CONTRIBUTING testing standards)
- Remove tests from legacy complete-slice.test.ts to avoid mixing
  test frameworks in the same file
2026-04-05 13:32:56 -05:00
Jeremy
6046a31c6f fix(gsd): address Codex adversarial review findings for #3565
- verificationEvidence coercion now uses sentinel values (exitCode: -1,
  verdict: "unknown") instead of fabricating passing results
- String coercion for requirements fields now parses "ID — detail"
  delimiter format to preserve semantic payload
- Added regression tests for sentinel values and delimiter parsing

Closes #3565
2026-04-05 13:30:09 -05:00
Jeremy
0742cf3493 fix(gsd): coerce string arrays to objects in complete-slice/task tools (#3565)
LLMs sometimes pass plain strings instead of the expected object shape
for array fields like filesModified and requires, causing TypeBox
validation to reject the input before the execute function runs. This
adds Type.Union schemas to accept both formats and normalizes strings
to objects with sensible defaults in the execute functions.

Closes #3565
2026-04-05 13:23:30 -05:00
Tibsfox
5d75705650 fix(cli): resolve hoisted node_modules for global installs 2026-04-05 11:16:10 -07:00
Tibsfox
21a14e32fc fix(headless): treat discuss and plan as multi-turn commands 2026-04-05 11:14:24 -07:00
Jeremy
3a1e9e3416 fix(gsd): harden flat-rate routing guard against alias/resolution gaps
The flat-rate provider guard from #3552 can fail open in two scenarios:

1. Provider alias mismatch — isFlatRateProvider only matched the exact
   string "github-copilot", but "copilot" appears as a provider alias
   in the codebase. Case variations could also bypass the check.
   Fix: add "copilot" alias and lowercase input before set membership.

2. Unresolved primary model — when resolveModelId returns undefined
   (stale model ID, registry mismatch), the guard was skipped entirely,
   allowing dynamic routing to downgrade models on a flat-rate backend.
   Fix: fall back to autoModeStartModel.provider and ctx.model.provider
   when primary resolution fails, disabling routing if either indicates
   a flat-rate provider.

Ref: #3453
2026-04-05 13:09:44 -05:00
Tibsfox
935cc9a464 fix(pi-coding-agent): register models.json providers and await Ollama probe in headless mode 2026-04-05 11:09:08 -07:00
Tibsfox
352dd17e76 fix(ollama): use apiKey auth mode to avoid streamSimple crash 2026-04-05 11:06:38 -07:00
Tibsfox
cd87c9937d fix(gsd): treat zero-slice roadmap as pre-planning in guided flow 2026-04-05 11:00:09 -07:00
Tibsfox
8ccab86aac fix(gsd): rebuild STATE.md after skip-slice and strengthen rethink prompt 2026-04-05 10:55:30 -07:00
Tibsfox
f93dee3733 fix(gsd): use main_branch preference in worktree creation 2026-04-05 10:43:31 -07:00
Tibsfox
b46b113360 fix(gsd): stamp defer and milestone captures as executed after triage 2026-04-05 10:38:29 -07:00
Tibsfox
31e20e0fe3 fix(tui): treat absolute file paths as plain text, not commands 2026-04-05 10:34:34 -07:00
Tibsfox
11239140db fix(tui): break infinite re-render loop for images in cmux 2026-04-05 10:30:13 -07:00
Tibsfox
9ab675a843 fix(gsd): disable dynamic model routing for flat-rate providers 2026-04-05 10:24:52 -07:00
Tibsfox
a2f7274a82 fix(gsd): rebuild STATE.md before guided-flow dispatch 2026-04-05 10:13:25 -07:00
Tibsfox
2363c94da6 fix(gsd): defer queued shells in active milestone selection 2026-04-05 10:06:05 -07:00
Tibsfox
fde2aafa64 fix(retry): prevent 429 quota cascade and 30-min lockout 2026-04-05 09:37:28 -07:00
deseltrus
58b58d7290 ci: retrigger — previous failures in unrelated tests (DB reconciliation, state-machine) 2026-04-05 18:15:21 +02:00
deseltrus
44aecc9a3f fix(auto): resilient transient error recovery — defer to Core RetryHandler and fix cmdCtx race
Three related bugs cause auto-mode to permanently stop on transient provider
errors (overloaded_error, rate limits, 503s) that should be recoverable:

1. **Layer 1/Layer 2 race condition:** The extension's handleAgentEnd processes
   agent_end events BEFORE the Core RetryHandler in _processAgentEvent. For
   transient errors, both layers reacted simultaneously — the extension called
   pauseAuto (tearing down the session) while the Core called agent.continue()
   (in-context retry). This ripped the agent out of its context window mid-
   recovery. Fix: handleAgentEnd now returns immediately for transient errors,
   letting the Core retry in-context with full conversation preservation.

2. **retryState accumulation across resume cycles:** consecutiveTransientCount
   in agent-end-recovery.ts accumulated across pause/resume cycles without
   resetting, permanently locking out auto-resume after MAX_TRANSIENT_AUTO_RESUMES
   total (not per-cycle) errors. Fix: resetTransientRetryState() is called before
   startAuto() in the resume path. MAX_TRANSIENT_AUTO_RESUMES raised from 3 to 8
   to cover ~30 minutes of sustained provider overload.

3. **ExtensionContext lacks newSession():** The provider-error resume callback
   receives an ExtensionContext (from the agent_end hook), not an
   ExtensionCommandContext. startAuto() overwrote s.cmdCtx with this incomplete
   context, causing 'newSession is not a function' on every subsequent runUnit()
   call. Fix: startAuto() now checks for newSession before overwriting — on
   provider-error resume, it preserves the original ExtensionCommandContext.

Bonus: Session creation timeout (category=timeout) now calls pauseAuto instead
of stopAuto, matching the provider-pause path. Structural errors (TypeError)
still hard-stop to prevent infinite retry loops.

Fixes #2492
2026-04-05 18:15:21 +02:00
Jeremy McSpadden
a6b7febc5e Merge pull request #3545 from jeremymcs/feat/ollama-native-chat-provider
feat(ollama): native /api/chat provider with full option exposure
2026-04-05 11:05:47 -05:00
Jeremy McSpadden
092d1c0a9e Merge pull request #3546 from jeremymcs/worktree-issue-3541-ollama-native
fix(gsd): prevent LLM from querying gsd.db directly via bash
2026-04-05 10:51:01 -05:00
Jeremy
563fdae8e2 ci: add scanignore for doctor-heal.md false positive
The prompt injection scan flags "You are now responsible" in
doctor-heal.md as role injection (matches "you are now [a-z]").
This is a pre-existing legitimate prompt instruction, not injection.
2026-04-05 10:22:03 -05:00
Jeremy
bc20104a44 perf(gsd): trim promptGuidelines to 1 line to reduce per-turn token cost
promptGuidelines from every registered tool are injected into the system
prompt on every API call. The return shape details were redundant (the
JSON response is self-describing). Keep only the sqlite3 prohibition.
2026-04-05 10:11:15 -05:00
Jeremy
7d74081434 fix(gsd): address Codex adversarial review findings
1. Replace ensureDbOpen() with isDbAvailable() in gsd_milestone_status
   so the read-only tool cannot create/migrate the DB as a side effect
2. Wrap all reads in a BEGIN/COMMIT transaction for snapshot consistency
   under concurrent WAL writes
3. Broaden negative regex in guardrail tests to catch sqlite3 with
   flags, relative paths, absolute paths, and quoted paths
2026-04-05 09:56:19 -05:00
Jeremy
4d9eb9ead0 fix(gsd): prevent LLM from querying gsd.db directly via bash (#3541)
Add 4-layer defense-in-depth to enforce single-writer WAL discipline:

1. Global anti-pattern in system.md protecting all 35+ auto-mode units
2. DB access safety blocks in 5 high-risk prompts (validate-milestone,
   complete-milestone, doctor-heal, forensics, reassess-roadmap)
3. New gsd_milestone_status read-only query tool giving the LLM a
   sanctioned path to inspect milestone/slice/task state
4. 14 regression tests (8 prompt guardrails + 6 tool coverage)

Closes #3541
2026-04-05 09:43:56 -05:00
Jeremy
4ba2d5a219 feat(ollama): native /api/chat provider with full option exposure
Replace the OpenAI-compat shim with a native Ollama /api/chat streaming
provider that exposes all commonly-used Ollama options and surfaces
inference performance metrics.

Key changes:
- Native NDJSON streaming from /api/chat (no more OpenAI shim)
- Known models send num_ctx from capability table; unknown models defer
  to Ollama's default to avoid OOM on constrained hosts
- Exposes: temperature, top_p, top_k, repeat_penalty, seed, num_gpu,
  keep_alive, num_predict via per-model providerOptions
- Extracts <think>...</think> blocks for reasoning models (deepseek-r1, qwq)
- Surfaces InferenceMetrics (tokens/sec, durations) on AssistantMessage
- Adds remove and show actions to ollama_manage LLM tool
- Adds "ollama-chat" to KnownApi, providerOptions to Model<TApi>
- NDJSON parser uses strict mode for chat (fails on malformed frames)
- Mixed content+tool_call chunks handled independently

Closes #3544
2026-04-05 09:01:40 -05:00
Jeremy McSpadden
dcf41154b8 Merge pull request #3540 from Tibsfox/fix/seed-requirements-from-markdown
fix(gsd): seed requirements table from REQUIREMENTS.md on first update
2026-04-05 08:11:59 -05:00
Jeremy McSpadden
5c7e5efcf4 Merge pull request #3539 from Tibsfox/fix/inject-slice-context-into-prompts
fix(gsd): inject S##-CONTEXT.md from slice discussion into all prompt builders
2026-04-05 08:07:19 -05:00