Commit graph

2613 commits

Author SHA1 Message Date
Jeremy
8078755e4b feat(gsd): persistent notification panel with TUI overlay, widget, and web API
Notifications from ctx.ui.notify() and workflow-logger now persist to
.gsd/notifications.jsonl instead of evaporating as transient toasts.

- notification-store: JSONL persistence with 500-entry rotation, atomic
  temp+rename rewrites, ref-counted suppress API, disk-synced counters
- notify-interceptor: WeakSet-guarded monkey-patch on ctx.ui.notify
  installed at session_start and session_switch
- notification-widget: always-on belowEditor strip showing unread count
- notification-overlay: scrollable Ctrl+Alt+N panel with severity filter
- /gsd notifications command: clear, tail, filter subcommands
- workflow-logger: warnings now also persist to notification store
- web API: GET/DELETE /api/notifications with ?countOnly support
- 16 unit tests covering store, suppress, project isolation, resync
2026-04-05 22:13:28 -05:00
github-actions[bot]
f6a1549edd release: v2.64.0 2026-04-06 02:11:42 +00:00
Jeremy McSpadden
24e0856950 Merge pull request #3583 from jeremymcs/fix/welcome-screen-full-width
fix(ui): remove 200-column cap on welcome screen width
2026-04-05 18:43:02 -05:00
Jeremy
0d92f2fbba test(ui): add regression test for full-width separator lines
Verifies separator lines extend to the full terminal width when
the terminal is wider than 200 columns.
2026-04-05 17:57:23 -05:00
Jeremy
efb4e21205 fix(ui): remove 200-column cap on welcome screen width
The welcome screen lines stopped short on wide terminals because
termWidth was capped at 200 columns. Remove the cap so separator
lines extend to the full terminal width.
2026-04-05 17:41:21 -05:00
Jeremy McSpadden
d1b7f6f85c Merge pull request #3570 from Tibsfox/fix/headless-query-extension-drift
fix(headless): sync resources and use agent dir for query
2026-04-05 16:05:56 -05:00
Jeremy McSpadden
2298b9acab Merge pull request #3576 from jeremymcs/feat/llm-safety-harness
feat(gsd): LLM safety harness for auto-mode damage control
2026-04-05 16:03:16 -05:00
Jeremy McSpadden
46b18818a6 Merge pull request #3561 from Tibsfox/fix/ollama-fallback-provider-ready
fix(pi-coding-agent): make Ollama visible to fallback resolver
2026-04-05 15:59:20 -05:00
Jeremy McSpadden
d666fea3a9 Merge pull request #3569 from Tibsfox/fix/update-diagnostics
fix(cli): show latest version and bypass npm cache in update check
2026-04-05 15:57:41 -05:00
Jeremy McSpadden
e17b50b8a4 Merge pull request #3577 from jeremymcs/fix/hardcoded-agent-paths-3575
fix(gsd): replace hardcoded agent skill paths with dynamic resolution
2026-04-05 15:56:49 -05:00
Jeremy
8d11e5d507 test: add regression tests for adversarial review fixes (#3576)
- git-checkpoint: rollback on checked-out branch, detached HEAD, ref cleanup
- ollama streaming: terminal done:true chunk content preservation
- provider registration: preflush clears queue to prevent double registration
2026-04-05 15:52:26 -05:00
Jeremy
ac20eab501 fix: address adversarial review findings for #3576
- Use `git reset --hard <sha>` for rollback instead of `git branch -f`
  which fails on checked-out branches and worktrees
- Clear pendingProviderRegistrations after preflush to prevent duplicate
  registration when bindCore() runs
- Process Ollama stream content on terminal `done:true` chunks to avoid
  truncating trailing assistant text
2026-04-05 15:48:25 -05:00
Jeremy McSpadden
369303f82f Merge pull request #3560 from Tibsfox/fix/ollama-stream-simple-crash
fix(ollama): use apiKey auth mode to avoid streamSimple crash
2026-04-05 15:22:38 -05:00
Jeremy McSpadden
adeedef328 Merge pull request #3562 from jeremymcs/fix/harden-flat-rate-guard
fix(gsd): harden flat-rate routing guard against alias/resolution gaps
2026-04-05 15:16:01 -05:00
Jeremy McSpadden
d3a38bb771 Merge pull request #3552 from Tibsfox/fix/disable-routing-copilot
fix(gsd): disable dynamic model routing for flat-rate providers
2026-04-05 15:15:50 -05:00
Jeremy
6d19cd6baf fix(gsd): replace hardcoded agent skill paths with dynamic resolution (#3575)
The system prompt hardcoded ~/.gsd/agent/skills/ paths for bundled skills,
causing ENOENT loops when skills weren't installed at those locations. The
auto-mode loop treated ENOENT as transient and retried indefinitely.

- Replace hardcoded skill paths in system.md with {{bundledSkillsTable}} template
  variable, resolved dynamically via resolveSkillReference() at runtime
- Replace hardcoded templates dir path with {{templatesDir}} variable
- Add buildBundledSkillsTable() to system-context.ts — only includes skills
  that actually exist on disk
- Export getTemplatesDir() from prompt-loader.ts
- Add Rule 4 to detect-stuck.ts: same ENOENT path seen twice in the sliding
  window triggers immediate stuck detection (missing files don't self-heal)
- Add 4 tests for Rule 4 coverage

Closes #3575
2026-04-05 15:12:19 -05:00
Jeremy
0d3ef6b545 feat(gsd): add LLM safety harness for auto-mode damage control
Unified safety layer that monitors, validates, and constrains LLM behavior
during auto-mode execution. All components use warn-and-continue policy by
default (log violations, notify user, keep going).

Components:
- Evidence collector: real-time bash/write/edit tool call tracking
- Destructive command guard: classifies 10 dangerous patterns (rm -rf, force push, etc.)
- File change validator: compares git diff against task plan's expected output
- Evidence cross-reference: detects tasks marked complete with zero bash calls
- Git checkpoint: pre-unit refs/gsd/checkpoints/ for optional rollback
- Content validator: minimum quality checks on plans and summaries
- Timeout scale cap: limits timeout multiplier to 6x (was unlimited)

New preference: safety_harness with per-component toggles.
Enabled by default, auto_rollback off by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:00:06 -05:00
Tibsfox
4b68e8c9d9 test(headless): add extension path alignment test 2026-04-05 11:57:40 -07:00
Tibsfox
469acf53af test(cli): add update diagnostics regression test 2026-04-05 11:57:25 -07:00
Tibsfox
d4b6eb714c test(pi-coding-agent): add custom provider registration test 2026-04-05 11:56:29 -07:00
Tibsfox
dbe6f9d292 test(ollama): add authMode regression test 2026-04-05 11:56:15 -07:00
Jeremy McSpadden
3e09184493 Merge pull request #3566 from jeremymcs/fix/complete-slice-string-coercion
fix(gsd): coerce string arrays to objects in complete-slice/task tools
2026-04-05 13:44:40 -05:00
Tibsfox
523fcd89a8 fix(headless): sync resources and use agent dir for query 2026-04-05 11:35:11 -07:00
Jeremy
0b7764349c chore(gsd): remove copyright line from test file 2026-04-05 13:33:13 -05:00
Tibsfox
3bcd55ccfd fix(cli): show latest version and bypass npm cache in update check 2026-04-05 11:33:03 -07:00
Jeremy
e210b7efdf fix(gsd): follow CONTRIBUTING standards for #3565
- Move new coercion tests to standalone file using node:test +
  node:assert/strict (per CONTRIBUTING testing standards)
- Remove tests from legacy complete-slice.test.ts to avoid mixing
  test frameworks in the same file
2026-04-05 13:32:56 -05:00
Jeremy
6046a31c6f fix(gsd): address Codex adversarial review findings for #3565
- verificationEvidence coercion now uses sentinel values (exitCode: -1,
  verdict: "unknown") instead of fabricating passing results
- String coercion for requirements fields now parses "ID — detail"
  delimiter format to preserve semantic payload
- Added regression tests for sentinel values and delimiter parsing

Closes #3565
2026-04-05 13:30:09 -05:00
Jeremy
0742cf3493 fix(gsd): coerce string arrays to objects in complete-slice/task tools (#3565)
LLMs sometimes pass plain strings instead of the expected object shape
for array fields like filesModified and requires, causing TypeBox
validation to reject the input before the execute function runs. This
adds Type.Union schemas to accept both formats and normalizes strings
to objects with sensible defaults in the execute functions.

Closes #3565
2026-04-05 13:23:30 -05:00
Jeremy
3a1e9e3416 fix(gsd): harden flat-rate routing guard against alias/resolution gaps
The flat-rate provider guard from #3552 can fail open in two scenarios:

1. Provider alias mismatch — isFlatRateProvider only matched the exact
   string "github-copilot", but "copilot" appears as a provider alias
   in the codebase. Case variations could also bypass the check.
   Fix: add "copilot" alias and lowercase input before set membership.

2. Unresolved primary model — when resolveModelId returns undefined
   (stale model ID, registry mismatch), the guard was skipped entirely,
   allowing dynamic routing to downgrade models on a flat-rate backend.
   Fix: fall back to autoModeStartModel.provider and ctx.model.provider
   when primary resolution fails, disabling routing if either indicates
   a flat-rate provider.

Ref: #3453
2026-04-05 13:09:44 -05:00
Tibsfox
935cc9a464 fix(pi-coding-agent): register models.json providers and await Ollama probe in headless mode 2026-04-05 11:09:08 -07:00
Tibsfox
352dd17e76 fix(ollama): use apiKey auth mode to avoid streamSimple crash 2026-04-05 11:06:38 -07:00
Tibsfox
9ab675a843 fix(gsd): disable dynamic model routing for flat-rate providers 2026-04-05 10:24:52 -07:00
Jeremy McSpadden
a6b7febc5e Merge pull request #3545 from jeremymcs/feat/ollama-native-chat-provider
feat(ollama): native /api/chat provider with full option exposure
2026-04-05 11:05:47 -05:00
Jeremy McSpadden
092d1c0a9e Merge pull request #3546 from jeremymcs/worktree-issue-3541-ollama-native
fix(gsd): prevent LLM from querying gsd.db directly via bash
2026-04-05 10:51:01 -05:00
Jeremy
563fdae8e2 ci: add scanignore for doctor-heal.md false positive
The prompt injection scan flags "You are now responsible" in
doctor-heal.md as role injection (matches "you are now [a-z]").
This is a pre-existing legitimate prompt instruction, not injection.
2026-04-05 10:22:03 -05:00
Jeremy
bc20104a44 perf(gsd): trim promptGuidelines to 1 line to reduce per-turn token cost
promptGuidelines from every registered tool are injected into the system
prompt on every API call. The return shape details were redundant (the
JSON response is self-describing). Keep only the sqlite3 prohibition.
2026-04-05 10:11:15 -05:00
Jeremy
7d74081434 fix(gsd): address Codex adversarial review findings
1. Replace ensureDbOpen() with isDbAvailable() in gsd_milestone_status
   so the read-only tool cannot create/migrate the DB as a side effect
2. Wrap all reads in a BEGIN/COMMIT transaction for snapshot consistency
   under concurrent WAL writes
3. Broaden negative regex in guardrail tests to catch sqlite3 with
   flags, relative paths, absolute paths, and quoted paths
2026-04-05 09:56:19 -05:00
Jeremy
4d9eb9ead0 fix(gsd): prevent LLM from querying gsd.db directly via bash (#3541)
Add 4-layer defense-in-depth to enforce single-writer WAL discipline:

1. Global anti-pattern in system.md protecting all 35+ auto-mode units
2. DB access safety blocks in 5 high-risk prompts (validate-milestone,
   complete-milestone, doctor-heal, forensics, reassess-roadmap)
3. New gsd_milestone_status read-only query tool giving the LLM a
   sanctioned path to inspect milestone/slice/task state
4. 14 regression tests (8 prompt guardrails + 6 tool coverage)

Closes #3541
2026-04-05 09:43:56 -05:00
Jeremy
4ba2d5a219 feat(ollama): native /api/chat provider with full option exposure
Replace the OpenAI-compat shim with a native Ollama /api/chat streaming
provider that exposes all commonly-used Ollama options and surfaces
inference performance metrics.

Key changes:
- Native NDJSON streaming from /api/chat (no more OpenAI shim)
- Known models send num_ctx from capability table; unknown models defer
  to Ollama's default to avoid OOM on constrained hosts
- Exposes: temperature, top_p, top_k, repeat_penalty, seed, num_gpu,
  keep_alive, num_predict via per-model providerOptions
- Extracts <think>...</think> blocks for reasoning models (deepseek-r1, qwq)
- Surfaces InferenceMetrics (tokens/sec, durations) on AssistantMessage
- Adds remove and show actions to ollama_manage LLM tool
- Adds "ollama-chat" to KnownApi, providerOptions to Model<TApi>
- NDJSON parser uses strict mode for chat (fails on malformed frames)
- Mixed content+tool_call chunks handled independently

Closes #3544
2026-04-05 09:01:40 -05:00
Jeremy McSpadden
dcf41154b8 Merge pull request #3540 from Tibsfox/fix/seed-requirements-from-markdown
fix(gsd): seed requirements table from REQUIREMENTS.md on first update
2026-04-05 08:11:59 -05:00
Jeremy McSpadden
5c7e5efcf4 Merge pull request #3539 from Tibsfox/fix/inject-slice-context-into-prompts
fix(gsd): inject S##-CONTEXT.md from slice discussion into all prompt builders
2026-04-05 08:07:19 -05:00
Tibsfox
58c19ed48d fix(gsd): seed requirements table from REQUIREMENTS.md on first update
When requirements are authored in REQUIREMENTS.md during the discussion
phase (the standard workflow), the DB requirements table stays empty.
gsd_requirement_update then fails with not_found for every requirement
at milestone completion, burning tokens on retries.

When updateRequirementInDb encounters a requirement ID not in the DB,
it now parses REQUIREMENTS.md via parseRequirementsSections() and seeds
all requirements into the DB before retrying the lookup. This preserves
the original content (class, description, why, source, validation)
instead of creating an empty skeleton.

The seeding is:
- Lazy: only runs on first miss, not on every update
- Collision-safe: skips IDs already in the DB
- Non-blocking: falls through to skeleton if REQUIREMENTS.md is
  missing or unparseable

Adds 1 regression test verifying that updating R005 when the DB is
empty seeds all 3 requirements from REQUIREMENTS.md with their
original content preserved.

Closes #3346
2026-04-05 05:44:06 -07:00
Tibsfox
01a1295e4d fix(gsd): inject S##-CONTEXT.md from slice discussion into all prompt builders
S##-CONTEXT.md files produced by /gsd discuss (require_slice_discussion)
are never injected into downstream prompt builders. Discussed
requirements, acceptance criteria, and design decisions are silently
dropped — the researcher, planner, completer, replanner, and
reassessor never see them.

Add resolveSliceFile(base, mid, sid, "CONTEXT") + inlineFileOptional()
to all 5 affected builders:

  1. buildResearchSlicePrompt
  2. buildPlanSlicePrompt
  3. buildCompleteSlicePrompt
  4. buildReplanSlicePrompt
  5. buildReassessRoadmapPrompt

The slice CONTEXT is placed immediately after the roadmap and before
other context (research, decisions, requirements) so the discussed
scope is visible before detailed planning artifacts.

Uses the existing inlineFileOptional() pattern — if no S##-CONTEXT.md
exists, nothing is injected (zero cost for projects not using slice
discussion).

Adds 5 regression tests verifying each builder resolves and inlines
the slice CONTEXT file.

Closes #3452
2026-04-05 05:41:14 -07:00
Jeremy McSpadden
f64a7c517d Merge pull request #3538 from jeremymcs/docs/documentation-refresh
docs: refresh documentation for v2.63.0
2026-04-05 07:40:18 -05:00
Jeremy McSpadden
8eba02f59f Merge pull request #3537 from jeremymcs/fix/model-fallback-race
fix(pi-coding-agent): resolve model fallback race that ignores configured provider
2026-04-05 07:38:21 -05:00
Jeremy
f4b87bf940 docs: refresh documentation for v2.63.0
Update What's New section from v2.52 to v2.63, expand native engine
docs to cover all 20+ modules, add missing extensions and ADRs to
indexes, update version references and Node.js requirements.
2026-04-05 07:37:31 -05:00
Jeremy
e3cd354d58 fix(cli): guard model re-apply against session restore and async rejection
Address Codex adversarial review findings:

1. Only re-apply the validated model when createAgentSession() signals
   a fallback (modelFallbackMessage is truthy). This prevents silently
   overriding the persisted model of resumed conversations.

2. Use modelRegistry.getAvailable() instead of find() to ensure the
   model's provider is request-ready before calling setModel().

3. Await session.setModel() and wrap in try/catch so provider auth
   failures don't surface as unhandled promise rejections at startup.

Applies to both print-mode and interactive-mode startup paths.
2026-04-05 07:27:26 -05:00
Jeremy
9fe13da3f2 fix(pi-coding-agent): resolve model fallback race that ignores configured provider (#3534)
Extension-provided models (e.g. claude-code/*) were unavailable during
findInitialModel() because pendingProviderRegistrations had not been
flushed yet, causing the fallback chain to select Google Gemini even
when the user explicitly configured claude-code as their default.

Three compounding issues fixed:

(A) Flush pendingProviderRegistrations in createAgentSession() before
    findInitialModel() runs, so extension models are in the registry
    when initial model selection happens.

(B) Re-apply the validated model to the session after
    validateConfiguredModel() in both print and interactive CLI paths.
    Previously, validation updated settingsManager but never called
    session.setModel(), leaving the session on the wrong model.

(C) Update defaultModelPerProvider.anthropic from "claude-opus-4-6[1m]"
    to "claude-opus-4-6" — the [1m] variant was removed from the model
    registry when the base model was upgraded to 1M context, causing the
    Anthropic fallback to silently fail and skip to Google.

Closes #3534
2026-04-05 07:14:24 -05:00
Tom Boucher
261e2a6d5f fix(detection): add xcodegen and Xcode bundle support to project detection (#1882)
* fix: detect Xcode bundles by suffix scan in worktree health check (#1882)

Xcode project directories have project-specific names (e.g. Sudokuxyz.xcodeproj)
that cannot be matched by the exact-filename PROJECT_FILES list. Add a
readdirSync suffix scan for *.xcodeproj and *.xcworkspace so iOS/macOS projects
are not incorrectly treated as greenfield when the health check runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: replace empty catch with debugLog in Xcode bundle scan

The silent-catch-diagnostics test (#3348) bans empty catch blocks in
migrated auto-mode files. Replace the bare `catch { /* best-effort */ }`
with a debugLog call to satisfy the workflow-logger requirement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 08:13:42 -04:00
Tom Boucher
4cb6252b9b fix(perf): share jiti module cache across extension loads (#3308)
* fix(perf): share jiti module cache across extension loads (#2108)

Each extension was creating a new jiti instance with moduleCache: false,
causing shared dependencies to be recompiled for every extension. Use a
shared singleton with moduleCache: true so shared modules are compiled once.

Export resetExtensionLoaderCache() for test teardown and explicit reload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: correct loader path in extension-load-perf test (4 → 3 levels up)

The test file is at src/tests/ (2 levels deep from repo root), so
fileURLToPath(import.meta.url) + 3x'..' reaches the repo root.
Using 4 levels exits the repo into the GitHub Actions workspace parent,
causing ERR_MODULE_NOT_FOUND for loader.js in dist/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: use process.cwd() for loader path in perf test (source/compiled portability)

import.meta.url resolves to different depths in source (src/tests/) vs compiled
(dist-test/src/tests/), so relative '../' navigation produces the wrong path in
the build phase. process.cwd() is always the repo root in CI regardless of
where the test file is compiled to.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: trek-e <trek-e@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 07:59:17 -04:00