Commit graph

2853 commits

Author SHA1 Message Date
Jeremy McSpadden
7b7cad0de9 Merge pull request #3586 from jeremymcs/fix/elapsed-timer-lost-on-resume
fix(gsd): persist elapsed timer across session resume
2026-04-05 21:21:37 -05:00
github-actions[bot]
f6a1549edd release: v2.64.0 2026-04-06 02:11:42 +00:00
Jeremy
b6794956f8 test(gsd): add regression tests for autoStartTime persistence (#3585)
Source-code guards verifying autoStartTime is saved in paused-session.json,
restored on both resume paths, and falls back to Date.now() when missing.
2026-04-05 21:07:58 -05:00
Jeremy
a0a20599a0 fix(gsd): persist autoStartTime across session resume so elapsed timer survives /exit
autoStartTime was never saved to paused-session.json, so cross-session
resume always started with autoStartTime=0 and the widget showed no
elapsed timer. Now saved on pause, restored on resume with Date.now()
fallback for old files.

Also fixes widget layout: elapsed/ETA stays on the header line above
the milestone/branch info line.
2026-04-05 21:04:05 -05:00
Alan Alwakeel
9711ac3efa test(gsd): add pause wiring and integration tests for enhanced verification
- pre-execution-pause-wiring.test.ts: Tests blocking check → pause control flow
- enhanced-verification-integration.test.ts: End-to-end integration coverage

Verifies that blocking pre-execution failures trigger auto-mode pause and
that the enhanced verification pipeline integrates correctly with existing
verification infrastructure.
2026-04-05 20:25:27 -04:00
Alan Alwakeel
8f2c544a91 fix(gsd): add enhanced_verification preferences to mergePreferences
The enhanced_verification_* preferences were validated and typed but not
included in mergePreferences(), causing project-level overrides to be
silently ignored. This fix ensures project preferences properly merge
with user-level defaults.
2026-04-05 19:47:34 -04:00
Alan Alwakeel
6ee0f40083 feat(gsd): wire blocking behavior and strict mode for enhanced verification
Integrates pre/post-execution checks into auto-mode:
- auto-verification.ts: runEnhancedPreChecks/runEnhancedPostChecks integration
- auto-post-unit.ts: pause control flow when blocking checks fail
- Respects enhanced_verification_strict preference for blocking vs warning

Control flow: blocking failures trigger auto-mode pause for user review.
2026-04-05 19:47:34 -04:00
Alan Alwakeel
a3d08f7125 feat(gsd): add post-execution cross-task consistency checks
Adds 3 post-execution checks that run after task completion:
- Import resolution: verifies relative imports resolve to existing files
- Export verification: confirms exported symbols are defined
- Type consistency: validates function return types match declarations

All checks follow the permissive-by-default pattern (R012) - warnings don't block.
2026-04-05 19:46:31 -04:00
Alan Alwakeel
992b321b63 feat(gsd): add pre-execution plan verification checks
Adds 4 pre-execution checks that run before each task:
- File ops review: surfaces create/edit/delete intent for manual review
- Read-before-create guard: fails when plan reads a file before creating it
- Package existence: verifies npm packages exist before install attempts
- Interface contract: warns on mismatched function signatures

Includes preference types and validation for enhanced_verification settings.
2026-04-05 19:46:31 -04:00
Jeremy McSpadden
24e0856950 Merge pull request #3583 from jeremymcs/fix/welcome-screen-full-width
fix(ui): remove 200-column cap on welcome screen width
2026-04-05 18:43:02 -05:00
Jeremy
0d92f2fbba test(ui): add regression test for full-width separator lines
Verifies separator lines extend to the full terminal width when
the terminal is wider than 200 columns.
2026-04-05 17:57:23 -05:00
Jeremy
efb4e21205 fix(ui): remove 200-column cap on welcome screen width
The welcome screen lines stopped short on wide terminals because
termWidth was capped at 200 columns. Remove the cap so separator
lines extend to the full terminal width.
2026-04-05 17:41:21 -05:00
Nils Reeh
3cf0094559 test(browser-tools): add regression tests for optional sharp lazy-load
Satisfies the CI test requirement for the capture.ts source change.

Two describe blocks:
- Static: verifies the lazy-load pattern is structurally correct in
  source (no top-level import, getSharp helper present, null guard present)
- Behavioral: verifies constrainScreenshot returns the raw buffer
  unchanged when sharp is null (unavailable platform / bunx)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 23:45:01 +02:00
Nils Reeh
701eccb588 fix(browser-tools): make sharp an optional lazy dependency
sharp requires platform-specific native binaries and is unavailable
when running via bunx or on platforms like Raspberry Pi (ARM) where
the prebuilt binary may not exist.

The previous top-level static import caused the browser-tools extension
to crash at load time before any tool was ever called.

Replace the static import with a lazy getSharp() helper that catches
import failures and caches the result. constrainScreenshot returns the
raw buffer unchanged when sharp is unavailable — screenshots remain
functional, just without resizing.

The core bunx extension-loading fix (routing bunx through virtualModules
in loader.ts) belongs upstream in pi-mono and will be submitted there
once the OSS weekend freeze lifts on 2026-04-13.

Related: #3504

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 23:40:24 +02:00
Jeremy McSpadden
d1b7f6f85c Merge pull request #3570 from Tibsfox/fix/headless-query-extension-drift
fix(headless): sync resources and use agent dir for query
2026-04-05 16:05:56 -05:00
Jeremy McSpadden
2298b9acab Merge pull request #3576 from jeremymcs/feat/llm-safety-harness
feat(gsd): LLM safety harness for auto-mode damage control
2026-04-05 16:03:16 -05:00
Jeremy McSpadden
46b18818a6 Merge pull request #3561 from Tibsfox/fix/ollama-fallback-provider-ready
fix(pi-coding-agent): make Ollama visible to fallback resolver
2026-04-05 15:59:20 -05:00
Jeremy McSpadden
d666fea3a9 Merge pull request #3569 from Tibsfox/fix/update-diagnostics
fix(cli): show latest version and bypass npm cache in update check
2026-04-05 15:57:41 -05:00
Jeremy McSpadden
e17b50b8a4 Merge pull request #3577 from jeremymcs/fix/hardcoded-agent-paths-3575
fix(gsd): replace hardcoded agent skill paths with dynamic resolution
2026-04-05 15:56:49 -05:00
Jeremy
8d11e5d507 test: add regression tests for adversarial review fixes (#3576)
- git-checkpoint: rollback on checked-out branch, detached HEAD, ref cleanup
- ollama streaming: terminal done:true chunk content preservation
- provider registration: preflush clears queue to prevent double registration
2026-04-05 15:52:26 -05:00
Jeremy
ac20eab501 fix: address adversarial review findings for #3576
- Use `git reset --hard <sha>` for rollback instead of `git branch -f`
  which fails on checked-out branches and worktrees
- Clear pendingProviderRegistrations after preflush to prevent duplicate
  registration when bindCore() runs
- Process Ollama stream content on terminal `done:true` chunks to avoid
  truncating trailing assistant text
2026-04-05 15:48:25 -05:00
Jeremy McSpadden
369303f82f Merge pull request #3560 from Tibsfox/fix/ollama-stream-simple-crash
fix(ollama): use apiKey auth mode to avoid streamSimple crash
2026-04-05 15:22:38 -05:00
Jeremy McSpadden
adeedef328 Merge pull request #3562 from jeremymcs/fix/harden-flat-rate-guard
fix(gsd): harden flat-rate routing guard against alias/resolution gaps
2026-04-05 15:16:01 -05:00
Jeremy McSpadden
d3a38bb771 Merge pull request #3552 from Tibsfox/fix/disable-routing-copilot
fix(gsd): disable dynamic model routing for flat-rate providers
2026-04-05 15:15:50 -05:00
Tibsfox
f953e5d9c7 fix(gsd): pass required arguments in defer-milestone-stamp test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 13:14:37 -07:00
Jeremy
6d19cd6baf fix(gsd): replace hardcoded agent skill paths with dynamic resolution (#3575)
The system prompt hardcoded ~/.gsd/agent/skills/ paths for bundled skills,
causing ENOENT loops when skills weren't installed at those locations. The
auto-mode loop treated ENOENT as transient and retried indefinitely.

- Replace hardcoded skill paths in system.md with {{bundledSkillsTable}} template
  variable, resolved dynamically via resolveSkillReference() at runtime
- Replace hardcoded templates dir path with {{templatesDir}} variable
- Add buildBundledSkillsTable() to system-context.ts — only includes skills
  that actually exist on disk
- Export getTemplatesDir() from prompt-loader.ts
- Add Rule 4 to detect-stuck.ts: same ENOENT path seen twice in the sliding
  window triggers immediate stuck detection (missing files don't self-heal)
- Add 4 tests for Rule 4 coverage

Closes #3575
2026-04-05 15:12:19 -05:00
Jeremy
0d3ef6b545 feat(gsd): add LLM safety harness for auto-mode damage control
Unified safety layer that monitors, validates, and constrains LLM behavior
during auto-mode execution. All components use warn-and-continue policy by
default (log violations, notify user, keep going).

Components:
- Evidence collector: real-time bash/write/edit tool call tracking
- Destructive command guard: classifies 10 dangerous patterns (rm -rf, force push, etc.)
- File change validator: compares git diff against task plan's expected output
- Evidence cross-reference: detects tasks marked complete with zero bash calls
- Git checkpoint: pre-unit refs/gsd/checkpoints/ for optional rollback
- Content validator: minimum quality checks on plans and summaries
- Timeout scale cap: limits timeout multiplier to 6x (was unlimited)

New preference: safety_harness with per-component toggles.
Enabled by default, auto_rollback off by default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:00:06 -05:00
Tibsfox
857c45bd6a fix(gsd): replace remaining empty catch with logWarning 2026-04-05 12:21:36 -07:00
Tibsfox
f4ecfd1a56 fix(gsd): use logWarning instead of raw stderr in catch blocks 2026-04-05 12:14:23 -07:00
Tibsfox
5a51631941 fix(gsd): log error instead of empty catch in STATE.md rebuild 2026-04-05 12:08:19 -07:00
Tibsfox
114bde1788 fix(gsd): log error instead of empty catch in skip_slice 2026-04-05 12:06:11 -07:00
Tibsfox
db90607378 fix(gsd): cast milestone classification to string for type safety 2026-04-05 12:05:02 -07:00
Tibsfox
4b68e8c9d9 test(headless): add extension path alignment test 2026-04-05 11:57:40 -07:00
Tibsfox
469acf53af test(cli): add update diagnostics regression test 2026-04-05 11:57:25 -07:00
Tibsfox
3213cd8c80 test(headless): add multi-turn command classification test 2026-04-05 11:56:44 -07:00
Tibsfox
d4b6eb714c test(pi-coding-agent): add custom provider registration test 2026-04-05 11:56:29 -07:00
Tibsfox
dbe6f9d292 test(ollama): add authMode regression test 2026-04-05 11:56:15 -07:00
Tibsfox
5b6d7784c2 test(gsd): add zero-slice roadmap guided flow test 2026-04-05 11:55:49 -07:00
Tibsfox
824e8e12a8 test(gsd): add skip-slice STATE.md rebuild regression test 2026-04-05 11:55:35 -07:00
Tibsfox
107efc5bff test(gsd): add worktree main_branch preference test 2026-04-05 11:55:21 -07:00
Tibsfox
5cb04f54ca test(gsd): add defer capture stamp regression test 2026-04-05 11:55:07 -07:00
Tibsfox
48dc32eeb5 test(tui): add Image dimension caching regression test 2026-04-05 11:54:47 -07:00
Jeremy McSpadden
3e09184493 Merge pull request #3566 from jeremymcs/fix/complete-slice-string-coercion
fix(gsd): coerce string arrays to objects in complete-slice/task tools
2026-04-05 13:44:40 -05:00
Tibsfox
523fcd89a8 fix(headless): sync resources and use agent dir for query 2026-04-05 11:35:11 -07:00
Jeremy
0b7764349c chore(gsd): remove copyright line from test file 2026-04-05 13:33:13 -05:00
Tibsfox
3bcd55ccfd fix(cli): show latest version and bypass npm cache in update check 2026-04-05 11:33:03 -07:00
Jeremy
e210b7efdf fix(gsd): follow CONTRIBUTING standards for #3565
- Move new coercion tests to standalone file using node:test +
  node:assert/strict (per CONTRIBUTING testing standards)
- Remove tests from legacy complete-slice.test.ts to avoid mixing
  test frameworks in the same file
2026-04-05 13:32:56 -05:00
Jeremy
6046a31c6f fix(gsd): address Codex adversarial review findings for #3565
- verificationEvidence coercion now uses sentinel values (exitCode: -1,
  verdict: "unknown") instead of fabricating passing results
- String coercion for requirements fields now parses "ID — detail"
  delimiter format to preserve semantic payload
- Added regression tests for sentinel values and delimiter parsing

Closes #3565
2026-04-05 13:30:09 -05:00
Jeremy
0742cf3493 fix(gsd): coerce string arrays to objects in complete-slice/task tools (#3565)
LLMs sometimes pass plain strings instead of the expected object shape
for array fields like filesModified and requires, causing TypeBox
validation to reject the input before the execute function runs. This
adds Type.Union schemas to accept both formats and normalizes strings
to objects with sensible defaults in the execute functions.

Closes #3565
2026-04-05 13:23:30 -05:00
Tibsfox
21a14e32fc fix(headless): treat discuss and plan as multi-turn commands 2026-04-05 11:14:24 -07:00