Commit graph

3535 commits

Author SHA1 Message Date
Tibsfox
8f2e120a29 fix(gsd): tighten verifyExpectedArtifact to prevent rogue-write false positives
Three fixes to fail-closed when gsd_complete_task didn't actually run:

1. Legacy branch: require checked checkbox (- [x] **T01:) instead of
   accepting heading-style matches that only prove the task was planned
2. No plan file: return false instead of falling through
3. DB available but task row missing: return false instead of treating
   as verified — if the DB is up and the task isn't there, the
   completion tool never ran

Closes #3607

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:04:25 -07:00
Tibsfox
b9baf42a47 fix(gsd): add verification gate to complete-slice tool
complete-slice had no check on the provided verification/UAT content,
allowing agents to mark slices complete even when verification clearly
failed. The prompt told agents to always call the tool, but the tool
blindly accepted.

Now rejects completion when verification or UAT content contains
blocked/failed signals (status: blocked, verification_result: failed,
etc.), forcing agents to address blockers before advancing.

Closes #3580

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:02:53 -07:00
Jeremy McSpadden
bd574d412e Merge pull request #3666 from jeremymcs/fix/notification-overlay-backdrop
fix(gsd): notification overlay backdrop and truncation fixes
2026-04-06 21:01:17 -05:00
Tibsfox
08a79875bb fix(gsd): fix pre-execution-checks false positives from backticks and task.files
Three fixes:
1. Strip backtick wrapping in normalizeFilePath — LLM-generated paths
   like \`src/foo.ts\` resolve to nonexistent paths, causing false blocks
2. Exclude task.files from existence checks — it includes files the task
   will create, so they legitimately don't pre-exist
3. Lower minimum task count from 2 to 1 — single-task slices are valid
   per the planning prompt

Closes #3649
Closes #3626

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 19:00:47 -07:00
Tibsfox
bf0e3fb0e4 fix(gsd): stop renderAllProjections from overwriting authoritative PLAN.md
renderAllProjections called renderPlanProjection which overwrote the
complete PLAN.md (from markdown-renderer.js) with a simplified projection
missing Must-Haves, Verification, Files Likely Touched sections and
corrupting multi-line task descriptions.

Remove the plan projection call from renderAllProjections — the
authoritative renderer in plan-slice/replan-slice tools is the sole
writer. The renderIfMissing recovery path is preserved for when the
file is actually missing.

Closes #3651

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:59:00 -07:00
Tibsfox
432cb79097 fix(gsd): auto-checkout to main when isolation:none finds stale milestone branch
When switching from isolation:branch/worktree to isolation:none, HEAD
could remain on a milestone/<MID> branch from the prior session. All
subsequent auto-mode commits would silently land on the wrong branch.

Now auto-start checks for stale milestone branches when isolation:none
and auto-checks out to the integration branch (main/master).

Closes #3613

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:55:33 -07:00
Tibsfox
89fe6c3bdb fix(gsd): auto-remediate stale slice DB status when SUMMARY exists on disk
When complete-slice unit fails after writing SUMMARY.md but before
calling updateSliceStatus(), the DB stays out of sync. The post-unit
check previously reported this as a "rogue" artifact, leading to
infinite re-dispatch of the same complete-slice unit.

Now auto-remediates by calling updateSliceStatus() to sync the DB when
SUMMARY exists on disk but status != "complete". Falls back to rogue
detection if the DB update fails.

Closes #3633

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:51:49 -07:00
Tibsfox
745865f1c6 fix(gsd): open DB on demand in gsd_milestone_status for non-auto sessions
gsd_milestone_status checked isDbAvailable() but never called
ensureDbOpen(), making it always fail outside auto-mode sessions where
the DB is pre-opened during bootstrap.

Replace with ensureDbOpen() which safely opens existing DB files without
side effects when .gsd/ content exists.

Closes #3644

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:49:50 -07:00
Tibsfox
cbd705202b fix(gsd): detect phantom milestones from abandoned gsd_milestone_generate_id
gsd_milestone_generate_id inserts a DB row with status "queued" as a
side effect. If the milestone is never planned, this phantom row blocks
the state machine — isGhostMilestone returned false for any milestone
with a DB row, regardless of status.

Now isGhostMilestone treats a "queued" DB row with no disk artifacts
(CONTEXT, ROADMAP, SUMMARY) as a ghost, allowing the state machine to
skip phantom milestones.

Closes #3645

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:48:22 -07:00
Tibsfox
3e5bce5dbd fix(gsd): force re-validation when verdict is needs-remediation
When validation returns needs-remediation, remediation slices are added
and executed. But the state machine treated any terminal verdict as ready
for completing-milestone, while dispatch correctly blocked completion for
needs-remediation — creating a permanent deadlock.

Now all three derivation paths (deriveStateFromDb, _deriveStateImpl
registry loop, and _deriveStateImpl completion check) treat
needs-remediation as requiring re-validation, routing back to
validating-milestone instead of completing-milestone.

Closes #3596

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:46:42 -07:00
Tibsfox
236e9f1367 fix(gsd): exclude closed slices from findMissingSummaries check
Skip slices with status skipped/complete/done when checking for missing
SUMMARY files. Skipped slices never produce SUMMARYs by design, and
legacy-complete slices may lack them after worktree merge failures.
The DB status is authoritative — missing SUMMARY is a cosmetic gap,
not evidence the slice was incomplete.

Fixes #3620

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:35:59 -07:00
Tibsfox
7790808a29 fix(gsd): recover from stale lockfile after crash or SIGKILL
Add pre-flight stale lock cleanup before proper-lockfile acquisition:
if the .lock/ directory exists but no auto.lock metadata is present
(or the owning PID is dead), remove it proactively instead of waiting
for the 30-min stale window. Also improve the error message when
recovery fails to include the rm command for manual cleanup.

Fixes #3218

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:33:27 -07:00
Tibsfox
7e9434dec1 fix(gsd): add createdAt timestamp and 30s age guard to staleness check
Prevent race where a freshly-set pending entry (before LLM writes
artifacts) could be falsely detected as stale. Only clear entries
older than 30 seconds with no manifest or CONTEXT.md on disk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:30:29 -07:00
Tibsfox
af2bd4d45f fix(gsd): clear stale pendingAutoStart after /clear interrupts discussion
When the pending auto-start guard fires, check if the discussion is
actually still in progress by verifying the discussion manifest or
milestone context exists on disk. If neither exists, the entry is stale
from an interrupted session — clear it and allow re-entry.

Fixes #3274

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:27:45 -07:00
Tibsfox
f93123cdbb fix(gsd): suppress misleading warnings for expected ENOENT/EISDIR conditions
Skip ENOENT warnings in clearProjectRootStateFiles and untracked file
cleanup since missing files are expected. Check if .git is a directory
before attempting readFileSync in resolveGitDir to avoid EISDIR warning
in normal (non-worktree) repos.

Fixes #3597

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:23:54 -07:00
Tibsfox
0b6dfd0bbf fix(gsd): extract real error from message content when errorMessage is useless
When errorMessage is uninformative (e.g. "success", "ok"), fall back
to the assistant message text content for display while keeping
rawErrorMsg for classification to avoid prose false-positives.

Fixes #3588

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:22:02 -07:00
Tibsfox
eaeced4774 fix(gsd): extract real error from message content when errorMessage is useless
When errorMessage is uninformative (e.g. "success", "ok", "error"),
fall back to the assistant message text content to surface the real
provider error like "Invalid API key · Please run /login".

Fixes #3588

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:20:11 -07:00
Tibsfox
c159844b05 fix(gsd): show accurate pause message for queued-user-message skip
Distinguish between malformed-JSON pauses and queued-user-message
pauses in the notification so operators see the correct root cause.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:18:44 -07:00
Tibsfox
1859cb0d1a fix(gsd): treat queued-user-message skip as non-retryable interruption
Add isQueuedUserMessageSkip() predicate and extend recordToolInvocationError
to catch "Skipped due to queued user message." so auto-mode pauses instead
of retrying the same unit until the provider aborts with 3 consecutive
validation failures.

Fixes #3595

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:16:32 -07:00
Tibsfox
c2c5f80f79 fix(gsd): recognize "Not provided." default in isVerificationNotApplicable
Strip trailing punctuation before matching and add "provided" to the
alternation so the plan-milestone default value no longer deadlocks
completing-milestone dispatch. Also change plan-milestone verification
defaults from "Not provided." to empty string to prevent recurrence.
Update JSDoc comments to reflect new defaults.

Fixes #3634

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:13:25 -07:00
Iouri Goussev
4b07f24d86 fix(gsd): discoverManifests skips symlinked extension directories
Dirent.isDirectory() returns false for symbolic links, so extensions
installed as directory symlinks under ~/.gsd/agent/extensions/ were
invisible to all management commands (list, enable, disable, info).

Apply the same guard already used in loader.ts discoverExtensionsInDir:
  entry.isDirectory() || entry.isSymbolicLink()

Closes igouss/gsd-2#20
2026-04-06 21:13:08 -04:00
Tibsfox
c138cca078 fix(gsd): recognize "Not provided." default in isVerificationNotApplicable
Strip trailing punctuation before matching and add "provided" to the
alternation so the plan-milestone default value no longer deadlocks
completing-milestone dispatch. Also change plan-milestone verification
defaults from "Not provided." to empty string to prevent recurrence.

Fixes #3634

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 18:12:06 -07:00
Jeremy
af158235eb fix(gsd): remove background color from backdrop, fix message truncation
Backdrop was painting empty lines with dark gray background (48;5;233),
making the entire screen go black. Now uses dim + gray foreground only.

Message truncation now measures actual prefix width with visibleWidth()
instead of hardcoded 20-char estimate, and uses truncateToWidth() for
proper Unicode handling.
2026-04-06 20:11:07 -05:00
Tibsfox
b1d9798e30 fix(gsd): reconcile plan-file tasks into DB when planner skips persistence (#3600)
When the planning agent writes S##-PLAN.md with task entries but never
calls the gsd_plan_slice persistence tool, the DB has zero task rows
even though the plan file on disk contains valid tasks. This causes
deriveState to return phase='planning' forever — the auto-mode
dispatcher re-dispatches plan-slice in an infinite loop.

Add a reconciliation step in deriveStateFromDb: when the DB returns zero
tasks but the plan file exists and contains parsed tasks, import them
into the DB so the state machine can advance past planning into
execution. This mirrors the existing #2514 reconciliation pattern for
stale task status.

Fixes #3600

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:51:33 -07:00
Tibsfox
77788a1b7e fix(gsd): classify plain connection-error as transient
Fixes #3594 — CONNECTION_RE required specific suffixes after connection, so plain Connection error fell through to unknown causing indefinite auto-mode pause.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:51:19 -07:00
Tibsfox
e1801f967f fix(gsd): use isClosedStatus() in dispatch guard instead of raw complete check
Replaces `r.status === "complete"` with `isClosedStatus(r.status)` in
dispatch-guard.ts so slices completed via the reconciliation replay path
(which writes "done") or skipped slices are correctly recognized as
closed. This was causing auto-mode to block on dependencies that were
actually complete.

Fixes #3601

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 17:50:02 -07:00
Jeremy
41d5189c4c fix(gsd): restore consistent overlay height to prevent ghost artifacts
Differential renderer can't clear old overlay positions when height
changes between filter cycles. Pad to maxVisibleRows so the overlay
stays the same size regardless of filter state.
2026-04-06 19:31:32 -05:00
Jeremy McSpadden
b4c6229360 Merge pull request #3646 from jeremymcs/fix/notification-overlay-backdrop
fix(gsd): notification overlay backdrop and sizing
2026-04-06 19:09:22 -05:00
Jeremy
2c91b8c6d8 test(tui): add test for 256-color backdrop codes 2026-04-06 18:57:02 -05:00
Jeremy
c35385fe53 fix(gsd): improve notification overlay backdrop and content-fit sizing
Use dark gray background + dim foreground for visible backdrop effect
instead of barely-perceptible SGR dim. Size overlay box to content
instead of padding to fill the entire viewport.
2026-04-06 18:53:26 -05:00
Jeremy McSpadden
6d2345e939 Merge pull request #3638 from jeremymcs/fix/notification-overlay-backdrop
fix(gsd): notification overlay backdrop dimming and viewport padding
2026-04-06 18:27:52 -05:00
Jeremy
9d1e343e41 test(gsd): add overlay backdrop and notification lock safety tests
- Overlay layout: verify backdrop dims base lines, no dim without flag,
  overlay composites on top of dimmed background
- Notification store: verify markAllRead and clearNotifications do not
  delete a foreign process's lock file
2026-04-06 17:44:34 -05:00
Jeremy
d553455732 fix(gsd): only unlink notification lock when owned, prevent foreign lock deletion
_withLock() was unconditionally unlinking the lock file in finally,
even when lock acquisition failed. This could delete another process's
lock and allow unlocked concurrent writes. Now tracks ownership and
only cleans up locks we created.
2026-04-06 17:39:44 -05:00
Jeremy
2c4ac844f1 fix(gsd): add backdrop dimming and viewport padding to notification overlay
The notification overlay was rendering too small with few entries, allowing
underlying content to bleed through. Added viewport padding to fill the
overlay box and a new `backdrop` option to OverlayOptions that dims the
background behind modal overlays.
2026-04-06 17:34:45 -05:00
Jeremy McSpadden
42caabdd0d Merge pull request #3563 from Tibsfox/fix/headless-discuss-multi-turn
fix(headless): treat discuss and plan as multi-turn commands
2026-04-06 15:58:04 -05:00
Jeremy McSpadden
d26efa47bc Merge pull request #3591 from jeremymcs/fix/complete-slice-provides-string-coercion
fix(gsd): coerce plain-string provides field to array in complete-slice
2026-04-06 15:57:16 -05:00
Jeremy McSpadden
c53f8ab471 Merge pull request #3608 from deseltrus/perf/session-memory-cpu-leaks
perf: fix CPU/memory leaks in long-running sessions
2026-04-06 15:56:22 -05:00
Jeremy McSpadden
d877e6e152 Merge pull request #3627 from jeremymcs/fix/3615-continue-context-injection
fix(gsd): inject task context for unstructured resume prompts (#3615)
2026-04-06 12:27:43 -05:00
Jeremy
04dc4a988b fix(gsd): add intent + phase guards to resume context fallback (#3615)
Tighten the deriveState fallback per adversarial review:
- Intent-gated: only fire for low-entropy resume prompts via
  RESUME_INTENT_PATTERNS (continue, ok, go ahead, resume, etc.)
- Phase-gated: only during state.phase === "executing"
- Non-resume prompts (help, status, abort, diagnostics) are not
  hijacked with execution context

Add behavioral tests: 24 positive matches + 17 negative rejections
for the intent pattern, alongside the 5 structural tests.
2026-04-06 12:14:36 -05:00
Jeremy
5b104897c8 fix(gsd): inject task context for unstructured resume prompts (#3615)
When a user types "continue" or bare text to resume an in-progress
session, buildGuidedExecuteContextInjection() only matched two
hardcoded regex patterns and returned null for anything else — causing
the agent to rebuild everything from scratch and burn ~86k tokens.

Add a phase-gated deriveState fallback that injects task execution
context when state.phase === "executing" and an active task exists.
The phase guard prevents misrouting during replanning, gate evaluation,
or other non-execution phases.
2026-04-06 12:10:08 -05:00
Jeremy McSpadden
0d87df94be Merge pull request #3619 from frizynn/fix/3618-schema-overload-bash-exit-code
fix(agent-loop): schema overload cap ignores bash execution errors (#3618)
2026-04-06 10:13:03 -05:00
Jeremy McSpadden
4aa7fe3940 Merge pull request #3621 from jeremymcs/fix/3616-db-tools-missing-subagent
fix(pi-coding-agent): restore extension tools after session switch (#3616)
2026-04-06 10:12:32 -05:00
Jeremy
90feebeccf fix(pi-coding-agent): restore extension tools after session switch (#3616)
newSession() only rebuilt the tool registry when cwd changed. When cwd
stayed the same (e.g., discuss → plan-slice in the same worktree), any
tool narrowing from setActiveTools() persisted — stripping gsd_plan_slice
and other DB tools from auto-mode subagent sessions.

Add an else-branch that calls _refreshToolRegistry with
includeAllExtensionTools:true on every session switch, regardless of cwd.

Also call resetExtensionLoaderCache() in DefaultResourceLoader.reload()
so hot-updated extension code on disk is re-compiled instead of served
from the stale jiti module cache.

Closes #3616
2026-04-06 09:51:58 -05:00
frizynn
cd14a4c765 fix(agent-loop): schema overload cap ignores bash execution errors (#3618)
The schema overload detector counted ALL isError tool results toward the
consecutive-failure cap, including bash commands that returned non-zero exit
codes (e.g. rg/grep exit 1 = 'no matches'). Three consecutive exploratory
searches with no matches would trigger the cap and abort the session.

Root cause: the allToolsFailed check used toolResults.every(r => r.isError)
which conflates preparation-phase errors (schema validation, tool-not-found,
tool-blocked) with execution-phase errors (the tool ran successfully but
returned a non-zero exit code).

Fix: track preparationErrorCount alongside tool results. Only preparation
errors (schema/validation failures) increment the consecutive failure
counter. Tool execution errors — like bash exit code 1 — are valid usage
and do not count toward the cap.

Also fixes pre-existing StopReason type mismatches in agent-loop tests
(end_turn → stop, tool_use → toolUse).
2026-04-06 11:35:41 -03:00
deseltrus
4744e86c8f test: structural regression tests for session memory/CPU leak fixes
Verifies that defensive guards (render-skip, chat cap, dispose, signal
handler cleanup, alert cap, orphan kill) are present in source. These
are structural tests because the leaks manifest over hours of real
usage, not in unit test timescales.
2026-04-06 09:57:40 +02:00
deseltrus
886c5837ff fix(bg-shell): prevent signal handler accumulation + cap alert queue
Signal handlers (SIGTERM, SIGINT, beforeExit) were registered on every
session_start but never removed. Over multiple sessions within the same
process, handlers accumulated — each adding another cleanupAll() call
and descendant kill sweep on exit.

Fix: session_shutdown now calls process.off() for each handler before
cleanupAll(), preventing accumulation.

Also: signalCleanup now kills ALL descendant processes (not just those
tracked by bg-shell) to catch bash-tool spawned children.

Alert queue: pendingAlerts is capped at 50 entries to prevent unbounded
growth when background processes generate rapid alerts faster than the
agent consumes them.

pushAlert signature updated to accept null bg parameter for system-level
alerts that don't originate from a tracked process.
2026-04-06 09:52:36 +02:00
deseltrus
0b40d39b0e perf(interactive): cap rendered chat components + kill orphan descendants
Chat component cap: After 100 rendered components, oldest are removed
from the container (session transcript persists on disk via
SessionManager). Prevents unbounded memory growth in long sessions
where thousands of tool calls accumulate DOM-like component trees.

Orphan process prevention: On shutdown, listDescendants(process.pid)
finds ALL child processes (including those spawned by the Bash tool
that bg-shell doesn't track) and kills them with SIGTERM + 500ms
grace + SIGKILL. Prevents orphaned dev servers, build processes, etc.
from persisting after session exit.
2026-04-06 09:52:20 +02:00
deseltrus
c5227f7570 perf(tui): render-skip, frame isolation, Text cache guard, dispose
Container.render() now returns a stable array reference when output is
unchanged — TUI.doRender() skips ALL post-processing (isImageLine scans,
applyLineResets, differential diffs) when the reference matches.

Loader decouples spinner frame rotation from Text content updates.
Previously every 80ms tick called setText() which invalidated Text's
wrapTextWithAnsi/visibleWidth caches. Now the frame is prepended in
render() while Text caches the message separately.

Text.setText() returns early when text is unchanged, avoiding cache
invalidation on redundant updates.

ToolExecutionComponent.dispose() clears heavy references (image maps,
diff previews, result data) so GC can reclaim memory when components
are removed from the chat history.
2026-04-06 09:52:08 +02:00
Jeremy McSpadden
6dfc422990 Merge pull request #3587 from jeremymcs/feat/persistent-notification-panel
feat(gsd): persistent notification panel
2026-04-05 22:53:30 -05:00
Jeremy
9616b02c58 fix(gsd): coerce plain-string provides field to array in complete-slice (#3585)
LLMs sometimes pass simple string-array fields (provides, keyFiles, etc.)
as a plain string instead of a single-element array, causing TypeBox schema
validation to reject the call before the execute function's coercion logic
can run. Fix by accepting Union([Array, String]) in the schema and adding
wrapArray() coercion for all 8 simple array fields in the execute function.
2026-04-05 22:15:13 -05:00