Three fixes to fail-closed when gsd_complete_task didn't actually run:
1. Legacy branch: require checked checkbox (- [x] **T01:) instead of
accepting heading-style matches that only prove the task was planned
2. No plan file: return false instead of falling through
3. DB available but task row missing: return false instead of treating
as verified — if the DB is up and the task isn't there, the
completion tool never ran
Closes#3607
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use dark gray background + dim foreground for visible backdrop effect
instead of barely-perceptible SGR dim. Size overlay box to content
instead of padding to fill the entire viewport.
- Overlay layout: verify backdrop dims base lines, no dim without flag,
overlay composites on top of dimmed background
- Notification store: verify markAllRead and clearNotifications do not
delete a foreign process's lock file
_withLock() was unconditionally unlinking the lock file in finally,
even when lock acquisition failed. This could delete another process's
lock and allow unlocked concurrent writes. Now tracks ownership and
only cleans up locks we created.
The notification overlay was rendering too small with few entries, allowing
underlying content to bleed through. Added viewport padding to fill the
overlay box and a new `backdrop` option to OverlayOptions that dims the
background behind modal overlays.
Tighten the deriveState fallback per adversarial review:
- Intent-gated: only fire for low-entropy resume prompts via
RESUME_INTENT_PATTERNS (continue, ok, go ahead, resume, etc.)
- Phase-gated: only during state.phase === "executing"
- Non-resume prompts (help, status, abort, diagnostics) are not
hijacked with execution context
Add behavioral tests: 24 positive matches + 17 negative rejections
for the intent pattern, alongside the 5 structural tests.
When a user types "continue" or bare text to resume an in-progress
session, buildGuidedExecuteContextInjection() only matched two
hardcoded regex patterns and returned null for anything else — causing
the agent to rebuild everything from scratch and burn ~86k tokens.
Add a phase-gated deriveState fallback that injects task execution
context when state.phase === "executing" and an active task exists.
The phase guard prevents misrouting during replanning, gate evaluation,
or other non-execution phases.
newSession() only rebuilt the tool registry when cwd changed. When cwd
stayed the same (e.g., discuss → plan-slice in the same worktree), any
tool narrowing from setActiveTools() persisted — stripping gsd_plan_slice
and other DB tools from auto-mode subagent sessions.
Add an else-branch that calls _refreshToolRegistry with
includeAllExtensionTools:true on every session switch, regardless of cwd.
Also call resetExtensionLoaderCache() in DefaultResourceLoader.reload()
so hot-updated extension code on disk is re-compiled instead of served
from the stale jiti module cache.
Closes#3616
Verifies that defensive guards (render-skip, chat cap, dispose, signal
handler cleanup, alert cap, orphan kill) are present in source. These
are structural tests because the leaks manifest over hours of real
usage, not in unit test timescales.
Signal handlers (SIGTERM, SIGINT, beforeExit) were registered on every
session_start but never removed. Over multiple sessions within the same
process, handlers accumulated — each adding another cleanupAll() call
and descendant kill sweep on exit.
Fix: session_shutdown now calls process.off() for each handler before
cleanupAll(), preventing accumulation.
Also: signalCleanup now kills ALL descendant processes (not just those
tracked by bg-shell) to catch bash-tool spawned children.
Alert queue: pendingAlerts is capped at 50 entries to prevent unbounded
growth when background processes generate rapid alerts faster than the
agent consumes them.
pushAlert signature updated to accept null bg parameter for system-level
alerts that don't originate from a tracked process.
LLMs sometimes pass simple string-array fields (provides, keyFiles, etc.)
as a plain string instead of a single-element array, causing TypeBox schema
validation to reject the call before the execute function's coercion logic
can run. Fix by accepting Union([Array, String]) in the schema and adding
wrapArray() coercion for all 8 simple array fields in the execute function.
Notifications from ctx.ui.notify() and workflow-logger now persist to
.gsd/notifications.jsonl instead of evaporating as transient toasts.
- notification-store: JSONL persistence with 500-entry rotation, atomic
temp+rename rewrites, ref-counted suppress API, disk-synced counters
- notify-interceptor: WeakSet-guarded monkey-patch on ctx.ui.notify
installed at session_start and session_switch
- notification-widget: always-on belowEditor strip showing unread count
- notification-overlay: scrollable Ctrl+Alt+N panel with severity filter
- /gsd notifications command: clear, tail, filter subcommands
- workflow-logger: warnings now also persist to notification store
- web API: GET/DELETE /api/notifications with ?countOnly support
- 16 unit tests covering store, suppress, project isolation, resync
1. Post-execution retry bypass (auto-verification.ts)
- When postExecBlockingFailure is true, skip retry and pause immediately
- Post-exec failures are cross-task consistency issues that retrying won't fix
- Added test in post-exec-retry-bypass.test.ts
2. File path normalization (pre-execution-checks.ts)
- Added normalizeFilePath() to handle ./path vs path equivalence
- Normalizes backslashes, removes duplicate slashes, strips leading ./
- Applied to checkFilePathConsistency() and checkTaskOrdering()
- Added tests for path normalization in pre-execution-checks.test.ts
3. Pre-exec fail-closed (auto-post-unit.ts)
- Added try/catch around runPreExecutionChecks() inside runSafely block
- If runPreExecutionChecks throws, set preExecPauseNeeded = true
- Used logError from workflow-logger (not raw stderr)
- Added test in pre-execution-fail-closed.test.ts
autoStartTime was never saved to paused-session.json, so cross-session
resume always started with autoStartTime=0 and the widget showed no
elapsed timer. Now saved on pause, restored on resume with Date.now()
fallback for old files.
Also fixes widget layout: elapsed/ETA stays on the header line above
the milestone/branch info line.
The enhanced_verification_* preferences were validated and typed but not
included in mergePreferences(), causing project-level overrides to be
silently ignored. This fix ensures project preferences properly merge
with user-level defaults.
Integrates pre/post-execution checks into auto-mode:
- auto-verification.ts: runEnhancedPreChecks/runEnhancedPostChecks integration
- auto-post-unit.ts: pause control flow when blocking checks fail
- Respects enhanced_verification_strict preference for blocking vs warning
Control flow: blocking failures trigger auto-mode pause for user review.
Adds 3 post-execution checks that run after task completion:
- Import resolution: verifies relative imports resolve to existing files
- Export verification: confirms exported symbols are defined
- Type consistency: validates function return types match declarations
All checks follow the permissive-by-default pattern (R012) - warnings don't block.
Adds 4 pre-execution checks that run before each task:
- File ops review: surfaces create/edit/delete intent for manual review
- Read-before-create guard: fails when plan reads a file before creating it
- Package existence: verifies npm packages exist before install attempts
- Interface contract: warns on mismatched function signatures
Includes preference types and validation for enhanced_verification settings.
The welcome screen lines stopped short on wide terminals because
termWidth was capped at 200 columns. Remove the cap so separator
lines extend to the full terminal width.
- Use `git reset --hard <sha>` for rollback instead of `git branch -f`
which fails on checked-out branches and worktrees
- Clear pendingProviderRegistrations after preflush to prevent duplicate
registration when bindCore() runs
- Process Ollama stream content on terminal `done:true` chunks to avoid
truncating trailing assistant text
The system prompt hardcoded ~/.gsd/agent/skills/ paths for bundled skills,
causing ENOENT loops when skills weren't installed at those locations. The
auto-mode loop treated ENOENT as transient and retried indefinitely.
- Replace hardcoded skill paths in system.md with {{bundledSkillsTable}} template
variable, resolved dynamically via resolveSkillReference() at runtime
- Replace hardcoded templates dir path with {{templatesDir}} variable
- Add buildBundledSkillsTable() to system-context.ts — only includes skills
that actually exist on disk
- Export getTemplatesDir() from prompt-loader.ts
- Add Rule 4 to detect-stuck.ts: same ENOENT path seen twice in the sliding
window triggers immediate stuck detection (missing files don't self-heal)
- Add 4 tests for Rule 4 coverage
Closes#3575