Commit graph

1667 commits

Author SHA1 Message Date
Iouri Goussev
562e8eb164 fix: add @gsd/pi-tui to test module resolver in dist-redirect (#1811) 2026-03-21 12:04:13 -06:00
Iouri Goussev
b486177066 refactor: split shared/mod.ts into pure and TUI-dependent barrels (#1807) 2026-03-21 11:48:32 -06:00
TÂCHES
28a3387e2b fix: surface unmapped active requirements when all milestones complete (#1805) 2026-03-21 11:45:44 -06:00
TÂCHES
8fecf6a3ef fix: normalize paths in tests to handle Windows 8.3 short-path forms (#1804) 2026-03-21 11:45:36 -06:00
TÂCHES
280653d925 fix: share milestone ID reservation between preview and tool (#1569) (#1802) 2026-03-21 11:41:52 -06:00
deseltrus
272963569d fix(tui,gsd): tool-call loop guard + TUI stack overflow prevention (#1801) 2026-03-21 11:41:37 -06:00
TÂCHES
48c25dc853 fix: validate paused-session milestone before restoring it (#1664) (#1800) 2026-03-21 11:40:40 -06:00
TÂCHES
3e4be6babf fix: detect REPLAN-TRIGGER.md in deriveState for triage-initiated replans (#1798) 2026-03-21 11:40:22 -06:00
TÂCHES
ad85995108 fix: dispatch uat targets last completed slice instead of activeSlice (#1693) (#1796) 2026-03-21 11:40:05 -06:00
TÂCHES
d587c91305 fix: read depends_on from CONTEXT-DRAFT.md when CONTEXT.md absent (#1795) 2026-03-21 11:39:48 -06:00
Jeremy McSpadden
afe5f58ea6 fix(worktree): sync root-level files and all milestone dirs on worktree teardown (#1794) 2026-03-21 11:39:37 -06:00
TÂCHES
79de78750f fix: dashboard highlights UAT target slice instead of advanced activeSlice (#1793) 2026-03-21 11:39:05 -06:00
TÂCHES
c550f2231e fix: dispatch guard skips completed milestones with SUMMARY file (#1791) 2026-03-21 11:38:54 -06:00
deseltrus
afd3e3bd96 fix: ensureDbOpen creates DB + migrates Markdown in interactive sessions (#1790) 2026-03-21 11:38:43 -06:00
Lex Christopherson
f628f71843 fix: add require condition to pi-tui exports for CJS resolution
createRequire() in shared/ui.ts uses CJS resolution which needs a
"require" condition in package.json exports. Without it, Node throws
ERR_PACKAGE_PATH_NOT_EXPORTED.

Verified locally: build, typecheck:extensions, test:unit (0 fail),
test:integration (0 fail), validate-pack all pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:44:45 -06:00
Lex Christopherson
82868e1648 fix: update integration test to match dependency-aware dispatch guard wording
Slices with declared depends:[S01] now get "dependency slice" message
instead of "earlier slice" from the dispatch guard (PR #1770).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:36:47 -06:00
Lex Christopherson
050f260f7b fix: use createRequire instead of bare require for lazy pi-tui import
ESM modules don't have require(). Use createRequire(import.meta.url)
which works in both jiti-loaded and native ESM contexts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:25:28 -06:00
Lex Christopherson
a3f5e87cb7 fix: update doctor-git test to match PR #1633 behavior change
integration_branch_missing is now auto-fixable (fallback to main branch
detection) with warning severity, not error. Test expectations updated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:20:05 -06:00
Lex Christopherson
69c0e84b24 fix: increase resolveProjectRootFromGitFile walk-up limit from 10 to 30
The test creates a worktree path 11 levels deep, exceeding the 10-level
walk-up limit. In CI, the deep resolved path through a symlinked .gsd
fails to find the .git file and returns the raw path instead of the
project root.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:15:29 -06:00
Lex Christopherson
1c696c6f56 fix: include ensure-workspace-builds.cjs in npm package files
The postinstall script references this file but it wasn't listed in
the package.json files array, causing npm install to fail in CI's
validate-pack step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:07:52 -06:00
Lex Christopherson
243293e5b0 fix: resolve extension typecheck errors in test files
- await-tool.test.ts: widen getTextFromResult param to accept ImageContent (text optional)
- auto-loop.test.ts: add missing rebuildState and resolveModelId to LoopDeps mock

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:59:38 -06:00
Lex Christopherson
2d5628b51c fix: resolve CI build errors from Wave 4+5 merges
- Remove duplicate `selfHealRuntimeRecords` import in auto.ts (PRs #1772 and #1773 both added it)
- Add missing `model?: string` to runPreDispatchHooks return type in loop-deps.ts (PR #1781 referenced it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:55:37 -06:00
TÂCHES
2da1ecfd20 fix: return retry from postUnitPreVerification when artifact verification fails (#1571) (#1782)
When verifyExpectedArtifact returns false for a unit type with a known
expected artifact, postUnitPreVerification now returns "retry" instead
of "continue". This sets pendingVerificationRetry on the session so the
next loop iteration re-dispatches with failure context, preventing
13+ blind re-dispatches of the same failed unit before the stuck-loop
detector kicks in.

Closes #1571

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:27 -06:00
TÂCHES
d14e67cad8 fix: hook model field uses model-router resolution instead of Claude-only registry (#1720) (#1781)
dispatchHookUnit() previously used a simplistic lookup that only matched
exact model IDs or "provider/id" against ctx.modelRegistry.getAvailable().
Non-Claude models (e.g. openrouter/openai/gpt-5.4-codex) silently fell
back to the session model with no warning.

- Replace simplistic lookup in dispatchHookUnit with resolveModelId() which
  handles provider/model, bare-id, and OpenRouter org/model formats
- Add warning notification when a hook model can't be resolved instead of
  silent fallback
- Wire sidecar hook model override through runUnitPhase so post-unit hook
  model fields are applied after standard model selection
- Consume pre-dispatch hook model field (hookModelOverride) in IterationData
  so it reaches the dispatch phase
- Export resolveModelId from auto-model-selection.ts for reuse
- Add resolveModelId to LoopDeps interface for testability

Closes #1720

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:25 -06:00
TÂCHES
27916344df fix: stop auto-mode immediately on infrastructure errors (ENOSPC, ENOMEM, etc.) (#1780)
The blanket catch in auto/loop.ts treated all errors as transient and retried
up to 3 times, burning ~$20 per retry on guaranteed failures like disk-full.
Infrastructure errors (ENOSPC, ENOMEM, EROFS, EDQUOT, EMFILE, ENFILE) are now
detected before the retry logic and trigger an immediate stop with a clear
error message. Also adds a pre-dispatch disk space check to the health gate
so low-disk conditions are caught before dispatching a unit.

Closes #1694

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:22 -06:00
TÂCHES
33caef89d0 fix: add missing milestones/ segment in resolveHookArtifactPath (#1779)
resolveHookArtifactPath() built paths as .gsd/<MID>/slices/... instead
of .gsd/milestones/<MID>/slices/..., causing artifact idempotency checks,
retry_on detection, and skip_if in pre-dispatch hooks to all fail silently.

Closes #1721

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:20 -06:00
TÂCHES
dda01fa648 fix: break needs-discussion infinite loop when survivor branch exists (#1726) (#1778)
When a milestone has only CONTEXT-DRAFT.md, the survivor branch check
sets hasSurvivorBranch=true and skips all showSmartEntry calls. Auto-mode
then dispatches needs-discussion->stop, creating an infinite loop on
every /gsd run. Add a pre-check: when hasSurvivorBranch is true AND
phase is needs-discussion, route to the interactive discussion handler.

Closes #1726

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:17 -06:00
TÂCHES
a95d420972 fix: tear down browser sessions at unit boundaries and in stopAuto (#1733) (#1777)
Auto-mode launches Playwright/Chrome for browser-based verification but
never closes browsers between units or during stopAuto teardown. Over
retries and re-dispatches, Chrome processes accumulate and spike RAM.

Add closeBrowser() calls in two locations:
- stopAuto() finally block: ensures browser cleanup on any exit path
- postUnitPreVerification(): tears down browser between unit completions

Both use a getBrowser() guard to skip the import when no browser is active,
keeping the lazy-load pattern intact.

Closes #1733

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:46:14 -06:00
TÂCHES
accb327552 fix: rebuild STATE.md and reset completed-units on milestone transition (#1576) (#1775)
After milestone transitions in auto-mode, STATE.md remained stale because
rebuildState() was never called. Additionally, completed-units.json retained
entries from the previous milestone, causing dispatch to skip units in the
new milestone context. This adds rebuildState() to the milestone transition
block (bypassing the 30-second throttle) and resets completed-units tracking
when the active milestone changes.

Closes #1576

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:33:07 -06:00
TÂCHES
605fa6803a fix: resolve pending unit promise on all exit paths to prevent orphaned auto-loop (#1774)
handleAgentEnd, pauseAuto, and supervision timer catch blocks could
leave the unitPromise unresolved, causing autoLoop to hang permanently
on `await unitPromise`. Add resolveAgentEndCancelled() and call it on
every exit path that previously skipped resolution.

Closes #1666

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:33:05 -06:00
TÂCHES
4c3fafd6a6 fix: closeout unit on pause and heal runtime records on resume (#1625) (#1773)
pauseAuto now calls closeoutUnit() and clearUnitRuntimeRecord() for the
current unit before setting s.active = false, preventing stale
"dispatched" runtime records from accumulating on disk.

The resume path in startAuto now calls selfHealRuntimeRecords() before
entering autoLoop to clean any stale records that survived from prior
sessions (e.g. if clearUnitRuntimeRecord failed silently during pause).

Closes #1625

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:33:02 -06:00
TÂCHES
b609c3b30b fix: call selfHealRuntimeRecords before autoLoop to clear orphaned dispatched records (#1772)
When auto-mode dies after a subagent completes but before agent_end is
processed, the runtime record stays permanently at "phase": "dispatched"
with no recovery path. selfHealRuntimeRecords was only called from the
manual guided-flow wizard, never from auto-loop startup.

Add selfHealRuntimeRecords(basePath, ctx) before both autoLoop call
sites in startAuto (resume path and fresh-start path) so stale
dispatched records are cleared on every auto-mode entry.

Closes #1727

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:32:59 -06:00
TÂCHES
fe63ccad10 fix: dispatch guard uses dependency declarations instead of positional ordering (#1638) (#1770)
The dispatch guard checked slices linearly by position, creating deadlocks
when a positionally-earlier slice depended on a positionally-later one
(e.g. S05 depends_on S06). Now checks declared dependencies for slices
that have them, falling back to positional ordering for backward compat.

Closes #1638

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:32:55 -06:00
TÂCHES
21b2f8223d fix: add configurable timeout to await_job to prevent indefinite session blocking (#1769)
The await_job tool previously blocked the entire agent session with no
escape hatch. This adds a configurable timeout parameter (default 120s)
that races against the job promises. On timeout, jobs continue running
in the background and the agent regains control.

Closes #1690

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:32:52 -06:00
mastertyko
27ef4fcc40 fix(parallel): restore orchestrator state from session files and add worker stderr logging (#1748)
When the coordinator process restarts after a crash, the in-memory
orchestrator state is lost even though workers may still be running.
restoreState() only reads orchestrator.json, which can be missing or
corrupt. This adds restoreRuntimeState() as a fallback that rebuilds
coordinator state from live session status files under .gsd/parallel/.

Also adds:
- Worker stderr logging to per-milestone .stderr.log files for
  post-mortem diagnostics
- refreshWorkerStatuses(restoreIfNeeded) option for lazy state recovery
  from the /gsd parallel status command path
- getWorkerStatuses(basePath) auto-refreshes before returning
- Dead workers with no session file are marked stopped/error instead
  of staying permanently 'running'

Builds on #873 (crash recovery) and #932 (PID tracking).
2026-03-21 09:28:11 -06:00
TÂCHES
d1b6a8a6b1 fix: prevent getLoadedSkills crash and auto-build workspace packages (#1767)
Add defensive fallback in auto-prompts.ts so a missing getLoadedSkills
export degrades gracefully (empty skill list) instead of crashing every
auto-mode dispatch iteration.

Add ensure-workspace-builds.cjs postinstall script that detects missing
dist/ directories in workspace packages and rebuilds them automatically.
This prevents stale-build issues after fresh clones where dist/ is
gitignored but required at runtime by jiti-loaded extensions.

Closes #1734

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:19:48 -06:00
TÂCHES
050d51475b fix: session lock multi-path cleanup and false positive hardening (#1578) (#1765)
Three fixes for the session lock false positive loop:

1. Multi-path cleanup: Lock files accumulate across main project .gsd/,
   worktree .gsd/, and projects registry paths, but cleanup only targeted
   the current gsdRoot(). Added a _lockDirRegistry Set that tracks all
   paths where locks are created. Both the exit handler and
   releaseSessionLock() now clean all registered paths.

2. onCompromised hardening: When proper-lockfile fires onCompromised past
   the stale window, check if the lock file metadata still contains our
   PID before declaring compromise. Long subagent executions can stall
   the event loop beyond the 30-min stale window without actual takeover.

3. Error messages: Include the lock file path and PID in error messages,
   and suggest `gsd doctor --fix` as the recovery path.

Closes #1578

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:19:46 -06:00
TÂCHES
8c228b8dbb fix: robust node_modules symlink handling to prevent extension loading failures (#1762)
The ensureNodeModulesSymlink function silently failed when: a real
directory existed instead of a symlink, the symlink target moved after
npm upgrade, or the symlink pointed to a deleted location. All three
cases left extensions unable to resolve @gsd/* packages, making GSD
completely non-functional.

Three fixes:
1. Use lstatSync to detect real directories vs symlinks and handle each
2. Verify the symlink target actually exists before considering it valid
3. Log a warning on symlinkSync failure instead of silently swallowing
4. Move ensureNodeModulesSymlink before the early-return version check
   so it runs on EVERY launch, not just during resource syncs

Closes #1688

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:19:43 -06:00
TÂCHES
182e4a5f85 fix: lazy-load @gsd/pi-tui in shared/ui.ts to prevent /exit crash (#1761)
The eager top-level import of @gsd/pi-tui in shared/ui.ts caused any
command that transitively loaded the shared/mod barrel (including /exit)
to fail when extensions were loaded from ~/.gsd/agent/extensions/ where
@gsd/pi-tui has no node_modules resolution path.

Replaced the static import with a lazy require() accessor that defers
resolution to the first makeUI() call, so modules that import shared/mod
for non-TUI exports (constants, format utils, etc.) no longer trigger
the unresolvable dependency.

Closes #1640

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:19:41 -06:00
TÂCHES
305b426f5f fix: validate worktree .git file and fix metrics toolCall casing (#1713) (#1754)
Closes #1713
2026-03-21 09:06:25 -06:00
TÂCHES
049d432c3c fix: verify implementation artifacts before milestone completion (#1703) (#1760)
Milestones were being marked complete with only .gsd/ plan files and
zero implementation code. Add hasImplementationArtifacts() that checks
git diff against the main branch to verify non-.gsd/ files exist.
Applied in both verifyExpectedArtifact (post-unit gate) and the
completing-milestone dispatch rule (pre-dispatch guard).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:04:57 -06:00
TÂCHES
c68c6331ad fix: make task closeout crash-safe by unchecking orphaned checkboxes (#1650) (#1759)
When the process crashes between marking a task [x] in PLAN.md and
writing SUMMARY.md, the task appears done but has no summary. The doctor
previously papered over this by creating a stub summary, silently losing
the task. Now it unchecks the task so it re-executes on next run.

- Add markTaskUndoneInPlan to roadmap-mutations.ts
- Change doctor task_done_missing_summary fix: uncheck instead of stub
- Add markTaskUndoneInPlan helper to doctor.ts for async file ops
- Add test coverage for both the mutation and doctor behavior

Closes #1650

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:04:55 -06:00
TÂCHES
4367ea36c4 fix: preserve milestone branch on merge-back during transitions (#1573) (#1758)
When mergeAndExit cannot find the roadmap at the project root, it now
tries the worktree path as a fallback. If neither location has a roadmap,
the teardown preserves the branch (preserveBranch: true) so commits are
not orphaned when the worktree is pruned.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:04:52 -06:00
TÂCHES
0483363a33 fix: write crash lock after newSession so it records correct session path (#1757)
The crash lock was written with the session file path from before
runUnit() called newSession(), causing crash recovery to look up the
previous unit's session file instead of the current one. This meant
recovery reported "No session data recovered" even when 261KB of session
data was on disk.

Split the lock write into two phases: a preliminary lock (unit info only,
no session path) before runUnit for crash identification, then a full
lock update with the correct session file path after runUnit returns.

Closes #1710

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:04:50 -06:00
TÂCHES
1fb59ecb71 fix: handle symlinked .gsd in git add pathspec exclusions (#1712) (#1756)
When .gsd is a symlink, git rejects `:!.gsd/...` pathspecs with
"beyond a symbolic link". nativeAddAllWithExclusions now catches this
error and falls back to plain `git add -A` (which respects .gitignore).

Auto-commit failures in postUnit are elevated from debug-only to a
visible warning notification so silent work loss is surfaced.

Closes #1712

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:04:46 -06:00
TÂCHES
dc20078ad9 fix: guard worktree teardown on empty merge to prevent data loss (#1672) (#1755)
When nativeCommit returns null (nothing to commit), the worktree directory
and milestone branch are now preserved instead of unconditionally deleted.
This prevents data loss on WSL where git's stat cache can cause
autoCommitCurrentBranch to skip commits.

Additionally, nativeMergeSquash now re-throws non-conflict git failures
(bad ref, corrupt repo) instead of masking them as { success: true }.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:04:44 -06:00
TÂCHES
5d8e8c04b6 fix: resolve symlinks in doctor orphaned-worktree check (#1715) (#1753)
When .gsd is a symlink, `worktreesDir()` returns the symlink path while
`nativeWorktreeList()` returns the resolved real path. The Set membership
check always fails, causing all worktrees to be flagged as orphaned and
deleted. Apply `realpathSync` and path separator normalization to both
sides of the comparison.

Closes #1715

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 09:04:41 -06:00
Tom Boucher
63a61196e8 fix: silence spurious extension load error for non-extension libraries (#1709) (#1747)
The extension loader emits "Extension does not export a valid factory
function" for shared libraries like cmux that live in the extensions/
directory but are not extensions. Previous fixes (#1537, #1545) added
pi manifest opt-out checks in the three discovery layers, but a
defense-in-depth gap remained: if any discovery path fails to filter
a library, loadExtension() reports it as a broken extension.

Add isNonExtensionLibrary() check in loadExtension() itself. When a
module does not export a factory function, the loader now checks the
nearest package.json for a "pi" manifest with no declared extensions
before reporting an error. Libraries with "pi": {} are silently
skipped instead of producing a spurious error on every startup.

Fixes #1709

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 08:54:19 -06:00
Tom Boucher
973846cdc6 fix: reset completion state when post_unit_hooks retry_on signal is consumed (#1746)
consumeRetryTrigger() cleared the in-memory retry flag but did not undo
the doctor's [x] checkbox, delete SUMMARY.md, remove from completedUnits,
or delete the retry artifact. On the next loop iteration, deriveState()
saw the task as done and advanced past it — silently losing the retry.

When consumeRetryTrigger() returns a trigger, the code now:
1. Unchecks [x] → [ ] for the task in PLAN.md
2. Deletes SUMMARY.md for the task
3. Removes the unit from s.completedUnits and flushes to completed-units.json
4. Deletes the retry_on artifact (e.g. NEEDS-REWORK.md)
5. Invalidates caches so deriveState reads fresh disk state

Also extends the retry trigger type to include retryArtifact so the
consumer knows which artifact to clean up.

Fixes #1714

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 08:54:03 -06:00
Tom Boucher
f94ef56727 fix: route needs-discussion phase to showSmartEntry, preventing infinite /gsd loop (#1745)
Fixes #1726

Two bugs in bootstrapAutoSession():

1. The survivor branch check (Milestone branch recovery #601) included
   needs-discussion in its phase filter. A branch created by a prior failed
   bootstrap would set hasSurvivorBranch=true, skipping all showSmartEntry
   calls and sending the session straight to auto-mode dispatch.

2. The !hasSurvivorBranch block only handled phase==="complete" and
   phase==="pre-planning" with showSmartEntry calls. needs-discussion fell
   through with no handler, reaching auto-mode which dispatched
   "needs-discussion -> stop" immediately. Next /gsd run repeated the cycle.

Fix: Remove needs-discussion from the survivor branch phase filter (only
check pre-planning). Add an explicit needs-discussion handler that routes
to showSmartEntry and aborts if the discussion does not promote the draft.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 08:53:48 -06:00