Commit graph

2050 commits

Author SHA1 Message Date
Jeremy
722dfc96cb fix(gsd): align prompt contracts and validation flow 2026-04-08 20:13:35 -05:00
Jeremy
55cde8549a fix(ui): display 'anthropic-api' in GSD preferences wizard provider list
Applies the same provider display name mapping to the /gsd prefs model
picker so both /model and /gsd prefs show 'anthropic-api' consistently.
2026-04-08 17:36:33 -05:00
Jeremy McSpadden
75a3795e9a Merge pull request #3811 from jeremymcs/worktree-fix+race-remote-local-questions
fix(remote-questions): race local TUI against remote channel
2026-04-08 16:46:22 -05:00
Jeremy
c1a68a48a9 test(remote-questions): add regression tests for race model (#3810)
Validates the race routing logic: raceRemoteAndLocal helper exists,
routing checks both hasRemote and ctx.hasUI, remote timeouts are
treated as non-wins, AbortController cancels the loser, and
isRemoteConfigured is exported from manager.ts.
2026-04-08 16:20:13 -05:00
Jeremy
dadb0b136e fix(remote-questions): race local TUI against remote channel instead of remote-only routing
When a remote channel (Discord/Slack/Telegram) is configured, ask_user_questions
now races the local TUI against the remote dispatch. The first response wins and
the loser is cancelled. Previously, remote completely preempted the local TUI,
meaning terminal users never saw the question prompt when remote was configured.

Closes #3801
2026-04-08 15:58:06 -05:00
Jeremy McSpadden
ecfd4fbc90 Merge pull request #3802 from jeremymcs/fix/harden-prompt-gates
fix(prompts): harden non-bypassable gates and exclude dot-folders from scanning
2026-04-08 14:19:07 -05:00
Jeremy
d84778336c fix(gates): add mechanical enforcement for discussion question gates
When ask_user_questions fails, errors, or is cancelled during a
discussion flow, the model is now mechanically blocked from all
non-read-only tool calls until it re-asks and gets a valid response.

Previously, gate enforcement was prompt-only — the model could
rationalize past failed ask_user_questions calls ("auth issue, I'll
continue") and generate an entire plan without any user interaction.

The pending gate mechanism:
- tool_call hook: any ask_user_questions during discussion sets pending
- tool_result hook: valid user response clears pending; failure keeps it
- tool_call hook: blocks write/edit/gsd/mutating-bash while pending
- read-only tools and ask_user_questions itself always allowed
2026-04-08 14:00:16 -05:00
Jeremy
2c1e4b695e test(scanning): add regression tests for dot-folder exclusions
- codebase-generator: verify .claude/, .plans/, .cursor/, .vscode/ excluded
- detection: verify scanProjectFiles skips .claude, .gsd, .planning, .plans, .cursor, .vscode
2026-04-08 13:08:12 -05:00
Jeremy
757ce56594 fix(prompts): harden non-bypassable gates and exclude dot-folders from scanning
Regression: discuss-prepared.md was created without the non-bypassable gate
language from discuss.md, allowing the LLM to fabricate excuses ("auth issues")
and bypass user approval gates. Also, codebase scanning included .claude/ and
other tool directories, tainting project analysis with LLM metadata.

Gate hardening (9 prompts):
- discuss-prepared.md: all 4 layer gates + requirements/roadmap preview
- discuss.md: requirements, roadmap preview, Phase 3 multi-milestone gate
- guided-discuss-slice.md: added missing write-gate before CONTEXT.md write
- guided-discuss-milestone.md: plain-text fallback path now matches structured
- worktree-merge.md: merge confirmation gate
- system.md: outward-facing actions gate
- rethink.md: skip-slice gate added, discard confirmation hardened
- complete-milestone.md: tool-failure handling for verification gate
- triage-captures.md: quick-task/inject/replan confirmation gate

Dot-folder exclusions (2 files):
- codebase-generator.ts: added .claude/, .plans/, .cursor/, .vscode/
- detection.ts: added .gsd, .planning, .plans, .claude, .cursor, .vscode
2026-04-08 12:58:13 -05:00
Jeremy
d98456cad7 test(pi-tui): add regression tests for slash command TUI interactions
Add 3 new tests covering editor↔selector/input component swaps that
happen during /gsd prefs, /gsd migrate, and /gsd setup:

- editor-to-selector swap: verifies cursor tracking when editor with
  CURSOR_MARKER is replaced by a selector without one
- selector-to-editor swap: verifies cursor restores to CURSOR_MARKER
  position when editor returns after selector dismissal
- input component swap: verifies typing in prefs wizard text input
  produces correct cursor movement without jumps

All tests confirm hardwareCursorRow baseline computes correct movement
deltas for these interactive component transitions.
2026-04-08 12:09:21 -05:00
Jeremy
c07ecc1028 fix(providers): match 'out of extra usage' error and respect claude-code provider in model resolution (#3772)
Two bugs prevented subscription users from routing through Claude Code CLI:
1. Retry handler regex only matched "third-party" errors but actual error is
   "You're out of extra usage" — fallback never triggered
2. auto-model-selection actively rerouted bare model IDs back to anthropic
   even after startup migration set claude-code as the session provider
2026-04-08 10:47:35 -05:00
Jeremy McSpadden
e3d69ed01a Merge pull request #3766 from OfficialDelta/feat/tiered-context-injection
feat(gsd): tiered context injection with scoped decisions, knowledge, and roadmap excerpts
2026-04-08 08:09:29 -05:00
Jeremy McSpadden
695cab8b63 Merge pull request #3774 from mastertyko/fix/3759-preferences-section-warn-once
fix(gsd): suppress repeated preferences section warnings
2026-04-08 08:07:41 -05:00
Jeremy McSpadden
7d9e9a5585 Merge pull request #3784 from jeremymcs/fix/claude-code-default-provider
fix(providers): route Anthropic subscription users through Claude Code CLI
2026-04-08 08:07:09 -05:00
Jeremy McSpadden
393700a649 Merge pull request #3374 from deseltrus/fix/auto-mode-transient-error-resilience
fix(auto): resilient transient error recovery — defer to Core RetryHandler and fix cmdCtx race
2026-04-08 07:30:43 -05:00
Jeremy
ea456d4cdb fix(providers): route Anthropic subscription users through Claude Code CLI (#3772)
Anthropic now blocks third-party apps from using Pro/Max subscription
quotas via direct API calls. This change makes the claude-code provider
(which delegates to the local claude CLI binary) the default path for
Anthropic subscription users — TOS-compliant because requests flow
through Anthropic's own infrastructure.

Changes:
- Enhanced readiness check to verify CLI auth status (not just binary)
- Startup migration: auto-switch anthropic → claude-code when CLI ready
- Error recovery: auto-switch on third-party 400 block error
- Onboarding: removed Anthropic from OAuth, added Claude CLI option
- Added claude-code to flat-rate providers (no dynamic routing benefit)

Closes #3772
2026-04-08 07:20:20 -05:00
mastertyko
6fce97977d fix(gsd): suppress repeated preferences section warnings 2026-04-08 09:08:13 +02:00
Jeremy
c0236254a2 fix(pi-tui): revert contentCursorRow, use hardwareCursorRow as movement baseline
PR #3744 and #3765 introduced contentCursorRow which diverges from the
actual terminal cursor position after IME repositioning. computeLineDiff
computes ANSI escape movements which are relative to where the cursor
physically is — that must be hardwareCursorRow, not a phantom position.

Remove contentCursorRow entirely and revert computeLineDiff baseline to
hardwareCursorRow. The ghost-line test was asserting wrong movement
direction (UP from phantom position vs DOWN from actual cursor).

Closes #3764
2026-04-07 23:37:52 -05:00
OfficialDelta
aeef1b7d55 test: Add edge case tests for deriveSliceScope unit IDs and process words
- Tests for S01/M001/T03 unit ID filtering
- Tests for hardening/validation/verification/optimization/enhancement/infrastructure
2026-04-07 23:49:02 -04:00
OfficialDelta
c3ba64d56b M005: Tiered Context Injection - Scoped knowledge queries & roadmap excerpts
- Added queryKnowledge() for keyword-based KNOWLEDGE.md section filtering
- Added formatRoadmapExcerpt() for slice-scoped roadmap excerpts
- Added inlineKnowledgeScoped() and inlineRoadmapExcerpt() to auto-prompts
- Context reduction: ~65% for knowledge, ~67% for roadmap excerpts
- Also includes deriveSliceScope() fix for unit IDs and process words

Squash merge of milestone/M005
2026-04-07 23:48:06 -04:00
Jeremy
a28a56b3e4 test(pi-tui): add regression tests for contentCursorRow tracking
Verify that contentCursorRow is correctly maintained across renders
and that IME repositioning does not cause spurious cursor jumps
during normal typing or content shrinking.

Refs #3764
2026-04-07 22:39:00 -05:00
OfficialDelta
e4f94fa5fb feat(context): implement R005 decision scope cascade and derive scope from slice metadata
Fix 1 - Fallback cascade:
- inlineDecisionsFromDb() now cascades: milestone+scope → milestone only → null
- When scoped query returns empty AND scope was provided, retries without scope
- Falls back to milestone-level decisions when no scope-specific ones exist

Fix 2 - Derive scope from slice metadata:
- Added deriveSliceScope(title, description?) helper function
- Extracts first meaningful noun (filters stopwords and generic words)
- Examples: 'Auth Middleware & Protected Route' → 'auth'
           'Integration Testing' → undefined (too generic)
- Wired into buildPlanSlicePrompt and buildResearchSlicePrompt

Added comprehensive test suite (13 tests) covering:
- Keyword extraction from slice titles
- Generic title detection
- Cascade fallback behavior
- Integration between scope derivation and cascade
2026-04-07 23:23:55 -04:00
Jeremy McSpadden
2d58b3fc1b Merge pull request #3739 from jeremymcs/fix/orphaned-worktree-audit
fix(gsd): add orphaned milestone branch audit at auto-mode bootstrap
2026-04-07 21:54:57 -05:00
Jeremy McSpadden
ff9ad68526 Merge pull request #3743 from mastertyko/fix/3742-pre-exec-annotated-paths
fix(gsd): parse annotated pre-exec file paths
2026-04-07 21:47:30 -05:00
OfficialDelta
4214252eaa feat(M005): Tiered Context Injection - relevance-scoped context with 65%+ reduction
- Added milestone-scoped queryDecisions/queryRequirements with D020 fallback cascade
- Added queryKnowledge() for keyword-based KNOWLEDGE.md section filtering
- Added formatRoadmapExcerpt() for minimal roadmap table extraction
- Wired inlineKnowledgeScoped() and inlineRoadmapExcerpt() into slice prompts
- 39 new tests (31 context-store + 8 measurement)
- Measured 65.7% combined context reduction (exceeds 40% target)
2026-04-07 22:44:09 -04:00
Jeremy
b7d7c69b9e fix(gsd): add logWarning to empty catch block in orphaned worktree cleanup
The workflow-logger coverage test (#3348) requires all catch blocks in
migrated files to include logging. Add logWarning for the expected
failure case when nativeWorktreeRemove fails on orphaned directories.

Refs #3739
2026-04-07 21:41:08 -05:00
Jeremy McSpadden
f9a6cac958 Merge pull request #3744 from mastertyko/fix/3721-tui-autocomplete-ghost-lines
fix(pi-tui): clear autocomplete rows from content bottom
2026-04-07 21:32:14 -05:00
Jeremy McSpadden
ae5e4af61e Merge pull request #3745 from mastertyko/fix/3705-subagent-tools-frontmatter
fix(subagent): support list-style tools frontmatter
2026-04-07 21:31:38 -05:00
Jeremy McSpadden
0c62fada0e Merge pull request #3758 from jeremymcs/fix/pre-verification-timeout-guard
fix(gsd): add timeout guard around postUnitPreVerification
2026-04-07 21:28:21 -05:00
Jeremy
5972e4d809 fix(gsd): add consecutiveFinalizeTimeouts to LoopState in journal tests
Update all LoopState object literals in journal-integration.test.ts
to include the new consecutiveFinalizeTimeouts property.

Refs #3757
2026-04-07 21:15:37 -05:00
Jeremy
0ca76f6813 fix(gsd): add escalation and unit-detach guards to finalize timeout handlers
Address adversarial review findings:

1. Timed-out pre/post verification continues running in background and
   can mutate s.currentUnit for the wrong unit. Fix: null out
   s.currentUnit on timeout so late async completions are harmless
   (all side effects in postUnitPreVerification guard on s.currentUnit).

2. Finalize timeouts were treated as successful iterations, resetting
   consecutiveErrors and enabling silent infinite churn. Fix: add
   consecutiveFinalizeTimeouts counter to LoopState, increment on each
   timeout, hard-stop auto-mode after MAX_FINALIZE_TIMEOUTS (3)
   consecutive timeouts. Reset to 0 on successful finalize.

Both fixes apply symmetrically to pre and post verification timeouts.

Refs #3757
2026-04-07 21:10:55 -05:00
Jeremy McSpadden
b628bfda4f Merge pull request #3754 from jeremymcs/fix/os-specific-keyboard-shortcuts
fix(gsd): OS-specific keyboard shortcut hints via formatShortcut helper
2026-04-07 21:08:26 -05:00
Jeremy
32a7af0513 fix(gsd): add timeout guard around postUnitPreVerification to prevent auto-loop hang
postUnitPostVerification already has a 60s timeout guard (#2344) but
postUnitPreVerification was called with bare await — if any async
operation inside it never resolves (browser teardown, worktree sync,
safety harness validation), the auto-loop freezes permanently with no
error, notification, or recovery.

Wrap postUnitPreVerification in the same withTimeout() pattern with a
dedicated FINALIZE_PRE_TIMEOUT_MS constant. On timeout, log a warning
and force-continue to the next iteration.

Closes #3757
2026-04-07 20:57:30 -05:00
Jeremy
e52be4fc09 fix(gsd): OS-specific keyboard shortcut hints via formatShortcut helper
Keyboard shortcut hints were hardcoded as Ctrl+Alt+X everywhere except
auto-dashboard.ts which had an inline platform check. On macOS these
should render as ⌃⌥X.

- Add formatShortcut() to files.ts — converts Ctrl/Alt/Shift/Cmd
  modifiers to macOS symbols (⌃/⌥/⇧/⌘) when process.platform is darwin
- Replace all inline platform checks and hardcoded hints with
  formatShortcut() calls
- Use template variables in system.md for shortcut hints
- Update comments in overlay files for consistency
- Add 7 tests covering all modifier conversions and passthrough

Closes #3753
2026-04-07 20:53:04 -05:00
Jeremy
dd08f501e5 fix(gsd): validate depth verification answer before unlocking write-gate
The tool_result handler called markDepthVerified() whenever
ask_user_questions returned any response with a depth_verification
question ID — without checking what the user actually selected.
Selecting "Not quite", "None of the above", or garbage input all
unlocked the gate.

- Extract isDepthConfirmationAnswer() into write-gate.ts with structural
  validation: cross-references selected answer against the question's
  defined options, only accepting an exact match of the first option
  (confirmation by convention). Rejects free-form "Other" text and
  decouples from any specific label substring.
- Harden block message with explicit anti-bypass language
- Add anti-bypass instructions to all three discuss prompts
- Add 8 new tests covering: structural validation, free-form bypass
  rejection, label-drift resilience, fallback behavior, edge cases

Closes #3749
2026-04-07 19:45:00 -05:00
mastertyko
fc38dd6f91 fix(subagent): support list-style tools frontmatter 2026-04-08 01:56:38 +02:00
Jeremy McSpadden
a3ea23bda1 Merge pull request #3727 from jeremymcs/fix/state-machine-wave5-consistency
fix(gsd): consistency and cleanup (wave 5/5)
2026-04-07 18:27:42 -05:00
mastertyko
9f4666be61 fix: clear autocomplete rows from content bottom
Use the rendered content row as the shrink diff baseline instead of\nreusing the IME hardware cursor row. Add a focused TUI regression test\nthat reproduces the ghost-line cleanup path when autocomplete shrinks.\n\nCloses #3721
2026-04-08 01:25:31 +02:00
mastertyko
12b6a01dae fix: parse annotated pre-exec file paths
Strip planner-style path annotations before pre-execution checks compare\ninputs and expected outputs. This keeps existing files, prior outputs,\nand ordering checks aligned even when task-plan entries include inline\ndescriptions.\n\nCloses #3742
2026-04-08 01:25:24 +02:00
Jeremy
c038f68898 Merge upstream/main into fix/state-machine-wave5-consistency
Resolve conflict in workflow-events.ts: keep HEAD's v? schema version
field addition alongside the cmd field.
2026-04-07 18:06:40 -05:00
Jeremy McSpadden
2ff938596b Merge pull request #3726 from jeremymcs/fix/state-machine-wave4-writes
fix(gsd): write safety — atomic writes and randomized tmp (wave 4/5)
2026-04-07 18:04:43 -05:00
Jeremy McSpadden
00f401ab98 Merge pull request #3725 from jeremymcs/fix/state-machine-wave3-session
fix(gsd): session and recovery robustness (wave 3/5)
2026-04-07 18:04:27 -05:00
Jeremy
ae8d3b6983 Merge upstream/main into fix/state-machine-wave2-events
Resolve conflict in workflow-reconcile.ts: keep upstream's
complete_milestone invariant check (getMilestoneSlices guard) and
HEAD's plan_milestone/plan_slice/plan_task replay handlers.
2026-04-07 17:48:24 -05:00
Jeremy
4fdce5d179 test(gsd): align hasImplementationArtifacts tests with string return type
The function signature changed from boolean to "present" | "absent" |
"unknown" but three test assertions still compared against true/false.

Update assertions to match the new return type.
2026-04-07 17:20:37 -05:00
Jeremy
fe8d67579e fix(gsd): add orphaned milestone branch audit at auto-mode bootstrap
When a milestone completes but the session ends before teardown runs,
the milestone branch and worktree directory are orphaned — the DB says
complete so auto-mode won't re-enter, and the teardown is never retried.

Adds auditOrphanedMilestoneBranches() that runs after DB open during
bootstrap. For each milestone/* branch where the DB status is complete:
- If already merged into main → deletes the branch + cleans worktree dir
- If NOT merged → preserves the branch and warns the user

Includes 9 regression tests covering merged/unmerged/active/none-mode
scenarios.
2026-04-07 16:33:03 -05:00
Jeremy
5cfc865040 fix(gsd): revert unknown artifact check to warn-and-proceed
Blocking on "unknown" from hasImplementationArtifacts broke real-world
auto-mode in projects without clean git merge-bases (single-branch,
fresh repos, detached HEAD). The auto-loop silently stopped at
completing-milestone with no visible error.

Reverted to warn-and-proceed for "unknown" — only "absent" (confirmed
no implementation files) blocks completion. This matches the original
fail-open behavior for inconclusive git checks.
2026-04-07 15:34:46 -05:00
Jeremy
01f5557520 test(gsd): update audit tests for expanded SAFE_KEYS allowlist 2026-04-07 14:04:14 -05:00
Jeremy
9b998def03 fix(gsd): add missing cmd field to test base WorkflowEvent 2026-04-07 13:53:42 -05:00
Jeremy
a9c62adf22 fix(gsd): address remaining adversarial review findings for wave 3
1. hasImplementationArtifacts "unknown" now blocks completion instead of
   warn-and-proceed. Both auto-dispatch.ts and auto-recovery.ts updated
   to treat "unknown" as a stop condition, preventing milestone completion
   when git status cannot be verified.

2. Audit log SAFE_KEYS allowlist expanded to include "id", "error", and
   "count" fields. SPLIT BRAIN logError entries now persist the entity ID
   and rollback error details to audit-log.jsonl for triage/repair.
2026-04-07 13:50:49 -05:00
Jeremy
6d9b7f91b2 fix(gsd): detect concurrent event log growth during reconcile
Adds a pre-write guard in reconcileWorktreeLogs: re-reads the event
log before overwriting and retries if it grew since the initial read.
Prevents appendEvent calls between read and rewrite from being silently
dropped by the atomic overwrite.
2026-04-07 13:46:37 -05:00