Uses the existing SessionManager.continueRecent() from the Pi SDK
to load the most recent session for the current working directory.
Mirrors the --continue flag already available in the base Pi CLI.
mergeSliceToMain now runs git reset --hard if git merge --squash fails,
restoring a clean working tree instead of leaving conflict markers.
The merge guard catch block in auto.ts now:
1. Detects leftover conflicted state (UU/AA/UD in porcelain status)
2. Resets the working tree if conflicts remain
3. Stops auto-mode with a clear error instead of continuing with
corrupted .gsd/ state files that cause an infinite dispatch loop
Also fixes conflict markers in loader.ts, logo.ts, and postinstall.js
that were baked into main from a prior bad merge resolution.
Remove task/project/product classification taxonomy from discuss prompt.
The LLM now sizes work based on judgment, not labels.
Key changes:
- discuss.md: Replace 3-tier classification with judgment-based sizing.
Remove hard minimum question rounds (2 for task, 4 for project).
Questioning depth now matches actual scope.
- plan-milestone.md: Add right-sizing doctrine. Single-slice milestones
now write the S01 plan + task plans inline, eliminating separate
research-slice and plan-slice sessions.
- plan-slice.md: Add right-sizing guidance. Make Proof Level,
Integration Closure, and Observability sections conditional —
omit entirely for simple slices instead of filling with 'none'.
Consolidate self-audit from 10 items to 7 (remove duplicates).
- auto.ts: Skip research-slice for S01 when milestone research exists.
Update peekNext label for plan-milestone.
- complete-slice.md: Add effort-matching guidance. Lighten observability
verification for simple slices.
- execute-task.md: Make observability steps conditional on task plan
content rather than always required.
- templates (plan.md, task-plan.md): Add comments making heavyweight
sections explicitly optional for simple work.
Pipeline reduction for simple 1-slice milestone:
Before: 9-10 sessions (research-M, plan-M, research-S, plan-S,
execute×N, complete-S, reassess, complete-M)
After: 5-6 sessions (research-M, plan-M [+S01 inline],
execute×N, complete-S, complete-M)
Covers VALID_BRANCH_NAME regex validation, configured preference
returns correctly, fallback to auto-detection, and injection rejection.
Closes#108
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes to the dispatch loop:
1. Don't mark a unit complete when the next dispatch is the same unit
(retry scenario) — let the retry mechanism handle it instead of
persisting a false completion.
2. Verify expected artifact exists on disk before marking a unit
complete. Uses resolveExpectedArtifactPath + existsSync to gate
persistCompletedKey calls.
3. Cross-validate idempotency: when skipping a "completed" unit, verify
the artifact actually exists. If missing, remove the stale record
from completed-units.json and re-run the unit.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cut 48% of system prompt token cost while preserving all load-bearing content:
- Remove Activity Logs section (agent never interacts with these)
- Remove Investigation escalation ladder (redundant with tool-routing)
- Remove Context economy section (obvious/redundant)
- Remove Web research vs browser execution (compressed into playbooks)
- Compress tool-routing to non-obvious entries only (scout, bg_shell, Context7)
- Compress Ask vs infer to core rule
- Compress Code structure to 5 key principles
- Compress Verification to inline task-type table
- Compress Agent-First Observability (character block carries the why)
- Compress Background processes playbook from 30 to 5 lines
- Compress Web behavior playbook from 25 to 6 lines
- Compress Libraries and Current facts into single section
- Remove BRAVE_API_KEY config (user-facing, not agent-facing)
Three additions to the GSD character block:
- Security/performance/elegance as craft instinct, not checkbox compliance
- Anti-laziness: finish what you start, no stubs, no 80% features, no skipped error handling
- Self-debugging awareness: you write code you will debug later with no memory of writing it
Replace the generic agent intro with a craftsman-engineer character
definition: curious about problems, warm but terse, co-owner during
planning, committed executor during auto-mode. Consolidate the
scattered Communication and Writing Style + Work Narration sections
into a single focused Communication section that preserves all
calibration signals (pushback triggers, narration examples, uncertainty
handling).
- Prevent duplicate Brave tool entries when toggling providers repeatedly
by filtering already-active tools before re-adding (BUG-1)
- Remove single quotes from test glob patterns in package.json so Windows
shell expands them correctly (BUG-2)
- Fix test mock fire() to call all handlers instead of short-circuiting
on first match, matching real framework behavior (BUG-3)
- Suppress "Native Anthropic web search active" notification on session
restore (source: "restore") to reduce UX noise (BUG-4)
- Add regression tests for all 4 bugs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The model_select event doesn't reliably fire on startup, so Brave tools
remained visible to Claude even without a key. Now before_provider_request
filters search-the-web and search_and_read from the payload directly,
ensuring Claude only sees the native web_search tool.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Pi SDK's streaming parser drops server_tool_use and
web_search_tool_result content blocks. When the conversation is replayed,
assistant messages are incomplete, causing the Anthropic API to reject
requests with "thinking blocks cannot be modified."
Fix: stripThinkingFromHistory() removes thinking/redacted_thinking blocks
from all assistant messages before sending, since they're all from stored
history. The model generates fresh thinking for each new turn.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 tests covering: tool injection for claude models, non-claude passthrough,
double-injection prevention, tool deactivation/reactivation on model switch,
and session_start diagnostics.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Inject the web_search_20250305 server-side tool into Anthropic API
requests, eliminating the BRAVE_API_KEY requirement for Anthropic models.
When Anthropic + no Brave key, custom search tools are disabled to avoid
confusing the LLM with broken tools. fetch_page (Jina) is unaffected.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Theme proxy throws when accessed in RPC mode since initTheme() is
never called without a TUI. Wrap header rendering in try/catch so
the GSD extension loads cleanly in both TUI and RPC modes.
Closes#121
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a Work Narration section to system.md and per-phase hints to
research, plan, and execute prompts. Instructs the LLM to emit brief
status messages between tool calls covering decisions, discoveries,
phase transitions, and verification results — without narrating
routine reads or trivial commands.