Initial commit

2026-03-10 22:28:37 -06:00 · 2026-03-10 22:28:37 -06:00 · 3bd2f8cb63
commit 3bd2f8cb63
137 changed files with 41602 additions and 0 deletions
--- a/.bg-shell/manifest.json
+++ b/.bg-shell/manifest.json
@ -0,0 +1 @@
+[]
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,40 @@
+
+# ── GSD baseline (auto-generated) ──
+.gsd/activity/
+.gsd/runtime/
+.gsd/auto.lock
+.gsd/metrics.json
+.gsd/STATE.md
+.DS_Store
+Thumbs.db
+*.swp
+*.swo
+*~
+.idea/
+.vscode/
+*.code-workspace
+.env
+.env.*
+!.env.example
+node_modules/
+.next/
+/dist/
+!/pkg/dist/
+build/
+__pycache__/
+*.pyc
+.venv/
+venv/
+target/
+vendor/
+*.log
+coverage/
+.cache/
+tmp/
+
+# ── GSD baseline (auto-generated) ──
+dist/
+.bg_shell
+.gsd*.tgz
+.gsd
+.artifacts/
--- a/39
+++ b/39
@ -0,0 +1,39 @@
+Business Source License 1.1
+
+Licensor: Lex Christopherson
+Licensed Work: GSD (gsd-pi)
+Additional Use Grant: None
+Change Date: 2029-03-10
+Change License: MIT
+
+Parameters
+
+Licensor: Lex Christopherson
+Licensed Work: The Licensed Work is (c) 2026 Lex Christopherson
+
+Use Limitation: You may not use the Licensed Work for a Commercial Purpose.
+A "Commercial Purpose" means use in a commercial product or service, or use 
+on behalf of a for-profit entity, unless you have received a separate 
+commercial license from the Licensor.
+
+On the Change Date, or the fourth anniversary of the first publicly available 
+distribution of a specific version of the Licensed Work under this License, 
+whichever comes first, the Licensor grants you rights under the terms of the 
+Change License, and the rights granted in the paragraph above terminate.
+
+For purposes of this License:
+
+"Commercial Purpose" means use intended for or directed toward commercial 
+advantage or monetary compensation.
+
+The Licensor may make additional grants beyond those described above. Any 
+additional grants will be described in the Additional Use Grant above.
+
+Notice
+
+The Business Source License (this document, or the "License") is not an 
+Open Source license. However, the Licensed Work will eventually be made 
+available under an Open Source License, as stated in this License.
+
+License text copyright (c) 2017 MariaDB Corporation Ab, All Rights Reserved.
+"Business Source License" is a trademark of MariaDB Corporation Ab.
--- a/README.md
+++ b/README.md
@ -0,0 +1,346 @@
+<div align="center">
+
+# GSD
+
+**The evolution of [Get Shit Done](https://github.com/glittercowboy/get-shit-done) — now a real coding agent.**
+
+[![npm version](https://img.shields.io/npm/v/gsd-pi?style=for-the-badge&logo=npm&logoColor=white&color=CB3837)](https://www.npmjs.com/package/gsd-pi)
+[![npm downloads](https://img.shields.io/npm/dm/gsd-pi?style=for-the-badge&logo=npm&logoColor=white&color=CB3837)](https://www.npmjs.com/package/gsd-pi)
+[![GitHub stars](https://img.shields.io/github/stars/glittercowboy/gsd-2?style=for-the-badge&logo=github&color=181717)](https://github.com/glittercowboy/gsd-2)
+[![License](https://img.shields.io/badge/license-BSL%201.1-blue?style=for-the-badge)](LICENSE)
+
+The original GSD went viral as a prompt framework for Claude Code. It worked, but it was fighting the tool — injecting prompts through slash commands, hoping the LLM would follow instructions, with no actual control over context windows, sessions, or execution.
+
+This version is different. GSD is now a standalone CLI built on the [Pi SDK](https://github.com/nicholasgasior/pi-coding-agent), which gives it direct TypeScript access to the agent harness itself. That means GSD can actually *do* what v1 could only *ask* the LLM to do: clear context between tasks, inject exactly the right files at dispatch time, manage git branches, track cost and tokens, detect stuck loops, recover from crashes, and auto-advance through an entire milestone without human intervention.
+
+One command. Walk away. Come back to a built project with clean git history.
+
+```bash
+npm install -g gsd-pi
+gsd
+```
+
+</div>
+
+---
+
+## What Changed From v1
+
+The original GSD was a collection of markdown prompts installed into `~/.claude/commands/`. It relied entirely on the LLM reading those prompts and doing the right thing. That worked surprisingly well — but it had hard limits:
+
+- **No context control.** The LLM accumulated garbage over a long session. Quality degraded.
+- **No real automation.** "Auto mode" was the LLM calling itself in a loop, burning context on orchestration overhead.
+- **No crash recovery.** If the session died mid-task, you started over.
+- **No observability.** No cost tracking, no progress dashboard, no stuck detection.
+
+GSD v2 solves all of these because it's not a prompt framework anymore — it's a TypeScript application that *controls* the agent session.
+
+| | v1 (Prompt Framework) | v2 (Agent Application) |
+|---|---|---|
+| Runtime | Claude Code slash commands | Standalone CLI via Pi SDK |
+| Context management | Hope the LLM doesn't fill up | Fresh session per task, programmatic |
+| Auto mode | LLM self-loop | State machine reading `.gsd/` files |
+| Crash recovery | None | Lock files + session forensics |
+| Git strategy | LLM writes git commands | Programmatic branch-per-slice, squash merge |
+| Cost tracking | None | Per-unit token/cost ledger with dashboard |
+| Stuck detection | None | Retry once, then stop with diagnostics |
+| Timeout supervision | None | Soft/idle/hard timeouts with recovery steering |
+| Context injection | "Read this file" | Pre-inlined into dispatch prompt |
+| Roadmap reassessment | Manual | Automatic after each slice completes |
+| Skill discovery | None | Auto-detect and install relevant skills during research |
+
+---
+
+## How It Works
+
+GSD structures work into a hierarchy:
+
+```
+Milestone  →  a shippable version (4-10 slices)
+  Slice    →  one demoable vertical capability (1-7 tasks)
+    Task   →  one context-window-sized unit of work
+```
+
+The iron rule: **a task must fit in one context window.** If it can't, it's two tasks.
+
+### The Loop
+
+Each slice flows through phases automatically:
+
+```
+Research → Plan → Execute (per task) → Complete → Reassess Roadmap → Next Slice
+```
+
+**Research** scouts the codebase and relevant docs. **Plan** decomposes the slice into tasks with must-haves (mechanically verifiable outcomes). **Execute** runs each task in a fresh context window with only the relevant files pre-loaded. **Complete** writes the summary, UAT script, marks the roadmap, and commits. **Reassess** checks if the roadmap still makes sense given what was learned.
+
+### `/gsd auto` — The Main Event
+
+This is what makes GSD different. Run it, walk away, come back to built software.
+
+```
+/gsd auto
+```
+
+Auto mode is a state machine driven by files on disk. It reads `.gsd/STATE.md`, determines the next unit of work, creates a fresh agent session, injects a focused prompt with all relevant context pre-inlined, and lets the LLM execute. When the LLM finishes, auto mode reads disk state again and dispatches the next unit.
+
+**What happens under the hood:**
+
+1. **Fresh session per unit** — Every task, every research phase, every planning step gets a clean 200k-token context window. No accumulated garbage. No "I'll be more concise now."
+
+2. **Context pre-loading** — The dispatch prompt includes inlined task plans, slice plans, prior task summaries, dependency summaries, roadmap excerpts, and decisions register. The LLM starts with everything it needs instead of spending tool calls reading files.
+
+3. **Git branch-per-slice** — Each slice gets its own branch (`gsd/M001/S01`). Tasks commit atomically on the branch. When the slice completes, it's squash-merged to main as one clean commit.
+
+4. **Crash recovery** — A lock file tracks the current unit. If the session dies, the next `/gsd auto` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context.
+
+5. **Stuck detection** — If the same unit dispatches twice (the LLM didn't produce the expected artifact), it retries once with a deep diagnostic. If it fails again, auto mode stops with the exact file it expected.
+
+6. **Timeout supervision** — Soft timeout warns the LLM to wrap up. Idle watchdog detects stalls. Hard timeout pauses auto mode. Recovery steering nudges the LLM to finish durable output before giving up.
+
+7. **Cost tracking** — Every unit's token usage and cost is captured, broken down by phase, slice, and model. The dashboard shows running totals and projections. Budget ceilings can pause auto mode before overspending.
+
+8. **Adaptive replanning** — After each slice completes, the roadmap is reassessed. If the work revealed new information that changes the plan, slices are reordered, added, or removed before continuing.
+
+9. **Escape hatch** — Press Escape to pause. The conversation is preserved. Interact with the agent, inspect what happened, or just `/gsd auto` to resume from disk state.
+
+### The `/gsd` Wizard
+
+When you're not in auto mode, `/gsd` reads disk state and shows contextual options:
+
+- **No `.gsd/` directory** → Start a new project. Discussion flow captures your vision, constraints, and preferences.
+- **Milestone exists, no roadmap** → Discuss or research the milestone.
+- **Roadmap exists, slices pending** → Plan the next slice, or jump straight to auto.
+- **Mid-task** → Resume from where you left off.
+
+The wizard is the on-ramp. Auto mode is the highway.
+
+---
+
+## Getting Started
+
+### Install
+
+```bash
+npm install -g gsd-pi
+```
+
+Requires Node.js ≥ 20.6.0. Installs Chromium via Playwright for browser-based verification (non-fatal if it fails).
+
+### First Run
+
+```bash
+cd your-project
+gsd
+```
+
+On first launch, GSD prompts for optional API keys:
+- **Brave Search** — for web research during planning
+- **Context7** — for up-to-date library documentation
+- **Jina** — for web page content extraction
+
+All optional. Press Enter to skip any. Keys are stored in `~/.gsd/agent/auth.json` and loaded automatically on subsequent launches.
+
+### Start Building
+
+The wizard walks you through describing what you want to build. Once you approve the roadmap:
+
+```
+/gsd auto
+```
+
+Walk away. GSD will research, plan, execute, verify, commit, and advance through every slice until the milestone is complete.
+
+### Commands
+
+| Command | What it does |
+|---------|-------------|
+| `/gsd` | Contextual wizard — reads state, shows what's next |
+| `/gsd auto` | Start auto mode (fresh session per unit, loops until done) |
+| `/gsd stop` | Stop auto mode gracefully |
+| `/gsd status` | Progress dashboard overlay |
+| `/gsd queue` | Queue future milestones (safe during auto mode) |
+| `/gsd discuss` | Discuss implementation decisions before planning |
+| `/gsd prefs` | Manage skill preferences (global/project) |
+| `/gsd doctor` | Validate `.gsd/` integrity, find and fix issues |
+| `Ctrl+Alt+G` | Toggle dashboard overlay |
+
+---
+
+## What GSD Manages For You
+
+### Context Engineering
+
+Every dispatch is carefully constructed. The LLM never wastes tool calls on orientation.
+
+| Artifact | Purpose |
+|----------|---------|
+| `PROJECT.md` | Living doc — what the project is right now |
+| `DECISIONS.md` | Append-only register of architectural decisions |
+| `STATE.md` | Quick-glance dashboard — always read first |
+| `M001-ROADMAP.md` | Milestone plan with slice checkboxes, risk levels, dependencies |
+| `M001-CONTEXT.md` | User decisions from the discuss phase |
+| `M001-RESEARCH.md` | Codebase and ecosystem research |
+| `S01-PLAN.md` | Slice task decomposition with must-haves |
+| `T01-PLAN.md` | Individual task plan with verification criteria |
+| `T01-SUMMARY.md` | What happened — YAML frontmatter + narrative |
+| `S01-UAT.md` | Human test script derived from slice outcomes |
+
+### Git Strategy
+
+Branch-per-slice with squash merge. Fully automated.
+
+```
+main:
+  feat(M001/S03): auth and session management
+  feat(M001/S02): API endpoints and middleware
+  feat(M001/S01): data model and type system
+
+gsd/M001/S01 (preserved):
+  feat(S01/T03): file writer with round-trip fidelity
+  feat(S01/T02): markdown parser for plan files
+  feat(S01/T01): core types and interfaces
+```
+
+One commit per slice on main. Per-task history preserved on branches. Git bisect works. Individual slices are revertable.
+
+### Verification
+
+Every task has must-haves — mechanically checkable outcomes:
+
+- **Truths** — Observable behaviors ("User can sign up with email")
+- **Artifacts** — Files that must exist with real implementation, not stubs
+- **Key Links** — Imports and wiring between artifacts
+
+The verification ladder: static checks → command execution → behavioral testing → human review (only when the agent genuinely can't verify itself).
+
+### Dashboard
+
+`Ctrl+Alt+G` or `/gsd status` opens a real-time overlay showing:
+
+- Current milestone, slice, and task progress
+- Auto mode elapsed time and phase
+- Per-unit cost and token breakdown by phase, slice, and model
+- Cost projections based on completed work
+- Completed and in-progress units
+
+---
+
+## Configuration
+
+### Preferences
+
+GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project). Manage with `/gsd prefs`.
+
+```yaml
+---
+version: 1
+models:
+  research: claude-sonnet-4-6
+  planning: claude-opus-4-6
+  execution: claude-sonnet-4-6
+  completion: claude-sonnet-4-6
+skill_discovery: suggest
+auto_supervisor:
+  soft_timeout_minutes: 20
+  idle_timeout_minutes: 10
+  hard_timeout_minutes: 30
+budget_ceiling: 50.00
+---
+```
+
+**Key settings:**
+
+| Setting | What it controls |
+|---------|-----------------|
+| `models.*` | Per-phase model selection (Opus for planning, Sonnet for execution, etc.) |
+| `skill_discovery` | `auto` / `suggest` / `off` — how GSD finds and applies skills |
+| `auto_supervisor.*` | Timeout thresholds for auto mode supervision |
+| `budget_ceiling` | USD ceiling — auto mode pauses when reached |
+| `uat_dispatch` | Enable automatic UAT runs after slice completion |
+| `always_use_skills` | Skills to always load when relevant |
+| `skill_rules` | Situational rules for skill routing |
+
+### Bundled Tools
+
+GSD ships with 11 extensions, all loaded automatically:
+
+| Extension | What it provides |
+|-----------|-----------------|
+| **GSD** | Core workflow engine, auto mode, commands, dashboard |
+| **Browser Tools** | Playwright-based browser for UI verification |
+| **Search the Web** | Brave Search + Jina page extraction |
+| **Context7** | Up-to-date library/framework documentation |
+| **Background Shell** | Long-running process management with readiness detection |
+| **Subagent** | Delegated tasks with isolated context windows |
+| **Plan Mode** | Structured planning before execution |
+| **Slash Commands** | Custom command creation |
+| **Worktree** | Git worktree management |
+| **Ask User Questions** | Structured user input with single/multi-select |
+| **Secure Env Collect** | Masked secret collection without manual .env editing |
+
+### Bundled Agents
+
+Three specialized subagents for delegated work:
+
+| Agent | Role |
+|-------|------|
+| **Scout** | Fast codebase recon — returns compressed context for handoff |
+| **Researcher** | Web research — finds and synthesizes current information |
+| **Worker** | General-purpose execution in an isolated context window |
+
+---
+
+## Architecture
+
+GSD is a TypeScript application that embeds the Pi coding agent SDK.
+
+```
+gsd (CLI binary)
+  └─ loader.ts          Sets PI_PACKAGE_DIR, GSD env vars, dynamic-imports cli.ts
+      └─ cli.ts         Wires SDK managers, loads extensions, starts InteractiveMode
+          ├─ wizard.ts       First-run API key collection (Brave/Context7/Jina)
+          ├─ app-paths.ts    ~/.gsd/agent/, ~/.gsd/sessions/, auth.json
+          ├─ resource-loader.ts  Syncs bundled extensions + agents to ~/.gsd/agent/
+          └─ src/resources/
+              ├─ extensions/gsd/    Core GSD extension (auto, state, commands, ...)
+              ├─ extensions/...     10 supporting extensions
+              ├─ agents/            scout, researcher, worker
+              ├─ AGENTS.md          Agent routing instructions
+              └─ GSD-WORKFLOW.md    Manual bootstrap protocol
+```
+
+**Key design decisions:**
+
+- **`pkg/` shim directory** — `PI_PACKAGE_DIR` points here (not project root) to avoid Pi's theme resolution collision with our `src/` directory. Contains only `piConfig` and theme assets.
+- **Two-file loader pattern** — `loader.ts` sets all env vars with zero SDK imports, then dynamic-imports `cli.ts` which does static SDK imports. This ensures `PI_PACKAGE_DIR` is set before any SDK code evaluates.
+- **Always-overwrite sync** — `npm update -g` takes effect immediately. Bundled extensions and agents are synced to `~/.gsd/agent/` on every launch, not just first run.
+- **State lives on disk** — `.gsd/` is the source of truth. Auto mode reads it, writes it, and advances based on what it finds. No in-memory state survives across sessions.
+
+---
+
+## Requirements
+
+- **Node.js** ≥ 20.6.0
+- **Anthropic API key** — handled by Pi's built-in auth flow on first launch
+- **Git** — initialized automatically if missing
+
+Optional:
+- Brave Search API key (web research)
+- Context7 API key (library docs)
+- Jina API key (page extraction)
+
+---
+
+## License
+
+MIT
+
+---
+
+<div align="center">
+
+**The original GSD showed what was possible. This version delivers it.**
+
+**`npm install -g gsd-pi && gsd`**
+
+</div>
--- a/package-lock.json
+++ b/package-lock.json
--- a/package.json
+++ b/package.json
@ -0,0 +1,41 @@
+{
+  "name": "gsd-pi",
+  "version": "0.1.0",
+  "description": "GSD — Get Stuff Done coding agent",
+  "license": "BUSL-1.1",
+  "type": "module",
+  "bin": {
+    "gsd": "dist/loader.js"
+  },
+  "files": [
+    "dist",
+    "pkg",
+    "src/resources",
+    "scripts/postinstall.js",
+    "package.json",
+    "README.md"
+  ],
+  "piConfig": {
+    "name": "gsd",
+    "configDir": ".gsd"
+  },
+  "engines": {
+    "node": ">=20.6.0"
+  },
+  "scripts": {
+    "build": "tsc && npm run copy-themes",
+    "copy-themes": "node -e \"const{mkdirSync,cpSync}=require('fs');const{resolve}=require('path');const src=resolve(__dirname,'node_modules/@mariozechner/pi-coding-agent/dist/modes/interactive/theme');mkdirSync('pkg/dist/modes/interactive/theme',{recursive:true});cpSync(src,'pkg/dist/modes/interactive/theme',{recursive:true})\"",
+    "test": "node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs --experimental-strip-types --test 'src/resources/extensions/gsd/tests/*.test.ts' 'src/resources/extensions/gsd/tests/*.test.mjs' 'src/tests/*.test.ts'",
+    "dev": "tsc --watch",
+    "postinstall": "node scripts/postinstall.js",
+    "prepublishOnly": "npm run build"
+  },
+  "dependencies": {
+    "@mariozechner/pi-coding-agent": "^0.57.1",
+    "playwright": "^1.58.2"
+  },
+  "devDependencies": {
+    "@types/node": "^22.0.0",
+    "typescript": "^5.4.0"
+  }
+}
--- a/pkg/package.json
+++ b/pkg/package.json
@ -0,0 +1,8 @@
+{
+  "name": "@glittercowboy/gsd",
+  "version": "0.1.0",
+  "piConfig": {
+    "name": "gsd",
+    "configDir": ".gsd"
+  }
+}
--- a/scripts/postinstall.js
+++ b/scripts/postinstall.js
@ -0,0 +1,10 @@
+#!/usr/bin/env node
+import { execSync } from 'child_process'
+import os from 'os'
+
+const args = os.platform() === 'linux' ? '--with-deps' : ''
+try {
+  execSync(`npx playwright install chromium ${args}`, { stdio: 'inherit' })
+} catch {
+  // Non-fatal — browser tools will show a clear error if playwright is missing
+}
--- a/scripts/verify-s03.sh
+++ b/scripts/verify-s03.sh
@ -0,0 +1,164 @@
+#!/usr/bin/env bash
+# S03 verification — first-run optional tool key wizard
+
+FAIL=0
+pass() { echo "  PASS: $1"; }
+fail() { echo "  FAIL: $1"; FAIL=1; }
+
+# Run node with a timeout using background kill (macOS has no GNU timeout)
+run_bg() {
+  local secs="$1"; shift
+  local tmp; tmp=$(mktemp)
+  local exit_tmp; exit_tmp=$(mktemp)
+  echo "" > "$exit_tmp"
+  ( "$@" > "$tmp" 2>&1; echo "$?" > "$exit_tmp" ) &
+  local pid=$!
+  sleep "$secs"
+  kill "$pid" 2>/dev/null || true
+  wait "$pid" 2>/dev/null || true
+  local code; code=$(cat "$exit_tmp")
+  cat "$tmp"
+  rm -f "$tmp" "$exit_tmp"
+  # Return the actual exit code if the process finished, else 0 (still running = ok)
+  [ -n "$code" ] && return "$code" || return 0
+}
+
+echo "=== S03 Verification ==="
+echo ""
+
+# ----------------------------------------------------------------
+# Check 1 — Build: dist outputs exist
+# ----------------------------------------------------------------
+echo "--- Build ---"
+if [ -f "dist/wizard.js" ] && [ -f "dist/cli.js" ] && [ -f "dist/loader.js" ]; then
+  pass "1 — dist/wizard.js, dist/cli.js, dist/loader.js exist"
+else
+  echo "  (building...)"
+  npm run build --silent 2>&1
+  if [ -f "dist/wizard.js" ] && [ -f "dist/cli.js" ] && [ -f "dist/loader.js" ]; then
+    pass "1 — build succeeded"
+  else
+    fail "1 — build failed or dist files missing"
+  fi
+fi
+
+echo ""
+echo "--- Non-TTY optional-key warning path ---"
+
+# ----------------------------------------------------------------
+# Check 2 — Non-TTY with all optional keys unset → warning on stderr, no exit 1
+# Uses a clean env with only ANTHROPIC_API_KEY set so the TUI can start,
+# then kills after 3s. The warning is emitted before the TUI launches.
+# ----------------------------------------------------------------
+tmp2=$(mktemp)
+(
+  env -i HOME="$HOME" PATH="$PATH" ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY:-}" \
+    node dist/loader.js < /dev/null > "$tmp2" 2>&1
+  echo "$?" >> "$tmp2"
+) &
+pid2=$!
+sleep 3
+kill "$pid2" 2>/dev/null || true
+wait "$pid2" 2>/dev/null || true
+
+if grep -q "Warning.*optional" "$tmp2" 2>/dev/null; then
+  pass "2 — Non-TTY missing optional keys → stderr warning emitted"
+else
+  fail "2 — Non-TTY missing optional keys → stderr warning emitted"
+  echo "    Output: $(head -3 "$tmp2")"
+fi
+
+# Check it does NOT exit 1 for missing optional keys (last line if process exited)
+last_line=$(tail -1 "$tmp2")
+if [ "$last_line" = "1" ]; then
+  fail "3 — Non-TTY missing optional keys → does NOT exit 1 (got exit 1)"
+else
+  pass "3 — Non-TTY missing optional keys → does NOT exit 1"
+fi
+rm -f "$tmp2"
+
+echo ""
+echo "--- Wizard skip when all keys present ---"
+
+# ----------------------------------------------------------------
+# Check 4 — All optional keys in env → wizard does not fire (no prompt text)
+# ----------------------------------------------------------------
+tmp4=$(mktemp)
+(
+  env -i HOME="$HOME" PATH="$PATH" \
+    ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY:-}" \
+    BRAVE_API_KEY="test-brave" \
+    CONTEXT7_API_KEY="test-ctx7" \
+    JINA_API_KEY="test-jina" \
+    node dist/loader.js < /dev/null > "$tmp4" 2>&1
+) &
+pid4=$!
+sleep 3
+kill "$pid4" 2>/dev/null || true
+wait "$pid4" 2>/dev/null || true
+
+if grep -qiE "optional tool|Some optional|Press Enter to skip" "$tmp4" 2>/dev/null; then
+  fail "4 — All optional keys in env → wizard does not fire"
+  echo "    Output contained wizard text: $(grep -iE 'optional|Press Enter' "$tmp4" | head -2)"
+else
+  pass "4 — All optional keys in env → wizard does not fire"
+fi
+rm -f "$tmp4"
+
+echo ""
+echo "--- loadStoredEnvKeys hydration ---"
+
+# ----------------------------------------------------------------
+# Check 5 — Structural: env var names compiled into dist/wizard.js
+# ----------------------------------------------------------------
+if grep -q "BRAVE_API_KEY" dist/wizard.js && grep -q "CONTEXT7_API_KEY" dist/wizard.js && grep -q "JINA_API_KEY" dist/wizard.js; then
+  pass "5 — dist/wizard.js contains all three optional key env var names"
+else
+  fail "5 — dist/wizard.js missing one or more optional key env var names"
+fi
+
+# ----------------------------------------------------------------
+# Check 6 — loadStoredEnvKeys: stored brave key is set into process.env
+# Write a test auth.json with a brave key, run loader, confirm no crash
+# ----------------------------------------------------------------
+tmp_auth=$(mktemp)
+cat > "$tmp_auth" <<'EOF'
+{"brave":{"type":"api_key","key":"test-brave-stored"}}
+EOF
+
+tmp6=$(mktemp)
+(
+  env -i HOME="$HOME" PATH="$PATH" \
+    ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY:-}" \
+    GSD_TEST_AUTH_PATH="$tmp_auth" \
+    node -e "
+      import('./dist/app-paths.js').then(async (paths) => {
+        // Override authFilePath for test
+        const { AuthStorage } = await import('@mariozechner/pi-coding-agent');
+        const { loadStoredEnvKeys } = await import('./dist/wizard.js');
+        const auth = AuthStorage.create('$tmp_auth');
+        loadStoredEnvKeys(auth);
+        const val = process.env.BRAVE_API_KEY;
+        process.stdout.write('BRAVE_API_KEY=' + (val || '') + '\n');
+        process.exit(0);
+      });
+    " > "$tmp6" 2>&1
+) || true
+
+if grep -q "BRAVE_API_KEY=test-brave-stored" "$tmp6" 2>/dev/null; then
+  pass "6 — loadStoredEnvKeys hydrates BRAVE_API_KEY from auth.json"
+else
+  fail "6 — loadStoredEnvKeys hydrates BRAVE_API_KEY from auth.json"
+  echo "    Output: $(cat "$tmp6")"
+fi
+rm -f "$tmp_auth" "$tmp6"
+
+echo ""
+echo "=== Results ==="
+if [ "$FAIL" -eq 0 ]; then
+  echo "All checks passed."
+  exit 0
+else
+  echo "One or more checks FAILED."
+  exit 1
+fi
--- a/scripts/verify-s04.sh
+++ b/scripts/verify-s04.sh
@ -0,0 +1,247 @@
+#!/usr/bin/env bash
+# S04 verification — npm pack tarball install smoke test
+# Checks: dist integrity, GSD_BUNDLED_EXTENSION_PATHS, prepublishOnly,
+#         npm pack dry-run, tarball install, binary exists, launch (no extension
+#         errors, "gsd" branding), ~/.pi/ untouched, non-TTY warning/no exit 1.
+
+set -uo pipefail
+
+FAIL=0
+pass() { echo "  PASS: $1"; }
+fail() { echo "  FAIL: $1"; FAIL=1; }
+
+SMOKE_PREFIX=/tmp/gsd-smoke-prefix
+TARBALL=""
+
+# Capture ~/.pi/agent/sessions/ count before any smoke runs (for Check 9)
+PI_SESSIONS_BEFORE=$(ls ~/.pi/agent/sessions/ 2>/dev/null | wc -l | tr -d ' ')
+
+cleanup() {
+  rm -rf "$SMOKE_PREFIX"
+  if [ -n "$TARBALL" ] && [ -f "$TARBALL" ]; then
+    rm -f "$TARBALL"
+  fi
+}
+trap cleanup EXIT
+
+echo "=== S04 Verification ==="
+echo ""
+
+# ----------------------------------------------------------------
+# Check 1 — dist/loader.js exists and has NODE_PATH block
+# ----------------------------------------------------------------
+echo "--- Dist integrity ---"
+if [ -f "dist/loader.js" ] && grep -q "NODE_PATH" dist/loader.js; then
+  pass "1 — dist/loader.js exists and contains NODE_PATH block"
+else
+  fail "1 — dist/loader.js missing or NODE_PATH block absent"
+fi
+
+# ----------------------------------------------------------------
+# Check 2 — GSD_BUNDLED_EXTENSION_PATHS does NOT reference src/resources
+# ----------------------------------------------------------------
+# The variable must be present and must use agentDir-based paths only.
+paths_line=$(grep "GSD_BUNDLED_EXTENSION_PATHS" dist/loader.js | grep -v "src/resources" | head -1)
+if [ -n "$paths_line" ]; then
+  # Double-check: none of the actual join() lines (not comments) reference src/resources.
+  # We look only at lines containing join( to avoid matching comment lines like "NOT src/resources".
+  if grep -A 15 "GSD_BUNDLED_EXTENSION_PATHS" dist/loader.js | grep "join(" | grep -q "src/resources"; then
+    fail "2 — GSD_BUNDLED_EXTENSION_PATHS still references src/resources path(s)"
+  else
+    pass "2 — GSD_BUNDLED_EXTENSION_PATHS uses agentDir-based paths (no src/resources)"
+  fi
+else
+  fail "2 — GSD_BUNDLED_EXTENSION_PATHS line not found or still references src/resources"
+fi
+
+echo ""
+echo "--- package.json hooks ---"
+
+# ----------------------------------------------------------------
+# Check 3 — prepublishOnly present in package.json
+# ----------------------------------------------------------------
+if node -e "const p=JSON.parse(require('fs').readFileSync('package.json','utf8')); process.exit(p.scripts?.prepublishOnly ? 0 : 1)" 2>/dev/null; then
+  pass "3 — prepublishOnly hook present in package.json"
+else
+  fail "3 — prepublishOnly hook missing from package.json"
+fi
+
+echo ""
+echo "--- npm pack dry-run ---"
+
+# ----------------------------------------------------------------
+# Check 4 — npm pack --dry-run lists expected files
+# ----------------------------------------------------------------
+dry_out=$(npm pack --dry-run 2>&1)
+file_count=$(echo "$dry_out" | grep -c "npm notice" || true)
+has_src=$(echo "$dry_out" | grep -q "src/resources" && echo "yes" || echo "no")
+has_dist=$(echo "$dry_out" | grep -q "dist/" && echo "yes" || echo "no")
+has_pkg=$(echo "$dry_out" | grep -q "pkg/" && echo "yes" || echo "no")
+
+# Count actual files listed (lines with a path, not summary lines)
+file_lines=$(echo "$dry_out" | grep "npm notice" | grep -v "=== Tarball" | grep -v "filename\|package size\|unpacked size\|shasum\|integrity\|total files" | wc -l | tr -d ' ')
+
+if [ "$file_lines" -ge 100 ] && [ "$has_dist" = "yes" ] && [ "$has_pkg" = "yes" ]; then
+  # src/resources check — warn but don't fail if absent (it's in "files" array but may not produce 100+ files on its own)
+  if [ "$has_src" = "yes" ]; then
+    pass "4 — dry-run: ${file_lines} files listed, dist/ present, pkg/ present, src/resources present"
+  else
+    fail "4 — dry-run: ${file_lines} files listed but src/resources NOT in pack output"
+    echo "    (dry-run output tail:)"
+    echo "$dry_out" | tail -10 | sed 's/^/    /'
+  fi
+elif [ "$file_lines" -lt 100 ]; then
+  fail "4 — dry-run: only ${file_lines} files listed (expected >=100)"
+  echo "$dry_out" | tail -10 | sed 's/^/    /'
+else
+  fail "4 — dry-run: dist/=${has_dist} pkg/=${has_pkg}"
+  echo "$dry_out" | tail -10 | sed 's/^/    /'
+fi
+
+echo ""
+echo "--- tarball pack ---"
+
+# ----------------------------------------------------------------
+# Check 5 — npm pack produces a tarball
+# ----------------------------------------------------------------
+# Note: prepublishOnly triggers a build here (expected).
+npm pack --silent 2>/dev/null || npm pack 2>&1 | tail -5
+TARBALL=$(ls glittercowboy-gsd-*.tgz 2>/dev/null | head -1 || true)
+if [ -n "$TARBALL" ] && [ -f "$TARBALL" ]; then
+  pass "5 — tarball produced: $TARBALL"
+else
+  fail "5 — npm pack did not produce a tarball"
+  echo "  Aborting remaining checks — no tarball available."
+  echo ""
+  echo "=== Results ==="
+  echo "One or more checks FAILED."
+  exit 1
+fi
+
+echo ""
+echo "--- tarball install ---"
+
+# ----------------------------------------------------------------
+# Check 6 — tarball installs cleanly to temp prefix
+# ----------------------------------------------------------------
+rm -rf "$SMOKE_PREFIX"
+if npm install -g --prefix "$SMOKE_PREFIX" "./$TARBALL" 2>&1 | tail -5; then
+  pass "6 — tarball installed to $SMOKE_PREFIX (exit 0)"
+else
+  fail "6 — tarball install failed"
+fi
+
+# ----------------------------------------------------------------
+# Check 7 — binary exists at expected path after install
+# ----------------------------------------------------------------
+if [ -f "$SMOKE_PREFIX/bin/gsd" ] || [ -L "$SMOKE_PREFIX/bin/gsd" ]; then
+  pass "7 — $SMOKE_PREFIX/bin/gsd exists after install"
+else
+  fail "7 — $SMOKE_PREFIX/bin/gsd not found after install"
+  ls -la "$SMOKE_PREFIX/bin/" 2>/dev/null || echo "    (bin/ dir does not exist)"
+fi
+
+echo ""
+echo "--- launch smoke ---"
+
+# ----------------------------------------------------------------
+# Check 8 — launch: "gsd" branding + zero extension load errors
+# Use background kill pattern (macOS has no GNU timeout).
+# Allow 8s for extensions to load.
+# ----------------------------------------------------------------
+smoke_out=$(mktemp)
+(
+  env -i HOME="$HOME" PATH="$PATH" \
+    "$SMOKE_PREFIX/bin/gsd" < /dev/null > "$smoke_out" 2>&1
+) &
+smoke_pid=$!
+sleep 8
+kill "$smoke_pid" 2>/dev/null || true
+wait "$smoke_pid" 2>/dev/null || true
+
+ext_errors=$(grep "Extension load error" "$smoke_out" 2>/dev/null | wc -l | tr -d ' ')
+# Strip ANSI escape codes for branding check
+plain_out=$(sed 's/\x1b\[[0-9;]*m//g' "$smoke_out" 2>/dev/null || cat "$smoke_out")
+has_gsd=$(echo "$plain_out" | grep -qi "gsd\|get stuff done" && echo "yes" || echo "no")
+
+if [ "$ext_errors" -eq 0 ]; then
+  pass "8a — zero Extension load errors on launch"
+else
+  fail "8a — ${ext_errors} Extension load error(s) on launch"
+  grep "Extension load error" "$smoke_out" | head -5 | sed 's/^/    /'
+fi
+
+if [ "$has_gsd" = "yes" ]; then
+  pass "8b — \"gsd\" / \"get stuff done\" branding found in launch output"
+else
+  # Fallback: check if binary self-identifies differently (not "pi")
+  has_pi_only=$(echo "$plain_out" | grep -qi "^pi\b" && echo "yes" || echo "no")
+  if [ "$has_pi_only" = "no" ]; then
+    pass "8b — output does not show \"pi\" branding (gsd branding likely in ANSI sequences)"
+  else
+    fail "8b — output shows \"pi\" branding instead of \"gsd\""
+    head -5 "$smoke_out" | sed 's/^/    /'
+  fi
+fi
+rm -f "$smoke_out"
+
+echo ""
+echo "--- ~/.pi/ isolation ---"
+
+# ----------------------------------------------------------------
+# Check 9 — ~/.pi/ session count unchanged before/after smoke run
+# PI_SESSIONS_BEFORE captured at script start (before any binary invocation).
+# ----------------------------------------------------------------
+pi_after=$(ls ~/.pi/agent/sessions/ 2>/dev/null | wc -l | tr -d ' ')
+if [ "$PI_SESSIONS_BEFORE" = "$pi_after" ]; then
+  pass "9 — ~/.pi/agent/sessions/ count unchanged (${pi_after} sessions before and after)"
+else
+  fail "9 — ~/.pi/agent/sessions/ count changed: was ${PI_SESSIONS_BEFORE}, now ${pi_after}"
+fi
+
+echo ""
+echo "--- non-TTY warning path ---"
+
+# ----------------------------------------------------------------
+# Check 10 — non-TTY missing optional keys → warning, no exit 1
+# Run installed binary with minimal env (HOME + PATH only), piped from /dev/null.
+# ----------------------------------------------------------------
+tmp10=$(mktemp)
+exit10_tmp=$(mktemp)
+echo "" > "$exit10_tmp"
+(
+  env -i HOME="$HOME" PATH="$PATH" \
+    "$SMOKE_PREFIX/bin/gsd" < /dev/null > "$tmp10" 2>&1
+  echo "$?" > "$exit10_tmp"
+) &
+pid10=$!
+sleep 5
+kill "$pid10" 2>/dev/null || true
+wait "$pid10" 2>/dev/null || true
+
+if grep -qi "warning\|optional" "$tmp10" 2>/dev/null; then
+  pass "10a — non-TTY missing optional keys → warning emitted"
+else
+  fail "10a — non-TTY missing optional keys → no warning found in output"
+  echo "    Output (first 5 lines):"
+  head -5 "$tmp10" | sed 's/^/    /'
+fi
+
+exit10_code=$(cat "$exit10_tmp")
+if [ "$exit10_code" = "1" ]; then
+  fail "10b — non-TTY missing optional keys → exited with code 1 (should not)"
+  echo "    Output: $(head -3 "$tmp10")"
+else
+  pass "10b — non-TTY missing optional keys → did NOT exit 1 (code: ${exit10_code:-killed})"
+fi
+rm -f "$tmp10" "$exit10_tmp"
+
+echo ""
+echo "=== Results ==="
+if [ "$FAIL" -eq 0 ]; then
+  echo "All checks passed."
+  exit 0
+else
+  echo "One or more checks FAILED."
+  exit 1
+fi
--- a/src/app-paths.ts
+++ b/src/app-paths.ts
@ -0,0 +1,7 @@
+import { homedir } from 'os'
+import { join } from 'path'
+
+export const appRoot = join(homedir(), '.gsd')
+export const agentDir = join(appRoot, 'agent')
+export const sessionsDir = join(appRoot, 'sessions')
+export const authFilePath = join(agentDir, 'auth.json')
--- a/src/cli.ts
+++ b/src/cli.ts
@ -0,0 +1,51 @@
+import {
+  AuthStorage,
+  ModelRegistry,
+  SettingsManager,
+  SessionManager,
+  createAgentSession,
+  InteractiveMode,
+} from '@mariozechner/pi-coding-agent'
+import { agentDir, sessionsDir, authFilePath } from './app-paths.js'
+import { buildResourceLoader, initResources } from './resource-loader.js'
+import { loadStoredEnvKeys, runWizardIfNeeded } from './wizard.js'
+
+const authStorage = AuthStorage.create(authFilePath)
+loadStoredEnvKeys(authStorage)
+await runWizardIfNeeded(authStorage)
+
+const modelRegistry = new ModelRegistry(authStorage)
+const settingsManager = SettingsManager.create(agentDir)
+
+// GSD always uses quiet startup — the gsd extension renders its own branded header
+if (!settingsManager.getQuietStartup()) {
+  settingsManager.setQuietStartup(true)
+}
+
+// Collapse changelog by default — avoid wall of text on updates
+if (!settingsManager.getCollapseChangelog()) {
+  settingsManager.setCollapseChangelog(true)
+}
+
+const sessionManager = SessionManager.create(process.cwd(), sessionsDir)
+
+initResources(agentDir)
+const resourceLoader = buildResourceLoader(agentDir)
+await resourceLoader.reload()
+
+const { session, extensionsResult } = await createAgentSession({
+  authStorage,
+  modelRegistry,
+  settingsManager,
+  sessionManager,
+  resourceLoader,
+})
+
+if (extensionsResult.errors.length > 0) {
+  for (const err of extensionsResult.errors) {
+    process.stderr.write(`[gsd] Extension load error: ${err.error}\n`)
+  }
+}
+
+const interactiveMode = new InteractiveMode(session)
+await interactiveMode.run()
--- a/src/loader.ts
+++ b/src/loader.ts
@ -0,0 +1,77 @@
+#!/usr/bin/env node
+import { fileURLToPath } from 'url'
+import { dirname, resolve, join } from 'path'
+import { readFileSync } from 'fs'
+import { agentDir } from './app-paths.js'
+
+// pkg/ is a shim directory: contains gsd's piConfig (package.json) and pi's
+// theme assets (dist/modes/interactive/theme/) without a src/ directory.
+// This allows config.js to:
+//   1. Read piConfig.name → "gsd" (branding)
+//   2. Resolve themes via dist/ (no src/ present → uses dist path)
+const pkgDir = resolve(dirname(fileURLToPath(import.meta.url)), '..', 'pkg')
+
+// MUST be set before any dynamic import of pi SDK fires — this is what config.js
+// reads to determine APP_NAME and CONFIG_DIR_NAME
+process.env.PI_PACKAGE_DIR = pkgDir
+process.title = 'gsd'
+
+// GSD_CODING_AGENT_DIR — tells pi's getAgentDir() to return ~/.gsd/agent/ instead of ~/.pi/agent/
+process.env.GSD_CODING_AGENT_DIR = agentDir
+
+// NODE_PATH — make gsd's own node_modules available to extensions loaded via jiti.
+// Without this, extensions (e.g. browser-tools) can't resolve dependencies like
+// `playwright` because jiti resolves modules from pi-coding-agent's location, not gsd's.
+// Prepending gsd's node_modules to NODE_PATH fixes this for all extensions.
+const gsdRoot = resolve(dirname(fileURLToPath(import.meta.url)), '..')
+const gsdNodeModules = join(gsdRoot, 'node_modules')
+process.env.NODE_PATH = process.env.NODE_PATH
+  ? `${gsdNodeModules}:${process.env.NODE_PATH}`
+  : gsdNodeModules
+// Force Node to re-evaluate module search paths with the updated NODE_PATH.
+// Must happen synchronously before cli.js imports → extension loading.
+// eslint-disable-next-line @typescript-eslint/no-require-imports
+const { Module } = await import('module');
+(Module as any)._initPaths?.()
+
+// GSD_VERSION — expose package version so extensions can display it
+try {
+  const gsdPkg = JSON.parse(readFileSync(join(gsdRoot, 'package.json'), 'utf-8'))
+  process.env.GSD_VERSION = gsdPkg.version || '0.0.0'
+} catch {
+  process.env.GSD_VERSION = '0.0.0'
+}
+
+// GSD_BIN_PATH — absolute path to this loader (dist/loader.js), used by patched subagent
+// to spawn gsd instead of pi when dispatching workflow tasks
+process.env.GSD_BIN_PATH = process.argv[1]
+
+// GSD_WORKFLOW_PATH — absolute path to bundled GSD-WORKFLOW.md, used by patched gsd extension
+// when dispatching workflow prompts (dist/loader.js → ../src/resources/GSD-WORKFLOW.md)
+const resourcesDir = resolve(dirname(fileURLToPath(import.meta.url)), '..', 'src', 'resources')
+process.env.GSD_WORKFLOW_PATH = join(resourcesDir, 'GSD-WORKFLOW.md')
+
+// GSD_BUNDLED_EXTENSION_PATHS — colon-joined list of all bundled extension entry point absolute
+// paths, used by patched subagent to pass --extension <path> to spawned gsd processes.
+// IMPORTANT: paths point to agentDir (~/.gsd/agent/extensions/) NOT src/resources/extensions/.
+// initResources() syncs bundled extensions to agentDir before any extension loading occurs,
+// so these paths are always valid at runtime. Using agentDir paths matches what buildResourceLoader
+// discovers (it scans agentDir), so pi's deduplication works correctly and extensions are not
+// double-loaded in subagent child processes.
+// Note: shared/ is NOT included — it's a library imported by gsd and ask-user-questions, not an entry point.
+process.env.GSD_BUNDLED_EXTENSION_PATHS = [
+  join(agentDir, 'extensions', 'gsd', 'index.ts'),
+  join(agentDir, 'extensions', 'bg-shell', 'index.ts'),
+  join(agentDir, 'extensions', 'browser-tools', 'index.ts'),
+  join(agentDir, 'extensions', 'context7', 'index.ts'),
+  join(agentDir, 'extensions', 'search-the-web', 'index.ts'),
+  join(agentDir, 'extensions', 'slash-commands', 'index.ts'),
+  join(agentDir, 'extensions', 'subagent', 'index.ts'),
+  join(agentDir, 'extensions', 'worktree', 'index.ts'),
+  join(agentDir, 'extensions', 'plan-mode', 'index.ts'),
+  join(agentDir, 'extensions', 'ask-user-questions.ts'),
+  join(agentDir, 'extensions', 'get-secrets-from-user.ts'),
+].join(':')
+
+// Dynamic import defers ESM evaluation — config.js will see PI_PACKAGE_DIR above
+await import('./cli.js')
--- a/src/resource-loader.ts
+++ b/src/resource-loader.ts
@ -0,0 +1,54 @@
+import { DefaultResourceLoader } from '@mariozechner/pi-coding-agent'
+import { cpSync, existsSync, mkdirSync, readFileSync, writeFileSync } from 'node:fs'
+import { dirname, join, resolve } from 'node:path'
+import { fileURLToPath } from 'node:url'
+
+// Resolves to the bundled src/resources/ inside the npm package at runtime:
+//   dist/resource-loader.js → .. → package root → src/resources/
+const resourcesDir = resolve(dirname(fileURLToPath(import.meta.url)), '..', 'src', 'resources')
+const bundledExtensionsDir = join(resourcesDir, 'extensions')
+
+/**
+ * Syncs all bundled resources to agentDir (~/.gsd/agent/) on every launch.
+ *
+ * - extensions/ → ~/.gsd/agent/extensions/   (always overwrite — ensures updates ship on next launch)
+ * - agents/     → ~/.gsd/agent/agents/        (always overwrite)
+ * - AGENTS.md   → ~/.gsd/agent/AGENTS.md      (always overwrite)
+ * - GSD-WORKFLOW.md is read directly from bundled path via GSD_WORKFLOW_PATH env var
+ *
+ * Always-overwrite ensures `npm update -g @glittercowboy/gsd` takes effect immediately.
+ * User customizations should go in ~/.gsd/agent/extensions/ subdirs with unique names,
+ * not by editing the gsd-managed files.
+ *
+ * Inspectable: `ls ~/.gsd/agent/extensions/`
+ */
+export function initResources(agentDir: string): void {
+  mkdirSync(agentDir, { recursive: true })
+
+  // Sync extensions — always overwrite so updates land on next launch
+  const destExtensions = join(agentDir, 'extensions')
+  cpSync(bundledExtensionsDir, destExtensions, { recursive: true, force: true })
+
+  // Sync agents
+  const destAgents = join(agentDir, 'agents')
+  const srcAgents = join(resourcesDir, 'agents')
+  if (existsSync(srcAgents)) {
+    cpSync(srcAgents, destAgents, { recursive: true, force: true })
+  }
+
+  // Sync AGENTS.md
+  const srcAgentsMd = join(resourcesDir, 'AGENTS.md')
+  const destAgentsMd = join(agentDir, 'AGENTS.md')
+  if (existsSync(srcAgentsMd)) {
+    writeFileSync(destAgentsMd, readFileSync(srcAgentsMd))
+  }
+}
+
+/**
+ * Constructs a DefaultResourceLoader with no additionalExtensionPaths.
+ * Extensions are synced to agentDir by initResources() and pi auto-discovers
+ * them from ~/.gsd/agent/extensions/ via its normal agentDir scan.
+ */
+export function buildResourceLoader(agentDir: string): DefaultResourceLoader {
+  return new DefaultResourceLoader({ agentDir })
+}
--- a/src/resources/AGENTS.md
+++ b/src/resources/AGENTS.md
@ -0,0 +1,204 @@
+## Hard Rules
+
+- Never ask the user to do work the agent can execute or verify itself.
+- Use the lightest sufficient tool first.
+- Read before edit.
+- Reproduce before fix when possible.
+- Work is not done until the relevant verification has passed.
+- Never print, echo, log, or restate secrets or credentials. Report only key names and applied/skipped status.
+- Never ask the user to edit `.env` files or set secrets manually. Use `secure_env_collect`.
+- For nontrivial work inside `~/.pi`, use a worktree by default.
+- In enduring files, write current state only unless the file is explicitly historical.
+
+## Execution Heuristics
+
+### Tool-routing hierarchy
+
+Use the lightest sufficient tool first.
+
+- Known file path, need contents -> `read`
+- Search repo text or symbols -> `bash` with `rg`
+- Search by filename or path -> `bash` with `find` or `rg --files`
+- Precise existing-file change -> `read` then `edit`
+- New file or full rewrite -> `write`
+- Broad unfamiliar subsystem mapping -> `subagent` with `scout`
+- Library, package, or framework truth -> `resolve_library` then `get_library_docs`
+- Current external facts -> `search-the-web`, then `fetch_page` for full page content
+- Long-running or indefinite shell commands (servers, watchers, builds) -> `bg_shell` with `start` + `wait_for_ready`
+- Background process status check -> `bg_shell` with `digest` (not `output`)
+- Background process debugging -> `bg_shell` with `highlights`, then `output` with `filter`
+- UI behavior verification -> browser tools
+- Secrets -> `secure_env_collect`
+
+### Web research vs browser execution
+
+Treat these as different jobs.
+
+- Use `search-the-web` + `fetch_page` for current external knowledge: release notes, product changes, pricing, news, public docs, and fast-moving ecosystem facts.
+- Use browser tools for interactive execution and verification: local app flows, reproducing browser bugs, DOM behavior, navigation, auth flows, and user-visible UI outcomes.
+- Do not use browser tools as a substitute for web research.
+- Do not use web search as a substitute for exercising a real browser flow.
+
+### Investigation escalation ladder
+
+Escalate in this order:
+
+1. Direct action if the target is explicit and the change is low-risk
+2. Targeted search with `rg` or `find`
+3. Minimal file reads
+4. `scout` when direct exploration would require reading many files or building a broad mental map
+5. Multi-agent chains for large, architectural, or multi-stage work
+
+### Ask vs infer
+
+Use `ask_user_questions` when the answer is intent-driven and materially affects the result.
+
+Ask only when the answer:
+
+- materially affects behavior, architecture, data shape, or user-visible outcomes
+- cannot be derived from repo evidence, docs, runtime behavior, tests, browser inspection, or command output
+- is needed to avoid an irreversible or high-cost mistake
+
+Do not ask when:
+
+- the answer is discoverable
+- the ambiguity is minor and the next step is safe and reversible
+- the user already asked for direct execution and the path is clear enough
+
+If multiple reasonable interpretations exist, choose the smallest safe reversible action that advances the task.
+
+### Context economy
+
+- Prefer minimum sufficient context over broad exploration.
+- Do not read extra files just in case.
+- Stop investigating once there is enough evidence to make a safe, testable change.
+- Use `scout` to compress broad unfamiliar exploration instead of manually reading many files.
+- When gathering independent facts from known files, read them in parallel when useful.
+
+### Code structure and abstraction
+
+- Build with future reuse in mind, especially for code likely to be consumed across tools, extensions, hooks, UI surfaces, or shared subsystems.
+- Prefer small, composable primitives with clear responsibilities over large monolithic modules.
+- Extract around real seams: parsing, normalization, validation, formatting, side-effect boundaries, transport, persistence, orchestration, and rendering.
+- Separate orchestration from implementation details. High-level flows should read clearly; low-level helpers should stay focused.
+- Prefer boring, standard abstractions over clever custom frameworks or one-off indirection layers.
+- Do not abstract for its own sake. If the interface is unclear or the shape is still changing, keep code local until the seam stabilizes.
+- When a small primitive is obviously reusable and cheap to extract, do it early rather than duplicating logic.
+- Optimize for code that is easy to recombine, test, and consume later — not just code that solves the immediate task.
+- Preserve local consistency with the surrounding codebase unless the task explicitly includes broader refactoring.
+
+### Verification and definition of done
+
+Verify according to task type.
+
+- Bug fix -> rerun the exact repro
+- Script or CLI fix -> rerun the exact command
+- UI or web fix -> verify in the browser and check console or network logs when relevant
+- Env or secrets fix -> rerun the blocked workflow after applying secrets
+- Refactor -> run tests or build plus a targeted smoke check
+- File delete, move, or rename -> confirm filesystem state
+- Docs or config change -> verify referenced paths, commands, and settings match reality
+
+If a command or workflow fails, continue the loop: inspect the error, fix it, rerun it, and repeat until it passes or a real blocker requires user input.
+
+### Root-cause-first debugging
+
+- Fix the root cause, not just the visible symptom, unless the user explicitly wants a temporary workaround.
+- Prefer changes that remove the failure mode over changes that merely mask it.
+- When applying a temporary mitigation, label it clearly and preserve a path to the real fix.
+
+## Situational Playbooks
+
+### Background processes
+
+Use `bg_shell` instead of `bash` for any command that runs indefinitely or takes a long time.
+
+**Starting processes:**
+
+- Set `type:'server'` and `ready_port:<port>` for dev servers so readiness detection is automatic.
+- Set `group:'<name>'` on related processes (e.g. frontend + backend) to manage them together.
+- Use `ready_pattern:'<regex>'` for processes with non-standard readiness signals.
+- The tool auto-classifies commands as server/build/test/watcher/generic and applies smart defaults.
+
+**After starting — use `wait_for_ready` instead of polling:**
+
+- `wait_for_ready` blocks until the process signals readiness (pattern match or port open) or times out.
+- This replaces the old pattern of `start` → `sleep` → `output` → check → repeat. One tool call instead of many.
+
+**Checking status — use `digest` instead of `output`:**
+
+- `digest` returns a structured ~30-token summary (status, ports, URLs, error count, change summary) instead of ~2000 tokens of raw output. Use this by default.
+- `highlights` returns only significant lines (errors, URLs, results) — typically 5-15 lines instead of hundreds.
+- `output` returns raw incremental lines — use only when debugging and you need full text. Add `filter:'error|warning'` to narrow results.
+- Token budget hierarchy: `digest` (~30 tokens) < `highlights` (~100 tokens) < `output` (~2000 tokens). Always start with the lightest.
+
+**Lifecycle awareness:**
+
+- Process crashes and errors are automatically surfaced as alerts at the start of your next turn — you don't need to poll for failures.
+- Use `group_status` to check health of related processes as a unit.
+- Use `restart` to kill and relaunch with the same config — preserves restart count.
+
+**Interactive processes:**
+
+- Use `send_and_wait` for interactive CLIs: send input and wait for an expected output pattern. Replaces manual `send` → `sleep` → `output` polling.
+
+**Cleanup:**
+
+- Kill processes when done with them — do not leave orphans.
+- Use `list` to see all running background processes.
+
+### Web behavior
+
+When the task involves frontend behavior, DOM interactions, navigation, or user flows, verify with browser tools against a running app before marking the work complete.
+
+Use browser tools with this operating order unless there is a clear reason not to:
+
+1. Cheap discovery first — use `browser_find` or `browser_snapshot_refs` to locate likely targets
+2. Deterministic targeting — prefer refs or explicit selectors over coordinates
+3. Batch obvious sequences — if the next 2-5 browser actions are clear and low-risk, use `browser_batch`
+4. Assert outcomes explicitly — prefer `browser_assert` over inferring success from prose summaries
+5. Diff ambiguous outcomes — use `browser_diff` when the effect of an action is unclear
+6. Inspect diagnostics only when needed — use console/network/dialog logs when assertions or diffs suggest failure
+7. Escalate inspection gradually — use `browser_get_accessibility_tree` only when targeted discovery is insufficient; use `browser_get_page_source` and `browser_evaluate` as escape hatches, not defaults
+8. Use screenshots as supporting evidence — do not default to screenshot-first browsing when semantic tools are sufficient
+
+For browser or UI work, “verified” means the flow was exercised and the expected outcome was checked explicitly with `browser_assert` or an equally structured browser signal whenever possible.
+
+For browser failures, debug in this order:
+
+1. inspect the failing assertion or explicit success signal
+2. inspect `browser_diff`
+3. inspect recent console/network/dialog diagnostics
+4. inspect targeted element or accessibility state
+5. only then escalate to broader page inspection
+
+Retry only with a new hypothesis. Do not thrash.
+
+### Libraries, packages, and frameworks
+
+When a task depends on a library or framework API, use Context7 before coding.
+
+- Call `resolve_library` first
+- Choose the highest-trust, highest-benchmark match
+- Call `get_library_docs` with a specific topic query
+- Start with `tokens=5000`
+- Increase to `10000` only if the first result lacks needed detail
+
+### Current external facts
+
+When a task involves current events, release notes, pricing, or facts likely to have changed after training, use `search-the-web` before answering.
+
+- Use `freshness` to scope results by recency: `day`, `week`, `month`, `year`. Auto-detection applies when the query contains recency signals like year numbers or "latest".
+- Use `domain` to limit results to a specific site when you know where the answer lives (e.g., `domain: "docs.python.org"`).
+- Use `fetch_page` to read the full content of promising URLs from search results. Search snippets are a table of contents — `fetch_page` gets the actual content as clean markdown.
+- Start `fetch_page` with the default `maxChars` (8000). Use smaller values for quick checks, larger (up to 30000) for thorough reading. Token-conscious: prefer reading one good page over skimming five.
+- The search→read pattern is: `search-the-web` to find URLs, then `fetch_page` on the most promising 1-2 results. Don't fetch everything — be selective.
+
+## Communication and Writing Style
+
+- Be direct, professional, and focused on the work.
+- Skip filler, false enthusiasm, and empty agreement.
+- Challenge bad patterns, unnecessary complexity, security issues, and performance problems with concrete reasoning.
+- The user makes the final call.
+- All plans are for the agent's own execution, not an imaginary team's.
+- Avoid enterprise patterns unless the user explicitly asks for them.
--- a/src/resources/GSD-WORKFLOW.md
+++ b/src/resources/GSD-WORKFLOW.md
@ -0,0 +1,661 @@
+# GSD Workflow — Manual Bootstrap Protocol
+
+> This document teaches you how to operate the GSD planning methodology manually using files on disk.
+>
+> **When to read this:** At the start of any session working on GSD-managed work, or when told `read @GSD-WORKFLOW.md`.
+>
+> **After reading this, always read `.gsd/state.md` to find out what's next.**
+> If the milestone has a `context.md`, read that too — it contains project-specific decisions, reference paths, and implementation guidance that this generic methodology doc does not.
+
+---
+
+## Quick Start: "What's next?"
+
+Read these files in order and act on what they say:
+
+1. **`.gsd/state.md`** — Where are we? What's the next action?
+2. **`.gsd/milestones/<active>/roadmap.md`** — What's the plan? Which slices are done? (state.md tells you which milestone is active)
+3. **`.gsd/milestones/<active>/context.md`** — Project-specific decisions, reference paths, constraints. Read this before doing implementation work.
+4. If a slice is active, read its **`plan.md`** — Which tasks exist? Which are done?
+5. If a task was interrupted, check for **`continue.md`** in the active slice directory — Resume from there.
+
+Then do the thing `state.md` says to do next.
+
+---
+
+## The Hierarchy
+
+```
+Milestone  →  a shippable version (4-10 slices)
+  Slice    →  one demoable vertical capability (1-7 tasks)
+    Task   →  one context-window-sized unit of work (fits in one session)
+```
+
+**The iron rule:** A task MUST fit in one context window. If it can't, it's two tasks.
+
+---
+
+## File Locations
+
+All artifacts live in `.gsd/` at the project root:
+
+```
+.gsd/
+  state.md                                  # Dashboard — always read first
+  decisions.md                              # Append-only decisions register
+  milestones/
+    M001/
+      roadmap.md                            # Milestone plan (checkboxes = state)
+      context.md                            # Optional: user decisions from discuss phase
+      research.md                           # Optional: codebase/tech research
+      summary.md                            # Milestone rollup (updated as slices complete)
+      slices/
+        S01/
+          plan.md                           # Task decomposition for this slice
+          context.md                        # Optional: slice-level user decisions
+          research.md                       # Optional: slice-level research
+          summary.md                        # Slice summary (written on completion)
+          uat.md                            # Non-blocking human test script (written on completion)
+          continue.md                       # Ephemeral: resume point if interrupted
+          tasks/
+            T01-plan.md                     # Individual task plan
+            T01-summary.md                  # Task summary with frontmatter
+```
+
+---
+
+## File Format Reference
+
+### `roadmap.md`
+
+```markdown
+# M001: Title of the Milestone
+
+**Vision:** One paragraph describing what this milestone delivers.
+
+**Success Criteria:**
+- Observable outcome 1
+- Observable outcome 2
+
+---
+
+## Slices
+
+- [ ] **S01: Slice Title** `risk:low` `depends:[]`
+  > After this: what the user can demo when this slice is done.
+
+- [ ] **S02: Another Slice** `risk:medium` `depends:[S01]`
+  > After this: demo sentence.
+
+- [x] **S03: Completed Slice** `risk:low` `depends:[S01]`
+  > After this: demo sentence.
+```
+
+**Parsing rules:** `- [x]` = done, `- [ ]` = not done. The `risk:` and `depends:[]` tags are inline metadata parsed from the line. `depends:[]` lists slice IDs this slice requires to be complete first.
+
+**Boundary Map** (required section in roadmap.md):
+
+After the slices section, include a `## Boundary Map` that shows what each slice produces and consumes:
+
+```markdown
+## Boundary Map
+
+### S01 → S02
+Produces:
+  types.ts → User, Session, AuthToken (interfaces)
+  auth.ts  → generateToken(), verifyToken(), refreshToken()
+
+Consumes: nothing (leaf node)
+
+### S02 → S03
+Produces:
+  api/auth/login.ts  → POST handler
+  api/auth/signup.ts → POST handler
+  middleware.ts       → authMiddleware()
+
+Consumes from S01:
+  auth.ts → generateToken(), verifyToken()
+```
+
+The boundary map is a **planning artifact** — not runnable code. It:
+- Forces upfront thinking about slice boundaries before implementation
+- Gives downstream slices a concrete target to code against
+- Enables deterministic verification that slices actually connect
+- Gets updated during slice planning if new interfaces emerge
+
+### `plan.md` (slice-level)
+
+```markdown
+# S01: Slice Title
+
+**Goal:** What this slice achieves.
+**Demo:** What the user can see/do when this is done.
+
+## Must-Haves
+- Observable outcome 1 (used for verification)
+- Observable outcome 2
+
+## Tasks
+
+- [ ] **T01: Task Title**
+  Description of what this task does.
+  
+- [ ] **T02: Another Task**
+  Description.
+
+## Files Likely Touched
+- path/to/file.ts
+- path/to/another.ts
+```
+
+### `TNN-plan.md` (task-level)
+
+```markdown
+# T01: Task Title
+
+**Slice:** S01
+**Milestone:** M001
+
+## Goal
+What this task accomplishes in one sentence.
+
+## Must-Haves
+
+### Truths
+Observable behaviors that must be true when this task is done:
+- "User can sign up with email and password"
+- "Login returns a JWT token"
+
+### Artifacts
+Files that must exist with real implementation (not stubs):
+- `src/lib/auth.ts` — JWT helpers (min 30 lines, exports: generateToken, verifyToken)
+- `src/app/api/auth/login/route.ts` — Login endpoint (exports: POST)
+
+### Key Links
+Critical wiring between artifacts:
+- `login/route.ts` → `auth.ts` via import of `generateToken`
+- `middleware.ts` → `auth.ts` via import of `verifyToken`
+
+## Steps
+1. First thing to do
+2. Second thing to do
+3. Third thing to do
+
+## Context
+- Relevant prior decisions or patterns to follow
+- Key files to read before starting
+```
+
+**Must-haves are what make verification mechanically checkable.** Truths are checked by running commands or reading output. Artifacts are checked by confirming files exist with real content. Key links are checked by confirming imports/references actually connect the pieces.
+
+### `state.md`
+
+```markdown
+# GSD State
+
+**Active Milestone:** M001 — Title
+**Active Slice:** S02 — Slice Title
+**Active Task:** T01 — Task Title
+**Phase:** Executing
+
+## Recent Decisions
+- Decision 1
+- Decision 2
+
+## Blockers
+- None (or list blockers)
+
+## Next Action
+Exact next thing to do.
+```
+
+### `context.md` (from discuss phase)
+
+```markdown
+# S01: Slice Title — Context
+
+**Gathered:** 2026-03-07
+**Status:** Ready for planning
+
+## Implementation Decisions
+- Decision on gray area 1
+- Decision on gray area 2
+
+## Agent's Discretion
+- Areas where the user said "you decide"
+
+## Deferred Ideas
+- Ideas that came up but belong in other slices
+```
+
+### `decisions.md` (append-only register)
+
+```markdown
+# Decisions Register
+
+<!-- Append-only. Never edit or remove existing rows.
+     To reverse a decision, add a new row that supersedes it.
+     Read this file at the start of any planning or research phase. -->
+
+| # | When | Scope | Decision | Choice | Rationale | Revisable? |
+|---|------|-------|----------|--------|-----------|------------|
+| D001 | M001/S01 | library | Validation library | Zod | Type inference, already in deps | No |
+| D002 | M001/S01 | arch | Session storage | HTTP-only cookies | Security, SSR compat | Yes — if mobile added |
+| D003 | M001/S02 | api | API versioning | URL prefix /v1 | Simple, fits scale | Yes |
+| D004 | M001/S03 | convention | Error format | RFC 7807 | Standard, client-friendly | No |
+| D005 | M002/S01 | arch | Session storage | JWT in Authorization header | Mobile client needs it (supersedes D002) | No |
+```
+
+**Rules:**
+- **Append-only** — rows are never edited or removed. To reverse a decision, add a new row that supersedes it (reference the old ID).
+- **#** — Sequential ID (`D001`, `D002`, ...), never reused.
+- **When** — Where the decision was made: `M001`, `M001/S01`, or `M001/S01/T02`.
+- **Scope** — Category tag: `arch`, `pattern`, `library`, `data`, `api`, `scope`, `convention`.
+- **Revisable?** — `No`, or `Yes — trigger condition`.
+
+**When to read:** At the start of any planning or research phase.
+**When to write:** During discussion (seed from context), during planning (structural choices), during task execution (if an architectural choice was made), and during slice completion (catch-all for missed decisions).
+
+---
+
+## The Phases
+
+Work flows through these phases. Each phase produces a file.
+
+### Phase 1: Discuss (Optional)
+
+**Purpose:** Capture user decisions on gray areas before planning.
+**Produces:** `context.md` at milestone or slice level.
+**When to use:** When the scope has ambiguities the user should weigh in on.
+**When to skip:** When the user already knows exactly what they want, or told you to just go.
+
+**How to do it manually:**
+1. Read the roadmap to understand the scope.
+2. Identify 3-5 gray areas — implementation decisions the user cares about.
+3. Use `ask_user_questions` to discuss each area.
+4. Write decisions to `context.md`.
+5. Do NOT discuss how to implement — only what the user wants.
+
+### Phase 2: Research (Optional)
+
+**Purpose:** Scout the codebase and relevant docs before planning.
+**Produces:** `research.md` at milestone or slice level.
+**When to use:** When working in unfamiliar code, with unfamiliar libraries, or on complex integrations.
+**When to skip:** When the codebase is familiar and the work is straightforward.
+
+**How to do it manually:**
+1. Read `context.md` if it exists — know what decisions are locked.
+2. Scout relevant code: `rg`, `find`, read key files.
+3. Use `resolve_library` / `get_library_docs` if needed.
+4. Write findings to `research.md` with these sections:
+
+```markdown
+# S01: Slice Title — Research
+
+**Researched:** 2026-03-07
+**Domain:** Primary technology/problem domain
+**Confidence:** HIGH/MEDIUM/LOW
+
+## Summary
+2-3 paragraph executive summary. Primary recommendation.
+
+## Don't Hand-Roll
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+Problems that look simple but have existing solutions.
+
+## Common Pitfalls
+### Pitfall 1: Name
+**What goes wrong:** ...
+**Why it happens:** ...
+**How to avoid:** ...
+**Warning signs:** ...
+
+## Relevant Code
+Existing files, patterns, reusable assets, integration points.
+
+## Sources
+- Context7: /library/id — topics fetched (HIGH confidence)
+- WebSearch: finding — verified against docs (MEDIUM confidence)
+```
+
+The **Don't Hand-Roll** and **Common Pitfalls** sections prevent the most expensive mistakes.
+
+### Phase 3: Plan
+
+**Purpose:** Decompose work into context-window-sized tasks with must-haves.
+**Produces:** `plan.md` + individual `T01-plan.md` files.
+
+**For a milestone (roadmap):**
+1. Read `context.md`, `research.md`, and `.gsd/decisions.md` if they exist.
+2. Decompose the vision into 4-10 demoable vertical slices.
+3. Order by risk (high-risk first to validate feasibility early).
+4. Write `roadmap.md` with checkboxes, risk levels, dependencies, demo sentences.
+5. **Write the boundary map** — for each slice, specify what it produces (functions, types, interfaces, endpoints) and what it consumes from upstream slices. This forces interface thinking before implementation and enables deterministic verification that slices actually connect.
+
+**For a slice (task decomposition):**
+1. Read the slice's entry in `roadmap.md` **and its boundary map section** — know what interfaces this slice must produce and consume.
+2. Read `context.md`, `research.md`, and `.gsd/decisions.md` if they exist for this slice.
+3. Read summaries from dependency slices (check `depends:[]` in roadmap).
+4. Verify that upstream slices' actual outputs match what the boundary map says this slice consumes. If they diverge, update the boundary map.
+5. Decompose into 1-7 tasks, each fitting one context window.
+6. Each task needs: title, description, steps (3-10), must-haves (observable verification criteria).
+7. Must-haves should reference boundary map contracts — e.g. "exports `generateToken()` as specified in boundary map S01→S02".
+8. Write `plan.md` and individual `TNN-plan.md` files.
+
+### Phase 4: Execute
+
+**Purpose:** Do the work for one task.
+**Produces:** Code changes + `[DONE:n]` markers.
+
+**How to do it manually:**
+1. Read the task's `TNN-plan.md`.
+2. Read relevant summaries from prior tasks (for context on what's already built).
+3. Execute each step. Mark progress with `[DONE:n]` in responses.
+4. If you made an architectural, pattern, or library decision, append it to `.gsd/decisions.md`.
+5. If interrupted or context is getting full, write `continue.md` (see below).
+
+### Phase 5: Verify
+
+**Purpose:** Check that the task's must-haves are actually met.
+**Produces:** Pass/fail determination.
+
+**Verification ladder — use the strongest tier you can reach:**
+1. **Static:** Files exist, exports present, wiring connected, not stubs.
+2. **Command:** Tests pass, build succeeds, lint clean, blocked command works.
+3. **Behavioral:** Browser flows work, API responses correct.
+4. **Human:** Ask the user only when you genuinely can't verify yourself.
+
+**The rule:** "All steps done" is NOT verification. Check the actual outcomes.
+
+**Verification report format** (written into the summary or surfaced on failure):
+
+```
+### Observable Truths
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | User can sign up | ✓ PASS | POST /api/auth/signup returns 201 |
+| 2 | Login returns JWT | ✗ FAIL | Returns 500 — missing env var |
+
+### Artifacts
+| File | Expected | Status | Evidence |
+|------|----------|--------|---------|
+| src/lib/auth.ts | JWT helpers, min 30 lines | ✓ SUBSTANTIVE | 87 lines, exports generateTokens |
+| src/lib/email.ts | Email sending | ✗ STUB | 8 lines, console.log instead of sending |
+
+### Key Links
+| From | To | Via | Status |
+|------|----|----|--------|
+| login/route.ts | auth.ts | import generateTokens | ✓ WIRED |
+| email.ts | Resend API | resend.emails.send() | ✗ NOT WIRED |
+
+### Anti-Patterns Found
+| File | Line | Pattern | Severity |
+|------|------|---------|----------|
+| src/lib/email.ts | 5 | console.log stub | 🛑 Blocker |
+```
+
+When verification finds gaps, include a **Gaps** section with what's missing, impact, and suggested fix.
+
+### Phase 6: Summarize
+
+**Purpose:** Record what happened for downstream tasks.
+**Produces:** `TNN-summary.md`, and when slice completes, `summary.md`.
+
+**Task summary format:**
+```markdown
+---
+id: T01
+parent: S01
+milestone: M001
+provides:
+  - What this task built (~5 items)
+requires:
+  - slice: S00
+    provides: What that prior slice built that this task used
+affects: [S02, S03]
+key_files:
+  - path/to/important/file.ts
+key_decisions:
+  - "Decision made: reasoning"
+patterns_established:
+  - "Pattern name and where it lives"
+drill_down_paths:
+  - .gsd/milestones/M001/slices/S01/tasks/T01-plan.md
+duration: 15min
+verification_result: pass
+completed_at: 2026-03-07T16:00:00Z
+---
+
+# T01: Task Title
+
+**Substantive one-liner — NOT "task complete" but what actually shipped**
+
+## What Happened
+
+Concise prose narrative of what was built, why key decisions were made,
+and what matters for future work.
+
+## Deviations
+What differed from the plan and why (or "None").
+
+## Files Created/Modified
+- `path/to/file.ts` — What it does
+```
+
+The one-liner must be substantive: "JWT auth with refresh rotation using jose" not "Authentication implemented."
+
+**Slice summary:** Written when all tasks in a slice complete. Compresses all task summaries. Includes `drill_down_paths` to each task summary. During slice completion, review task summaries for `key_decisions` and ensure any significant ones are captured in `.gsd/decisions.md`.
+
+**Milestone summary:** Updated each time a slice completes. Compresses all slice summaries. This is what gets injected into later slice planning instead of loading many individual summaries.
+
+### Phase 7: Advance
+
+**Purpose:** Mark work done and move to the next thing.
+
+**After a task completes:**
+1. Mark the task done in `plan.md` (checkbox).
+2. Check if there's a next task in the slice → execute it.
+3. If slice is complete → write slice summary, mark slice done in `roadmap.md`.
+
+**After a slice completes:**
+1. Write slice `summary.md` (compresses all task summaries).
+2. Write slice `uat.md` — a non-blocking human test script derived from the slice's must-haves and demo sentence. The agent does NOT wait for UAT results.
+3. Mark the slice checkbox in `roadmap.md` as `[x]`.
+4. Update `state.md` with new position.
+5. Update milestone `summary.md` with the completed slice's contributions.
+6. Continue to next slice immediately. The user tests the UAT whenever convenient.
+7. If the user reports UAT failures later, create fix tasks in the current or a new slice.
+8. If all slices done → milestone complete.
+
+---
+
+## Continue-Here Protocol
+
+**When to write `continue.md`:**
+- You're about to lose context (compaction, session end, Ctrl+C).
+- The current task isn't done yet.
+- You want to pause and come back later.
+
+**What to capture:**
+```markdown
+---
+milestone: M001
+slice: S01
+task: T02
+step: 3
+total_steps: 7
+saved_at: 2026-03-07T15:30:00Z
+---
+
+## Completed Work
+- What's already done in this task and prior tasks in the slice.
+
+## Remaining Work
+- What steps remain, with enough detail to resume.
+
+## Decisions Made
+- Key decisions and WHY (so next session doesn't re-debate).
+
+## Context
+The "vibe" — what you were thinking, what's tricky, what to watch out for.
+
+## Next Action
+The EXACT first thing to do when resuming. Not vague. Specific.
+```
+
+**How to resume:**
+1. Read `continue.md`.
+2. Delete `continue.md` (it's consumed, not permanent).
+3. Pick up from "Next Action".
+
+---
+
+## State Management
+
+### `state.md` is a derived cache
+
+It is NOT the source of truth. It's a convenience dashboard.
+
+**Sources of truth:**
+- `roadmap.md` → which slices exist and which are done
+- `plan.md` → which tasks exist within a slice
+- `TNN-summary.md` → what happened during a task
+- `summary.md` (slice/milestone) → compressed outcomes
+
+**Update `state.md`** after every significant action:
+- Active milestone/slice/task
+- Recent decisions (last 3-5)
+- Blockers
+- Next action (most important — this is what a fresh session reads first)
+
+### Reconciliation
+
+If files disagree, **pause and surface to the user**:
+- Roadmap says slice done but task summaries missing → inconsistency
+- Task marked done but no summary → treat as incomplete
+- Continue file exists for completed task → delete continue file
+- State points to nonexistent slice/task → rebuild state from files
+
+---
+
+## Git Strategy: Branch-Per-Slice with Squash Merge
+
+**Principle:** Main is always clean and working. Each slice gets an isolated branch. The user never runs a git command — the agent handles everything.
+
+### Branch Lifecycle
+
+1. **Slice starts** → create branch `gsd/M001/S01` from main
+2. **Per-task commits** on the branch — atomic, descriptive, bisectable
+3. **Slice completes** → squash merge to main as one clean commit
+4. **Branch kept** — not deleted, available for per-task history
+
+### What Main Looks Like
+
+```
+feat(M001/S03): milestone and slice discuss commands
+feat(M001/S02): extension scaffold and command routing
+feat(M001/S01): file I/O foundation
+```
+
+One commit per slice. Individually revertable. Reads like a changelog.
+
+### What the Branch Looks Like
+
+```
+gsd/M001/S01:
+  test(S01): round-trip tests passing
+  feat(S01/T03): file writer with round-trip fidelity
+  checkpoint(S01/T03): pre-task
+  feat(S01/T02): markdown parser for plan files
+  checkpoint(S01/T02): pre-task
+  feat(S01/T01): core types and interfaces
+  checkpoint(S01/T01): pre-task
+```
+
+### Commit Conventions
+
+| When | Format | Example |
+|------|--------|---------|
+| Before each task | `checkpoint(S01/T02): pre-task` | Safety net for `git reset` |
+| After task verified | `feat(S01/T02): <what was built>` | The real work |
+| Plan/docs committed | `docs(S01): add slice plan` | Bundled with first task |
+| Slice squash to main | `feat(M001/S01): <slice title>` | Clean one-liner on main |
+
+Commit types: `feat`, `fix`, `test`, `refactor`, `docs`, `chore`
+
+### Squash Merge Message
+
+```
+feat(M001/S01): file I/O foundation
+
+Agent can parse, format, load, and save all GSD file types with round-trip fidelity.
+
+Tasks completed:
+- T01: core types and interfaces
+- T02: markdown parser for plan files
+- T03: file writer with round-trip fidelity
+```
+
+### Rollback
+
+| Problem | Fix |
+|---------|-----|
+| Bad task | `git reset --hard` to checkpoint on the branch |
+| Bad slice | `git revert <squash commit>` on main |
+| UAT failure after merge | Fix tasks on `gsd/M001/S01-fix` branch, squash as `fix(M001/S01): <fix>` |
+
+---
+
+## Summary Injection for Downstream Tasks
+
+When planning or executing a task, load relevant prior context:
+
+1. Check the current slice's `depends:[]` in `roadmap.md`.
+2. Load summaries from those dependency slices.
+3. Start with the **highest available level** — milestone `summary.md` first.
+4. Only drill down to slice/task summaries if you need specific detail.
+5. Stay within **~2500 tokens** of total injected summary context.
+6. If the dependency chain is too large, drop the oldest/least-relevant summaries first.
+
+**Aim for:**
+- ~5 provides per summary
+- ~10 key_files per summary
+- ~5 key_decisions per summary
+- ~3 patterns_established per summary
+
+These are soft caps — exceed them when genuinely needed, but don't let summaries become essays.
+
+---
+
+## Project-Specific Context
+
+This methodology doc is generic. Project-specific guidance belongs in the milestone's `context.md`:
+
+- **`.gsd/milestones/<active>/context.md`** — Architecture decisions, reference file paths, per-slice doc reading guides, implementation constraints, and any project-specific protocols (worktrees, testing, etc.)
+
+**Always read the active milestone's `context.md` before starting implementation work.** It tells you what decisions are locked, what files to reference, and how to verify your work in this specific project.
+
+---
+
+## Checklist for a Fresh Session
+
+1. Read `.gsd/state.md` — what's the next action?
+2. Check for `continue.md` in the active slice — is there interrupted work?
+3. If resuming: read `continue.md`, delete it, pick up from "Next Action".
+4. If starting fresh: read the active slice's `plan.md`, find the next incomplete task.
+5. If in a planning or research phase, read `.gsd/decisions.md` — respect existing decisions.
+6. Read relevant summaries from prior tasks/slices for context.
+7. Do the work.
+8. Verify the must-haves.
+9. Write the summary.
+10. Mark done, update `state.md`, advance.
+11. If context is getting full or you're done for now: write `continue.md` if mid-task, or update `state.md` with next action if between tasks.
+
+## When Context Gets Large
+
+If you sense context pressure (many files read, long execution, lots of tool output):
+
+1. **If mid-task:** Write `continue.md` with exact resume state. Tell the user: "Context is getting full. I've saved progress to continue.md. Start a new session and say `read @GSD-WORKFLOW.md - what's next?`"
+2. **If between tasks:** Just update `state.md` with the next action. No continue file needed — the next session will read state.md and pick up the next task cleanly.
+3. **Don't fight it.** The whole system is designed for this. A fresh session with the right files loaded is better than a stale session with degraded reasoning.
--- a/src/resources/agents/researcher.md
+++ b/src/resources/agents/researcher.md
@ -0,0 +1,29 @@
+---
+name: researcher
+description: Web researcher that finds and synthesizes current information using Brave Search
+tools: web_search, bash
+---
+
+You are a web researcher. You find current, accurate information using web search and synthesize it into a clear, well-structured report.
+
+## Strategy
+
+1. Search for the topic with 2-3 targeted queries to get breadth
+2. Synthesize findings into a coherent summary
+3. Cite sources with URLs
+
+## Output format
+
+## Summary
+
+Brief 2-3 sentence overview.
+
+## Key Findings
+
+Bullet points of the most important information, each with a source URL.
+
+## Sources
+
+Numbered list of sources used with titles and URLs.
+
+Be factual. Do not speculate beyond what the sources say. If results conflict, note it.
--- a/src/resources/agents/scout.md
+++ b/src/resources/agents/scout.md
@ -0,0 +1,56 @@
+---
+name: scout
+description: Fast codebase recon that returns compressed context for handoff to other agents
+tools: read, grep, find, ls, bash
+---
+
+You are a scout. Quickly investigate a codebase and return structured findings that another agent can use without re-reading everything.
+
+Your output will be passed to an agent who has NOT seen the files you explored.
+
+Thoroughness (infer from task, default medium):
+
+- Quick: Targeted lookups, key files only
+- Medium: Follow imports, read critical sections
+- Thorough: Trace all dependencies, check tests/types
+
+Strategy:
+
+1. grep/find to locate relevant code
+2. Read key sections (not entire files)
+3. Identify types, interfaces, key functions
+4. Note dependencies between files
+
+Output format:
+
+## Files Retrieved
+
+List with exact line ranges:
+
+1. `path/to/file.ts` (lines 10-50) - Description of what's here
+2. `path/to/other.ts` (lines 100-150) - Description
+3. ...
+
+## Key Code
+
+Critical types, interfaces, or functions:
+
+```typescript
+interface Example {
+  // actual code from the files
+}
+```
+
+```typescript
+function keyFunction() {
+  // actual implementation
+}
+```
+
+## Architecture
+
+Brief explanation of how the pieces connect.
+
+## Start Here
+
+Which file to look at first and why.
--- a/src/resources/agents/worker.md
+++ b/src/resources/agents/worker.md
@ -0,0 +1,31 @@
+---
+name: worker
+description: General-purpose subagent with full capabilities, isolated context
+---
+
+You are a worker agent with full capabilities. You operate in an isolated context window to handle delegated tasks without polluting the main conversation.
+
+Work autonomously to complete the assigned task. Use all available tools as needed, with one important restriction:
+
+- Do **not** spawn subagents or act as an orchestrator unless the parent task explicitly instructs you to do so.
+- If the task looks like GSD orchestration, planning, scouting, parallel dispatch, or review routing, stop and report that the caller should use the appropriate specialist agent instead (for example: `gsd-worker`, `gsd-scout`, `gsd-reviewer`, or the top-level orchestrator).
+- In particular, do **not** call `gsd_scout`, `subagent`, `launch_parallel_view`, or `gsd_execute_parallel` on your own initiative.
+
+Output format when finished:
+
+## Completed
+
+What was done.
+
+## Files Changed
+
+- `path/to/file.ts` - what changed
+
+## Notes (if any)
+
+Anything the main agent should know.
+
+If handing off to another agent (e.g. reviewer), include:
+
+- Exact file paths changed
+- Key functions/types touched (short list)
--- a/src/resources/extensions/ask-user-questions.ts
+++ b/src/resources/extensions/ask-user-questions.ts
@ -0,0 +1,200 @@
+/**
+ * Request User Input — LLM tool for asking the user questions
+ *
+ * Thin wrapper around the shared interview-ui. The LLM presents 1-3
+ * questions with 2-3 options each. Each question can be single-select (default)
+ * or multi-select (allowMultiple: true). A free-form "None of the above" option
+ * is added automatically to single-select questions.
+ *
+ * Based on: https://github.com/openai/codex (codex-rs/core/src/tools/handlers/ask_user_questions.rs)
+ */
+
+import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import { Text } from "@mariozechner/pi-tui";
+import { Type } from "@sinclair/typebox";
+import {
+	showInterviewRound,
+	type Question,
+	type QuestionOption,
+	type RoundResult,
+} from "./shared/interview-ui.js";
+
+// ─── Types ────────────────────────────────────────────────────────────────────
+
+interface AskUserQuestionsDetails {
+	questions: Question[];
+	response: RoundResult | null;
+	cancelled: boolean;
+}
+
+// ─── Schema ───────────────────────────────────────────────────────────────────
+
+const OptionSchema = Type.Object({
+	label: Type.String({ description: "User-facing label (1-5 words)" }),
+	description: Type.String({ description: "One short sentence explaining impact/tradeoff if selected" }),
+});
+
+const QuestionSchema = Type.Object({
+	id: Type.String({ description: "Stable identifier for mapping answers (snake_case)" }),
+	header: Type.String({ description: "Short header label shown in the UI (12 or fewer chars)" }),
+	question: Type.String({ description: "Single-sentence prompt shown to the user" }),
+	options: Type.Array(OptionSchema, {
+		description:
+			'Provide 2-3 mutually exclusive choices for single-select, or any number for multi-select. Put the recommended option first and suffix its label with "(Recommended)". Do not include an "Other" option for single-select; the client adds a free-form "None of the above" option automatically.',
+	}),
+	allowMultiple: Type.Optional(
+		Type.Boolean({
+			description:
+				"If true, the user can select multiple options using SPACE to toggle and ENTER to confirm. No 'None of the above' option is added. Default: false.",
+		}),
+	),
+});
+
+const AskUserQuestionsParams = Type.Object({
+	questions: Type.Array(QuestionSchema, {
+		description: "Questions to show the user. Prefer 1 and do not exceed 3.",
+	}),
+});
+
+// ─── Helpers ──────────────────────────────────────────────────────────────────
+
+const OTHER_OPTION_LABEL = "None of the above";
+
+function errorResult(
+	message: string,
+	questions: Question[] = [],
+): { content: { type: "text"; text: string }[]; details: AskUserQuestionsDetails } {
+	return {
+		content: [{ type: "text", text: message }],
+		details: { questions, response: null, cancelled: true },
+	};
+}
+
+/** Convert the shared RoundResult into the JSON the LLM expects. */
+function formatForLLM(result: RoundResult): string {
+	const answers: Record<string, { answers: string[] }> = {};
+	for (const [id, answer] of Object.entries(result.answers)) {
+		const list: string[] = [];
+		if (Array.isArray(answer.selected)) {
+			list.push(...answer.selected);
+		} else {
+			list.push(answer.selected);
+		}
+		if (answer.notes) {
+			list.push(`user_note: ${answer.notes}`);
+		}
+		answers[id] = { answers: list };
+	}
+	return JSON.stringify({ answers });
+}
+
+// ─── Extension ────────────────────────────────────────────────────────────────
+
+export default function AskUserQuestions(pi: ExtensionAPI) {
+	pi.registerTool({
+		name: "ask_user_questions",
+		label: "Request User Input",
+		description:
+			"Request user input for one to three short questions and wait for the response. Single-select questions have 2-3 mutually exclusive options with a free-form 'None of the above' added automatically. Multi-select questions (allowMultiple: true) let the user toggle multiple options with SPACE and confirm with ENTER.",
+		promptGuidelines: [
+			"Use ask_user_questions when you need the user to choose between concrete alternatives before proceeding.",
+			"Keep questions to 1 when possible; never exceed 3.",
+			"For single-select: each question must have 2-3 options. Put the recommended option first with '(Recommended)' suffix. Do not include an 'Other' or 'None of the above' option - the client adds one automatically.",
+			"For multi-select: set allowMultiple: true. The user can pick any number of options. No 'None of the above' is added.",
+		],
+		parameters: AskUserQuestionsParams,
+
+		async execute(_toolCallId, params, _signal, _onUpdate, ctx) {
+			// Validation
+			if (params.questions.length === 0 || params.questions.length > 3) {
+				return errorResult("Error: questions must contain 1-3 items", params.questions);
+			}
+
+			for (const q of params.questions) {
+				if (!q.options || q.options.length === 0) {
+					return errorResult(
+						`Error: ask_user_questions requires non-empty options for every question (question "${q.id}" has none)`,
+						params.questions,
+					);
+				}
+			}
+
+			if (!ctx.hasUI) {
+				return errorResult("Error: UI not available (non-interactive mode)", params.questions);
+			}
+
+			// Delegate to shared interview UI
+			const result = await showInterviewRound(params.questions, {}, ctx);
+
+			// Check if cancelled (empty answers = user exited)
+			const hasAnswers = Object.keys(result.answers).length > 0;
+			if (!hasAnswers) {
+				return {
+					content: [{ type: "text", text: "ask_user_questions was cancelled before receiving a response" }],
+					details: { questions: params.questions, response: null, cancelled: true } as AskUserQuestionsDetails,
+				};
+			}
+
+			return {
+				content: [{ type: "text", text: formatForLLM(result) }],
+				details: { questions: params.questions, response: result, cancelled: false } as AskUserQuestionsDetails,
+			};
+		},
+
+		// ─── Rendering ────────────────────────────────────────────────────────
+
+		renderCall(args, theme) {
+			const qs = (args.questions as Question[]) || [];
+			let text = theme.fg("toolTitle", theme.bold("ask_user_questions "));
+			text += theme.fg("muted", `${qs.length} question${qs.length !== 1 ? "s" : ""}`);
+			if (qs.length > 0) {
+				const headers = qs.map((q) => q.header).join(", ");
+				text += theme.fg("dim", ` (${headers})`);
+			}
+			for (const q of qs) {
+				const multiSel = !!q.allowMultiple;
+				text += `\n  ${theme.fg("text", q.question)}`;
+				const optLabels = multiSel
+					? (q.options || []).map((o: QuestionOption) => o.label)
+					: [...(q.options || []).map((o: QuestionOption) => o.label), OTHER_OPTION_LABEL];
+				const prefix = multiSel ? "☐" : "";
+				const numbered = optLabels.map((l, i) => `${prefix}${i + 1}. ${l}`).join(", ");
+				text += `\n  ${theme.fg("dim", numbered)}`;
+			}
+			return new Text(text, 0, 0);
+		},
+
+		renderResult(result, _options, theme) {
+			const details = result.details as AskUserQuestionsDetails | undefined;
+			if (!details) {
+				const text = result.content[0];
+				return new Text(text?.type === "text" ? text.text : "", 0, 0);
+			}
+
+			if (details.cancelled || !details.response) {
+				return new Text(theme.fg("warning", "Cancelled"), 0, 0);
+			}
+
+			const lines: string[] = [];
+			for (const q of details.questions) {
+				const answer = details.response.answers[q.id];
+				if (!answer) {
+					lines.push(`${theme.fg("accent", q.header)}: ${theme.fg("dim", "(no answer)")}`);
+					continue;
+				}
+				const selected = answer.selected;
+				const notes = answer.notes;
+				const multiSel = !!q.allowMultiple;
+				const answerText = multiSel && Array.isArray(selected)
+					? selected.join(", ")
+					: (Array.isArray(selected) ? selected[0] : selected) ?? "(no answer)";
+				let line = `${theme.fg("success", "✓ ")}${theme.fg("accent", q.header)}: ${answerText}`;
+				if (notes) {
+					line += ` ${theme.fg("muted", `[note: ${notes}]`)}`;
+				}
+				lines.push(line);
+			}
+			return new Text(lines.join("\n"), 0, 0);
+		},
+	});
+}
--- a/src/resources/extensions/bg-shell/index.ts
+++ b/src/resources/extensions/bg-shell/index.ts
--- a/src/resources/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md
+++ b/src/resources/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md
--- a/src/resources/extensions/browser-tools/core.js
+++ b/src/resources/extensions/browser-tools/core.js
--- a/src/resources/extensions/browser-tools/index.ts
+++ b/src/resources/extensions/browser-tools/index.ts
--- a/src/resources/extensions/browser-tools/package.json
+++ b/src/resources/extensions/browser-tools/package.json
@ -0,0 +1,20 @@
+{
+  "name": "pi-browser-tools",
+  "private": true,
+  "version": "1.0.0",
+  "type": "module",
+  "scripts": {
+    "test": "node --test tests/*.test.mjs"
+  },
+  "pi": {
+    "extensions": ["./index.ts"]
+  },
+  "peerDependencies": {
+    "playwright": ">=1.40.0"
+  },
+  "peerDependenciesMeta": {
+    "playwright": {
+      "optional": true
+    }
+  }
+}
--- a/src/resources/extensions/context7/index.ts
+++ b/src/resources/extensions/context7/index.ts
@ -0,0 +1,428 @@
+/**
+ * Context7 Documentation Extension
+ *
+ * Replaces the context7 MCP server with a native pi extension.
+ * Provides two tools for the LLM:
+ *
+ *   resolve_library   - Search for a library by name, returns candidates with metadata
+ *   get_library_docs  - Fetch docs for a library ID, scoped to an optional query/topic
+ *
+ * API contract (verified against live API 2026-03-04):
+ *   Search:  GET /api/v2/libs/search?libraryName=&query=  → { results: C7Library[] }
+ *   Context: GET /api/v2/context?libraryId=&query=&tokens= → text/plain (markdown)
+ *
+ * Features:
+ *   - Bearer auth via CONTEXT7_API_KEY env var (optional, increases rate limits)
+ *   - In-session caching of search results and doc pages
+ *   - Smart token budgeting (default 5000, configurable per call, max 10000)
+ *   - Proper truncation guard so context is never overwhelmed
+ *   - Custom TUI rendering for clean display in pi
+ *
+ * Setup:
+ *   export CONTEXT7_API_KEY=your_key   (get one at context7.com/dashboard)
+ */
+
+import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import {
+	DEFAULT_MAX_BYTES,
+	DEFAULT_MAX_LINES,
+	formatSize,
+	truncateHead,
+} from "@mariozechner/pi-coding-agent";
+import { Text } from "@mariozechner/pi-tui";
+import { Type } from "@sinclair/typebox";
+
+// ─── API types ────────────────────────────────────────────────────────────────
+
+/** Shape returned by GET /api/v2/libs/search */
+interface C7SearchResponse {
+	results: C7Library[];
+}
+
+interface C7Library {
+	id: string;
+	title: string;
+	description?: string;
+	branch?: string;
+	lastUpdateDate?: string;
+	state?: string;
+	totalTokens?: number;
+	totalSnippets?: number;
+	stars?: number;
+	trustScore?: number;
+	benchmarkScore?: number;
+	versions?: string[];
+}
+
+// ─── In-session cache ─────────────────────────────────────────────────────────
+
+// Keyed by lowercased query string
+const searchCache = new Map<string, C7Library[]>();
+
+// Keyed by `${libraryId}::${query ?? ""}::${tokens}`
+const docCache = new Map<string, string>();
+
+// ─── Helpers ─────────────────────────────────────────────────────────────────
+
+const BASE_URL = "https://context7.com/api/v2";
+
+function getApiKey(): string | undefined {
+	return process.env.CONTEXT7_API_KEY;
+}
+
+function buildHeaders(): Record<string, string> {
+	const headers: Record<string, string> = {
+		"User-Agent": "pi-coding-agent/context7-extension",
+	};
+	const key = getApiKey();
+	if (key) headers["Authorization"] = `Bearer ${key}`;
+	return headers;
+}
+
+async function apiFetchJson(url: string, signal?: AbortSignal): Promise<unknown> {
+	const res = await fetch(url, { headers: { ...buildHeaders(), Accept: "application/json" }, signal });
+	if (!res.ok) {
+		const body = await res.text().catch(() => "");
+		throw new Error(`Context7 API ${res.status}: ${body.slice(0, 300)}`);
+	}
+	return res.json();
+}
+
+async function apiFetchText(url: string, signal?: AbortSignal): Promise<string> {
+	const res = await fetch(url, { headers: { ...buildHeaders(), Accept: "text/plain" }, signal });
+	if (!res.ok) {
+		const body = await res.text().catch(() => "");
+		throw new Error(`Context7 API ${res.status}: ${body.slice(0, 300)}`);
+	}
+	return res.text();
+}
+
+/**
+ * Format library search results into a compact, LLM-readable string.
+ * Each library gets a block with the key signals for picking the best match.
+ */
+function formatLibraryList(libs: C7Library[], query: string): string {
+	if (libs.length === 0) {
+		return `No libraries found for "${query}". Try a different name or spelling.`;
+	}
+
+	const lines: string[] = [
+		`Found ${libs.length} ${libs.length === 1 ? "library" : "libraries"} matching "${query}":\n`,
+	];
+
+	for (const lib of libs) {
+		let line = `• ${lib.title}  (ID: ${lib.id})`;
+		if (lib.description) line += `\n  ${lib.description}`;
+
+		const meta: string[] = [];
+		if (lib.trustScore !== undefined) meta.push(`trust: ${lib.trustScore}/10`);
+		if (lib.benchmarkScore !== undefined) meta.push(`benchmark: ${lib.benchmarkScore.toFixed(1)}`);
+		if (lib.totalSnippets !== undefined) meta.push(`${lib.totalSnippets.toLocaleString()} snippets`);
+		if (lib.totalTokens !== undefined) meta.push(`${(lib.totalTokens / 1000).toFixed(0)}k tokens`);
+		if (lib.lastUpdateDate) meta.push(`updated: ${lib.lastUpdateDate.split("T")[0]}`);
+		if (meta.length > 0) line += `\n  ${meta.join(" · ")}`;
+
+		lines.push(line);
+	}
+
+	lines.push(
+		"\nUse the ID (e.g. /websites/react_dev) with get_library_docs to fetch documentation.",
+	);
+
+	return lines.join("\n");
+}
+
+// ─── Tool details types ───────────────────────────────────────────────────────
+
+interface ResolveDetails {
+	query: string;
+	resultCount: number;
+	cached: boolean;
+	error?: string;
+}
+
+interface DocsDetails {
+	libraryId: string;
+	query?: string;
+	tokens: number;
+	cached: boolean;
+	truncated: boolean;
+	charCount: number;
+	error?: string;
+}
+
+// ─── Extension ───────────────────────────────────────────────────────────────
+
+export default function (pi: ExtensionAPI) {
+	// ── resolve_library ──────────────────────────────────────────────────────
+
+	pi.registerTool({
+		name: "resolve_library",
+		label: "Resolve Library",
+		description:
+			"Search the Context7 library catalogue by name and return matching libraries with metadata. " +
+			"Use this to find the correct library ID before fetching documentation. " +
+			"Results are ranked by trustScore (0–10) and benchmarkScore — prefer the highest. " +
+			"If you already have a library ID (e.g. /vercel/next.js), skip this and call get_library_docs directly.",
+		promptSnippet: "Search Context7 for a library by name to get its ID for documentation lookup",
+		promptGuidelines: [
+			"Call resolve_library first when the user asks about a library, package, or framework you need current docs for.",
+			"Choose the result with the highest trustScore and benchmarkScore when multiple matches appear.",
+			"Pass the user's question as the query parameter — it improves result ranking.",
+		],
+		parameters: Type.Object({
+			libraryName: Type.String({
+				description:
+					"Library or framework name to search for, e.g. 'react', 'next.js', 'tailwindcss', 'prisma', 'langchain'",
+			}),
+			query: Type.Optional(
+				Type.String({
+					description:
+						"Optional: the user's question or topic. Improves search ranking. E.g. 'how do I use server actions?'",
+				}),
+			),
+		}),
+
+		async execute(_toolCallId, params, signal, _onUpdate, _ctx) {
+			const cacheKey = params.libraryName.toLowerCase().trim();
+
+			if (searchCache.has(cacheKey)) {
+				const cached = searchCache.get(cacheKey)!;
+				return {
+					content: [{ type: "text", text: formatLibraryList(cached, params.libraryName) }],
+					details: {
+						query: params.libraryName,
+						resultCount: cached.length,
+						cached: true,
+					} as ResolveDetails,
+				};
+			}
+
+			const url = new URL(`${BASE_URL}/libs/search`);
+			url.searchParams.set("libraryName", params.libraryName);
+			if (params.query) url.searchParams.set("query", params.query);
+
+			let libs: C7Library[];
+			try {
+				const data = (await apiFetchJson(url.toString(), signal)) as C7SearchResponse;
+				libs = Array.isArray(data?.results) ? data.results : [];
+			} catch (err: unknown) {
+				const msg = err instanceof Error ? err.message : String(err);
+				return {
+					content: [{ type: "text", text: `Context7 search failed: ${msg}` }],
+					isError: true,
+					details: { query: params.libraryName, resultCount: 0, cached: false, error: msg } as ResolveDetails,
+				};
+			}
+
+			searchCache.set(cacheKey, libs);
+
+			return {
+				content: [{ type: "text", text: formatLibraryList(libs, params.libraryName) }],
+				details: { query: params.libraryName, resultCount: libs.length, cached: false } as ResolveDetails,
+			};
+		},
+
+		renderCall(args, theme) {
+			let text = theme.fg("toolTitle", theme.bold("resolve_library "));
+			text += theme.fg("accent", `"${args.libraryName}"`);
+			if (args.query) text += theme.fg("muted", ` — "${args.query}"`);
+			return new Text(text, 0, 0);
+		},
+
+		renderResult(result, { isPartial }, theme) {
+			const d = result.details as ResolveDetails | undefined;
+			if (isPartial) return new Text(theme.fg("warning", "Searching Context7..."), 0, 0);
+			if (result.isError || d?.error) {
+				return new Text(theme.fg("error", `Error: ${d?.error ?? "unknown"}`), 0, 0);
+			}
+			let text = theme.fg("success", `${d?.resultCount ?? 0} ${d?.resultCount === 1 ? "library" : "libraries"} found`);
+			if (d?.cached) text += theme.fg("dim", " (cached)");
+			text += theme.fg("dim", ` for "${d?.query}"`);
+			return new Text(text, 0, 0);
+		},
+	});
+
+	// ── get_library_docs ─────────────────────────────────────────────────────
+
+	pi.registerTool({
+		name: "get_library_docs",
+		label: "Get Library Docs",
+		description:
+			"Fetch up-to-date documentation from Context7 for a specific library. " +
+			"Pass the library ID from resolve_library (e.g. /websites/react_dev) and a focused topic query " +
+			"to get the most relevant snippets. " +
+			"The tokens parameter controls how much documentation to retrieve (default 5000, max 10000). " +
+			"A specific query (e.g. 'server actions form submission') returns better results than a broad one.",
+		promptSnippet: "Fetch up-to-date, version-specific documentation for a library from Context7",
+		promptGuidelines: [
+			"Use a specific topic query for best results — e.g. 'useEffect cleanup' not just 'hooks'.",
+			"Start with tokens=5000. Increase to 10000 only if the first response lacks the detail you need.",
+			"Results are cached per-session — repeated calls for the same library+query have no API cost.",
+		],
+		parameters: Type.Object({
+			libraryId: Type.String({
+				description:
+					"Context7 library ID from resolve_library, e.g. /websites/react_dev or /vercel/next.js",
+			}),
+			query: Type.Optional(
+				Type.String({
+					description:
+						"Specific topic to focus the docs on, e.g. 'server actions', 'useEffect cleanup', 'authentication middleware'. More specific = better results.",
+				}),
+			),
+			tokens: Type.Optional(
+				Type.Number({
+					description: "Max tokens of documentation to return (default 5000, max 10000).",
+					minimum: 500,
+					maximum: 10000,
+				}),
+			),
+		}),
+
+		async execute(_toolCallId, params, signal, _onUpdate, _ctx) {
+			const tokens = Math.min(Math.max(params.tokens ?? 5000, 500), 10000);
+			// Strip accidental leading @ that some models inject
+			const libraryId = params.libraryId.startsWith("@")
+				? params.libraryId.slice(1)
+				: params.libraryId;
+			const query = params.query?.trim() || undefined;
+
+			const cacheKey = `${libraryId}::${query ?? ""}::${tokens}`;
+
+			if (docCache.has(cacheKey)) {
+				const cached = docCache.get(cacheKey)!;
+				return {
+					content: [{ type: "text", text: cached }],
+					details: {
+						libraryId,
+						query,
+						tokens,
+						cached: true,
+						truncated: false,
+						charCount: cached.length,
+					} as DocsDetails,
+				};
+			}
+
+			const url = new URL(`${BASE_URL}/context`);
+			url.searchParams.set("libraryId", libraryId);
+			if (query) url.searchParams.set("query", query);
+			url.searchParams.set("tokens", String(tokens));
+
+			let rawText: string;
+			try {
+				rawText = await apiFetchText(url.toString(), signal);
+			} catch (err: unknown) {
+				const msg = err instanceof Error ? err.message : String(err);
+				return {
+					content: [{ type: "text", text: `Context7 doc fetch failed: ${msg}` }],
+					isError: true,
+					details: {
+						libraryId,
+						query,
+						tokens,
+						cached: false,
+						truncated: false,
+						charCount: 0,
+						error: msg,
+					} as DocsDetails,
+				};
+			}
+
+			if (!rawText.trim()) {
+				const notFound = query
+					? `No documentation found for "${query}" in ${libraryId}. Try a broader query or different library ID.`
+					: `No documentation found for ${libraryId}. Try resolve_library to verify the library ID.`;
+				return {
+					content: [{ type: "text", text: notFound }],
+					details: {
+						libraryId,
+						query,
+						tokens,
+						cached: false,
+						truncated: false,
+						charCount: 0,
+					} as DocsDetails,
+				};
+			}
+
+			// Truncation guard — Context7 already respects the token budget, but be defensive
+			const truncation = truncateHead(rawText, {
+				maxLines: DEFAULT_MAX_LINES,
+				maxBytes: DEFAULT_MAX_BYTES,
+			});
+
+			let finalText = truncation.content;
+			if (truncation.truncated) {
+				finalText +=
+					`\n\n[Truncated: showing ${truncation.outputLines}/${truncation.totalLines} lines` +
+					` (${formatSize(truncation.outputBytes)} of ${formatSize(truncation.totalBytes)}).` +
+					` Use a more specific query to reduce output size.]`;
+			}
+
+			docCache.set(cacheKey, finalText);
+
+			return {
+				content: [{ type: "text", text: finalText }],
+				details: {
+					libraryId,
+					query,
+					tokens,
+					cached: false,
+					truncated: truncation.truncated,
+					charCount: finalText.length,
+				} as DocsDetails,
+			};
+		},
+
+		renderCall(args, theme) {
+			let text = theme.fg("toolTitle", theme.bold("get_library_docs "));
+			text += theme.fg("accent", args.libraryId);
+			if (args.query) text += theme.fg("muted", ` — "${args.query}"`);
+			if (args.tokens && args.tokens !== 5000) text += theme.fg("dim", ` (${args.tokens} tokens)`);
+			return new Text(text, 0, 0);
+		},
+
+		renderResult(result, { isPartial, expanded }, theme) {
+			const d = result.details as DocsDetails | undefined;
+
+			if (isPartial) return new Text(theme.fg("warning", "Fetching documentation..."), 0, 0);
+			if (result.isError || d?.error) {
+				return new Text(theme.fg("error", `Error: ${d?.error ?? "unknown"}`), 0, 0);
+			}
+
+			let text = theme.fg("success", `${(d?.charCount ?? 0).toLocaleString()} chars`);
+			text += theme.fg("dim", ` · ${d?.tokens ?? 5000} token budget`);
+			if (d?.cached) text += theme.fg("dim", " · cached");
+			if (d?.truncated) text += theme.fg("warning", " · truncated");
+			text += theme.fg("dim", ` · ${d?.libraryId}`);
+			if (d?.query) text += theme.fg("dim", ` — "${d.query}"`);
+
+			if (expanded) {
+				const content = result.content[0];
+				if (content?.type === "text") {
+					const preview = content.text.split("\n").slice(0, 12).join("\n");
+					text += "\n\n" + theme.fg("dim", preview);
+					if (content.text.split("\n").length > 12) {
+						text += "\n" + theme.fg("muted", "… (Ctrl+O to collapse)");
+					}
+				}
+			}
+
+			return new Text(text, 0, 0);
+		},
+	});
+
+	// ── Startup notification ─────────────────────────────────────────────────
+
+	pi.on("session_start", async (_event, ctx) => {
+		if (!getApiKey()) {
+			ctx.ui.notify(
+				"Context7: No CONTEXT7_API_KEY set. Using free tier (1000 req/month limit). " +
+				"Set CONTEXT7_API_KEY for higher limits.",
+				"warning",
+			);
+		}
+	});
+}
--- a/src/resources/extensions/context7/package.json
+++ b/src/resources/extensions/context7/package.json
@ -0,0 +1,11 @@
+{
+  "name": "pi-extension-context7",
+  "private": true,
+  "version": "1.0.0",
+  "type": "module",
+  "pi": {
+    "extensions": [
+      "./index.ts"
+    ]
+  }
+}
--- a/src/resources/extensions/get-secrets-from-user.ts
+++ b/src/resources/extensions/get-secrets-from-user.ts
@ -0,0 +1,352 @@
+/**
+ * get-secrets-from-user — paged secure env var collection + apply
+ *
+ * Collects secrets one-per-page via masked TUI input, then writes them
+ * to .env (local), Vercel, or Convex. No ctx.callTool, no external deps.
+ * Uses Node fs/promises for file I/O and pi.exec() for CLI sinks.
+ */
+
+import { readFile, writeFile } from "node:fs/promises";
+import { resolve } from "node:path";
+
+import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import { CURSOR_MARKER, Editor, type EditorTheme, Key, matchesKey, Text, truncateToWidth } from "@mariozechner/pi-tui";
+import { Type } from "@sinclair/typebox";
+
+// ─── Types ────────────────────────────────────────────────────────────────────
+
+interface CollectedSecret {
+	key: string;
+	value: string | null; // null = skipped
+}
+
+interface ToolResultDetails {
+	destination: string;
+	environment?: string;
+	applied: string[];
+	skipped: string[];
+}
+
+// ─── Helpers ──────────────────────────────────────────────────────────────────
+
+function maskPreview(value: string): string {
+	if (!value) return "";
+	if (value.length <= 8) return "*".repeat(value.length);
+	return `${value.slice(0, 4)}${"*".repeat(Math.max(4, value.length - 8))}${value.slice(-4)}`;
+}
+
+/**
+ * Replace editor visible text with masked characters while preserving ANSI cursor/sequencer codes.
+ */
+function maskEditorLine(line: string): string {
+	// Keep border / metadata lines readable.
+	if (line.startsWith("─")) {
+		return line;
+	}
+
+	let output = "";
+	let i = 0;
+	while (i < line.length) {
+		if (line.startsWith(CURSOR_MARKER, i)) {
+			output += CURSOR_MARKER;
+			i += CURSOR_MARKER.length;
+			continue;
+		}
+
+		const ansiMatch = /^\x1b\[[0-9;]*m/.exec(line.slice(i));
+		if (ansiMatch) {
+			output += ansiMatch[0];
+			i += ansiMatch[0].length;
+			continue;
+		}
+
+		const ch = line[i] as string;
+		output += ch === " " ? " " : "*";
+		i += 1;
+	}
+
+	return output;
+}
+
+function shellEscapeSingle(value: string): string {
+	return `'${value.replace(/'/g, `'\\''`)}'`;
+}
+
+async function writeEnvKey(filePath: string, key: string, value: string): Promise<void> {
+	let content = "";
+	try {
+		content = await readFile(filePath, "utf8");
+	} catch {
+		content = "";
+	}
+	const escaped = value.replace(/\\/g, "\\\\").replace(/\n/g, "\\n").replace(/\r/g, "");
+	const line = `${key}=${escaped}`;
+	const regex = new RegExp(`^${key.replace(/[.*+?^${}()|[\]\\]/g, "\\$&")}\\s*=.*$`, "m");
+	if (regex.test(content)) {
+		content = content.replace(regex, line);
+	} else {
+		if (content.length > 0 && !content.endsWith("\n")) content += "\n";
+		content += `${line}\n`;
+	}
+	await writeFile(filePath, content, "utf8");
+}
+
+// ─── Paged secure input UI ────────────────────────────────────────────────────
+
+/**
+ * Show a single-key masked input page via ctx.ui.custom().
+ * Returns the entered value, or null if skipped/cancelled.
+ */
+async function collectOneSecret(
+	ctx: { ui: any; hasUI: boolean },
+	pageIndex: number,
+	totalPages: number,
+	keyName: string,
+	hint: string | undefined,
+): Promise<string | null> {
+	if (!ctx.hasUI) return null;
+
+	return ctx.ui.custom<string | null>((tui: any, theme: any, _kb: any, done: (r: string | null) => void) => {
+		let value = "";
+		let cachedLines: string[] | undefined;
+
+		const editorTheme: EditorTheme = {
+			borderColor: (s: string) => theme.fg("accent", s),
+			selectList: {
+				selectedPrefix: (t: string) => theme.fg("accent", t),
+				selectedText: (t: string) => theme.fg("accent", t),
+				description: (t: string) => theme.fg("muted", t),
+				scrollInfo: (t: string) => theme.fg("dim", t),
+				noMatch: (t: string) => theme.fg("warning", t),
+			},
+		};
+		const editor = new Editor(tui, editorTheme, { paddingX: 1 });
+
+		function refresh() {
+			cachedLines = undefined;
+			tui.requestRender();
+		}
+
+		function handleInput(data: string) {
+			if (matchesKey(data, Key.enter)) {
+				value = editor.getText().trim();
+				done(value.length > 0 ? value : null);
+				return;
+			}
+			if (matchesKey(data, Key.escape)) {
+				done(null);
+				return;
+			}
+			// ctrl+s = skip this key
+			if (data === "\x13") {
+				done(null);
+				return;
+			}
+			editor.handleInput(data);
+			refresh();
+		}
+
+		function render(width: number): string[] {
+			if (cachedLines) return cachedLines;
+			const lines: string[] = [];
+			const add = (s: string) => lines.push(truncateToWidth(s, width));
+
+			add(theme.fg("accent", "─".repeat(width)));
+			add(theme.fg("dim", ` Page ${pageIndex + 1}/${totalPages} · Secure Env Setup`));
+			lines.push("");
+
+			// Key name as big header
+			add(theme.fg("accent", theme.bold(` ${keyName}`)));
+			if (hint) {
+				add(theme.fg("muted", `  ${hint}`));
+			}
+			lines.push("");
+
+			// Masked preview
+			const raw = editor.getText();
+			const preview = raw.length > 0 ? maskPreview(raw) : theme.fg("dim", "(empty — press enter to skip)");
+			add(theme.fg("text", `  Preview: ${preview}`));
+			lines.push("");
+
+			// Editor
+			add(theme.fg("muted", " Enter value:"));
+			for (const line of editor.render(width - 2)) {
+				add(theme.fg("text", maskEditorLine(line)));
+			}
+
+			lines.push("");
+			add(theme.fg("dim", ` enter to confirm  |  ctrl+s or esc to skip  |  esc cancels`));
+			add(theme.fg("accent", "─".repeat(width)));
+
+			cachedLines = lines;
+			return lines;
+		}
+
+		return {
+			render,
+			invalidate: () => { cachedLines = undefined; },
+			handleInput,
+		};
+	});
+}
+
+// ─── Extension ────────────────────────────────────────────────────────────────
+
+export default function secureEnv(pi: ExtensionAPI) {
+	pi.registerTool({
+		name: "secure_env_collect",
+		label: "Secure Env Collect",
+		description:
+			"Collect one or more env vars through a paged masked-input UI, then write them to .env, Vercel, or Convex. " +
+			"Values are shown masked to the user (e.g. sk-ir***dgdh) and never echoed in tool output.",
+		promptSnippet: "Collect and apply env vars securely without asking user to edit files manually.",
+		promptGuidelines: [
+			"NEVER ask the user to manually edit .env files, copy-paste into a terminal, or open a dashboard to set env vars. Always use secure_env_collect instead.",
+			"When a command fails due to a missing env var (e.g. 'OPENAI_API_KEY is not set', 'Missing required environment variable', 'Invalid API key', 'authentication required'), immediately call secure_env_collect with the missing keys before retrying.",
+			"When starting a new project or running setup steps that require secrets (API keys, tokens, database URLs), proactively call secure_env_collect before the first command that needs them.",
+			"Detect the right destination: use 'dotenv' for local dev, 'vercel' when deploying to Vercel, 'convex' when using Convex backend.",
+			"After secure_env_collect completes, re-run the originally blocked command to verify the fix worked.",
+			"Never echo, log, or repeat secret values in your responses. Only report key names and applied/skipped status.",
+		],
+		parameters: Type.Object({
+			destination: Type.Union([
+				Type.Literal("dotenv"),
+				Type.Literal("vercel"),
+				Type.Literal("convex"),
+			], { description: "Where to write the collected secrets" }),
+			keys: Type.Array(
+				Type.Object({
+					key: Type.String({ description: "Env var name, e.g. OPENAI_API_KEY" }),
+					hint: Type.Optional(Type.String({ description: "Format hint shown to user, e.g. 'starts with sk-'" })),
+					required: Type.Optional(Type.Boolean()),
+				}),
+				{ minItems: 1 },
+			),
+			envFilePath: Type.Optional(Type.String({ description: "Path to .env file (dotenv only). Defaults to .env in cwd." })),
+			environment: Type.Optional(
+				Type.Union([
+					Type.Literal("development"),
+					Type.Literal("preview"),
+					Type.Literal("production"),
+				], { description: "Target environment (vercel only)" }),
+			),
+		}),
+
+		async execute(_toolCallId, params, _signal, _onUpdate, ctx) {
+			if (!ctx.hasUI) {
+				return {
+					content: [{ type: "text", text: "Error: UI not available (interactive mode required for secure env collection)." }],
+					isError: true,
+				};
+			}
+
+			const collected: CollectedSecret[] = [];
+
+			// Collect one key per page
+			for (let i = 0; i < params.keys.length; i++) {
+				const item = params.keys[i];
+				const value = await collectOneSecret(ctx, i, params.keys.length, item.key, item.hint);
+				collected.push({ key: item.key, value });
+			}
+
+			const provided = collected.filter((c) => c.value !== null) as Array<{ key: string; value: string }>;
+			const skipped = collected.filter((c) => c.value === null).map((c) => c.key);
+			const applied: string[] = [];
+			const errors: string[] = [];
+
+			// Apply to destination
+			if (params.destination === "dotenv") {
+				const filePath = resolve(ctx.cwd, params.envFilePath ?? ".env");
+				for (const { key, value } of provided) {
+					try {
+						await writeEnvKey(filePath, key, value);
+						applied.push(key);
+					} catch (err: any) {
+						errors.push(`${key}: ${err.message}`);
+					}
+				}
+			}
+
+			if (params.destination === "vercel") {
+				const env = params.environment ?? "development";
+				for (const { key, value } of provided) {
+					try {
+						const result = await pi.exec("sh", [
+							"-c",
+							`printf %s ${shellEscapeSingle(value)} | vercel env add ${key} ${env}`,
+						]);
+						if (result.code !== 0) {
+							errors.push(`${key}: ${result.stderr.slice(0, 200)}`);
+						} else {
+							applied.push(key);
+						}
+					} catch (err: any) {
+						errors.push(`${key}: ${err.message}`);
+					}
+				}
+			}
+
+			if (params.destination === "convex") {
+				for (const { key, value } of provided) {
+					try {
+						const result = await pi.exec("sh", [
+							"-c",
+							`npx convex env set ${key} ${shellEscapeSingle(value)}`,
+						]);
+						if (result.code !== 0) {
+							errors.push(`${key}: ${result.stderr.slice(0, 200)}`);
+						} else {
+							applied.push(key);
+						}
+					} catch (err: any) {
+						errors.push(`${key}: ${err.message}`);
+					}
+				}
+			}
+
+			const details: ToolResultDetails = {
+				destination: params.destination,
+				environment: params.environment,
+				applied,
+				skipped,
+			};
+
+			const lines = [
+				`destination: ${params.destination}${params.environment ? ` (${params.environment})` : ""}`,
+				...applied.map((k) => `✓ ${k}: applied`),
+				...skipped.map((k) => `• ${k}: skipped`),
+				...errors.map((e) => `✗ ${e}`),
+			];
+
+			return {
+				content: [{ type: "text", text: lines.join("\n") }],
+				details,
+				isError: errors.length > 0 && applied.length === 0,
+			};
+		},
+
+		renderCall(args, theme) {
+			const count = Array.isArray(args.keys) ? args.keys.length : 0;
+			return new Text(
+				theme.fg("toolTitle", theme.bold("secure_env_collect ")) +
+				theme.fg("muted", `→ ${args.destination}`) +
+				theme.fg("dim", `  ${count} key${count !== 1 ? "s" : ""}`),
+				0, 0,
+			);
+		},
+
+		renderResult(result, _options, theme) {
+			const details = result.details as ToolResultDetails | undefined;
+			if (!details) {
+				const t = result.content[0];
+				return new Text(t?.type === "text" ? t.text : "", 0, 0);
+			}
+			const lines = [
+				`${theme.fg("success", "✓")} ${details.destination}${details.environment ? ` (${details.environment})` : ""}`,
+				...details.applied.map((k) => `  ${theme.fg("success", "✓")} ${k}: applied`),
+				...details.skipped.map((k) => `  ${theme.fg("warning", "•")} ${k}: skipped`),
+			];
+			return new Text(lines.join("\n"), 0, 0);
+		},
+	});
+}
--- a/src/resources/extensions/gsd/activity-log.ts
+++ b/src/resources/extensions/gsd/activity-log.ts
@ -0,0 +1,69 @@
+/**
+ * GSD Activity Log — Save raw chat sessions to .gsd/activity/
+ *
+ * Before each context wipe in auto-mode, dumps the full session
+ * as JSONL. No formatting, no truncation, no information loss.
+ * These are debug artifacts — only read when summaries aren't enough.
+ *
+ * Diagnostic extraction is handled by session-forensics.ts.
+ */
+
+import { writeFileSync, mkdirSync, readdirSync, unlinkSync, statSync } from "node:fs";
+import { join } from "node:path";
+import type { ExtensionContext } from "@mariozechner/pi-coding-agent";
+import { gsdRoot } from "./paths.js";
+
+export function saveActivityLog(
+  ctx: ExtensionContext,
+  basePath: string,
+  unitType: string,
+  unitId: string,
+): void {
+  try {
+    const entries = ctx.sessionManager.getEntries();
+    if (!entries || entries.length === 0) return;
+
+    const activityDir = join(gsdRoot(basePath), "activity");
+    mkdirSync(activityDir, { recursive: true });
+
+    // Next sequence number
+    let maxSeq = 0;
+    try {
+      for (const f of readdirSync(activityDir)) {
+        const match = f.match(/^(\d+)-/);
+        if (match) maxSeq = Math.max(maxSeq, parseInt(match[1], 10));
+      }
+    } catch { /* empty dir */ }
+    const seq = String(maxSeq + 1).padStart(3, "0");
+
+    const safeUnitId = unitId.replace(/\//g, "-");
+    const fileName = `${seq}-${unitType}-${safeUnitId}.jsonl`;
+    const filePath = join(activityDir, fileName);
+
+    const lines = entries.map(entry => JSON.stringify(entry));
+    writeFileSync(filePath, lines.join("\n") + "\n", "utf-8");
+  } catch {
+    // Don't let logging failures break auto-mode
+  }
+}
+
+export function pruneActivityLogs(activityDir: string, retentionDays: number): void {
+  try {
+    const files = readdirSync(activityDir);
+    const entries: { seq: number; filePath: string }[] = [];
+    for (const f of files) {
+      const match = f.match(/^(\d+)-/);
+      if (match) entries.push({ seq: parseInt(match[1], 10), filePath: join(activityDir, f) });
+    }
+    if (entries.length === 0) return;
+    const maxSeq = Math.max(...entries.map(e => e.seq));
+    const cutoff = Date.now() - retentionDays * 86_400_000;
+    for (const entry of entries) {
+      if (entry.seq === maxSeq) continue;  // always preserve highest-seq
+      try {
+        const mtime = statSync(entry.filePath).mtimeMs;
+        if (Math.floor(mtime) <= cutoff) unlinkSync(entry.filePath);
+      } catch { /* file vanished or stat failed — skip */ }
+    }
+  } catch { /* empty dir or readdirSync failure — skip */ }
+}
--- a/src/resources/extensions/gsd/auto.ts
+++ b/src/resources/extensions/gsd/auto.ts
--- a/src/resources/extensions/gsd/commands.ts
+++ b/src/resources/extensions/gsd/commands.ts
@ -0,0 +1,292 @@
+/**
+ * GSD Command — /gsd
+ *
+ * One command, one wizard. Routes to smart entry or status.
+ */
+
+import type { ExtensionAPI, ExtensionCommandContext } from "@mariozechner/pi-coding-agent";
+import { existsSync, readFileSync } from "node:fs";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+import { deriveState } from "./state.js";
+import { GSDDashboardOverlay } from "./dashboard-overlay.js";
+import { showSmartEntry, showQueue, showDiscuss } from "./guided-flow.js";
+import { startAuto, stopAuto, isAutoActive, isAutoPaused } from "./auto.js";
+import {
+  getGlobalGSDPreferencesPath,
+  getLegacyGlobalGSDPreferencesPath,
+  getProjectGSDPreferencesPath,
+  loadGlobalGSDPreferences,
+  loadProjectGSDPreferences,
+  loadEffectiveGSDPreferences,
+  resolveAllSkillReferences,
+} from "./preferences.js";
+import { loadFile, saveFile } from "./files.js";
+import {
+  formatDoctorIssuesForPrompt,
+  formatDoctorReport,
+  runGSDDoctor,
+  selectDoctorScope,
+  filterDoctorIssues,
+} from "./doctor.js";
+import { loadPrompt } from "./prompt-loader.js";
+import { getSuggestedNextCommands } from "./workspace-index.ts";
+
+function dispatchDoctorHeal(pi: ExtensionAPI, scope: string | undefined, reportText: string, structuredIssues: string): void {
+  const workflowPath = process.env.GSD_WORKFLOW_PATH ?? join(process.env.HOME ?? "~", ".pi", "GSD-WORKFLOW.md");
+  const workflow = readFileSync(workflowPath, "utf-8");
+  const prompt = loadPrompt("doctor-heal", {
+    doctorSummary: reportText,
+    structuredIssues,
+    scopeLabel: scope ?? "active milestone / blocking scope",
+    doctorCommandSuffix: scope ? ` ${scope}` : "",
+  });
+
+  const content = `Read the following GSD workflow protocol and execute exactly.\n\n${workflow}\n\n## Your Task\n\n${prompt}`;
+
+  pi.sendMessage(
+    { customType: "gsd-doctor-heal", content, display: false },
+    { triggerTurn: true },
+  );
+}
+
+export function registerGSDCommand(pi: ExtensionAPI): void {
+  pi.registerCommand("gsd", {
+    description: "GSD — Get Stuff Done: /gsd auto|stop|status|queue|prefs|doctor",
+
+    getArgumentCompletions: (prefix: string) => {
+      const subcommands = ["auto", "stop", "status", "queue", "discuss", "prefs", "doctor"];
+      const parts = prefix.trim().split(/\s+/);
+
+      if (parts.length <= 1) {
+        return subcommands
+          .filter((cmd) => cmd.startsWith(parts[0] ?? ""))
+          .map((cmd) => ({ value: cmd, label: cmd }));
+      }
+
+      if (parts[0] === "auto" && parts.length <= 2) {
+        const flagPrefix = parts[1] ?? "";
+        return ["--verbose"]
+          .filter((f) => f.startsWith(flagPrefix))
+          .map((f) => ({ value: `auto ${f}`, label: f }));
+      }
+
+      if (parts[0] === "prefs" && parts.length <= 2) {
+        const subPrefix = parts[1] ?? "";
+        return ["global", "project", "status"]
+          .filter((cmd) => cmd.startsWith(subPrefix))
+          .map((cmd) => ({ value: `prefs ${cmd}`, label: cmd }));
+      }
+
+      if (parts[0] === "doctor") {
+        const modePrefix = parts[1] ?? "";
+        const modes = ["fix", "heal", "audit"];
+
+        if (parts.length <= 2) {
+          return modes
+            .filter((cmd) => cmd.startsWith(modePrefix))
+            .map((cmd) => ({ value: `doctor ${cmd}`, label: cmd }));
+        }
+
+        return [];
+      }
+
+      return [];
+    },
+
+    async handler(args: string, ctx: ExtensionCommandContext) {
+      const trimmed = (typeof args === "string" ? args : "").trim();
+
+      if (trimmed === "status") {
+        await handleStatus(ctx);
+        return;
+      }
+
+      if (trimmed === "prefs" || trimmed.startsWith("prefs ")) {
+        await handlePrefs(trimmed.replace(/^prefs\s*/, "").trim(), ctx);
+        return;
+      }
+
+      if (trimmed === "doctor" || trimmed.startsWith("doctor ")) {
+        await handleDoctor(trimmed.replace(/^doctor\s*/, "").trim(), ctx, pi);
+        return;
+      }
+
+      if (trimmed === "auto" || trimmed.startsWith("auto ")) {
+        const verboseMode = trimmed.includes("--verbose");
+        await startAuto(ctx, pi, process.cwd(), verboseMode);
+        return;
+      }
+
+      if (trimmed === "stop") {
+        if (!isAutoActive() && !isAutoPaused()) {
+          ctx.ui.notify("Auto-mode is not running.", "info");
+          return;
+        }
+        await stopAuto(ctx, pi);
+        return;
+      }
+
+      if (trimmed === "queue") {
+        await showQueue(ctx, pi, process.cwd());
+        return;
+      }
+
+      if (trimmed === "discuss") {
+        await showDiscuss(ctx, pi, process.cwd());
+        return;
+      }
+
+      if (trimmed === "") {
+        await showSmartEntry(ctx, pi, process.cwd());
+        const next = await getSuggestedNextCommands(process.cwd());
+        if (next.length > 0) {
+          ctx.ui.notify(`Likely next: ${next.join(" · ")}`, "info");
+        }
+        return;
+      }
+
+      ctx.ui.notify(
+        `Unknown: /gsd ${trimmed}. Use /gsd, /gsd auto, /gsd stop, /gsd status, /gsd queue, /gsd discuss, /gsd prefs [global|project|status], or /gsd doctor [audit|fix|heal] [M###/S##].`,
+        "warning",
+      );
+    },
+  });
+}
+
+async function handleStatus(ctx: ExtensionCommandContext): Promise<void> {
+  const basePath = process.cwd();
+  const state = await deriveState(basePath);
+
+  if (state.registry.length === 0) {
+    ctx.ui.notify("No GSD milestones found. Run /gsd to start.", "info");
+    return;
+  }
+
+  await ctx.ui.custom<void>(
+    (tui, theme, _kb, done) => {
+      return new GSDDashboardOverlay(tui, theme, () => done());
+    },
+    {
+      overlay: true,
+      overlayOptions: {
+        width: "70%",
+        minWidth: 60,
+        maxHeight: "90%",
+        anchor: "center",
+      },
+    },
+  );
+}
+
+export async function fireStatusViaCommand(
+  ctx: import("@mariozechner/pi-coding-agent").ExtensionContext,
+): Promise<void> {
+  await handleStatus(ctx as ExtensionCommandContext);
+}
+
+async function handlePrefs(args: string, ctx: ExtensionCommandContext): Promise<void> {
+  const trimmed = args.trim();
+
+  if (trimmed === "" || trimmed === "global") {
+    await ensurePreferencesFile(getGlobalGSDPreferencesPath(), ctx, "global");
+    return;
+  }
+
+  if (trimmed === "project") {
+    await ensurePreferencesFile(getProjectGSDPreferencesPath(), ctx, "project");
+    return;
+  }
+
+  if (trimmed === "status") {
+    const globalPrefs = loadGlobalGSDPreferences();
+    const projectPrefs = loadProjectGSDPreferences();
+    const canonicalGlobal = getGlobalGSDPreferencesPath();
+    const legacyGlobal = getLegacyGlobalGSDPreferencesPath();
+    const globalStatus = globalPrefs
+      ? `present: ${globalPrefs.path}${globalPrefs.path === legacyGlobal ? " (legacy fallback)" : ""}`
+      : `missing: ${canonicalGlobal}`;
+    const projectStatus = projectPrefs ? `present: ${projectPrefs.path}` : `missing: ${getProjectGSDPreferencesPath()}`;
+
+    const lines = [`GSD skill prefs — global ${globalStatus}; project ${projectStatus}`];
+
+    const effective = loadEffectiveGSDPreferences();
+    let hasUnresolved = false;
+    if (effective) {
+      const report = resolveAllSkillReferences(effective.preferences, process.cwd());
+      const resolved = [...report.resolutions.values()].filter(r => r.method !== "unresolved");
+      hasUnresolved = report.warnings.length > 0;
+      if (resolved.length > 0 || hasUnresolved) {
+        lines.push(`Skills: ${resolved.length} resolved, ${report.warnings.length} unresolved`);
+      }
+      if (hasUnresolved) {
+        lines.push(`Unresolved: ${report.warnings.join(", ")}`);
+      }
+    }
+
+    ctx.ui.notify(lines.join("\n"), hasUnresolved ? "warning" : "info");
+    return;
+  }
+
+  ctx.ui.notify("Usage: /gsd prefs [global|project|status]", "info");
+}
+
+async function handleDoctor(args: string, ctx: ExtensionCommandContext, pi: ExtensionAPI): Promise<void> {
+  const trimmed = args.trim();
+  const parts = trimmed ? trimmed.split(/\s+/) : [];
+  const mode = parts[0] === "fix" || parts[0] === "heal" || parts[0] === "audit" ? parts[0] : "doctor";
+  const requestedScope = mode === "doctor" ? parts[0] : parts[1];
+  const scope = await selectDoctorScope(process.cwd(), requestedScope);
+  const effectiveScope = mode === "audit" ? requestedScope : scope;
+  const report = await runGSDDoctor(process.cwd(), {
+    fix: mode === "fix" || mode === "heal",
+    scope: effectiveScope,
+  });
+
+  const reportText = formatDoctorReport(report, {
+    scope: effectiveScope,
+    includeWarnings: mode === "audit",
+    maxIssues: mode === "audit" ? 50 : 12,
+    title: mode === "audit" ? "GSD doctor audit." : mode === "heal" ? "GSD doctor heal prep." : undefined,
+  });
+
+  ctx.ui.notify(reportText, report.ok ? "info" : "warning");
+
+  if (mode === "heal") {
+    const unresolved = filterDoctorIssues(report.issues, {
+      scope: effectiveScope,
+      includeWarnings: true,
+    });
+    const actionable = unresolved.filter(issue => issue.severity === "error" || issue.code === "all_tasks_done_missing_slice_uat" || issue.code === "slice_checked_missing_uat");
+    if (actionable.length === 0) {
+      ctx.ui.notify("Doctor heal found nothing actionable to hand off to the LLM.", "info");
+      return;
+    }
+
+    const structuredIssues = formatDoctorIssuesForPrompt(actionable);
+    dispatchDoctorHeal(pi, effectiveScope, reportText, structuredIssues);
+    ctx.ui.notify(`Doctor heal dispatched ${actionable.length} issue(s) to the LLM.`, "info");
+  }
+}
+
+async function ensurePreferencesFile(
+  path: string,
+  ctx: ExtensionCommandContext,
+  scope: "global" | "project",
+): Promise<void> {
+  if (!existsSync(path)) {
+    const template = await loadFile(join(dirname(fileURLToPath(import.meta.url)), "templates", "preferences.md"));
+    if (!template) {
+      ctx.ui.notify("Could not load GSD preferences template.", "error");
+      return;
+    }
+    await saveFile(path, template);
+    ctx.ui.notify(`Created ${scope} GSD skill preferences at ${path}`, "info");
+  } else {
+    ctx.ui.notify(`Using existing ${scope} GSD skill preferences at ${path}`, "info");
+  }
+
+  await ctx.waitForIdle();
+  await ctx.reload();
+  ctx.ui.notify(`Edit ${path} to update ${scope} GSD skill preferences.`, "info");
+}
--- a/src/resources/extensions/gsd/crash-recovery.ts
+++ b/src/resources/extensions/gsd/crash-recovery.ts
@ -0,0 +1,85 @@
+/**
+ * GSD Crash Recovery
+ *
+ * Detects interrupted auto-mode sessions via a lock file.
+ * Written on auto-start, updated on each unit dispatch, deleted on clean stop.
+ * If the lock file exists on next startup, the previous session crashed.
+ *
+ * The lock records the pi session file path so crash recovery can read the
+ * surviving JSONL (pi appends entries incrementally via appendFileSync,
+ * so the file on disk reflects every tool call up to the crash point).
+ */
+
+import { writeFileSync, readFileSync, unlinkSync, existsSync } from "node:fs";
+import { join } from "node:path";
+import { gsdRoot } from "./paths.js";
+
+const LOCK_FILE = "auto.lock";
+
+export interface LockData {
+  pid: number;
+  startedAt: string;
+  unitType: string;
+  unitId: string;
+  unitStartedAt: string;
+  completedUnits: number;
+  /** Path to the pi session JSONL file that was active when this unit started. */
+  sessionFile?: string;
+}
+
+function lockPath(basePath: string): string {
+  return join(gsdRoot(basePath), LOCK_FILE);
+}
+
+/** Write or update the lock file with current auto-mode state. */
+export function writeLock(
+  basePath: string,
+  unitType: string,
+  unitId: string,
+  completedUnits: number,
+  sessionFile?: string,
+): void {
+  try {
+    const data: LockData = {
+      pid: process.pid,
+      startedAt: new Date().toISOString(),
+      unitType,
+      unitId,
+      unitStartedAt: new Date().toISOString(),
+      completedUnits,
+      sessionFile,
+    };
+    writeFileSync(lockPath(basePath), JSON.stringify(data, null, 2), "utf-8");
+  } catch { /* non-fatal */ }
+}
+
+/** Remove the lock file on clean stop. */
+export function clearLock(basePath: string): void {
+  try {
+    const p = lockPath(basePath);
+    if (existsSync(p)) unlinkSync(p);
+  } catch { /* non-fatal */ }
+}
+
+/** Check if a crash lock exists and return its data. */
+export function readCrashLock(basePath: string): LockData | null {
+  try {
+    const p = lockPath(basePath);
+    if (!existsSync(p)) return null;
+    const raw = readFileSync(p, "utf-8");
+    return JSON.parse(raw) as LockData;
+  } catch {
+    return null;
+  }
+}
+
+/** Format crash info for display or injection into a prompt. */
+export function formatCrashInfo(lock: LockData): string {
+  return [
+    `Previous auto-mode session was interrupted.`,
+    `  Was executing: ${lock.unitType} (${lock.unitId})`,
+    `  Started at: ${lock.unitStartedAt}`,
+    `  Units completed before crash: ${lock.completedUnits}`,
+    `  PID: ${lock.pid}`,
+  ].join("\n");
+}
--- a/src/resources/extensions/gsd/dashboard-overlay.ts
+++ b/src/resources/extensions/gsd/dashboard-overlay.ts
@ -0,0 +1,516 @@
+/**
+ * GSD Dashboard Overlay
+ *
+ * Full-screen overlay showing auto-mode progress: milestone/slice/task
+ * breakdown, current unit, completed units, timing, and activity log.
+ * Toggled with Ctrl+Alt+G or opened from /gsd status.
+ */
+
+import type { Theme } from "@mariozechner/pi-coding-agent";
+import { truncateToWidth, visibleWidth, matchesKey, Key } from "@mariozechner/pi-tui";
+import { deriveState } from "./state.js";
+import { loadFile, parseRoadmap, parsePlan } from "./files.js";
+import { resolveMilestoneFile, resolveSliceFile } from "./paths.js";
+import { getAutoDashboardData, type AutoDashboardData } from "./auto.js";
+import {
+  getLedger, getProjectTotals, aggregateByPhase, aggregateBySlice,
+  aggregateByModel, formatCost, formatTokenCount, formatCostProjection,
+} from "./metrics.js";
+import { loadEffectiveGSDPreferences } from "./preferences.js";
+
+function formatDuration(ms: number): string {
+  const s = Math.floor(ms / 1000);
+  if (s < 60) return `${s}s`;
+  const m = Math.floor(s / 60);
+  const rs = s % 60;
+  if (m < 60) return `${m}m ${rs}s`;
+  const h = Math.floor(m / 60);
+  const rm = m % 60;
+  return `${h}h ${rm}m`;
+}
+
+function unitLabel(type: string): string {
+  switch (type) {
+    case "research-milestone": return "Research";
+    case "plan-milestone": return "Plan";
+    case "research-slice": return "Research";
+    case "plan-slice": return "Plan";
+    case "execute-task": return "Execute";
+    case "complete-slice": return "Complete";
+    case "reassess-roadmap": return "Reassess";
+    default: return type;
+  }
+}
+
+function centerLine(content: string, width: number): string {
+  const vis = visibleWidth(content);
+  if (vis >= width) return truncateToWidth(content, width);
+  const leftPad = Math.floor((width - vis) / 2);
+  return " ".repeat(leftPad) + content;
+}
+
+function padRight(content: string, width: number): string {
+  const vis = visibleWidth(content);
+  return content + " ".repeat(Math.max(0, width - vis));
+}
+
+function joinColumns(left: string, right: string, width: number): string {
+  const leftW = visibleWidth(left);
+  const rightW = visibleWidth(right);
+  if (leftW + rightW + 2 > width) {
+    return truncateToWidth(`${left}  ${right}`, width);
+  }
+  return left + " ".repeat(width - leftW - rightW) + right;
+}
+
+function fitColumns(parts: string[], width: number, separator = "  "): string {
+  const filtered = parts.filter(Boolean);
+  if (filtered.length === 0) return "";
+  let result = filtered[0];
+  for (let i = 1; i < filtered.length; i++) {
+    const candidate = `${result}${separator}${filtered[i]}`;
+    if (visibleWidth(candidate) > width) break;
+    result = candidate;
+  }
+  return truncateToWidth(result, width);
+}
+
+export class GSDDashboardOverlay {
+  private tui: { requestRender: () => void };
+  private theme: Theme;
+  private onClose: () => void;
+  private cachedWidth?: number;
+  private cachedLines?: string[];
+  private refreshTimer: ReturnType<typeof setInterval>;
+  private scrollOffset = 0;
+  private dashData: AutoDashboardData;
+  private milestoneData: MilestoneView | null = null;
+  private loading = true;
+
+  constructor(
+    tui: { requestRender: () => void },
+    theme: Theme,
+    onClose: () => void,
+  ) {
+    this.tui = tui;
+    this.theme = theme;
+    this.onClose = onClose;
+    this.dashData = getAutoDashboardData();
+
+    this.loadData().then(() => {
+      this.loading = false;
+      this.invalidate();
+      this.tui.requestRender();
+    });
+
+    this.refreshTimer = setInterval(() => {
+      this.dashData = getAutoDashboardData();
+      this.loadData().then(() => {
+        this.invalidate();
+        this.tui.requestRender();
+      });
+    }, 2000);
+  }
+
+  private async loadData(): Promise<void> {
+    const base = this.dashData.basePath || process.cwd();
+    try {
+      const state = await deriveState(base);
+      if (!state.activeMilestone) {
+        this.milestoneData = null;
+        return;
+      }
+
+      const mid = state.activeMilestone.id;
+      const view: MilestoneView = {
+        id: mid,
+        title: state.activeMilestone.title,
+        slices: [],
+        phase: state.phase,
+        progress: {
+          milestones: {
+            total: state.progress?.milestones.total ?? state.registry.length,
+            done: state.progress?.milestones.done ?? state.registry.filter(entry => entry.status === "complete").length,
+          },
+        },
+      };
+
+      const roadmapFile = resolveMilestoneFile(base, mid, "ROADMAP");
+      const roadmapContent = roadmapFile ? await loadFile(roadmapFile) : null;
+      if (roadmapContent) {
+        const roadmap = parseRoadmap(roadmapContent);
+        for (const s of roadmap.slices) {
+          const sliceView: SliceView = {
+            id: s.id,
+            title: s.title,
+            done: s.done,
+            risk: s.risk,
+            active: state.activeSlice?.id === s.id,
+            tasks: [],
+          };
+
+          if (sliceView.active) {
+            const planFile = resolveSliceFile(base, mid, s.id, "PLAN");
+            const planContent = planFile ? await loadFile(planFile) : null;
+            if (planContent) {
+              const plan = parsePlan(planContent);
+              sliceView.taskProgress = {
+                done: plan.tasks.filter(t => t.done).length,
+                total: plan.tasks.length,
+              };
+              for (const t of plan.tasks) {
+                sliceView.tasks.push({
+                  id: t.id,
+                  title: t.title,
+                  done: t.done,
+                  active: state.activeTask?.id === t.id,
+                });
+              }
+            }
+          }
+
+          view.slices.push(sliceView);
+        }
+      }
+
+      this.milestoneData = view;
+    } catch {
+      // Don't crash the overlay
+    }
+  }
+
+  handleInput(data: string): void {
+    if (matchesKey(data, Key.escape) || matchesKey(data, Key.ctrl("c")) || matchesKey(data, Key.ctrlAlt("g"))) {
+      clearInterval(this.refreshTimer);
+      this.onClose();
+      return;
+    }
+
+    if (matchesKey(data, Key.down) || matchesKey(data, "j")) {
+      this.scrollOffset++;
+      this.invalidate();
+      this.tui.requestRender();
+      return;
+    }
+
+    if (matchesKey(data, Key.up) || matchesKey(data, "k")) {
+      this.scrollOffset = Math.max(0, this.scrollOffset - 1);
+      this.invalidate();
+      this.tui.requestRender();
+      return;
+    }
+
+    if (data === "g") {
+      this.scrollOffset = 0;
+      this.invalidate();
+      this.tui.requestRender();
+      return;
+    }
+
+    if (data === "G") {
+      this.scrollOffset = 999;
+      this.invalidate();
+      this.tui.requestRender();
+      return;
+    }
+  }
+
+  render(width: number): string[] {
+    if (this.cachedLines && this.cachedWidth === width) {
+      return this.cachedLines;
+    }
+
+    const content = this.buildContentLines(width);
+    const viewportHeight = Math.max(5, process.stdout.rows ? process.stdout.rows - 8 : 24);
+    const chromeHeight = 2;
+    const visibleContentRows = Math.max(1, viewportHeight - chromeHeight);
+    const maxScroll = Math.max(0, content.length - visibleContentRows);
+    this.scrollOffset = Math.min(this.scrollOffset, maxScroll);
+    const visibleContent = content.slice(this.scrollOffset, this.scrollOffset + visibleContentRows);
+
+    const lines = this.wrapInBox(visibleContent, width);
+
+    this.cachedWidth = width;
+    this.cachedLines = lines;
+    return lines;
+  }
+
+  private wrapInBox(inner: string[], width: number): string[] {
+    const th = this.theme;
+    const border = (s: string) => th.fg("borderAccent", s);
+    const innerWidth = width - 4;
+    const lines: string[] = [];
+
+    lines.push(border("╭" + "─".repeat(width - 2) + "╮"));
+    for (const line of inner) {
+      const truncated = truncateToWidth(line, innerWidth);
+      const padWidth = Math.max(0, innerWidth - visibleWidth(truncated));
+      lines.push(border("│") + " " + truncated + " ".repeat(padWidth) + " " + border("│"));
+    }
+    lines.push(border("╰" + "─".repeat(width - 2) + "╯"));
+    return lines;
+  }
+
+  private buildContentLines(width: number): string[] {
+    const th = this.theme;
+    const shellWidth = width - 4;
+    const contentWidth = Math.min(shellWidth, 128);
+    const sidePad = Math.max(0, Math.floor((shellWidth - contentWidth) / 2));
+    const leftMargin = " ".repeat(sidePad);
+    const lines: string[] = [];
+
+    const row = (content = ""): string => {
+      const truncated = truncateToWidth(content, contentWidth);
+      return leftMargin + padRight(truncated, contentWidth);
+    };
+    const blank = () => row("");
+    const hr = () => row(th.fg("dim", "─".repeat(contentWidth)));
+    const centered = (content: string) => row(centerLine(content, contentWidth));
+
+    const title = th.fg("accent", th.bold("GSD Dashboard"));
+    const status = this.dashData.active
+      ? `${Date.now() % 2000 < 1000 ? th.fg("success", "●") : th.fg("dim", "○")} ${th.fg("success", "AUTO")}`
+      : this.dashData.paused
+        ? th.fg("warning", "⏸ PAUSED")
+        : th.fg("dim", "idle");
+    const elapsed = th.fg("dim", formatDuration(this.dashData.elapsed));
+    lines.push(row(joinColumns(`${title}  ${status}`, elapsed, contentWidth)));
+    lines.push(blank());
+
+    if (this.dashData.currentUnit) {
+      const cu = this.dashData.currentUnit;
+      const currentElapsed = th.fg("dim", formatDuration(Date.now() - cu.startedAt));
+      lines.push(row(joinColumns(
+        `${th.fg("text", "Now")}: ${th.fg("accent", unitLabel(cu.type))} ${th.fg("text", cu.id)}`,
+        currentElapsed,
+        contentWidth,
+      )));
+      lines.push(blank());
+    } else if (this.dashData.paused) {
+      lines.push(row(th.fg("dim", "/gsd auto to resume")));
+      lines.push(blank());
+    } else {
+      lines.push(row(th.fg("dim", "No unit running · /gsd auto to start")));
+      lines.push(blank());
+    }
+
+    if (this.loading) {
+      lines.push(centered(th.fg("dim", "Loading dashboard…")));
+      return lines;
+    }
+
+    if (this.milestoneData) {
+      const mv = this.milestoneData;
+      lines.push(row(th.fg("text", th.bold(`${mv.id}: ${mv.title}`))));
+      lines.push(blank());
+
+      const totalSlices = mv.slices.length;
+      const doneSlices = mv.slices.filter(s => s.done).length;
+      const totalMilestones = mv.progress.milestones.total;
+      const doneMilestones = mv.progress.milestones.done;
+      const activeSlice = mv.slices.find(s => s.active);
+
+      lines.push(blank());
+
+      if (activeSlice?.taskProgress) {
+        lines.push(row(this.renderProgressRow("Tasks", activeSlice.taskProgress.done, activeSlice.taskProgress.total, "accent", contentWidth)));
+      }
+      lines.push(row(this.renderProgressRow("Slices", doneSlices, totalSlices, "success", contentWidth)));
+      lines.push(row(this.renderProgressRow("Milestones", doneMilestones, totalMilestones, "warning", contentWidth)));
+
+      lines.push(blank());
+
+      for (const s of mv.slices) {
+        const icon = s.done ? th.fg("success", "✓")
+          : s.active ? th.fg("accent", "▸")
+          : th.fg("dim", "○");
+        const titleText = s.active ? th.fg("accent", `${s.id}: ${s.title}`)
+          : s.done ? th.fg("muted", `${s.id}: ${s.title}`)
+          : th.fg("dim", `${s.id}: ${s.title}`);
+        const risk = th.fg("dim", s.risk);
+        lines.push(row(joinColumns(`  ${icon} ${titleText}`, risk, contentWidth)));
+
+        if (s.active && s.tasks.length > 0) {
+          for (const t of s.tasks) {
+            const tIcon = t.done ? th.fg("success", "✓")
+              : t.active ? th.fg("warning", "▸")
+              : th.fg("dim", "·");
+            const tTitle = t.active ? th.fg("warning", `${t.id}: ${t.title}`)
+              : t.done ? th.fg("muted", `${t.id}: ${t.title}`)
+              : th.fg("dim", `${t.id}: ${t.title}`);
+            lines.push(row(`      ${tIcon} ${truncateToWidth(tTitle, contentWidth - 6)}`));
+          }
+        }
+      }
+    } else {
+      lines.push(centered(th.fg("dim", "No active milestone.")));
+    }
+
+    if (this.dashData.completedUnits.length > 0) {
+      lines.push(blank());
+      lines.push(hr());
+      lines.push(row(th.fg("text", th.bold("Completed"))));
+      lines.push(blank());
+
+      const recent = [...this.dashData.completedUnits].reverse().slice(0, 10);
+      for (const u of recent) {
+        const left = `  ${th.fg("success", "✓")} ${th.fg("muted", unitLabel(u.type))} ${th.fg("muted", u.id)}`;
+        const right = th.fg("dim", formatDuration(u.finishedAt - u.startedAt));
+        lines.push(row(joinColumns(left, right, contentWidth)));
+      }
+
+      if (this.dashData.completedUnits.length > 10) {
+        lines.push(row(th.fg("dim", `  ...and ${this.dashData.completedUnits.length - 10} more`)));
+      }
+    }
+
+    const ledger = getLedger();
+    if (ledger && ledger.units.length > 0) {
+      const totals = getProjectTotals(ledger.units);
+
+      lines.push(blank());
+      lines.push(hr());
+      lines.push(row(th.fg("text", th.bold("Cost & Usage"))));
+      lines.push(blank());
+
+      lines.push(row(fitColumns([
+        `${th.fg("warning", formatCost(totals.cost))} total`,
+        `${th.fg("text", formatTokenCount(totals.tokens.total))} tokens`,
+        `${th.fg("text", String(totals.toolCalls))} tools`,
+        `${th.fg("text", String(totals.units))} units`,
+      ], contentWidth, `  ${th.fg("dim", "·")}  `)));
+
+      lines.push(row(fitColumns([
+        `${th.fg("dim", "in:")} ${th.fg("text", formatTokenCount(totals.tokens.input))}`,
+        `${th.fg("dim", "out:")} ${th.fg("text", formatTokenCount(totals.tokens.output))}`,
+        `${th.fg("dim", "cache-r:")} ${th.fg("text", formatTokenCount(totals.tokens.cacheRead))}`,
+        `${th.fg("dim", "cache-w:")} ${th.fg("text", formatTokenCount(totals.tokens.cacheWrite))}`,
+      ], contentWidth, "  ")));
+
+      const phases = aggregateByPhase(ledger.units);
+      if (phases.length > 0) {
+        lines.push(blank());
+        lines.push(row(th.fg("dim", "By Phase")));
+        for (const p of phases) {
+          const pct = totals.cost > 0 ? Math.round((p.cost / totals.cost) * 100) : 0;
+          const left = `  ${th.fg("text", p.phase.padEnd(14))}${th.fg("warning", formatCost(p.cost).padStart(8))}`;
+          const right = th.fg("dim", `${String(pct).padStart(3)}%  ${formatTokenCount(p.tokens.total)} tok  ${p.units} units`);
+          lines.push(row(joinColumns(left, right, contentWidth)));
+        }
+      }
+
+      const slices = aggregateBySlice(ledger.units);
+      if (slices.length > 0) {
+        lines.push(blank());
+        lines.push(row(th.fg("dim", "By Slice")));
+        for (const s of slices) {
+          const pct = totals.cost > 0 ? Math.round((s.cost / totals.cost) * 100) : 0;
+          const left = `  ${th.fg("text", s.sliceId.padEnd(14))}${th.fg("warning", formatCost(s.cost).padStart(8))}`;
+          const right = th.fg("dim", `${String(pct).padStart(3)}%  ${formatTokenCount(s.tokens.total)} tok  ${formatDuration(s.duration)}`);
+          lines.push(row(joinColumns(left, right, contentWidth)));
+        }
+      }
+
+      // Cost projection — only when active milestone data is available
+      if (this.milestoneData) {
+        const mv = this.milestoneData;
+        const msTotalSlices = mv.slices.length;
+        const msDoneSlices = mv.slices.filter(s => s.done).length;
+        const remainingCount = msTotalSlices - msDoneSlices;
+        const overlayPrefs = loadEffectiveGSDPreferences()?.preferences;
+        const projLines = formatCostProjection(slices, remainingCount, overlayPrefs?.budget_ceiling);
+        if (projLines.length > 0) {
+          lines.push(blank());
+          for (const line of projLines) {
+            const colored = line.toLowerCase().includes('ceiling')
+              ? th.fg("warning", line)
+              : th.fg("dim", line);
+            lines.push(row(colored));
+          }
+        }
+      }
+
+      const models = aggregateByModel(ledger.units);
+      if (models.length > 1) {
+        lines.push(blank());
+        lines.push(row(th.fg("dim", "By Model")));
+        for (const m of models) {
+          const pct = totals.cost > 0 ? Math.round((m.cost / totals.cost) * 100) : 0;
+          const modelName = truncateToWidth(m.model, 38);
+          const left = `  ${th.fg("text", modelName.padEnd(38))}${th.fg("warning", formatCost(m.cost).padStart(8))}`;
+          const right = th.fg("dim", `${String(pct).padStart(3)}%  ${m.units} units`);
+          lines.push(row(joinColumns(left, right, contentWidth)));
+        }
+      }
+
+      lines.push(blank());
+      lines.push(row(`${th.fg("dim", "avg/unit:")} ${th.fg("text", formatCost(totals.cost / totals.units))}  ${th.fg("dim", "·")}  ${th.fg("text", formatTokenCount(Math.round(totals.tokens.total / totals.units)))} tokens`));
+    }
+
+    lines.push(blank());
+    lines.push(hr());
+    lines.push(centered(th.fg("dim", "↑↓ scroll · g/G top/end · esc close")));
+
+    return lines;
+  }
+
+  private renderProgressRow(
+    label: string,
+    done: number,
+    total: number,
+    color: "success" | "accent" | "warning",
+    width: number,
+  ): string {
+    const th = this.theme;
+    const pct = total > 0 ? Math.round((done / total) * 100) : 0;
+    const labelWidth = 12;
+    const rightWidth = 14;
+    const gap = 2;
+    const labelText = truncateToWidth(label, labelWidth, "").padEnd(labelWidth);
+    const ratioText = `${done}/${total}`;
+    const rightText = `${String(pct).padStart(3)}%  ${ratioText.padStart(rightWidth - 5)}`;
+    const barWidth = Math.max(12, width - labelWidth - rightWidth - gap * 2);
+    const filled = total > 0 ? Math.round((done / total) * barWidth) : 0;
+    const bar = th.fg(color, "█".repeat(filled)) + th.fg("dim", "░".repeat(Math.max(0, barWidth - filled)));
+    return `${th.fg("dim", labelText)}${" ".repeat(gap)}${bar}${" ".repeat(gap)}${th.fg("dim", rightText)}`;
+  }
+
+  invalidate(): void {
+    this.cachedWidth = undefined;
+    this.cachedLines = undefined;
+  }
+
+  dispose(): void {
+    clearInterval(this.refreshTimer);
+  }
+}
+
+interface MilestoneView {
+  id: string;
+  title: string;
+  slices: SliceView[];
+  phase: string;
+  progress: {
+    milestones: {
+      total: number;
+      done: number;
+    };
+  };
+}
+
+interface SliceView {
+  id: string;
+  title: string;
+  done: boolean;
+  risk: string;
+  active: boolean;
+  tasks: TaskView[];
+  taskProgress?: { done: number; total: number };
+}
+
+interface TaskView {
+  id: string;
+  title: string;
+  done: boolean;
+  active: boolean;
+}
--- a/src/resources/extensions/gsd/docs/preferences-reference.md
+++ b/src/resources/extensions/gsd/docs/preferences-reference.md
@ -0,0 +1,103 @@
+# GSD Preferences Reference
+
+Full documentation for `~/.gsd/preferences.md` (global) and `.gsd/preferences.md` (project).
+
+---
+
+## Notes
+
+- Keep this skill-first.
+- Prefer explicit skill names or absolute paths.
+- Use absolute paths for personal/local skills when you want zero ambiguity.
+- These preferences guide which skills GSD should load and follow; they do not override higher-priority instructions in the current conversation.
+
+---
+
+## Field Guide
+
+- `version`: schema version. Start at `1`.
+
+- `always_use_skills`: skills GSD should use whenever they are relevant.
+
+- `prefer_skills`: soft defaults GSD should prefer when relevant.
+
+- `avoid_skills`: skills GSD should avoid unless clearly needed.
+
+- `skill_rules`: situational rules with a human-readable `when` trigger and one or more of `use`, `prefer`, or `avoid`.
+
+- `custom_instructions`: extra durable instructions related to skill use.
+
+- `models`: per-stage model selection for auto-mode. Keys: `research`, `planning`, `execution`, `completion`. Values: model IDs (e.g. `claude-sonnet-4-6`, `claude-opus-4-6`). Omit a key to use whatever model is currently active.
+
+- `skill_discovery`: controls how GSD discovers and applies skills during auto-mode. Valid values:
+  - `auto` — skills are found and applied automatically without prompting.
+  - `suggest` — (default) skills are identified during research but not installed automatically.
+  - `off` — skill discovery is disabled entirely.
+
+- `auto_supervisor`: configures the auto-mode supervisor that monitors agent progress and enforces timeouts. Keys:
+  - `model`: model ID to use for the supervisor process (defaults to the currently active model).
+  - `soft_timeout_minutes`: minutes before the supervisor issues a soft warning (default: 20).
+  - `idle_timeout_minutes`: minutes of inactivity before the supervisor intervenes (default: 10).
+  - `hard_timeout_minutes`: minutes before the supervisor forces termination (default: 30).
+
+---
+
+## Best Practices
+
+- Keep `always_use_skills` short.
+- Use `skill_rules` for situational routing, not broad personality preferences.
+- Prefer skill names for stable built-in skills.
+- Prefer absolute paths for local personal skills.
+
+---
+
+## Models Example
+
+```yaml
+---
+version: 1
+models:
+  research: claude-sonnet-4-6
+  planning: claude-opus-4-6
+  execution: claude-sonnet-4-6
+  completion: claude-sonnet-4-6
+---
+```
+
+Opus for planning (where architectural decisions matter most), Sonnet for everything else (faster, cheaper). Omit any key to use the currently selected model.
+
+---
+
+## Example Variations
+
+**Minimal — always load a UAT skill and route Clerk tasks:**
+
+```yaml
+---
+version: 1
+always_use_skills:
+  - /Users/you/.claude/skills/verify-uat
+skill_rules:
+  - when: finishing implementation and human judgment matters
+    use:
+      - /Users/you/.claude/skills/verify-uat
+---
+```
+
+**Richer routing — prefer cleanup and authentication skills:**
+
+```yaml
+---
+version: 1
+prefer_skills:
+  - commit-ignore
+skill_rules:
+  - when: task involves Clerk authentication
+    use:
+      - clerk
+      - clerk-setup
+  - when: the user is looking for installable capability rather than implementation
+    prefer:
+      - find-skills
+---
+```
--- a/src/resources/extensions/gsd/doctor.ts
+++ b/src/resources/extensions/gsd/doctor.ts
@ -0,0 +1,683 @@
+import { existsSync, mkdirSync } from "node:fs";
+import { join } from "node:path";
+
+import { loadFile, parsePlan, parseRoadmap, parseSummary, saveFile, parseTaskPlanMustHaves, countMustHavesMentionedInSummary } from "./files.js";
+import { resolveMilestoneFile, resolveMilestonePath, resolveSliceFile, resolveSlicePath, resolveTaskFile, resolveTaskFiles, resolveTasksDir, milestonesDir, gsdRoot, relMilestoneFile, relSliceFile, relTaskFile, relSlicePath, relGsdRootFile, resolveGsdRootFile } from "./paths.js";
+import { deriveState, isMilestoneComplete } from "./state.js";
+import { loadEffectiveGSDPreferences, type GSDPreferences } from "./preferences.js";
+
+export type DoctorSeverity = "info" | "warning" | "error";
+export type DoctorIssueCode =
+  | "invalid_preferences"
+  | "missing_tasks_dir"
+  | "missing_slice_plan"
+  | "task_done_missing_summary"
+  | "task_summary_without_done_checkbox"
+  | "all_tasks_done_missing_slice_summary"
+  | "all_tasks_done_missing_slice_uat"
+  | "all_tasks_done_roadmap_not_checked"
+  | "slice_checked_missing_summary"
+  | "slice_checked_missing_uat"
+  | "all_slices_done_missing_milestone_summary"
+  | "task_done_must_haves_not_verified"
+  | "active_requirement_missing_owner"
+  | "blocked_requirement_missing_reason"
+  | "blocker_discovered_no_replan";
+
+export interface DoctorIssue {
+  severity: DoctorSeverity;
+  code: DoctorIssueCode;
+  scope: "project" | "milestone" | "slice" | "task";
+  unitId: string;
+  message: string;
+  file?: string;
+  fixable: boolean;
+}
+
+export interface DoctorReport {
+  ok: boolean;
+  basePath: string;
+  issues: DoctorIssue[];
+  fixesApplied: string[];
+}
+
+export interface DoctorSummary {
+  total: number;
+  errors: number;
+  warnings: number;
+  infos: number;
+  fixable: number;
+  byCode: Array<{ code: DoctorIssueCode; count: number }>;
+}
+
+function normalizeStringArray(value: unknown): string[] | undefined {
+  if (!Array.isArray(value)) return undefined;
+  const items = value.filter((item): item is string => typeof item === "string").map(item => item.trim()).filter(Boolean);
+  return items.length > 0 ? Array.from(new Set(items)) : undefined;
+}
+
+function validatePreferenceShape(preferences: GSDPreferences): string[] {
+  const issues: string[] = [];
+  const listFields = ["always_use_skills", "prefer_skills", "avoid_skills", "custom_instructions"] as const;
+  for (const field of listFields) {
+    const value = preferences[field];
+    if (value !== undefined && !Array.isArray(value)) {
+      issues.push(`${field} must be a list`);
+    }
+  }
+
+  if (preferences.skill_rules !== undefined) {
+    if (!Array.isArray(preferences.skill_rules)) {
+      issues.push("skill_rules must be a list");
+    } else {
+      for (const [index, rule] of preferences.skill_rules.entries()) {
+        if (!rule || typeof rule !== "object") {
+          issues.push(`skill_rules[${index}] must be an object`);
+          continue;
+        }
+        if (typeof rule.when !== "string") {
+          issues.push(`skill_rules[${index}].when must be a string`);
+        }
+        for (const key of ["use", "prefer", "avoid"] as const) {
+          const value = (rule as Record<string, unknown>)[key];
+          if (value !== undefined && !Array.isArray(value)) {
+            issues.push(`skill_rules[${index}].${key} must be a list`);
+          }
+        }
+      }
+    }
+  }
+
+  return issues;
+}
+
+function buildStateMarkdown(state: Awaited<ReturnType<typeof deriveState>>): string {
+  const lines: string[] = [];
+  lines.push("# GSD State", "");
+
+  const activeMilestone = state.activeMilestone
+    ? `${state.activeMilestone.id} — ${state.activeMilestone.title}`
+    : "None";
+  const activeSlice = state.activeSlice
+    ? `${state.activeSlice.id} — ${state.activeSlice.title}`
+    : "None";
+
+  lines.push(`**Active Milestone:** ${activeMilestone}`);
+  lines.push(`**Active Slice:** ${activeSlice}`);
+  lines.push(`**Phase:** ${state.phase}`);
+  if (state.requirements) {
+    lines.push(`**Requirements Status:** ${state.requirements.active} active · ${state.requirements.validated} validated · ${state.requirements.deferred} deferred · ${state.requirements.outOfScope} out of scope`);
+  }
+  lines.push("");
+  lines.push("## Milestone Registry");
+
+  for (const entry of state.registry) {
+    const glyph = entry.status === "complete" ? "✅" : entry.status === "active" ? "🔄" : "⬜";
+    lines.push(`- ${glyph} **${entry.id}:** ${entry.title}`);
+  }
+
+  lines.push("");
+  lines.push("## Recent Decisions");
+  if (state.recentDecisions.length > 0) {
+    for (const decision of state.recentDecisions) lines.push(`- ${decision}`);
+  } else {
+    lines.push("- None recorded");
+  }
+
+  lines.push("");
+  lines.push("## Blockers");
+  if (state.blockers.length > 0) {
+    for (const blocker of state.blockers) lines.push(`- ${blocker}`);
+  } else {
+    lines.push("- None");
+  }
+
+  lines.push("");
+  lines.push("## Next Action");
+  lines.push(state.nextAction || "None");
+  lines.push("");
+
+  return lines.join("\n");
+}
+
+async function updateStateFile(basePath: string, fixesApplied: string[]): Promise<void> {
+  const state = await deriveState(basePath);
+  const path = resolveGsdRootFile(basePath, "STATE");
+  await saveFile(path, buildStateMarkdown(state));
+  fixesApplied.push(`updated ${path}`);
+}
+
+async function ensureSliceSummaryStub(basePath: string, milestoneId: string, sliceId: string, fixesApplied: string[]): Promise<void> {
+  const path = join(resolveSlicePath(basePath, milestoneId, sliceId) ?? relSlicePath(basePath, milestoneId, sliceId), `${sliceId}-SUMMARY.md`);
+  const absolute = resolveSliceFile(basePath, milestoneId, sliceId, "SUMMARY") ?? join(resolveSlicePath(basePath, milestoneId, sliceId)!, `${sliceId}-SUMMARY.md`);
+  const content = [
+    "---",
+    `id: ${sliceId}`,
+    `parent: ${milestoneId}`,
+    `milestone: ${milestoneId}`,
+    "provides: []",
+    "requires: []",
+    "affects: []",
+    "key_files: []",
+    "key_decisions: []",
+    "patterns_established: []",
+    "observability_surfaces:",
+    "  - none yet — doctor created placeholder summary; replace with real diagnostics before treating as complete",
+    "drill_down_paths: []",
+    "duration: unknown",
+    "verification_result: unknown",
+    `completed_at: ${new Date().toISOString()}`,
+    "---",
+    "",
+    `# ${sliceId}: Recovery placeholder summary`,
+    "",
+    "**Doctor-created placeholder.**",
+    "",
+    "## What Happened",
+    "Doctor detected that all tasks were complete but the slice summary was missing. Replace this with a real compressed slice summary before relying on it.",
+    "",
+    "## Verification",
+    "Not re-run by doctor.",
+    "",
+    "## Deviations",
+    "Recovery placeholder created to restore required artifact shape.",
+    "",
+    "## Known Limitations",
+    "This file is intentionally incomplete and should be replaced by a real summary.",
+    "",
+    "## Follow-ups",
+    "- Regenerate this summary from task summaries.",
+    "",
+    "## Files Created/Modified",
+    `- \`${relSliceFile(basePath, milestoneId, sliceId, "SUMMARY")}\` — doctor-created placeholder summary`,
+    "",
+    "## Forward Intelligence",
+    "",
+    "### What the next slice should know",
+    "- Doctor had to reconstruct completion artifacts; inspect task summaries before continuing.",
+    "",
+    "### What's fragile",
+    "- Placeholder summary exists solely to unblock invariant checks.",
+    "",
+    "### Authoritative diagnostics",
+    "- Task summaries in the slice tasks/ directory — they are the actual authoritative source until this summary is rewritten.",
+    "",
+    "### What assumptions changed",
+    "- The system assumed completion would always write a slice summary; in practice doctor may need to restore missing artifacts.",
+    "",
+  ].join("\n");
+  await saveFile(absolute, content);
+  fixesApplied.push(`created placeholder ${absolute}`);
+}
+
+async function ensureSliceUatStub(basePath: string, milestoneId: string, sliceId: string, fixesApplied: string[]): Promise<void> {
+  const sDir = resolveSlicePath(basePath, milestoneId, sliceId);
+  if (!sDir) return;
+  const absolute = join(sDir, `${sliceId}-UAT.md`);
+  const content = [
+    `# ${sliceId}: Recovery placeholder UAT`,
+    "",
+    `**Milestone:** ${milestoneId}`,
+    `**Written:** ${new Date().toISOString()}`,
+    "",
+    "## Preconditions",
+    "- Doctor created this placeholder because the expected UAT file was missing.",
+    "",
+    "## Smoke Test",
+    "- Re-run the slice verification from the slice plan before shipping.",
+    "",
+    "## Test Cases",
+    "### 1. Replace this placeholder",
+    "1. Read the slice plan and task summaries.",
+    "2. Write a real UAT script.",
+    "3. **Expected:** This placeholder is replaced with meaningful human checks.",
+    "",
+    "## Edge Cases",
+    "### Missing completion artifacts",
+    "1. Confirm the summary, roadmap checkbox, and state file are coherent.",
+    "2. **Expected:** GSD doctor reports no remaining completion drift for this slice.",
+    "",
+    "## Failure Signals",
+    "- Placeholder content still present when treating the slice as done",
+    "",
+    "## Notes for Tester",
+    "Doctor created this file only to restore the required artifact shape. Replace it with a real UAT script.",
+    "",
+  ].join("\n");
+  await saveFile(absolute, content);
+  fixesApplied.push(`created placeholder ${absolute}`);
+}
+
+async function markTaskDoneInPlan(basePath: string, milestoneId: string, sliceId: string, taskId: string, fixesApplied: string[]): Promise<void> {
+  const planPath = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+  if (!planPath) return;
+  const content = await loadFile(planPath);
+  if (!content) return;
+  const updated = content.replace(new RegExp(`^-\\s+\\[ \\]\\s+\\*\\*${taskId}:`, "m"), `- [x] **${taskId}:`);
+  if (updated !== content) {
+    await saveFile(planPath, updated);
+    fixesApplied.push(`marked ${taskId} done in ${planPath}`);
+  }
+}
+
+async function markSliceDoneInRoadmap(basePath: string, milestoneId: string, sliceId: string, fixesApplied: string[]): Promise<void> {
+  const roadmapPath = resolveMilestoneFile(basePath, milestoneId, "ROADMAP");
+  if (!roadmapPath) return;
+  const content = await loadFile(roadmapPath);
+  if (!content) return;
+  const updated = content.replace(new RegExp(`^-\\s+\\[ \\]\\s+\\*\\*${sliceId}:`, "m"), `- [x] **${sliceId}:`);
+  if (updated !== content) {
+    await saveFile(roadmapPath, updated);
+    fixesApplied.push(`marked ${sliceId} done in ${roadmapPath}`);
+  }
+}
+
+function matchesScope(unitId: string, scope?: string): boolean {
+  if (!scope) return true;
+  return unitId === scope || unitId.startsWith(`${scope}/`) || unitId.startsWith(`${scope}`);
+}
+
+function auditRequirements(content: string | null): DoctorIssue[] {
+  if (!content) return [];
+  const issues: DoctorIssue[] = [];
+  const blocks = content.split(/^###\s+/m).slice(1);
+
+  for (const block of blocks) {
+    const idMatch = block.match(/^(R\d+)/);
+    if (!idMatch) continue;
+    const requirementId = idMatch[1];
+    const status = block.match(/^-\s+Status:\s+(.+)$/m)?.[1]?.trim().toLowerCase() ?? "";
+    const owner = block.match(/^-\s+Primary owning slice:\s+(.+)$/m)?.[1]?.trim().toLowerCase() ?? "";
+    const notes = block.match(/^-\s+Notes:\s+(.+)$/m)?.[1]?.trim().toLowerCase() ?? "";
+
+    if (status === "active" && (!owner || owner === "none" || owner === "none yet")) {
+      issues.push({
+        severity: "error",
+        code: "active_requirement_missing_owner",
+        scope: "project",
+        unitId: requirementId,
+        message: `${requirementId} is Active but has no primary owning slice`,
+        file: relGsdRootFile("REQUIREMENTS"),
+        fixable: false,
+      });
+    }
+
+    if (status === "blocked" && !notes) {
+      issues.push({
+        severity: "warning",
+        code: "blocked_requirement_missing_reason",
+        scope: "project",
+        unitId: requirementId,
+        message: `${requirementId} is Blocked but has no reason in Notes`,
+        file: relGsdRootFile("REQUIREMENTS"),
+        fixable: false,
+      });
+    }
+  }
+
+  return issues;
+}
+
+export function summarizeDoctorIssues(issues: DoctorIssue[]): DoctorSummary {
+  const errors = issues.filter(issue => issue.severity === "error").length;
+  const warnings = issues.filter(issue => issue.severity === "warning").length;
+  const infos = issues.filter(issue => issue.severity === "info").length;
+  const fixable = issues.filter(issue => issue.fixable).length;
+  const byCodeMap = new Map<DoctorIssueCode, number>();
+  for (const issue of issues) {
+    byCodeMap.set(issue.code, (byCodeMap.get(issue.code) ?? 0) + 1);
+  }
+  const byCode = [...byCodeMap.entries()]
+    .map(([code, count]) => ({ code, count }))
+    .sort((a, b) => b.count - a.count || a.code.localeCompare(b.code));
+  return { total: issues.length, errors, warnings, infos, fixable, byCode };
+}
+
+export async function selectDoctorScope(basePath: string, requestedScope?: string): Promise<string | undefined> {
+  if (requestedScope) return requestedScope;
+
+  const state = await deriveState(basePath);
+  if (state.activeMilestone?.id && state.activeSlice?.id) {
+    return `${state.activeMilestone.id}/${state.activeSlice.id}`;
+  }
+  if (state.activeMilestone?.id) {
+    return state.activeMilestone.id;
+  }
+
+  const milestonesPath = milestonesDir(basePath);
+  if (!existsSync(milestonesPath)) return undefined;
+
+  for (const milestone of state.registry) {
+    const roadmapPath = resolveMilestoneFile(basePath, milestone.id, "ROADMAP");
+    const roadmapContent = roadmapPath ? await loadFile(roadmapPath) : null;
+    if (!roadmapContent) continue;
+    const roadmap = parseRoadmap(roadmapContent);
+    if (!isMilestoneComplete(roadmap)) return milestone.id;
+  }
+
+  return state.registry[0]?.id;
+}
+
+export function filterDoctorIssues(issues: DoctorIssue[], options?: { scope?: string; includeWarnings?: boolean; includeHistorical?: boolean }): DoctorIssue[] {
+  let filtered = issues;
+  if (options?.scope) filtered = filtered.filter(issue => matchesScope(issue.unitId, options.scope));
+  if (!options?.includeWarnings) filtered = filtered.filter(issue => issue.severity === "error");
+  return filtered;
+}
+
+export function formatDoctorReport(
+  report: DoctorReport,
+  options?: { scope?: string; includeWarnings?: boolean; maxIssues?: number; title?: string },
+): string {
+  const scopedIssues = filterDoctorIssues(report.issues, {
+    scope: options?.scope,
+    includeWarnings: options?.includeWarnings ?? true,
+  });
+  const summary = summarizeDoctorIssues(scopedIssues);
+  const maxIssues = options?.maxIssues ?? 12;
+  const lines: string[] = [];
+  lines.push(options?.title ?? (summary.errors > 0 ? "GSD doctor found blocking issues." : "GSD doctor report."));
+  lines.push(`Scope: ${options?.scope ?? "all milestones"}`);
+  lines.push(`Issues: ${summary.total} total · ${summary.errors} error(s) · ${summary.warnings} warning(s) · ${summary.fixable} fixable`);
+
+  if (summary.byCode.length > 0) {
+    lines.push("Top issue types:");
+    for (const item of summary.byCode.slice(0, 5)) {
+      lines.push(`- ${item.code}: ${item.count}`);
+    }
+  }
+
+  if (scopedIssues.length > 0) {
+    lines.push("Priority issues:");
+    for (const issue of scopedIssues.slice(0, maxIssues)) {
+      const prefix = issue.severity === "error" ? "ERROR" : issue.severity === "warning" ? "WARN" : "INFO";
+      lines.push(`- [${prefix}] ${issue.unitId}: ${issue.message}${issue.file ? ` (${issue.file})` : ""}`);
+    }
+    if (scopedIssues.length > maxIssues) {
+      lines.push(`- ...and ${scopedIssues.length - maxIssues} more in scope`);
+    }
+  }
+
+  if (report.fixesApplied.length > 0) {
+    lines.push("Fixes applied:");
+    for (const fix of report.fixesApplied.slice(0, maxIssues)) lines.push(`- ${fix}`);
+    if (report.fixesApplied.length > maxIssues) lines.push(`- ...and ${report.fixesApplied.length - maxIssues} more`);
+  }
+
+  return lines.join("\n");
+}
+
+export function formatDoctorIssuesForPrompt(issues: DoctorIssue[]): string {
+  if (issues.length === 0) return "- No remaining issues in scope.";
+  return issues.map(issue => {
+    const prefix = issue.severity === "error" ? "ERROR" : issue.severity === "warning" ? "WARN" : "INFO";
+    return `- [${prefix}] ${issue.unitId} | ${issue.code} | ${issue.message}${issue.file ? ` | file: ${issue.file}` : ""} | fixable: ${issue.fixable ? "yes" : "no"}`;
+  }).join("\n");
+}
+
+export async function runGSDDoctor(basePath: string, options?: { fix?: boolean; scope?: string }): Promise<DoctorReport> {
+  const issues: DoctorIssue[] = [];
+  const fixesApplied: string[] = [];
+  const fix = options?.fix === true;
+
+  const prefs = loadEffectiveGSDPreferences();
+  if (prefs) {
+    const prefIssues = validatePreferenceShape(prefs.preferences);
+    for (const issue of prefIssues) {
+      issues.push({
+        severity: "warning",
+        code: "invalid_preferences",
+        scope: "project",
+        unitId: "project",
+        message: `GSD preferences invalid: ${issue}`,
+        file: prefs.path,
+        fixable: false,
+      });
+    }
+  }
+
+  const milestonesPath = milestonesDir(basePath);
+  if (!existsSync(milestonesPath)) {
+    return { ok: issues.every(issue => issue.severity !== "error"), basePath, issues, fixesApplied };
+  }
+
+  const requirementsPath = resolveGsdRootFile(basePath, "REQUIREMENTS");
+  const requirementsContent = await loadFile(requirementsPath);
+  issues.push(...auditRequirements(requirementsContent));
+
+  const state = await deriveState(basePath);
+  for (const milestone of state.registry) {
+    const milestoneId = milestone.id;
+    const milestonePath = resolveMilestonePath(basePath, milestoneId);
+    if (!milestonePath) continue;
+
+    const roadmapPath = resolveMilestoneFile(basePath, milestoneId, "ROADMAP");
+    const roadmapContent = roadmapPath ? await loadFile(roadmapPath) : null;
+    if (!roadmapContent) continue;
+    const roadmap = parseRoadmap(roadmapContent);
+
+    for (const slice of roadmap.slices) {
+      const unitId = `${milestoneId}/${slice.id}`;
+      if (options?.scope && !matchesScope(unitId, options.scope) && options.scope !== milestoneId) continue;
+
+      const slicePath = resolveSlicePath(basePath, milestoneId, slice.id);
+      if (!slicePath) continue;
+
+      const tasksDir = resolveTasksDir(basePath, milestoneId, slice.id);
+      if (!tasksDir) {
+        issues.push({
+          severity: "error",
+          code: "missing_tasks_dir",
+          scope: "slice",
+          unitId,
+          message: `Missing tasks directory for ${unitId}`,
+          file: relSlicePath(basePath, milestoneId, slice.id),
+          fixable: true,
+        });
+        if (fix) {
+          mkdirSync(join(slicePath, "tasks"), { recursive: true });
+          fixesApplied.push(`created ${join(slicePath, "tasks")}`);
+        }
+      }
+
+      const planPath = resolveSliceFile(basePath, milestoneId, slice.id, "PLAN");
+      const planContent = planPath ? await loadFile(planPath) : null;
+      const plan = planContent ? parsePlan(planContent) : null;
+      if (!plan) {
+        issues.push({
+          severity: "warning",
+          code: "missing_slice_plan",
+          scope: "slice",
+          unitId,
+          message: `Slice ${unitId} has no plan file`,
+          file: relSliceFile(basePath, milestoneId, slice.id, "PLAN"),
+          fixable: false,
+        });
+        continue;
+      }
+
+      let allTasksDone = plan.tasks.length > 0;
+      for (const task of plan.tasks) {
+        const taskUnitId = `${unitId}/${task.id}`;
+        const summaryPath = resolveTaskFile(basePath, milestoneId, slice.id, task.id, "SUMMARY");
+        const hasSummary = !!(summaryPath && await loadFile(summaryPath));
+
+        if (task.done && !hasSummary) {
+          issues.push({
+            severity: "error",
+            code: "task_done_missing_summary",
+            scope: "task",
+            unitId: taskUnitId,
+            message: `Task ${task.id} is marked done but summary is missing`,
+            file: relTaskFile(basePath, milestoneId, slice.id, task.id, "SUMMARY"),
+            fixable: false,
+          });
+        }
+
+        if (!task.done && hasSummary) {
+          issues.push({
+            severity: "warning",
+            code: "task_summary_without_done_checkbox",
+            scope: "task",
+            unitId: taskUnitId,
+            message: `Task ${task.id} has a summary but is not marked done in the slice plan`,
+            file: relSliceFile(basePath, milestoneId, slice.id, "PLAN"),
+            fixable: true,
+          });
+          if (fix) await markTaskDoneInPlan(basePath, milestoneId, slice.id, task.id, fixesApplied);
+        }
+
+        // Must-have verification: done task with summary — check if must-haves are addressed
+        if (task.done && hasSummary) {
+          const taskPlanPath = resolveTaskFile(basePath, milestoneId, slice.id, task.id, "PLAN");
+          if (taskPlanPath) {
+            const taskPlanContent = await loadFile(taskPlanPath);
+            if (taskPlanContent) {
+              const mustHaves = parseTaskPlanMustHaves(taskPlanContent);
+              if (mustHaves.length > 0) {
+                const summaryContent = await loadFile(summaryPath!);
+                const mentionedCount = summaryContent
+                  ? countMustHavesMentionedInSummary(mustHaves, summaryContent)
+                  : 0;
+                if (mentionedCount < mustHaves.length) {
+                  issues.push({
+                    severity: "warning",
+                    code: "task_done_must_haves_not_verified",
+                    scope: "task",
+                    unitId: taskUnitId,
+                    message: `Task ${task.id} has ${mustHaves.length} must-haves but summary addresses only ${mentionedCount}`,
+                    file: relTaskFile(basePath, milestoneId, slice.id, task.id, "SUMMARY"),
+                    fixable: false,
+                  });
+                }
+              }
+            }
+          }
+        }
+
+        allTasksDone = allTasksDone && task.done;
+      }
+
+      // Blocker-without-replan detection: a completed task reported blocker_discovered
+      // but no REPLAN.md exists yet — the slice is stuck
+      const replanPath = resolveSliceFile(basePath, milestoneId, slice.id, "REPLAN");
+      if (!replanPath) {
+        for (const task of plan.tasks) {
+          if (!task.done) continue;
+          const summaryPath = resolveTaskFile(basePath, milestoneId, slice.id, task.id, "SUMMARY");
+          if (!summaryPath) continue;
+          const summaryContent = await loadFile(summaryPath);
+          if (!summaryContent) continue;
+          const summary = parseSummary(summaryContent);
+          if (summary.frontmatter.blocker_discovered) {
+            issues.push({
+              severity: "warning",
+              code: "blocker_discovered_no_replan",
+              scope: "slice",
+              unitId,
+              message: `Task ${task.id} reported blocker_discovered but no REPLAN.md exists for ${slice.id} — slice may be stuck`,
+              file: relSliceFile(basePath, milestoneId, slice.id, "REPLAN"),
+              fixable: false,
+            });
+            break; // one issue per slice is sufficient
+          }
+        }
+      }
+
+      const sliceSummaryPath = resolveSliceFile(basePath, milestoneId, slice.id, "SUMMARY");
+      const sliceUatPath = join(slicePath, `${slice.id}-UAT.md`);
+      const hasSliceSummary = !!(sliceSummaryPath && await loadFile(sliceSummaryPath));
+      const hasSliceUat = existsSync(sliceUatPath);
+
+      if (allTasksDone && !hasSliceSummary) {
+        issues.push({
+          severity: "error",
+          code: "all_tasks_done_missing_slice_summary",
+          scope: "slice",
+          unitId,
+          message: `All tasks are done but ${slice.id}-SUMMARY.md is missing`,
+          file: relSliceFile(basePath, milestoneId, slice.id, "SUMMARY"),
+          fixable: true,
+        });
+        if (fix) await ensureSliceSummaryStub(basePath, milestoneId, slice.id, fixesApplied);
+      }
+
+      if (allTasksDone && !hasSliceUat) {
+        issues.push({
+          severity: "warning",
+          code: "all_tasks_done_missing_slice_uat",
+          scope: "slice",
+          unitId,
+          message: `All tasks are done but ${slice.id}-UAT.md is missing`,
+          file: `${relSlicePath(basePath, milestoneId, slice.id)}/${slice.id}-UAT.md`,
+          fixable: true,
+        });
+        if (fix) await ensureSliceUatStub(basePath, milestoneId, slice.id, fixesApplied);
+      }
+
+      if (allTasksDone && !slice.done) {
+        issues.push({
+          severity: "error",
+          code: "all_tasks_done_roadmap_not_checked",
+          scope: "slice",
+          unitId,
+          message: `All tasks are done but roadmap still shows ${slice.id} as incomplete`,
+          file: relMilestoneFile(basePath, milestoneId, "ROADMAP"),
+          fixable: true,
+        });
+        if (fix && (hasSliceSummary || issues.some(issue => issue.code === "all_tasks_done_missing_slice_summary" && issue.unitId === unitId))) {
+          await markSliceDoneInRoadmap(basePath, milestoneId, slice.id, fixesApplied);
+        }
+      }
+
+      if (slice.done && !hasSliceSummary) {
+        issues.push({
+          severity: "error",
+          code: "slice_checked_missing_summary",
+          scope: "slice",
+          unitId,
+          message: `Roadmap marks ${slice.id} complete but slice summary is missing`,
+          file: relSliceFile(basePath, milestoneId, slice.id, "SUMMARY"),
+          fixable: true,
+        });
+      }
+
+      if (slice.done && !hasSliceUat) {
+        issues.push({
+          severity: "warning",
+          code: "slice_checked_missing_uat",
+          scope: "slice",
+          unitId,
+          message: `Roadmap marks ${slice.id} complete but UAT file is missing`,
+          file: `${relSlicePath(basePath, milestoneId, slice.id)}/${slice.id}-UAT.md`,
+          fixable: true,
+        });
+      }
+    }
+
+    // Milestone-level check: all slices done but no milestone summary
+    if (isMilestoneComplete(roadmap) && !resolveMilestoneFile(basePath, milestoneId, "SUMMARY")) {
+      issues.push({
+        severity: "warning",
+        code: "all_slices_done_missing_milestone_summary",
+        scope: "milestone",
+        unitId: milestoneId,
+        message: `All slices are done but ${milestoneId}-SUMMARY.md is missing — milestone is stuck in completing-milestone phase`,
+        file: relMilestoneFile(basePath, milestoneId, "SUMMARY"),
+        fixable: false,
+      });
+    }
+  }
+
+  if (fix && fixesApplied.length > 0) {
+    await updateStateFile(basePath, fixesApplied);
+  }
+
+  return {
+    ok: issues.every(issue => issue.severity !== "error"),
+    basePath,
+    issues,
+    fixesApplied,
+  };
+}
+
--- a/src/resources/extensions/gsd/files.ts
+++ b/src/resources/extensions/gsd/files.ts
@ -0,0 +1,730 @@
+// GSD Extension — File Parsing and I/O
+// Parsers for roadmap, plan, summary, and continue files.
+// Used by state derivation and the status widget.
+// Pure functions, zero Pi dependencies — uses only Node built-ins.
+
+import { promises as fs, readdirSync } from 'node:fs';
+import { dirname } from 'node:path';
+import { milestonesDir, resolveMilestoneFile, relMilestoneFile } from './paths.js';
+
+import type {
+  Roadmap, RoadmapSliceEntry, BoundaryMapEntry, RiskLevel,
+  SlicePlan, TaskPlanEntry,
+  Summary, SummaryFrontmatter, SummaryRequires, FileModified,
+  Continue, ContinueFrontmatter, ContinueStatus,
+  RequirementCounts,
+} from './types.ts';
+
+// ─── Helpers ───────────────────────────────────────────────────────────────
+
+/**
+ * Split markdown content into frontmatter (YAML-like) and body.
+ * Returns [frontmatterLines, body] where frontmatterLines is null if no frontmatter.
+ */
+function splitFrontmatter(content: string): [string[] | null, string] {
+  const trimmed = content.trimStart();
+  if (!trimmed.startsWith('---')) return [null, content];
+
+  const afterFirst = trimmed.indexOf('\n');
+  if (afterFirst === -1) return [null, content];
+
+  const rest = trimmed.slice(afterFirst + 1);
+  const endIdx = rest.indexOf('\n---');
+  if (endIdx === -1) return [null, content];
+
+  const fmLines = rest.slice(0, endIdx).split('\n');
+  const body = rest.slice(endIdx + 4).replace(/^\n+/, '');
+  return [fmLines, body];
+}
+
+/**
+ * Parse YAML-like frontmatter lines into a flat key-value map.
+ * Handles simple scalars and arrays (lines starting with "  - ").
+ * Handles nested objects like requires (lines with "    key: value").
+ */
+function parseFrontmatterMap(lines: string[]): Record<string, unknown> {
+  const result: Record<string, unknown> = {};
+  let currentKey: string | null = null;
+  let currentArray: unknown[] | null = null;
+  let currentObj: Record<string, string> | null = null;
+
+  for (const line of lines) {
+    // Nested object property (4-space indent with key: value)
+    const nestedMatch = line.match(/^    (\w[\w_]*)\s*:\s*(.*)$/);
+    if (nestedMatch && currentArray && currentObj) {
+      currentObj[nestedMatch[1]] = nestedMatch[2].trim();
+      continue;
+    }
+
+    // Array item (2-space indent)
+    const arrayMatch = line.match(/^  - (.*)$/);
+    if (arrayMatch && currentKey) {
+      // If there's a pending nested object, push it
+      if (currentObj && Object.keys(currentObj).length > 0) {
+        currentArray!.push(currentObj);
+      }
+      currentObj = null;
+
+      const val = arrayMatch[1].trim();
+      if (!currentArray) currentArray = [];
+
+      // Check if this array item starts a nested object (e.g. "- slice: S00")
+      const nestedStart = val.match(/^(\w[\w_]*)\s*:\s*(.*)$/);
+      if (nestedStart) {
+        currentObj = { [nestedStart[1]]: nestedStart[2].trim() };
+      } else {
+        currentArray.push(val);
+      }
+      continue;
+    }
+
+    // Flush previous key
+    if (currentKey) {
+      if (currentObj && Object.keys(currentObj).length > 0 && currentArray) {
+        currentArray.push(currentObj);
+        currentObj = null;
+      }
+      if (currentArray) {
+        result[currentKey] = currentArray;
+      }
+      currentArray = null;
+    }
+
+    // Top-level key: value
+    const kvMatch = line.match(/^(\w[\w_]*)\s*:\s*(.*)$/);
+    if (kvMatch) {
+      currentKey = kvMatch[1];
+      const val = kvMatch[2].trim();
+
+      if (val === '' || val === '[]') {
+        currentArray = [];
+      } else if (val.startsWith('[') && val.endsWith(']')) {
+        const inner = val.slice(1, -1).trim();
+        result[currentKey] = inner ? inner.split(',').map(s => s.trim()) : [];
+        currentKey = null;
+      } else {
+        result[currentKey] = val;
+        currentKey = null;
+      }
+    }
+  }
+
+  // Flush final key
+  if (currentKey) {
+    if (currentObj && Object.keys(currentObj).length > 0 && currentArray) {
+      currentArray.push(currentObj);
+      currentObj = null;
+    }
+    if (currentArray) {
+      result[currentKey] = currentArray;
+    }
+  }
+
+  return result;
+}
+
+/** Extract the text after a heading at a given level, up to the next heading of same or higher level. */
+function extractSection(body: string, heading: string, level: number = 2): string | null {
+  const prefix = '#'.repeat(level) + ' ';
+  const regex = new RegExp(`^${prefix}${escapeRegex(heading)}\\s*$`, 'm');
+  const match = regex.exec(body);
+  if (!match) return null;
+
+  const start = match.index + match[0].length;
+  const rest = body.slice(start);
+
+  const nextHeading = rest.match(new RegExp(`^#{1,${level}} `, 'm'));
+  const end = nextHeading ? nextHeading.index! : rest.length;
+
+  return rest.slice(0, end).trim();
+}
+
+/** Extract all sections at a given level, returning heading → content map. */
+function extractAllSections(body: string, level: number = 2): Map<string, string> {
+  const prefix = '#'.repeat(level) + ' ';
+  const regex = new RegExp(`^${prefix}(.+)$`, 'gm');
+  const sections = new Map<string, string>();
+  const matches = [...body.matchAll(regex)];
+
+  for (let i = 0; i < matches.length; i++) {
+    const heading = matches[i][1].trim();
+    const start = matches[i].index! + matches[i][0].length;
+    const end = i + 1 < matches.length ? matches[i + 1].index! : body.length;
+    sections.set(heading, body.slice(start, end).trim());
+  }
+
+  return sections;
+}
+
+function escapeRegex(s: string): string {
+  return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+}
+
+/** Parse bullet list items from a text block. */
+function parseBullets(text: string): string[] {
+  return text.split('\n')
+    .map(l => l.replace(/^\s*[-*]\s+/, '').trim())
+    .filter(l => l.length > 0 && !l.startsWith('#'));
+}
+
+/** Extract key: value from bold-prefixed lines like "**Key:** Value" */
+function extractBoldField(text: string, key: string): string | null {
+  const regex = new RegExp(`^\\*\\*${escapeRegex(key)}:\\*\\*\\s*(.+)$`, 'm');
+  const match = regex.exec(text);
+  return match ? match[1].trim() : null;
+}
+
+// ─── Roadmap Parser ────────────────────────────────────────────────────────
+
+export function parseRoadmap(content: string): Roadmap {
+  const lines = content.split('\n');
+
+  const h1 = lines.find(l => l.startsWith('# '));
+  const title = h1 ? h1.slice(2).trim() : '';
+  const vision = extractBoldField(content, 'Vision') || '';
+
+  const scSection = extractSection(content, 'Success Criteria', 2) ||
+    (() => {
+      const idx = content.indexOf('**Success Criteria:**');
+      if (idx === -1) return '';
+      const rest = content.slice(idx);
+      const nextSection = rest.indexOf('\n---');
+      const block = rest.slice(0, nextSection === -1 ? undefined : nextSection);
+      const firstNewline = block.indexOf('\n');
+      return firstNewline === -1 ? '' : block.slice(firstNewline + 1);
+    })();
+  const successCriteria = scSection ? parseBullets(scSection) : [];
+
+  // Slices
+  const slicesSection = extractSection(content, 'Slices');
+  const slices: RoadmapSliceEntry[] = [];
+
+  if (slicesSection) {
+    const checkboxItems = slicesSection.split('\n');
+    let currentSlice: RoadmapSliceEntry | null = null;
+
+    for (const line of checkboxItems) {
+      const cbMatch = line.match(/^-\s+\[([ xX])\]\s+\*\*(\w+):\s+(.+?)\*\*\s*(.*)/);
+      if (cbMatch) {
+        if (currentSlice) slices.push(currentSlice);
+
+        const done = cbMatch[1].toLowerCase() === 'x';
+        const id = cbMatch[2];
+        const sliceTitle = cbMatch[3];
+        const rest = cbMatch[4];
+
+        const riskMatch = rest.match(/`risk:(\w+)`/);
+        const risk = (riskMatch ? riskMatch[1] : 'low') as RiskLevel;
+
+        const depsMatch = rest.match(/`depends:\[([^\]]*)\]`/);
+        const depends = depsMatch && depsMatch[1].trim()
+          ? depsMatch[1].split(',').map(s => s.trim())
+          : [];
+
+        currentSlice = { id, title: sliceTitle, risk, depends, done, demo: '' };
+      } else if (currentSlice && line.trim().startsWith('>')) {
+        const demoText = line.trim().replace(/^>\s*/, '').replace(/^After this:\s*/i, '');
+        currentSlice.demo = demoText;
+      }
+    }
+    if (currentSlice) slices.push(currentSlice);
+  }
+
+  // Boundary map
+  const boundaryMap: BoundaryMapEntry[] = [];
+  const bmSection = extractSection(content, 'Boundary Map');
+
+  if (bmSection) {
+    const h3Sections = extractAllSections(bmSection, 3);
+    for (const [heading, sectionContent] of h3Sections) {
+      const arrowMatch = heading.match(/^(\S+)\s*→\s*(\S+)/);
+      if (!arrowMatch) continue;
+
+      const fromSlice = arrowMatch[1];
+      const toSlice = arrowMatch[2];
+
+      let produces = '';
+      let consumes = '';
+
+      const prodMatch = sectionContent.match(/^Produces:\s*\n([\s\S]*?)(?=^Consumes|$)/m);
+      if (prodMatch) produces = prodMatch[1].trim();
+
+      const consMatch = sectionContent.match(/^Consumes[^:]*:\s*\n?([\s\S]*?)$/m);
+      if (consMatch) consumes = consMatch[1].trim();
+      if (!consumes) {
+        const singleCons = sectionContent.match(/^Consumes[^:]*:\s*(.+)$/m);
+        if (singleCons) consumes = singleCons[1].trim();
+      }
+
+      boundaryMap.push({ fromSlice, toSlice, produces, consumes });
+    }
+  }
+
+  return { title, vision, successCriteria, slices, boundaryMap };
+}
+
+// ─── Slice Plan Parser ─────────────────────────────────────────────────────
+
+export function parsePlan(content: string): SlicePlan {
+  const lines = content.split('\n');
+
+  const h1 = lines.find(l => l.startsWith('# '));
+  let id = '';
+  let title = '';
+  if (h1) {
+    const match = h1.match(/^#\s+(\w+):\s+(.+)/);
+    if (match) {
+      id = match[1];
+      title = match[2].trim();
+    } else {
+      title = h1.slice(2).trim();
+    }
+  }
+
+  const goal = extractBoldField(content, 'Goal') || '';
+  const demo = extractBoldField(content, 'Demo') || '';
+
+  const mhSection = extractSection(content, 'Must-Haves');
+  const mustHaves = mhSection ? parseBullets(mhSection) : [];
+
+  const tasksSection = extractSection(content, 'Tasks');
+  const tasks: TaskPlanEntry[] = [];
+
+  if (tasksSection) {
+    const taskLines = tasksSection.split('\n');
+    let currentTask: TaskPlanEntry | null = null;
+
+    for (const line of taskLines) {
+      const cbMatch = line.match(/^-\s+\[([ xX])\]\s+\*\*(\w+):\s+(.+?)\*\*\s*(.*)/);
+      if (cbMatch) {
+        if (currentTask) tasks.push(currentTask);
+
+        const rest = cbMatch[4] || '';
+        const estMatch = rest.match(/`est:([^`]+)`/);
+        const estimate = estMatch ? estMatch[1] : '';
+
+        currentTask = {
+          id: cbMatch[2],
+          title: cbMatch[3],
+          description: '',
+          done: cbMatch[1].toLowerCase() === 'x',
+          estimate,
+        };
+      } else if (currentTask && line.match(/^\s*-\s+Files:\s*(.*)/)) {
+        const filesMatch = line.match(/^\s*-\s+Files:\s*(.*)/);
+        if (filesMatch) {
+          currentTask.files = filesMatch[1]
+            .split(',')
+            .map(f => f.replace(/`/g, '').trim())
+            .filter(f => f.length > 0);
+        }
+      } else if (currentTask && line.match(/^\s*-\s+Verify:\s*(.*)/)) {
+        const verifyMatch = line.match(/^\s*-\s+Verify:\s*(.*)/);
+        if (verifyMatch) {
+          currentTask.verify = verifyMatch[1].trim();
+        }
+      } else if (currentTask && line.trim() && !line.startsWith('#')) {
+        const desc = line.trim();
+        if (desc) {
+          currentTask.description = currentTask.description
+            ? currentTask.description + ' ' + desc
+            : desc;
+        }
+      }
+    }
+    if (currentTask) tasks.push(currentTask);
+  }
+
+  const filesSection = extractSection(content, 'Files Likely Touched');
+  const filesLikelyTouched = filesSection ? parseBullets(filesSection) : [];
+
+  return { id, title, goal, demo, mustHaves, tasks, filesLikelyTouched };
+}
+
+// ─── Summary Parser ────────────────────────────────────────────────────────
+
+export function parseSummary(content: string): Summary {
+  const [fmLines, body] = splitFrontmatter(content);
+
+  const fm = fmLines ? parseFrontmatterMap(fmLines) : {};
+  const frontmatter: SummaryFrontmatter = {
+    id: (fm.id as string) || '',
+    parent: (fm.parent as string) || '',
+    milestone: (fm.milestone as string) || '',
+    provides: (fm.provides as string[]) || [],
+    requires: ((fm.requires as Array<Record<string, string>>) || []).map(r => ({
+      slice: r.slice || '',
+      provides: r.provides || '',
+    })),
+    affects: (fm.affects as string[]) || [],
+    key_files: (fm.key_files as string[]) || [],
+    key_decisions: (fm.key_decisions as string[]) || [],
+    patterns_established: (fm.patterns_established as string[]) || [],
+    drill_down_paths: (fm.drill_down_paths as string[]) || [],
+    observability_surfaces: (fm.observability_surfaces as string[]) || [],
+    duration: (fm.duration as string) || '',
+    verification_result: (fm.verification_result as string) || 'untested',
+    completed_at: (fm.completed_at as string) || '',
+    blocker_discovered: fm.blocker_discovered === 'true' || fm.blocker_discovered === true,
+  };
+
+  const bodyLines = body.split('\n');
+  const h1 = bodyLines.find(l => l.startsWith('# '));
+  const title = h1 ? h1.slice(2).trim() : '';
+
+  const h1Idx = bodyLines.indexOf(h1 || '');
+  let oneLiner = '';
+  for (let i = h1Idx + 1; i < bodyLines.length; i++) {
+    const line = bodyLines[i].trim();
+    if (!line) continue;
+    if (line.startsWith('**') && line.endsWith('**')) {
+      oneLiner = line.slice(2, -2);
+    }
+    break;
+  }
+
+  const whatHappened = extractSection(body, 'What Happened') || '';
+  const deviations = extractSection(body, 'Deviations') || '';
+
+  const filesSection = extractSection(body, 'Files Created/Modified') || extractSection(body, 'Files Modified');
+  const filesModified: FileModified[] = [];
+  if (filesSection) {
+    for (const line of filesSection.split('\n')) {
+      const trimmed = line.replace(/^\s*[-*]\s+/, '').trim();
+      if (!trimmed || trimmed.startsWith('#')) continue;
+
+      const fileMatch = trimmed.match(/^`([^`]+)`\s*[—–-]\s*(.+)/);
+      if (fileMatch) {
+        filesModified.push({ path: fileMatch[1], description: fileMatch[2].trim() });
+      }
+    }
+  }
+
+  return { frontmatter, title, oneLiner, whatHappened, deviations, filesModified };
+}
+
+// ─── Continue Parser ───────────────────────────────────────────────────────
+
+export function parseContinue(content: string): Continue {
+  const [fmLines, body] = splitFrontmatter(content);
+
+  const fm = fmLines ? parseFrontmatterMap(fmLines) : {};
+  const frontmatter: ContinueFrontmatter = {
+    milestone: (fm.milestone as string) || '',
+    slice: (fm.slice as string) || '',
+    task: (fm.task as string) || '',
+    step: typeof fm.step === 'string' ? parseInt(fm.step) : (fm.step as number) || 0,
+    totalSteps: typeof fm.total_steps === 'string' ? parseInt(fm.total_steps) : (fm.total_steps as number) ||
+      (typeof fm.totalSteps === 'string' ? parseInt(fm.totalSteps) : (fm.totalSteps as number) || 0),
+    status: ((fm.status as string) || 'in_progress') as ContinueStatus,
+    savedAt: (fm.saved_at as string) || (fm.savedAt as string) || '',
+  };
+
+  const completedWork = extractSection(body, 'Completed Work') || '';
+  const remainingWork = extractSection(body, 'Remaining Work') || '';
+  const decisions = extractSection(body, 'Decisions Made') || '';
+  const context = extractSection(body, 'Context') || '';
+  const nextAction = extractSection(body, 'Next Action') || '';
+
+  return { frontmatter, completedWork, remainingWork, decisions, context, nextAction };
+}
+
+// ─── Continue Formatter ────────────────────────────────────────────────────
+
+function formatFrontmatter(data: Record<string, unknown>): string {
+  const lines: string[] = ['---'];
+
+  for (const [key, value] of Object.entries(data)) {
+    if (value === undefined || value === null) continue;
+
+    if (Array.isArray(value)) {
+      if (value.length === 0) {
+        lines.push(`${key}: []`);
+      } else if (typeof value[0] === 'object' && value[0] !== null) {
+        lines.push(`${key}:`);
+        for (const obj of value) {
+          const entries = Object.entries(obj as Record<string, unknown>);
+          if (entries.length > 0) {
+            lines.push(`  - ${entries[0][0]}: ${entries[0][1]}`);
+            for (let i = 1; i < entries.length; i++) {
+              lines.push(`    ${entries[i][0]}: ${entries[i][1]}`);
+            }
+          }
+        }
+      } else {
+        lines.push(`${key}:`);
+        for (const item of value) {
+          lines.push(`  - ${item}`);
+        }
+      }
+    } else {
+      lines.push(`${key}: ${value}`);
+    }
+  }
+
+  lines.push('---');
+  return lines.join('\n');
+}
+
+export function formatContinue(cont: Continue): string {
+  const fm = cont.frontmatter;
+  const fmData: Record<string, unknown> = {
+    milestone: fm.milestone,
+    slice: fm.slice,
+    task: fm.task,
+    step: fm.step,
+    total_steps: fm.totalSteps,
+    status: fm.status,
+    saved_at: fm.savedAt,
+  };
+
+  const lines: string[] = [];
+  lines.push(formatFrontmatter(fmData));
+  lines.push('');
+  lines.push('## Completed Work');
+  lines.push(cont.completedWork);
+  lines.push('');
+  lines.push('## Remaining Work');
+  lines.push(cont.remainingWork);
+  lines.push('');
+  lines.push('## Decisions Made');
+  lines.push(cont.decisions);
+  lines.push('');
+  lines.push('## Context');
+  lines.push(cont.context);
+  lines.push('');
+  lines.push('## Next Action');
+  lines.push(cont.nextAction);
+
+  return lines.join('\n');
+}
+
+// ─── File I/O ──────────────────────────────────────────────────────────────
+
+/**
+ * Load a file from disk. Returns content string or null if file doesn't exist.
+ */
+export async function loadFile(path: string): Promise<string | null> {
+  try {
+    return await fs.readFile(path, 'utf-8');
+  } catch (err: unknown) {
+    if ((err as NodeJS.ErrnoException).code === 'ENOENT') return null;
+    throw err;
+  }
+}
+
+/**
+ * Save content to a file atomically (write to temp, then rename).
+ * Creates parent directories if needed.
+ */
+export async function saveFile(path: string, content: string): Promise<void> {
+  const dir = dirname(path);
+  await fs.mkdir(dir, { recursive: true });
+
+  const tmpPath = path + '.tmp';
+  await fs.writeFile(tmpPath, content, 'utf-8');
+  await fs.rename(tmpPath, path);
+}
+
+export function parseRequirementCounts(content: string | null): RequirementCounts {
+  const counts: RequirementCounts = {
+    active: 0,
+    validated: 0,
+    deferred: 0,
+    outOfScope: 0,
+    blocked: 0,
+    total: 0,
+  };
+
+  if (!content) return counts;
+
+  const sections = [
+    { key: 'active', heading: 'Active' },
+    { key: 'validated', heading: 'Validated' },
+    { key: 'deferred', heading: 'Deferred' },
+    { key: 'outOfScope', heading: 'Out of Scope' },
+  ] as const;
+
+  for (const section of sections) {
+    const text = extractSection(content, section.heading, 2);
+    if (!text) continue;
+    const matches = text.match(/^###\s+R\d+\s+—/gm);
+    counts[section.key] = matches ? matches.length : 0;
+  }
+
+  const blockedMatches = content.match(/^-\s+Status:\s+blocked\s*$/gim);
+  counts.blocked = blockedMatches ? blockedMatches.length : 0;
+  counts.total = counts.active + counts.validated + counts.deferred + counts.outOfScope;
+  return counts;
+}
+
+// ─── Task Plan Must-Haves Parser ───────────────────────────────────────────
+
+/**
+ * Parse must-have items from a task plan's `## Must-Haves` section.
+ * Returns structured items with checkbox state. Handles YAML frontmatter,
+ * all common checkbox variants (`[ ]`, `[x]`, `[X]`), plain bullets (no checkbox),
+ * and indented variants. Returns empty array when the section is missing or empty.
+ */
+export function parseTaskPlanMustHaves(content: string): Array<{ text: string; checked: boolean }> {
+  const [, body] = splitFrontmatter(content);
+  const sectionText = extractSection(body, 'Must-Haves');
+  if (!sectionText) return [];
+
+  const bullets = parseBullets(sectionText);
+  if (bullets.length === 0) return [];
+
+  return bullets.map(line => {
+    const cbMatch = line.match(/^\[([xX ])\]\s+(.+)/);
+    if (cbMatch) {
+      return {
+        text: cbMatch[2].trim(),
+        checked: cbMatch[1].toLowerCase() === 'x',
+      };
+    }
+    // No checkbox — treat as unchecked with full line as text
+    return { text: line.trim(), checked: false };
+  });
+}
+
+// ─── Must-Have Summary Matching ────────────────────────────────────────────
+
+/** Common short words to exclude from substring matching. */
+const COMMON_WORDS = new Set([
+  'the', 'and', 'for', 'are', 'but', 'not', 'you', 'all', 'can', 'had', 'her',
+  'was', 'one', 'our', 'out', 'has', 'its', 'let', 'say', 'she', 'too', 'use',
+  'with', 'have', 'from', 'this', 'that', 'they', 'been', 'each', 'when', 'will',
+  'does', 'into', 'also', 'than', 'them', 'then', 'some', 'what', 'only', 'just',
+  'more', 'make', 'like', 'made', 'over', 'such', 'take', 'most', 'very', 'must',
+  'file', 'test', 'tests', 'task', 'new', 'add', 'added', 'existing',
+]);
+
+/**
+ * Count how many must-have items are mentioned in a summary.
+ *
+ * Matching heuristic per must-have:
+ * 1. Extract all backtick-enclosed code tokens (e.g. `inspectFoo`).
+ *    If any code token appears case-insensitively in the summary, count as mentioned.
+ * 2. If no code tokens exist, check if any significant word (≥4 chars, not a common word)
+ *    from the must-have text appears in the summary (case-insensitive).
+ *
+ * Returns the count of must-haves that had at least one match.
+ */
+export function countMustHavesMentionedInSummary(
+  mustHaves: Array<{ text: string; checked: boolean }>,
+  summaryContent: string,
+): number {
+  if (!summaryContent || mustHaves.length === 0) return 0;
+
+  const summaryLower = summaryContent.toLowerCase();
+  let count = 0;
+
+  for (const mh of mustHaves) {
+    // Extract backtick-enclosed code tokens
+    const codeTokens: string[] = [];
+    const codeRegex = /`([^`]+)`/g;
+    let match: RegExpExecArray | null;
+    while ((match = codeRegex.exec(mh.text)) !== null) {
+      codeTokens.push(match[1]);
+    }
+
+    if (codeTokens.length > 0) {
+      // Strategy 1: any code token found in summary (case-insensitive)
+      const found = codeTokens.some(token => summaryLower.includes(token.toLowerCase()));
+      if (found) count++;
+    } else {
+      // Strategy 2: significant substring matching
+      // Split into words, keep words ≥4 chars that aren't common
+      const words = mh.text.replace(/[^\w\s]/g, ' ').split(/\s+/).filter(w =>
+        w.length >= 4 && !COMMON_WORDS.has(w.toLowerCase())
+      );
+      const found = words.some(word => summaryLower.includes(word.toLowerCase()));
+      if (found) count++;
+    }
+  }
+
+  return count;
+}
+
+// ─── UAT Type Extractor ────────────────────────────────────────────────────
+
+/**
+ * The four UAT classification types recognised by GSD auto-mode.
+ * `undefined` is returned (not this union) when no type can be determined.
+ */
+export type UatType = 'artifact-driven' | 'live-runtime' | 'human-experience' | 'mixed';
+
+/**
+ * Extract the UAT type from a UAT file's raw content.
+ *
+ * UAT files have no YAML frontmatter — pass raw file content directly.
+ * Classification is leading-keyword-only: e.g. `mixed (artifact-driven + live-runtime)` → `'mixed'`.
+ *
+ * Returns `undefined` when:
+ * - the `## UAT Type` section is absent
+ * - no `UAT mode:` bullet is found in the section
+ * - the value does not start with a recognised keyword
+ */
+export function extractUatType(content: string): UatType | undefined {
+  const sectionText = extractSection(content, 'UAT Type');
+  if (!sectionText) return undefined;
+
+  const bullets = parseBullets(sectionText);
+  const modeBullet = bullets.find(b => b.startsWith('UAT mode:'));
+  if (!modeBullet) return undefined;
+
+  const rawValue = modeBullet.slice('UAT mode:'.length).trim().toLowerCase();
+
+  if (rawValue.startsWith('artifact-driven')) return 'artifact-driven';
+  if (rawValue.startsWith('live-runtime')) return 'live-runtime';
+  if (rawValue.startsWith('human-experience')) return 'human-experience';
+  if (rawValue.startsWith('mixed')) return 'mixed';
+
+  return undefined;
+}
+
+/**
+ * Extract the `depends_on` list from M00x-CONTEXT.md YAML frontmatter.
+ * Returns [] when: content is null, no frontmatter block, field absent, or field is empty.
+ * Normalizes each dep ID to uppercase (e.g. 'm001' → 'M001').
+ */
+export function parseContextDependsOn(content: string | null): string[] {
+  if (!content) return [];
+  const [fmLines] = splitFrontmatter(content);
+  if (!fmLines) return [];
+  const fm = parseFrontmatterMap(fmLines);
+  const raw = fm['depends_on'];
+  if (!Array.isArray(raw) || raw.length === 0) return [];
+  return (raw as string[]).map(s => String(s).toUpperCase().trim()).filter(Boolean);
+}
+
+/**
+ * Inline the prior milestone's SUMMARY.md as context for the current milestone's planning prompt.
+ * Returns null when: (1) `mid` is the first milestone, (2) prior milestone has no SUMMARY file.
+ *
+ * Scans the milestones directory using the same readdirSync + sort + M\d+ match pattern
+ * as findMilestoneIds in state.ts.
+ */
+export async function inlinePriorMilestoneSummary(mid: string, base: string): Promise<string | null> {
+  const dir = milestonesDir(base);
+  let sorted: string[];
+  try {
+    sorted = readdirSync(dir, { withFileTypes: true })
+      .filter(d => d.isDirectory())
+      .map(d => {
+        const match = d.name.match(/^(M\d+)/);
+        return match ? match[1] : d.name;
+      })
+      .sort();
+  } catch {
+    return null;
+  }
+  const idx = sorted.indexOf(mid);
+  if (idx <= 0) return null;
+  const prevMid = sorted[idx - 1];
+  const absPath = resolveMilestoneFile(base, prevMid, "SUMMARY");
+  const relPath = relMilestoneFile(base, prevMid, "SUMMARY");
+  const content = absPath ? await loadFile(absPath) : null;
+  if (!content) return null;
+  return `### Prior Milestone Summary\nSource: \`${relPath}\`\n\n${content.trim()}`;
+}
--- a/src/resources/extensions/gsd/gitignore.ts
+++ b/src/resources/extensions/gsd/gitignore.ts
@ -0,0 +1,104 @@
+/**
+ * GSD .gitignore bootstrapper
+ *
+ * Ensures a baseline .gitignore exists with universally-correct patterns.
+ * Idempotent — only appends entries that are missing.
+ */
+
+import { join } from "node:path";
+import { existsSync, readFileSync, writeFileSync } from "node:fs";
+
+/**
+ * Patterns that are always correct regardless of project type.
+ * No one ever wants these tracked.
+ */
+const BASELINE_PATTERNS = [
+  // ── GSD runtime (not source artifacts) ──
+  ".gsd/activity/",
+  ".gsd/runtime/",
+  ".gsd/auto.lock",
+  ".gsd/metrics.json",
+  ".gsd/STATE.md",
+
+  // ── OS junk ──
+  ".DS_Store",
+  "Thumbs.db",
+
+  // ── Editor / IDE ──
+  "*.swp",
+  "*.swo",
+  "*~",
+  ".idea/",
+  ".vscode/",
+  "*.code-workspace",
+
+  // ── Environment / secrets ──
+  ".env",
+  ".env.*",
+  "!.env.example",
+
+  // ── Node / JS / TS ──
+  "node_modules/",
+  ".next/",
+  "dist/",
+  "build/",
+
+  // ── Python ──
+  "__pycache__/",
+  "*.pyc",
+  ".venv/",
+  "venv/",
+
+  // ── Rust ──
+  "target/",
+
+  // ── Go ──
+  "vendor/",
+
+  // ── Misc build artifacts ──
+  "*.log",
+  "coverage/",
+  ".cache/",
+  "tmp/",
+];
+
+/**
+ * Ensure basePath/.gitignore contains all baseline patterns.
+ * Creates the file if missing; appends only missing lines if it exists.
+ * Returns true if the file was created or modified, false if already complete.
+ */
+export function ensureGitignore(basePath: string): boolean {
+  const gitignorePath = join(basePath, ".gitignore");
+
+  let existing = "";
+  if (existsSync(gitignorePath)) {
+    existing = readFileSync(gitignorePath, "utf-8");
+  }
+
+  // Parse existing lines (trimmed, ignoring comments and blanks)
+  const existingLines = new Set(
+    existing
+      .split("\n")
+      .map((l) => l.trim())
+      .filter((l) => l && !l.startsWith("#")),
+  );
+
+  // Find patterns not yet present
+  const missing = BASELINE_PATTERNS.filter((p) => !existingLines.has(p));
+
+  if (missing.length === 0) return false;
+
+  // Build the block to append
+  const block = [
+    "",
+    "# ── GSD baseline (auto-generated) ──",
+    ...missing,
+    "",
+  ].join("\n");
+
+  // Ensure existing content ends with a newline before appending
+  const prefix = existing && !existing.endsWith("\n") ? "\n" : "";
+  writeFileSync(gitignorePath, existing + prefix + block, "utf-8");
+
+  return true;
+}
--- a/src/resources/extensions/gsd/guided-flow.ts
+++ b/src/resources/extensions/gsd/guided-flow.ts
@ -0,0 +1,800 @@
+/**
+ * GSD Guided Flow — Smart Entry Wizard
+ *
+ * One function: showSmartEntry(). Reads state from disk, shows a contextual
+ * wizard via showNextAction(), and dispatches through GSD-WORKFLOW.md.
+ * No execution state, no hooks, no tools — the LLM does the rest.
+ */
+
+import type { ExtensionAPI, ExtensionContext, ExtensionCommandContext } from "@mariozechner/pi-coding-agent";
+import { showNextAction } from "../shared/next-action-ui.js";
+import { loadFile, parseRoadmap } from "./files.js";
+import { loadPrompt } from "./prompt-loader.js";
+import { deriveState } from "./state.js";
+import { startAuto } from "./auto.js";
+import { readCrashLock, clearLock, formatCrashInfo } from "./crash-recovery.js";
+import {
+  gsdRoot, milestonesDir, resolveMilestoneFile,
+  resolveSliceFile, resolveSlicePath, resolveGsdRootFile, relGsdRootFile,
+  relMilestoneFile, relSliceFile, relSlicePath,
+} from "./paths.js";
+import { join } from "node:path";
+import { readFileSync, existsSync, mkdirSync, readdirSync } from "node:fs";
+import { execSync } from "node:child_process";
+import { ensureGitignore } from "./gitignore.js";
+
+// ─── Auto-start after discuss ─────────────────────────────────────────────────
+
+/** Stashed context + flag for auto-starting after discuss phase completes */
+let pendingAutoStart: {
+  ctx: ExtensionCommandContext;
+  pi: ExtensionAPI;
+  basePath: string;
+  milestoneId: string; // the milestone being discussed
+} | null = null;
+
+/** Called from agent_end to check if auto-mode should start after discuss */
+export function checkAutoStartAfterDiscuss(): boolean {
+  if (!pendingAutoStart) return false;
+
+  const { ctx, pi, basePath, milestoneId } = pendingAutoStart;
+
+  // Don't fire until the discuss phase has actually produced a context file
+  // for the milestone being discussed. agent_end fires after every LLM turn,
+  // including the initial "What do you want to build?" response — we need to
+  // wait for the full conversation to complete and the LLM to write CONTEXT.md.
+  const contextFile = resolveMilestoneFile(basePath, milestoneId, "CONTEXT");
+  if (!contextFile) return false; // no context yet — keep waiting
+
+  pendingAutoStart = null;
+  startAuto(ctx, pi, basePath, false).catch(() => {});
+  return true;
+}
+
+// ─── Types ────────────────────────────────────────────────────────────────────
+
+type UIContext = ExtensionContext;
+
+// ─── Helpers ──────────────────────────────────────────────────────────────────
+
+/**
+ * Read GSD-WORKFLOW.md and dispatch it to the LLM with a contextual note.
+ * This is the only way the wizard triggers work — everything else is the LLM's job.
+ */
+function dispatchWorkflow(pi: ExtensionAPI, note: string, customType = "gsd-run"): void {
+  const workflowPath = process.env.GSD_WORKFLOW_PATH ?? join(process.env.HOME ?? "~", ".pi", "GSD-WORKFLOW.md");
+  const workflow = readFileSync(workflowPath, "utf-8");
+
+  pi.sendMessage(
+    {
+      customType,
+      content: `Read the following GSD workflow protocol and execute exactly.\n\n${workflow}\n\n## Your Task\n\n${note}`,
+      display: false,
+    },
+    { triggerTurn: true },
+  );
+}
+
+/**
+ * Build the discuss-and-plan prompt for a new milestone.
+ * Used by all three "new milestone" paths (first ever, no active, all complete).
+ */
+function buildDiscussPrompt(nextId: string, preamble: string, basePath: string): string {
+  const milestoneDirAbs = join(basePath, ".gsd", "milestones", nextId);
+  return loadPrompt("discuss", {
+    milestoneId: nextId,
+    preamble,
+    contextAbsPath: join(milestoneDirAbs, `${nextId}-CONTEXT.md`),
+    roadmapAbsPath: join(milestoneDirAbs, `${nextId}-ROADMAP.md`),
+  });
+}
+
+function findMilestoneIds(basePath: string): string[] {
+  const dir = milestonesDir(basePath);
+  try {
+    return readdirSync(dir, { withFileTypes: true })
+      .filter((d) => d.isDirectory())
+      .map((d) => {
+        const match = d.name.match(/^(M\d+)/);
+        return match ? match[1] : d.name;
+      })
+      .sort();
+  } catch {
+    return [];
+  }
+}
+
+// ─── Queue ─────────────────────────────────────────────────────────────────────
+
+/**
+ * Queue future milestones via conversational intake.
+ *
+ * Safe to run while auto-mode is executing — only writes to future milestone
+ * directories (which auto-mode won't touch until it reaches them) and appends
+ * to project.md / queue.md.
+ *
+ * The flow:
+ * 1. Build context about all existing milestones (complete, active, pending)
+ * 2. Dispatch the queue prompt — LLM discusses with the user, assesses scope
+ * 3. LLM writes CONTEXT.md files for new milestones (no roadmaps — JIT)
+ * 4. Auto-mode picks them up naturally when it advances past current work
+ *
+ * Root durable artifacts use uppercase names like PROJECT.md and QUEUE.md.
+ */
+export async function showQueue(
+  ctx: ExtensionCommandContext,
+  pi: ExtensionAPI,
+  basePath: string,
+): Promise<void> {
+  // ── Ensure .gsd/ exists ─────────────────────────────────────────────
+  const gsd = gsdRoot(basePath);
+  if (!existsSync(gsd)) {
+    ctx.ui.notify("No GSD project found. Run /gsd to start one first.", "warning");
+    return;
+  }
+
+  const state = await deriveState(basePath);
+  const milestoneIds = findMilestoneIds(basePath);
+
+  if (milestoneIds.length === 0) {
+    ctx.ui.notify("No milestones exist yet. Run /gsd to create the first one.", "warning");
+    return;
+  }
+
+  // ── Build existing milestones context for the prompt ────────────────
+  const existingContext = await buildExistingMilestonesContext(basePath, milestoneIds, state);
+
+  // ── Determine next milestone ID ─────────────────────────────────────
+  const maxNum = milestoneIds.reduce((max, id) => {
+    const num = parseInt(id.replace(/^M/, ""), 10);
+    return num > max ? num : max;
+  }, 0);
+  const nextId = `M${String(maxNum + 1).padStart(3, "0")}`;
+  const nextIdPlus1 = `M${String(maxNum + 2).padStart(3, "0")}`;
+
+  // ── Build preamble ──────────────────────────────────────────────────
+  const activePart = state.activeMilestone
+    ? `Currently executing: ${state.activeMilestone.id} — ${state.activeMilestone.title} (phase: ${state.phase}).`
+    : "No milestone currently active.";
+
+  const pendingCount = state.registry.filter(m => m.status === "pending").length;
+  const completeCount = state.registry.filter(m => m.status === "complete").length;
+
+  const preamble = [
+    `Queuing new work onto an existing GSD project.`,
+    activePart,
+    `${completeCount} milestone(s) complete, ${pendingCount} pending.`,
+    `Next available milestone ID: ${nextId}.`,
+  ].join(" ");
+
+  // ── Dispatch the queue prompt ───────────────────────────────────────
+  const prompt = loadPrompt("queue", {
+    preamble,
+    nextId,
+    nextIdPlus1,
+    existingMilestonesContext: existingContext,
+  });
+
+  pi.sendMessage(
+    {
+      customType: "gsd-queue",
+      content: prompt,
+      display: false,
+    },
+    { triggerTurn: true },
+  );
+}
+
+/**
+ * Build a context block describing all existing milestones for the queue prompt.
+ * Gives the LLM enough information to dedup, sequence, and dependency-check.
+ */
+async function buildExistingMilestonesContext(
+  basePath: string,
+  milestoneIds: string[],
+  state: import("./types.js").GSDState,
+): Promise<string> {
+  const sections: string[] = [];
+
+  // Include PROJECT.md if it exists — it has the milestone sequence and project description
+  const projectPath = resolveGsdRootFile(basePath, "PROJECT");
+  if (existsSync(projectPath)) {
+    const projectContent = await loadFile(projectPath);
+    if (projectContent) {
+      sections.push(`### Project Overview\nSource: \`${relGsdRootFile("PROJECT")}\`\n\n${projectContent.trim()}`);
+    }
+  }
+
+  // Include DECISIONS.md if it exists — architectural decisions inform new milestone scoping
+  const decisionsPath = resolveGsdRootFile(basePath, "DECISIONS");
+  if (existsSync(decisionsPath)) {
+    const decisionsContent = await loadFile(decisionsPath);
+    if (decisionsContent) {
+      sections.push(`### Decisions Register\nSource: \`${relGsdRootFile("DECISIONS")}\`\n\n${decisionsContent.trim()}`);
+    }
+  }
+
+  // For each milestone, include context and status
+  for (const mid of milestoneIds) {
+    const registryEntry = state.registry.find(m => m.id === mid);
+    const status = registryEntry?.status ?? "unknown";
+    const title = registryEntry?.title ?? mid;
+
+    const parts: string[] = [];
+    parts.push(`### ${mid}: ${title}\n**Status:** ${status}`);
+
+    // Include context file — this is the primary content for understanding scope
+    const contextFile = resolveMilestoneFile(basePath, mid, "CONTEXT");
+    if (contextFile) {
+      const content = await loadFile(contextFile);
+      if (content) {
+        parts.push(`\n**Context:**\n${content.trim()}`);
+      }
+    }
+
+    // For completed milestones, include the summary if it exists
+    if (status === "complete") {
+      const summaryFile = resolveMilestoneFile(basePath, mid, "SUMMARY");
+      if (summaryFile) {
+        const content = await loadFile(summaryFile);
+        if (content) {
+          parts.push(`\n**Summary:**\n${content.trim()}`);
+        }
+      }
+    }
+
+    // For active/pending milestones, include the roadmap if it exists
+    // (shows what's planned but not yet built)
+    if (status === "active" || status === "pending") {
+      const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+      if (roadmapFile) {
+        const content = await loadFile(roadmapFile);
+        if (content) {
+          parts.push(`\n**Roadmap:**\n${content.trim()}`);
+        }
+      }
+    }
+
+    sections.push(parts.join("\n"));
+  }
+
+  // Include queue log if it exists — shows what's been queued before
+  const queuePath = resolveGsdRootFile(basePath, "QUEUE");
+  if (existsSync(queuePath)) {
+    const queueContent = await loadFile(queuePath);
+    if (queueContent) {
+      sections.push(`### Previous Queue Entries\nSource: \`${relGsdRootFile("QUEUE")}\`\n\n${queueContent.trim()}`);
+    }
+  }
+
+  return sections.join("\n\n---\n\n");
+}
+
+// ─── Discuss Flow ─────────────────────────────────────────────────────────────
+
+/**
+ * Build a rich inlined-context prompt for discussing a specific slice.
+ * Preloads roadmap, milestone context, research, decisions, and completed
+ * slice summaries so the agent can ask grounded UX/behaviour questions
+ * without wasting a turn reading files.
+ */
+async function buildDiscussSlicePrompt(
+  mid: string,
+  sid: string,
+  sTitle: string,
+  base: string,
+): Promise<string> {
+  const inlined: string[] = [];
+
+  // Roadmap — always included so the agent sees surrounding slices
+  const roadmapPath = resolveMilestoneFile(base, mid, "ROADMAP");
+  const roadmapRel = relMilestoneFile(base, mid, "ROADMAP");
+  const roadmapContent = roadmapPath ? await loadFile(roadmapPath) : null;
+  if (roadmapContent) {
+    inlined.push(`### Milestone Roadmap\nSource: \`${roadmapRel}\`\n\n${roadmapContent.trim()}`);
+  }
+
+  // Milestone context — understanding the full milestone intent
+  const contextPath = resolveMilestoneFile(base, mid, "CONTEXT");
+  const contextRel = relMilestoneFile(base, mid, "CONTEXT");
+  const contextContent = contextPath ? await loadFile(contextPath) : null;
+  if (contextContent) {
+    inlined.push(`### Milestone Context\nSource: \`${contextRel}\`\n\n${contextContent.trim()}`);
+  }
+
+  // Milestone research — technical grounding
+  const researchPath = resolveMilestoneFile(base, mid, "RESEARCH");
+  const researchRel = relMilestoneFile(base, mid, "RESEARCH");
+  const researchContent = researchPath ? await loadFile(researchPath) : null;
+  if (researchContent) {
+    inlined.push(`### Milestone Research\nSource: \`${researchRel}\`\n\n${researchContent.trim()}`);
+  }
+
+  // Decisions — architectural context that constrains this slice
+  const decisionsPath = resolveGsdRootFile(base, "DECISIONS");
+  if (existsSync(decisionsPath)) {
+    const decisionsContent = await loadFile(decisionsPath);
+    if (decisionsContent) {
+      inlined.push(`### Decisions Register\nSource: \`${relGsdRootFile("DECISIONS")}\`\n\n${decisionsContent.trim()}`);
+    }
+  }
+
+  // Completed slice summaries — what was already built that this slice builds on
+  if (roadmapContent) {
+    const roadmap = parseRoadmap(roadmapContent);
+    for (const s of roadmap.slices) {
+      if (!s.done || s.id === sid) continue;
+      const summaryPath = resolveSliceFile(base, mid, s.id, "SUMMARY");
+      const summaryRel = relSliceFile(base, mid, s.id, "SUMMARY");
+      const summaryContent = summaryPath ? await loadFile(summaryPath) : null;
+      if (summaryContent) {
+        inlined.push(`### ${s.id} Summary (completed)\nSource: \`${summaryRel}\`\n\n${summaryContent.trim()}`);
+      }
+    }
+  }
+
+  const inlinedContext = inlined.length > 0
+    ? `## Inlined Context (preloaded — do not re-read these files)\n\n${inlined.join("\n\n---\n\n")}`
+    : `## Inlined Context\n\n_(no context files found yet — go in blind and ask broad questions)_`;
+
+  const sliceDirAbsPath = join(base, ".gsd", "milestones", mid, "slices", sid);
+  const contextAbsPath = join(sliceDirAbsPath, `${sid}-CONTEXT.md`);
+
+  return loadPrompt("guided-discuss-slice", {
+    milestoneId: mid,
+    sliceId: sid,
+    sliceTitle: sTitle,
+    inlinedContext,
+    sliceDirAbsPath,
+    contextAbsPath,
+    projectRoot: base,
+  });
+}
+
+/**
+ * /gsd discuss — show a picker of non-done slices and run a slice interview.
+ * Loops back to the picker after each discussion so the user can chain
+ * multiple slice interviews in one session.
+ */
+export async function showDiscuss(
+  ctx: ExtensionCommandContext,
+  pi: ExtensionAPI,
+  basePath: string,
+): Promise<void> {
+  // Guard: no .gsd/ project
+  if (!existsSync(join(basePath, ".gsd"))) {
+    ctx.ui.notify("No GSD project found. Run /gsd to start one first.", "warning");
+    return;
+  }
+
+  const state = await deriveState(basePath);
+
+  // Guard: no active milestone
+  if (!state.activeMilestone) {
+    ctx.ui.notify("No active milestone. Run /gsd to create one first.", "warning");
+    return;
+  }
+
+  const mid = state.activeMilestone.id;
+  const milestoneTitle = state.activeMilestone.title;
+
+  // Guard: no roadmap yet
+  const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+  const roadmapContent = roadmapFile ? await loadFile(roadmapFile) : null;
+  if (!roadmapContent) {
+    ctx.ui.notify("No roadmap yet for this milestone. Run /gsd to plan first.", "warning");
+    return;
+  }
+
+  const roadmap = parseRoadmap(roadmapContent);
+  const pendingSlices = roadmap.slices.filter(s => !s.done);
+
+  if (pendingSlices.length === 0) {
+    ctx.ui.notify("All slices are complete — nothing to discuss.", "info");
+    return;
+  }
+
+  // Loop: show picker, dispatch discuss, repeat until "not_yet"
+  while (true) {
+    const actions = pendingSlices.map((s, i) => ({
+      id: s.id,
+      label: `${s.id}: ${s.title}`,
+      description: state.activeSlice?.id === s.id ? "active slice" : "upcoming",
+      recommended: i === 0,
+    }));
+
+    const choice = await showNextAction(ctx as any, {
+      title: "GSD — Discuss a slice",
+      summary: [
+        `${mid}: ${milestoneTitle}`,
+        "Pick a slice to interview. Context file will be written when done.",
+      ],
+      actions,
+      notYetMessage: "Run /gsd discuss when ready.",
+    });
+
+    if (choice === "not_yet") return;
+
+    const chosen = pendingSlices.find(s => s.id === choice);
+    if (!chosen) return;
+
+    const prompt = await buildDiscussSlicePrompt(mid, chosen.id, chosen.title, basePath);
+    dispatchWorkflow(pi, prompt, "gsd-discuss");
+
+    // Wait for the discuss session to finish, then loop back to the picker
+    await ctx.waitForIdle();
+  }
+}
+
+// ─── Smart Entry Point ────────────────────────────────────────────────────────
+
+/**
+ * The one wizard. Reads state, shows contextual options, dispatches into the workflow doc.
+ */
+export async function showSmartEntry(
+  ctx: ExtensionCommandContext,
+  pi: ExtensionAPI,
+  basePath: string,
+): Promise<void> {
+
+  // ── Ensure git repo exists — GSD needs it for branch-per-slice ──────
+  try {
+    execSync("git rev-parse --git-dir", { cwd: basePath, stdio: "pipe" });
+  } catch {
+    execSync("git init", { cwd: basePath, stdio: "pipe" });
+  }
+
+  // ── Ensure .gitignore has baseline patterns ──────────────────────────
+  ensureGitignore(basePath);
+
+  // ── No GSD project OR no milestone → Create first/next milestone ────
+  if (!existsSync(join(basePath, ".gsd"))) {
+    // Bootstrap .gsd/ silently — the user wants a milestone, not to "init"
+    const gsd = gsdRoot(basePath);
+    mkdirSync(join(gsd, "milestones"), { recursive: true });
+    try {
+      execSync("git add -A .gsd .gitignore && git commit -m 'chore: init gsd'", {
+        cwd: basePath,
+        stdio: "pipe",
+      });
+    } catch {
+      // nothing to commit — that's fine
+    }
+  }
+
+  // Check for crash from previous auto-mode session
+  const crashLock = readCrashLock(basePath);
+  if (crashLock) {
+    clearLock(basePath);
+    const resume = await showNextAction(ctx as any, {
+      title: "GSD — Interrupted Session Detected",
+      summary: [formatCrashInfo(crashLock)],
+      actions: [
+        { id: "resume", label: "Resume with /gsd auto", description: "Pick up where it left off", recommended: true },
+        { id: "continue", label: "Continue manually", description: "Open the wizard as normal" },
+      ],
+    });
+    if (resume === "resume") {
+      await startAuto(ctx, pi, basePath, false);
+      return;
+    }
+  }
+
+  const state = await deriveState(basePath);
+
+  if (!state.activeMilestone) {
+    const milestoneIds = findMilestoneIds(basePath);
+    const nextId = `M${String(milestoneIds.length + 1).padStart(3, "0")}`;
+    const isFirst = milestoneIds.length === 0;
+
+    if (isFirst) {
+      // First ever — skip wizard, just ask directly
+      pendingAutoStart = { ctx, pi, basePath, milestoneId: nextId };
+      dispatchWorkflow(pi, buildDiscussPrompt(nextId,
+        `New project, milestone ${nextId}. Do NOT read or explore .gsd/ — it's empty scaffolding.`,
+        basePath
+      ));
+    } else {
+      const choice = await showNextAction(ctx as any, {
+        title: "GSD — Get Stuff Done",
+        summary: ["No active milestone."],
+        actions: [
+          {
+            id: "new_milestone",
+            label: "Create next milestone",
+            description: "Define what to build next.",
+            recommended: true,
+          },
+        ],
+        notYetMessage: "Run /gsd when ready.",
+      });
+
+      if (choice === "new_milestone") {
+        pendingAutoStart = { ctx, pi, basePath, milestoneId: nextId };
+        dispatchWorkflow(pi, buildDiscussPrompt(nextId,
+          `New milestone ${nextId}.`,
+          basePath
+        ));
+      }
+    }
+    return;
+  }
+
+  const milestoneId = state.activeMilestone.id;
+  const milestoneTitle = state.activeMilestone.title;
+
+  // ── All milestones complete → New milestone ──────────────────────────
+  if (state.phase === "complete") {
+    const choice = await showNextAction(ctx as any, {
+      title: `GSD — ${milestoneId}: ${milestoneTitle}`,
+      summary: ["All milestones complete."],
+      actions: [
+        {
+          id: "new_milestone",
+          label: "Start new milestone",
+          description: "Define and plan the next milestone.",
+          recommended: true,
+        },
+        {
+          id: "status",
+          label: "View status",
+          description: "Review what was built.",
+        },
+      ],
+      notYetMessage: "Run /gsd when ready.",
+    });
+
+    if (choice === "new_milestone") {
+      const milestoneIds = findMilestoneIds(basePath);
+      const nextId = `M${String(milestoneIds.length + 1).padStart(3, "0")}`;
+
+      pendingAutoStart = { ctx, pi, basePath, milestoneId: nextId };
+      dispatchWorkflow(pi, buildDiscussPrompt(nextId,
+        `New milestone ${nextId}.`,
+        basePath
+      ));
+    } else if (choice === "status") {
+      const { fireStatusViaCommand } = await import("./commands.js");
+      await fireStatusViaCommand(ctx);
+    }
+    return;
+  }
+
+  // ── No active slice ──────────────────────────────────────────────────
+  if (!state.activeSlice) {
+    const roadmapFile = resolveMilestoneFile(basePath, milestoneId, "ROADMAP");
+    const hasRoadmap = !!(roadmapFile && await loadFile(roadmapFile));
+
+    if (!hasRoadmap) {
+      // No roadmap → discuss or plan
+      const contextFile = resolveMilestoneFile(basePath, milestoneId, "CONTEXT");
+      const hasContext = !!(contextFile && await loadFile(contextFile));
+
+      const actions = [
+        {
+          id: "plan",
+          label: "Create roadmap",
+          description: hasContext
+            ? "Context captured. Decompose into slices with a boundary map."
+            : "Decompose the milestone into slices with a boundary map.",
+          recommended: true,
+        },
+        ...(!hasContext ? [{
+          id: "discuss",
+          label: "Discuss first",
+          description: "Capture decisions on gray areas before planning.",
+        }] : []),
+      ];
+
+      const choice = await showNextAction(ctx as any, {
+        title: `GSD — ${milestoneId}: ${milestoneTitle}`,
+        summary: [hasContext ? "Context captured. Ready to create roadmap." : "New milestone — no roadmap yet."],
+        actions,
+        notYetMessage: "Run /gsd when ready.",
+      });
+
+      if (choice === "plan") {
+        dispatchWorkflow(pi, loadPrompt("guided-plan-milestone", {
+          milestoneId, milestoneTitle,
+        }));
+      } else if (choice === "discuss") {
+        dispatchWorkflow(pi, loadPrompt("guided-discuss-milestone", {
+          milestoneId, milestoneTitle,
+        }));
+      }
+    } else {
+      // Roadmap exists — either blocked or ready for auto
+      const actions = [
+        {
+          id: "auto",
+          label: "Go auto",
+          description: "Execute everything automatically until milestone complete.",
+          recommended: true,
+        },
+        {
+          id: "status",
+          label: "View status",
+          description: "See milestone progress and blockers.",
+        },
+      ];
+
+      const choice = await showNextAction(ctx as any, {
+        title: `GSD — ${milestoneId}: ${milestoneTitle}`,
+        summary: ["Roadmap exists. Ready to execute."],
+        actions,
+        notYetMessage: "Run /gsd status for details.",
+      });
+
+      if (choice === "auto") {
+        await startAuto(ctx, pi, basePath, false);
+      } else if (choice === "status") {
+        const { fireStatusViaCommand } = await import("./commands.js");
+        await fireStatusViaCommand(ctx);
+      }
+    }
+    return;
+  }
+
+  const sliceId = state.activeSlice.id;
+  const sliceTitle = state.activeSlice.title;
+
+  // ── Slice needs planning ─────────────────────────────────────────────
+  if (state.phase === "planning") {
+    const contextFile = resolveSliceFile(basePath, milestoneId, sliceId, "CONTEXT");
+    const researchFile = resolveSliceFile(basePath, milestoneId, sliceId, "RESEARCH");
+    const hasContext = !!(contextFile && await loadFile(contextFile));
+    const hasResearch = !!(researchFile && await loadFile(researchFile));
+
+    const actions = [
+      {
+        id: "plan",
+        label: `Plan ${sliceId}`,
+        description: `Decompose "${sliceTitle}" into tasks with must-haves.`,
+        recommended: true,
+      },
+      ...(!hasContext ? [{
+        id: "discuss",
+        label: `Discuss ${sliceId} first`,
+        description: "Capture context and decisions for this slice.",
+      }] : []),
+      ...(!hasResearch ? [{
+        id: "research",
+        label: `Research ${sliceId} first`,
+        description: "Scout codebase and relevant docs.",
+      }] : []),
+      {
+        id: "status",
+        label: "View status",
+        description: "See milestone progress.",
+      },
+    ];
+
+    const summaryParts = [];
+    if (hasContext) summaryParts.push("context ✓");
+    if (hasResearch) summaryParts.push("research ✓");
+    const summaryLine = summaryParts.length > 0
+      ? `${sliceId}: ${sliceTitle} (${summaryParts.join(", ")})`
+      : `${sliceId}: ${sliceTitle} — ready for planning.`;
+
+    const choice = await showNextAction(ctx as any, {
+      title: `GSD — ${milestoneId} / ${sliceId}: ${sliceTitle}`,
+      summary: [summaryLine],
+      actions,
+      notYetMessage: "Run /gsd when ready.",
+    });
+
+    if (choice === "plan") {
+      dispatchWorkflow(pi, loadPrompt("guided-plan-slice", {
+        milestoneId, sliceId, sliceTitle,
+      }));
+    } else if (choice === "discuss") {
+      dispatchWorkflow(pi, await buildDiscussSlicePrompt(milestoneId, sliceId, sliceTitle, basePath));
+    } else if (choice === "research") {
+      dispatchWorkflow(pi, loadPrompt("guided-research-slice", {
+        milestoneId, sliceId, sliceTitle,
+      }));
+    } else if (choice === "status") {
+      const { fireStatusViaCommand } = await import("./commands.js");
+      await fireStatusViaCommand(ctx);
+    }
+    return;
+  }
+
+  // ── All tasks done → Complete slice ──────────────────────────────────
+  if (state.phase === "summarizing") {
+    const choice = await showNextAction(ctx as any, {
+      title: `GSD — ${milestoneId} / ${sliceId}: ${sliceTitle}`,
+      summary: ["All tasks complete. Ready for slice summary."],
+      actions: [
+        {
+          id: "complete",
+          label: `Complete ${sliceId}`,
+          description: "Write slice summary, UAT, mark done, and squash-merge to main.",
+          recommended: true,
+        },
+        {
+          id: "status",
+          label: "View status",
+          description: "Review tasks before completing.",
+        },
+      ],
+      notYetMessage: "Run /gsd when ready.",
+    });
+
+    if (choice === "complete") {
+      dispatchWorkflow(pi, loadPrompt("guided-complete-slice", {
+        milestoneId, sliceId, sliceTitle,
+      }));
+    } else if (choice === "status") {
+      const { fireStatusViaCommand } = await import("./commands.js");
+      await fireStatusViaCommand(ctx);
+    }
+    return;
+  }
+
+  // ── Active task → Execute ────────────────────────────────────────────
+  if (state.activeTask) {
+    const taskId = state.activeTask.id;
+    const taskTitle = state.activeTask.title;
+
+    const continueFile = resolveSliceFile(basePath, milestoneId, sliceId, "CONTINUE");
+    const sDir = resolveSlicePath(basePath, milestoneId, sliceId);
+    const hasInterrupted = !!(continueFile && await loadFile(continueFile)) ||
+      !!(sDir && await loadFile(join(sDir, "continue.md")));
+
+    const choice = await showNextAction(ctx as any, {
+      title: `GSD — ${milestoneId} / ${sliceId}: ${sliceTitle}`,
+      summary: [
+        hasInterrupted
+          ? `Resuming: ${taskId} — ${taskTitle}`
+          : `Next: ${taskId} — ${taskTitle}`,
+      ],
+      actions: [
+        {
+          id: "execute",
+          label: hasInterrupted ? `Resume ${taskId}` : `Execute ${taskId}`,
+          description: hasInterrupted
+            ? "Continue from where you left off."
+            : `Start working on "${taskTitle}".`,
+          recommended: true,
+        },
+        {
+          id: "auto",
+          label: "Go auto",
+          description: "Execute this and all remaining tasks automatically.",
+        },
+        {
+          id: "status",
+          label: "View status",
+          description: "See slice progress before starting.",
+        },
+      ],
+      notYetMessage: "Run /gsd when ready.",
+    });
+
+    if (choice === "auto") {
+      await startAuto(ctx, pi, basePath, false);
+      return;
+    }
+
+    if (choice === "execute") {
+      if (hasInterrupted) {
+        dispatchWorkflow(pi, loadPrompt("guided-resume-task", {
+          milestoneId, sliceId,
+        }));
+      } else {
+        dispatchWorkflow(pi, loadPrompt("guided-execute-task", {
+          milestoneId, sliceId, taskId, taskTitle,
+        }));
+      }
+    } else if (choice === "status") {
+      const { fireStatusViaCommand } = await import("./commands.js");
+      await fireStatusViaCommand(ctx);
+    }
+    return;
+  }
+
+  // ── Fallback: show status ────────────────────────────────────────────
+  const { fireStatusViaCommand } = await import("./commands.js");
+  await fireStatusViaCommand(ctx);
+}
--- a/src/resources/extensions/gsd/index.ts
+++ b/src/resources/extensions/gsd/index.ts
@ -0,0 +1,418 @@
+/**
+ * GSD Extension — /gsd
+ *
+ * One command, one wizard. Reads state from disk, shows contextual options,
+ * dispatches through GSD-WORKFLOW.md. The LLM does the rest.
+ *
+ * Auto-mode: /gsd auto loops fresh sessions until milestone complete.
+ *
+ * Commands:
+ *   /gsd        — contextual wizard (smart entry point)
+ *   /gsd auto   — start auto-mode (fresh session per unit)
+ *   /gsd stop   — stop auto-mode gracefully
+ *   /gsd status — progress dashboard
+ *
+ * Hooks:
+ *   before_agent_start — inject GSD system context for GSD projects
+ *   agent_end — auto-mode advancement
+ *   session_before_compact — save continue.md OR block during auto
+ */
+
+import type {
+  ExtensionAPI,
+  ExtensionContext,
+} from "@mariozechner/pi-coding-agent";
+
+import { registerGSDCommand } from "./commands.js";
+import { saveFile, formatContinue, loadFile, parseContinue, parseSummary } from "./files.js";
+import { loadPrompt } from "./prompt-loader.js";
+import { deriveState } from "./state.js";
+import { isAutoActive, isAutoPaused, handleAgentEnd, pauseAuto, getAutoDashboardData } from "./auto.js";
+import { saveActivityLog } from "./activity-log.js";
+import { checkAutoStartAfterDiscuss } from "./guided-flow.js";
+import { GSDDashboardOverlay } from "./dashboard-overlay.js";
+import {
+  loadEffectiveGSDPreferences,
+  renderPreferencesForSystemPrompt,
+  resolveAllSkillReferences,
+} from "./preferences.js";
+import { hasSkillSnapshot, detectNewSkills, formatSkillsXml } from "./skill-discovery.js";
+import {
+  resolveSlicePath, resolveSliceFile, resolveTaskFile, resolveTaskFiles, resolveTasksDir,
+  relSliceFile, relSlicePath, relTaskFile,
+  buildSliceFileName, gsdRoot,
+} from "./paths.js";
+import { Key } from "@mariozechner/pi-tui";
+import { join } from "node:path";
+import { existsSync } from "node:fs";
+import { Text } from "@mariozechner/pi-tui";
+
+// ── ASCII logo ────────────────────────────────────────────────────────────
+const GSD_LOGO_LINES = [
+  "   ██████╗ ███████╗██████╗ ",
+  "  ██╔════╝ ██╔════╝██╔══██╗",
+  "  ██║  ███╗███████╗██║  ██║",
+  "  ██║   ██║╚════██║██║  ██║",
+  "  ╚██████╔╝███████║██████╔╝",
+  "   ╚═════╝ ╚══════╝╚═════╝ ",
+];
+
+export default function (pi: ExtensionAPI) {
+  registerGSDCommand(pi);
+
+  // ── session_start: render branded GSD header ───────────────────────────
+  pi.on("session_start", async (_event, ctx) => {
+    const theme = ctx.ui.theme;
+    const version = process.env.GSD_VERSION || "0.0.0";
+
+    const logoText = GSD_LOGO_LINES.map((line) => theme.fg("accent", line)).join("\n");
+    const titleLine = `  ${theme.bold("Get Shit Done")} ${theme.fg("dim", `v${version}`)}`;
+
+    const headerContent = `${logoText}\n${titleLine}`;
+    ctx.ui.setHeader((_ui, _theme) => new Text(headerContent, 1, 0));
+  });
+
+  // ── Ctrl+Alt+G shortcut — GSD dashboard overlay ────────────────────────
+  pi.registerShortcut(Key.ctrlAlt("g"), {
+    description: "Open GSD dashboard",
+    handler: async (ctx) => {
+      // Only show if .gsd/ exists
+      if (!existsSync(join(process.cwd(), ".gsd"))) {
+        ctx.ui.notify("No .gsd/ directory found. Run /gsd to start.", "info");
+        return;
+      }
+
+      await ctx.ui.custom<void>(
+        (tui, theme, _kb, done) => {
+          return new GSDDashboardOverlay(tui, theme, () => done());
+        },
+        {
+          overlay: true,
+          overlayOptions: {
+            width: "90%",
+            minWidth: 80,
+            maxHeight: "92%",
+            anchor: "center",
+          },
+        },
+      );
+    },
+  });
+
+  // ── before_agent_start: inject GSD contract into true system prompt ─────
+  pi.on("before_agent_start", async (event, ctx: ExtensionContext) => {
+    if (!existsSync(join(process.cwd(), ".gsd"))) return;
+
+    const systemContent = loadPrompt("system");
+    const loadedPreferences = loadEffectiveGSDPreferences();
+    let preferenceBlock = "";
+    if (loadedPreferences) {
+      const cwd = process.cwd();
+      const report = resolveAllSkillReferences(loadedPreferences.preferences, cwd);
+      preferenceBlock = `\n\n${renderPreferencesForSystemPrompt(loadedPreferences.preferences, report.resolutions)}`;
+
+      // Emit warnings for unresolved skill references
+      if (report.warnings.length > 0) {
+        ctx.ui.notify(
+          `GSD skill preferences: ${report.warnings.length} unresolved skill${report.warnings.length === 1 ? "" : "s"}: ${report.warnings.join(", ")}`,
+          "warning",
+        );
+      }
+    }
+
+    // Detect skills installed during this auto-mode session
+    let newSkillsBlock = "";
+    if (hasSkillSnapshot()) {
+      const newSkills = detectNewSkills();
+      if (newSkills.length > 0) {
+        newSkillsBlock = formatSkillsXml(newSkills);
+      }
+    }
+
+    const injection = await buildGuidedExecuteContextInjection(event.prompt, process.cwd());
+
+    return {
+      systemPrompt: `${event.systemPrompt}\n\n[SYSTEM CONTEXT — GSD]\n\n${systemContent}${preferenceBlock}${newSkillsBlock}`,
+      ...(injection
+        ? {
+          message: {
+            customType: "gsd-guided-context",
+            content: injection,
+            display: false,
+          },
+        }
+        : {}),
+    };
+  });
+
+  // ── agent_end: auto-mode advancement or auto-start after discuss ───────────
+  pi.on("agent_end", async (event, ctx: ExtensionContext) => {
+    // If discuss phase just finished, start auto-mode
+    if (checkAutoStartAfterDiscuss()) return;
+
+    // If auto-mode is already running, advance to next unit
+    if (!isAutoActive()) return;
+
+    // If the agent was aborted (user pressed Escape), pause auto-mode
+    // instead of advancing. This preserves the conversation so the user
+    // can inspect what happened, interact with the agent, or resume.
+    const lastMsg = event.messages[event.messages.length - 1];
+    if (lastMsg && "stopReason" in lastMsg && lastMsg.stopReason === "aborted") {
+      await pauseAuto(ctx, pi);
+      return;
+    }
+
+    await handleAgentEnd(ctx, pi);
+  });
+
+  // ── session_before_compact ────────────────────────────────────────────────
+  pi.on("session_before_compact", async (_event, _ctx: ExtensionContext) => {
+    // Block compaction during auto-mode — each unit is a fresh session
+    // Also block during paused state — context is valuable for the user
+    if (isAutoActive() || isAutoPaused()) {
+      return { cancel: true };
+    }
+
+    const basePath = process.cwd();
+    const state = await deriveState(basePath);
+
+    // Only save continue.md if we're actively executing a task
+    if (!state.activeMilestone || !state.activeSlice || !state.activeTask) return;
+    if (state.phase !== "executing") return;
+
+    const sDir = resolveSlicePath(basePath, state.activeMilestone.id, state.activeSlice.id);
+    if (!sDir) return;
+
+    // Check for existing continue file (new naming or legacy)
+    const existingFile = resolveSliceFile(basePath, state.activeMilestone.id, state.activeSlice.id, "CONTINUE");
+    if (existingFile && await loadFile(existingFile)) return;
+    const legacyContinue = join(sDir, "continue.md");
+    if (await loadFile(legacyContinue)) return;
+
+    const continuePath = join(sDir, buildSliceFileName(state.activeSlice.id, "CONTINUE"));
+
+    const continueData = {
+      frontmatter: {
+        milestone: state.activeMilestone.id,
+        slice: state.activeSlice.id,
+        task: state.activeTask.id,
+        step: 0,
+        totalSteps: 0,
+        status: "compacted" as const,
+        savedAt: new Date().toISOString(),
+      },
+      completedWork: `Task ${state.activeTask.id} (${state.activeTask.title}) was in progress when compaction occurred.`,
+      remainingWork: "Check the task plan for remaining steps.",
+      decisions: "Check task summary files for prior decisions.",
+      context: "Session was auto-compacted by Pi. Resume with /gsd.",
+      nextAction: `Resume task ${state.activeTask.id}: ${state.activeTask.title}.`,
+    };
+
+    await saveFile(continuePath, formatContinue(continueData));
+  });
+
+  // ── session_shutdown: save activity log on Ctrl+C / SIGTERM ─────────────
+  pi.on("session_shutdown", async (_event, ctx: ExtensionContext) => {
+    if (!isAutoActive() && !isAutoPaused()) return;
+
+    // Save the current session — the lock file stays on disk
+    // so the next /gsd auto knows it was interrupted
+    const dash = getAutoDashboardData();
+    if (dash.currentUnit) {
+      saveActivityLog(ctx, dash.basePath, dash.currentUnit.type, dash.currentUnit.id);
+    }
+  });
+}
+
+async function buildGuidedExecuteContextInjection(prompt: string, basePath: string): Promise<string | null> {
+  const executeMatch = prompt.match(/Execute the next task:\s+(T\d+)\s+\("([^"]+)"\)\s+in slice\s+(S\d+)\s+of milestone\s+(M\d+)/i);
+  if (executeMatch) {
+    const [, taskId, taskTitle, sliceId, milestoneId] = executeMatch;
+    return buildTaskExecutionContextInjection(basePath, milestoneId, sliceId, taskId, taskTitle);
+  }
+
+  const resumeMatch = prompt.match(/Resume interrupted work\.[\s\S]*?slice\s+(S\d+)\s+of milestone\s+(M\d+)/i);
+  if (resumeMatch) {
+    const [, sliceId, milestoneId] = resumeMatch;
+    const state = await deriveState(basePath);
+    if (
+      state.activeMilestone?.id === milestoneId &&
+      state.activeSlice?.id === sliceId &&
+      state.activeTask
+    ) {
+      return buildTaskExecutionContextInjection(
+        basePath,
+        milestoneId,
+        sliceId,
+        state.activeTask.id,
+        state.activeTask.title,
+      );
+    }
+  }
+
+  return null;
+}
+
+async function buildTaskExecutionContextInjection(
+  basePath: string,
+  milestoneId: string,
+  sliceId: string,
+  taskId: string,
+  taskTitle: string,
+): Promise<string> {
+  const taskPlanPath = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+  const taskPlanRelPath = relTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+  const taskPlanContent = taskPlanPath ? await loadFile(taskPlanPath) : null;
+  const taskPlanInline = taskPlanContent
+    ? [
+      "## Inlined Task Plan (authoritative local execution contract)",
+      `Source: \`${taskPlanRelPath}\``,
+      "",
+      taskPlanContent.trim(),
+    ].join("\n")
+    : [
+      "## Inlined Task Plan (authoritative local execution contract)",
+      `Task plan not found at dispatch time. Read \`${taskPlanRelPath}\` before executing.`,
+    ].join("\n");
+
+  const slicePlanPath = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+  const slicePlanRelPath = relSliceFile(basePath, milestoneId, sliceId, "PLAN");
+  const slicePlanContent = slicePlanPath ? await loadFile(slicePlanPath) : null;
+  const slicePlanExcerpt = extractSliceExecutionExcerpt(slicePlanContent, slicePlanRelPath);
+
+  const priorTaskLines = await buildCarryForwardLines(basePath, milestoneId, sliceId, taskId);
+  const resumeSection = await buildResumeSection(basePath, milestoneId, sliceId);
+
+  return [
+    "[GSD Guided Execute Context]",
+    "Use this injected context as startup context for guided task execution. Treat the inlined task plan as the authoritative local execution contract. Use source artifacts to verify details and run checks.",
+    "",
+    resumeSection,
+    "",
+    "## Carry-Forward Context",
+    ...priorTaskLines,
+    "",
+    taskPlanInline,
+    "",
+    slicePlanExcerpt,
+    "",
+    "## Backing Source Artifacts",
+    `- Slice plan: \`${slicePlanRelPath}\``,
+    `- Task plan source: \`${taskPlanRelPath}\``,
+  ].join("\n");
+}
+
+async function buildCarryForwardLines(
+  basePath: string,
+  milestoneId: string,
+  sliceId: string,
+  taskId: string,
+): Promise<string[]> {
+  const tDir = resolveTasksDir(basePath, milestoneId, sliceId);
+  if (!tDir) return ["- No prior task summaries in this slice."];
+
+  const currentNum = parseInt(taskId.replace(/^T/, ""), 10);
+  const sRel = relSlicePath(basePath, milestoneId, sliceId);
+  const summaryFiles = resolveTaskFiles(tDir, "SUMMARY")
+    .filter((file) => parseInt(file.replace(/^T/, ""), 10) < currentNum)
+    .sort();
+
+  if (summaryFiles.length === 0) return ["- No prior task summaries in this slice."];
+
+  const lines = await Promise.all(summaryFiles.map(async (file) => {
+    const absPath = join(tDir, file);
+    const content = await loadFile(absPath);
+    const relPath = `${sRel}/tasks/${file}`;
+    if (!content) return `- \`${relPath}\``;
+
+    const summary = parseSummary(content);
+    const provided = summary.frontmatter.provides.slice(0, 2).join("; ");
+    const decisions = summary.frontmatter.key_decisions.slice(0, 2).join("; ");
+    const patterns = summary.frontmatter.patterns_established.slice(0, 2).join("; ");
+    const diagnostics = extractMarkdownSection(content, "Diagnostics");
+
+    const parts = [summary.title || relPath];
+    if (summary.oneLiner) parts.push(summary.oneLiner);
+    if (provided) parts.push(`provides: ${provided}`);
+    if (decisions) parts.push(`decisions: ${decisions}`);
+    if (patterns) parts.push(`patterns: ${patterns}`);
+    if (diagnostics) parts.push(`diagnostics: ${oneLine(diagnostics)}`);
+
+    return `- \`${relPath}\` — ${parts.join(" | ")}`;
+  }));
+
+  return lines;
+}
+
+async function buildResumeSection(basePath: string, milestoneId: string, sliceId: string): Promise<string> {
+  const continueFile = resolveSliceFile(basePath, milestoneId, sliceId, "CONTINUE");
+  const legacyDir = resolveSlicePath(basePath, milestoneId, sliceId);
+  const legacyPath = legacyDir ? join(legacyDir, "continue.md") : null;
+  const continueContent = continueFile ? await loadFile(continueFile) : null;
+  const legacyContent = !continueContent && legacyPath ? await loadFile(legacyPath) : null;
+  const resolvedContent = continueContent ?? legacyContent;
+  const resolvedRelPath = continueContent
+    ? relSliceFile(basePath, milestoneId, sliceId, "CONTINUE")
+    : (legacyPath ? `${relSlicePath(basePath, milestoneId, sliceId)}/continue.md` : null);
+
+  if (!resolvedContent || !resolvedRelPath) {
+    return ["## Resume State", "- No continue file present. Start from the top of the task plan."].join("\n");
+  }
+
+  const cont = parseContinue(resolvedContent);
+  const lines = [
+    "## Resume State",
+    `Source: \`${resolvedRelPath}\``,
+    `- Status: ${cont.frontmatter.status || "in_progress"}`,
+  ];
+
+  if (cont.frontmatter.step && cont.frontmatter.totalSteps) {
+    lines.push(`- Progress: step ${cont.frontmatter.step} of ${cont.frontmatter.totalSteps}`);
+  }
+  if (cont.completedWork) lines.push(`- Completed: ${oneLine(cont.completedWork)}`);
+  if (cont.remainingWork) lines.push(`- Remaining: ${oneLine(cont.remainingWork)}`);
+  if (cont.decisions) lines.push(`- Decisions: ${oneLine(cont.decisions)}`);
+  if (cont.nextAction) lines.push(`- Next action: ${oneLine(cont.nextAction)}`);
+
+  return lines.join("\n");
+}
+
+function extractSliceExecutionExcerpt(content: string | null, relPath: string): string {
+  if (!content) {
+    return [
+      "## Slice Plan Excerpt",
+      `Slice plan not found at dispatch time. Read \`${relPath}\` before running slice-level verification.`,
+    ].join("\n");
+  }
+
+  const lines = content.split("\n");
+  const goalLine = lines.find((line) => line.startsWith("**Goal:**"))?.trim();
+  const demoLine = lines.find((line) => line.startsWith("**Demo:**"))?.trim();
+  const verification = extractMarkdownSection(content, "Verification");
+  const observability = extractMarkdownSection(content, "Observability / Diagnostics");
+
+  const parts = ["## Slice Plan Excerpt", `Source: \`${relPath}\``];
+  if (goalLine) parts.push(goalLine);
+  if (demoLine) parts.push(demoLine);
+  if (verification) parts.push("", "### Slice Verification", verification.trim());
+  if (observability) parts.push("", "### Slice Observability / Diagnostics", observability.trim());
+  return parts.join("\n");
+}
+
+function extractMarkdownSection(content: string, heading: string): string | null {
+  const match = new RegExp(`^## ${escapeRegExp(heading)}\\s*$`, "m").exec(content);
+  if (!match) return null;
+  const start = match.index + match[0].length;
+  const rest = content.slice(start);
+  const nextHeading = rest.match(/^##\s+/m);
+  const end = nextHeading?.index ?? rest.length;
+  return rest.slice(0, end).trim();
+}
+
+function escapeRegExp(value: string): string {
+  return value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+}
+
+function oneLine(text: string): string {
+  return text.replace(/\s+/g, " ").trim();
+}
--- a/src/resources/extensions/gsd/metrics.ts
+++ b/src/resources/extensions/gsd/metrics.ts
@ -0,0 +1,372 @@
+/**
+ * GSD Metrics — Token & Cost Tracking
+ *
+ * Accumulates per-unit usage data across auto-mode sessions.
+ * Data is extracted from session entries before each context wipe,
+ * written to .gsd/metrics.json, and surfaced in the dashboard.
+ *
+ * Data flow:
+ *   1. Before newSession() wipes context, snapshotUnitMetrics() scans
+ *      session entries for AssistantMessage usage data
+ *   2. The unit record is appended to the in-memory ledger and flushed to disk
+ *   3. The dashboard overlay and progress widget read from the in-memory ledger
+ *   4. On crash recovery or fresh start, the ledger is loaded from disk
+ */
+
+import { readFileSync, writeFileSync, mkdirSync } from "node:fs";
+import { join } from "node:path";
+import type { ExtensionContext } from "@mariozechner/pi-coding-agent";
+import { gsdRoot } from "./paths.js";
+
+// ─── Types ────────────────────────────────────────────────────────────────────
+
+export interface TokenCounts {
+  input: number;
+  output: number;
+  cacheRead: number;
+  cacheWrite: number;
+  total: number;
+}
+
+export interface UnitMetrics {
+  type: string;            // e.g. "research-milestone", "execute-task"
+  id: string;              // e.g. "M001/S01/T01"
+  model: string;           // model ID used
+  startedAt: number;       // ms timestamp
+  finishedAt: number;      // ms timestamp
+  tokens: TokenCounts;
+  cost: number;            // total USD cost
+  toolCalls: number;
+  assistantMessages: number;
+  userMessages: number;
+}
+
+export interface MetricsLedger {
+  version: 1;
+  projectStartedAt: number;
+  units: UnitMetrics[];
+}
+
+// ─── Phase classification ─────────────────────────────────────────────────────
+
+export type MetricsPhase = "research" | "planning" | "execution" | "completion" | "reassessment";
+
+export function classifyUnitPhase(unitType: string): MetricsPhase {
+  switch (unitType) {
+    case "research-milestone":
+    case "research-slice":
+      return "research";
+    case "plan-milestone":
+    case "plan-slice":
+      return "planning";
+    case "execute-task":
+      return "execution";
+    case "complete-slice":
+      return "completion";
+    case "reassess-roadmap":
+      return "reassessment";
+    default:
+      return "execution";
+  }
+}
+
+// ─── In-memory state ──────────────────────────────────────────────────────────
+
+let ledger: MetricsLedger | null = null;
+let basePath: string = "";
+
+// ─── Public API ───────────────────────────────────────────────────────────────
+
+/**
+ * Initialize the metrics system for a given project.
+ * Loads existing ledger from disk if present.
+ */
+export function initMetrics(base: string): void {
+  basePath = base;
+  ledger = loadLedger(base);
+}
+
+/**
+ * Reset in-memory state. Called when auto-mode stops.
+ */
+export function resetMetrics(): void {
+  ledger = null;
+  basePath = "";
+}
+
+/**
+ * Snapshot usage metrics from the current session before it's wiped.
+ * Scans session entries for AssistantMessage usage data.
+ */
+export function snapshotUnitMetrics(
+  ctx: ExtensionContext,
+  unitType: string,
+  unitId: string,
+  startedAt: number,
+  model: string,
+): UnitMetrics | null {
+  if (!ledger) return null;
+
+  const entries = ctx.sessionManager.getEntries();
+  if (!entries || entries.length === 0) return null;
+
+  const tokens: TokenCounts = { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 };
+  let cost = 0;
+  let toolCalls = 0;
+  let assistantMessages = 0;
+  let userMessages = 0;
+
+  for (const entry of entries) {
+    if (entry.type !== "message") continue;
+    const msg = (entry as any).message;
+    if (!msg) continue;
+
+    if (msg.role === "assistant") {
+      assistantMessages++;
+      if (msg.usage) {
+        tokens.input += msg.usage.input ?? 0;
+        tokens.output += msg.usage.output ?? 0;
+        tokens.cacheRead += msg.usage.cacheRead ?? 0;
+        tokens.cacheWrite += msg.usage.cacheWrite ?? 0;
+        tokens.total += msg.usage.totalTokens ?? 0;
+        if (msg.usage.cost) {
+          cost += msg.usage.cost.total ?? 0;
+        }
+      }
+      // Count tool calls in this message
+      if (msg.content && Array.isArray(msg.content)) {
+        for (const block of msg.content) {
+          if (block.type === "tool_call") toolCalls++;
+        }
+      }
+    } else if (msg.role === "user") {
+      userMessages++;
+    }
+  }
+
+  const unit: UnitMetrics = {
+    type: unitType,
+    id: unitId,
+    model,
+    startedAt,
+    finishedAt: Date.now(),
+    tokens,
+    cost,
+    toolCalls,
+    assistantMessages,
+    userMessages,
+  };
+
+  ledger.units.push(unit);
+  saveLedger(basePath, ledger);
+
+  return unit;
+}
+
+/**
+ * Get the current ledger (read-only).
+ */
+export function getLedger(): MetricsLedger | null {
+  return ledger;
+}
+
+// ─── Aggregation helpers ──────────────────────────────────────────────────────
+
+export interface PhaseAggregate {
+  phase: MetricsPhase;
+  units: number;
+  tokens: TokenCounts;
+  cost: number;
+  duration: number;  // ms
+}
+
+export interface SliceAggregate {
+  sliceId: string;
+  units: number;
+  tokens: TokenCounts;
+  cost: number;
+  duration: number;
+}
+
+export interface ModelAggregate {
+  model: string;
+  units: number;
+  tokens: TokenCounts;
+  cost: number;
+}
+
+export interface ProjectTotals {
+  units: number;
+  tokens: TokenCounts;
+  cost: number;
+  duration: number;
+  toolCalls: number;
+  assistantMessages: number;
+  userMessages: number;
+}
+
+function emptyTokens(): TokenCounts {
+  return { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 };
+}
+
+function addTokens(a: TokenCounts, b: TokenCounts): TokenCounts {
+  return {
+    input: a.input + b.input,
+    output: a.output + b.output,
+    cacheRead: a.cacheRead + b.cacheRead,
+    cacheWrite: a.cacheWrite + b.cacheWrite,
+    total: a.total + b.total,
+  };
+}
+
+export function aggregateByPhase(units: UnitMetrics[]): PhaseAggregate[] {
+  const map = new Map<MetricsPhase, PhaseAggregate>();
+  for (const u of units) {
+    const phase = classifyUnitPhase(u.type);
+    let agg = map.get(phase);
+    if (!agg) {
+      agg = { phase, units: 0, tokens: emptyTokens(), cost: 0, duration: 0 };
+      map.set(phase, agg);
+    }
+    agg.units++;
+    agg.tokens = addTokens(agg.tokens, u.tokens);
+    agg.cost += u.cost;
+    agg.duration += u.finishedAt - u.startedAt;
+  }
+  // Return in a stable order
+  const order: MetricsPhase[] = ["research", "planning", "execution", "completion", "reassessment"];
+  return order.map(p => map.get(p)).filter((a): a is PhaseAggregate => !!a);
+}
+
+export function aggregateBySlice(units: UnitMetrics[]): SliceAggregate[] {
+  const map = new Map<string, SliceAggregate>();
+  for (const u of units) {
+    const parts = u.id.split("/");
+    // Slice ID is parts[0]/parts[1] if it exists, else parts[0]
+    const sliceId = parts.length >= 2 ? `${parts[0]}/${parts[1]}` : parts[0];
+    let agg = map.get(sliceId);
+    if (!agg) {
+      agg = { sliceId, units: 0, tokens: emptyTokens(), cost: 0, duration: 0 };
+      map.set(sliceId, agg);
+    }
+    agg.units++;
+    agg.tokens = addTokens(agg.tokens, u.tokens);
+    agg.cost += u.cost;
+    agg.duration += u.finishedAt - u.startedAt;
+  }
+  return Array.from(map.values()).sort((a, b) => a.sliceId.localeCompare(b.sliceId));
+}
+
+export function aggregateByModel(units: UnitMetrics[]): ModelAggregate[] {
+  const map = new Map<string, ModelAggregate>();
+  for (const u of units) {
+    let agg = map.get(u.model);
+    if (!agg) {
+      agg = { model: u.model, units: 0, tokens: emptyTokens(), cost: 0 };
+      map.set(u.model, agg);
+    }
+    agg.units++;
+    agg.tokens = addTokens(agg.tokens, u.tokens);
+    agg.cost += u.cost;
+  }
+  return Array.from(map.values()).sort((a, b) => b.cost - a.cost);
+}
+
+export function getProjectTotals(units: UnitMetrics[]): ProjectTotals {
+  const totals: ProjectTotals = {
+    units: units.length,
+    tokens: emptyTokens(),
+    cost: 0,
+    duration: 0,
+    toolCalls: 0,
+    assistantMessages: 0,
+    userMessages: 0,
+  };
+  for (const u of units) {
+    totals.tokens = addTokens(totals.tokens, u.tokens);
+    totals.cost += u.cost;
+    totals.duration += u.finishedAt - u.startedAt;
+    totals.toolCalls += u.toolCalls;
+    totals.assistantMessages += u.assistantMessages;
+    totals.userMessages += u.userMessages;
+  }
+  return totals;
+}
+
+// ─── Formatting helpers ───────────────────────────────────────────────────────
+
+export function formatCost(cost: number): string {
+  if (cost < 0.01) return `$${cost.toFixed(4)}`;
+  if (cost < 1) return `$${cost.toFixed(3)}`;
+  return `$${cost.toFixed(2)}`;
+}
+
+/**
+ * Compute a projected remaining cost based on completed slice averages.
+ *
+ * Filters to slice-level entries (sliceId contains "/") to exclude bare milestone
+ * aggregates from the average. Returns [] when fewer than 2 slice-level entries
+ * exist (insufficient data for a reliable projection).
+ *
+ * If `budgetCeiling` is provided and `totalCost >= budgetCeiling`, a warning line
+ * is appended to the result.
+ */
+export function formatCostProjection(
+  completedSlices: SliceAggregate[],
+  remainingCount: number,
+  budgetCeiling?: number,
+): string[] {
+  const sliceLevel = completedSlices.filter(s => s.sliceId.includes("/"));
+  if (sliceLevel.length < 2) return [];
+
+  const totalCost = sliceLevel.reduce((sum, s) => sum + s.cost, 0);
+  const avgCost = totalCost / sliceLevel.length;
+  const projected = avgCost * remainingCount;
+
+  const projLine = `Projected remaining: ${formatCost(projected)} (${formatCost(avgCost)}/slice avg × ${remainingCount} remaining)`;
+  const result: string[] = [projLine];
+
+  if (budgetCeiling !== undefined && totalCost >= budgetCeiling) {
+    result.push(`Budget ceiling ${formatCost(budgetCeiling)} reached (spent ${formatCost(totalCost)})`);
+  }
+
+  return result;
+}
+
+export function formatTokenCount(count: number): string {
+  if (count < 1000) return `${count}`;
+  if (count < 1_000_000) return `${(count / 1000).toFixed(1)}k`;
+  return `${(count / 1_000_000).toFixed(2)}M`;
+}
+
+// ─── Disk I/O ─────────────────────────────────────────────────────────────────
+
+function metricsPath(base: string): string {
+  return join(gsdRoot(base), "metrics.json");
+}
+
+function loadLedger(base: string): MetricsLedger {
+  try {
+    const raw = readFileSync(metricsPath(base), "utf-8");
+    const parsed = JSON.parse(raw);
+    if (parsed.version === 1 && Array.isArray(parsed.units)) {
+      return parsed as MetricsLedger;
+    }
+  } catch {
+    // File doesn't exist or is corrupt — start fresh
+  }
+  return {
+    version: 1,
+    projectStartedAt: Date.now(),
+    units: [],
+  };
+}
+
+function saveLedger(base: string, data: MetricsLedger): void {
+  try {
+    mkdirSync(gsdRoot(base), { recursive: true });
+    writeFileSync(metricsPath(base), JSON.stringify(data, null, 2) + "\n", "utf-8");
+  } catch {
+    // Don't let metrics failures break auto-mode
+  }
+}
--- a/src/resources/extensions/gsd/observability-validator.ts
+++ b/src/resources/extensions/gsd/observability-validator.ts
@ -0,0 +1,408 @@
+import { loadFile } from "./files.js";
+import { resolveSliceFile, resolveTaskFile, resolveTasksDir, resolveTaskFiles } from "./paths.js";
+
+export interface ValidationIssue {
+  severity: "info" | "warning" | "error";
+  scope: "slice-plan" | "task-plan" | "task-summary" | "slice-summary";
+  file: string;
+  ruleId: string;
+  message: string;
+  suggestion?: string;
+}
+
+function getSection(content: string, heading: string, level: number = 2): string | null {
+  const prefix = "#".repeat(level) + " ";
+  const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+  const regex = new RegExp(`^${prefix}${escaped}\\s*$`, "m");
+  const match = regex.exec(content);
+  if (!match) return null;
+
+  const start = match.index + match[0].length;
+  const rest = content.slice(start);
+  const nextHeading = rest.match(new RegExp(`^#{1,${level}} `, "m"));
+  const end = nextHeading ? nextHeading.index! : rest.length;
+  return rest.slice(0, end).trim();
+}
+
+function getFrontmatter(content: string): string | null {
+  const trimmed = content.trimStart();
+  if (!trimmed.startsWith("---")) return null;
+  const afterFirst = trimmed.indexOf("\n");
+  if (afterFirst === -1) return null;
+  const rest = trimmed.slice(afterFirst + 1);
+  const endIdx = rest.indexOf("\n---");
+  if (endIdx === -1) return null;
+  return rest.slice(0, endIdx);
+}
+
+function hasFrontmatterKey(content: string, key: string): boolean {
+  const fm = getFrontmatter(content);
+  if (!fm) return false;
+  return new RegExp(`^${key}:`, "m").test(fm);
+}
+
+function normalizeMeaningfulLines(text: string): string[] {
+  return text
+    .split("\n")
+    .map(line => line.trim())
+    .filter(line => line.length > 0)
+    .filter(line => !line.startsWith("<!--"))
+    .filter(line => !line.endsWith("-->"))
+    .filter(line => !/^[-*]\s*\{\{.+\}\}$/.test(line))
+    .filter(line => !/^\{\{.+\}\}$/.test(line));
+}
+
+function sectionLooksPlaceholderOnly(text: string | null): boolean {
+  if (!text) return true;
+  const lines = normalizeMeaningfulLines(text)
+    .map(line => line.replace(/^[-*]\s+/, "").trim())
+    .filter(line => line.length > 0);
+
+  if (lines.length === 0) return true;
+
+  return lines.every(line => {
+    const lower = line.toLowerCase();
+    return lower === "none" ||
+      lower.endsWith(": none") ||
+      lower.includes("{{") ||
+      lower.includes("}}") ||
+      lower.startsWith("required for non-trivial") ||
+      lower.startsWith("describe how a future agent") ||
+      lower.startsWith("prefer:") ||
+      lower.startsWith("keep this section concise");
+  });
+}
+
+function textSuggestsObservabilityRelevant(content: string): boolean {
+  const lower = content.toLowerCase();
+  const needles = [
+    " api", "route", "server", "worker", "queue", "job", "sync", "import",
+    "webhook", "auth", "db", "database", "migration", "cache", "background",
+    "polling", "realtime", "socket", "stateful", "integration", "ui", "form",
+    "submit", "status", "service", "pipeline", "health endpoint", "error path"
+  ];
+  return needles.some(needle => lower.includes(needle));
+}
+
+function verificationMentionsDiagnostics(section: string | null): boolean {
+  if (!section) return false;
+  const lower = section.toLowerCase();
+  const needles = [
+    "error", "failure", "diagnostic", "status", "health", "inspect", "log",
+    "network", "console", "retry", "last error", "correlation", "readiness"
+  ];
+  return needles.some(needle => lower.includes(needle));
+}
+
+export function validateSlicePlanContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+
+  // ── Plan quality rules (always run, not gated by runtime relevance) ──
+
+  const tasksSection = getSection(content, "Tasks", 2);
+  if (tasksSection) {
+    const lines = tasksSection.split("\n");
+    const taskLinePattern = /^- \[[ x]\] \*\*T\d+:/;
+    const taskLineIndices: number[] = [];
+    for (let i = 0; i < lines.length; i++) {
+      if (taskLinePattern.test(lines[i])) taskLineIndices.push(i);
+    }
+
+    for (let t = 0; t < taskLineIndices.length; t++) {
+      const start = taskLineIndices[t];
+      const end = t + 1 < taskLineIndices.length ? taskLineIndices[t + 1] : lines.length;
+      // Check lines between this task header and the next (or section end)
+      const bodyLines = lines.slice(start + 1, end);
+      const meaningful = bodyLines.filter(l => l.trim().length > 0);
+      if (meaningful.length === 0) {
+        issues.push({
+          severity: "warning",
+          scope: "slice-plan",
+          file,
+          ruleId: "empty_task_entry",
+          message: "Inline task entry has no description content beneath the checkbox line.",
+          suggestion: "Add at least a Why/Files/Do/Verify summary so the task is self-describing.",
+        });
+      }
+    }
+  }
+
+  // ── Observability rules (gated by runtime relevance) ──
+
+  const relevant = textSuggestsObservabilityRelevant(content);
+  if (!relevant) return issues;
+
+  const obs = getSection(content, "Observability / Diagnostics", 2);
+  const verification = getSection(content, "Verification", 2);
+
+  if (!obs) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-plan",
+      file,
+      ruleId: "missing_observability_section",
+      message: "Slice plan appears non-trivial but is missing `## Observability / Diagnostics`.",
+      suggestion: "Add runtime signals, inspection surfaces, failure visibility, and redaction constraints.",
+    });
+  } else if (sectionLooksPlaceholderOnly(obs)) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-plan",
+      file,
+      ruleId: "observability_section_placeholder_only",
+      message: "Slice plan has `## Observability / Diagnostics` but it still looks like placeholder text.",
+      suggestion: "Replace placeholders with concrete signals and inspection surfaces a future agent should trust.",
+    });
+  }
+
+  if (!verificationMentionsDiagnostics(verification)) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-plan",
+      file,
+      ruleId: "verification_missing_diagnostic_check",
+      message: "Slice verification does not appear to include any diagnostic or failure-path check.",
+      suggestion: "Add at least one verification step for inspectable failure state, structured error output, status surface, or equivalent.",
+    });
+  }
+
+  return issues;
+}
+
+export function validateTaskPlanContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+
+  // ── Plan quality rules (always run, not gated by runtime relevance) ──
+
+  // Rule: empty or missing Steps section
+  const stepsSection = getSection(content, "Steps", 2);
+  if (stepsSection === null || sectionLooksPlaceholderOnly(stepsSection)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "empty_steps_section",
+      message: "Task plan has an empty or missing `## Steps` section.",
+      suggestion: "Add concrete numbered implementation steps so execution has a clear sequence.",
+    });
+  }
+
+  // Rule: placeholder-only Verification section
+  const verificationSection = getSection(content, "Verification", 2);
+  if (verificationSection !== null && sectionLooksPlaceholderOnly(verificationSection)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "placeholder_verification",
+      message: "Task plan has `## Verification` but it still looks like placeholder text.",
+      suggestion: "Replace placeholders with concrete verification commands, test runs, or observable checks.",
+    });
+  }
+
+  // Rule: scope estimate thresholds
+  const fm = getFrontmatter(content);
+  if (fm) {
+    const stepsMatch = fm.match(/^estimated_steps:\s*(\d+)/m);
+    const filesMatch = fm.match(/^estimated_files:\s*(\d+)/m);
+
+    if (stepsMatch) {
+      const estimatedSteps = parseInt(stepsMatch[1], 10);
+      if (estimatedSteps >= 10) {
+        issues.push({
+          severity: "warning",
+          scope: "task-plan",
+          file,
+          ruleId: "scope_estimate_steps_high",
+          message: `Task plan estimates ${estimatedSteps} steps (threshold: 10). Consider splitting into smaller tasks.`,
+          suggestion: "Break the task into sub-tasks or reduce scope so each task stays focused and completable in one pass.",
+        });
+      }
+    }
+
+    if (filesMatch) {
+      const estimatedFiles = parseInt(filesMatch[1], 10);
+      if (estimatedFiles >= 12) {
+        issues.push({
+          severity: "warning",
+          scope: "task-plan",
+          file,
+          ruleId: "scope_estimate_files_high",
+          message: `Task plan estimates ${estimatedFiles} files (threshold: 12). Consider splitting into smaller tasks.`,
+          suggestion: "Break the task into sub-tasks or reduce scope to keep the change footprint manageable.",
+        });
+      }
+    }
+  }
+
+  // ── Observability rules (gated by runtime relevance) ──
+
+  const relevant = textSuggestsObservabilityRelevant(content);
+  if (!relevant) return issues;
+
+  const obs = getSection(content, "Observability Impact", 2);
+  if (!obs) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "missing_observability_impact",
+      message: "Task plan appears runtime-relevant but is missing `## Observability Impact`.",
+      suggestion: "Explain what signals change, how a future agent inspects this task, and what failure state becomes visible.",
+    });
+  } else if (sectionLooksPlaceholderOnly(obs)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "observability_impact_placeholder_only",
+      message: "Task plan has `## Observability Impact` but it still looks empty or placeholder-only.",
+      suggestion: "Fill in concrete inspection surfaces or explicitly justify why observability is not applicable.",
+    });
+  }
+
+  return issues;
+}
+
+export function validateTaskSummaryContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+  if (!hasFrontmatterKey(content, "observability_surfaces")) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "missing_observability_frontmatter",
+      message: "Task summary is missing `observability_surfaces` in frontmatter.",
+      suggestion: "List the durable status/log/error surfaces a future agent should use.",
+    });
+  }
+
+  const diagnostics = getSection(content, "Diagnostics", 2);
+  if (!diagnostics) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "missing_diagnostics_section",
+      message: "Task summary is missing `## Diagnostics`.",
+      suggestion: "Document how to inspect what this task built later.",
+    });
+  } else if (sectionLooksPlaceholderOnly(diagnostics)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "diagnostics_placeholder_only",
+      message: "Task summary diagnostics section still looks like placeholder text.",
+      suggestion: "Replace placeholders with concrete commands, endpoints, logs, error shapes, or failure artifacts.",
+    });
+  }
+
+  return issues;
+}
+
+export function validateSliceSummaryContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+  if (!hasFrontmatterKey(content, "observability_surfaces")) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-summary",
+      file,
+      ruleId: "missing_observability_frontmatter",
+      message: "Slice summary is missing `observability_surfaces` in frontmatter.",
+      suggestion: "List the authoritative diagnostics and durable inspection surfaces for this slice.",
+    });
+  }
+
+  const diagnostics = getSection(content, "Authoritative diagnostics", 3);
+  if (!diagnostics) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-summary",
+      file,
+      ruleId: "missing_authoritative_diagnostics",
+      message: "Slice summary is missing `### Authoritative diagnostics` in Forward Intelligence.",
+      suggestion: "Tell future agents where to look first and why that signal is trustworthy.",
+    });
+  } else if (sectionLooksPlaceholderOnly(diagnostics)) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-summary",
+      file,
+      ruleId: "authoritative_diagnostics_placeholder_only",
+      message: "Slice summary includes authoritative diagnostics but it still looks like placeholder text.",
+      suggestion: "Replace placeholders with the real first-stop diagnostic surface for this slice.",
+    });
+  }
+
+  return issues;
+}
+
+export async function validatePlanBoundary(basePath: string, milestoneId: string, sliceId: string): Promise<ValidationIssue[]> {
+  const issues: ValidationIssue[] = [];
+  const slicePlan = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+  if (slicePlan) {
+    const content = await loadFile(slicePlan);
+    if (content) issues.push(...validateSlicePlanContent(slicePlan, content));
+  }
+
+  const tasksDir = resolveTasksDir(basePath, milestoneId, sliceId);
+  const taskPlans = tasksDir ? resolveTaskFiles(tasksDir, "PLAN") : [];
+  for (const file of taskPlans) {
+    const taskId = file.split("-")[0];
+    const taskPlan = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+    if (!taskPlan) continue;
+    const content = await loadFile(taskPlan);
+    if (content) issues.push(...validateTaskPlanContent(taskPlan, content));
+  }
+
+  return issues;
+}
+
+export async function validateExecuteBoundary(basePath: string, milestoneId: string, sliceId: string, taskId: string): Promise<ValidationIssue[]> {
+  const issues: ValidationIssue[] = [];
+  const slicePlan = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+  if (slicePlan) {
+    const content = await loadFile(slicePlan);
+    if (content) issues.push(...validateSlicePlanContent(slicePlan, content));
+  }
+
+  const taskPlan = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+  if (taskPlan) {
+    const content = await loadFile(taskPlan);
+    if (content) issues.push(...validateTaskPlanContent(taskPlan, content));
+  }
+
+  return issues;
+}
+
+export async function validateCompleteBoundary(basePath: string, milestoneId: string, sliceId: string): Promise<ValidationIssue[]> {
+  const issues: ValidationIssue[] = [];
+  const tasksDir = resolveTasksDir(basePath, milestoneId, sliceId);
+  const taskSummaries = tasksDir ? resolveTaskFiles(tasksDir, "SUMMARY") : [];
+  for (const file of taskSummaries) {
+    const taskId = file.split("-")[0];
+    const taskSummary = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "SUMMARY");
+    if (!taskSummary) continue;
+    const content = await loadFile(taskSummary);
+    if (content) issues.push(...validateTaskSummaryContent(taskSummary, content));
+  }
+
+  const sliceSummary = resolveSliceFile(basePath, milestoneId, sliceId, "SUMMARY");
+  if (sliceSummary) {
+    const content = await loadFile(sliceSummary);
+    if (content) issues.push(...validateSliceSummaryContent(sliceSummary, content));
+  }
+
+  return issues;
+}
+
+export function formatValidationIssues(issues: ValidationIssue[], limit: number = 4): string {
+  if (issues.length === 0) return "";
+  const lines = issues.slice(0, limit).map(issue => {
+    const fileName = issue.file.split("/").pop() || issue.file;
+    return `- ${fileName}: ${issue.message}`;
+  });
+  if (issues.length > limit) lines.push(`- ...and ${issues.length - limit} more`);
+  return lines.join("\n");
+}
--- a/src/resources/extensions/gsd/package.json
+++ b/src/resources/extensions/gsd/package.json
@ -0,0 +1,11 @@
+{
+  "name": "pi-extension-gsd",
+  "private": true,
+  "version": "1.0.0",
+  "type": "module",
+  "pi": {
+    "extensions": [
+      "./index.ts"
+    ]
+  }
+}
--- a/src/resources/extensions/gsd/paths.ts
+++ b/src/resources/extensions/gsd/paths.ts
@ -0,0 +1,308 @@
+/**
+ * GSD Paths — ID-based path resolution
+ *
+ * Directories use bare IDs: M001/, S01/, etc.
+ * Files use ID-SUFFIX: M001-ROADMAP.md, S01-PLAN.md, T01-PLAN.md
+ *
+ * Resolvers still handle legacy descriptor-suffixed names
+ * (e.g. M001-FLIGHT-SIMULATOR/, T03-INSTALL-PACKAGES-PLAN.md)
+ * via prefix matching, so existing projects work without migration.
+ */
+
+import { readdirSync, existsSync } from "node:fs";
+import { join } from "node:path";
+
+// ─── Name Builders ─────────────────────────────────────────────────────────
+
+/**
+ * Build a directory name from an ID.
+ * ("M001") → "M001"
+ */
+export function buildDirName(id: string): string {
+  return id;
+}
+
+/**
+ * Build a milestone-level file name.
+ * ("M001", "CONTEXT") → "M001-CONTEXT.md"
+ */
+export function buildMilestoneFileName(milestoneId: string, suffix: string): string {
+  return `${milestoneId}-${suffix}.md`;
+}
+
+/**
+ * Build a slice-level file name.
+ * ("S01", "PLAN") → "S01-PLAN.md"
+ */
+export function buildSliceFileName(sliceId: string, suffix: string): string {
+  return `${sliceId}-${suffix}.md`;
+}
+
+/**
+ * Build a task file name.
+ * ("T03", "PLAN") → "T03-PLAN.md"
+ * ("T03", "SUMMARY") → "T03-SUMMARY.md"
+ */
+export function buildTaskFileName(taskId: string, suffix: string): string {
+  return `${taskId}-${suffix}.md`;
+}
+
+// ─── Resolvers ─────────────────────────────────────────────────────────────
+
+/**
+ * Find a directory entry by ID prefix within a parent directory.
+ * Exact match first (M001), then prefix match (M001-SOMETHING) for
+ * backward compatibility with legacy descriptor directories.
+ * Returns the full directory name or null.
+ */
+export function resolveDir(parentDir: string, idPrefix: string): string | null {
+  if (!existsSync(parentDir)) return null;
+  try {
+    const entries = readdirSync(parentDir, { withFileTypes: true });
+    // Exact match first (current convention: bare ID)
+    const exact = entries.find(e => e.isDirectory() && e.name === idPrefix);
+    if (exact) return exact.name;
+    // Prefix match for legacy descriptor dirs: M001-SOMETHING
+    const prefixed = entries.find(
+      e => e.isDirectory() && e.name.startsWith(idPrefix + "-")
+    );
+    return prefixed ? prefixed.name : null;
+  } catch {
+    return null;
+  }
+}
+
+/**
+ * Find a file by ID prefix and suffix within a directory.
+ * Checks in order:
+ *   1. Direct: ID-SUFFIX.md (e.g. M001-ROADMAP.md, T03-PLAN.md)
+ *   2. Legacy descriptor: ID-DESCRIPTOR-SUFFIX.md (e.g. T03-INSTALL-PACKAGES-PLAN.md)
+ *   3. Legacy bare: suffix.md (e.g. roadmap.md)
+ */
+export function resolveFile(dir: string, idPrefix: string, suffix: string): string | null {
+  if (!existsSync(dir)) return null;
+  const target = `${idPrefix}-${suffix}.md`.toUpperCase();
+  try {
+    const entries = readdirSync(dir);
+    // Direct match: ID-SUFFIX.md
+    const direct = entries.find(e => e.toUpperCase() === target);
+    if (direct) return direct;
+    // Legacy pattern match: ID-DESCRIPTOR-SUFFIX.md
+    const pattern = new RegExp(
+      `^${idPrefix}-.*-${suffix}\\.md$`, "i"
+    );
+    const match = entries.find(e => pattern.test(e));
+    if (match) return match;
+    // Legacy fallback: suffix.md
+    const legacy = entries.find(e => e.toLowerCase() === `${suffix.toLowerCase()}.md`);
+    if (legacy) return legacy;
+    return null;
+  } catch {
+    return null;
+  }
+}
+
+/**
+ * Find all task files matching a pattern in a tasks directory.
+ * Returns sorted file names matching T##-SUFFIX.md or legacy T##-*-SUFFIX.md
+ */
+export function resolveTaskFiles(tasksDir: string, suffix: string): string[] {
+  if (!existsSync(tasksDir)) return [];
+  try {
+    // Current convention: T01-PLAN.md
+    const currentPattern = new RegExp(`^T\\d+-${suffix}\\.md$`, "i");
+    // Legacy convention: T01-INSTALL-PACKAGES-PLAN.md
+    const legacyPattern = new RegExp(`^T\\d+-.*-${suffix}\\.md$`, "i");
+    return readdirSync(tasksDir)
+      .filter(f => currentPattern.test(f) || legacyPattern.test(f))
+      .sort();
+  } catch {
+    return [];
+  }
+}
+
+// ─── Full Path Builders ────────────────────────────────────────────────────
+
+export const GSD_ROOT_FILES = {
+  PROJECT: "PROJECT.md",
+  DECISIONS: "DECISIONS.md",
+  QUEUE: "QUEUE.md",
+  STATE: "STATE.md",
+  REQUIREMENTS: "REQUIREMENTS.md",
+} as const;
+
+export type GSDRootFileKey = keyof typeof GSD_ROOT_FILES;
+
+const LEGACY_GSD_ROOT_FILES: Record<GSDRootFileKey, string> = {
+  PROJECT: "project.md",
+  DECISIONS: "decisions.md",
+  QUEUE: "queue.md",
+  STATE: "state.md",
+  REQUIREMENTS: "requirements.md",
+};
+
+export function gsdRoot(basePath: string): string {
+  return join(basePath, ".gsd");
+}
+
+export function milestonesDir(basePath: string): string {
+  return join(gsdRoot(basePath), "milestones");
+}
+
+export function resolveGsdRootFile(basePath: string, key: GSDRootFileKey): string {
+  const root = gsdRoot(basePath);
+  const canonical = join(root, GSD_ROOT_FILES[key]);
+  if (existsSync(canonical)) return canonical;
+  const legacy = join(root, LEGACY_GSD_ROOT_FILES[key]);
+  if (existsSync(legacy)) return legacy;
+  return canonical;
+}
+
+export function relGsdRootFile(key: GSDRootFileKey): string {
+  return `.gsd/${GSD_ROOT_FILES[key]}`;
+}
+
+/**
+ * Resolve the full path to a milestone directory.
+ * Returns null if the milestone doesn't exist.
+ */
+export function resolveMilestonePath(basePath: string, milestoneId: string): string | null {
+  const dir = resolveDir(milestonesDir(basePath), milestoneId);
+  return dir ? join(milestonesDir(basePath), dir) : null;
+}
+
+/**
+ * Resolve the full path to a milestone file (e.g. ROADMAP, CONTEXT, RESEARCH).
+ */
+export function resolveMilestoneFile(
+  basePath: string, milestoneId: string, suffix: string
+): string | null {
+  const mDir = resolveMilestonePath(basePath, milestoneId);
+  if (!mDir) return null;
+  const file = resolveFile(mDir, milestoneId, suffix);
+  return file ? join(mDir, file) : null;
+}
+
+/**
+ * Resolve the full path to a slice directory within a milestone.
+ */
+export function resolveSlicePath(
+  basePath: string, milestoneId: string, sliceId: string
+): string | null {
+  const mDir = resolveMilestonePath(basePath, milestoneId);
+  if (!mDir) return null;
+  const slicesDir = join(mDir, "slices");
+  const dir = resolveDir(slicesDir, sliceId);
+  return dir ? join(slicesDir, dir) : null;
+}
+
+/**
+ * Resolve the full path to a slice file (e.g. PLAN, RESEARCH, CONTEXT, SUMMARY).
+ */
+export function resolveSliceFile(
+  basePath: string, milestoneId: string, sliceId: string, suffix: string
+): string | null {
+  const sDir = resolveSlicePath(basePath, milestoneId, sliceId);
+  if (!sDir) return null;
+  const file = resolveFile(sDir, sliceId, suffix);
+  return file ? join(sDir, file) : null;
+}
+
+/**
+ * Resolve the tasks directory within a slice.
+ */
+export function resolveTasksDir(
+  basePath: string, milestoneId: string, sliceId: string
+): string | null {
+  const sDir = resolveSlicePath(basePath, milestoneId, sliceId);
+  if (!sDir) return null;
+  const tDir = join(sDir, "tasks");
+  return existsSync(tDir) ? tDir : null;
+}
+
+/**
+ * Resolve a specific task file.
+ */
+export function resolveTaskFile(
+  basePath: string, milestoneId: string, sliceId: string,
+  taskId: string, suffix: string
+): string | null {
+  const tDir = resolveTasksDir(basePath, milestoneId, sliceId);
+  if (!tDir) return null;
+  const file = resolveFile(tDir, taskId, suffix);
+  return file ? join(tDir, file) : null;
+}
+
+// ─── Relative Path Builders (for prompts — .gsd/milestones/...) ────────────
+
+/**
+ * Build relative .gsd/ path to a milestone directory.
+ * Uses the actual directory name on disk if it exists, otherwise bare ID.
+ */
+export function relMilestonePath(basePath: string, milestoneId: string): string {
+  const dir = resolveDir(milestonesDir(basePath), milestoneId);
+  if (dir) return `.gsd/milestones/${dir}`;
+  return `.gsd/milestones/${milestoneId}`;
+}
+
+/**
+ * Build relative .gsd/ path to a milestone file.
+ */
+export function relMilestoneFile(
+  basePath: string, milestoneId: string, suffix: string
+): string {
+  const mRel = relMilestonePath(basePath, milestoneId);
+  const mDir = resolveMilestonePath(basePath, milestoneId);
+  if (mDir) {
+    const file = resolveFile(mDir, milestoneId, suffix);
+    if (file) return `${mRel}/${file}`;
+  }
+  return `${mRel}/${buildMilestoneFileName(milestoneId, suffix)}`;
+}
+
+/**
+ * Build relative .gsd/ path to a slice directory.
+ */
+export function relSlicePath(
+  basePath: string, milestoneId: string, sliceId: string
+): string {
+  const mRel = relMilestonePath(basePath, milestoneId);
+  const mDir = resolveMilestonePath(basePath, milestoneId);
+  if (mDir) {
+    const slicesDir = join(mDir, "slices");
+    const dir = resolveDir(slicesDir, sliceId);
+    if (dir) return `${mRel}/slices/${dir}`;
+  }
+  return `${mRel}/slices/${sliceId}`;
+}
+
+/**
+ * Build relative .gsd/ path to a slice file.
+ */
+export function relSliceFile(
+  basePath: string, milestoneId: string, sliceId: string, suffix: string
+): string {
+  const sRel = relSlicePath(basePath, milestoneId, sliceId);
+  const sDir = resolveSlicePath(basePath, milestoneId, sliceId);
+  if (sDir) {
+    const file = resolveFile(sDir, sliceId, suffix);
+    if (file) return `${sRel}/${file}`;
+  }
+  return `${sRel}/${buildSliceFileName(sliceId, suffix)}`;
+}
+
+/**
+ * Build relative .gsd/ path to a task file.
+ */
+export function relTaskFile(
+  basePath: string, milestoneId: string, sliceId: string,
+  taskId: string, suffix: string
+): string {
+  const sRel = relSlicePath(basePath, milestoneId, sliceId);
+  const tDir = resolveTasksDir(basePath, milestoneId, sliceId);
+  if (tDir) {
+    const file = resolveFile(tDir, taskId, suffix);
+    if (file) return `${sRel}/tasks/${file}`;
+  }
+  return `${sRel}/tasks/${buildTaskFileName(taskId, suffix)}`;
+}
--- a/src/resources/extensions/gsd/preferences.ts
+++ b/src/resources/extensions/gsd/preferences.ts
@ -0,0 +1,600 @@
+import { existsSync, readdirSync, readFileSync, statSync } from "node:fs";
+import { homedir } from "node:os";
+import { isAbsolute, join } from "node:path";
+import { getAgentDir } from "@mariozechner/pi-coding-agent";
+
+const GLOBAL_PREFERENCES_PATH = join(homedir(), ".gsd", "preferences.md");
+const LEGACY_GLOBAL_PREFERENCES_PATH = join(homedir(), ".pi", "agent", "gsd-preferences.md");
+const PROJECT_PREFERENCES_PATH = join(process.cwd(), ".gsd", "preferences.md");
+const SKILL_ACTIONS = new Set(["use", "prefer", "avoid"]);
+
+export interface GSDSkillRule {
+  when: string;
+  use?: string[];
+  prefer?: string[];
+  avoid?: string[];
+}
+
+export interface GSDModelConfig {
+  research?: string;   // e.g. "claude-sonnet-4-6"
+  planning?: string;   // e.g. "claude-opus-4-6"
+  execution?: string;  // e.g. "claude-sonnet-4-6"
+  completion?: string; // e.g. "claude-sonnet-4-6"
+}
+
+export type SkillDiscoveryMode = "auto" | "suggest" | "off";
+
+export interface AutoSupervisorConfig {
+  model?: string;
+  soft_timeout_minutes?: number;
+  idle_timeout_minutes?: number;
+  hard_timeout_minutes?: number;
+}
+
+export interface GSDPreferences {
+  version?: number;
+  always_use_skills?: string[];
+  prefer_skills?: string[];
+  avoid_skills?: string[];
+  skill_rules?: GSDSkillRule[];
+  custom_instructions?: string[];
+  models?: GSDModelConfig;
+  skill_discovery?: SkillDiscoveryMode;
+  auto_supervisor?: AutoSupervisorConfig;
+  uat_dispatch?: boolean;
+  budget_ceiling?: number;
+}
+
+export interface LoadedGSDPreferences {
+  path: string;
+  scope: "global" | "project";
+  preferences: GSDPreferences;
+}
+
+export function getGlobalGSDPreferencesPath(): string {
+  return GLOBAL_PREFERENCES_PATH;
+}
+
+export function getLegacyGlobalGSDPreferencesPath(): string {
+  return LEGACY_GLOBAL_PREFERENCES_PATH;
+}
+
+export function getProjectGSDPreferencesPath(): string {
+  return PROJECT_PREFERENCES_PATH;
+}
+
+export function loadGlobalGSDPreferences(): LoadedGSDPreferences | null {
+  return loadPreferencesFile(GLOBAL_PREFERENCES_PATH, "global")
+    ?? loadPreferencesFile(LEGACY_GLOBAL_PREFERENCES_PATH, "global");
+}
+
+export function loadProjectGSDPreferences(): LoadedGSDPreferences | null {
+  return loadPreferencesFile(PROJECT_PREFERENCES_PATH, "project");
+}
+
+export function loadEffectiveGSDPreferences(): LoadedGSDPreferences | null {
+  const globalPreferences = loadGlobalGSDPreferences();
+  const projectPreferences = loadProjectGSDPreferences();
+
+  if (!globalPreferences && !projectPreferences) return null;
+  if (!globalPreferences) return projectPreferences;
+  if (!projectPreferences) return globalPreferences;
+
+  return {
+    path: projectPreferences.path,
+    scope: "project",
+    preferences: mergePreferences(globalPreferences.preferences, projectPreferences.preferences),
+  };
+}
+
+// ─── Skill Reference Resolution ───────────────────────────────────────────────
+
+export interface SkillResolution {
+  /** The original reference from preferences (bare name or path). */
+  original: string;
+  /** The resolved absolute path to the SKILL.md file, or null if unresolved. */
+  resolvedPath: string | null;
+  /** How it was resolved. */
+  method: "absolute-path" | "absolute-dir" | "user-skill" | "project-skill" | "unresolved";
+}
+
+export interface SkillResolutionReport {
+  /** All resolution results, keyed by original reference. */
+  resolutions: Map<string, SkillResolution>;
+  /** References that could not be resolved. */
+  warnings: string[];
+}
+
+/**
+ * Known skill directories, in priority order.
+ * User skills (~/.pi/agent/skills/) take precedence over project skills.
+ */
+function getSkillSearchDirs(cwd: string): Array<{ dir: string; method: SkillResolution["method"] }> {
+  return [
+    { dir: join(getAgentDir(), "skills"), method: "user-skill" },
+    { dir: join(cwd, ".pi", "agent", "skills"), method: "project-skill" },
+  ];
+}
+
+/**
+ * Resolve a single skill reference to an absolute path.
+ *
+ * Resolution order:
+ * 1. Absolute path to a file → check existsSync
+ * 2. Absolute path to a directory → check for SKILL.md inside
+ * 3. Bare name → scan known skill directories for <name>/SKILL.md
+ */
+function resolveSkillReference(ref: string, cwd: string): SkillResolution {
+  const trimmed = ref.trim();
+
+  // Expand tilde
+  const expanded = trimmed.startsWith("~/")
+    ? join(homedir(), trimmed.slice(2))
+    : trimmed;
+
+  // Absolute path
+  if (isAbsolute(expanded)) {
+    // Direct file reference
+    if (existsSync(expanded)) {
+      // Check if it's a directory — look for SKILL.md inside
+      try {
+        const stat = statSync(expanded);
+        if (stat.isDirectory()) {
+          const skillFile = join(expanded, "SKILL.md");
+          if (existsSync(skillFile)) {
+            return { original: ref, resolvedPath: skillFile, method: "absolute-dir" };
+          }
+          return { original: ref, resolvedPath: null, method: "unresolved" };
+        }
+      } catch { /* fall through */ }
+      return { original: ref, resolvedPath: expanded, method: "absolute-path" };
+    }
+    // Maybe it's a directory path without SKILL.md suffix
+    const withSkillMd = join(expanded, "SKILL.md");
+    if (existsSync(withSkillMd)) {
+      return { original: ref, resolvedPath: withSkillMd, method: "absolute-dir" };
+    }
+    return { original: ref, resolvedPath: null, method: "unresolved" };
+  }
+
+  // Bare name — scan known skill directories
+  for (const { dir, method } of getSkillSearchDirs(cwd)) {
+    if (!existsSync(dir)) continue;
+    try {
+      const entries = readdirSync(dir, { withFileTypes: true });
+      for (const entry of entries) {
+        if (!entry.isDirectory()) continue;
+        if (entry.name === expanded) {
+          const skillFile = join(dir, entry.name, "SKILL.md");
+          if (existsSync(skillFile)) {
+            return { original: ref, resolvedPath: skillFile, method };
+          }
+        }
+      }
+    } catch { /* directory not readable — skip */ }
+  }
+
+  return { original: ref, resolvedPath: null, method: "unresolved" };
+}
+
+/**
+ * Resolve all skill references in a preferences object.
+ * Caches resolution per reference string to avoid redundant filesystem scans.
+ */
+export function resolveAllSkillReferences(preferences: GSDPreferences, cwd: string): SkillResolutionReport {
+  const validated = validatePreferences(preferences).preferences;
+  preferences = validated;
+
+  const resolutions = new Map<string, SkillResolution>();
+  const warnings: string[] = [];
+
+  function resolve(ref: string): SkillResolution {
+    const existing = resolutions.get(ref);
+    if (existing) return existing;
+    const result = resolveSkillReference(ref, cwd);
+    resolutions.set(ref, result);
+    if (result.method === "unresolved") {
+      warnings.push(ref);
+    }
+    return result;
+  }
+
+  // Resolve all skill lists
+  for (const skill of preferences.always_use_skills ?? []) resolve(skill);
+  for (const skill of preferences.prefer_skills ?? []) resolve(skill);
+  for (const skill of preferences.avoid_skills ?? []) resolve(skill);
+
+  // Resolve skill rules
+  for (const rule of preferences.skill_rules ?? []) {
+    for (const skill of rule.use ?? []) resolve(skill);
+    for (const skill of rule.prefer ?? []) resolve(skill);
+    for (const skill of rule.avoid ?? []) resolve(skill);
+  }
+
+  return { resolutions, warnings };
+}
+
+/**
+ * Format a skill reference for the system prompt.
+ * If resolved, shows the path so the agent knows exactly where to read.
+ * If unresolved, marks it clearly.
+ */
+function formatSkillRef(ref: string, resolutions: Map<string, SkillResolution>): string {
+  const resolution = resolutions.get(ref);
+  if (!resolution || resolution.method === "unresolved") {
+    return `${ref} (⚠ not found — check skill name or path)`;
+  }
+  // For absolute paths where SKILL.md is just appended, don't clutter the output
+  if (resolution.method === "absolute-path" || resolution.method === "absolute-dir") {
+    return ref;
+  }
+  // For bare names resolved from skill directories, show the resolved path
+  return `${ref} → \`${resolution.resolvedPath}\``;
+}
+
+// ─── System Prompt Rendering ──────────────────────────────────────────────────
+
+export function renderPreferencesForSystemPrompt(preferences: GSDPreferences, resolutions?: Map<string, SkillResolution>): string {
+  const validated = validatePreferences(preferences);
+  const lines: string[] = ["## GSD Skill Preferences"];
+
+  if (validated.errors.length > 0) {
+    lines.push("- Validation: some preference values were ignored because they were invalid.");
+  }
+
+  preferences = validated.preferences;
+
+  lines.push(
+    "- Treat these as explicit skill-selection policy for GSD work.",
+    "- If a listed skill exists and is relevant, load and follow it instead of treating it as a vague suggestion.",
+    "- Current user instructions still override these defaults.",
+  );
+
+  const fmt = (ref: string) => resolutions ? formatSkillRef(ref, resolutions) : ref;
+
+  if (preferences.always_use_skills && preferences.always_use_skills.length > 0) {
+    lines.push("- Always use these skills when relevant:");
+    for (const skill of preferences.always_use_skills) {
+      lines.push(`  - ${fmt(skill)}`);
+    }
+  }
+
+  if (preferences.prefer_skills && preferences.prefer_skills.length > 0) {
+    lines.push("- Prefer these skills when relevant:");
+    for (const skill of preferences.prefer_skills) {
+      lines.push(`  - ${fmt(skill)}`);
+    }
+  }
+
+  if (preferences.avoid_skills && preferences.avoid_skills.length > 0) {
+    lines.push("- Avoid these skills unless clearly needed:");
+    for (const skill of preferences.avoid_skills) {
+      lines.push(`  - ${fmt(skill)}`);
+    }
+  }
+
+  if (preferences.skill_rules && preferences.skill_rules.length > 0) {
+    lines.push("- Situational rules:");
+    for (const rule of preferences.skill_rules) {
+      lines.push(`  - When ${rule.when}:`);
+      if (rule.use && rule.use.length > 0) {
+        lines.push(`    - use: ${rule.use.map(fmt).join(", ")}`);
+      }
+      if (rule.prefer && rule.prefer.length > 0) {
+        lines.push(`    - prefer: ${rule.prefer.map(fmt).join(", ")}`);
+      }
+      if (rule.avoid && rule.avoid.length > 0) {
+        lines.push(`    - avoid: ${rule.avoid.map(fmt).join(", ")}`);
+      }
+    }
+  }
+
+  if (preferences.custom_instructions && preferences.custom_instructions.length > 0) {
+    lines.push("- Additional instructions:");
+    for (const instruction of preferences.custom_instructions) {
+      lines.push(`  - ${instruction}`);
+    }
+  }
+
+  return lines.join("\n");
+}
+
+function loadPreferencesFile(path: string, scope: "global" | "project"): LoadedGSDPreferences | null {
+  if (!existsSync(path)) return null;
+
+  const raw = readFileSync(path, "utf-8");
+  const preferences = parsePreferencesMarkdown(raw);
+  if (!preferences) return null;
+
+  return {
+    path,
+    scope,
+    preferences,
+  };
+}
+
+function parsePreferencesMarkdown(content: string): GSDPreferences | null {
+  const match = content.match(/^---\n([\s\S]*?)\n---/);
+  if (!match) return null;
+  return parseFrontmatterBlock(match[1]);
+}
+
+function parseFrontmatterBlock(frontmatter: string): GSDPreferences {
+  const root: Record<string, unknown> = {};
+  const stack: Array<{ indent: number; value: Record<string, unknown> }> = [{ indent: -1, value: root }];
+
+  const lines = frontmatter.split(/\r?\n/);
+  for (let i = 0; i < lines.length; i++) {
+    const line = lines[i];
+    if (!line.trim()) continue;
+
+    const indent = line.match(/^\s*/)?.[0].length ?? 0;
+    const trimmed = line.trim();
+
+    while (stack.length > 1 && indent <= stack[stack.length - 1].indent) {
+      stack.pop();
+    }
+
+    const current = stack[stack.length - 1].value;
+    const keyMatch = trimmed.match(/^([A-Za-z0-9_]+):(.*)$/);
+    if (!keyMatch) continue;
+
+    const [, key, remainder] = keyMatch;
+    const valuePart = remainder.trim();
+
+    if (valuePart === "") {
+      const nextLine = lines[i + 1] ?? "";
+      const nextTrimmed = nextLine.trim();
+      if (nextTrimmed.startsWith("- ")) {
+        const items: unknown[] = [];
+        let j = i + 1;
+        while (j < lines.length) {
+          const candidate = lines[j];
+          const candidateIndent = candidate.match(/^\s*/)?.[0].length ?? 0;
+          const candidateTrimmed = candidate.trim();
+          if (!candidateTrimmed) {
+            j++;
+            continue;
+          }
+          if (candidateIndent <= indent || !candidateTrimmed.startsWith("- ")) break;
+
+          const itemText = candidateTrimmed.slice(2).trim();
+          const nextCandidate = lines[j + 1] ?? "";
+          const nextCandidateIndent = nextCandidate.match(/^\s*/)?.[0].length ?? 0;
+          const nextCandidateTrimmed = nextCandidate.trim();
+
+          if (itemText.includes(":") || (nextCandidateTrimmed && nextCandidateIndent > candidateIndent)) {
+            const obj: Record<string, unknown> = {};
+            const firstMatch = itemText.match(/^([A-Za-z0-9_]+):(.*)$/);
+            if (firstMatch) {
+              obj[firstMatch[1]] = parseScalar(firstMatch[2].trim());
+            }
+            j++;
+            while (j < lines.length) {
+              const nested = lines[j];
+              const nestedIndent = nested.match(/^\s*/)?.[0].length ?? 0;
+              const nestedTrimmed = nested.trim();
+              if (!nestedTrimmed) {
+                j++;
+                continue;
+              }
+              if (nestedIndent <= candidateIndent) break;
+              const nestedMatch = nestedTrimmed.match(/^([A-Za-z0-9_]+):(.*)$/);
+              if (nestedMatch) {
+                const nestedValue = nestedMatch[2].trim();
+                if (nestedValue === "") {
+                  const nestedItems: string[] = [];
+                  j++;
+                  while (j < lines.length) {
+                    const nestedArrayLine = lines[j];
+                    const nestedArrayIndent = nestedArrayLine.match(/^\s*/)?.[0].length ?? 0;
+                    const nestedArrayTrimmed = nestedArrayLine.trim();
+                    if (!nestedArrayTrimmed) {
+                      j++;
+                      continue;
+                    }
+                    if (nestedArrayIndent <= nestedIndent || !nestedArrayTrimmed.startsWith("- ")) break;
+                    nestedItems.push(String(parseScalar(nestedArrayTrimmed.slice(2).trim())));
+                    j++;
+                  }
+                  obj[nestedMatch[1]] = nestedItems;
+                  continue;
+                }
+                obj[nestedMatch[1]] = parseScalar(nestedValue);
+              }
+              j++;
+            }
+            items.push(obj);
+            continue;
+          }
+
+          items.push(parseScalar(itemText));
+          j++;
+        }
+        current[key] = items;
+        i = j - 1;
+      } else {
+        const obj: Record<string, unknown> = {};
+        current[key] = obj;
+        stack.push({ indent, value: obj });
+      }
+      continue;
+    }
+
+    current[key] = parseScalar(valuePart);
+  }
+
+  return root as GSDPreferences;
+}
+
+function parseScalar(value: string): string | number | boolean {
+  if (value === "true") return true;
+  if (value === "false") return false;
+  if (/^-?\d+$/.test(value)) return Number(value);
+  return value.replace(/^['\"]|['\"]$/g, "");
+}
+
+/**
+ * Resolve the skill discovery mode from effective preferences.
+ * Defaults to "suggest" — skills are identified during research but not installed automatically.
+ */
+export function resolveSkillDiscoveryMode(): SkillDiscoveryMode {
+  const prefs = loadEffectiveGSDPreferences();
+  return prefs?.preferences.skill_discovery ?? "suggest";
+}
+
+/**
+ * Resolve which model ID to use for a given auto-mode unit type.
+ * Returns undefined if no model preference is set for this unit type.
+ */
+export function resolveModelForUnit(unitType: string): string | undefined {
+  const prefs = loadEffectiveGSDPreferences();
+  if (!prefs?.preferences.models) return undefined;
+  const m = prefs.preferences.models;
+
+  switch (unitType) {
+    case "research-milestone":
+    case "research-slice":
+      return m.research;
+    case "plan-milestone":
+    case "plan-slice":
+    case "replan-slice":
+      return m.planning;
+    case "execute-task":
+      return m.execution;
+    case "complete-slice":
+    case "run-uat":
+      return m.completion;
+    default:
+      return undefined;
+  }
+}
+
+export function resolveAutoSupervisorConfig(): AutoSupervisorConfig {
+  const prefs = loadEffectiveGSDPreferences();
+  const configured = prefs?.preferences.auto_supervisor ?? {};
+
+  return {
+    soft_timeout_minutes: configured.soft_timeout_minutes ?? 20,
+    idle_timeout_minutes: configured.idle_timeout_minutes ?? 10,
+    hard_timeout_minutes: configured.hard_timeout_minutes ?? 30,
+    ...(configured.model ? { model: configured.model } : {}),
+  };
+}
+
+function mergePreferences(base: GSDPreferences, override: GSDPreferences): GSDPreferences {
+  return {
+    version: override.version ?? base.version,
+    always_use_skills: mergeStringLists(base.always_use_skills, override.always_use_skills),
+    prefer_skills: mergeStringLists(base.prefer_skills, override.prefer_skills),
+    avoid_skills: mergeStringLists(base.avoid_skills, override.avoid_skills),
+    skill_rules: [...(base.skill_rules ?? []), ...(override.skill_rules ?? [])],
+    custom_instructions: mergeStringLists(base.custom_instructions, override.custom_instructions),
+    models: { ...(base.models ?? {}), ...(override.models ?? {}) },
+    skill_discovery: override.skill_discovery ?? base.skill_discovery,
+    auto_supervisor: { ...(base.auto_supervisor ?? {}), ...(override.auto_supervisor ?? {}) },
+    uat_dispatch: override.uat_dispatch ?? base.uat_dispatch,
+    budget_ceiling: override.budget_ceiling ?? base.budget_ceiling,
+  };
+}
+
+function validatePreferences(preferences: GSDPreferences): {
+  preferences: GSDPreferences;
+  errors: string[];
+} {
+  const errors: string[] = [];
+  const validated: GSDPreferences = {};
+
+  if (preferences.version !== undefined) {
+    if (preferences.version === 1) {
+      validated.version = 1;
+    } else {
+      errors.push(`unsupported version ${preferences.version}`);
+    }
+  }
+
+  const validDiscoveryModes = new Set(["auto", "suggest", "off"]);
+  if (preferences.skill_discovery) {
+    if (validDiscoveryModes.has(preferences.skill_discovery)) {
+      validated.skill_discovery = preferences.skill_discovery;
+    } else {
+      errors.push(`invalid skill_discovery value: ${preferences.skill_discovery}`);
+    }
+  }
+
+  validated.always_use_skills = normalizeStringList(preferences.always_use_skills);
+  validated.prefer_skills = normalizeStringList(preferences.prefer_skills);
+  validated.avoid_skills = normalizeStringList(preferences.avoid_skills);
+  validated.custom_instructions = normalizeStringList(preferences.custom_instructions);
+
+  if (preferences.skill_rules) {
+    const validRules: GSDSkillRule[] = [];
+    for (const rule of preferences.skill_rules) {
+      if (!rule || typeof rule !== "object") {
+        errors.push("invalid skill_rules entry");
+        continue;
+      }
+      const when = typeof rule.when === "string" ? rule.when.trim() : "";
+      if (!when) {
+        errors.push("skill_rules entry missing when");
+        continue;
+      }
+      const validatedRule: GSDSkillRule = { when };
+      for (const action of SKILL_ACTIONS) {
+        const values = normalizeStringList((rule as Record<string, unknown>)[action]);
+        if (values.length > 0) {
+          validatedRule[action as keyof GSDSkillRule] = values as never;
+        }
+      }
+      if (!validatedRule.use && !validatedRule.prefer && !validatedRule.avoid) {
+        errors.push(`skill rule has no actions: ${when}`);
+        continue;
+      }
+      validRules.push(validatedRule);
+    }
+    if (validRules.length > 0) {
+      validated.skill_rules = validRules;
+    }
+  }
+
+  for (const key of ["always_use_skills", "prefer_skills", "avoid_skills", "custom_instructions"] as const) {
+    if (validated[key] && validated[key]!.length === 0) {
+      delete validated[key];
+    }
+  }
+
+  if (preferences.uat_dispatch !== undefined) {
+    validated.uat_dispatch = !!preferences.uat_dispatch;
+  }
+
+  if (preferences.budget_ceiling !== undefined) {
+    const raw = preferences.budget_ceiling;
+    if (typeof raw === "number" && Number.isFinite(raw)) {
+      validated.budget_ceiling = raw;
+    } else if (typeof raw === "string" && Number.isFinite(Number(raw))) {
+      validated.budget_ceiling = Number(raw);
+    } else {
+      errors.push("budget_ceiling must be a finite number");
+    }
+  }
+
+  return { preferences: validated, errors };
+}
+
+function mergeStringLists(base?: unknown, override?: unknown): string[] | undefined {
+  const merged = [
+    ...normalizeStringList(base),
+    ...normalizeStringList(override),
+  ]
+    .map((item) => item.trim())
+    .filter(Boolean);
+  return merged.length > 0 ? Array.from(new Set(merged)) : undefined;
+}
+
+function normalizeStringList(value: unknown): string[] {
+  if (!Array.isArray(value)) return [];
+  return value
+    .filter((item): item is string => typeof item === "string")
+    .map((item) => item.trim())
+    .filter(Boolean);
+}
--- a/src/resources/extensions/gsd/prompt-loader.ts
+++ b/src/resources/extensions/gsd/prompt-loader.ts
@ -0,0 +1,50 @@
+/**
+ * GSD Prompt Loader
+ *
+ * Reads .md prompt templates from the prompts/ directory and substitutes
+ * {{variable}} placeholders with provided values.
+ *
+ * Templates live at prompts/ relative to this module's directory.
+ * They use {{variableName}} syntax for substitution.
+ */
+
+import { readFileSync } from "node:fs";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const promptsDir = join(dirname(fileURLToPath(import.meta.url)), "prompts");
+
+/**
+ * Load a prompt template and substitute variables.
+ *
+ * @param name - Template filename without .md extension (e.g. "execute-task")
+ * @param vars - Key-value pairs to substitute for {{key}} placeholders
+ */
+export function loadPrompt(name: string, vars: Record<string, string> = {}): string {
+  const path = join(promptsDir, `${name}.md`);
+  let content = readFileSync(path, "utf-8");
+
+  // Check BEFORE substitution: find all {{varName}} placeholders the template
+  // declares and verify every one has a value in vars. Checking after substitution
+  // would also flag {{...}} patterns injected by inlined content (e.g. template
+  // files embedded in {{inlinedContext}}), producing false positives.
+  const declared = content.match(/\{\{[a-zA-Z][a-zA-Z0-9_]*\}\}/g);
+  if (declared) {
+    const missing = [...new Set(declared)]
+      .map(m => m.slice(2, -2))
+      .filter(key => !(key in vars));
+    if (missing.length > 0) {
+      throw new Error(
+        `loadPrompt("${name}"): template declares {{${missing.join("}}, {{")}}}} but no value was provided. ` +
+        `This usually means the extension code in memory is older than the template on disk. ` +
+        `Restart pi to reload the extension.`
+      );
+    }
+  }
+
+  for (const [key, value] of Object.entries(vars)) {
+    content = content.replaceAll(`{{${key}}}`, value);
+  }
+
+  return content.trim();
+}
--- a/src/resources/extensions/gsd/prompts/complete-milestone.md
+++ b/src/resources/extensions/gsd/prompts/complete-milestone.md
@ -0,0 +1,25 @@
+You are executing GSD auto-mode.
+
+## UNIT: Complete Milestone {{milestoneId}} ("{{milestoneTitle}}")
+
+All relevant context has been preloaded below — the roadmap, all slice summaries, requirements, decisions, and project context are inlined. Start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+Then:
+1. Read the milestone-summary template at `~/.pi/agent/extensions/gsd/templates/milestone-summary.md`
+2. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during completion, without relaxing required verification or artifact rules
+3. Verify each **success criterion** from the milestone definition in `{{roadmapPath}}`. For each criterion, confirm it was met with specific evidence from slice summaries, test results, or observable behavior. List any criterion that was NOT met.
+4. Verify the milestone's **definition of done** — all slices are `[x]`, all slice summaries exist, and any cross-slice integration points work correctly.
+5. Validate **requirement status transitions**. For each requirement that changed status during this milestone, confirm the transition is supported by evidence. Requirements can move between Active, Validated, Deferred, Blocked, or Out of Scope — but only with proof.
+6. Write `{{milestoneSummaryAbsPath}}` using the milestone-summary template. Fill all frontmatter fields and narrative sections. The `requirement_outcomes` field must list every requirement that changed status with `from_status`, `to_status`, and `proof`.
+7. Update `.gsd/REQUIREMENTS.md` if any requirement status transitions were validated in step 5.
+8. Update `.gsd/PROJECT.md` to reflect milestone completion and current project state.
+9. Commit all changes: `git add -A && git commit -m 'feat(gsd): complete {{milestoneId}}'`
+10. Update `.gsd/STATE.md`
+
+**Important:** Do NOT skip the success criteria and definition of done verification (steps 3-4). The milestone summary must reflect actual verified outcomes, not assumed success. If any criterion was not met, document it clearly in the summary and do not mark the milestone as passing verification.
+
+**You MUST write `{{milestoneSummaryAbsPath}}` AND update PROJECT.md before finishing.**
+
+When done, say: "Milestone {{milestoneId}} complete."
--- a/src/resources/extensions/gsd/prompts/complete-slice.md
+++ b/src/resources/extensions/gsd/prompts/complete-slice.md
@ -0,0 +1,27 @@
+You are executing GSD auto-mode.
+
+## UNIT: Complete Slice {{sliceId}} ("{{sliceTitle}}") — Milestone {{milestoneId}}
+
+All relevant context has been preloaded below — the slice plan, all task summaries, and the milestone roadmap are inlined. Start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+Then:
+1. Read the templates:
+   - `~/.pi/agent/extensions/gsd/templates/slice-summary.md`
+   - `~/.pi/agent/extensions/gsd/templates/uat.md`
+2. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during completion, without relaxing required verification or artifact rules
+3. Run all slice-level verification checks defined in the slice plan. All must pass before marking the slice done. If any fail, fix them first.
+4. Confirm the slice's observability/diagnostic surfaces are real and useful where relevant: status inspection works, failure state is externally visible, structured errors/logs are actionable, and hidden failures are not being mistaken for success.
+5. If `.gsd/REQUIREMENTS.md` exists, update it based on what this slice actually proved. Move requirements between Active, Validated, Deferred, Blocked, or Out of Scope only when the evidence from execution supports that change. Surface any new candidate requirements discovered during execution instead of silently dropping them.
+6. Write `{{sliceSummaryAbsPath}}` (compress all task summaries). Fill the requirement-related sections explicitly.
+7. Write `{{sliceUatAbsPath}}`. Fill the new `UAT Type`, `Requirements Proved By This UAT`, and `Not Proven By This UAT` sections explicitly.
+8. Review task summaries for `key_decisions`. Ensure any significant architectural, pattern, or observability decisions are in `.gsd/DECISIONS.md`. If any are missing, append them now.
+9. Mark {{sliceId}} done in `{{roadmapPath}}` (change `[ ]` to `[x]`)
+10. Commit all remaining slice changes: `git add -A && git commit -m 'feat(gsd): complete {{sliceId}}'`. Do not squash-merge manually; the extension will merge the slice branch back to main after this unit succeeds.
+11. Update `.gsd/PROJECT.md` if it exists — refresh current state if needed.
+12. Update `.gsd/STATE.md`
+
+**You MUST mark {{sliceId}} as `[x]` in `{{roadmapPath}}` AND write `{{sliceSummaryAbsPath}}` before finishing.**
+
+When done, say: "Slice {{sliceId}} complete."
--- a/src/resources/extensions/gsd/prompts/discuss.md
+++ b/src/resources/extensions/gsd/prompts/discuss.md
@ -0,0 +1,151 @@
+{{preamble}}
+
+Say exactly: "What's the vision?" — nothing else. Wait for the user's answer.
+
+## Discussion Phase
+
+After they describe it, your job is to understand the project deeply enough to define the project's capability contract before planning slices.
+
+## Vision Mapping
+
+Before diving into detailed Q&A, read the user's description and classify its scale:
+
+- **Task** — a focused piece of work (single milestone, few slices)
+- **Project** — a coherent product with multiple major capabilities (multi-milestone likely)
+- **Product/Platform** — a large vision with distinct phases, audiences, or systems (definitely multi-milestone)
+
+**For Project or Product/Platform scale:** Before drilling into details, map the full landscape:
+1. Propose a milestone sequence — names, one-line intents, rough dependencies
+2. Present this to the user for confirmation or adjustment
+3. Only then begin the deep Q&A — and scope the Q&A to the full vision, not just M001
+
+**For Task scale:** Proceed directly to the discussion flow below (single milestone).
+
+**Anti-reduction rule:** If the user describes a big vision, plan the big vision. Do not ask "what's the minimum viable version?" or try to reduce scope unless the user explicitly asks for an MVP or minimal version. When something is complex or risky, phase it into a later milestone — do not cut it. The user's ambition is the target, and your job is to sequence it intelligently, not shrink it.
+
+---
+
+**If the user provides a file path or pastes a large document** (spec, design doc, product plan, chat export), read it fully before asking questions. Use it as the starting point — don't ask them to re-explain what's already in the document. Your questions should fill gaps and resolve ambiguities the document doesn't cover.
+
+**Investigate between question rounds to make your questions smarter.** Before each round of questions, do enough lightweight research that your questions are grounded in reality — not guesses about what exists or what's possible.
+
+- Check library docs (`resolve_library` / `get_library_docs`) when the user mentions tech you need current facts about — capabilities, constraints, API shapes, version-specific behavior
+- Do web searches (`search-the-web`) to verify the landscape — what solutions exist, what's changed recently, what's the current best practice. Use `freshness` for recency-sensitive queries, `domain` to target specific sites. Use `fetch_page` to read the full content of promising URLs when snippets aren't enough.
+- Scout the codebase (`ls`, `find`, `rg`, or `scout` for broad unfamiliar areas) to understand what already exists, what patterns are established, what constraints current code imposes
+
+Don't go deep — just enough that your next question reflects what's actually true rather than what you assume.
+
+**Use this to actively surface:**
+- The biggest technical unknowns — what could fail, what hasn't been proven, what might invalidate the plan
+- Integration surfaces — external systems, APIs, libraries, or internal modules this work touches
+- What needs to be proven before committing — the things that, if they don't work, mean the plan is wrong
+- Product reality requirements: primary user loop, launchability expectations, continuity expectations, and failure visibility expectations
+- Items that are complex, risky, or lower priority — phase these into later milestones rather than deferring or cutting them. Only truly unwanted capabilities become anti-features.
+
+**Then use ask_user_questions** to dig into gray areas — architecture choices, scope boundaries, tech preferences, what's in vs out. 1-3 questions per round.
+
+If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during discuss/planning work, but do not let it override the required discuss flow or artifact requirements.
+
+**Self-regulate depth by scale:**
+- **Task scale:** After about 5-10 questions total (2-3 rounds), or when you feel you have a solid understanding, offer to proceed.
+- **Project/Product scale:** After about 15-25 questions total (5-8 rounds), or when you feel you have a solid understanding, offer to proceed.
+
+Include a question like:
+"I think I have a good picture. Ready to confirm requirements and milestone plan, or are there more things to discuss?"
+with options: "Ready to confirm requirements and milestone plan (Recommended)", "I have more to discuss"
+
+If the user wants to keep going, keep asking. If they're ready, proceed.
+
+## Focused Research
+
+For a new project or any project that does not yet have `.gsd/REQUIREMENTS.md`, do a focused research pass before roadmap creation.
+
+Research is advisory, not auto-binding. Use the discussion output to identify:
+- table stakes the product space usually expects
+- domain-standard behaviors the user may or may not want
+- likely omissions that would make the product feel incomplete
+- plausible anti-features or scope traps
+- differentiators worth preserving
+
+If the research suggests requirements the user did not explicitly ask for, present them as candidate requirements to confirm, defer, or reject. Do not silently turn research into scope.
+
+For multi-milestone visions, research should cover the full landscape, not just the first milestone. Research findings may affect milestone sequencing, not just slice ordering within M001.
+
+## Capability Contract
+
+Before writing a roadmap, produce or update `.gsd/REQUIREMENTS.md`.
+
+Use it as the project's explicit capability contract.
+
+Requirements must be organized into:
+- Active
+- Validated
+- Deferred
+- Out of Scope
+- Traceability
+
+Each requirement should include:
+- stable ID (`R###`)
+- title
+- class
+- status
+- description
+- why it matters
+- source (`user`, `inferred`, `research`, or `execution`)
+- primary owning slice
+- supporting slices
+- validation status
+- notes
+
+Rules:
+- Keep requirements capability-oriented, not a giant feature inventory
+- Every Active requirement must either be mapped to a roadmap owner, explicitly deferred, blocked with reason, or moved out of scope
+- Product-facing work should capture launchability, primary user loop, continuity, and failure visibility when relevant
+- Later milestones may have provisional ownership, but the first planned milestone should map requirements to concrete slices wherever possible
+
+For multi-milestone projects, requirements should span the full vision. Requirements owned by later milestones get provisional ownership. The full requirement set captures the user's complete vision — milestones are the sequencing strategy, not the scope boundary.
+
+If the project is new or has no `REQUIREMENTS.md`, confirm candidate requirements with the user before writing the roadmap. Keep the confirmation lightweight: confirm, defer, reject, or add.
+
+## Scope Assessment
+
+Confirm the scale assessment from Vision Mapping still holds after discussion. If the scope grew or shrank significantly during Q&A, adjust the milestone count accordingly.
+
+If Vision Mapping classified the work as Task but discussion revealed Project-scale complexity, upgrade to multi-milestone and propose the split. If Vision Mapping classified it as Project but the scope narrowed to a single coherent body of work (roughly 2-12 slices), downgrade to single-milestone.
+
+## Output Phase
+
+### Naming Convention
+
+Directories use bare IDs. Files use ID-SUFFIX format. Titles live inside file content, not in names.
+- Milestone dir: `.gsd/milestones/{{milestoneId}}/`
+- Milestone files: `{{milestoneId}}-CONTEXT.md`, `{{milestoneId}}-ROADMAP.md`
+- Slice dirs: `S01/`, `S02/`, etc.
+
+### Single Milestone
+
+Once the user is satisfied, in a single pass:
+1. `mkdir -p .gsd/milestones/{{milestoneId}}/slices`
+2. Write or update `.gsd/PROJECT.md` — read the template at `~/.pi/agent/extensions/gsd/templates/project.md` first. Describe what the project is, its current state, and list the milestone sequence.
+3. Write or update `.gsd/REQUIREMENTS.md` — read the template at `~/.pi/agent/extensions/gsd/templates/requirements.md` first. Confirm requirement states, ownership, and traceability before roadmap creation.
+4. Write `{{contextAbsPath}}` — read the template at `~/.pi/agent/extensions/gsd/templates/context.md` first. Preserve key risks, unknowns, existing codebase constraints, integration points, and relevant requirements surfaced during discussion.
+5. Write `{{roadmapAbsPath}}` — read the template at `~/.pi/agent/extensions/gsd/templates/roadmap.md` first. Decompose into demoable vertical slices with checkboxes, risk, depends, demo sentences, proof strategy, verification classes, milestone definition of done, requirement coverage, and a boundary map. If the milestone crosses multiple runtime boundaries, include an explicit final integration slice that proves the assembled system works end-to-end in a real environment.
+6. Seed `.gsd/DECISIONS.md` — read the template at `~/.pi/agent/extensions/gsd/templates/decisions.md` first. Append rows for any architectural or pattern decisions made during discussion.
+7. Update `.gsd/STATE.md`
+8. Commit: `docs({{milestoneId}}): context, requirements, and roadmap`
+
+After writing the files and committing, say exactly: "Milestone {{milestoneId}} ready." — nothing else. Auto-mode will start automatically.
+
+### Multi-Milestone
+
+Once the user confirms the milestone split, in a single pass:
+1. `mkdir -p .gsd/milestones/{{milestoneId}}/slices` for each milestone
+2. Write `.gsd/PROJECT.md` — read the template at `~/.pi/agent/extensions/gsd/templates/project.md` first.
+3. Write `.gsd/REQUIREMENTS.md` — read the template at `~/.pi/agent/extensions/gsd/templates/requirements.md` first. Capture Active, Deferred, Out of Scope, and any already Validated requirements. Later milestones may have provisional ownership where slice plans do not exist yet.
+4. Write a `CONTEXT.md` for **every** milestone — capture the intent, scope, risks, constraints, user-visible outcome, completion class, final integrated acceptance, and relevant requirements for each. Each future milestone's CONTEXT.md should be rich enough that a planning agent encountering it fresh — with no memory of this conversation — can understand the intent, constraints, dependencies, what this milestone unlocks, and what "done" looks like.
+5. Write a `ROADMAP.md` for **only the first milestone** — detail-planning later milestones now is waste because the codebase will change. Include requirement coverage and a milestone definition of done.
+6. Seed `.gsd/DECISIONS.md`.
+7. Update `.gsd/STATE.md`
+8. Commit: `docs: project plan — N milestones` (replace N with the actual milestone count)
+
+After writing the files and committing, say exactly: "Milestone M001 ready." — nothing else. Auto-mode will start automatically.
--- a/src/resources/extensions/gsd/prompts/doctor-heal.md
+++ b/src/resources/extensions/gsd/prompts/doctor-heal.md
@ -0,0 +1,29 @@
+You are executing GSD doctor heal mode.
+
+The doctor has already scanned the repo and optionally applied deterministic fixes. You are now responsible for resolving the remaining issues using the smallest safe set of changes.
+
+Rules:
+1. Prioritize the active milestone or the explicitly requested scope. Do not fan out across unrelated historical milestones unless the report explicitly scopes you there.
+2. Read before edit.
+3. Prefer fixing authoritative artifacts over masking warnings.
+4. For missing summaries or UAT files, generate the real artifact from existing slice/task context when possible — do not leave placeholders if you can reconstruct the real content.
+5. After each repair cluster, verify the relevant invariant directly from disk.
+6. When done, rerun `/gsd doctor {{doctorCommandSuffix}}` mentally by ensuring the remaining issue set for this scope is reduced or cleared.
+
+## Doctor Summary
+
+{{doctorSummary}}
+
+## Structured Issues
+
+{{structuredIssues}}
+
+## Requested Scope
+
+{{scopeLabel}}
+
+Then:
+- Repair the unresolved issues in scope
+- Keep changes minimal and targeted
+- If unresolved issues remain outside scope, leave them untouched and mention them briefly
+- End with: "GSD doctor heal complete."
--- a/src/resources/extensions/gsd/prompts/execute-task.md
+++ b/src/resources/extensions/gsd/prompts/execute-task.md
@ -0,0 +1,64 @@
+You are executing GSD auto-mode.
+
+## UNIT: Execute Task {{taskId}} ("{{taskTitle}}") — Slice {{sliceId}} ("{{sliceTitle}}"), Milestone {{milestoneId}}
+
+Start with the inlined context below. Treat the inlined task plan as the authoritative local execution contract for this unit. Use the referenced source artifacts to verify details, resolve ambiguity, and run the required checks — do not waste time reconstructing context that is already provided here.
+
+{{resumeSection}}
+
+{{carryForwardSection}}
+
+{{taskPlanInline}}
+
+{{slicePlanExcerpt}}
+
+## Backing Source Artifacts
+- Slice plan: `{{planPath}}`
+- Task plan source: `{{taskPlanPath}}`
+- Prior task summaries in this slice:
+{{priorTaskLines}}
+
+Then:
+1. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during execution, without relaxing required verification or artifact rules
+2. Execute the steps in the inlined task plan
+3. Build the real thing. If the task plan says "create login endpoint", build an endpoint that actually authenticates against a real store, not one that returns a hardcoded success response. If the task plan says "create dashboard page", build a page that renders real data from the API, not a component with hardcoded props. Stubs and mocks are for tests, not for the shipped feature.
+4. Write or update tests as part of execution — tests are verification, not an afterthought. If the slice plan defines test files in its Verification section and this is the first task, create them (they should initially fail).
+5. When implementing non-trivial runtime behavior, add or preserve agent-usable observability:
+   - Prefer structured logs/events, stable error codes/types, and explicit status surfaces over ad hoc console text
+   - Ensure failures are externally inspectable rather than swallowed or hidden
+   - Persist high-value failure state when it materially improves retries, recovery, or later debugging
+   - Never log secrets, tokens, or sensitive raw payloads unnecessarily
+6. Verify must-haves are met by running concrete checks (tests, commands, observable behaviors)
+7. Run the slice-level verification checks defined in the slice plan's Verification section. Track which pass. On the final task of the slice, all must pass before marking done. On intermediate tasks, partial passes are expected — note which ones pass in the summary.
+8. If the task touches UI, browser flows, DOM behavior, or user-visible web state:
+   - exercise the real flow in the browser
+   - prefer `browser_batch` when the next few actions are obvious and sequential
+   - prefer `browser_assert` for explicit pass/fail verification of the intended outcome
+   - use `browser_diff` when an action's effect is ambiguous
+   - use console/network/dialog diagnostics when validating async, stateful, or failure-prone UI
+   - record verification in terms of explicit checks passed/failed, not only prose interpretation
+9. If observability or diagnostics were part of this task's scope, verify them directly — e.g. structured errors, status inspection, health endpoints, persisted failure state, browser/network diagnostics, or equivalent.
+10. **If execution is running long or verification fails:**
+
+    **Context budget:** If you've used most of your context and haven't finished all steps, stop implementing and prioritize writing the task summary with clear notes on what's done and what remains. A partial summary that enables clean resumption is more valuable than one more half-finished step with no documentation. Never sacrifice summary quality for one more implementation step.
+
+    **Debugging discipline:** If a verification check fails or implementation hits unexpected behavior:
+    - Form a hypothesis first. State what you think is wrong and why, then test that specific theory. Don't shotgun-fix.
+    - Change one variable at a time. Make one change, test, observe. Multiple simultaneous changes mean you can't attribute what worked.
+    - Read completely. When investigating, read entire functions and their imports, not just the line that looks relevant.
+    - Distinguish "I know" from "I assume." Observable facts (the error says X) are strong evidence. Assumptions (this library should work this way) need verification.
+    - Know when to stop. If you've tried 3+ fixes without progress, your mental model is probably wrong. Stop. List what you know for certain. List what you've ruled out. Form fresh hypotheses from there.
+    - Don't fix symptoms. Understand *why* something fails before changing code. A test that passes after a change you don't understand is luck, not a fix.
+11. **Blocker discovery:** If execution reveals that the remaining slice plan is fundamentally invalid — not just a bug or minor deviation, but a plan-invalidating finding like a wrong API, missing capability, or architectural mismatch — set `blocker_discovered: true` in the task summary frontmatter and describe the blocker clearly in the summary narrative. Do NOT set `blocker_discovered: true` for ordinary debugging, minor deviations, or issues that can be fixed within the current task or the remaining plan. This flag triggers an automatic replan of the slice.
+12. If you made an architectural, pattern, library, or observability decision during this task that downstream work should know about, append it to `.gsd/DECISIONS.md` (read the template at `~/.pi/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet). Not every task produces decisions — only append when a meaningful choice was made.
+13. Read the template at `~/.pi/agent/extensions/gsd/templates/task-summary.md`
+14. Write `{{taskSummaryAbsPath}}`
+15. Mark {{taskId}} done in `{{planPath}}` (change `[ ]` to `[x]`)
+16. Commit your work: `git add -A && git commit -m 'feat({{sliceId}}/{{taskId}}): <what was built>'`. If `git add` silently fails to stage files (a known git worktree stat-cache bug), use this workaround per file: `git update-index --cacheinfo 100644,$(git hash-object -w <file>),<file>` then commit. If that also fails, move on — the system will auto-commit remaining changes after your session ends.
+17. Update `.gsd/STATE.md`
+
+You are on the slice branch. All work stays here.
+
+**You MUST mark {{taskId}} as `[x]` in `{{planPath}}` AND write `{{taskSummaryAbsPath}}` before finishing.**
+
+When done, say: "Task {{taskId}} complete."
--- a/src/resources/extensions/gsd/prompts/guided-complete-slice.md
+++ b/src/resources/extensions/gsd/prompts/guided-complete-slice.md
@ -0,0 +1 @@
+Complete slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. All tasks are done. Read the templates at `~/.pi/agent/extensions/gsd/templates/slice-summary.md` and `uat.md`. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during completion, without relaxing required verification or artifact rules. Write `{{sliceId}}-SUMMARY.md` (compress task summaries), write `{{sliceId}}-UAT.md`, and fill the `UAT Type` plus `Not Proven By This UAT` sections explicitly so the artifact states what class of acceptance it covers and what still remains unproven. Review task summaries for `key_decisions` and ensure any significant ones are in `.gsd/DECISIONS.md`. Mark the slice checkbox done in the roadmap, update STATE.md, update milestone summary, and leave the slice branch clean for the extension to squash-merge back to main automatically.
--- a/src/resources/extensions/gsd/prompts/guided-discuss-milestone.md
+++ b/src/resources/extensions/gsd/prompts/guided-discuss-milestone.md
@ -0,0 +1,3 @@
+Discuss milestone {{milestoneId}} ("{{milestoneTitle}}"). Identify gray areas, ask the user about them, and write `{{milestoneId}}-CONTEXT.md` in the milestone directory with the decisions. Read the template at `~/.pi/agent/extensions/gsd/templates/context.md` first. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow; do not override required artifact rules.
+
+**Investigate between question rounds to make your questions smarter.** Before each round of questions, do enough lightweight research that your questions are grounded in reality — not guesses about what exists or what's possible. Check library docs (`resolve_library`/`get_library_docs`) when tech choices are relevant, search the web (`search-the-web` with `freshness`/`domain` filters, then `fetch_page` for full content) to verify the landscape, scout the codebase (`rg`, `find`, `scout`) to understand what already exists. Don't go deep — just enough that your next question reflects what's actually true. The goal is to ask questions the user can't answer by saying "did you check the docs?" or "look at the code."
--- a/src/resources/extensions/gsd/prompts/guided-discuss-slice.md
+++ b/src/resources/extensions/gsd/prompts/guided-discuss-slice.md
@ -0,0 +1,59 @@
+You are interviewing the user to surface behavioural, UX, and usage grey areas for slice **{{sliceId}}: {{sliceTitle}}** of milestone **{{milestoneId}}**.
+
+Your goal is **not** to settle tech stack, naming conventions, or architecture — that happens during research and planning. Your goal is to produce a context file that captures the human decisions: what this slice should feel like, how it should behave, what edge cases matter, where scope begins and ends, and what the user cares about that won't be obvious from the roadmap entry alone.
+
+{{inlinedContext}}
+
+---
+
+## Interview Protocol
+
+### Before your first question round
+
+Do a lightweight targeted investigation so your questions are grounded in reality:
+- Scout the codebase (`rg`, `find`, or `scout` for broad unfamiliar areas) to understand what already exists that this slice touches or builds on
+- Check the roadmap context above to understand what surrounds this slice — what comes before, what depends on it
+- Identify the 3–5 biggest behavioural unknowns: things where the user's answer will materially change what gets built
+
+Do **not** go deep — just enough that your questions reflect what's actually true rather than what you assume.
+
+### Question rounds
+
+Ask **1–3 questions per round** using `ask_user_questions`. Keep each question focused on one of:
+- **UX and user-facing behaviour** — what does the user see, click, trigger, or experience?
+- **Edge cases and failure states** — what happens when things go wrong or are in unusual states?
+- **Scope boundaries** — what is explicitly in vs out for this slice? What deferred to later?
+- **Feel and experience** — tone, responsiveness, feedback, transitions, what "done" feels like to the user
+
+After the user answers, investigate further if any answer opens a new unknown, then ask the next round.
+
+### Check-in after each round
+
+After each round of answers, use `ask_user_questions` to ask:
+
+> "I think I have a solid picture of this slice. Ready to wrap up and write the context file, or is there more to cover?"
+
+Options:
+- "Wrap up — write the context file" *(recommended after ~2–3 rounds)*
+- "Keep going — more to discuss"
+
+If the user wants to keep going, keep asking. Stop when they say wrap up.
+
+---
+
+## Output
+
+Once the user is ready to wrap up:
+
+1. Read the slice context template at `~/.pi/agent/extensions/gsd/templates/slice-context.md`
+2. `mkdir -p {{sliceDirAbsPath}}`
+3. Write `{{contextAbsPath}}` — use the template structure, filling in:
+   - **Goal** — one sentence: what this slice delivers
+   - **Why this Slice** — why now, what it unblocks
+   - **Scope / In Scope** — what was confirmed in scope during the interview
+   - **Scope / Out of Scope** — what was explicitly deferred or excluded
+   - **Constraints** — anything the user flagged as a hard constraint
+   - **Integration Points** — what this slice consumes and produces
+   - **Open Questions** — anything still unresolved, with current thinking
+4. Commit: `git -C {{projectRoot}} add {{contextAbsPath}} && git -C {{projectRoot}} commit -m "docs({{milestoneId}}/{{sliceId}}): slice context from discuss"`
+5. Say exactly: `"{{sliceId}} context written."` — nothing else.
--- a/src/resources/extensions/gsd/prompts/guided-execute-task.md
+++ b/src/resources/extensions/gsd/prompts/guided-execute-task.md
@ -0,0 +1 @@
+Execute the next task: {{taskId}} ("{{taskTitle}}") in slice {{sliceId}} of milestone {{milestoneId}}. Read the task plan (`{{taskId}}-PLAN.md`), load relevant summaries from prior tasks, and execute each step. Verify must-haves when done. If the task touches UI, browser flows, DOM behavior, or user-visible web state, exercise the real flow in the browser, prefer `browser_batch` for obvious sequences, prefer `browser_assert` for explicit pass/fail verification, use `browser_diff` when an action's effect is ambiguous, and use browser diagnostics when validating async or failure-prone UI. If you made an architectural, pattern, or library decision, append it to `.gsd/DECISIONS.md`. Read the template at `~/.pi/agent/extensions/gsd/templates/task-summary.md`. Write `{{taskId}}-SUMMARY.md`, mark it done, commit, and advance. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during execution, without relaxing required verification or artifact rules. If running long and not all steps are finished, stop implementing and prioritize writing a clean partial summary over attempting one more step — a recoverable handoff is more valuable than a half-finished step with no documentation. If verification fails, debug methodically: form a hypothesis and test that specific theory before changing anything, change one variable at a time, read entire functions not just the suspect line, distinguish observable facts from assumptions, and if 3+ fixes fail without progress stop and reassess your mental model — list what you know for certain, what you've ruled out, and form fresh hypotheses. Don't fix symptoms — understand why something fails before changing code.
--- a/src/resources/extensions/gsd/prompts/guided-plan-milestone.md
+++ b/src/resources/extensions/gsd/prompts/guided-plan-milestone.md
@ -0,0 +1,23 @@
+Plan milestone {{milestoneId}} ("{{milestoneTitle}}"). Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists and treat Active requirements as the capability contract. If `REQUIREMENTS.md` is missing, continue in legacy compatibility mode but explicitly note missing requirement coverage. Read the template at `~/.pi/agent/extensions/gsd/templates/roadmap.md`. Create `{{milestoneId}}-ROADMAP.md` in the milestone directory with slices, risk levels, dependencies, demo sentences, verification classes, milestone definition of done, requirement coverage, and a boundary map. Write success criteria as observable truths, not implementation tasks. If the milestone crosses multiple runtime boundaries, include an explicit final integration slice that proves the assembled system works end-to-end in a real environment. If planning produces structural decisions, append them to `.gsd/DECISIONS.md`. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during planning, without overriding required roadmap formatting.
+
+## Requirement Rules
+
+- Every relevant Active requirement must be mapped to a slice, deferred, blocked with reason, or moved out of scope.
+- Each requirement gets one primary owner and may have supporting slices.
+- Surface orphaned Active requirements instead of silently ignoring them.
+- Product-facing milestones should cover launchability, primary user loop, continuity, and failure visibility when relevant.
+
+## Planning Doctrine
+
+- **Risk-first means proof-first.** The earliest slices should prove the hardest thing works by shipping the real feature through the uncertain path. If auth is the risk, the first slice ships a real login page with real session handling that a user can actually use — not a CLI command that returns "authenticated: true". Proof is the shipped feature working. There is no separate "proof" artifact. Do not plan spikes, proof-of-concept slices, or validation-only slices — the proof is the real feature, built through the risky path.
+- **Every slice is vertical, demoable, and shippable.** Every slice ships real, user-facing functionality. "Demoable" means you could show a stakeholder and they'd see real product progress — not a developer showing a terminal command. If the only way to demonstrate the slice is through a test runner or a curl command, the slice is missing its UI/UX surface. Add it. A slice that only proves something but doesn't ship real working code is not a slice — restructure it.
+- **Brownfield bias.** When planning against an existing codebase, ground slices in existing modules, conventions, and seams. Prefer extending real patterns over inventing new ones.
+- **Each slice should establish something downstream slices can depend on.** Think about what stable surface this slice creates for later work — an API, a data shape, a proven integration path.
+- **Avoid foundation-only slices.** If a slice doesn't produce something demoable end-to-end, it's probably a layer, not a vertical slice. Restructure it.
+- **Verification-first.** When planning slices, know what "done" looks like before detailing implementation. Each slice's demo line should describe concrete, verifiable evidence — not vague "it works" claims.
+- **Plan for integrated reality, not just local proof.** Distinguish contract proof from live integration proof. If the milestone involves multiple runtime boundaries, one slice must explicitly prove the assembled system through the real entrypoint or runtime path.
+- **Truthful demo lines only.** If a slice is proven by fixtures or tests only, say so. Do not phrase harness-level proof as if the user can already perform the live end-to-end behavior unless that has actually been exercised.
+- **Completion must imply capability.** If every slice in this roadmap were completed exactly as written, the milestone's promised outcome should actually work at the proof level claimed. Do not write slices that can all be checked off while the user-visible capability still does not exist.
+- **Don't invent risks.** If the project is straightforward, skip the proof strategy and just ship value in smart order. Not everything has major unknowns.
+- **Ship features, not proofs.** A completed slice should leave the product in a state where the new capability is actually usable through its real interface. A login flow slice ends with a working login page, not a middleware function. An API slice ends with endpoints that return real data from a real store, not hardcoded fixtures. A dashboard slice ends with a real dashboard rendering real data, not a component that renders mock props. If a slice can't ship the real thing yet because a dependency isn't built, it should ship with realistic stubs that are clearly marked for replacement — but the user-facing surface must be real.
+- **Ambition matches the milestone.** The number and depth of slices should match the milestone's ambition. A milestone promising "core platform with auth, data model, and primary user loop" should have enough slices to actually deliver all three as working features — not two proof-of-concept slices and a note that "the rest will come in the next milestone." If the milestone's context promises an outcome, the roadmap must deliver it.
--- a/src/resources/extensions/gsd/prompts/guided-plan-slice.md
+++ b/src/resources/extensions/gsd/prompts/guided-plan-slice.md
@ -0,0 +1 @@
+Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Read the templates at `~/.pi/agent/extensions/gsd/templates/plan.md` and `task-plan.md`. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. Write `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files in the `tasks/` subdirectory. If planning produces structural decisions, append them to `.gsd/decisions.md`. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during planning, without overriding required plan formatting. Before committing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 2–5 steps and 3–8 files (6–8 steps or 8–10 files — consider splitting; 10+ steps or 12+ files — must split), the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts.
--- a/src/resources/extensions/gsd/prompts/guided-research-slice.md
+++ b/src/resources/extensions/gsd/prompts/guided-research-slice.md
@ -0,0 +1,11 @@
+Research slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.gsd/DECISIONS.md` if it exists — respect existing decisions, don't contradict them. Read `.gsd/REQUIREMENTS.md` if it exists — identify which Active requirements this slice owns or supports and target research toward risks, unknowns, and constraints that could affect delivery of those requirements. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during research, without relaxing required verification or artifact rules. Explore the relevant code — use `rg`/`find` for targeted reads, or `scout` if the area is broad or unfamiliar. Check libraries with `resolve_library`/`get_library_docs`. Read the template at `~/.pi/agent/extensions/gsd/templates/research.md`. Write `{{sliceId}}-RESEARCH.md` in the slice directory with summary, don't-hand-roll, common pitfalls, and relevant code sections.
+
+## Strategic Questions to Answer
+
+Research should drive planning decisions, not just collect facts. Explicitly address:
+
+- **What should be proven first?** What's the riskiest assumption — the thing that, if wrong, invalidates downstream work?
+- **What existing patterns should be reused?** What modules, conventions, or infrastructure already exist that the plan should build on rather than reinvent?
+- **What boundary contracts matter?** What interfaces, data shapes, event formats, or invariants will slices need to agree on?
+- **What constraints does the existing codebase impose?** What can't be changed, what's expensive to change, what patterns must be respected?
+- **Are there known failure modes that should shape slice ordering?** Pitfalls that mean certain work should come before or after other work?
--- a/src/resources/extensions/gsd/prompts/guided-resume-task.md
+++ b/src/resources/extensions/gsd/prompts/guided-resume-task.md
@ -0,0 +1 @@
+Resume interrupted work. Find the continue file (`{{sliceId}}-CONTINUE.md` or `continue.md`) in slice {{sliceId}} of milestone {{milestoneId}}, then pick up from where you left off. Delete the continue file after reading it. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during execution, without relaxing required verification or artifact rules.
--- a/src/resources/extensions/gsd/prompts/plan-milestone.md
+++ b/src/resources/extensions/gsd/prompts/plan-milestone.md
@ -0,0 +1,47 @@
+You are executing GSD auto-mode.
+
+## UNIT: Plan Milestone {{milestoneId}} ("{{milestoneTitle}}")
+
+All relevant context has been preloaded below — start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+Then:
+1. Read the template at `~/.pi/agent/extensions/gsd/templates/roadmap.md`
+2. Read `.gsd/REQUIREMENTS.md` if it exists. Treat **Active** requirements as the capability contract for planning. If it does not exist, continue in legacy compatibility mode but explicitly note that requirement coverage is operating without a contract.
+3. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during planning, without overriding required roadmap formatting
+4. Create the roadmap: decompose into demoable vertical slices — as many as the work needs, no more
+5. Order by risk (high-risk first)
+6. Write `{{outputPath}}` with checkboxes, risk, depends, demo sentences, proof strategy, verification classes, milestone definition of done, **requirement coverage**, and a boundary map. Write success criteria as observable truths, not implementation tasks. If the milestone crosses multiple runtime boundaries, include an explicit final integration slice that proves the assembled system works end-to-end in a real environment
+7. If planning produced structural decisions (e.g. slice ordering rationale, technology choices, scope exclusions), append them to `.gsd/DECISIONS.md` (read the template at `~/.pi/agent/extensions/gsd/templates/decisions.md` if the file doesn't exist yet)
+8. Update `.gsd/STATE.md`
+
+## Requirement Mapping Rules
+
+- Every Active requirement relevant to this milestone must be in one of these states by the end of planning: mapped to a slice, explicitly deferred, blocked with reason, or moved out of scope.
+- Each requirement should have one accountable primary owner and may have supporting slices.
+- Product-facing milestones should cover launchability, primary user loop, continuity, and failure visibility when relevant.
+- A slice may support multiple requirements, but should not exist with no requirement justification unless it is clearly enabling work for a mapped requirement.
+- Include a compact coverage summary in the roadmap so omissions are mechanically visible.
+- If `.gsd/REQUIREMENTS.md` exists and an Active requirement has no credible path, surface that clearly. Do not silently ignore orphaned Active requirements.
+
+## Planning Doctrine
+
+Apply these when decomposing and ordering slices:
+
+- **Risk-first means proof-first.** The earliest slices should prove the hardest thing works by shipping the real feature through the uncertain path. If auth is the risk, the first slice ships a real login page with real session handling that a user can actually use — not a CLI command that returns "authenticated: true". Proof is the shipped feature working. There is no separate "proof" artifact. Do not plan spikes, proof-of-concept slices, or validation-only slices — the proof is the real feature, built through the risky path.
+- **Every slice is vertical, demoable, and shippable.** Every slice ships real, user-facing functionality. "Demoable" means you could show a stakeholder and they'd see real product progress — not a developer showing a terminal command. If the only way to demonstrate the slice is through a test runner or a curl command, the slice is missing its UI/UX surface. Add it. A slice that only proves something but doesn't ship real working code is not a slice — restructure it.
+- **Brownfield bias.** When planning against an existing codebase, ground slices in existing modules, conventions, and seams. Prefer extending real patterns over inventing new ones.
+- **Each slice should establish something downstream slices can depend on.** Think about what stable surface this slice creates for later work — an API, a data shape, a proven integration path.
+- **Avoid foundation-only slices.** If a slice doesn't produce something demoable end-to-end, it's probably a layer, not a vertical slice. Restructure it.
+- **Verification-first.** When planning slices, know what "done" looks like before detailing implementation. Each slice's demo line should describe concrete, verifiable evidence — not vague "it works" claims.
+- **Plan for integrated reality, not just local proof.** Distinguish contract proof from live integration proof. If the milestone involves multiple runtime boundaries, one slice must explicitly prove the assembled system through the real entrypoint or runtime path.
+- **Truthful demo lines only.** If a slice is proven by fixtures or tests only, say so. Do not phrase harness-level proof as if the user can already perform the live end-to-end behavior unless that has actually been exercised.
+- **Completion must imply capability.** If every slice in this roadmap were completed exactly as written, the milestone's promised outcome should actually work at the proof level claimed. Do not write slices that can all be checked off while the user-visible capability still does not exist.
+- **Don't invent risks.** If the project is straightforward, skip the proof strategy and just ship value in smart order. Not everything has major unknowns.
+- **Ship features, not proofs.** A completed slice should leave the product in a state where the new capability is actually usable through its real interface. A login flow slice ends with a working login page, not a middleware function. An API slice ends with endpoints that return real data from a real store, not hardcoded fixtures. A dashboard slice ends with a real dashboard rendering real data, not a component that renders mock props. If a slice can't ship the real thing yet because a dependency isn't built, it should ship with realistic stubs that are clearly marked for replacement — but the user-facing surface must be real.
+- **Ambition matches the milestone.** The number and depth of slices should match the milestone's ambition. A milestone promising "core platform with auth, data model, and primary user loop" should have enough slices to actually deliver all three as working features — not two proof-of-concept slices and a note that "the rest will come in the next milestone." If the milestone's context promises an outcome, the roadmap must deliver it.
+
+**You MUST write the file `{{outputAbsPath}}` before finishing.**
+
+When done, say: "Milestone {{milestoneId}} planned."
--- a/src/resources/extensions/gsd/prompts/plan-slice.md
+++ b/src/resources/extensions/gsd/prompts/plan-slice.md
@ -0,0 +1,63 @@
+You are executing GSD auto-mode.
+
+## UNIT: Plan Slice {{sliceId}} ("{{sliceTitle}}") — Milestone {{milestoneId}}
+
+All relevant context has been preloaded below — start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+### Dependency Slice Summaries
+
+Pay particular attention to **Forward Intelligence** sections — they contain hard-won knowledge about what's fragile, what assumptions changed, and what this slice should watch out for.
+
+{{dependencySummaries}}
+
+Then:
+0. If `REQUIREMENTS.md` was preloaded above, identify which Active requirements the roadmap says this slice owns or supports. These are the requirements this plan must deliver — every owned requirement needs at least one task that directly advances it, and verification must prove the requirement is met.
+1. Read the templates:
+   - `~/.pi/agent/extensions/gsd/templates/plan.md`
+   - `~/.pi/agent/extensions/gsd/templates/task-plan.md`
+2. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during planning, without overriding required plan formatting
+3. Define slice-level verification first — the objective stopping condition for this slice:
+   - For non-trivial slices: plan actual test files with real assertions. Name the files. The first task creates them (initially failing). Remaining tasks make them pass.
+   - For simple slices: executable commands or script assertions are fine.
+   - If the project is non-trivial and has no test framework, the first task should set one up.
+   - If this slice establishes a boundary contract, verification must exercise that contract.
+4. Plan observability and diagnostics explicitly:
+   - For non-trivial backend, integration, async, stateful, or UI slices, include an `Observability / Diagnostics` section in the slice plan.
+   - Define how a future agent will inspect state, detect failure, and localize the problem.
+   - Prefer structured logs/events, stable error codes/types, status surfaces, and persisted failure state over ad hoc debug text.
+   - Include at least one verification check for a diagnostic or failure-path signal when relevant.
+5. Fill the `Proof Level` and `Integration Closure` sections truthfully:
+   - State whether the slice proves contract, integration, operational, or final-assembly behavior.
+   - Say whether real runtime or human/UAT is required.
+   - Name the wiring introduced in this slice and what still remains before the milestone is truly usable end-to-end.
+6. Decompose the slice into tasks, each fitting one context window
+7. Every task in the slice plan should be written as an executable increment with:
+   - a concrete, action-oriented title
+   - the inline task entry fields defined in the plan.md template (Why / Files / Do / Verify / Done when)
+   - a matching task plan containing description, steps, must-haves, verification, observability impact, inputs, and expected output
+8. Each task needs: title, description, steps, must-haves, verification, observability impact, inputs, and expected output
+9. If verification includes test files, ensure the first task includes creating them with expected assertions (they should fail initially — that's correct)
+10. Write `{{outputPath}}`
+11. Write individual task plans in `{{sliceAbsPath}}/tasks/`: `T01-PLAN.md`, `T02-PLAN.md`, etc.
+12. **Self-audit the plan before continuing.** Walk through each check — if any fail, fix the plan files before moving on:
+    - **Completion semantics:** If every task were completed exactly as written, the slice goal/demo should actually be true at the claimed proof level. Do not allow a task plan that only scaffolds toward a future working state.
+    - **Requirement coverage:** Every must-have in the slice maps to at least one task. No must-have is orphaned.
+    - **Task completeness:** Every task has steps, must-haves, verification, observability impact, inputs, and expected output — none are blank or vague.
+    - **Dependency correctness:** Task ordering is consistent. No task references work from a later task.
+    - **Key links planned:** For every pair of artifacts that must connect (component → API, API → database, form → handler), there is an explicit step that wires them — not just "create X" and "create Y" in separate tasks with no connection step.
+    - **Scope sanity:** Target 2–5 steps and 3–8 files per task. 6–8 steps or 8–10 files is a warning — consider splitting. 10+ steps or 12+ files — must split. Each task must be completable in a single fresh context window.
+    - **Context compliance:** If context/research artifacts or `.gsd/DECISIONS.md` exist, the plan honors locked decisions and doesn't include deferred or out-of-scope items.
+    - **Requirement coverage:** If `REQUIREMENTS.md` exists, every Active requirement this slice owns (per the roadmap) maps to at least one task with verification that proves the requirement is met. No owned requirement is left without a task. No task claims to satisfy a requirement that is Deferred or Out of Scope.
+    - **Proof honesty:** The `Proof Level` and `Integration Closure` sections match what this slice will actually prove, and they do not imply live end-to-end completion if only fixture or contract proof is planned.
+    - **Feature completeness:** Every task produces real, user-facing progress — not just internal scaffolding. If the slice has a UI surface, at least one task builds the real UI (not a placeholder). If the slice has an API, at least one task connects it to a real data source (not hardcoded returns). If every task were completed and you showed the result to a non-technical stakeholder, they should see real product progress, not developer artifacts.
+13. If planning produced structural decisions (e.g. verification strategy, observability strategy, technology choices, patterns to follow), append them to `.gsd/DECISIONS.md`
+14. Commit: `docs({{sliceId}}): add slice plan`
+15. Update `.gsd/STATE.md`
+
+The slice directory and tasks/ subdirectory already exist. Do NOT mkdir. You are on the slice branch; all work stays here.
+
+**You MUST write the file `{{outputAbsPath}}` before finishing.**
+
+When done, say: "Slice {{sliceId}} planned."
--- a/src/resources/extensions/gsd/prompts/queue.md
+++ b/src/resources/extensions/gsd/prompts/queue.md
@ -0,0 +1,85 @@
+{{preamble}}
+
+Say exactly: "What do you want to add?" — nothing else. Wait for the user's answer.
+
+## Discussion Phase
+
+After they describe it, your job is to understand the new work deeply enough to create context files that a future planning session can use.
+
+**If the user provides a file path or pastes a large document** (spec, design doc, product plan, chat export), read it fully before asking questions. Use it as the starting point — don't ask them to re-explain what's already in the document. Your questions should fill gaps and resolve ambiguities the document doesn't cover.
+
+**Investigate between question rounds to make your questions smarter.** Before each round of questions, do enough lightweight research that your questions are grounded in reality — not guesses about what exists or what's possible.
+
+- Check library docs (`resolve_library` / `get_library_docs`) when the user mentions tech you need current facts about — capabilities, constraints, API shapes, version-specific behavior
+- Do web searches (`search-the-web`) to verify the landscape — what solutions exist, what's changed recently, what's the current best practice. Use `freshness` for recency-sensitive queries, `domain` to target specific sites. Use `fetch_page` to read the full content of promising URLs when snippets aren't enough.
+- Scout the codebase (`ls`, `find`, `rg`, or `scout` for broad unfamiliar areas) to understand what already exists, what patterns are established, what constraints current code imposes
+
+Don't go deep — just enough that your next question reflects what's actually true rather than what you assume.
+
+**Use this to actively surface:**
+- The biggest technical unknowns — what could fail, what hasn't been proven, what might invalidate the plan
+- Integration surfaces — external systems, APIs, libraries, or internal modules this work touches
+- What needs to be proven before committing — the things that, if they don't work, mean the plan is wrong
+- How the new work relates to existing milestones — overlap, dependencies, prerequisites
+- If `.gsd/REQUIREMENTS.md` exists: which unmet Active or Deferred requirements this queued work advances
+
+**Then use ask_user_questions** to dig into gray areas — architecture choices, scope boundaries, tech preferences, what's in vs out. 1-3 questions per round.
+
+If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during discuss/planning work, but do not let it override the required discuss flow or artifact requirements.
+
+**Self-regulate:** After about 10-15 questions total (3-5 rounds), or when you feel you have a solid understanding, include a question like:
+"I think I have a good picture. Ready to queue this, or are there more things to discuss?"
+with options: "Ready to queue (Recommended)", "I have more to discuss"
+
+If the user wants to keep going, keep asking. If they're ready, proceed.
+
+## Existing Milestone Awareness
+
+{{existingMilestonesContext}}
+
+Before writing anything, assess the new work against what already exists:
+
+1. **Dedup check** — Is this already covered (fully or partially) by an existing milestone? If so, tell the user and explain what's already planned. Don't create duplicate milestones.
+2. **Extension check** — Should this be added to an existing *pending* (not yet started) milestone rather than creating a new one? If the scope naturally belongs with existing pending work, propose extending that milestone's context instead.
+3. **Dependency check** — Does the new work depend on something that's currently in progress or planned? Note the dependency so context files capture it.
+4. **Requirement check** — If `.gsd/REQUIREMENTS.md` exists, identify whether this queued work advances unmet Active requirements, promotes Deferred work, or introduces entirely new scope that should also update the requirement contract.
+
+If the new work is already fully covered, say so and stop — don't create anything.
+
+## Scope Assessment
+
+Before writing artifacts, assess whether this is **single-milestone** or **multi-milestone** scope.
+
+**Single milestone** if the work is one coherent body of deliverables that fits in roughly 2-12 slices.
+
+**Multi-milestone** if:
+- The work has natural phase boundaries
+- Different parts could ship independently on different timelines
+- The full scope is too large for one milestone to stay focused
+- The document/spec describes what is clearly multiple major efforts
+
+If multi-milestone: propose the split to the user before writing artifacts.
+
+## Sequencing
+
+Determine where the new milestones should go in the overall sequence. Consider dependencies, prerequisites, and independence.
+
+## Output Phase
+
+Once the user is satisfied, in a single pass for **each** new milestone (starting from {{nextId}}):
+
+1. `mkdir -p .gsd/milestones/<ID>/slices`
+2. Write `.gsd/milestones/<ID>/<ID>-CONTEXT.md` — read the template at `~/.pi/agent/extensions/gsd/templates/context.md` first. Capture intent, scope, risks, constraints, integration points, and relevant requirements. Mark the status as "Queued — pending auto-mode execution."
+
+Then, after all milestone directories and context files are written:
+
+3. Update `.gsd/PROJECT.md` — add the new milestones to the Milestone Sequence. Keep existing entries exactly as they are. Only add new lines.
+4. If `.gsd/REQUIREMENTS.md` exists and the queued work introduces new in-scope capabilities or promotes Deferred items, update it.
+5. If discussion produced decisions relevant to existing work, append to `.gsd/DECISIONS.md`.
+6. Append to `.gsd/QUEUE.md`.
+7. Commit: `docs: queue <milestone list>`
+
+**Do NOT write roadmaps for queued milestones.**
+**Do NOT update `.gsd/STATE.md`.**
+
+After writing the files and committing, say exactly: "Queued N milestone(s). Auto-mode will pick them up after current work completes." — nothing else.
--- a/src/resources/extensions/gsd/prompts/reassess-roadmap.md
+++ b/src/resources/extensions/gsd/prompts/reassess-roadmap.md
@ -0,0 +1,48 @@
+You are executing GSD auto-mode.
+
+## UNIT: Reassess Roadmap — Milestone {{milestoneId}} after {{completedSliceId}}
+
+All relevant context has been preloaded below — the current roadmap, completed slice summary, project state, and decisions are inlined. Start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during reassessment, without relaxing required verification or artifact rules.
+
+Then assess whether the remaining roadmap still makes sense given what was just built.
+
+**Bias strongly toward "roadmap is fine."** Most of the time, the plan is still good. Only rewrite if you have concrete evidence that remaining slices need to change. Don't rewrite for cosmetic reasons, minor optimization, or theoretical improvements.
+
+Ask yourself:
+- Did this slice retire the risk it was supposed to? If not, does a remaining slice need to address it?
+- Did new risks or unknowns emerge that should change slice ordering?
+- Are the boundary contracts in the boundary map still accurate given what was actually built?
+- Should any remaining slices be reordered, merged, split, or adjusted based on concrete evidence?
+- Did assumptions in remaining slice descriptions turn out wrong?
+- If `.gsd/REQUIREMENTS.md` exists: did this slice validate, invalidate, defer, block, or newly surface requirements?
+- If `.gsd/REQUIREMENTS.md` exists: does the remaining roadmap still provide credible coverage for Active requirements, including launchability, primary user loop, continuity, and failure visibility where relevant?
+
+### Success-Criterion Coverage Check
+
+Before deciding whether changes are needed, enumerate each success criterion from the roadmap's `## Success Criteria` section and map it to the remaining (unchecked) slice(s) that prove it. Each criterion must have at least one remaining owning slice. If any criterion has no remaining owner after the proposed changes, flag it as a **blocking issue** — do not accept changes that leave a criterion unproved.
+
+Format each criterion as a single line:
+
+- `Criterion text → S02, S03` (covered by at least one remaining slice)
+- `Criterion text → ⚠ no remaining owner — BLOCKING` (no slice proves this criterion)
+
+If all criteria have at least one remaining owning slice, the coverage check passes. If any criterion has no remaining owner, resolve it before finalizing the assessment — either by keeping a slice that was going to be removed, adding coverage to another slice, or explaining why the criterion is no longer relevant.
+
+**If the roadmap is still good:**
+
+Write `{{assessmentAbsPath}}` with a brief confirmation that roadmap coverage still holds after {{completedSliceId}}. If requirements exist, explicitly note whether requirement coverage remains sound.
+
+**If changes are needed:**
+
+1. Rewrite the remaining (unchecked) slices in `{{roadmapPath}}`. Keep completed slices exactly as they are (`[x]`). Update the boundary map for changed slices. Update the proof strategy if risks changed. Update requirement coverage if ownership or scope changed.
+2. Write `{{assessmentAbsPath}}` explaining what changed and why — keep it brief and concrete.
+3. If `.gsd/REQUIREMENTS.md` exists and requirement ownership or status changed, update it.
+4. Commit: `docs({{milestoneId}}): reassess roadmap after {{completedSliceId}}`
+
+**You MUST write the file `{{assessmentAbsPath}}` before finishing.**
+
+When done, say: "Roadmap reassessed."
--- a/src/resources/extensions/gsd/prompts/replan-slice.md
+++ b/src/resources/extensions/gsd/prompts/replan-slice.md
@ -0,0 +1,39 @@
+You are executing GSD auto-mode.
+
+## UNIT: Replan Slice {{sliceId}} ("{{sliceTitle}}") — Milestone {{milestoneId}}
+
+A completed task reported `blocker_discovered: true`, meaning the current slice plan cannot be executed as-is. Your job is to rewrite the remaining tasks in the slice plan to address the blocker while preserving all completed work.
+
+All relevant context has been preloaded below — the roadmap, current slice plan, the blocker task summary, and decisions are inlined. Start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+## Hard Constraints
+
+- **Do NOT renumber or remove completed tasks.** All `[x]` tasks and their IDs must remain exactly as they are in the plan.
+- **Do NOT change completed task descriptions, estimates, or metadata.** They are historical records.
+- **Preserve completed task summaries.** Do not modify any `T0x-SUMMARY.md` files for completed tasks.
+- Only modify `[ ]` (incomplete) tasks. You may rewrite, reorder, add, or remove incomplete tasks as needed to address the blocker.
+- New tasks must follow the existing ID numbering sequence (e.g., if T01–T03 exist, new tasks start at T04 or continue from the highest existing ID).
+
+## Instructions
+
+1. Read the blocker task summary carefully. Understand exactly what was discovered and why it blocks the current plan.
+2. Analyze the remaining `[ ]` tasks in the slice plan. Determine which are still valid, which need modification, and which should be replaced.
+3. Write `{{replanAbsPath}}` documenting:
+   - What blocker was discovered and in which task
+   - What changed in the plan and why
+   - Which incomplete tasks were modified, added, or removed
+   - Any new risks or considerations introduced by the replan
+4. Rewrite `{{planPath}}` with the updated slice plan:
+   - Keep all `[x]` tasks exactly as they were (same IDs, same descriptions, same checkmarks)
+   - Update the `[ ]` tasks to address the blocker
+   - Ensure the slice Goal and Demo sections are still achievable with the new tasks, or update them if the blocker fundamentally changes what the slice can deliver
+   - Update the Files Likely Touched section if the replan changes which files are affected
+5. If any incomplete task had a `T0x-PLAN.md`, remove or rewrite it to match the new task description.
+6. Commit all changes: `git add -A && git commit -m 'refactor({{sliceId}}): replan after blocker in {{blockerTaskId}}'`
+7. Update `.gsd/STATE.md`
+
+**You MUST write `{{replanAbsPath}}` and the updated slice plan before finishing.**
+
+When done, say: "Slice {{sliceId}} replanned."
--- a/src/resources/extensions/gsd/prompts/research-milestone.md
+++ b/src/resources/extensions/gsd/prompts/research-milestone.md
@ -0,0 +1,37 @@
+You are executing GSD auto-mode.
+
+## UNIT: Research Milestone {{milestoneId}} ("{{milestoneTitle}}")
+
+All relevant context has been preloaded below — start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+Then research the codebase and relevant technologies:
+1. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during research, without relaxing required verification or artifact rules
+2. **Skill Discovery ({{skillDiscoveryMode}}):**{{skillDiscoveryInstructions}}
+3. Explore relevant code. For small/familiar codebases, use `rg`, `find`, and targeted reads. For large or unfamiliar codebases, use `scout` to build a broad map efficiently before diving in.
+4. Use `resolve_library` / `get_library_docs` for unfamiliar libraries
+5. Read the template at `~/.pi/agent/extensions/gsd/templates/research.md`
+6. If `.gsd/REQUIREMENTS.md` exists, research against it. Identify which Active requirements are table stakes, likely omissions, overbuilt risks, or domain-standard behaviors the user may or may not want.
+7. Write `{{outputPath}}` with:
+   - Summary (2-3 paragraphs, primary recommendation)
+   - Don't Hand-Roll table (problems with existing solutions)
+   - Common Pitfalls (what goes wrong, how to avoid)
+   - Relevant Code (existing files, patterns, integration points)
+   - Sources
+
+## Strategic Questions to Answer
+
+- What should be proven first?
+- What existing patterns should be reused?
+- What boundary contracts matter?
+- What constraints does the existing codebase impose?
+- Are there known failure modes that should shape slice ordering?
+- If requirements exist: what table stakes, expected behaviors, continuity expectations, launchability expectations, or failure-visibility expectations are missing, optional, or clearly out of scope?
+- Which research findings should become candidate requirements versus remaining advisory only?
+
+**Research is advisory, not auto-binding.** Surface candidate requirements clearly instead of silently expanding scope.
+
+**You MUST write the file `{{outputAbsPath}}` before finishing.**
+
+When done, say: "Milestone {{milestoneId}} researched."
--- a/src/resources/extensions/gsd/prompts/research-slice.md
+++ b/src/resources/extensions/gsd/prompts/research-slice.md
@ -0,0 +1,28 @@
+You are executing GSD auto-mode.
+
+## UNIT: Research Slice {{sliceId}} ("{{sliceTitle}}") — Milestone {{milestoneId}}
+
+All relevant context has been preloaded below — start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+### Dependency Slice Summaries
+
+Pay particular attention to **Forward Intelligence** sections — they contain hard-won knowledge about what's fragile, what assumptions changed, and what to watch out for.
+
+{{dependencySummaries}}
+
+Then research what this slice needs:
+0. If `REQUIREMENTS.md` was preloaded above, identify which Active requirements this slice owns or supports. Research should target these requirements — surfacing risks, unknowns, and implementation constraints that could affect whether the slice actually delivers them.
+1. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during research, without relaxing required verification or artifact rules
+2. **Skill Discovery ({{skillDiscoveryMode}}):**{{skillDiscoveryInstructions}}
+3. Explore relevant code for this slice's scope. For targeted exploration, use `rg`, `find`, and reads. For broad or unfamiliar subsystems, use `scout` to map the relevant area first.
+4. Use `resolve_library` / `get_library_docs` for unfamiliar libraries
+5. Read the template at `~/.pi/agent/extensions/gsd/templates/research.md`
+6. Write `{{outputPath}}`
+
+The slice directory already exists at `{{slicePath}}/`. Do NOT mkdir — just write the file.
+
+**You MUST write the file `{{outputAbsPath}}` before finishing.**
+
+When done, say: "Slice {{sliceId}} researched."
--- a/src/resources/extensions/gsd/prompts/run-uat.md
+++ b/src/resources/extensions/gsd/prompts/run-uat.md
@ -0,0 +1,109 @@
+You are executing GSD auto-mode.
+
+## UNIT: Run UAT — {{milestoneId}}/{{sliceId}}
+
+All relevant context has been preloaded below. Start working immediately without re-reading these files.
+
+{{inlinedContext}}
+
+If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during UAT execution, without relaxing required verification or artifact rules.
+
+---
+
+## UAT Instructions
+
+**UAT file:** `{{uatPath}}`
+**UAT type:** `{{uatType}}`
+**Result file to write:** `{{uatResultAbsPath}}` (relative: `{{uatResultPath}}`)
+
+### If UAT type is `artifact-driven`
+
+You are the test runner. Execute every check defined in `{{uatPath}}` directly:
+
+- Run shell commands with `bash`
+- Run `grep` / `rg` checks against files
+- Run `node` / script invocations
+- Read files and verify their contents
+- Check that expected artifacts exist and have correct structure
+
+For each check, record:
+- The check description (from the UAT file)
+- The command or action taken
+- The actual result observed
+- PASS or FAIL verdict
+
+After running all checks, compute the **overall verdict**:
+- `PASS` — all checks passed
+- `FAIL` — one or more checks failed
+- `PARTIAL` — some checks passed, some failed or were skipped
+
+Write `{{uatResultAbsPath}}` with:
+
+```markdown
+---
+sliceId: {{sliceId}}
+uatType: {{uatType}}
+verdict: PASS | FAIL | PARTIAL
+date: <ISO 8601 timestamp>
+---
+
+# UAT Result — {{sliceId}}
+
+## Checks
+
+| Check | Result | Notes |
+|-------|--------|-------|
+| <check description> | PASS / FAIL | <observed output or reason> |
+
+## Overall Verdict
+
+<PASS / FAIL / PARTIAL> — <one sentence summary>
+
+## Notes
+
+<any additional context, errors encountered, or follow-up items>
+```
+
+### If UAT type is NOT `artifact-driven` (type is `{{uatType}}`)
+
+This UAT type requires human execution or live-runtime observation that you cannot perform mechanically. Your role is to surface it clearly for review.
+
+Write `{{uatResultAbsPath}}` with:
+
+```markdown
+---
+sliceId: {{sliceId}}
+uatType: {{uatType}}
+verdict: surfaced-for-human-review
+date: <ISO 8601 timestamp>
+---
+
+# UAT Result — {{sliceId}}
+
+## UAT Type
+
+`{{uatType}}` — requires human execution or live-runtime verification.
+
+## Status
+
+Surfaced for human review. Auto-mode will pause after this unit so the UAT can be performed manually.
+
+## UAT File
+
+See `{{uatPath}}` for the full UAT specification and acceptance criteria.
+
+## Instructions for Human Reviewer
+
+Review `{{uatPath}}`, perform the described UAT steps, then update this file with:
+- The actual verdict (PASS / FAIL / PARTIAL)
+- Results for each check
+- Date completed
+
+Once updated, run `/gsd auto` to resume auto-mode.
+```
+
+---
+
+**You MUST write `{{uatResultAbsPath}}` before finishing.**
+
+When done, say: "UAT {{sliceId}} complete."
--- a/src/resources/extensions/gsd/prompts/system.md
+++ b/src/resources/extensions/gsd/prompts/system.md
@ -0,0 +1,220 @@
+## GSD — Get Stuff Done
+
+You are **GSD** — a coding agent that gets shit done.
+
+Be direct. Execute the work. Verify results. Fix root causes. Keep momentum. Leave the project in a state where the next agent can immediately understand what happened and continue.
+
+This project uses GSD for structured planning and execution. Artifacts live in `.gsd/`.
+
+If a `GSD Skill Preferences` block is present below this contract, treat it as explicit durable guidance for which skills to use, prefer, or avoid during GSD work. Follow it where it does not conflict with required GSD artifact rules, verification requirements, or higher-priority system/developer instructions.
+
+### Naming Convention
+
+Directories use bare IDs. Files use ID-SUFFIX format:
+
+- Milestone dirs: `M001/`
+- Milestone files: `M001-CONTEXT.md`, `M001-ROADMAP.md`, `M001-RESEARCH.md`
+- Slice dirs: `S01/`
+- Slice files: `S01-PLAN.md`, `S01-RESEARCH.md`, `S01-SUMMARY.md`, `S01-UAT.md`
+- Task files: `T01-PLAN.md`, `T01-SUMMARY.md`
+
+Titles live inside file content (headings, frontmatter), not in file or directory names.
+
+### Directory Structure
+
+```
+.gsd/
+  PROJECT.md          (living doc — what the project is right now)
+  DECISIONS.md        (append-only register of architectural and pattern decisions)
+  QUEUE.md            (append-only log of queued milestones via /gsd queue)
+  STATE.md
+  milestones/
+    M001/
+      M001-CONTEXT.md
+      M001-RESEARCH.md
+      M001-ROADMAP.md
+      M001-SUMMARY.md
+      slices/
+        S01/
+          S01-CONTEXT.md    (optional)
+          S01-RESEARCH.md   (optional)
+          S01-PLAN.md
+          S01-SUMMARY.md
+          S01-UAT.md
+          tasks/
+            T01-PLAN.md
+            T01-SUMMARY.md
+```
+
+### Conventions
+
+- **PROJECT.md** is a living document describing what the project is right now — current state only, updated at slice completion when stale
+- **DECISIONS.md** is an append-only register of architectural and pattern decisions — read it during planning/research, append to it during execution when a meaningful decision is made
+- **Milestones** are major project phases (M001, M002, ...)
+- **Slices** are demoable vertical increments (S01, S02, ...) ordered by risk. After each slice completes, the roadmap is reassessed before the next slice begins.
+- **Tasks** are single-context-window units of work (T01, T02, ...)
+- Checkboxes in roadmap and plan files track completion (`[ ]` → `[x]`)
+- Each slice gets its own git branch: `gsd/M001/S01`
+- Slices are squash-merged to main when complete
+- Summaries compress prior work — read them instead of re-reading all task details
+- `STATE.md` is the quick-glance status file — keep it updated after changes
+
+### Artifact Templates
+
+Templates showing the expected format for each artifact type are in:
+`~/.pi/agent/extensions/gsd/templates/`
+
+**Always read the relevant template before writing an artifact** to match the expected structure exactly. The parsers that read these files depend on specific formatting:
+
+- Roadmap slices: `- [ ] **S01: Title** \`risk:level\` \`depends:[]\``
+- Plan tasks: `- [ ] **T01: Title** \`est:estimate\``
+- Summaries use YAML frontmatter
+
+### Activity Logs
+
+Auto-mode saves session logs to `.gsd/activity/` before each context wipe.
+Files are sequentially numbered: `001-execute-task-M001-S01-T01.jsonl`, etc.
+These are raw JSONL debug artifacts — used automatically for retry diagnostics.
+
+`.gsd/activity/` is automatically added to `.gitignore` during bootstrap.
+
+### Commands
+
+- `/gsd` — contextual wizard
+- `/gsd auto` — auto-execute (fresh context per task)
+- `/gsd stop` — stop auto-mode
+- `/gsd status` — progress dashboard overlay
+- `/gsd queue` — queue future milestones (safe while auto-mode is running)
+- `Ctrl+Alt+G` — toggle dashboard overlay
+
+### Tool-routing hierarchy
+
+Use the lightest sufficient tool first.
+
+- Known file path, need contents -> `read`
+- Search repo text or symbols -> `bash` with `rg`
+- Search by filename or path -> `bash` with `find` or `rg --files`
+- Precise existing-file change -> `read` then `edit`
+- New file or full rewrite -> `write`
+- Broad unfamiliar subsystem mapping -> `subagent` with `scout`
+- Library, package, or framework truth -> `resolve_library` then `get_library_docs`
+- Current external facts -> `search-the-web`, then `fetch_page` for full page content
+- Long-running or indefinite shell commands (servers, watchers, builds) -> `bg_shell` with `start` + `wait_for_ready`
+- Background process status check -> `bg_shell` with `digest` (not `output`)
+- Background process debugging -> `bg_shell` with `highlights`, then `output` with `filter`
+- UI behavior verification -> browser tools
+- Secrets -> `secure_env_collect`
+
+### Web research vs browser execution
+
+Treat these as different jobs.
+
+- Use `search-the-web` + `fetch_page` for current external knowledge: release notes, product changes, pricing, news, public docs, and fast-moving ecosystem facts.
+- Use browser tools for interactive execution and verification: local app flows, reproducing browser bugs, DOM behavior, navigation, auth flows, and user-visible UI outcomes.
+- Do not use browser tools as a substitute for web research.
+- Do not use web search as a substitute for exercising a real browser flow.
+
+### Verification and definition of done
+
+Verify according to task type.
+
+- Bug fix -> rerun the exact repro
+- Script or CLI fix -> rerun the exact command
+- UI or web fix -> verify in the browser and check console or network logs when relevant
+- Env or secrets fix -> rerun the blocked workflow after applying secrets
+- Refactor -> run tests or build plus a targeted smoke check
+- File delete, move, or rename -> confirm filesystem state
+- Docs or config change -> verify referenced paths, commands, and settings match reality
+
+For non-trivial backend, async, stateful, integration, or UI work, verification must cover both behavior and observability.
+
+- Verify the feature works
+- Verify the failure path or diagnostic surface is inspectable
+- Verify the chosen status/log/error surface exposes enough information for a future agent to localize problems quickly
+
+If a command or workflow fails, continue the loop: inspect the error, fix it, rerun it, and repeat until it passes or a real blocker requires user input.
+
+### Agent-First Observability
+
+GSD is optimized for agent autonomy. Build systems so a future agent can inspect current state, localize failures, and continue work without relying on human intuition.
+
+Prefer:
+- Structured, machine-readable logs or events over ad hoc prose logs
+- Stable error types/codes and preserved causal context over vague failures
+- Explicit state transitions and status inspection surfaces over implicit behavior
+- Durable diagnostics that survive the current run when they materially improve recovery
+- High-signal summaries and status endpoints over log spam
+
+For relevant work, plan and implement:
+- Health/readiness/status surfaces for services, jobs, pipelines, and long-running work
+- Observable failure state: last error, phase, timestamp, identifiers, retry count, or equivalent
+- Deterministic verification of both happy path and at least one diagnostic/failure-path signal
+- Safe redaction boundaries: never log secrets, tokens, or sensitive raw payloads unnecessarily
+
+Temporary instrumentation is allowed during debugging. Remove noisy one-off instrumentation before finishing unless it provides durable diagnostic value.
+
+### Root-cause-first debugging
+
+- Fix the root cause, not just the visible symptom, unless the user explicitly wants a temporary workaround.
+- Prefer changes that remove the failure mode over changes that merely mask it.
+- When applying a temporary mitigation, label it clearly and preserve a path to the real fix.
+
+## Situational Playbooks
+
+### Background processes
+
+Use `bg_shell` instead of `bash` for any command that runs indefinitely or takes a long time.
+
+**Starting processes:**
+- Set `type:'server'` and `ready_port:<port>` for dev servers so readiness detection is automatic.
+- Set `group:'<name>'` on related processes (e.g. frontend + backend) to manage them together.
+- Use `ready_pattern:'<regex>'` for processes with non-standard readiness signals.
+- The tool auto-classifies commands as server/build/test/watcher/generic and applies smart defaults.
+
+**After starting — use `wait_for_ready` instead of polling:**
+- `wait_for_ready` blocks until the process signals readiness (pattern match or port open) or times out.
+- This replaces the old pattern of `start` → `sleep` → `output` → check → repeat. One tool call instead of many.
+
+**Checking status — use `digest` instead of `output`:**
+- `digest` returns a structured ~30-token summary (status, ports, URLs, error count, change summary) instead of ~2000 tokens of raw output. Use this by default.
+- `highlights` returns only significant lines (errors, URLs, results) — typically 5-15 lines instead of hundreds.
+- `output` returns raw incremental lines — use only when debugging and you need full text. Add `filter:'error|warning'` to narrow results.
+- Token budget hierarchy: `digest` (~30 tokens) < `highlights` (~100 tokens) < `output` (~2000 tokens). Always start with the lightest.
+
+**Lifecycle awareness:**
+- Process crashes and errors are automatically surfaced as alerts at the start of your next turn — you don't need to poll for failures.
+- Use `group_status` to check health of related processes as a unit.
+- Use `restart` to kill and relaunch with the same config — preserves restart count.
+
+**Interactive processes:**
+- Use `send_and_wait` for interactive CLIs: send input and wait for an expected output pattern. Replaces manual `send` → `sleep` → `output` polling.
+
+**Cleanup:**
+- Kill processes when done with them — do not leave orphans.
+- Use `list` to see all running background processes.
+
+### Web behavior
+
+When the task involves frontend behavior, DOM interactions, navigation, or user flows, verify with browser tools against a running app before marking the work complete.
+
+Use browser tools with this operating order unless there is a clear reason not to:
+
+1. Cheap discovery first — use `browser_find` or `browser_snapshot_refs` to locate likely targets
+2. Deterministic targeting — prefer refs or explicit selectors over coordinates
+3. Batch obvious sequences — if the next 2-5 browser actions are clear and low-risk, use `browser_batch`
+4. Assert outcomes explicitly — prefer `browser_assert` over inferring success from prose summaries
+5. Diff ambiguous outcomes — use `browser_diff` when the effect of an action is unclear
+6. Inspect diagnostics only when needed — use console/network/dialog logs when assertions or diffs suggest failure
+7. Escalate inspection gradually — use `browser_get_accessibility_tree` only when targeted discovery is insufficient; use `browser_get_page_source` and `browser_evaluate` as escape hatches, not defaults
+8. Use screenshots as supporting evidence — do not default to screenshot-first browsing when semantic tools are sufficient
+
+For browser or UI work, “verified” means the flow was exercised and the expected outcome was checked explicitly with `browser_assert` or an equally structured browser signal whenever possible.
+
+For browser failures, debug in this order:
+1. inspect the failing assertion or explicit success signal
+2. inspect `browser_diff`
+3. inspect recent console/network/dialog diagnostics
+4. inspect targeted element or accessibility state
+5. only then escalate to broader page inspection
+
+Retry only with a new hypothesis. Do not thrash.
--- a/src/resources/extensions/gsd/session-forensics.ts
+++ b/src/resources/extensions/gsd/session-forensics.ts
@ -0,0 +1,487 @@
+/**
+ * GSD Session Forensics — Deep analysis of pi session JSONL files
+ *
+ * Pi's SessionManager persists every entry to disk via appendFileSync as it
+ * happens. When a crash occurs, the session JSONL on disk contains every tool
+ * call, every assistant response, and every error up to the moment of death.
+ *
+ * This module reads that file and reconstructs a structured execution trace
+ * that tells the recovering agent exactly what happened, what changed, and
+ * where to resume.
+ *
+ * Used by:
+ * - Crash recovery (reading the surviving pi session file)
+ * - Stuck-retry diagnostics (reading GSD activity log copies)
+ *
+ * Entry format (verified against real pi session files):
+ * - Tool calls: { type: "toolCall", name: "bash", id: "toolu_...", arguments: { command: "..." } }
+ * - Tool results: { role: "toolResult", toolCallId: "toolu_...", toolName: "bash", isError: bool, content: ... }
+ */
+
+import { readFileSync, readdirSync, existsSync } from "node:fs";
+import { execSync } from "node:child_process";
+import { basename, join } from "node:path";
+
+// ─── Types ────────────────────────────────────────────────────────────────────
+
+export interface ToolCall {
+  name: string;
+  input: Record<string, unknown>;
+  result?: string;
+  isError: boolean;
+}
+
+export interface ExecutionTrace {
+  /** Ordered list of tool calls with results */
+  toolCalls: ToolCall[];
+  /** Files written or edited (deduplicated, ordered by first occurrence) */
+  filesWritten: string[];
+  /** Files read (deduplicated) */
+  filesRead: string[];
+  /** Shell commands executed with exit status */
+  commandsRun: { command: string; failed: boolean }[];
+  /** Tool errors encountered */
+  errors: string[];
+  /** The agent's last reasoning / text output before crash */
+  lastReasoning: string;
+  /** Total tool calls completed (have matching results) */
+  toolCallCount: number;
+}
+
+export interface RecoveryBriefing {
+  /** What the agent was doing */
+  unitType: string;
+  unitId: string;
+  /** Structured execution trace */
+  trace: ExecutionTrace;
+  /** Git state: files modified/added/deleted since unit started */
+  gitChanges: string | null;
+  /** Formatted prompt section ready for injection */
+  prompt: string;
+}
+
+// ─── JSONL Parsing ────────────────────────────────────────────────────────────
+
+function parseJSONL(raw: string): unknown[] {
+  return raw.trim().split("\n").map(line => {
+    try { return JSON.parse(line); }
+    catch { return null; }
+  }).filter(Boolean) as unknown[];
+}
+
+/**
+ * Find the entries belonging to the last session in a JSONL file.
+ * Auto-mode creates a new session per unit, so the last session header
+ * marks the start of the crashed unit's entries.
+ */
+function extractLastSession(entries: unknown[]): unknown[] {
+  let lastSessionIdx = -1;
+  for (let i = entries.length - 1; i >= 0; i--) {
+    const entry = entries[i] as Record<string, unknown>;
+    if (entry.type === "session") {
+      lastSessionIdx = i;
+      break;
+    }
+  }
+  return lastSessionIdx >= 0 ? entries.slice(lastSessionIdx) : entries;
+}
+
+// ─── Trace Extraction ─────────────────────────────────────────────────────────
+
+/**
+ * Extract a structured execution trace from raw session entries.
+ * Works with both pi session JSONL and GSD activity log JSONL.
+ */
+export function extractTrace(entries: unknown[]): ExecutionTrace {
+  const toolCalls: ToolCall[] = [];
+  const filesWritten: string[] = [];
+  const filesRead: string[] = [];
+  const commandsRun: { command: string; failed: boolean }[] = [];
+  const errors: string[] = [];
+  let lastReasoning = "";
+
+  // Track pending tool calls by ID for matching with results
+  const pendingTools = new Map<string, { name: string; input: Record<string, unknown> }>();
+
+  const seenWritten = new Set<string>();
+  const seenRead = new Set<string>();
+
+  for (const raw of entries) {
+    const entry = raw as Record<string, unknown>;
+    if (entry.type !== "message" || !entry.message) continue;
+    const msg = entry.message as Record<string, unknown>;
+
+    // ── Assistant messages: tool calls + reasoning ──
+    if (msg.role === "assistant" && Array.isArray(msg.content)) {
+      for (const part of msg.content as Record<string, unknown>[]) {
+        // Text reasoning
+        if (part.type === "text" && part.text) {
+          lastReasoning = String(part.text);
+        }
+
+        // Tool call initiation
+        // Pi format: { type: "toolCall", name: "bash", id: "toolu_...", arguments: { command: "..." } }
+        if (part.type === "toolCall") {
+          const name = String(part.name || "unknown").toLowerCase();
+          const input = (part.arguments || part.input || {}) as Record<string, unknown>;
+          const id = String(part.id || "");
+
+          if (id) {
+            pendingTools.set(id, { name, input });
+          }
+
+          // Track file operations
+          const path = input.path ? String(input.path) : null;
+          if (path) {
+            if (name === "write" || name === "edit") {
+              if (!seenWritten.has(path)) { seenWritten.add(path); filesWritten.push(path); }
+            } else if (name === "read") {
+              if (!seenRead.has(path)) { seenRead.add(path); filesRead.push(path); }
+            }
+          }
+
+          // Track shell commands
+          if ((name === "bash" || name === "bg_shell") && input.command) {
+            commandsRun.push({ command: String(input.command), failed: false });
+          }
+        }
+      }
+    }
+
+    // ── Tool results: match with pending calls ──
+    // Pi format: { role: "toolResult", toolCallId: "toolu_...", toolName: "bash", isError: bool, content: ... }
+    if (msg.role === "toolResult") {
+      const id = String(msg.toolCallId || "");
+      const isError = !!msg.isError;
+      const resultText = extractResultText(msg);
+
+      const pending = pendingTools.get(id);
+      if (pending) {
+        toolCalls.push({
+          name: pending.name,
+          input: redactInput(pending.name, pending.input),
+          result: resultText.slice(0, 500),
+          isError,
+        });
+        pendingTools.delete(id);
+
+        // Mark failed commands
+        if (isError && (pending.name === "bash" || pending.name === "bg_shell")) {
+          const lastCmd = findLast(commandsRun, c => c.command === String(pending.input.command));
+          if (lastCmd) lastCmd.failed = true;
+        }
+      }
+
+      if (isError && resultText) {
+        errors.push(resultText.slice(0, 300));
+      }
+    }
+  }
+
+  // Flush any pending tool calls that never got results (crash mid-tool)
+  for (const [, pending] of pendingTools) {
+    toolCalls.push({
+      name: pending.name,
+      input: redactInput(pending.name, pending.input),
+      isError: false,
+    });
+  }
+
+  return {
+    toolCalls,
+    filesWritten,
+    filesRead,
+    commandsRun,
+    errors,
+    lastReasoning: lastReasoning.slice(-600).trim(),
+    toolCallCount: toolCalls.length,
+  };
+}
+
+// ─── Git State ────────────────────────────────────────────────────────────────
+
+function getGitChanges(basePath: string): string | null {
+  try {
+    const status = execSync("git status --porcelain", { cwd: basePath, stdio: "pipe" }).toString().trim();
+    if (!status) return null;
+
+    const diffStat = execSync("git diff --stat HEAD 2>/dev/null || true", { cwd: basePath, stdio: "pipe" }).toString().trim();
+    const stagedStat = execSync("git diff --stat --cached HEAD 2>/dev/null || true", { cwd: basePath, stdio: "pipe" }).toString().trim();
+
+    const parts: string[] = [];
+    if (status) parts.push(`Status:\n${status}`);
+    if (stagedStat) parts.push(`Staged:\n${stagedStat}`);
+    if (diffStat) parts.push(`Unstaged:\n${diffStat}`);
+    return parts.join("\n\n");
+  } catch {
+    return null;
+  }
+}
+
+// ─── Recovery Briefing ────────────────────────────────────────────────────────
+
+/**
+ * Synthesize a full crash recovery briefing.
+ *
+ * Reads the surviving pi session file (or falls back to the last GSD activity
+ * log), deep-parses it into an execution trace, combines with git state, and
+ * formats a structured prompt section ready for injection.
+ */
+export function synthesizeCrashRecovery(
+  basePath: string,
+  unitType: string,
+  unitId: string,
+  sessionFile?: string,
+  activityDir?: string,
+): RecoveryBriefing | null {
+  try {
+    let trace: ExecutionTrace | null = null;
+
+    // Primary source: surviving pi session file
+    if (sessionFile && existsSync(sessionFile)) {
+      const raw = readFileSync(sessionFile, "utf-8");
+      const allEntries = parseJSONL(raw);
+      const sessionEntries = extractLastSession(allEntries);
+      trace = extractTrace(sessionEntries);
+    }
+
+    // Fallback: last GSD activity log
+    if (!trace || trace.toolCallCount === 0) {
+      const fallbackTrace = readLastActivityLog(activityDir);
+      if (fallbackTrace && fallbackTrace.toolCallCount > 0) {
+        trace = fallbackTrace;
+      }
+    }
+
+    // If no trace from either source, still provide git state
+    if (!trace) {
+      trace = {
+        toolCalls: [], filesWritten: [], filesRead: [],
+        commandsRun: [], errors: [], lastReasoning: "", toolCallCount: 0,
+      };
+    }
+
+    const gitChanges = getGitChanges(basePath);
+    const prompt = formatRecoveryPrompt(unitType, unitId, trace, gitChanges);
+
+    return { unitType, unitId, trace, gitChanges, prompt };
+  } catch {
+    return null;
+  }
+}
+
+/**
+ * Deep diagnostic from any JSONL source (activity log or session file).
+ * Replaces the old shallow getLastActivityDiagnostic().
+ */
+export function getDeepDiagnostic(basePath: string): string | null {
+  const activityDir = join(basePath, ".gsd", "activity");
+  const trace = readLastActivityLog(activityDir);
+  if (!trace || trace.toolCallCount === 0) return null;
+  return formatTraceSummary(trace);
+}
+
+// ─── Formatting ───────────────────────────────────────────────────────────────
+
+function formatRecoveryPrompt(
+  unitType: string,
+  unitId: string,
+  trace: ExecutionTrace,
+  gitChanges: string | null,
+): string {
+  const sections: string[] = [];
+
+  sections.push(
+    "## Crash Recovery Briefing",
+    "",
+    `You are resuming \`${unitType}\` for \`${unitId}\` after a crash.`,
+    `The previous session completed **${trace.toolCallCount} tool calls** before dying.`,
+    "Use this briefing to pick up exactly where it left off. Do NOT redo completed work.",
+  );
+
+  // Tool call trace — compact summary
+  if (trace.toolCalls.length > 0) {
+    sections.push("", "### Completed Tool Calls");
+    const summary = compressToolCallTrace(trace.toolCalls);
+    sections.push(summary);
+  }
+
+  // Files written
+  if (trace.filesWritten.length > 0) {
+    sections.push(
+      "", "### Files Already Written/Edited",
+      ...trace.filesWritten.map(f => `- \`${f}\``),
+      "",
+      "These files exist on disk from the previous run. Verify they look correct before continuing.",
+    );
+  }
+
+  // Commands run
+  const significantCommands = trace.commandsRun.filter(c =>
+    !c.command.startsWith("git ") || c.failed,
+  );
+  if (significantCommands.length > 0) {
+    sections.push("", "### Commands Already Run");
+    for (const c of significantCommands.slice(-10)) {
+      const status = c.failed ? " ❌" : " ✓";
+      sections.push(`- \`${truncate(c.command, 120)}\`${status}`);
+    }
+  }
+
+  // Errors
+  if (trace.errors.length > 0) {
+    sections.push(
+      "", "### Errors Before Crash",
+      ...trace.errors.slice(-3).map(e => `- ${truncate(e, 200)}`),
+    );
+  }
+
+  // Git state
+  if (gitChanges) {
+    sections.push(
+      "", "### Current Git State (filesystem truth)",
+      "```", gitChanges, "```",
+    );
+  }
+
+  // Last reasoning
+  if (trace.lastReasoning) {
+    sections.push(
+      "", "### Last Agent Reasoning Before Crash",
+      `> ${trace.lastReasoning.replace(/\n/g, "\n> ")}`,
+    );
+  }
+
+  sections.push(
+    "",
+    "### Resume Instructions",
+    "1. Check the task plan for remaining work",
+    "2. Verify files listed above exist and look correct on disk",
+    "3. Continue from where the previous session left off",
+    "4. Do NOT re-read files or re-run commands that already succeeded above",
+  );
+
+  return sections.join("\n");
+}
+
+/**
+ * Compress a tool call trace into a readable summary.
+ * Groups consecutive reads, shows write/edit/bash individually.
+ */
+function compressToolCallTrace(calls: ToolCall[]): string {
+  const lines: string[] = [];
+  let readBatch: string[] = [];
+
+  function flushReads() {
+    if (readBatch.length === 0) return;
+    if (readBatch.length <= 2) {
+      for (const path of readBatch) lines.push(`  read \`${path}\``);
+    } else {
+      lines.push(`  read ${readBatch.length} files: ${readBatch.map(p => `\`${basename(p)}\``).join(", ")}`);
+    }
+    readBatch = [];
+  }
+
+  for (let i = 0; i < calls.length; i++) {
+    const call = calls[i]!;
+    const num = i + 1;
+
+    if (call.name === "read" && call.input.path) {
+      readBatch.push(String(call.input.path));
+      continue;
+    }
+
+    flushReads();
+
+    const err = call.isError ? " ❌" : "";
+
+    if (call.name === "write" || call.name === "edit") {
+      lines.push(`${num}. ${call.name} \`${call.input.path || "?"}\`${err}`);
+    } else if (call.name === "bash" || call.name === "bg_shell") {
+      const cmd = truncate(String(call.input.command || ""), 80);
+      lines.push(`${num}. ${call.name}: \`${cmd}\`${err}`);
+    } else {
+      lines.push(`${num}. ${call.name}${err}`);
+    }
+  }
+
+  flushReads();
+  return lines.join("\n");
+}
+
+function formatTraceSummary(trace: ExecutionTrace): string {
+  const parts: string[] = [];
+  parts.push(`Tool calls completed: ${trace.toolCallCount}`);
+
+  if (trace.filesWritten.length > 0) {
+    parts.push(`Files written: ${trace.filesWritten.map(f => `\`${f}\``).join(", ")}`);
+  }
+  if (trace.commandsRun.length > 0) {
+    const cmds = trace.commandsRun.slice(-5).map(c => `\`${truncate(c.command, 80)}\`${c.failed ? " ❌" : ""}`);
+    parts.push(`Commands run: ${cmds.join(", ")}`);
+  }
+  if (trace.errors.length > 0) {
+    parts.push(`Errors: ${trace.errors.slice(-3).join("; ")}`);
+  }
+  if (trace.lastReasoning) {
+    parts.push(`Last reasoning: "${trace.lastReasoning}"`);
+  }
+  return parts.join("\n");
+}
+
+// ─── Helpers ──────────────────────────────────────────────────────────────────
+
+function readLastActivityLog(activityDir?: string): ExecutionTrace | null {
+  if (!activityDir) return null;
+  try {
+    if (!existsSync(activityDir)) return null;
+    const files = readdirSync(activityDir).filter(f => f.endsWith(".jsonl")).sort();
+    if (files.length === 0) return null;
+
+    const lastFile = files[files.length - 1]!;
+    const raw = readFileSync(join(activityDir, lastFile), "utf-8");
+    return extractTrace(parseJSONL(raw));
+  } catch {
+    return null;
+  }
+}
+
+function extractResultText(msg: Record<string, unknown>): string {
+  const content = msg.content;
+  if (typeof content === "string") return content;
+  if (Array.isArray(content)) {
+    return content
+      .filter((p: Record<string, unknown>) => p.type === "text")
+      .map((p: Record<string, unknown>) => String(p.text || ""))
+      .join(" ");
+  }
+  return "";
+}
+
+/**
+ * Redact sensitive fields from tool inputs.
+ * Keep paths and commands, drop large content bodies.
+ */
+function redactInput(name: string, input: Record<string, unknown>): Record<string, unknown> {
+  const safe: Record<string, unknown> = {};
+  for (const [key, value] of Object.entries(input)) {
+    if (key === "content" || key === "oldText" || key === "newText") {
+      safe[key] = typeof value === "string" ? truncate(value, 100) : "[redacted]";
+    } else {
+      safe[key] = value;
+    }
+  }
+  return safe;
+}
+
+/** Array.findLast polyfill for older Node versions */
+function findLast<T>(arr: T[], predicate: (item: T) => boolean): T | undefined {
+  for (let i = arr.length - 1; i >= 0; i--) {
+    if (predicate(arr[i]!)) return arr[i];
+  }
+  return undefined;
+}
+
+function truncate(s: string, max: number): string {
+  return s.length > max ? s.slice(0, max) + "…" : s;
+}
--- a/src/resources/extensions/gsd/skill-discovery.ts
+++ b/src/resources/extensions/gsd/skill-discovery.ts
@ -0,0 +1,137 @@
+/**
+ * GSD Skill Discovery
+ *
+ * Detects skills installed during auto-mode by comparing the current
+ * skills directory against a snapshot taken at auto-mode start.
+ *
+ * New skills are injected into the system prompt via before_agent_start,
+ * making them visible to all subsequent units without requiring a reload.
+ */
+
+import { existsSync, readdirSync, readFileSync } from "node:fs";
+import { join } from "node:path";
+import { getAgentDir } from "@mariozechner/pi-coding-agent";
+
+const SKILLS_DIR = join(getAgentDir(), "skills");
+
+export interface DiscoveredSkill {
+  name: string;
+  description: string;
+  location: string;
+}
+
+/** Snapshot of skill names at auto-mode start */
+let baselineSkills: Set<string> | null = null;
+
+/**
+ * Snapshot the current skills directory. Call at auto-mode start.
+ */
+export function snapshotSkills(): void {
+  baselineSkills = new Set(listSkillDirs());
+}
+
+/**
+ * Clear the snapshot. Call when auto-mode stops.
+ */
+export function clearSkillSnapshot(): void {
+  baselineSkills = null;
+}
+
+/**
+ * Check if a snapshot is active (auto-mode is running with discovery).
+ */
+export function hasSkillSnapshot(): boolean {
+  return baselineSkills !== null;
+}
+
+/**
+ * Detect skills installed since the snapshot was taken.
+ * Returns skill metadata for any new skills found.
+ */
+export function detectNewSkills(): DiscoveredSkill[] {
+  if (!baselineSkills) return [];
+
+  const current = listSkillDirs();
+  const newSkills: DiscoveredSkill[] = [];
+
+  for (const dir of current) {
+    if (baselineSkills.has(dir)) continue;
+
+    const skillMdPath = join(SKILLS_DIR, dir, "SKILL.md");
+    if (!existsSync(skillMdPath)) continue;
+
+    const meta = parseSkillFrontmatter(skillMdPath);
+    if (meta) {
+      newSkills.push({
+        name: meta.name || dir,
+        description: meta.description || `Skill: ${dir}`,
+        location: skillMdPath,
+      });
+    }
+  }
+
+  return newSkills;
+}
+
+/**
+ * Format discovered skills as an XML block matching pi's <available_skills> format.
+ * This can be appended to the system prompt so the LLM sees them naturally.
+ */
+export function formatSkillsXml(skills: DiscoveredSkill[]): string {
+  if (skills.length === 0) return "";
+
+  const entries = skills.map(s => `  <skill>
+    <name>${escapeXml(s.name)}</name>
+    <description>${escapeXml(s.description)}</description>
+    <location>${escapeXml(s.location)}</location>
+  </skill>`).join("\n");
+
+  return `\n<newly_discovered_skills>
+The following skills were installed during this auto-mode session.
+Use the read tool to load a skill's file when the task matches its description.
+
+${entries}
+</newly_discovered_skills>`;
+}
+
+// ─── Internals ────────────────────────────────────────────────────────────────
+
+function listSkillDirs(): string[] {
+  if (!existsSync(SKILLS_DIR)) return [];
+  try {
+    return readdirSync(SKILLS_DIR, { withFileTypes: true })
+      .filter(d => d.isDirectory())
+      .map(d => d.name);
+  } catch {
+    return [];
+  }
+}
+
+function parseSkillFrontmatter(path: string): { name?: string; description?: string } | null {
+  try {
+    const content = readFileSync(path, "utf-8");
+    const match = content.match(/^---\n([\s\S]*?)\n---/);
+    if (!match) return null;
+
+    const fm = match[1];
+    const result: { name?: string; description?: string } = {};
+
+    const nameMatch = fm.match(/^name:\s*(.+)$/m);
+    if (nameMatch) result.name = nameMatch[1].trim();
+
+    const descMatch = fm.match(/^description:\s*(.+)$/m);
+    if (descMatch) result.description = descMatch[1].trim();
+
+    return result;
+  } catch {
+    return null;
+  }
+}
+
+function escapeXml(text: string): string {
+  return text
+    .replace(/&/g, "&amp;")
+    .replace(/</g, "&lt;")
+    .replace(/>/g, "&gt;")
+    .replace(/"/g, "&quot;");
+}
--- a/src/resources/extensions/gsd/state.ts
+++ b/src/resources/extensions/gsd/state.ts
@ -0,0 +1,439 @@
+// GSD Extension — State Derivation
+// Reads roadmap + plan files to determine current position.
+// Pure TypeScript, zero Pi dependencies.
+
+import type {
+  GSDState,
+  ActiveRef,
+  Roadmap,
+  RoadmapSliceEntry,
+  SlicePlan,
+  MilestoneRegistryEntry,
+} from './types.ts';
+
+import {
+  parseRoadmap,
+  parsePlan,
+  parseSummary,
+  loadFile,
+  parseRequirementCounts,
+  parseContextDependsOn,
+} from './files.ts';
+
+import {
+  milestonesDir,
+  resolveMilestonePath,
+  resolveMilestoneFile,
+  resolveSlicePath,
+  resolveSliceFile,
+  resolveTaskFile,
+  resolveGsdRootFile,
+} from './paths.ts';
+import { getActiveSliceBranch } from './worktree.ts';
+
+import { readdirSync } from 'fs';
+import { join } from 'path';
+
+// ─── Query Functions ───────────────────────────────────────────────────────
+
+/**
+ * Check if all tasks in a slice plan are done.
+ */
+export function isSliceComplete(plan: SlicePlan): boolean {
+  return plan.tasks.length > 0 && plan.tasks.every(t => t.done);
+}
+
+/**
+ * Check if all slices in a roadmap are done.
+ */
+export function isMilestoneComplete(roadmap: Roadmap): boolean {
+  return roadmap.slices.length > 0 && roadmap.slices.every(s => s.done);
+}
+
+// ─── State Derivation ──────────────────────────────────────────────────────
+
+/**
+ * Find all milestone directory IDs by scanning .gsd/milestones/.
+ * Extracts the ID prefix (e.g. "M001") from directory names like "M001-PAYMENT-INTEGRATIONS".
+ */
+function findMilestoneIds(basePath: string): string[] {
+  const dir = milestonesDir(basePath);
+  try {
+    return readdirSync(dir, { withFileTypes: true })
+      .filter(d => d.isDirectory())
+      .map(d => {
+        const match = d.name.match(/^(M\d+)/);
+        return match ? match[1] : d.name;
+      })
+      .sort();
+  } catch {
+    return [];
+  }
+}
+
+/**
+ * Returns the ID of the first incomplete milestone, or null if all are complete.
+ */
+export async function getActiveMilestoneId(basePath: string): Promise<string | null> {
+  const milestoneIds = findMilestoneIds(basePath);
+  for (const mid of milestoneIds) {
+    const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+    const content = roadmapFile ? await loadFile(roadmapFile) : null;
+    if (!content) return mid; // No roadmap yet — milestone is incomplete
+    const roadmap = parseRoadmap(content);
+    if (!isMilestoneComplete(roadmap)) return mid;
+  }
+  return null;
+}
+
+/**
+ * Reconstruct GSD state from files on disk.
+ * This is the source of truth — STATE.md is just a cache of this output.
+ */
+export async function deriveState(basePath: string): Promise<GSDState> {
+  const milestoneIds = findMilestoneIds(basePath);
+  const requirements = parseRequirementCounts(await loadFile(resolveGsdRootFile(basePath, "REQUIREMENTS")));
+
+  if (milestoneIds.length === 0) {
+    return {
+      activeMilestone: null,
+      activeSlice: null,
+      activeTask: null,
+      phase: 'pre-planning',
+      recentDecisions: [],
+      blockers: [],
+      nextAction: 'No milestones found. Run /gsd to create one.',
+      registry: [],
+      requirements,
+      progress: {
+        milestones: { done: 0, total: 0 },
+      },
+    };
+  }
+
+  // Pre-compute the set of complete milestone IDs for dependency checking.
+  // This allows forward references (M002 depending on M003) to resolve correctly.
+  const completeMilestoneIds = new Set<string>();
+  for (const mid of milestoneIds) {
+    const rf = resolveMilestoneFile(basePath, mid, "ROADMAP");
+    const rc = rf ? await loadFile(rf) : null;
+    if (!rc) continue;
+    const rmap = parseRoadmap(rc);
+    if (!isMilestoneComplete(rmap)) continue;
+    const sf = resolveMilestoneFile(basePath, mid, "SUMMARY");
+    if (sf) completeMilestoneIds.add(mid);
+  }
+
+  // Build the registry and locate the active milestone in a single pass.
+  const registry: MilestoneRegistryEntry[] = [];
+  let activeMilestone: ActiveRef | null = null;
+  let activeRoadmap: Roadmap | null = null;
+  let activeMilestoneFound = false;
+
+  for (const mid of milestoneIds) {
+    const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+    const content = roadmapFile ? await loadFile(roadmapFile) : null;
+    if (!content) {
+      // No roadmap yet — treat as incomplete/active
+      if (!activeMilestoneFound) {
+        activeMilestone = { id: mid, title: mid };
+        activeMilestoneFound = true;
+        registry.push({ id: mid, title: mid, status: 'active' });
+      } else {
+        registry.push({ id: mid, title: mid, status: 'pending' });
+      }
+      continue;
+    }
+
+    const roadmap = parseRoadmap(content);
+    const title = roadmap.title.replace(/^M\d+[^:]*:\s*/, '');
+    const complete = isMilestoneComplete(roadmap);
+
+    if (complete) {
+      // All slices done — check if milestone summary exists
+      const summaryFile = resolveMilestoneFile(basePath, mid, "SUMMARY");
+      if (!summaryFile && !activeMilestoneFound) {
+        // All slices complete but no summary written yet → completing-milestone
+        activeMilestone = { id: mid, title };
+        activeRoadmap = roadmap;
+        activeMilestoneFound = true;
+        registry.push({ id: mid, title, status: 'active' });
+      } else {
+        registry.push({ id: mid, title, status: 'complete' });
+      }
+    } else if (!activeMilestoneFound) {
+      // Check milestone-level dependencies before promoting to active
+      const contextFile = resolveMilestoneFile(basePath, mid, "CONTEXT");
+      const contextContent = contextFile ? await loadFile(contextFile) : null;
+      const deps = parseContextDependsOn(contextContent);
+      const depsUnmet = deps.some(dep => !completeMilestoneIds.has(dep));
+      if (depsUnmet) {
+        registry.push({ id: mid, title, status: 'pending', dependsOn: deps });
+        // Do NOT set activeMilestoneFound — let the loop continue to the next milestone
+      } else {
+        activeMilestone = { id: mid, title };
+        activeRoadmap = roadmap;
+        activeMilestoneFound = true;
+        registry.push({ id: mid, title, status: 'active', ...(deps.length > 0 ? { dependsOn: deps } : {}) });
+      }
+    } else {
+      const contextFile2 = resolveMilestoneFile(basePath, mid, "CONTEXT");
+      const contextContent2 = contextFile2 ? await loadFile(contextFile2) : null;
+      const deps2 = parseContextDependsOn(contextContent2);
+      registry.push({ id: mid, title, status: 'pending', ...(deps2.length > 0 ? { dependsOn: deps2 } : {}) });
+    }
+  }
+
+  const milestoneProgress = {
+    done: registry.filter(entry => entry.status === 'complete').length,
+    total: registry.length,
+  };
+
+  if (!activeMilestone) {
+    // Check whether any milestones are pending (dep-blocked) vs all complete
+    const pendingEntries = registry.filter(entry => entry.status === 'pending');
+    if (pendingEntries.length > 0) {
+      // All incomplete milestones are dep-blocked — no progress possible
+      const blockerDetails = pendingEntries
+        .filter(entry => entry.dependsOn && entry.dependsOn.length > 0)
+        .map(entry => `${entry.id} is waiting on unmet deps: ${entry.dependsOn!.join(', ')}`);
+      return {
+        activeMilestone: null,
+        activeSlice: null,
+        activeTask: null,
+        phase: 'blocked',
+        recentDecisions: [],
+        blockers: blockerDetails.length > 0
+          ? blockerDetails
+          : ['All remaining milestones are dep-blocked but no deps listed — check CONTEXT.md files'],
+        nextAction: 'Resolve milestone dependencies before proceeding.',
+        registry,
+        requirements,
+        progress: {
+          milestones: milestoneProgress,
+        },
+      };
+    }
+    // All milestones complete
+    const lastEntry = registry[registry.length - 1];
+    return {
+      activeMilestone: lastEntry ? { id: lastEntry.id, title: lastEntry.title } : null,
+      activeSlice: null,
+      activeTask: null,
+      phase: 'complete',
+      recentDecisions: [],
+      blockers: [],
+      nextAction: 'All milestones complete.',
+      registry,
+      requirements,
+      progress: {
+        milestones: milestoneProgress,
+      },
+    };
+  }
+
+  if (!activeRoadmap) {
+    // Active milestone exists but has no roadmap yet — needs planning
+    return {
+      activeMilestone,
+      activeSlice: null,
+      activeTask: null,
+      phase: 'pre-planning',
+      recentDecisions: [],
+      blockers: [],
+      nextAction: `Plan milestone ${activeMilestone.id}.`,
+      registry,
+      requirements,
+      progress: {
+        milestones: milestoneProgress,
+      },
+    };
+  }
+
+  // Check if active milestone needs completion (all slices done, no summary)
+  if (isMilestoneComplete(activeRoadmap)) {
+    const sliceProgress = {
+      done: activeRoadmap.slices.length,
+      total: activeRoadmap.slices.length,
+    };
+    return {
+      activeMilestone,
+      activeSlice: null,
+      activeTask: null,
+      phase: 'completing-milestone',
+      recentDecisions: [],
+      blockers: [],
+      nextAction: `All slices complete in ${activeMilestone.id}. Write milestone summary.`,
+      registry,
+      requirements,
+      progress: {
+        milestones: milestoneProgress,
+        slices: sliceProgress,
+      },
+    };
+  }
+
+  const sliceProgress = {
+    done: activeRoadmap.slices.filter(s => s.done).length,
+    total: activeRoadmap.slices.length,
+  };
+
+  // Find the active slice (first incomplete with deps satisfied)
+  const doneSliceIds = new Set(activeRoadmap.slices.filter(s => s.done).map(s => s.id));
+  let activeSlice: ActiveRef | null = null;
+
+  for (const s of activeRoadmap.slices) {
+    if (s.done) continue;
+    if (s.depends.every(dep => doneSliceIds.has(dep))) {
+      activeSlice = { id: s.id, title: s.title };
+      break;
+    }
+  }
+
+  if (!activeSlice) {
+    return {
+      activeMilestone,
+      activeSlice: null,
+      activeTask: null,
+      phase: 'blocked',
+      recentDecisions: [],
+      blockers: ['No slice eligible — check dependency ordering'],
+      nextAction: 'Resolve dependency blockers or plan next slice.',
+      registry,
+      requirements,
+      progress: {
+        milestones: milestoneProgress,
+        slices: sliceProgress,
+      },
+    };
+  }
+
+  const activeBranch = getActiveSliceBranch(basePath);
+
+  // Check if the slice has a plan
+  const planFile = resolveSliceFile(basePath, activeMilestone.id, activeSlice.id, "PLAN");
+  const slicePlanContent = planFile ? await loadFile(planFile) : null;
+
+  if (!slicePlanContent) {
+    return {
+      activeMilestone,
+      activeSlice,
+      activeTask: null,
+      phase: 'planning',
+      recentDecisions: [],
+      blockers: [],
+      nextAction: `Plan slice ${activeSlice.id} (${activeSlice.title}).`,
+      activeBranch: activeBranch ?? undefined,
+      registry,
+      requirements,
+      progress: {
+        milestones: milestoneProgress,
+        slices: sliceProgress,
+      },
+    };
+  }
+
+  const slicePlan = parsePlan(slicePlanContent);
+  const taskProgress = {
+    done: slicePlan.tasks.filter(t => t.done).length,
+    total: slicePlan.tasks.length,
+  };
+  const activeTaskEntry = slicePlan.tasks.find(t => !t.done);
+
+  if (!activeTaskEntry) {
+    // All tasks done but slice not marked complete
+    return {
+      activeMilestone,
+      activeSlice,
+      activeTask: null,
+      phase: 'summarizing',
+      recentDecisions: [],
+      blockers: [],
+      nextAction: `All tasks done in ${activeSlice.id}. Write slice summary and complete slice.`,
+      activeBranch: activeBranch ?? undefined,
+      registry,
+      requirements,
+      progress: {
+        milestones: milestoneProgress,
+        slices: sliceProgress,
+        tasks: taskProgress,
+      },
+    };
+  }
+
+  const activeTask: ActiveRef = {
+    id: activeTaskEntry.id,
+    title: activeTaskEntry.title,
+  };
+
+  // ── Blocker detection: scan completed task summaries ──────────────────
+  // If any completed task has blocker_discovered: true and no REPLAN.md
+  // exists yet, transition to replanning-slice instead of executing.
+  const completedTasks = slicePlan.tasks.filter(t => t.done);
+  let blockerTaskId: string | null = null;
+  for (const ct of completedTasks) {
+    const summaryFile = resolveTaskFile(basePath, activeMilestone.id, activeSlice.id, ct.id, "SUMMARY");
+    if (!summaryFile) continue;
+    const summaryContent = await loadFile(summaryFile);
+    if (!summaryContent) continue;
+    const summary = parseSummary(summaryContent);
+    if (summary.frontmatter.blocker_discovered) {
+      blockerTaskId = ct.id;
+      break;
+    }
+  }
+
+  if (blockerTaskId) {
+    // Loop protection: if REPLAN.md already exists, a replan was already
+    // performed for this slice — skip further replanning and continue executing.
+    const replanFile = resolveSliceFile(basePath, activeMilestone.id, activeSlice.id, "REPLAN");
+    if (!replanFile) {
+      return {
+        activeMilestone,
+        activeSlice,
+        activeTask,
+        phase: 'replanning-slice',
+        recentDecisions: [],
+        blockers: [`Task ${blockerTaskId} discovered a blocker requiring slice replan`],
+        nextAction: `Task ${blockerTaskId} reported blocker_discovered. Replan slice ${activeSlice.id} before continuing.`,
+        activeBranch: activeBranch ?? undefined,
+        activeWorkspace: undefined,
+        registry,
+        requirements,
+        progress: {
+          milestones: milestoneProgress,
+          slices: sliceProgress,
+          tasks: taskProgress,
+        },
+      };
+    }
+    // REPLAN.md exists — loop protection: fall through to normal executing
+  }
+
+  // Check for interrupted work
+  const sDir = resolveSlicePath(basePath, activeMilestone.id, activeSlice.id);
+  const continueFile = sDir ? resolveSliceFile(basePath, activeMilestone.id, activeSlice.id, "CONTINUE") : null;
+  // Also check legacy continue.md
+  const hasInterrupted = !!(continueFile && await loadFile(continueFile)) ||
+    !!(sDir && await loadFile(join(sDir, "continue.md")));
+
+  return {
+    activeMilestone,
+    activeSlice,
+    activeTask,
+    phase: 'executing',
+    recentDecisions: [],
+    blockers: [],
+    nextAction: hasInterrupted
+      ? `Resume interrupted work on ${activeTask.id}: ${activeTask.title} in slice ${activeSlice.id}. Read continue.md first.`
+      : `Execute ${activeTask.id}: ${activeTask.title} in slice ${activeSlice.id}.`,
+    activeBranch: activeBranch ?? undefined,
+    registry,
+    requirements,
+    progress: {
+      milestones: milestoneProgress,
+      slices: sliceProgress,
+      tasks: taskProgress,
+    },
+  };
+}
--- a/src/resources/extensions/gsd/templates/context.md
+++ b/src/resources/extensions/gsd/templates/context.md
@ -0,0 +1,76 @@
+# {{milestoneId}}: {{milestoneTitle}} — Context
+
+**Gathered:** {{date}}
+**Status:** Ready for planning
+
+## Project Description
+
+{{description}}
+
+## Why This Milestone
+
+{{whatProblemThisSolves_AND_whyNow}}
+
+## User-Visible Outcome
+
+### When this milestone is complete, the user can:
+
+- {{literalUserActionInRealEnvironment}}
+- {{literalUserActionInRealEnvironment}}
+
+### Entry point / environment
+
+- Entry point: {{CLI command / URL / bot / extension / service / workflow}}
+- Environment: {{local dev / browser / mobile / launchd / CI / production-like}}
+- Live dependencies involved: {{telegram / database / webhook / rpc subprocess / none}}
+
+## Completion Class
+
+- Contract complete means: {{what can be proven by tests / fixtures / artifacts}}
+- Integration complete means: {{what must work across real subsystems}}
+- Operational complete means: {{what must work under real lifecycle conditions, or none}}
+
+## Final Integrated Acceptance
+
+To call this milestone complete, we must prove:
+
+- {{one real end-to-end scenario}}
+- {{one real end-to-end scenario}}
+- {{what cannot be simulated if this milestone is to be considered truly done}}
+
+## Risks and Unknowns
+
+- {{riskOrUnknown}} — {{whyItMatters}}
+
+## Existing Codebase / Prior Art
+
+- `{{fileOrModule}}` — {{howItRelates}}
+- `{{fileOrModule}}` — {{howItRelates}}
+
+> See `.gsd/DECISIONS.md` for all architectural and pattern decisions — it is an append-only register; read it during planning, append to it during execution.
+
+## Relevant Requirements
+
+- {{requirementId}} — {{howThisMilestoneAdvancesIt}}
+
+## Scope
+
+### In Scope
+
+- {{inScopeItem}}
+
+### Out of Scope / Non-Goals
+
+- {{outOfScopeItem}}
+
+## Technical Constraints
+
+- {{constraint}}
+
+## Integration Points
+
+- {{systemOrService}} — {{howThisMilestoneInteractsWithIt}}
+
+## Open Questions
+
+- {{question}} — {{currentThinking}}
--- a/src/resources/extensions/gsd/templates/decisions.md
+++ b/src/resources/extensions/gsd/templates/decisions.md
@ -0,0 +1,8 @@
+# Decisions Register
+
+<!-- Append-only. Never edit or remove existing rows.
+     To reverse a decision, add a new row that supersedes it.
+     Read this file at the start of any planning or research phase. -->
+
+| # | When | Scope | Decision | Choice | Rationale | Revisable? |
+|---|------|-------|----------|--------|-----------|------------|
--- a/src/resources/extensions/gsd/templates/milestone-summary.md
+++ b/src/resources/extensions/gsd/templates/milestone-summary.md
@ -0,0 +1,73 @@
+---
+id: {{milestoneId}}
+provides:
+  - {{whatThisMilestoneProvides}}
+key_decisions:
+  - {{decision}}
+patterns_established:
+  - {{pattern}}
+observability_surfaces:
+  - {{status endpoint, structured log, persisted failure state, diagnostic command, or none}}
+requirement_outcomes:
+  - id: {{requirementId}}
+    from_status: {{active|blocked|deferred}}
+    to_status: {{validated|deferred|blocked|out_of_scope}}
+    proof: {{whatEvidenceSupportsThisTransition}}
+duration: {{duration}}
+verification_result: passed
+completed_at: {{date}}
+---
+
+# {{milestoneId}}: {{milestoneTitle}}
+
+<!-- One-liner must say what the milestone actually delivered, not just that it completed.
+     Good: "State machine integrity with completing-milestone gating, doctor audits, and observability validation"
+     Bad: "Milestone 2 completed" -->
+
+**{{oneLiner}}**
+
+## What Happened
+
+<!-- Cross-slice narrative: compress all slice summaries into a coherent story.
+     Focus on what was built, how the slices connected, and what the milestone
+     achieved as a whole — not a task-by-task replay. -->
+
+{{crossSliceNarrative}}
+
+## Cross-Slice Verification
+
+<!-- How were the milestone's success criteria verified?
+     Reference specific tests, commands, browser checks, or observable behaviors.
+     Each success criterion from the roadmap should have a corresponding verification entry. -->
+
+{{howSuccessCriteriaWereVerified}}
+
+## Requirement Changes
+
+<!-- Transitions with evidence. Each requirement that changed status during this milestone
+     should be listed with the proof that supports the transition. -->
+
+- {{requirementId}}: {{fromStatus}} → {{toStatus}} — {{evidence}}
+
+## Forward Intelligence
+
+<!-- Write what you wish you'd known at the start of this milestone.
+     This section is read by the next milestone's planning and research steps.
+     Be specific and concrete — this is the most valuable context you can transfer. -->
+
+### What the next milestone should know
+- {{insightThatWouldHelpDownstreamWork}}
+
+### What's fragile
+- {{fragileAreaOrThinImplementation}} — {{whyItMatters}}
+
+### Authoritative diagnostics
+- {{whereAFutureAgentShouldLookFirst}} — {{whyThisSignalIsTrustworthy}}
+
+### What assumptions changed
+- {{originalAssumption}} — {{whatActuallyHappened}}
+
+## Files Created/Modified
+
+- `{{filePath}}` — {{description}}
+- `{{filePath}}` — {{description}}
--- a/src/resources/extensions/gsd/templates/plan.md
+++ b/src/resources/extensions/gsd/templates/plan.md
@ -0,0 +1,133 @@
+# {{sliceId}}: {{sliceTitle}}
+
+**Goal:** {{goal}}
+**Demo:** {{demo}}
+
+## Must-Haves
+
+- {{mustHave}}
+- {{mustHave}}
+
+## Proof Level
+
+- This slice proves: {{contract | integration | operational | final-assembly}}
+- Real runtime required: {{yes/no}}
+- Human/UAT required: {{yes/no}}
+
+## Verification
+
+<!-- Define what "done" looks like BEFORE detailing tasks.
+     This section is the slice's objective stopping condition — execution isn't done
+     until everything here passes.
+
+     For non-trivial projects:
+     - Write actual test files into the codebase during the first task
+     - Tests should assert on the slice's demo outcome and boundary contracts
+     - Name the test files here so execution has an unambiguous target
+
+     For simple projects or scripts:
+     - Executable verification commands (bash assertions, curl checks, etc.) are sufficient
+
+     If the project has no test framework and the work is non-trivial,
+     the first task should set one up. A test runner costs 2 minutes
+     and pays for itself immediately.
+
+     For non-trivial backend, integration, async, stateful, or UI work:
+     - Include at least one verification check for an observability or failure-path signal
+     - Verify not just that the feature works, but that a future agent can inspect its state when it fails -->
+
+- {{testFileOrCommand — e.g. `npm test -- --grep "auth flow"` or `bash scripts/verify-s01.sh`}}
+- {{testFileOrCommand}}
+
+## Observability / Diagnostics
+
+<!-- Required for non-trivial backend, integration, async, stateful, or UI slices.
+     Describe how a future agent will inspect current state, detect failure,
+     and localize the problem with minimal ambiguity.
+
+     Prefer:
+     - structured logs/events over ad hoc console strings
+     - stable error codes/types over vague failures
+     - health/readiness/status surfaces over hidden internal state
+     - persisted failure state when it materially improves retries or recovery
+
+     Keep this section concise and high-signal. Do not log secrets or sensitive raw payloads. -->
+
+- Runtime signals: {{structured log/event, state transition, metric, or none}}
+- Inspection surfaces: {{status endpoint, CLI command, script, UI state, DB table, or none}}
+- Failure visibility: {{last error, retry count, phase, timestamp, correlation id, or none}}
+- Redaction constraints: {{secret/PII boundary or none}}
+
+## Integration Closure
+
+- Upstream surfaces consumed: {{specific files / modules / contracts}}
+- New wiring introduced in this slice: {{entrypoint / composition / runtime hookup, or none}}
+- What remains before the milestone is truly usable end-to-end: {{list or "nothing"}}
+
+## Tasks
+
+<!--
+  If every task below is completed exactly as written, the Goal and Demo above
+  should be true at the stated proof level. Tasks should close the loop on the
+  slice, not merely prepare for later work unless the Demo truthfully says the
+  slice only proves fixture/contract-level behavior.
+
+  Write each task as an executable increment, not a vague intention.
+
+  Prefer action-oriented titles:
+  - "Wire real auth middleware into dashboard routes"
+  - "Persist job status and expose failure diagnostics"
+  - "Add browser test covering empty-state recovery"
+
+  Avoid vague titles:
+  - "Set up auth"
+  - "Handle errors"
+  - "Improve UI"
+
+  Each task should usually include:
+  - Why: why this task exists / what part of the slice it closes
+  - Files: the main files likely touched
+  - Do: concrete implementation steps and important constraints
+  - Verify: the command, test, or runtime check that proves it worked
+  - Done when: a measurable acceptance condition
+
+  Keep the checkbox line format exactly:
+  - [ ] **T01: Title** `est:30m`
+-->
+
+- [ ] **T01: {{taskTitle}}** `est:{{estimate}}`
+  - Why: {{whyThisTaskExists}}
+  - Files: `{{filePath}}`, `{{filePath}}`
+  - Do: {{specificImplementationStepsAndConstraints}}
+  - Verify: {{testCommandOrRuntimeCheck}}
+  - Done when: {{measurableAcceptanceCondition}}
+- [ ] **T02: {{taskTitle}}** `est:{{estimate}}`
+  - Why: {{whyThisTaskExists}}
+  - Files: `{{filePath}}`, `{{filePath}}`
+  - Do: {{specificImplementationStepsAndConstraints}}
+  - Verify: {{testCommandOrRuntimeCheck}}
+  - Done when: {{measurableAcceptanceCondition}}
+- [ ] **T03: {{taskTitle}}** `est:{{estimate}}`
+  - Why: {{whyThisTaskExists}}
+  - Files: `{{filePath}}`, `{{filePath}}`
+  - Do: {{specificImplementationStepsAndConstraints}}
+  - Verify: {{testCommandOrRuntimeCheck}}
+  - Done when: {{measurableAcceptanceCondition}}
+
+<!--
+  Format rules (parsers depend on this exact structure):
+  - Checkbox line: - [ ] **T01: Title** `est:30m`
+  - Description:   indented text on the next line(s)
+  - Mark done:     change [ ] to [x]
+  - Tasks execute sequentially in order (T01, T02, T03, ...)
+  - est: is informational (e.g. 30m, 1h, 2h) and optional
+
+  Integration closure rule:
+  - At least one slice in any multi-boundary milestone should perform real composition/wiring, not just contract hardening
+  - For the final assembly slice, verification must exercise the real entrypoint or runtime path
+-->
+
+## Files Likely Touched
+
+- `{{filePath}}`
+- `{{filePath}}`
--- a/src/resources/extensions/gsd/templates/preferences.md
+++ b/src/resources/extensions/gsd/templates/preferences.md
@ -0,0 +1,15 @@
+---
+version: 1
+always_use_skills: []
+prefer_skills: []
+avoid_skills: []
+skill_rules: []
+custom_instructions: []
+models: {}
+skill_discovery:
+auto_supervisor: {}
+---
+
+# GSD Skill Preferences
+
+See `~/.pi/agent/extensions/gsd/docs/preferences-reference.md` for full field documentation and examples.
--- a/src/resources/extensions/gsd/templates/project.md
+++ b/src/resources/extensions/gsd/templates/project.md
@ -0,0 +1,31 @@
+# Project
+
+## What This Is
+
+{{whatTheProjectDoes — plain language, current state, not aspirational}}
+
+## Core Value
+
+<!-- This is the primary value anchor for prioritization and tradeoffs.
+     If scope must shrink, this should survive. -->
+
+{{theOneThingThatMustWorkEvenIfEverythingElseIsCut}}
+
+## Current State
+
+{{whatHasBeenBuiltSoFar — what works, what exists, what's deployed}}
+
+## Architecture / Key Patterns
+
+{{howItsStructured — conventions, tech stack, key modules, established patterns}}
+
+## Capability Contract
+
+See `.gsd/REQUIREMENTS.md` for the explicit capability contract, requirement status, and coverage mapping.
+
+## Milestone Sequence
+
+<!-- Check off milestones as they complete. One-liners should describe intent, not implementation detail. -->
+
+- [ ] M001: {{title}} — {{oneLiner}}
+- [ ] M002: {{title}} — {{oneLiner}}
--- a/src/resources/extensions/gsd/templates/reassessment.md
+++ b/src/resources/extensions/gsd/templates/reassessment.md
@ -0,0 +1,28 @@
+---
+date: {{YYYY-MM-DD}}
+triggering_slice: {{milestoneId/sliceId}}
+verdict: {{no-change | modified}}
+---
+
+# Reassessment: {{triggering_slice}}
+
+## Changes Made
+
+<!-- What changed in the roadmap as a result of this reassessment.
+     Write "No changes." if verdict is no-change. -->
+
+{{placeholder}}
+
+## Requirement Coverage Impact
+
+<!-- Which requirements were added, removed, deferred, or reordered.
+     Write "None." if no requirements were affected. -->
+
+{{placeholder}}
+
+## Decision References
+
+<!-- D-numbers from DECISIONS.md that informed or resulted from this reassessment.
+     Write "None." if no recorded decisions apply. -->
+
+{{placeholder}}
--- a/src/resources/extensions/gsd/templates/requirements.md
+++ b/src/resources/extensions/gsd/templates/requirements.md
@ -0,0 +1,81 @@
+# Requirements
+
+This file is the explicit capability and coverage contract for the project.
+
+Use it to track what is actively in scope, what has been validated by completed work, what is intentionally deferred, and what is explicitly out of scope.
+
+Guidelines:
+- Keep requirements capability-oriented, not a giant feature wishlist.
+- Requirements should be atomic, testable, and stated in plain language.
+- Every **Active** requirement should be mapped to a slice, deferred, blocked with reason, or moved out of scope.
+- Each requirement should have one accountable primary owner and may have supporting slices.
+- Research may suggest requirements, but research does not silently make them binding.
+- Validation means the requirement was actually proven by completed work and verification, not just discussed.
+
+## Active
+
+### R001 — {{requirementTitle}}
+- Class: {{core-capability | primary-user-loop | launchability | continuity | failure-visibility | integration | quality-attribute | operability | admin/support | compliance/security | differentiator | constraint | anti-feature}}
+- Status: active
+- Description: {{what must be true in plain language}}
+- Why it matters: {{why this matters to actual product usefulness/completeness}}
+- Source: {{user | inferred | research | execution}}
+- Primary owning slice: {{M001/S01 | none yet}}
+- Supporting slices: {{M001/S02, M001/S03 | none}}
+- Validation: {{unmapped | mapped | partial | validated}}
+- Notes: {{constraints / acceptance nuance / why not yet validated}}
+
+## Validated
+
+### R010 — {{requirementTitle}}
+- Class: {{failure-visibility}}
+- Status: validated
+- Description: {{what was proven}}
+- Why it matters: {{why it matters}}
+- Source: {{user | inferred | research | execution}}
+- Primary owning slice: {{M001/S01}}
+- Supporting slices: {{none}}
+- Validation: validated
+- Notes: {{what verification proved this}}
+
+## Deferred
+
+### R020 — {{requirementTitle}}
+- Class: {{admin/support}}
+- Status: deferred
+- Description: {{useful later, not now}}
+- Why it matters: {{why it might matter later}}
+- Source: {{user | inferred | research | execution}}
+- Primary owning slice: {{none}}
+- Supporting slices: {{none}}
+- Validation: unmapped
+- Notes: {{why deferred now}}
+
+## Out of Scope
+
+### R030 — {{requirementTitle}}
+- Class: {{anti-feature | constraint | core-capability}}
+- Status: out-of-scope
+- Description: {{what is explicitly excluded}}
+- Why it matters: {{what scope confusion this prevents}}
+- Source: {{user | inferred | research | execution}}
+- Primary owning slice: {{none}}
+- Supporting slices: {{none}}
+- Validation: n/a
+- Notes: {{why excluded}}
+
+## Traceability
+
+| ID | Class | Status | Primary owner | Supporting | Proof |
+|---|---|---|---|---|---|
+| R001 | primary-user-loop | active | M001/S01 | none | mapped |
+| R010 | failure-visibility | validated | M001/S01 | none | validated |
+| R020 | admin/support | deferred | none | none | unmapped |
+| R030 | anti-feature | out-of-scope | none | none | n/a |
+
+## Coverage Summary
+
+- Active requirements: {{count}}
+- Mapped to slices: {{count}}
+- Validated: {{count}}
+- Unmapped active requirements: {{count}}
--- a/src/resources/extensions/gsd/templates/research.md
+++ b/src/resources/extensions/gsd/templates/research.md
@ -0,0 +1,46 @@
+# {{scope}} — Research
+
+**Date:** {{date}}
+
+## Summary
+
+{{summary — 2-3 paragraphs with primary recommendation}}
+
+## Recommendation
+
+{{whatApproachToTake_AND_why}}
+
+## Don't Hand-Roll
+
+| Problem | Existing Solution | Why Use It |
+|---------|------------------|------------|
+| {{problem}} | {{solution}} | {{why}} |
+
+## Existing Code and Patterns
+
+- `{{filePath}}` — {{whatItDoesAndHowToReuseIt}}
+- `{{filePath}}` — {{patternToFollowOrAvoid}}
+
+## Constraints
+
+- {{hardConstraintFromCodebaseOrRuntime}}
+- {{constraintFromDependencies}}
+
+## Common Pitfalls
+
+- **{{pitfall}}** — {{howToAvoid}}
+- **{{pitfall}}** — {{howToAvoid}}
+
+## Open Risks
+
+- {{riskThatCouldSurfaceDuringExecution}}
+
+## Skills Discovered
+
+| Technology | Skill | Status |
+|------------|-------|--------|
+| {{technology}} | {{owner/repo@skill}} | {{installed / available / none found}} |
+
+## Sources
+
+- {{whatWasLearned}} (source: [{{title}}]({{url}}))
--- a/src/resources/extensions/gsd/templates/roadmap.md
+++ b/src/resources/extensions/gsd/templates/roadmap.md
@ -0,0 +1,118 @@
+# {{milestoneId}}: {{milestoneTitle}}
+
+**Vision:** {{vision}}
+
+## Success Criteria
+
+<!-- Write success criteria as observable truths, not implementation tasks.
+     Prefer user-visible or runtime-visible outcomes that can be re-checked at
+     milestone completion.
+
+     Good:
+     - "User can complete the full import flow end-to-end"
+     - "The daemon reconnects automatically after restart"
+
+     Bad:
+     - "Add import API and UI"
+     - "Refactor reconnect logic" -->
+
+- {{criterion}}
+- {{criterion}}
+
+## Key Risks / Unknowns
+
+<!-- List the real risks and uncertainties that shape how slices are ordered.
+     If the project is straightforward, this section can be short or empty.
+     Don't invent risks — only list things that could actually invalidate downstream work. -->
+
+- {{risk}} — {{whyItMatters}}
+- {{risk}} — {{whyItMatters}}
+
+## Proof Strategy
+
+<!-- For each real risk above, name which slice retires it and what "proven" looks like.
+     Proof comes from building the real thing, not from spikes or research.
+     Skip this section for straightforward projects with no major unknowns. -->
+
+- {{riskOrUnknown}} → retire in {{sliceId}} by proving {{whatWillBeProven}}
+- {{riskOrUnknown}} → retire in {{sliceId}} by proving {{whatWillBeProven}}
+
+## Verification Classes
+
+- Contract verification: {{tests / shell verifiers / fixtures / artifact checks}}
+- Integration verification: {{real subsystem interaction that must be exercised, or none}}
+- Operational verification: {{service lifecycle / restart / reconnect / supervision / deploy-install behavior, or none}}
+- UAT / human verification: {{what needs real human judgment, or none}}
+
+## Milestone Definition of Done
+
+This milestone is complete only when all are true:
+
+- {{all slice deliverables are complete}}
+- {{shared components are actually wired together}}
+- {{the real entrypoint exists and is exercised}}
+- {{success criteria are re-checked against live behavior, not just artifacts}}
+- {{final integrated acceptance scenarios pass}}
+
+## Requirement Coverage
+
+- Covers: {{R001, R002}}
+- Partially covers: {{R003 or none}}
+- Leaves for later: {{R004 or none}}
+- Orphan risks: {{none or what is still unmapped}}
+
+## Slices
+
+- [ ] **S01: {{sliceTitle}}** `risk:high` `depends:[]`
+  > After this: {{whatIsDemoableWhenThisSliceIsDone}}
+- [ ] **S02: {{sliceTitle}}** `risk:medium` `depends:[S01]`
+  > After this: {{whatIsDemoableWhenThisSliceIsDone}}
+- [ ] **S03: {{sliceTitle}}** `risk:low` `depends:[S01]`
+  > After this: {{whatIsDemoableWhenThisSliceIsDone}}
+
+<!--
+  Format rules (parsers depend on this exact structure):
+  - Checkbox line: - [ ] **S01: Title** `risk:high|medium|low` `depends:[S01,S02]`
+  - Demo line:     >  After this: one sentence showing what's demoable
+  - Mark done:     change [ ] to [x]
+  - Order slices by risk (highest first)
+  - Each slice must be a vertical, demoable increment — not a layer
+  - If all slices are completed exactly as written, the milestone's promised outcome should actually work at the stated proof level
+  - depends:[X,Y] means X and Y must be done before this slice starts
+
+  Planning quality rules:
+  - Every slice must ship real, working, demoable code — no research-only or foundation-only slices
+  - Early slices should prove the hardest thing works by building through the uncertain path
+  - Each slice should establish a stable surface that downstream slices can depend on
+  - Demo lines should describe concrete, verifiable evidence — not vague claims
+  - In brownfield projects, ground slices in existing modules and patterns
+  - If a slice doesn't produce something testable end-to-end, it's probably a layer — restructure it
+  - If the milestone crosses multiple runtime boundaries (for example daemon + API + UI, bot + subprocess + service manager, or extension + RPC + filesystem), include an explicit final integration slice that proves the assembled system works end-to-end in a real environment
+  - Contract or fixture proof does not replace final assembly proof when the user-visible outcome depends on live wiring
+  - Each "After this" line must be truthful about proof level: if only fixtures or tests prove it, say so; do not imply the user can already perform the live end-to-end behavior unless that has actually been exercised
+-->
+
+## Boundary Map
+
+<!-- Be specific. Name concrete outputs: API endpoints, event payloads, shared types/interfaces,
+     persisted record shapes, CLI contracts, file formats, or invariants.
+     "Produces: auth system" is too vague. "Produces: session middleware that attaches
+     authenticated user to request context" is useful.
+     Consumes should name what downstream slices assume is already available and stable.
+     If the project has a test framework, boundary contracts should ideally be exercised by tests. -->
+
+### S01 → S02
+
+Produces:
+- {{concreteOutput — API, type, data shape, interface, or invariant}}
+
+Consumes:
+- nothing (first slice)
+
+### S01 → S03
+
+Produces:
+- {{concreteOutput — API, type, data shape, interface, or invariant}}
+
+Consumes:
+- nothing (first slice)
--- a/src/resources/extensions/gsd/templates/slice-context.md
+++ b/src/resources/extensions/gsd/templates/slice-context.md
@ -0,0 +1,58 @@
+---
+id: {{sliceId}}
+milestone: {{milestoneId}}
+status: {{draft|ready|in_progress|complete}}
+---
+
+# {{sliceId}}: {{sliceTitle}} — Context
+
+<!-- Slice-scoped context. Milestone-only sections (acceptance criteria, completion class,
+     milestone sequence) do not belong here — those live in the milestone context. -->
+
+## Goal
+
+<!-- One sentence: what this slice delivers when it is done. -->
+
+{{sliceGoal}}
+
+## Why this Slice
+
+<!-- Why this slice is being done now. What does it unblock, and why does order matter? -->
+
+{{whyNowAndWhatItUnblocks}}
+
+## Scope
+
+<!-- What is and is not in scope for this slice. Be explicit about non-goals. -->
+
+### In Scope
+
+- {{inScopeItem}}
+
+### Out of Scope
+
+- {{outOfScopeItem}}
+
+## Constraints
+
+<!-- Known constraints: time-boxes, hard dependencies, prior decisions this slice must respect. -->
+
+- {{constraint}}
+
+## Integration Points
+
+<!-- Artifacts or subsystems this slice consumes and produces. -->
+
+### Consumes
+
+- `{{fileOrArtifact}}` — {{howItIsUsed}}
+
+### Produces
+
+- `{{fileOrArtifact}}` — {{whatItProvides}}
+
+## Open Questions
+
+<!-- Unresolved questions at planning time. Answer them before or during execution. -->
+
+- {{question}} — {{currentThinking}}
--- a/src/resources/extensions/gsd/templates/slice-summary.md
+++ b/src/resources/extensions/gsd/templates/slice-summary.md
@ -0,0 +1,99 @@
+---
+id: {{sliceId}}
+parent: {{milestoneId}}
+milestone: {{milestoneId}}
+provides:
+  - {{whatThisSliceProvides}}
+requires:
+  - slice: {{depSliceId}}
+    provides: {{whatWasConsumed}}
+affects:
+  - {{downstreamSliceId}}
+key_files:
+  - {{filePath}}
+key_decisions:
+  - {{decision}}
+patterns_established:
+  - {{pattern}}
+observability_surfaces:
+  - {{status endpoint, structured log, persisted failure state, diagnostic command, or none}}
+drill_down_paths:
+  - {{pathToTaskSummary}}
+duration: {{duration}}
+verification_result: passed
+completed_at: {{date}}
+---
+
+# {{sliceId}}: {{sliceTitle}}
+
+<!-- One-liner must say what actually shipped, not just that work completed.
+     Good: "Structured job status endpoint with persisted failure diagnostics"
+     Bad: "Status feature implemented" -->
+
+**{{oneLiner}}**
+
+## What Happened
+
+{{narrative — compress task summaries into a coherent story}}
+
+## Verification
+
+{{whatWasVerifiedAcrossAllTasks — tests, builds, manual checks}}
+
+<!-- If the project has no REQUIREMENTS.md, omit all four requirement sections below entirely — do not fill them with "none". These sections only apply when requirements are being actively tracked. -->
+## Requirements Advanced
+
+- {{requirementId}} — {{howThisSliceAdvancedIt}}
+
+## Requirements Validated
+
+- {{requirementId}} — {{whatProofNowMakesItValidated}}
+
+## New Requirements Surfaced
+
+- {{newRequirementOr_none}}
+
+## Requirements Invalidated or Re-scoped
+
+- {{requirementIdOr_none}} — {{what changed}}
+
+## Deviations
+
+<!-- Deviations are unplanned changes to the written plan, not ordinary debugging inside the plan's intended scope. -->
+
+{{deviationsFromPlan_OR_none}}
+
+## Known Limitations
+
+<!-- Known limitations are real gaps, rough edges, or deferred constraints that still exist after this slice shipped. -->
+
+{{whatDoesntWorkYet_OR_whatWasDeferredToLaterSlices}}
+
+## Follow-ups
+
+<!-- Follow-ups are concrete next actions discovered during execution, not a restatement of known limitations. -->
+
+{{workDeferredOrDiscoveredDuringExecution_OR_none}}
+
+## Files Created/Modified
+
+- `{{filePath}}` — {{description}}
+- `{{filePath}}` — {{description}}
+
+## Forward Intelligence
+
+<!-- Write what you wish you'd known at the start of this slice.
+     This section is read by the next slice's planning and research steps.
+     Be specific and concrete — this is the most valuable context you can transfer. -->
+
+### What the next slice should know
+- {{insightThatWouldHelpDownstreamWork}}
+
+### What's fragile
+- {{fragileAreaOrThinImplementation}} — {{whyItMatters}}
+
+### Authoritative diagnostics
+- {{whereAFutureAgentShouldLookFirst}} — {{whyThisSignalIsTrustworthy}}
+
+### What assumptions changed
+- {{originalAssumption}} — {{whatActuallyHappened}}
--- a/src/resources/extensions/gsd/templates/state.md
+++ b/src/resources/extensions/gsd/templates/state.md
@ -0,0 +1,19 @@
+# GSD State
+
+**Active Milestone:** {{milestoneId}} — {{milestoneTitle}}
+**Active Slice:** {{sliceId}} — {{sliceTitle}}
+**Active Task:** {{taskId}} — {{taskTitle}}
+**Phase:** {{phase}}
+**Slice Branch:** {{activeBranch}}
+**Active Workspace:** {{activeWorkspace}}
+**Next Action:** {{nextAction}}
+**Last Updated:** {{date}}
+**Requirements Status:** {{activeCount}} active · {{validatedCount}} validated · {{deferredCount}} deferred · {{outOfScopeCount}} out of scope
+
+## Recent Decisions
+
+- {{decision}}
+
+## Blockers
+
+- (none)
--- a/src/resources/extensions/gsd/templates/task-plan.md
+++ b/src/resources/extensions/gsd/templates/task-plan.md
@ -0,0 +1,52 @@
+---
+# Optional scope estimate — helps the plan quality validator detect over-scoped tasks.
+# Tasks with 10+ estimated steps or 12+ estimated files trigger a warning to consider splitting.
+estimated_steps: {{estimatedSteps}}
+estimated_files: {{estimatedFiles}}
+---
+
+# {{taskId}}: {{taskTitle}}
+
+**Slice:** {{sliceId}} — {{sliceTitle}}
+**Milestone:** {{milestoneId}}
+
+## Description
+
+{{description}}
+
+## Steps
+
+1. {{step}}
+2. {{step}}
+3. {{step}}
+
+## Must-Haves
+
+- [ ] {{mustHave}}
+- [ ] {{mustHave}}
+
+## Verification
+
+- {{howToVerifyThisTaskIsActuallyDone}}
+- {{commandToRun_OR_behaviorToCheck}}
+
+## Observability Impact
+
+<!-- If this task creates or changes a runtime boundary, async flow, API, UI state,
+     background process, or error path, explain how it improves or depends on
+     future agent observability. Use "None" when genuinely not applicable. -->
+
+- Signals added/changed: {{structured logs, statuses, errors, metrics, or None}}
+- How a future agent inspects this: {{command, endpoint, file, UI state, or None}}
+- Failure state exposed: {{what becomes visible on failure, or None}}
+
+## Inputs
+
+- `{{filePath}}` — {{whatThisTaskNeedsFromPriorWork}}
+- {{priorTaskSummaryInsight}}
+
+## Expected Output
+
+<!-- This task should produce a real increment toward making the slice goal/demo true. A full slice plan should not be able to mark every task complete while the claimed slice behavior still does not work at the stated proof level. -->
+
+- `{{filePath}}` — {{whatThisTaskShouldProduceOrModify}}
--- a/src/resources/extensions/gsd/templates/task-summary.md
+++ b/src/resources/extensions/gsd/templates/task-summary.md
@ -0,0 +1,57 @@
+---
+id: {{taskId}}
+parent: {{sliceId}}
+milestone: {{milestoneId}}
+provides:
+  - {{whatThisTaskProvides}}
+key_files:
+  - {{filePath}}
+key_decisions:
+  - {{decision}}
+patterns_established:
+  - {{pattern}}
+observability_surfaces:
+  - {{status endpoint, structured log, persisted failure state, diagnostic command, or none}}
+duration: {{duration}}
+verification_result: passed
+completed_at: {{date}}
+# Set blocker_discovered: true only if execution revealed the remaining slice plan
+# is fundamentally invalid (wrong API, missing capability, architectural mismatch).
+# Do NOT set true for ordinary bugs, minor deviations, or fixable issues.
+blocker_discovered: false
+---
+
+# {{taskId}}: {{taskTitle}}
+
+<!-- One-liner must say what actually shipped, not just that work completed.
+     Good: "Added retry-aware worker status logging"
+     Bad: "Implemented logging improvements" -->
+
+**{{oneLiner}}**
+
+## What Happened
+
+{{narrative}}
+
+## Verification
+
+{{whatWasVerifiedAndHow — commands run, tests passed, behavior confirmed}}
+
+## Diagnostics
+
+{{howToInspectWhatThisTaskBuiltLater — status surfaces, logs, error shapes, failure artifacts, or none}}
+
+## Deviations
+
+<!-- Deviations are unplanned changes to the written task plan, not ordinary debugging during implementation. -->
+
+{{deviationsFromPlan_OR_none}}
+
+## Known Issues
+
+{{issuesDiscoveredButNotFixed_OR_none}}
+
+## Files Created/Modified
+
+- `{{filePath}}` — {{description}}
+- `{{filePath}}` — {{description}}
--- a/src/resources/extensions/gsd/templates/uat.md
+++ b/src/resources/extensions/gsd/templates/uat.md
@ -0,0 +1,54 @@
+# {{sliceId}}: {{sliceTitle}} — UAT
+
+**Milestone:** {{milestoneId}}
+**Written:** {{date}}
+
+## UAT Type
+
+- UAT mode: {{artifact-driven | live-runtime | human-experience | mixed}}
+- Why this mode is sufficient: {{reason}}
+
+## Preconditions
+
+{{whatMustBeTrueBeforeTesting — server running, data seeded, etc.}}
+
+## Smoke Test
+
+{{oneQuickCheckThatConfirmsTheSliceBasicallyWorks}}
+
+## Test Cases
+
+### 1. {{testName}}
+
+1. {{step}}
+2. {{step}}
+3. **Expected:** {{expected}}
+
+### 2. {{testName}}
+
+1. {{step}}
+2. **Expected:** {{expected}}
+
+## Edge Cases
+
+### {{edgeCaseName}}
+
+1. {{step}}
+2. **Expected:** {{expected}}
+
+## Failure Signals
+
+- {{whatWouldIndicateSomethingIsBroken — errors, missing UI, wrong data}}
+
+## Requirements Proved By This UAT
+
+- {{requirementIdOr_none}} — {{what this UAT proves}}
+
+## Not Proven By This UAT
+
+- {{what this UAT intentionally does not prove}}
+- {{remaining live/runtime/operational gaps, if any}}
+
+## Notes for Tester
+
+{{anythingTheHumanShouldKnow — known rough edges, things to ignore, areas needing gut check}}
--- a/src/resources/extensions/gsd/tests/activity-log-prune.test.ts
+++ b/src/resources/extensions/gsd/tests/activity-log-prune.test.ts
@ -0,0 +1,327 @@
+// Tests for pruneActivityLogs — age-based activity log pruning with
+// highest-seq preservation invariant — plus step-11 prompt text assertion.
+//
+// Sections:
+//   (a) Basic pruning: one old file deleted, two recent survive
+//   (b) Highest-seq preserved even when all files are old
+//   (c) retentionDays=0 boundary: all non-highest-seq deleted
+//   (d) No-op when all files are recent
+//   (e) Empty directory: no crash
+//   (f) All old files: only highest-seq survives
+//   (g) Single file: always preserved (it IS highest-seq)
+//   (h) Seq number is tie-breaker (010 beats 001 lexicographically and numerically)
+//   (i) Non-matching filenames ignored: notes.txt survives, no crash
+//   (j) Step-11 prompt text: "refresh current state if needed"
+
+import { mkdtempSync, mkdirSync, readdirSync, rmSync, utimesSync, writeFileSync } from 'node:fs';
+import { join, dirname } from 'node:path';
+import { tmpdir } from 'node:os';
+import { fileURLToPath } from 'node:url';
+
+import { pruneActivityLogs } from '../activity-log.ts';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+
+// ─── Assertion helpers ─────────────────────────────────────────────────────
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ─── Fixture helpers ───────────────────────────────────────────────────────
+
+let tmpDirs: string[] = [];
+
+function createTmpActivityDir(): string {
+  const dir = mkdtempSync(join(tmpdir(), 'gsd-prune-test-'));
+  tmpDirs.push(dir);
+  return dir;
+}
+
+function writeActivityFile(activityDir: string, seq: string, name: string): string {
+  mkdirSync(activityDir, { recursive: true });
+  const filePath = join(activityDir, `${seq}-${name}.jsonl`);
+  writeFileSync(filePath, `{"seq":${parseInt(seq, 10)},"name":"${name}"}\n`, 'utf-8');
+  return filePath;
+}
+
+/** Set mtime to daysAgo days in the past. */
+function backdateFile(filePath: string, daysAgo: number): void {
+  const pastMs = Date.now() - daysAgo * 24 * 60 * 60 * 1000;
+  const pastDate = new Date(pastMs);
+  utimesSync(filePath, pastDate, pastDate);
+}
+
+function cleanup(): void {
+  for (const dir of tmpDirs) {
+    rmSync(dir, { recursive: true, force: true });
+  }
+  tmpDirs = [];
+}
+
+process.on('exit', cleanup);
+
+// ─── Helper: get sorted filenames (basenames only) in a directory ──────────
+
+function listFiles(dir: string): string[] {
+  return readdirSync(dir).sort();
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Tests
+// ═══════════════════════════════════════════════════════════════════════════
+
+async function main(): Promise<void> {
+
+  // ─── (a) Basic pruning ────────────────────────────────────────────────────
+  console.log('\n── (a) Basic pruning: one old file deleted, two recent survive');
+
+  {
+    const dir = createTmpActivityDir();
+    const f001 = writeActivityFile(dir, '001', 'execute-task-M001-S01-T01');
+    const _f002 = writeActivityFile(dir, '002', 'execute-task-M001-S01-T02');
+    const _f003 = writeActivityFile(dir, '003', 'execute-task-M001-S01-T03');
+
+    backdateFile(f001, 40); // older than 30-day retention
+
+    pruneActivityLogs(dir, 30);
+
+    const remaining = listFiles(dir);
+    assert(
+      !remaining.includes('001-execute-task-M001-S01-T01.jsonl'),
+      '(a) file 001 deleted (40 days old, past 30-day threshold)',
+    );
+    assert(
+      remaining.includes('002-execute-task-M001-S01-T02.jsonl'),
+      '(a) file 002 survives (recent)',
+    );
+    assert(
+      remaining.includes('003-execute-task-M001-S01-T03.jsonl'),
+      '(a) file 003 survives (recent, also highest-seq)',
+    );
+  }
+
+  // ─── (b) Highest-seq preserved even when all files are old ───────────────
+  console.log('\n── (b) Highest-seq preserved even when all files are old');
+
+  {
+    const dir = createTmpActivityDir();
+    const f001 = writeActivityFile(dir, '001', 'execute-task-M001-S01-T01');
+    const f002 = writeActivityFile(dir, '002', 'execute-task-M001-S01-T02');
+    const f003 = writeActivityFile(dir, '003', 'execute-task-M001-S01-T03');
+
+    backdateFile(f001, 40);
+    backdateFile(f002, 40);
+    backdateFile(f003, 40); // all old, but 003 is highest-seq
+
+    pruneActivityLogs(dir, 30);
+
+    const remaining = listFiles(dir);
+    assertEq(remaining.length, 1, '(b) exactly 1 file survives when all are old');
+    assert(
+      remaining.includes('003-execute-task-M001-S01-T03.jsonl'),
+      '(b) highest-seq file (003) is the survivor',
+    );
+  }
+
+  // ─── (c) retentionDays=0 boundary ────────────────────────────────────────
+  console.log('\n── (c) retentionDays=0: all non-highest-seq deleted even if brand-new');
+
+  {
+    const dir = createTmpActivityDir();
+    // All files have mtime=now (freshly written — no backdating)
+    writeActivityFile(dir, '001', 'execute-task-M002-S01-T01');
+    writeActivityFile(dir, '002', 'execute-task-M002-S01-T02');
+    writeActivityFile(dir, '003', 'execute-task-M002-S01-T03');
+
+    pruneActivityLogs(dir, 0); // cutoff = now → everything is "expired"
+
+    const remaining = listFiles(dir);
+    assertEq(remaining.length, 1, '(c) retentionDays=0: exactly 1 file survives');
+    assert(
+      remaining.includes('003-execute-task-M002-S01-T03.jsonl'),
+      '(c) retentionDays=0: only highest-seq (003) survives',
+    );
+  }
+
+  // ─── (d) No-op when all files are recent ─────────────────────────────────
+  console.log('\n── (d) No-op when all files are recent');
+
+  {
+    const dir = createTmpActivityDir();
+    writeActivityFile(dir, '001', 'execute-task-M003-S01-T01');
+    writeActivityFile(dir, '002', 'execute-task-M003-S01-T02');
+    writeActivityFile(dir, '003', 'execute-task-M003-S01-T03');
+    // No backdating — all files are fresh
+
+    pruneActivityLogs(dir, 30);
+
+    const remaining = listFiles(dir);
+    assertEq(remaining.length, 3, '(d) all 3 files survive when all are recent');
+  }
+
+  // ─── (e) Empty directory: no crash ────────────────────────────────────────
+  console.log('\n── (e) Empty directory: no crash');
+
+  {
+    const dir = createTmpActivityDir();
+    // dir exists but is empty
+
+    let threw = false;
+    try {
+      pruneActivityLogs(dir, 30);
+    } catch {
+      threw = true;
+    }
+
+    assert(!threw, '(e) pruneActivityLogs does not throw on empty directory');
+    assert(
+      readdirSync(dir).length === 0,
+      '(e) directory still exists and is still empty after no-op',
+    );
+  }
+
+  // ─── (f) All old files: only highest-seq survives ─────────────────────────
+  console.log('\n── (f) All old files: only highest-seq survives');
+
+  {
+    const dir = createTmpActivityDir();
+    const f004 = writeActivityFile(dir, '004', 'execute-task-M004-S01-T01');
+    const f005 = writeActivityFile(dir, '005', 'execute-task-M004-S01-T02');
+    const f006 = writeActivityFile(dir, '006', 'execute-task-M004-S01-T03');
+
+    backdateFile(f004, 60);
+    backdateFile(f005, 60);
+    backdateFile(f006, 60);
+
+    pruneActivityLogs(dir, 30);
+
+    const remaining = listFiles(dir);
+    assertEq(remaining.length, 1, '(f) exactly 1 file survives when all are old');
+    assert(
+      remaining[0].startsWith('006-'),
+      '(f) the surviving file starts with 006 (highest-seq)',
+    );
+  }
+
+  // ─── (g) Single file: always preserved ────────────────────────────────────
+  console.log('\n── (g) Single file: always preserved (it IS highest-seq)');
+
+  {
+    const dir = createTmpActivityDir();
+    const f001 = writeActivityFile(dir, '001', 'execute-task-M005-S01-T01');
+    backdateFile(f001, 100); // very old
+
+    pruneActivityLogs(dir, 30);
+
+    const remaining = listFiles(dir);
+    assertEq(remaining.length, 1, '(g) single file survives even when very old (it is the highest-seq)');
+    assert(
+      remaining.includes('001-execute-task-M005-S01-T01.jsonl'),
+      '(g) the single file (001) is preserved',
+    );
+  }
+
+  // ─── (h) Seq tie-breaker: 010 is higher than 001 ─────────────────────────
+  console.log('\n── (h) Seq number tie-breaker: 010 beats 001 numerically');
+
+  {
+    const dir = createTmpActivityDir();
+    const f001 = writeActivityFile(dir, '001', 'execute-task-M006-S01-T01');
+    const f010 = writeActivityFile(dir, '010', 'execute-task-M006-S01-T10');
+
+    backdateFile(f001, 40);
+    backdateFile(f010, 40); // both old; 010 is numerically highest
+
+    pruneActivityLogs(dir, 30);
+
+    const remaining = listFiles(dir);
+    assertEq(remaining.length, 1, '(h) exactly 1 file survives');
+    assert(
+      remaining.includes('010-execute-task-M006-S01-T10.jsonl'),
+      '(h) seq 010 (numeric 10) survives over seq 001 (numeric 1)',
+    );
+  }
+
+  // ─── (i) Non-matching filenames ignored ───────────────────────────────────
+  console.log('\n── (i) Non-matching filenames ignored: notes.txt survives, no crash');
+
+  {
+    const dir = createTmpActivityDir();
+    const f001 = writeActivityFile(dir, '001', 'execute-task-M007-S01-T01');
+    const notesPath = join(dir, 'notes.txt');
+    writeFileSync(notesPath, 'some notes\n', 'utf-8');
+
+    backdateFile(f001, 40); // eligible for pruning
+    // notes.txt never gets a seq prefix → should be ignored by pruner
+
+    let threw = false;
+    try {
+      pruneActivityLogs(dir, 30);
+    } catch {
+      threw = true;
+    }
+
+    assert(!threw, '(i) no crash when non-matching file is present');
+
+    const remaining = listFiles(dir);
+    assert(
+      remaining.includes('notes.txt'),
+      '(i) notes.txt (non-matching filename) survives pruning unchanged',
+    );
+    // 001 is deleted (old, and notes.txt is not counted as seq-bearing so 001 is not "highest")
+    // But wait — 001 IS the only seq file, making it highest-seq → it survives
+    assert(
+      remaining.includes('001-execute-task-M007-S01-T01.jsonl'),
+      '(i) seq 001 survives (it is the highest-seq among seq files)',
+    );
+  }
+
+  // ─── (j) Step-11 prompt text assertion ────────────────────────────────────
+  console.log('\n── (j) Step-11 prompt text: "refresh current state if needed"');
+
+  {
+    const { readFileSync } = await import('node:fs');
+    const promptPath = join(__dirname, '..', 'prompts', 'complete-slice.md');
+    const content = readFileSync(promptPath, 'utf-8');
+
+    assert(
+      content.includes('refresh current state if needed'),
+      '(j) complete-slice.md step 11 contains "refresh current state if needed"',
+    );
+  }
+
+  // ═══════════════════════════════════════════════════════════════════════════
+  // Results
+  // ═══════════════════════════════════════════════════════════════════════════
+
+  console.log(`\n${'='.repeat(40)}`);
+  console.log(`Results: ${passed} passed, ${failed} failed`);
+  if (failed > 0) {
+    process.exit(1);
+  } else {
+    console.log('All tests passed ✓');
+  }
+}
+
+main().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
--- a/src/resources/extensions/gsd/tests/auto-preflight.test.ts
+++ b/src/resources/extensions/gsd/tests/auto-preflight.test.ts
@ -0,0 +1,56 @@
+import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+
+import { runGSDDoctor, selectDoctorScope, filterDoctorIssues } from "../doctor.js";
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+const tmpBase = mkdtempSync(join(tmpdir(), "gsd-auto-preflight-test-"));
+const gsd = join(tmpBase, ".gsd");
+
+mkdirSync(join(gsd, "milestones", "M001", "slices", "S01", "tasks"), { recursive: true });
+mkdirSync(join(gsd, "milestones", "M009", "slices", "S01", "tasks"), { recursive: true });
+
+writeFileSync(join(gsd, "milestones", "M001", "M001-ROADMAP.md"), `# M001: Historical\n\n## Slices\n- [x] **S01: Old Slice** \`risk:low\` \`depends:[]\`\n  > After this: old done\n`);
+writeFileSync(join(gsd, "milestones", "M001", "slices", "S01", "S01-PLAN.md"), `# S01: Old Slice\n\n**Goal:** Old\n**Demo:** Old\n\n## Must-Haves\n- done\n\n## Tasks\n- [x] **T01: Old Task** \`est:5m\`\n  done\n`);
+writeFileSync(join(gsd, "milestones", "M001", "slices", "S01", "tasks", "T01-SUMMARY.md"), `---\nid: T01\nparent: S01\nmilestone: M001\nprovides: []\nrequires: []\naffects: []\nkey_files: []\nkey_decisions: []\npatterns_established: []\nobservability_surfaces: []\ndrill_down_paths: []\nduration: 5m\nverification_result: passed\ncompleted_at: 2026-03-09T00:00:00Z\n---\n\n# T01: Old Task\n\n**Done**\n\n## What Happened\nDone.\n\n## Diagnostics\n- log\n`);
+writeFileSync(join(gsd, "milestones", "M001", "slices", "S01", "S01-SUMMARY.md"), `---\nid: S01\nparent: M001\nmilestone: M001\nprovides: []\nrequires: []\naffects: []\nkey_files: []\nkey_decisions: []\npatterns_established: []\nobservability_surfaces: []\ndrill_down_paths: []\nduration: 5m\nverification_result: passed\ncompleted_at: 2026-03-09T00:00:00Z\n---\n\n# S01: Old Slice\n\n**Done**\n\n## What Happened\nDone.\n\n## Verification\nDone.\n\n## Deviations\nNone\n\n## Known Limitations\nNone\n\n## Follow-ups\nNone\n\n## Files Created/Modified\n- \`x\` — x\n\n## Forward Intelligence\n\n### What the next slice should know\n- x\n\n### What's fragile\n- x\n\n### Authoritative diagnostics\n- x\n\n### What assumptions changed\n- x\n`);
+
+writeFileSync(join(gsd, "milestones", "M001", "M001-SUMMARY.md"), `---\nid: M001\nstatus: complete\ncompleted_at: 2026-03-09T00:00:00Z\n---\n\n# M001: Historical\n\nComplete.\n`);
+
+writeFileSync(join(gsd, "milestones", "M009", "M009-ROADMAP.md"), `# M009: Active\n\n## Slices\n- [ ] **S01: Active Slice** \`risk:low\` \`depends:[]\`\n  > After this: active works\n`);
+writeFileSync(join(gsd, "milestones", "M009", "slices", "S01", "S01-PLAN.md"), `# S01: Active Slice\n\n**Goal:** Active\n**Demo:** Active\n\n## Must-Haves\n- done\n\n## Tasks\n- [ ] **T01: Active Task** \`est:5m\`\n  todo\n`);
+
+async function main(): Promise<void> {
+  const scope = await selectDoctorScope(tmpBase);
+  assert(scope === "M009/S01", "active scope selected instead of historical milestone");
+
+  const scopedReport = await runGSDDoctor(tmpBase, { fix: false, scope });
+  const scopedBlocking = filterDoctorIssues(scopedReport.issues, { scope, includeWarnings: false });
+  assert(scopedBlocking.length === 0, "no blocking issues in active scope");
+
+  const historicalReport = await runGSDDoctor(tmpBase, { fix: false });
+  const historicalWarnings = historicalReport.issues.filter(issue => issue.unitId.startsWith("M001/S01") && issue.severity === "warning");
+  assert(historicalWarnings.length > 0, "full repo still contains historical warning drift");
+
+  rmSync(tmpBase, { recursive: true, force: true });
+
+  console.log(`Results: ${passed} passed, ${failed} failed`);
+  if (failed > 0) process.exit(1);
+}
+
+main().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
--- a/src/resources/extensions/gsd/tests/auto-supervisor.test.mjs
+++ b/src/resources/extensions/gsd/tests/auto-supervisor.test.mjs
@ -0,0 +1,53 @@
+import test from 'node:test';
+import assert from 'node:assert/strict';
+import { mkdtempSync, readFileSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+import { join } from 'node:path';
+import { writeUnitRuntimeRecord, readUnitRuntimeRecord } from '../unit-runtime.ts';
+import { resolveAutoSupervisorConfig } from '../preferences.ts';
+
+test('resolveAutoSupervisorConfig provides safe timeout defaults', () => {
+  const supervisor = resolveAutoSupervisorConfig();
+  assert.equal(supervisor.soft_timeout_minutes, 20);
+  assert.equal(supervisor.idle_timeout_minutes, 10);
+  assert.equal(supervisor.hard_timeout_minutes, 30);
+});
+
+test('writeUnitRuntimeRecord persists progress and recovery metadata defaults', () => {
+  const base = mkdtempSync(join(tmpdir(), 'gsd-auto-supervisor-'));
+  const startedAt = 1234567890;
+
+  writeUnitRuntimeRecord(base, 'plan-milestone', 'M010', startedAt, {
+    phase: 'dispatched',
+    lastProgressAt: startedAt,
+    progressCount: 1,
+    lastProgressKind: 'dispatch',
+  });
+
+  const runtime = readUnitRuntimeRecord(base, 'plan-milestone', 'M010');
+  assert.ok(runtime);
+  assert.equal(runtime.phase, 'dispatched');
+  assert.equal(runtime.lastProgressAt, startedAt);
+  assert.equal(runtime.progressCount, 1);
+  assert.equal(runtime.lastProgressKind, 'dispatch');
+  assert.equal(runtime.recoveryAttempts, 0);
+});
+
+test('writeUnitRuntimeRecord keeps explicit recovery attempt fields', () => {
+  const base = mkdtempSync(join(tmpdir(), 'gsd-auto-supervisor-'));
+  const startedAt = 2234567890;
+
+  writeUnitRuntimeRecord(base, 'research-milestone', 'M011', startedAt, {
+    phase: 'timeout',
+    recoveryAttempts: 2,
+    lastRecoveryReason: 'idle',
+    lastProgressAt: startedAt + 50,
+    progressCount: 3,
+    lastProgressKind: 'recovery-retry',
+  });
+
+  const runtime = JSON.parse(readFileSync(join(base, '.gsd/runtime/units/research-milestone-M011.json'), 'utf8'));
+  assert.equal(runtime.recoveryAttempts, 2);
+  assert.equal(runtime.lastRecoveryReason, 'idle');
+  assert.equal(runtime.lastProgressKind, 'recovery-retry');
+});
--- a/src/resources/extensions/gsd/tests/complete-milestone.test.ts
+++ b/src/resources/extensions/gsd/tests/complete-milestone.test.ts
@ -0,0 +1,225 @@
+import { mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
+import { join, dirname } from "node:path";
+import { tmpdir } from "node:os";
+import { fileURLToPath } from "node:url";
+
+// loadPrompt reads from ~/.pi/agent/extensions/gsd/prompts/ (main checkout).
+// In a worktree the file may not exist there yet, so we resolve prompts
+// relative to this test file's location (the worktree copy).
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const worktreePromptsDir = join(__dirname, "..", "prompts");
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+/**
+ * Load a prompt template from the worktree prompts directory
+ * and apply variable substitution (mirrors loadPrompt logic).
+ */
+function loadPromptFromWorktree(name: string, vars: Record<string, string> = {}): string {
+  const path = join(worktreePromptsDir, `${name}.md`);
+  let content = readFileSync(path, "utf-8");
+  for (const [key, value] of Object.entries(vars)) {
+    content = content.replaceAll(`{{${key}}}`, value);
+  }
+  return content.trim();
+}
+
+// ─── Fixture Helpers ───────────────────────────────────────────────────────
+
+function createFixtureBase(): string {
+  const base = mkdtempSync(join(tmpdir(), "gsd-complete-ms-test-"));
+  mkdirSync(join(base, ".gsd", "milestones"), { recursive: true });
+  return base;
+}
+
+function writeRoadmap(base: string, mid: string, content: string): void {
+  const dir = join(base, ".gsd", "milestones", mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-ROADMAP.md`), content);
+}
+
+function writeMilestoneSummary(base: string, mid: string, content: string): void {
+  const dir = join(base, ".gsd", "milestones", mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-SUMMARY.md`), content);
+}
+
+function cleanup(base: string): void {
+  rmSync(base, { recursive: true, force: true });
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Tests
+// ═══════════════════════════════════════════════════════════════════════════
+
+async function main(): Promise<void> {
+
+  // ─── Prompt Template Loading ───────────────────────────────────────────
+  console.log("\n=== complete-milestone prompt template exists ===");
+  {
+    let result: string;
+    let threw = false;
+    try {
+      result = loadPromptFromWorktree("complete-milestone", {
+        milestoneId: "M001",
+        milestoneTitle: "Test Milestone",
+        roadmapPath: ".gsd/milestones/M001/M001-ROADMAP.md",
+        inlinedContext: "test context block",
+      });
+    } catch (err) {
+      threw = true;
+      result = "";
+      console.error(`  ERROR: loadPrompt threw: ${err}`);
+    }
+
+    assert(!threw, "loadPrompt does not throw for complete-milestone");
+    assert(typeof result === "string" && result.length > 0, "loadPrompt returns a non-empty string");
+  }
+
+  // ─── Variable Substitution ─────────────────────────────────────────────
+  console.log("\n=== prompt variable substitution ===");
+  {
+    const prompt = loadPromptFromWorktree("complete-milestone", {
+      milestoneId: "M001",
+      milestoneTitle: "Integration Feature",
+      roadmapPath: ".gsd/milestones/M001/M001-ROADMAP.md",
+      inlinedContext: "--- inlined slice summaries and context ---",
+    });
+
+    assert(prompt.includes("M001"), "prompt contains milestoneId 'M001'");
+    assert(prompt.includes("Integration Feature"), "prompt contains milestoneTitle");
+    assert(prompt.includes(".gsd/milestones/M001/M001-ROADMAP.md"), "prompt contains roadmapPath");
+    assert(prompt.includes("--- inlined slice summaries and context ---"), "prompt contains inlinedContext");
+    assert(!prompt.includes("{{milestoneId}}"), "no un-substituted {{milestoneId}}");
+    assert(!prompt.includes("{{milestoneTitle}}"), "no un-substituted {{milestoneTitle}}");
+    assert(!prompt.includes("{{roadmapPath}}"), "no un-substituted {{roadmapPath}}");
+    assert(!prompt.includes("{{inlinedContext}}"), "no un-substituted {{inlinedContext}}");
+  }
+
+  // ─── Prompt Content Integrity ──────────────────────────────────────────
+  console.log("\n=== prompt content integrity ===");
+  {
+    const prompt = loadPromptFromWorktree("complete-milestone", {
+      milestoneId: "M002",
+      milestoneTitle: "Completion Workflow",
+      roadmapPath: ".gsd/milestones/M002/M002-ROADMAP.md",
+      inlinedContext: "context",
+    });
+
+    assert(prompt.includes("Complete Milestone"), "prompt contains 'Complete Milestone' heading");
+    assert(prompt.includes("success criter") || prompt.includes("success criteria"), "prompt mentions success criteria verification");
+    assert(prompt.includes("milestone-summary") || prompt.includes("milestoneSummary"), "prompt references milestone summary artifact");
+    assert(prompt.includes("Milestone M002 complete"), "prompt contains completion sentinel for M002");
+  }
+
+  // ─── diagnoseExpectedArtifact behavior ─────────────────────────────────
+  // Since diagnoseExpectedArtifact is not exported from auto.ts, we test
+  // the same logic by reimplementing the switch case for complete-milestone
+  // and verifying against known path patterns.
+  console.log("\n=== diagnoseExpectedArtifact logic for complete-milestone ===");
+  {
+    // Import the path helpers used by diagnoseExpectedArtifact
+    const { relMilestoneFile } = await import("../paths.ts");
+
+    // Simulate diagnoseExpectedArtifact("complete-milestone", "M001", base) logic
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, "M001", `# M001\n\n## Slices\n- [x] **S01: Done** \`risk:low\` \`depends:[]\`\n  > After this: done\n`);
+
+      const unitType = "complete-milestone";
+      const unitId = "M001";
+      const parts = unitId.split("/");
+      const mid = parts[0]!;
+
+      // This is the exact logic from diagnoseExpectedArtifact for "complete-milestone"
+      const result = `${relMilestoneFile(base, mid, "SUMMARY")} (milestone summary)`;
+
+      assert(typeof result === "string", "diagnose returns a string");
+      assert(result.includes("SUMMARY"), "diagnose result mentions SUMMARY");
+      assert(result.includes("milestone"), "diagnose result mentions milestone");
+      assert(result.includes("M001"), "diagnose result includes the milestone ID");
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── deriveState integration: completing-milestone dispatches correctly ─
+  console.log("\n=== deriveState completing-milestone integration ===");
+  {
+    const { deriveState, isMilestoneComplete } = await import("../state.ts");
+    const { parseRoadmap } = await import("../files.ts");
+
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, "M001", `# M001: Integration Test
+
+**Vision:** Test completing-milestone flow.
+
+## Slices
+
+- [x] **S01: Slice One** \`risk:low\` \`depends:[]\`
+  > After this: done.
+
+- [x] **S02: Slice Two** \`risk:low\` \`depends:[S01]\`
+  > After this: done.
+`);
+
+      // Verify isMilestoneComplete returns true
+      const { loadFile } = await import("../files.ts");
+      const roadmapPath = join(base, ".gsd", "milestones", "M001", "M001-ROADMAP.md");
+      const roadmapContent = await loadFile(roadmapPath);
+      const roadmap = parseRoadmap(roadmapContent!);
+      assert(isMilestoneComplete(roadmap), "isMilestoneComplete returns true when all slices are [x]");
+
+      // Verify deriveState returns completing-milestone phase
+      const state = await deriveState(base);
+      assertEq(state.phase, "completing-milestone", "deriveState returns completing-milestone when all slices done, no summary");
+      assertEq(state.activeMilestone?.id, "M001", "active milestone is M001");
+      assertEq(state.activeSlice, null, "no active slice in completing-milestone");
+
+      // Now add the summary and verify it transitions to complete
+      writeMilestoneSummary(base, "M001", "# M001 Summary\n\nDone.");
+      const stateAfter = await deriveState(base);
+      assertEq(stateAfter.phase, "complete", "deriveState returns complete after summary exists");
+      assertEq(stateAfter.registry[0]?.status, "complete", "registry shows complete status");
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ═════════════════════════════════════════════════════════════════════════
+  // Results
+  // ═════════════════════════════════════════════════════════════════════════
+
+  console.log(`\n${"=".repeat(40)}`);
+  console.log(`Results: ${passed} passed, ${failed} failed`);
+  if (failed > 0) {
+    process.exit(1);
+  } else {
+    console.log("All tests passed ✓");
+  }
+}
+
+main().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
--- a/src/resources/extensions/gsd/tests/cost-projection.test.ts
+++ b/src/resources/extensions/gsd/tests/cost-projection.test.ts
@ -0,0 +1,160 @@
+/**
+ * Contract tests for `formatCostProjection`.
+ * Tests the pure function — no file I/O, no extension context.
+ *
+ * This test intentionally fails at import time (or on first assertion)
+ * because `formatCostProjection` does not yet exist in metrics.ts.
+ * That failure confirms the test runs against real code. (T01 state)
+ */
+
+import {
+  type SliceAggregate,
+  formatCostProjection,
+} from "../metrics.js";
+
+// ─── Test helpers ─────────────────────────────────────────────────────────────
+
+function makeSliceAggregate(sliceId: string, cost: number): SliceAggregate {
+  return {
+    sliceId,
+    units: 1,
+    tokens: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
+    cost,
+    duration: 1000,
+  };
+}
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (actual === expected) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ─── formatCostProjection ─────────────────────────────────────────────────────
+
+console.log("\n=== formatCostProjection ===");
+
+// 1. Zero completed slices → empty result
+{
+  const result = formatCostProjection([], 3);
+  assertEq(result.length, 0, "zero slices → empty array");
+}
+
+// 2. One slice → suppressed (need ≥2 to project reliably)
+{
+  const result = formatCostProjection([makeSliceAggregate("M001/S01", 0.10)], 3);
+  assertEq(result.length, 0, "one slice → suppressed (no projection shown)");
+}
+
+// 3. Two slices → projection shown (result.length > 0)
+{
+  const slices = [
+    makeSliceAggregate("M001/S01", 0.10),
+    makeSliceAggregate("M001/S02", 0.10),
+  ];
+  const result = formatCostProjection(slices, 5);
+  assert(result.length > 0, "two slices → projection shown");
+}
+
+// 4. Two-slice result: result[0] contains "$" (cost is formatted)
+{
+  const slices = [
+    makeSliceAggregate("M001/S01", 0.10),
+    makeSliceAggregate("M001/S02", 0.10),
+  ];
+  const result = formatCostProjection(slices, 5);
+  assert(result.length > 0 && result[0].includes("$"), "projection line contains \"$\"");
+}
+
+// 5. Budget ceiling hit: total $0.20 >= ceiling $0.05 → line contains "ceiling"
+{
+  const slices = [
+    makeSliceAggregate("M001/S01", 0.10),
+    makeSliceAggregate("M001/S02", 0.10),
+  ];
+  const result = formatCostProjection(slices, 5, 0.05);
+  const hasCeilingLine = result.some(
+    line => line.toLowerCase().includes("ceiling")
+  );
+  assert(hasCeilingLine, "ceiling warning appears when total ($0.20) >= ceiling ($0.05)");
+}
+
+// 6. Budget ceiling not hit: total $0.20 < ceiling $100.00 → no ceiling line
+{
+  const slices = [
+    makeSliceAggregate("M001/S01", 0.10),
+    makeSliceAggregate("M001/S02", 0.10),
+  ];
+  const result = formatCostProjection(slices, 5, 100.00);
+  const hasCeilingLine = result.some(
+    line => line.toLowerCase().includes("ceiling")
+  );
+  assert(!hasCeilingLine, "no ceiling warning when total ($0.20) < ceiling ($100.00)");
+}
+
+// 7. No ceiling arg → no ceiling line
+{
+  const slices = [
+    makeSliceAggregate("M001/S01", 0.10),
+    makeSliceAggregate("M001/S02", 0.10),
+  ];
+  const result = formatCostProjection(slices, 5);
+  const hasCeilingLine = result.some(
+    line => line.toLowerCase().includes("ceiling")
+  );
+  assert(!hasCeilingLine, "no ceiling warning when no ceiling is set");
+}
+
+// 8. Rounding: avg $0.10 × 5 remaining = $0.50 → result[0] contains "$0.50"
+{
+  const slices = [
+    makeSliceAggregate("M001/S01", 0.10),
+    makeSliceAggregate("M001/S02", 0.10),
+  ];
+  const result = formatCostProjection(slices, 5);
+  const hasRoundedCost = result.some(line => line.includes("$0.50"));
+  assert(hasRoundedCost, "projected cost $0.50 (avg $0.10 × 5 remaining) appears in output");
+}
+
+// 9. Bare milestone entries excluded from average:
+//    makeSliceAggregate('M001', 5.00) has no "/" in sliceId → excluded from avg calc.
+//    Only M001/S01 ($0.10) and M001/S02 ($0.10) count → avg $0.10 × 3 remaining = $0.30
+{
+  const slices = [
+    makeSliceAggregate("M001", 5.00),        // bare milestone — must be excluded
+    makeSliceAggregate("M001/S01", 0.10),
+    makeSliceAggregate("M001/S02", 0.10),
+  ];
+  const result = formatCostProjection(slices, 3);
+  const hasCorrectProjection = result.some(line => line.includes("$0.30"));
+  assert(
+    hasCorrectProjection,
+    "bare milestone entry excluded from avg: projection shows $0.30 (avg $0.10 × 3), not $1.83 (including $5.00 entry)"
+  );
+}
+
+// ─── Summary ──────────────────────────────────────────────────────────────────
+
+console.log(`\n${"=".repeat(40)}`);
+console.log(`Results: ${passed} passed, ${failed} failed`);
+if (failed > 0) {
+  console.error(`${failed} test(s) failed`);
+  process.exit(1);
+} else {
+  console.log("All tests passed ✓");
+}
--- a/src/resources/extensions/gsd/tests/derive-state-deps.test.ts
+++ b/src/resources/extensions/gsd/tests/derive-state-deps.test.ts
@ -0,0 +1,341 @@
+import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from 'node:fs';
+import { join } from 'node:path';
+import { tmpdir } from 'node:os';
+
+import { deriveState } from '../state.ts';
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ─── Fixture Helpers ───────────────────────────────────────────────────────
+
+function createFixtureBase(): string {
+  const base = mkdtempSync(join(tmpdir(), 'gsd-deps-test-'));
+  mkdirSync(join(base, '.gsd', 'milestones'), { recursive: true });
+  return base;
+}
+
+function writeRoadmap(base: string, mid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-ROADMAP.md`), content);
+}
+
+function writeMilestoneSummary(base: string, mid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-SUMMARY.md`), content);
+}
+
+/**
+ * Creates M00x-CONTEXT.md with a valid YAML frontmatter block.
+ * frontmatter is the raw YAML lines between the --- delimiters.
+ */
+function writeContext(base: string, mid: string, frontmatter: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-CONTEXT.md`), `---\n${frontmatter}\n---\n`);
+}
+
+function writeSlicePlan(base: string, mid: string, sid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid, 'slices', sid);
+  mkdirSync(join(dir, 'tasks'), { recursive: true });
+  writeFileSync(join(dir, `${sid}-PLAN.md`), content);
+}
+
+function cleanup(base: string): void {
+  rmSync(base, { recursive: true, force: true });
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Test Groups
+// ═══════════════════════════════════════════════════════════════════════════
+
+async function main(): Promise<void> {
+
+  // ─── Test Group 1: blocked-deps ────────────────────────────────────────
+  // M001 is incomplete (no SUMMARY), M002 depends_on M001 → M002 is pending
+  console.log('\n=== blocked-deps ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: incomplete (one slice, no SUMMARY)
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** First milestone still in progress.
+
+## Slices
+
+- [ ] **S01: Incomplete Slice** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      // M001: add a slice plan with an active task so phase is 'executing'
+      writeSlicePlan(base, 'M001', 'S01', `# S01: Incomplete Slice
+
+**Goal:** Verify dep-blocked milestone behavior.
+**Demo:** Tests pass.
+
+## Tasks
+
+- [ ] **T01: Do work** \`est:15m\`
+  First task still in progress.
+`);
+
+      // M002: depends on M001, also incomplete
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** Second milestone blocked by M001.
+
+## Slices
+
+- [ ] **S01: Blocked Slice** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeContext(base, 'M002', 'depends_on: [M001]');
+
+      const state = await deriveState(base);
+
+      assertEq(state.registry[0]?.status, 'active', 'blocked-deps: M001 is active');
+      assertEq(state.registry[1]?.status, 'pending', 'blocked-deps: M002 is pending (dep-blocked)');
+      assertEq(state.phase, 'executing', 'blocked-deps: phase is executing (M001 is active)');
+      assertEq(state.activeMilestone?.id, 'M001', 'blocked-deps: activeMilestone is M001');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test Group 2: unblocked-deps ──────────────────────────────────────
+  // M001 is complete (all slices [x] + SUMMARY), M002 depends_on M001 → M002 becomes active
+  console.log('\n=== unblocked-deps ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: complete (all slices done + SUMMARY present)
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** First milestone complete.
+
+## Slices
+
+- [x] **S01: Done** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeMilestoneSummary(base, 'M001', '# M001 Summary\n\nFirst milestone is complete.');
+
+      // M002: depends on M001, now unblocked
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** Second milestone now active.
+
+## Slices
+
+- [ ] **S01: Active Slice** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeContext(base, 'M002', 'depends_on: [M001]');
+
+      const state = await deriveState(base);
+
+      assertEq(state.registry[0]?.status, 'complete', 'unblocked-deps: M001 is complete');
+      assertEq(state.registry[1]?.status, 'active', 'unblocked-deps: M002 is active');
+      assertEq(state.activeMilestone?.id, 'M002', 'unblocked-deps: activeMilestone is M002');
+      assert(state.phase !== 'blocked', 'unblocked-deps: phase is not blocked');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test Group 3: all-blocked ─────────────────────────────────────────
+  // M001 depends_on M002, M002 depends_on M001 — circular dep, neither can activate
+  console.log('\n=== all-blocked ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: depends on M002
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** Circular dependency.
+
+## Slices
+
+- [ ] **S01: Waiting** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeContext(base, 'M001', 'depends_on: [M002]');
+
+      // M002: depends on M001
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** Also in circular dependency.
+
+## Slices
+
+- [ ] **S01: Also Waiting** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeContext(base, 'M002', 'depends_on: [M001]');
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'blocked', 'all-blocked: phase is blocked');
+      assert(state.activeMilestone === null || state.activeMilestone !== null, 'all-blocked: state is consistent');
+      assert(state.blockers.length > 0, 'all-blocked: blockers array is non-empty');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test Group 4: absent-context ──────────────────────────────────────
+  // Neither M001 nor M002 has a CONTEXT.md → no dep constraints, normal sequential behavior
+  console.log('\n=== absent-context ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: incomplete, no CONTEXT.md
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** No context file, no deps.
+
+## Slices
+
+- [ ] **S01: Incomplete** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      // M002: incomplete, no CONTEXT.md
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** Also no context file.
+
+## Slices
+
+- [ ] **S01: Pending** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.registry[0]?.status, 'active', 'absent-context: M001 is active');
+      assertEq(state.registry[1]?.status, 'pending', 'absent-context: M002 is pending');
+      assertEq(state.activeMilestone?.id, 'M001', 'absent-context: activeMilestone is M001');
+      assert(state.phase !== 'blocked', 'absent-context: phase is not blocked');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test Group 5: forward-dep ─────────────────────────────────────────
+  // M001 depends_on M002, but M002 is already complete → M001 can activate
+  console.log('\n=== forward-dep ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: depends on M002, but M002 is complete so M001 is unblocked
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** Depends on M002 which is already complete.
+
+## Slices
+
+- [ ] **S01: Ready** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeContext(base, 'M001', 'depends_on: [M002]');
+
+      // M002: complete (all slices [x] + SUMMARY)
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** Already complete.
+
+## Slices
+
+- [x] **S01: Done** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeMilestoneSummary(base, 'M002', '# M002 Summary\n\nSecond milestone is complete.');
+
+      const state = await deriveState(base);
+
+      assertEq(state.activeMilestone?.id, 'M001', 'forward-dep: activeMilestone is M001');
+      assertEq(state.registry[1]?.status, 'complete', 'forward-dep: M002 is complete');
+      assert(state.phase !== 'blocked', 'forward-dep: phase is not blocked');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test Group 6: empty-deps-list ─────────────────────────────────────
+  // M002 has `depends_on: []` — empty list means no constraint, normal sequential behavior
+  console.log('\n=== empty-deps-list ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: incomplete, no context
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** First milestone still in progress.
+
+## Slices
+
+- [ ] **S01: Incomplete** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      // M002: empty deps list — no constraint from deps, but still sequential after M001
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** Empty deps list, no blocking constraint.
+
+## Slices
+
+- [ ] **S01: Waiting for M001** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeContext(base, 'M002', 'depends_on: []');
+
+      const state = await deriveState(base);
+
+      assertEq(state.registry[0]?.status, 'active', 'empty-deps-list: M001 is active');
+      assertEq(state.registry[1]?.status, 'pending', 'empty-deps-list: M002 is pending (M001 not done yet)');
+      assert(state.phase !== 'blocked', 'empty-deps-list: phase is not blocked');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ═════════════════════════════════════════════════════════════════════════
+  // Results
+  // ═════════════════════════════════════════════════════════════════════════
+
+  console.log(`\n${'='.repeat(40)}`);
+  console.log(`Results: ${passed} passed, ${failed} failed`);
+  if (failed > 0) {
+    process.exit(1);
+  } else {
+    console.log('All tests passed ✓');
+  }
+}
+
+main().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
--- a/src/resources/extensions/gsd/tests/derive-state.test.ts
+++ b/src/resources/extensions/gsd/tests/derive-state.test.ts
@ -0,0 +1,637 @@
+import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from 'node:fs';
+import { join } from 'node:path';
+import { tmpdir } from 'node:os';
+
+import { deriveState, isSliceComplete, isMilestoneComplete } from '../state.ts';
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ─── Fixture Helpers ───────────────────────────────────────────────────────
+
+function createFixtureBase(): string {
+  const base = mkdtempSync(join(tmpdir(), 'gsd-state-test-'));
+  mkdirSync(join(base, '.gsd', 'milestones'), { recursive: true });
+  return base;
+}
+
+function writeRoadmap(base: string, mid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-ROADMAP.md`), content);
+}
+
+function writePlan(base: string, mid: string, sid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid, 'slices', sid);
+  mkdirSync(join(dir, 'tasks'), { recursive: true });
+  writeFileSync(join(dir, `${sid}-PLAN.md`), content);
+}
+
+function writeContinue(base: string, mid: string, sid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid, 'slices', sid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${sid}-CONTINUE.md`), content);
+}
+
+function writeMilestoneSummary(base: string, mid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-SUMMARY.md`), content);
+}
+
+function writeRequirements(base: string, content: string): void {
+  writeFileSync(join(base, '.gsd', 'REQUIREMENTS.md'), content);
+}
+
+function cleanup(base: string): void {
+  rmSync(base, { recursive: true, force: true });
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Test Groups
+// ═══════════════════════════════════════════════════════════════════════════
+
+async function main(): Promise<void> {
+
+  // ─── Test 1: empty milestones dir → pre-planning ───────────────────────
+  console.log('\n=== empty milestones dir → pre-planning ===');
+  {
+    const base = createFixtureBase();
+    try {
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'pre-planning', 'phase is pre-planning');
+      assertEq(state.activeMilestone, null, 'activeMilestone is null');
+      assertEq(state.activeSlice, null, 'activeSlice is null');
+      assertEq(state.activeTask, null, 'activeTask is null');
+      assertEq(state.registry, [], 'registry is empty');
+      assertEq(state.progress?.milestones?.done, 0, 'milestones done = 0');
+      assertEq(state.progress?.milestones?.total, 0, 'milestones total = 0');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 2: milestone dir exists but no roadmap → pre-planning ────────
+  console.log('\n=== milestone dir exists but no roadmap → pre-planning ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // Create M001 directory but no roadmap file
+      mkdirSync(join(base, '.gsd', 'milestones', 'M001'), { recursive: true });
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'pre-planning', 'phase is pre-planning');
+      assert(state.activeMilestone !== null, 'activeMilestone is not null');
+      assertEq(state.activeMilestone?.id, 'M001', 'activeMilestone id is M001');
+      assertEq(state.activeSlice, null, 'activeSlice is null');
+      assertEq(state.activeTask, null, 'activeTask is null');
+      assertEq(state.registry.length, 1, 'registry has 1 entry');
+      assertEq(state.registry[0]?.status, 'active', 'registry entry status is active');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 3: roadmap with incomplete slice, no plan → planning ─────────
+  console.log('\n=== roadmap with incomplete slice, no plan → planning ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, 'M001', `# M001: Test Milestone
+
+**Vision:** Test planning phase.
+
+## Slices
+
+- [ ] **S01: Test Slice** \`risk:low\` \`depends:[]\`
+  > After this: Slice is done.
+`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'planning', 'phase is planning');
+      assert(state.activeSlice !== null, 'activeSlice is not null');
+      assertEq(state.activeSlice?.id, 'S01', 'activeSlice id is S01');
+      assertEq(state.activeTask, null, 'activeTask is null');
+      assertEq(state.progress?.slices?.done, 0, 'slices done = 0');
+      assertEq(state.progress?.slices?.total, 1, 'slices total = 1');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 4: roadmap + plan with incomplete tasks → executing ──────────
+  console.log('\n=== roadmap + plan with incomplete tasks → executing ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, 'M001', `# M001: Test Milestone
+
+**Vision:** Test executing phase.
+
+## Slices
+
+- [ ] **S01: Test Slice** \`risk:low\` \`depends:[]\`
+  > After this: Slice is done.
+`);
+
+      writePlan(base, 'M001', 'S01', `# S01: Test Slice
+
+**Goal:** Test executing.
+**Demo:** Tests pass.
+
+## Tasks
+
+- [ ] **T01: First** \`est:10m\`
+  First task description.
+
+- [ ] **T02: Second** \`est:10m\`
+  Second task description.
+`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'executing', 'phase is executing');
+      assert(state.activeTask !== null, 'activeTask is not null');
+      assertEq(state.activeTask?.id, 'T01', 'activeTask id is T01');
+      assertEq(state.progress?.tasks?.done, 0, 'tasks done = 0');
+      assertEq(state.progress?.tasks?.total, 2, 'tasks total = 2');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 5: executing + continue file → resume message ─────────────
+  console.log('\n=== executing + continue file → resume message ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, 'M001', `# M001: Test Milestone
+
+**Vision:** Test interrupted resume.
+
+## Slices
+
+- [ ] **S01: Test Slice** \`risk:low\` \`depends:[]\`
+  > After this: Slice is done.
+`);
+
+      writePlan(base, 'M001', 'S01', `# S01: Test Slice
+
+**Goal:** Test interrupted.
+**Demo:** Tests pass.
+
+## Tasks
+
+- [ ] **T01: First Task** \`est:10m\`
+  First task description.
+`);
+
+      writeContinue(base, 'M001', 'S01', `---
+milestone: M001
+slice: S01
+task: T01
+step: 2
+totalSteps: 5
+status: interrupted
+savedAt: 2026-03-10T10:00:00Z
+---
+
+# Continue: T01
+
+## Completed Work
+Steps 1 done.
+
+## Remaining Work
+Steps 2-5.
+
+## Next Action
+Continue from step 2.
+`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'executing', 'interrupted: phase is executing');
+      assert(state.activeTask !== null, 'interrupted: activeTask is not null');
+      assertEq(state.activeTask?.id, 'T01', 'interrupted: activeTask id is T01');
+      assert(
+        state.nextAction.includes('Resume') || state.nextAction.includes('resume') || state.nextAction.includes('continue.md'),
+        'interrupted: nextAction mentions Resume/resume/continue.md'
+      );
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 6: all tasks done, slice not [x] → summarizing ──────────────
+  console.log('\n=== all tasks done, slice not [x] → summarizing ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, 'M001', `# M001: Test Milestone
+
+**Vision:** Test summarizing phase.
+
+## Slices
+
+- [ ] **S01: Test Slice** \`risk:low\` \`depends:[]\`
+  > After this: Slice is done.
+`);
+
+      writePlan(base, 'M001', 'S01', `# S01: Test Slice
+
+**Goal:** Test summarizing.
+**Demo:** Tests pass.
+
+## Tasks
+
+- [x] **T01: First Done** \`est:10m\`
+  Already completed.
+
+- [x] **T02: Second Done** \`est:10m\`
+  Also completed.
+`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'summarizing', 'summarizing: phase is summarizing');
+      assert(state.activeSlice !== null, 'summarizing: activeSlice is not null');
+      assertEq(state.activeSlice?.id, 'S01', 'summarizing: activeSlice id is S01');
+      assertEq(state.activeTask, null, 'summarizing: activeTask is null');
+      assert(
+        state.nextAction.toLowerCase().includes('summary') || state.nextAction.toLowerCase().includes('complete'),
+        'summarizing: nextAction mentions summary or complete'
+      );
+      assertEq(state.progress?.tasks?.done, 2, 'summarizing: tasks done = 2');
+      assertEq(state.progress?.tasks?.total, 2, 'summarizing: tasks total = 2');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 7: all milestones complete → complete ────────────────────────
+  console.log('\n=== all milestones complete → complete ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, 'M001', `# M001: Test Milestone
+
+**Vision:** Test complete phase.
+
+## Slices
+
+- [x] **S01: Done Slice** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      writeMilestoneSummary(base, 'M001', `# M001 Summary\n\nMilestone complete.`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'complete', 'complete: phase is complete');
+      assertEq(state.activeSlice, null, 'complete: activeSlice is null');
+      assertEq(state.activeTask, null, 'complete: activeTask is null');
+      assert(
+        state.nextAction.toLowerCase().includes('complete'),
+        'complete: nextAction mentions complete'
+      );
+      assertEq(state.registry.length, 1, 'complete: registry has 1 entry');
+      assertEq(state.registry[0]?.status, 'complete', 'complete: registry[0] status is complete');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 8: blocked dependencies ──────────────────────────────────────
+  console.log('\n=== blocked dependencies ===');
+  {
+    // Case A: S01 active (deps satisfied), S02 blocked on S01
+    const base1 = createFixtureBase();
+    try {
+      writeRoadmap(base1, 'M001', `# M001: Test Milestone
+
+**Vision:** Test blocked deps.
+
+## Slices
+
+- [ ] **S01: First** \`risk:low\` \`depends:[]\`
+  > After this: S01 done.
+
+- [ ] **S02: Second** \`risk:low\` \`depends:[S01]\`
+  > After this: S02 done.
+`);
+
+      // S01 has a plan with incomplete task — it's the active slice
+      writePlan(base1, 'M001', 'S01', `# S01: First
+
+**Goal:** First slice.
+**Demo:** Tests pass.
+
+## Tasks
+
+- [ ] **T01: Incomplete** \`est:10m\`
+  Still working.
+`);
+
+      const state1 = await deriveState(base1);
+
+      assertEq(state1.phase, 'executing', 'blocked-A: phase is executing (S01 active)');
+      assertEq(state1.activeSlice?.id, 'S01', 'blocked-A: activeSlice is S01');
+    } finally {
+      cleanup(base1);
+    }
+
+    // Case B: S01 depends on nonexistent S99 → truly blocked
+    const base2 = createFixtureBase();
+    try {
+      writeRoadmap(base2, 'M001', `# M001: Test Milestone
+
+**Vision:** Test truly blocked.
+
+## Slices
+
+- [ ] **S01: Blocked** \`risk:low\` \`depends:[S99]\`
+  > After this: Done.
+`);
+
+      const state2 = await deriveState(base2);
+
+      assertEq(state2.phase, 'blocked', 'blocked-B: phase is blocked');
+      assertEq(state2.activeSlice, null, 'blocked-B: activeSlice is null');
+      assert(state2.blockers.length > 0, 'blocked-B: blockers array is non-empty');
+    } finally {
+      cleanup(base2);
+    }
+  }
+
+  // ─── Test 9: multi-milestone registry ──────────────────────────────────
+  console.log('\n=== multi-milestone registry ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: complete (all slices done)
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** Already done.
+
+## Slices
+
+- [x] **S01: Done** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      writeMilestoneSummary(base, 'M001', `# M001 Summary\n\nFirst milestone complete.`);
+
+      // M002: active (has incomplete slices)
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** Currently active.
+
+## Slices
+
+- [ ] **S01: In Progress** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      // M003: just a dir (no roadmap → pending since M002 is already active)
+      mkdirSync(join(base, '.gsd', 'milestones', 'M003'), { recursive: true });
+
+      const state = await deriveState(base);
+
+      assertEq(state.registry.length, 3, 'multi-ms: registry has 3 entries');
+      assertEq(state.registry[0]?.id, 'M001', 'multi-ms: registry[0] is M001');
+      assertEq(state.registry[0]?.status, 'complete', 'multi-ms: M001 is complete');
+      assertEq(state.registry[1]?.id, 'M002', 'multi-ms: registry[1] is M002');
+      assertEq(state.registry[1]?.status, 'active', 'multi-ms: M002 is active');
+      assertEq(state.registry[2]?.id, 'M003', 'multi-ms: registry[2] is M003');
+      assertEq(state.registry[2]?.status, 'pending', 'multi-ms: M003 is pending');
+      assertEq(state.activeMilestone?.id, 'M002', 'multi-ms: activeMilestone is M002');
+      assertEq(state.progress?.milestones?.done, 1, 'multi-ms: milestones done = 1');
+      assertEq(state.progress?.milestones?.total, 3, 'multi-ms: milestones total = 3');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 10: requirements integration ─────────────────────────────────
+  console.log('\n=== requirements integration ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRequirements(base, `# Requirements
+
+## Active
+
+### R001 — First Active Requirement
+- Status: active
+- Description: Something active.
+
+### R002 — Second Active Requirement
+- Status: active
+- Description: Another active one.
+
+## Validated
+
+### R003 — Validated Requirement
+- Status: validated
+- Description: Already validated.
+
+## Deferred
+
+### R004 — Deferred Requirement
+- Status: deferred
+- Description: Pushed back.
+
+### R005 — Another Deferred
+- Status: deferred
+- Description: Also deferred.
+
+## Out of Scope
+
+### R006 — Out of Scope Requirement
+- Status: out-of-scope
+- Description: Not doing this.
+`);
+
+      // Need at least an empty milestones dir for deriveState
+      const state = await deriveState(base);
+
+      assert(state.requirements !== undefined, 'requirements: requirements object exists');
+      assertEq(state.requirements?.active, 2, 'requirements: active = 2');
+      assertEq(state.requirements?.validated, 1, 'requirements: validated = 1');
+      assertEq(state.requirements?.deferred, 2, 'requirements: deferred = 2');
+      assertEq(state.requirements?.outOfScope, 1, 'requirements: outOfScope = 1');
+      assertEq(state.requirements?.total, 6, 'requirements: total = 6 (sum of all)');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 11: all slices [x], no summary → completing-milestone ────────
+  console.log('\n=== all slices [x], no summary → completing-milestone ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, 'M001', `# M001: Test Milestone
+
+**Vision:** Test completing-milestone phase.
+
+## Slices
+
+- [x] **S01: First Done** \`risk:low\` \`depends:[]\`
+  > After this: S01 complete.
+
+- [x] **S02: Second Done** \`risk:low\` \`depends:[S01]\`
+  > After this: S02 complete.
+`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'completing-milestone', 'completing-ms: phase is completing-milestone');
+      assert(state.activeMilestone !== null, 'completing-ms: activeMilestone is not null');
+      assertEq(state.activeMilestone?.id, 'M001', 'completing-ms: activeMilestone id is M001');
+      assertEq(state.activeSlice, null, 'completing-ms: activeSlice is null');
+      assertEq(state.activeTask, null, 'completing-ms: activeTask is null');
+      assertEq(state.registry.length, 1, 'completing-ms: registry has 1 entry');
+      assertEq(state.registry[0]?.status, 'active', 'completing-ms: registry[0] status is active (not complete)');
+      assertEq(state.progress?.slices?.done, 2, 'completing-ms: slices done = 2');
+      assertEq(state.progress?.slices?.total, 2, 'completing-ms: slices total = 2');
+      assert(
+        state.nextAction.toLowerCase().includes('summary') || state.nextAction.toLowerCase().includes('complete'),
+        'completing-ms: nextAction mentions summary or complete'
+      );
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 12: all slices [x], summary exists → complete ───────────────
+  console.log('\n=== all slices [x], summary exists → complete ===');
+  {
+    const base = createFixtureBase();
+    try {
+      writeRoadmap(base, 'M001', `# M001: Test Milestone
+
+**Vision:** Test that summary presence means complete.
+
+## Slices
+
+- [x] **S01: Done** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      writeMilestoneSummary(base, 'M001', `# M001 Summary\n\nMilestone is complete.`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'complete', 'summary-exists: phase is complete');
+      assertEq(state.registry.length, 1, 'summary-exists: registry has 1 entry');
+      assertEq(state.registry[0]?.status, 'complete', 'summary-exists: registry[0] status is complete');
+      assertEq(state.activeSlice, null, 'summary-exists: activeSlice is null');
+      assertEq(state.activeTask, null, 'summary-exists: activeTask is null');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── Test 13: multi-milestone completing-milestone ─────────────────────
+  console.log('\n=== multi-milestone completing-milestone ===');
+  {
+    const base = createFixtureBase();
+    try {
+      // M001: all slices done + summary exists → complete
+      writeRoadmap(base, 'M001', `# M001: First Milestone
+
+**Vision:** Already complete with summary.
+
+## Slices
+
+- [x] **S01: Done** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+      writeMilestoneSummary(base, 'M001', `# M001 Summary\n\nFirst milestone complete.`);
+
+      // M002: all slices done, no summary → completing-milestone
+      writeRoadmap(base, 'M002', `# M002: Second Milestone
+
+**Vision:** All slices done but no summary.
+
+## Slices
+
+- [x] **S01: Done** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+
+- [x] **S02: Also Done** \`risk:low\` \`depends:[S01]\`
+  > After this: Done.
+`);
+
+      // M003: has incomplete slices → pending (M002 is active)
+      writeRoadmap(base, 'M003', `# M003: Third Milestone
+
+**Vision:** Not yet started.
+
+## Slices
+
+- [ ] **S01: Not Started** \`risk:low\` \`depends:[]\`
+  > After this: Done.
+`);
+
+      const state = await deriveState(base);
+
+      assertEq(state.phase, 'completing-milestone', 'multi-completing: phase is completing-milestone');
+      assertEq(state.activeMilestone?.id, 'M002', 'multi-completing: activeMilestone is M002');
+      assertEq(state.activeSlice, null, 'multi-completing: activeSlice is null');
+      assertEq(state.activeTask, null, 'multi-completing: activeTask is null');
+      assertEq(state.registry.length, 3, 'multi-completing: registry has 3 entries');
+      assertEq(state.registry[0]?.id, 'M001', 'multi-completing: registry[0] is M001');
+      assertEq(state.registry[0]?.status, 'complete', 'multi-completing: M001 is complete');
+      assertEq(state.registry[1]?.id, 'M002', 'multi-completing: registry[1] is M002');
+      assertEq(state.registry[1]?.status, 'active', 'multi-completing: M002 is active (completing-milestone)');
+      assertEq(state.registry[2]?.id, 'M003', 'multi-completing: registry[2] is M003');
+      assertEq(state.registry[2]?.status, 'pending', 'multi-completing: M003 is pending');
+      assertEq(state.progress?.milestones?.done, 1, 'multi-completing: milestones done = 1');
+      assertEq(state.progress?.milestones?.total, 3, 'multi-completing: milestones total = 3');
+      assertEq(state.progress?.slices?.done, 2, 'multi-completing: slices done = 2');
+      assertEq(state.progress?.slices?.total, 2, 'multi-completing: slices total = 2');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ═════════════════════════════════════════════════════════════════════════
+  // Results
+  // ═════════════════════════════════════════════════════════════════════════
+
+  console.log(`\n${'='.repeat(40)}`);
+  console.log(`Results: ${passed} passed, ${failed} failed`);
+  if (failed > 0) {
+    process.exit(1);
+  } else {
+    console.log('All tests passed ✓');
+  }
+}
+
+main().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
--- a/src/resources/extensions/gsd/tests/doctor.test.ts
+++ b/src/resources/extensions/gsd/tests/doctor.test.ts
@ -0,0 +1,505 @@
+import { mkdtempSync, mkdirSync, readFileSync, rmSync, writeFileSync, existsSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+
+import { formatDoctorReport, runGSDDoctor, summarizeDoctorIssues, filterDoctorIssues, selectDoctorScope } from "../doctor.js";
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+const tmpBase = mkdtempSync(join(tmpdir(), "gsd-doctor-test-"));
+const gsd = join(tmpBase, ".gsd");
+const mDir = join(gsd, "milestones", "M001");
+const sDir = join(mDir, "slices", "S01");
+const tDir = join(sDir, "tasks");
+mkdirSync(tDir, { recursive: true });
+
+writeFileSync(join(mDir, "M001-ROADMAP.md"), `# M001: Test Milestone
+
+## Slices
+- [ ] **S01: Demo Slice** \`risk:low\` \`depends:[]\`
+  > After this: demo works
+`);
+
+writeFileSync(join(sDir, "S01-PLAN.md"), `# S01: Demo Slice
+
+**Goal:** Demo
+**Demo:** Demo
+
+## Must-Haves
+- done
+
+## Tasks
+- [x] **T01: Implement thing** \`est:10m\`
+  Task is complete.
+`);
+
+writeFileSync(join(tDir, "T01-SUMMARY.md"), `---
+id: T01
+parent: S01
+milestone: M001
+provides: []
+requires: []
+affects: []
+key_files: []
+key_decisions: []
+patterns_established: []
+observability_surfaces: []
+drill_down_paths: []
+duration: 10m
+verification_result: passed
+completed_at: 2026-03-09T00:00:00Z
+---
+
+# T01: Implement thing
+
+**Done**
+
+## What Happened
+Implemented.
+
+## Diagnostics
+- log
+`);
+
+async function main(): Promise<void> {
+  console.log("\n=== doctor diagnose ===");
+  {
+    const report = await runGSDDoctor(tmpBase, { fix: false });
+    assert(!report.ok, "report is not ok when completion artifacts are missing");
+    assert(report.issues.some(issue => issue.code === "all_tasks_done_missing_slice_summary"), "detects missing slice summary");
+    assert(report.issues.some(issue => issue.code === "all_tasks_done_missing_slice_uat"), "detects missing slice UAT");
+  }
+
+  console.log("\n=== doctor formatting ===");
+  {
+    const report = await runGSDDoctor(tmpBase, { fix: false });
+    const summary = summarizeDoctorIssues(report.issues);
+    assertEq(summary.errors, 2, "two blocking errors in summary");
+    const scoped = filterDoctorIssues(report.issues, { scope: "M001/S01", includeWarnings: true });
+    assert(scoped.length >= 2, "scope filter keeps slice issues");
+    const text = formatDoctorReport(report, { scope: "M001/S01", includeWarnings: true, maxIssues: 5 });
+    assert(text.includes("Scope: M001/S01"), "formatted report shows scope");
+    assert(text.includes("Top issue types:"), "formatted report shows grouped issue types");
+  }
+
+  console.log("\n=== doctor default scope ===");
+  {
+    const scope = await selectDoctorScope(tmpBase);
+    assertEq(scope, "M001/S01", "default doctor scope targets the active slice");
+  }
+
+  console.log("\n=== doctor fix ===");
+  {
+    const report = await runGSDDoctor(tmpBase, { fix: true });
+    if (report.fixesApplied.length < 3) console.error(report);
+    assert(report.fixesApplied.length >= 3, "applies multiple fixes");
+    assert(existsSync(join(sDir, "S01-SUMMARY.md")), "creates placeholder slice summary");
+    assert(existsSync(join(sDir, "S01-UAT.md")), "creates placeholder UAT");
+
+    const plan = readFileSync(join(sDir, "S01-PLAN.md"), "utf-8");
+    assert(plan.includes("- [x] **T01:"), "marks task checkbox done");
+
+    const roadmap = readFileSync(join(mDir, "M001-ROADMAP.md"), "utf-8");
+    assert(roadmap.includes("- [x] **S01:"), "marks slice checkbox done");
+
+    const state = readFileSync(join(gsd, "STATE.md"), "utf-8");
+    assert(state.includes("# GSD State"), "writes state file");
+  }
+
+  rmSync(tmpBase, { recursive: true, force: true });
+
+  // ─── Milestone summary detection: missing summary ──────────────────────
+  console.log("\n=== doctor detects missing milestone summary ===");
+  {
+    const msBase = mkdtempSync(join(tmpdir(), "gsd-doctor-ms-test-"));
+    const msGsd = join(msBase, ".gsd");
+    const msMDir = join(msGsd, "milestones", "M001");
+    const msSDir = join(msMDir, "slices", "S01");
+    const msTDir = join(msSDir, "tasks");
+    mkdirSync(msTDir, { recursive: true });
+
+    // Roadmap with ALL slices [x] — milestone is complete by slice status
+    writeFileSync(join(msMDir, "M001-ROADMAP.md"), `# M001: Test Milestone
+
+## Slices
+- [x] **S01: Done Slice** \`risk:low\` \`depends:[]\`
+  > After this: done
+`);
+
+    // Slice has plan with all tasks done
+    writeFileSync(join(msSDir, "S01-PLAN.md"), `# S01: Done Slice
+
+**Goal:** Done
+**Demo:** Done
+
+## Tasks
+- [x] **T01: Done Task** \`est:10m\`
+  Done.
+`);
+
+    // Task summary exists
+    writeFileSync(join(msTDir, "T01-SUMMARY.md"), `---
+id: T01
+parent: S01
+milestone: M001
+---
+# T01: Done
+**Done**
+## What Happened
+Done.
+`);
+
+    // Slice summary exists (so slice-level checks pass)
+    writeFileSync(join(msSDir, "S01-SUMMARY.md"), `---
+id: S01
+parent: M001
+---
+# S01: Done
+`);
+
+    // Slice UAT exists (so slice-level checks pass)
+    writeFileSync(join(msSDir, "S01-UAT.md"), `# S01 UAT\nDone.\n`);
+
+    // NO milestone summary — this is the condition we're detecting
+
+    const report = await runGSDDoctor(msBase, { fix: false });
+    assert(
+      report.issues.some(issue => issue.code === "all_slices_done_missing_milestone_summary"),
+      "detects missing milestone summary when all slices are done"
+    );
+    const msIssue = report.issues.find(issue => issue.code === "all_slices_done_missing_milestone_summary");
+    assertEq(msIssue?.scope, "milestone", "milestone summary issue has scope 'milestone'");
+    assertEq(msIssue?.severity, "warning", "milestone summary issue has severity 'warning'");
+    assertEq(msIssue?.unitId, "M001", "milestone summary issue unitId is 'M001'");
+    assert(msIssue?.message?.includes("SUMMARY") ?? false, "milestone summary issue message mentions SUMMARY");
+
+    rmSync(msBase, { recursive: true, force: true });
+  }
+
+  // ─── Milestone summary detection: summary present (no false positive) ──
+  console.log("\n=== doctor does NOT flag milestone with summary ===");
+  {
+    const msBase = mkdtempSync(join(tmpdir(), "gsd-doctor-ms-ok-test-"));
+    const msGsd = join(msBase, ".gsd");
+    const msMDir = join(msGsd, "milestones", "M001");
+    const msSDir = join(msMDir, "slices", "S01");
+    const msTDir = join(msSDir, "tasks");
+    mkdirSync(msTDir, { recursive: true });
+
+    // Roadmap with ALL slices [x]
+    writeFileSync(join(msMDir, "M001-ROADMAP.md"), `# M001: Test Milestone
+
+## Slices
+- [x] **S01: Done Slice** \`risk:low\` \`depends:[]\`
+  > After this: done
+`);
+
+    writeFileSync(join(msSDir, "S01-PLAN.md"), `# S01: Done Slice
+
+**Goal:** Done
+**Demo:** Done
+
+## Tasks
+- [x] **T01: Done Task** \`est:10m\`
+  Done.
+`);
+
+    writeFileSync(join(msTDir, "T01-SUMMARY.md"), `---
+id: T01
+parent: S01
+milestone: M001
+---
+# T01: Done
+**Done**
+## What Happened
+Done.
+`);
+
+    writeFileSync(join(msSDir, "S01-SUMMARY.md"), `---
+id: S01
+parent: M001
+---
+# S01: Done
+`);
+
+    writeFileSync(join(msSDir, "S01-UAT.md"), `# S01 UAT\nDone.\n`);
+
+    // Milestone summary EXISTS
+    writeFileSync(join(msMDir, "M001-SUMMARY.md"), `# M001 Summary\n\nMilestone complete.`);
+
+    const report = await runGSDDoctor(msBase, { fix: false });
+    assert(
+      !report.issues.some(issue => issue.code === "all_slices_done_missing_milestone_summary"),
+      "does NOT report missing milestone summary when summary exists"
+    );
+
+    rmSync(msBase, { recursive: true, force: true });
+  }
+
+  // ─── blocker_discovered_no_replan detection ────────────────────────────
+  console.log("\n=== doctor detects blocker_discovered_no_replan ===");
+  {
+    const bBase = mkdtempSync(join(tmpdir(), "gsd-doctor-blocker-test-"));
+    const bGsd = join(bBase, ".gsd");
+    const bMDir = join(bGsd, "milestones", "M001");
+    const bSDir = join(bMDir, "slices", "S01");
+    const bTDir = join(bSDir, "tasks");
+    mkdirSync(bTDir, { recursive: true });
+
+    writeFileSync(join(bMDir, "M001-ROADMAP.md"), `# M001: Test Milestone
+
+## Slices
+- [ ] **S01: Test Slice** \`risk:low\` \`depends:[]\`
+  > After this: stuff works
+`);
+
+    writeFileSync(join(bSDir, "S01-PLAN.md"), `# S01: Test Slice
+
+**Goal:** Test
+**Demo:** Test
+
+## Tasks
+- [x] **T01: First task** \`est:10m\`
+  First task.
+
+- [ ] **T02: Second task** \`est:10m\`
+  Second task.
+`);
+
+    // Task summary with blocker_discovered: true
+    writeFileSync(join(bTDir, "T01-SUMMARY.md"), `---
+id: T01
+parent: S01
+milestone: M001
+provides: []
+key_files: []
+key_decisions: []
+patterns_established: []
+observability_surfaces: []
+duration: 10m
+verification_result: passed
+completed_at: 2026-03-10T00:00:00Z
+blocker_discovered: true
+---
+
+# T01: First task
+
+**Found a blocker.**
+
+## What Happened
+
+Discovered an issue.
+`);
+
+    // No REPLAN.md — should trigger the issue
+    const report = await runGSDDoctor(bBase, { fix: false });
+    const blockerIssues = report.issues.filter(i => i.code === "blocker_discovered_no_replan");
+    assert(blockerIssues.length > 0, "detects blocker_discovered_no_replan");
+    assertEq(blockerIssues[0]?.severity, "warning", "blocker issue has warning severity");
+    assertEq(blockerIssues[0]?.scope, "slice", "blocker issue has slice scope");
+    assert(blockerIssues[0]?.message?.includes("T01") ?? false, "blocker issue message mentions T01");
+    assert(blockerIssues[0]?.message?.includes("S01") ?? false, "blocker issue message mentions S01");
+
+    rmSync(bBase, { recursive: true, force: true });
+  }
+
+  // ─── blocker_discovered with REPLAN.md (no false positive) ─────────────
+  console.log("\n=== doctor does NOT flag blocker when REPLAN.md exists ===");
+  {
+    const bBase = mkdtempSync(join(tmpdir(), "gsd-doctor-blocker-ok-test-"));
+    const bGsd = join(bBase, ".gsd");
+    const bMDir = join(bGsd, "milestones", "M001");
+    const bSDir = join(bMDir, "slices", "S01");
+    const bTDir = join(bSDir, "tasks");
+    mkdirSync(bTDir, { recursive: true });
+
+    writeFileSync(join(bMDir, "M001-ROADMAP.md"), `# M001: Test Milestone
+
+## Slices
+- [ ] **S01: Test Slice** \`risk:low\` \`depends:[]\`
+  > After this: stuff works
+`);
+
+    writeFileSync(join(bSDir, "S01-PLAN.md"), `# S01: Test Slice
+
+**Goal:** Test
+**Demo:** Test
+
+## Tasks
+- [x] **T01: First task** \`est:10m\`
+  First task.
+
+- [ ] **T02: Second task** \`est:10m\`
+  Second task.
+`);
+
+    writeFileSync(join(bTDir, "T01-SUMMARY.md"), `---
+id: T01
+parent: S01
+milestone: M001
+blocker_discovered: true
+completed_at: 2026-03-10T00:00:00Z
+---
+
+# T01: First task
+
+**Found a blocker.**
+
+## What Happened
+
+Discovered an issue.
+`);
+
+    // REPLAN.md exists — should NOT trigger
+    writeFileSync(join(bSDir, "S01-REPLAN.md"), `# Replan\n\nAlready replanned.`);
+
+    const report = await runGSDDoctor(bBase, { fix: false });
+    const blockerIssues = report.issues.filter(i => i.code === "blocker_discovered_no_replan");
+    assertEq(blockerIssues.length, 0, "no blocker_discovered_no_replan when REPLAN.md exists");
+
+    rmSync(bBase, { recursive: true, force: true });
+  }
+
+  // ─── Must-have verification: all addressed → no issue ─────────────────
+  console.log("\n=== doctor: done task with must-haves all addressed → no issue ===");
+  {
+    const mhBase = mkdtempSync(join(tmpdir(), "gsd-doctor-mh-ok-"));
+    const mhGsd = join(mhBase, ".gsd");
+    const mhMDir = join(mhGsd, "milestones", "M001");
+    const mhSDir = join(mhMDir, "slices", "S01");
+    const mhTDir = join(mhSDir, "tasks");
+    mkdirSync(mhTDir, { recursive: true });
+
+    writeFileSync(join(mhMDir, "M001-ROADMAP.md"), `# M001: Test\n\n## Slices\n- [ ] **S01: Slice** \`risk:low\` \`depends:[]\`\n  > After this: done\n`);
+    writeFileSync(join(mhSDir, "S01-PLAN.md"), `# S01: Slice\n\n**Goal:** Demo\n**Demo:** Demo\n\n## Tasks\n- [x] **T01: Implement** \`est:10m\`\n  Done.\n`);
+
+    // Task plan with must-haves
+    writeFileSync(join(mhTDir, "T01-PLAN.md"), `# T01: Implement\n\n## Must-Haves\n\n- [ ] \`parseWidgets\` function exported\n- [ ] Unit tests pass with zero failures\n`);
+
+    // Summary mentioning both must-haves
+    writeFileSync(join(mhTDir, "T01-SUMMARY.md"), `---\nid: T01\nparent: S01\nmilestone: M001\n---\n# T01: Implement\n\n## What Happened\nAdded parseWidgets function. Unit tests pass with zero failures.\n`);
+
+    const report = await runGSDDoctor(mhBase, { fix: false });
+    assert(
+      !report.issues.some(i => i.code === "task_done_must_haves_not_verified"),
+      "no must-have issue when all must-haves are addressed"
+    );
+
+    rmSync(mhBase, { recursive: true, force: true });
+  }
+
+  // ─── Must-have verification: not addressed → warning fired ───────────
+  console.log("\n=== doctor: done task with must-haves NOT addressed → warning ===");
+  {
+    const mhBase = mkdtempSync(join(tmpdir(), "gsd-doctor-mh-fail-"));
+    const mhGsd = join(mhBase, ".gsd");
+    const mhMDir = join(mhGsd, "milestones", "M001");
+    const mhSDir = join(mhMDir, "slices", "S01");
+    const mhTDir = join(mhSDir, "tasks");
+    mkdirSync(mhTDir, { recursive: true });
+
+    writeFileSync(join(mhMDir, "M001-ROADMAP.md"), `# M001: Test\n\n## Slices\n- [ ] **S01: Slice** \`risk:low\` \`depends:[]\`\n  > After this: done\n`);
+    writeFileSync(join(mhSDir, "S01-PLAN.md"), `# S01: Slice\n\n**Goal:** Demo\n**Demo:** Demo\n\n## Tasks\n- [x] **T01: Implement** \`est:10m\`\n  Done.\n`);
+
+    // Task plan with 3 must-haves
+    writeFileSync(join(mhTDir, "T01-PLAN.md"), `# T01: Implement\n\n## Must-Haves\n\n- [ ] \`parseWidgets\` function exported\n- [ ] \`countWidgets\` utility added\n- [ ] Full regression suite passes\n`);
+
+    // Summary mentions only parseWidgets — the other two are missing
+    writeFileSync(join(mhTDir, "T01-SUMMARY.md"), `---\nid: T01\nparent: S01\nmilestone: M001\n---\n# T01: Implement\n\n## What Happened\nAdded parseWidgets function.\n`);
+
+    const report = await runGSDDoctor(mhBase, { fix: false });
+    const mhIssue = report.issues.find(i => i.code === "task_done_must_haves_not_verified");
+    assert(!!mhIssue, "must-have issue is fired when summary doesn't address all must-haves");
+    assertEq(mhIssue?.severity, "warning", "must-have issue is warning severity");
+    assertEq(mhIssue?.scope, "task", "must-have issue scope is task");
+    assert(mhIssue?.message?.includes("3 must-haves") ?? false, "message mentions total must-have count");
+    assert(mhIssue?.message?.includes("only 1") ?? false, "message mentions addressed count");
+    assertEq(mhIssue?.fixable, false, "must-have issue is not fixable");
+
+    rmSync(mhBase, { recursive: true, force: true });
+  }
+
+  // ─── Must-have verification: no task plan → no issue ─────────────────
+  console.log("\n=== doctor: done task with no task plan file → no issue ===");
+  {
+    const mhBase = mkdtempSync(join(tmpdir(), "gsd-doctor-mh-noplan-"));
+    const mhGsd = join(mhBase, ".gsd");
+    const mhMDir = join(mhGsd, "milestones", "M001");
+    const mhSDir = join(mhMDir, "slices", "S01");
+    const mhTDir = join(mhSDir, "tasks");
+    mkdirSync(mhTDir, { recursive: true });
+
+    writeFileSync(join(mhMDir, "M001-ROADMAP.md"), `# M001: Test\n\n## Slices\n- [ ] **S01: Slice** \`risk:low\` \`depends:[]\`\n  > After this: done\n`);
+    writeFileSync(join(mhSDir, "S01-PLAN.md"), `# S01: Slice\n\n**Goal:** Demo\n**Demo:** Demo\n\n## Tasks\n- [x] **T01: Implement** \`est:10m\`\n  Done.\n`);
+
+    // NO task plan file — just a summary
+    writeFileSync(join(mhTDir, "T01-SUMMARY.md"), `---\nid: T01\nparent: S01\nmilestone: M001\n---\n# T01: Implement\n\n## What Happened\nDone.\n`);
+
+    const report = await runGSDDoctor(mhBase, { fix: false });
+    assert(
+      !report.issues.some(i => i.code === "task_done_must_haves_not_verified"),
+      "no must-have issue when task plan file doesn't exist"
+    );
+
+    rmSync(mhBase, { recursive: true, force: true });
+  }
+
+  // ─── Must-have verification: plan exists but no Must-Haves section → no issue
+  console.log("\n=== doctor: done task with plan but no Must-Haves section → no issue ===");
+  {
+    const mhBase = mkdtempSync(join(tmpdir(), "gsd-doctor-mh-nosect-"));
+    const mhGsd = join(mhBase, ".gsd");
+    const mhMDir = join(mhGsd, "milestones", "M001");
+    const mhSDir = join(mhMDir, "slices", "S01");
+    const mhTDir = join(mhSDir, "tasks");
+    mkdirSync(mhTDir, { recursive: true });
+
+    writeFileSync(join(mhMDir, "M001-ROADMAP.md"), `# M001: Test\n\n## Slices\n- [ ] **S01: Slice** \`risk:low\` \`depends:[]\`\n  > After this: done\n`);
+    writeFileSync(join(mhSDir, "S01-PLAN.md"), `# S01: Slice\n\n**Goal:** Demo\n**Demo:** Demo\n\n## Tasks\n- [x] **T01: Implement** \`est:10m\`\n  Done.\n`);
+
+    // Task plan with NO Must-Haves section
+    writeFileSync(join(mhTDir, "T01-PLAN.md"), `# T01: Implement\n\n## Steps\n\n1. Do the thing.\n\n## Verification\n\n- Run tests.\n`);
+
+    writeFileSync(join(mhTDir, "T01-SUMMARY.md"), `---\nid: T01\nparent: S01\nmilestone: M001\n---\n# T01: Implement\n\n## What Happened\nDone.\n`);
+
+    const report = await runGSDDoctor(mhBase, { fix: false });
+    assert(
+      !report.issues.some(i => i.code === "task_done_must_haves_not_verified"),
+      "no must-have issue when task plan has no Must-Haves section"
+    );
+
+    rmSync(mhBase, { recursive: true, force: true });
+  }
+
+  console.log(`\n${"=".repeat(40)}`);
+  console.log(`Results: ${passed} passed, ${failed} failed`);
+  if (failed > 0) {
+    process.exit(1);
+  } else {
+    console.log("All tests passed ✓");
+  }
+}
+
+main().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
--- a/src/resources/extensions/gsd/tests/metrics-io.test.ts
+++ b/src/resources/extensions/gsd/tests/metrics-io.test.ts
@ -0,0 +1,201 @@
+/**
+ * Tests for GSD metrics disk I/O — init, snapshot, load/save cycle.
+ * Uses a temp directory to avoid touching real .gsd/ state.
+ */
+
+import { mkdtempSync, mkdirSync, readFileSync, rmSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+import {
+  initMetrics,
+  resetMetrics,
+  getLedger,
+  snapshotUnitMetrics,
+  type MetricsLedger,
+} from "../metrics.js";
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ─── Setup ────────────────────────────────────────────────────────────────────
+
+const tmpBase = mkdtempSync(join(tmpdir(), "gsd-metrics-test-"));
+mkdirSync(join(tmpBase, ".gsd"), { recursive: true });
+
+// Mock ExtensionContext with session entries
+function mockCtx(messages: any[] = []): any {
+  const entries = messages.map((msg, i) => ({
+    type: "message",
+    id: `entry-${i}`,
+    parentId: i > 0 ? `entry-${i - 1}` : null,
+    timestamp: new Date().toISOString(),
+    message: msg,
+  }));
+  return {
+    sessionManager: {
+      getEntries: () => entries,
+    },
+    model: { id: "claude-sonnet-4-20250514" },
+  };
+}
+
+// ─── Tests ────────────────────────────────────────────────────────────────────
+
+console.log("\n=== initMetrics / getLedger ===");
+
+{
+  resetMetrics();
+  assert(getLedger() === null, "ledger null before init");
+
+  initMetrics(tmpBase);
+  const ledger = getLedger();
+  assert(ledger !== null, "ledger not null after init");
+  assertEq(ledger!.version, 1, "version is 1");
+  assertEq(ledger!.units.length, 0, "no units initially");
+}
+
+console.log("\n=== snapshotUnitMetrics ===");
+
+{
+  resetMetrics();
+  initMetrics(tmpBase);
+
+  // Simulate a session with assistant messages containing usage data
+  const ctx = mockCtx([
+    { role: "user", content: "Do the thing" },
+    {
+      role: "assistant",
+      content: [
+        { type: "text", text: "I'll do the thing" },
+        { type: "tool_call", id: "tc1", name: "bash", input: {} },
+      ],
+      usage: {
+        input: 5000,
+        output: 2000,
+        cacheRead: 3000,
+        cacheWrite: 500,
+        totalTokens: 10500,
+        cost: { input: 0.015, output: 0.03, cacheRead: 0.003, cacheWrite: 0.002, total: 0.05 },
+      },
+    },
+    { role: "toolResult", toolCallId: "tc1", content: [{ type: "text", text: "ok" }] },
+    {
+      role: "assistant",
+      content: [{ type: "text", text: "Done!" }],
+      usage: {
+        input: 8000,
+        output: 1000,
+        cacheRead: 6000,
+        cacheWrite: 200,
+        totalTokens: 15200,
+        cost: { input: 0.024, output: 0.015, cacheRead: 0.006, cacheWrite: 0.001, total: 0.046 },
+      },
+    },
+  ]);
+
+  const unit = snapshotUnitMetrics(ctx, "execute-task", "M001/S01/T01", Date.now() - 5000, "claude-sonnet-4-20250514");
+
+  assert(unit !== null, "unit returned");
+  assertEq(unit!.type, "execute-task", "type");
+  assertEq(unit!.id, "M001/S01/T01", "id");
+  assertEq(unit!.tokens.input, 13000, "input tokens (5000+8000)");
+  assertEq(unit!.tokens.output, 3000, "output tokens (2000+1000)");
+  assertEq(unit!.tokens.cacheRead, 9000, "cacheRead (3000+6000)");
+  assertEq(unit!.tokens.total, 25700, "total tokens (10500+15200)");
+  assert(Math.abs(unit!.cost - 0.096) < 0.001, `cost ~0.096 (got ${unit!.cost})`);
+  assertEq(unit!.toolCalls, 1, "1 tool call");
+  assertEq(unit!.assistantMessages, 2, "2 assistant messages");
+  assertEq(unit!.userMessages, 1, "1 user message");
+
+  // Verify ledger persisted
+  const ledger = getLedger()!;
+  assertEq(ledger.units.length, 1, "1 unit in ledger");
+}
+
+console.log("\n=== Persistence across init/reset cycles ===");
+
+{
+  // Reset and re-init — should load from disk
+  resetMetrics();
+  initMetrics(tmpBase);
+
+  const ledger = getLedger()!;
+  assertEq(ledger.units.length, 1, "unit survived reset+init");
+  assertEq(ledger.units[0].id, "M001/S01/T01", "correct unit ID");
+
+  // Add another unit
+  const ctx = mockCtx([
+    {
+      role: "assistant",
+      content: [{ type: "text", text: "Research complete" }],
+      usage: {
+        input: 3000, output: 1500, cacheRead: 1000, cacheWrite: 300, totalTokens: 5800,
+        cost: { input: 0.009, output: 0.023, cacheRead: 0.001, cacheWrite: 0.001, total: 0.034 },
+      },
+    },
+  ]);
+
+  snapshotUnitMetrics(ctx, "research-slice", "M001/S02", Date.now() - 3000, "claude-sonnet-4-20250514");
+
+  // Verify both units persisted
+  resetMetrics();
+  initMetrics(tmpBase);
+  const final = getLedger()!;
+  assertEq(final.units.length, 2, "2 units after second snapshot");
+}
+
+console.log("\n=== File content verification ===");
+
+{
+  const raw = readFileSync(join(tmpBase, ".gsd", "metrics.json"), "utf-8");
+  const parsed: MetricsLedger = JSON.parse(raw);
+  assertEq(parsed.version, 1, "file version is 1");
+  assertEq(parsed.units.length, 2, "file has 2 units");
+  assert(parsed.projectStartedAt > 0, "projectStartedAt is set");
+}
+
+console.log("\n=== Empty session handling ===");
+
+{
+  resetMetrics();
+  initMetrics(tmpBase);
+
+  // Empty session — no messages
+  const ctx = mockCtx([]);
+  const unit = snapshotUnitMetrics(ctx, "plan-slice", "M001/S01", Date.now(), "test-model");
+  assert(unit === null, "returns null for empty session");
+
+  // Ledger shouldn't have grown
+  assertEq(getLedger()!.units.length, 2, "still 2 units (empty session not added)");
+}
+
+// ─── Cleanup ──────────────────────────────────────────────────────────────────
+
+resetMetrics();
+rmSync(tmpBase, { recursive: true, force: true });
+
+console.log(`\n${"=".repeat(40)}`);
+console.log(`Results: ${passed} passed, ${failed} failed`);
+if (failed > 0) {
+  process.exit(1);
+} else {
+  console.log("All tests passed ✓");
+}
--- a/src/resources/extensions/gsd/tests/metrics.test.ts
+++ b/src/resources/extensions/gsd/tests/metrics.test.ts
@ -0,0 +1,217 @@
+/**
+ * Tests for GSD metrics aggregation logic.
+ * Tests the pure functions — no file I/O, no extension context.
+ */
+
+import {
+  type UnitMetrics,
+  type TokenCounts,
+  classifyUnitPhase,
+  aggregateByPhase,
+  aggregateBySlice,
+  aggregateByModel,
+  getProjectTotals,
+  formatCost,
+  formatTokenCount,
+} from "../metrics.js";
+
+// ─── Test helpers ─────────────────────────────────────────────────────────────
+
+function makeUnit(overrides: Partial<UnitMetrics> = {}): UnitMetrics {
+  return {
+    type: "execute-task",
+    id: "M001/S01/T01",
+    model: "claude-sonnet-4-20250514",
+    startedAt: 1000,
+    finishedAt: 2000,
+    tokens: { input: 1000, output: 500, cacheRead: 200, cacheWrite: 100, total: 1800 },
+    cost: 0.05,
+    toolCalls: 3,
+    assistantMessages: 2,
+    userMessages: 1,
+    ...overrides,
+  };
+}
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (actual === expected) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+function assertClose(actual: number, expected: number, tolerance: number, message: string): void {
+  if (Math.abs(actual - expected) <= tolerance) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ~${expected}, got ${actual}`);
+  }
+}
+
+// ─── Phase classification ─────────────────────────────────────────────────────
+
+console.log("\n=== classifyUnitPhase ===");
+
+assertEq(classifyUnitPhase("research-milestone"), "research", "research-milestone → research");
+assertEq(classifyUnitPhase("research-slice"), "research", "research-slice → research");
+assertEq(classifyUnitPhase("plan-milestone"), "planning", "plan-milestone → planning");
+assertEq(classifyUnitPhase("plan-slice"), "planning", "plan-slice → planning");
+assertEq(classifyUnitPhase("execute-task"), "execution", "execute-task → execution");
+assertEq(classifyUnitPhase("complete-slice"), "completion", "complete-slice → completion");
+assertEq(classifyUnitPhase("reassess-roadmap"), "reassessment", "reassess-roadmap → reassessment");
+assertEq(classifyUnitPhase("unknown-thing"), "execution", "unknown → execution (fallback)");
+
+// ─── getProjectTotals ─────────────────────────────────────────────────────────
+
+console.log("\n=== getProjectTotals ===");
+
+{
+  const units = [
+    makeUnit({ tokens: { input: 1000, output: 500, cacheRead: 200, cacheWrite: 100, total: 1800 }, cost: 0.05, toolCalls: 3, startedAt: 1000, finishedAt: 2000 }),
+    makeUnit({ tokens: { input: 2000, output: 1000, cacheRead: 400, cacheWrite: 200, total: 3600 }, cost: 0.10, toolCalls: 5, startedAt: 2000, finishedAt: 4000 }),
+  ];
+  const totals = getProjectTotals(units);
+
+  assertEq(totals.units, 2, "total units");
+  assertEq(totals.tokens.input, 3000, "total input tokens");
+  assertEq(totals.tokens.output, 1500, "total output tokens");
+  assertEq(totals.tokens.cacheRead, 600, "total cacheRead");
+  assertEq(totals.tokens.cacheWrite, 300, "total cacheWrite");
+  assertEq(totals.tokens.total, 5400, "total tokens");
+  assertClose(totals.cost, 0.15, 0.001, "total cost");
+  assertEq(totals.toolCalls, 8, "total tool calls");
+  assertEq(totals.duration, 3000, "total duration");
+}
+
+{
+  const totals = getProjectTotals([]);
+  assertEq(totals.units, 0, "empty: zero units");
+  assertEq(totals.cost, 0, "empty: zero cost");
+  assertEq(totals.tokens.total, 0, "empty: zero tokens");
+}
+
+// ─── aggregateByPhase ─────────────────────────────────────────────────────────
+
+console.log("\n=== aggregateByPhase ===");
+
+{
+  const units = [
+    makeUnit({ type: "research-milestone", cost: 0.02 }),
+    makeUnit({ type: "research-slice", cost: 0.03 }),
+    makeUnit({ type: "plan-milestone", cost: 0.01 }),
+    makeUnit({ type: "plan-slice", cost: 0.02 }),
+    makeUnit({ type: "execute-task", cost: 0.10 }),
+    makeUnit({ type: "execute-task", cost: 0.08 }),
+    makeUnit({ type: "complete-slice", cost: 0.01 }),
+    makeUnit({ type: "reassess-roadmap", cost: 0.005 }),
+  ];
+  const phases = aggregateByPhase(units);
+
+  assertEq(phases.length, 5, "5 phases");
+  assertEq(phases[0].phase, "research", "first phase is research");
+  assertEq(phases[0].units, 2, "2 research units");
+  assertClose(phases[0].cost, 0.05, 0.001, "research cost");
+
+  assertEq(phases[1].phase, "planning", "second phase is planning");
+  assertEq(phases[1].units, 2, "2 planning units");
+
+  assertEq(phases[2].phase, "execution", "third phase is execution");
+  assertEq(phases[2].units, 2, "2 execution units");
+  assertClose(phases[2].cost, 0.18, 0.001, "execution cost");
+
+  assertEq(phases[3].phase, "completion", "fourth phase is completion");
+  assertEq(phases[4].phase, "reassessment", "fifth phase is reassessment");
+}
+
+// ─── aggregateBySlice ─────────────────────────────────────────────────────────
+
+console.log("\n=== aggregateBySlice ===");
+
+{
+  const units = [
+    makeUnit({ id: "M001/S01/T01", cost: 0.05 }),
+    makeUnit({ id: "M001/S01/T02", cost: 0.04 }),
+    makeUnit({ id: "M001/S02/T01", cost: 0.10 }),
+    makeUnit({ id: "M001", type: "research-milestone", cost: 0.02 }),
+  ];
+  const slices = aggregateBySlice(units);
+
+  assertEq(slices.length, 3, "3 slice groups");
+
+  const s01 = slices.find(s => s.sliceId === "M001/S01");
+  assert(!!s01, "M001/S01 exists");
+  assertEq(s01!.units, 2, "M001/S01 has 2 units");
+  assertClose(s01!.cost, 0.09, 0.001, "M001/S01 cost");
+
+  const s02 = slices.find(s => s.sliceId === "M001/S02");
+  assert(!!s02, "M001/S02 exists");
+  assertEq(s02!.units, 1, "M001/S02 has 1 unit");
+
+  const mLevel = slices.find(s => s.sliceId === "M001");
+  assert(!!mLevel, "M001 (milestone-level) exists");
+}
+
+// ─── aggregateByModel ─────────────────────────────────────────────────────────
+
+console.log("\n=== aggregateByModel ===");
+
+{
+  const units = [
+    makeUnit({ model: "claude-sonnet-4-20250514", cost: 0.05 }),
+    makeUnit({ model: "claude-sonnet-4-20250514", cost: 0.04 }),
+    makeUnit({ model: "claude-opus-4-20250514", cost: 0.30 }),
+  ];
+  const models = aggregateByModel(units);
+
+  assertEq(models.length, 2, "2 models");
+  // Sorted by cost desc — opus should be first
+  assertEq(models[0].model, "claude-opus-4-20250514", "opus first (higher cost)");
+  assertClose(models[0].cost, 0.30, 0.001, "opus cost");
+  assertEq(models[1].model, "claude-sonnet-4-20250514", "sonnet second");
+  assertEq(models[1].units, 2, "sonnet has 2 units");
+}
+
+// ─── formatCost ───────────────────────────────────────────────────────────────
+
+console.log("\n=== formatCost ===");
+
+assertEq(formatCost(0), "$0.0000", "zero cost");
+assertEq(formatCost(0.001), "$0.0010", "sub-cent cost");
+assertEq(formatCost(0.05), "$0.050", "5 cents");
+assertEq(formatCost(1.50), "$1.50", "dollar+");
+assertEq(formatCost(14.20), "$14.20", "double digits");
+
+// ─── formatTokenCount ─────────────────────────────────────────────────────────
+
+console.log("\n=== formatTokenCount ===");
+
+assertEq(formatTokenCount(0), "0", "zero tokens");
+assertEq(formatTokenCount(500), "500", "sub-k");
+assertEq(formatTokenCount(1500), "1.5k", "1.5k");
+assertEq(formatTokenCount(150000), "150.0k", "150k");
+assertEq(formatTokenCount(1500000), "1.50M", "1.5M");
+
+// ─── Summary ──────────────────────────────────────────────────────────────────
+
+console.log(`\n${"=".repeat(40)}`);
+console.log(`Results: ${passed} passed, ${failed} failed`);
+if (failed > 0) {
+  process.exit(1);
+} else {
+  console.log("All tests passed ✓");
+}
--- a/src/resources/extensions/gsd/tests/must-have-parser.test.ts
+++ b/src/resources/extensions/gsd/tests/must-have-parser.test.ts
@ -0,0 +1,309 @@
+import { parseTaskPlanMustHaves } from '../files.ts';
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) passed++;
+  else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) passed++;
+  else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (a) Standard unchecked format: - [ ] text
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: standard unchecked ===');
+{
+  const content = `# T01: Test Task
+
+## Must-Haves
+
+- [ ] First must-have item
+- [ ] Second must-have item
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 2, 'should return 2 items');
+  assertEq(result[0].text, 'First must-have item', 'first item text');
+  assertEq(result[0].checked, false, 'first item unchecked');
+  assertEq(result[1].text, 'Second must-have item', 'second item text');
+  assertEq(result[1].checked, false, 'second item unchecked');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (b) Checked variants: - [x] and - [X]
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: checked [x] and [X] ===');
+{
+  const content = `## Must-Haves
+
+- [x] Lowercase checked item
+- [X] Uppercase checked item
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 2, 'should return 2 items');
+  assertEq(result[0].checked, true, 'lowercase x is checked');
+  assertEq(result[0].text, 'Lowercase checked item', 'lowercase x text');
+  assertEq(result[1].checked, true, 'uppercase X is checked');
+  assertEq(result[1].text, 'Uppercase checked item', 'uppercase X text');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (c) No-checkbox bullets: - text
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: no-checkbox bullets ===');
+{
+  const content = `## Must-Haves
+
+- Plain bullet item
+- Another plain item
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 2, 'should return 2 items');
+  assertEq(result[0].text, 'Plain bullet item', 'plain bullet text');
+  assertEq(result[0].checked, false, 'plain bullet defaults to unchecked');
+  assertEq(result[1].text, 'Another plain item', 'second plain bullet text');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (d) Indented variants
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: indented variants ===');
+{
+  const content = `## Must-Haves
+
+  - [ ] Indented unchecked item
+  - [x] Indented checked item
+  - Plain indented item
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 3, 'should return 3 items');
+  assertEq(result[0].text, 'Indented unchecked item', 'indented unchecked text');
+  assertEq(result[0].checked, false, 'indented unchecked state');
+  assertEq(result[1].text, 'Indented checked item', 'indented checked text');
+  assertEq(result[1].checked, true, 'indented checked state');
+  assertEq(result[2].text, 'Plain indented item', 'indented plain text');
+  assertEq(result[2].checked, false, 'indented plain state');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (e) Mixed checkbox states in one section
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: mixed states ===');
+{
+  const content = `## Must-Haves
+
+- [ ] Unchecked one
+- [x] Checked one
+- [X] Also checked
+- Plain bullet
+- [ ] Another unchecked
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 5, 'should return 5 items');
+  assertEq(result[0].checked, false, 'first is unchecked');
+  assertEq(result[1].checked, true, 'second is checked');
+  assertEq(result[2].checked, true, 'third is checked (uppercase)');
+  assertEq(result[3].checked, false, 'fourth (plain) is unchecked');
+  assertEq(result[4].checked, false, 'fifth is unchecked');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (f) Missing Must-Haves section → empty array
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: missing section ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Some description here.
+
+## Verification
+
+- Run tests
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 0, 'returns empty array when section missing');
+  assert(Array.isArray(result), 'result is an array');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (g) Empty Must-Haves section → empty array
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: empty section ===');
+{
+  const content = `## Must-Haves
+
+## Verification
+
+- Run tests
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 0, 'returns empty array when section is empty');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (h) Content with YAML frontmatter
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: YAML frontmatter ===');
+{
+  const content = `---
+estimated_steps: 5
+estimated_files: 3
+---
+
+# T01: Task with frontmatter
+
+## Must-Haves
+
+- [ ] Real must-have after frontmatter
+- [x] Checked must-have after frontmatter
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 2, 'frontmatter does not pollute results');
+  assertEq(result[0].text, 'Real must-have after frontmatter', 'first item text correct');
+  assertEq(result[0].checked, false, 'first item unchecked');
+  assertEq(result[1].text, 'Checked must-have after frontmatter', 'second item text correct');
+  assertEq(result[1].checked, true, 'second item checked');
+}
+
+// Verify frontmatter content is not misinterpreted as must-haves
+console.log('\n=== parseTaskPlanMustHaves: frontmatter-only content ===');
+{
+  const content = `---
+estimated_steps: 5
+estimated_files: 3
+---
+
+# T01: Task with only frontmatter
+
+## Description
+
+No must-haves section here.
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 0, 'frontmatter-only content returns empty array');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// (i) Real task plan format (based on S01/T01-PLAN.md structure)
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: real task plan format ===');
+{
+  const content = `---
+estimated_steps: 5
+estimated_files: 3
+---
+
+# T01: Add completing-milestone phase to deriveState with tests
+
+**Slice:** S01 — Milestone Completion Unit
+**Milestone:** M002
+
+## Description
+
+Add the \`completing-milestone\` phase to the GSD state machine.
+
+## Steps
+
+1. Add \`'completing-milestone'\` to the \`Phase\` union type in \`types.ts\`.
+2. In \`state.ts\`, modify the registry-building loop.
+
+## Must-Haves
+
+- [ ] \`Phase\` type includes \`'completing-milestone'\`
+- [ ] \`deriveState\` returns \`phase: 'completing-milestone'\` when all slices are \`[x]\` and no \`M00x-SUMMARY.md\` exists
+- [ ] \`deriveState\` returns milestone as \`'complete'\` and advances when summary exists
+- [ ] All 63+ existing \`deriveState\` tests pass without modification
+- [ ] New test fixtures cover single-milestone and multi-milestone completing-milestone scenarios
+
+## Verification
+
+- Run tests
+- All existing 63 assertions pass
+
+## Observability Impact
+
+- Signals added/changed: \`completing-milestone\` phase now visible
+- How a future agent inspects this: Run \`deriveState(basePath)\`
+- Failure state exposed: If \`deriveState\` doesn't detect the phase
+
+## Inputs
+
+- \`agent/extensions/gsd/types.ts\` — Phase type definition
+
+## Expected Output
+
+- \`agent/extensions/gsd/types.ts\` — Phase union includes \`'completing-milestone'\`
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 5, 'real plan has 5 must-haves');
+  assert(result[0].text.includes('`Phase` type includes'), 'first must-have text matches');
+  assert(result[1].text.includes('`deriveState` returns'), 'second must-have text matches');
+  assertEq(result[0].checked, false, 'all real must-haves are unchecked');
+  assertEq(result[4].checked, false, 'last real must-have is unchecked');
+  assert(result[4].text.includes('multi-milestone'), 'last must-have references multi-milestone');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Edge cases
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== parseTaskPlanMustHaves: empty string ===');
+{
+  const result = parseTaskPlanMustHaves('');
+  assertEq(result.length, 0, 'empty string returns empty array');
+}
+
+console.log('\n=== parseTaskPlanMustHaves: must-haves with inline code and backticks ===');
+{
+  const content = `## Must-Haves
+
+- [ ] \`functionName\` is exported from \`module.ts\`
+- [x] Returns \`Array<{ text: string }>\` with correct extraction
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 2, 'handles backtick content');
+  assert(result[0].text.includes('`functionName`'), 'preserves backticks in text');
+  assertEq(result[0].checked, false, 'backtick item unchecked');
+  assertEq(result[1].checked, true, 'backtick item checked');
+}
+
+console.log('\n=== parseTaskPlanMustHaves: asterisk bullets ===');
+{
+  const content = `## Must-Haves
+
+* [ ] Asterisk unchecked
+* [x] Asterisk checked
+* Plain asterisk
+`;
+  const result = parseTaskPlanMustHaves(content);
+  assertEq(result.length, 3, 'handles asterisk bullets');
+  assertEq(result[0].checked, false, 'asterisk unchecked');
+  assertEq(result[1].checked, true, 'asterisk checked');
+  assertEq(result[2].checked, false, 'plain asterisk unchecked');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log(`\n=== Results: ${passed} passed, ${failed} failed ===\n`);
+process.exit(failed > 0 ? 1 : 0);
--- a/src/resources/extensions/gsd/tests/parsers.test.ts
+++ b/src/resources/extensions/gsd/tests/parsers.test.ts
--- a/src/resources/extensions/gsd/tests/plan-milestone.test.ts
+++ b/src/resources/extensions/gsd/tests/plan-milestone.test.ts
@ -0,0 +1,163 @@
+// Tests for inlinePriorMilestoneSummary — the cross-milestone context bridging helper.
+//
+// Scenarios covered:
+//   (A) M002 with M001-SUMMARY.md present → returns string containing "Prior Milestone Summary" and summary content
+//   (B) M001 (no prior milestone in dir) → returns null
+//   (C) M002 with no M001-SUMMARY.md written → returns null
+//   (D) M003 with M002 dir present but no M002-SUMMARY.md → returns null
+
+import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from 'node:fs';
+import { join, dirname } from 'node:path';
+import { tmpdir } from 'node:os';
+import { fileURLToPath } from 'node:url';
+
+import { inlinePriorMilestoneSummary } from '../files.ts';
+
+// ─── Worktree-aware prompt loader ──────────────────────────────────────────
+const __dirname = dirname(fileURLToPath(import.meta.url));
+
+// ─── Assertion helpers ─────────────────────────────────────────────────────
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) {
+    passed++;
+  } else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ─── Fixture helpers ───────────────────────────────────────────────────────
+
+function createFixtureBase(): string {
+  const base = mkdtempSync(join(tmpdir(), 'gsd-plan-ms-test-'));
+  mkdirSync(join(base, '.gsd', 'milestones'), { recursive: true });
+  return base;
+}
+
+function writeMilestoneDir(base: string, mid: string): void {
+  mkdirSync(join(base, '.gsd', 'milestones', mid), { recursive: true });
+}
+
+function writeMilestoneSummary(base: string, mid: string, content: string): void {
+  const dir = join(base, '.gsd', 'milestones', mid);
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(join(dir, `${mid}-SUMMARY.md`), content);
+}
+
+function cleanup(base: string): void {
+  rmSync(base, { recursive: true, force: true });
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Tests
+// ═══════════════════════════════════════════════════════════════════════════
+
+async function main(): Promise<void> {
+
+  // ─── (A) M002 with M001-SUMMARY.md present ────────────────────────────────
+  console.log('\n── (A) M002 with M001-SUMMARY.md present → string containing "Prior Milestone Summary"');
+  {
+    const base = createFixtureBase();
+    try {
+      writeMilestoneDir(base, 'M001');
+      writeMilestoneDir(base, 'M002');
+      writeMilestoneSummary(base, 'M001', '# M001 Summary\n\nKey decisions: used TypeScript throughout.\n');
+
+      const result = await inlinePriorMilestoneSummary('M002', base);
+
+      assert(result !== null, '(A) result is not null when prior milestone has SUMMARY');
+      assert(
+        typeof result === 'string' && result.includes('Prior Milestone Summary'),
+        '(A) result contains "Prior Milestone Summary" label',
+      );
+      assert(
+        typeof result === 'string' && result.includes('Key decisions: used TypeScript throughout.'),
+        '(A) result contains the summary file content',
+      );
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── (B) M001 (no prior milestone in dir) ─────────────────────────────────
+  console.log('\n── (B) M001 — first milestone, no prior → null');
+  {
+    const base = createFixtureBase();
+    try {
+      writeMilestoneDir(base, 'M001');
+
+      const result = await inlinePriorMilestoneSummary('M001', base);
+
+      assertEq(result, null, '(B) M001 with no prior milestone → null');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── (C) M002 with no M001-SUMMARY.md ────────────────────────────────────
+  console.log('\n── (C) M002 with M001 dir but no M001-SUMMARY.md → null');
+  {
+    const base = createFixtureBase();
+    try {
+      writeMilestoneDir(base, 'M001');
+      writeMilestoneDir(base, 'M002');
+      // Intentionally do NOT write M001-SUMMARY.md
+
+      const result = await inlinePriorMilestoneSummary('M002', base);
+
+      assertEq(result, null, '(C) M002 when M001 has no SUMMARY file → null');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ─── (D) M003 with M002 dir but no M002-SUMMARY.md ───────────────────────
+  console.log('\n── (D) M003, M002 is immediately prior but has no SUMMARY → null');
+  {
+    const base = createFixtureBase();
+    try {
+      writeMilestoneDir(base, 'M001');
+      writeMilestoneDir(base, 'M002');
+      writeMilestoneDir(base, 'M003');
+      // M001 has a summary — but M002 (the immediately prior to M003) does NOT
+      writeMilestoneSummary(base, 'M001', '# M001 Summary\n\nOld context.\n');
+      // Intentionally do NOT write M002-SUMMARY.md
+
+      const result = await inlinePriorMilestoneSummary('M003', base);
+
+      assertEq(result, null, '(D) M003 when M002 (immediately prior) has no SUMMARY → null');
+    } finally {
+      cleanup(base);
+    }
+  }
+
+  // ═══════════════════════════════════════════════════════════════════════════
+  // Results
+  // ═══════════════════════════════════════════════════════════════════════════
+
+  console.log(`\n${'='.repeat(40)}`);
+  console.log(`Results: ${passed} passed, ${failed} failed`);
+  if (failed > 0) {
+    process.exit(1);
+  } else {
+    console.log('All tests passed ✓');
+  }
+}
+
+main().catch((error) => {
+  console.error(error);
+  process.exit(1);
+});
--- a/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts
+++ b/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts
@ -0,0 +1,386 @@
+import { validateTaskPlanContent, validateSlicePlanContent } from '../observability-validator.ts';
+
+let passed = 0;
+let failed = 0;
+
+function assert(condition: boolean, message: string): void {
+  if (condition) passed++;
+  else {
+    failed++;
+    console.error(`  FAIL: ${message}`);
+  }
+}
+
+function assertEq<T>(actual: T, expected: T, message: string): void {
+  if (JSON.stringify(actual) === JSON.stringify(expected)) passed++;
+  else {
+    failed++;
+    console.error(`  FAIL: ${message} — expected ${JSON.stringify(expected)}, got ${JSON.stringify(actual)}`);
+  }
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — empty/missing Steps section
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: empty Steps section ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something useful.
+
+## Steps
+
+## Verification
+
+- Run the tests and confirm output.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const stepsIssues = issues.filter(i => i.ruleId === 'empty_steps_section');
+  assert(stepsIssues.length >= 1, 'empty Steps section produces empty_steps_section issue');
+  if (stepsIssues.length > 0) {
+    assertEq(stepsIssues[0].severity, 'warning', 'empty_steps_section severity is warning');
+    assertEq(stepsIssues[0].scope, 'task-plan', 'empty_steps_section scope is task-plan');
+  }
+}
+
+console.log('\n=== validateTaskPlanContent: missing Steps section entirely ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something useful.
+
+## Verification
+
+- Run the tests.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const stepsIssues = issues.filter(i => i.ruleId === 'empty_steps_section');
+  assert(stepsIssues.length >= 1, 'missing Steps section produces empty_steps_section issue');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — placeholder-only Verification
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: placeholder-only Verification ===');
+{
+  const content = `# T01: Some Task
+
+## Steps
+
+1. Do the thing.
+2. Do the other thing.
+
+## Verification
+
+- {{placeholder verification step}}
+- {{another placeholder}}
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const verifyIssues = issues.filter(i => i.ruleId === 'placeholder_verification');
+  assert(verifyIssues.length >= 1, 'placeholder-only Verification produces placeholder_verification issue');
+  if (verifyIssues.length > 0) {
+    assertEq(verifyIssues[0].severity, 'warning', 'placeholder_verification severity is warning');
+    assertEq(verifyIssues[0].scope, 'task-plan', 'placeholder_verification scope is task-plan');
+  }
+}
+
+console.log('\n=== validateTaskPlanContent: Verification with only template text ===');
+{
+  const content = `# T01: Some Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+{{whatWasVerifiedAndHow — commands run, tests passed, behavior confirmed}}
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const verifyIssues = issues.filter(i => i.ruleId === 'placeholder_verification');
+  assert(verifyIssues.length >= 1, 'template-text-only Verification produces placeholder_verification issue');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateSlicePlanContent — empty inline task entries
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateSlicePlanContent: empty inline task entries ===');
+{
+  const content = `# S01: Some Slice
+
+**Goal:** Build the thing.
+**Demo:** It works.
+
+## Tasks
+
+- [ ] **T01: First Task** \`est:20m\`
+
+- [ ] **T02: Second Task** \`est:15m\`
+
+## Verification
+
+- Run the tests.
+`;
+
+  const issues = validateSlicePlanContent('S01-PLAN.md', content);
+  const emptyTaskIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+  assert(emptyTaskIssues.length >= 1, 'task entries with no description produce empty_task_entry issue');
+  if (emptyTaskIssues.length > 0) {
+    assertEq(emptyTaskIssues[0].severity, 'warning', 'empty_task_entry severity is warning');
+    assertEq(emptyTaskIssues[0].scope, 'slice-plan', 'empty_task_entry scope is slice-plan');
+  }
+}
+
+console.log('\n=== validateSlicePlanContent: task entries with content are fine ===');
+{
+  const content = `# S01: Some Slice
+
+**Goal:** Build the thing.
+**Demo:** It works.
+
+## Tasks
+
+- [ ] **T01: First Task** \`est:20m\`
+  - Why: Because it matters.
+  - Files: \`src/index.ts\`
+  - Do: Implement the feature.
+
+- [ ] **T02: Second Task** \`est:15m\`
+  - Why: Also important.
+  - Do: Add tests.
+
+## Verification
+
+- Run the tests.
+`;
+
+  const issues = validateSlicePlanContent('S01-PLAN.md', content);
+  const emptyTaskIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+  assertEq(emptyTaskIssues.length, 0, 'task entries with description content produce no empty_task_entry issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — scope_estimate over threshold
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: scope_estimate over threshold ===');
+{
+  const content = `---
+estimated_steps: 12
+estimated_files: 15
+---
+
+# T01: Big Task
+
+## Steps
+
+1. Step one.
+2. Step two.
+3. Step three.
+
+## Verification
+
+- Check it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const stepsOverIssues = issues.filter(i => i.ruleId === 'scope_estimate_steps_high');
+  const filesOverIssues = issues.filter(i => i.ruleId === 'scope_estimate_files_high');
+  assert(stepsOverIssues.length >= 1, 'estimated_steps=12 (>=10) produces scope_estimate_steps_high issue');
+  assert(filesOverIssues.length >= 1, 'estimated_files=15 (>=12) produces scope_estimate_files_high issue');
+  if (stepsOverIssues.length > 0) {
+    assertEq(stepsOverIssues[0].severity, 'warning', 'scope_estimate_steps_high severity is warning');
+    assertEq(stepsOverIssues[0].scope, 'task-plan', 'scope_estimate_steps_high scope is task-plan');
+  }
+  if (filesOverIssues.length > 0) {
+    assertEq(filesOverIssues[0].severity, 'warning', 'scope_estimate_files_high severity is warning');
+  }
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — scope_estimate within limits
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: scope_estimate within limits ===');
+{
+  const content = `---
+estimated_steps: 4
+estimated_files: 6
+---
+
+# T01: Small Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const scopeIssues = issues.filter(i =>
+    i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(scopeIssues.length, 0, 'scope_estimate within limits produces no scope issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — missing scope_estimate (no warning)
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: missing scope_estimate ===');
+{
+  const content = `# T01: No Frontmatter Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const scopeIssues = issues.filter(i =>
+    i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(scopeIssues.length, 0, 'missing scope_estimate produces no scope issues');
+}
+
+console.log('\n=== validateTaskPlanContent: frontmatter without scope keys ===');
+{
+  const content = `---
+id: T01
+parent: S01
+---
+
+# T01: Task With Other Frontmatter
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const scopeIssues = issues.filter(i =>
+    i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(scopeIssues.length, 0, 'frontmatter without scope keys produces no scope issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Clean plans — no false positives
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== Clean task plan: no plan-quality issues ===');
+{
+  const content = `---
+estimated_steps: 5
+estimated_files: 3
+---
+
+# T01: Well-Formed Task
+
+## Description
+
+A real task with real content.
+
+## Steps
+
+1. Read the input files.
+2. Parse the configuration.
+3. Transform the data.
+4. Write the output.
+5. Verify the results.
+
+## Must-Haves
+
+- [ ] Output file is valid JSON
+- [ ] All input records are processed
+
+## Verification
+
+- Run \`node --test tests/transform.test.ts\` — all assertions pass
+- Manually inspect output.json for correct structure
+
+## Observability Impact
+
+- Signals added/changed: structured error log on parse failure
+- How a future agent inspects this: check stderr for JSON parse errors
+- Failure state exposed: exit code 1 + error message on invalid input
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const planQualityIssues = issues.filter(i =>
+    i.ruleId === 'empty_steps_section' ||
+    i.ruleId === 'placeholder_verification' ||
+    i.ruleId === 'scope_estimate_steps_high' ||
+    i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(planQualityIssues.length, 0, 'clean task plan produces no plan-quality issues');
+}
+
+console.log('\n=== Clean slice plan: no plan-quality issues ===');
+{
+  const content = `# S01: Well-Formed Slice
+
+**Goal:** Build a complete feature.
+**Demo:** Run the test suite and see all green.
+
+## Tasks
+
+- [ ] **T01: Create tests** \`est:20m\`
+  - Why: Tests define the contract before implementation.
+  - Files: \`tests/feature.test.ts\`
+  - Do: Write comprehensive test assertions.
+  - Verify: Test file runs without syntax errors.
+
+- [ ] **T02: Implement feature** \`est:30m\`
+  - Why: Core implementation.
+  - Files: \`src/feature.ts\`
+  - Do: Build the feature to make tests pass.
+  - Verify: All tests pass.
+
+## Verification
+
+- \`node --test tests/feature.test.ts\` — all assertions pass
+- Check error output for diagnostic messages
+
+## Observability / Diagnostics
+
+- Runtime signals: structured error objects with error codes
+- Inspection surfaces: test output shows pass/fail counts
+- Failure visibility: exit code 1 on failure with descriptive message
+- Redaction constraints: none
+`;
+
+  const issues = validateSlicePlanContent('S01-PLAN.md', content);
+  const planQualityIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+  assertEq(planQualityIssues.length, 0, 'clean slice plan produces no empty_task_entry issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Results
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log(`\nResults: ${passed} passed, ${failed} failed`);
+if (failed > 0) process.exit(1);
+console.log('All tests passed ✓');
--- a/Show more
+++ b/Show more
				`@ -0,0 +1 @@`
				Complete slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. All tasks are done. Read the templates at `~/.pi/agent/extensions/gsd/templates/slice-summary.md` and `uat.md`. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during completion, without relaxing required verification or artifact rules. Write `{{sliceId}}-SUMMARY.md` (compress task summaries), write `{{sliceId}}-UAT.md`, and fill the `UAT Type` plus `Not Proven By This UAT` sections explicitly so the artifact states what class of acceptance it covers and what still remains unproven. Review task summaries for `key_decisions` and ensure any significant ones are in `.gsd/DECISIONS.md`. Mark the slice checkbox done in the roadmap, update STATE.md, update milestone summary, and leave the slice branch clean for the extension to squash-merge back to main automatically.
				`@ -0,0 +1 @@`
				Execute the next task: {{taskId}} ("{{taskTitle}}") in slice {{sliceId}} of milestone {{milestoneId}}. Read the task plan (`{{taskId}}-PLAN.md`), load relevant summaries from prior tasks, and execute each step. Verify must-haves when done. If the task touches UI, browser flows, DOM behavior, or user-visible web state, exercise the real flow in the browser, prefer `browser_batch` for obvious sequences, prefer `browser_assert` for explicit pass/fail verification, use `browser_diff` when an action's effect is ambiguous, and use browser diagnostics when validating async or failure-prone UI. If you made an architectural, pattern, or library decision, append it to `.gsd/DECISIONS.md`. Read the template at `~/.pi/agent/extensions/gsd/templates/task-summary.md`. Write `{{taskId}}-SUMMARY.md`, mark it done, commit, and advance. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during execution, without relaxing required verification or artifact rules. If running long and not all steps are finished, stop implementing and prioritize writing a clean partial summary over attempting one more step — a recoverable handoff is more valuable than a half-finished step with no documentation. If verification fails, debug methodically: form a hypothesis and test that specific theory before changing anything, change one variable at a time, read entire functions not just the suspect line, distinguish observable facts from assumptions, and if 3+ fixes fail without progress stop and reassess your mental model — list what you know for certain, what you've ruled out, and form fresh hypotheses. Don't fix symptoms — understand why something fails before changing code.
				`@ -0,0 +1 @@`
				Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.gsd/DECISIONS.md` if it exists — respect existing decisions. Read `.gsd/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Read the templates at `~/.pi/agent/extensions/gsd/templates/plan.md` and `task-plan.md`. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. Write `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files in the `tasks/` subdirectory. If planning produces structural decisions, append them to `.gsd/decisions.md`. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during planning, without overriding required plan formatting. Before committing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 2–5 steps and 3–8 files (6–8 steps or 8–10 files — consider splitting; 10+ steps or 12+ files — must split), the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts.
				`@ -0,0 +1 @@`
				Resume interrupted work. Find the continue file (`{{sliceId}}-CONTINUE.md` or `continue.md`) in slice {{sliceId}} of milestone {{milestoneId}}, then pick up from where you left off. Delete the continue file after reading it. If a `GSD Skill Preferences` block is present in system context, use it to decide which skills to load and follow during execution, without relaxing required verification or artifact rules.