diff --git a/docs-internal/configuration.md b/docs-internal/configuration.md
index 067eb5da8..2c7fe49ed 100644
--- a/docs-internal/configuration.md
+++ b/docs-internal/configuration.md
@@ -578,7 +578,7 @@ prefer_skills:
 avoid_skills: []
 ```
 
-Skills can be bare names (looked up in `~/.gsd/agent/skills/`) or absolute paths.
+Skills can be bare names (looked up in `~/.agents/skills/` and `.agents/skills/`) or absolute paths.
 
 ### `skill_rules`
 
diff --git a/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md b/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md
index aa0fc79ea..7bb2c57cc 100644
--- a/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md
+++ b/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md
@@ -174,7 +174,7 @@ When a skill file references a relative path, resolve it against the skill direc
   <skill>
     <name>commit-outstanding</name>
     <description>Commit all uncommitted files in logical groups</description>
-    <location>/Users/you/.gsd/agent/skills/commit-outstanding/SKILL.md</location>
+    <location>/Users/you/.agents/skills/commit-outstanding/SKILL.md</location>
   </skill>
 </available_skills>
 ```
diff --git a/docs-internal/skills.md b/docs-internal/skills.md
index 71f039546..6a9e1d567 100644
--- a/docs-internal/skills.md
+++ b/docs-internal/skills.md
@@ -2,28 +2,85 @@
 
 Skills are specialized instruction sets that GSD loads when the task matches. They provide domain-specific guidance for the LLM — coding patterns, framework idioms, testing strategies, and tool usage.
 
-## Bundled Skills
+Skills follow the open [Agent Skills standard](https://agentskills.io/) and are **not GSD-specific** — they work with Claude Code, OpenAI Codex, Cursor, GitHub Copilot, Windsurf, and 40+ other agents.
 
-GSD ships with these skills, installed to `~/.gsd/agent/skills/`:
+## Skill Directories
 
-| Skill | Trigger | Description |
-|-------|---------|-------------|
-| `frontend-design` | Web UI work — components, pages, dashboards, styling | Production-grade frontend with high design quality |
-| `swiftui` | macOS/iOS apps — SwiftUI, Xcode, App Store | Full lifecycle from creation to shipping |
-| `debug-like-expert` | Complex debugging — after standard approaches fail | Methodical investigation with evidence gathering |
-| `rust-core` | Rust code — ownership, lifetimes, traits, async | Idiomatic, safe, performant Rust patterns |
-| `axum-web-framework` | Axum web apps — routing, middleware, extractors | Complete Axum development guide |
-| `axum-tests` | Testing Axum apps — integration tests, mock state | Test patterns for Axum applications |
-| `tauri` | Tauri v2 desktop apps — setup, plugins, bundling | Cross-platform desktop app development |
-| `tauri-ipc-developer` | Tauri IPC — React-Rust type-safe communication | Command scaffolding and serialization |
-| `tauri-devtools` | Tauri debugging — CrabNebula DevTools integration | Profiling and monitoring |
-| `github-workflows` | GitHub Actions — CI/CD, workflow debugging | Live syntax, run monitoring, failure diagnosis |
-| `security-audit` | Security auditing — dependency scanning, OWASP | Comprehensive security assessment |
-| `security-review` | Code security review — injection, XSS, auth flaws | Vulnerability-focused code review |
-| `security-docker` | Docker security — Dockerfile, runtime hardening | Container security best practices |
-| `review` | Code review — staged changes, PRs, security, performance | Diff-aware code review with quality analysis |
-| `test` | Test generation and execution — auto-detects frameworks | Generate tests or run existing suites with failure analysis |
-| `lint` | Linting and formatting — ESLint, Biome, Prettier | Auto-detect linter, fix issues, report remaining problems |
+GSD reads skills from two locations, in priority order:
+
+| Location                          | Scope   | Description                                              |
+|-----------------------------------|---------|----------------------------------------------------------|
+| `~/.agents/skills/`              | Global  | Shared across all projects and all compatible agents     |
+| `.agents/skills/` (project root) | Project | Project-specific skills, committable to version control  |
+
+Global skills take precedence over project skills when names collide.
+
+> **Migration from `~/.gsd/agent/skills/`:** On first launch after upgrading, GSD automatically copies skills from the legacy `~/.gsd/agent/skills/` directory to `~/.agents/skills/`. The old directory is preserved for backward compatibility.
+
+## Installing Skills
+
+Skills are installed via the [skills.sh CLI](https://skills.sh):
+
+```bash
+# Interactive — choose skills and target agents
+npx skills add dpearson2699/swift-ios-skills
+
+# Install specific skills non-interactively
+npx skills add dpearson2699/swift-ios-skills --skill swift-concurrency --skill swiftui-patterns -y
+
+# Install all skills from a repo
+npx skills add dpearson2699/swift-ios-skills --all
+
+# Check for updates
+npx skills check
+
+# Update installed skills
+npx skills update
+```
+
+### Onboarding Catalog
+
+During `gsd init`, GSD detects the project's tech stack and recommends relevant skill packs. For brownfield projects, detection is automatic; for greenfield projects, the user picks a tech stack.
+
+The curated catalog is maintained in `src/resources/extensions/gsd/skill-catalog.ts`. Each entry maps a tech stack to a skills.sh repo and specific skill names.
+
+#### Available Skill Packs
+
+**Swift (any Swift project — `Package.swift` or `.xcodeproj` detected):**
+- **SwiftUI** — layout, navigation, animations, gestures, Liquid Glass
+- **Swift Core** — Swift language, concurrency, Codable, Charts, Testing, SwiftData
+
+**iOS (only when `.xcodeproj` targets `iphoneos` via SDKROOT):**
+- **iOS App Frameworks** — App Intents, Widgets, StoreKit, MapKit, Live Activities
+- **iOS Data Frameworks** — CloudKit, HealthKit, MusicKit, WeatherKit, Contacts
+- **iOS AI & ML** — Core ML, Vision, on-device AI, speech recognition
+- **iOS Engineering** — networking, security, accessibility, localization, Instruments
+- **iOS Hardware** — Bluetooth, CoreMotion, NFC, PencilKit, RealityKit
+- **iOS Platform** — CallKit, EnergyKit, HomeKit, SharePlay, PermissionKit
+
+**Web:**
+- **React & Web Frontend** — React best practices, web design, composition patterns
+- **React Native** — cross-platform mobile patterns
+- **Frontend Design & UX** — frontend design, accessibility
+
+**Languages:**
+- **Rust** — Rust patterns and best practices
+- **Python** — Python patterns and best practices
+- **Go** — Go patterns and best practices
+
+**General:**
+- **Document Handling** — PDF, DOCX, XLSX, PPTX creation and manipulation
+
+### Maintaining the Catalog
+
+The skill catalog lives in [`src/resources/extensions/gsd/skill-catalog.ts`](../src/resources/extensions/gsd/skill-catalog.ts). To add or update a pack:
+
+1. Add a `SkillPack` entry to the `SKILL_CATALOG` array with `repo`, `skills`, and matching criteria
+2. For language-detection matching, use `matchLanguages` (values from `detection.ts` `LANGUAGE_MAP`)
+3. For Xcode platform matching, use `matchXcodePlatforms` (e.g., `["iphoneos"]` — parsed from `SDKROOT` in `project.pbxproj`)
+4. For file-presence matching, use `matchFiles` (checked against `PROJECT_FILES` in `detection.ts`)
+5. If the pack should appear in greenfield choices, add it to `GREENFIELD_STACKS`
+6. Packs sharing the same `repo` are batched into a single `npx skills add` invocation
 
 ## Skill Discovery
 
@@ -59,18 +116,18 @@ skill_rules:
 ### Resolution Order
 
 Skills can be referenced by:
-1. **Bare name** — e.g., `frontend-design` → scans `~/.gsd/agent/skills/` and project skills
-2. **Absolute path** — e.g., `/Users/you/.gsd/agent/skills/my-skill/SKILL.md`
+1. **Bare name** — e.g., `frontend-design` → scans `~/.agents/skills/` and project `.agents/skills/`
+2. **Absolute path** — e.g., `/Users/you/.agents/skills/my-skill/SKILL.md`
 3. **Directory path** — e.g., `~/custom-skills/my-skill` → looks for `SKILL.md` inside
 
-User skills (`~/.gsd/agent/skills/`) take precedence over project skills.
+Global skills (`~/.agents/skills/`) take precedence over project skills (`.agents/skills/`).
 
 ## Custom Skills
 
 Create your own skills by adding a directory with a `SKILL.md` file:
 
 ```
-~/.gsd/agent/skills/my-skill/
+~/.agents/skills/my-skill/
   SKILL.md           — instructions for the LLM
   references/        — optional reference files
 ```
@@ -82,10 +139,12 @@ The `SKILL.md` file contains instructions the LLM follows when the skill is acti
 Place skills in your project for project-specific guidance:
 
 ```
-.gsd/agent/skills/my-project-skill/
+.agents/skills/my-project-skill/
   SKILL.md
 ```
 
+Project-local skills can be committed to version control so team members share the same skill set.
+
 ## Skill Lifecycle Management
 
 GSD tracks skill performance across auto-mode sessions and surfaces health data to help you maintain skill quality.
diff --git a/docs-internal/what-is-pi/09-the-customization-stack.md b/docs-internal/what-is-pi/09-the-customization-stack.md
index 10a3fb42d..10d032b39 100644
--- a/docs-internal/what-is-pi/09-the-customization-stack.md
+++ b/docs-internal/what-is-pi/09-the-customization-stack.md
@@ -48,8 +48,8 @@ On-demand capability packages following the [Agent Skills standard](https://agen
 ```
 
 **Placement:**
-- `~/.gsd/agent/skills/` or `~/.agents/skills/` (global)
-- `.gsd/skills/` or `.agents/skills/` (project, searched up to git root)
+- `~/.agents/skills/` (global — shared across all agents)
+- `.agents/skills/` (project, searched up to git root)
 
 **Skill structure:**
 ```
diff --git a/docs/ADR-001-branchless-worktree-architecture.md b/docs/ADR-001-branchless-worktree-architecture.md
new file mode 100644
index 000000000..478dade24
--- /dev/null
+++ b/docs/ADR-001-branchless-worktree-architecture.md
@@ -0,0 +1,279 @@
+# ADR-001: Branchless Worktree Architecture
+
+**Status:** Proposed
+**Date:** 2026-03-15
+**Deciders:** Lex Christopherson
+**Advisors:** Claude Opus 4.6, Gemini 2.5 Pro, GPT-5.4 (Codex)
+
+## Context
+
+GSD uses git for isolation during autonomous coding sessions. The current architecture (shipped in M003, v2.13.0) creates a **worktree per milestone** with **slice branches inside each worktree**. Each slice (`S01`, `S02`, ...) gets its own branch (`gsd/M001/S01`) within the worktree, which merges back to the milestone branch (`milestone/M001`) via `--no-ff` when the slice completes. The milestone branch squash-merges to `main` when the milestone completes.
+
+This architecture replaced a previous "branch-per-slice" model that had severe `.gsd/` merge conflicts. M003 solved the merge conflicts but retained slice branches inside worktrees, inheriting complexity that has produced persistent, user-facing failures.
+
+### Problems
+
+**1. Planning artifact invisibility (loop detection failures)**
+
+When `research-slice` or `plan-slice` dispatches, the agent writes artifacts (e.g., `S02-RESEARCH.md`) on a slice branch. After the agent completes, `handleAgentEnd` switches back to the milestone branch for the next dispatch. The artifact is on the slice branch, not the milestone branch. `verifyExpectedArtifact()` checks the milestone branch, can't find the file, increments the loop counter, retries, same result. After 3 retries → hard stop. After 6 lifetime dispatches → permanent stop. This burns budget and blocks progress.
+
+Documented in the auto-stop architecture doc as "The Branch-Switching Problem."
+
+**2. `.gsd/` state clobbering across branches**
+
+`.gsd/` is gitignored (line 52 of `.gitignore`: `.gsd/`). Planning artifacts (roadmaps, plans, summaries, decisions, requirements) live in `.gsd/milestones/` but are invisible to git. When multiple branches or worktrees operate from the same repo, they share a single `.gsd/` directory on disk. Branch A's M001 roadmap overwrites Branch B's M001 roadmap. GSD reads corrupted state, shows wrong milestone as complete, or enters infinite dispatch loops.
+
+The codebase has a contradictory workaround: `smartStage()` (git-service.ts:304-352) force-adds `GSD_DURABLE_PATHS` (milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md) despite the `.gitignore`. This means `.gsd/milestones/` IS partially tracked on some branches but the gitignore claims otherwise. The code fights the configuration.
+
+**3. Merge/conflict code complexity**
+
+The current slice branch model requires:
+- `mergeSliceToMilestone()` — 98 lines, `--no-ff` merge with `withMergeHeal` wrapper
+- `mergeSliceToMain()` — 189 lines, squash-merge with conflict detection/categorization/auto-resolution
+- `git-self-heal.ts` — 198 lines, 3 recovery functions for merge failures
+- `fix-merge` dispatch unit — dedicated LLM session to resolve conflicts the auto-resolver can't handle
+- `smartStage()` — 49 lines of runtime exclusion during staging
+- Conflict categorization — 80 lines classifying `.gsd/` vs runtime vs code conflicts
+
+Total: **~582 lines** of merge/branch/conflict code across 3 files, plus the `fix-merge` prompt template and dispatch logic. This code exists solely because of slice branches.
+
+**4. Dual isolation modes**
+
+Branch-mode (`git-service.ts:mergeSliceToMain`) and worktree-mode (`auto-worktree.ts:mergeSliceToMilestone`) have parallel implementations with different merge strategies, different conflict handling, and different branch naming. Both paths must be maintained and tested. 11 test files exercise merge/branch/worktree logic.
+
+**5. Bug history**
+
+- v2.11.1: URGENT fix for parse cache staleness causing repeated unit dispatch (directly caused by branch switching invalidation timing)
+- v2.13.1: Windows hotfix for multi-line commit messages in `mergeSliceToMilestone`
+- 15+ separate bug fixes for `.gsd/` merge conflicts in the pre-M003 era
+- Persistent user complaints about loop detection failures and state corruption
+
+## Decision
+
+**Eliminate slice branches entirely.** All work within a milestone worktree commits sequentially on a single branch (`milestone/<MID>`). No branch creation, no branch switching, no slice merges, no conflict resolution within a worktree.
+
+Track `.gsd/` planning artifacts in git. Gitignore only runtime/ephemeral state.
+
+### The Architecture
+
+```
+main ──────────────────────────────────────────── main
+  │                                                 ↑
+  └─ worktree (milestone/M001)                      │
+       │                                            │
+       commit: feat(M001): context + roadmap        │
+       commit: feat(M001/S01): research             │
+       commit: feat(M001/S01): plan                 │
+       commit: feat(M001/S01/T01): impl             │
+       commit: feat(M001/S01/T02): impl             │
+       commit: feat(M001/S01): summary + UAT        │
+       commit: feat(M001/S02): research             │
+       commit: ...                                  │
+       commit: feat(M001): milestone complete       │
+       │                                            │
+       └──────────── squash merge ──────────────────┘
+```
+
+### Git Primitives Used
+
+| Primitive | Purpose |
+|-----------|---------|
+| **Worktrees** | One per active milestone. Filesystem isolation. |
+| **Commits** | Granular sequential history of every action. |
+| **Squash merge** | Clean single commit on `main` per milestone. |
+| **Branches** | Only `main` and `milestone/<MID>`. Nothing else. |
+
+### Git Primitives NOT Used
+
+| Primitive | Why Not |
+|-----------|---------|
+| Slice branches | Slices are sequential. Branches add complexity with no rollback benefit. |
+| `--no-ff` merges | No branches to merge within a worktree. |
+| Branch switching | Never happens. All work on one branch. |
+| Conflict resolution | No merges within a worktree means no conflicts within a worktree. |
+
+### `.gsd/` Tracking Model
+
+**Tracked in git (travels with the branch):**
+```
+.gsd/milestones/         — roadmaps, plans, summaries, research, contexts, task plans/summaries
+.gsd/PROJECT.md          — project overview
+.gsd/DECISIONS.md        — architectural decision register
+.gsd/REQUIREMENTS.md     — requirements register
+.gsd/QUEUE.md            — work queue
+```
+
+**Gitignored (ephemeral, runtime, infrastructure):**
+```
+.gsd/runtime/            — dispatch records, timeout tracking
+.gsd/activity/           — JSONL session dumps
+.gsd/worktrees/          — git worktree working directories
+.gsd/auto.lock           — crash detection sentinel
+.gsd/metrics.json        — token/cost accumulator
+.gsd/completed-units.json — dispatch idempotency tracker
+.gsd/STATE.md            — derived state cache (rebuilt by deriveState())
+.gsd/gsd.db              — SQLite cache (rebuilt from tracked markdown by importers)
+.gsd/DISCUSSION-MANIFEST.json — discussion phase tracking
+.gsd/milestones/**/*-CONTINUE.md — interrupted-work markers
+.gsd/milestones/**/continue.md   — legacy continue markers
+```
+
+### `.gitignore` Update
+
+Replace the current blanket `.gsd/` ignore with explicit runtime-only ignores:
+
+```gitignore
+# ── GSD: Runtime / Ephemeral ─────────────────────────────────
+.gsd/auto.lock
+.gsd/completed-units.json
+.gsd/STATE.md
+.gsd/metrics.json
+.gsd/gsd.db
+.gsd/activity/
+.gsd/runtime/
+.gsd/worktrees/
+.gsd/DISCUSSION-MANIFEST.json
+.gsd/milestones/**/*-CONTINUE.md
+.gsd/milestones/**/continue.md
+```
+
+Planning artifacts (milestones/, PROJECT.md, DECISIONS.md, REQUIREMENTS.md, QUEUE.md) are NOT in `.gitignore` and are tracked normally.
+
+## Consequences
+
+### Code Deletion
+
+| File | Lines Deleted | What's Removed |
+|------|--------------|----------------|
+| `auto-worktree.ts` | ~246 | `mergeSliceToMilestone()`, `shouldUseWorktreeIsolation()`, `getMergeToMainMode()`, slice merge guards |
+| `git-service.ts` | ~250 | `mergeSliceToMain()`, conflict resolution, runtime stripping post-merge, `ensureSliceBranch()`, `switchToMain()` |
+| `git-self-heal.ts` | ~86 | `abortAndReset()`, `withMergeHeal()` (merge-specific recovery) |
+| `auto.ts` | ~150 | Merge dispatch guards, `fix-merge` dispatch path, branch-mode routing |
+| `worktree.ts` | ~40 | `getSliceBranchName()`, `ensureSliceBranch()`, `mergeSliceToMain()` delegates |
+| **Test files** | ~11 files | `auto-worktree-merge.test.ts`, `auto-worktree-milestone-merge.test.ts`, merge-related test cases |
+| **Total** | **~770+ lines** | |
+
+### What `mergeMilestoneToMain()` Becomes
+
+The function simplifies dramatically:
+1. Auto-commit any dirty state in worktree
+2. `chdir` back to main repo root
+3. `git checkout main`
+4. `git merge --squash milestone/<MID>`
+5. `git commit` with milestone summary
+6. Remove worktree + delete branch
+
+No conflict categorization. No runtime file stripping. No `.gsd/` special handling. Planning artifacts merge cleanly because they're in `.gsd/milestones/M001/` which doesn't exist on `main` until this merge.
+
+### What `smartStage()` Becomes
+
+The force-add of `GSD_DURABLE_PATHS` is no longer needed — planning artifacts are not gitignored, so `git add -A` picks them up naturally. The function reduces to:
+
+1. `git add -A`
+2. `git reset HEAD -- <runtime paths>` (unstage runtime files)
+
+The `_runtimeFilesCleanedUp` one-time migration logic can also be removed.
+
+### What Happens to `handleAgentEnd()`
+
+After any unit completes:
+1. Invalidate caches
+2. `autoCommitCurrentBranch()` — commits on the one and only branch
+3. `verifyExpectedArtifact()` — file is always on the current branch (no branch switching)
+4. Persist completion key
+
+The "Path A fix" (lines 937-953) becomes the only path. No branch mismatch possible.
+
+### What Happens to `fix-merge`
+
+The `fix-merge` dispatch unit type is eliminated. Within a worktree, there are no merges that can conflict. The only merge is milestone→main (squash), and if that conflicts (rare, parallel milestone edge case), it's handled as a one-time resolution at milestone completion — not a dispatch loop.
+
+### Backwards Compatibility
+
+The `shouldUseWorktreeIsolation()` three-tier preference resolution is replaced by a single behavior: worktree isolation is always used. The `git.isolation: "branch"` preference is deprecated.
+
+Projects with existing `gsd/M001/S01` slice branches can still be read by state derivation, but new work never creates slice branches.
+
+### Risks
+
+**1. Parallel milestone code conflicts at squash-merge time**
+
+If two milestones modify the same source file, the second squash-merge to `main` will conflict. Mitigation: `git fetch origin main && git rebase main` before squash-merge. This is standard practice and rare in single-user workflows.
+
+**2. Loss of per-slice git history after squash**
+
+Squash merge collapses all commits into one on `main`. Mitigations:
+- Commit messages tag slices (`feat(M001/S01/T01):`) — filterable with `git log --grep`
+- The milestone branch can be preserved (not deleted) if history is needed
+- Alternative: `merge --no-ff` instead of `--squash` to keep history on `main`
+
+**3. SQLite DB desync after `git reset`**
+
+If tracked markdown rolls back via `git reset --hard`, the gitignored `gsd.db` doesn't. Mitigation: the importer layer (M001/S02) rebuilds the DB from markdown on startup. The DB is a cache, markdown is truth.
+
+**4. Disk space with multiple worktrees**
+
+Each worktree duplicates the working directory (including `node_modules`). Mitigation: single active milestone at a time (single-user workflow), immediate cleanup after completion.
+
+## Alternatives Considered
+
+### A. Keep slice branches, fix visibility with immediate mini-merges
+
+After `research-slice` or `plan-slice`, immediately merge the slice branch back to the milestone branch. This fixes the loop detection bug but retains all merge complexity.
+
+**Rejected:** Adds another merge path instead of removing the root cause. Still requires conflict resolution, self-healing, branch switching.
+
+### B. Keep `.gsd/` gitignored, bootstrap from git history for manual worktrees
+
+When GSD detects an empty `.gsd/` in a worktree, reconstruct state from the branch's git history using `git show <commit>:.gsd/...`.
+
+**Rejected:** Recovery logic, not architecture. Doesn't fix the fundamental problem of branch-agnostic state. Fails when git history has been rewritten.
+
+### C. Branch-scoped `.gsd/` directories (`.gsd/branches/<branch-name>/milestones/...`)
+
+Each branch writes to a namespaced subdirectory within `.gsd/`.
+
+**Rejected:** Adds complexity instead of removing it. Requires renaming/moving on branch creation, doesn't work with standard git tools (`git checkout` doesn't rename directories).
+
+## Validation
+
+This architecture was stress-tested by three independent models:
+
+**Gemini 2.5 Pro** identified 6 attack vectors. None broke the core model. Recommendations: pre-flight rebase before squash-merge (adopted), heartbeat locks (already exists), DB rebuild on startup (adopted via M001/S02 importers).
+
+**GPT-5.4 (Codex)** read the full codebase and confirmed the model is sound. Identified that `smartStage()` already force-adds durable paths (validating the tracked-artifact approach) and that `resolveMainWorktreeRoot` in PR #487 is architecturally wrong (adopted — PR to be closed).
+
+**Codebase analysis** confirmed `.gsd/milestones/` is already partially tracked on `main` despite the `.gitignore`, that `GSD_DURABLE_PATHS` exists as a code-level acknowledgment that planning artifacts should be tracked, and that the README already documents the correct runtime-only gitignore pattern.
+
+### Codex (GPT-5.4) Dissent — "No Slice Branches Is a Redesign"
+
+Codex read the full codebase and raised 4 concerns. Each is addressed:
+
+**Concern 1: "Crash after slice done but before integration — today the runtime detects orphaned slice branches and merges them."**
+
+Rebuttal: In the branchless model, there is no integration step to crash between. Slice work is committed directly on the milestone branch. On restart, `deriveState()` reads the branch state as-is. The orphaned-branch recovery path exists solely because of slice branches — removing branches removes the failure mode it recovers from.
+
+**Concern 2: "Concurrent edits to shared root docs (PROJECT.md, DECISIONS.md) from two terminals."**
+
+Rebuttal: Valid edge case. If `/gsd queue` edits `DECISIONS.md` on `main` while auto-mode edits it in a worktree, there's a content conflict at squash-merge time. This is a standard git content conflict — no different from two developers editing the same file. Handled by normal merge resolution. Not caused by or solved by slice branches.
+
+**Concern 3: "Slice→milestone merges provide continuous integration. Removing them pushes conflict discovery to the end."**
+
+Rebuttal: In a single-user sequential workflow, there is nothing to integrate against within a worktree. Each slice builds on the previous one. The only conflict source is `main` diverging (e.g., another milestone merging first), which slice→milestone merges don't catch anyway — they merge within the worktree, not against `main`. Pre-flight rebase before squash-merge catches this more directly.
+
+**Concern 4: "Replace slice branches with another explicit slice-boundary primitive. Don't just delete them."**
+
+Response: Accepted in spirit. Commits with conventional tags (`feat(M001/S01):`, `feat(M001/S01/T01):`) serve as the slice boundary primitive. `git log --grep="M001/S01"` isolates a slice's history. `git revert` targets specific commits. Git tags (`gsd/M001/S01-complete`) can mark slice completion if needed. The boundary primitive is commit metadata, not branches.
+
+## Action Items
+
+1. Close PR #487 (`resolveMainWorktreeRoot`) — contradicts this architecture
+2. Implement as a GSD milestone with phases:
+   - Update `.gitignore` and force-add existing planning artifacts
+   - Remove slice branch creation/switching/merging code
+   - Simplify `mergeMilestoneToMain()` and `smartStage()`
+   - Remove `fix-merge` dispatch unit
+   - Remove branch-mode isolation (`git.isolation: "branch"`)
+   - Update/delete 11 test files
+   - Update README suggested gitignore
+   - Migration path for existing projects with slice branches
diff --git a/docs/ADR-003-pipeline-simplification.md b/docs/ADR-003-pipeline-simplification.md
new file mode 100644
index 000000000..ddc31f609
--- /dev/null
+++ b/docs/ADR-003-pipeline-simplification.md
@@ -0,0 +1,738 @@
+# ADR-003: Auto-Mode Pipeline Simplification
+
+**Status:** Proposed
+**Date:** 2026-03-18
+**Deciders:** Lex Christopherson
+**Related:** ADR-001 (branchless worktree architecture), ADR-002 (external state directory)
+**Audited by:** Claude Opus 4.6, OpenAI Codex — findings incorporated below.
+
+## Context
+
+GSD auto-mode orchestrates a multi-session pipeline where each "unit" of work runs in a fresh LLM session. The pipeline for a single milestone with N slices and M tasks per slice runs through:
+
+```
+research-milestone → plan-milestone →
+  (research-slice → plan-slice → execute-task × M → complete-slice → reassess-roadmap) × N →
+  validate-milestone → complete-milestone
+```
+
+The exact session count depends on profile. The "quality" profile runs all phases. The "balanced" profile skips slice research by default. The "budget" profile skips milestone research, slice research, reassessment, and milestone validation. This ADR uses the quality profile as the baseline for analysis — it represents the full pipeline and the worst-case ceremony overhead.
+
+For a typical 4-slice, 3-task milestone under the quality profile:
+- 1 research-milestone + 1 plan-milestone
+- Per slice: research-slice (skipped for S01) + plan-slice + 3 execute-task + complete-slice + reassess-roadmap (skipped for last slice, since all slices are done)
+- Per-slice total for S01: 0 + 1 + 3 + 1 + 1 = 6
+- Per-slice total for S02–S04: 1 + 1 + 3 + 1 + 1 = 7 (S04 skips reassess since it's the last completed slice: 6)
+- Slices total: 6 + 7 + 7 + 6 = 26
+- Plus: 1 validate-milestone + 1 complete-milestone
+
+**Total: 30 sessions.** Only 12 are task execution. The remaining 18 are pipeline ceremony.
+
+(The "balanced" profile drops slice research for S02-S04: 30 - 3 = 27 sessions. The "budget" profile drops milestone research, all slice research, reassessment, and validation: 30 - 1 - 3 - 3 - 1 = 22 sessions.)
+
+### The Token Tax
+
+Every fresh session re-ingests static context via prompt inlining. The `auto-prompts.ts` builders (1,099 lines) inline the following files into nearly every unit type:
+
+| File | Inlined Into | Changes After |
+|------|-------------|---------------|
+| ROADMAP | research-slice, plan-slice, execute-task (excerpt), complete-slice, reassess, validate, complete-milestone | plan-milestone (rare reassess rewrites) |
+| DECISIONS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-milestone, validate | Appended occasionally during execution |
+| REQUIREMENTS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-slice, complete-milestone, validate | Updated during complete-slice |
+| KNOWLEDGE.md | research-milestone, plan-milestone, research-slice, plan-slice, execute-task, complete-slice, complete-milestone, validate | Appended occasionally during execution |
+| PROJECT.md | research-milestone, plan-milestone, complete-milestone, validate | Rarely updated |
+
+The ROADMAP alone is inlined into 7 unit types. It never changes during normal execution. This is a static document being re-tokenized per session at a cost of 5–20K tokens each time.
+
+For the 30-session milestone above (quality profile), context re-ingestion costs approximately:
+- ROADMAP: 7 re-inlines × ~10K tokens = 70K tokens
+- DECISIONS: 6 re-inlines × ~5K tokens = 30K tokens
+- REQUIREMENTS: 8 re-inlines × ~5K tokens = 40K tokens
+- KNOWLEDGE: 8 re-inlines × ~3K tokens = 24K tokens
+- Templates (research, plan, task-plan, etc.): ~2K per inline × ~10 units = 20K tokens
+- Dependency summaries: ~8K per slice plan × 3 non-S01 slices = 24K tokens
+
+**Total context re-ingestion overhead: ~208K tokens per milestone.** This is pure waste — the LLM re-reads documents it already processed in prior sessions, gaining no new information.
+
+### The Lossy Handoff Problem
+
+Each session boundary is a lossy compression step. The research-milestone agent reads the codebase and writes a RESEARCH.md. The plan-milestone agent reads that research and produces a ROADMAP. The research-slice agent reads the ROADMAP and explores the codebase again for its slice scope. The plan-slice agent reads that slice research and produces a PLAN.
+
+This is a game of telephone:
+
+```
+Codebase → [researcher reads code] → RESEARCH.md → [planner reads research] → ROADMAP
+                                                      ↑ often re-reads the same code
+```
+
+The research prompt explicitly says: *"Write for the roadmap planner."* The plan prompt says: *"Trust the research. Don't re-read code."* But planners routinely re-read code because research is a lossy compression — a summary of what one LLM session saw, not the thing itself. The fidelity loss compounds at each handoff.
+
+### The Machinery Tax
+
+The multi-session pipeline requires extensive orchestration machinery to handle edge cases, failures, and recovery:
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `auto-recovery.ts` | 591 | Artifact resolution, loop remediation, skip/rerun logic |
+| `auto-stuck-detection.ts` | 220 | Dispatch loop detection, lifetime caps, stub recovery |
+| `auto-idempotency.ts` | 150 | Skip completed units, phantom loop detection, stale key recovery |
+| `session-forensics.ts` | 536 | Post-mortem analysis, crash briefings, deep diagnostics |
+| `auto-timeout-recovery.ts` | 262 | Resume after timeout, recovery briefing synthesis |
+| `crash-recovery.ts` | 108 | Lock file management, crash detection |
+| `auto-post-unit.ts` | 591 | Post-agent processing, verification, commits, state sync |
+| `auto-verification.ts` | 229 | Post-task verification enforcement |
+| `verification-gate.ts` | 643 | Test/lint/audit gate runner |
+| `doctor-proactive.ts` | 292 | Health checks, proactive healing, escalation detection |
+| **Total** | **3,622** | **Recovery, verification, and post-processing** |
+
+This is 3,622 lines of code managing the complexity of a 15-rule dispatch table across 13 unit types. Much of this machinery exists because the pipeline has so many sessions that failures, timeouts, and stuck states are statistically likely.
+
+### The Ceremony Sessions
+
+Six of the 13 unit types produce no code. They exist purely to manage the pipeline:
+
+| Unit Type | What It Does | Sessions per Milestone (quality, 4-slice) |
+|-----------|-------------|----------------------|
+| research-milestone | Reads codebase, writes RESEARCH.md | 1 |
+| research-slice | Reads codebase for slice scope, writes slice RESEARCH.md | 3 (skipped for S01) |
+| complete-slice | Re-reads ROADMAP + plan + all task summaries, writes slice SUMMARY.md + UAT.md | 4 |
+| reassess-roadmap | Re-reads ROADMAP + slice summary, almost always says "roadmap is fine" | 3 (skipped after last slice) |
+| validate-milestone | Re-reads ROADMAP + all slice summaries, writes VALIDATION.md | 1 |
+| complete-milestone | Re-reads ROADMAP + all slice summaries, writes SUMMARY.md | 1 |
+
+Total: 1 + 3 + 4 + 3 + 1 + 1 = **13 ceremony sessions** (under quality profile), each consuming 12–37K tokens of prompt context. Under the balanced profile this drops to 9 (no slice research). These sessions burn tokens re-reading documents that other sessions already produced, producing intermediate artifacts that downstream sessions then re-read.
+
+### Root Cause
+
+The pipeline was designed around a paradigm where:
+1. LLM context windows are small (32K–100K tokens)
+2. Sessions are expensive, so specialize each one
+3. Handoffs between specialized agents produce better results than generalist sessions
+4. Research → plan → execute is the "correct" decomposition of intellectual work
+
+With 200K+ token context windows and prompt caching, assumptions 1-2 are obsolete. Assumption 3 is demonstrably false — handoffs lose fidelity. Assumption 4 confuses human workflow patterns with LLM-optimal patterns. An LLM with tool access is already researching while it plans. Forcing it to serialize research into a document, then read that document in a new session, is an artificial bottleneck.
+
+## Decision
+
+**Collapse the pipeline from 13 unit types to 5. Merge research into planning. Fold completion into post-unit mechanical processing. Replace LLM-driven validation with mechanical verification aggregation.**
+
+### The Simplified Pipeline
+
+```
+plan-milestone → (plan-slice → execute-task × M) × N → done
+```
+
+Note: `discuss` is an interactive human-facing session, not an auto-mode unit — it's not counted in session math. It continues to work as-is.
+
+For the same 4-slice, 3-task milestone:
+- 1 plan-milestone (S01 plan + task plans produced inline via single-slice fast path if applicable)
+- S01: plan-slice skipped (milestone planner already explored) + 3 execute-task = 3
+- S02–S04: plan-slice + 3 execute-task = 4 each × 3 slices = 12
+
+**Total: 1 + 3 + 12 = 16 sessions** (down from 30). The 14 eliminated sessions were the highest-waste ones — each re-ingested context for minimal value.
+
+### Unit Type Changes
+
+#### 1. Merge research-milestone INTO plan-milestone
+
+**Current:** Two sessions. Researcher explores codebase, writes RESEARCH.md. Planner reads RESEARCH.md, writes ROADMAP.
+
+**New:** One session. The plan-milestone agent explores the codebase directly and produces the ROADMAP. It has full tool access — it can read files, run commands, search code. The "research" happens naturally as part of planning, not as a serialized intermediary.
+
+**What changes:**
+- The plan-milestone prompt gains the research-milestone's exploration instructions: "Explore relevant code, check technologies, identify constraints."
+- The plan-milestone prompt drops "Trust the research" — there is no research document to trust.
+- The RESEARCH.md artifact becomes optional. If the planner wants to capture notes for downstream reference, it can write one. But it's not required, and downstream units don't depend on it.
+- Skill discovery instructions move into the plan-milestone prompt.
+- The research-milestone template (`prompts/research-milestone.md`) is retained but only used when explicitly dispatched via `/gsd dispatch research`.
+
+**Token savings:** ~1 full session (12–37K tokens of prompt context) + the RESEARCH.md document no longer re-inlined into plan-milestone (~5–15K tokens).
+
+**Quality impact:** Positive. The planner has direct access to the codebase instead of reading a lossy summary. It can verify assumptions in real time instead of trusting a prior session's interpretation.
+
+#### 2. Merge research-slice INTO plan-slice
+
+**Current:** Two sessions per non-S01 slice. Slice researcher explores codebase for slice scope, writes slice RESEARCH.md. Slice planner reads that research, writes PLAN.md + task plans.
+
+**New:** One session. The plan-slice agent explores the relevant code directly and produces the slice plan with task plans.
+
+**What changes:**
+- The plan-slice prompt gains exploration instructions: "Read the relevant code for this slice's scope before decomposing."
+- The plan-slice prompt drops "Trust the research" — there is no slice research document.
+- Slice RESEARCH.md becomes optional (same as milestone research above).
+- The research-slice template is retained for explicit dispatch.
+- The `skip_slice_research` preference becomes the default behavior rather than an opt-in.
+- The dispatch rule "planning (no research, not S01) → research-slice" is removed.
+
+**Token savings:** ~1 session per non-S01 slice × (N-1) slices. For a 4-slice milestone: 3 sessions × 12–37K tokens = 36–111K tokens.
+
+**Quality impact:** Positive. The planner can read actual code files instead of a summary. It verifies file paths, function signatures, and patterns directly rather than trusting a researcher's notes.
+
+#### 3. Fold complete-slice INTO mechanical post-unit processing
+
+**Current:** After all tasks in a slice complete, `deriveState()` emits the `summarizing` phase, dispatching a separate complete-slice LLM session that re-reads the ROADMAP, slice plan, and ALL task summaries to write a slice SUMMARY.md and UAT.md.
+
+**New:** Slice completion moves to a **post-gate mechanical closeout** in `auto-post-unit.ts`, not into the final executor's prompt. After the last execute-task's verification gate passes:
+
+1. The post-unit processing detects that all tasks in the slice are done (same check `deriveState()` uses to emit `summarizing`).
+2. It runs mechanical slice completion: aggregate task summaries into a SUMMARY.md using structured frontmatter, generate a UAT.md from the slice plan's verification section, mark the slice done in the ROADMAP.
+3. If the mechanical summary is insufficient (complex slices where structured aggregation loses important narrative), the system detects low quality (e.g., summary is below a character threshold) and dispatches a standalone complete-slice LLM session as recovery.
+
+**Why post-gate, not in the executor prompt:**
+- Codex audit identified that folding completion into execute-task creates a verification-retry ordering problem: if the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure would retry against incorrect derived state (the slice appears complete when it isn't).
+- Post-gate processing runs after verification succeeds, so state transitions are always consistent.
+- The executor's context budget is fully available for its actual work.
+
+**What changes in `deriveState()`:**
+- The `summarizing` phase still exists in state derivation (all tasks done, slice not marked complete).
+- The dispatch table no longer maps `summarizing → complete-slice`. Instead, post-unit processing handles the transition synchronously.
+- If post-unit mechanical completion fails or produces low-quality output, the `summarizing` phase still exists as a dispatch target and the system falls back to dispatching a complete-slice LLM session.
+
+**What changes:**
+- `auto-post-unit.ts` gains a `mechanicalSliceCompletion()` function.
+- The complete-slice dispatch rule is removed from the default path but retained as a fallback.
+- The complete-slice template is retained for recovery and explicit dispatch.
+- The `summarizing` phase in `state.ts` is unchanged — it serves as the fallback trigger if mechanical completion doesn't run.
+
+**Full completion contract preserved:** The mechanical completion writes all three required artifacts (SUMMARY.md, UAT.md, ROADMAP checkbox) — matching the current complete-slice contract. It also handles REQUIREMENTS.md updates and KNOWLEDGE.md/DECISIONS.md appendix that the current complete-slice prompt performs (see Risk 5 below for details).
+
+**Token savings:** ~1 session per slice × N slices. For a 4-slice milestone: 4 sessions × 12–37K tokens = 48–148K tokens.
+
+**Quality impact:** For most slices, the mechanical summary is sufficient — it aggregates structured frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) from task summaries. For complex slices with important narrative context, the LLM fallback preserves quality.
+
+#### 4. Eliminate reassess-roadmap (make opt-in)
+
+**Current:** After every slice completion, a reassess-roadmap session re-reads the ROADMAP and slice summary, then almost always writes "roadmap is fine."
+
+**New:** Reassessment is eliminated by default. The plan-slice agent for the next slice serves as the natural reassessment point — it reads the ROADMAP and prior slice summaries, and can adjust its plan if the ground has shifted.
+
+**What changes:**
+- The reassess-roadmap dispatch rule fires only when the `reassess_after_slice` preference is enabled (default: off, was effectively always-on).
+- The plan-slice prompt gains a reassessment preamble: "Before planning this slice, verify that the roadmap's assumptions still hold given prior slice summaries. If the remaining roadmap needs adjustment, modify it before proceeding."
+- The `checkNeedsReassessment()` function in auto-prompts.ts becomes a preference gate, not a mandatory check.
+
+**Token savings:** ~1 session per completed non-final slice × (N-1) slices minus those already skipped. For a 4-slice milestone under quality profile: 3 sessions × 12–37K tokens = 36–111K tokens.
+
+**Quality impact:** Neutral. The reassess prompt says *"Bias strongly toward 'roadmap is fine.'"* — acknowledging that most reassessments produce no change. JIT reassessment during the next plan-slice is more informed (has the next slice's context) and costs zero additional tokens.
+
+#### 5. Replace validate-milestone with mechanical verification
+
+**Current:** An LLM session re-reads the ROADMAP and all slice summaries, checks success criteria against delivery evidence, and writes a VALIDATION.md with a verdict. It also inlines UAT-RESULT artifacts from slices with `uat_dispatch` enabled.
+
+**New:** The system mechanically aggregates verification results from all tasks and slices. The canonical verification data sources are:
+
+1. **`T##-VERIFY.json`** files (written by `writeVerificationJSON()` in `verification-evidence.ts`) — machine-readable per-task verification results with command, exit code, verdict, duration, and blocking status.
+2. **`S##-UAT-RESULT.md`** files (when `uat_dispatch` is enabled) — human or artifact-driven UAT outcomes.
+3. **Task summary frontmatter** `verification_result` field — a human-readable pass/fail string (not structured, used as a secondary signal).
+
+The aggregator reads `T##-VERIFY.json` as the primary source of truth, supplements with UAT-RESULT artifacts, and produces a deterministic VALIDATION.md.
+
+**What changes:**
+- A new `aggregateMilestoneVerification()` function collects `T##-VERIFY.json` files and `S##-UAT-RESULT.md` files across all slices.
+- The function produces a VALIDATION.md with per-task and per-slice pass/fail status, UAT evidence, and an overall verdict.
+- The LLM-driven validate-milestone session is removed from the default pipeline.
+- The validate-milestone template is retained for explicit dispatch (users who want LLM-driven validation can run `/gsd dispatch validate`).
+- The `skip_milestone_validation` preference (which writes a pass-through VALIDATION.md) becomes the default behavior, with the mechanical aggregation replacing it.
+
+```typescript
+async function aggregateMilestoneVerification(base: string, mid: string): Promise<ValidationResult> {
+  const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
+  const checks: EvidenceCheckJSON[] = [];
+  const uatResults: { sliceId: string; content: string }[] = [];
+
+  for (const slice of roadmap.slices) {
+    // Primary source: T##-VERIFY.json files (machine-readable, written by verification-gate.ts)
+    const tDir = resolveTasksDir(base, mid, slice.id);
+    if (tDir) {
+      const verifyFiles = resolveTaskFiles(tDir, "VERIFY");
+      for (const file of verifyFiles) {
+        const content = await loadFile(join(tDir, file));
+        if (content) {
+          const evidence: EvidenceJSON = JSON.parse(content);
+          checks.push(...evidence.checks);
+        }
+      }
+    }
+
+    // Secondary source: S##-UAT-RESULT.md (when uat_dispatch enabled)
+    const uatResultFile = resolveSliceFile(base, mid, slice.id, "UAT-RESULT");
+    if (uatResultFile) {
+      const uatContent = await loadFile(uatResultFile);
+      if (uatContent) uatResults.push({ sliceId: slice.id, content: uatContent });
+    }
+  }
+
+  const allChecksPassed = checks.every(c => c.verdict === "pass");
+  const hasUatFailures = uatResults.some(r => r.content.includes("❌") || r.content.includes("FAIL"));
+  const verdict = allChecksPassed && !hasUatFailures ? "pass" : "needs-attention";
+
+  return { verdict, checks, uatResults };
+}
+```
+
+**Token savings:** 1 session × 12–37K tokens. This session is one of the most context-heavy — it inlines the ROADMAP + all slice summaries + all UAT results.
+
+**Quality impact:** Positive. Mechanical verification is deterministic and complete. LLM validation is subjective and can miss things. The verification gate and UAT system already do the hard work — the validate session was a redundant re-check. The `T##-VERIFY.json` artifacts are the canonical machine-readable source, not task summary frontmatter.
+
+#### 6. Replace complete-milestone with mechanical completion
+
+**Current:** An LLM session re-reads the ROADMAP and all slice summaries to write a SUMMARY.md.
+
+**New:** The system produces a milestone summary mechanically by aggregating slice summaries. The summary includes: milestone title, success criteria with pass/fail status, slice completion dates, key decisions made, and patterns established (all extracted from structured frontmatter in slice summaries).
+
+**What changes:**
+- A new `generateMilestoneSummary()` function reads all slice SUMMARY.md files, extracts frontmatter fields, and produces a structured milestone SUMMARY.md.
+- The complete-milestone dispatch rule is replaced with a synchronous post-processing step after the validation artifact is written.
+- The complete-milestone template is retained for explicit dispatch.
+
+**What changes in `deriveState()`:**
+- The `validating-milestone` and `completing-milestone` phases still exist in state derivation.
+- When mechanical validation + completion runs synchronously in post-unit processing, these phases are transient — `deriveState()` emits them, but the mechanical processing writes the VALIDATION.md and SUMMARY.md artifacts before the next dispatch cycle, so the phases resolve immediately.
+- If mechanical processing fails, the phases remain as dispatch targets and the system falls back to dispatching LLM sessions for validation and/or completion.
+
+**Token savings:** 1 session × 12–37K tokens.
+
+**Quality impact:** Neutral. Milestone summaries are archival — they capture what happened, not make decisions. Mechanical aggregation of structured frontmatter is more reliable than an LLM re-interpreting task summaries.
+
+### Dispatch Table Changes
+
+**Current: 15 rules.**
+
+```
+1. rewrite-docs (override gate)
+2. summarizing → complete-slice
+3. run-uat (post-completion)
+4. reassess-roadmap (post-completion)
+5. needs-discussion → stop
+6. pre-planning (no context) → stop
+7. pre-planning (no research) → research-milestone
+8. pre-planning (has research) → plan-milestone
+9. planning (no research, not S01) → research-slice
+10. planning → plan-slice
+11. replanning-slice → replan-slice
+12. executing → execute-task (recovery)
+13. executing → execute-task
+14. validating-milestone → validate-milestone
+15. completing-milestone → complete-milestone
+```
+
+**New: 11 rules.**
+
+```
+1. rewrite-docs (override gate)                           [unchanged]
+2. summarizing → complete-slice                           [FALLBACK ONLY — fires when mechanical completion didn't run]
+3. run-uat (post-completion)                              [unchanged, preference-gated]
+4. needs-discussion → stop                                [unchanged]
+5. pre-planning (no context) → stop                       [unchanged]
+6. pre-planning → plan-milestone                          [rules 7+8 merged — research folded in]
+7. planning → plan-slice                                  [rules 9+10 merged — research folded in]
+8. replanning-slice → replan-slice                        [unchanged]
+9. executing → execute-task (recovery)                    [unchanged]
+10. executing → execute-task                              [unchanged]
+11. validating-milestone → validate-milestone             [FALLBACK ONLY — fires when mechanical validation didn't run]
+12. completing-milestone → complete-milestone              [FALLBACK ONLY — fires when mechanical completion didn't run]
+```
+
+Note: Rules 2, 11, and 12 are retained as **fallbacks** for cases where mechanical processing fails. They do not fire in the normal path because post-unit processing writes the required artifacts before the next dispatch cycle. This means `deriveState()` is unchanged — it still emits `summarizing`, `validating-milestone`, and `completing-milestone` phases. The change is that these phases are normally resolved mechanically before dispatch evaluates them.
+
+**Removed rules (no longer in default path):**
+- `reassess-roadmap` — folded into next plan-slice (or opt-in preference)
+- `pre-planning (no research) → research-milestone` — merged into plan-milestone
+- `planning (no research, not S01) → research-slice` — merged into plan-slice
+
+### Prompt Changes
+
+#### plan-milestone.md — gains exploration instructions
+
+Add before the planning steps:
+
+```markdown
+## Explore First, Then Decompose
+
+You have full tool access. Before decomposing into slices:
+1. Explore the relevant codebase — read key files, understand existing patterns, identify constraints.
+2. For unfamiliar libraries, use `resolve_library` / `get_library_docs`.
+3. Skill Discovery ({{skillDiscoveryMode}}):{{skillDiscoveryInstructions}}
+
+Narrate key findings as you go. If findings are significant enough to benefit downstream slice planners, write {{researchOutputPath}} — but only if the content would genuinely help. Don't write a research doc just because the template exists.
+```
+
+#### plan-slice.md — gains exploration + reassessment preamble
+
+Add before the planning steps:
+
+```markdown
+## Verify Roadmap Assumptions
+
+Before planning this slice, check whether the roadmap's assumptions still hold:
+- Review prior slice summaries (inlined above). Did anything change that affects this slice?
+- If the remaining roadmap needs adjustment, modify the unchecked slices in {{roadmapPath}} before proceeding.
+
+## Explore Slice Scope
+
+Read the relevant code for this slice before decomposing:
+1. Check the files and modules this slice will touch.
+2. Verify the approach described in the roadmap against the actual codebase state.
+3. If the roadmap's description of this slice is wrong or outdated, adjust your plan accordingly.
+```
+
+### Context Inlining Changes
+
+#### Reduce inlining for planning sessions — provide paths for stable documents
+
+Planning sessions (plan-milestone, plan-slice) currently inline ROADMAP, DECISIONS, REQUIREMENTS, KNOWLEDGE, and PROJECT. Since these sessions now also explore the codebase (merged research), the total prompt size grows. To offset this, stable documents should be provided as file paths rather than inlined content for planning sessions.
+
+**Current pattern:**
+```typescript
+inlined.push(await inlineFile(roadmapPath, roadmapRel, "Milestone Roadmap"));
+```
+
+**New pattern for plan-milestone/plan-slice:**
+```typescript
+sourcePaths.push(`- Milestone Roadmap: \`${roadmapRel}\` — read this for the full slice decomposition`);
+```
+
+The prompt header changes from "All relevant context has been preloaded below" to "Source files are listed below. Read them before proceeding."
+
+**What stays inlined:**
+- **Task plan** in execute-task (it's the executor's authoritative contract — must be in prompt)
+- **Slice plan excerpt** in execute-task (goal/demo/verification — small and task-specific)
+- **Prior task summaries** in execute-task (carry-forward context — already budget-managed)
+- **Milestone context** in plan-milestone (it's the starting input — relatively small)
+
+**What moves to file-path references:**
+- ROADMAP in plan-slice, complete-slice, reassess, validate, complete-milestone
+- DECISIONS.md everywhere except execute-task (where it's already omitted for minimal inline level)
+- REQUIREMENTS.md everywhere except execute-task
+- KNOWLEDGE.md everywhere (already uses `inlineFileSmart` for execute-task)
+- PROJECT.md everywhere
+
+**Interaction with budget engine:** The current budget engine (`context-budget.ts`) truncates inlined content when it exceeds budget. Removing inlining means the LLM reads the full file via tool call. For most documents (ROADMAP ~3-10K chars, DECISIONS ~2-5K chars), the full read is within budget. For very large REQUIREMENTS.md files (>30K chars), the LLM may need to use the DB-scoped query (`inlineRequirementsFromDb` with slice scoping) or the compact formatter. The path reference should note: "For large files, use scoped queries."
+
+**Risk: LLMs might not read referenced files.**
+
+This is the most significant behavioral risk in this ADR. Inlined content forces processing. Path references require the LLM to decide to read. Mitigation:
+
+1. **Mandatory read directives.** The prompt says "You MUST read the following files before proceeding" with a numbered list of 2-3 critical files. Not "read as needed" — a direct instruction.
+2. **Verification.** The plan-slice prompt requires citing the ROADMAP's slice description in its output (slice title, risk level, depends). If these don't match, the planner didn't read it.
+3. **Phased rollout.** Phase 4 (context reduction) is separate from Phase 1 (research merge). This allows measuring whether path references degrade plan quality before full rollout.
+4. **Fallback.** If path references prove unreliable, restore inlining for critical documents only (ROADMAP in plan-slice). The budget engine still handles truncation.
+
+**Token savings (Phase 4 only):** Eliminates ~150K tokens of re-ingestion per milestone (revised from 208K — the execute-task sessions retain inlined content). The LLM reads files as needed via tool calls, cached by API prompt caching. Net savings are ~50-60% of the re-ingestion overhead, since the LLM still reads most files once per session.
+
+### Post-Unit Processing Changes
+
+#### Mechanical slice completion
+
+After the last execute-task's verification gate passes and post-unit processing detects all tasks done:
+
+```typescript
+async function mechanicalSliceCompletion(base: string, mid: string, sid: string): Promise<boolean> {
+  const tDir = resolveTasksDir(base, mid, sid);
+  if (!tDir) return false;
+
+  const summaryFiles = resolveTaskFiles(tDir, "SUMMARY").sort();
+  const taskSummaries = await Promise.all(
+    summaryFiles.map(async f => ({ file: f, summary: parseSummary(await loadFile(join(tDir, f)) ?? "") }))
+  );
+
+  // Aggregate structured frontmatter
+  const allProvides = taskSummaries.flatMap(t => t.summary.frontmatter.provides);
+  const allKeyFiles = taskSummaries.flatMap(t => t.summary.frontmatter.key_files);
+  const allDecisions = taskSummaries.flatMap(t => t.summary.frontmatter.key_decisions);
+  const allPatterns = taskSummaries.flatMap(t => t.summary.frontmatter.patterns_established);
+  const allAffects = taskSummaries.flatMap(t => t.summary.frontmatter.affects);
+
+  // Build slice SUMMARY.md from aggregated frontmatter
+  const sliceSummary = formatSliceSummary({ sid, provides: allProvides, keyFiles: allKeyFiles, ... });
+
+  // Build UAT.md from slice plan's Verification section
+  const slicePlanContent = await loadFile(resolveSliceFile(base, mid, sid, "PLAN"));
+  const verificationSection = extractMarkdownSection(slicePlanContent, "Verification");
+  const sliceUat = formatSliceUat(sid, verificationSection);
+
+  // Write all three artifacts atomically
+  writeFileSync(sliceSummaryPath, sliceSummary);
+  writeFileSync(sliceUatPath, sliceUat);
+  markSliceDoneInRoadmap(base, mid, sid);
+
+  // Handle REQUIREMENTS.md updates (currently done by complete-slice prompt step 5)
+  // Mechanical: mark requirements as Validated if all tasks covering them passed verification.
+  await mechanicalRequirementsUpdate(base, mid, sid, taskSummaries);
+
+  // Handle DECISIONS.md appendix (currently done by complete-slice prompt step 8)
+  // Mechanical: collect key_decisions from task summaries not already in DECISIONS.md
+  await appendNewDecisions(base, taskSummaries);
+
+  // Handle KNOWLEDGE.md appendix (currently done by complete-slice prompt step 9)
+  // Not mechanical — skip. Knowledge entries require judgment about what's genuinely useful.
+  // The executor tasks already write KNOWLEDGE.md entries during execution (step 13 in execute-task).
+
+  return true;
+}
+```
+
+**Fallback:** If `mechanicalSliceCompletion()` fails or produces output below a quality threshold (e.g., summary under 200 chars for a multi-task slice), the `summarizing` phase persists in `deriveState()` and the dispatch table's retained fallback rule dispatches a complete-slice LLM session.
+
+#### Mechanical milestone validation
+
+See `aggregateMilestoneVerification()` above (Section 5). Reads `T##-VERIFY.json` and `S##-UAT-RESULT.md` as canonical sources.
+
+#### Mechanical milestone summary
+
+```typescript
+async function generateMilestoneSummary(base: string, mid: string): Promise<string> {
+  const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
+  const sliceSummaries = [];
+
+  for (const slice of roadmap.slices) {
+    const content = await loadFile(resolveSliceFile(base, mid, slice.id, "SUMMARY"));
+    if (content) sliceSummaries.push({ id: slice.id, summary: parseSummary(content) });
+  }
+
+  // Aggregate frontmatter fields across all slice summaries
+  // Produce structured markdown from the aggregation
+  return formatMilestoneSummary(roadmap, sliceSummaries);
+}
+```
+
+## Consequences
+
+### Session Count Reduction
+
+Counts assume no fallback sessions fire (mechanical processing succeeds). "Current" uses quality profile. "New" is the simplified pipeline.
+
+| Milestone Shape | Current Sessions (quality) | New Sessions | Reduction |
+|----------------|---------------------------|--------------|-----------|
+| 1 slice, 2 tasks | 9 | 3 | 67% |
+| 2 slices, 3 tasks | 17 | 8 | 53% |
+| 4 slices, 3 tasks | 30 | 16 | 47% |
+| 6 slices, 4 tasks | 46 | 31 | 33% |
+
+**Derivation (4-slice, 3-task):**
+
+Current (quality): research-milestone(1) + plan-milestone(1) + [research-slice(0) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] for S01 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] × 2 for S02-S03 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(0)] for S04 + validate(1) + complete-milestone(1) = 2 + 6 + 14 + 6 + 2 = 30.
+
+New: plan-milestone(1) + [execute(3)] for S01 + [plan-slice(1) + execute(3)] × 3 for S02-S04 = 1 + 3 + 12 = 16.
+
+### Token Savings
+
+Eliminated sessions are the primary savings mechanism. Context re-ingestion reduction is a secondary effect of having fewer sessions (each of the remaining sessions still ingests some context). These are NOT additive — the re-ingestion savings are already captured in the eliminated session savings.
+
+| Source | Per Milestone (4-slice, 3-task) |
+|--------|-------------------------------|
+| Eliminated research sessions (1 milestone + 3 slice) | 48–148K tokens |
+| Eliminated complete-slice sessions (4) | 48–148K tokens |
+| Eliminated reassess sessions (3) | 36–111K tokens |
+| Eliminated validate session (1) | 12–37K tokens |
+| Eliminated complete-milestone session (1) | 12–37K tokens |
+| **Total estimated savings** | **~156–481K tokens** |
+
+At current Opus pricing ($15/MTok input, $75/MTok output — as of March 2026), the input savings alone are **$2.34–$7.22 per milestone**. Output savings are harder to estimate but typically 30-50% of input.
+
+### Code Deletion
+
+| File / Section | Lines | Impact |
+|----------------|-------|--------|
+| `auto-dispatch.ts` — 3 removed default-path rules | ~40 | Simpler dispatch table |
+| `auto-prompts.ts` — 5 builders become fallback-only | ~250 | `buildResearchMilestonePrompt`, `buildResearchSlicePrompt`, `buildCompleteSlicePrompt`, `buildValidateMilestonePrompt`, `buildCompleteMilestonePrompt` move to explicit-dispatch codepath |
+| `auto-prompts.ts` — reduced inlining (Phase 4) | ~100 | Remove `inlineFile` calls for static docs in planning prompts, replace with path references |
+| Context re-ingestion helpers (Phase 4) | ~50 | `inlineDecisionsFromDb`, `inlineRequirementsFromDb`, `inlineProjectFromDb` simplified for planning paths |
+| **Total deletable** | **~440** | |
+
+### Code Added
+
+| File / Section | Lines | Impact |
+|----------------|-------|--------|
+| `auto-prompts.ts` — plan-milestone exploration | ~30 | Research instructions merged in |
+| `auto-prompts.ts` — plan-slice reassessment + exploration | ~25 | Reassessment + exploration preamble |
+| `auto-post-unit.ts` — `mechanicalSliceCompletion()` | ~80 | Structured frontmatter aggregation, UAT generation, artifact writes |
+| `auto-verification.ts` — `aggregateMilestoneVerification()` | ~60 | T##-VERIFY.json + UAT-RESULT aggregation |
+| `auto-unit-closeout.ts` — `generateMilestoneSummary()` | ~60 | Mechanical summary generation |
+| **Total added** | **~255** | |
+
+### Net Impact
+
+- **~185 lines net deleted** (440 deleted - 255 added)
+- **3 fewer default-path dispatch rules** (15 → 12, with 3 retained as fallbacks)
+- **6 fewer unit types in the default pipeline** (13 → 7 active; 6 retained for fallback/explicit dispatch)
+- **~156–481K fewer tokens per milestone**
+- **14 fewer session handoffs per 4-slice milestone under quality profile** (each a potential failure/timeout point)
+- `auto-prompts.ts` goes from ~1,099 lines to ~924 lines (~175 lines net reduction)
+
+### What Stays Unchanged
+
+- The **discuss** flow (guided-flow.ts, interactive discussion)
+- The **dispatch table architecture** (declarative rules, first-match-wins)
+- The **fresh session per unit** pattern (still used for plan-slice and execute-task)
+- The **state derivation** (`deriveState()` reads files, derives phase — all existing phases preserved)
+- The **verification gate** (runs tests/lint after each task)
+- The **worktree isolation** model
+- The **crash recovery**, **idempotency**, and **stuck detection** systems (fewer sessions means these fire less often, but the safety nets remain)
+- The **metrics** and **cost tracking** systems
+- The **parallel orchestrator** for independent milestones
+- All prompt templates are **retained** — for fallback, recovery, and explicit dispatch via `/gsd dispatch <unit-type>`
+
+### What Gets Simpler Downstream
+
+Less machinery is needed when sessions are fewer:
+
+- **Fewer recovery paths.** 14 fewer sessions means 14 fewer opportunities for timeouts, stuck states, and missing artifacts.
+- **Simpler `auto-post-unit.ts`.** Reassess dispatch logic removed (opt-in only). Mechanical completion/validation added but replaces more complex LLM-session dispatch.
+- **Simpler `auto-stuck-detection.ts`.** Fewer unit types means fewer dispatch-loop patterns to detect.
+- **Simpler `auto-idempotency.ts`.** Fewer completed-key types to track.
+
+These simplifications are downstream effects — they don't need to happen in the same change. But they represent ~500-1000 lines of code that becomes significantly simpler or unnecessary as a consequence of this ADR.
+
+## Risks
+
+### 1. Plan-milestone sessions become heavier
+
+Merging research into planning makes plan-milestone sessions longer. The planner must explore the codebase AND decompose into slices in a single session. Risk: the session hits context pressure before finishing.
+
+**Mitigation:** Plan-milestone is the session that benefits most from a large context window. Modern context windows (200K+ tokens) easily accommodate exploration + planning. The single-slice fast path (already in plan-milestone.md) already combines planning with slice plan + task plan writing in one session — this extends that pattern. Phase 4 (reducing inlining for planning sessions) further offsets the added exploration work.
+
+**Phase ordering note:** Phase 1 (merge research into planning) adds exploration to plan-milestone. If Phase 4 (reduce inlining) hasn't landed yet, the plan-milestone prompt includes both exploration instructions AND the full inlined context. This is the most context-heavy state. To mitigate, Phase 1 should also reduce inlining for plan-milestone/plan-slice specifically — moving DECISIONS, REQUIREMENTS, and PROJECT to path references while keeping ROADMAP and CONTEXT inlined. This is a targeted subset of Phase 4, not a separate phase.
+
+### 2. Mechanical completion quality
+
+The mechanical slice completion aggregates structured frontmatter but cannot produce narrative context, forward intelligence sections, or nuanced UAT scenarios that the current LLM-driven complete-slice session produces.
+
+**Mitigation:**
+- For most slices (2-3 tasks, straightforward work), structured aggregation is sufficient. The frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) capture the essential information.
+- The quality threshold fallback dispatches a complete-slice LLM session for complex slices.
+- The LLM fallback is zero-cost to implement — the complete-slice template and dispatch rule are retained.
+
+### 3. Loss of research artifacts
+
+RESEARCH.md files provided a useful paper trail for debugging plan quality. Without them, it's harder to understand why a planner made certain decisions.
+
+**Mitigation:**
+- The planner's narration (visible in the conversation transcript) captures exploration reasoning.
+- RESEARCH.md is optional, not eliminated. Planners can write one when exploration is complex.
+- The KNOWLEDGE.md file captures non-obvious patterns and decisions.
+- DECISIONS.md captures structural choices.
+
+### 4. Reassessment gaps
+
+Without mandatory reassessment, a slice might complete with findings that invalidate the remaining roadmap, and the next planner might not notice.
+
+**Mitigation:**
+- The plan-slice prompt includes a reassessment preamble that explicitly checks prior slice summaries.
+- The `blocker_discovered` flag in task summaries already triggers automatic replanning.
+- Users who want explicit reassessment can enable the `reassess_after_slice` preference.
+
+### 5. Mechanical completion doesn't cover all complete-slice responsibilities
+
+The current complete-slice prompt (steps 5, 8, 9) updates REQUIREMENTS.md, appends to DECISIONS.md, and appends to KNOWLEDGE.md. The mechanical completion handles REQUIREMENTS.md and DECISIONS.md mechanically but cannot produce KNOWLEDGE.md entries (which require judgment about what's genuinely useful).
+
+**Mitigation:**
+- Execute-task prompt step 13 already instructs executors to append to KNOWLEDGE.md during task execution. Most knowledge entries are discovered during implementation, not during completion.
+- DECISIONS.md appendix is handled mechanically by collecting `key_decisions` from task summaries and deduplicating against existing entries.
+- REQUIREMENTS.md updates are handled mechanically by cross-referencing task verification results against requirement-to-slice mappings.
+- For the LLM fallback path (complex slices), the complete-slice prompt retains all responsibilities.
+
+### 6. Migration path
+
+Milestones in progress when this change deploys will have state files (RESEARCH.md, etc.) that the new pipeline doesn't produce. The dispatch table must gracefully handle both old-style and new-style state.
+
+**Mitigation:**
+- Dispatch rules check for file existence, not file absence. A milestone with an existing RESEARCH.md still works — the plan-milestone rule fires regardless of whether research exists.
+- The idempotency system already handles "completed research unit → dispatch plan" transitions.
+- All `deriveState()` phases are preserved — old-style state resolves correctly.
+- No migration needed. The new pipeline is strictly more permissive than the old one.
+
+## Alternatives Considered
+
+### A. Keep research as a separate session, just make it optional
+
+Add a `skip_research` preference (already exists) and make it default to true. This is the minimal change — one boolean flip.
+
+**Rejected:** This saves sessions but doesn't address the context re-ingestion problem, the lossy handoff problem, or the ceremony session overhead. It's a preference toggle, not an architectural improvement.
+
+### B. Keep all unit types but share context via a persistent cache
+
+Instead of fresh sessions, maintain a shared context store that persists across units. Each unit reads from the store instead of re-inlining files.
+
+**Rejected:** This requires a fundamentally different session model — either a long-running session (which hits context limits) or a cache mechanism that the LLM can query (which doesn't exist in the Claude API). The fresh-session-per-unit model is correct; the problem is what we put in each session, not the session model itself.
+
+### C. Collapse everything into a single session per slice
+
+One session per slice: plan + execute all tasks + complete. Maximum context efficiency.
+
+**Rejected:** This hits real context limits for slices with 4+ tasks. Task execution is legitimately heavy — reading code, writing code, running tests, debugging failures. A single session for all of this would exhaust the context window. The plan-slice / execute-task boundary is a genuine engineering constraint, not ceremony.
+
+### D. Fold completion into the last executor's prompt instead of post-unit processing
+
+The original design had the last execute-task writing SUMMARY.md, UAT.md, and marking the slice done.
+
+**Rejected (per Codex audit):** This creates a verification-retry ordering problem. If the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure retries against incorrect derived state. Post-gate mechanical processing avoids this by running only after verification succeeds.
+
+### E. Keep complete-slice as a separate session
+
+The mechanical summary quality might be insufficient for complex slices.
+
+**Addressed:** The mechanical approach with LLM fallback provides the best of both worlds. Simple slices get fast mechanical completion. Complex slices fall back to the existing LLM session. The quality threshold is tunable.
+
+## Action Items
+
+### Phase 1: Merge research into planning (+ targeted inlining reduction)
+1. Update `buildPlanMilestonePrompt()` — add exploration instructions, skill discovery, drop "Trust the research"
+2. Update `buildPlanSlicePrompt()` — add exploration instructions, reassessment preamble, drop "Trust the research"
+3. Remove dispatch rule "pre-planning (no research) → research-milestone" — merge with "pre-planning (has research) → plan-milestone" into single "pre-planning → plan-milestone"
+4. Remove dispatch rule "planning (no research, not S01) → research-slice"
+5. Update `plan-milestone.md` and `plan-slice.md` prompt templates
+6. Make `skip_research` and `skip_slice_research` preferences default to true (backwards compat)
+7. Retain research templates for explicit `/gsd dispatch research` use
+8. **Targeted inlining reduction for planning sessions:** Move DECISIONS, REQUIREMENTS, PROJECT to path references in plan-milestone and plan-slice prompts. Keep ROADMAP and CONTEXT inlined. This prevents context pressure from the added exploration work.
+
+### Phase 2: Mechanical slice completion
+9. Implement `mechanicalSliceCompletion()` in `auto-post-unit.ts`
+10. Wire into post-unit processing: detect all-tasks-done after verification gate passes, run mechanical completion
+11. Implement quality threshold check (summary length, artifact presence)
+12. Retain `summarizing → complete-slice` dispatch rule as fallback for mechanical failures
+13. Implement `mechanicalRequirementsUpdate()` and `appendNewDecisions()`
+
+### Phase 3: Mechanical milestone validation + completion
+14. Implement `aggregateMilestoneVerification()` reading `T##-VERIFY.json` and `S##-UAT-RESULT.md`
+15. Implement `generateMilestoneSummary()` from slice summary aggregation
+16. Wire into post-unit processing: after last slice completion, run mechanical validation + summary
+17. Make reassess-roadmap opt-in via `reassess_after_slice` preference (default: false)
+18. Retain `validating-milestone` and `completing-milestone` dispatch rules as fallbacks
+
+### Phase 4: Full context re-ingestion reduction
+19. Replace remaining `inlineFile()` calls for stable documents with mandatory-read path references
+20. Update prompt headers with explicit "You MUST read" directives for critical files
+21. Add plan output verification (must cite ROADMAP slice description)
+22. Measure plan quality metrics before/after to validate the change
+
+### Phase 5: Downstream simplification (optional, deferred)
+23. Simplify `auto-post-unit.ts` — remove reassess dispatch logic (opt-in only)
+24. Simplify `auto-stuck-detection.ts` — fewer unit type patterns
+25. Simplify `auto-idempotency.ts` — fewer completed-key types
+26. Review `auto-recovery.ts` — simplify recovery paths for unit types that are now fallback-only
+27. Update auto-mode documentation (`docs/auto-mode.md`)
+
+## Audit Trail
+
+### Round 1 — Three-model review (March 18, 2026)
+
+**Claude Opus 4.6** identified 8 issues:
+1. ✅ Session count math inconsistent about S01 plan-slice skip — **fixed**: explicit derivation added with per-slice breakdown
+2. ✅ `discuss` session counted in pipeline but not in math — **fixed**: noted as interactive session, not auto-mode unit
+3. ✅ Token savings double-counting (eliminated sessions + re-ingestion) — **fixed**: removed overlap, noted savings are not additive
+4. ✅ Context inlining change (file paths vs inline) underanalyzed — **fixed**: expanded to dedicated risk section with enforcement strategy, phased rollout, and interaction with budget engine
+5. ✅ Budget engine interaction not discussed — **fixed**: addressed in context inlining section
+6. ✅ `aggregateMilestoneVerification()` reads wrong data source — **fixed**: now reads `T##-VERIFY.json` as primary source, supplemented by `S##-UAT-RESULT.md`
+7. ✅ Phase ordering creates heavy intermediate state (Phase 1 without Phase 4) — **fixed**: Phase 1 now includes targeted inlining reduction for planning sessions
+8. ✅ ADR number conflict — **fixed**: confirmed no ADR-003 exists in `docs/` (the referenced file doesn't exist in current git)
+
+**OpenAI Codex** identified 6 issues:
+1. ✅ HIGH: Folding completion into execute-task breaks verification-retry model — **fixed**: moved completion to post-gate mechanical processing instead of executor prompt. Added Alternative D explaining why.
+2. ✅ HIGH: Mechanical validation reads nonexistent `verification_evidence` frontmatter — **fixed**: now reads `T##-VERIFY.json` (canonical machine-readable source from `verification-evidence.ts`)
+3. ✅ HIGH: Replacement validation drops UAT evidence — **fixed**: aggregator now reads both `T##-VERIFY.json` and `S##-UAT-RESULT.md`
+4. ✅ HIGH: "State derivation stays unchanged" is false — **fixed**: explicitly documented that `deriveState()` phases are preserved, mechanical processing resolves them synchronously, fallback dispatch rules handle failures
+5. ✅ MEDIUM: Folded completion omits REQUIREMENTS.md and KNOWLEDGE.md updates — **fixed**: mechanical completion handles REQUIREMENTS.md and DECISIONS.md; KNOWLEDGE.md addressed in Risk 5
+6. ✅ MEDIUM: Session and token math inconsistent — **fixed**: complete rederivation with per-slice breakdown, corrected to 30 baseline sessions, noted profile variations
+
+**Gemini 2.5 Pro** audit was not usable — it hallucinated the ADR as a CI/CD pipeline document about GitHub Actions, matrix builds, and nx workspace tooling. No findings were applicable to the actual content.
diff --git a/docs/FILE-SYSTEM-MAP.md b/docs/FILE-SYSTEM-MAP.md
new file mode 100644
index 000000000..cfaa65fae
--- /dev/null
+++ b/docs/FILE-SYSTEM-MAP.md
@@ -0,0 +1,1020 @@
+# GSD2 File System Map
+# Maps every source file to its system/subsystem labels
+
+---
+
+## System Labels Reference
+
+| Label | Description |
+|-------|-------------|
+| **Agent Core** | Core agent loop, session lifecycle, SDK factory |
+| **AI Providers** | LLM provider implementations (Anthropic, OpenAI, Google, etc.) |
+| **API Routes** | Next.js API route handlers (web server) |
+| **AST** | Abstract Syntax Tree search/rewrite via tree-sitter + ast-grep |
+| **Async Jobs** | Background bash job management |
+| **Auth / OAuth** | Authentication, OAuth flows, token storage |
+| **Auto Engine** | GSD autonomous execution loop, dispatch, supervision |
+| **Bg Shell** | Background process / interactive shell management |
+| **Browser Tools** | Playwright-based browser automation extension |
+| **Build System** | Scripts for build, packaging, version management, CI |
+| **CLI** | Command-line entry points and argument parsing |
+| **CMux** | Tmux/multiplexer session integration |
+| **Commands** | GSD slash/sub-command routing and handlers |
+| **Compaction** | Context token reduction and summarization |
+| **Config** | Paths, defaults, models, preferences, constants |
+| **Context7** | Library documentation fetching extension |
+| **Doctor / Diagnostics** | Health checks, forensics, skill health |
+| **Event System** | Event bus, publication/subscription |
+| **Extension Registry** | Extension discovery, manifests, enable/disable |
+| **Extensions** | Extension loader, runner, project trust, hooks |
+| **File Search** | grep, glob, fd — file and content discovery |
+| **GSD Workflow** | Core GSD planning/execution workflow engine |
+| **Google Search** | Web search via Google API |
+| **Headless Mode** | Non-interactive / scripted command execution |
+| **Image Processing** | Image decode, resize, encode, clipboard images |
+| **Integration Tests** | Smoke, fixture, live, regression test suites |
+| **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
+| **LSP** | Language Server Protocol client and multiplexer |
+| **Mac Tools** | macOS-native utilities (Swift CLI) |
+| **MCP Server/Client** | Model Context Protocol server and client |
+| **Memory Extension** | In-session memory pipeline and storage |
+| **Migration** | Data and config migration tools |
+| **Modes** | Interactive TUI, Print, RPC, and Web modes |
+| **Model System** | Model discovery, resolution, routing, registry |
+| **Native / Rust Tools** | N-API Rust engine modules |
+| **Node.js Bindings** | TypeScript wrappers around Rust N-API modules |
+| **Onboarding** | First-run wizard and setup flows |
+| **Permissions** | Permission management for tools and trust |
+| **Remote Questions** | Remote prompting via Slack, Discord, Telegram |
+| **Search the Web** | Brave/Jina/Tavily-based web search extension |
+| **Session Management** | Session file I/O, branches, fork trees |
+| **Skills** | Skill tool registration, health, telemetry |
+| **Slash Commands** | Command boilerplate generators extension |
+| **State Machine** | State, history, persistence, reactive graph |
+| **Studio App** | Electron desktop app (renderer, main, preload) |
+| **Subagent** | Parallel/serial subagent delegation |
+| **Syntax Highlighting** | Syntect-backed ANSI code coloring |
+| **Text Processing** | Diff, truncation, HTML→MD, ANSI, JSON parse |
+| **Tool System** | Tool implementations (bash, edit, read, write, grep…) |
+| **TTSR** | Time-Traveling Stream Rules regex guardrails |
+| **TUI Components** | Terminal UI component library (pi-tui) |
+| **Universal Config** | Multi-tool configuration file discovery |
+| **Voice** | Voice input extension (Swift/Python) |
+| **VS Code Extension** | VS Code sidebar, chat participant, RPC client |
+| **Web Mode** | Web server service layer and RPC bridge |
+| **Web UI** | Next.js frontend components, pages, hooks |
+| **Worktree** | Git worktree lifecycle, sync, name generation |
+
+---
+
+## src/ — Core Application Files
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| src/app-paths.ts | Config | App directory paths (GSD_HOME, sessions, web PID, prefs) |
+| src/app-paths.js | Config | Compiled JS version |
+| src/bundled-extension-paths.ts | Extension Registry | Serializes/parses bundled extension directory paths |
+| src/bundled-resource-path.ts | Loader/Bootstrap, Extension Registry | Resolves bundled raw resource files from package root |
+| src/cli.ts | CLI | Main CLI entry point — arg parsing, mode detection, plugin init |
+| src/cli-web-branch.ts | CLI, Web Mode | Web CLI branch; session dir resolution, legacy migration |
+| src/extension-discovery.ts | Extension Registry | Discovers extension entry points from FS and package.json |
+| src/extension-registry.ts | Extension Registry | Extension manifests, registry persistence, enable/disable |
+| src/headless-answers.ts | Headless Mode | Pre-supply answers to extension UI requests in headless |
+| src/headless-context.ts | Headless Mode | Context loading from stdin/files; project bootstrapping |
+| src/headless-events.ts | Headless Mode | Event classification, terminal detection, idle timeouts |
+| src/headless-query.ts | Headless Mode, CLI | Read-only snapshot query (state, dispatch preview, costs) |
+| src/headless-ui.ts | Headless Mode | Extension UI auto-response, progress formatting |
+| src/headless.ts | Headless Mode | Orchestrator for /gsd subcommands without TUI via RPC |
+| src/help-text.ts | CLI | Generates help text for all subcommands |
+| src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
+| src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
+| src/mcp-server.ts | MCP Server/Client | Native MCP server over stdin/stdout for external AI clients |
+| src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to GSD |
+| src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup |
+| src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to GSD |
+| src/project-sessions.ts | State Machine, CLI | Session-per-project directory paths from project CWD |
+| src/remote-questions-config.ts | Config, Onboarding | Saves remote questions (Discord, Slack, Telegram) config |
+| src/resource-loader.ts | Loader/Bootstrap, Extension Registry | Initializes, syncs, validates bundled resources |
+| src/startup-timings.ts | CLI, Build System | Optional startup timing instrumentation |
+| src/tool-bootstrap.ts | Loader/Bootstrap | Manages fd/rg availability, falls back to built-in |
+| src/update-check.ts | CLI | Checks npm registry for new versions (cached) |
+| src/update-cmd.ts | CLI | Executes npm install to update gsd-pi package |
+| src/web-mode.ts | Web Mode | Launches/manages web server process (PID tracking, browser) |
+| src/welcome-screen.ts | CLI | Welcome panel — logo, version, model info |
+| src/wizard.ts | Onboarding, Config | Loads env keys from auth.json → hydrates process.env |
+| src/worktree-cli.ts | Worktree, CLI | Worktree lifecycle: create, list, merge, clean, remove |
+| src/worktree-name-gen.ts | Worktree | Generates random worktree names (adjective-verbing-noun) |
+
+### src/web/ — Web Service Layer
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| src/web/auto-dashboard-service.ts | Web Mode, Auto Engine | Loads auto-mode dashboard state (active, paused, costs) |
+| src/web/bridge-service.ts | Web Mode, State Machine | Central hub spawning RPC sessions, managing session state |
+| src/web/captures-service.ts | Web Mode | Loads knowledge capture entries via child process bridge |
+| src/web/cleanup-service.ts | Web Mode | Collects GSD branches and snapshot refs for cleanup |
+| src/web/cli-entry.ts | Web Mode, CLI | Builds/resolves GSD CLI entry points for RPC/interactive |
+| src/web/doctor-service.ts | Web Mode, Doctor/Diagnostics | Runs diagnostics, returns fixer operations |
+| src/web/export-service.ts | Web Mode | Generates exported project reports (markdown/JSON) |
+| src/web/forensics-service.ts | Web Mode, Doctor/Diagnostics | Loads forensic report data (traces, metrics, issues) |
+| src/web/git-summary-service.ts | Web Mode | Provides git branch, commit history, diff summary |
+| src/web/history-service.ts | Web Mode | Loads metrics ledger, aggregates history views |
+| src/web/hooks-service.ts | Web Mode | Manages git hook registration and shell integration |
+| src/web/inspect-service.ts | Web Mode | Detailed inspection of project state and traces |
+| src/web/knowledge-service.ts | Web Mode | Reads and parses KNOWLEDGE.md |
+| src/web/onboarding-service.ts | Web Mode, Onboarding, Auth/OAuth | Manages onboarding state, auth refresh, lock reasons |
+| src/web/project-discovery-service.ts | Web Mode | Discovers and catalogs projects in filesystem |
+| src/web/recovery-diagnostics-service.ts | Web Mode | Recovery suggestions for error states/blockers |
+| src/web/settings-service.ts | Web Mode, Config | Loads preferences, routing config, budget, totals |
+| src/web/skill-health-service.ts | Web Mode, Doctor/Diagnostics | Loads skill health report with capability assessments |
+| src/web/undo-service.ts | Web Mode | Manages undo/snapshot and restoration |
+| src/web/update-service.ts | Web Mode | Checks for and executes application updates |
+| src/web/visualizer-service.ts | Web Mode | Generates visual representations of project state |
+| src/web/web-auth-storage.ts | Web Mode, Auth/OAuth | OAuth and API key credential storage for web mode |
+
+---
+
+## packages/pi-agent-core/src/ — Agent Core
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| agent-loop.ts | Agent Core, State Machine | Core agent execution loop — tool calls and LLM interactions |
+| agent.ts | Agent Core | Main Agent class wrapping loop with state management |
+| proxy.ts | Agent Core | Proxy wrapper for agent functionality |
+| types.ts | Agent Core | Type definitions for agent config, context, events |
+| index.ts | Agent Core | Package exports |
+
+---
+
+## packages/pi-ai/src/ — AI Providers
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| index.ts | AI Providers | Main export hub for providers and streaming |
+| api-registry.ts | AI Providers | Registry for managing multiple AI provider implementations |
+| models.ts | AI Providers | Model definitions and metadata |
+| models.generated.ts | AI Providers | Auto-generated model list from provider registries |
+| stream.ts | AI Providers | Main streaming interface dispatching to registered providers |
+| types.ts | AI Providers | Core types for models, APIs, streaming options |
+| env-api-keys.ts | AI Providers, Auth/OAuth | Environment variable API key resolution |
+| web-runtime-env-api-keys.ts | AI Providers, Auth/OAuth | Web runtime API key handling |
+| web-runtime-oauth.ts | AI Providers, Auth/OAuth | Web runtime OAuth token management |
+| providers/register-builtins.ts | AI Providers | Registration of built-in provider implementations |
+| providers/anthropic.ts | AI Providers | Anthropic API provider |
+| providers/anthropic-shared.ts | AI Providers | Shared utilities for Anthropic provider variants |
+| providers/anthropic-vertex.ts | AI Providers | Google Vertex AI Anthropic models |
+| providers/amazon-bedrock.ts | AI Providers | AWS Bedrock LLM provider |
+| providers/bedrock-provider.ts | AI Providers | Bedrock-specific streaming logic |
+| providers/google.ts | AI Providers | Google Generative AI provider |
+| providers/google-gemini-cli.ts | AI Providers | Google Gemini CLI authentication provider |
+| providers/google-shared.ts | AI Providers | Shared Google provider utilities |
+| providers/google-vertex.ts | AI Providers | Google Vertex AI provider |
+| providers/mistral.ts | AI Providers | Mistral AI provider |
+| providers/openai-completions.ts | AI Providers | OpenAI legacy completions API |
+| providers/openai-responses.ts | AI Providers | OpenAI responses (chat) API |
+| providers/openai-responses-shared.ts | AI Providers | Shared OpenAI responses utilities |
+| providers/openai-shared.ts | AI Providers | Shared OpenAI utilities |
+| providers/openai-codex-responses.ts | AI Providers | OpenAI Codex-specific response handling |
+| providers/azure-openai-responses.ts | AI Providers | Azure OpenAI responses provider |
+| providers/github-copilot-headers.ts | AI Providers | GitHub Copilot custom header construction |
+| providers/simple-options.ts | AI Providers | Common options builder for simple streaming |
+| providers/transform-messages.ts | AI Providers | Message transformation for provider compatibility |
+| utils/oauth/index.ts | Auth/OAuth | OAuth utilities export hub |
+| utils/oauth/types.ts | Auth/OAuth | OAuth credential and prompt types |
+| utils/oauth/pkce.ts | Auth/OAuth | PKCE flow implementation |
+| utils/oauth/github-copilot.ts | Auth/OAuth | GitHub Copilot OAuth flow |
+| utils/oauth/google-oauth-utils.ts | Auth/OAuth | Shared Google OAuth utilities |
+| utils/oauth/google-gemini-cli.ts | Auth/OAuth | Google Gemini CLI OAuth flow |
+| utils/oauth/google-antigravity.ts | Auth/OAuth | Google Antigravity OAuth implementation |
+| utils/oauth/openai-codex.ts | Auth/OAuth | OpenAI Codex OAuth flow |
+| utils/oauth/anthropic.ts | Auth/OAuth | Anthropic OAuth flow |
+| utils/event-stream.ts | AI Providers | Event stream parsing and handling |
+| utils/hash.ts | AI Providers | Hashing utilities |
+| utils/json-parse.ts | AI Providers | Resilient JSON parsing with recovery |
+| utils/overflow.ts | AI Providers | Token/context overflow detection |
+| utils/sanitize-unicode.ts | AI Providers | Unicode sanitization for API compatibility |
+| utils/validation.ts | AI Providers | Request/response validation schemas |
+| utils/typebox-helpers.ts | AI Providers | TypeBox schema helpers |
+
+---
+
+## packages/pi-tui/src/ — TUI Components
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| index.ts | TUI Components | Main TUI export hub |
+| tui.ts | TUI Components | Core TUI renderer and component system |
+| terminal.ts | TUI Components | Low-level terminal I/O and rendering |
+| keys.ts | TUI Components | Keyboard key parsing and matching |
+| keybindings.ts | TUI Components | Keybinding configuration and management |
+| stdin-buffer.ts | TUI Components | Buffered stdin for batch key processing |
+| editor-component.ts | TUI Components | Interface for custom editor implementations |
+| autocomplete.ts | TUI Components | Autocomplete suggestion provider system |
+| fuzzy.ts | TUI Components | Fuzzy matching algorithm |
+| terminal-image.ts | TUI Components | Terminal image protocol (Kitty, iTerm2) |
+| kill-ring.ts | TUI Components | Emacs-style kill ring buffer |
+| undo-stack.ts | TUI Components | Undo/redo stack for editor operations |
+| overlay-layout.ts | TUI Components | Overlay/modal dialog layout system |
+| utils.ts | TUI Components | Text width calculation, ANSI utilities |
+| components/box.ts | TUI Components | Box drawing with borders and styling |
+| components/text.ts | TUI Components | Simple text display component |
+| components/truncated-text.ts | TUI Components | Text with automatic truncation |
+| components/spacer.ts | TUI Components | Vertical/horizontal spacing |
+| components/input.ts | TUI Components | Single-line text input with history |
+| components/loader.ts | TUI Components | Animated loading spinner |
+| components/cancellable-loader.ts | TUI Components | Loading spinner with cancel |
+| components/image.ts | TUI Components | Image display with theme support |
+| components/select-list.ts | TUI Components | List selection UI with keyboard nav |
+| components/settings-list.ts | TUI Components | Settings/preferences list display |
+| components/editor.ts | TUI Components | Full multi-line editor with syntax awareness |
+| components/markdown.ts | TUI Components | Markdown rendering to terminal |
+
+---
+
+## packages/pi-coding-agent/src/ — Coding Agent
+
+### CLI
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| cli.ts | CLI | Main CLI entry point and argument routing |
+| main.ts | CLI | CLI main entry with mode routing |
+| cli/args.ts | CLI | CLI argument definition and parsing |
+| cli/config-selector.ts | CLI | Interactive configuration selection |
+| cli/file-processor.ts | CLI | File input processing for agent context |
+| cli/list-models.ts | CLI, Model System | Model listing and discovery UI |
+| cli/session-picker.ts | CLI | Session selection interface |
+
+### Core — Session & State
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/agent-session.ts | Agent Core, State Machine | Core session abstraction, agent lifecycle, persistence |
+| core/session-manager.ts | Session Management | Session file I/O, branch/fork tree management |
+| core/event-bus.ts | Agent Core, Event System | Event publication and subscription |
+| core/messages.ts | State Machine | Message type definitions and constructors |
+| core/settings-manager.ts | Session Management, Config | Session-level settings persistence |
+
+### Core — Tool System
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/tools/index.ts | Tool System | Tool registry and factory exports |
+| core/tools/bash.ts | Tool System | Bash/shell command execution tool |
+| core/tools/bash-interceptor.ts | Tool System | Bash command interception and filtering |
+| core/tools/edit.ts | Tool System | File editing tool with line ranges |
+| core/tools/edit-diff.ts | Tool System | Edit tool with diff-based operations |
+| core/tools/read.ts | Tool System | File reading tool |
+| core/tools/write.ts | Tool System | File writing tool |
+| core/tools/find.ts | Tool System, File Search | File discovery tool |
+| core/tools/grep.ts | Tool System, File Search | Pattern search tool |
+| core/tools/ls.ts | Tool System | Directory listing tool |
+| core/tools/truncate.ts | Tool System, Text Processing | Output truncation utility |
+| core/tools/hashline.ts | Tool System | Hash-based line identification |
+| core/tools/hashline-read.ts | Tool System | File reading with hash-based line ranges |
+| core/tools/hashline-edit.ts | Tool System | File editing with hash-based line identification |
+| core/tools/path-utils.ts | Tool System | Path normalization and validation |
+| core/bash-executor.ts | Tool System | High-level bash execution with event handling |
+| core/exec.ts | Tool System | Utility functions for command execution |
+
+### Core — Model Management
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/model-registry.ts | Model System | Model metadata and capability registry |
+| core/model-discovery.ts | Model System | Model discovery from external sources |
+| core/model-resolver.ts | Model System | Model selection and resolution logic |
+| core/models-json-writer.ts | Model System | Model metadata serialization |
+
+### Core — AI & Context
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/prompt-templates.ts | Agent Core | Template system for prompt construction |
+| core/system-prompt.ts | Agent Core | System prompt building and management |
+| core/retry-handler.ts | AI Providers | Retry logic with exponential backoff |
+| core/fallback-resolver.ts | Model System | Model fallback resolution on API failures |
+| core/slash-commands.ts | Commands | Built-in slash command definitions and handlers |
+
+### Core — Extensions & Skills
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/extensions/index.ts | Extensions | Extension system exports |
+| core/extensions/types.ts | Extensions | Extension event and context types |
+| core/extensions/loader.ts | Extensions | Extension discovery and loading |
+| core/extensions/runner.ts | Extensions, Event System | Extension event dispatch and execution |
+| core/extensions/wrapper.ts | Extensions, Tool System | Tool wrapping for extension monitoring |
+| core/extensions/project-trust.ts | Extensions, Permissions | Project trust management for local extensions |
+| core/skills.ts | Skills, Tool System | Skill tool registration and management |
+
+### Core — Compaction
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/compaction-orchestrator.ts | Compaction | Orchestrates session compaction decisions |
+| core/compaction/compaction.ts | Compaction | Context token reduction via summarization |
+| core/compaction/branch-summarization.ts | Compaction | Branch history summarization for context limits |
+| core/compaction/utils.ts | Compaction | Compaction utilities |
+
+### Core — Configuration & Auth
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| config.ts | Config | Directory paths and version management |
+| core/sdk.ts | Agent Core | Main SDK factory for creating agent sessions |
+| core/resolve-config-value.ts | Config | Config value resolution from environment/files |
+| core/resource-loader.ts | Config, Loader/Bootstrap | Extensible resource loading (tools, extensions, themes) |
+| core/defaults.ts | Config | Default configuration values |
+| core/constants.ts | Config | Global constants |
+| core/auth-storage.ts | Auth/OAuth, Permissions | OAuth token storage and management |
+| migrations.ts | Config, Migration | Configuration migration and deprecation handling |
+
+### Core — Artifacts & Export
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/artifact-manager.ts | Agent Core | Artifact file management and metadata |
+| core/blob-store.ts | Agent Core | Binary data storage for images and attachments |
+| core/export-html/index.ts | Web Mode | Session export to HTML |
+| core/export-html/ansi-to-html.ts | Web Mode | ANSI code to HTML conversion |
+| core/export-html/tool-renderer.ts | Web Mode | HTML rendering for tool calls/results |
+
+### Core — LSP
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/lsp/index.ts | LSP | LSP integration exports |
+| core/lsp/client.ts | LSP | LSP client implementation |
+| core/lsp/lspmux.ts | LSP | LSP server multiplexing |
+| core/lsp/config.ts | LSP | LSP server configuration |
+| core/lsp/edits.ts | LSP | LSP-based code editing operations |
+| core/lsp/helpers.ts | LSP | LSP utility functions |
+| core/lsp/types.ts | LSP | LSP type definitions |
+| core/lsp/utils.ts | LSP | LSP utilities |
+
+### Core — Utilities
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/fs-utils.ts | Tool System | File system utilities (atomic writes, temp files) |
+| core/lock-utils.ts | Tool System | File locking for concurrent access |
+| core/timings.ts | Build System | Performance timing measurement |
+| core/diagnostics.ts | Doctor/Diagnostics | Diagnostic information collection |
+| core/discovery-cache.ts | Model System | Model discovery result caching |
+| core/keybindings.ts | TUI Components | Keybinding definitions |
+| core/footer-data-provider.ts | TUI Components | Footer information provider |
+| core/index.ts | Agent Core | Core module exports |
+| index.ts | Agent Core | Package exports |
+| utils/clipboard.ts | Tool System | Clipboard read/write |
+| utils/clipboard-native.ts | Tool System | Native clipboard implementation |
+| utils/clipboard-image.ts | Tool System | Clipboard image support |
+| utils/error.ts | Agent Core | Error message extraction/formatting |
+| utils/frontmatter.ts | Config | YAML frontmatter parsing |
+| utils/git.ts | Tool System | Git information and utilities |
+| utils/image-convert.ts | Image Processing | Image format conversion |
+| utils/image-resize.ts | Image Processing | Image resizing and optimization |
+| utils/mime.ts | Tool System | MIME type detection |
+| utils/path-display.ts | TUI Components | Path formatting for display |
+| utils/photon.ts | Agent Core | Photon scripting runtime support |
+| utils/shell.ts | Tool System | Shell detection and execution |
+| utils/changelog.ts | CLI | Changelog parsing |
+| utils/sleep.ts | Agent Core | Async sleep/delay utility |
+| utils/tools-manager.ts | Tool System | Tool discovery and management |
+| package-manager.ts | Build System | npm/yarn/pnpm/bun abstraction |
+
+### Modes
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| modes/index.ts | Modes | Mode system exports |
+| modes/print-mode.ts | Modes | Non-interactive print mode |
+| modes/rpc/rpc-mode.ts | Modes, MCP Server/Client | RPC server mode for remote access |
+| modes/rpc/rpc-client.ts | Modes, MCP Server/Client | RPC client for remote agent interaction |
+| modes/rpc/rpc-types.ts | Modes, MCP Server/Client | RPC protocol type definitions |
+| modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC |
+| modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling |
+| modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities |
+| modes/interactive/interactive-mode.ts | Modes, TUI Components | Main interactive TUI mode orchestration |
+| modes/interactive/interactive-mode-state.ts | Modes, TUI Components, State Machine | Interactive mode state management |
+| modes/interactive/slash-command-handlers.ts | Modes, Commands | Interactive mode slash command handlers |
+| modes/interactive/theme/theme.ts | TUI Components | Theme system and hot reloading |
+| modes/interactive/theme/themes.ts | TUI Components | Built-in theme definitions |
+| modes/interactive/utils/shorten-path.ts | TUI Components | Path shortening for display |
+| modes/interactive/controllers/chat-controller.ts | Modes, TUI Components | Chat input and message submission |
+| modes/interactive/controllers/input-controller.ts | Modes, TUI Components | Input handling and routing |
+| modes/interactive/controllers/model-controller.ts | Modes, TUI Components, Model System | Model/provider/thinking configuration |
+| modes/interactive/controllers/extension-ui-controller.ts | Modes, TUI Components, Extensions | Extension UI event handling |
+
+### Modes — Interactive Components
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| components/index.ts | TUI Components | Interactive mode component exports |
+| components/armin.ts | TUI Components | Assistant message rendering |
+| components/assistant-message.ts | TUI Components | Assistant message display |
+| components/user-message.ts | TUI Components | User message display |
+| components/user-message-selector.ts | TUI Components | User message editing selector |
+| components/bash-execution.ts | TUI Components, Tool System | Bash execution result display |
+| components/tool-execution.ts | TUI Components, Tool System | Tool call and result display |
+| components/custom-message.ts | TUI Components | Custom message type display |
+| components/custom-editor.ts | TUI Components | Custom editor integration |
+| components/skill-invocation-message.ts | TUI Components, Skills | Skill invocation display |
+| components/branch-summary-message.ts | TUI Components, Compaction | Branch summary display |
+| components/compaction-summary-message.ts | TUI Components, Compaction | Compaction summary display |
+| components/diff.ts | TUI Components, Text Processing | Diff display component |
+| components/tree-render-utils.ts | TUI Components, Session Management | Session tree rendering utilities |
+| components/tree-selector.ts | TUI Components, Session Management | Session tree navigation UI |
+| components/session-selector.ts | TUI Components, Session Management | Session selection UI |
+| components/session-selector-search.ts | TUI Components, Session Management | Session search UI |
+| components/model-selector.ts | TUI Components, Model System | Model selection UI |
+| components/scoped-models-selector.ts | TUI Components, Model System | Scoped model selection |
+| components/thinking-selector.ts | TUI Components, Model System | Thinking level selection |
+| components/provider-manager.ts | TUI Components, AI Providers | Provider configuration UI |
+| components/oauth-selector.ts | TUI Components, Auth/OAuth | OAuth provider selection/login |
+| components/login-dialog.ts | TUI Components, Auth/OAuth | OAuth login dialog |
+| components/theme-selector.ts | TUI Components | Theme selection UI |
+| components/config-selector.ts | TUI Components, Config | Configuration selection UI |
+| components/extension-selector.ts | TUI Components, Extensions | Extension selection UI |
+| components/extension-editor.ts | TUI Components, Extensions | Extension code editor |
+| components/extension-input.ts | TUI Components, Extensions | Extension input handling |
+| components/settings-selector.ts | TUI Components, Config | Settings/preferences UI |
+| components/show-images-selector.ts | TUI Components, Config | Image display toggle |
+| components/bordered-loader.ts | TUI Components | Loading spinner with border |
+| components/countdown-timer.ts | TUI Components | Countdown timer display |
+| components/dynamic-border.ts | TUI Components | Dynamic border drawing |
+| components/keybinding-hints.ts | TUI Components | Keybinding help display |
+| components/footer.ts | TUI Components | Footer information display |
+| components/daxnuts.ts | TUI Components | Special rendering effect |
+| components/visual-truncate.ts | TUI Components | Visual text truncation |
+
+### Resources — Memory Extension
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| resources/extensions/memory/index.ts | Memory Extension | Memory extension index and setup |
+| resources/extensions/memory/pipeline.ts | Memory Extension | Memory processing pipeline |
+| resources/extensions/memory/storage.ts | Memory Extension | Memory persistence storage |
+
+---
+
+## src/resources/extensions/ — Extension Subsystems
+
+### GSD Extension (Core Workflow Engine)
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| gsd/index.ts | GSD Workflow | Main GSD extension bootstrap and registration |
+| gsd/auto.ts | Auto Engine | Automatic workflow execution and loop management |
+| gsd/auto-dashboard.ts | Auto Engine, Web Mode | Real-time dashboard for auto-run progress |
+| gsd/auto-worktree.ts | Auto Engine, Worktree | Automatic worktree creation and branch management |
+| gsd/auto-recovery.ts | Auto Engine | Recovery for crashed/stalled workflows |
+| gsd/auto-start.ts | Auto Engine | Initialization sequence for automatic execution |
+| gsd/auto-worktree-sync.ts | Auto Engine, Worktree | State sync between worktrees and main |
+| gsd/auto-model-selection.ts | Auto Engine, Model System | Intelligent LLM model routing |
+| gsd/auto-direct-dispatch.ts | Auto Engine | Direct command dispatching without planning |
+| gsd/auto-dispatch.ts | Auto Engine | Task queueing and priority-based dispatch |
+| gsd/auto-timeout-recovery.ts | Auto Engine | Timeout handling and recovery |
+| gsd/auto-post-unit.ts | Auto Engine | Post-unit milestone completion processing |
+| gsd/auto-unit-closeout.ts | Auto Engine | Unit finalization and archiving |
+| gsd/auto-verification.ts | Auto Engine | Post-execution verification |
+| gsd/auto-timers.ts | Auto Engine | Timeout and deadline management |
+| gsd/auto-loop.ts | Auto Engine, State Machine | Execution loop state and cycle management |
+| gsd/auto-supervisor.ts | Auto Engine | Supervision and oversight of autonomous runs |
+| gsd/auto-budget.ts | Auto Engine | Token/cost budgeting and tracking |
+| gsd/auto-observability.ts | Auto Engine | Observability hooks and telemetry |
+| gsd/auto-tool-tracking.ts | Auto Engine | Tool usage instrumentation |
+| gsd/doctor.ts | Doctor/Diagnostics | Health check and system diagnostics |
+| gsd/doctor-checks.ts | Doctor/Diagnostics | Individual diagnostic checks |
+| gsd/doctor-providers.ts | Doctor/Diagnostics | Diagnostic data source providers |
+| gsd/doctor-format.ts | Doctor/Diagnostics | Diagnostic output formatting |
+| gsd/state.ts | State Machine | Milestone and workflow state management |
+| gsd/history.ts | State Machine | State history and versioning |
+| gsd/json-persistence.ts | State Machine | JSON-based persistence layer |
+| gsd/memory-store.ts | State Machine | In-memory state storage |
+| gsd/reactive-graph.ts | State Machine | Reactive dependency graph for state |
+| gsd/routing-history.ts | State Machine | History of routing decisions |
+| gsd/cache.ts | State Machine | Caching layer for performance |
+| gsd/model-router.ts | Model System | LLM model selection and routing logic |
+| gsd/worktree.ts | Worktree | Worktree creation and management |
+| gsd/worktree-manager.ts | Worktree | Higher-level worktree orchestration |
+| gsd/worktree-resolver.ts | Worktree | Worktree path and reference resolution |
+| gsd/unit-runtime.ts | Auto Engine | Unit-level execution runtime |
+| gsd/activity-log.ts | GSD Workflow | Activity tracking and logging |
+| gsd/debug-logger.ts | GSD Workflow | Debug output and verbose logging |
+| gsd/commands.ts | Commands | Main command dispatcher |
+| gsd/commands-handlers.ts | Commands | Command-specific handlers |
+| gsd/commands-bootstrap.ts | Commands | Bootstrap and initialization commands |
+| gsd/commands-config.ts | Commands, Config | Configuration management commands |
+| gsd/commands-extensions.ts | Commands, Extensions | Extension discovery and management |
+| gsd/commands-inspect.ts | Commands, Doctor/Diagnostics | Database and state inspection tools |
+| gsd/commands-logs.ts | Commands | Log viewing and filtering |
+| gsd/commands-workflow-templates.ts | Commands, GSD Workflow | Workflow template management |
+| gsd/commands-cmux.ts | Commands, CMux | Tmux/cmux integration commands |
+| gsd/exit-command.ts | Commands | Exit and cleanup commands |
+| gsd/undo.ts | Commands | Undo and rollback functionality |
+| gsd/kill.ts | Commands | Process termination and cleanup |
+| gsd/worktree-command.ts | Commands, Worktree | Worktree subcommands |
+| gsd/namespaced-resolver.ts | GSD Workflow | Namespace and scoped resource resolution |
+| gsd/error-utils.ts | GSD Workflow | Error handling and formatting |
+| gsd/errors.ts | GSD Workflow | Error type definitions |
+| gsd/diff-context.ts | GSD Workflow | Diff-based context extraction |
+| gsd/memory-extractor.ts | GSD Workflow | Memory and context extraction from state |
+| gsd/structured-data-formatter.ts | GSD Workflow | Structured output formatting |
+| gsd/export-html.ts | GSD Workflow | HTML export of milestone reports |
+| gsd/reports.ts | GSD Workflow | Report generation and summaries |
+| gsd/notifications.ts | GSD Workflow | User notification and messaging |
+| gsd/triage-ui.ts | GSD Workflow | Triage interface for issue categorization |
+| gsd/guided-flow.ts | GSD Workflow | User-guided workflow orchestration |
+| gsd/env-utils.ts | GSD Workflow | Environment variable utilities |
+| gsd/git-constants.ts | GSD Workflow | Git-related constants and paths |
+| gsd/milestone-id-utils.ts | GSD Workflow | Milestone ID generation and parsing |
+| gsd/resource-version.ts | GSD Workflow | Resource versioning helpers |
+| gsd/atomic-write.ts | GSD Workflow | Atomic file write operations |
+| gsd/captures.ts | GSD Workflow | Artifact capture and storage |
+| gsd/changelog.ts | GSD Workflow | Changelog generation |
+| gsd/claude-import.ts | GSD Workflow | Claude API/resource importing |
+| gsd/collision-diagnostics.ts | Doctor/Diagnostics | Collision detection and diagnostics |
+| gsd/prompt-loader.ts | GSD Workflow | Prompt template loading |
+| gsd/file-watcher.ts | GSD Workflow | File system change monitoring |
+| gsd/parallel-eligibility.ts | GSD Workflow | Parallel execution eligibility checks |
+| gsd/plugin-importer.ts | GSD Workflow, Extensions | Custom plugin/extension importing |
+| gsd/verification-gate.ts | GSD Workflow | Pre-execution verification checks |
+| gsd/preference-models.ts | Config, Model System | Model preference configuration |
+| gsd/preferences-skills.ts | Config, Skills | Skill preference configuration |
+| gsd/post-unit-hooks.ts | GSD Workflow | Post-unit execution hooks |
+| gsd/skill-telemetry.ts | Skills | Skill usage and performance telemetry |
+| gsd/bootstrap/* | GSD Workflow, Loader/Bootstrap | Extension initialization and hook registration |
+| gsd/auto/* | Auto Engine | Auto-execution engine components |
+| gsd/commands/* | Commands | Command routing and handling |
+| gsd/templates/* | GSD Workflow | Output templates and formatters |
+| gsd/prompts/* | GSD Workflow | System prompts and instructions |
+| gsd/workflow-templates/* | GSD Workflow | Workflow starter templates and registry |
+| gsd/skills/* | Skills | Integrated skill configurations |
+| gsd/migrate/* | Migration | Data migration and upgrade tools |
+
+### Other Extensions
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| async-jobs/index.ts | Async Jobs | Background bash command execution extension |
+| async-jobs/job-manager.ts | Async Jobs | Background job lifecycle management |
+| async-jobs/async-bash-tool.ts | Async Jobs, Tool System | Tool for spawning background bash processes |
+| async-jobs/await-tool.ts | Async Jobs, Tool System | Tool for waiting on job completion |
+| async-jobs/cancel-job-tool.ts | Async Jobs, Tool System | Tool for cancelling background jobs |
+| bg-shell/index.ts | Bg Shell | Interactive background process management extension |
+| bg-shell/bg-shell-tool.ts | Bg Shell, Tool System | Tool for spawning background processes |
+| bg-shell/bg-shell-command.ts | Bg Shell, Commands | Command handler for bg subcommands |
+| bg-shell/bg-shell-lifecycle.ts | Bg Shell | Process lifecycle and state management |
+| bg-shell/process-manager.ts | Bg Shell | Core process management implementation |
+| bg-shell/readiness-detector.ts | Bg Shell | Startup readiness detection |
+| bg-shell/interaction.ts | Bg Shell | Interactive process communication |
+| bg-shell/output-formatter.ts | Bg Shell | Process output formatting |
+| bg-shell/overlay.ts | Bg Shell, TUI Components | Terminal overlay for process monitoring |
+| browser-tools/index.ts | Browser Tools | Playwright-based browser automation extension |
+| browser-tools/core.ts | Browser Tools | Core Playwright instance management |
+| browser-tools/lifecycle.ts | Browser Tools | Browser session lifecycle |
+| browser-tools/capture.ts | Browser Tools | Screenshot and media capture |
+| browser-tools/settle.ts | Browser Tools | Page settlement and readiness detection |
+| browser-tools/refs.ts | Browser Tools | Reference-based element selection |
+| browser-tools/state.ts | Browser Tools, State Machine | Browser state management |
+| browser-tools/tools/navigation.ts | Browser Tools, Tool System | Navigation and page loading tool |
+| browser-tools/tools/interaction.ts | Browser Tools, Tool System | Element interaction tool (click, type) |
+| browser-tools/tools/screenshot.ts | Browser Tools, Tool System | Screenshot and visual capture tool |
+| browser-tools/tools/inspection.ts | Browser Tools, Tool System | Page inspection tool |
+| browser-tools/tools/session.ts | Browser Tools, Tool System | Session management and cookies tool |
+| browser-tools/tools/pages.ts | Browser Tools, Tool System | Multi-page management tool |
+| browser-tools/tools/forms.ts | Browser Tools, Tool System | Form filling and submission tool |
+| browser-tools/tools/wait.ts | Browser Tools, Tool System | Wait conditions and polling tool |
+| browser-tools/tools/assertions.ts | Browser Tools, Tool System | Visual and content assertions tool |
+| browser-tools/tools/verify.ts | Browser Tools, Tool System | Verification checks tool |
+| browser-tools/tools/extract.ts | Browser Tools, Tool System | Data extraction tool |
+| browser-tools/tools/pdf.ts | Browser Tools, Tool System | PDF export/generation tool |
+| browser-tools/tools/state-persistence.ts | Browser Tools, Tool System | State save/restore tool |
+| browser-tools/tools/network-mock.ts | Browser Tools, Tool System | Network mocking/interception tool |
+| browser-tools/tools/device.ts | Browser Tools, Tool System | Device emulation tool |
+| browser-tools/tools/visual-diff.ts | Browser Tools, Tool System | Visual regression testing tool |
+| browser-tools/tools/zoom.ts | Browser Tools, Tool System | Zoom and viewport manipulation tool |
+| browser-tools/tools/codegen.ts | Browser Tools, Tool System | Test code generation tool |
+| browser-tools/tools/action-cache.ts | Browser Tools | Action caching and replay |
+| context7/index.ts | Context7, Tool System | Library documentation fetching extension |
+| google-search/index.ts | Google Search, Tool System | Web search via Google API |
+| search-the-web/index.ts | Search the Web | Brave/Jina/Tavily-based web search extension |
+| search-the-web/provider.ts | Search the Web | Search provider abstraction |
+| search-the-web/native-search.ts | Search the Web | Native Brave search implementation |
+| search-the-web/tavily.ts | Search the Web | Tavily search provider |
+| search-the-web/tool-search.ts | Search the Web, Tool System | Search tool implementation |
+| search-the-web/tool-fetch-page.ts | Search the Web, Tool System | Page fetching tool |
+| search-the-web/cache.ts | Search the Web | Search result caching |
+| remote-questions/index.ts | Remote Questions | Remote question routing extension |
+| remote-questions/manager.ts | Remote Questions | Question lifecycle management |
+| remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter |
+| remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter |
+| remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter |
+| mcp-client/index.ts | MCP Server/Client | Model Context Protocol client integration |
+| subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension |
+| subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery |
+| subagent/isolation.ts | Subagent | Execution isolation and sandboxing |
+| subagent/worker-registry.ts | Subagent | Worker process management |
+| slash-commands/index.ts | Slash Commands, Commands | Command boilerplate generators extension |
+| slash-commands/create-slash-command.ts | Slash Commands | Generator for new slash command scaffolding |
+| slash-commands/create-extension.ts | Slash Commands, Extensions | Generator for new extension scaffolding |
+| universal-config/index.ts | Universal Config | Multi-tool configuration file discovery |
+| universal-config/discovery.ts | Universal Config | Configuration file discovery |
+| universal-config/scanners.ts | Universal Config | Tool-specific config scanners |
+| ttsr/index.ts | TTSR | TTSR regex engine — streaming output guardrails |
+| ttsr/ttsr-manager.ts | TTSR | Streaming rule manager |
+| ttsr/rule-loader.ts | TTSR | Rule loading and parsing |
+| voice/index.ts | Voice | Voice input mode extension |
+| voice/speech-recognizer.swift | Voice | macOS Swift speech recognizer |
+| voice/speech-recognizer.py | Voice | Linux/Windows Python speech recognizer |
+| cmux/index.ts | CMux | Tmux/multiplexer session management |
+| mac-tools/index.ts | Mac Tools | macOS-specific utilities extension |
+| mac-tools/swift-cli/Sources/main.swift | Mac Tools | macOS native tools Swift implementation |
+| aws-auth/index.ts | Auth/OAuth | AWS authentication and credential handling |
+| shared/ui.ts | TUI Components | Generic UI components and utilities |
+| shared/tui.ts | TUI Components | Terminal UI helpers |
+| shared/interview-ui.ts | TUI Components | Interview-style questionnaire UI |
+| shared/confirm-ui.ts | TUI Components | Confirmation dialog UI |
+| shared/terminal.ts | TUI Components | Terminal operations and formatting |
+| shared/format-utils.ts | GSD Workflow | String formatting utilities |
+| shared/sanitize.ts | GSD Workflow | Input sanitization |
+| shared/frontmatter.ts | Config | YAML frontmatter parsing |
+
+### src/resources/agents/
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| javascript-pro.md | Subagent | JavaScript specialist agent definition |
+| typescript-pro.md | Subagent | TypeScript specialist agent definition |
+| worker.md | Subagent | Generic worker agent definition |
+| researcher.md | Subagent | Research and exploration agent definition |
+| scout.md | Subagent | Scout/pathfinding agent definition |
+
+### src/resources/skills/
+
+| Skill Directory | System Label(s) | Description |
+|-----------------|-----------------|-------------|
+| react-best-practices/ | Skills | React development patterns (62 files) |
+| userinterface-wiki/ | Skills | UI/UX guidelines and component reference (155 files) |
+| create-skill/ | Skills | Skill creation scaffolding and templates (25 files) |
+| create-gsd-extension/ | Skills, Extensions | GSD extension scaffolding (22 files) |
+| code-optimizer/ | Skills | Performance optimization techniques (16 files) |
+| agent-browser/ | Skills, Browser Tools | Browser automation guidance (11 files) |
+| github-workflows/ | Skills | GitHub Actions workflow patterns (10 files) |
+| debug-like-expert/ | Skills | Advanced debugging techniques (6 files) |
+| make-interfaces-feel-better/ | Skills | UI/UX improvement patterns (5 files) |
+| accessibility/ | Skills | WCAG and accessibility standards |
+| core-web-vitals/ | Skills | Web performance metrics guidance |
+| web-quality-audit/ | Skills | Quality audit procedures |
+| best-practices/ | Skills | General development best practices |
+| frontend-design/ | Skills | Frontend design principles |
+| lint/ | Skills | Code linting standards |
+| review/ | Skills | Code review guidelines |
+| test/ | Skills | Testing strategies and patterns |
+| web-design-guidelines/ | Skills | Web design principles |
+
+---
+
+## web/ — Web Frontend (Next.js)
+
+### App Shell & Navigation
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/app/layout.tsx | Web UI | Root Next.js layout with theme provider and font |
+| web/app/page.tsx | Web UI | Entry page loading GSDAppShell |
+| web/components/gsd/app-shell.tsx | Web UI | Main app shell — sidebar, panels, terminal, commands |
+| web/components/gsd/sidebar.tsx | Web UI | Multi-panel sidebar with milestone explorer |
+| web/components/gsd/status-bar.tsx | Web UI | Status bar with workspace state and metrics |
+
+### Main Views
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/dashboard.tsx | Web UI | Dashboard with workflow actions and metrics |
+| web/components/gsd/chat-mode.tsx | Web UI | Chat interface for agent interaction |
+| web/components/gsd/projects-view.tsx | Web UI | Project browser and selector |
+| web/components/gsd/files-view.tsx | Web UI | File browser and explorer |
+| web/components/gsd/activity-view.tsx | Web UI | Activity log and history view |
+| web/components/gsd/roadmap.tsx | Web UI, GSD Workflow | Milestone roadmap visualization |
+| web/components/gsd/visualizer-view.tsx | Web UI, Doctor/Diagnostics | Workflow visualization |
+| web/components/gsd/project-welcome.tsx | Web UI | Welcome screen for new projects |
+| web/components/gsd/knowledge-captures-panel.tsx | Web UI | Knowledge and capture management |
+
+### Terminal
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/terminal.tsx | Web UI | Terminal widget with input mode handling |
+| web/components/gsd/shell-terminal.tsx | Web UI | Shell terminal with PTY integration |
+| web/components/gsd/main-session-terminal.tsx | Web UI | Main session terminal display |
+| web/components/gsd/dual-terminal.tsx | Web UI | Side-by-side terminal layout |
+
+### Commands & Dialogs
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/command-surface.tsx | Web UI, Commands | Command palette and slash command dispatcher |
+| web/components/gsd/remaining-command-panels.tsx | Web UI, Commands | History, undo, export, cleanup panels |
+| web/components/gsd/diagnostics-panels.tsx | Web UI, Doctor/Diagnostics | Doctor, forensics, skill health panels |
+| web/components/gsd/settings-panels.tsx | Web UI, Config | Settings and preferences panels |
+| web/components/gsd/guided-dialog.tsx | Web UI | Generic guided dialog component |
+| web/components/gsd/update-banner.tsx | Web UI | Update notification banner |
+| web/components/gsd/scope-badge.tsx | Web UI | Scope badge indicator |
+| web/components/gsd/loading-skeletons.tsx | Web UI | Loading skeleton placeholders |
+| web/components/gsd/code-editor.tsx | Web UI | Code editor display component |
+| web/components/gsd/file-content-viewer.tsx | Web UI | File content viewer and previewer |
+| web/components/gsd/focused-panel.tsx | Web UI | Focused panel layout component |
+
+### Onboarding
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/onboarding-gate.tsx | Web UI, Onboarding | Gate and orchestration for onboarding flow |
+| web/components/gsd/onboarding/step-welcome.tsx | Web UI, Onboarding | Welcome step |
+| web/components/gsd/onboarding/step-mode.tsx | Web UI, Onboarding | User mode selection step |
+| web/components/gsd/onboarding/step-provider.tsx | Web UI, Onboarding | LLM provider selection step |
+| web/components/gsd/onboarding/step-authenticate.tsx | Web UI, Onboarding, Auth/OAuth | Authentication step |
+| web/components/gsd/onboarding/step-dev-root.tsx | Web UI, Onboarding | Dev root directory selection step |
+| web/components/gsd/onboarding/step-project.tsx | Web UI, Onboarding | Project selection step |
+| web/components/gsd/onboarding/step-remote.tsx | Web UI, Onboarding | Remote configuration step |
+| web/components/gsd/onboarding/step-optional.tsx | Web UI, Onboarding | Optional settings step |
+| web/components/gsd/onboarding/step-ready.tsx | Web UI, Onboarding | Ready confirmation step |
+| web/components/gsd/onboarding/wizard-stepper.tsx | Web UI, Onboarding | Stepper progress indicator |
+
+### API Routes
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/app/api/boot/route.ts | API Routes, State Machine | Initial boot payload with project/workspace state |
+| web/app/api/session/manage/route.ts | API Routes, Session Management | Session rename and management |
+| web/app/api/session/browser/route.ts | API Routes, Session Management | Session browser listing |
+| web/app/api/session/command/route.ts | API Routes, Session Management | Session command execution |
+| web/app/api/session/events/route.ts | API Routes, Session Management | Session event streaming (SSE) |
+| web/app/api/terminal/stream/route.ts | API Routes | PTY output streaming via SSE |
+| web/app/api/terminal/input/route.ts | API Routes | Terminal input submission |
+| web/app/api/terminal/resize/route.ts | API Routes | Terminal resize |
+| web/app/api/terminal/sessions/route.ts | API Routes | Terminal session management |
+| web/app/api/terminal/upload/route.ts | API Routes | File upload for terminal |
+| web/app/api/bridge-terminal/stream/route.ts | API Routes, Web Mode | Bridge terminal output streaming |
+| web/app/api/bridge-terminal/input/route.ts | API Routes, Web Mode | Bridge terminal input |
+| web/app/api/bridge-terminal/resize/route.ts | API Routes, Web Mode | Bridge terminal resize |
+| web/app/api/projects/route.ts | API Routes | Project discovery and listing |
+| web/app/api/live-state/route.ts | API Routes, State Machine | Live workspace state updates |
+| web/app/api/steer/route.ts | API Routes, Commands | Steering endpoint for agent direction |
+| web/app/api/history/route.ts | API Routes, State Machine | History and metrics |
+| web/app/api/undo/route.ts | API Routes, Commands | Undo operation |
+| web/app/api/cleanup/route.ts | API Routes, Commands | Cleanup operation |
+| web/app/api/export-data/route.ts | API Routes, Commands | Data export |
+| web/app/api/knowledge/route.ts | API Routes, GSD Workflow | Knowledge base |
+| web/app/api/hooks/route.ts | API Routes, GSD Workflow | Git hooks management |
+| web/app/api/inspect/route.ts | API Routes, Doctor/Diagnostics | Inspection and analysis |
+| web/app/api/doctor/route.ts | API Routes, Doctor/Diagnostics | Doctor diagnostic tool |
+| web/app/api/forensics/route.ts | API Routes, Doctor/Diagnostics | Forensics analysis |
+| web/app/api/skill-health/route.ts | API Routes, Doctor/Diagnostics | Skill health check |
+| web/app/api/visualizer/route.ts | API Routes, Doctor/Diagnostics | Workflow visualization |
+| web/app/api/preferences/route.ts | API Routes, Config | User preferences |
+| web/app/api/settings-data/route.ts | API Routes, Config | Settings data |
+| web/app/api/dev-mode/route.ts | API Routes, Config | Development mode toggle |
+| web/app/api/captures/route.ts | API Routes, GSD Workflow | Knowledge captures |
+| web/app/api/browse-directories/route.ts | API Routes | Directory browsing |
+| web/app/api/files/route.ts | API Routes, Tool System | File system access |
+| web/app/api/git/route.ts | API Routes, Tool System | Git operations |
+| web/app/api/onboarding/route.ts | API Routes, Onboarding | Onboarding data |
+| web/app/api/recovery/route.ts | API Routes, Doctor/Diagnostics | Recovery operations |
+| web/app/api/remote-questions/route.ts | API Routes, Remote Questions | Remote question handling |
+| web/app/api/shutdown/route.ts | API Routes | Graceful shutdown |
+| web/app/api/update/route.ts | API Routes, CLI | Update check |
+
+### Library & State
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/lib/auth.ts | Auth/OAuth | Client-side auth token management from URL fragment |
+| web/lib/gsd-workspace-store.tsx | State Machine | Global workspace state store with external store |
+| web/lib/project-store-manager.tsx | State Machine | Multi-project store manager with SSE lifecycle |
+| web/lib/shutdown-gate.ts | State Machine | Graceful shutdown coordination |
+| web/lib/browser-slash-command-dispatch.ts | Commands | Slash command dispatch |
+| web/lib/workflow-actions.ts | GSD Workflow | Primary workflow action derivation logic |
+| web/lib/workflow-action-execution.ts | GSD Workflow | Workflow action execution handler |
+| web/lib/command-surface-contract.ts | Commands | Command surface request/response contract types |
+| web/lib/pty-manager.ts | Web UI | Server-side PTY spawning and session management |
+| web/lib/pty-chat-parser.ts | Web UI | PTY output parsing for chat display |
+| web/lib/remaining-command-types.ts | Web UI | Browser-safe types for command surfaces |
+| web/lib/knowledge-captures-types.ts | GSD Workflow | Knowledge entry and captures types |
+| web/lib/diagnostics-types.ts | Doctor/Diagnostics | Diagnostics panel types |
+| web/lib/settings-types.ts | Config | Settings and preferences types |
+| web/lib/visualizer-types.ts | Doctor/Diagnostics | Workflow visualizer types |
+| web/lib/session-browser-contract.ts | Session Management | Session browser contract types |
+| web/lib/git-summary-contract.ts | Tool System | Git summary contract types |
+| web/lib/utils.ts | Web UI | Common utility functions |
+| web/lib/project-url.ts | Web UI | Project URL parsing and construction |
+| web/lib/workspace-status.ts | Web UI, State Machine | Workspace status derivation |
+| web/lib/image-utils.ts | Image Processing | Image handling and processing utilities |
+| web/lib/use-editor-font-size.ts | Web UI | Editor font size preference hook |
+| web/lib/use-terminal-font-size.ts | Web UI | Terminal font size preference hook |
+| web/lib/use-user-mode.ts | Web UI | User mode hook |
+| web/hooks/use-mobile.ts | Web UI | Mobile viewport detection hook |
+| web/hooks/use-toast.ts | Web UI | Toast notification hook |
+| web/components/theme-provider.tsx | Web UI | Theme provider for dark/light modes |
+| web/components/ui/* (50+ files) | Web UI | Shadcn/ui base component library |
+
+---
+
+## vscode-extension/ — VS Code Extension
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration |
+| vscode-extension/src/gsd-client.ts | VS Code Extension, MCP Server/Client | RPC client for GSD agent communication |
+| vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @gsd command |
+| vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display |
+
+---
+
+## studio/ — Electron Desktop App
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| studio/electron.vite.config.ts | Studio App, Build System | Electron Vite build configuration |
+| studio/src/main/index.ts | Studio App | Electron main process window creation |
+| studio/src/preload/index.ts | Studio App | Context isolation preload for IPC bridge |
+| studio/src/preload/index.d.ts | Studio App | Preload bridge type definitions |
+| studio/src/renderer/src/main.tsx | Studio App | React renderer entry point |
+| studio/src/renderer/src/App.tsx | Studio App | Main app component |
+| studio/src/renderer/src/lib/theme/tokens.ts | Studio App | Design tokens (colors, fonts, sizes) |
+
+---
+
+## native/ — Rust Engine
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| native/crates/engine/src/lib.rs | Native/Rust Tools | N-API entry point exposing all Rust modules |
+| native/crates/engine/src/grep.rs | File Search, Native/Rust Tools | Ripgrep-backed regex search with context/globbing |
+| native/crates/engine/src/glob.rs | File Search, Native/Rust Tools | Glob-pattern FS discovery with gitignore + scan cache |
+| native/crates/engine/src/fd.rs | File Search, Native/Rust Tools | Fuzzy file discovery for autocomplete/@-mentions |
+| native/crates/engine/src/highlight.rs | Syntax Highlighting, Native/Rust Tools | Syntect-backed ANSI syntax highlighting |
+| native/crates/engine/src/ast.rs | AST, Native/Rust Tools | Linker shim for AST N-API registrations |
+| native/crates/engine/src/diff.rs | Text Processing, Native/Rust Tools | Fuzzy matching, Unicode normalization, unified diffs |
+| native/crates/engine/src/image.rs | Image Processing, Native/Rust Tools | Image decode/encode and resize |
+| native/crates/engine/src/html.rs | Text Processing, Native/Rust Tools | HTML to Markdown conversion |
+| native/crates/engine/src/text.rs | Text Processing, Native/Rust Tools | ANSI-aware text measurement and slicing |
+| native/crates/engine/src/truncate.rs | Text Processing, Native/Rust Tools | Line-boundary-aware output truncation |
+| native/crates/engine/src/ps.rs | Native/Rust Tools | Cross-platform process tree management |
+| native/crates/engine/src/clipboard.rs | Native/Rust Tools | Clipboard read/write for text and images |
+| native/crates/engine/src/json_parse.rs | Text Processing, Native/Rust Tools | Streaming JSON parser with partial recovery |
+| native/crates/engine/src/gsd_parser.rs | GSD Workflow, Native/Rust Tools | .gsd/ directory file parser (markdown, frontmatter) |
+| native/crates/engine/src/ttsr.rs | TTSR, Native/Rust Tools | TTSR regex engine with compiled RegexSet |
+| native/crates/engine/src/stream_process.rs | Text Processing, Native/Rust Tools | Bash stream processor (UTF-8, ANSI strip, binary) |
+| native/crates/engine/src/xxhash.rs | Native/Rust Tools | xxHash32 for hashline edit tool |
+| native/crates/engine/src/git.rs | Native/Rust Tools | Native git operations via libgit2 |
+| native/crates/engine/src/fs_cache.rs | File Search, Native/Rust Tools | TTL-based FS scan cache with explicit invalidation |
+| native/crates/engine/src/glob_util.rs | File Search, Native/Rust Tools | Shared glob-pattern helpers |
+| native/crates/engine/src/task.rs | Native/Rust Tools | Blocking work on libuv thread pool with cancellation |
+| native/crates/engine/build.rs | Build System | Cargo build script for napi-build compilation |
+| native/crates/grep/src/lib.rs | File Search, Native/Rust Tools | Ripgrep search library (in-memory and on-disk) |
+| native/crates/ast/src/lib.rs | AST, Native/Rust Tools | AST-aware structural search and rewrite engine |
+| native/crates/ast/src/ast.rs | AST, Native/Rust Tools | ast-grep integration for structural code search |
+| native/crates/ast/src/language/mod.rs | AST, Native/Rust Tools | Vendored language defs and tree-sitter bindings |
+| native/crates/ast/src/language/parsers.rs | AST, Native/Rust Tools | Pre-compiled tree-sitter parsers (50+ languages) |
+
+## packages/native/src/ — Node.js Rust Bindings
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| packages/native/src/native.ts | Native/Rust Tools, Node.js Bindings | Native addon loader with platform fallback |
+| packages/native/src/grep/index.ts | File Search, Node.js Bindings | Ripgrep wrapper for regex search |
+| packages/native/src/fd/index.ts | File Search, Node.js Bindings | Fuzzy file discovery wrapper |
+| packages/native/src/highlight/index.ts | Syntax Highlighting, Node.js Bindings | Syntax highlighting wrapper |
+| packages/native/src/image/index.ts | Image Processing, Node.js Bindings | Image processing wrapper |
+| packages/native/src/html/index.ts | Text Processing, Node.js Bindings | HTML to Markdown wrapper |
+| packages/native/src/diff/index.ts | Text Processing, Node.js Bindings | Text diffing wrapper |
+| packages/native/src/ps/index.ts | Native/Rust Tools, Node.js Bindings | Process tree management wrapper |
+| packages/native/src/truncate/index.ts | Text Processing, Node.js Bindings | Output truncation wrapper |
+| packages/native/src/json-parse/index.ts | Text Processing, Node.js Bindings | JSON parsing wrapper |
+| packages/native/src/stream-process/index.ts | Text Processing, Node.js Bindings | Stream processing wrapper |
+| packages/native/src/ttsr/index.ts | TTSR, Node.js Bindings | TTSR regex engine wrapper |
+
+---
+
+## tests/ — Test Suite
+
+| File / Directory | System Label(s) | Description |
+|------------------|-----------------|-------------|
+| tests/smoke/run.ts | Integration Tests | Test runner for smoke tests |
+| tests/smoke/test-help.ts | Integration Tests | Smoke test for help command |
+| tests/smoke/test-init.ts | Integration Tests | Smoke test for initialization |
+| tests/smoke/test-version.ts | Integration Tests | Smoke test for version command |
+| tests/fixtures/run.ts | Integration Tests | Fixture-based test harness with recording replay |
+| tests/fixtures/provider.ts | Integration Tests | Fixture provider and replayer for LLM turns |
+| tests/fixtures/record.ts | Integration Tests | Recording fixture capture |
+| tests/fixtures/recordings/*.json | Integration Tests | Pre-recorded LLM agent interaction fixtures |
+| tests/live/run.ts | Integration Tests | Live API roundtrip test runner |
+| tests/live/test-anthropic-roundtrip.ts | Integration Tests, AI Providers | Live Anthropic API integration test |
+| tests/live/test-openai-roundtrip.ts | Integration Tests, AI Providers | Live OpenAI API integration test |
+| tests/live-regression/run.ts | Integration Tests | Live regression test runner |
+| tests/repro-worktree-bug/*.mjs | Integration Tests, Worktree | Worktree bug reproduction scripts |
+
+---
+
+## scripts/ — Build & Utility
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| scripts/dev.js | Build System | Dev supervisor — tsc and resource watcher |
+| scripts/dev-cli.js | Build System | CLI development mode runner |
+| scripts/watch-resources.js | Build System | Resource file watcher for hot reload |
+| scripts/bump-version.mjs | Build System | Version bumper for package.json and platform packages |
+| scripts/sync-pkg-version.cjs | Build System | Sync pkg/package.json with workspace version |
+| scripts/copy-resources.cjs | Build System | Resource file copier for distribution |
+| scripts/copy-export-html.cjs | Build System | HTML export asset copier |
+| scripts/copy-themes.cjs | Build System | Theme file copier |
+| scripts/link-workspace-packages.cjs | Build System | Workspace package symlink manager |
+| scripts/ensure-workspace-builds.cjs | Build System | Postinstall build checker |
+| scripts/build-web-if-stale.cjs | Build System | Conditional web build trigger |
+| scripts/stage-web-standalone.cjs | Build System | Web standalone staging |
+| scripts/generate-changelog.mjs | Build System | Changelog generator from commits |
+| scripts/update-changelog.mjs | Build System | Changelog updater |
+| scripts/version-stamp.mjs | Build System | Version timestamp generator |
+| scripts/validate-pack.sh | Build System | Package validation script |
+| scripts/validate-pack.js | Build System | Package validation (Node.js) |
+| scripts/install-pi-global.js | Build System | Global installation helper |
+| scripts/uninstall-pi-global.js | Build System | Global uninstallation helper |
+| scripts/install-hooks.sh | Build System, GSD Workflow | Git hook installer |
+| scripts/secret-scan.sh | Build System, Auth/OAuth | Secret scanning for credentials |
+| scripts/docs-prompt-injection-scan.sh | Build System | Prompt injection detection in docs |
+| scripts/check-skill-references.mjs | Build System, Skills | Skill reference validator |
+| scripts/preview-dashboard.ts | Web Mode | Dashboard preview server |
+| scripts/ci_monitor.cjs | Build System | CI monitoring dashboard |
+| scripts/recover-gsd-1364.sh | Build System, Migration | Recovery script for issue #1364 |
+| scripts/recover-gsd-1364.ps1 | Build System, Migration | Recovery script for issue #1364 (PowerShell) |
+| scripts/recover-gsd-1668.sh | Build System, Migration | Recovery script for issue #1668 |
+| scripts/recover-gsd-1668.ps1 | Build System, Migration | Recovery script for issue #1668 (PowerShell) |
+
+---
+
+## System → File Reverse Index
+
+Quick lookup: which files are part of each system?
+
+| System | Key Files (abbreviated) |
+|--------|------------------------|
+| **Agent Core** | pi-agent-core/src/*, pi-coding-agent/src/core/agent-session.ts, agent-loop.ts, agent.ts, event-bus.ts, sdk.ts |
+| **AI Providers** | pi-ai/src/providers/*, pi-ai/src/stream.ts, pi-ai/src/models*.ts |
+| **API Routes** | web/app/api/**/*.ts |
+| **AST** | native/crates/ast/*, packages/native/src/ast/ |
+| **Async Jobs** | src/resources/extensions/async-jobs/* |
+| **Auth / OAuth** | pi-ai/src/utils/oauth/*, src/web/web-auth-storage.ts, core/auth-storage.ts, src/pi-migration.ts, aws-auth/index.ts, web/lib/auth.ts |
+| **Auto Engine** | src/resources/extensions/gsd/auto*.ts, gsd/auto-loop.ts, gsd/auto-supervisor.ts, gsd/unit-runtime.ts |
+| **Bg Shell** | src/resources/extensions/bg-shell/* |
+| **Browser Tools** | src/resources/extensions/browser-tools/* |
+| **Build System** | scripts/*, native/crates/engine/build.rs |
+| **CLI** | src/cli.ts, src/cli-web-branch.ts, src/help-text.ts, src/update*.ts, pi-coding-agent/src/cli.ts, src/worktree-cli.ts |
+| **CMux** | src/resources/extensions/cmux/index.ts |
+| **Commands** | gsd/commands*.ts, gsd/exit-command.ts, gsd/undo.ts, gsd/kill.ts, pi-coding-agent/src/core/slash-commands.ts |
+| **Compaction** | pi-coding-agent/src/core/compaction*.ts, core/compaction/* |
+| **Config** | src/app-paths.ts, src/models-resolver.ts, src/remote-questions-config.ts, src/wizard.ts, core/defaults.ts, core/constants.ts, config.ts |
+| **Context7** | src/resources/extensions/context7/index.ts |
+| **Doctor / Diagnostics** | gsd/doctor*.ts, gsd/collision-diagnostics.ts, core/diagnostics.ts, web/lib/diagnostics-types.ts, web/app/api/doctor/*, forensics/* |
+| **Event System** | pi-coding-agent/src/core/event-bus.ts, gsd/auto-observability.ts |
+| **Extension Registry** | src/extension-discovery.ts, src/extension-registry.ts, src/bundled-extension-paths.ts |
+| **Extensions** | pi-coding-agent/src/core/extensions/*, src/resource-loader.ts |
+| **File Search** | native/crates/engine/src/grep.rs, glob.rs, fd.rs, fs_cache.rs, packages/native/src/grep/*, fd/*, core/tools/grep.ts, find.ts |
+| **GSD Workflow** | src/resources/extensions/gsd/* (non-auto), gsd/reports.ts, gsd/notifications.ts, gsd/prompts/*, gsd/workflow-templates/* |
+| **Google Search** | src/resources/extensions/google-search/index.ts |
+| **Headless Mode** | src/headless*.ts |
+| **Image Processing** | native/crates/engine/src/image.rs, packages/native/src/image/*, utils/image-*.ts, web/lib/image-utils.ts |
+| **Integration Tests** | tests/**/* |
+| **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, gsd/bootstrap/* |
+| **LSP** | pi-coding-agent/src/core/lsp/* |
+| **Mac Tools** | src/resources/extensions/mac-tools/* |
+| **MCP Server/Client** | src/mcp-server.ts, src/resources/extensions/mcp-client/index.ts, vscode-extension/src/gsd-client.ts, modes/rpc/* |
+| **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* |
+| **Migration** | gsd/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
+| **Modes** | pi-coding-agent/src/modes/* |
+| **Model System** | pi-coding-agent/src/core/model-*.ts, pi-ai/src/models*.ts, pi-ai/src/api-registry.ts, gsd/model-router.ts |
+| **Native / Rust Tools** | native/crates/engine/src/* |
+| **Node.js Bindings** | packages/native/src/* |
+| **Onboarding** | src/onboarding.ts, src/wizard.ts, web/components/gsd/onboarding/*, web/app/api/onboarding/* |
+| **Permissions** | core/extensions/project-trust.ts, core/auth-storage.ts |
+| **Remote Questions** | src/resources/extensions/remote-questions/* |
+| **Search the Web** | src/resources/extensions/search-the-web/* |
+| **Session Management** | pi-coding-agent/src/core/session-manager.ts, core/settings-manager.ts, web/app/api/session/* |
+| **Skills** | src/resources/skills/*, gsd/skill-telemetry.ts, gsd/preferences-skills.ts, core/skills.ts |
+| **Slash Commands** | src/resources/extensions/slash-commands/* |
+| **State Machine** | gsd/state.ts, gsd/history.ts, gsd/json-persistence.ts, gsd/memory-store.ts, gsd/reactive-graph.ts, core/agent-session.ts, web/lib/gsd-workspace-store.tsx |
+| **Studio App** | studio/* |
+| **Subagent** | src/resources/extensions/subagent/*, src/resources/agents/* |
+| **Syntax Highlighting** | native/crates/engine/src/highlight.rs, packages/native/src/highlight/* |
+| **Text Processing** | native/crates/engine/src/diff.rs, html.rs, text.rs, truncate.rs, json_parse.rs, stream_process.rs |
+| **Tool System** | pi-coding-agent/src/core/tools/*, core/bash-executor.ts, core/exec.ts |
+| **TTSR** | src/resources/extensions/ttsr/*, native/crates/engine/src/ttsr.rs, packages/native/src/ttsr/* |
+| **TUI Components** | packages/pi-tui/src/*, pi-coding-agent/src/modes/interactive/components/*, pi-coding-agent/src/modes/interactive/controllers/* |
+| **Universal Config** | src/resources/extensions/universal-config/* |
+| **Voice** | src/resources/extensions/voice/* |
+| **VS Code Extension** | vscode-extension/src/* |
+| **Web Mode** | src/web/*.ts, src/web-mode.ts |
+| **Web UI** | web/app/*.tsx, web/components/*, web/hooks/*, web/lib/* |
+| **Worktree** | src/worktree-cli.ts, src/worktree-name-gen.ts, gsd/worktree*.ts, tests/repro-worktree-bug/* |
diff --git a/docs/FRONTIER-TECHNIQUES.md b/docs/FRONTIER-TECHNIQUES.md
new file mode 100644
index 000000000..6aa5ad59a
--- /dev/null
+++ b/docs/FRONTIER-TECHNIQUES.md
@@ -0,0 +1,741 @@
+# Frontier Techniques for GSD-2
+
+Research into cutting-edge AI agent techniques that map directly to GSD-2's architecture, ranked by impact and feasibility.
+
+**Date:** 2026-03-25
+**Status:** Research / Pre-RFC
+
+---
+
+## Table of Contents
+
+- [Executive Summary](#executive-summary)
+- [1. Skill Library Evolution](#1-skill-library-evolution)
+- [2. DAG-Based Parallel Tool Execution](#2-dag-based-parallel-tool-execution)
+- [3. Speculative Tool Execution](#3-speculative-tool-execution)
+- [4. Semantic Context Compression](#4-semantic-context-compression)
+- [5. Cross-Session Learning Graph](#5-cross-session-learning-graph)
+- [6. MCTS-Based Planning](#6-mcts-based-planning)
+- [Priority Matrix](#priority-matrix)
+- [Sources & References](#sources--references)
+
+---
+
+## Executive Summary
+
+GSD-2 is a multi-layered, event-driven agent platform with strong extensibility primitives: a skill system, file-based memory, session branching, compaction, and 16+ extension lifecycle hooks. These existing primitives create natural integration points for six frontier techniques that could fundamentally change how GSD operates.
+
+The techniques fall into three categories:
+
+| Category | Techniques | Theme |
+|----------|-----------|-------|
+| **Self-Improvement** | Skill Library Evolution, Cross-Session Learning Graph | GSD gets better the more you use it |
+| **Performance** | DAG Tool Execution, Speculative Tool Execution | GSD gets faster per turn |
+| **Intelligence** | Semantic Context Compression, MCTS Planning | GSD reasons better with the same context budget |
+
+---
+
+## 1. Skill Library Evolution
+
+**Category:** Self-Improvement
+**Impact:** Massive | **Effort:** Medium | **Priority:** #1
+
+### What It Is
+
+Inspired by [SkillRL](https://arxiv.org/abs/2602.08234) (ICLR 2026), this technique transforms GSD's skill system from static instruction files into a self-improving knowledge base. Instead of skills being written once and updated manually, they evolve based on execution outcomes.
+
+SkillRL demonstrates that agents with learned skill libraries outperform baselines by 15.3%+ across task benchmarks, with 10-20% token compression compared to raw trajectory storage.
+
+### How It Works
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    EXECUTION LOOP                       │
+│                                                         │
+│  1. Skill invoked → agent executes task                 │
+│  2. Outcome captured (success/failure + trajectory)     │
+│  3. Trajectory distilled:                               │
+│     ├─ Success → strategic pattern extracted            │
+│     └─ Failure → anti-pattern + lesson recorded         │
+│  4. Skill file updated with versioned improvement       │
+│  5. Next invocation benefits from accumulated learnings │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+```
+
+**Two types of learned knowledge:**
+
+| Type | Description | Example |
+|------|-------------|---------|
+| **General Skills** | Universal strategic guidance applicable across tasks | "When editing TypeScript files, always check for type errors via LSP before committing" |
+| **Task-Specific Skills** | Category-level heuristics for specific skill domains | "The `fix-issue` skill should check CI status before opening a PR, not after" |
+
+### Why It Fits GSD-2
+
+GSD already has every primitive needed:
+
+- **Skill files** (`~/.claude/skills/`, `.claude/skills/`) — the storage layer exists
+- **Extension hooks** (`turn_end`, `agent_end`) — outcome capture points exist
+- **Memory system** (MEMORY.md + individual files) — persistence exists
+- **`/improve-skill` and `/heal-skill` commands** — manual versions of this loop already exist
+
+The gap is automation: connecting execution outcomes back to skill files without human intervention.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-session.ts` → `turn_end` event | Captures execution outcome (success/failure signals) |
+| Extension hook: `agent_end` | Triggers trajectory distillation |
+| Skill file system | Receives versioned updates with learned patterns |
+| `compaction.ts` | Provides trajectory data from the session for distillation |
+
+### Architecture
+
+```
+User invokes skill
+        │
+        ▼
+┌──────────────┐     ┌──────────────────┐
+│ AgentSession  │────▶│  Skill Executor   │
+│ (turn_end)    │     │  (tracks outcome) │
+└──────────────┘     └────────┬─────────┘
+                              │
+                    ┌─────────▼──────────┐
+                    │ Outcome Classifier  │
+                    │ (success/failure/   │
+                    │  partial)           │
+                    └─────────┬──────────┘
+                              │
+              ┌───────────────┼───────────────┐
+              ▼               ▼               ▼
+     ┌────────────┐  ┌──────────────┐  ┌───────────┐
+     │  Success   │  │   Failure    │  │  Partial   │
+     │  Distiller │  │  Distiller   │  │  Analyzer  │
+     └─────┬──────┘  └──────┬───────┘  └─────┬─────┘
+           │                │                 │
+           ▼                ▼                 ▼
+     ┌─────────────────────────────────────────────┐
+     │           Skill File Updater                 │
+     │  • Appends learned pattern to skill          │
+     │  • Versions the update                       │
+     │  • Preserves original skill intent           │
+     └─────────────────────────────────────────────┘
+```
+
+### Open Questions
+
+- **Drift prevention:** How to prevent accumulated learnings from overwhelming the original skill intent?
+- **Conflict resolution:** What happens when a lesson from one session contradicts another?
+- **Quality gate:** Should updates require a validation pass before being written?
+
+---
+
+## 2. DAG-Based Parallel Tool Execution
+
+**Category:** Performance
+**Impact:** High | **Effort:** Medium | **Priority:** #2
+
+### What It Is
+
+The [LLM Compiler pattern](https://arxiv.org/pdf/2312.04511) (ICML 2024) treats multi-tool workflows like a compiler optimization pass. When the model returns multiple tool calls in a single response, instead of executing them sequentially, the system:
+
+1. Analyzes dependencies between tool calls
+2. Constructs a Directed Acyclic Graph (DAG)
+3. Executes independent tools in parallel
+4. Blocks only on actual data dependencies
+
+### How It Works
+
+**Current GSD behavior (sequential):**
+```
+Read(auth.ts) ─── 150ms ───▶ result
+                               │
+Read(types.ts) ─── 120ms ──▶ result
+                               │
+Grep("login") ─── 80ms ────▶ result
+                               │
+Read(test.ts) ─── 130ms ───▶ result
+                               │
+Total: ~480ms sequential
+```
+
+**With DAG execution (parallel):**
+```
+Read(auth.ts)  ─── 150ms ──▶ result ─┐
+Read(types.ts) ─── 120ms ──▶ result ─┤
+Grep("login")  ─── 80ms ───▶ result ─┤── all complete at 150ms
+Read(test.ts)  ─── 130ms ──▶ result ─┘
+                                      │
+Total: ~150ms (max of parallel set)
+```
+
+**Dependency analysis rules:**
+
+| Tool A | Tool B | Dependency? | Reason |
+|--------|--------|-------------|--------|
+| Read(file) | Read(file) | No | Reads are idempotent |
+| Read(file) | Grep(pattern) | No | Independent data sources |
+| Read(file) | Edit(file) | Yes | Edit depends on Read content |
+| Edit(file) | Edit(file) | Yes | Edits to same file must serialize |
+| Bash(cmd) | Bash(cmd) | Maybe | Depends on side effects |
+| Write(file) | Read(file) | Yes | Read after write needs write to complete |
+
+### Why It Fits GSD-2
+
+The model already emits multiple `tool_use` blocks in a single response. GSD processes them, but the execution path in `agent-loop.ts` handles them in sequence. The parallelism opportunity is sitting right there.
+
+**Measured impact estimate:** A typical coding turn involves 3-5 tool calls. With 60% parallelizable (reads, greps, globs), per-turn latency drops by 40-60%. Over a 50-turn session, that's minutes saved.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-loop.ts` tool execution path | Replace sequential execution with DAG scheduler |
+| Tool definitions | Annotate tools with side-effect metadata (pure/impure) |
+| Extension hooks (`tool_*`) | Must still fire in correct order per dependency chain |
+
+### Architecture
+
+```
+Model response with N tool_use blocks
+                │
+                ▼
+┌──────────────────────────────┐
+│      Dependency Analyzer      │
+│  • Parse tool calls           │
+│  • Identify file overlaps     │
+│  • Identify data dependencies │
+│  • Classify: pure vs impure   │
+└──────────────┬───────────────┘
+               │
+               ▼
+┌──────────────────────────────┐
+│        DAG Constructor        │
+│  • Nodes = tool calls         │
+│  • Edges = dependencies       │
+│  • Topological sort           │
+└──────────────┬───────────────┘
+               │
+               ▼
+┌──────────────────────────────┐
+│      Parallel Executor        │
+│  • Execute roots immediately  │
+│  • On completion, unlock      │
+│    dependent nodes            │
+│  • Collect all results        │
+│  • Return in original order   │
+└──────────────────────────────┘
+```
+
+### Open Questions
+
+- **Bash side effects:** How to determine if two Bash commands conflict without executing them?
+- **Extension hooks:** Should `tool_start`/`tool_end` events fire in execution order or original order?
+- **Error propagation:** If a parallel tool fails, do dependent tools get cancelled or receive the error?
+
+---
+
+## 3. Speculative Tool Execution
+
+**Category:** Performance
+**Impact:** High | **Effort:** Low-Medium | **Priority:** #3
+
+### What It Is
+
+Based on [Speculative Tool Calls research](https://arxiv.org/pdf/2512.15834), this technique predicts which tools the model will request and pre-executes them before the model responds. Correct predictions eliminate the first tool-call round-trip entirely. Wrong predictions are discarded at zero cost beyond compute.
+
+### How It Works
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ User: "fix the bug in auth.ts"                              │
+│                                                             │
+│ BEFORE model responds:                                      │
+│   Speculator predicts:                                      │
+│     ├─ Read("auth.ts")           → pre-executed ✓           │
+│     ├─ Grep("error|bug", "auth") → pre-executed ✓           │
+│     ├─ LSP diagnostics(auth.ts)  → pre-executed ✓           │
+│     └─ Read("auth.test.ts")      → pre-executed ✓           │
+│                                                             │
+│ Model responds with tool calls:                             │
+│     ├─ Read("auth.ts")           → CACHE HIT (0ms)         │
+│     ├─ Read("auth.test.ts")      → CACHE HIT (0ms)         │
+│     └─ Grep("login", "src/")     → cache miss (execute)    │
+│                                                             │
+│ Hit rate: 2/3 = 67%                                         │
+│ Latency saved: ~300ms on this turn                          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Prediction strategies (simplest to most sophisticated):**
+
+| Strategy | Description | Expected Hit Rate |
+|----------|-------------|-------------------|
+| **Keyword extraction** | Parse user prompt for file paths, function names → Read those files | 40-60% |
+| **Session history** | Track which tools follow which user prompt patterns | 50-70% |
+| **Learned patterns** | Use the skill library evolution data to predict tool sequences | 60-80% |
+| **Model pre-query** | Ask a fast/cheap model to predict tool calls | 70-85% |
+
+### Why It Fits GSD-2
+
+The #1 latency bottleneck in GSD is the round-trip: user prompt → model thinks → model requests tool → tool executes → result sent back → model thinks again. Speculative execution attacks the highest-latency step.
+
+GSD's architecture makes this easy to add:
+- `AgentSession.prompt()` already processes user input before sending to the model
+- Tool results are already cached in the message array
+- The extension system can intercept input and spawn pre-fetches
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `AgentSession.prompt()` | Trigger speculation after user input, before model call |
+| Tool result cache (new) | Store speculated results keyed by tool+args |
+| `agent-loop.ts` tool execution | Check cache before executing; serve cached result on hit |
+| Extension hook: `input` | Parse user intent for file paths, patterns |
+
+### Architecture
+
+```
+User input arrives
+        │
+        ├──────────────────────────────────────┐
+        │                                      │
+        ▼                                      ▼
+┌───────────────┐                    ┌──────────────────┐
+│  Send to LLM  │                    │   Speculator      │
+│  (normal path) │                    │  • Extract paths   │
+│               │                    │  • Predict tools   │
+│  ... waiting  │                    │  • Pre-execute     │
+│  for response │                    │  • Cache results   │
+│               │                    └──────────────────┘
+│               │                              │
+│               │◀─── model returns ──────────│
+│               │     tool_use blocks         │
+└───────┬───────┘                              │
+        │                                      │
+        ▼                                      │
+┌───────────────┐                              │
+│ Tool Executor  │◀──── check cache ───────────┘
+│ • Cache hit?   │
+│   → return     │
+│ • Cache miss?  │
+│   → execute    │
+└───────────────┘
+```
+
+### Cost Analysis
+
+| Scenario | Cost |
+|----------|------|
+| **Correct prediction** | ~0ms latency (result already available). Compute cost: the pre-execution itself (trivial for Read/Grep). |
+| **Wrong prediction** | Wasted compute for the pre-executed tool. For Read/Grep/Glob, this is <10ms of I/O. |
+| **Partial hit** | Net positive as long as hit rate > 20% (given how cheap misses are). |
+
+### Open Questions
+
+- **TTL for cached results:** How long are speculated results valid? File contents can change between speculation and model request.
+- **Side effects:** Should only pure tools (Read, Grep, Glob, LSP) be speculatable?
+- **Resource limits:** Cap on number of speculative executions per turn to prevent I/O storms?
+
+---
+
+## 4. Semantic Context Compression
+
+**Category:** Intelligence
+**Impact:** High | **Effort:** High | **Priority:** #4
+
+### What It Is
+
+GSD's compaction system uses a char/4 heuristic for token estimation and all-or-nothing LLM summarization for context reduction. Research from [Zylos](https://zylos.ai/research/2026-02-28-ai-agent-context-compression-strategies) and [context engineering literature](https://rlancemartin.github.io/2025/06/23/context_engineering/) shows that embedding-based compression achieves 80-90% token reduction while preserving the ability to selectively recall specific historical context.
+
+### Current GSD Compaction (Weaknesses Highlighted)
+
+```
+Messages: [M1, M2, M3, M4, M5, M6, M7, M8, M9, M10]
+                                                    ▲
+Token budget exceeded                               │ recent
+                                                    │
+Current approach:
+┌─────────────────────────┬─────────────────────────┐
+│  M1-M6: LLM-summarized │  M7-M10: kept verbatim  │
+│  into single blob       │  (last ~20k tokens)     │
+│                         │                         │
+│  ⚠ All detail lost      │  ✓ Full fidelity        │
+│  ⚠ No selective recall  │                         │
+│  ⚠ char/4 overestimates │                         │
+└─────────────────────────┴─────────────────────────┘
+```
+
+**Three specific weaknesses:**
+
+| Weakness | Impact | Current Code Location |
+|----------|--------|-----------------------|
+| char/4 token estimation | ~25% overestimate → compacts too early → wastes context | `compaction.ts:201-259` |
+| All-or-nothing summarization | Loses specific details that may be relevant later | `compaction.ts:327-400` |
+| No retrieval from compacted history | Once summarized, detail is gone forever | `compaction-orchestrator.ts` |
+
+### Proposed: Tiered Memory Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    HOT TIER                              │
+│  Recent turns (last ~20k tokens)                        │
+│  Full text, full fidelity                               │
+│  Storage: in-context messages                           │
+│  Access: always in prompt                               │
+├─────────────────────────────────────────────────────────┤
+│                    WARM TIER                             │
+│  Older turns (beyond context window)                    │
+│  Stored as embeddings + compressed text                 │
+│  Storage: session-local vector index                    │
+│  Access: retrieved when semantically relevant to        │
+│          current turn                                   │
+│  Token cost: only retrieved segments count              │
+├─────────────────────────────────────────────────────────┤
+│                    COLD TIER                             │
+│  Ancient turns / previous sessions                      │
+│  Stored as summaries + metadata                         │
+│  Storage: disk (existing session files)                 │
+│  Access: retrieved only on explicit recall              │
+│  Token cost: minimal summary headers                    │
+└─────────────────────────────────────────────────────────┘
+```
+
+**How retrieval works per turn:**
+
+```
+New user prompt arrives
+        │
+        ▼
+┌───────────────────┐
+│  Embed the prompt  │ (compute embedding of user's question)
+└────────┬──────────┘
+         │
+         ├──── query warm tier ──▶ top-K relevant historical turns
+         │                         (cosine similarity > threshold)
+         │
+         ├──── always include ──▶ hot tier (recent turns, full text)
+         │
+         ▼
+┌───────────────────┐
+│  Compose context   │
+│  = hot + retrieved │
+│  + system prompt   │
+└───────────────────┘
+```
+
+### Token Estimation Improvement
+
+Replace char/4 with adaptive estimation:
+
+| Approach | Accuracy | Cost |
+|----------|----------|------|
+| **char/4 (current)** | ~75% (overestimates) | Zero |
+| **Provider-reported usage** | 100% (for last turn) | Zero (already tracked) |
+| **tiktoken/provider tokenizer** | ~98% | ~5ms per message |
+| **Hybrid: actual for recent, char/4 for old** | ~95% | Negligible |
+
+The hybrid approach — use actual token counts from provider responses for recent messages, fall back to char/4 for older messages — is a quick win that requires no new dependencies.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `compaction.ts` | Replace cut-point algorithm with tiered approach |
+| `compaction-orchestrator.ts` | Add warm-tier retrieval before model call |
+| `agent-session.ts` message building | Inject retrieved warm-tier segments |
+| Session persistence layer | Store embeddings alongside session entries |
+
+### Open Questions
+
+- **Embedding model:** Local (fast, private) or API (better quality, adds latency)?
+- **Index format:** Simple cosine similarity on flat arrays vs. HNSW index?
+- **Retrieval budget:** How many tokens to allocate to warm-tier retrievals per turn?
+- **Coherence:** How to prevent retrieved historical context from confusing the model about the current state?
+
+---
+
+## 5. Cross-Session Learning Graph
+
+**Category:** Self-Improvement
+**Impact:** Transformative | **Effort:** High | **Priority:** #5
+
+### What It Is
+
+GSD's memory system (MEMORY.md + individual files) stores flat, file-based memories. A learning graph extends this into a structured knowledge base that captures relationships between codebases, files, errors, solutions, and patterns across all sessions.
+
+This is informed by research on [agent memory architectures](https://github.com/Shichun-Liu/Agent-Memory-Paper-List) and the emerging discipline of [context engineering](https://thenewstack.io/memory-for-ai-agents-a-new-paradigm-of-context-engineering/).
+
+### Current Memory vs Learning Graph
+
+| Aspect | Current (MEMORY.md) | Learning Graph |
+|--------|---------------------|----------------|
+| **Structure** | Flat file list | Nodes + edges (graph) |
+| **Relationships** | None | "file X often breaks when Y changes" |
+| **Retrieval** | All loaded into context | Query-driven, only relevant nodes |
+| **Learning** | Manual (user says "remember X") | Automatic from execution outcomes |
+| **Scope** | Per-project directory | Per-project with cross-project patterns |
+| **Staleness** | Manual cleanup | Confidence decay over time |
+
+### Graph Schema
+
+```
+┌──────────┐     touches      ┌──────────┐
+│  Session  │────────────────▶│   File    │
+│           │                 │           │
+│ • date    │                 │ • path    │
+│ • outcome │                 │ • type    │
+│ • tokens  │                 │ • churn   │
+└────┬──────┘                 └─────┬─────┘
+     │                              │
+     │ encountered                  │ involved_in
+     │                              │
+     ▼                              ▼
+┌──────────┐    resolved_by   ┌──────────┐
+│  Error    │────────────────▶│ Solution  │
+│           │                 │           │
+│ • type    │                 │ • pattern │
+│ • message │                 │ • success │
+│ • freq    │                 │   rate    │
+└──────────┘                 └──────────┘
+     │                              │
+     │ prevented_by                 │ uses
+     │                              │
+     ▼                              ▼
+┌──────────┐                 ┌──────────┐
+│  Pattern  │                │   Tool   │
+│           │                │          │
+│ • type    │                │ • name   │
+│ • desc    │                │ • avg    │
+│ • conf    │                │   time   │
+└──────────┘                 └──────────┘
+```
+
+### Example Queries
+
+| Query | Result |
+|-------|--------|
+| "What errors have occurred in `auth.ts`?" | List of error nodes connected to that file node |
+| "What's the typical fix for `TypeError` in this codebase?" | Solution nodes with highest success rate for that error type |
+| "Which files tend to break together?" | File clusters with high co-occurrence in error sessions |
+| "What tools are slowest in this project?" | Tool nodes sorted by avg execution time |
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `session-manager.ts` | Write graph nodes on session save |
+| `agent-session.ts` prompt building | Query graph for relevant context before model call |
+| Memory system (MEMORY.md) | Coexists — graph handles structured knowledge, memory handles preferences/feedback |
+| Extension hook: `agent_end` | Trigger graph update with session outcome |
+
+### Storage Options
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **SQLite + json columns** | Simple, no dependencies, fast queries | No native vector search |
+| **SQLite + sqlite-vss** | Adds vector similarity to SQLite | Extra native dependency |
+| **Flat JSON files** | Zero dependencies, git-friendly | Slow for large graphs |
+| **LanceDB** | Embedded vector DB, no server | Additional dependency |
+
+### Open Questions
+
+- **Privacy:** Graph contains detailed codebase interaction history — should it be encrypted at rest?
+- **Portability:** Should the graph travel with the project (`.claude/` dir) or stay user-local?
+- **Garbage collection:** How to prune stale nodes (e.g., files that no longer exist)?
+
+---
+
+## 6. MCTS-Based Planning
+
+**Category:** Intelligence
+**Impact:** Transformative | **Effort:** Very High | **Priority:** #6
+
+### What It Is
+
+Inspired by [ToolTree](https://www.agentic-patterns.com/patterns/skill-library-evolution/) and Monte Carlo Tree Search, this technique replaces GSD's linear action selection with a tree-based planner that explores multiple solution paths simultaneously.
+
+Instead of the model deciding one action at a time and hoping it works, the system:
+
+1. Generates N candidate next-actions
+2. Scores each based on estimated probability of reaching the goal
+3. Explores promising branches in parallel
+4. Backtracks when a path fails, without wasting the user's context on dead ends
+
+### Current vs MCTS Approach
+
+**Current (linear):**
+```
+User: "fix the auth bug"
+  │
+  ▼
+Action 1: Read auth.ts ──▶ Action 2: Edit line 45 ──▶ Action 3: Run tests
+                                                              │
+                                                         Tests fail ✗
+                                                              │
+                                                         ▼
+                                                    Action 4: Try different edit
+                                                              │
+                                                         Tests fail ✗
+                                                              │
+                                                         ▼
+                                                    Action 5: Read error log...
+                                                    (linear flailing)
+```
+
+**With MCTS (tree search):**
+```
+User: "fix the auth bug"
+  │
+  ▼
+Read auth.ts
+  │
+  ├── Branch A: Edit line 45 (score: 0.6)
+  │     └── Run tests → FAIL → prune
+  │
+  ├── Branch B: Check auth middleware (score: 0.7)  ◀── highest score
+  │     └── Edit middleware.ts → Run tests → PASS ✓
+  │
+  └── Branch C: Check env config (score: 0.3)
+        └── (not explored — lower score)
+
+Result: Branch B succeeds after 2 actions, not 5+
+```
+
+### Why It Fits GSD-2
+
+GSD already has session branching primitives:
+- `fork()` creates a branch from any message
+- Branch summaries compress history at fork points
+- Tree navigation (`/tree`) lets users explore branches
+- Session tree is already a first-class concept
+
+The gap: these primitives are user-triggered. MCTS would make the agent trigger them automatically during problem-solving.
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    MCTS Planning Layer                   │
+│                                                         │
+│  ┌─────────────┐    ┌──────────────┐    ┌────────────┐ │
+│  │   Proposer   │───▶│   Scorer     │───▶│  Selector  │ │
+│  │ Generate N   │    │ Estimate P   │    │ Pick best  │ │
+│  │ candidates   │    │ of success   │    │ to explore │ │
+│  └─────────────┘    └──────────────┘    └─────┬──────┘ │
+│                                               │        │
+│  ┌─────────────┐    ┌──────────────┐          │        │
+│  │  Pruner     │◀───│   Executor   │◀─────────┘        │
+│  │ Kill dead   │    │ Run action   │                   │
+│  │ branches    │    │ in worktree  │                   │
+│  └─────────────┘    └──────────────┘                   │
+└─────────────────────────────────────────────────────────┘
+         │
+         ▼
+┌─────────────────────┐
+│  Agent Session       │
+│  (receives winning   │
+│   branch as result)  │
+└─────────────────────┘
+```
+
+### Scoring Approaches
+
+| Approach | Speed | Quality | Cost |
+|----------|-------|---------|------|
+| **Heuristic** (file relevance, error proximity) | Fast | Low | Free |
+| **Fast model** (haiku-class rates candidates) | Medium | Medium | Low |
+| **Self-evaluation** (main model rates its own proposals) | Slow | High | High |
+| **Learned scorer** (trained on past outcomes from learning graph) | Fast | High | Free at inference |
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-loop.ts` | New planning phase between user prompt and action execution |
+| Session branching (`fork()`) | Used to create exploration branches |
+| Git worktrees | Each branch explored in an isolated worktree |
+| `agent-session.ts` | Receives the winning branch and presents it as the result |
+| Skill Library Evolution (#1) | Provides learned patterns to improve the scorer over time |
+
+### Cost-Benefit Analysis
+
+| Factor | Value |
+|--------|-------|
+| **LLM calls per turn** | 2-5x more (proposal generation + scoring) |
+| **Token usage** | 3-10x more per complex problem |
+| **Success rate on hard problems** | Estimated 30-50% improvement |
+| **Time to solution** | Fewer total turns despite more LLM calls per turn |
+| **User experience** | Agent appears to "think harder" on hard problems |
+
+### Open Questions
+
+- **When to activate:** MCTS is expensive. Should it only activate when the agent detects a hard problem (repeated failures, high uncertainty)?
+- **Branch isolation:** Git worktrees work for file changes, but how to isolate Bash side effects?
+- **Budget control:** How many branches to explore before falling back to linear execution?
+- **Transparency:** Should the user see the exploration tree or just the winning path?
+
+---
+
+## Priority Matrix
+
+| # | Technique | Impact | Effort | Compounding | Dependencies |
+|---|-----------|--------|--------|-------------|--------------|
+| 1 | **Skill Library Evolution** | Massive | Medium | Yes — improves all other techniques | None |
+| 2 | **DAG Tool Execution** | High | Medium | No — static speedup | None |
+| 3 | **Speculative Tool Execution** | High | Low-Med | Yes — improves with learning | Benefits from #1 |
+| 4 | **Semantic Context Compression** | High | High | No — static improvement | None |
+| 5 | **Cross-Session Learning Graph** | Transformative | High | Yes — feeds #1, #3, #6 | Benefits from #1 |
+| 6 | **MCTS Planning** | Transformative | Very High | Yes — improves with #1, #5 | Benefits from #1, #5 |
+
+### Recommended Implementation Order
+
+```
+Phase 1 (Foundation)          Phase 2 (Performance)       Phase 3 (Intelligence)
+─────────────────────         ─────────────────────       ─────────────────────
+┌─────────────────┐          ┌─────────────────┐         ┌─────────────────┐
+│ Skill Library    │          │ DAG Tool Exec   │         │ Semantic Context│
+│ Evolution        │──feeds──▶│                 │         │ Compression     │
+│                  │          │ Speculative     │         │                 │
+│                  │──feeds──▶│ Tool Exec       │         │ MCTS Planning   │
+└─────────────────┘          └─────────────────┘         └─────────────────┘
+                                      │                          ▲
+┌─────────────────┐                   │                          │
+│ Cross-Session   │───────────────────┴──────────────────────────┘
+│ Learning Graph  │         (feeds intelligence layer)
+└─────────────────┘
+```
+
+**Phase 1** creates the feedback loop that makes everything else better over time.
+**Phase 2** delivers immediate, measurable performance wins.
+**Phase 3** requires the most architectural change but delivers the deepest capability gains.
+
+---
+
+## Sources & References
+
+### Papers
+
+- [SkillRL: Evolving Agents via Recursive Skill-Augmented RL](https://arxiv.org/abs/2602.08234) — ICLR 2026. Skill library evolution framework.
+- [LLMCompiler: An LLM Compiler for Parallel Function Calling](https://arxiv.org/pdf/2312.04511) — ICML 2024. DAG-based tool execution.
+- [Optimizing Agentic LLM Inference via Speculative Tool Calls](https://arxiv.org/pdf/2512.15834) — Speculative execution for agent tools.
+- [RISE: Recursive Introspection for Self-Improvement](https://proceedings.neurips.cc/paper_files/paper/2024/file/639d992f819c2b40387d4d5170b8ffd7-Paper-Conference.pdf) — NeurIPS 2024. Self-improving LLM agents.
+- [Don't Break the Cache: Prompt Caching for Agentic Tasks](https://arxiv.org/html/2601.06007v1) — Prompt caching evaluation.
+- [Efficient LLM Serving for Agentic Workflows](https://arxiv.org/html/2603.16104v1) — Systems perspective on agent serving.
+
+### Industry & Analysis
+
+- [Context Engineering for Agents](https://rlancemartin.github.io/2025/06/23/context_engineering/) — Lance Martin's comprehensive guide.
+- [AI Agent Context Compression Strategies](https://zylos.ai/research/2026-02-28-ai-agent-context-compression-strategies) — Zylos Research, Feb 2026.
+- [Context Engineering for Coding Agents](https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html) — Martin Fowler.
+- [Memory for AI Agents: A New Paradigm](https://thenewstack.io/memory-for-ai-agents-a-new-paradigm-of-context-engineering/) — The New Stack.
+- [LLM Compiler Agent Pattern](https://agent-patterns.readthedocs.io/en/stable/patterns/llm-compiler.html) — Agent Patterns documentation.
+- [Skill Library Evolution Pattern](https://www.agentic-patterns.com/patterns/skill-library-evolution/) — Awesome Agentic Patterns.
+
+### Workshops & Events
+
+- [ICLR 2026 Workshop on AI with Recursive Self-Improvement](https://iclr.cc/virtual/2026/workshop/10000796)
+- [Agent Memory Paper List](https://github.com/Shichun-Liu/Agent-Memory-Paper-List) — Comprehensive survey.
+- [Awesome Context Engineering](https://github.com/Meirtz/Awesome-Context-Engineering) — Papers, frameworks, guides.
diff --git a/docs/PRD-branchless-worktree-architecture.md b/docs/PRD-branchless-worktree-architecture.md
new file mode 100644
index 000000000..4c511353c
--- /dev/null
+++ b/docs/PRD-branchless-worktree-architecture.md
@@ -0,0 +1,383 @@
+# PRD: Branchless Worktree Architecture
+
+**Author:** Lex Christopherson
+**Date:** 2026-03-15
+**ADR:** [ADR-001-branchless-worktree-architecture.md](./ADR-001-branchless-worktree-architecture.md)
+**Priority:** Critical — blocks reliable auto-mode operation
+
+---
+
+## Problem Statement
+
+GSD's auto-mode is unreliable. Users experience:
+
+1. **Infinite loop detection failures** — the agent writes planning artifacts on slice branches that become invisible after branch switching, causing `verifyExpectedArtifact()` to fail repeatedly. Auto-mode burns budget retrying the same unit 3-6 times before hard-stopping. This is the #1 user complaint.
+
+2. **State corruption across branches** — `.gsd/` planning artifacts (roadmaps, plans, decisions) are gitignored but branch-specific. Multiple branches sharing a single `.gsd/` directory clobber each other's state. Users see wrong milestones marked complete, wrong roadmaps loaded, and auto-mode starting from the wrong phase.
+
+3. **Excessive complexity** — 770+ lines of merge, conflict resolution, branch switching, and self-healing code exist solely to manage slice branches inside worktrees. This code has required 15+ bug fixes across versions and remains the primary source of auto-mode failures.
+
+These problems are architectural. They cannot be fixed by patching individual symptoms.
+
+## Vision
+
+Auto-mode uses git worktrees for isolation and sequential commits for history. No branch switching. No merge conflicts within a worktree. Planning artifacts are tracked in git and travel with the branch. The git layer is so simple it can't break.
+
+## Success Criteria
+
+| Criterion | Measurement |
+|-----------|-------------|
+| Zero loop detection failures from branch visibility | No `verifyExpectedArtifact()` failures caused by branch mismatch in 50 consecutive auto-mode runs |
+| Zero `.gsd/` state corruption | Manual worktrees created via `git worktree add` have correct `.gsd/` state without any GSD-specific initialization |
+| Code deletion | Net removal of ≥500 lines of merge/conflict/branch-switching code |
+| Test simplification | Removal or simplification of ≥6 merge-specific test files |
+| Backwards compatibility | Existing projects with `gsd/M001/S01` slice branches continue to work (read-only; new work uses new model) |
+| No new git primitives | The implementation uses only: worktrees, commits, squash-merge. No new branch types, merge strategies, or conflict resolution. |
+
+## Non-Goals
+
+- Parallel slice execution within a single worktree (if needed later, use separate worktrees)
+- Changing how milestones relate to `main` (squash-merge stays)
+- Modifying the dispatch unit types or state machine (except removing `fix-merge`)
+- Changing the worktree-manager.ts manual worktree API (`/worktree` command)
+
+## Current Architecture
+
+### Branch Model (M003, v2.13.0)
+
+```
+main
+  └─ milestone/M001 (worktree at .gsd/worktrees/M001/)
+       ├─ gsd/M001/S01 (slice branch — code + .gsd/ artifacts)
+       │   └── merge --no-ff → milestone/M001
+       ├─ gsd/M001/S02
+       │   └── merge --no-ff → milestone/M001
+       └── squash merge → main
+```
+
+### Data Flow
+
+```
+Agent writes file → on slice branch → handleAgentEnd → auto-commit on slice branch
+→ switch to milestone branch → verifyExpectedArtifact → FILE NOT FOUND (it's on slice branch)
+→ loop counter++ → retry → same result → HARD STOP
+```
+
+### Code Involved
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `auto-worktree.ts` | 512 | Worktree lifecycle + slice→milestone merge |
+| `git-service.ts` | 915 | Branch creation, switching, merge with conflict resolution |
+| `git-self-heal.ts` | 198 | Merge failure recovery |
+| `auto.ts` | ~150 lines | Merge dispatch guards, fix-merge routing, branch-mode vs worktree-mode branching |
+| `worktree.ts` | ~40 lines | Slice branch delegates |
+| 11 test files | ~2000 lines | Merge/branch/worktree test coverage |
+
+### `.gsd/` Tracking (Current — Contradictory)
+
+- `.gitignore` line 52: `.gsd/` — ignores everything
+- `smartStage()` lines 338-349: force-adds `GSD_DURABLE_PATHS` — tracks milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md
+- Result: `.gsd/milestones/` is partially tracked on some branches, fully ignored on others. The code fights the config.
+
+## Proposed Architecture
+
+### Branch Model
+
+```
+main
+  └─ milestone/M001 (worktree at .gsd/worktrees/M001/)
+       │
+       commit: feat(M001): context + roadmap
+       commit: feat(M001/S01): research
+       commit: feat(M001/S01): plan
+       commit: feat(M001/S01/T01): implement auth service
+       commit: feat(M001/S01/T02): implement auth tests
+       commit: feat(M001/S01): summary + UAT
+       commit: docs(M001): reassess roadmap after S01
+       commit: feat(M001/S02): research
+       commit: feat(M001/S02): plan
+       commit: ...
+       commit: feat(M001): milestone complete
+       │
+       └── squash merge → main
+```
+
+One branch. Sequential commits. No merges within the worktree.
+
+### Data Flow
+
+```
+Agent writes file → on milestone branch → handleAgentEnd → auto-commit on milestone branch
+→ verifyExpectedArtifact → FILE FOUND (same branch) → persist completion → next dispatch
+```
+
+### `.gsd/` Tracking (Proposed — Coherent)
+
+**Tracked (travels with branch):**
+```
+.gsd/milestones/**/*.md    (except CONTINUE markers)
+.gsd/milestones/**/*.json  (META.json integration records)
+.gsd/PROJECT.md
+.gsd/DECISIONS.md
+.gsd/REQUIREMENTS.md
+.gsd/QUEUE.md
+```
+
+**Gitignored (ephemeral):**
+```
+.gsd/auto.lock
+.gsd/completed-units.json
+.gsd/STATE.md
+.gsd/metrics.json
+.gsd/gsd.db
+.gsd/activity/
+.gsd/runtime/
+.gsd/worktrees/
+.gsd/DISCUSSION-MANIFEST.json
+.gsd/milestones/**/*-CONTINUE.md
+.gsd/milestones/**/continue.md
+```
+
+### Why This Works
+
+| Problem | How It's Solved |
+|---------|----------------|
+| Artifact invisibility after branch switch | No branch switching. Artifacts commit on the one branch. |
+| `.gsd/` state clobbering | Artifacts tracked in git. Each branch carries its own `.gsd/`. `git worktree add` and `git checkout` give correct state. |
+| Merge conflict complexity | No merges within a worktree. Only merge is milestone→main (squash). |
+| Manual worktree initialization | Tracked artifacts are checked out with the branch. No GSD-specific bootstrap needed. |
+| Dual isolation mode maintenance | Single mode: worktree. Branch-mode (`git.isolation: "branch"`) deprecated. |
+
+## Implementation Plan
+
+### Phase 1: `.gitignore` + Tracking Fix
+
+**Goal:** Planning artifacts are tracked in git. `.gitignore` reflects reality.
+
+1. Update `.gitignore`:
+   - Remove blanket `.gsd/` ignore
+   - Add explicit runtime-only ignores (see proposed list above)
+
+2. Force-add existing planning artifacts on current branch:
+   ```
+   git add --force .gsd/milestones/ .gsd/PROJECT.md .gsd/DECISIONS.md .gsd/REQUIREMENTS.md .gsd/QUEUE.md
+   ```
+
+3. Ensure runtime files are NOT tracked:
+   ```
+   git rm --cached -r .gsd/runtime/ .gsd/activity/ .gsd/STATE.md .gsd/metrics.json .gsd/completed-units.json .gsd/auto.lock
+   ```
+
+4. Update README suggested `.gitignore` section
+
+5. Remove `smartStage()` force-add of `GSD_DURABLE_PATHS` — no longer needed since `.gitignore` doesn't block them
+
+**Verification:** `git status` shows planning artifacts tracked, runtime files untracked. `git worktree add` on a new worktree has correct `.gsd/milestones/` state.
+
+### Phase 2: Remove Slice Branch Creation + Switching
+
+**Goal:** No code creates, switches to, or references slice branches for new work.
+
+1. Remove `ensureSliceBranch()` from `git-service.ts` (lines 485-544)
+2. Remove `switchToMain()` from `git-service.ts` (lines 549-563)
+3. Remove `getSliceBranchName()` from `worktree.ts` (lines 94-98)
+4. Remove `isOnSliceBranch()` and `getActiveSliceBranch()` from `worktree.ts`
+5. Update `auto.ts` dispatch paths — remove branch creation before `execute-task`
+6. Update `handleAgentEnd` — remove branch-switching logic post-dispatch
+
+**Verification:** Auto-mode runs a full slice (research → plan → execute → complete) without creating any branches. All commits land on `milestone/<MID>`.
+
+### Phase 3: Remove Slice Merge Code
+
+**Goal:** All slice→milestone and slice→main merge code is deleted.
+
+1. Remove `mergeSliceToMilestone()` from `auto-worktree.ts` (lines 253-350)
+2. Remove `mergeSliceToMain()` from `git-service.ts` (lines 705-893)
+3. Remove merge dispatch guards from `auto.ts` (lines 1635-1679)
+4. Remove `fix-merge` dispatch unit type from `auto.ts`
+5. Remove `buildPromptForFixMerge()` from `auto.ts`
+6. Remove `withMergeHeal()` from `git-self-heal.ts` (lines 99-136)
+7. Remove `abortAndReset()` from `git-self-heal.ts` (lines 37-84) — or simplify to crash-recovery-only
+8. Remove `shouldUseWorktreeIsolation()` preference resolution — worktree is the only mode
+9. Remove `getMergeToMainMode()` — milestone merge is the only mode
+10. Deprecate `git.isolation: "branch"` and `git.merge_to_main: "slice"` preferences
+
+**Verification:** `git grep mergeSliceToMilestone` returns zero results. `git grep mergeSliceToMain` returns zero results. `git grep fix-merge` returns zero results (outside of changelog/docs).
+
+### Phase 4: Simplify `mergeMilestoneToMain()`
+
+**Goal:** Milestone→main merge is clean and minimal.
+
+The function becomes:
+1. Auto-commit any dirty state in worktree
+2. `process.chdir(originalBasePath)` — back to main repo
+3. `git checkout main`
+4. `git merge --squash milestone/<MID>`
+5. Build commit message with milestone summary + slice manifest
+6. `git commit`
+7. Optional: `git push`
+8. `removeWorktree()` + `git branch -D milestone/<MID>`
+
+No conflict categorization. No runtime file stripping (runtime files are gitignored, not in the merge). No `.gsd/` special handling.
+
+If squash-merge conflicts (parallel milestone edge case): stop auto-mode with clear error, user resolves manually or GSD dispatches a one-time resolution session.
+
+**Verification:** Complete a full milestone in auto-mode. `main` receives one squash commit with all code and planning artifacts.
+
+### Phase 5: Test Cleanup
+
+**Goal:** Test suite reflects the simplified architecture.
+
+1. Delete or rewrite:
+   - `auto-worktree-merge.test.ts` — tests slice→milestone merge (deleted)
+   - `auto-worktree-milestone-merge.test.ts` — rewrite for simplified milestone→main
+   - `worktree-e2e.test.ts` — rewrite for branchless flow
+   - `worktree-integration.test.ts` — rewrite for branchless flow
+   - Merge-related test cases in `git-service.test.ts`
+
+2. Add new tests:
+   - Branchless worktree lifecycle: create → commit → commit → squash-merge → cleanup
+   - `.gsd/` tracking: planning artifacts tracked, runtime files ignored
+   - Manual worktree: `git worktree add` has correct `.gsd/` state
+   - Crash recovery: dirty state on milestone branch, restart, auto-commit, continue
+
+3. Remove merge-specific doctor checks or simplify:
+   - `corrupt_merge_state` — keep (still relevant for milestone→main)
+   - `orphaned_auto_worktree` — keep
+   - `stale_milestone_branch` — keep
+   - `tracked_runtime_files` — keep
+
+**Verification:** `npm run test` passes. No test references `mergeSliceToMilestone`, `mergeSliceToMain`, or `ensureSliceBranch`.
+
+### Phase 6: Migration + Backwards Compatibility
+
+**Goal:** Existing projects with slice branches continue to work.
+
+1. State derivation (`deriveState()`) continues to read `gsd/M001/S01` branch naming for legacy detection
+2. On first run after upgrade:
+   - Detect existing slice branches
+   - Notify user: "GSD no longer creates slice branches. Existing branches are preserved but new work commits directly to the milestone branch."
+   - No forced migration — legacy branches are read-only context
+3. Doctor check: `legacy_slice_branches` — informational, not auto-fix
+4. Update `shouldUseWorktreeIsolation()` preference handling:
+   - `git.isolation: "worktree"` → default behavior (only option)
+   - `git.isolation: "branch"` → warning, treated as worktree
+   - Remove preference UI for isolation mode
+
+**Verification:** Open a project with existing `gsd/M001/S01` branches. GSD reads state correctly, new work commits on milestone branch without slice branches.
+
+## Stress Test Results
+
+Validated by three independent models:
+
+### Gemini 2.5 Pro — 6 Attack Vectors
+
+| Attack | Severity | Mitigation |
+|--------|----------|------------|
+| Parallel milestone code conflict at squash-merge | Medium | `git rebase main` before squash. Rare in single-user. |
+| SQLite desync after `git reset --hard` | Low | DB rebuilt from tracked markdown on startup (M001/S02 importers). |
+| Ghost lock after SIGKILL | Low | Existing heartbeat lock detection handles this. |
+| Squash merge loses bisect granularity | Low | Commit messages tag slices. Branch preservable if needed. |
+| Disk space with multiple worktrees | Low | Single active milestone at a time. Immediate cleanup. |
+| Plan-action atomicity gap (crash between write and commit) | Low | `handleAgentEnd` auto-commits. Sequential model simplifies recovery. |
+
+### GPT-5.4 (Codex) — Codebase-Informed Analysis
+
+- Confirmed `smartStage()` force-add already implements tracked-artifact intent
+- Confirmed `resolveMainWorktreeRoot` (PR #487) contradicts this architecture
+- Confirmed `.gsd/milestones/` partially tracked on `main` despite `.gitignore`
+- Verdict: **Model is sound. Removes only accidental complexity.**
+
+### GPT-5.4 (Codex) — Dissenting Opinion
+
+Codex agreed on tracked artifacts and worktree-per-milestone, but pushed back on removing slice branches, calling it "a redesign, not a simplification." Specific concerns:
+
+| Concern | Rebuttal |
+|---------|----------|
+| Crash recovery for orphaned slice branches disappears | The failure mode (orphaned branch needing merge) is caused by slice branches. Removing branches removes the failure. Sequential commits on one branch need no orphan recovery. |
+| Concurrent edits to shared root docs (DECISIONS.md) from two terminals | Standard content conflict at squash-merge time. Not caused by or solved by slice branches. |
+| Continuous integration via slice→milestone merges | In sequential single-user work, there's nothing to integrate against within the worktree. Pre-flight rebase before squash-merge is more direct. |
+| Need a replacement slice-boundary primitive | Accepted: conventional commit tags (`feat(M001/S01):`) + optional git tags (`gsd/M001/S01-complete`) serve as boundaries. |
+
+Codex's analysis confirms the tracked-artifact approach but recommends treating branchless as a deliberate redesign with explicit replacement primitives, not a casual deletion.
+
+### Edge Case: Two Milestones Touching Same Source Files
+
+Scenario: M001 and M002 both modify `src/auth.ts`. M001 squash-merges first.
+
+Resolution: Before M002 squash-merges, rebase onto updated `main`:
+```
+cd .gsd/worktrees/M002
+git fetch origin main
+git rebase main
+# Resolve any conflicts (code-only, never .gsd/)
+# Then squash-merge
+```
+
+This is standard git workflow. GSD can automate the rebase step as a pre-merge check.
+
+### Edge Case: Agent Crash Mid-Commit
+
+Scenario: Power loss during `git commit` on the milestone branch.
+
+Resolution: Git's internal journaling protects the object store. On restart:
+- If commit completed: state is consistent
+- If commit didn't complete: working directory has uncommitted changes, `handleAgentEnd` auto-commits on next dispatch
+- No branch to be "stuck between" — single branch means no split-brain state
+
+### Edge Case: User Edits Main While Worktree Active
+
+Scenario: User makes manual commits on `main` while M001 worktree is active.
+
+Resolution: Worktree is on `milestone/M001` branch, independent of `main`. Manual `main` commits don't affect the worktree. At squash-merge time, `git merge --squash` handles the divergence normally. If there's a conflict, it's resolved once.
+
+## Metrics
+
+### Before (Current)
+
+| Metric | Value |
+|--------|-------|
+| Merge/conflict/branch code | 770+ lines across 4 files |
+| Merge-related test files | 11 files |
+| Branch types | 4 (main, milestone/*, gsd/*/*, worktree/*) |
+| Merge strategies | 3 (--no-ff, --squash, conflict resolution) |
+| Dispatch unit types with merge logic | 2 (complete-slice, fix-merge) |
+| Isolation modes | 2 (branch, worktree) |
+| Doctor git checks | 4 |
+
+### After (Proposed)
+
+| Metric | Value |
+|--------|-------|
+| Merge/conflict/branch code | ~50 lines (simplified `mergeMilestoneToMain` only) |
+| Merge-related test files | 3-4 files (rewritten) |
+| Branch types | 2 (main, milestone/*) |
+| Merge strategies | 1 (--squash) |
+| Dispatch unit types with merge logic | 0 |
+| Isolation modes | 1 (worktree) |
+| Doctor git checks | 3-4 (simplified) |
+
+### Net Impact
+
+- **~720 lines deleted** (net, after simplified replacements)
+- **~7 test files deleted or consolidated**
+- **2 branch types eliminated**
+- **2 merge strategies eliminated**
+- **1 dispatch unit type eliminated** (fix-merge)
+- **1 isolation mode eliminated** (branch)
+- **0 merge conflicts possible within a worktree**
+
+## Dependencies
+
+- **M001 (Memory Database):** The SQLite database (`gsd.db`) must remain gitignored. The M001/S02 importer layer rebuilds it from tracked markdown. This PRD's `.gitignore` update explicitly ignores `gsd.db`.
+
+- **PR #487:** Must be closed. The `resolveMainWorktreeRoot` approach (sharing `.gsd/` across worktrees) contradicts tracked-artifact architecture.
+
+## Open Questions
+
+1. **Squash vs `--no-ff` for milestone→main merge?** Squash gives clean history on `main` but loses bisect granularity. `--no-ff` preserves granular commits but clutters `main`. Current proposal: squash (matching existing behavior), with option to preserve milestone branch for debugging.
+
+2. **Should `worktrees/` move outside `.gsd/`?** Having worktrees inside `.gsd/` creates a nesting-doll pattern (worktree contains `.gsd/` which is inside `.gsd/worktrees/`). Relocating to `.gsd-worktrees/` or `~/.gsd/worktrees/<repo-hash>/` is cleaner but changes the filesystem layout. Recommendation: defer, address separately if it causes issues.
+
+3. **Pre-flight rebase automation?** Before milestone→main squash-merge, should GSD automatically `git rebase main`? Gemini recommends yes. Risk: rebase can fail with conflicts, adding a code path. Recommendation: implement as a doctor check ("milestone branch is behind main by N commits") with manual resolution, automate later if needed.
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 000000000..c37b303c0
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,53 @@
+# GSD Documentation
+
+Welcome to the GSD documentation. This covers everything from getting started to advanced configuration, auto-mode internals, and extending GSD with the Pi SDK.
+
+## User Documentation
+
+| Guide | Description |
+|-------|-------------|
+| [Getting Started](./getting-started.md) | Installation, first run, and basic usage |
+| [Auto Mode](./auto-mode.md) | How autonomous execution works — the state machine, crash recovery, and steering |
+| [Commands Reference](./commands.md) | All commands, keyboard shortcuts, and CLI flags |
+| [Remote Questions](./remote-questions.md) | Discord and Slack integration for headless auto-mode |
+| [Configuration](./configuration.md) | Preferences, model selection, git settings, and token profiles |
+| [Custom Models](./custom-models.md) | Add custom providers (Ollama, vLLM, LM Studio, proxies) via models.json |
+| [Token Optimization](./token-optimization.md) | Token profiles, context compression, complexity routing, and adaptive learning (v2.17) |
+| [Dynamic Model Routing](./dynamic-model-routing.md) | Complexity-based model selection, cost tables, escalation, and budget pressure (v2.19) |
+| [Captures & Triage](./captures-triage.md) | Fire-and-forget thought capture during auto-mode with automated triage (v2.19) |
+| [Workflow Visualizer](./visualizer.md) | Interactive TUI overlay for progress, dependencies, metrics, and timeline (v2.19) |
+| [Cost Management](./cost-management.md) | Budget ceilings, cost tracking, projections, and enforcement modes |
+| [Git Strategy](./git-strategy.md) | Worktree isolation, branching model, and merge behavior |
+| [Parallel Orchestration](./parallel-orchestration.md) | Run multiple milestones simultaneously with worker isolation and coordination |
+| [Working in Teams](./working-in-teams.md) | Unique milestone IDs, `.gitignore` setup, and shared planning artifacts |
+| [Skills](./skills.md) | Bundled skills, skill discovery, and custom skill authoring |
+| [Migration from v1](./migration.md) | Migrating `.planning` directories from the original GSD |
+| [Troubleshooting](./troubleshooting.md) | Common issues, `/gsd doctor` (real-time visibility v2.40), `/gsd forensics` (full debugger v2.40), and recovery procedures |
+| [Web Interface](./web-interface.md) | Browser-based project management with `pi --web` (v2.41) |
+| [VS Code Extension](../vscode-extension/README.md) | Chat participant, sidebar dashboard, and RPC integration for VS Code |
+
+## Architecture & Internals
+
+| Guide | Description |
+|-------|-------------|
+| [Architecture Overview](./architecture.md) | System design, extension model, state-on-disk, and dispatch pipeline |
+| [Native Engine](../native/README.md) | Rust N-API modules for performance-critical operations |
+| [ADR-001: Branchless Worktree Architecture](./ADR-001-branchless-worktree-architecture.md) | Decision record for the v2.14 git architecture |
+| [ADR-003: Pipeline Simplification](./ADR-003-pipeline-simplification.md) | Research merged into planning, mechanical completion (v2.30) |
+
+## Pi SDK Documentation
+
+These guides cover the underlying Pi SDK that GSD is built on. Useful if you want to extend GSD or build your own agent application.
+
+| Guide | Description |
+|-------|-------------|
+| [What is Pi](./what-is-pi/README.md) | Core concepts — modes, agent loop, sessions, tools, providers |
+| [Extending Pi](./extending-pi/README.md) | Building extensions — tools, commands, UI, events, state |
+| [Context & Hooks](./context-and-hooks/README.md) | Context pipeline, hook reference, inter-extension communication |
+| [Pi UI / TUI](./pi-ui-tui/README.md) | Terminal UI components, theming, keyboard input, rendering |
+
+## Research
+
+| Guide | Description |
+|-------|-------------|
+| [Building Coding Agents](./building-coding-agents/README.md) | Research notes on agent design — decomposition, context engineering, cost/quality tradeoffs |
diff --git a/docs/agent-knowledge-index.md b/docs/agent-knowledge-index.md
new file mode 100644
index 000000000..6d9cb6c77
--- /dev/null
+++ b/docs/agent-knowledge-index.md
@@ -0,0 +1,222 @@
+# Agent Knowledge Index
+
+Use this file as a machine-operational routing table for pi docs and research references.
+
+Rules:
+
+- Read only the specific files relevant to the current task.
+- Prefer the primary bundle first.
+- Read files in parallel when the task clearly maps to multiple known references.
+- Use absolute paths directly with `read`.
+- Follow conditional references only when the primary bundle does not answer the question.
+
+## Pi architecture
+
+Use when:
+
+- understanding how pi works end to end
+- tracing subsystem relationships
+- understanding sessions, compaction, models, tools, or prompt flow
+- deciding how to embed pi in a branded app, custom CLI, desktop app, or web product
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/01-what-pi-is.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/04-the-architecture-how-everything-fits-together.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/05-the-agent-loop-how-pi-thinks.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/06-tools-how-pi-acts-on-the-world.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/07-sessions-memory-that-branches.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/08-compaction-how-pi-manages-context-limits.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/09-the-customization-stack.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/10-providers-models-multi-model-by-default.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/13-context-files-project-instructions.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/03-the-four-modes-of-operation.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/11-the-interactive-tui.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/15-pi-packages-the-ecosystem.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/17-file-reference-all-documentation.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/18-quick-reference-commands-shortcuts.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md`
+
+## Context engineering, hooks, and context flow
+
+Use when:
+
+- understanding how user prompts flow through to the LLM
+- working with before_agent_start, context, tool_call, tool_result, input hooks
+- injecting, filtering, or transforming LLM context
+- understanding message types and what the LLM actually sees
+- coordinating multiple extensions
+- building mode systems, presets, or context management extensions
+- debugging why the LLM does or doesn't see certain information
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/01-the-context-pipeline.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/02-hook-reference.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/03-context-injection-patterns.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/04-message-types-and-llm-visibility.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/05-inter-extension-communication.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/06-advanced-patterns-from-source.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/07-the-system-prompt-anatomy.md`
+
+## Extension development
+
+Use when:
+
+- building or modifying extensions
+- adding tools, commands, hooks, renderers, state, or packaging
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/01-what-are-extensions.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/02-architecture-mental-model.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/03-getting-started.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/06-the-extension-lifecycle.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/07-events-the-nervous-system.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/08-extensioncontext-what-you-can-access.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/09-extensionapi-what-you-can-do.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/10-custom-tools-giving-the-llm-new-abilities.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/11-custom-commands-user-facing-actions.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/14-custom-rendering-controlling-what-the-user-sees.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/25-slash-command-subcommand-patterns.md` # for subcommand-style slash command UX via getArgumentCompletions()
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/15-system-prompt-modification.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/22-key-rules-gotchas.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/04-extension-locations-discovery.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/05-extension-structure-styles.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/12-custom-ui-visual-components.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/13-state-management-persistence.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/16-compaction-session-control.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/17-model-provider-management.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/18-remote-execution-tool-overrides.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/19-packaging-distribution.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/20-mode-behavior.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/21-error-handling.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/23-file-reference-documentation.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/24-file-reference-example-extensions.md`
+
+## Pi UI and TUI
+
+Use when:
+
+- building dialogs, widgets, overlays, custom editors, or UI renderers
+- working on TUI layout or display behavior
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/01-the-ui-architecture.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/22-quick-reference-all-ui-apis.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/04-built-in-dialog-methods.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/05-persistent-ui-elements.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/06-ctx-ui-custom-full-custom-components.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/07-built-in-components-the-building-blocks.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/12-overlays-floating-modals-and-panels.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/13-custom-editors-replacing-the-input.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/14-tool-rendering-custom-tool-display.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/15-message-rendering-custom-message-display.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/21-common-mistakes-and-how-to-avoid-them.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/02-the-component-interface-foundation-of-everything.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/08-high-level-components-from-pi-coding-agent.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/09-keyboard-input-how-to-handle-keys.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/10-line-width-the-cardinal-rule.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/11-theming-colors-and-styles.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/16-performance-caching-and-invalidation.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/17-theme-changes-and-invalidation.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/18-ime-support-the-focusable-interface.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/19-building-a-complete-component-step-by-step.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/20-real-world-patterns-from-examples.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/23-file-reference-example-extensions-with-ui.md`
+
+## Building coding agents
+
+Use when:
+
+- designing agent behavior
+- improving autonomy, speed, context handling, or decomposition
+- solving hard ambiguity, safety, or verification problems
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/01-work-decomposition.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/11-god-tier-context-engineering.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/12-handling-ambiguity-contradiction.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/03-state-machine-context-management.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/04-optimal-storage-for-project-context.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/05-parallelization-strategy.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/08-speed-optimization.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/20-error-taxonomy-routing.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/24-security-trust-boundaries.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/13-long-running-memory-fidelity.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/16-encoding-taste-aesthetics.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/19-when-to-scrap-and-start-over.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/23-evolution-across-project-scale.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md`
+
+## Pi product docs
+
+Use when:
+
+- the user asks about pi itself, its SDK, extensions, themes, skills, packages, TUI, prompt templates, keybindings, or custom providers
+
+Read first:
+
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/README.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/extensions.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/themes.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/skills.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/prompt-templates.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/tui.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/keybindings.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/sdk.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/custom-provider.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/models.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/packages.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples`
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 000000000..a166c148b
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,162 @@
+# Architecture Overview
+
+GSD is a TypeScript application built on the [Pi SDK](https://github.com/badlogic/pi-mono). It embeds the Pi coding agent and extends it with the GSD workflow engine, auto mode state machine, and project management primitives.
+
+## System Structure
+
+```
+gsd (CLI binary)
+  └─ loader.ts          Sets PI_PACKAGE_DIR, GSD env vars, dynamic-imports cli.ts
+      └─ cli.ts         Wires SDK managers, loads extensions, starts InteractiveMode
+          ├─ onboarding.ts   First-run setup wizard (LLM provider + tool keys)
+          ├─ wizard.ts       Env hydration from stored auth.json credentials
+          ├─ app-paths.ts    ~/.gsd/agent/, ~/.gsd/sessions/, auth.json
+          ├─ resource-loader.ts  Syncs bundled extensions + agents to ~/.gsd/agent/
+          └─ src/resources/
+              ├─ extensions/gsd/    Core GSD extension
+              ├─ extensions/...     12 supporting extensions
+              ├─ agents/            scout, researcher, worker
+              ├─ AGENTS.md          Agent routing instructions
+              └─ GSD-WORKFLOW.md    Manual bootstrap protocol
+
+gsd headless              Headless mode — CI/cron orchestration via RPC child process
+gsd --mode mcp            MCP server mode — exposes tools over stdin/stdout
+
+vscode-extension/         VS Code extension — chat participant (@gsd), sidebar dashboard, RPC integration
+```
+
+## Key Design Decisions
+
+### State Lives on Disk
+
+`.gsd/` is the sole source of truth. Auto mode reads it, writes it, and advances based on what it finds. No in-memory state survives across sessions. This enables crash recovery, multi-terminal steering, and session resumption.
+
+### Two-File Loader Pattern
+
+`loader.ts` sets all environment variables with zero SDK imports, then dynamically imports `cli.ts` which does static SDK imports. This ensures `PI_PACKAGE_DIR` is set before any SDK code evaluates.
+
+### `pkg/` Shim Directory
+
+`PI_PACKAGE_DIR` points to `pkg/` (not project root) to avoid Pi's theme resolution colliding with GSD's `src/` directory. Contains only `piConfig` and theme assets.
+
+### Always-Overwrite Sync
+
+Bundled extensions and agents are synced to `~/.gsd/agent/` on every launch, not just first run. This means `npm update -g` takes effect immediately.
+
+### Lazy Provider Loading
+
+LLM provider SDKs (Anthropic, OpenAI, Google, etc.) are lazy-loaded on first use rather than imported at startup. This significantly reduces cold-start time — only the provider you actually connect to gets loaded.
+
+### Fresh Session Per Unit
+
+Every dispatch creates a new agent session. The LLM starts with a clean context window containing only the pre-inlined artifacts it needs. This prevents quality degradation from context accumulation.
+
+## Bundled Extensions
+
+| Extension | What It Provides |
+|-----------|-----------------|
+| **GSD** | Core workflow engine — auto mode, state machine, commands, dashboard |
+| **Browser Tools** | Playwright-based browser automation — navigation, forms, screenshots, PDF export, device emulation, visual regression, structured data extraction, route mocking, accessibility tree inspection, and semantic actions |
+| **Search the Web** | Brave Search, Tavily, or Jina page extraction |
+| **Google Search** | Gemini-powered web search with AI-synthesized answers |
+| **Context7** | Up-to-date library/framework documentation |
+| **Background Shell** | Long-running process management with readiness detection |
+| **Subagent** | Delegated tasks with isolated context windows |
+| **Mac Tools** | macOS native app automation via Accessibility APIs |
+| **MCP Client** | Native MCP server integration via @modelcontextprotocol/sdk |
+| **Voice** | Real-time speech-to-text (macOS, Linux) |
+| **Slash Commands** | Custom command creation |
+| **LSP** | Language Server Protocol — diagnostics, definitions, references, hover, rename |
+| **Ask User Questions** | Structured user input with single/multi-select |
+| **Secure Env Collect** | Masked secret collection |
+| **Async Jobs** | Background command execution with `async_bash`, `await_job`, `cancel_job` |
+| **Remote Questions** | Discord, Slack, and Telegram integration for headless question routing |
+| **TTSR** | Tool-triggered system rules — conditional context injection based on tool usage |
+| **Universal Config** | Discovery of existing AI tool configurations (Claude Code, Cursor, Windsurf, etc.) |
+
+## Bundled Agents
+
+| Agent | Role |
+|-------|------|
+| **Scout** | Fast codebase recon — compressed context for handoff |
+| **Researcher** | Web research — finds and synthesizes current information |
+| **Worker** | General-purpose execution in an isolated context window |
+
+## Native Engine
+
+Performance-critical operations use a Rust N-API engine:
+
+- **grep** — ripgrep-backed content search
+- **glob** — gitignore-aware file discovery
+- **ps** — cross-platform process tree management
+- **highlight** — syntect-based syntax highlighting
+- **ast** — structural code search via ast-grep
+- **diff** — fuzzy text matching and unified diff generation
+- **text** — ANSI-aware text measurement and wrapping
+- **html** — HTML-to-Markdown conversion
+- **image** — decode, encode, resize images
+- **fd** — fuzzy file path discovery
+- **clipboard** — native clipboard access
+- **git** — libgit2-backed git read operations (v2.16+)
+- **parser** — GSD file parsing and frontmatter extraction
+
+## Dispatch Pipeline
+
+The auto mode dispatch pipeline:
+
+```
+1.  Read disk state (STATE.md, roadmap, plans)
+2.  Determine next unit type and ID
+3.  Classify complexity → select model tier
+4.  Apply budget pressure adjustments
+5.  Check routing history for adaptive adjustments
+6.  Dynamic model routing (if enabled) → select cheapest model for tier
+7.  Resolve effective model (with fallbacks)
+8.  Check pending captures → triage if needed
+9.  Build dispatch prompt (applying inline level compression)
+10. Create fresh agent session
+11. Inject prompt and let LLM execute
+12. On completion: snapshot metrics, verify artifacts, persist state
+13. Loop to step 1
+```
+
+Phase skipping (from token profile) gates steps 2-3: if a phase is skipped, the corresponding unit type is never dispatched.
+
+## Key Modules (v2.33)
+
+| Module | Purpose |
+|--------|---------|
+| `auto.ts` | Auto-mode state machine and orchestration |
+| `auto/session.ts` | `AutoSession` class — all mutable auto-mode state in one encapsulated instance |
+| `auto-dispatch.ts` | Declarative dispatch table (phase → unit mapping) |
+| `auto-idempotency.ts` | Completed-key checks, skip loop detection, key eviction |
+| `auto-stuck-detection.ts` | Stuck loop recovery and unit retry escalation |
+| `auto-start.ts` | Fresh-start bootstrap — git/state init, crash lock detection, worktree setup |
+| `auto-post-unit.ts` | Post-unit processing — commit, doctor, state rebuild, hooks |
+| `auto-verification.ts` | Post-unit verification gate (lint/test/typecheck with auto-fix retries) |
+| `auto-prompts.ts` | Prompt builders with inline level compression |
+| `auto-worktree.ts` | Worktree lifecycle (create, enter, merge, teardown) |
+| `auto-recovery.ts` | Expected artifact resolution, completed-key persistence, self-healing |
+| `auto-timeout-recovery.ts` | Timed-out unit recovery and continuation |
+| `auto-timers.ts` | Unit supervision — soft/idle/hard timeouts, continue-here monitor |
+| `complexity-classifier.ts` | Unit complexity classification (light/standard/heavy) |
+| `model-router.ts` | Dynamic model routing with cost-aware selection |
+| `model-cost-table.ts` | Built-in per-model cost data for cross-provider comparison |
+| `routing-history.ts` | Adaptive learning from routing outcomes |
+| `captures.ts` | Fire-and-forget thought capture and triage classification |
+| `triage-resolution.ts` | Capture resolution (inject, defer, replan, quick-task) |
+| `visualizer-overlay.ts` | Workflow visualizer TUI overlay |
+| `visualizer-data.ts` | Data loading for visualizer tabs |
+| `visualizer-views.ts` | Tab renderers (progress, deps, metrics, timeline, discussion status) |
+| `metrics.ts` | Token and cost tracking ledger |
+| `state.ts` | State derivation from disk |
+| `session-lock.ts` | OS-level exclusive session locking (proper-lockfile) |
+| `crash-recovery.ts` | Lock file management for crash detection and recovery |
+| `preferences.ts` | Preference loading, merging, validation |
+| `git-service.ts` | Git operations — commit, merge, worktree sync, completed-units cross-boundary sync |
+| `unit-id.ts` | Centralized `parseUnitId()` — milestone/slice/task extraction from unit IDs |
+| `error-utils.ts` | `getErrorMessage()` — unified error-to-string conversion |
+| `roadmap-slices.ts` | Roadmap parser with prose fallback for LLM-generated variants |
+| `memory-extractor.ts` | Extract reusable knowledge from session transcripts |
+| `memory-store.ts` | Persistent memory store for cross-session knowledge |
+| `queue-order.ts` | Milestone queue ordering |
diff --git a/docs/auto-mode.md b/docs/auto-mode.md
new file mode 100644
index 000000000..5d2c47e3a
--- /dev/null
+++ b/docs/auto-mode.md
@@ -0,0 +1,301 @@
+# Auto Mode
+
+Auto mode is GSD's autonomous execution engine. Run `/gsd auto`, walk away, come back to built software with clean git history.
+
+## How It Works
+
+Auto mode is a **state machine driven by files on disk**. It reads `.gsd/STATE.md`, determines the next unit of work, creates a fresh agent session, injects a focused prompt with all relevant context pre-inlined, and lets the LLM execute. When the LLM finishes, auto mode reads disk state again and dispatches the next unit.
+
+### The Loop
+
+Each slice flows through phases automatically:
+
+```
+Plan (with integrated research) → Execute (per task) → Complete → Reassess Roadmap → Next Slice
+                                                                                      ↓ (all slices done)
+                                                                              Validate Milestone → Complete Milestone
+```
+
+- **Plan** — scouts the codebase, researches relevant docs, and decomposes the slice into tasks with must-haves
+- **Execute** — runs each task in a fresh context window
+- **Complete** — writes summary, UAT script, marks roadmap, commits
+- **Reassess** — checks if the roadmap still makes sense
+- **Validate Milestone** — reconciliation gate after all slices complete; compares roadmap success criteria against actual results, catches gaps before sealing the milestone
+
+## Key Properties
+
+### Fresh Session Per Unit
+
+Every task, research phase, and planning step gets a clean context window. No accumulated garbage. No degraded quality from context bloat. The dispatch prompt includes everything needed — task plans, prior summaries, dependency context, decisions register — so the LLM starts oriented instead of spending tool calls reading files.
+
+### Context Pre-Loading
+
+The dispatch prompt is carefully constructed with:
+
+| Inlined Artifact | Purpose |
+|------------------|---------|
+| Task plan | What to build |
+| Slice plan | Where this task fits |
+| Prior task summaries | What's already done |
+| Dependency summaries | Cross-slice context |
+| Roadmap excerpt | Overall direction |
+| Decisions register | Architectural context |
+
+The amount of context inlined is controlled by your [token profile](./token-optimization.md). Budget mode inlines minimal context; quality mode inlines everything.
+
+### Git Isolation
+
+GSD isolates milestone work using one of three modes (configured via `git.isolation` in preferences):
+
+- **`worktree`** (default): Each milestone runs in its own git worktree at `.gsd/worktrees/<MID>/` on a `milestone/<MID>` branch. All slice work commits sequentially — no branch switching, no merge conflicts mid-milestone. When the milestone completes, it's squash-merged to main as one clean commit.
+- **`branch`**: Work happens in the project root on a `milestone/<MID>` branch. Useful for submodule-heavy repos where worktrees don't work well.
+- **`none`**: Work happens directly on your current branch. No worktree, no milestone branch. Ideal for hot-reload workflows where file isolation breaks dev tooling.
+
+See [Git Strategy](./git-strategy.md) for details.
+
+### Parallel Execution
+
+When your project has independent milestones, you can run them simultaneously. Each milestone gets its own worker process and worktree. See [Parallel Orchestration](./parallel-orchestration.md) for setup and usage.
+
+### Crash Recovery
+
+A lock file tracks the current unit. If the session dies, the next `/gsd auto` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context.
+
+**Headless auto-restart (v2.26):** When running `gsd headless auto`, crashes trigger automatic restart with exponential backoff (5s → 10s → 30s cap, default 3 attempts). Configure with `--max-restarts N`. SIGINT/SIGTERM bypasses restart. Combined with crash recovery, this enables true overnight "run until done" execution.
+
+### Provider Error Recovery
+
+GSD classifies provider errors and auto-resumes when safe:
+
+| Error type | Examples | Action |
+|-----------|----------|--------|
+| **Rate limit** | 429, "too many requests" | Auto-resume after retry-after header or 60s |
+| **Server error** | 500, 502, 503, "overloaded", "api_error" | Auto-resume after 30s |
+| **Permanent** | "unauthorized", "invalid key", "billing" | Pause indefinitely (requires manual resume) |
+
+No manual intervention needed for transient errors — the session pauses briefly and continues automatically.
+
+### Incremental Memory (v2.26)
+
+GSD maintains a `KNOWLEDGE.md` file — an append-only register of project-specific rules, patterns, and lessons learned. The agent reads it at the start of every unit and appends to it when discovering recurring issues, non-obvious patterns, or rules that future sessions should follow. This gives auto-mode cross-session memory that survives context window boundaries.
+
+### Context Pressure Monitor (v2.26)
+
+When context usage reaches 70%, GSD sends a wrap-up signal to the agent, nudging it to finish durable output (commit, write summaries) before the context window fills. This prevents sessions from hitting the hard context limit mid-task with no artifacts written.
+
+### Meaningful Commit Messages (v2.26)
+
+Commits are generated from task summaries — not generic "complete task" messages. Each commit message reflects what was actually built, giving clean `git log` output that reads like a changelog.
+
+### Stuck Detection (v2.39)
+
+GSD uses a sliding-window analysis to detect stuck loops. Instead of a simple "same unit dispatched twice" counter, the detector examines recent dispatch history for repeated patterns — catching cycles like A→B→A→B as well as single-unit repeats. On detection, GSD retries once with a deep diagnostic prompt. If it fails again, auto mode stops with the exact file it expected, so you can intervene.
+
+The sliding-window approach reduces false positives on legitimate retries (e.g., verification failures that self-correct) while catching genuine stuck loops faster.
+
+### Post-Mortem Investigation (v2.40)
+
+`/gsd forensics` is a full-access GSD debugger for post-mortem analysis of auto-mode failures. It provides:
+
+- **Anomaly detection** — structured identification of stuck loops, cost spikes, timeouts, missing artifacts, and crashes with severity levels
+- **Unit traces** — last 10 unit executions with error details and execution times
+- **Metrics analysis** — cost, token counts, and execution time breakdowns
+- **Doctor integration** — includes structural health issues from `/gsd doctor`
+- **LLM-guided investigation** — an agent session with full tool access to investigate root causes
+
+```
+/gsd forensics [optional problem description]
+```
+
+See [Troubleshooting](./troubleshooting.md) for more on diagnosing issues.
+
+### Timeout Supervision
+
+Three timeout tiers prevent runaway sessions:
+
+| Timeout | Default | Behavior |
+|---------|---------|----------|
+| Soft | 20 min | Warns the LLM to wrap up |
+| Idle | 10 min | Detects stalls, intervenes |
+| Hard | 30 min | Pauses auto mode |
+
+Recovery steering nudges the LLM to finish durable output before timing out. Configure in preferences:
+
+```yaml
+auto_supervisor:
+  soft_timeout_minutes: 20
+  idle_timeout_minutes: 10
+  hard_timeout_minutes: 30
+```
+
+### Cost Tracking
+
+Every unit's token usage and cost is captured, broken down by phase, slice, and model. The dashboard shows running totals and projections. Budget ceilings can pause auto mode before overspending.
+
+See [Cost Management](./cost-management.md).
+
+### Adaptive Replanning
+
+After each slice completes, the roadmap is reassessed. If the work revealed new information that changes the plan, slices are reordered, added, or removed before continuing. This can be skipped with the `balanced` or `budget` token profiles.
+
+### Verification Enforcement (v2.26)
+
+Configure shell commands that run automatically after every task execution:
+
+```yaml
+verification_commands:
+  - npm run lint
+  - npm run test
+verification_auto_fix: true    # auto-retry on failure (default)
+verification_max_retries: 2    # max retry attempts (default: 2)
+```
+
+Failures trigger auto-fix retries — the agent sees the verification output and attempts to fix the issues before advancing. This ensures code quality gates are enforced mechanically, not by LLM compliance.
+
+### Slice Discussion Gate (v2.26)
+
+For projects where you want human review before each slice begins:
+
+```yaml
+require_slice_discussion: true
+```
+
+Auto-mode pauses before each slice, presenting the slice context for discussion. After you confirm, execution continues. Useful for high-stakes projects where you want to review the plan before the agent builds.
+
+### HTML Reports (v2.26)
+
+After a milestone completes, GSD auto-generates a self-contained HTML report in `.gsd/reports/`. Reports include project summary, progress tree, slice dependency graph (SVG DAG), cost/token metrics with bar charts, execution timeline, changelog, and knowledge base. No external dependencies — all CSS and JS are inlined.
+
+```yaml
+auto_report: true    # enabled by default
+```
+
+Generate manually anytime with `/gsd export --html`, or generate reports for all milestones at once with `/gsd export --html --all` (v2.28).
+
+### Failure Recovery (v2.28)
+
+v2.28 hardens auto-mode reliability with multiple safeguards: atomic file writes prevent corruption on crash, OAuth fetch timeouts (30s) prevent indefinite hangs, RPC subprocess exit is detected and reported, and blob garbage collection prevents unbounded disk growth. Combined with the existing crash recovery and headless auto-restart, auto-mode is designed for true "fire and forget" overnight execution.
+
+### Pipeline Architecture (v2.40)
+
+The auto-loop is structured as a linear phase pipeline rather than recursive dispatch. Each iteration flows through explicit stages:
+
+1. **Pre-Dispatch** — validate state, check guards, resolve model preferences
+2. **Dispatch** — execute the unit with a focused prompt
+3. **Post-Unit** — close out the unit, update caches, run cleanup
+4. **Verification** — optional validation gate (lint, test, etc.)
+5. **Stuck Detection** — sliding-window pattern analysis
+
+This linear flow is easier to debug, uses less memory (no recursive call stack), and provides cleaner error recovery since each phase has well-defined entry and exit conditions.
+
+### Real-Time Health Visibility (v2.40)
+
+Doctor issues (from `/gsd doctor`) now surface in real time across three places:
+
+- **Dashboard widget** — health indicator with issue count and severity
+- **Workflow visualizer** — issues shown in the status panel
+- **HTML reports** — health section with all issues at report generation time
+
+Issues are classified by severity: `error` (blocks auto-mode), `warning` (non-blocking), and `info` (advisory). Auto-mode checks health at dispatch time and can pause on critical issues.
+
+### Skill Activation in Prompts (v2.39)
+
+Configured skills are automatically resolved and injected into dispatch prompts. The agent receives an "Available Skills" block listing skills that match the current context, based on:
+
+- `always_use_skills` — always included
+- `prefer_skills` — included with preference indicator
+- `skill_rules` — conditional activation based on `when` clauses
+
+See [Configuration](./configuration.md) for skill routing preferences.
+
+## Controlling Auto Mode
+
+### Start
+
+```
+/gsd auto
+```
+
+### Pause
+
+Press **Escape**. The conversation is preserved. You can interact with the agent, inspect state, or resume.
+
+### Resume
+
+```
+/gsd auto
+```
+
+Auto mode reads disk state and picks up where it left off.
+
+### Stop
+
+```
+/gsd stop
+```
+
+Stops auto mode gracefully. Can be run from a different terminal.
+
+### Steer
+
+```
+/gsd steer
+```
+
+Hard-steer plan documents during execution without stopping the pipeline. Changes are picked up at the next phase boundary.
+
+### Capture
+
+```
+/gsd capture "add rate limiting to API endpoints"
+```
+
+Fire-and-forget thought capture. Captures are triaged automatically between tasks. See [Captures & Triage](./captures-triage.md).
+
+### Visualize
+
+```
+/gsd visualize
+```
+
+Open the workflow visualizer — interactive tabs for progress, dependencies, metrics, and timeline. See [Workflow Visualizer](./visualizer.md).
+
+## Dashboard
+
+`Ctrl+Alt+G` or `/gsd status` shows real-time progress:
+
+- Current milestone, slice, and task
+- Auto mode elapsed time and phase
+- Per-unit cost and token breakdown
+- Cost projections
+- Completed and in-progress units
+- Pending capture count (when captures are awaiting triage)
+- Parallel worker status (when running parallel milestones — includes 80% budget alert)
+
+## Phase Skipping
+
+Token profiles can skip certain phases to reduce cost:
+
+| Phase | `budget` | `balanced` | `quality` |
+|-------|----------|------------|-----------|
+| Milestone Research | Skipped | Runs | Runs |
+| Slice Research | Skipped | Skipped | Runs |
+| Reassess Roadmap | Skipped | Runs | Runs |
+
+See [Token Optimization](./token-optimization.md) for details.
+
+## Dynamic Model Routing
+
+When enabled, auto-mode automatically selects cheaper models for simple units (slice completion, UAT) and reserves expensive models for complex work (replanning, architectural tasks). See [Dynamic Model Routing](./dynamic-model-routing.md).
+
+## Reactive Task Execution (v2.38)
+
+When `reactive_execution: true` is set in preferences, GSD derives a dependency graph from IO annotations in task plans. Tasks that don't conflict (no shared file reads/writes) are dispatched in parallel via subagents, while dependent tasks wait for their predecessors to complete.
+
+```yaml
+reactive_execution: true    # disabled by default
+```
+
+The graph derivation is pure and deterministic — it resolves a ready-set of tasks, detects conflicts, and guards against deadlocks. Verification results carry forward across parallel batches, so tasks that pass verification don't need to be re-verified when subsequent tasks in the same slice complete.
+
+The implementation lives in `reactive-graph.ts` (graph derivation, ready-set resolution, conflict/deadlock detection) with integration into `auto-dispatch.ts` and `auto-prompts.ts`.
diff --git a/docs/building-coding-agents/01-work-decomposition.md b/docs/building-coding-agents/01-work-decomposition.md
new file mode 100644
index 000000000..fe58f8d2a
--- /dev/null
+++ b/docs/building-coding-agents/01-work-decomposition.md
@@ -0,0 +1,34 @@
+# Work Decomposition
+
+**The universal consensus:** Elite engineers never jump from vision to code. They use **progressive decomposition** through layers of abstraction.
+
+### The Compression Ladder
+
+```
+Vision → Capabilities → Systems/Architecture → Features → Tasks
+```
+
+Each layer answers a different question:
+
+| Layer | Question |
+|-------|----------|
+| Vision | What world are we creating? |
+| Capabilities | What must the product be able to do? |
+| Systems | What infrastructure enables those capabilities? |
+| Features | What does the user interact with? |
+| Tasks | What exact code gets written? |
+
+### Core Principles (All 4 Models Agree)
+
+- **Start with outcomes, not features.** Define "done" before anything else. Not "build a login page" but "a user can securely access their dashboard using OAuth."
+- **Vertical slices over horizontal layers.** Build thin end-to-end slices (UI → API → DB) rather than completing all backend before all frontend. Each slice is independently demoable and testable.
+- **The 1-Day Rule.** If a task takes longer than a day, it's not a task — it's a milestone. Break it down further until each item is a single, clear action completable in one sitting.
+- **Risk-first exploration.** Identify the hardest/most uncertain parts first. Spike on unknowns before committing to architecture. "Kill the biggest risks while they are still cheap to fix."
+- **Interface-first design.** Define contracts between components before building them. This enables parallel work and creates natural verification checkpoints.
+- **MECE decomposition.** Tasks should be Mutually Exclusive (no overlap) and Collectively Exhaustive (complete the vision when all are done).
+
+### The Recursive Heuristic
+
+> If something feels fuzzy, break it down one level deeper. Keep decomposing until a task is obvious how to start.
+
+---
diff --git a/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md b/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md
new file mode 100644
index 000000000..347c9abc8
--- /dev/null
+++ b/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md
@@ -0,0 +1,38 @@
+# What to Keep & Discard from Human Engineering
+
+### KEEP & Amplify
+
+| Practice | Why It Matters More for AI |
+|----------|---------------------------|
+| **Clear product intent & experience specs** | AI needs direction, not instructions. "How should it feel?" drives architecture. |
+| **Acceptance criteria as the backbone** | Becomes TDD at its logical extreme — human writes tests in natural language, AI makes them true. |
+| **Vertical slicing** | Even more critical — prevents AI from going deep down a wrong path fast and confidently. |
+| **Interface-first approach** | Creates natural checkpoints, makes systems modular and replaceable. |
+| **Explicit constraints & non-functional requirements** | Narrows the search space. Without them AI may produce technically correct but strategically wrong systems. |
+| **Architecture Decision Records (ADRs)** | Prevents AI from "accidentally" undoing decisions made weeks ago. |
+| **Feedback loops** | Build → test → observe → refine. Accelerated to machine speed. |
+
+### DISCARD
+
+| Practice | Why It's Dead Weight |
+|----------|---------------------|
+| **Estimation rituals** (story points, velocity, sprint planning) | AI doesn't get tired, doesn't context-switch, works at machine speed. |
+| **Communication overhead** (standups, design reviews, PR reviews) | Only one communication channel matters: human ↔ agent. |
+| **Manual code review for style** | Automated linting + formatting handles this deterministically. |
+| **Step-by-step instructions** | Provide outcomes, not "how." |
+| **Heavy upfront documentation** | AI can read the entire repo instantly. Document *intent* and *why*, not *how*. |
+| **Gradual skill-building** | No ramp-up, no knowledge silos, no "only Sarah knows how that module works." |
+| **Defensive architecture against human error** | Tests still needed, but for a different reason: verifying AI's interpretation of intent. |
+
+### The New Human Role
+
+| Responsibility | Description |
+|---------------|-------------|
+| **Defining "good"** | Vision, personas, experience specs, success metrics |
+| **Taste & judgment** | Aesthetics, emotional experience, brand voice |
+| **Strategic decisions** | Which problems matter, product pivots |
+| **Gut checks at milestones** | Does this *feel* right? |
+
+> **The core shift:** Human = intention + taste. AI = exploration + execution.
+
+---
diff --git a/docs/building-coding-agents/03-state-machine-context-management.md b/docs/building-coding-agents/03-state-machine-context-management.md
new file mode 100644
index 000000000..c6e8da7e5
--- /dev/null
+++ b/docs/building-coding-agents/03-state-machine-context-management.md
@@ -0,0 +1,46 @@
+# State Machine & Context Management
+
+### The Fundamental Tension
+
+The agent needs to understand the whole project to make good decisions, but any single context window degrades with too much information — not just from token limits but from **attention dilution**.
+
+### Layered Memory Architecture (Universal Agreement)
+
+```
+Project Manifest (always loaded, <1000 tokens)
+        ↓
+Task Context (per-task, relevant files + specs)
+        ↓
+Retrieval Layer (pull-based, on-demand)
+        ↓
+Ground Truth (filesystem, git, actual code)
+```
+
+| Layer | Content | Access Pattern | Token Impact |
+|-------|---------|---------------|--------------|
+| **Working Context** (L1) | Current task + 3–5 relevant files | Dynamically assembled per LLM call | 8k–25k tokens |
+| **Session/Episodic** (L2) | Compressed history + recent decisions | Auto-summarized at transitions | Summary only |
+| **Project Semantic** (L3) | Full codebase summaries, dependency graph, ADRs | Vector + Graph retrieval | Pointers only |
+| **Ground Truth** (L4) | Actual files, git history, test results | Agent reads via tools | Zero in prompt |
+
+### The State Machine
+
+The agent should always be in one explicit state:
+
+```
+PLAN → IMPLEMENT → TEST → DEBUG → VERIFY → DOCUMENT
+```
+
+**Critical transitions that matter:**
+- **Task completion:** Defined by automated tests passing + acceptance criteria met
+- **Stuck detection:** Triggered by repeated failed attempts or missing information
+- **Plan revision:** Triggered when completed tasks reveal wrong assumptions
+
+### Key Principles
+
+- **Summarize aggressively between phases.** Don't carry full implementation context forward — carry compressed summaries: what was built, what decisions were made, what interfaces were created.
+- **Pull-based, not push-based context.** Don't preload everything the agent might need. Let it ask for what it discovers it needs.
+- **Use structured state for reliability.** Natural language summaries drift. Use JSON/typed configs for anything the system needs to track. Reserve natural language for reasoning.
+- **The filesystem is external memory.** The codebase itself is the most detailed representation of current state. Hold *understanding* about code in context, not the code itself.
+
+---
diff --git a/docs/building-coding-agents/04-optimal-storage-for-project-context.md b/docs/building-coding-agents/04-optimal-storage-for-project-context.md
new file mode 100644
index 000000000..e6d45ace9
--- /dev/null
+++ b/docs/building-coding-agents/04-optimal-storage-for-project-context.md
@@ -0,0 +1,56 @@
+# Optimal Storage for Project Context
+
+### The Universal Answer: Plain Text Files in the Repo + Structured State Store
+
+All four models converge on a hybrid approach. The key insight: **don't over-engineer with databases and vector stores, but don't under-engineer with a single massive file either.**
+
+### The Optimal Stack
+
+| Storage | What Lives Here | Why |
+|---------|----------------|-----|
+| **Project Manifest** (`PROJECT.md`) | Vision, principles, architecture overview, component status | Always loaded, <1000 tokens, single source of truth |
+| **Structured State** (JSON/SQLite/Postgres) | Task status, phase, dependencies, verification results | Machine-parseable, drives state machine transitions |
+| **Context Directory** (`.context/` or `.ai/`) | Architecture docs, task specs, decision records | Organized for retrieval, not human browsing |
+| **Git Repository** | Actual source code, test results | Ultimate ground truth, never duplicated |
+| **Knowledge Graph** (optional at scale) | File → function → dependency relationships | Enables "what breaks if I change this?" queries |
+
+### Why Plain Files Win
+
+- AI reads files directly — no query language, no ORM, no API calls
+- Version control comes free via git
+- Human can read and edit with any text editor
+- Survives tooling changes — not locked into any system
+
+### Why NOT Vector Stores (as primary)
+
+- Project context is **structured** — you know where things are
+- Vector stores return **approximately relevant** results — approximate is often wrong in codebases
+- They can't represent state, relationships, or task progress
+
+### The Hybrid Format
+
+Individual files use **YAML frontmatter + Markdown body**:
+```yaml
+---
+status: in_progress
+dependencies: [AUTH-01, DB-02]
+acceptance_criteria:
+  - User can reset password via email
+  - Token expires after 30 minutes
+---
+
+## Task: Password Reset Flow
+[Rich narrative description and context here]
+```
+
+### Size Discipline
+
+| File | Target Size |
+|------|------------|
+| Project Manifest | <1,000 tokens |
+| Individual task files (completed) | <500 tokens |
+| Architecture doc | <2,000 tokens |
+
+> The context system isn't just storage — it's a **compression engine**. Its job is to maintain maximum useful understanding in minimum token footprint.
+
+---
diff --git a/docs/building-coding-agents/05-parallelization-strategy.md b/docs/building-coding-agents/05-parallelization-strategy.md
new file mode 100644
index 000000000..83823021d
--- /dev/null
+++ b/docs/building-coding-agents/05-parallelization-strategy.md
@@ -0,0 +1,62 @@
+# Parallelization Strategy
+
+### Core Principle
+
+> Parallelize across boundaries, serialize within them.
+
+The quality of parallelization is directly determined by the quality of interface definitions.
+
+### The Diamond Pattern
+
+```
+    Planning (narrow, serial)
+         ↓
+   Fan Out (parallel execution)
+         ↓
+  Convergence (integration verification)
+         ↓
+    Fan Out (next parallel set)
+```
+
+### Phase-by-Phase Strategy
+
+#### Planning: Mostly Serial, with Parallel Spikes
+- High-level decomposition must be serial (one coherent act of reasoning)
+- **Parallelize uncertainty resolution:** Multiple spikes investigating different risks simultaneously
+- Output: A dependency graph that explicitly identifies what can be parallelized
+
+#### Execution: Massive Parallelization with Right Topology
+
+| Work Type | Strategy |
+|-----------|----------|
+| **Independent leaf tasks** | Embarrassingly parallel — one agent per module |
+| **Dependent chains** | Serial within chain, but chains run in parallel |
+| **Convergence points** | Strictly serial — integration verification |
+
+**Critical insight:** The frontend doesn't need the real API — it needs the API *contract*. Once contracts exist, both sides build in parallel.
+
+#### Testing: The Most Interesting Story
+- **Unit tests:** Same agent, same context, atomic with code
+- **Cross-task tests:** All parallel by definition
+- **Integration tests:** Parallel across different boundaries
+- **E2E tests:** Serial (exercises whole system)
+
+#### Verification: Deliberate Redundancy
+- **Adversarial verification:** Separate reviewer agent with fresh context evaluates against spec
+- **Red-team parallelism:** Agent tries to break the implementation
+
+### Coordination Rules
+
+- Agents communicate through the **filesystem**, never directly
+- Each agent works on a **branch** — merge on success, discard on failure
+- One agent per file at a time (file locking)
+- Optimal concurrency: **3–8 simultaneous agents** for most projects
+
+### Anti-Patterns
+
+- ❌ Don't parallelize tasks that modify the same files
+- ❌ Don't parallelize interacting decisions
+- ❌ Don't skip convergence/integration verification
+- ❌ Don't over-parallelize (coordination tax eats gains above ~8 agents)
+
+---
diff --git a/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md b/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md
new file mode 100644
index 000000000..2d5802580
--- /dev/null
+++ b/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md
@@ -0,0 +1,44 @@
+# Maximizing Agent Autonomy & Superpowers
+
+### The Foundational Insight
+
+> Autonomy comes from **self-correction**, not from getting it right the first time. The power isn't in the initial generation — it's in iteration speed and feedback signal quality.
+
+### The Essential Tool Arsenal
+
+| Category | Tools | Why |
+|----------|-------|-----|
+| **Execution Environment** | Terminal, filesystem, git, package manager | Closes the write → run → debug → verify loop |
+| **Verification** | Test runner, linter, type checker, security scanner | Ground truth over self-assessment |
+| **Observation** | Logs, browser/renderer, performance profiler | Sees what users would see |
+| **Exploration** | Code search, documentation lookup, web research | Self-directed learning |
+| **Recovery** | Git revert, branch management, checkpoints | Safety net that enables boldness |
+
+### Self-Verification Architecture
+
+Every task completion should self-evaluate against a checklist:
+1. Does the code compile?
+2. Do all existing tests still pass?
+3. Do new tests pass?
+4. Does the application actually start?
+5. Can I exercise the feature and see expected behavior?
+6. Does this match acceptance criteria point by point?
+
+### Debugging Superpowers
+
+- **Temporary instrumentation:** Add logging, remove after diagnosis
+- **Bisection:** Walk back through changes to find where regression was introduced
+- **Minimal reproduction:** Strip away everything except exact conditions that trigger failure
+- **Exploratory tests:** Quick throwaway scripts to test hypotheses
+
+### Meta-Cognitive Layer
+
+- **Scratchpad:** External reasoning space to track hypotheses, attempts, and outcomes
+- **Stuck detection:** After N failed attempts, trigger step-back with fresh context and explicitly different approach
+- **Structured escalation:** "Here's what I'm trying, here's what I've tried, here's what I think the issue is, here's what I need from you"
+
+### The Philosophy
+
+> You're not trying to build an agent that doesn't make mistakes. You're building one that **catches and fixes its own mistakes faster than a human would notice them**. Not intelligence — **closed-loop execution with rich feedback**.
+
+---
diff --git a/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md b/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md
new file mode 100644
index 000000000..4772b7a3a
--- /dev/null
+++ b/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md
@@ -0,0 +1,82 @@
+# System Prompt & LLM vs Deterministic Split
+
+### The Core Separation Principle
+
+> If you could write an if-else statement that handles it correctly every time, **it should not be in the LLM's context**. Every token the model spends reasoning about something deterministic is wasted and introduces hallucination risk.
+
+### What the LLM Owns
+
+| Capability | Why LLM |
+|-----------|---------|
+| Understanding intent | Interpretation, judgment |
+| Architectural reasoning | Weighing tradeoffs |
+| Code generation | Creative, context-dependent |
+| Debugging & diagnosis | Abductive reasoning, hypothesis formation |
+| Self-critique & quality assessment | Judgment calls |
+
+### What TypeScript/Deterministic Code Owns
+
+| Capability | Why Deterministic |
+|-----------|-------------------|
+| State machine transitions | Typed state object, no ambiguity |
+| Context assembly | Predict + pre-load what agent needs |
+| File operations | Validate paths, handle encoding, manage permissions |
+| Test execution & result parsing | Structured results, not raw terminal output |
+| Build & environment management | Install deps, start servers, manage ports |
+| Code formatting | Run prettier automatically, never waste LLM tokens |
+| Task scheduling & dependency resolution | Graph traversal, instant vs 5-second LLM call |
+| Summarization triggers | Mechanical workflow, LLM provides content |
+
+### Modular System Prompt Architecture
+
+```
+Base Layer (always present, ~500 tokens)
+  → Identity, core behavioral rules, general approach
+  
+Phase-Specific Layer (swapped based on state)
+  → Planning mode: decomposition, interfaces, risks
+  → Execution mode: implementation, testing, iteration
+  → Debugging mode: diagnosis, hypothesis testing, isolation
+
+Task-Specific Layer (assembled fresh per task)
+  → Current spec, acceptance criteria, relevant contracts, prior attempts
+
+Tools Layer
+  → Available tool definitions and parameters
+```
+
+### Tool Design Philosophy
+
+> Each tool should do one thing, do it completely, and return structured results the LLM can immediately act on.
+
+**Bad:** LLM calls `readFile` → `parseJSON` → `runCommand` (3 calls, 3 failure points)  
+**Good:** LLM calls `runTests(filter)` → gets structured pass/fail with locations (1 call, clean result)
+
+### Essential Tools
+
+| Tool | Returns |
+|------|---------|
+| `runTests` | Structured results: pass count, fail count, per-failure details |
+| `readFiles` | Batched file contents (array of paths, not one at a time) |
+| `writeFile` | Auto-formats before writing |
+| `searchCodebase` | Grep-like results with file paths and line numbers |
+| `getProjectState` | Manifest + current task spec + related task statuses |
+| `updateTaskStatus` | Handles downstream state updates automatically |
+| `buildProject` | Structured errors with file paths and line numbers |
+| `browserCheck` | Screenshot or structured description of rendered output |
+| `commitChanges` | Enforces conventions, runs pre-commit hooks |
+| `revertToCheckpoint` | Rolls back to last known good state |
+
+### Prompt Patterns That Maximize Agency
+
+1. **Tell it what it CAN do, not what it can't.** "Full authority as long as acceptance criteria and tests pass."
+2. **Explicit permission to iterate.** "First attempt doesn't need to be perfect. Write, run, observe, improve."
+3. **Clear exit conditions.** Concrete, measurable, unambiguous definition of "done."
+4. **Built-in scratchpad.** "Write reasoning in thinking blocks. Track attempts and outcomes."
+5. **Recovery protocol.** "After 3 failed approaches, produce structured escalation."
+
+### The Meta-Principle
+
+> Your TypeScript orchestrator is the deterministic skeleton — workflow, state, context, tools, coordination. The LLM is the reasoning muscle — understanding, creativity, judgment, problem-solving. **Neither should do the other's job.** When you get this right, the LLM becomes dramatically more capable because it's only doing what it's good at, with exactly the context it needs.
+
+---
diff --git a/docs/building-coding-agents/08-speed-optimization.md b/docs/building-coding-agents/08-speed-optimization.md
new file mode 100644
index 000000000..48fbf5fe3
--- /dev/null
+++ b/docs/building-coding-agents/08-speed-optimization.md
@@ -0,0 +1,60 @@
+# Speed Optimization
+
+### The #1 Speed Principle
+
+> The fastest possible operation is the one you don't perform. Before optimizing any step, ask: does this step need to exist at all?
+
+### Speed Levers (Ranked by Impact)
+
+#### 1. Minimize LLM Calls
+- **Batch intent into single calls.** Don't generate code, then tests, then docs separately. One call: "implement, test, and document." TypeScript splits the output.
+- **Deterministic fast paths.** Missing import? Syntax error? Fix without an LLM call if the fix is mechanical.
+- Audit call chains ruthlessly — most systems have 50%+ unnecessary sequential calls.
+
+#### 2. Make Feedback Loops Instantaneous
+- Use test watch mode (no cold start)
+- Run only relevant test subsets (track which files affect which tests)
+- Incremental builds (hot module reloading)
+- Async, non-blocking file writes
+
+#### 3. Precompute Context
+- Predict what the agent will need based on task definition
+- Pre-load into the prompt — no tool calls needed mid-generation
+- **Speculative pre-fetching** (like CPU cache prefetching)
+
+#### 4. Parallelize Independent Work
+- Minimize startup cost for new parallel agents (pre-built templates, warm connections)
+- Use the dependency graph to identify independent work automatically
+
+#### 5. Stream Everything, Block on Nothing
+- Process tokens as they arrive
+- Pipeline parallelism: start formatting code while commit message is still generating
+
+#### 6. Cache Aggressively
+- In-memory cache of everything agent might need
+- Cross-task caching for unchanged files
+- Cache LLM results for deterministic inputs (boilerplate, type definitions)
+
+#### 7. Minimize Token Waste
+- Dense context, not verbose context
+- Structured formats for structured data
+- Minify reference code that's informational, not for modification
+
+### Anti-Patterns That Murder Speed
+
+| Anti-Pattern | Fix |
+|-------------|-----|
+| Re-verifying things that can't have changed | Dependency-aware selective re-verification |
+| Excessive self-reflection on simple tasks | Complexity-based workflow routing |
+| Over-summarization between micro-steps | Only full context reset at task boundaries |
+| Waiting for human approval on auto-verifiable work | Human checkpoints at milestones, not tasks |
+| Quadratic history growth | Aggressive compression at every transition |
+| Synchronous blocking tools | Async everything, pipeline parallelism |
+
+### The Speed Multiplier Nobody Talks About
+
+**Failure prediction.** Track patterns across tasks. If certain task types fail on first attempt, pre-load extra guidance. Preventing a failed iteration is faster than executing one.
+
+> The magical feeling of speed comes from only doing things that matter, and then doing those things as fast as possible. The system should feel like the agent knew what to do and just did it.
+
+---
diff --git a/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md b/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md
new file mode 100644
index 000000000..8d1d9e031
--- /dev/null
+++ b/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md
@@ -0,0 +1,33 @@
+# Top 10 Tips for a World-Class Agent
+
+### 1. The Orchestrator Is the Product, Not the Model
+The model is a commodity. Two teams using the same model produce wildly different results based on orchestration quality. Invest 70% of effort in the orchestrator, 30% in prompt engineering.
+
+### 2. Context Assembly Is a Craft
+Profile your context like you'd profile code. Measure which context elements correlate with first-attempt success. Prune relentlessly. The right files, in the right order, with the right framing, at the right level of detail.
+
+### 3. Make the Feedback Loop the Fastest Thing
+Treat feedback loop latency like a game engine treats frame rate. Incremental builds, targeted tests, pre-warmed servers, cached deps. Put it on a dashboard you look at every day.
+
+### 4. Build First-Class Error Recovery Into Every Layer
+Retry with variation (never the same way twice), automatic rollback, structured escalation, ability to park blocked tasks. **Design failure paths first** — they'll get more use than you expect.
+
+### 5. Verify Through Execution, Not Self-Assessment
+An agent that asks itself "is this correct?" says yes 90% of the time regardless. Run the code, observe results, get ground truth. Self-assessment supplements execution-based verification, never replaces it.
+
+### 6. Return Structured, Actionable Data from Every Tool
+Don't return raw terminal output. Return structured objects: what passed, what failed, where, why. Remove cognitive load from the model — it directly translates to better decisions.
+
+### 7. Use a DAG, Not a Flat List
+Explicit inputs, outputs, dependencies, acceptance criteria per task. Maximizes parallelism, identifies critical path, enables smart impact tracing when things change.
+
+### 8. Keep the Manifest Small and Always Current
+One file, <1000 tokens, always included. Updated automatically after every task completion. If it drifts from reality, everything downstream suffers.
+
+### 9. Build Observability From Day One
+Log every LLM call. Track iterations per task type, token usage, failure rates, first-attempt success rates. This is your training data for improving the orchestrator. Teams that instrument well improve 10x faster.
+
+### 10. Make Human Touchpoints High-Leverage and Low-Friction
+Present specific questions with context, not walls of text. "The API could return nested or flat fields — which fits your vision?" is a 5-second decision. "Please review everything" takes 20 minutes.
+
+---
diff --git a/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md b/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md
new file mode 100644
index 000000000..906a66060
--- /dev/null
+++ b/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md
@@ -0,0 +1,33 @@
+# Top 10 Pitfalls to Avoid
+
+### 1. Putting Workflow Logic in the Prompt
+Control flow belongs in TypeScript with actual conditionals and state tracking. Prompts that describe workflows are fragile, inconsistently followed, and impossible to debug with a debugger.
+
+### 2. Unbounded Context Accumulation
+Each iteration adds noise. After 7 iterations, context is bloated with stale information from attempts 1–5. **Carry forward only current state and most recent error.** Summarize or discard everything else.
+
+### 3. Trusting the Model's Self-Assessment of Completion
+Models are biased toward completion. Never let the model be the sole judge. Use deterministic checks: tests pass, it builds, acceptance criteria have corresponding passing tests.
+
+### 4. Over-Engineering Tools Before Understanding Workflows
+Start with a small general-purpose set (file read/write, execute command, run tests). Watch where the agent struggles in real tasks. Then build specialized tools to solve observed problems.
+
+### 5. Neglecting the Cold-Start Problem
+The first task is fundamentally different from the twentieth. Use deterministic templates for project scaffolding, conventions, and test infrastructure before handing off to the agent.
+
+### 6. Too Much Autonomy Too Early
+An agent going slightly wrong for 2 hours produces a mountain of throwaway code. Start with more checkpoints than needed. Earn autonomy incrementally for proven task types.
+
+### 7. Ignoring Compounding Inconsistency
+Different naming, different patterns, different structures across files = technical debt that confuses the agent itself later. Enforce consistency through linting or by showing existing examples before new code.
+
+### 8. Building for the Demo, Not the Recovery
+The demo is the happy path. The product is what happens when tests fail, builds break, APIs change. **Spend 2x as much time on failure/recovery paths.** The agent spends more time recovering than succeeding first-attempt.
+
+### 9. Treating All Tasks as Equally Complex
+Simple utility functions and complex state management shouldn't go through the same workflow. Classify by complexity. Simple tasks get a fast path. Complex tasks get the full treatment.
+
+### 10. Not Measuring What Actually Matters
+Don't just track tokens and costs. Measure: first-attempt success rate, iterations to completion, human intervention frequency, code survival rate (does it survive the next 3 tasks?), stuck-detection accuracy. These guide real improvement.
+
+---
diff --git a/docs/building-coding-agents/11-god-tier-context-engineering.md b/docs/building-coding-agents/11-god-tier-context-engineering.md
new file mode 100644
index 000000000..90de37c38
--- /dev/null
+++ b/docs/building-coding-agents/11-god-tier-context-engineering.md
@@ -0,0 +1,97 @@
+# God-Tier Context Engineering
+
+### The Core Principle
+
+> God-tier context engineering treats the context window as a **designed experience for the model**, not as a bucket you throw information into. The context window is the UX of your agent. Design it accordingly.
+
+### The 10 Commandments of Context Engineering
+
+#### 1. The Pyramid of Relevance
+- **Sharp focus:** Active files at full detail
+- **Present but compressed:** Interface contracts, manifest, task definition
+- **Summarized or absent:** Other components' internals, completed task histories
+
+Each tier has a token budget. If full-resolution tier is large, outer tiers compress harder.
+
+#### 2. Context Is a Cache, Not a History
+Treat it like a CPU cache: holds exactly what's needed now, everything else evicted. The question isn't "what has happened" but "what does the model need to see right now?"
+
+#### 3. Separate Reference from Instruction
+- **Instruction context** (what to do) → beginning and end of prompt (highest attention)
+- **Reference context** (helpful info) → middle, clearly delineated
+
+Manage them independently. Compress reference aggressively while keeping instructions at full detail.
+
+#### 4. Earn Every Token's Place
+Implement a token budget system:
+
+| Category | Budget |
+|----------|--------|
+| System prompt + behavioral instructions | ~15% |
+| Manifest | ~5% |
+| Task spec + acceptance criteria | ~20% |
+| Active code files | ~40% |
+| Interface contracts | ~10% |
+| Reserve (tool results, errors) | ~10% |
+
+When any category exceeds budget, intelligently summarize (not truncate).
+
+#### 5. Write for the Model's Attention Pattern
+- Critical info at the very beginning and reiterated at the end
+- Structured blocks with clear headers and delimiters
+- Consistent formatting conventions
+
+```
+TASK: Implement password reset flow
+STATUS: New
+DEPENDS ON: auth-module (complete), email-service (complete)
+ACCEPTANCE CRITERIA:
+- User can request reset via email
+- Token expires after 30 minutes
+- New password meets existing validation rules
+- All existing auth tests pass
+RELEVANT INTERFACES: [below]
+ACTIVE FILES: [below]
+```
+
+#### 6. Compress at Every State Transition
+- Task completion → 50–100 token completion record
+- Use a **dedicated summarization call** with a tight prompt (not the working agent self-summarizing)
+- **Cascading summarization:** Task summaries → milestone summaries → phase summaries (5:1 compression ratio at each level)
+
+#### 7. Use the Filesystem as Your Infinite Context Window
+- Organize files for retrieval, not human browsing
+- Predictable naming conventions = instant lookup
+- Essentially a custom database on top of the filesystem
+
+#### 8. Profile Context Quality, Not Just Size
+Track first-attempt success rate as a function of context composition. What was in context when it succeeded vs failed? Let data guide what constitutes high-quality context.
+
+#### 9. Dynamic Context Based on Task Phase
+Different phases need different context:
+
+| Phase | Optimal Context |
+|-------|----------------|
+| Understanding | Spec, acceptance criteria, broad architectural context |
+| Implementation | Active files, interface contracts, coding patterns |
+| Debugging | Failing test output, relevant code, test code |
+| Verification | Acceptance criteria prominently, ability to exercise feature |
+
+#### 10. Design for Context Recovery
+- **Checkpoint** context state at task starts and phase transitions
+- On detected confusion (repeated failures, increasing iterations, off-task output): **roll back to checkpoint** and re-enter with fresh context + concise failure info + strategy hint
+- Structured recovery ≠ naive retry. It rebuilds context from scratch with learned information.
+
+### The God-Tier Strategy in One Sentence
+
+> Orchestrator-assembled minimal slice + persistent hierarchical memory. Every single LLM call stays 8k–25k tokens while the agent has perfect knowledge of a 500k-line codebase and months of project history.
+
+---
+
+---
+
+# Part II: The Hard Problems (Grey Area Synthesis)
+
+> Synthesized from a second round of deep conversations with all four models, targeting the 13 hardest unsolved problems in autonomous coding agents — plus a critical question on accessibility for non-technical users.
+
+---
diff --git a/docs/building-coding-agents/12-handling-ambiguity-contradiction.md b/docs/building-coding-agents/12-handling-ambiguity-contradiction.md
new file mode 100644
index 000000000..d94936254
--- /dev/null
+++ b/docs/building-coding-agents/12-handling-ambiguity-contradiction.md
@@ -0,0 +1,56 @@
+# Handling Ambiguity & Contradiction
+
+**The universal consensus:** This is the highest-cost failure mode. An agent confidently building the wrong thing based on a reasonable-but-incorrect interpretation burns hours of work discovered only at milestone reviews.
+
+### The Three-Layer Strategy (All 4 Models Agree)
+
+#### Layer 1: Classification of Ambiguity Type
+
+Every requirement should be classified during planning:
+
+| Classification | Action |
+|---------------|--------|
+| **Clear and actionable** | Proceed autonomously |
+| **Ambiguous but decidable with sensible defaults** | Proceed + document assumptions |
+| **Genuinely unclear or contradictory** | Halt and escalate to human |
+
+> The middle category is where most real work lives. "The user should be able to reset their password" has a hundred implied decisions. A good agent resolves these with sensible defaults and **documents the assumptions it made** — it doesn't ask about every one.
+
+#### Layer 2: The Assumption Ledger
+
+Every task completion includes an `assumptions.md` update listing every interpretive decision the agent made:
+
+```json
+{
+  "assumptions": [
+    "Password reset tokens expire after 30 minutes (common security practice)",
+    "Email delivery, not SMS",
+    "No password history check"
+  ],
+  "confidence": 0.82
+}
+```
+
+The human reviews these at **milestones, not in real-time** — preserving speed while maintaining correctness.
+
+#### Layer 3: Contradiction Detection Pass
+
+Before execution begins, a **dedicated reasoning pass** (separate from planning) scans for conflicts:
+- Do requirements contradict each other?
+- Do acceptance criteria conflict with stated architecture?
+- Are there implicit assumptions in one requirement that violate another?
+
+### Escalation Threshold
+
+- **Impact confined to current task** → decide and document
+- **Impact touches interface contracts** → escalate (wrong interpretation cascades)
+
+Grok adds a **"Multi-Hypothesis Planning"** approach: when underspecification is detected, generate three distinct "Intent Hypotheses" (The Minimalist Path, The Scalable Path, The Feature-Rich Path). If the semantic distance between them exceeds a threshold, hard-halt and present a decision matrix to the human.
+
+### The Deepest Pitfall
+
+Models don't naturally express uncertainty — they pick an interpretation and run with it as if it's obviously correct. The system prompt must explicitly instruct confidence-level flagging, and the orchestrator must treat low-confidence decisions differently from high-confidence ones.
+
+> **Proven result:** Grok reports this pattern cuts wrong-path rework by ~65% in 2026 evaluations.
+
+---
diff --git a/docs/building-coding-agents/13-long-running-memory-fidelity.md b/docs/building-coding-agents/13-long-running-memory-fidelity.md
new file mode 100644
index 000000000..118fd9c0c
--- /dev/null
+++ b/docs/building-coding-agents/13-long-running-memory-fidelity.md
@@ -0,0 +1,34 @@
+# Long-Running Memory Fidelity
+
+**The core problem:** Every compression loses information. Over enough compressions, summaries drift from reality like a photocopy of a photocopy. The system can't easily tell it's happening because it only sees the current summary, not what was lost.
+
+### Multi-Tier Memory with Different Decay Rates
+
+| Tier | Decay Rate | Content | Update Strategy |
+|------|-----------|---------|-----------------|
+| **Manifest** | Fast (updates every task) | Current state only, <1000 tokens | Continuous overwrite — no history |
+| **Decision Log** | Never decays (append-only) | Every significant architectural decision + rationale | Never summarized, grows linearly |
+| **Task Archive** | Medium | Compressed task completion records | Available for retrieval, not routinely loaded |
+
+### The Critical Mechanism: Periodic Reconciliation
+
+All four models converge on some form of automated audit:
+
+- **Claude:** Every milestone or N tasks — agent compares manifest against actual codebase
+- **Gemini:** Every N commits, spawn a "History Auditor" agent whose sole job is manifest-vs-code comparison
+- **GPT:** Self-healing summaries with checksums — when source files change, invalidate and regenerate
+- **Grok:** Deterministic "Memory Fidelity Audit" node every 5 checkpoints — samples key invariants, scores drift 0-100, auto-rebuilds if drift >15%
+
+### The Golden Rule
+
+> **Never summarize summaries.** Each compression layer regenerates from the one below. The codebase is always the lossless source of truth.
+
+### The Most Dangerous Form of Drift
+
+Not factual inaccuracy — **the loss of "why."** The manifest says "auth uses JWT tokens." Three months ago there was a long discussion about why JWT was chosen over session-based auth. That context is exactly what gets compressed away. The **append-only decision log** solves this by preserving *why* indefinitely even as *what* gets continuously compressed.
+
+### Phase Boundary Refresh
+
+For very long projects (weeks/months), **rebuild the manifest from scratch** at phase boundaries by having the agent read the actual codebase + decision log — rather than carrying forward the old manifest with incremental updates. This is the equivalent of defragmenting a hard drive.
+
+---
diff --git a/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md b/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md
new file mode 100644
index 000000000..33acefca5
--- /dev/null
+++ b/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md
@@ -0,0 +1,25 @@
+# Multi-Agent Semantic Conflict Resolution
+
+**The hard case:** Git-level merge conflicts are easy. The real problem is code that merges cleanly but doesn't work — agents honoring the same typed interface while disagreeing on semantics (e.g., Agent A returns `null` for "not found," Agent B treats `null` as "error").
+
+### Three Lines of Defense (Universal Agreement)
+
+#### 1. Semantically Rich Interface Contracts
+
+Don't just define type signatures — define **behavioral contracts**: What does `null` mean? What are the error semantics? What invariants does the caller rely on? Contracts should be miniature specs, not just type definitions.
+
+#### 2. Pre-Written Integration Tests
+
+Write integration tests **during planning, before parallel execution begins** — tests that exercise semantic expectations, not just types. These are waiting when parallel branches converge.
+
+#### 3. Dedicated Integration/Reconciliation Agent
+
+After parallel branches merge, a focused agent gets: interface contracts + both implementations + integration tests. Its job is finding semantic mismatches, not rebuilding.
+
+### The Highest-Value Technique
+
+**Adversarial edge-case generation at integration points.** The integration agent reads both implementations, sees how each handles boundaries, and generates new tests that specifically probe the assumption gaps between them. This catches the subtlest bugs.
+
+Gemini adds the concept of a **"Shadow Merge"** agent that runs "Cross-Impact Analysis" before actual merge — looking for "Logical Race Conditions" where Worker A changed a utility that Worker B relied on, even when the git merge is clean.
+
+---
diff --git a/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md b/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md
new file mode 100644
index 000000000..0a044d758
--- /dev/null
+++ b/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md
@@ -0,0 +1,35 @@
+# Legacy Code & Brownfield Onboarding
+
+**The fundamental difference:** Greenfield = design → implement. Brownfield = **observe → infer → validate → modify.**
+
+### The Onboarding Pipeline (All 4 Models Agree)
+
+#### Phase 1: Structural Analysis (Deterministic)
+- Dependency graph mapping
+- Module identification, LOC per component
+- Test coverage analysis, entry point discovery
+- Database schema mapping
+
+#### Phase 2: Convention Extraction (LLM-Assisted)
+- Sample representative files across modules
+- Identify: error handling patterns, naming conventions, API structure, DB access patterns, testing patterns
+- Output: a **conventions document** that becomes critical reference context
+
+#### Phase 3: Pattern Mining
+- Extract implicit "tribal knowledge" — workarounds for browser bugs, special customer cases, performance hacks that look like mistakes
+- Generate decision records into project state
+
+### The Cardinal Rules
+
+| Rule | Why |
+|------|-----|
+| **Observe first, edit later** | Agents must never modify code they don't understand |
+| **Preserve local consistency over global ideals** | Resist the "Junior Refactor" — don't "fix" legacy code to modern standards |
+| **Add characterization tests before modifying** | Tests that document *current behavior*, not *correct behavior* |
+| **Minimal, surgical modifications** | Refactoring is a separate task requiring explicit human approval |
+
+### The Biggest Pitfall
+
+The agent will try to refactor legacy code to match its sense of good patterns. Left unchecked, this produces massive diffs that change behavior in subtle ways. **Enforce strict rules:** modifications to legacy code should be minimal and surgical.
+
+---
diff --git a/docs/building-coding-agents/16-encoding-taste-aesthetics.md b/docs/building-coding-agents/16-encoding-taste-aesthetics.md
new file mode 100644
index 000000000..c57aee799
--- /dev/null
+++ b/docs/building-coding-agents/16-encoding-taste-aesthetics.md
@@ -0,0 +1,34 @@
+# Encoding Taste & Aesthetics
+
+**The honest frontier:** This is where all four models are most candid about current limitations.
+
+### What CAN Be Automated
+
+| Technique | Description |
+|-----------|-------------|
+| **Reference-based extraction** | "Feels like Linear" → extract concrete attributes: spacing ratios, animation timing curves, color relationships, typography |
+| **Style specification** | Convert extracted attributes to verifiable parameters: "transitions 150-200ms ease-out, 8px grid spacing, specific contrast ratios" |
+| **Automated verification** | Lighthouse scores, visual regression tests, accessibility audits, performance budgets, design system linting |
+| **Visual comparison** | Render output, compare against reference screenshots using vision-capable models |
+| **A/B comparison** | Show two versions, human picks which "feels better" — faster than absolute judgment |
+
+### What CANNOT Be Automated
+
+The **gestalt** — the overall feeling, emotional response, sense of quality emerging from a thousand small interacting decisions. *Does this feel premium? Fast? Trustworthy?* These are fundamentally subjective.
+
+### The Optimal Strategy
+
+**Narrow the gap** by converting as much "taste" as possible into **concrete, verifiable specifications upfront:**
+
+- Not "use nice spacing" → "16px between sections, 8px between related elements, 4px between tightly coupled elements"
+- Exact animation timing curves, color values with contrast ratios, typography weights and sizes
+
+Then **reserve human review for the remaining subjective layer** with structured, specific questions:
+
+> "Does the density feel right? Does the transition timing feel snappy enough? Does the empty state feel intentional or broken?"
+
+### The Emerging Frontier
+
+Vision-capable models for aesthetic evaluation — render output, capture screenshot, compare against references on specific visual dimensions. Imperfect but improving rapidly. Grok reports ~80-85% of taste can be automated this way; the remaining 15% stays human-only.
+
+---
diff --git a/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md b/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md
new file mode 100644
index 000000000..cb17b6d6f
--- /dev/null
+++ b/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md
@@ -0,0 +1,31 @@
+# Irreversible Operations & Safety Architecture
+
+**The core principle (universal agreement):** Irreversible operations should **never be executed by the agent.** The agent prepares them; the human executes them.
+
+### Risk-Graded Action Classification
+
+| Class | Examples | Policy |
+|-------|----------|--------|
+| **Reversible** | Code edits, UI changes, unit tests | Full autonomy + auto-revert on failure |
+| **Semi-Reversible** | New files, dependencies | Auto-execute + git checkpoint |
+| **Irreversible** | DB migrations, external API changes, data transformations | Human-in-the-loop required |
+| **External Side-Effect** | Payment charges, third-party API calls with side effects | Human approval + dry-run + rollback plan |
+
+### Per-Operation Protocols
+
+| Operation | Agent Does | Human Does |
+|-----------|-----------|-----------|
+| **Database migrations** | Write migration + rollback + tests, run against test DB, produce review package | Review package, execute migration |
+| **External APIs** | Build + test against sandbox/mock versions | Switch from sandbox to production |
+| **Deployment** | Produce artifacts, verify in staging | Trigger production deployment |
+
+### The Classification Must Be:
+- **Static and deterministic** (not left to the agent's judgment)
+- **Conservative** (if there's doubt, classify as irreversible)
+- **Enforced by the orchestrator** (the agent never encounters an irreversible operation without interception)
+
+### The Subtlety Most Miss
+
+Data transformations that technically don't delete anything but **lose information through reformatting**. Converting a nullable column to non-nullable with a default value permanently destroys the distinction between rows that had real values and rows that got the default. These must be flagged with the same severity as deletions.
+
+---
diff --git a/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md b/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md
new file mode 100644
index 000000000..1220ad599
--- /dev/null
+++ b/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md
@@ -0,0 +1,31 @@
+# The Handoff Problem: Agent → Human Maintainability
+
+**The failure modes of AI-generated code** that all four models identify:
+
+### Known Anti-Patterns
+
+| Pattern | Problem | Fix |
+|---------|---------|-----|
+| **Flat code** | Everything in one function/file to reduce inconsistency risk | Enforce human-friendly modular patterns |
+| **Clever solutions** | Dense functional chains (`filter().map().reduce().flatMap()`) | Max 3 chained operations; extract named intermediates |
+| **Useless comments** | `// filter active users` above a filter call | Require *why* comments, skip *what* comments |
+| **Over-abstraction** | Creates clever custom abstractions no human can follow | Enforce standard framework patterns over custom inventions |
+| **Missing breadcrumbs** | No README files in directories, no ADRs, no diagrams | Include documentation in task completion checklist |
+
+### The Architecture That Maximizes Handoff Quality
+
+**Enforce well-known frameworks and conventions** over custom patterns. A codebase using standard Next.js/Express/React patterns is immediately navigable. A codebase with custom-invented patterns requires learning a new system.
+
+### Verification Mechanism
+
+**Automated readability test:** Periodically have a **separate agent** (with no knowledge of the building agent's decisions) attempt to add a feature using only the code and docs. If it struggles, a human will too.
+
+### Gemini's "Boring Code" Principle
+
+> Humans hate "clever" AI code; they love "boring" AI code. Run a **Complexity Linter** — if a function has cyclomatic complexity >10, the reviewer agent rejects it.
+
+### Grok's Maintainability Checklist
+
+Every file gets: auto-generated JSDoc/TS comments + ADR for every major decision. No magic numbers, no over-abstraction. Mandatory "maintainability score" (cyclomatic complexity + test coverage + comment density) in the critic node.
+
+---
diff --git a/docs/building-coding-agents/19-when-to-scrap-and-start-over.md b/docs/building-coding-agents/19-when-to-scrap-and-start-over.md
new file mode 100644
index 000000000..35248d46f
--- /dev/null
+++ b/docs/building-coding-agents/19-when-to-scrap-and-start-over.md
@@ -0,0 +1,27 @@
+# When to Scrap and Start Over
+
+### The Four Signals (Cross-Model Convergence)
+
+| Signal | What It Looks Like |
+|--------|-------------------|
+| **Iteration count trending upward** | Task 1: 3 iterations. Task 2: 5. Task 3: 8. Complexity compounding, not resolving. |
+| **Test flakiness increasing** | Previously passing tests intermittently fail — hidden coupling being strained |
+| **Same files modified repeatedly** | Every task touches the same core module — god object absorbing too much responsibility |
+| **Acceptance criteria requiring exceptions** | "Works except when X" / "Passes if you ignore test Y" — agent negotiating with criteria |
+
+### The Reassessment Protocol
+
+When thresholds are crossed, trigger a **focused LLM call** with: manifest + original spec + task summaries + signal data. Prompt: *"Is the current approach viable or would a different architecture serve better? If different, what and why?"*
+
+### The Critical Architectural Enabler: Make Rewrites Cheap
+
+- Clean interface contracts + good test suites → rewriting internals while preserving interfaces is low-risk
+- Tests verify new implementation against same criteria
+- Interface contracts ensure nothing downstream breaks
+- **Every major approach on a branch** that can be discarded without affecting anything else
+
+Gemini's **"Sunk-Cost Heuristic"**: Monitor "Task Re-entry Rate." If the same 3 tests have been attempted >5 times, or if the refactor-to-feature ratio exceeds 4:1, trigger a "Whiteboard Session."
+
+Grok adds **parallel experimentation**: create a "Rewrite Branch" subgraph, run the same vision on a clean slate for one vertical slice, compare metrics. Only merge if superior. Cost is near-zero because it runs in parallel and is discarded on failure.
+
+---
diff --git a/docs/building-coding-agents/20-error-taxonomy-routing.md b/docs/building-coding-agents/20-error-taxonomy-routing.md
new file mode 100644
index 000000000..fa330dd36
--- /dev/null
+++ b/docs/building-coding-agents/20-error-taxonomy-routing.md
@@ -0,0 +1,28 @@
+# Error Taxonomy & Routing
+
+**The key insight:** Different errors have fundamentally different causes and optimal resolution strategies. Treating them uniformly is one of the biggest sources of wasted iterations.
+
+### The Optimal Taxonomy
+
+| Error Class | Context Needed | Optimal Handler | Escalation |
+|-------------|---------------|-----------------|------------|
+| **Syntax/Type** | Error message + offending file + types | Deterministic fast path (no LLM needed) | Only if fast path fails |
+| **Logic** | Failing test (expected vs actual) + implementation + spec | LLM with medium, focused context | After 3 attempts |
+| **Design** | Original spec + architecture + interface contracts + implementation | LLM with broad context | Often needs human input |
+| **Performance** | Profiling data + benchmarks + code | Specialist optimization agent | If regression >2x |
+| **Security** | Static analysis results + secure pattern reference | Conservative fix prompt | Always flag for review |
+| **Environment** | Environment config + recent dep changes + error output | Specialized env context | If not auto-resolved |
+| **Flaky Tests** | Run test multiple times to confirm flakiness | Quarantine, don't fix | Infrastructure agent |
+
+### Critical Routing Rules
+
+- **Flaky tests:** Detect by running failing tests multiple times. If inconsistent, **quarantine** — never trigger a fix cycle.
+- **Environment errors:** Classify as potentially environmental when they appear in build/startup rather than tests.
+- **Security:** Caught by static analysis in the deterministic layer, not by the LLM. Run security linting after every task.
+- **Syntax/Type:** Hit a deterministic fast path first. Missing import? Search codebase for the export. Only escalate to LLM if mechanical fix fails.
+
+### The Architecture
+
+The orchestrator classifies every error → selects the appropriate context assembly strategy → optionally selects a different prompt framing. The agent experiences this as *"I got exactly the information I need"* rather than *"I got a dump of everything."*
+
+---
diff --git a/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md b/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md
new file mode 100644
index 000000000..d98a3c8cb
--- /dev/null
+++ b/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md
@@ -0,0 +1,26 @@
+# Cost-Quality Tradeoff & Model Routing
+
+### The Key Insight
+
+Quality requirements vary enormously across task types, but most systems use the same model for everything.
+
+### The Optimal Model Routing Strategy (All 4 Agree)
+
+| Task Type | Model Tier | Rationale |
+|-----------|-----------|-----------|
+| **Planning, architecture, critique** | Frontier (always) | Planning errors cascade through every downstream task |
+| **Ambiguity resolution** | Frontier | Wrong interpretation = wasted execution |
+| **Well-specified implementation** (CRUD, standard UI, utilities) | Mid-tier / capable but cheaper | Task is well-defined, patterns established |
+| **Code review, test generation** | Mid-tier | Evaluating against known criteria, not generating novel solutions |
+| **Summarization** (task records, manifest updates) | Lightest viable | Language competence, minimal reasoning depth |
+| **Boilerplate** | Small/fast model | Predictable output, low reasoning requirements |
+
+### The Non-Obvious Cost Optimization
+
+> **Reducing wasted tokens is higher leverage than reducing token price.** A bloated context window costs money on every single call. Trimming 500 unnecessary tokens from context assembly saves more over a project than switching to a model that's 10% cheaper.
+
+### Measurement
+
+Track **cost-per-successful-task**, not cost-per-task. If the cheaper model requires twice as many iterations, it's not actually cheaper. Grok reports 60-70% cost reduction with zero quality loss when routing is done at the orchestrator level.
+
+---
diff --git a/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md b/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md
new file mode 100644
index 000000000..96ab8687e
--- /dev/null
+++ b/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md
@@ -0,0 +1,30 @@
+# Cross-Project Learning & Reusable Intelligence
+
+### What Transfers Well
+
+| Type | Transferability | Example |
+|------|----------------|---------|
+| **Problem-solving patterns** (abstract) | ✅ High | "When implementing OAuth, these are the common pitfalls and the architecture that avoids them" |
+| **Code templates & scaffolding** | ✅ With adaptation | Proven auth module structure, tested payment integration pattern |
+| **Learned pitfalls** | ✅ High | "When integrating Stripe, these edge cases around webhooks most implementations miss" |
+| **Project-specific conventions** | ❌ Does not transfer | Architectural decisions are contextual |
+| **Domain logic** | ❌ Does not transfer | Business rules are project-specific |
+
+### The Optimal Architecture: A Pattern Library
+
+Each pattern includes:
+- Description of the problem it solves
+- The approach and tradeoffs
+- Common pitfalls
+- Verification tests
+- Reference implementation
+
+### Growth Through Extraction, Not Manual Curation
+
+When a task completes with high quality (first-attempt success, no subsequent modifications, clean review), flag it as a **candidate for pattern extraction.** A dedicated pass determines whether the solution embodies a generalizable pattern.
+
+### The Critical Constraint
+
+Patterns should be **descriptive, not prescriptive** — "here's an approach that has worked well, with these tradeoffs" not "always do it this way." Grok adds an overfitting guard: require **3+ project examples** before promoting to reusable.
+
+---
diff --git a/docs/building-coding-agents/23-evolution-across-project-scale.md b/docs/building-coding-agents/23-evolution-across-project-scale.md
new file mode 100644
index 000000000..085acb8f5
--- /dev/null
+++ b/docs/building-coding-agents/23-evolution-across-project-scale.md
@@ -0,0 +1,31 @@
+# Evolution Across Project Scale
+
+### Phase Transitions (All 4 Models Converge)
+
+#### 0–1k LOC: The Monolithic Phase
+- Everything fits in one context window
+- Agent reads entire codebase, makes globally coherent decisions
+- Orchestrator is simple, manifest barely needed
+- **This is where most demos live**
+
+#### 1k–10k LOC: The Modular Phase
+- Codebase no longer fits in one context window
+- **What breaks first: consistency** — agent sees fragments that gradually diverge
+- Requirements: modular context assembly, manifest as essential map, interface contracts, convention enforcement (linting, formatting)
+
+#### 10k–50k LOC: The Architectural Phase
+- Relationships between components become non-obvious
+- Changing one thing might affect ten others through indirect dependencies
+- **What breaks:** planning quality — planner can't understand full system
+- Requirements: dependency-aware context assembly, impact analysis before execution, more conservative/incremental plans
+
+#### 50k–100k+ LOC: The Organizational Phase
+- System of systems — no single agent context can reason about the whole thing
+- **What breaks:** integration — interactions between components become so numerous that integration testing becomes the bottleneck
+- Requirements: hierarchical planning (system-level planner → component-level agents), continuous integration verification, possibly distributed orchestrator, hierarchy of manifests
+
+### The Meta-Insight
+
+> The architecture of your agentic system should **mirror the architecture of the software it's building.** Microservices projects need a more distributed orchestrator. Monolithic projects can use a simpler one.
+
+---
diff --git a/docs/building-coding-agents/24-security-trust-boundaries.md b/docs/building-coding-agents/24-security-trust-boundaries.md
new file mode 100644
index 000000000..b2d2c1e68
--- /dev/null
+++ b/docs/building-coding-agents/24-security-trust-boundaries.md
@@ -0,0 +1,35 @@
+# Security & Trust Boundaries
+
+### Hard Boundaries — Things the Agent Should NEVER Do (Universal Agreement)
+
+| Forbidden Action | Why |
+|-----------------|-----|
+| Access production systems directly | Agent's world is the dev environment, full stop |
+| Access or embed secrets | API keys, credentials should never appear in agent context or output |
+| Make network requests to arbitrary destinations | Restrictive firewall, whitelist only required services |
+| Modify its own orchestrator, prompts, or tools | Prevents removing safety constraints |
+| Execute commands outside the project directory | Sandbox to project dir + temp working dirs only |
+
+### The Sandboxing Architecture
+
+| Layer | Mechanism |
+|-------|-----------|
+| **Execution** | Containerized (Docker + seccomp), restricted filesystem, network policy |
+| **Filesystem** | Content-addressable storage — agent *proposes* changes, backend validates before writing |
+| **Secrets** | Vault proxy with short-lived tokens, never direct credentials |
+| **Commands** | Parsed and blocked for dangerous patterns (`rm -rf /`, `curl` to unknown hosts) |
+| **Dependencies** | Approved dependency list — new deps require auto-approval (pre-approved list) or human approval |
+
+### The Capability-Based Security Model
+
+The orchestrator runs **outside** the sandbox. The agent requests operations through a controlled API. The orchestrator validates every request before executing. The agent doesn't have direct access to anything — it has access to **tools that the orchestrator mediates.**
+
+### The Subtle Risk
+
+The agent introduces vulnerabilities not through malice but through **plausible-looking insecure patterns**: string concatenation for SQL queries, disabling CORS for convenience, logging sensitive data for debugging. Security linting rules should be tuned to catch these **AI-common patterns** specifically.
+
+### The Trust Model
+
+> Think of the agent as a **highly capable but unvetted contractor.** Give them the codebase and dev environment. Don't give them production credentials, deployment access, or the ability to modify security infrastructure. The goal isn't to make the agent safe by limiting capabilities — it's to make the **environment** safe so the agent can be maximally capable within it.
+
+---
diff --git a/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md b/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md
new file mode 100644
index 000000000..b67515ef2
--- /dev/null
+++ b/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md
@@ -0,0 +1,116 @@
+# Designing for Non-Technical Users ("Vibe Coders")
+
+**The question that matters most** — because everything else is worthless if only engineers can use it.
+
+### The Fundamental Principle (All 4 Models Converge)
+
+> The human should **never have to think in code.** Not in input, not in output, not in error messages, not in verification, not in debugging. The entire technical layer should be absorbed by the system. The human operates purely in **intent, vision, preference, and judgment.**
+
+### The Core Philosophy
+
+| Human Provides | System Provides |
+|---------------|-----------------|
+| Vision & imagination | Engineering intelligence |
+| Taste & aesthetic judgment | Technical translation |
+| Direction & priorities | Architecture & implementation |
+| "This feels off — calmer, like Linear" | Concrete CSS/animation/spacing changes |
+
+### The 10 Pillars of a Magical Non-Technical Experience
+
+#### 1. Intent-Based Input, Not Specification
+Users speak naturally: *"I want an app where people can upload recipes and find them by ingredient."* The system runs a **discovery conversation** that feels like talking to a brilliant product partner — not filling out a requirements form. Behind the scenes, answers compile into structured specs, acceptance criteria, and interface contracts the human never sees.
+
+> **Critical:** Questions should be about the *experience*, not the *implementation.* Never "relational or document store?" Always "should search find exact matches only, or also substitutable ingredients?"
+
+#### 2. Show the Thing, Not the Process
+After each milestone: a **working preview**, not a task list. The human interacts with the real thing at every checkpoint — clicks around, feels it, reacts. Progress is communicated as capability, not code: *"Your app can now save workouts and retrieve them later"* — not *"implemented REST endpoint."*
+
+#### 3. Collaborative Builder, Not Command Executor
+The agent should feel like a senior co-founder:
+```
+User: I want something like Notion but for recipes.
+
+Agent: Here's how I'd approach that:
+- Recipe database with tagging
+- Search by ingredient
+- Meal planner
+
+Would you like to prioritize simplicity or advanced features?
+```
+This implicitly educates the user while avoiding wrong builds from vague specs.
+
+#### 4. Problems, Not Errors
+The human should **never see a stack trace**. Technical failures are either resolved silently or translated to domain-level questions:
+
+| ❌ Never Show | ✅ Show Instead |
+|--------------|----------------|
+| `TypeError: Cannot read property 'map' of undefined` | "The recipe list isn't displaying correctly. I'm fixing it now — should be ready in a few minutes." |
+| `ECONNREFUSED localhost:5432` | "I'm having trouble connecting to the database. Working on it." |
+| Ambiguous technical decision | "When someone searches 'chicken,' should results include recipes where chicken is optional?" |
+
+#### 5. Reactions, Not Reviews
+Design for **reactions** to the running app, not code reviews. Like working with an interior designer: *"I love the color but the couch feels too big."* Visual, spatial, experiential feedback. **A/B comparison** is the most powerful pattern: show two versions, human picks which "feels better" in seconds.
+
+#### 6. Engineering Tradeoffs as Simple Choices
+Instead of *"Which auth provider?"* → ask *"Which matters more: A) Simplicity B) Maximum customization C) Enterprise security"* — the system maps answers to technical decisions automatically.
+
+#### 7. Safety Blanket
+- Auto-backups every slice + "undo entire feature" button
+- **"Vibe Checkpoints"** — before every major change, a save point. "Go back to how it was ten minutes ago."
+- Deployment previews before anything goes live
+- No irreversible actions without plain-English confirmation
+
+#### 8. Progressive Disclosure
+Start ultra-simple. Offer "Advanced mode" toggle only if the user ever asks. The system should **progressively reveal engineering** — at first pure vision → later architecture tweaking → eventually deep collaboration. Many users will never leave the simple mode, and that's fine.
+
+#### 9. Implicit Teaching
+When the user asks *"why is that taking longer?"*:
+> "The recipe search needs to look through all recipes every time. I'm adding an index — think of it like a table of contents — so it can find things faster."
+
+Optional, triggered by curiosity, expressed in analogy. Over time, users develop useful mental models of software **without it ever being mandatory.**
+
+#### 10. Invisible Deployment & Operations
+"I want to share this with people" → receive a URL. Behind the scenes: hosting, domain, database, SSL, CI/CD. Ongoing maintenance equally invisible. Simple dashboard: *"Your recipe app had 340 visitors this week. Everything is running smoothly."*
+
+### The Translation Layer (The Magic Glue)
+
+A deterministic "Human Translator" node at the front of every orchestrator cycle:
+
+```
+Raw user message + references
+        ↓
+  [Human Translator]
+        ↓
+Precise assumptions, invariants, success criteria
+        ↓
+  [Rest of the god-tier orchestrator pipeline]
+```
+
+The rest of the graph never sees "vibe language" — only clean spec. This preserves all technical quality while shielding the user.
+
+### The Scope Protection Layer
+
+Non-technical users often don't realize how complex their requests are. The system must be honest — gently:
+
+> *"That's a great idea. Adding social features is significant — it involves user profiles, a follow system, a feed algorithm, and notifications. It'll take as long as everything we've built so far. Want me to go ahead, or finish core recipe features first?"*
+
+This respects agency while providing the information needed for good decisions.
+
+### The Meta-Principle
+
+> The system is a **creative tool**, not a development tool. It should feel like Photoshop or Ableton — a powerful instrument that lets a person with vision manifest that vision without understanding the underlying mechanics. A music producer doesn't need to understand digital signal processing. A filmmaker doesn't need to understand codec compression. **A person with a great app idea shouldn't need to understand React component lifecycle.**
+
+### What Makes It Feel Magical
+
+The most powerful systems feel magical when they:
+- Understand vague ideas
+- Ask smart clarifying questions
+- Translate intent into architecture
+- Show visible progress quickly
+- Make experimentation safe
+- Explain decisions clearly
+- Hide complexity without blocking power
+
+> When these align, the user experiences: **"I can build anything I imagine."** That feeling is the real product.
+
+---
diff --git a/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md b/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md
new file mode 100644
index 000000000..07fa2a984
--- /dev/null
+++ b/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md
@@ -0,0 +1,33 @@
+# Cross-Cutting Themes (Where All 4 Models Converge)
+
+### Original Themes (Reinforced)
+
+These ideas appeared independently in all four conversations across both rounds, indicating the highest-confidence principles:
+
+1. **The LLM should only do what requires judgment.** Everything deterministic belongs in code.
+2. **Vertical slices are non-negotiable.** End-to-end working increments at every stage.
+3. **Context leanness = quality.** Less (but more relevant) context produces better outputs than more context.
+4. **Execution-based verification beats self-assessment.** Run the code. Trust test results over the model's opinion.
+5. **The orchestrator is the product.** The model is a commodity; the system around it is the differentiator.
+6. **State must be structured and deterministic.** Never let the LLM manage its own lifecycle or memory.
+7. **Speed comes from removing unnecessary work.** Not from doing the same work faster.
+8. **Failure recovery matters more than happy-path perfection.** Design the error paths first.
+9. **Human involvement should be high-leverage.** Specific questions with context, not open-ended reviews.
+10. **The system improves over time.** Track patterns, cache solutions, learn from failures.
+
+### New Themes (From Grey Area Deep-Dives)
+
+11. **Document assumptions, don't ask about every one.** Proceed with sensible defaults + transparent logging. Review at milestones, not in real-time.
+12. **The codebase is the lossless source of truth.** Summaries are lossy caches that must be periodically reconciled against actual code. Never summarize summaries.
+13. **Semantic conflicts are harder than syntactic ones.** Interface contracts must be behavioral specs, not just type signatures. Integration testing is a first-class concern, not an afterthought.
+14. **Observe before modifying.** Especially in legacy codebases — the agent must understand existing patterns before changing them. Preserve local consistency over global ideals.
+15. **Taste can be ~80-85% automated.** Convert subjective preferences to concrete, verifiable specs. Reserve human judgment for the remaining gestalt. The gap is closing fast with vision-capable models.
+16. **Irreversible operations are categorically different.** The agent prepares; the human executes. No exceptions.
+17. **"Boring" code is good code.** For handoff, enforce standard patterns, limit complexity, and write *why* comments. Automated readability testing catches problems before humans encounter them.
+18. **Make rewrites cheap, not rare.** Clean interfaces + good tests + branch-based experimentation = rewriting is a safe, routine operation rather than a crisis.
+19. **Route errors by type, not by severity.** Different error classes need different context, different handlers, and different escalation thresholds. Flaky tests should be quarantined, not fixed.
+20. **The magic is the translation layer.** For non-technical users, the entire value proposition is the invisible bridge between human intent and technical execution. Every moment the user has to think like a developer is a failure.
+
+---
+
+*Generated March 2026. Updated with grey-area deep-dive synthesis. Source material: two rounds of parallel deep-dive conversations with Claude (Anthropic), Gemini (Google), GPT (OpenAI), and Grok (xAI) on optimal autonomous AI coding agent architecture — including the 13 hardest unsolved problems and designing for non-technical users.*
diff --git a/docs/building-coding-agents/README.md b/docs/building-coding-agents/README.md
new file mode 100644
index 000000000..ea5b8eb81
--- /dev/null
+++ b/docs/building-coding-agents/README.md
@@ -0,0 +1,37 @@
+# Building Coding Agents — Research
+
+> Split into individual files for easier consumption.
+
+## Table of Contents
+
+- [01. Work Decomposition](./01-work-decomposition.md)
+- [02. What to Keep & Discard from Human Engineering](./02-what-to-keep-discard-from-human-engineering.md)
+- [03. State Machine & Context Management](./03-state-machine-context-management.md)
+- [04. Optimal Storage for Project Context](./04-optimal-storage-for-project-context.md)
+- [05. Parallelization Strategy](./05-parallelization-strategy.md)
+- [06. Maximizing Agent Autonomy & Superpowers](./06-maximizing-agent-autonomy-superpowers.md)
+- [07. System Prompt & LLM vs Deterministic Split](./07-system-prompt-llm-vs-deterministic-split.md)
+- [08. Speed Optimization](./08-speed-optimization.md)
+- [09. Top 10 Tips for a World-Class Agent](./09-top-10-tips-for-a-world-class-agent.md)
+- [10. Top 10 Pitfalls to Avoid](./10-top-10-pitfalls-to-avoid.md)
+- [11. God-Tier Context Engineering](./11-god-tier-context-engineering.md)
+- [12. Handling Ambiguity & Contradiction](./12-handling-ambiguity-contradiction.md)
+- [13. Long-Running Memory Fidelity](./13-long-running-memory-fidelity.md)
+- [14. Multi-Agent Semantic Conflict Resolution](./14-multi-agent-semantic-conflict-resolution.md)
+- [15. Legacy Code & Brownfield Onboarding](./15-legacy-code-brownfield-onboarding.md)
+- [16. Encoding Taste & Aesthetics](./16-encoding-taste-aesthetics.md)
+- [17. Irreversible Operations & Safety Architecture](./17-irreversible-operations-safety-architecture.md)
+- [18. The Handoff Problem: Agent → Human Maintainability](./18-the-handoff-problem-agent-human-maintainability.md)
+- [19. When to Scrap and Start Over](./19-when-to-scrap-and-start-over.md)
+- [20. Error Taxonomy & Routing](./20-error-taxonomy-routing.md)
+- [21. Cost-Quality Tradeoff & Model Routing](./21-cost-quality-tradeoff-model-routing.md)
+- [22. Cross-Project Learning & Reusable Intelligence](./22-cross-project-learning-reusable-intelligence.md)
+- [23. Evolution Across Project Scale](./23-evolution-across-project-scale.md)
+- [24. Security & Trust Boundaries](./24-security-trust-boundaries.md)
+- [25. Designing for Non-Technical Users ("Vibe Coders")](./25-designing-for-non-technical-users-vibe-coders.md)
+- [26. Cross-Cutting Themes (Where All 4 Models Converge)](./26-cross-cutting-themes-where-all-4-models-converge.md)
+
+---
+
+*Split into per-section files for surgical context loading.*
+
diff --git a/docs/captures-triage.md b/docs/captures-triage.md
new file mode 100644
index 000000000..1c5f7e3f7
--- /dev/null
+++ b/docs/captures-triage.md
@@ -0,0 +1,82 @@
+# Captures & Triage
+
+*Introduced in v2.19.0*
+
+Captures let you fire-and-forget thoughts during auto-mode execution. Instead of pausing auto-mode to steer, you can capture ideas, bugs, or scope changes and let GSD triage them at natural seams between tasks.
+
+## Quick Start
+
+While auto-mode is running (or any time):
+
+```
+/gsd capture "add rate limiting to the API endpoints"
+/gsd capture "the auth flow should support OAuth, not just JWT"
+```
+
+Captures are appended to `.gsd/CAPTURES.md` and triaged automatically between tasks.
+
+## How It Works
+
+### Pipeline
+
+```
+capture → triage → confirm → resolve → resume
+```
+
+1. **Capture** — `/gsd capture "thought"` appends to `.gsd/CAPTURES.md` with a timestamp and unique ID
+2. **Triage** — at natural seams between tasks (in `handleAgentEnd`), GSD detects pending captures and classifies them
+3. **Confirm** — the user is shown the proposed resolution and confirms or adjusts
+4. **Resolve** — the resolution is applied (task injection, replan trigger, deferral, etc.)
+5. **Resume** — auto-mode continues
+
+### Classification Types
+
+Each capture is classified into one of five types:
+
+| Type | Meaning | Resolution |
+|------|---------|------------|
+| `quick-task` | Small, self-contained fix | Inline quick task executed immediately |
+| `inject` | New task needed in current slice | Task injected into the active slice plan |
+| `defer` | Important but not urgent | Deferred to roadmap reassessment |
+| `replan` | Changes the current approach | Triggers slice replan with capture context |
+| `note` | Informational, no action needed | Acknowledged, no plan changes |
+
+### Automatic Triage
+
+Triage fires automatically between tasks during auto-mode. The triage prompt receives:
+- All pending captures
+- The current slice plan
+- The active roadmap
+
+The LLM classifies each capture and proposes a resolution. Plan-modifying resolutions (inject, replan) require user confirmation.
+
+### Manual Triage
+
+Trigger triage manually at any time:
+
+```
+/gsd triage
+```
+
+This is useful when you've accumulated several captures and want to process them before the next natural seam.
+
+## Dashboard Integration
+
+The progress widget shows a pending capture count badge when captures are waiting for triage. This is visible in both the `Ctrl+Alt+G` dashboard and the auto-mode progress widget.
+
+## Context Injection
+
+Capture context is automatically injected into:
+- **Replan-slice prompts** — so the replan knows what triggered it
+- **Reassess-roadmap prompts** — so deferred captures influence roadmap decisions
+
+## Worktree Awareness
+
+Captures always resolve to the **original project root's** `.gsd/CAPTURES.md`, not the worktree's local copy. This ensures captures from a steering terminal are visible to the auto-mode session running in a worktree.
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd capture "text"` | Capture a thought (quotes optional for single words) |
+| `/gsd triage` | Manually trigger triage of pending captures |
diff --git a/docs/ci-cd-pipeline.md b/docs/ci-cd-pipeline.md
new file mode 100644
index 000000000..80410d124
--- /dev/null
+++ b/docs/ci-cd-pipeline.md
@@ -0,0 +1,196 @@
+# CI/CD Pipeline Guide
+
+## Overview
+
+GSD 2 uses a three-stage promotion pipeline that automatically moves merged PRs through **Dev → Test → Prod** environments using npm dist-tags.
+
+```
+PR merged to main
+        │
+        ▼
+   ┌─────────┐    ci.yml passes (build, test, typecheck)
+   │   DEV   │    → publishes gsd-pi@<version>-dev.<sha> with @dev tag
+   └────┬────┘
+        ▼ (automatic if green)
+   ┌─────────┐    CLI smoke tests + LLM fixture replay
+   │  TEST   │    → promotes to @next tag
+   └────┬────┘    → pushes Docker image as :next
+        ▼ (manual approval required)
+   ┌─────────┐    optional real-LLM integration tests
+   │  PROD   │    → promotes to @latest tag
+   └─────────┘    → creates GitHub Release
+```
+
+## For Contributors: Testing Your PR Before It Ships
+
+### Install the Dev Build
+
+Every merged PR is immediately installable:
+
+```bash
+# Latest dev build (bleeding edge, every merged PR)
+npx gsd-pi@dev
+
+# Test candidate (passed smoke + fixture tests)
+npx gsd-pi@next
+
+# Stable production release
+npx gsd-pi@latest    # or just: npx gsd-pi
+```
+
+### Using Docker
+
+```bash
+# Test candidate
+docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:next --version
+
+# Stable
+docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:latest --version
+```
+
+### Checking if a Fix Landed
+
+1. Find the PR's merge commit SHA (first 7 chars)
+2. Check if it's in `@dev`: `npm view gsd-pi@dev version`
+   - If the version ends in `-dev.<your-sha>`, your PR is in dev
+3. Check if it promoted to `@next`: `npm view gsd-pi@next version`
+4. Check if it's in production: `npm view gsd-pi@latest version`
+
+## For Maintainers
+
+### Pipeline Workflows
+
+| Workflow | File | Trigger | Purpose |
+|----------|------|---------|---------|
+| CI | `ci.yml` | PR + push to main | Build, test, typecheck — **gate for all promotions** |
+| Release Pipeline | `pipeline.yml` | After CI succeeds on main | Three-stage promotion |
+| Native Binaries | `build-native.yml` | `v*` tags | Cross-compile platform binaries |
+| Dev Cleanup | `cleanup-dev-versions.yml` | Weekly (Monday 06:00 UTC) | Unpublish `-dev.` versions older than 30 days |
+| AI Triage | `triage.yml` | New issues + PRs | Automated classification via Claude Haiku (v2.36) |
+
+**CI optimization (v2.38):** GitHub Actions minutes were reduced ~60-70% (~10k → ~3-4k/month) through workflow consolidation and caching improvements.
+
+**Pipeline optimization (v2.41):**
+- **Shallow clones** — CI lint and build jobs use `fetch-depth: 1` or `fetch-depth: 2` instead of full history, saving ~30-60s per job
+- **npm cache in pipeline** — dev-publish, test-verify, and prod-release now use `cache: 'npm'` on setup-node, saving ~1-2 min per job on repeat runs
+- **Exponential backoff** — npm registry propagation waits in `build-native.yml` replaced hardcoded `sleep 30` + fixed 15s retries with exponential backoff (5s → 10s → 20s → 30s cap), typically finishing in <15s when the registry is fast
+- **Security hardening** — pipeline.yml moved `${{ }}` expressions from `run:` blocks to `env:` variables to prevent command injection vectors
+### Docs-Only PR Detection (v2.41)
+
+CI automatically detects when a PR contains only documentation changes (`.md` files and `docs/` content). When docs-only:
+
+- **Skipped:** `build`, `windows-portability` (no code to compile or test)
+- **Still runs:** `lint` (secret scanning, `.gsd/` check), `docs-check` (prompt injection scan)
+
+This saves CI minutes on documentation PRs while still enforcing security checks.
+
+### Prompt Injection Scan (v2.41)
+
+The `docs-check` job runs `scripts/docs-prompt-injection-scan.sh` on every PR that touches markdown files. It scans documentation prose (excluding fenced code blocks) for patterns that could manipulate LLM behavior when docs are ingested as context:
+
+- **System prompt markers** — `<system-prompt>`, `<|im_start|>system`, `[SYSTEM]:`
+- **Role/instruction overrides** — `ignore previous instructions`, `you are now`, `new instructions:`
+- **Hidden HTML directives** — `<!-- PROMPT:`, `<!-- INSTRUCTION:`
+- **Tool call injection** — `<tool_call>`, `<function_call>`, `<invoke`
+- **Invisible Unicode** — zero-width character sequences that hide directives
+
+Content inside fenced code blocks (` ``` `) is excluded — patterns in code examples are expected and legitimate.
+
+**False positives:** Add exceptions to `.prompt-injection-scanignore` using the same format as `.secretscanignore` (one pattern per line, `file:regex` for file-scoped exceptions).
+
+### Gating Tests
+
+The pipeline only triggers after `ci.yml` passes. Key gating tests include:
+
+- **Unit tests** (`npm run test:unit`) — includes `auto-session-encapsulation.test.ts` which enforces that all auto-mode state is encapsulated in `AutoSession`, plus dispatch loop regression tests that exercise the full `deriveState → resolveDispatch → idempotency` chain without an LLM. Any PR adding module-level mutable state to `auto.ts` will fail CI and block the pipeline.
+- **Integration tests** (`npm run test:integration`)
+- **Extension typecheck** (`npm run typecheck:extensions`)
+- **Package validation** (`npm run validate-pack`)
+- **Smoke tests** (`npm run test:smoke`) — run post-build in the pipeline against the local binary and again against the globally-installed `@dev` package
+- **Fixture tests** (`npm run test:fixtures`) — replay recorded LLM conversations without hitting real APIs
+- **Live regression tests** (`npm run test:live-regression`) — run against the installed binary in the Test stage to catch runtime regressions before promotion to `@next`
+
+### Approving a Prod Release
+
+1. A version reaches the Test stage automatically
+2. In GitHub Actions, the `prod-release` job will show "Waiting for review"
+3. Click **Review deployments** → select `prod` → **Approve**
+4. The version is promoted to `@latest` and a GitHub Release is created
+
+To enable live LLM tests during Prod promotion:
+- Set the `RUN_LIVE_TESTS` environment variable to `true` on the `prod` environment
+
+### Rolling Back a Release
+
+If a broken version reaches production:
+
+```bash
+# Roll back npm
+npm dist-tag add gsd-pi@<previous-good-version> latest
+
+# Roll back Docker
+docker pull ghcr.io/gsd-build/gsd-pi:<previous-good-version>
+docker tag ghcr.io/gsd-build/gsd-pi:<previous-good-version> ghcr.io/gsd-build/gsd-pi:latest
+docker push ghcr.io/gsd-build/gsd-pi:latest
+```
+
+For `@dev` or `@next` rollbacks, the next successful merge will overwrite the tag automatically.
+
+### GitHub Configuration Required
+
+| Setting | Value |
+|---------|-------|
+| Environment: `dev` | No protection rules |
+| Environment: `test` | No protection rules |
+| Environment: `prod` | Required reviewers: maintainers |
+| Secret: `NPM_TOKEN` | All environments |
+| Secret: `ANTHROPIC_API_KEY` | Prod environment only |
+| Secret: `OPENAI_API_KEY` | Prod environment only |
+| Variable: `RUN_LIVE_TESTS` | `false` (set to `true` to enable live LLM tests) |
+| GHCR | Enabled for the `gsd-build` org |
+
+### Docker Images
+
+| Image | Base | Purpose | Tags |
+|-------|------|---------|------|
+| `ghcr.io/gsd-build/gsd-ci-builder` | `node:24-bookworm` | CI build environment with Rust toolchain | `:latest`, `:<date>` |
+| `ghcr.io/gsd-build/gsd-pi` | `node:24-slim` | User-facing runtime | `:latest`, `:next`, `:v<version>` |
+
+The CI builder image is rebuilt automatically when the `Dockerfile` changes. It eliminates ~3-5 min of toolchain setup per CI run.
+
+## LLM Fixture Tests
+
+The fixture system records and replays LLM conversations without hitting real APIs (zero cost).
+
+### Running Fixture Tests
+
+```bash
+npm run test:fixtures
+```
+
+### Recording New Fixtures
+
+```bash
+# Set your API key, then record
+GSD_FIXTURE_MODE=record GSD_FIXTURE_DIR=./tests/fixtures/recordings \
+  node --experimental-strip-types tests/fixtures/record.ts
+```
+
+Fixtures are JSON files in `tests/fixtures/recordings/`. Each one captures a conversation's request/response pairs and replays them by turn index.
+
+### When to Re-Record
+
+Re-record fixtures when:
+- Provider wire format changes (e.g., new field in Anthropic response)
+- Tool definitions change (affects request shape)
+- System prompt changes (may cause turn count mismatch)
+
+## Version Strategy
+
+| Tag | Published | Format | Who uses it |
+|-----|-----------|--------|-------------|
+| `@dev` | Every merged PR | `2.27.0-dev.a3f2c1b` | Developers verifying fixes |
+| `@next` | Auto-promoted from dev | Same version | Early adopters, beta testers |
+| `@latest` | Manually approved | Same version | Production users |
+
+Old `-dev.` versions are cleaned up weekly (30-day retention).
diff --git a/docs/commands.md b/docs/commands.md
new file mode 100644
index 000000000..5826978df
--- /dev/null
+++ b/docs/commands.md
@@ -0,0 +1,245 @@
+# Commands Reference
+
+## Session Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd` | Step mode — execute one unit at a time, pause between each |
+| `/gsd next` | Explicit step mode (same as `/gsd`) |
+| `/gsd auto` | Autonomous mode — research, plan, execute, commit, repeat |
+| `/gsd quick` | Execute a quick task with GSD guarantees (atomic commits, state tracking) without full planning overhead |
+| `/gsd stop` | Stop auto mode gracefully |
+| `/gsd steer` | Hard-steer plan documents during execution |
+| `/gsd discuss` | Discuss architecture and decisions (works alongside auto mode) |
+| `/gsd status` | Progress dashboard |
+| `/gsd queue` | Queue and reorder future milestones (safe during auto mode) |
+| `/gsd capture` | Fire-and-forget thought capture (works during auto mode) |
+| `/gsd triage` | Manually trigger triage of pending captures |
+| `/gsd forensics` | Full-access GSD debugger — structured anomaly detection, unit traces, and LLM-guided root-cause analysis for auto-mode failures |
+| `/gsd cleanup` | Clean up GSD state files and stale worktrees |
+| `/gsd visualize` | Open workflow visualizer (progress, deps, metrics, timeline) |
+| `/gsd export --html` | Generate self-contained HTML report for current or completed milestone |
+| `/gsd export --html --all` | Generate retrospective reports for all milestones at once |
+| `/gsd update` | Update GSD to the latest version in-session |
+| `/gsd knowledge` | Add persistent project knowledge (rule, pattern, or lesson) |
+| `/gsd help` | Categorized command reference with descriptions for all GSD subcommands |
+
+## Configuration & Diagnostics
+
+| Command | Description |
+|---------|-------------|
+| `/gsd prefs` | Model selection, timeouts, budget ceiling |
+| `/gsd mode` | Switch workflow mode (solo/team) with coordinated defaults for milestone IDs, git commit behavior, and documentation |
+| `/gsd config` | Re-run the provider setup wizard (LLM provider + tool keys) |
+| `/gsd keys` | API key manager — list, add, remove, test, rotate, doctor |
+| `/gsd doctor` | Runtime health checks with auto-fix — issues surface in real time across widget, visualizer, and HTML reports (v2.40) |
+| `/gsd skill-health` | Skill lifecycle dashboard — usage stats, success rates, token trends, staleness warnings |
+| `/gsd skill-health <name>` | Detailed view for a single skill |
+| `/gsd skill-health --declining` | Show only skills flagged for declining performance |
+| `/gsd skill-health --stale N` | Show skills unused for N+ days |
+| `/gsd hooks` | Show configured post-unit and pre-dispatch hooks |
+| `/gsd run-hook` | Manually trigger a specific hook |
+| `/gsd migrate` | Migrate a v1 `.planning` directory to `.gsd` format |
+
+## Milestone Management
+
+| Command | Description |
+|---------|-------------|
+| `/gsd new-milestone` | Create a new milestone |
+| `/gsd skip` | Prevent a unit from auto-mode dispatch |
+| `/gsd undo` | Revert last completed unit |
+| Park milestone | Available via `/gsd` wizard → "Milestone actions" → "Park" |
+| Unpark milestone | Available via `/gsd` wizard → "Milestone actions" → "Unpark" |
+| Discard milestone | Available via `/gsd` wizard → "Milestone actions" → "Discard" |
+
+## Parallel Orchestration
+
+| Command | Description |
+|---------|-------------|
+| `/gsd parallel start` | Analyze eligibility, confirm, and start workers |
+| `/gsd parallel status` | Show all workers with state, progress, and cost |
+| `/gsd parallel stop [MID]` | Stop all workers or a specific milestone's worker |
+| `/gsd parallel pause [MID]` | Pause all workers or a specific one |
+| `/gsd parallel resume [MID]` | Resume paused workers |
+| `/gsd parallel merge [MID]` | Merge completed milestones back to main |
+
+See [Parallel Orchestration](./parallel-orchestration.md) for full documentation.
+
+## GitHub Sync (v2.39)
+
+| Command | Description |
+|---------|-------------|
+| `/github-sync bootstrap` | Initial setup — creates GitHub Milestones, Issues, and draft PRs from current `.gsd/` state |
+| `/github-sync status` | Show sync mapping counts (milestones, slices, tasks) |
+
+Enable with `github.enabled: true` in preferences. Requires `gh` CLI installed and authenticated. Sync mapping is persisted in `.gsd/.github-sync.json`.
+
+## Git Commands
+
+| Command | Description |
+|---------|-------------|
+| `/worktree` (`/wt`) | Git worktree lifecycle — create, switch, merge, remove |
+
+## Session Management
+
+| Command | Description |
+|---------|-------------|
+| `/clear` | Start a new session (alias for `/new`) |
+| `/exit` | Graceful shutdown — saves session state before exiting |
+| `/kill` | Kill GSD process immediately |
+| `/model` | Switch the active model |
+| `/login` | Log in to an LLM provider |
+| `/thinking` | Toggle thinking level during sessions |
+| `/voice` | Toggle real-time speech-to-text (macOS, Linux) |
+
+## Keyboard Shortcuts
+
+| Shortcut | Action |
+|----------|--------|
+| `Ctrl+Alt+G` | Toggle dashboard overlay |
+| `Ctrl+Alt+V` | Toggle voice transcription |
+| `Ctrl+Alt+B` | Show background shell processes |
+| `Ctrl+V` / `Alt+V` | Paste image from clipboard (screenshot → vision input) |
+| `Escape` | Pause auto mode (preserves conversation) |
+
+> **Note:** In terminals without Kitty keyboard protocol support (macOS Terminal.app, JetBrains IDEs), slash-command fallbacks are shown instead of `Ctrl+Alt` shortcuts.
+>
+> **Tip:** If `Ctrl+V` is intercepted by your terminal (e.g. Warp), use `Alt+V` instead for clipboard image paste.
+
+## CLI Flags
+
+| Flag | Description |
+|------|-------------|
+| `gsd` | Start a new interactive session |
+| `gsd --continue` (`-c`) | Resume the most recent session for the current directory |
+| `gsd --model <id>` | Override the default model for this session |
+| `gsd --print "msg"` (`-p`) | Single-shot prompt mode (no TUI) |
+| `gsd --mode <text\|json\|rpc\|mcp>` | Output mode for non-interactive use |
+| `gsd --list-models [search]` | List available models and exit |
+| `gsd sessions` | Interactive session picker — list all saved sessions for the current directory and choose one to resume |
+| `gsd --debug` | Enable structured JSONL diagnostic logging for troubleshooting dispatch and state issues |
+| `gsd config` | Set up global API keys for search and docs tools (saved to `~/.gsd/agent/auth.json`, applies to all projects). See [Global API Keys](./configuration.md#global-api-keys-gsd-config). |
+| `gsd update` | Update GSD to the latest version |
+| `gsd headless new-milestone` | Create a new milestone from a context file (headless — no TUI required) |
+
+## Headless Mode
+
+`gsd headless` runs `/gsd` commands without a TUI — designed for CI, cron jobs, and scripted automation. It spawns a child process in RPC mode, auto-responds to interactive prompts, detects completion, and exits with meaningful exit codes.
+
+```bash
+# Run auto mode (default)
+gsd headless
+
+# Run a single unit
+gsd headless next
+
+# Instant JSON snapshot — no LLM, ~50ms
+gsd headless query
+
+# With timeout for CI
+gsd headless --timeout 600000 auto
+
+# Force a specific phase
+gsd headless dispatch plan
+
+# Create a new milestone from a context file and start auto mode
+gsd headless new-milestone --context brief.md --auto
+
+# Create a milestone from inline text
+gsd headless new-milestone --context-text "Build a REST API with auth"
+
+# Pipe context from stdin
+echo "Build a CLI tool" | gsd headless new-milestone --context -
+```
+
+| Flag | Description |
+|------|-------------|
+| `--timeout N` | Overall timeout in milliseconds (default: 300000 / 5 min) |
+| `--max-restarts N` | Auto-restart on crash with exponential backoff (default: 3). Set 0 to disable |
+| `--json` | Stream all events as JSONL to stdout |
+| `--model ID` | Override the model for the headless session |
+| `--context <file>` | Context file for `new-milestone` (use `-` for stdin) |
+| `--context-text <text>` | Inline context text for `new-milestone` |
+| `--auto` | Chain into auto-mode after milestone creation |
+
+**Exit codes:** `0` = complete, `1` = error or timeout, `2` = blocked.
+
+Any `/gsd` subcommand works as a positional argument — `gsd headless status`, `gsd headless doctor`, `gsd headless dispatch execute`, etc.
+
+### `gsd headless query`
+
+Returns a single JSON object with the full project snapshot — no LLM session, no RPC child, instant response (~50ms). This is the recommended way for orchestrators and scripts to inspect GSD state.
+
+```bash
+gsd headless query | jq '.state.phase'
+# "executing"
+
+gsd headless query | jq '.next'
+# {"action":"dispatch","unitType":"execute-task","unitId":"M001/S01/T03"}
+
+gsd headless query | jq '.cost.total'
+# 4.25
+```
+
+**Output schema:**
+
+```json
+{
+  "state": {
+    "phase": "executing",
+    "activeMilestone": { "id": "M001", "title": "..." },
+    "activeSlice": { "id": "S01", "title": "..." },
+    "activeTask": { "id": "T01", "title": "..." },
+    "registry": [{ "id": "M001", "status": "active" }, ...],
+    "progress": { "milestones": { "done": 0, "total": 2 }, "slices": { "done": 1, "total": 3 } },
+    "blockers": []
+  },
+  "next": {
+    "action": "dispatch",
+    "unitType": "execute-task",
+    "unitId": "M001/S01/T01"
+  },
+  "cost": {
+    "workers": [{ "milestoneId": "M001", "cost": 1.50, "state": "running", ... }],
+    "total": 1.50
+  }
+}
+```
+
+## MCP Server Mode
+
+`gsd --mode mcp` runs GSD as a [Model Context Protocol](https://modelcontextprotocol.io) server over stdin/stdout. This exposes all GSD tools (read, write, edit, bash, etc.) to external AI clients — Claude Desktop, VS Code Copilot, and any MCP-compatible host.
+
+```bash
+# Start GSD as an MCP server
+gsd --mode mcp
+```
+
+The server registers all tools from the agent session and maps MCP `tools/list` and `tools/call` requests to GSD tool definitions. It runs until the transport closes.
+
+## In-Session Update
+
+`/gsd update` checks npm for a newer version of GSD and installs it without leaving the session.
+
+```bash
+/gsd update
+# Current version: v2.36.0
+# Checking npm registry...
+# Updated to v2.37.0. Restart GSD to use the new version.
+```
+
+If already up to date, it reports so and takes no action.
+
+## Export
+
+`/gsd export` generates reports of milestone work.
+
+```bash
+# Generate HTML report for the active milestone
+/gsd export --html
+
+# Generate retrospective reports for ALL milestones at once
+/gsd export --html --all
+```
+
+Reports are saved to `.gsd/reports/` with a browseable `index.html` that links to all generated snapshots.
diff --git a/docs/configuration.md b/docs/configuration.md
new file mode 100644
index 000000000..429ebde29
--- /dev/null
+++ b/docs/configuration.md
@@ -0,0 +1,744 @@
+# Configuration
+
+GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project-local). Manage interactively with `/gsd prefs`.
+
+## `/gsd prefs` Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd prefs` | Open the global preferences wizard (default) |
+| `/gsd prefs global` | Interactive wizard for global preferences (`~/.gsd/preferences.md`) |
+| `/gsd prefs project` | Interactive wizard for project preferences (`.gsd/preferences.md`) |
+| `/gsd prefs status` | Show current preference files, merged values, and skill resolution status |
+| `/gsd prefs wizard` | Alias for `/gsd prefs global` |
+| `/gsd prefs setup` | Alias for `/gsd prefs wizard` — creates preferences file if missing |
+| `/gsd prefs import-claude` | Import Claude marketplace plugins and skills as namespaced GSD components |
+| `/gsd prefs import-claude global` | Import to global scope |
+| `/gsd prefs import-claude project` | Import to project scope |
+
+## Preferences File Format
+
+Preferences use YAML frontmatter in a markdown file:
+
+```yaml
+---
+version: 1
+models:
+  research: claude-sonnet-4-6
+  planning: claude-opus-4-6
+  execution: claude-sonnet-4-6
+  completion: claude-sonnet-4-6
+skill_discovery: suggest
+auto_supervisor:
+  soft_timeout_minutes: 20
+  idle_timeout_minutes: 10
+  hard_timeout_minutes: 30
+budget_ceiling: 50.00
+token_profile: balanced
+---
+```
+
+## Global vs Project Preferences
+
+| Scope | Path | Applies to |
+|-------|------|-----------|
+| Global | `~/.gsd/preferences.md` | All projects |
+| Project | `.gsd/preferences.md` | Current project only |
+
+**Merge behavior:**
+- **Scalar fields** (`skill_discovery`, `budget_ceiling`): project wins if defined
+- **Array fields** (`always_use_skills`, etc.): concatenated (global first, then project)
+- **Object fields** (`models`, `git`, `auto_supervisor`): shallow-merged, project overrides per-key
+
+## Global API Keys (`/gsd config`)
+
+Tool API keys are stored globally in `~/.gsd/agent/auth.json` and apply to all projects automatically. Set them once with `/gsd config` — no need to configure per-project `.env` files.
+
+```bash
+/gsd config
+```
+
+This opens an interactive wizard showing which keys are configured and which are missing. Select a tool to enter its key.
+
+### Supported keys
+
+| Tool | Environment Variable | Purpose | Get a key |
+|------|---------------------|---------|-----------|
+| Tavily Search | `TAVILY_API_KEY` | Web search for non-Anthropic models | [tavily.com/app/api-keys](https://tavily.com/app/api-keys) |
+| Brave Search | `BRAVE_API_KEY` | Web search for non-Anthropic models | [brave.com/search/api](https://brave.com/search/api) |
+| Context7 Docs | `CONTEXT7_API_KEY` | Library documentation lookup | [context7.com/dashboard](https://context7.com/dashboard) |
+
+### How it works
+
+1. `/gsd config` saves keys to `~/.gsd/agent/auth.json`
+2. On every session start, `loadToolApiKeys()` reads the file and sets environment variables
+3. Keys apply to all projects — no per-project setup required
+4. Environment variables (`export BRAVE_API_KEY=...`) take precedence over saved keys
+5. Anthropic models don't need Brave/Tavily — they have built-in web search
+
+## MCP Servers
+
+GSD can connect to external MCP servers configured in project files. This is useful for local tools, internal APIs, self-hosted services, or integrations that aren't built in as native GSD extensions.
+
+### Config file locations
+
+GSD reads MCP client configuration from these project-local paths:
+
+- `.mcp.json`
+- `.gsd/mcp.json`
+
+If both files exist, server names are merged and the first definition found wins. Use:
+
+- `.mcp.json` for repo-shared MCP configuration you may want to commit
+- `.gsd/mcp.json` for local-only MCP configuration you do **not** want to share
+
+### Supported transports
+
+| Transport | Config shape | Use when |
+|-----------|--------------|----------|
+| `stdio` | `command` + optional `args`, `env`, `cwd` | Launching a local MCP server process |
+| `http` | `url` | Connecting to an already-running MCP server over HTTP |
+
+### Example: stdio server
+
+```json
+{
+  "mcpServers": {
+    "my-server": {
+      "type": "stdio",
+      "command": "/absolute/path/to/python3",
+      "args": ["/absolute/path/to/server.py"],
+      "env": {
+        "API_URL": "http://localhost:8000"
+      }
+    }
+  }
+}
+```
+
+### Example: HTTP server
+
+```json
+{
+  "mcpServers": {
+    "my-http-server": {
+      "url": "http://localhost:8080/mcp"
+    }
+  }
+}
+```
+
+### Verifying a server
+
+After adding config, verify it from a GSD session:
+
+```text
+mcp_servers
+mcp_discover(server="my-server")
+mcp_call(server="my-server", tool="<tool_name>", args={...})
+```
+
+Recommended verification order:
+
+1. `mcp_servers` — confirms GSD can see the config file and parse the server entry
+2. `mcp_discover` — confirms the server process starts and responds to `tools/list`
+3. `mcp_call` — confirms at least one real tool invocation works
+
+### Notes
+
+- Use absolute paths for local executables and scripts when possible.
+- For `stdio` servers, prefer setting required environment variables directly in the MCP config instead of relying on an interactive shell profile.
+- If a server is team-shared and safe to commit, `.mcp.json` is usually the better home.
+- If a server depends on machine-local paths, personal services, or local-only secrets, prefer `.gsd/mcp.json`.
+
+## Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `GSD_HOME` | `~/.gsd` | Global GSD directory. All paths derive from this unless individually overridden. Affects preferences, skills, sessions, and per-project state. (v2.39) |
+| `GSD_PROJECT_ID` | (auto-hash) | Override the automatic project identity hash. Per-project state goes to `$GSD_HOME/projects/<GSD_PROJECT_ID>/` instead of the computed hash. Useful for CI/CD or sharing state across clones of the same repo. (v2.39) |
+| `GSD_STATE_DIR` | `$GSD_HOME` | Per-project state root. Controls where `projects/<repo-hash>/` directories are created. Takes precedence over `GSD_HOME` for project state. |
+| `GSD_CODING_AGENT_DIR` | `$GSD_HOME/agent` | Agent directory containing managed resources, extensions, and auth. Takes precedence over `GSD_HOME` for agent paths. |
+
+## All Settings
+
+### `models`
+
+Per-phase model selection. Each key accepts a model string or an object with fallbacks.
+
+```yaml
+models:
+  research: claude-sonnet-4-6
+  planning:
+    model: claude-opus-4-6
+    fallbacks:
+      - openrouter/z-ai/glm-5
+  execution: claude-sonnet-4-6
+  execution_simple: claude-haiku-4-5-20250414
+  completion: claude-sonnet-4-6
+  subagent: claude-sonnet-4-6
+```
+
+**Phases:** `research`, `planning`, `execution`, `execution_simple`, `completion`, `subagent`
+
+- `execution_simple` — used for tasks classified as "simple" by the [complexity router](./token-optimization.md#complexity-based-task-routing)
+- `subagent` — model for delegated subagent tasks (scout, researcher, worker)
+- Provider targeting: use `provider/model` format (e.g., `bedrock/claude-sonnet-4-6`) or the `provider` field in object format
+- Omit a key to use whatever model is currently active
+
+### Custom Model Definitions (`models.json`)
+
+Define custom models and providers in `~/.gsd/agent/models.json`. This lets you add models not included in the default registry — useful for self-hosted endpoints (Ollama, vLLM, LM Studio), fine-tuned models, proxies, or new provider releases.
+
+GSD resolves models.json with fallback logic:
+1. `~/.gsd/agent/models.json` — primary (GSD)
+2. `~/.pi/agent/models.json` — fallback (Pi)
+3. If neither exists, creates `~/.gsd/agent/models.json`
+
+**Quick example for local models (Ollama):**
+
+```json
+{
+  "providers": {
+    "ollama": {
+      "baseUrl": "http://localhost:11434/v1",
+      "api": "openai-completions",
+      "apiKey": "ollama",
+      "models": [
+        { "id": "llama3.1:8b" },
+        { "id": "qwen2.5-coder:7b" }
+      ]
+    }
+  }
+}
+```
+
+The file reloads each time you open `/model` — no restart needed.
+
+For full documentation including provider configuration, model overrides, OpenAI compatibility settings, and advanced examples, see the [Custom Models Guide](./custom-models.md).
+
+**With fallbacks:**
+
+```yaml
+models:
+  planning:
+    model: claude-opus-4-6
+    fallbacks:
+      - openrouter/z-ai/glm-5
+      - openrouter/moonshotai/kimi-k2.5
+    provider: bedrock    # optional: target a specific provider
+```
+
+When a model fails to switch (provider unavailable, rate limited, credits exhausted), GSD automatically tries the next model in the `fallbacks` list.
+
+### Community Provider Extensions
+
+For providers not built into GSD, community extensions can add full provider support with proper model definitions, thinking format configuration, and interactive API key setup.
+
+| Extension | Provider | Models | Install |
+|-----------|----------|--------|---------|
+| [`pi-dashscope`](https://www.npmjs.com/package/pi-dashscope) | Alibaba DashScope (ModelStudio) | Qwen3, GLM-5, MiniMax M2.5, Kimi K2.5 | `gsd install npm:pi-dashscope` |
+
+Community extensions are recommended over the built-in `alibaba-coding-plan` provider for DashScope models — they use the correct OpenAI-compatible endpoint and include per-model compatibility flags for thinking mode.
+
+### `token_profile`
+
+Coordinates model selection, phase skipping, and context compression. See [Token Optimization](./token-optimization.md).
+
+Values: `budget`, `balanced` (default), `quality`
+
+| Profile | Behavior |
+|---------|----------|
+| `budget` | Skips research + reassessment phases, uses cheaper models |
+| `balanced` | Default behavior — all phases run, standard model selection |
+| `quality` | All phases run, prefers higher-quality models |
+
+### `phases`
+
+Fine-grained control over which phases run in auto mode:
+
+```yaml
+phases:
+  skip_research: false        # skip milestone-level research
+  skip_reassess: false        # skip roadmap reassessment after each slice
+  skip_slice_research: true   # skip per-slice research
+  reassess_after_slice: true  # enable roadmap reassessment after each slice (required for reassessment)
+  require_slice_discussion: false  # pause auto-mode before each slice for discussion
+```
+
+These are usually set automatically by `token_profile`, but can be overridden explicitly.
+
+> **Note:** Roadmap reassessment requires `reassess_after_slice: true` to be set explicitly. Without it, reassessment is skipped regardless of `skip_reassess`.
+
+### `skill_discovery`
+
+Controls how GSD finds and applies skills during auto mode.
+
+| Value | Behavior |
+|-------|----------|
+| `auto` | Skills found and applied automatically |
+| `suggest` | Skills identified during research but not auto-installed (default) |
+| `off` | Skill discovery disabled |
+
+### `auto_supervisor`
+
+Timeout thresholds for auto mode supervision:
+
+```yaml
+auto_supervisor:
+  model: claude-sonnet-4-6    # optional: model for supervisor (defaults to active model)
+  soft_timeout_minutes: 20    # warn LLM to wrap up
+  idle_timeout_minutes: 10    # detect stalls
+  hard_timeout_minutes: 30    # pause auto mode
+```
+
+### `budget_ceiling`
+
+Maximum USD to spend during auto mode. No `$` sign — just the number.
+
+```yaml
+budget_ceiling: 50.00
+```
+
+### `budget_enforcement`
+
+How the budget ceiling is enforced:
+
+| Value | Behavior |
+|-------|----------|
+| `warn` | Log a warning but continue |
+| `pause` | Pause auto mode (default when ceiling is set) |
+| `halt` | Stop auto mode entirely |
+
+### `context_pause_threshold`
+
+Context window usage percentage (0-100) at which auto mode pauses for checkpointing. Set to `0` to disable.
+
+```yaml
+context_pause_threshold: 80   # pause at 80% context usage
+```
+
+Default: `0` (disabled)
+
+### `uat_dispatch`
+
+Enable automatic UAT (User Acceptance Test) runs after slice completion:
+
+```yaml
+uat_dispatch: true
+```
+
+### Verification (v2.26)
+
+Configure shell commands that run automatically after every task execution. Failures trigger auto-fix retries before advancing.
+
+```yaml
+verification_commands:
+  - npm run lint
+  - npm run test
+verification_auto_fix: true       # auto-retry on failure (default: true)
+verification_max_retries: 2       # max retry attempts (default: 2)
+```
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `verification_commands` | string[] | `[]` | Shell commands to run after task execution |
+| `verification_auto_fix` | boolean | `true` | Auto-retry when verification fails |
+| `verification_max_retries` | number | `2` | Maximum auto-fix retry attempts |
+
+### `auto_report` (v2.26)
+
+Auto-generate HTML reports after milestone completion:
+
+```yaml
+auto_report: true    # default: true
+```
+
+Reports are written to `.gsd/reports/` as self-contained HTML files with embedded CSS/JS.
+
+### `unique_milestone_ids`
+
+Generate milestone IDs with a random suffix to avoid collisions in team workflows:
+
+```yaml
+unique_milestone_ids: true
+# Produces: M001-eh88as instead of M001
+```
+
+### `git`
+
+Git behavior configuration. All fields optional:
+
+```yaml
+git:
+  auto_push: false            # push commits to remote after committing
+  push_branches: false        # push milestone branch to remote
+  remote: origin              # git remote name
+  snapshots: false            # WIP snapshot commits during long tasks
+  pre_merge_check: false      # run checks before worktree merge (true/false/"auto")
+  commit_type: feat           # override conventional commit prefix
+  main_branch: main           # primary branch name
+  merge_strategy: squash      # how worktree branches merge: "squash" or "merge"
+  isolation: worktree         # git isolation: "worktree", "branch", or "none"
+  commit_docs: true           # commit .gsd/ artifacts to git (set false to keep local)
+  manage_gitignore: true      # set false to prevent GSD from modifying .gitignore
+  worktree_post_create: .gsd/hooks/post-worktree-create  # script to run after worktree creation
+  auto_pr: false              # create a PR on milestone completion (requires push_branches)
+  pr_target_branch: develop   # target branch for auto-created PRs (default: main branch)
+```
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `auto_push` | boolean | `false` | Push commits to remote after committing |
+| `push_branches` | boolean | `false` | Push milestone branch to remote |
+| `remote` | string | `"origin"` | Git remote name |
+| `snapshots` | boolean | `false` | WIP snapshot commits during long tasks |
+| `pre_merge_check` | bool/string | `false` | Run checks before merge (`true`/`false`/`"auto"`) |
+| `commit_type` | string | (inferred) | Override conventional commit prefix (`feat`, `fix`, `refactor`, `docs`, `test`, `chore`, `perf`, `ci`, `build`, `style`) |
+| `main_branch` | string | `"main"` | Primary branch name |
+| `merge_strategy` | string | `"squash"` | How worktree branches merge: `"squash"` (combine all commits) or `"merge"` (preserve individual commits) |
+| `isolation` | string | `"worktree"` | Auto-mode isolation: `"worktree"` (separate directory), `"branch"` (work in project root — useful for submodule-heavy repos), or `"none"` (no isolation — commits on current branch, no worktree or milestone branch) |
+| `commit_docs` | boolean | `true` | Commit `.gsd/` planning artifacts to git. Set `false` to keep local-only |
+| `manage_gitignore` | boolean | `true` | When `false`, GSD will not modify `.gitignore` at all — no baseline patterns, no self-healing. Use if you manage your own `.gitignore` |
+| `worktree_post_create` | string | (none) | Script to run after worktree creation. Receives `SOURCE_DIR` and `WORKTREE_DIR` env vars |
+| `auto_pr` | boolean | `false` | Automatically create a pull request when a milestone completes. Requires `auto_push: true` and `gh` CLI installed and authenticated |
+| `pr_target_branch` | string | (main branch) | Target branch for auto-created PRs (e.g. `develop`, `qa`). Defaults to `main_branch` if not set |
+
+#### `git.worktree_post_create`
+
+Script to run after a worktree is created (both auto-mode and manual `/worktree`). Useful for copying `.env` files, symlinking asset directories, or running setup commands that worktrees don't inherit from the main tree.
+
+```yaml
+git:
+  worktree_post_create: .gsd/hooks/post-worktree-create
+```
+
+The script receives two environment variables:
+- `SOURCE_DIR` — the original project root
+- `WORKTREE_DIR` — the newly created worktree path
+
+Example hook script (`.gsd/hooks/post-worktree-create`):
+
+```bash
+#!/bin/bash
+# Copy environment files and symlink assets into the new worktree
+cp "$SOURCE_DIR/.env" "$WORKTREE_DIR/.env"
+cp "$SOURCE_DIR/.env.local" "$WORKTREE_DIR/.env.local" 2>/dev/null || true
+ln -sf "$SOURCE_DIR/assets" "$WORKTREE_DIR/assets"
+```
+
+The path can be absolute or relative to the project root. The script runs with a 30-second timeout. Failure is non-fatal — GSD logs a warning and continues.
+
+#### `git.auto_pr`
+
+Automatically create a pull request when a milestone completes. Designed for teams using Gitflow or branch-based workflows where work should go through PR review before merging to a target branch.
+
+```yaml
+git:
+  auto_push: true
+  auto_pr: true
+  pr_target_branch: develop  # or qa, staging, etc.
+```
+
+**Requirements:**
+- `auto_push: true` — the milestone branch must be pushed before a PR can be created
+- [`gh` CLI](https://cli.github.com/) installed and authenticated (`gh auth login`)
+
+**How it works:**
+1. Milestone completes → GSD squash-merges the worktree to the main branch
+2. Pushes the main branch to remote (if `auto_push: true`)
+3. Pushes the milestone branch to remote
+4. Creates a PR from the milestone branch to `pr_target_branch` via `gh pr create`
+
+If `pr_target_branch` is not set, the PR targets the `main_branch` (or auto-detected main branch). PR creation failure is non-fatal — GSD logs and continues.
+
+### `github` (v2.39)
+
+GitHub sync configuration. When enabled, GSD auto-syncs milestones, slices, and tasks to GitHub Issues, PRs, and Milestones.
+
+```yaml
+github:
+  enabled: true
+  repo: "owner/repo"              # auto-detected from git remote if omitted
+  labels: [gsd, auto-generated]   # labels applied to created issues/PRs
+  project: "Project ID"           # optional GitHub Project board
+```
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `enabled` | boolean | `false` | Enable GitHub sync |
+| `repo` | string | (auto-detected) | GitHub repository in `owner/repo` format |
+| `labels` | string[] | `[]` | Labels to apply to created issues and PRs |
+| `project` | string | (none) | GitHub Project ID for project board integration |
+
+**Requirements:**
+- `gh` CLI installed and authenticated (`gh auth login`)
+- Sync mapping is persisted in `.gsd/.github-sync.json`
+- Rate-limit aware — skips sync when GitHub API rate limit is low
+
+**Commands:**
+- `/github-sync bootstrap` — initial setup and sync
+- `/github-sync status` — show sync mapping counts
+
+### `notifications`
+
+Control what notifications GSD sends during auto mode:
+
+```yaml
+notifications:
+  enabled: true
+  on_complete: true           # notify on unit completion
+  on_error: true              # notify on errors
+  on_budget: true             # notify on budget thresholds
+  on_milestone: true          # notify when milestone finishes
+  on_attention: true          # notify when manual attention needed
+```
+
+### `remote_questions`
+
+Route interactive questions to Slack or Discord for headless auto mode:
+
+```yaml
+remote_questions:
+  channel: slack              # or discord
+  channel_id: "C1234567890"
+  timeout_minutes: 15         # question timeout (1-30 minutes)
+  poll_interval_seconds: 10   # poll interval (2-30 seconds)
+```
+
+### `post_unit_hooks`
+
+Custom hooks that fire after specific unit types complete:
+
+```yaml
+post_unit_hooks:
+  - name: code-review
+    after: [execute-task]
+    prompt: "Review the code changes for quality and security issues."
+    model: claude-opus-4-6          # optional: model override
+    max_cycles: 1                   # max fires per trigger (1-10, default: 1)
+    artifact: REVIEW.md             # optional: skip if this file exists
+    retry_on: NEEDS-REWORK.md       # optional: re-run trigger unit if this file appears
+    agent: review-agent             # optional: agent definition to use
+    enabled: true                   # optional: disable without removing
+```
+
+**Known unit types for `after`:** `research-milestone`, `plan-milestone`, `research-slice`, `plan-slice`, `execute-task`, `complete-slice`, `replan-slice`, `reassess-roadmap`, `run-uat`
+
+**Prompt substitutions:** `{milestoneId}`, `{sliceId}`, `{taskId}` are replaced with current context values.
+
+### `pre_dispatch_hooks`
+
+Hooks that intercept units before dispatch. Three actions available:
+
+**Modify** — prepend/append text to the unit prompt:
+
+```yaml
+pre_dispatch_hooks:
+  - name: add-standards
+    before: [execute-task]
+    action: modify
+    prepend: "Follow our coding standards document."
+    append: "Run linting after changes."
+```
+
+**Skip** — skip the unit entirely:
+
+```yaml
+pre_dispatch_hooks:
+  - name: skip-research
+    before: [research-slice]
+    action: skip
+    skip_if: RESEARCH.md            # optional: only skip if this file exists
+```
+
+**Replace** — replace the unit prompt entirely:
+
+```yaml
+pre_dispatch_hooks:
+  - name: custom-execute
+    before: [execute-task]
+    action: replace
+    prompt: "Execute the task using TDD methodology."
+    unit_type: execute-task-tdd     # optional: override unit type label
+    model: claude-opus-4-6          # optional: model override
+```
+
+All pre-dispatch hooks support `enabled: true/false` to toggle without removing.
+
+### `always_use_skills` / `prefer_skills` / `avoid_skills`
+
+Skill routing preferences:
+
+```yaml
+always_use_skills:
+  - debug-like-expert
+prefer_skills:
+  - frontend-design
+avoid_skills: []
+```
+
+Skills can be bare names (looked up in `~/.agents/skills/` and `.agents/skills/`) or absolute paths.
+
+### `skill_rules`
+
+Situational skill routing with human-readable triggers:
+
+```yaml
+skill_rules:
+  - when: task involves authentication
+    use: [clerk]
+  - when: frontend styling work
+    prefer: [frontend-design]
+  - when: working with legacy code
+    avoid: [aggressive-refactor]
+```
+
+### `custom_instructions`
+
+Durable instructions appended to every session:
+
+```yaml
+custom_instructions:
+  - "Always use TypeScript strict mode"
+  - "Prefer functional patterns over classes"
+```
+
+For project-specific knowledge (patterns, gotchas, lessons learned), use `.gsd/KNOWLEDGE.md` instead — it's injected into every agent prompt automatically. Add entries with `/gsd knowledge rule|pattern|lesson <description>`.
+
+### `RUNTIME.md` — Runtime Context (v2.39)
+
+Declare project-level runtime context in `.gsd/RUNTIME.md`. This file is inlined into task execution prompts, giving the agent accurate information about your runtime environment without relying on hallucinated paths or URLs.
+
+**Location:** `.gsd/RUNTIME.md`
+
+**Example:**
+
+```markdown
+# Runtime Context
+
+## API Endpoints
+- Main API: https://api.example.com
+- Cache: redis://localhost:6379
+
+## Environment Variables
+- DEPLOYMENT_ENV: staging
+- DB_POOL_SIZE: 20
+
+## Local Services
+- PostgreSQL: localhost:5432
+- Redis: localhost:6379
+```
+
+Use this for information that the agent needs during execution but that doesn't belong in `DECISIONS.md` (architectural) or `KNOWLEDGE.md` (patterns/rules). Common examples: API base URLs, service ports, deployment targets, and environment-specific configuration.
+
+### `dynamic_routing`
+
+Complexity-based model routing. See [Dynamic Model Routing](./dynamic-model-routing.md).
+
+```yaml
+dynamic_routing:
+  enabled: true
+  tier_models:
+    light: claude-haiku-4-5
+    standard: claude-sonnet-4-6
+    heavy: claude-opus-4-6
+  escalate_on_failure: true
+  budget_pressure: true
+  cross_provider: true
+```
+
+### `auto_visualize`
+
+Show the workflow visualizer automatically after milestone completion:
+
+```yaml
+auto_visualize: true
+```
+
+See [Workflow Visualizer](./visualizer.md).
+
+### `parallel`
+
+Run multiple milestones simultaneously. Disabled by default.
+
+```yaml
+parallel:
+  enabled: false            # Master toggle
+  max_workers: 2            # Concurrent workers (1-4)
+  budget_ceiling: 50.00     # Aggregate cost limit in USD
+  merge_strategy: "per-milestone"  # "per-slice" or "per-milestone"
+  auto_merge: "confirm"            # "auto", "confirm", or "manual"
+```
+
+See [Parallel Orchestration](./parallel-orchestration.md) for full documentation.
+
+## Full Example
+
+```yaml
+---
+version: 1
+
+# Model selection
+models:
+  research: openrouter/deepseek/deepseek-r1
+  planning:
+    model: claude-opus-4-6
+    fallbacks:
+      - openrouter/z-ai/glm-5
+  execution: claude-sonnet-4-6
+  execution_simple: claude-haiku-4-5-20250414
+  completion: claude-sonnet-4-6
+
+# Token optimization
+token_profile: balanced
+
+# Dynamic model routing
+dynamic_routing:
+  enabled: true
+  escalate_on_failure: true
+  budget_pressure: true
+
+# Budget
+budget_ceiling: 25.00
+budget_enforcement: pause
+context_pause_threshold: 80
+
+# Supervision
+auto_supervisor:
+  soft_timeout_minutes: 15
+  hard_timeout_minutes: 25
+
+# Git
+git:
+  auto_push: true
+  merge_strategy: squash
+  isolation: worktree         # "worktree", "branch", or "none"
+  commit_docs: true
+
+# Skills
+skill_discovery: suggest
+skill_staleness_days: 60     # Skills unused for N days get deprioritized (0 = disabled)
+always_use_skills:
+  - debug-like-expert
+skill_rules:
+  - when: task involves authentication
+    use: [clerk]
+
+# Notifications
+notifications:
+  on_complete: false
+  on_milestone: true
+  on_attention: true
+
+# Visualizer
+auto_visualize: true
+
+# Hooks
+post_unit_hooks:
+  - name: code-review
+    after: [execute-task]
+    prompt: "Review {sliceId}/{taskId} for quality and security."
+    artifact: REVIEW.md
+---
+```
diff --git a/docs/context-and-hooks/01-the-context-pipeline.md b/docs/context-and-hooks/01-the-context-pipeline.md
new file mode 100644
index 000000000..b578b658e
--- /dev/null
+++ b/docs/context-and-hooks/01-the-context-pipeline.md
@@ -0,0 +1,249 @@
+# The Context Pipeline
+
+The full journey of a user prompt from keypress to LLM input, through every transformation stage. Understanding this pipeline is the foundation of all context engineering in pi.
+
+---
+
+## The Pipeline at a Glance
+
+```
+User types prompt and hits Enter
+│
+├─► Extension command check (/command)
+│   If match → run handler, skip everything below
+│
+├─► input event
+│   Extensions can: transform text/images, intercept entirely, or pass through
+│
+├─► Skill expansion (/skill:name)
+│   Skill file content injected into prompt text
+│
+├─► Prompt template expansion (/template)
+│   Template file content merged into prompt text
+│
+├─► before_agent_start event [ONCE per user prompt]
+│   Extensions can:
+│     • Inject custom messages (appended after user message)
+│     • Modify the system prompt (chained across extensions)
+│
+├─► Agent.prompt(messages)
+│   Messages array: [user message, ...nextTurn messages, ...extension messages]
+│
+│   ┌── Turn loop (repeats while LLM calls tools) ──────────────┐
+│   │                                                            │
+│   │  transformContext (= context event) [EVERY turn]           │
+│   │    Extensions receive AgentMessage[] deep copy             │
+│   │    Can filter, reorder, inject, or replace messages        │
+│   │    Multiple handlers chain: each sees previous output      │
+│   │                                                            │
+│   │  convertToLlm [EVERY turn, AFTER context event]           │
+│   │    AgentMessage[] → Message[]                              │
+│   │    Custom types mapped to user role                        │
+│   │    bashExecution (!! prefix) filtered out                  │
+│   │    Not extensible — hardcoded in messages.ts               │
+│   │                                                            │
+│   │  LLM call                                                  │
+│   │    System prompt + converted messages + tool definitions   │
+│   │                                                            │
+│   │  Tool execution (if LLM calls tools)                       │
+│   │    tool_call event → can block                             │
+│   │    execute runs                                            │
+│   │    tool_result event → can modify result                   │
+│   │    Steering check → may skip remaining tools               │
+│   │                                                            │
+│   │  Follow-up check (if no more tool calls)                   │
+│   │    Queued follow-up messages become next turn input         │
+│   │                                                            │
+│   └────────────────────────────────────────────────────────────┘
+│
+└─► agent_end event
+```
+
+---
+
+## Stage-by-Stage Detail
+
+### Stage 1: Extension Command Check
+
+The first thing that happens. If the text starts with `/` and matches a registered extension command, the command handler runs and **the prompt never reaches the agent**. No events fire. No LLM call happens.
+
+This means extension commands are fully synchronous escape hatches — they execute even during streaming (they're checked before any queuing logic).
+
+### Stage 2: Input Event
+
+```typescript
+pi.on("input", async (event, ctx) => {
+  // event.text — the raw user input
+  // event.images — attached images, if any
+  // event.source — "interactive" | "rpc" | "extension"
+  
+  // Three possible return values:
+  return { action: "continue" };                    // pass through unchanged
+  return { action: "transform", text: "new text" }; // rewrite the input
+  return { action: "handled" };                      // swallow entirely
+});
+```
+
+**Chaining:** Multiple `input` handlers chain. If handler A returns `transform`, handler B sees the transformed text. If any handler returns `handled`, the pipeline stops — no LLM call.
+
+**Timing:** Fires before skill/template expansion. Your handler sees the raw `/skill:name args` text, not the expanded content.
+
+### Stage 3: Skill and Template Expansion
+
+Deterministic text substitution. `/skill:name args` becomes the skill file content wrapped in `<skill>` tags. `/template args` becomes the template file content. These are string replacements — no events fire.
+
+### Stage 4: before_agent_start
+
+```typescript
+pi.on("before_agent_start", async (event, ctx) => {
+  // event.prompt — the expanded user prompt text
+  // event.images — attached images
+  // event.systemPrompt — current system prompt (may be modified by earlier extensions)
+  
+  return {
+    message: {
+      customType: "my-context",
+      content: "Context the LLM should see",
+      display: false,  // UI rendering only — LLM ALWAYS sees it
+    },
+    systemPrompt: event.systemPrompt + "\nExtra instructions",
+  };
+});
+```
+
+**Critical facts:**
+- Fires **once** per user prompt, not per turn
+- System prompts **chain**: Extension A modifies it, Extension B sees the modified version in `event.systemPrompt`
+- Messages **accumulate**: All extensions' messages are collected and injected as separate entries
+- If no extension returns a `systemPrompt`, the base system prompt is restored (previous turn's modifications don't persist)
+
+**Message injection order in the final array:**
+```
+[user message] → [nextTurn messages] → [extension messages from before_agent_start]
+```
+
+### Stage 5: The Turn Loop
+
+This is where the LLM is actually called. The turn loop repeats for each LLM response that includes tool calls.
+
+#### 5a: transformContext / context event
+
+The `context` event is wired as the `transformContext` callback on the Agent. It fires on **every turn** within the agent loop.
+
+```typescript
+// Inside the agent loop (agent-loop.ts):
+let messages = context.messages;
+if (config.transformContext) {
+  messages = await config.transformContext(messages, signal);
+}
+const llmMessages = await config.convertToLlm(messages);
+```
+
+The `context` event handler in the runner creates a `structuredClone` deep copy:
+
+```typescript
+// runner.ts emitContext():
+let currentMessages = structuredClone(messages);
+// ...each handler receives and can modify currentMessages
+```
+
+**This means:**
+- You get a deep copy — safe to mutate, splice, filter, or replace
+- You work at the `AgentMessage[]` level (includes custom types)
+- Multiple handlers chain: each sees the output of the previous
+- **You cannot modify the system prompt here** — only `before_agent_start` can do that
+- The messages include everything: user messages, assistant responses, tool results, custom messages, bash executions, compaction summaries, branch summaries
+
+#### 5b: convertToLlm
+
+After `context` event processing, `convertToLlm` maps `AgentMessage[]` to `Message[]`:
+
+| AgentMessage role | Converted to | Notes |
+|---|---|---|
+| `user` | `user` | Pass through |
+| `assistant` | `assistant` | Pass through |
+| `toolResult` | `toolResult` | Pass through |
+| `custom` | `user` | Content preserved, `display` field ignored |
+| `bashExecution` | `user` | Unless `excludeFromContext` (`!!` prefix) → filtered out |
+| `compactionSummary` | `user` | Wrapped in `<summary>` tags |
+| `branchSummary` | `user` | Wrapped in `<summary>` tags |
+
+**`convertToLlm` is not extensible.** It's a hardcoded function in `messages.ts`. If you need to change how messages appear to the LLM, do it in the `context` event handler before this stage.
+
+#### 5c: LLM Call
+
+The converted messages plus system prompt plus tool definitions go to the LLM provider. The system prompt used is whatever was set by `before_agent_start` (or the base prompt if no extension modified it).
+
+#### 5d: Tool Execution and Interception
+
+When the LLM responds with tool calls, they execute sequentially:
+
+```
+For each tool call:
+  tool_call event → can { block: true, reason: "..." }
+    If blocked → Error("reason") becomes the tool result
+  tool_execution_start event (informational)
+  tool.execute() runs
+  tool_execution_end event (informational)
+  tool_result event → can modify { content, details, isError }
+  
+  Steering check → if steering messages queued:
+    Remaining tools get "Skipped due to queued user message"
+    Steering messages become input for next turn
+```
+
+### Stage 6: Follow-up and Continuation
+
+When the LLM finishes and has no more tool calls:
+1. Check for steering messages → if any, start new turn with them
+2. Check for follow-up messages → if any, start new turn with them  
+3. If neither → `agent_end` fires, agent goes idle
+
+---
+
+## What the LLM Actually Sees
+
+For any given turn, the LLM receives:
+
+```
+System prompt (base + before_agent_start modifications)
+  +
+Messages (after context event filtering, after convertToLlm mapping)
+  +
+Tool definitions (active tools with names, descriptions, parameter schemas)
+```
+
+The system prompt includes:
+- Base prompt (tool descriptions, guidelines, pi docs reference, date/time, cwd)
+- `promptSnippet` overrides from active tools (replaces tool description in "Available tools")
+- `promptGuidelines` from active tools (appended to "Guidelines" section)
+- `appendSystemPrompt` from settings/config
+- Project context files (AGENTS.md, CLAUDE.md from cwd ancestors)
+- Skills listing (names + descriptions, agent uses `read` to load them)
+- Any `before_agent_start` modifications
+
+---
+
+## Key Timing Distinctions
+
+| Hook | When | How often | Can modify |
+|------|------|-----------|-----------|
+| `input` | Before expansion | Once per user input | Input text |
+| `before_agent_start` | After expansion, before agent loop | Once per user prompt | System prompt + inject messages |
+| `context` | Before each LLM call | Every turn in agent loop | Message array |
+| `tool_call` | Before each tool execution | Per tool call | Block execution |
+| `tool_result` | After each tool execution | Per tool call | Result content/details |
+
+---
+
+## The Deep Copy Question
+
+When do you get a safe-to-mutate copy vs a reference?
+
+| Hook | What you receive | Safe to mutate? |
+|------|-----------------|-----------------|
+| `context` | `structuredClone` deep copy | Yes |
+| `before_agent_start` | `event.systemPrompt` is a string (immutable) | Return new string |
+| `tool_call` | `event.input` is the raw args object | Do not mutate — return `block` |
+| `tool_result` | `{ ...event }` shallow spread | Return new values, don't mutate |
+| `input` | `event.text` is a string (immutable) | Return new text via `transform` |
diff --git a/docs/context-and-hooks/02-hook-reference.md b/docs/context-and-hooks/02-hook-reference.md
new file mode 100644
index 000000000..534800657
--- /dev/null
+++ b/docs/context-and-hooks/02-hook-reference.md
@@ -0,0 +1,465 @@
+# Hook Reference
+
+Complete behavioral specification of every hook in pi's extension system. Covers timing, chaining semantics, return shapes, and edge cases not in the extending-pi docs.
+
+---
+
+## Hook Categories
+
+1. **Input hooks** — intercept user input before the agent
+2. **Agent lifecycle hooks** — control the agent loop boundary
+3. **Per-turn hooks** — fire on every LLM call within an agent run
+4. **Tool hooks** — intercept individual tool executions
+5. **Session hooks** — respond to session lifecycle changes
+6. **Model hooks** — respond to model changes
+7. **Resource hooks** — provide dynamic resources at startup
+
+---
+
+## 1. Input Hooks
+
+### `input`
+
+**When:** User submits text (Enter in editor, RPC message, or `pi.sendUserMessage` from an extension with `source: "extension"`).
+
+**Before:** Skill expansion, template expansion, command check (extension commands are checked before `input` fires, but built-in commands are checked after).
+
+**Chaining:** Sequential through all extensions. Each handler sees the text output of the previous handler's `transform`. First `handled` stops the chain and the pipeline.
+
+```typescript
+pi.on("input", async (event, ctx) => {
+  // event.text: string — current text (possibly transformed by earlier handler)
+  // event.images: ImageContent[] | undefined
+  // event.source: "interactive" | "rpc" | "extension"
+  
+  // Option 1: Pass through
+  return { action: "continue" };
+  // or return nothing (undefined) — same as continue
+  
+  // Option 2: Transform
+  return { action: "transform", text: "rewritten", images: newImages };
+  
+  // Option 3: Swallow (no LLM call, no further handlers)
+  return { action: "handled" };
+});
+```
+
+**Edge cases:**
+- Extension commands (`/mycommand`) are checked **before** `input` fires. If it matches, `input` never fires.
+- Built-in commands (`/new`, `/model`, etc.) are checked **after** `input` transforms. So `input` can transform text into a built-in command, or transform a built-in command into something else.
+- Images can be replaced via `transform`. Omitting `images` in the transform result preserves the original images.
+
+---
+
+## 2. Agent Lifecycle Hooks
+
+### `before_agent_start`
+
+**When:** After input processing, skill/template expansion, and the user message is constructed — but before `agent.prompt()` is called.
+
+**Fires:** Once per user prompt. Does NOT fire on subsequent turns within the same agent run.
+
+**Chaining:**
+- **System prompt:** Chains. Extension A modifies `event.systemPrompt`, Extension B sees that modified version. If no extension returns a `systemPrompt`, the base prompt is used (resetting any previous turn's modifications).
+- **Messages:** Accumulate. All `message` results are collected into an array. Each becomes a separate `CustomMessage` with `role: "custom"` injected after the user message.
+
+```typescript
+pi.on("before_agent_start", async (event, ctx) => {
+  // event.prompt: string — expanded user prompt text
+  // event.images: ImageContent[] | undefined
+  // event.systemPrompt: string — current system prompt (may be chained from earlier extension)
+  
+  return {
+    // Optional: inject a custom message into the session
+    message: {
+      customType: "my-extension",  // identifies the message type
+      content: "Text the LLM sees", // string or (TextContent | ImageContent)[]
+      display: true,                // controls UI rendering, NOT LLM visibility
+      details: { any: "data" },     // for custom rendering and state reconstruction
+    },
+    
+    // Optional: modify the system prompt for this agent run
+    systemPrompt: event.systemPrompt + "\nNew instructions",
+  };
+});
+```
+
+**Critical detail:** The `display` field controls whether the message shows in the TUI chat log. The LLM **always** sees the message content regardless of `display`. All custom messages become `user` role messages in `convertToLlm`.
+
+**Error handling:** If a handler throws, the error is captured and reported via `emitError`. Other handlers still run. The pipeline is not stopped.
+
+### `agent_start`
+
+**When:** The agent loop begins (after `before_agent_start`, after `agent.prompt()` is called).
+
+**Fires:** Once per agent run. Informational only — no return value.
+
+```typescript
+pi.on("agent_start", async (event, ctx) => {
+  // event: { type: "agent_start" }
+  // Useful for: starting timers, resetting per-run state
+});
+```
+
+### `agent_end`
+
+**When:** The agent loop finishes (all turns complete, no more tool calls, no queued messages).
+
+**Fires:** Once per agent run.
+
+```typescript
+pi.on("agent_end", async (event, ctx) => {
+  // event.messages: AgentMessage[] — all messages produced during this run
+  // Useful for: final summaries, state persistence, triggering follow-up actions
+});
+```
+
+**Subtlety:** `event.messages` contains only the NEW messages from this agent run, not the full conversation history. Use `ctx.sessionManager.getBranch()` for the full history.
+
+---
+
+## 3. Per-Turn Hooks
+
+### `turn_start`
+
+**When:** Each turn within the agent loop begins (before the LLM call).
+
+```typescript
+pi.on("turn_start", async (event, ctx) => {
+  // event.turnIndex: number — 0-based index of this turn within the agent run
+  // event.timestamp: number — when the turn started
+});
+```
+
+### `context`
+
+**When:** Before each LLM call, after the turn starts. This is the last chance to modify what the LLM sees.
+
+**Fires:** Every turn. If the LLM calls 3 tools and loops back, `context` fires 4 times (once for initial call + once per loop-back).
+
+**Chaining:** Sequential. Each handler receives the output of the previous. First handler gets a `structuredClone` deep copy of the agent's message array.
+
+```typescript
+pi.on("context", async (event, ctx) => {
+  // event.messages: AgentMessage[] — deep copy, safe to mutate
+  
+  // Filter out messages
+  const filtered = event.messages.filter(m => !isIrrelevant(m));
+  return { messages: filtered };
+  
+  // Or inject messages
+  return { messages: [...event.messages, syntheticMessage] };
+  
+  // Or return nothing to pass through unchanged
+});
+```
+
+**What `event.messages` contains:**
+- All roles: `user`, `assistant`, `toolResult`, `custom`, `bashExecution`, `compactionSummary`, `branchSummary`
+- The user message from the current prompt
+- Custom messages injected by `before_agent_start`
+- Tool results from earlier turns in this agent run
+- Steering/follow-up messages that became turn inputs
+- Historical messages from the session (including compaction summaries)
+
+**What it does NOT contain:**
+- The system prompt (use `before_agent_start` for that)
+- Tool definitions (use `pi.setActiveTools()` for that)
+
+### `turn_end`
+
+**When:** After the LLM responds and all tool calls for this turn complete.
+
+```typescript
+pi.on("turn_end", async (event, ctx) => {
+  // event.turnIndex: number
+  // event.message: AgentMessage — the assistant's response message
+  // event.toolResults: ToolResultMessage[] — results from tools called this turn
+});
+```
+
+### `message_start` / `message_update` / `message_end`
+
+**When:** Message lifecycle events. `update` only fires for assistant messages during streaming (token-by-token).
+
+```typescript
+pi.on("message_start", async (event, ctx) => {
+  // event.message: AgentMessage — user, assistant, toolResult, or custom
+});
+
+pi.on("message_update", async (event, ctx) => {
+  // event.message: AgentMessage — partial assistant message (streaming)
+  // event.assistantMessageEvent: AssistantMessageEvent — the specific token event
+});
+
+pi.on("message_end", async (event, ctx) => {
+  // event.message: AgentMessage — final message
+  // Messages are persisted to the session file at this point
+});
+```
+
+---
+
+## 4. Tool Hooks
+
+### `tool_call`
+
+**When:** After the LLM requests a tool call, before it executes.
+
+**Chaining:** Sequential. If any handler returns `{ block: true }`, execution stops immediately. The block reason becomes an Error that is caught and returned as the tool result with `isError: true`.
+
+```typescript
+pi.on("tool_call", async (event, ctx) => {
+  // event.toolCallId: string
+  // event.toolName: string
+  // event.input: typed based on tool (use isToolCallEventType for narrowing)
+  
+  // Block execution
+  return { block: true, reason: "Not allowed in read-only mode" };
+  
+  // Allow execution (return nothing or undefined)
+});
+```
+
+**Type narrowing:**
+```typescript
+import { isToolCallEventType } from "@mariozechner/pi-coding-agent";
+
+pi.on("tool_call", async (event, ctx) => {
+  if (isToolCallEventType("bash", event)) {
+    event.input.command; // string — typed!
+  }
+  if (isToolCallEventType("write", event)) {
+    event.input.path;    // string
+    event.input.content; // string
+  }
+  // Custom tools need explicit type params:
+  if (isToolCallEventType<"my_tool", { action: string }>("my_tool", event)) {
+    event.input.action;  // string
+  }
+});
+```
+
+### `tool_execution_start` / `tool_execution_update` / `tool_execution_end`
+
+Informational events during tool execution. No return values.
+
+```typescript
+pi.on("tool_execution_start", async (event) => {
+  // event.toolCallId, event.toolName, event.args
+});
+
+pi.on("tool_execution_update", async (event) => {
+  // event.partialResult — streaming progress from onUpdate callback
+});
+
+pi.on("tool_execution_end", async (event) => {
+  // event.result, event.isError
+});
+```
+
+### `tool_result`
+
+**When:** After a tool finishes executing, before the result is returned to the agent loop.
+
+**Chaining:** Sequential. Each handler can modify the result. Modifications accumulate across handlers. All handlers see the evolving `currentEvent` with content/details/isError updated by previous handlers.
+
+```typescript
+pi.on("tool_result", async (event, ctx) => {
+  // event.toolCallId: string
+  // event.toolName: string
+  // event.input: Record<string, unknown>
+  // event.content: (TextContent | ImageContent)[]
+  // event.details: unknown
+  // event.isError: boolean
+  
+  // Modify the result
+  return {
+    content: [...event.content, { type: "text", text: "\n\nAudit: logged" }],
+    isError: false, // can flip error state
+  };
+  
+  // Return nothing to pass through unchanged
+});
+```
+
+**Also fires for errors:** If tool execution throws, `tool_result` still fires with `isError: true` and the error message as content. Extensions can modify even error results.
+
+---
+
+## 5. Session Hooks
+
+### `session_start`
+
+**When:** Initial session load (startup) and after session switch/fork. Also fires after `/reload`.
+
+**Use for:** State restoration from session entries, initial setup.
+
+```typescript
+pi.on("session_start", async (_event, ctx) => {
+  // Restore state from session
+  for (const entry of ctx.sessionManager.getBranch()) {
+    if (entry.type === "custom" && entry.customType === "my-state") {
+      myState = entry.data;
+    }
+  }
+});
+```
+
+### `session_before_switch` / `session_switch`
+
+**When:** Before/after `/new` or `/resume`.
+
+```typescript
+pi.on("session_before_switch", async (event) => {
+  // event.reason: "new" | "resume"
+  // event.targetSessionFile?: string (only for resume)
+  return { cancel: true }; // prevent the switch
+});
+```
+
+### `session_before_fork` / `session_fork`
+
+**When:** Before/after `/fork`.
+
+```typescript
+pi.on("session_before_fork", async (event) => {
+  // event.entryId: string — the entry being forked from
+  return { cancel: true };
+  // or
+  return { skipConversationRestore: true }; // fork without restoring messages
+});
+```
+
+### `session_before_compact` / `session_compact`
+
+**When:** Before/after compaction (manual or auto).
+
+```typescript
+pi.on("session_before_compact", async (event) => {
+  // event.preparation: CompactionPreparation
+  // event.branchEntries: SessionEntry[]
+  // event.customInstructions?: string
+  // event.signal: AbortSignal
+  
+  return { cancel: true };
+  // or provide custom compaction:
+  return {
+    compaction: {
+      summary: "My custom summary",
+      firstKeptEntryId: event.preparation.firstKeptEntryId,
+      tokensBefore: event.preparation.tokensBefore,
+    }
+  };
+});
+```
+
+### `session_before_tree` / `session_tree`
+
+**When:** Before/after `/tree` navigation.
+
+```typescript
+pi.on("session_before_tree", async (event) => {
+  // event.preparation: TreePreparation
+  // event.signal: AbortSignal
+  
+  return { cancel: true };
+  // or provide custom summary:
+  return {
+    summary: { summary: "Custom branch summary" },
+    label: "my-label",
+  };
+});
+```
+
+### `session_shutdown`
+
+**When:** Process exit (Ctrl+C, Ctrl+D, SIGTERM, `ctx.shutdown()`).
+
+```typescript
+pi.on("session_shutdown", async (_event, ctx) => {
+  // Last chance to persist state
+  // Keep it fast — process is exiting
+});
+```
+
+---
+
+## 6. Model Hooks
+
+### `model_select`
+
+**When:** Model changes via `/model`, Ctrl+P cycling, or session restore.
+
+```typescript
+pi.on("model_select", async (event, ctx) => {
+  // event.model: Model — the new model
+  // event.previousModel: Model | undefined
+  // event.source: "set" | "cycle" | "restore"
+});
+```
+
+---
+
+## 7. Resource Hooks
+
+### `resources_discover`
+
+**When:** At startup and after `/reload`. Lets extensions provide additional skill, prompt template, and theme paths.
+
+**Not documented in extending-pi docs.** This is how extensions ship their own resources.
+
+```typescript
+pi.on("resources_discover", async (event, ctx) => {
+  // event.cwd: string
+  // event.reason: "startup" | "reload"
+  
+  return {
+    skillPaths: [join(__dirname, "skills", "SKILL.md")],
+    promptPaths: [join(__dirname, "prompts", "my-template.md")],
+    themePaths: [join(__dirname, "themes", "dark.json")],
+  };
+});
+```
+
+**Behavior:** Returned paths are loaded by the resource loader and integrated into the system prompt (skills) and available commands (prompts/themes). The system prompt is rebuilt after resources are extended.
+
+---
+
+## 8. User Bash Hooks
+
+### `user_bash`
+
+**When:** User executes a command via `!` or `!!` prefix in the editor.
+
+```typescript
+pi.on("user_bash", async (event, ctx) => {
+  // event.command: string
+  // event.excludeFromContext: boolean (true if !! prefix)
+  // event.cwd: string
+  
+  // Provide custom execution (e.g., SSH)
+  return {
+    operations: { execute: (cmd) => sshExec(remote, cmd) },
+  };
+  
+  // Or provide a full replacement result
+  return {
+    result: { output: "custom output", exitCode: 0, cancelled: false, truncated: false },
+  };
+});
+```
+
+---
+
+## Execution Order Across Extensions
+
+All hooks iterate through extensions in **load order** (project-local first, then global, then explicitly configured via `-e`). Within each extension, handlers for the same event run in registration order.
+
+For hooks that chain (e.g., `context`, `before_agent_start.systemPrompt`, `input`, `tool_result`):
+- Extension A's handler runs first, Extension B sees A's output
+- Load order determines priority
+
+For hooks that short-circuit (e.g., `tool_call` with `block`, `input` with `handled`, session `cancel`):
+- First extension to return the short-circuit value wins
+- Remaining handlers are skipped
diff --git a/docs/context-and-hooks/03-context-injection-patterns.md b/docs/context-and-hooks/03-context-injection-patterns.md
new file mode 100644
index 000000000..e0927fa29
--- /dev/null
+++ b/docs/context-and-hooks/03-context-injection-patterns.md
@@ -0,0 +1,413 @@
+# Context Injection Patterns
+
+Practical recipes for injecting, filtering, transforming, and managing context through pi's hook system. Each pattern includes when to use it, which hook to use, and the exact implementation.
+
+---
+
+## Pattern 1: Per-Prompt System Prompt Modification
+
+**Use when:** You want to change the LLM's behavior for the entire agent run based on some condition.
+
+**Hook:** `before_agent_start`
+
+```typescript
+let debugMode = false;
+
+pi.registerCommand("debug", {
+  handler: async (_args, ctx) => {
+    debugMode = !debugMode;
+    ctx.ui.notify(debugMode ? "Debug mode ON" : "Debug mode OFF");
+  },
+});
+
+pi.on("before_agent_start", async (event) => {
+  if (debugMode) {
+    return {
+      systemPrompt: event.systemPrompt + `
+
+## Debug Mode
+- Show your reasoning for each decision
+- Before executing any tool, explain what you expect to happen
+- After each tool result, explain what you learned
+- If something unexpected happens, stop and explain before continuing`,
+    };
+  }
+});
+```
+
+**Why `before_agent_start` and not `context`:** The system prompt is separate from the message array. `context` can only modify messages, not the system prompt.
+
+---
+
+## Pattern 2: Invisible Context Injection
+
+**Use when:** You need the LLM to know something without the user seeing it in the chat.
+
+**Hook:** `before_agent_start` with `display: false`
+
+```typescript
+pi.on("before_agent_start", async (event, ctx) => {
+  const gitBranch = await getBranch();
+  const recentCommits = await getRecentCommits(5);
+  
+  return {
+    message: {
+      customType: "git-context",
+      content: `[Git Context] Branch: ${gitBranch}\nRecent commits:\n${recentCommits}`,
+      display: false,  // User doesn't see this in chat
+      // But the LLM DOES see it — display only controls UI rendering
+    },
+  };
+});
+```
+
+**Important:** `display: false` hides from UI only. The LLM always receives custom messages as `user` role content. There is no way to inject LLM-invisible metadata through `sendMessage` or `before_agent_start`.
+
+---
+
+## Pattern 3: Conditional Context Filtering
+
+**Use when:** Some messages in the history are no longer relevant and waste context tokens.
+
+**Hook:** `context`
+
+```typescript
+pi.on("context", async (event) => {
+  return {
+    messages: event.messages.filter(m => {
+      // Remove custom messages from a previous mode
+      if (m.role === "custom" && m.customType === "plan-mode-context") {
+        return currentMode === "plan"; // only keep if still in plan mode
+      }
+      
+      // Remove old bash executions beyond the last 10
+      if (m.role === "bashExecution") {
+        return bashCount++ >= totalBash - 10;
+      }
+      
+      return true;
+    }),
+  };
+});
+```
+
+**Why `context` and not `before_agent_start`:** `context` fires every turn and can see the full message array including tool results from earlier turns. `before_agent_start` fires once and can only inject — it can't filter existing messages.
+
+---
+
+## Pattern 4: Dynamic Context Injection Per Turn
+
+**Use when:** You want to add context that changes between turns (e.g., current file state, running process output).
+
+**Hook:** `context`
+
+```typescript
+pi.on("context", async (event, ctx) => {
+  // Inject a synthetic message at the end of the conversation
+  const liveStatus = await getProcessStatus();
+  
+  const contextMessage = {
+    role: "user" as const,
+    content: [{ type: "text" as const, text: `[Live Status] ${liveStatus}` }],
+    timestamp: Date.now(),
+  };
+  
+  return {
+    messages: [...event.messages, contextMessage],
+  };
+});
+```
+
+**Caution:** Messages injected in `context` are NOT persisted to the session. They exist only for the LLM call. Next turn, you'll need to inject again. This is actually useful — it means the context is always fresh.
+
+---
+
+## Pattern 5: Deferred Context (Next Turn)
+
+**Use when:** You want to attach context to the user's next prompt without interrupting the current conversation.
+
+**Mechanism:** `pi.sendMessage` with `deliverAs: "nextTurn"`
+
+```typescript
+// Queue context for the next user prompt
+pi.sendMessage(
+  {
+    customType: "deferred-context",
+    content: "The test suite passed with 47/47 tests",
+    display: false,
+  },
+  { deliverAs: "nextTurn" }
+);
+```
+
+**How it works internally:** The message is stored in `_pendingNextTurnMessages` and injected into the `messages` array when the next `agent.prompt()` is called, after the user message. Unlike `context` hook injection, these messages ARE persisted to the session.
+
+---
+
+## Pattern 6: Context Window Management
+
+**Use when:** You're approaching the context limit and need to intelligently prune.
+
+**Hook:** `context`
+
+```typescript
+pi.on("context", async (event, ctx) => {
+  const usage = ctx.getContextUsage();
+  if (!usage || usage.percent === null || usage.percent < 70) {
+    return; // plenty of room
+  }
+  
+  // Aggressive pruning: remove tool results beyond the last 20
+  let toolResultCount = 0;
+  const total = event.messages.filter(m => m.role === "toolResult").length;
+  
+  return {
+    messages: event.messages.filter(m => {
+      if (m.role === "toolResult") {
+        toolResultCount++;
+        // Keep last 20 tool results
+        return toolResultCount > total - 20;
+      }
+      return true;
+    }),
+  };
+});
+```
+
+---
+
+## Pattern 7: Steering with Context
+
+**Use when:** You want to redirect the agent mid-run with additional context.
+
+**Mechanism:** `pi.sendMessage` with `deliverAs: "steer"`
+
+```typescript
+// During an agent run, inject a steering message
+pi.sendMessage(
+  {
+    customType: "user-feedback",
+    content: "IMPORTANT: The user just updated the config file. Re-read config.json before continuing.",
+    display: true,
+  },
+  { deliverAs: "steer" }
+);
+```
+
+**What happens:** The current tool call finishes, remaining queued tool calls are skipped (they get error results saying "Skipped due to queued user message"), and the steering message becomes input for the next turn.
+
+---
+
+## Pattern 8: Follow-Up Context After Completion
+
+**Use when:** You want to trigger another LLM turn after the agent finishes, with additional context.
+
+**Mechanism:** `pi.sendMessage` with `deliverAs: "followUp"`
+
+```typescript
+pi.on("agent_end", async (event, ctx) => {
+  // Check if the agent made changes that need verification
+  const hasEdits = event.messages.some(m => 
+    m.role === "toolResult" && m.toolName === "edit"
+  );
+  
+  if (hasEdits) {
+    pi.sendMessage(
+      {
+        customType: "auto-verify",
+        content: "You just made edits. Please verify them by running the test suite.",
+        display: true,
+      },
+      { deliverAs: "followUp", triggerTurn: true }
+    );
+  }
+});
+```
+
+---
+
+## Pattern 9: Tool-Scoped Context via promptGuidelines
+
+**Use when:** You want context that only appears when specific tools are active.
+
+**Mechanism:** `promptGuidelines` on tool registration
+
+```typescript
+pi.registerTool({
+  name: "deploy",
+  label: "Deploy",
+  description: "Deploy the application",
+  promptSnippet: "Deploy the application to staging or production",
+  promptGuidelines: [
+    "Always run tests before deploying",
+    "Never deploy to production without explicit user confirmation",
+    "After deploying, verify the health check endpoint",
+  ],
+  parameters: Type.Object({ /* ... */ }),
+  async execute(toolCallId, params, signal, onUpdate, ctx) { /* ... */ },
+});
+```
+
+**Behavior:** The `promptGuidelines` are added to the "Guidelines" section of the system prompt ONLY when the `deploy` tool is in the active tool set. If the tool is disabled via `pi.setActiveTools(...)`, the guidelines disappear.
+
+---
+
+## Pattern 10: Persistent State as Context
+
+**Use when:** You need state that survives session resume AND is visible to the LLM.
+
+**Mechanism:** Tool result `details` + `session_start` reconstruction + `before_agent_start` injection
+
+```typescript
+let projectFacts: string[] = [];
+
+pi.on("session_start", async (_event, ctx) => {
+  // Reconstruct from session
+  projectFacts = [];
+  for (const entry of ctx.sessionManager.getBranch()) {
+    if (entry.type === "message" && entry.message.role === "toolResult") {
+      if (entry.message.toolName === "learn_fact") {
+        projectFacts = entry.message.details?.facts ?? [];
+      }
+    }
+  }
+});
+
+pi.registerTool({
+  name: "learn_fact",
+  label: "Learn Fact",
+  description: "Record a fact about the project",
+  parameters: Type.Object({ fact: Type.String() }),
+  async execute(toolCallId, params) {
+    projectFacts.push(params.fact);
+    return {
+      content: [{ type: "text", text: `Learned: ${params.fact}` }],
+      details: { facts: [...projectFacts] }, // snapshot in details for branching
+    };
+  },
+});
+
+pi.on("before_agent_start", async (event) => {
+  if (projectFacts.length > 0) {
+    return {
+      message: {
+        customType: "project-facts",
+        content: `Known project facts:\n${projectFacts.map(f => `- ${f}`).join("\n")}`,
+        display: false,
+      },
+    };
+  }
+});
+```
+
+**Why this works for branching:** State lives in tool result `details`, so when the user forks from an earlier point, `session_start` reconstructs from `getBranch()` (the current path), not the full history. Old branches' facts don't leak into new branches.
+
+---
+
+## Pattern 11: Input Preprocessing / Macros
+
+**Use when:** You want custom syntax that expands before the LLM sees it.
+
+**Hook:** `input`
+
+```typescript
+pi.on("input", async (event) => {
+  // Expand @file references to file contents
+  const expanded = event.text.replace(/@(\S+)/g, (match, filePath) => {
+    try {
+      const content = readFileSync(filePath, "utf-8");
+      return `\`\`\`${filePath}\n${content}\n\`\`\``;
+    } catch {
+      return match; // leave unchanged if can't read
+    }
+  });
+  
+  if (expanded !== event.text) {
+    return { action: "transform", text: expanded };
+  }
+  return { action: "continue" };
+});
+```
+
+---
+
+## Pattern 12: Context-Aware Tool Blocking
+
+**Use when:** You want to prevent certain tool usage based on conversation context.
+
+**Hook:** `tool_call` with `context` awareness
+
+```typescript
+let inPlanMode = false;
+
+pi.on("tool_call", async (event, ctx) => {
+  if (!inPlanMode) return;
+  
+  const destructiveTools = ["edit", "write", "bash"];
+  
+  if (event.toolName === "bash" && isToolCallEventType("bash", event)) {
+    // Allow read-only bash commands
+    if (isSafeCommand(event.input.command)) return;
+  }
+  
+  if (destructiveTools.includes(event.toolName)) {
+    return {
+      block: true,
+      reason: `Plan mode active: ${event.toolName} is not allowed. Use /plan to exit plan mode.`,
+    };
+  }
+});
+```
+
+---
+
+## Anti-Patterns
+
+### ❌ Don't: Modify system prompt in `context`
+
+```typescript
+// WRONG — context event can only modify messages, not the system prompt
+pi.on("context", async (event, ctx) => {
+  // This does nothing to the system prompt
+  return { systemPrompt: "new prompt" }; // ← not a valid return field
+});
+```
+
+### ❌ Don't: Rely on `display: false` for security
+
+```typescript
+// WRONG — display: false only hides from UI, LLM still sees it
+pi.on("before_agent_start", async () => ({
+  message: {
+    customType: "secret",
+    content: "API_KEY=sk-1234", // LLM receives this as a user message!
+    display: false,
+  },
+}));
+```
+
+### ❌ Don't: Use `context` for one-time injection
+
+```typescript
+// WRONG — context fires every turn, so this injects repeatedly
+let injected = false;
+pi.on("context", async (event) => {
+  if (!injected) {
+    injected = true;
+    return { messages: [...event.messages, myMessage] };
+  }
+});
+// Problem: after compaction or session restore, injected resets to false
+```
+
+Use `before_agent_start` with `message` for one-time per-prompt injection instead.
+
+### ❌ Don't: Use `getEntries()` for branch-aware state
+
+```typescript
+// WRONG — getEntries() returns ALL entries including dead branches
+for (const entry of ctx.sessionManager.getEntries()) { /* ... */ }
+
+// CORRECT — getBranch() returns only entries on the current branch path
+for (const entry of ctx.sessionManager.getBranch()) { /* ... */ }
+```
diff --git a/docs/context-and-hooks/04-message-types-and-llm-visibility.md b/docs/context-and-hooks/04-message-types-and-llm-visibility.md
new file mode 100644
index 000000000..99a1f3872
--- /dev/null
+++ b/docs/context-and-hooks/04-message-types-and-llm-visibility.md
@@ -0,0 +1,209 @@
+# Message Types and LLM Visibility
+
+Every message in pi has an `AgentMessage` type. These messages go through `convertToLlm` before the LLM sees them. This document specifies exactly what the LLM receives for each message type and what it never sees.
+
+---
+
+## The AgentMessage Type Hierarchy
+
+Pi uses `AgentMessage` as its internal message type, which is a union of standard LLM messages and custom application messages:
+
+```typescript
+// Standard LLM messages
+type Message = UserMessage | AssistantMessage | ToolResultMessage;
+
+// Custom messages added by pi's coding agent
+interface CustomAgentMessages {
+  bashExecution: BashExecutionMessage;
+  custom: CustomMessage;
+  branchSummary: BranchSummaryMessage;
+  compactionSummary: CompactionSummaryMessage;
+}
+
+// The union
+type AgentMessage = Message | CustomAgentMessages[keyof CustomAgentMessages];
+```
+
+---
+
+## Message Type → LLM Conversion Table
+
+| AgentMessage type | `role` seen by LLM | Content transformation | When excluded |
+|---|---|---|---|
+| `user` | `user` | Pass through unchanged | Never |
+| `assistant` | `assistant` | Pass through unchanged | Never |
+| `toolResult` | `toolResult` | Pass through unchanged | Never |
+| `custom` | `user` | `content` preserved as-is (string → `[{type:"text",text}]`) | Never — ALL custom messages reach the LLM |
+| `bashExecution` | `user` | Formatted: `` Ran `cmd`\n```\noutput\n``` `` | When `excludeFromContext: true` (`!!` prefix) |
+| `compactionSummary` | `user` | Wrapped: `The conversation history before this point was compacted into the following summary:\n<summary>\n...\n</summary>` | Never |
+| `branchSummary` | `user` | Wrapped: `The following is a summary of a branch that this conversation came back from:\n<summary>\n...\n</summary>` | Never |
+
+---
+
+## Custom Messages In Detail
+
+Custom messages are created by:
+1. `pi.sendMessage()` — extension-injected messages
+2. `before_agent_start` returning a `message` — per-prompt context injection
+
+### The `display` Field Misconception
+
+```typescript
+pi.sendMessage({
+  customType: "my-context",
+  content: "This text goes to the LLM",
+  display: false,  // ← ONLY controls UI rendering
+});
+```
+
+**What `display` controls:**
+- `true`: Message appears in the TUI chat log (rendered via `registerMessageRenderer` if one exists, or default rendering)
+- `false`: Message is hidden from the TUI chat log
+
+**What `display` does NOT control:**
+- LLM visibility — the LLM ALWAYS receives the content as a `user` role message
+- Session persistence — the message is ALWAYS persisted to the session file
+
+### How Custom Messages Become User Messages
+
+In `convertToLlm` (messages.ts):
+
+```typescript
+case "custom": {
+  const content = typeof m.content === "string" 
+    ? [{ type: "text", text: m.content }] 
+    : m.content;
+  return {
+    role: "user",
+    content,
+    timestamp: m.timestamp,
+  };
+}
+```
+
+The `customType`, `display`, and `details` fields are all stripped. The LLM sees a plain user message with the content.
+
+---
+
+## Bash Execution Messages
+
+Created when the user runs commands via `!` or `!!` prefix.
+
+### `!` (included in context)
+
+```typescript
+// User types: !ls -la
+// LLM sees:
+{
+  role: "user",
+  content: [{ type: "text", text: "Ran `ls -la`\n```\n<output>\n```" }]
+}
+```
+
+With exit code, cancellation, and truncation info appended as needed:
+- Non-zero exit: `\n\nCommand exited with code N`
+- Cancelled: `\n\n(command cancelled)`
+- Truncated: `\n\n[Output truncated. Full output: /path/to/file]`
+
+### `!!` (excluded from context)
+
+```typescript
+// User types: !!echo secret
+// LLM sees: NOTHING — filtered out by convertToLlm
+```
+
+The `excludeFromContext` flag on `BashExecutionMessage` causes `convertToLlm` to return `undefined` for this message, effectively removing it.
+
+---
+
+## Compaction and Branch Summary Messages
+
+These are synthetic messages created by pi's session management.
+
+### Compaction Summary
+
+When the context is compacted, older messages are replaced with a summary:
+
+```typescript
+// LLM sees:
+{
+  role: "user",
+  content: [{
+    type: "text",
+    text: "The conversation history before this point was compacted into the following summary:\n\n<summary>\n[LLM-generated summary of the compacted conversation]\n</summary>"
+  }]
+}
+```
+
+### Branch Summary
+
+When navigating away from a branch and back, the abandoned branch gets summarized:
+
+```typescript
+// LLM sees:
+{
+  role: "user",
+  content: [{
+    type: "text",
+    text: "The following is a summary of a branch that this conversation came back from:\n\n<summary>\n[summary of the branch]\n</summary>"
+  }]
+}
+```
+
+---
+
+## What the LLM Never Sees
+
+1. **`appendEntry` data** — Extension-private entries (`pi.appendEntry("my-state", data)`) are stored in the session file but NEVER included in the message array. They're not `AgentMessage` types at all — they're `CustomEntry` session entries.
+
+2. **`details` on custom messages** — The `details` field is for rendering and state reconstruction. `convertToLlm` strips it.
+
+3. **`details` on tool results** — Tool result `details` are stripped by the LLM message conversion. Only `content` reaches the LLM.
+
+4. **`!!` bash execution output** — Explicitly excluded from context.
+
+5. **Tool definitions not in the active set** — If a tool is registered but not in `getActiveTools()`, the LLM doesn't know it exists.
+
+6. **`promptSnippet` and `promptGuidelines` from inactive tools** — Only active tools contribute to the system prompt.
+
+---
+
+## The Message Array Order
+
+For a typical conversation, the message array the LLM sees (after `context` event and `convertToLlm`) looks like:
+
+```
+1. [compactionSummary → user]  (if compaction happened)
+2. [branchSummary → user]      (if navigated back from a branch)
+3. [user]                       (first user message after compaction)
+4. [assistant]                  (LLM response)
+5. [toolResult]                 (tool results)
+6. [user]                       (next user message)
+7. [custom → user]              (extension-injected message)
+8. ...continues...
+9. [user]                       (current prompt)
+10. [custom → user]             (before_agent_start injected messages)
+11. [custom → user]             (nextTurn queued messages)
+```
+
+---
+
+## Implications for Extension Authors
+
+### If you want the LLM to see something:
+- Use `before_agent_start` → `message` for per-prompt context
+- Use `context` event to inject into the message array per-turn
+- Use `pi.sendMessage` for standalone messages
+- Use `before_agent_start` → `systemPrompt` for system-level instructions
+
+### If you want to hide something from the LLM:
+- Use `pi.appendEntry` — never reaches the message array
+- Use tool result `details` — stored in session but stripped before LLM
+- Use the `context` event to filter messages OUT of the array
+- There is NO way to inject UI-only messages that participate in the conversation flow — `display: false` only hides from the TUI, not from the LLM
+
+### If you want something to survive compaction:
+- Store it in tool result `details` (survives in the kept entries)
+- Store it in `appendEntry` (survives as session data, not messages)
+- Re-inject it via `before_agent_start` every time (survives because you regenerate it)
+- Messages in the compacted range are replaced by the compaction summary — they're gone from the LLM's perspective
diff --git a/docs/context-and-hooks/05-inter-extension-communication.md b/docs/context-and-hooks/05-inter-extension-communication.md
new file mode 100644
index 000000000..f17b71a91
--- /dev/null
+++ b/docs/context-and-hooks/05-inter-extension-communication.md
@@ -0,0 +1,233 @@
+# Inter-Extension Communication
+
+How extensions communicate with each other, share state, and coordinate behavior.
+
+---
+
+## pi.events — The Shared Event Bus
+
+Every extension receives the same `pi.events` instance. It's a simple typed pub/sub bus.
+
+### API
+
+```typescript
+// Emit an event on a channel
+pi.events.emit("my-channel", { action: "started", id: 123 });
+
+// Subscribe to a channel — returns an unsubscribe function
+const unsub = pi.events.on("my-channel", (data) => {
+  // data is typed as `unknown` — you must cast
+  const payload = data as { action: string; id: number };
+  console.log(payload.action); // "started"
+});
+
+// Later: stop listening
+unsub();
+```
+
+### Characteristics
+
+| Property | Behavior |
+|---|---|
+| **Typing** | `data` is `unknown`. No generics. Cast at the consumer. |
+| **Error handling** | Handlers are wrapped in async try/catch. Errors log to `console.error` but don't propagate to emitter or crash the session. |
+| **Ordering** | Handlers fire in subscription order (order of `pi.events.on` calls). |
+| **Persistence** | No replay, no persistence. If you emit before anyone subscribes, the event is lost. |
+| **Scope** | Shared across ALL extensions in the session. The bus is created once and passed to every extension's `createExtensionAPI`. |
+| **Lifecycle** | The bus is cleared on extension reload (`/reload`). Subscriptions from the old extension instances are gone. |
+
+### Example: Extension A Signals Extension B
+
+```typescript
+// Extension A: plan-mode.ts
+export default function (pi: ExtensionAPI) {
+  pi.registerCommand("plan", {
+    handler: async (_args, ctx) => {
+      planEnabled = !planEnabled;
+      pi.events.emit("mode-change", { mode: planEnabled ? "plan" : "normal" });
+    },
+  });
+}
+
+// Extension B: status-display.ts
+export default function (pi: ExtensionAPI) {
+  pi.events.on("mode-change", (data) => {
+    const { mode } = data as { mode: string };
+    // React to mode change
+  });
+}
+```
+
+### Limitations
+
+- **No request/response** — emit is fire-and-forget. If you need a response, use shared state or a callback pattern.
+- **No guaranteed delivery** — if the subscriber hasn't loaded yet (load order matters), the event is missed.
+- **No channel namespacing** — use descriptive channel names to avoid collisions (e.g., `"myext:event"` rather than `"update"`).
+
+---
+
+## Shared State Patterns
+
+### Pattern 1: Shared Module State
+
+If two extensions are loaded from the same package (via `package.json` `pi.extensions` array), they can share state through module-level variables in a shared file.
+
+```
+my-extension/
+├── package.json    # pi.extensions: ["./a.ts", "./b.ts"]
+├── a.ts            # import { state } from "./shared.ts"
+├── b.ts            # import { state } from "./shared.ts"
+└── shared.ts       # export const state = { count: 0 }
+```
+
+**Caveat:** jiti module caching means the shared module is loaded once. But on `/reload`, everything is re-imported from scratch — shared state resets.
+
+### Pattern 2: Event Bus as State Channel
+
+Use `pi.events` to broadcast state changes. Each extension maintains its own copy.
+
+```typescript
+// Extension A: authoritative state owner
+let items: string[] = [];
+
+function addItem(item: string) {
+  items.push(item);
+  pi.events.emit("items:updated", { items: [...items] });
+}
+
+// Extension B: state consumer
+let mirroredItems: string[] = [];
+
+pi.events.on("items:updated", (data) => {
+  mirroredItems = (data as { items: string[] }).items;
+});
+```
+
+### Pattern 3: Session Entries as Coordination Points
+
+Extensions can read each other's `appendEntry` data from the session:
+
+```typescript
+// Extension A writes:
+pi.appendEntry("ext-a-config", { theme: "dark", verbose: true });
+
+// Extension B reads during session_start:
+pi.on("session_start", async (_event, ctx) => {
+  for (const entry of ctx.sessionManager.getEntries()) {
+    if (entry.type === "custom" && entry.customType === "ext-a-config") {
+      const config = entry.data as { theme: string; verbose: boolean };
+      // Use config from Extension A
+    }
+  }
+});
+```
+
+**Downside:** This only works after `session_start`. Not suitable for real-time coordination during a turn.
+
+---
+
+## Multi-Extension Coordination Patterns
+
+### Pattern: Mode Manager
+
+One extension acts as the mode authority, others react:
+
+```typescript
+// mode-manager.ts — the authority
+export default function (pi: ExtensionAPI) {
+  let currentMode: "plan" | "execute" | "review" = "execute";
+  
+  pi.registerCommand("mode", {
+    handler: async (args, ctx) => {
+      const newMode = args.trim() as typeof currentMode;
+      if (!["plan", "execute", "review"].includes(newMode)) {
+        ctx.ui.notify(`Invalid mode: ${newMode}`, "error");
+        return;
+      }
+      currentMode = newMode;
+      pi.events.emit("mode:changed", { mode: currentMode });
+      ctx.ui.notify(`Mode: ${currentMode}`);
+    },
+  });
+  
+  // Other extensions can query current mode via event
+  pi.events.on("mode:query", () => {
+    pi.events.emit("mode:current", { mode: currentMode });
+  });
+}
+
+// tool-guard.ts — reacts to mode changes
+export default function (pi: ExtensionAPI) {
+  let currentMode = "execute";
+  
+  pi.events.on("mode:changed", (data) => {
+    currentMode = (data as { mode: string }).mode;
+  });
+  
+  pi.on("tool_call", async (event) => {
+    if (currentMode === "plan" && ["edit", "write"].includes(event.toolName)) {
+      return { block: true, reason: "Plan mode: write operations disabled" };
+    }
+    if (currentMode === "review" && event.toolName === "bash") {
+      return { block: true, reason: "Review mode: bash disabled" };
+    }
+  });
+}
+```
+
+### Pattern: Extension Priority Chain
+
+When multiple extensions handle the same hook, load order determines priority. Project-local extensions load before global ones. Within a directory, files are discovered in filesystem order.
+
+If you need explicit priority control:
+
+```typescript
+// priority-extension.ts
+export default function (pi: ExtensionAPI) {
+  // Register with a known channel so other extensions can defer
+  pi.events.emit("priority:registered", { name: "security-guard" });
+  
+  pi.on("tool_call", async (event) => {
+    // This runs first if loaded first
+    if (isUnsafe(event)) {
+      return { block: true, reason: "Security policy violation" };
+    }
+  });
+}
+```
+
+---
+
+## The ExtensionContext in Tools
+
+Tools registered by extensions receive `ExtensionContext` as their 5th `execute` parameter. This is the same context event handlers get:
+
+```typescript
+pi.registerTool({
+  name: "my_tool",
+  // ...
+  async execute(toolCallId, params, signal, onUpdate, ctx) {
+    // ctx.ui — dialog methods, notifications, widgets
+    // ctx.sessionManager — read session state
+    // ctx.model — current model
+    // ctx.cwd — working directory
+    // ctx.hasUI — false in print/json mode
+    // ctx.isIdle() — agent state
+    // ctx.abort() — abort current operation
+    // ctx.getContextUsage() — token usage
+    // ctx.compact() — trigger compaction
+    // ctx.getSystemPrompt() — current system prompt
+    
+    if (ctx.hasUI) {
+      const confirmed = await ctx.ui.confirm("Proceed?", "This will modify files");
+      if (!confirmed) {
+        return { content: [{ type: "text", text: "Cancelled by user" }] };
+      }
+    }
+    
+    // ... do work
+  },
+});
+```
+
+**Important:** The `ctx` is freshly created via `runner.createContext()` for each tool execution. It reflects the current state at call time (current model, current session, etc.), not the state when the tool was registered.
diff --git a/docs/context-and-hooks/06-advanced-patterns-from-source.md b/docs/context-and-hooks/06-advanced-patterns-from-source.md
new file mode 100644
index 000000000..48e3bb0ee
--- /dev/null
+++ b/docs/context-and-hooks/06-advanced-patterns-from-source.md
@@ -0,0 +1,382 @@
+# Advanced Patterns from Source
+
+Production patterns extracted from the pi codebase, built-in extensions, and real extension examples. Each pattern shows the mechanism, the source of truth, and when to use it.
+
+---
+
+## Pattern 1: Mode-Aware Tool Sets with Context Injection
+
+**Source:** `plan-mode/index.ts` — the built-in plan mode extension.
+
+This pattern combines tool set management, tool call blocking, context event filtering, and before_agent_start injection into a cohesive mode system.
+
+### The Architecture
+
+```
+/plan toggle → sets planModeEnabled
+  ├─► setActiveTools(PLAN_MODE_TOOLS)     # restrict available tools
+  ├─► tool_call guard                      # block unsafe bash even if tool is active
+  ├─► before_agent_start                   # inject mode-specific instructions
+  ├─► context                              # filter stale mode messages on mode exit
+  └─► agent_end                            # check plan output, offer execution
+```
+
+### Key Insight: Defense in Depth
+
+The plan mode uses THREE layers of tool control:
+
+1. **`setActiveTools`** — removes write tools from the active set entirely. The LLM doesn't even know they exist.
+2. **`tool_call` guard** — even for allowed tools like `bash`, blocks destructive commands via an allowlist.
+3. **`context` filter** — when exiting plan mode, removes stale plan mode context messages so they don't confuse the LLM in normal mode.
+
+```typescript
+// Layer 1: Tool set
+if (planModeEnabled) {
+  pi.setActiveTools(["read", "bash", "grep", "find", "ls"]);
+}
+
+// Layer 2: Bash guard  
+pi.on("tool_call", async (event) => {
+  if (!planModeEnabled || event.toolName !== "bash") return;
+  if (!isSafeCommand(event.input.command)) {
+    return { block: true, reason: "Plan mode: command blocked" };
+  }
+});
+
+// Layer 3: Context cleanup on mode exit
+pi.on("context", async (event) => {
+  if (planModeEnabled) return; // keep plan context when in plan mode
+  return {
+    messages: event.messages.filter(m => {
+      // Remove plan mode markers from context
+      if (m.customType === "plan-mode-context") return false;
+      return true;
+    }),
+  };
+});
+```
+
+### Why This Matters
+
+A naive implementation would just change the tool set. But:
+- `bash` with `rm -rf` is technically a "read-only" tool by name
+- Stale context messages from a previous mode can confuse the LLM
+- The LLM might try to work around restrictions if it sees the mode instructions but has the tools available
+
+---
+
+## Pattern 2: Preset System with Dynamic Model + Tool + Prompt Configuration
+
+**Source:** `preset.ts` — the built-in preset extension.
+
+This pattern shows how to build a full configuration management system that coordinates model, thinking level, tools, and system prompt from a single config file.
+
+### The Architecture
+
+```
+presets.json → load on session_start
+  │
+  ├─► /preset command      → applyPreset(name)
+  ├─► Ctrl+Shift+U         → cyclePreset()
+  ├─► --preset flag         → applyPreset on startup
+  │
+  applyPreset:
+  ├─► pi.setModel()         → switch model
+  ├─► pi.setThinkingLevel() → adjust thinking
+  ├─► pi.setActiveTools()   → reconfigure tools
+  └─► store activePreset    → before_agent_start reads it
+  
+  before_agent_start:
+  └─► append preset.instructions to system prompt
+```
+
+### Key Insight: Deferred System Prompt Application
+
+The preset doesn't modify the system prompt during `applyPreset`. It stores `activePreset` and lets `before_agent_start` read it:
+
+```typescript
+// On apply — just store
+activePresetName = name;
+activePreset = preset;
+
+// On each prompt — inject
+pi.on("before_agent_start", async (event) => {
+  if (activePreset?.instructions) {
+    return {
+      systemPrompt: `${event.systemPrompt}\n\n${activePreset.instructions}`,
+    };
+  }
+});
+```
+
+This is better than calling `agent.setSystemPrompt()` directly because:
+- `before_agent_start` fires on every prompt, keeping the system prompt current
+- The base system prompt is rebuilt by pi when tools change — a direct set would be overwritten
+- Other extensions can see and further modify the prompt in the chain
+
+---
+
+## Pattern 3: Progress Tracking with Widget + State Persistence
+
+**Source:** `plan-mode/index.ts` — todo item tracking during plan execution.
+
+### The Architecture
+
+```
+Plan created (assistant message with "Plan:" section)
+  → extractTodoItems() parses numbered steps
+  → todoItems stored in memory
+  → ui.setWidget() shows progress
+  → appendEntry() persists state
+  
+Each turn:
+  → turn_end checks for [DONE:n] markers
+  → markCompletedSteps() updates todoItems
+  → updateStatus() refreshes widget
+  
+Session resume:
+  → session_start restores from appendEntry
+  → Re-scans messages after last execute marker for [DONE:n]
+  → Rebuilds completion state
+```
+
+### Key Insight: Dual State Reconstruction
+
+On session resume, the extension does TWO things:
+
+1. **Reads the persisted state** from `appendEntry`:
+   ```typescript
+   const planModeEntry = entries
+     .filter(e => e.type === "custom" && e.customType === "plan-mode")
+     .pop();
+   ```
+
+2. **Re-scans assistant messages** for completion markers:
+   ```typescript
+   // Only scan messages AFTER the last plan-mode-execute marker
+   const allText = messages.map(getTextContent).join("\n");
+   markCompletedSteps(allText, todoItems);
+   ```
+
+This handles the case where the extension crashed or was reloaded mid-execution — the persisted state might be stale, but the messages are the source of truth.
+
+---
+
+## Pattern 4: Dynamic Resource Injection
+
+**Source:** `dynamic-resources/index.ts` — extension that ships its own skills and themes.
+
+```typescript
+import { dirname, join } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const baseDir = dirname(fileURLToPath(import.meta.url));
+
+export default function (pi: ExtensionAPI) {
+  pi.on("resources_discover", () => {
+    return {
+      skillPaths: [join(baseDir, "SKILL.md")],
+      promptPaths: [join(baseDir, "dynamic.md")],
+      themePaths: [join(baseDir, "dynamic.json")],
+    };
+  });
+}
+```
+
+### How It Works Internally
+
+After `session_start`, the runner calls `emitResourcesDiscover()`. The returned paths are processed through the `ResourceLoader`:
+
+1. Skills → loaded, added to system prompt's skill listing
+2. Prompts → loaded as prompt templates, available via `/templatename`
+3. Themes → loaded, available via `/theme` or `ctx.ui.setTheme()`
+
+The system prompt is rebuilt after resources are extended, so new skills appear in the same prompt turn.
+
+### When to Use
+
+- Extension packages that need custom skills (e.g., a deployment extension with a "deploy checklist" skill)
+- Theme packs distributed as extensions
+- Dynamic prompt templates that depend on the project context
+
+---
+
+## Pattern 5: Claude Rules Integration
+
+**Source:** `claude-rules.ts` — scanning `.claude/rules/` for per-project rules.
+
+### The Architecture
+
+```
+session_start:
+  → Scan .claude/rules/ for .md files (recursive)
+  → Store file list
+
+before_agent_start:
+  → Append file list to system prompt
+  → Agent uses read tool to load specific rules on demand
+```
+
+### Key Insight: Listing, Not Loading
+
+The extension does NOT load rule file contents into the system prompt. It lists the files:
+
+```typescript
+pi.on("before_agent_start", async (event) => {
+  if (ruleFiles.length === 0) return;
+  
+  const rulesList = ruleFiles.map(f => `- .claude/rules/${f}`).join("\n");
+  
+  return {
+    systemPrompt: event.systemPrompt + `
+
+## Project Rules
+The following project rules are available in .claude/rules/:
+${rulesList}
+When working on tasks related to these rules, use the read tool to load the relevant rule files.`,
+  };
+});
+```
+
+This is context-efficient: the system prompt grows by one line per rule file, not by the full contents of every rule. The LLM loads specific rules via `read` only when relevant.
+
+---
+
+## Pattern 6: Remote Execution via Tool Wrapping
+
+**Source:** The SSH extension pattern and `createBashTool` with pluggable operations.
+
+### The Architecture
+
+Tools support pluggable `operations` that replace the underlying I/O:
+
+```typescript
+import { createBashTool } from "@mariozechner/pi-coding-agent";
+
+// Create a bash tool that executes via SSH
+const remoteBash = createBashTool(cwd, {
+  operations: {
+    execute: async (command, options) => {
+      return sshExec(remoteHost, command, options);
+    },
+  },
+});
+
+// Register it as the bash tool (overrides built-in)
+pi.registerTool({
+  ...remoteBash,
+  name: "bash", // same name = overrides built-in
+});
+```
+
+### The spawnHook Alternative
+
+For lighter customization (e.g., environment setup):
+
+```typescript
+const bashTool = createBashTool(cwd, {
+  spawnHook: ({ command, cwd, env }) => ({
+    command: `source ~/.profile\n${command}`,
+    cwd: `/mnt/sandbox${cwd}`,
+    env: { ...env, CI: "1" },
+  }),
+});
+```
+
+### User Bash Hook for `!` Commands
+
+The `user_bash` event lets you intercept user-typed bash commands (not LLM-initiated ones):
+
+```typescript
+pi.on("user_bash", async (event) => {
+  // Route user bash commands through SSH too
+  return {
+    operations: {
+      execute: (cmd, opts) => sshExec(remoteHost, cmd, opts),
+    },
+  };
+});
+```
+
+---
+
+## Pattern 7: Extension-Aware Compaction
+
+**Source:** `session_before_compact` in agent-session.ts.
+
+### Custom Compaction Summary
+
+Override the default LLM-generated summary:
+
+```typescript
+pi.on("session_before_compact", async (event) => {
+  // Build a domain-specific summary
+  const summary = buildCustomSummary(event.branchEntries);
+  
+  return {
+    compaction: {
+      summary,
+      firstKeptEntryId: event.preparation.firstKeptEntryId,
+      tokensBefore: event.preparation.tokensBefore,
+    },
+  };
+});
+```
+
+### Compaction-Aware State
+
+If your extension stores state in messages that might get compacted away, you need a reconstruction strategy:
+
+```typescript
+pi.on("session_start", async (_event, ctx) => {
+  // Check if there's been a compaction
+  const entries = ctx.sessionManager.getBranch();
+  const hasCompaction = entries.some(e => e.type === "compaction");
+  
+  if (hasCompaction) {
+    // State before compaction is gone from messages
+    // Fall back to appendEntry data or re-derive from remaining messages
+    restoreFromAppendEntries(entries);
+  } else {
+    // Full message history available
+    restoreFromToolResults(entries);
+  }
+});
+```
+
+---
+
+## Pattern 8: The Complete Extension Initialization Sequence
+
+From the source code, the full initialization order is:
+
+```
+1. Extension factory function runs
+   ├─► pi.on() — register event handlers
+   ├─► pi.registerTool() — register tools
+   ├─► pi.registerCommand() — register commands
+   ├─► pi.registerShortcut() — register shortcuts
+   ├─► pi.registerFlag() — register CLI flags
+   └─► pi.registerProvider() — queued (not yet applied)
+
+2. ExtensionRunner created with all extensions
+
+3. bindCore() — action methods become live
+   ├─► pi.sendMessage, pi.setActiveTools, etc. now work
+   └─► Queued provider registrations flushed to ModelRegistry
+
+4. bindExtensions() — UI context and command context connected
+   └─► setUIContext(), bindCommandContext()
+
+5. session_start event fires
+   └─► Extensions restore state from session
+
+6. resources_discover event fires
+   └─► Extensions provide additional skill/prompt/theme paths
+
+7. System prompt rebuilt with new resources
+
+8. Ready for first user prompt
+```
+
+**Important timing:** During step 1, action methods (`sendMessage`, `setActiveTools`, etc.) will throw. You can only register handlers and tools during the factory function. Use `session_start` for anything that needs runtime access.
diff --git a/docs/context-and-hooks/07-the-system-prompt-anatomy.md b/docs/context-and-hooks/07-the-system-prompt-anatomy.md
new file mode 100644
index 000000000..7bb2c57cc
--- /dev/null
+++ b/docs/context-and-hooks/07-the-system-prompt-anatomy.md
@@ -0,0 +1,316 @@
+# The System Prompt Anatomy
+
+How pi's system prompt is built, what goes into it, when it's rebuilt, and every lever you have to shape it.
+
+---
+
+## The Final Prompt Structure
+
+When `buildSystemPrompt()` runs, it assembles sections in this exact order:
+
+```
+┌──────────────────────────────────────────────────┐
+│ 1. Base prompt (default or SYSTEM.md override)   │
+│    ├── Identity statement                        │
+│    ├── Available tools list                      │
+│    ├── Custom tools note                         │
+│    ├── Guidelines                                │
+│    └── Pi documentation pointers                 │
+│                                                  │
+│ 2. Append system prompt (APPEND_SYSTEM.md)       │
+│                                                  │
+│ 3. Project context files                         │
+│    ├── ~/.gsd/agent/AGENTS.md (global)            │
+│    ├── Ancestor AGENTS.md / CLAUDE.md files      │
+│    └── cwd AGENTS.md / CLAUDE.md                 │
+│                                                  │
+│ 4. Skills listing                                │
+│    └── <available_skills> XML block              │
+│                                                  │
+│ 5. Date/time and working directory               │
+└──────────────────────────────────────────────────┘
+```
+
+After `buildSystemPrompt()`, extensions can further modify via `before_agent_start`.
+
+---
+
+## Section 1: The Base Prompt
+
+### Default Base Prompt (no SYSTEM.md)
+
+When no SYSTEM.md exists, pi uses its built-in base:
+
+```
+You are an expert coding assistant operating inside pi, a coding agent harness.
+You help users by reading files, executing commands, editing code, and writing new files.
+
+Available tools:
+- read: Read file contents
+- bash: Execute bash commands (ls, grep, find, etc.)
+- edit: Make surgical edits to files (find exact text and replace)
+- write: Create or overwrite files
+- my_custom_tool: [promptSnippet or description]
+
+In addition to the tools above, you may have access to other custom tools
+depending on the project.
+
+Guidelines:
+- Use bash for file operations like ls, rg, find
+- Use read to examine files before editing. You must use this tool instead of cat or sed.
+- Use edit for precise changes (old text must match exactly)
+- Use write only for new files or complete rewrites
+- [extension tool promptGuidelines inserted here]
+- Be concise in your responses
+- Show file paths clearly when working with files
+
+Pi documentation (read only when the user asks about pi itself...):
+- Main documentation: [path]
+- Additional docs: [path]
+- Examples: [path]
+```
+
+### SYSTEM.md Override (full replacement)
+
+If `.gsd/SYSTEM.md` (project) or `~/.gsd/agent/SYSTEM.md` (global) exists, its contents **completely replace** the default base prompt above. The tools list, guidelines, pi docs pointers — all gone. You own the entire base.
+
+Project takes precedence over global. Only one SYSTEM.md is used (first found wins).
+
+**What still gets appended even with a custom SYSTEM.md:**
+- APPEND_SYSTEM.md content
+- Project context files (AGENTS.md / CLAUDE.md)
+- Skills listing (if the `read` tool is active)
+- Date/time and cwd
+
+**What you lose:**
+- The entire default prompt structure
+- Built-in tool descriptions and guidelines
+- Pi documentation pointers
+- Dynamic guidelines from `promptGuidelines` on tools
+
+### How Tool Descriptions Appear
+
+Each active tool gets a line in "Available tools":
+
+```
+- toolname: [one-line description]
+```
+
+The description is determined by priority:
+1. `promptSnippet` from the tool registration (if provided)
+2. Built-in description from `toolDescriptions` map (for read, bash, edit, write, grep, find, ls)
+3. The tool's `name` as fallback
+
+`promptSnippet` is normalized: newlines collapsed to spaces, trimmed to a single line.
+
+### How Guidelines Are Built
+
+Guidelines are assembled dynamically based on which tools are active:
+
+| Condition | Guideline |
+|---|---|
+| bash active, no grep/find/ls | "Use bash for file operations like ls, rg, find" |
+| bash active + grep/find/ls | "Prefer grep/find/ls tools over bash for file exploration" |
+| read + edit active | "Use read to examine files before editing" |
+| edit active | "Use edit for precise changes (old text must match exactly)" |
+| write active | "Use write only for new files or complete rewrites" |
+| edit or write active | "When summarizing your actions, output plain text directly" |
+| Always | "Be concise in your responses" |
+| Always | "Show file paths clearly when working with files" |
+
+**Extension tool guidelines** from `promptGuidelines` are appended after the built-in guidelines. They're deduplicated (same string appears only once even if multiple tools register it).
+
+---
+
+## Section 2: Append System Prompt
+
+If `.gsd/APPEND_SYSTEM.md` (project) or `~/.gsd/agent/APPEND_SYSTEM.md` (global) exists, its contents are appended after the base prompt.
+
+This is the safe way to add project-wide instructions without replacing the default prompt. It works with both the default base and a custom SYSTEM.md.
+
+---
+
+## Section 3: Project Context Files
+
+Pi walks the filesystem collecting context files:
+
+```
+1. ~/.gsd/agent/AGENTS.md (global)
+2. Walk from cwd upward to root:
+   - Each directory: check for AGENTS.md, then CLAUDE.md (first found wins per directory)
+   - Files are collected root-down (ancestors first, cwd last)
+```
+
+All found files are concatenated under a "# Project Context" header:
+
+```markdown
+# Project Context
+
+Project-specific instructions and guidelines:
+
+## /Users/you/.gsd/agent/AGENTS.md
+
+[global AGENTS.md content]
+
+## /Users/you/projects/myapp/AGENTS.md
+
+[project AGENTS.md content]
+```
+
+**AGENTS.md vs CLAUDE.md:** Both are treated identically. Per directory, AGENTS.md is checked first. If it exists, CLAUDE.md in the same directory is skipped.
+
+---
+
+## Section 4: Skills Listing
+
+If the `read` tool is active and skills are loaded, an XML block is appended:
+
+```xml
+The following skills provide specialized instructions for specific tasks.
+Use the read tool to load a skill's file when the task matches its description.
+When a skill file references a relative path, resolve it against the skill directory.
+
+<available_skills>
+  <skill>
+    <name>commit-outstanding</name>
+    <description>Commit all uncommitted files in logical groups</description>
+    <location>/Users/you/.agents/skills/commit-outstanding/SKILL.md</location>
+  </skill>
+</available_skills>
+```
+
+Skills with `disable-model-invocation: true` in their frontmatter are excluded from this listing.
+
+**Key design:** Only names, descriptions, and file paths go into the system prompt. The full skill content is NOT loaded. The agent uses the `read` tool to load specific skills on demand. This keeps the system prompt small even with many skills.
+
+---
+
+## Section 5: Date/Time and CWD
+
+Always appended last:
+
+```
+Current date and time: Saturday, March 7, 2026 at 08:55:05 AM CST
+Current working directory: /Users/you/projects/myapp
+```
+
+---
+
+## When the System Prompt Is Rebuilt
+
+The base system prompt (`_baseSystemPrompt`) is rebuilt in these situations:
+
+| Trigger | What happens |
+|---|---|
+| **Startup** (`_buildRuntime`) | Full rebuild with initial tool set |
+| **`setActiveToolsByName()`** | Rebuild with new tool set (guidelines and snippets change) |
+| **`reload()`** (`/reload`) | Full rebuild — reloads SYSTEM.md, APPEND_SYSTEM.md, context files, skills, extensions |
+| **`extendResourcesFromExtensions()`** | Rebuild after `resources_discover` adds new skills/prompts/themes |
+| **`_refreshToolRegistry()`** | Rebuild when extension tools change dynamically |
+
+### Per-Prompt Modifications
+
+On each user prompt, the `before_agent_start` hook can modify the system prompt. This modification is **not persisted** — the base prompt is restored if no extension modifies it on the next prompt:
+
+```
+User prompt 1:
+  before_agent_start → extensions modify system prompt → LLM sees modified version
+
+User prompt 2:
+  before_agent_start → no extensions modify → LLM sees base system prompt (reset)
+```
+
+This means `before_agent_start` modifications are truly per-prompt. You cannot make a permanent system prompt change through this hook alone (the change must be re-applied every time).
+
+---
+
+## Every Lever for Shaping the System Prompt
+
+From static configuration to dynamic extension hooks, ordered from broadest to most targeted:
+
+### Static (file-based, loaded at startup)
+
+| Mechanism | Scope | Effect |
+|---|---|---|
+| `SYSTEM.md` | Replace base prompt entirely | Nuclear option — you own everything |
+| `APPEND_SYSTEM.md` | Append to base prompt | Safe additive instructions |
+| `AGENTS.md` / `CLAUDE.md` | Project context section | Per-project conventions and rules |
+| Skill `SKILL.md` files | Skills listing | On-demand capability descriptions |
+
+### Dynamic (extension-based, runtime)
+
+| Mechanism | Scope | Timing | Effect |
+|---|---|---|---|
+| `before_agent_start` → `systemPrompt` | Full prompt | Per user prompt | Modify/append/replace system prompt |
+| `promptSnippet` on tools | Tool description line | When tool set changes | Custom one-liner in "Available tools" |
+| `promptGuidelines` on tools | Guidelines section | When tool set changes | Add behavioral bullets |
+| `pi.setActiveTools()` | Tool list + guidelines | Immediate, next prompt | Add/remove tools (rebuilds prompt) |
+| `resources_discover` event | Skills listing | Startup + reload | Inject additional skills from extensions |
+
+### Per-Turn (message-based, not system prompt)
+
+These don't modify the system prompt but add to what the LLM sees:
+
+| Mechanism | Timing | Effect |
+|---|---|---|
+| `before_agent_start` → `message` | Per user prompt | Inject custom message (becomes user role) |
+| `context` event | Per LLM turn | Filter/inject/transform message array |
+| `pi.sendMessage()` | Anytime | Inject custom message into conversation |
+
+---
+
+## Practical Tradeoffs
+
+### SYSTEM.md vs before_agent_start
+
+| | SYSTEM.md | before_agent_start |
+|---|---|---|
+| **Persistence** | Permanent until file changes | Per-prompt, must re-apply |
+| **Dynamism** | Static file content | Can compute based on state |
+| **Tool awareness** | Loses built-in tool guidelines | Preserves base prompt, appends |
+| **Composability** | Only one SYSTEM.md (project or global) | Multiple extensions can chain |
+
+**Recommendation:** Use SYSTEM.md only when you genuinely need to replace the entire prompt (e.g., custom agent personality, non-coding use case). Use `before_agent_start` for everything else.
+
+### APPEND_SYSTEM.md vs AGENTS.md
+
+Both append content, but they appear in different sections:
+
+- **APPEND_SYSTEM.md** appears immediately after the base prompt, before "# Project Context"
+- **AGENTS.md** appears inside "# Project Context" with a `## filepath` header
+
+Functionally equivalent for the LLM. Use APPEND_SYSTEM.md for instructions that feel like system-level directives. Use AGENTS.md for project-specific conventions and context.
+
+### promptGuidelines vs before_agent_start
+
+| | promptGuidelines | before_agent_start |
+|---|---|---|
+| **Scope** | Only when the tool is active | Always (or conditionally in your code) |
+| **Positioning** | Inside "Guidelines" section | Appended to end (or wherever you put it) |
+| **Tool coupling** | Automatically appears/disappears with tool | Independent of tool state |
+
+**Recommendation:** Use `promptGuidelines` for instructions directly related to tool usage. Use `before_agent_start` for behavioral modifications independent of tool state.
+
+---
+
+## The Full Context Surface Area
+
+Everything the LLM sees on a given turn:
+
+```
+System prompt (built from all sources above + before_agent_start mods)
+  +
+Message array (after context event filtering + convertToLlm):
+  - Compaction summaries (user role)
+  - Branch summaries (user role)
+  - Historical user/assistant/toolResult messages
+  - Bash execution results (user role, unless !! excluded)
+  - Custom messages from extensions (user role)
+  - Current prompt + before_agent_start injected messages
+  +
+Tool definitions:
+  - name, description, parameter JSON schema
+  - Only for active tools (pi.getActiveTools())
+```
+
+Understanding this complete surface area — and which levers control which parts — is the key to effective context engineering in pi.
diff --git a/docs/context-and-hooks/README.md b/docs/context-and-hooks/README.md
new file mode 100644
index 000000000..b8673b24b
--- /dev/null
+++ b/docs/context-and-hooks/README.md
@@ -0,0 +1,17 @@
+# Context & Hooks — Deep Reference
+
+How context flows through pi, how to intercept and shape it, and advanced patterns for extension authors.
+
+These documents fill gaps between the high-level extending-pi docs and the actual source implementation. Read the extending-pi docs first for fundamentals, then use these for precision work.
+
+## Documents
+
+| # | Document | When to read |
+|---|----------|-------------|
+| 01 | [The Context Pipeline](01-the-context-pipeline.md) | Understanding the full journey of a user prompt through every transformation stage to the LLM |
+| 02 | [Hook Reference](02-hook-reference.md) | Complete behavioral specification of every hook — timing, chaining, return shapes, edge cases |
+| 03 | [Context Injection Patterns](03-context-injection-patterns.md) | Practical recipes for injecting, filtering, transforming, and managing context |
+| 04 | [Message Types and LLM Visibility](04-message-types-and-llm-visibility.md) | How every message type is converted for the LLM, what it sees, what it doesn't |
+| 05 | [Inter-Extension Communication](05-inter-extension-communication.md) | `pi.events`, shared state patterns, and multi-extension coordination |
+| 06 | [Advanced Patterns from Source](06-advanced-patterns-from-source.md) | Production patterns extracted from the pi codebase and built-in extensions |
+| 07 | [The System Prompt Anatomy](07-the-system-prompt-anatomy.md) | How the system prompt is built, every input source, when it's rebuilt, and every lever to shape it |
diff --git a/docs/cost-management.md b/docs/cost-management.md
new file mode 100644
index 000000000..06214590d
--- /dev/null
+++ b/docs/cost-management.md
@@ -0,0 +1,93 @@
+# Cost Management
+
+GSD tracks token usage and cost for every unit of work dispatched during auto mode. This data powers the dashboard, budget enforcement, and cost projections.
+
+## Cost Tracking
+
+Every unit's metrics are captured automatically:
+
+- **Token counts** — input, output, cache read, cache write, total
+- **Cost** — USD cost per unit
+- **Duration** — wall-clock time
+- **Tool calls** — number of tool invocations
+- **Message counts** — assistant and user messages
+
+Data is stored in `.gsd/metrics.json` and survives across sessions.
+
+### Viewing Costs
+
+**Dashboard:** `Ctrl+Alt+G` or `/gsd status` shows real-time cost breakdown.
+
+**Aggregations available:**
+- By phase (research, planning, execution, completion, reassessment)
+- By slice (M001/S01, M001/S02, ...)
+- By model (which models consumed the most budget)
+- Project totals
+
+## Budget Ceiling
+
+Set a maximum spend for a project:
+
+```yaml
+---
+version: 1
+budget_ceiling: 50.00
+---
+```
+
+### Enforcement Modes
+
+Control what happens when the ceiling is reached:
+
+```yaml
+budget_enforcement: pause    # default when ceiling is set
+```
+
+| Mode | Behavior |
+|------|----------|
+| `warn` | Log a warning, continue executing |
+| `pause` | Pause auto mode, wait for user action |
+| `halt` | Stop auto mode entirely |
+
+## Cost Projections
+
+Once at least two slices have completed, GSD projects the remaining cost:
+
+```
+Projected remaining: $12.40 ($6.20/slice avg × 2 remaining)
+```
+
+Projections use per-slice averages from completed work. If the budget ceiling has been reached, a warning is appended.
+
+## Budget Pressure & Model Downgrading
+
+When approaching the budget ceiling, the [complexity router](./token-optimization.md#budget-pressure) automatically downgrades model assignments to cheaper tiers. This is graduated:
+
+- **< 50% used** — no adjustment
+- **50-75% used** — standard tasks downgrade to light
+- **75-90% used** — same, more aggressive
+- **> 90% used** — nearly everything downgrades; only heavy tasks stay at standard
+
+This ensures the budget is spread across remaining work instead of being exhausted early on complex tasks.
+
+## Token Profiles & Cost
+
+The `token_profile` preference directly affects cost:
+
+| Profile | Typical Savings | How |
+|---------|----------------|-----|
+| `budget` | 40-60% | Cheaper models, phase skipping, minimal context |
+| `balanced` | 10-20% | Default models, skip slice research, standard context |
+| `quality` | 0% (baseline) | Full models, all phases, full context |
+
+See [Token Optimization](./token-optimization.md) for details.
+
+## Tips
+
+- Start with `balanced` profile and a generous `budget_ceiling` to establish baseline costs
+- Check `/gsd status` after a few slices to see per-slice cost averages
+- Switch to `budget` profile for well-understood, repetitive work
+- Use `quality` only when architectural decisions are being made
+- Per-phase model selection lets you use Opus only for planning while keeping execution on Sonnet
+- Enable `dynamic_routing` for automatic model downgrading on simple tasks — see [Dynamic Model Routing](./dynamic-model-routing.md)
+- Use `/gsd visualize` → Metrics tab to see where your budget is going
diff --git a/docs/custom-models.md b/docs/custom-models.md
new file mode 100644
index 000000000..943d213bf
--- /dev/null
+++ b/docs/custom-models.md
@@ -0,0 +1,335 @@
+# Custom Models
+
+Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via `~/.gsd/agent/models.json`.
+
+## Table of Contents
+
+- [Minimal Example](#minimal-example)
+- [Full Example](#full-example)
+- [Supported APIs](#supported-apis)
+- [Provider Configuration](#provider-configuration)
+- [Model Configuration](#model-configuration)
+- [Overriding Built-in Providers](#overriding-built-in-providers)
+- [Per-model Overrides](#per-model-overrides)
+- [OpenAI Compatibility](#openai-compatibility)
+
+## Minimal Example
+
+For local models (Ollama, LM Studio, vLLM), only `id` is required per model:
+
+```json
+{
+  "providers": {
+    "ollama": {
+      "baseUrl": "http://localhost:11434/v1",
+      "api": "openai-completions",
+      "apiKey": "ollama",
+      "models": [
+        { "id": "llama3.1:8b" },
+        { "id": "qwen2.5-coder:7b" }
+      ]
+    }
+  }
+}
+```
+
+The `apiKey` is required but Ollama ignores it, so any value works.
+
+Some OpenAI-compatible servers do not understand the `developer` role used for reasoning-capable models. For those providers, set `compat.supportsDeveloperRole` to `false` so GSD sends the system prompt as a `system` message instead. If the server also does not support `reasoning_effort`, set `compat.supportsReasoningEffort` to `false` too.
+
+You can set `compat` at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.
+
+```json
+{
+  "providers": {
+    "ollama": {
+      "baseUrl": "http://localhost:11434/v1",
+      "api": "openai-completions",
+      "apiKey": "ollama",
+      "compat": {
+        "supportsDeveloperRole": false,
+        "supportsReasoningEffort": false
+      },
+      "models": [
+        {
+          "id": "gpt-oss:20b",
+          "reasoning": true
+        }
+      ]
+    }
+  }
+}
+```
+
+## Full Example
+
+Override defaults when you need specific values:
+
+```json
+{
+  "providers": {
+    "ollama": {
+      "baseUrl": "http://localhost:11434/v1",
+      "api": "openai-completions",
+      "apiKey": "ollama",
+      "models": [
+        {
+          "id": "llama3.1:8b",
+          "name": "Llama 3.1 8B (Local)",
+          "reasoning": false,
+          "input": ["text"],
+          "contextWindow": 128000,
+          "maxTokens": 32000,
+          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
+        }
+      ]
+    }
+  }
+}
+```
+
+The file reloads each time you open `/model`. Edit during session; no restart needed.
+
+## Supported APIs
+
+| API | Description |
+|-----|-------------|
+| `openai-completions` | OpenAI Chat Completions (most compatible) |
+| `openai-responses` | OpenAI Responses API |
+| `anthropic-messages` | Anthropic Messages API |
+| `google-generative-ai` | Google Generative AI |
+
+Set `api` at provider level (default for all models) or model level (override per model).
+
+## Provider Configuration
+
+| Field | Description |
+|-------|-------------|
+| `baseUrl` | API endpoint URL |
+| `api` | API type (see above) |
+| `apiKey` | API key (see value resolution below) |
+| `headers` | Custom headers (see value resolution below) |
+| `authHeader` | Set `true` to add `Authorization: Bearer <apiKey>` automatically |
+| `models` | Array of model configurations |
+| `modelOverrides` | Per-model overrides for built-in models on this provider |
+
+### Value Resolution
+
+The `apiKey` and `headers` fields support three formats:
+
+- **Shell command:** `"!command"` executes and uses stdout
+  ```json
+  "apiKey": "!security find-generic-password -ws 'anthropic'"
+  "apiKey": "!op read 'op://vault/item/credential'"
+  ```
+- **Environment variable:** Uses the value of the named variable
+  ```json
+  "apiKey": "MY_API_KEY"
+  ```
+- **Literal value:** Used directly
+  ```json
+  "apiKey": "sk-..."
+  ```
+
+### Custom Headers
+
+```json
+{
+  "providers": {
+    "custom-proxy": {
+      "baseUrl": "https://proxy.example.com/v1",
+      "apiKey": "MY_API_KEY",
+      "api": "anthropic-messages",
+      "headers": {
+        "x-portkey-api-key": "PORTKEY_API_KEY",
+        "x-secret": "!op read 'op://vault/item/secret'"
+      },
+      "models": [...]
+    }
+  }
+}
+```
+
+## Model Configuration
+
+| Field | Required | Default | Description |
+|-------|----------|---------|-------------|
+| `id` | Yes | — | Model identifier (passed to the API) |
+| `name` | No | `id` | Human-readable model label. Used for matching (`--model` patterns) and shown in model details/status text. |
+| `api` | No | provider's `api` | Override provider's API for this model |
+| `reasoning` | No | `false` | Supports extended thinking |
+| `input` | No | `["text"]` | Input types: `["text"]` or `["text", "image"]` |
+| `contextWindow` | No | `128000` | Context window size in tokens |
+| `maxTokens` | No | `16384` | Maximum output tokens |
+| `cost` | No | all zeros | `{"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}` (per million tokens) |
+| `compat` | No | provider `compat` | OpenAI compatibility overrides. Merged with provider-level `compat` when both are set. |
+
+Current behavior:
+- `/model` and `--list-models` list entries by model `id`.
+- The configured `name` is used for model matching and detail/status text.
+
+## Overriding Built-in Providers
+
+Route a built-in provider through a proxy without redefining models:
+
+```json
+{
+  "providers": {
+    "anthropic": {
+      "baseUrl": "https://my-proxy.example.com/v1"
+    }
+  }
+}
+```
+
+All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work.
+
+To merge custom models into a built-in provider, include the `models` array:
+
+```json
+{
+  "providers": {
+    "anthropic": {
+      "baseUrl": "https://my-proxy.example.com/v1",
+      "apiKey": "ANTHROPIC_API_KEY",
+      "api": "anthropic-messages",
+      "models": [...]
+    }
+  }
+}
+```
+
+Merge semantics:
+- Built-in models are kept.
+- Custom models are upserted by `id` within the provider.
+- If a custom model `id` matches a built-in model `id`, the custom model replaces that built-in model.
+- If a custom model `id` is new, it is added alongside built-in models.
+
+## Per-model Overrides
+
+Use `modelOverrides` to customize specific built-in models without replacing the provider's full model list.
+
+```json
+{
+  "providers": {
+    "openrouter": {
+      "modelOverrides": {
+        "anthropic/claude-sonnet-4": {
+          "name": "Claude Sonnet 4 (Bedrock Route)",
+          "compat": {
+            "openRouterRouting": {
+              "only": ["amazon-bedrock"]
+            }
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+`modelOverrides` supports these fields per model: `name`, `reasoning`, `input`, `cost` (partial), `contextWindow`, `maxTokens`, `headers`, `compat`.
+
+Behavior notes:
+- `modelOverrides` are applied to built-in provider models.
+- Unknown model IDs are ignored.
+- You can combine provider-level `baseUrl`/`headers` with `modelOverrides`.
+- If `models` is also defined for a provider, custom models are merged after built-in overrides. A custom model with the same `id` replaces the overridden built-in model entry.
+
+## OpenAI Compatibility
+
+For providers with partial OpenAI compatibility, use the `compat` field.
+
+- Provider-level `compat` applies defaults to all models under that provider.
+- Model-level `compat` overrides provider-level values for that model.
+
+```json
+{
+  "providers": {
+    "local-llm": {
+      "baseUrl": "http://localhost:8080/v1",
+      "api": "openai-completions",
+      "compat": {
+        "supportsUsageInStreaming": false,
+        "maxTokensField": "max_tokens"
+      },
+      "models": [...]
+    }
+  }
+}
+```
+
+| Field | Description |
+|-------|-------------|
+| `supportsStore` | Provider supports `store` field |
+| `supportsDeveloperRole` | Use `developer` vs `system` role |
+| `supportsReasoningEffort` | Support for `reasoning_effort` parameter |
+| `reasoningEffortMap` | Map GSD thinking levels to provider-specific `reasoning_effort` values |
+| `supportsUsageInStreaming` | Supports `stream_options: { include_usage: true }` (default: `true`) |
+| `maxTokensField` | Use `max_completion_tokens` or `max_tokens` |
+| `requiresToolResultName` | Include `name` on tool result messages |
+| `requiresAssistantAfterToolResult` | Insert an assistant message before a user message after tool results |
+| `requiresThinkingAsText` | Convert thinking blocks to plain text |
+| `thinkingFormat` | Use `reasoning_effort`, `zai`, `qwen`, or `qwen-chat-template` thinking parameters |
+| `supportsStrictMode` | Include the `strict` field in tool definitions |
+| `openRouterRouting` | OpenRouter routing config passed to OpenRouter for model/provider selection |
+| `vercelGatewayRouting` | Vercel AI Gateway routing config for provider selection (`only`, `order`) |
+
+`qwen` uses top-level `enable_thinking`. Use `qwen-chat-template` for local Qwen-compatible servers that require `chat_template_kwargs.enable_thinking`.
+
+Example:
+
+```json
+{
+  "providers": {
+    "openrouter": {
+      "baseUrl": "https://openrouter.ai/api/v1",
+      "apiKey": "OPENROUTER_API_KEY",
+      "api": "openai-completions",
+      "models": [
+        {
+          "id": "openrouter/anthropic/claude-3.5-sonnet",
+          "name": "OpenRouter Claude 3.5 Sonnet",
+          "compat": {
+            "openRouterRouting": {
+              "order": ["anthropic"],
+              "fallbacks": ["openai"]
+            }
+          }
+        }
+      ]
+    }
+  }
+}
+```
+
+Vercel AI Gateway example:
+
+```json
+{
+  "providers": {
+    "vercel-ai-gateway": {
+      "baseUrl": "https://ai-gateway.vercel.sh/v1",
+      "apiKey": "AI_GATEWAY_API_KEY",
+      "api": "openai-completions",
+      "models": [
+        {
+          "id": "moonshotai/kimi-k2.5",
+          "name": "Kimi K2.5 (Fireworks via Vercel)",
+          "reasoning": true,
+          "input": ["text", "image"],
+          "cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 },
+          "contextWindow": 262144,
+          "maxTokens": 262144,
+          "compat": {
+            "vercelGatewayRouting": {
+              "only": ["fireworks", "novita"],
+              "order": ["fireworks", "novita"]
+            }
+          }
+        }
+      ]
+    }
+  }
+}
+```
diff --git a/docs/dynamic-model-routing.md b/docs/dynamic-model-routing.md
new file mode 100644
index 000000000..9d0d5525e
--- /dev/null
+++ b/docs/dynamic-model-routing.md
@@ -0,0 +1,127 @@
+# Dynamic Model Routing
+
+*Introduced in v2.19.0*
+
+Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
+
+## How It Works
+
+Each unit dispatched by auto-mode is classified into a complexity tier:
+
+| Tier | Typical Work | Default Model Level |
+|------|-------------|-------------------|
+| **Light** | Slice completion, UAT, hooks | Haiku-class |
+| **Standard** | Research, planning, execution, milestone completion | Sonnet-class |
+| **Heavy** | Replanning, roadmap reassessment, complex execution | Opus-class |
+
+The router then selects a model for that tier. The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
+
+## Enabling
+
+Dynamic routing is off by default. Enable it in preferences:
+
+```yaml
+---
+version: 1
+dynamic_routing:
+  enabled: true
+---
+```
+
+## Configuration
+
+```yaml
+dynamic_routing:
+  enabled: true
+  tier_models:                    # explicit model per tier (optional)
+    light: claude-haiku-4-5
+    standard: claude-sonnet-4-6
+    heavy: claude-opus-4-6
+  escalate_on_failure: true       # bump tier on task failure (default: true)
+  budget_pressure: true           # auto-downgrade when approaching budget ceiling (default: true)
+  cross_provider: true            # consider models from other providers (default: true)
+  hooks: true                     # apply routing to post-unit hooks (default: true)
+```
+
+### `tier_models`
+
+Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:
+
+- **Light:** `claude-haiku-4-5`, `gpt-4o-mini`, `gemini-2.0-flash`
+- **Standard:** `claude-sonnet-4-6`, `gpt-4o`, `gemini-2.5-pro`
+- **Heavy:** `claude-opus-4-6`, `gpt-4.5-preview`, `gemini-2.5-pro`
+
+### `escalate_on_failure`
+
+When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
+
+### `budget_pressure`
+
+When approaching the budget ceiling, the router progressively downgrades:
+
+| Budget Used | Effect |
+|------------|--------|
+| < 50% | No adjustment |
+| 50-75% | Standard → Light |
+| 75-90% | More aggressive downgrading |
+| > 90% | Nearly everything → Light; only Heavy stays at Standard |
+
+### `cross_provider`
+
+When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
+
+## Complexity Classification
+
+Units are classified using pure heuristics — no LLM calls, sub-millisecond:
+
+### Unit Type Defaults
+
+| Unit Type | Default Tier |
+|-----------|-------------|
+| `complete-slice`, `run-uat` | Light |
+| `research-*`, `plan-*`, `complete-milestone` | Standard |
+| `execute-task` | Standard (upgraded by task analysis) |
+| `replan-slice`, `reassess-roadmap` | Heavy |
+| `hook/*` | Light |
+
+### Task Plan Analysis
+
+For `execute-task` units, the classifier analyzes the task plan:
+
+| Signal | Simple → Light | Complex → Heavy |
+|--------|---------------|----------------|
+| Step count | ≤ 3 | ≥ 8 |
+| File count | ≤ 3 | ≥ 8 |
+| Description length | < 500 chars | > 2000 chars |
+| Code blocks | — | ≥ 5 |
+| Complexity keywords | None | Present |
+
+**Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`, `distributed`, `backward compat`
+
+### Adaptive Learning
+
+The routing history (`.gsd/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (`over`/`under`/`ok`) is weighted 2× vs automatic outcomes.
+
+## Interaction with Token Profiles
+
+Dynamic routing and token profiles are complementary:
+
+- **Token profiles** (`budget`/`balanced`/`quality`) control phase skipping and context compression
+- **Dynamic routing** controls per-unit model selection within the configured phase model
+
+When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The `budget` token profile + dynamic routing provides maximum cost savings.
+
+## Cost Table
+
+The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):
+
+| Model | Input | Output |
+|-------|-------|--------|
+| claude-haiku-4-5 | $0.80 | $4.00 |
+| claude-sonnet-4-6 | $3.00 | $15.00 |
+| claude-opus-4-6 | $15.00 | $75.00 |
+| gpt-4o-mini | $0.15 | $0.60 |
+| gpt-4o | $2.50 | $10.00 |
+| gemini-2.0-flash | $0.10 | $0.40 |
+
+The cost table is used for comparison only — actual billing comes from your provider.
diff --git a/docs/extending-pi/01-what-are-extensions.md b/docs/extending-pi/01-what-are-extensions.md
new file mode 100644
index 000000000..cfd0177d3
--- /dev/null
+++ b/docs/extending-pi/01-what-are-extensions.md
@@ -0,0 +1,18 @@
+# What Are Extensions?
+
+
+Extensions are TypeScript modules that hook into pi's runtime to extend its behavior. They can:
+
+- **Register custom tools** the LLM can call (via `pi.registerTool()`)
+- **Intercept and modify events** — block dangerous tool calls, transform user input, inject context
+- **Register slash commands** (`/mycommand`) for the user
+- **Render custom UI** — dialogs, selectors, games, overlays, custom editors
+- **Persist state** across session restarts
+- **Control how tool calls and messages appear** in the TUI
+- **Modify the system prompt** dynamically per-turn
+- **Manage models and providers** — register custom providers, switch models
+- **Override built-in tools** — wrap `read`, `bash`, `edit`, `write` with custom logic
+
+**Why this matters:** Extensions are the primary mechanism for customizing pi. They turn pi from a generic coding agent into *your* coding agent — with your guardrails, your tools, your workflow.
+
+---
diff --git a/docs/extending-pi/02-architecture-mental-model.md b/docs/extending-pi/02-architecture-mental-model.md
new file mode 100644
index 000000000..fba0bcfa7
--- /dev/null
+++ b/docs/extending-pi/02-architecture-mental-model.md
@@ -0,0 +1,34 @@
+# Architecture & Mental Model
+
+
+```
+┌─────────────────────────────────────────────────────┐
+│                    Pi Runtime                        │
+│                                                     │
+│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐  │
+│  │  Session  │  │  Agent   │  │   Tool Executor  │  │
+│  │  Manager  │  │  Loop    │  │                  │  │
+│  └────┬─────┘  └────┬─────┘  └────────┬─────────┘  │
+│       │              │                 │             │
+│       └──────────────┼─────────────────┘             │
+│                      │                               │
+│              ┌───────▼────────┐                      │
+│              │  Event System  │ ◄── All events flow  │
+│              └───────┬────────┘     through here     │
+│                      │                               │
+│         ┌────────────┼────────────┐                  │
+│         ▼            ▼            ▼                  │
+│    Extension A  Extension B  Extension C             │
+│                                                     │
+└─────────────────────────────────────────────────────┘
+```
+
+**Key concepts:**
+
+- **Extensions are loaded once** when pi starts (or on `/reload`). Your default export function runs, and you subscribe to events and register tools/commands during that function call.
+- **Events are the communication mechanism.** Pi emits events at every stage of its lifecycle. Your extension listens and reacts.
+- **Tools are the LLM's interface to your extension.** The LLM sees tool descriptions in its system prompt and calls them when appropriate.
+- **Commands are the user's interface.** Users type `/mycommand` to invoke your extension directly.
+- **State lives in tool result `details`** for proper branching/forking support, or in `pi.appendEntry()` for extension-private state.
+
+---
diff --git a/docs/extending-pi/03-getting-started.md b/docs/extending-pi/03-getting-started.md
new file mode 100644
index 000000000..3d4eb2909
--- /dev/null
+++ b/docs/extending-pi/03-getting-started.md
@@ -0,0 +1,36 @@
+# Getting Started
+
+
+### Minimal Extension
+
+Create `~/.gsd/agent/extensions/my-extension.ts`:
+
+```typescript
+import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
+
+export default function (pi: ExtensionAPI) {
+  pi.on("session_start", async (_event, ctx) => {
+    ctx.ui.notify("Extension loaded!", "info");
+  });
+}
+```
+
+### Testing
+
+```bash
+# Quick test (doesn't need to be in extensions dir)
+pi -e ./my-extension.ts
+
+# Or just place it in the extensions dir and start pi
+pi
+```
+
+### Hot Reload
+
+Extensions in auto-discovered locations (`~/.gsd/agent/extensions/` or `.gsd/extensions/`) can be hot-reloaded:
+
+```
+/reload
+```
+
+---
diff --git a/docs/extending-pi/04-extension-locations-discovery.md b/docs/extending-pi/04-extension-locations-discovery.md
new file mode 100644
index 000000000..d7090f57c
--- /dev/null
+++ b/docs/extending-pi/04-extension-locations-discovery.md
@@ -0,0 +1,32 @@
+# Extension Locations & Discovery
+
+
+### Auto-Discovery Paths
+
+| Location | Scope |
+|----------|-------|
+| `~/.gsd/agent/extensions/*.ts` | Global (all projects) |
+| `~/.gsd/agent/extensions/*/index.ts` | Global (subdirectory) |
+| `.gsd/extensions/*.ts` | Project-local |
+| `.gsd/extensions/*/index.ts` | Project-local (subdirectory) |
+
+### Additional Paths (via settings.json)
+
+```json
+{
+  "extensions": [
+    "/path/to/local/extension.ts",
+    "/path/to/local/extension/dir"
+  ],
+  "packages": [
+    "npm:@foo/bar@1.0.0",
+    "git:github.com/user/repo@v1"
+  ]
+}
+```
+
+### Security Warning
+
+> Extensions run with your **full system permissions**. They can execute arbitrary code, read/write any file, make network requests. Only install from sources you trust.
+
+---
diff --git a/docs/extending-pi/05-extension-structure-styles.md b/docs/extending-pi/05-extension-structure-styles.md
new file mode 100644
index 000000000..86e6c4ab7
--- /dev/null
+++ b/docs/extending-pi/05-extension-structure-styles.md
@@ -0,0 +1,54 @@
+# Extension Structure & Styles
+
+
+### Single File (simplest)
+
+```
+~/.gsd/agent/extensions/
+└── my-extension.ts
+```
+
+### Directory with index.ts (multi-file)
+
+```
+~/.gsd/agent/extensions/
+└── my-extension/
+    ├── index.ts        # Entry point (must export default function)
+    ├── tools.ts
+    └── utils.ts
+```
+
+### Package with Dependencies (npm packages needed)
+
+```
+~/.gsd/agent/extensions/
+└── my-extension/
+    ├── package.json
+    ├── package-lock.json
+    ├── node_modules/
+    └── src/
+        └── index.ts
+```
+
+```json
+// package.json
+{
+  "name": "my-extension",
+  "dependencies": { "zod": "^3.0.0" },
+  "pi": { "extensions": ["./src/index.ts"] }
+}
+```
+
+Run `npm install` in the extension directory. Imports from `node_modules/` resolve automatically.
+
+### Available Imports
+
+| Package | Purpose |
+|---------|---------|
+| `@mariozechner/pi-coding-agent` | Extension types (`ExtensionAPI`, `ExtensionContext`, event types, utilities) |
+| `@sinclair/typebox` | Schema definitions for tool parameters (`Type.Object`, `Type.String`, etc.) |
+| `@mariozechner/pi-ai` | AI utilities (`StringEnum` for Google-compatible enums) |
+| `@mariozechner/pi-tui` | TUI components (`Text`, `Box`, `Container`, `SelectList`, etc.) |
+| Node.js built-ins | `node:fs`, `node:path`, `node:child_process`, etc. |
+
+---
diff --git a/docs/extending-pi/06-the-extension-lifecycle.md b/docs/extending-pi/06-the-extension-lifecycle.md
new file mode 100644
index 000000000..e193e4112
--- /dev/null
+++ b/docs/extending-pi/06-the-extension-lifecycle.md
@@ -0,0 +1,42 @@
+# The Extension Lifecycle
+
+
+```
+pi starts
+  │
+  └─► Extension default function runs
+      ├── pi.on("event", handler)    ← Subscribe to events
+      ├── pi.registerTool({...})     ← Register tools
+      ├── pi.registerCommand(...)    ← Register commands
+      └── pi.registerShortcut(...)   ← Register keyboard shortcuts
+      
+  └─► session_start event fires
+      │
+      ▼
+  User types a prompt ─────────────────────────────────────┐
+      │                                                    │
+      ├─► Extension commands checked (bypass if match)     │
+      ├─► input event (can intercept/transform)            │
+      ├─► Skill/template expansion                         │
+      ├─► before_agent_start (inject message, modify       │
+      │   system prompt)                                   │
+      ├─► agent_start                                      │
+      │                                                    │
+      │   ┌── Turn loop (repeats while LLM calls tools)──┐│
+      │   │ turn_start                                    ││
+      │   │ context (can modify messages sent to LLM)     ││
+      │   │ LLM responds → may call tools:                ││
+      │   │   tool_call (can BLOCK)                       ││
+      │   │   tool_execution_start/update/end             ││
+      │   │   tool_result (can MODIFY)                    ││
+      │   │ turn_end                                      ││
+      │   └───────────────────────────────────────────────┘│
+      │                                                    │
+      └─► agent_end                                        │
+                                                           │
+  User types another prompt ◄──────────────────────────────┘
+```
+
+**Critical insight:** The event system is your primary mechanism for interacting with pi. Every meaningful thing that happens emits an event, and most events let you modify or block the behavior.
+
+---
diff --git a/docs/extending-pi/07-events-the-nervous-system.md b/docs/extending-pi/07-events-the-nervous-system.md
new file mode 100644
index 000000000..e2380231b
--- /dev/null
+++ b/docs/extending-pi/07-events-the-nervous-system.md
@@ -0,0 +1,93 @@
+# Events — The Nervous System
+
+
+Events are the core of the extension system. They fall into five categories:
+
+### 7.1 Session Events
+
+| Event | When | Can Return |
+|-------|------|------------|
+| `session_start` | Session loads | — |
+| `session_before_switch` | Before `/new` or `/resume` | `{ cancel: true }` |
+| `session_switch` | After session switch | — |
+| `session_before_fork` | Before `/fork` | `{ cancel: true }` or `{ skipConversationRestore: true }` |
+| `session_fork` | After fork | — |
+| `session_before_compact` | Before compaction | `{ cancel: true }` or `{ compaction: {...} }` (custom summary) |
+| `session_compact` | After compaction | — |
+| `session_before_tree` | Before `/tree` navigation | `{ cancel: true }` or `{ summary: {...} }` |
+| `session_tree` | After tree navigation | — |
+| `session_shutdown` | On exit (Ctrl+C, Ctrl+D, SIGTERM) | — |
+
+### 7.2 Agent Events
+
+| Event | When | Can Return |
+|-------|------|------------|
+| `before_agent_start` | After user prompt, before agent loop | `{ message: {...}, systemPrompt: "..." }` |
+| `agent_start` | Agent loop begins | — |
+| `agent_end` | Agent loop ends | — |
+| `turn_start` | Each LLM turn begins | — |
+| `turn_end` | Each LLM turn ends | — |
+| `context` | Before each LLM call | `{ messages: [...] }` (modified copy) |
+| `message_start/update/end` | Message lifecycle | — |
+
+### 7.3 Tool Events
+
+| Event | When | Can Return |
+|-------|------|------------|
+| `tool_call` | Before tool executes | `{ block: true, reason: "..." }` |
+| `tool_execution_start` | Tool begins executing | — |
+| `tool_execution_update` | Tool sends progress | — |
+| `tool_execution_end` | Tool finishes | — |
+| `tool_result` | After tool executes | `{ content: [...], details: {...}, isError: bool }` (modify result) |
+
+### 7.4 Input Events
+
+| Event | When | Can Return |
+|-------|------|------------|
+| `input` | User input received (before skill/template expansion) | `{ action: "transform", text: "..." }` or `{ action: "handled" }` or `{ action: "continue" }` |
+
+### 7.5 Model Events
+
+| Event | When | Can Return |
+|-------|------|------------|
+| `model_select` | Model changes (`/model`, Ctrl+P, restore) | — |
+
+### 7.6 User Bash Events
+
+| Event | When | Can Return |
+|-------|------|------------|
+| `user_bash` | User runs `!` or `!!` commands | `{ operations: ... }` or `{ result: {...} }` |
+
+### Event Handler Signature
+
+```typescript
+pi.on("event_name", async (event, ctx: ExtensionContext) => {
+  // event — typed payload for this event
+  // ctx — access to UI, session, model, and control flow
+  
+  // Return undefined for no action, or a typed response object
+});
+```
+
+### Type Narrowing for Tool Events
+
+```typescript
+import { isToolCallEventType, isToolResultEventType } from "@mariozechner/pi-coding-agent";
+
+pi.on("tool_call", async (event, ctx) => {
+  if (isToolCallEventType("bash", event)) {
+    // event.input is typed as { command: string; timeout?: number }
+  }
+  if (isToolCallEventType("write", event)) {
+    // event.input is typed as { path: string; content: string }
+  }
+});
+
+pi.on("tool_result", async (event, ctx) => {
+  if (isToolResultEventType("bash", event)) {
+    // event.details is typed as BashToolDetails
+  }
+});
+```
+
+---
diff --git a/docs/extending-pi/08-extensioncontext-what-you-can-access.md b/docs/extending-pi/08-extensioncontext-what-you-can-access.md
new file mode 100644
index 000000000..241ae2e05
--- /dev/null
+++ b/docs/extending-pi/08-extensioncontext-what-you-can-access.md
@@ -0,0 +1,85 @@
+# ExtensionContext — What You Can Access
+
+
+Every event handler receives `ctx: ExtensionContext`. This is your window into pi's runtime state.
+
+### ctx.ui — User Interaction
+
+The primary way to interact with the user. See [Section 12: Custom UI](#12-custom-ui--visual-components) for full details.
+
+```typescript
+// Dialogs (blocking, wait for user response)
+const choice = await ctx.ui.select("Pick one:", ["A", "B", "C"]);
+const ok = await ctx.ui.confirm("Delete?", "This cannot be undone");
+const name = await ctx.ui.input("Name:", "placeholder");
+const text = await ctx.ui.editor("Edit:", "prefilled text");
+
+// Non-blocking UI
+ctx.ui.notify("Done!", "info");           // Toast notification
+ctx.ui.setStatus("my-ext", "Active");     // Footer status
+ctx.ui.setWidget("my-id", ["Line 1"]);    // Widget above/below editor
+ctx.ui.setTitle("pi - my project");       // Terminal title
+ctx.ui.setEditorText("Prefill text");     // Set editor content
+ctx.ui.setWorkingMessage("Thinking...");  // Working message during streaming
+```
+
+### ctx.hasUI
+
+`false` in print mode (`-p`) and JSON mode. `true` in interactive and RPC mode. Always check before calling dialog methods in non-interactive contexts.
+
+### ctx.cwd
+
+Current working directory (string).
+
+### ctx.sessionManager — Session State
+
+Read-only access to the session:
+
+```typescript
+ctx.sessionManager.getEntries()       // All entries in session
+ctx.sessionManager.getBranch()        // Current branch entries
+ctx.sessionManager.getLeafId()        // Current leaf entry ID
+ctx.sessionManager.getSessionFile()   // Path to session JSONL file
+ctx.sessionManager.getLabel(entryId)  // Get label on entry
+```
+
+### ctx.modelRegistry / ctx.model
+
+Access to available models and the current model.
+
+### ctx.isIdle() / ctx.abort() / ctx.hasPendingMessages()
+
+Control flow helpers for checking agent state.
+
+### ctx.shutdown()
+
+Request graceful shutdown. Deferred until agent is idle. Emits `session_shutdown` before exiting.
+
+### ctx.getContextUsage()
+
+Returns current context token usage. Useful for triggering compaction or showing stats.
+
+```typescript
+const usage = ctx.getContextUsage();
+if (usage && usage.tokens > 100_000) {
+  // Context is getting large
+}
+```
+
+### ctx.compact(options?)
+
+Trigger compaction programmatically:
+
+```typescript
+ctx.compact({
+  customInstructions: "Focus on recent changes",
+  onComplete: (result) => ctx.ui.notify("Compacted!", "info"),
+  onError: (error) => ctx.ui.notify(`Failed: ${error.message}`, "error"),
+});
+```
+
+### ctx.getSystemPrompt()
+
+Returns the current effective system prompt (including any `before_agent_start` modifications).
+
+---
diff --git a/docs/extending-pi/09-extensionapi-what-you-can-do.md b/docs/extending-pi/09-extensionapi-what-you-can-do.md
new file mode 100644
index 000000000..df63dcd96
--- /dev/null
+++ b/docs/extending-pi/09-extensionapi-what-you-can-do.md
@@ -0,0 +1,77 @@
+# ExtensionAPI — What You Can Do
+
+
+The `pi` object (received in your default export function) is your registration interface. It persists for the lifetime of the extension.
+
+### Core Registration
+
+| Method | Purpose |
+|--------|---------|
+| `pi.on(event, handler)` | Subscribe to events |
+| `pi.registerTool(definition)` | Register a tool the LLM can call |
+| `pi.registerCommand(name, options)` | Register a `/command` |
+| `pi.registerShortcut(key, options)` | Register a keyboard shortcut |
+| `pi.registerFlag(name, options)` | Register a CLI flag |
+| `pi.registerMessageRenderer(customType, renderer)` | Custom message rendering |
+| `pi.registerProvider(name, config)` | Register/override a model provider |
+| `pi.unregisterProvider(name)` | Remove a provider |
+
+### Messaging
+
+| Method | Purpose |
+|--------|---------|
+| `pi.sendMessage(message, options?)` | Inject a custom message into the session |
+| `pi.sendUserMessage(content, options?)` | Send a user message (triggers a turn) |
+
+**`sendMessage` delivery modes:**
+- `"steer"` (default) — Interrupts streaming. Delivered after current tool finishes, remaining tools skipped.
+- `"followUp"` — Waits for agent to finish. Delivered when agent has no more tool calls.
+- `"nextTurn"` — Queued for next user prompt. Does not interrupt.
+
+### State & Session
+
+| Method | Purpose |
+|--------|---------|
+| `pi.appendEntry(customType, data?)` | Persist extension state (NOT sent to LLM) |
+| `pi.setSessionName(name)` | Set display name for session selector |
+| `pi.getSessionName()` | Get current session name |
+| `pi.setLabel(entryId, label)` | Bookmark an entry for `/tree` navigation |
+
+### Tool Management
+
+| Method | Purpose |
+|--------|---------|
+| `pi.getActiveTools()` | Get currently active tool names |
+| `pi.getAllTools()` | Get all registered tools (name + description) |
+| `pi.setActiveTools(names)` | Enable/disable tools at runtime |
+
+### Model Management
+
+| Method | Purpose |
+|--------|---------|
+| `pi.setModel(model)` | Switch model. Returns `false` if no API key. |
+| `pi.getThinkingLevel()` | Get current thinking level |
+| `pi.setThinkingLevel(level)` | Set thinking level (`"off"` through `"xhigh"`) |
+
+### Utilities
+
+| Method | Purpose |
+|--------|---------|
+| `pi.exec(command, args, options?)` | Execute a shell command |
+| `pi.events` | Shared event bus for inter-extension communication |
+| `pi.getFlag(name)` | Get value of a registered CLI flag |
+| `pi.getCommands()` | Get all available slash commands |
+
+### ExtensionCommandContext (commands only)
+
+Command handlers receive `ExtensionCommandContext`, which adds session control methods not available in regular event handlers (they would deadlock there):
+
+| Method | Purpose |
+|--------|---------|
+| `ctx.waitForIdle()` | Wait for agent to finish streaming |
+| `ctx.newSession(options?)` | Create a new session |
+| `ctx.fork(entryId)` | Fork from an entry |
+| `ctx.navigateTree(targetId, options?)` | Navigate the session tree |
+| `ctx.reload()` | Hot-reload extensions, skills, prompts, themes |
+
+---
diff --git a/docs/extending-pi/10-custom-tools-giving-the-llm-new-abilities.md b/docs/extending-pi/10-custom-tools-giving-the-llm-new-abilities.md
new file mode 100644
index 000000000..9b1105c89
--- /dev/null
+++ b/docs/extending-pi/10-custom-tools-giving-the-llm-new-abilities.md
@@ -0,0 +1,148 @@
+# Custom Tools — Giving the LLM New Abilities
+
+
+Tools are the most powerful extension capability. They appear in the LLM's system prompt and the LLM calls them autonomously when appropriate.
+
+### Tool Definition
+
+```typescript
+import { Type } from "@sinclair/typebox";
+import { StringEnum } from "@mariozechner/pi-ai";
+
+pi.registerTool({
+  name: "my_tool",                    // Unique identifier
+  label: "My Tool",                   // Display name in TUI
+  description: "What this does",      // Shown to LLM in system prompt
+  
+  // Optional: customize the one-liner in the system prompt's "Available tools" section
+  promptSnippet: "List or add items to the project todo list",
+  
+  // Optional: add bullets to the system prompt's "Guidelines" section when tool is active
+  promptGuidelines: [
+    "Use this tool for todo planning instead of direct file edits."
+  ],
+  
+  // Parameter schema (MUST use TypeBox)
+  parameters: Type.Object({
+    action: StringEnum(["list", "add"] as const),  // ⚠️ Use StringEnum, NOT Type.Union/Type.Literal
+    text: Type.Optional(Type.String()),
+  }),
+
+  // The execution function
+  async execute(toolCallId, params, signal, onUpdate, ctx) {
+    // Check for cancellation
+    if (signal?.aborted) {
+      return { content: [{ type: "text", text: "Cancelled" }] };
+    }
+
+    // Stream progress updates to the UI
+    onUpdate?.({
+      content: [{ type: "text", text: "Working..." }],
+      details: { progress: 50 },
+    });
+
+    // Do the work
+    const result = await doSomething(params);
+
+    // Return result
+    return {
+      content: [{ type: "text", text: "Done" }],  // Sent to LLM as context
+      details: { data: result },                   // For rendering & state reconstruction
+    };
+  },
+
+  // Optional: Custom TUI rendering (see Section 14)
+  renderCall(args, theme) { ... },
+  renderResult(result, options, theme) { ... },
+});
+```
+
+### ⚠️ Critical: Use StringEnum
+
+For string enum parameters, you **must** use `StringEnum` from `@mariozechner/pi-ai`. `Type.Union([Type.Literal("a"), Type.Literal("b")])` does NOT work with Google's API.
+
+```typescript
+import { StringEnum } from "@mariozechner/pi-ai";
+
+// ✅ Correct
+action: StringEnum(["list", "add", "remove"] as const)
+
+// ❌ Broken with Google
+action: Type.Union([Type.Literal("list"), Type.Literal("add")])
+```
+
+### Dynamic Tool Registration
+
+Tools can be registered at any time — during load, in `session_start`, in command handlers, etc. New tools are available immediately without `/reload`.
+
+```typescript
+pi.on("session_start", async (_event, ctx) => {
+  pi.registerTool({ name: "dynamic_tool", ... });
+});
+
+pi.registerCommand("add-tool", {
+  handler: async (args, ctx) => {
+    pi.registerTool({ name: "runtime_tool", ... });
+    ctx.ui.notify("Tool registered!", "info");
+  },
+});
+```
+
+### Output Truncation
+
+**Tools MUST truncate output** to avoid overwhelming the LLM context. The built-in limit is 50KB / 2000 lines (whichever first).
+
+```typescript
+import {
+  truncateHead, truncateTail, formatSize,
+  DEFAULT_MAX_BYTES, DEFAULT_MAX_LINES,
+} from "@mariozechner/pi-coding-agent";
+
+async execute(toolCallId, params, signal, onUpdate, ctx) {
+  const output = await runCommand();
+  const truncation = truncateHead(output, {
+    maxLines: DEFAULT_MAX_LINES,
+    maxBytes: DEFAULT_MAX_BYTES,
+  });
+
+  let result = truncation.content;
+  if (truncation.truncated) {
+    result += `\n\n[Output truncated: ${truncation.outputLines}/${truncation.totalLines} lines]`;
+  }
+  return { content: [{ type: "text", text: result }] };
+}
+```
+
+### Overriding Built-in Tools
+
+Register a tool with the same name as a built-in (`read`, `bash`, `edit`, `write`, `grep`, `find`, `ls`) to override it. Your implementation **must match the exact result shape** including the `details` type.
+
+```bash
+# Start with no built-in tools, only your extensions
+pi --no-tools -e ./my-extension.ts
+```
+
+### Remote Execution via Pluggable Operations
+
+Built-in tools support pluggable operations for SSH, containers, etc.:
+
+```typescript
+import { createReadTool, createBashTool } from "@mariozechner/pi-coding-agent";
+
+const remoteBash = createBashTool(cwd, {
+  operations: { execute: (cmd) => sshExec(remote, cmd) }
+});
+
+// The bash tool also supports a spawnHook:
+const bashTool = createBashTool(cwd, {
+  spawnHook: ({ command, cwd, env }) => ({
+    command: `source ~/.profile\n${command}`,
+    cwd: `/mnt/sandbox${cwd}`,
+    env: { ...env, CI: "1" },
+  }),
+});
+```
+
+**Operations interfaces:** `ReadOperations`, `WriteOperations`, `EditOperations`, `BashOperations`, `LsOperations`, `GrepOperations`, `FindOperations`
+
+---
diff --git a/docs/extending-pi/11-custom-commands-user-facing-actions.md b/docs/extending-pi/11-custom-commands-user-facing-actions.md
new file mode 100644
index 000000000..25157f607
--- /dev/null
+++ b/docs/extending-pi/11-custom-commands-user-facing-actions.md
@@ -0,0 +1,40 @@
+# Custom Commands — User-Facing Actions
+
+
+Commands let users invoke your extension directly via `/mycommand`.
+
+```typescript
+pi.registerCommand("deploy", {
+  description: "Deploy to an environment",
+  
+  // Optional: argument auto-completion
+  getArgumentCompletions: (prefix: string) => {
+    const envs = ["dev", "staging", "prod"];
+    return envs
+      .filter(e => e.startsWith(prefix))
+      .map(e => ({ value: e, label: e }));
+  },
+  
+  handler: async (args, ctx) => {
+    // args = everything after "/deploy "
+    // ctx = ExtensionCommandContext (has extra session control methods)
+    
+    await ctx.waitForIdle();  // Wait for agent to finish
+    ctx.ui.notify(`Deploying to ${args}`, "info");
+  },
+});
+```
+
+### Command Context Extras
+
+Command handlers get `ExtensionCommandContext` which extends `ExtensionContext` with:
+
+- `ctx.waitForIdle()` — Wait for agent to finish
+- `ctx.newSession(options?)` — Create a new session
+- `ctx.fork(entryId)` — Fork from an entry
+- `ctx.navigateTree(targetId, options?)` — Navigate the session tree
+- `ctx.reload()` — Hot-reload everything
+
+> **Important:** These methods are only available in commands, not in event handlers, because they would deadlock there.
+
+---
diff --git a/docs/extending-pi/12-custom-ui-visual-components.md b/docs/extending-pi/12-custom-ui-visual-components.md
new file mode 100644
index 000000000..6c963f5b9
--- /dev/null
+++ b/docs/extending-pi/12-custom-ui-visual-components.md
@@ -0,0 +1,195 @@
+# Custom UI — Visual Components
+
+
+Pi's extension UI has multiple layers, from simple notifications to full custom components.
+
+### 12.1 Dialogs (Blocking)
+
+```typescript
+// Selection
+const choice = await ctx.ui.select("Pick one:", ["A", "B", "C"]);
+
+// Confirmation
+const ok = await ctx.ui.confirm("Delete?", "This cannot be undone");
+
+// Text input
+const name = await ctx.ui.input("Name:", "placeholder");
+
+// Multi-line editor
+const text = await ctx.ui.editor("Edit:", "prefilled text");
+```
+
+#### Timed Dialogs
+
+```typescript
+// Auto-dismiss after 5s with countdown: "Title (5s)" → "Title (4s)" → ...
+const ok = await ctx.ui.confirm("Auto-confirm?", "Proceeds in 5s", { timeout: 5000 });
+// Returns false on timeout
+```
+
+### 12.2 Persistent UI Elements
+
+```typescript
+// Footer status (persistent until cleared)
+ctx.ui.setStatus("my-ext", "● Active");
+ctx.ui.setStatus("my-ext", undefined);   // Clear
+
+// Widget above editor (default placement)
+ctx.ui.setWidget("my-widget", ["Line 1", "Line 2"]);
+
+// Widget below editor
+ctx.ui.setWidget("my-widget", ["Below!"], { placement: "belowEditor" });
+
+// Widget with theme callback
+ctx.ui.setWidget("my-widget", (_tui, theme) => ({
+  render: () => [theme.fg("accent", "Styled widget")],
+  invalidate: () => {},
+}));
+
+// Working message during streaming
+ctx.ui.setWorkingMessage("Analyzing code...");
+ctx.ui.setWorkingMessage();  // Restore default
+
+// Custom footer (replaces built-in entirely)
+ctx.ui.setFooter((tui, theme, footerData) => ({
+  render(width) { return [theme.fg("dim", `branch: ${footerData.getGitBranch()}`)]; },
+  invalidate() {},
+  dispose: footerData.onBranchChange(() => tui.requestRender()),
+}));
+
+// Editor control
+ctx.ui.setEditorText("Prefill");
+const current = ctx.ui.getEditorText();
+ctx.ui.pasteToEditor("pasted content");
+
+// Tool expansion
+ctx.ui.setToolsExpanded(true);
+ctx.ui.setToolsExpanded(false);
+
+// Theme management
+const themes = ctx.ui.getAllThemes();
+ctx.ui.setTheme("light");
+```
+
+### 12.3 Custom Components (ctx.ui.custom)
+
+For complex UI, `ctx.ui.custom()` temporarily replaces the editor with your component:
+
+```typescript
+const result = await ctx.ui.custom<string | null>((tui, theme, keybindings, done) => {
+  // Return a component object
+  return {
+    render(width: number): string[] {
+      return ["Press Enter to confirm, Escape to cancel"];
+    },
+    handleInput(data: string) {
+      if (matchesKey(data, Key.enter)) done("confirmed");
+      if (matchesKey(data, Key.escape)) done(null);
+    },
+    invalidate() {},
+  };
+});
+```
+
+### 12.4 Overlays (Floating Modals)
+
+```typescript
+const result = await ctx.ui.custom<string | null>(
+  (tui, theme, keybindings, done) => new MyDialog({ onClose: done }),
+  {
+    overlay: true,
+    overlayOptions: {
+      anchor: "center",         // 9 positions: center, top-left, top-right, etc.
+      width: "50%",
+      maxHeight: "80%",
+      margin: 2,
+      visible: (w, h) => w >= 80,  // Hide on narrow terminals
+    },
+    onHandle: (handle) => {
+      // handle.setHidden(true/false)
+    },
+  }
+);
+```
+
+### 12.5 Custom Editor (Replace Main Input)
+
+```typescript
+import { CustomEditor } from "@mariozechner/pi-coding-agent";
+import { matchesKey, truncateToWidth } from "@mariozechner/pi-tui";
+
+class VimEditor extends CustomEditor {
+  private mode: "normal" | "insert" = "insert";
+
+  handleInput(data: string): void {
+    if (matchesKey(data, "escape") && this.mode === "insert") {
+      this.mode = "normal";
+      return;
+    }
+    if (this.mode === "insert") {
+      super.handleInput(data);  // Normal text editing + app keybindings
+      return;
+    }
+    // Vim normal mode keys...
+    if (data === "i") { this.mode = "insert"; return; }
+    super.handleInput(data);  // Pass unhandled to parent
+  }
+}
+
+// Register:
+ctx.ui.setEditorComponent((_tui, theme, keybindings) => new VimEditor(theme, keybindings));
+ctx.ui.setEditorComponent(undefined);  // Restore default
+```
+
+> **Key point:** Extend `CustomEditor` (not `Editor`) to get app keybindings (escape to abort, ctrl+d, model switching).
+
+### 12.6 Built-in TUI Components
+
+Import from `@mariozechner/pi-tui`:
+
+| Component | Purpose |
+|-----------|---------|
+| `Text` | Multi-line text with word wrapping |
+| `Box` | Container with padding and background |
+| `Container` | Groups children vertically |
+| `Spacer` | Empty vertical space |
+| `Markdown` | Rendered markdown with syntax highlighting |
+| `Image` | Image rendering (Kitty, iTerm2, etc.) |
+| `SelectList` | Interactive selection from list |
+| `SettingsList` | Toggle settings UI |
+| `Input` | Text input field |
+
+Import from `@mariozechner/pi-coding-agent`:
+
+| Component | Purpose |
+|-----------|---------|
+| `DynamicBorder` | Border line with theming |
+| `BorderedLoader` | Spinner with cancel support |
+
+### 12.7 Keyboard Input
+
+```typescript
+import { matchesKey, Key } from "@mariozechner/pi-tui";
+
+handleInput(data: string) {
+  if (matchesKey(data, Key.up)) { /* arrow up */ }
+  if (matchesKey(data, Key.enter)) { /* enter */ }
+  if (matchesKey(data, Key.escape)) { /* escape */ }
+  if (matchesKey(data, Key.ctrl("c"))) { /* ctrl+c */ }
+  if (matchesKey(data, Key.shift("tab"))) { /* shift+tab */ }
+}
+```
+
+### 12.8 Line Width Rules
+
+**Critical:** Each line from `render()` must not exceed the `width` parameter.
+
+```typescript
+import { visibleWidth, truncateToWidth, wrapTextWithAnsi } from "@mariozechner/pi-tui";
+
+render(width: number): string[] {
+  return [truncateToWidth(this.text, width)];
+}
+```
+
+---
diff --git a/docs/extending-pi/13-state-management-persistence.md b/docs/extending-pi/13-state-management-persistence.md
new file mode 100644
index 000000000..85485aa11
--- /dev/null
+++ b/docs/extending-pi/13-state-management-persistence.md
@@ -0,0 +1,55 @@
+# State Management & Persistence
+
+
+### Pattern: State in Tool Result Details
+
+The recommended approach for stateful tools. State lives in `details` so it works correctly with branching/forking.
+
+```typescript
+export default function (pi: ExtensionAPI) {
+  let items: string[] = [];
+
+  // Reconstruct from session on load
+  pi.on("session_start", async (_event, ctx) => {
+    items = [];
+    for (const entry of ctx.sessionManager.getBranch()) {
+      if (entry.type === "message" && entry.message.role === "toolResult") {
+        if (entry.message.toolName === "my_tool") {
+          items = entry.message.details?.items ?? [];
+        }
+      }
+    }
+  });
+
+  pi.registerTool({
+    name: "my_tool",
+    // ...
+    async execute(toolCallId, params, signal, onUpdate, ctx) {
+      items.push(params.text);
+      return {
+        content: [{ type: "text", text: "Added" }],
+        details: { items: [...items] },  // ← Snapshot state here
+      };
+    },
+  });
+}
+```
+
+### Pattern: Extension-Private State (appendEntry)
+
+For state that doesn't participate in LLM context but needs to survive restarts:
+
+```typescript
+pi.appendEntry("my-state", { count: 42, lastRun: Date.now() });
+
+// Restore on reload
+pi.on("session_start", async (_event, ctx) => {
+  for (const entry of ctx.sessionManager.getEntries()) {
+    if (entry.type === "custom" && entry.customType === "my-state") {
+      const data = entry.data;  // { count: 42, lastRun: ... }
+    }
+  }
+});
+```
+
+---
diff --git a/docs/extending-pi/14-custom-rendering-controlling-what-the-user-sees.md b/docs/extending-pi/14-custom-rendering-controlling-what-the-user-sees.md
new file mode 100644
index 000000000..00c7e209f
--- /dev/null
+++ b/docs/extending-pi/14-custom-rendering-controlling-what-the-user-sees.md
@@ -0,0 +1,97 @@
+# Custom Rendering — Controlling What the User Sees
+
+
+### Tool Rendering
+
+Tools can provide `renderCall` (how the tool call looks) and `renderResult` (how the result looks):
+
+```typescript
+import { Text } from "@mariozechner/pi-tui";
+import { keyHint } from "@mariozechner/pi-coding-agent";
+
+pi.registerTool({
+  name: "my_tool",
+  // ...
+  
+  renderCall(args, theme) {
+    let text = theme.fg("toolTitle", theme.bold("my_tool "));
+    text += theme.fg("muted", args.action);
+    return new Text(text, 0, 0);  // 0,0 padding — Box handles it
+  },
+
+  renderResult(result, { expanded, isPartial }, theme) {
+    if (isPartial) {
+      return new Text(theme.fg("warning", "Processing..."), 0, 0);
+    }
+    
+    let text = theme.fg("success", "✓ Done");
+    if (!expanded) {
+      text += ` (${keyHint("expandTools", "to expand")})`;
+    }
+    if (expanded && result.details?.items) {
+      for (const item of result.details.items) {
+        text += "\n  " + theme.fg("dim", item);
+      }
+    }
+    return new Text(text, 0, 0);
+  },
+});
+```
+
+### Message Rendering
+
+Register a renderer for custom message types:
+
+```typescript
+import { Text } from "@mariozechner/pi-tui";
+
+pi.registerMessageRenderer("my-extension", (message, options, theme) => {
+  const { expanded } = options;
+  let text = theme.fg("accent", `[${message.customType}] `) + message.content;
+  if (expanded && message.details) {
+    text += "\n" + theme.fg("dim", JSON.stringify(message.details, null, 2));
+  }
+  return new Text(text, 0, 0);
+});
+
+// Send messages that use this renderer:
+pi.sendMessage({
+  customType: "my-extension",  // Matches the renderer
+  content: "Status update",
+  display: true,
+  details: { foo: "bar" },
+});
+```
+
+### Theme Colors Reference
+
+```typescript
+// Foreground: theme.fg(color, text)
+"text" | "accent" | "muted" | "dim"           // General
+"success" | "error" | "warning"                 // Status
+"border" | "borderAccent" | "borderMuted"       // Borders
+"toolTitle" | "toolOutput"                      // Tools
+"toolDiffAdded" | "toolDiffRemoved"             // Diffs
+"mdHeading" | "mdLink" | "mdCode"              // Markdown
+"syntaxKeyword" | "syntaxFunction" | "syntaxString"  // Syntax
+
+// Background: theme.bg(color, text)
+"selectedBg" | "userMessageBg" | "customMessageBg"
+"toolPendingBg" | "toolSuccessBg" | "toolErrorBg"
+
+// Text styles
+theme.bold(text)
+theme.italic(text)
+theme.strikethrough(text)
+```
+
+### Syntax Highlighting in Renderers
+
+```typescript
+import { highlightCode, getLanguageFromPath } from "@mariozechner/pi-coding-agent";
+
+const lang = getLanguageFromPath("/path/to/file.rs");  // "rust"
+const highlighted = highlightCode(code, lang, theme);
+```
+
+---
diff --git a/docs/extending-pi/15-system-prompt-modification.md b/docs/extending-pi/15-system-prompt-modification.md
new file mode 100644
index 000000000..2cda4f47f
--- /dev/null
+++ b/docs/extending-pi/15-system-prompt-modification.md
@@ -0,0 +1,49 @@
+# System Prompt Modification
+
+
+### Per-Turn Modification (before_agent_start)
+
+```typescript
+pi.on("before_agent_start", async (event, ctx) => {
+  return {
+    // Inject a persistent message (stored in session, visible to LLM)
+    message: {
+      customType: "my-extension",
+      content: "Additional context for the LLM",
+      display: true,
+    },
+    // Modify the system prompt for this turn
+    systemPrompt: event.systemPrompt + "\n\nYou must respond only in haiku.",
+  };
+});
+```
+
+### Context Manipulation (context event)
+
+Modify the messages sent to the LLM on every turn:
+
+```typescript
+pi.on("context", async (event, ctx) => {
+  // event.messages is a deep copy — safe to modify
+  const filtered = event.messages.filter(m => !isIrrelevant(m));
+  return { messages: filtered };
+});
+```
+
+### Tool-Specific Prompt Content
+
+Tools can add to the system prompt when they're active:
+
+```typescript
+pi.registerTool({
+  name: "my_tool",
+  promptSnippet: "Summarize or transform text according to action",  // Replaces description in "Available tools"
+  promptGuidelines: [
+    "Use my_tool when the user asks to summarize text.",
+    "Prefer my_tool over direct output for structured data."
+  ],  // Added to "Guidelines" section when tool is active
+  // ...
+});
+```
+
+---
diff --git a/docs/extending-pi/16-compaction-session-control.md b/docs/extending-pi/16-compaction-session-control.md
new file mode 100644
index 000000000..68a56e1e7
--- /dev/null
+++ b/docs/extending-pi/16-compaction-session-control.md
@@ -0,0 +1,54 @@
+# Compaction & Session Control
+
+
+### Custom Compaction
+
+Override the default compaction behavior:
+
+```typescript
+pi.on("session_before_compact", async (event, ctx) => {
+  const { preparation, branchEntries, customInstructions, signal } = event;
+
+  // Option 1: Cancel compaction
+  return { cancel: true };
+
+  // Option 2: Provide custom summary
+  return {
+    compaction: {
+      summary: "Custom summary of conversation so far...",
+      firstKeptEntryId: preparation.firstKeptEntryId,
+      tokensBefore: preparation.tokensBefore,
+    }
+  };
+});
+```
+
+### Triggering Compaction
+
+```typescript
+ctx.compact({
+  customInstructions: "Focus on the authentication changes",
+  onComplete: (result) => ctx.ui.notify("Compacted!", "info"),
+});
+```
+
+### Session Control (Commands Only)
+
+```typescript
+pi.registerCommand("handoff", {
+  handler: async (args, ctx) => {
+    // Create a new session with initial context
+    await ctx.newSession({
+      setup: async (sm) => {
+        sm.appendMessage({
+          role: "user",
+          content: [{ type: "text", text: "Context: " + args }],
+          timestamp: Date.now(),
+        });
+      },
+    });
+  },
+});
+```
+
+---
diff --git a/docs/extending-pi/17-model-provider-management.md b/docs/extending-pi/17-model-provider-management.md
new file mode 100644
index 000000000..fdd761224
--- /dev/null
+++ b/docs/extending-pi/17-model-provider-management.md
@@ -0,0 +1,61 @@
+# Model & Provider Management
+
+
+### Switching Models
+
+```typescript
+const model = ctx.modelRegistry.find("anthropic", "claude-sonnet-4-5");
+if (model) {
+  const success = await pi.setModel(model);
+  if (!success) ctx.ui.notify("No API key for this model", "error");
+}
+```
+
+### Registering Custom Providers
+
+```typescript
+pi.registerProvider("my-proxy", {
+  baseUrl: "https://proxy.example.com",
+  apiKey: "PROXY_API_KEY",  // Env var name or literal
+  api: "anthropic-messages",
+  models: [
+    {
+      id: "claude-sonnet-4-20250514",
+      name: "Claude 4 Sonnet (proxy)",
+      reasoning: false,
+      input: ["text", "image"],
+      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+      contextWindow: 200000,
+      maxTokens: 16384,
+    }
+  ],
+  // Optional: OAuth support for /login
+  oauth: {
+    name: "Corporate AI (SSO)",
+    async login(callbacks) { /* ... */ },
+    async refreshToken(credentials) { /* ... */ },
+    getApiKey(credentials) { return credentials.access; },
+  },
+});
+
+// Override just the baseUrl for an existing provider
+pi.registerProvider("anthropic", {
+  baseUrl: "https://proxy.example.com",
+});
+
+// Remove a provider
+pi.unregisterProvider("my-proxy");
+```
+
+### Reacting to Model Changes
+
+```typescript
+pi.on("model_select", async (event, ctx) => {
+  // event.model — new model
+  // event.previousModel — previous model (undefined if first)
+  // event.source — "set" | "cycle" | "restore"
+  ctx.ui.setStatus("model", `${event.model.provider}/${event.model.id}`);
+});
+```
+
+---
diff --git a/docs/extending-pi/18-remote-execution-tool-overrides.md b/docs/extending-pi/18-remote-execution-tool-overrides.md
new file mode 100644
index 000000000..b844226ef
--- /dev/null
+++ b/docs/extending-pi/18-remote-execution-tool-overrides.md
@@ -0,0 +1,52 @@
+# Remote Execution & Tool Overrides
+
+
+### SSH Example Pattern
+
+```typescript
+import { createReadTool, createBashTool, createWriteTool } from "@mariozechner/pi-coding-agent";
+
+export default function (pi: ExtensionAPI) {
+  pi.registerFlag("ssh", { description: "SSH target", type: "string" });
+
+  const localBash = createBashTool(process.cwd());
+
+  pi.registerTool({
+    ...localBash,
+    async execute(id, params, signal, onUpdate, ctx) {
+      const sshTarget = pi.getFlag("--ssh");
+      if (sshTarget) {
+        const remoteBash = createBashTool(process.cwd(), {
+          operations: createSSHOperations(sshTarget),
+        });
+        return remoteBash.execute(id, params, signal, onUpdate);
+      }
+      return localBash.execute(id, params, signal, onUpdate);
+    },
+  });
+}
+```
+
+### Tool Override Pattern (Logging/Access Control)
+
+```typescript
+pi.registerTool({
+  name: "read",  // Same name = overrides built-in
+  label: "Read (Logged)",
+  description: "Read file contents with logging",
+  parameters: Type.Object({
+    path: Type.String(),
+    offset: Type.Optional(Type.Number()),
+    limit: Type.Optional(Type.Number()),
+  }),
+  async execute(toolCallId, params, signal, onUpdate, ctx) {
+    console.log(`[AUDIT] Reading: ${params.path}`);
+    // Delegate to built-in implementation
+    const builtIn = createReadTool(ctx.cwd);
+    return builtIn.execute(toolCallId, params, signal, onUpdate);
+  },
+  // Omit renderCall/renderResult to use built-in renderer automatically
+});
+```
+
+---
diff --git a/docs/extending-pi/19-packaging-distribution.md b/docs/extending-pi/19-packaging-distribution.md
new file mode 100644
index 000000000..4090edd18
--- /dev/null
+++ b/docs/extending-pi/19-packaging-distribution.md
@@ -0,0 +1,56 @@
+# Packaging & Distribution
+
+
+### Creating a Pi Package
+
+Add a `pi` manifest to `package.json`:
+
+```json
+{
+  "name": "my-pi-package",
+  "keywords": ["pi-package"],
+  "pi": {
+    "extensions": ["./extensions"],
+    "skills": ["./skills"],
+    "prompts": ["./prompts"],
+    "themes": ["./themes"]
+  }
+}
+```
+
+### Installing Packages
+
+```bash
+pi install npm:@foo/bar@1.0.0
+pi install git:github.com/user/repo@v1
+pi install ./local/path
+
+# Try without installing:
+pi -e npm:@foo/bar
+```
+
+### Convention Directories (no manifest needed)
+
+If no `pi` manifest exists, pi auto-discovers:
+- `extensions/` → `.ts` and `.js` files
+- `skills/` → `SKILL.md` folders
+- `prompts/` → `.md` files
+- `themes/` → `.json` files
+
+### Gallery Metadata
+
+```json
+{
+  "pi": {
+    "video": "https://example.com/demo.mp4",
+    "image": "https://example.com/screenshot.png"
+  }
+}
+```
+
+### Dependencies
+
+- List `@mariozechner/pi-ai`, `@mariozechner/pi-coding-agent`, `@mariozechner/pi-tui`, `@sinclair/typebox` in `peerDependencies` with `"*"` — they're bundled by pi.
+- Other npm deps go in `dependencies`. Pi runs `npm install` on package installation.
+
+---
diff --git a/docs/extending-pi/20-mode-behavior.md b/docs/extending-pi/20-mode-behavior.md
new file mode 100644
index 000000000..61f9dc75e
--- /dev/null
+++ b/docs/extending-pi/20-mode-behavior.md
@@ -0,0 +1,21 @@
+# Mode Behavior
+
+
+| Mode | UI Methods | Notes |
+|------|-----------|-------|
+| **Interactive** (default) | Full TUI | Normal operation |
+| **RPC** (`--mode rpc`) | JSON protocol | Host handles UI, dialogs work via sub-protocol |
+| **JSON** (`--mode json`) | No-op | Event stream to stdout |
+| **Print** (`-p`) | No-op | Extensions run but can't prompt users |
+
+**Always check `ctx.hasUI`** before calling dialog methods in extensions that might run in non-interactive modes:
+
+```typescript
+if (ctx.hasUI) {
+  const ok = await ctx.ui.confirm("Delete?", "Sure?");
+} else {
+  // Default behavior for non-interactive mode
+}
+```
+
+---
diff --git a/docs/extending-pi/21-error-handling.md b/docs/extending-pi/21-error-handling.md
new file mode 100644
index 000000000..15ecceef3
--- /dev/null
+++ b/docs/extending-pi/21-error-handling.md
@@ -0,0 +1,8 @@
+# Error Handling
+
+
+- **Extension errors** are logged but don't crash pi. The agent continues.
+- **`tool_call` handler errors** block the tool (fail-safe behavior).
+- **Tool `execute` errors** are reported to the LLM with `isError: true`, allowing it to recover.
+
+---
diff --git a/docs/extending-pi/22-key-rules-gotchas.md b/docs/extending-pi/22-key-rules-gotchas.md
new file mode 100644
index 000000000..45e2a7e18
--- /dev/null
+++ b/docs/extending-pi/22-key-rules-gotchas.md
@@ -0,0 +1,25 @@
+# Key Rules & Gotchas
+
+
+### Must-Follow Rules
+
+1. **Use `StringEnum` for string enums** — `Type.Union`/`Type.Literal` breaks Google's API.
+2. **Truncate tool output** — Large output causes context overflow, compaction failures, degraded performance.
+3. **Use theme from callback** — Don't import theme directly. Use the `theme` parameter from `ctx.ui.custom()` or render functions.
+4. **Type the DynamicBorder color param** — Write `(s: string) => theme.fg("accent", s)`.
+5. **Call `tui.requestRender()` after state changes** in `handleInput`.
+6. **Return `{ render, invalidate, handleInput }`** from custom components.
+7. **Lines must not exceed `width`** in `render()` — use `truncateToWidth()`.
+8. **Session control methods only in commands** — `waitForIdle()`, `newSession()`, `fork()`, `navigateTree()`, `reload()` will deadlock in event handlers.
+9. **Strip leading `@` from path arguments** in custom tools — some models add it.
+10. **Store state in tool result `details`** for proper branching support.
+
+### Common Patterns
+
+- **Rebuild on `invalidate()`** when your component pre-bakes theme colors
+- **Check `signal?.aborted`** in long-running tool executions
+- **Use `pi.exec()` instead of `child_process`** for shell commands
+- **Overlay components are disposed when closed** — create fresh instances each time
+- **Treat `ctx.reload()` as terminal** — code after it runs from the pre-reload version
+
+---
diff --git a/docs/extending-pi/23-file-reference-documentation.md b/docs/extending-pi/23-file-reference-documentation.md
new file mode 100644
index 000000000..ec32a9df9
--- /dev/null
+++ b/docs/extending-pi/23-file-reference-documentation.md
@@ -0,0 +1,24 @@
+# File Reference — Documentation
+
+
+All paths relative to:
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/
+```
+
+| File | What It Covers |
+|------|---------------|
+| `docs/extensions.md` | **Primary reference** — Full extensions API (1,972 lines) |
+| `docs/tui.md` | TUI component system — `Component` interface, built-in components, keyboard input, theming, overlay system, performance, patterns |
+| `docs/packages.md` | Creating and distributing pi packages (npm, git, local) |
+| `docs/session.md` | Session file format, entry types, message types, SessionManager API |
+| `docs/compaction.md` | Auto-compaction, branch summarization, custom compaction hooks |
+| `docs/rpc.md` | RPC mode protocol for headless/embedded operation |
+| `docs/sdk.md` | SDK integrations |
+| `docs/custom-provider.md` | Custom model providers, OAuth, streaming APIs |
+| `docs/keybindings.md` | Keyboard shortcut format and built-in keybindings |
+| `docs/themes.md` | Creating custom themes |
+| `docs/settings.md` | Settings configuration |
+| `README.md` | Main pi documentation |
+
+---
diff --git a/docs/extending-pi/24-file-reference-example-extensions.md b/docs/extending-pi/24-file-reference-example-extensions.md
new file mode 100644
index 000000000..5a793fa1f
--- /dev/null
+++ b/docs/extending-pi/24-file-reference-example-extensions.md
@@ -0,0 +1,132 @@
+# File Reference — Example Extensions
+
+
+All paths relative to:
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples/extensions/
+```
+
+### Lifecycle & Safety
+| File | What It Demonstrates |
+|------|---------------------|
+| `protected-paths.ts` | Blocking writes to `.env`, `.git/`, `node_modules/` via `tool_call` |
+| `dirty-repo-guard.ts` | Preventing session changes with uncommitted git changes |
+
+### Custom Tools
+| File | What It Demonstrates |
+|------|---------------------|
+| `todo.ts` | **Best example** — Stateful tool with persistence, custom rendering, command |
+| `hello.ts` | Minimal tool registration |
+| `question.ts` | Tool with `ctx.ui.select()` for user interaction |
+| `questionnaire.ts` | Multi-question wizard with tab navigation |
+| `tool-override.ts` | Overriding built-in `read` with logging/access control |
+| `dynamic-tools.ts` | Registering tools after startup and at runtime |
+| `truncated-tool.ts` | Output truncation with `truncateHead` |
+| `built-in-tool-renderer.ts` | Custom compact rendering for built-in tools |
+| `antigravity-image-gen.ts` | Image generation tool |
+| `ssh.ts` | Full SSH remote execution with pluggable operations |
+
+### Commands & UI
+| File | What It Demonstrates |
+|------|---------------------|
+| `commands.ts` | Basic command registration |
+| `preset.ts` | Named presets (model, thinking, tools) with flag and command |
+| `plan-mode/` | Full plan mode — commands, shortcuts, flags, widgets, status, tool management |
+| `qna.ts` | Extract questions + `BorderedLoader` + `setEditorText` |
+| `send-user-message.ts` | `pi.sendUserMessage()` for injecting user messages |
+| `modal-editor.ts` | Vim-like modal editor via `CustomEditor` |
+| `snake.ts` | Full game with custom UI, keyboard handling, persistence |
+| `space-invaders.ts` | Full game with custom UI |
+| `doom-overlay/` | DOOM running as an overlay at 35 FPS |
+| `timed-confirm.ts` | Dialogs with `timeout` and `AbortSignal` |
+| `overlay-test.ts` | Overlay compositing with inline inputs |
+| `overlay-qa-tests.ts` | Comprehensive overlay tests: anchors, margins, stacking |
+
+### System Prompt & Context
+| File | What It Demonstrates |
+|------|---------------------|
+| `pirate.ts` | `before_agent_start` system prompt modification |
+| `claude-rules.ts` | Loading rules from `.claude/rules/` into system prompt |
+| `system-prompt-header.ts` | Displaying system prompt info |
+| `input-transform.ts` | Transforming user input via `input` event |
+| `inline-bash.ts` | Expanding `!{command}` patterns in prompts |
+
+### Compaction & Sessions
+| File | What It Demonstrates |
+|------|---------------------|
+| `custom-compaction.ts` | Custom compaction summary via `session_before_compact` |
+| `trigger-compact.ts` | Triggering compaction at 100k tokens |
+| `git-checkpoint.ts` | Git stash on turns, restore on fork |
+| `bookmark.ts` | Labeling entries for `/tree` navigation |
+| `session-name.ts` | Naming sessions for selector display |
+
+### UI Components
+| File | What It Demonstrates |
+|------|---------------------|
+| `custom-footer.ts` | `setFooter` with git branch and token stats |
+| `custom-header.ts` | `setHeader` for custom startup header |
+| `status-line.ts` | `setStatus` for footer indicators |
+| `widget-placement.ts` | `setWidget` above and below editor |
+| `notify.ts` | Desktop notifications via OSC 777 |
+| `titlebar-spinner.ts` | Braille spinner in terminal title |
+| `message-renderer.ts` | Custom message rendering with `registerMessageRenderer` |
+| `model-status.ts` | `model_select` event for status bar |
+| `mac-system-theme.ts` | Auto-sync theme with macOS dark/light mode |
+
+### Providers
+| File | What It Demonstrates |
+|------|---------------------|
+| `custom-provider-anthropic/` | Custom Anthropic provider with OAuth |
+| `custom-provider-gitlab-duo/` | GitLab Duo via proxy |
+| `custom-provider-qwen-cli/` | Qwen CLI with OAuth device flow |
+
+### Communication
+| File | What It Demonstrates |
+|------|---------------------|
+| `event-bus.ts` | Inter-extension communication via `pi.events` |
+| `rpc-demo.ts` | All RPC-supported extension UI methods |
+| `reload-runtime.ts` | Safe reload flow: command + LLM tool handoff |
+| `shutdown-command.ts` | `ctx.shutdown()` for graceful exit |
+| `file-trigger.ts` | File watcher injecting messages via `sendMessage` |
+
+### Misc
+| File | What It Demonstrates |
+|------|---------------------|
+| `with-deps/` | Extension with its own `package.json` and npm dependencies |
+| `minimal-mode.ts` | Override built-in tool rendering for minimal display |
+
+---
+
+## Quick Reference: "I want to..."
+
+| Goal | Approach | Key API | Example File |
+|------|----------|---------|-------------|
+| Block dangerous commands | Listen to `tool_call`, return `{ block: true }` | `pi.on("tool_call", ...)` | `protected-paths.ts` |
+| Add a tool the LLM can use | Register a tool with schema and execute | `pi.registerTool({...})` | `todo.ts` |
+| Add a slash command | Register a command with handler | `pi.registerCommand(...)` | `commands.ts` |
+| Ask the user a question | Use dialog methods | `ctx.ui.select()`, `ctx.ui.confirm()` | `question.ts` |
+| Show persistent status | Set footer status | `ctx.ui.setStatus(id, text)` | `status-line.ts` |
+| Modify the system prompt | Hook `before_agent_start` | Return `{ systemPrompt: "..." }` | `pirate.ts` |
+| Filter messages sent to LLM | Hook `context` event | Return `{ messages: [...] }` | — |
+| Save state across restarts | Store in tool details or appendEntry | `details: {...}` / `pi.appendEntry(...)` | `todo.ts` |
+| Custom compaction | Hook `session_before_compact` | Return `{ compaction: {...} }` | `custom-compaction.ts` |
+| Build a full-screen UI | Use `ctx.ui.custom()` | Component with render/handleInput | `snake.ts` |
+| Show a floating dialog | Use overlay mode | `ctx.ui.custom(..., { overlay: true })` | `overlay-test.ts` |
+| Replace the input editor | Extend `CustomEditor` | `ctx.ui.setEditorComponent(...)` | `modal-editor.ts` |
+| Override a built-in tool | Register tool with same name | `pi.registerTool({ name: "read" })` | `tool-override.ts` |
+| Run tools via SSH | Use pluggable operations | `createBashTool(cwd, { operations })` | `ssh.ts` |
+| Switch models programmatically | Find and set model | `pi.setModel(model)` | `preset.ts` |
+| Register a custom provider | Provide config with models | `pi.registerProvider(...)` | `custom-provider-anthropic/` |
+| Transform user input | Hook `input` event | Return `{ action: "transform", text }` | `input-transform.ts` |
+| Inject messages | Send custom or user messages | `pi.sendMessage()` / `pi.sendUserMessage()` | `send-user-message.ts` |
+| React to model changes | Hook `model_select` | `pi.on("model_select", ...)` | `model-status.ts` |
+| Add a keyboard shortcut | Register a shortcut | `pi.registerShortcut("ctrl+x", ...)` | `plan-mode/` |
+| Package for distribution | Add `pi` key to package.json | `"pi": { "extensions": [...] }` | See packages.md |
+
+---
+
+*This document was generated from the Pi extension documentation and examples. Source docs are at:*
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples/extensions/
+```
diff --git a/docs/extending-pi/25-slash-command-subcommand-patterns.md b/docs/extending-pi/25-slash-command-subcommand-patterns.md
new file mode 100644
index 000000000..ad883a238
--- /dev/null
+++ b/docs/extending-pi/25-slash-command-subcommand-patterns.md
@@ -0,0 +1,324 @@
+# Slash Command Subcommand Patterns
+
+Pi does not have a separate built-in concept of "nested slash commands" like `/wt new` or `/foo delete`.
+
+Instead, this UX is built by registering a single slash command and using **argument completions** to make the first argument behave like a subcommand.
+
+This is the pattern used by the built-in worktree extension:
+- `/wt`
+- `/wt new`
+- `/wt ls`
+- `/wt switch my-branch`
+
+The key API is:
+- `pi.registerCommand(name, options)`
+- `getArgumentCompletions(prefix)`
+- `handler(args, ctx)`
+
+## Mental Model
+
+Treat the command as:
+
+- one top-level slash command
+- one or more positional arguments
+- the first positional argument acting as a subcommand
+- optional later arguments completed dynamically based on the first
+
+So this:
+
+```text
+/wt
+  new
+  ls
+  switch
+  merge
+  rm
+  status
+```
+
+is really just:
+
+- command: `wt`
+- first arg: one of `new | ls | switch | merge | rm | status`
+
+## The Core Pattern
+
+```typescript
+pi.registerCommand("foo", {
+  description: "Manage foo items: /foo new|list|delete [name]",
+
+  getArgumentCompletions: (prefix: string) => {
+    const subcommands = ["new", "list", "delete"];
+    const parts = prefix.trim().split(/\s+/);
+
+    // Complete the first argument: /foo <subcommand>
+    if (parts.length <= 1) {
+      return subcommands
+        .filter((cmd) => cmd.startsWith(parts[0] ?? ""))
+        .map((cmd) => ({ value: cmd, label: cmd }));
+    }
+
+    // Complete the second argument: /foo delete <name>
+    if (parts[0] === "delete") {
+      const items = ["alpha", "beta", "gamma"];
+      const namePrefix = parts[1] ?? "";
+      return items
+        .filter((name) => name.startsWith(namePrefix))
+        .map((name) => ({ value: `delete ${name}`, label: name }));
+    }
+
+    return [];
+  },
+
+  handler: async (args, ctx) => {
+    const parts = args.trim().split(/\s+/);
+    const sub = parts[0];
+    const name = parts[1];
+
+    await ctx.waitForIdle();
+
+    if (sub === "new") {
+      ctx.ui.notify("Create a new foo item", "info");
+      return;
+    }
+
+    if (sub === "list") {
+      ctx.ui.notify("List foo items", "info");
+      return;
+    }
+
+    if (sub === "delete") {
+      if (!name) {
+        ctx.ui.notify("Usage: /foo delete <name>", "error");
+        return;
+      }
+      ctx.ui.notify(`Deleting ${name}`, "info");
+      return;
+    }
+
+    ctx.ui.notify("Usage: /foo <new|list|delete> [name]", "info");
+  },
+});
+```
+
+## How `getArgumentCompletions()` Behaves
+
+`getArgumentCompletions(prefix)` receives everything after the slash command name.
+
+Examples for `/foo`:
+
+- typing `/foo ` gives `prefix === ""`
+- typing `/foo de` gives `prefix === "de"`
+- typing `/foo delete a` gives `prefix === "delete a"`
+
+That means you can parse the prefix into words and decide what suggestions to show next.
+
+A common structure is:
+
+1. If the user is on the first argument, show available subcommands.
+2. If the first argument selects a branch like `delete`, show completions for the next argument.
+3. Otherwise return `[]`.
+
+## Important Detail: Empty Prefix Handling
+
+A practical gotcha is that:
+
+```typescript
+"".trim().split(/\s+/)
+```
+
+produces `['']`, not `[]`.
+
+That is why the common pattern is:
+
+```typescript
+const parts = prefix.trim().split(/\s+/);
+if (parts.length <= 1) {
+  // complete first argument
+}
+```
+
+This handles both:
+- completely empty input after the command
+- partially typed first arguments
+
+## Dynamic Second-Argument Completion
+
+This pattern becomes powerful when later arguments depend on the subcommand.
+
+Example:
+
+```typescript
+getArgumentCompletions: (prefix) => {
+  const parts = prefix.trim().split(/\s+/);
+  const sub = parts[0];
+
+  if (parts.length <= 1) {
+    return ["new", "list", "delete"].map((s) => ({ value: s, label: s }));
+  }
+
+  if (sub === "delete") {
+    const items = getCurrentItemsSomehow();
+    const namePrefix = parts[1] ?? "";
+    return items
+      .filter((item) => item.startsWith(namePrefix))
+      .map((item) => ({ value: `delete ${item}`, label: item }));
+  }
+
+  return [];
+}
+```
+
+This is how `/wt switch`, `/wt merge`, and `/wt rm` can suggest current worktree names.
+
+## Real Example: `/wt`
+
+The worktree extension uses this exact structure in:
+
+- `/Users/lexchristopherson/.gsd/agent/extensions/worktree/index.ts`
+
+It defines:
+
+```typescript
+const subcommands = ["new", "ls", "switch", "merge", "rm", "status"];
+```
+
+Then:
+
+- when the first argument is still being typed, it suggests those subcommands
+- when the first argument is `switch`, `merge`, or `rm`, it suggests matching worktree names for the second argument
+
+That is why typing:
+
+```text
+/wt 
+```
+
+shows:
+
+```text
+new
+ls
+switch
+merge
+rm
+status
+```
+
+and typing:
+
+```text
+/wt switch 
+```
+
+shows available worktree names.
+
+## Parsing in the Handler
+
+Your completion logic and your handler logic should agree on the command shape.
+
+A common structure is:
+
+```typescript
+handler: async (args, ctx) => {
+  const parts = args.trim().split(/\s+/);
+  const sub = parts[0];
+  const rest = parts.slice(1);
+
+  switch (sub) {
+    case "new":
+      // handle /foo new
+      return;
+    case "list":
+      // handle /foo list
+      return;
+    case "delete":
+      // handle /foo delete <name>
+      return;
+    default:
+      ctx.ui.notify("Usage: /foo <new|list|delete>", "info");
+      return;
+  }
+}
+```
+
+Keep the parsing simple and mirror the same branches your completions advertise.
+
+## When to Use This Pattern
+
+Use a single command with subcommand-style completions when:
+
+- the actions belong to one clear domain
+- you want discoverability from one entry point
+- the subcommands feel like one family of operations
+- later arguments depend on the earlier choice
+
+Examples:
+
+- `/wt new|switch|merge|rm|status`
+- `/preset save|load|delete`
+- `/workflow start|list|abort`
+- `/foo new|list|delete`
+
+## When to Prefer Separate Commands
+
+Prefer separate commands when:
+
+- the actions are conceptually unrelated
+- each command needs its own distinct description and identity
+- autocomplete would become too deep or overloaded
+- the combined command would become hard to remember or document
+
+Good candidates for separate commands:
+
+- `/deploy`
+- `/rollback`
+- `/handoff`
+
+rather than forcing all of those into one umbrella command.
+
+## UX Guidelines
+
+A few practical rules make this pattern feel good:
+
+- Keep top-level subcommands short and obvious.
+- Use names that read naturally after the slash command.
+- Keep branching shallow; one or two levels is usually enough.
+- Return an empty array when no completion makes sense.
+- Make your fallback usage text match your completion structure.
+- If a subcommand needs required data, validate it again in the handler.
+
+## Recommended Structure
+
+A solid command with subcommands usually has:
+
+- `description` showing the top-level grammar
+- `getArgumentCompletions()` for first and second argument suggestions
+- `handler()` that branches on the first argument
+- a fallback usage message for invalid input
+
+Example description:
+
+```typescript
+description: "Manage foo items: /foo new|list|delete [name]"
+```
+
+## Related Docs
+
+Read these alongside this pattern:
+
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/11-custom-commands-user-facing-actions.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/09-extensionapi-what-you-can-do.md`
+- `/Users/lexchristopherson/.gsd/agent/extensions/worktree/index.ts`
+
+## Summary
+
+If you want `/foo` to behave like it has nested subcommands, do this:
+
+1. register one slash command
+2. treat the first argument as a subcommand
+3. implement `getArgumentCompletions(prefix)`
+4. optionally complete later arguments dynamically
+5. branch in the handler based on the parsed first argument
+
+That is the mechanism behind the `/wt` experience.
diff --git a/docs/extending-pi/README.md b/docs/extending-pi/README.md
new file mode 100644
index 000000000..2cd82acf0
--- /dev/null
+++ b/docs/extending-pi/README.md
@@ -0,0 +1,36 @@
+# The Complete Guide to Building Pi Extensions
+
+> Split into individual files for easier consumption.
+
+## Table of Contents
+
+- [01. What Are Extensions?](./01-what-are-extensions.md)
+- [02. Architecture & Mental Model](./02-architecture-mental-model.md)
+- [03. Getting Started](./03-getting-started.md)
+- [04. Extension Locations & Discovery](./04-extension-locations-discovery.md)
+- [05. Extension Structure & Styles](./05-extension-structure-styles.md)
+- [06. The Extension Lifecycle](./06-the-extension-lifecycle.md)
+- [07. Events — The Nervous System](./07-events-the-nervous-system.md)
+- [08. ExtensionContext — What You Can Access](./08-extensioncontext-what-you-can-access.md)
+- [09. ExtensionAPI — What You Can Do](./09-extensionapi-what-you-can-do.md)
+- [10. Custom Tools — Giving the LLM New Abilities](./10-custom-tools-giving-the-llm-new-abilities.md)
+- [11. Custom Commands — User-Facing Actions](./11-custom-commands-user-facing-actions.md)
+- [12. Custom UI — Visual Components](./12-custom-ui-visual-components.md)
+- [13. State Management & Persistence](./13-state-management-persistence.md)
+- [14. Custom Rendering — Controlling What the User Sees](./14-custom-rendering-controlling-what-the-user-sees.md)
+- [15. System Prompt Modification](./15-system-prompt-modification.md)
+- [16. Compaction & Session Control](./16-compaction-session-control.md)
+- [17. Model & Provider Management](./17-model-provider-management.md)
+- [18. Remote Execution & Tool Overrides](./18-remote-execution-tool-overrides.md)
+- [19. Packaging & Distribution](./19-packaging-distribution.md)
+- [20. Mode Behavior](./20-mode-behavior.md)
+- [21. Error Handling](./21-error-handling.md)
+- [22. Key Rules & Gotchas](./22-key-rules-gotchas.md)
+- [23. File Reference — Documentation](./23-file-reference-documentation.md)
+- [24. File Reference — Example Extensions](./24-file-reference-example-extensions.md)
+- [25. Slash Command Subcommand Patterns](./25-slash-command-subcommand-patterns.md)
+
+---
+
+*Split into per-section files for surgical context loading.*
+
diff --git a/docs/getting-started.md b/docs/getting-started.md
new file mode 100644
index 000000000..bd79f868e
--- /dev/null
+++ b/docs/getting-started.md
@@ -0,0 +1,194 @@
+# Getting Started
+
+## Install
+
+```bash
+npm install -g gsd-pi
+```
+
+Requires Node.js ≥ 22.0.0 (24 LTS recommended) and Git.
+
+> **`command not found: gsd`?** Your shell may not have npm's global bin directory in `$PATH`. Run `npm prefix -g` to find it, then add `$(npm prefix -g)/bin` to your PATH. See [Troubleshooting](./troubleshooting.md#command-not-found-gsd-after-install) for details.
+
+GSD checks for updates once every 24 hours. When a new version is available, you'll see an interactive prompt at startup with the option to update immediately or skip. You can also update from within a session with `/gsd update`.
+
+### Set up API keys
+
+If you use a non-Anthropic model, you'll need a search API key for web search. Run `/gsd config` to set keys globally — they're saved to `~/.gsd/agent/auth.json` and apply to all projects:
+
+```bash
+# Inside any GSD session:
+/gsd config
+```
+
+See [Global API Keys](./configuration.md#global-api-keys-gsd-config) for details on supported keys.
+
+### Set up custom MCP servers
+
+If you want GSD to call local or external MCP servers, add project-local config in `.mcp.json` or `.gsd/mcp.json`.
+
+See [Configuration → MCP Servers](./configuration.md#mcp-servers) for examples and verification steps.
+
+### VS Code Extension
+
+GSD is also available as a VS Code extension. Install from the marketplace (publisher: FluxLabs) or search for "GSD" in VS Code extensions. The extension provides:
+
+- **`@gsd` chat participant** — talk to the agent in VS Code Chat
+- **Sidebar dashboard** — connection status, model info, token usage, quick actions
+- **Full command palette** — start/stop agent, switch models, export sessions
+
+The CLI (`gsd-pi`) must be installed first — the extension connects to it via RPC.
+
+## First Launch
+
+Run `gsd` in any directory:
+
+```bash
+gsd
+```
+
+GSD displays a welcome screen showing your version, active model, and available tool keys. Then on first launch, it runs a setup wizard:
+
+1. **LLM Provider** — select from 20+ providers (Anthropic, OpenAI, Google, OpenRouter, GitHub Copilot, Amazon Bedrock, Azure, and more). OAuth flows handle Claude Max and Copilot subscriptions automatically; otherwise paste an API key.
+2. **Tool API Keys** (optional) — Brave Search, Context7, Jina, Slack, Discord. Press Enter to skip any.
+
+If you have an existing Pi installation, provider credentials are imported automatically.
+
+Re-run the wizard anytime with:
+
+```bash
+gsd config
+```
+
+## Choose a Model
+
+GSD auto-selects a default model after login. Switch later with:
+
+```
+/model
+```
+
+Or configure per-phase models in preferences — see [Configuration](./configuration.md).
+
+## Two Ways to Work
+
+### Step Mode — `/gsd`
+
+Type `/gsd` inside a session. GSD executes one unit of work at a time, pausing between each with a wizard showing what completed and what's next.
+
+- **No `.gsd/` directory** → starts a discussion flow to capture your project vision
+- **Milestone exists, no roadmap** → discuss or research the milestone
+- **Roadmap exists, slices pending** → plan the next slice or execute a task
+- **Mid-task** → resume where you left off
+
+Step mode is the on-ramp. You stay in the loop, reviewing output between each step.
+
+### Auto Mode — `/gsd auto`
+
+Type `/gsd auto` and walk away. GSD autonomously researches, plans, executes, verifies, commits, and advances through every slice until the milestone is complete.
+
+```
+/gsd auto
+```
+
+See [Auto Mode](./auto-mode.md) for full details.
+
+## Two Terminals, One Project
+
+The recommended workflow: auto mode in one terminal, steering from another.
+
+**Terminal 1 — let it build:**
+
+```bash
+gsd
+/gsd auto
+```
+
+**Terminal 2 — steer while it works:**
+
+```bash
+gsd
+/gsd discuss    # talk through architecture decisions
+/gsd status     # check progress
+/gsd queue      # queue the next milestone
+```
+
+Both terminals read and write the same `.gsd/` files. Decisions in terminal 2 are picked up at the next phase boundary automatically.
+
+## Project Structure
+
+GSD organizes work into a hierarchy:
+
+```
+Milestone  →  a shippable version (4-10 slices)
+  Slice    →  one demoable vertical capability (1-7 tasks)
+    Task   →  one context-window-sized unit of work
+```
+
+The iron rule: **a task must fit in one context window.** If it can't, it's two tasks.
+
+All state lives on disk in `.gsd/`:
+
+```
+.gsd/
+  PROJECT.md          — what the project is right now
+  REQUIREMENTS.md     — requirement contract (active/validated/deferred)
+  DECISIONS.md        — append-only architectural decisions
+  KNOWLEDGE.md        — cross-session rules, patterns, and lessons
+  RUNTIME.md          — runtime context: API endpoints, env vars, services (v2.39)
+  STATE.md            — quick-glance status
+  milestones/
+    M001/
+      M001-ROADMAP.md — slice plan with risk levels and dependencies
+      M001-CONTEXT.md — scope and goals from discussion
+      slices/
+        S01/
+          S01-PLAN.md     — task decomposition
+          S01-SUMMARY.md  — what happened
+          S01-UAT.md      — human test script
+          tasks/
+            T01-PLAN.md
+            T01-SUMMARY.md
+```
+
+## Resume a Session
+
+```bash
+gsd --continue    # or gsd -c
+```
+
+Resumes the most recent session for the current directory.
+
+To browse and pick from all saved sessions:
+
+```bash
+gsd sessions
+```
+
+Shows each session's date, message count, and first-message preview so you can choose which one to resume.
+
+## Next Steps
+
+- [Auto Mode](./auto-mode.md) — deep dive into autonomous execution
+- [Configuration](./configuration.md) — model selection, timeouts, budgets
+- [Commands Reference](./commands.md) — all commands and shortcuts
+
+## Troubleshooting
+
+### `gsd` command runs `git svn dcommit` instead of GSD
+
+The [oh-my-zsh git plugin](https://github.com/ohmyzsh/ohmyzsh/tree/master/plugins/git) defines `alias gsd='git svn dcommit'`, which shadows the GSD binary.
+
+**Option 1** — Remove the alias in your `~/.zshrc` (add after the `source $ZSH/oh-my-zsh.sh` line):
+
+```bash
+unalias gsd 2>/dev/null
+```
+
+**Option 2** — Use the alternative binary name:
+
+```bash
+gsd-cli
+```
+
+Both `gsd` and `gsd-cli` point to the same binary.
diff --git a/docs/git-strategy.md b/docs/git-strategy.md
new file mode 100644
index 000000000..40576256f
--- /dev/null
+++ b/docs/git-strategy.md
@@ -0,0 +1,184 @@
+# Git Strategy
+
+GSD uses git for milestone isolation and sequential commits within each milestone. You choose an **isolation mode** that controls where work happens. The strategy is fully automated — you don't need to manage branches manually.
+
+## Isolation Modes
+
+GSD supports three isolation modes, configured via the `git.isolation` preference:
+
+| Mode | Working Directory | Branch | Best For |
+|------|-------------------|--------|----------|
+| `worktree` (default) | `.gsd/worktrees/<MID>/` | `milestone/<MID>` | Most projects — full file isolation between milestones |
+| `branch` | Project root | `milestone/<MID>` | Submodule-heavy repos where worktrees don't work well |
+| `none` | Project root | Current branch (no milestone branch) | Hot-reload workflows where file isolation breaks dev tooling |
+
+### `worktree` Mode (Default)
+
+Each milestone gets its own git worktree at `.gsd/worktrees/<MID>/` on a `milestone/<MID>` branch. All execution happens inside the worktree. On completion, the worktree is squash-merged to main as one clean commit. The worktree and branch are then cleaned up.
+
+This provides full file isolation — changes in a milestone can't interfere with your main working copy.
+
+### `branch` Mode
+
+Work happens in the project root on a `milestone/<MID>` branch. No worktree is created. On completion, the branch is merged to main (squash or regular merge, per `merge_strategy`).
+
+Use this when worktrees cause problems — submodule-heavy repos, repos with hardcoded paths, or environments where worktree symlinks don't behave.
+
+### `none` Mode
+
+Work happens directly on your current branch. No worktree, no milestone branch. GSD still commits sequentially with conventional commit messages, but there's no branch isolation.
+
+Use this for hot-reload workflows where file isolation breaks dev tooling (e.g., file watchers that only see the project root), or for small projects where branch overhead isn't worth it.
+
+## Branching Model (Worktree Mode)
+
+```
+main ─────────────────────────────────────────────────────────
+  │                                                     ↑
+  └── milestone/M001 (worktree) ────────────────────────┘
+       commit: feat(S01/T01): core types
+       commit: feat(S01/T02): markdown parser
+       commit: feat(S01/T03): file writer
+       commit: docs(M001/S01): workflow docs
+       ...
+       → squash-merged to main as single commit
+```
+
+In **branch mode**, the flow is the same except work happens in the project root instead of a separate worktree directory.
+
+In **none mode**, commits land directly on the current branch — no milestone branch is created, and no merge step is needed.
+
+### Parallel Worktrees
+
+With [parallel orchestration](./parallel-orchestration.md) enabled, multiple milestones run in separate worktrees simultaneously:
+
+```
+main ──────────────────────────────────────────────────────────
+  │                                      ↑              ↑
+  ├── milestone/M002 (worktree) ─────────┘              │
+  │    commit: feat(S01/T01): auth types                │
+  │    commit: feat(S01/T02): JWT middleware             │
+  │    → squash-merged first                            │
+  │                                                     │
+  └── milestone/M003 (worktree) ────────────────────────┘
+       commit: feat(S01/T01): dashboard layout
+       commit: feat(S01/T02): chart components
+       → squash-merged second
+```
+
+Each worktree operates on its own branch with its own commit history. Merges happen sequentially to avoid conflicts.
+
+### Key Properties
+
+- **Sequential commits on one branch** — no per-slice branches, no merge conflicts within a milestone
+- **Squash merge to main** — in worktree and branch modes, all commits are squashed into one clean commit on main (configurable via `merge_strategy`)
+
+### Commit Format
+
+Commits use conventional commit format with scope:
+
+```
+feat(S01/T01): core type definitions
+feat(S01/T02): markdown parser for plan files
+fix(M001/S03): bug fixes and doc corrections
+docs(M001/S04): workflow documentation
+```
+
+## Worktree Management
+
+These features apply only in **worktree mode**.
+
+### Automatic (Auto Mode)
+
+Auto mode creates and manages worktrees automatically:
+
+1. When a milestone starts, a worktree is created at `.gsd/worktrees/<MID>/` on branch `milestone/<MID>`
+2. Planning artifacts from `.gsd/milestones/` are copied into the worktree
+3. All execution happens inside the worktree
+4. On milestone completion, the worktree is squash-merged to the integration branch
+5. The worktree and branch are removed
+
+### Manual
+
+Use the `/worktree` (or `/wt`) command for manual worktree management:
+
+```
+/worktree create
+/worktree switch
+/worktree merge
+/worktree remove
+```
+
+## Workflow Modes
+
+Instead of configuring each git setting individually, set `mode` to get sensible defaults for your workflow:
+
+```yaml
+mode: solo    # personal projects — auto-push, squash, simple IDs
+mode: team    # shared repos — unique IDs, push branches, pre-merge checks
+```
+
+| Setting | `solo` | `team` |
+|---|---|---|
+| `git.auto_push` | `true` | `false` |
+| `git.push_branches` | `false` | `true` |
+| `git.pre_merge_check` | `false` | `true` |
+| `git.merge_strategy` | `"squash"` | `"squash"` |
+| `git.isolation` | `"worktree"` | `"worktree"` |
+| `git.commit_docs` | `true` | `true` |
+| `unique_milestone_ids` | `false` | `true` |
+
+Mode defaults are the lowest priority — any explicit preference overrides them. For example, `mode: solo` with `git.auto_push: false` gives you everything from solo except auto-push.
+
+Existing configs without `mode` work exactly as before — no defaults are injected.
+
+## Git Preferences
+
+Configure git behavior in preferences:
+
+```yaml
+git:
+  auto_push: false            # push after commits
+  push_branches: false        # push milestone branch
+  remote: origin
+  snapshots: false            # WIP snapshot commits
+  pre_merge_check: false      # pre-merge validation
+  commit_type: feat           # override commit type prefix
+  main_branch: main           # primary branch name
+  commit_docs: true           # commit .gsd/ to git
+  isolation: worktree         # "worktree", "branch", or "none"
+  auto_pr: false              # create PR on milestone completion
+  pr_target_branch: develop   # PR target branch (default: main)
+```
+
+### Automatic Pull Requests
+
+For teams using Gitflow or branch-based workflows, GSD can automatically create a pull request when a milestone completes:
+
+```yaml
+git:
+  auto_push: true
+  auto_pr: true
+  pr_target_branch: develop
+```
+
+This pushes the milestone branch and creates a PR targeting `develop` (or whichever branch you specify). Requires `gh` CLI installed and authenticated. See [git.auto_pr](./configuration.md#gitauto_pr) for details.
+```
+
+### `commit_docs: false`
+
+When set to `false`, GSD adds `.gsd/` to `.gitignore` and keeps all planning artifacts local-only. Useful for teams where only some members use GSD, or when company policy requires a clean repository.
+
+## Self-Healing
+
+GSD includes automatic recovery for common git issues:
+
+- **Detached HEAD** — automatically reattaches to the correct branch
+- **Stale lock files** — removes `index.lock` files from crashed processes
+- **Orphaned worktrees** — detects and offers to clean up abandoned worktrees (worktree mode only)
+
+Run `/gsd doctor` to check git health manually.
+
+## Native Git Operations
+
+Since v2.16, GSD uses libgit2 via native bindings for read-heavy operations in the dispatch hot path. This eliminates ~70 process spawns per dispatch cycle, improving auto-mode throughput.
diff --git a/docs/migration.md b/docs/migration.md
new file mode 100644
index 000000000..8676d1af2
--- /dev/null
+++ b/docs/migration.md
@@ -0,0 +1,48 @@
+# Migration from v1
+
+If you have projects with `.planning` directories from the original Get Shit Done (v1), you can migrate them to GSD-2's `.gsd` format.
+
+## Running the Migration
+
+```bash
+# From within the project directory
+/gsd migrate
+
+# Or specify a path
+/gsd migrate ~/projects/my-old-project
+```
+
+## What Gets Migrated
+
+The migration tool:
+
+- Parses your old `PROJECT.md`, `ROADMAP.md`, `REQUIREMENTS.md`, phase directories, plans, summaries, and research
+- Maps phases → slices, plans → tasks, milestones → milestones
+- Preserves completion state (`[x]` phases stay done, summaries carry over)
+- Consolidates research files into the new structure
+- Shows a preview before writing anything
+- Optionally runs an agent-driven review of the output for quality assurance
+
+## Supported Formats
+
+The migration handles various v1 format variations:
+
+- Milestone-sectioned roadmaps with `<details>` blocks
+- Bold phase entries
+- Bullet-format requirements
+- Decimal phase numbering
+- Duplicate phase numbers across milestones
+
+## Requirements
+
+Migration works best with a `ROADMAP.md` file for milestone structure. Without one, milestones are inferred from the `phases/` directory.
+
+## Post-Migration
+
+After migrating, verify the output with:
+
+```
+/gsd doctor
+```
+
+This checks `.gsd/` integrity and flags any structural issues.
diff --git a/docs/node-lts-macos.md b/docs/node-lts-macos.md
new file mode 100644
index 000000000..67582aec1
--- /dev/null
+++ b/docs/node-lts-macos.md
@@ -0,0 +1,75 @@
+# Pinning Node.js LTS on macOS with Homebrew
+
+If you installed Node.js via Homebrew (`brew install node`), you're tracking the **latest current release** — which can include odd-numbered development versions (e.g. 23.x, 25.x). These aren't LTS and may have breaking changes or instability.
+
+GSD requires Node.js **v22 or later** and works best on an **LTS (even-numbered) release**. This guide shows how to pin Node 24 LTS using Homebrew.
+
+## Check your current version
+
+```bash
+node --version
+```
+
+If this shows an odd number (e.g. `v23.x`, `v25.x`), you're on a development release.
+
+## Install Node 24 LTS
+
+Homebrew provides versioned formulas for LTS releases:
+
+```bash
+# Unlink the current (possibly non-LTS) version
+brew unlink node
+
+# Install Node 24 LTS
+brew install node@24
+
+# Link it as the default
+brew link --overwrite node@24
+```
+
+Verify:
+
+```bash
+node --version
+# Should show v24.x.x
+```
+
+## Why pin to LTS?
+
+- **Stability** — LTS releases receive bug fixes and security patches for 30 months
+- **Compatibility** — npm packages (including GSD) test against LTS versions
+- **No surprises** — `brew upgrade` won't jump you to an unstable development release
+
+## Prevent accidental upgrades
+
+By default, `brew upgrade` will upgrade all packages, which could move you off the pinned version. Pin the formula:
+
+```bash
+brew pin node@24
+```
+
+To unpin later:
+
+```bash
+brew unpin node@24
+```
+
+## Switching between versions
+
+If you need multiple Node versions (e.g. 22 and 24), consider using a version manager instead:
+
+- **[nvm](https://github.com/nvm-sh/nvm)** — `nvm install 24 && nvm use 24`
+- **[fnm](https://github.com/Schniz/fnm)** — `fnm install 24 && fnm use 24` (faster, Rust-based)
+- **[mise](https://mise.jdx.dev/)** — `mise use node@24` (polyglot version manager)
+
+These let you set per-project Node versions via `.node-version` or `.nvmrc` files.
+
+## Verify GSD works
+
+After pinning:
+
+```bash
+node --version   # v24.x.x
+npm install -g gsd-pi
+gsd --version
+```
diff --git a/docs/parallel-orchestration.md b/docs/parallel-orchestration.md
new file mode 100644
index 000000000..6b611291d
--- /dev/null
+++ b/docs/parallel-orchestration.md
@@ -0,0 +1,309 @@
+# Parallel Milestone Orchestration
+
+Run multiple milestones simultaneously in isolated git worktrees. Each milestone gets its own worker process, its own branch, and its own context window — while a coordinator tracks progress, enforces budgets, and keeps everything in sync.
+
+> **Status:** Behind `parallel.enabled: false` by default. Opt-in only — zero impact to existing users.
+
+## Quick Start
+
+1. Enable parallel mode in your preferences:
+
+```yaml
+---
+parallel:
+  enabled: true
+  max_workers: 2
+---
+```
+
+2. Start parallel execution:
+
+```
+/gsd parallel start
+```
+
+GSD scans your milestones, checks dependencies and file overlap, shows an eligibility report, and spawns workers for eligible milestones.
+
+3. Monitor progress:
+
+```
+/gsd parallel status
+```
+
+4. Stop when done:
+
+```
+/gsd parallel stop
+```
+
+## How It Works
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│  Coordinator (your GSD session)                         │
+│                                                         │
+│  Responsibilities:                                      │
+│  - Eligibility analysis (deps + file overlap)           │
+│  - Worker spawning and lifecycle                        │
+│  - Budget tracking across all workers                   │
+│  - Signal dispatch (pause/resume/stop)                  │
+│  - Session status monitoring                            │
+│  - Merge reconciliation                                 │
+│                                                         │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐              │
+│  │ Worker 1 │  │ Worker 2 │  │ Worker 3 │  ...          │
+│  │ M001     │  │ M003     │  │ M005     │              │
+│  └──────────┘  └──────────┘  └──────────┘              │
+│       │              │              │                   │
+│       ▼              ▼              ▼                   │
+│  .gsd/worktrees/ .gsd/worktrees/ .gsd/worktrees/       │
+│  M001/           M003/           M005/                  │
+│  (milestone/     (milestone/     (milestone/            │
+│   M001 branch)    M003 branch)    M005 branch)          │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Worker Isolation
+
+Each worker is a separate `gsd` process with complete isolation:
+
+| Resource | Isolation Method |
+|----------|-----------------|
+| **Filesystem** | Git worktree — each worker has its own checkout |
+| **Git branch** | `milestone/<MID>` — one branch per milestone |
+| **State derivation** | `GSD_MILESTONE_LOCK` env var — `deriveState()` only sees the assigned milestone |
+| **Context window** | Separate process — each worker has its own agent sessions |
+| **Metrics** | Each worktree has its own `.gsd/metrics.json` |
+| **Crash recovery** | Each worktree has its own `.gsd/auto.lock` |
+
+### Coordination
+
+Workers and the coordinator communicate through file-based IPC:
+
+- **Session status files** (`.gsd/parallel/<MID>.status.json`) — workers write heartbeats, the coordinator reads them
+- **Signal files** (`.gsd/parallel/<MID>.signal.json`) — coordinator writes signals, workers consume them
+- **Atomic writes** — write-to-temp + rename prevents partial reads
+
+## Eligibility Analysis
+
+Before starting parallel execution, GSD checks which milestones can safely run concurrently.
+
+### Rules
+
+1. **Not complete** — Finished milestones are skipped
+2. **Dependencies satisfied** — All `dependsOn` entries must have status `complete`
+3. **File overlap check** — Milestones touching the same files get a warning (but are still eligible)
+
+### Example Report
+
+```
+# Parallel Eligibility Report
+
+## Eligible for Parallel Execution (2)
+
+- **M002** — Auth System
+  All dependencies satisfied.
+- **M003** — Dashboard UI
+  All dependencies satisfied.
+
+## Ineligible (2)
+
+- **M001** — Core Types
+  Already complete.
+- **M004** — API Integration
+  Blocked by incomplete dependencies: M002.
+
+## File Overlap Warnings (1)
+
+- **M002** <-> **M003** — 2 shared file(s):
+  - `src/types.ts`
+  - `src/middleware.ts`
+```
+
+File overlaps are warnings, not blockers. Both milestones work in separate worktrees, so they won't interfere at the filesystem level. Conflicts are detected and resolved during merge.
+
+## Configuration
+
+Add to `~/.gsd/preferences.md` or `.gsd/preferences.md`:
+
+```yaml
+---
+parallel:
+  enabled: false            # Master toggle (default: false)
+  max_workers: 2            # Concurrent workers (1-4, default: 2)
+  budget_ceiling: 50.00     # Aggregate cost limit in dollars (optional)
+  merge_strategy: "per-milestone"  # When to merge: "per-slice" or "per-milestone"
+  auto_merge: "confirm"            # "auto", "confirm", or "manual"
+---
+```
+
+### Configuration Reference
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| `enabled` | boolean | `false` | Master toggle. Must be `true` for `/gsd parallel` commands to work. |
+| `max_workers` | number (1-4) | `2` | Maximum concurrent worker processes. Higher values use more memory and API budget. |
+| `budget_ceiling` | number | none | Aggregate cost ceiling in USD across all workers. When reached, no new units are dispatched. |
+| `merge_strategy` | `"per-slice"` or `"per-milestone"` | `"per-milestone"` | When worktree changes merge back to main. Per-milestone waits for the full milestone to complete. |
+| `auto_merge` | `"auto"`, `"confirm"`, `"manual"` | `"confirm"` | How merge-back is handled. `confirm` prompts before merging. `manual` requires explicit `/gsd parallel merge`. |
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd parallel start` | Analyze eligibility, confirm, and start workers |
+| `/gsd parallel status` | Show all workers with state, units completed, and cost |
+| `/gsd parallel stop` | Stop all workers (sends SIGTERM) |
+| `/gsd parallel stop M002` | Stop a specific milestone's worker |
+| `/gsd parallel pause` | Pause all workers (finish current unit, then wait) |
+| `/gsd parallel pause M002` | Pause a specific worker |
+| `/gsd parallel resume` | Resume all paused workers |
+| `/gsd parallel resume M002` | Resume a specific worker |
+| `/gsd parallel merge` | Merge all completed milestones back to main |
+| `/gsd parallel merge M002` | Merge a specific milestone back to main |
+
+## Signal Lifecycle
+
+The coordinator communicates with workers through signals:
+
+```
+Coordinator                    Worker
+    │                            │
+    ├── sendSignal("pause") ──→  │
+    │                            ├── consumeSignal()
+    │                            ├── pauseAuto()
+    │                            │   (finish current unit, wait)
+    │                            │
+    ├── sendSignal("resume") ─→  │
+    │                            ├── consumeSignal()
+    │                            ├── resume dispatch loop
+    │                            │
+    ├── sendSignal("stop") ───→  │
+    │   + SIGTERM ────────────→  │
+    │                            ├── consumeSignal() or SIGTERM handler
+    │                            ├── stopAuto()
+    │                            └── process exits
+```
+
+Workers check for signals between units (in `handleAgentEnd`). The coordinator also sends `SIGTERM` for immediate response on stop.
+
+## Merge Reconciliation
+
+When milestones complete, their worktree changes need to merge back to main.
+
+### Merge Order
+
+- **Sequential** (default): Milestones merge in ID order (M001 before M002)
+- **By-completion**: Milestones merge in the order they finish
+
+### Conflict Handling
+
+1. `.gsd/` state files (STATE.md, metrics.json, etc.) — **auto-resolved** by accepting the milestone branch version
+2. Code conflicts — **stop and report**. The merge halts, showing which files conflict. Resolve manually and retry with `/gsd parallel merge <MID>`.
+
+### Example
+
+```
+/gsd parallel merge
+
+# Merge Results
+
+- **M002** — merged successfully (pushed)
+- **M003** — CONFLICT (2 file(s)):
+  - `src/types.ts`
+  - `src/middleware.ts`
+  Resolve conflicts manually and run `/gsd parallel merge M003` to retry.
+```
+
+## Budget Management
+
+When `budget_ceiling` is set, the coordinator tracks aggregate cost across all workers:
+
+- Cost is summed from each worker's session status
+- When the ceiling is reached, the coordinator signals workers to stop
+- Each worker also respects the project-level `budget_ceiling` preference independently
+
+## Health Monitoring
+
+### Doctor Integration
+
+`/gsd doctor` detects parallel session issues:
+
+- **Stale parallel sessions** — Worker process died without cleanup. Doctor finds `.gsd/parallel/*.status.json` files with dead PIDs or expired heartbeats and removes them.
+
+Run `/gsd doctor --fix` to clean up automatically.
+
+### Stale Detection
+
+Sessions are considered stale when:
+- The worker PID is no longer running (checked via `process.kill(pid, 0)`)
+- The last heartbeat is older than 30 seconds
+
+The coordinator runs stale detection during `refreshWorkerStatuses()` and automatically removes dead sessions.
+
+## Safety Model
+
+| Safety Layer | Protection |
+|-------------|------------|
+| **Feature flag** | `parallel.enabled: false` by default — existing users unaffected |
+| **Eligibility analysis** | Dependency and file overlap checks before starting |
+| **Worker isolation** | Separate processes, worktrees, branches, context windows |
+| **`GSD_MILESTONE_LOCK`** | Each worker only sees its milestone in state derivation |
+| **`GSD_PARALLEL_WORKER`** | Workers cannot spawn nested parallel sessions |
+| **Budget ceiling** | Aggregate cost enforcement across all workers |
+| **Signal-based shutdown** | Graceful stop via file signals + SIGTERM |
+| **Doctor integration** | Detects and cleans up orphaned sessions |
+| **Conflict-aware merge** | Stops on code conflicts, auto-resolves `.gsd/` state conflicts |
+
+## File Layout
+
+```
+.gsd/
+├── parallel/                    # Coordinator ↔ worker IPC
+│   ├── M002.status.json         # Worker heartbeat + progress
+│   ├── M002.signal.json         # Coordinator → worker signals
+│   ├── M003.status.json
+│   └── M003.signal.json
+├── worktrees/                   # Git worktrees (one per milestone)
+│   ├── M002/                    # M002's isolated checkout
+│   │   ├── .gsd/                # M002's own state files
+│   │   │   ├── auto.lock
+│   │   │   ├── metrics.json
+│   │   │   └── milestones/
+│   │   └── src/                 # M002's working copy
+│   └── M003/
+│       └── ...
+└── ...
+```
+
+Both `.gsd/parallel/` and `.gsd/worktrees/` are gitignored — they're runtime-only coordination files that never get committed.
+
+## Troubleshooting
+
+### "Parallel mode is not enabled"
+
+Set `parallel.enabled: true` in your preferences file.
+
+### "No milestones are eligible for parallel execution"
+
+All milestones are either complete or blocked by dependencies. Check `/gsd queue` to see milestone status and dependency chains.
+
+### Worker crashed — how to recover
+
+Workers now persist their state to disk automatically. If a worker process dies, the coordinator detects the dead PID via heartbeat expiry and marks the worker as crashed. On restart, the worker picks up from disk state — crash recovery, worktree re-entry, and completed-unit tracking carry over from the crashed session.
+
+1. Run `/gsd doctor --fix` to clean up stale sessions
+2. Run `/gsd parallel status` to see current state
+3. Re-run `/gsd parallel start` to spawn new workers for remaining milestones
+
+### Merge conflicts after parallel completion
+
+1. Run `/gsd parallel merge` to see which milestones have conflicts
+2. Resolve conflicts in the worktree at `.gsd/worktrees/<MID>/`
+3. Retry with `/gsd parallel merge <MID>`
+
+### Workers seem stuck
+
+Check if budget ceiling was reached: `/gsd parallel status` shows per-worker costs. Increase `parallel.budget_ceiling` or remove it to continue.
diff --git a/docs/pi-ui-tui/01-the-ui-architecture.md b/docs/pi-ui-tui/01-the-ui-architecture.md
new file mode 100644
index 000000000..d970de830
--- /dev/null
+++ b/docs/pi-ui-tui/01-the-ui-architecture.md
@@ -0,0 +1,57 @@
+# The UI Architecture
+
+Pi's TUI is a custom terminal rendering system. Understanding its architecture prevents most mistakes:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    Terminal Window                            │
+│                                                              │
+│  ┌────────────────────────────────────────────────────────┐  │
+│  │  Custom Header (ctx.ui.setHeader)                      │  │
+│  ├────────────────────────────────────────────────────────┤  │
+│  │                                                        │  │
+│  │  Message Area                                          │  │
+│  │  - User messages                                       │  │
+│  │  - Assistant responses                                 │  │
+│  │  - Tool calls and results ◄── renderCall/renderResult  │  │
+│  │  - Custom messages ◄── registerMessageRenderer         │  │
+│  │  - Notifications                                       │  │
+│  │                                                        │  │
+│  ├────────────────────────────────────────────────────────┤  │
+│  │  Widgets (above editor) ◄── ctx.ui.setWidget           │  │
+│  ├────────────────────────────────────────────────────────┤  │
+│  │                                                        │  │
+│  │  Editor ◄── Can be replaced by:                        │  │
+│  │    - ctx.ui.custom() (temporary full replacement)      │  │
+│  │    - ctx.ui.setEditorComponent() (permanent replace)   │  │
+│  │                                                        │  │
+│  ├────────────────────────────────────────────────────────┤  │
+│  │  Widgets (below editor) ◄── ctx.ui.setWidget           │  │
+│  ├────────────────────────────────────────────────────────┤  │
+│  │  Footer ◄── ctx.ui.setFooter / ctx.ui.setStatus        │  │
+│  └────────────────────────────────────────────────────────┘  │
+│                                                              │
+│  ┌────────────────────────┐                                  │
+│  │  Overlay (floating)    │ ◄── ctx.ui.custom({ overlay })   │
+│  │  Rendered on top of    │                                  │
+│  │  everything            │                                  │
+│  └────────────────────────┘                                  │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Key principles:**
+- Everything renders as **arrays of strings** (one per line)
+- Each line **must not exceed the `width` parameter** — this is enforced
+- **ANSI escape codes** are used for styling — they don't count toward visible width
+- **Styles do NOT carry across lines** — the TUI resets SGR at the end of each line
+- All **state changes require explicit invalidation** followed by a render request
+- **Theme is always passed via callbacks** — never import it directly
+
+### Packages
+
+| Package | What it provides |
+|---------|-----------------|
+| `@mariozechner/pi-tui` | Core components (`Text`, `Box`, `Container`, `SelectList`, etc.), keyboard handling, text utilities |
+| `@mariozechner/pi-coding-agent` | Higher-level components (`DynamicBorder`, `BorderedLoader`, `CustomEditor`), theming helpers, code highlighting |
+
+---
diff --git a/docs/pi-ui-tui/02-the-component-interface-foundation-of-everything.md b/docs/pi-ui-tui/02-the-component-interface-foundation-of-everything.md
new file mode 100644
index 000000000..6b0536614
--- /dev/null
+++ b/docs/pi-ui-tui/02-the-component-interface-foundation-of-everything.md
@@ -0,0 +1,44 @@
+# The Component Interface — Foundation of Everything
+
+Every visual element in Pi implements this interface:
+
+```typescript
+interface Component {
+  render(width: number): string[];
+  handleInput?(data: string): void;
+  wantsKeyRelease?: boolean;
+  invalidate(): void;
+}
+```
+
+| Method | Purpose | Required? |
+|--------|---------|-----------|
+| `render(width)` | Return array of strings (one per line). Each line ≤ `width` visible chars. | **Yes** |
+| `handleInput(data)` | Receive keyboard input when component has focus. | Optional |
+| `wantsKeyRelease` | If `true`, receive key release events (Kitty protocol). | Optional, default `false` |
+| `invalidate()` | Clear cached render state. Called on theme changes. | **Yes** |
+
+### The Render Contract
+
+```typescript
+render(width: number): string[] {
+  // MUST return an array of strings
+  // Each string MUST NOT exceed `width` in visible characters
+  // ANSI escape codes (colors, styles) don't count toward visible width
+  // Styles are reset at end of each line — reapply per line
+  // Return [] for zero-height component
+}
+```
+
+### The Invalidation Contract
+
+```typescript
+invalidate(): void {
+  // Clear ALL cached render output
+  // Clear any pre-baked themed strings
+  // Call super.invalidate() if extending a built-in component
+  // After invalidation, next render() must produce fresh output
+}
+```
+
+---
diff --git a/docs/pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md b/docs/pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md
new file mode 100644
index 000000000..686e3016b
--- /dev/null
+++ b/docs/pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md
@@ -0,0 +1,19 @@
+# Entry Points — How UI Gets on Screen
+
+There are **six different ways** to put custom UI on screen, each for a different purpose:
+
+| Method | Purpose | Blocks? | Replaces editor? |
+|--------|---------|---------|-------------------|
+| `ctx.ui.select/confirm/input/editor` | Quick dialogs | Yes | Temporarily |
+| `ctx.ui.notify` | Toast notifications | No | No |
+| `ctx.ui.setStatus` | Footer status text | No | No |
+| `ctx.ui.setWidget` | Persistent widget above/below editor | No | No |
+| `ctx.ui.setFooter` | Replace entire footer | No | No (replaces footer) |
+| `ctx.ui.custom()` | Full custom component | Yes | Temporarily |
+| `ctx.ui.custom({overlay})` | Floating overlay | Yes | No (renders on top) |
+| `ctx.ui.setEditorComponent` | Replace editor permanently | No | Yes (permanently) |
+| `ctx.ui.setHeader` | Custom startup header | No | No (replaces header) |
+| `renderCall/renderResult` | Tool display | No | No (inline in messages) |
+| `registerMessageRenderer` | Custom message display | No | No (inline in messages) |
+
+---
diff --git a/docs/pi-ui-tui/04-built-in-dialog-methods.md b/docs/pi-ui-tui/04-built-in-dialog-methods.md
new file mode 100644
index 000000000..efedded97
--- /dev/null
+++ b/docs/pi-ui-tui/04-built-in-dialog-methods.md
@@ -0,0 +1,77 @@
+# Built-in Dialog Methods
+
+The simplest UI — blocking dialogs that wait for user response:
+
+### Selection
+
+```typescript
+const choice = await ctx.ui.select("Pick a color:", ["Red", "Green", "Blue"]);
+// Returns: "Red" | "Green" | "Blue" | undefined (if cancelled)
+```
+
+### Confirmation
+
+```typescript
+const ok = await ctx.ui.confirm("Delete file?", "This action cannot be undone.");
+// Returns: true | false
+```
+
+### Text Input
+
+```typescript
+const name = await ctx.ui.input("Project name:", "my-project");
+// Returns: string | undefined (if cancelled)
+```
+
+### Multi-line Editor
+
+```typescript
+const text = await ctx.ui.editor("Edit the description:", "Default text here");
+// Returns: string | undefined (if cancelled)
+```
+
+### Timed Dialogs (Auto-Dismiss)
+
+Dialogs can auto-dismiss with a live countdown:
+
+```typescript
+// Shows "Confirm? (5s)" → "Confirm? (4s)" → ... → auto-dismisses
+const ok = await ctx.ui.confirm(
+  "Auto-proceed?",
+  "Continuing in 5 seconds...",
+  { timeout: 5000 }
+);
+// Returns false on timeout
+```
+
+**Timeout return values:**
+- `select()` → `undefined`
+- `confirm()` → `false`
+- `input()` → `undefined`
+
+### Manual Dismissal with AbortSignal
+
+For more control (distinguish timeout from user cancel):
+
+```typescript
+const controller = new AbortController();
+const timeoutId = setTimeout(() => controller.abort(), 5000);
+
+const ok = await ctx.ui.confirm(
+  "Timed Confirm",
+  "Auto-cancels in 5s",
+  { signal: controller.signal }
+);
+
+clearTimeout(timeoutId);
+
+if (ok) {
+  // User confirmed
+} else if (controller.signal.aborted) {
+  // Timed out
+} else {
+  // User cancelled (Escape)
+}
+```
+
+---
diff --git a/docs/pi-ui-tui/05-persistent-ui-elements.md b/docs/pi-ui-tui/05-persistent-ui-elements.md
new file mode 100644
index 000000000..be3b384cd
--- /dev/null
+++ b/docs/pi-ui-tui/05-persistent-ui-elements.md
@@ -0,0 +1,120 @@
+# Persistent UI Elements
+
+These stay on screen until explicitly cleared:
+
+### Status (Footer)
+
+```typescript
+// Set (persists until cleared or overwritten)
+ctx.ui.setStatus("my-ext", "● Active");
+ctx.ui.setStatus("my-ext", ctx.ui.theme.fg("accent", "● Mode: Plan"));
+
+// Clear
+ctx.ui.setStatus("my-ext", undefined);
+```
+
+Multiple extensions can set independent status entries. They appear in the footer.
+
+### Widgets (Above/Below Editor)
+
+```typescript
+// Simple string array (above editor, default)
+ctx.ui.setWidget("my-widget", ["Line 1", "Line 2", "Line 3"]);
+
+// Below editor
+ctx.ui.setWidget("my-widget", ["Below the editor!"], { placement: "belowEditor" });
+
+// With theme (component factory)
+ctx.ui.setWidget("my-widget", (_tui, theme) => {
+  const lines = items.map(item =>
+    item.done
+      ? theme.fg("success", "✓ ") + theme.fg("muted", theme.strikethrough(item.text))
+      : theme.fg("dim", "○ ") + item.text
+  );
+  return {
+    render: () => lines,
+    invalidate: () => {},
+  };
+});
+
+// Clear
+ctx.ui.setWidget("my-widget", undefined);
+```
+
+### Working Message (During Streaming)
+
+```typescript
+ctx.ui.setWorkingMessage("Analyzing code structure...");
+ctx.ui.setWorkingMessage();  // Restore default
+```
+
+### Custom Footer (Full Replacement)
+
+```typescript
+ctx.ui.setFooter((tui, theme, footerData) => ({
+  invalidate() {},
+  render(width: number): string[] {
+    const branch = footerData.getGitBranch();  // Not accessible elsewhere!
+    const statuses = footerData.getExtensionStatuses();  // All setStatus values
+    const left = theme.fg("dim", `${ctx.model?.id || "no-model"}`);
+    const right = theme.fg("dim", branch || "no git");
+    const pad = " ".repeat(Math.max(1, width - visibleWidth(left) - visibleWidth(right)));
+    return [truncateToWidth(left + pad + right, width)];
+  },
+  // Reactive: re-render when branch changes
+  dispose: footerData.onBranchChange(() => tui.requestRender()),
+}));
+
+// Restore default
+ctx.ui.setFooter(undefined);
+```
+
+**`footerData` provides:**
+- `getGitBranch(): string | null` — current git branch (not accessible through any other API)
+- `getExtensionStatuses(): ReadonlyMap<string, string>` — all `setStatus` values
+- `onBranchChange(callback): () => void` — subscribe to branch changes, returns dispose function
+
+### Custom Header
+
+```typescript
+ctx.ui.setHeader((tui, theme) => ({
+  render(width: number): string[] {
+    return [theme.fg("accent", theme.bold("My Custom Header"))];
+  },
+  invalidate() {},
+}));
+```
+
+### Editor Control
+
+```typescript
+// Set editor text
+ctx.ui.setEditorText("Prefilled text for the user");
+
+// Get current editor text
+const current = ctx.ui.getEditorText();
+
+// Paste into editor (triggers paste handling, including collapse for large content)
+ctx.ui.pasteToEditor("pasted content");
+
+// Tool output expansion
+const wasExpanded = ctx.ui.getToolsExpanded();
+ctx.ui.setToolsExpanded(true);   // Expand all
+ctx.ui.setToolsExpanded(false);  // Collapse all
+
+// Terminal title
+ctx.ui.setTitle("pi - my project");
+```
+
+### Theme Management
+
+```typescript
+const themes = ctx.ui.getAllThemes();       // [{ name: "dark", path: ... }, ...]
+const lightTheme = ctx.ui.getTheme("light");  // Load without switching
+const result = ctx.ui.setTheme("light");      // Switch by name
+if (!result.success) ctx.ui.notify(result.error!, "error");
+ctx.ui.setTheme(lightTheme!);               // Switch by Theme object
+ctx.ui.theme.fg("accent", "styled text");    // Access current theme
+```
+
+---
diff --git a/docs/pi-ui-tui/06-ctx-ui-custom-full-custom-components.md b/docs/pi-ui-tui/06-ctx-ui-custom-full-custom-components.md
new file mode 100644
index 000000000..2e2f9ea51
--- /dev/null
+++ b/docs/pi-ui-tui/06-ctx-ui-custom-full-custom-components.md
@@ -0,0 +1,130 @@
+# ctx.ui.custom() — Full Custom Components
+
+This is the most powerful UI mechanism. It **temporarily replaces the editor** with your component. Returns a value when `done()` is called.
+
+### Basic Pattern
+
+```typescript
+const result = await ctx.ui.custom<string | null>((tui, theme, keybindings, done) => {
+  // tui        — TUI instance (requestRender, screen dimensions)
+  // theme      — Current theme for styling
+  // keybindings — App keybinding manager
+  // done(value) — Call to close component and return value
+
+  return {
+    render(width: number): string[] {
+      return [
+        theme.fg("accent", "─".repeat(width)),
+        " Press Enter to confirm, Escape to cancel",
+        theme.fg("accent", "─".repeat(width)),
+      ];
+    },
+    handleInput(data: string) {
+      if (matchesKey(data, Key.enter)) done("confirmed");
+      if (matchesKey(data, Key.escape)) done(null);
+    },
+    invalidate() {},
+  };
+});
+
+if (result === "confirmed") {
+  ctx.ui.notify("Confirmed!", "info");
+}
+```
+
+### The Factory Callback
+
+The factory function receives four arguments:
+
+| Argument | Type | Purpose |
+|----------|------|---------|
+| `tui` | `TUI` | Screen info and render control. `tui.requestRender()` triggers re-render after state changes. |
+| `theme` | `Theme` | Current theme. Use `theme.fg()`, `theme.bg()`, `theme.bold()`, etc. |
+| `keybindings` | `KeybindingsManager` | App keybinding config. For checking what keys do what. |
+| `done` | `(value: T) => void` | Call this to close the component and return a value to the awaiting code. |
+
+### Using Existing Components as Children
+
+```typescript
+const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
+  const container = new Container();
+  container.addChild(new DynamicBorder((s: string) => theme.fg("accent", s)));
+  container.addChild(new Text(theme.fg("accent", theme.bold("Title")), 1, 0));
+
+  const selectList = new SelectList(items, 10, {
+    selectedPrefix: (t) => theme.fg("accent", t),
+    selectedText: (t) => theme.fg("accent", t),
+    description: (t) => theme.fg("muted", t),
+    scrollInfo: (t) => theme.fg("dim", t),
+    noMatch: (t) => theme.fg("warning", t),
+  });
+  selectList.onSelect = (item) => done(item.value);
+  selectList.onCancel = () => done(null);
+  container.addChild(selectList);
+
+  container.addChild(new DynamicBorder((s: string) => theme.fg("accent", s)));
+
+  return {
+    render: (w) => container.render(w),
+    invalidate: () => container.invalidate(),
+    handleInput: (data) => { selectList.handleInput(data); tui.requestRender(); },
+  };
+});
+```
+
+### Using a Class
+
+```typescript
+class MyComponent {
+  private selected = 0;
+  private items: string[];
+  private done: (value: string | null) => void;
+  private tui: { requestRender: () => void };
+  private cachedWidth?: number;
+  private cachedLines?: string[];
+
+  constructor(tui: TUI, items: string[], done: (value: string | null) => void) {
+    this.tui = tui;
+    this.items = items;
+    this.done = done;
+  }
+
+  handleInput(data: string) {
+    if (matchesKey(data, Key.up) && this.selected > 0) {
+      this.selected--;
+      this.invalidate();
+      this.tui.requestRender();
+    } else if (matchesKey(data, Key.down) && this.selected < this.items.length - 1) {
+      this.selected++;
+      this.invalidate();
+      this.tui.requestRender();
+    } else if (matchesKey(data, Key.enter)) {
+      this.done(this.items[this.selected]);
+    } else if (matchesKey(data, Key.escape)) {
+      this.done(null);
+    }
+  }
+
+  render(width: number): string[] {
+    if (this.cachedLines && this.cachedWidth === width) return this.cachedLines;
+    this.cachedLines = this.items.map((item, i) => {
+      const prefix = i === this.selected ? "> " : "  ";
+      return truncateToWidth(prefix + item, width);
+    });
+    this.cachedWidth = width;
+    return this.cachedLines;
+  }
+
+  invalidate() {
+    this.cachedWidth = undefined;
+    this.cachedLines = undefined;
+  }
+}
+
+// Usage:
+const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
+  return new MyComponent(tui, ["Option A", "Option B", "Option C"], done);
+});
+```
+
+---
diff --git a/docs/pi-ui-tui/07-built-in-components-the-building-blocks.md b/docs/pi-ui-tui/07-built-in-components-the-building-blocks.md
new file mode 100644
index 000000000..7fcc3bfcd
--- /dev/null
+++ b/docs/pi-ui-tui/07-built-in-components-the-building-blocks.md
@@ -0,0 +1,198 @@
+# Built-in Components — The Building Blocks
+
+Import from `@mariozechner/pi-tui`:
+
+### Text
+
+Multi-line text with automatic word wrapping and optional background.
+
+```typescript
+import { Text } from "@mariozechner/pi-tui";
+
+const text = new Text(
+  "Hello World\nSecond line",  // content (supports \n)
+  1,                            // paddingX (default: 1)
+  1,                            // paddingY (default: 1)
+  (s) => bgGray(s)              // optional background function
+);
+
+text.setText("Updated content");  // Update text dynamically
+```
+
+**When to use:** Single or multi-line text blocks, styled labels, error messages.
+
+### Box
+
+Container with padding and background color. Add children inside it.
+
+```typescript
+import { Box } from "@mariozechner/pi-tui";
+
+const box = new Box(
+  1,                // paddingX
+  1,                // paddingY
+  (s) => bgGray(s)  // background function
+);
+box.addChild(new Text("Content inside a box", 0, 0));
+box.setBgFn((s) => bgBlue(s));  // Change background dynamically
+```
+
+**When to use:** Visually grouping content with a colored background.
+
+### Container
+
+Groups child components vertically (stacked). No visual styling of its own.
+
+```typescript
+import { Container } from "@mariozechner/pi-tui";
+
+const container = new Container();
+container.addChild(component1);
+container.addChild(component2);
+container.removeChild(component1);
+container.clear();  // Remove all children
+```
+
+**When to use:** Composing complex layouts from simpler components.
+
+### Spacer
+
+Empty vertical space.
+
+```typescript
+import { Spacer } from "@mariozechner/pi-tui";
+
+const spacer = new Spacer(2);  // 2 empty lines
+```
+
+**When to use:** Visual separation between components.
+
+### Markdown
+
+Renders markdown with full formatting and syntax highlighting.
+
+```typescript
+import { Markdown } from "@mariozechner/pi-tui";
+import { getMarkdownTheme } from "@mariozechner/pi-coding-agent";
+
+const md = new Markdown(
+  "# Title\n\nSome **bold** text\n\n```js\nconst x = 1;\n```",
+  1,                    // paddingX
+  1,                    // paddingY
+  getMarkdownTheme()    // MarkdownTheme (from pi-coding-agent)
+);
+
+md.setText("Updated markdown content");
+```
+
+**When to use:** Rendering documentation, help text, formatted content.
+
+### Image
+
+Renders images in supported terminals (Kitty, iTerm2, Ghostty, WezTerm).
+
+```typescript
+import { Image } from "@mariozechner/pi-tui";
+
+const image = new Image(
+  base64Data,    // base64-encoded image data
+  "image/png",   // MIME type
+  theme,         // ImageTheme
+  { maxWidthCells: 80, maxHeightCells: 24 }  // Optional size constraints
+);
+```
+
+**When to use:** Displaying generated images, screenshots, diagrams.
+
+### SelectList
+
+Interactive selection from a list with search, scrolling, and descriptions.
+
+```typescript
+import { SelectList, type SelectItem } from "@mariozechner/pi-tui";
+
+const items: SelectItem[] = [
+  { value: "opt1", label: "Option 1", description: "First option" },
+  { value: "opt2", label: "Option 2", description: "Second option" },
+  { value: "opt3", label: "Option 3" },  // description is optional
+];
+
+const selectList = new SelectList(
+  items,
+  10,  // maxVisible (scrollable if more items)
+  {
+    selectedPrefix: (t) => theme.fg("accent", t),
+    selectedText: (t) => theme.fg("accent", t),
+    description: (t) => theme.fg("muted", t),
+    scrollInfo: (t) => theme.fg("dim", t),
+    noMatch: (t) => theme.fg("warning", t),
+  }
+);
+
+selectList.onSelect = (item) => { /* item.value */ };
+selectList.onCancel = () => { /* escape pressed */ };
+```
+
+**When to use:** Letting users pick from a list. Handles arrow keys, search filtering, scrolling.
+
+### SettingsList
+
+Toggle settings with left/right arrow keys.
+
+```typescript
+import { SettingsList, type SettingItem } from "@mariozechner/pi-tui";
+import { getSettingsListTheme } from "@mariozechner/pi-coding-agent";
+
+const items: SettingItem[] = [
+  { id: "verbose", label: "Verbose mode", currentValue: "off", values: ["on", "off"] },
+  { id: "theme", label: "Theme", currentValue: "dark", values: ["dark", "light", "auto"] },
+];
+
+const settingsList = new SettingsList(
+  items,
+  Math.min(items.length + 2, 15),  // maxVisible
+  getSettingsListTheme(),
+  (id, newValue) => { /* setting changed */ },
+  () => { /* close requested (escape) */ },
+  { enableSearch: true },  // Optional: fuzzy search by label
+);
+```
+
+**When to use:** Settings panels, toggle groups, configuration UIs.
+
+### Input
+
+Text input field with cursor.
+
+```typescript
+import { Input } from "@mariozechner/pi-tui";
+
+const input = new Input();
+input.setText("initial value");
+// Route keyboard input via handleInput
+```
+
+### Editor
+
+Multi-line text editor with undo, word deletion, cursor movement.
+
+```typescript
+import { Editor, type EditorTheme } from "@mariozechner/pi-tui";
+
+const editorTheme: EditorTheme = {
+  borderColor: (s) => theme.fg("accent", s),
+  selectList: {
+    selectedPrefix: (t) => theme.fg("accent", t),
+    selectedText: (t) => theme.fg("accent", t),
+    description: (t) => theme.fg("muted", t),
+    scrollInfo: (t) => theme.fg("dim", t),
+    noMatch: (t) => theme.fg("warning", t),
+  },
+};
+
+const editor = new Editor(tui, editorTheme);
+editor.setText("prefilled");
+editor.onSubmit = (value) => { /* enter pressed */ };
+```
+
+---
diff --git a/docs/pi-ui-tui/08-high-level-components-from-pi-coding-agent.md b/docs/pi-ui-tui/08-high-level-components-from-pi-coding-agent.md
new file mode 100644
index 000000000..e5ba3da95
--- /dev/null
+++ b/docs/pi-ui-tui/08-high-level-components-from-pi-coding-agent.md
@@ -0,0 +1,51 @@
+# High-Level Components from pi-coding-agent
+
+### DynamicBorder
+
+A horizontal border line with themed color. Use for framing dialogs.
+
+```typescript
+import { DynamicBorder } from "@mariozechner/pi-coding-agent";
+
+// ⚠️ MUST explicitly type the parameter as string
+const border = new DynamicBorder((s: string) => theme.fg("accent", s));
+```
+
+### BorderedLoader
+
+Spinner with cancel support. Shows a message and an animated spinner while async work runs.
+
+```typescript
+import { BorderedLoader } from "@mariozechner/pi-coding-agent";
+
+const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
+  const loader = new BorderedLoader(tui, theme, "Fetching data...");
+  loader.onAbort = () => done(null);  // Escape pressed
+
+  // Do async work with the loader's AbortSignal
+  fetchData(loader.signal)
+    .then(data => done(data))
+    .catch(() => done(null));
+
+  return loader;
+});
+```
+
+### CustomEditor
+
+Base class for custom editors that replace the input. Provides app keybindings (escape to abort, ctrl+d, model switching) automatically.
+
+```typescript
+import { CustomEditor } from "@mariozechner/pi-coding-agent";
+
+class MyEditor extends CustomEditor {
+  handleInput(data: string): void {
+    // Handle your keys first
+    if (data === "x") { /* custom behavior */ return; }
+    // Fall through to CustomEditor for app keybindings + text editing
+    super.handleInput(data);
+  }
+}
+```
+
+---
diff --git a/docs/pi-ui-tui/09-keyboard-input-how-to-handle-keys.md b/docs/pi-ui-tui/09-keyboard-input-how-to-handle-keys.md
new file mode 100644
index 000000000..08f77af2c
--- /dev/null
+++ b/docs/pi-ui-tui/09-keyboard-input-how-to-handle-keys.md
@@ -0,0 +1,62 @@
+# Keyboard Input — How to Handle Keys
+
+### matchesKey — The Key Detection Function
+
+```typescript
+import { matchesKey, Key } from "@mariozechner/pi-tui";
+
+handleInput(data: string) {
+  // Using Key constants (recommended — gives autocomplete)
+  if (matchesKey(data, Key.up)) { /* arrow up */ }
+  if (matchesKey(data, Key.down)) { /* arrow down */ }
+  if (matchesKey(data, Key.left)) { /* arrow left */ }
+  if (matchesKey(data, Key.right)) { /* arrow right */ }
+  if (matchesKey(data, Key.enter)) { /* enter */ }
+  if (matchesKey(data, Key.escape)) { /* escape */ }
+  if (matchesKey(data, Key.tab)) { /* tab */ }
+  if (matchesKey(data, Key.space)) { /* space */ }
+  if (matchesKey(data, Key.backspace)) { /* backspace */ }
+  if (matchesKey(data, Key.delete)) { /* delete */ }
+  if (matchesKey(data, Key.home)) { /* home */ }
+  if (matchesKey(data, Key.end)) { /* end */ }
+
+  // With modifiers
+  if (matchesKey(data, Key.ctrl("c"))) { /* ctrl+c */ }
+  if (matchesKey(data, Key.ctrl("x"))) { /* ctrl+x */ }
+  if (matchesKey(data, Key.shift("tab"))) { /* shift+tab */ }
+  if (matchesKey(data, Key.alt("left"))) { /* alt+left */ }
+  if (matchesKey(data, Key.ctrlShift("p"))) { /* ctrl+shift+p */ }
+
+  // String format also works
+  if (matchesKey(data, "enter")) { }
+  if (matchesKey(data, "ctrl+c")) { }
+  if (matchesKey(data, "shift+tab")) { }
+
+  // Printable character detection
+  if (data.length === 1 && data.charCodeAt(0) >= 32) {
+    // It's a printable character (letter, number, symbol)
+  }
+}
+```
+
+### Key identifiers Reference
+
+| Category | Keys |
+|----------|------|
+| Basic | `enter`, `escape`, `tab`, `space`, `backspace`, `delete`, `home`, `end` |
+| Arrow | `up`, `down`, `left`, `right` |
+| Modifiers | `ctrl("x")`, `shift("tab")`, `alt("left")`, `ctrlShift("p")` |
+
+### The handleInput Contract
+
+```typescript
+handleInput(data: string): void {
+  // 1. Check for your keys
+  // 2. Update state
+  // 3. Call this.invalidate() if render output changes
+  // 4. Call tui.requestRender() to trigger a re-render
+  //    (or if you're the top-level custom component, the TUI does this automatically)
+}
+```
+
+---
diff --git a/docs/pi-ui-tui/10-line-width-the-cardinal-rule.md b/docs/pi-ui-tui/10-line-width-the-cardinal-rule.md
new file mode 100644
index 000000000..763b4fc5f
--- /dev/null
+++ b/docs/pi-ui-tui/10-line-width-the-cardinal-rule.md
@@ -0,0 +1,46 @@
+# Line Width — The Cardinal Rule
+
+**Every line from `render()` MUST NOT exceed the `width` parameter in visible characters.** This is the single most common source of rendering bugs.
+
+### Utilities
+
+```typescript
+import { visibleWidth, truncateToWidth, wrapTextWithAnsi } from "@mariozechner/pi-tui";
+
+// Get display width (ignores ANSI escape codes)
+visibleWidth("\x1b[32mHello\x1b[0m");  // Returns 5, not 14
+
+// Truncate to fit width (preserves ANSI codes)
+truncateToWidth("Very long text here", 10);        // "Very lo..."
+truncateToWidth("Very long text here", 10, "");     // "Very long " (no ellipsis)
+truncateToWidth("Very long text here", 10, "→");    // "Very long→"
+
+// Word wrap preserving ANSI codes
+wrapTextWithAnsi("\x1b[32mThis is a long green text\x1b[0m", 15);
+// Returns ["This is a long", "green text"] with ANSI codes preserved per line
+```
+
+### The Pattern
+
+```typescript
+render(width: number): string[] {
+  const lines: string[] = [];
+
+  // Always truncate any line that could exceed width
+  lines.push(truncateToWidth(`  ${prefix}${content}`, width));
+
+  // For dynamic content, calculate available space
+  const labelWidth = visibleWidth(label);
+  const available = width - labelWidth - 4;  // Leave room for padding
+  const truncated = truncateToWidth(value, available);
+  lines.push(`  ${label}: ${truncated}`);
+
+  return lines;
+}
+```
+
+### Why This Matters
+
+If a line exceeds `width`, the terminal wraps it, causing visual corruption — lines overlap, the cursor mispositions, and the entire TUI can become garbled. The TUI framework **cannot fix this for you** because it doesn't know how you want lines truncated.
+
+---
diff --git a/docs/pi-ui-tui/11-theming-colors-and-styles.md b/docs/pi-ui-tui/11-theming-colors-and-styles.md
new file mode 100644
index 000000000..9dc8ebe4d
--- /dev/null
+++ b/docs/pi-ui-tui/11-theming-colors-and-styles.md
@@ -0,0 +1,72 @@
+# Theming — Colors and Styles
+
+### Using Theme Colors
+
+The `theme` object is always passed via callbacks — never import it directly.
+
+```typescript
+// Foreground color
+theme.fg("accent", "Highlighted text")     // Apply foreground color
+theme.fg("success", "✓ Passed")
+theme.fg("error", "✗ Failed")
+theme.fg("warning", "⚠ Warning")
+theme.fg("muted", "Secondary text")
+theme.fg("dim", "Tertiary text")
+
+// Background color
+theme.bg("selectedBg", "Selected item")
+theme.bg("toolSuccessBg", "Success background")
+
+// Text styles
+theme.bold("Bold text")
+theme.italic("Italic text")
+theme.strikethrough("Struck through")
+
+// Combination
+theme.fg("accent", theme.bold("Bold and colored"))
+theme.bg("selectedBg", theme.fg("text", " Selected "))
+```
+
+### All Foreground Colors
+
+| Category | Colors |
+|----------|--------|
+| **General** | `text`, `accent`, `muted`, `dim` |
+| **Status** | `success`, `error`, `warning` |
+| **Borders** | `border`, `borderAccent`, `borderMuted` |
+| **Messages** | `userMessageText`, `customMessageText`, `customMessageLabel` |
+| **Tools** | `toolTitle`, `toolOutput` |
+| **Diffs** | `toolDiffAdded`, `toolDiffRemoved`, `toolDiffContext` |
+| **Markdown** | `mdHeading`, `mdLink`, `mdLinkUrl`, `mdCode`, `mdCodeBlock`, `mdCodeBlockBorder`, `mdQuote`, `mdQuoteBorder`, `mdHr`, `mdListBullet` |
+| **Syntax** | `syntaxComment`, `syntaxKeyword`, `syntaxFunction`, `syntaxVariable`, `syntaxString`, `syntaxNumber`, `syntaxType`, `syntaxOperator`, `syntaxPunctuation` |
+| **Thinking** | `thinkingOff`, `thinkingMinimal`, `thinkingLow`, `thinkingMedium`, `thinkingHigh`, `thinkingXhigh` |
+| **Modes** | `bashMode` |
+
+### All Background Colors
+
+`selectedBg`, `userMessageBg`, `customMessageBg`, `toolPendingBg`, `toolSuccessBg`, `toolErrorBg`
+
+### Syntax Highlighting
+
+```typescript
+import { highlightCode, getLanguageFromPath } from "@mariozechner/pi-coding-agent";
+
+// Highlight with explicit language
+const highlighted = highlightCode("const x = 1;", "typescript", theme);
+
+// Auto-detect from file path
+const lang = getLanguageFromPath("/path/to/file.rs");  // "rust"
+const highlighted = highlightCode(code, lang, theme);
+```
+
+### Markdown Theme
+
+```typescript
+import { getMarkdownTheme } from "@mariozechner/pi-coding-agent";
+import { Markdown } from "@mariozechner/pi-tui";
+
+const mdTheme = getMarkdownTheme();
+const md = new Markdown(content, 1, 1, mdTheme);
+```
+
+---
diff --git a/docs/pi-ui-tui/12-overlays-floating-modals-and-panels.md b/docs/pi-ui-tui/12-overlays-floating-modals-and-panels.md
new file mode 100644
index 000000000..5d8293a64
--- /dev/null
+++ b/docs/pi-ui-tui/12-overlays-floating-modals-and-panels.md
@@ -0,0 +1,115 @@
+# Overlays — Floating Modals and Panels
+
+Overlays render **on top of existing content** without clearing the screen. Essential for dialogs, side panels, and floating UI.
+
+### Basic Overlay
+
+```typescript
+const result = await ctx.ui.custom<string | null>(
+  (tui, theme, keybindings, done) => new MyDialog({ onClose: done }),
+  { overlay: true }
+);
+```
+
+### Positioned Overlay
+
+```typescript
+const result = await ctx.ui.custom<string | null>(
+  (tui, theme, _kb, done) => new SidePanel({ onClose: done }),
+  {
+    overlay: true,
+    overlayOptions: {
+      // Size (number = columns, string = percentage)
+      width: "50%",
+      minWidth: 40,
+      maxHeight: "80%",
+
+      // Position: anchor-based (9 positions)
+      anchor: "right-center",
+      offsetX: -2,
+      offsetY: 0,
+
+      // Or absolute/percentage positioning
+      row: "25%",    // 25% from top
+      col: 10,       // column 10
+
+      // Margins
+      margin: 2,                              // all sides
+      margin: { top: 2, right: 4, bottom: 2, left: 4 },  // per side
+
+      // Responsive: hide on narrow terminals
+      visible: (termWidth, termHeight) => termWidth >= 80,
+    },
+  }
+);
+```
+
+### Anchor Positions
+
+```
+  top-left      top-center      top-right
+       ┌────────────────────────────┐
+       │                            │
+  left-center    center     right-center
+       │                            │
+       └────────────────────────────┘
+  bottom-left  bottom-center  bottom-right
+```
+
+### Programmatic Visibility Control
+
+```typescript
+let overlayHandle: OverlayHandle | null = null;
+
+const result = await ctx.ui.custom<string | null>(
+  (tui, theme, _kb, done) => new MyPanel({ onClose: done }),
+  {
+    overlay: true,
+    overlayOptions: { anchor: "right-center", width: "40%" },
+    onHandle: (handle) => {
+      overlayHandle = handle;
+      // handle.setHidden(true)  — temporarily hide
+      // handle.setHidden(false) — show again
+      // handle.hide()           — permanently remove
+    },
+  }
+);
+```
+
+### Stacked Overlays
+
+Multiple overlays can be shown simultaneously. They stack in order (newest on top). Each one's `done()` closes only that overlay:
+
+```typescript
+// Show three stacked overlays
+const p1 = ctx.ui.custom(/* ... */, { overlay: true, overlayOptions: { offsetX: -5, offsetY: -3 } });
+const p2 = ctx.ui.custom(/* ... */, { overlay: true, overlayOptions: { offsetX: 0, offsetY: 0 } });
+const p3 = ctx.ui.custom(/* ... */, { overlay: true, overlayOptions: { offsetX: 5, offsetY: 3 } });
+
+// Last one shown (p3) receives keyboard input
+// Closing p3 gives focus to p2, closing p2 gives focus to p1
+```
+
+### ⚠️ Overlay Lifecycle Rule
+
+**Overlay components are disposed when closed. Never reuse references.**
+
+```typescript
+// ❌ WRONG — stale reference
+let menu: MenuComponent;
+await ctx.ui.custom((_, __, ___, done) => {
+  menu = new MenuComponent(done);
+  return menu;
+}, { overlay: true });
+menu.doSomething();  // DISPOSED — will fail
+
+// ✅ CORRECT — re-call the factory
+const showMenu = () => ctx.ui.custom(
+  (_, __, ___, done) => new MenuComponent(done),
+  { overlay: true }
+);
+await showMenu();  // First show
+await showMenu();  // Re-show with fresh instance
+```
+
+---
diff --git a/docs/pi-ui-tui/13-custom-editors-replacing-the-input.md b/docs/pi-ui-tui/13-custom-editors-replacing-the-input.md
new file mode 100644
index 000000000..91f205910
--- /dev/null
+++ b/docs/pi-ui-tui/13-custom-editors-replacing-the-input.md
@@ -0,0 +1,72 @@
+# Custom Editors — Replacing the Input
+
+Replace the main input editor with a custom implementation. The editor persists until explicitly removed.
+
+### The Pattern
+
+```typescript
+import { CustomEditor, type ExtensionAPI } from "@mariozechner/pi-coding-agent";
+import { matchesKey, truncateToWidth } from "@mariozechner/pi-tui";
+
+class VimEditor extends CustomEditor {
+  private mode: "normal" | "insert" = "insert";
+
+  handleInput(data: string): void {
+    // Escape in insert mode → switch to normal
+    if (matchesKey(data, "escape") && this.mode === "insert") {
+      this.mode = "normal";
+      return;
+    }
+
+    // Insert mode: pass everything to CustomEditor for text editing + app keybindings
+    if (this.mode === "insert") {
+      super.handleInput(data);
+      return;
+    }
+
+    // Normal mode: vim keys
+    switch (data) {
+      case "i": this.mode = "insert"; return;
+      case "h": super.handleInput("\x1b[D"); return;  // Left arrow
+      case "j": super.handleInput("\x1b[B"); return;  // Down arrow
+      case "k": super.handleInput("\x1b[A"); return;  // Up arrow
+      case "l": super.handleInput("\x1b[C"); return;  // Right arrow
+    }
+
+    // Filter printable chars in normal mode (don't insert them)
+    if (data.length === 1 && data.charCodeAt(0) >= 32) return;
+
+    // Pass unhandled to super (ctrl+c, ctrl+d, etc.)
+    super.handleInput(data);
+  }
+
+  render(width: number): string[] {
+    const lines = super.render(width);
+    // Add mode indicator to last line
+    if (lines.length > 0) {
+      const label = this.mode === "normal" ? " NORMAL " : " INSERT ";
+      const lastLine = lines[lines.length - 1]!;
+      lines[lines.length - 1] = truncateToWidth(lastLine, width - label.length, "") + label;
+    }
+    return lines;
+  }
+}
+
+// Register it:
+export default function (pi: ExtensionAPI) {
+  pi.on("session_start", (_event, ctx) => {
+    ctx.ui.setEditorComponent((_tui, theme, keybindings) =>
+      new VimEditor(theme, keybindings)
+    );
+  });
+}
+```
+
+### Critical Rules
+
+1. **Extend `CustomEditor`**, not `Editor`. `CustomEditor` provides app keybindings (escape to abort, ctrl+d to exit, model switching) that must not be lost.
+2. **Call `super.handleInput(data)`** for any key you don't handle.
+3. **Use the factory pattern**: `setEditorComponent` receives a factory `(tui, theme, keybindings) => CustomEditor`.
+4. **Pass `undefined` to restore default**: `ctx.ui.setEditorComponent(undefined)`.
+
+---
diff --git a/docs/pi-ui-tui/14-tool-rendering-custom-tool-display.md b/docs/pi-ui-tui/14-tool-rendering-custom-tool-display.md
new file mode 100644
index 000000000..081ddd1ff
--- /dev/null
+++ b/docs/pi-ui-tui/14-tool-rendering-custom-tool-display.md
@@ -0,0 +1,95 @@
+# Tool Rendering — Custom Tool Display
+
+Tools can control how their calls and results appear in the message area.
+
+### renderCall — How the Tool Call Looks
+
+```typescript
+import { Text } from "@mariozechner/pi-tui";
+
+pi.registerTool({
+  name: "my_tool",
+  // ...
+
+  renderCall(args, theme) {
+    // args = the tool call arguments
+    let text = theme.fg("toolTitle", theme.bold("my_tool "));
+    text += theme.fg("muted", args.action);
+    if (args.text) text += " " + theme.fg("dim", `"${args.text}"`);
+    return new Text(text, 0, 0);  // 0,0 padding — the wrapping Box handles padding
+  },
+});
+```
+
+### renderResult — How the Tool Result Looks
+
+```typescript
+import { Text } from "@mariozechner/pi-tui";
+import { keyHint } from "@mariozechner/pi-coding-agent";
+
+pi.registerTool({
+  name: "my_tool",
+  // ...
+
+  renderResult(result, { expanded, isPartial }, theme) {
+    // result.content — the content array sent to the LLM
+    // result.details — your custom details object
+    // expanded — whether user toggled expand (Ctrl+O)
+    // isPartial — streaming in progress (onUpdate was called)
+
+    // Handle streaming state
+    if (isPartial) {
+      return new Text(theme.fg("warning", "Processing..."), 0, 0);
+    }
+
+    // Handle errors
+    if (result.details?.error) {
+      return new Text(theme.fg("error", `Error: ${result.details.error}`), 0, 0);
+    }
+
+    // Default view (collapsed)
+    let text = theme.fg("success", "✓ Done");
+    if (!expanded) {
+      text += ` (${keyHint("expandTools", "to expand")})`;
+    }
+
+    // Expanded view — show details
+    if (expanded && result.details?.items) {
+      for (const item of result.details.items) {
+        text += "\n  " + theme.fg("dim", item);
+      }
+    }
+
+    return new Text(text, 0, 0);
+  },
+});
+```
+
+### Key Hints for Keybindings
+
+```typescript
+import { keyHint, appKeyHint, editorKey, rawKeyHint } from "@mariozechner/pi-coding-agent";
+
+// Editor action hint (respects user's keybinding config)
+keyHint("expandTools", "to expand")    // e.g., "Ctrl+O to expand"
+keyHint("selectConfirm", "to select")  // e.g., "Enter to select"
+
+// Raw key hint
+rawKeyHint("Ctrl+O", "to expand")      // Always shows "Ctrl+O to expand"
+```
+
+### Fallback Behavior
+
+If `renderCall` or `renderResult` is not defined or throws:
+- `renderCall` → shows tool name
+- `renderResult` → shows raw text from `content`
+
+### Best Practices
+
+- Return `Text` with padding `(0, 0)` — the wrapping `Box` handles padding
+- Support `expanded` for detail on demand
+- Handle `isPartial` for streaming progress
+- Keep the default (collapsed) view compact
+- Use `\n` for multi-line content within a single `Text`
+
+---
diff --git a/docs/pi-ui-tui/15-message-rendering-custom-message-display.md b/docs/pi-ui-tui/15-message-rendering-custom-message-display.md
new file mode 100644
index 000000000..f65226fed
--- /dev/null
+++ b/docs/pi-ui-tui/15-message-rendering-custom-message-display.md
@@ -0,0 +1,33 @@
+# Message Rendering — Custom Message Display
+
+Register a renderer for messages with your `customType`:
+
+```typescript
+import { Text } from "@mariozechner/pi-tui";
+
+pi.registerMessageRenderer("my-extension", (message, options, theme) => {
+  const { expanded } = options;
+
+  let text = theme.fg("accent", `[${message.customType}] `);
+  text += message.content;
+
+  if (expanded && message.details) {
+    text += "\n" + theme.fg("dim", JSON.stringify(message.details, null, 2));
+  }
+
+  return new Text(text, 0, 0);
+});
+```
+
+Send messages that use this renderer:
+
+```typescript
+pi.sendMessage({
+  customType: "my-extension",  // Must match registerMessageRenderer
+  content: "Status update",
+  display: true,               // Show in TUI
+  details: { progress: 50 },   // Available in renderer, NOT sent to LLM
+});
+```
+
+---
diff --git a/docs/pi-ui-tui/16-performance-caching-and-invalidation.md b/docs/pi-ui-tui/16-performance-caching-and-invalidation.md
new file mode 100644
index 000000000..da073e76e
--- /dev/null
+++ b/docs/pi-ui-tui/16-performance-caching-and-invalidation.md
@@ -0,0 +1,78 @@
+# Performance — Caching and Invalidation
+
+### The Caching Pattern
+
+Always cache `render()` output and recompute only when state changes:
+
+```typescript
+class CachedComponent {
+  private cachedWidth?: number;
+  private cachedLines?: string[];
+
+  render(width: number): string[] {
+    if (this.cachedLines && this.cachedWidth === width) {
+      return this.cachedLines;
+    }
+
+    // Expensive computation here
+    const lines = this.computeLines(width);
+
+    this.cachedWidth = width;
+    this.cachedLines = lines;
+    return lines;
+  }
+
+  invalidate(): void {
+    this.cachedWidth = undefined;
+    this.cachedLines = undefined;
+  }
+}
+```
+
+### The Update Cycle
+
+```
+State changes → invalidate() → tui.requestRender() → render(width) called
+```
+
+1. Something changes your component's state (user input, timer, async result)
+2. Call `this.invalidate()` to clear caches
+3. Call `tui.requestRender()` to schedule a re-render
+4. The TUI calls `render(width)` on the next frame
+5. Your component recomputes its output (since cache was cleared)
+
+### Game Loop Pattern (Real-Time Updates)
+
+```typescript
+class GameComponent {
+  private interval: ReturnType<typeof setInterval> | null = null;
+  private version = 0;
+  private cachedVersion = -1;
+
+  constructor(private tui: { requestRender: () => void }) {
+    this.interval = setInterval(() => {
+      this.tick();
+      this.version++;
+      this.tui.requestRender();
+    }, 100);  // 10 FPS
+  }
+
+  render(width: number): string[] {
+    if (this.cachedVersion === this.version && /* width unchanged */) {
+      return this.cachedLines;
+    }
+    // ... render ...
+    this.cachedVersion = this.version;
+    return lines;
+  }
+
+  dispose(): void {
+    if (this.interval) {
+      clearInterval(this.interval);
+      this.interval = null;
+    }
+  }
+}
+```
+
+---
diff --git a/docs/pi-ui-tui/17-theme-changes-and-invalidation.md b/docs/pi-ui-tui/17-theme-changes-and-invalidation.md
new file mode 100644
index 000000000..c567a3ad8
--- /dev/null
+++ b/docs/pi-ui-tui/17-theme-changes-and-invalidation.md
@@ -0,0 +1,49 @@
+# Theme Changes and Invalidation
+
+When the user switches themes, the TUI calls `invalidate()` on all components. If your component pre-bakes theme colors, you must rebuild them.
+
+### ❌ Wrong — Theme Colors Won't Update
+
+```typescript
+class BadComponent extends Container {
+  constructor(message: string, theme: Theme) {
+    super();
+    // Pre-baked theme colors — stuck with old theme forever!
+    this.addChild(new Text(theme.fg("accent", message), 1, 0));
+  }
+}
+```
+
+### ✅ Correct — Rebuild on Invalidate
+
+```typescript
+class GoodComponent extends Container {
+  private message: string;
+  private theme: Theme;
+
+  constructor(message: string, theme: Theme) {
+    super();
+    this.message = message;
+    this.theme = theme;
+    this.rebuild();
+  }
+
+  private rebuild(): void {
+    this.clear();  // Remove all children
+    this.addChild(new Text(this.theme.fg("accent", this.message), 1, 0));
+  }
+
+  override invalidate(): void {
+    super.invalidate();
+    this.rebuild();  // Rebuild with current theme
+  }
+}
+```
+
+### When You Need This Pattern
+
+**NEED to rebuild:** Pre-baked `theme.fg()`/`theme.bg()` strings, `highlightCode()` results, complex child trees with embedded colors.
+
+**DON'T need to rebuild:** Theme callbacks `(text) => theme.fg("accent", text)`, stateless renders that compute fresh each time, simple containers without themed content.
+
+---
diff --git a/docs/pi-ui-tui/18-ime-support-the-focusable-interface.md b/docs/pi-ui-tui/18-ime-support-the-focusable-interface.md
new file mode 100644
index 000000000..1b418ae73
--- /dev/null
+++ b/docs/pi-ui-tui/18-ime-support-the-focusable-interface.md
@@ -0,0 +1,43 @@
+# IME Support — The Focusable Interface
+
+For components that display a text cursor and need IME (Input Method Editor) support for CJK languages:
+
+```typescript
+import { CURSOR_MARKER, type Component, type Focusable } from "@mariozechner/pi-tui";
+
+class MyInput implements Component, Focusable {
+  focused: boolean = false;  // Set by TUI when focus changes
+
+  render(width: number): string[] {
+    const marker = this.focused ? CURSOR_MARKER : "";
+    return [`> ${beforeCursor}${marker}\x1b[7m${atCursor}\x1b[27m${afterCursor}`];
+  }
+}
+```
+
+### Container with Embedded Input
+
+If your container contains an `Input` or `Editor` child, propagate focus:
+
+```typescript
+class SearchDialog extends Container implements Focusable {
+  private searchInput: Input;
+  private _focused = false;
+
+  get focused(): boolean { return this._focused; }
+  set focused(value: boolean) {
+    this._focused = value;
+    this.searchInput.focused = value;  // Propagate!
+  }
+
+  constructor() {
+    super();
+    this.searchInput = new Input();
+    this.addChild(this.searchInput);
+  }
+}
+```
+
+Without this, IME candidate windows (Chinese, Japanese, Korean input) appear in the wrong position.
+
+---
diff --git a/docs/pi-ui-tui/19-building-a-complete-component-step-by-step.md b/docs/pi-ui-tui/19-building-a-complete-component-step-by-step.md
new file mode 100644
index 000000000..0f125a397
--- /dev/null
+++ b/docs/pi-ui-tui/19-building-a-complete-component-step-by-step.md
@@ -0,0 +1,133 @@
+# Building a Complete Component — Step by Step
+
+Let's build a real component: an interactive todo list displayed via a command.
+
+```typescript
+import type { ExtensionAPI, Theme } from "@mariozechner/pi-coding-agent";
+import { DynamicBorder } from "@mariozechner/pi-coding-agent";
+import { Container, Key, matchesKey, Text, truncateToWidth } from "@mariozechner/pi-tui";
+
+interface TodoItem {
+  text: string;
+  done: boolean;
+}
+
+class TodoListUI {
+  private items: TodoItem[];
+  private selected = 0;
+  private theme: Theme;
+  private done: (items: TodoItem[]) => void;
+  private tui: { requestRender: () => void };
+  private cachedWidth?: number;
+  private cachedLines?: string[];
+
+  constructor(
+    tui: { requestRender: () => void },
+    theme: Theme,
+    items: TodoItem[],
+    done: (items: TodoItem[]) => void,
+  ) {
+    this.tui = tui;
+    this.theme = theme;
+    this.items = [...items];  // Clone to avoid mutation
+    this.done = done;
+  }
+
+  handleInput(data: string): void {
+    if (matchesKey(data, Key.up) && this.selected > 0) {
+      this.selected--;
+    } else if (matchesKey(data, Key.down) && this.selected < this.items.length - 1) {
+      this.selected++;
+    } else if (matchesKey(data, Key.space)) {
+      // Toggle current item
+      this.items[this.selected].done = !this.items[this.selected].done;
+    } else if (matchesKey(data, Key.enter)) {
+      this.done(this.items);
+      return;
+    } else if (matchesKey(data, Key.escape)) {
+      this.done(this.items);
+      return;
+    } else {
+      return;  // Don't invalidate for unhandled keys
+    }
+
+    this.invalidate();
+    this.tui.requestRender();
+  }
+
+  render(width: number): string[] {
+    if (this.cachedLines && this.cachedWidth === width) {
+      return this.cachedLines;
+    }
+
+    const th = this.theme;
+    const lines: string[] = [];
+
+    // Border
+    lines.push(truncateToWidth(th.fg("accent", "─".repeat(width)), width));
+
+    // Title
+    const done = this.items.filter(i => i.done).length;
+    lines.push(truncateToWidth(
+      ` ${th.fg("accent", th.bold("Todos"))} ${th.fg("muted", `${done}/${this.items.length}`)}`,
+      width
+    ));
+    lines.push("");
+
+    // Items
+    for (let i = 0; i < this.items.length; i++) {
+      const item = this.items[i];
+      const isSelected = i === this.selected;
+      const prefix = isSelected ? th.fg("accent", "> ") : "  ";
+      const check = item.done ? th.fg("success", "✓ ") : th.fg("dim", "○ ");
+      const text = item.done
+        ? th.fg("muted", th.strikethrough(item.text))
+        : th.fg("text", item.text);
+
+      lines.push(truncateToWidth(`${prefix}${check}${text}`, width));
+    }
+
+    if (this.items.length === 0) {
+      lines.push(truncateToWidth(`  ${th.fg("dim", "No items")}`, width));
+    }
+
+    // Help
+    lines.push("");
+    lines.push(truncateToWidth(
+      ` ${th.fg("dim", "↑↓ navigate • Space toggle • Enter/Esc close")}`,
+      width
+    ));
+    lines.push(truncateToWidth(th.fg("accent", "─".repeat(width)), width));
+
+    this.cachedWidth = width;
+    this.cachedLines = lines;
+    return lines;
+  }
+
+  invalidate(): void {
+    this.cachedWidth = undefined;
+    this.cachedLines = undefined;
+  }
+}
+
+// Usage in an extension:
+export default function (pi: ExtensionAPI) {
+  let items: TodoItem[] = [
+    { text: "First task", done: false },
+    { text: "Second task", done: true },
+    { text: "Third task", done: false },
+  ];
+
+  pi.registerCommand("todos", {
+    description: "Interactive todo list",
+    handler: async (_args, ctx) => {
+      const result = await ctx.ui.custom<TodoItem[]>((tui, theme, _kb, done) => {
+        return new TodoListUI(tui, theme, items, done);
+      });
+      items = result;  // Save updated state
+    },
+  });
+}
+```
+
+---
diff --git a/docs/pi-ui-tui/20-real-world-patterns-from-examples.md b/docs/pi-ui-tui/20-real-world-patterns-from-examples.md
new file mode 100644
index 000000000..7c1bc5497
--- /dev/null
+++ b/docs/pi-ui-tui/20-real-world-patterns-from-examples.md
@@ -0,0 +1,147 @@
+# Real-World Patterns from Examples
+
+### Pattern: Selection Dialog with Borders
+
+From `preset.ts` and `tools.ts`:
+
+```typescript
+const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
+  const container = new Container();
+  container.addChild(new DynamicBorder((s: string) => theme.fg("accent", s)));
+  container.addChild(new Text(theme.fg("accent", theme.bold("Title")), 1, 0));
+
+  const selectList = new SelectList(items, maxVisible, {
+    selectedPrefix: (t) => theme.fg("accent", t),
+    selectedText: (t) => theme.fg("accent", t),
+    description: (t) => theme.fg("muted", t),
+    scrollInfo: (t) => theme.fg("dim", t),
+    noMatch: (t) => theme.fg("warning", t),
+  });
+  selectList.onSelect = (item) => done(item.value);
+  selectList.onCancel = () => done(null);
+  container.addChild(selectList);
+  container.addChild(new Text(theme.fg("dim", "↑↓ navigate • enter select • esc cancel"), 1, 0));
+  container.addChild(new DynamicBorder((s: string) => theme.fg("accent", s)));
+
+  return {
+    render: (w) => container.render(w),
+    invalidate: () => container.invalidate(),
+    handleInput: (data) => { selectList.handleInput(data); tui.requestRender(); },
+  };
+});
+```
+
+### Pattern: Game with Timer Loop
+
+From `snake.ts`:
+
+```typescript
+class SnakeComponent {
+  private interval: ReturnType<typeof setInterval> | null = null;
+
+  constructor(tui: { requestRender: () => void }, done: () => void) {
+    this.interval = setInterval(() => {
+      this.tick();       // Update game state
+      this.version++;    // Bump render version
+      tui.requestRender();  // Request re-render
+    }, 100);
+  }
+
+  dispose(): void {
+    if (this.interval) {
+      clearInterval(this.interval);
+      this.interval = null;
+    }
+  }
+
+  // Call dispose() before calling done() to stop the timer
+  handleInput(data: string): void {
+    if (matchesKey(data, Key.escape)) {
+      this.dispose();
+      this.onClose();
+    }
+  }
+}
+```
+
+### Pattern: Async Operation with Spinner
+
+From `qna.ts`:
+
+```typescript
+const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
+  const loader = new BorderedLoader(tui, theme, "Processing...");
+  loader.onAbort = () => done(null);
+
+  doAsyncWork(loader.signal)
+    .then(data => done(data))
+    .catch(() => done(null));
+
+  return loader;
+});
+```
+
+### Pattern: Persistent Widget with Live Updates
+
+From `plan-mode/index.ts`:
+
+```typescript
+function updateUI(ctx: ExtensionContext): void {
+  if (todoItems.length > 0) {
+    const lines = todoItems.map(item => {
+      if (item.completed) {
+        return ctx.ui.theme.fg("success", "☑ ") +
+               ctx.ui.theme.fg("muted", ctx.ui.theme.strikethrough(item.text));
+      }
+      return ctx.ui.theme.fg("muted", "☐ ") + item.text;
+    });
+    ctx.ui.setWidget("plan-todos", lines);
+    ctx.ui.setStatus("plan-mode", ctx.ui.theme.fg("accent", `📋 ${completed}/${total}`));
+  } else {
+    ctx.ui.setWidget("plan-todos", undefined);
+    ctx.ui.setStatus("plan-mode", undefined);
+  }
+}
+```
+
+### Pattern: Multi-Tab Questionnaire
+
+From `questionnaire.ts`:
+
+```typescript
+// State: currentTab, optionIndex, inputMode, answers map
+// Tab navigation with shift+tab / tab
+// Option selection with up/down + enter
+// "Type something" option that switches to embedded Editor
+// Submit tab that shows summary of all answers
+// Full renderCall and renderResult for LLM context display
+```
+
+### Pattern: Custom Footer with Reactive Data
+
+From `custom-footer.ts`:
+
+```typescript
+ctx.ui.setFooter((tui, theme, footerData) => ({
+  invalidate() {},
+  render(width: number): string[] {
+    let input = 0, output = 0, cost = 0;
+    for (const e of ctx.sessionManager.getBranch()) {
+      if (e.type === "message" && e.message.role === "assistant") {
+        const m = e.message as AssistantMessage;
+        input += m.usage.input;
+        output += m.usage.output;
+        cost += m.usage.cost.total;
+      }
+    }
+    const branch = footerData.getGitBranch();
+    const left = theme.fg("dim", `↑${fmt(input)} ↓${fmt(output)} $${cost.toFixed(3)}`);
+    const right = theme.fg("dim", `${ctx.model?.id}${branch ? ` (${branch})` : ""}`);
+    const pad = " ".repeat(Math.max(1, width - visibleWidth(left) - visibleWidth(right)));
+    return [truncateToWidth(left + pad + right, width)];
+  },
+  dispose: footerData.onBranchChange(() => tui.requestRender()),
+}));
+```
+
+---
diff --git a/docs/pi-ui-tui/21-common-mistakes-and-how-to-avoid-them.md b/docs/pi-ui-tui/21-common-mistakes-and-how-to-avoid-them.md
new file mode 100644
index 000000000..3e773cb20
--- /dev/null
+++ b/docs/pi-ui-tui/21-common-mistakes-and-how-to-avoid-them.md
@@ -0,0 +1,53 @@
+# Common Mistakes and How to Avoid Them
+
+### 1. Lines exceed width
+
+**Symptom:** Visual corruption, overlapping lines, garbled display.
+**Fix:** Use `truncateToWidth()` on every line.
+
+### 2. Forgetting `tui.requestRender()`
+
+**Symptom:** UI doesn't update after state changes.
+**Fix:** Call `this.invalidate()` then `tui.requestRender()` after any state change in `handleInput`.
+
+### 3. Importing theme directly
+
+**Symptom:** Wrong colors, crashes, or stale theme after switching.
+**Fix:** Always use `theme` from the callback: `ctx.ui.custom((tui, theme, kb, done) => ...)`.
+
+### 4. Not typing DynamicBorder color param
+
+**Symptom:** TypeScript error or runtime crash.
+**Fix:** `new DynamicBorder((s: string) => theme.fg("accent", s))` — always add `s: string`.
+
+### 5. Reusing disposed overlay components
+
+**Symptom:** Component doesn't render, events don't fire.
+**Fix:** Create fresh instances each time. Never save references to overlay components.
+
+### 6. Styles bleeding across lines
+
+**Symptom:** Colors from one line appear on the next.
+**Fix:** The TUI resets styles at end of each line. Reapply styles per line, or use `wrapTextWithAnsi()`.
+
+### 7. Not implementing invalidate()
+
+**Symptom:** Theme changes don't take effect, stale rendering.
+**Fix:** Clear all caches in `invalidate()`. If you pre-bake theme colors, rebuild them.
+
+### 8. Forgetting to call `super.invalidate()`
+
+**Symptom:** Child components don't update when extending Container/Box.
+**Fix:** `override invalidate() { super.invalidate(); /* your cleanup */ }`
+
+### 9. Timer not cleaned up
+
+**Symptom:** Errors after component closes, memory leaks, phantom updates.
+**Fix:** Call `clearInterval` in a `dispose()` method before calling `done()`.
+
+### 10. Using `ctx.ui` methods in non-interactive mode
+
+**Symptom:** Hangs (dialogs waiting for input that will never come) or silent failures.
+**Fix:** Check `ctx.hasUI` before calling dialog methods.
+
+---
diff --git a/docs/pi-ui-tui/22-quick-reference-all-ui-apis.md b/docs/pi-ui-tui/22-quick-reference-all-ui-apis.md
new file mode 100644
index 000000000..495421783
--- /dev/null
+++ b/docs/pi-ui-tui/22-quick-reference-all-ui-apis.md
@@ -0,0 +1,68 @@
+# Quick Reference — All UI APIs
+
+### ctx.ui Dialog Methods (Blocking)
+
+| Method | Returns | Description |
+|--------|---------|-------------|
+| `select(title, options)` | `string \| undefined` | Selection dialog |
+| `confirm(title, message, opts?)` | `boolean` | Yes/No confirmation |
+| `input(label, placeholder?, opts?)` | `string \| undefined` | Single-line text input |
+| `editor(label, prefill?, opts?)` | `string \| undefined` | Multi-line text editor |
+
+### ctx.ui Persistent Methods (Non-Blocking)
+
+| Method | Description |
+|--------|-------------|
+| `notify(message, level)` | Toast notification (`"info"`, `"warning"`, `"error"`) |
+| `setStatus(id, text?)` | Footer status (clear with `undefined`) |
+| `setWidget(id, content?, opts?)` | Widget above/below editor |
+| `setWorkingMessage(text?)` | Working message during streaming |
+| `setFooter(factory?)` | Replace footer (restore with `undefined`) |
+| `setHeader(factory?)` | Replace header (restore with `undefined`) |
+| `setTitle(title)` | Terminal title |
+| `setEditorText(text)` | Set editor content |
+| `getEditorText()` | Get editor content |
+| `pasteToEditor(text)` | Paste into editor |
+| `setToolsExpanded(bool)` | Expand/collapse tool output |
+| `getToolsExpanded()` | Get expansion state |
+| `setEditorComponent(factory?)` | Replace editor (restore with `undefined`) |
+| `custom(factory, opts?)` | Full custom component / overlay |
+| `setTheme(name \| Theme)` | Switch theme |
+| `getTheme(name)` | Load theme without switching |
+| `getAllThemes()` | List available themes |
+| `theme` | Current theme object |
+
+### Component Interface
+
+| Method | Required | Description |
+|--------|----------|-------------|
+| `render(width): string[]` | Yes | Render to lines (each ≤ width) |
+| `handleInput(data): void` | No | Receive keyboard input |
+| `invalidate(): void` | Yes | Clear caches |
+| `wantsKeyRelease?: boolean` | No | Receive key release events |
+
+### Key Imports
+
+```typescript
+// From @mariozechner/pi-tui
+import {
+  Text, Box, Container, Spacer, Markdown, Image,
+  SelectList, SettingsList, Input, Editor,
+  matchesKey, Key,
+  visibleWidth, truncateToWidth, wrapTextWithAnsi,
+  CURSOR_MARKER,
+  type Component, type Focusable, type SelectItem, type SettingItem,
+  type EditorTheme, type OverlayAnchor, type OverlayOptions, type OverlayHandle,
+} from "@mariozechner/pi-tui";
+
+// From @mariozechner/pi-coding-agent
+import {
+  DynamicBorder, BorderedLoader, CustomEditor,
+  getMarkdownTheme, getSettingsListTheme,
+  highlightCode, getLanguageFromPath,
+  keyHint, appKeyHint, editorKey, rawKeyHint,
+  type ExtensionAPI, type ExtensionContext, type Theme,
+} from "@mariozechner/pi-coding-agent";
+```
+
+---
diff --git a/docs/pi-ui-tui/23-file-reference-example-extensions-with-ui.md b/docs/pi-ui-tui/23-file-reference-example-extensions-with-ui.md
new file mode 100644
index 000000000..897092d8e
--- /dev/null
+++ b/docs/pi-ui-tui/23-file-reference-example-extensions-with-ui.md
@@ -0,0 +1,93 @@
+# File Reference — Example Extensions with UI
+
+All paths relative to:
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples/extensions/
+```
+
+### Full Custom Components
+| File | What It Demonstrates |
+|------|---------------------|
+| `snake.ts` | **Game** — Timer loop, keyboard handling, WASD + arrows, render caching, session persistence, pause/resume |
+| `space-invaders.ts` | **Game** — Similar patterns to snake with more complex rendering |
+| `doom-overlay/` | **Game as overlay** — DOOM running at 35 FPS in a floating overlay, real-time rendering |
+| `questionnaire.ts` | **Multi-tab wizard** — Tab navigation, embedded `Editor` for free-text, option selection, submission flow |
+| `modal-editor.ts` | **Custom editor** — Vim-like modal editing with mode indicator |
+| `rainbow-editor.ts` | **Custom editor** — Animated text effects |
+
+### Dialogs and Selection
+| File | What It Demonstrates |
+|------|---------------------|
+| `preset.ts` | `SelectList` with `DynamicBorder`, complex multi-value presets |
+| `tools.ts` | `SettingsList` for toggling tools on/off |
+| `question.ts` | `ctx.ui.select()` inside a tool |
+| `timed-confirm.ts` | Dialogs with `timeout` and `AbortSignal` |
+
+### Overlays
+| File | What It Demonstrates |
+|------|---------------------|
+| `overlay-test.ts` | Basic overlay compositing with inline inputs |
+| `overlay-qa-tests.ts` | **Comprehensive** — All 9 anchors, margins, offsets, stacking, responsive visibility, animation at ~30 FPS, percentage sizing, max-height |
+
+### Persistent UI
+| File | What It Demonstrates |
+|------|---------------------|
+| `plan-mode/` | `setStatus` + `setWidget` for progress tracking, reactive updates |
+| `status-line.ts` | `setStatus` with themed colors |
+| `widget-placement.ts` | `setWidget` above and below editor |
+| `custom-footer.ts` | `setFooter` with git branch, token stats, reactive branch changes |
+| `custom-header.ts` | `setHeader` for custom startup header |
+
+### Tool Rendering
+| File | What It Demonstrates |
+|------|---------------------|
+| `todo.ts` | **Complete example** — `renderCall` and `renderResult` with expanded/collapsed views, state in details |
+| `built-in-tool-renderer.ts` | Custom compact rendering for built-in tools |
+| `minimal-mode.ts` | Override rendering for minimal display |
+
+### Message Rendering
+| File | What It Demonstrates |
+|------|---------------------|
+| `message-renderer.ts` | `registerMessageRenderer` with colors and expandable details |
+
+### Async Operations
+| File | What It Demonstrates |
+|------|---------------------|
+| `qna.ts` | `BorderedLoader` for async LLM calls with cancel |
+| `summarize.ts` | Summarize conversation with transient UI |
+
+### Notifications and Status
+| File | What It Demonstrates |
+|------|---------------------|
+| `notify.ts` | Desktop notifications via OSC 777 (Ghostty, iTerm2, WezTerm) |
+| `titlebar-spinner.ts` | Braille spinner animation in terminal title |
+| `model-status.ts` | React to model changes with `setStatus` |
+
+### Documentation References
+| File | What It Covers |
+|------|---------------|
+| `docs/tui.md` | Full TUI component API, all patterns, performance, theming |
+| `docs/extensions.md` | Custom UI section, custom components, overlays, rendering |
+| `docs/themes.md` | Creating custom themes, full color palette |
+| `docs/keybindings.md` | Keyboard shortcut format, customization |
+
+### Debug Logging
+
+```bash
+PI_TUI_WRITE_LOG=/tmp/tui-ansi.log pi
+```
+
+Captures the raw ANSI stream for debugging rendering issues.
+
+---
+
+*This document was generated from Pi's TUI and extension documentation. Source files:*
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/tui.md
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/extensions.md
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples/extensions/
+```
+
+*Companion documents on Desktop:*
+- **Pi-What-It-Is-And-How-It-Works.md** — What Pi is and how it works
+- **Pi-Extensions-Complete-Guide.md** — Full extensions API reference
diff --git a/docs/pi-ui-tui/README.md b/docs/pi-ui-tui/README.md
new file mode 100644
index 000000000..fe1e0fa50
--- /dev/null
+++ b/docs/pi-ui-tui/README.md
@@ -0,0 +1,34 @@
+# Pi Custom UI & TUI Component System
+
+> Split into individual files for easier consumption.
+
+## Table of Contents
+
+- [01. The UI Architecture](./01-the-ui-architecture.md)
+- [02. The Component Interface — Foundation of Everything](./02-the-component-interface-foundation-of-everything.md)
+- [03. Entry Points — How UI Gets on Screen](./03-entry-points-how-ui-gets-on-screen.md)
+- [04. Built-in Dialog Methods](./04-built-in-dialog-methods.md)
+- [05. Persistent UI Elements](./05-persistent-ui-elements.md)
+- [06. ctx.ui.custom() — Full Custom Components](./06-ctx-ui-custom-full-custom-components.md)
+- [07. Built-in Components — The Building Blocks](./07-built-in-components-the-building-blocks.md)
+- [08. High-Level Components from pi-coding-agent](./08-high-level-components-from-pi-coding-agent.md)
+- [09. Keyboard Input — How to Handle Keys](./09-keyboard-input-how-to-handle-keys.md)
+- [10. Line Width — The Cardinal Rule](./10-line-width-the-cardinal-rule.md)
+- [11. Theming — Colors and Styles](./11-theming-colors-and-styles.md)
+- [12. Overlays — Floating Modals and Panels](./12-overlays-floating-modals-and-panels.md)
+- [13. Custom Editors — Replacing the Input](./13-custom-editors-replacing-the-input.md)
+- [14. Tool Rendering — Custom Tool Display](./14-tool-rendering-custom-tool-display.md)
+- [15. Message Rendering — Custom Message Display](./15-message-rendering-custom-message-display.md)
+- [16. Performance — Caching and Invalidation](./16-performance-caching-and-invalidation.md)
+- [17. Theme Changes and Invalidation](./17-theme-changes-and-invalidation.md)
+- [18. IME Support — The Focusable Interface](./18-ime-support-the-focusable-interface.md)
+- [19. Building a Complete Component — Step by Step](./19-building-a-complete-component-step-by-step.md)
+- [20. Real-World Patterns from Examples](./20-real-world-patterns-from-examples.md)
+- [21. Common Mistakes and How to Avoid Them](./21-common-mistakes-and-how-to-avoid-them.md)
+- [22. Quick Reference — All UI APIs](./22-quick-reference-all-ui-apis.md)
+- [23. File Reference — Example Extensions with UI](./23-file-reference-example-extensions-with-ui.md)
+
+---
+
+*Split into per-section files for surgical context loading.*
+
diff --git a/docs/pr-1530/01-full.png b/docs/pr-1530/01-full.png
new file mode 100644
index 000000000..032098a0a
Binary files /dev/null and b/docs/pr-1530/01-full.png differ
diff --git a/docs/pr-1530/02-small.png b/docs/pr-1530/02-small.png
new file mode 100644
index 000000000..7221c0d76
Binary files /dev/null and b/docs/pr-1530/02-small.png differ
diff --git a/docs/pr-1530/03-min.png b/docs/pr-1530/03-min.png
new file mode 100644
index 000000000..4e93052a9
Binary files /dev/null and b/docs/pr-1530/03-min.png differ
diff --git a/docs/pr-1530/04-unhealthy.png b/docs/pr-1530/04-unhealthy.png
new file mode 100644
index 000000000..2d62e88be
Binary files /dev/null and b/docs/pr-1530/04-unhealthy.png differ
diff --git a/docs/pr-876/01-index.png b/docs/pr-876/01-index.png
new file mode 100644
index 000000000..dc2957b92
Binary files /dev/null and b/docs/pr-876/01-index.png differ
diff --git a/docs/pr-876/02-summary.png b/docs/pr-876/02-summary.png
new file mode 100644
index 000000000..dea9d8cb1
Binary files /dev/null and b/docs/pr-876/02-summary.png differ
diff --git a/docs/pr-876/03-progress.png b/docs/pr-876/03-progress.png
new file mode 100644
index 000000000..9dec3856b
Binary files /dev/null and b/docs/pr-876/03-progress.png differ
diff --git a/docs/pr-876/04-depgraph.png b/docs/pr-876/04-depgraph.png
new file mode 100644
index 000000000..b1349dead
Binary files /dev/null and b/docs/pr-876/04-depgraph.png differ
diff --git a/docs/pr-876/05-metrics.png b/docs/pr-876/05-metrics.png
new file mode 100644
index 000000000..bb8083030
Binary files /dev/null and b/docs/pr-876/05-metrics.png differ
diff --git a/docs/pr-876/06-changelog.png b/docs/pr-876/06-changelog.png
new file mode 100644
index 000000000..c79e00f2d
Binary files /dev/null and b/docs/pr-876/06-changelog.png differ
diff --git a/docs/pr-876/06-timeline.png b/docs/pr-876/06-timeline.png
new file mode 100644
index 000000000..62d081703
Binary files /dev/null and b/docs/pr-876/06-timeline.png differ
diff --git a/docs/pr-876/07-changelog.png b/docs/pr-876/07-changelog.png
new file mode 100644
index 000000000..f279f6d95
Binary files /dev/null and b/docs/pr-876/07-changelog.png differ
diff --git a/docs/pr-876/07-knowledge.png b/docs/pr-876/07-knowledge.png
new file mode 100644
index 000000000..2e7e32952
Binary files /dev/null and b/docs/pr-876/07-knowledge.png differ
diff --git a/docs/pr-876/08-knowledge.png b/docs/pr-876/08-knowledge.png
new file mode 100644
index 000000000..14a4dd33b
Binary files /dev/null and b/docs/pr-876/08-knowledge.png differ
diff --git a/docs/pr-876/09-captures.png b/docs/pr-876/09-captures.png
new file mode 100644
index 000000000..f3c29a40e
Binary files /dev/null and b/docs/pr-876/09-captures.png differ
diff --git a/docs/pr-876/10-artifacts.png b/docs/pr-876/10-artifacts.png
new file mode 100644
index 000000000..7aab45ec9
Binary files /dev/null and b/docs/pr-876/10-artifacts.png differ
diff --git a/docs/proposals/698-browser-tools-feature-additions.md b/docs/proposals/698-browser-tools-feature-additions.md
new file mode 100644
index 000000000..e7bac0e72
--- /dev/null
+++ b/docs/proposals/698-browser-tools-feature-additions.md
@@ -0,0 +1,312 @@
+# Browser-Tools Feature Additions — Implementation Requirements
+
+> Ref: [#698](https://github.com/gsd-build/gsd-2/issues/698)
+> Status: **Shipped** — all 10 features implemented and merged to main
+
+## Current State
+
+Browser-tools shipped **47 tools** across 10 modules (~8,300 lines) at the time this proposal was written. After implementation of these 10 features, the tool count expanded with 13 additional tools (some features map to multiple tools).
+
+Key existing capabilities at proposal time: `browser_navigate`, `browser_click`, `browser_evaluate`, `browser_assert`, `browser_diff`, `browser_batch`, `browser_find_best`, `browser_act`, `browser_trace_start/stop`, `browser_export_har`, `browser_set_viewport`, `browser_screenshot`, `browser_snapshot_refs`.
+
+**Implemented tools** (shipped in main): `browser_save_pdf`, `browser_save_state`, `browser_restore_state`, `browser_mock_route`, `browser_block_urls`, `browser_clear_routes`, `browser_emulate_device`, `browser_extract`, `browser_visual_diff`, `browser_zoom_region`, `browser_generate_test`, `browser_check_injection`, `browser_action_cache`.
+
+---
+
+## Feature 1: Structured Data Extraction with Schema Validation
+
+**Tool:** `browser_extract`
+
+### What it does
+Accept a JSON Schema (or simplified shape description), extract matching structured data from the current page, validate against the schema, return typed JSON.
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **New file** | `tools/extract.ts` |
+| **Playwright API** | `page.evaluate()` — runs extraction logic in-page |
+| **Schema validation** | Use `@sinclair/typebox` (already a dependency) for schema definition; `ajv` or inline validation for runtime checking |
+| **Extraction strategy** | 1. Convert page to accessibility tree or clean text via existing `browser_get_accessibility_tree` / `browser_get_page_source` infrastructure. 2. Use `page.evaluate()` to run CSS selector-based extraction. 3. For complex extraction, pass schema + page content to the LLM via tool result and let the agent extract (Stagehand approach) |
+| **Tool signature** | `browser_extract({ schema: JSONSchema, selector?: string, multiple?: boolean })` → `{ data: T, validationErrors?: string[] }` |
+| **Dependencies** | None new — Typebox already available, `page.evaluate` is Playwright core |
+| **Estimated effort** | **16–24 hours** |
+| **Risk** | Medium — extraction quality depends heavily on page structure; may need multiple strategies (DOM-based, a11y-tree-based, LLM-assisted) |
+
+### Acceptance criteria
+- [ ] Extracts data matching a provided JSON schema from a page
+- [ ] Returns validation errors when extracted data doesn't match schema
+- [ ] Supports scoping extraction to a CSS selector
+- [ ] Supports extracting arrays of items (`multiple: true`)
+- [ ] Handles pages with dynamic content (waits for network idle before extraction)
+
+---
+
+## Feature 2: Session State Persistence & Restoration
+
+**Tools:** `browser_save_state`, `browser_restore_state`
+
+### What it does
+Save cookies, localStorage, sessionStorage, and auth tokens to disk. Restore them on a subsequent browser session to resume authenticated state without re-logging in.
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **New tools in** | `tools/session.ts` (extend existing file) |
+| **Playwright API** | `context.storageState()` for cookies + localStorage; `page.evaluate()` for sessionStorage (not included in Playwright's storageState) |
+| **Storage location** | Session artifacts directory: `.gsd/browser-state/<name>.json` |
+| **Tool signatures** | `browser_save_state({ name?: string })` → `{ path, cookieCount, localStorageOrigins }` / `browser_restore_state({ name?: string })` → `{ restored, cookieCount }` |
+| **Restore mechanism** | `browser.newContext({ storageState: path })` for new sessions; `context.addCookies()` + `page.evaluate()` for mid-session restore |
+| **Security** | State files may contain auth tokens — add to `.gitignore` pattern, warn in tool output |
+| **Dependencies** | None new — all Playwright core APIs |
+| **Estimated effort** | **8–12 hours** |
+| **Risk** | Low — Playwright's `storageState()` is well-tested; sessionStorage requires extra handling |
+
+### Acceptance criteria
+- [ ] Saves cookies + localStorage via `context.storageState()`
+- [ ] Saves sessionStorage via `page.evaluate()` (per-origin)
+- [ ] Restores state on new browser context launch
+- [ ] Restores state mid-session (cookies + evaluate injection)
+- [ ] State files written to `.gsd/browser-state/` and gitignored
+- [ ] Tool output shows count of restored items, never displays secret values
+
+---
+
+## Feature 3: Test Code Generation from Session
+
+**Tool:** `browser_generate_test`
+
+### What it does
+Record agent interactions during a browser session and emit a Playwright test script. Turns AI-driven exploration into deterministic, reproducible tests.
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **New file** | `tools/codegen.ts` |
+| **Data source** | Action timeline (already tracked in `state.ts`) + trace data from `browser_trace_start/stop` |
+| **Code generation** | Transform timeline entries (navigate, click, type, assert) into Playwright test syntax: `await page.goto(...)`, `await page.click(...)`, `await expect(page.locator(...)).toBeVisible()` |
+| **Tool signature** | `browser_generate_test({ name?: string, includeAssertions?: boolean })` → `{ path, actionCount, testCode }` |
+| **Output format** | Standard Playwright test file (`*.spec.ts`) written to project's test directory or session artifacts |
+| **Selector strategy** | Prefer stable selectors: `getByRole` > `getByText` > CSS selector (use ref metadata for best selectors) |
+| **Dependencies** | None new — reads from existing timeline/trace infrastructure |
+| **Estimated effort** | **20–30 hours** |
+| **Risk** | High — generated selectors may be brittle; action timeline may not capture all nuances (hover timing, scroll position, wait conditions); output quality varies significantly by page complexity |
+
+### Acceptance criteria
+- [ ] Generates a runnable Playwright test from a recorded session
+- [ ] Includes navigation, click, type, and assertion actions
+- [ ] Uses stable selectors (role-based preferred over CSS)
+- [ ] Generated test passes when run against the same page state
+- [ ] Writes test file to configurable output path
+
+---
+
+## Feature 4: Network Request Interception & Mocking
+
+**Tools:** `browser_mock_route`, `browser_block_urls`, `browser_clear_routes`
+
+### What it does
+Intercept network requests to mock API responses, block URLs (analytics, ads), simulate error conditions (500s, timeouts, slow responses).
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **New file** | `tools/network-mock.ts` |
+| **Playwright API** | `page.route(urlPattern, handler)` for interception; `route.fulfill()` for mock responses; `route.abort()` for blocking |
+| **Tool signatures** | `browser_mock_route({ url: string, status?: number, body?: string, headers?: Record })` / `browser_block_urls({ patterns: string[] })` / `browser_clear_routes()` |
+| **State tracking** | Track active routes in module state for cleanup and listing |
+| **Dependencies** | None new — Playwright core API |
+| **Estimated effort** | **12–16 hours** |
+| **Risk** | Low — Playwright's route API is mature and well-documented |
+
+### Acceptance criteria
+- [ ] Mock API responses with custom status, body, and headers
+- [ ] Block requests matching URL patterns (glob or regex)
+- [ ] Simulate slow responses with configurable delay
+- [ ] Clear all active routes
+- [ ] List active routes for debugging
+- [ ] Routes survive page navigation within the same context
+
+---
+
+## Feature 5: Device Emulation Presets
+
+**Tool:** `browser_emulate_device`
+
+### What it does
+One-call device simulation: viewport + user agent + touch + device scale factor. Wraps Playwright's device descriptors.
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **Extend** | `tools/interaction.ts` (alongside `browser_set_viewport`) or new `tools/device.ts` |
+| **Playwright API** | `playwright.devices['iPhone 15']` → `{ viewport, userAgent, deviceScaleFactor, isMobile, hasTouch }` applied via context recreation or page emulation |
+| **Tool signature** | `browser_emulate_device({ device: string })` → `{ device, viewport, userAgent, isMobile }` |
+| **Device list** | Expose Playwright's built-in device descriptors (~100 devices); accept fuzzy matching on device name |
+| **Limitation** | Some properties (userAgent, isMobile) can only be set at context creation — may require context restart |
+| **Dependencies** | None new — Playwright ships device descriptors |
+| **Estimated effort** | **6–10 hours** |
+| **Risk** | Low-Medium — context restart for full emulation changes the page state; partial emulation (viewport only) is simpler but less accurate |
+
+### Acceptance criteria
+- [ ] Accept device name (e.g., "iPhone 15", "Pixel 7") and configure full emulation
+- [ ] Support fuzzy matching on device name with suggestions on no match
+- [ ] Set viewport, user agent, device scale factor, touch, and mobile flag
+- [ ] Warn when context restart is required and confirm with user
+
+---
+
+## Feature 6: Visual Diffing (Screenshot Comparison)
+
+**Tool:** `browser_visual_diff`
+
+### What it does
+Compare two screenshots pixel-by-pixel, return a diff image and similarity score.
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **New file** | `tools/visual-diff.ts` |
+| **Comparison library** | `pixelmatch` (lightweight, ~200 lines, MIT) or Playwright's built-in `expect(page).toHaveScreenshot()` comparison |
+| **Tool signature** | `browser_visual_diff({ baseline?: string, current?: string, threshold?: number })` → `{ match: boolean, similarity: number, diffPixels: number, diffImagePath?: string }` |
+| **Baseline management** | Save baselines to `.gsd/browser-baselines/`; auto-name by URL + viewport |
+| **Dependencies** | `pixelmatch` + `pngjs` (new deps, ~50KB total) or use Playwright's built-in comparator |
+| **Estimated effort** | **10–14 hours** |
+| **Risk** | Medium — anti-aliasing and dynamic content (timestamps, ads) cause false positives; threshold tuning needed |
+
+### Acceptance criteria
+- [ ] Compare current page screenshot against a stored baseline
+- [ ] Return similarity score (0–1) and diff pixel count
+- [ ] Generate diff image highlighting changed regions
+- [ ] Configurable threshold for pass/fail
+- [ ] Support element-scoped comparison (crop to selector)
+
+---
+
+## Feature 7: PDF Generation
+
+**Tool:** `browser_save_pdf`
+
+### What it does
+Render current page as PDF artifact.
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **Extend** | `tools/screenshot.ts` or new `tools/pdf.ts` |
+| **Playwright API** | `page.pdf({ path, format, printBackground })` — Chromium only (already our engine) |
+| **Tool signature** | `browser_save_pdf({ filename?: string, format?: string, printBackground?: boolean })` → `{ path, pageCount, sizeBytes }` |
+| **Output location** | Session artifacts directory |
+| **Dependencies** | None — Playwright core API |
+| **Estimated effort** | **3–5 hours** |
+| **Risk** | Low — straightforward Playwright wrapper |
+
+### Acceptance criteria
+- [ ] Generate PDF from current page
+- [ ] Support A4/Letter/custom page formats
+- [ ] Include background graphics option
+- [ ] Write to session artifacts with configurable filename
+- [ ] Return file path and size
+
+---
+
+## Feature 8: Region Zoom / Targeted High-Res Capture
+
+**Tool:** `browser_zoom_region`
+
+### What it does
+Capture and upscale a specific rectangular region for detailed inspection of dense UIs.
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **Extend** | `tools/screenshot.ts` |
+| **Playwright API** | `page.screenshot({ clip: { x, y, width, height } })` for region capture; upscale via `sharp` or return at native device pixel ratio |
+| **Tool signature** | `browser_zoom_region({ x, y, width, height, scale?: number })` → screenshot image |
+| **Dependencies** | Optional `sharp` for upscaling, or rely on Playwright's deviceScaleFactor |
+| **Estimated effort** | **4–6 hours** |
+| **Risk** | Low |
+
+### Acceptance criteria
+- [ ] Capture arbitrary rectangular region by coordinates
+- [ ] Support scale factor for upscaling (2x, 3x)
+- [ ] Return as inline image (same as `browser_screenshot`)
+
+---
+
+## Feature 9: Action Caching / Replay (Lower Priority)
+
+**Tool:** Internal optimization, not a user-facing tool
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **Cache key** | URL + DOM structure hash → selector mapping |
+| **Storage** | In-memory LRU cache with optional disk persistence |
+| **Integration point** | `browser_find_best` / `browser_act` — check cache before LLM resolution |
+| **Estimated effort** | **12–18 hours** |
+| **Risk** | Medium — cache invalidation when page structure changes; stale selectors cause silent failures |
+
+---
+
+## Feature 10: Prompt Injection Detection (Lower Priority)
+
+**Tool:** `browser_check_injection`
+
+### Implementation requirements
+
+| Item | Details |
+|---|---|
+| **Detection strategy** | Regex/keyword scan on screenshot OCR text or page text content for known injection patterns ("ignore previous", "system prompt", "you are now") |
+| **Integration point** | Optional auto-check after `browser_screenshot` or `browser_navigate` |
+| **Estimated effort** | **8–12 hours** |
+| **Risk** | Medium — false positives on legitimate content; OCR adds latency; determined adversaries can evade keyword detection |
+
+---
+
+## Summary — Effort & Priority Matrix
+
+| # | Feature | Priority | Effort | New Deps | Risk |
+|---|---|---|---|---|---|
+| 1 | Structured data extraction | High | 16–24h | None | Medium |
+| 2 | Session state persistence | High | 8–12h | None | Low |
+| 3 | Test code generation | High | 20–30h | None | High |
+| 4 | Network interception/mocking | High | 12–16h | None | Low |
+| 5 | Device emulation presets | Medium | 6–10h | None | Low-Med |
+| 6 | Visual diffing | Medium | 10–14h | pixelmatch (~50KB) | Medium |
+| 7 | PDF generation | Medium | 3–5h | None | Low |
+| 8 | Region zoom capture | Medium | 4–6h | Optional sharp | Low |
+| 9 | Action caching | Lower | 12–18h | None | Medium |
+| 10 | Prompt injection detection | Lower | 8–12h | None | Medium |
+| | **Total** | | **~100–150h** | | |
+
+## Recommended Implementation Order
+
+1. **PDF generation** (Feature 7) — smallest, zero deps, immediate utility, good warmup
+2. **Session state persistence** (Feature 2) — high value, low risk, moderate effort
+3. **Network interception** (Feature 4) — high value, low risk, Playwright API is mature
+4. **Region zoom** (Feature 8) — small effort, extends existing screenshot tool
+5. **Device emulation** (Feature 5) — moderate effort, extends existing viewport tool
+6. **Structured extraction** (Feature 1) — high value but needs design iteration on extraction strategy
+7. **Visual diffing** (Feature 6) — useful for UAT, needs threshold tuning
+8. **Test code generation** (Feature 3) — high value but high risk, best tackled after timeline infrastructure is battle-tested
+9. **Action caching** (Feature 9) — optimization, defer until intent resolution is a proven bottleneck
+10. **Prompt injection** (Feature 10) — defensive, defer until production use cases mature
+
+## Notes for Contributors
+
+- All features wrap existing Playwright APIs — no custom browser extensions or CDP hacking needed
+- Features 2, 4, 5, 7, 8 are straightforward Playwright wrappers with low implementation risk
+- Features 1 and 3 involve more design work — open sub-issues for design discussion before implementation
+- Each feature should be a separate PR with its own tests
+- Follow the existing tool registration pattern in `index.ts` → `tools/*.ts`
+- Use `Type` from `@sinclair/typebox` for tool parameter schemas (existing convention)
+- Session artifacts go in the artifacts directory managed by `session.ts`
diff --git a/docs/proposals/rfc-gitops-branching-strategy.md b/docs/proposals/rfc-gitops-branching-strategy.md
new file mode 100644
index 000000000..99e2a394c
--- /dev/null
+++ b/docs/proposals/rfc-gitops-branching-strategy.md
@@ -0,0 +1,299 @@
+# RFC: GitOps Branching & Versioning Strategy
+
+> **Status:** 🧪 Experimental — requesting feedback before implementation  
+> **Author:** @trek-e  
+> **Date:** 2026-03-19
+
+## Problem
+
+The current workflow is trunk-based: all PRs target `main`, the pipeline auto-bumps version on merge, and `@dev`/`@next`/`@latest` dist-tags promote through stages. This works but has gaps:
+
+1. **No stable release branch.** If v2.33 ships a regression, the only fix path is forward — merge to main, wait for the full pipeline. There's no branch to cherry-pick a hotfix onto.
+2. **No batched releases.** Every merge to main triggers a version bump. Contributors can't group related features into a coordinated release.
+3. **Ad-hoc branch naming.** Branch prefixes (`fix/`, `feat/`, `docs/`, `refactor/`) are conventional but not enforced. No automated integration branches collect work-in-progress.
+4. **No pre-release channel for breaking changes.** Major version bumps (v3.0.0) have no staging area — they'd need to land on main directly.
+
+## Proposal: Git-Flow Lite with Automated Integration Branches
+
+A lightweight adaptation of git-flow that preserves our trunk-based CI speed while adding release stability. Three branch tiers:
+
+```
+main                    ← production-ready, tagged releases only
+  ├── release/2.34      ← stabilization branch, created when 2.34 is feature-complete
+  ├── release/2.33      ← maintenance branch for hotfixes to the current stable
+  └── next              ← integration branch for the next minor release
+       ├── feat/1325-user-prefs
+       ├── feat/1340-parallel-v2
+       └── fix/1326-silent-commit
+```
+
+### Branch Roles
+
+| Branch | Purpose | Merges Into | Auto-Created |
+|--------|---------|-------------|--------------|
+| `main` | Production releases only. Every commit is a tagged release. | — | — |
+| `next` | Integration branch for the next minor version. PRs target here. | `main` (via release branch) | Yes, on version bump |
+| `release/X.Y` | Stabilization branch. Created when `next` is feature-complete. Only bugfixes allowed. | `main` + back-merged to `next` | Yes, via `/release` command or workflow |
+| `hotfix/X.Y.Z` | Emergency fixes for production. Cherry-picked from `next` or created fresh. | `release/X.Y` + `main` | No, manual |
+| `feat/<issue>-<slug>` | Feature work. Targets `next`. | `next` | No, developer creates |
+| `fix/<issue>-<slug>` | Bug fix. Targets `next` (or `release/X.Y` for hotfixes). | `next` or `release/X.Y` | No, developer creates |
+
+### Version Scheme
+
+Semantic versioning with automated bump logic based on conventional commits (already implemented in `generate-changelog.mjs`):
+
+| Commit Type | Bump | Example |
+|-------------|------|---------|
+| `fix:` | Patch | 2.33.0 → 2.33.1 |
+| `feat:` | Minor | 2.33.0 → 2.34.0 |
+| `feat!:` / `BREAKING CHANGE` | Major | 2.33.0 → 3.0.0 |
+
+Pre-release versions on `next`:
+
+```
+2.34.0-next.1    ← first merge to next after 2.33.0 release
+2.34.0-next.2    ← second merge to next
+2.34.0-next.N    ← continues until release/2.34 is cut
+```
+
+### Lifecycle
+
+```
+1. Development
+   Developer creates feat/1325-user-prefs from next
+   Developer opens PR targeting next
+   CI runs on PR (build, test, typecheck, windows)
+   PR is reviewed and merged to next
+   Pipeline publishes gsd-pi@2.34.0-next.N with @next tag
+
+2. Stabilization
+   Maintainer runs: gh workflow dispatch create-release -- version=2.34
+   Workflow creates release/2.34 from next
+   Only fix: commits allowed on release/2.34 (enforced by branch protection)
+   Pipeline publishes gsd-pi@2.34.0-rc.N with @rc tag
+   Back-merges fixes to next automatically
+
+3. Production Release
+   Maintainer approves prod-release for release/2.34
+   Pipeline merges release/2.34 → main, tags v2.34.0, publishes @latest
+   release/2.34 branch is kept alive for patch releases (2.34.1, 2.34.2)
+
+4. Hotfix
+   Critical bug found in v2.34.0
+   Developer creates fix/1400-critical from release/2.34
+   PR targets release/2.34
+   Pipeline publishes 2.34.1-rc.1 for verification
+   Merged to release/2.34 → auto-merged to main → back-merged to next
+```
+
+## Automation: Workflow Additions
+
+### 1. `create-release.yml` — Release Branch Creation
+
+```yaml
+name: Create Release Branch
+
+on:
+  workflow_dispatch:
+    inputs:
+      version:
+        description: "Release version (e.g., 2.34)"
+        required: true
+        type: string
+
+jobs:
+  create-release:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          ref: next
+          fetch-depth: 0
+          token: ${{ secrets.RELEASE_PAT }}
+
+      - name: Validate version format
+        run: |
+          if ! echo "${{ inputs.version }}" | grep -qE '^[0-9]+\.[0-9]+$'; then
+            echo "::error::Version must be X.Y format (e.g., 2.34)"
+            exit 1
+          fi
+
+      - name: Create release branch
+        run: |
+          BRANCH="release/${{ inputs.version }}"
+          git checkout -b "$BRANCH"
+          git push origin "$BRANCH"
+          echo "Created $BRANCH from next"
+
+      - name: Configure branch protection
+        env:
+          GH_TOKEN: ${{ secrets.RELEASE_PAT }}
+        run: |
+          # Require PR reviews, block force-push, restrict to fix: commits
+          gh api repos/${{ github.repository }}/branches/release%2F${{ inputs.version }}/protection \
+            -X PUT \
+            -f required_pull_request_reviews='{"required_approving_review_count":1}' \
+            -F enforce_admins=true \
+            -F allow_force_pushes=false \
+            || echo "::warning::Branch protection setup requires admin permissions"
+
+      - name: Open tracking issue
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          gh issue create \
+            --title "Release v${{ inputs.version }}.0" \
+            --label "release" \
+            --body "## Release v${{ inputs.version }}.0
+
+          Release branch: \`release/${{ inputs.version }}\`
+          Created from: \`next\` at $(git rev-parse --short HEAD)
+
+          ### Checklist
+          - [ ] All targeted fixes merged to release/${{ inputs.version }}
+          - [ ] RC published and smoke-tested
+          - [ ] CHANGELOG reviewed
+          - [ ] Production deployment approved"
+```
+
+### 2. `sync-next.yml` — Auto-Create/Maintain `next` Branch
+
+```yaml
+name: Sync Next Branch
+
+on:
+  push:
+    tags:
+      - "v*"
+
+jobs:
+  sync-next:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+          token: ${{ secrets.RELEASE_PAT }}
+
+      - name: Ensure next branch exists and is up to date
+        run: |
+          git fetch origin next 2>/dev/null || true
+
+          if git show-ref --verify --quiet refs/remotes/origin/next; then
+            # next exists — merge main into it (fast-forward if possible)
+            git checkout next
+            git merge origin/main --no-edit || {
+              echo "::warning::Merge conflict merging main into next. Manual resolution required."
+              exit 1
+            }
+          else
+            # next doesn't exist — create from main
+            git checkout -b next
+          fi
+
+          git push origin next
+```
+
+### 3. `backmerge.yml` — Auto Back-Merge Release Fixes to `next`
+
+```yaml
+name: Back-merge Release Fixes
+
+on:
+  push:
+    branches:
+      - "release/**"
+
+jobs:
+  backmerge:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      pull-requests: write
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+          token: ${{ secrets.RELEASE_PAT }}
+
+      - name: Back-merge to next
+        run: |
+          RELEASE_BRANCH="${GITHUB_REF#refs/heads/}"
+          git fetch origin next
+          git checkout next
+          git merge "origin/${RELEASE_BRANCH}" --no-edit || {
+            # Conflict — open a PR instead of failing
+            git merge --abort
+            gh pr create \
+              --base next \
+              --head "${RELEASE_BRANCH}" \
+              --title "backmerge: ${RELEASE_BRANCH} → next (conflict)" \
+              --body "Automated back-merge from ${RELEASE_BRANCH} to next has a conflict. Please resolve manually."
+            exit 0
+          }
+          git push origin next
+        env:
+          GH_TOKEN: ${{ secrets.RELEASE_PAT }}
+```
+
+## Pipeline Changes
+
+The existing `pipeline.yml` needs minor adjustments:
+
+| Current | Proposed |
+|---------|----------|
+| Pipeline triggers on `main` CI success | Pipeline triggers on `main`, `next`, and `release/*` CI success |
+| Dev publish uses `-dev.<sha>` | `next` uses `-next.N`, `release/*` uses `-rc.N` |
+| Prod release auto-bumps version | Prod release reads version from release branch |
+| Single `@latest` promotion | `@next` from `next` branch, `@rc` from release branches, `@latest` from main |
+
+## Migration Path
+
+This can be adopted incrementally:
+
+1. **Phase 1 (low risk):** Create `next` branch as an alias for `main`. PRs can target either. Pipeline handles both. Zero behavioral change.
+2. **Phase 2:** Start targeting `next` for new feature PRs. `main` receives only merges from release branches.
+3. **Phase 3:** Add `create-release.yml` workflow. Cut first release branch for the next minor.
+4. **Phase 4:** Add back-merge automation. Enforce branch protection on release branches.
+
+## What This Doesn't Change
+
+- **Conventional commits** — same `feat:`, `fix:`, `refactor:` prefixes
+- **CI workflow** — same build/test/typecheck gates on every PR
+- **Dev publish** — still publishes on every merge (just to `next` instead of `main`)
+- **Prod approval** — still requires manual environment approval
+- **Changelog generation** — same script, just reads from release branch instead of main
+- **Docker images** — same multi-stage build, same GHCR tags
+
+## Open Questions
+
+1. **Is the `next` branch worth the overhead?** Trunk-based is simpler. The main benefit is batched releases and a stable `main`.
+2. **Should release branches be long-lived or ephemeral?** Long-lived enables patch releases (2.34.1, 2.34.2). Ephemeral (delete after merge) is simpler.
+3. **How many simultaneous release branches?** Maintaining 2+ releases (current + previous) adds backport burden. Is `current + hotfix` enough?
+4. **Should `next` branch get its own npm dist-tag?** Currently `@next` is promoted from `@dev`. With this model, `@next` would come from the `next` branch directly.
+5. **Branch protection on `next`?** Require PR reviews? Or allow direct push for maintainers?
+
+## Alternatives Considered
+
+### Trunk-Based (Current)
+Pros: Simple, fast. Cons: No release stabilization, no hotfix path.
+
+### Full Git-Flow
+Pros: Maximum control. Cons: Heavy — `develop`, `release`, `hotfix`, `feature` branches with strict merge rules. Overkill for a team this size.
+
+### GitHub Flow + Release Tags
+Pros: Simple branching, release via tags only. Cons: No stabilization period, same forward-only problem as current.
+
+### Release Please / Semantic Release
+Pros: Fully automated versioning. Cons: Less control over release timing, doesn't solve the hotfix branch problem.
+
+## Feedback Requested
+
+- Does this match how you want to manage releases?
+- Is the `next` branch overhead justified for this project's pace?
+- Which open questions have strong opinions?
+- Any workflows or automation missing?
diff --git a/docs/proposals/workflows/README.md b/docs/proposals/workflows/README.md
new file mode 100644
index 000000000..b2abdc628
--- /dev/null
+++ b/docs/proposals/workflows/README.md
@@ -0,0 +1,9 @@
+## Workflow Scaffolds
+
+These files are **not active** — they live in `docs/proposals/workflows/` for review purposes only. If the RFC is accepted, they'll be moved to `.github/workflows/`.
+
+| File | Purpose |
+|------|---------|
+| `create-release.yml` | Manually triggered — creates `release/X.Y` from `next`, sets up branch protection, opens tracking issue |
+| `sync-next.yml` | Auto-triggered on version tag — ensures `next` branch exists and is merged up from `main` |
+| `backmerge.yml` | Auto-triggered on release branch push — back-merges fixes to `next`, opens conflict PR if needed |
diff --git a/docs/proposals/workflows/backmerge.yml b/docs/proposals/workflows/backmerge.yml
new file mode 100644
index 000000000..5841db1fd
--- /dev/null
+++ b/docs/proposals/workflows/backmerge.yml
@@ -0,0 +1,62 @@
+# ⚠️ SCAFFOLD ONLY — not active. See docs/proposals/rfc-gitops-branching-strategy.md
+
+name: Back-merge Release Fixes
+
+on:
+  push:
+    branches:
+      - "release/**"
+
+jobs:
+  backmerge:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      pull-requests: write
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+          token: ${{ secrets.RELEASE_PAT }}
+
+      - name: Back-merge to next
+        run: |
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+
+          RELEASE_BRANCH="${GITHUB_REF#refs/heads/}"
+          echo "Back-merging $RELEASE_BRANCH → next"
+
+          git fetch origin next || {
+            echo "::warning::next branch does not exist — skipping back-merge"
+            exit 0
+          }
+
+          git checkout next
+
+          if git merge "origin/${RELEASE_BRANCH}" --no-edit; then
+            git push origin next
+            echo "## ✅ Back-merged \`$RELEASE_BRANCH\` → \`next\`" >> "$GITHUB_STEP_SUMMARY"
+          else
+            # Conflict — open a PR for manual resolution
+            git merge --abort
+            gh pr create \
+              --base next \
+              --head "${RELEASE_BRANCH}" \
+              --title "backmerge: ${RELEASE_BRANCH} → next (conflict)" \
+              --body "Automated back-merge from \`${RELEASE_BRANCH}\` to \`next\` has a merge conflict.
+
+          **Action required:** Resolve the conflict manually by checking out \`next\` and merging \`${RELEASE_BRANCH}\`:
+
+          \`\`\`bash
+          git checkout next
+          git merge origin/${RELEASE_BRANCH}
+          # resolve conflicts
+          git commit
+          git push origin next
+          \`\`\`" \
+              || echo "::warning::PR creation failed"
+            echo "## ⚠️ Conflict merging \`$RELEASE_BRANCH\` → \`next\` — PR opened" >> "$GITHUB_STEP_SUMMARY"
+          fi
+        env:
+          GH_TOKEN: ${{ secrets.RELEASE_PAT }}
diff --git a/docs/proposals/workflows/create-release.yml b/docs/proposals/workflows/create-release.yml
new file mode 100644
index 000000000..47ff74f9c
--- /dev/null
+++ b/docs/proposals/workflows/create-release.yml
@@ -0,0 +1,69 @@
+# ⚠️ SCAFFOLD ONLY — not active. See docs/proposals/rfc-gitops-branching-strategy.md
+
+name: Create Release Branch
+
+on:
+  workflow_dispatch:
+    inputs:
+      version:
+        description: "Release version (e.g., 2.34)"
+        required: true
+        type: string
+
+jobs:
+  create-release:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+      pull-requests: write
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          ref: next
+          fetch-depth: 0
+          token: ${{ secrets.RELEASE_PAT }}
+
+      - name: Validate version format
+        run: |
+          if ! echo "${{ inputs.version }}" | grep -qE '^[0-9]+\.[0-9]+$'; then
+            echo "::error::Version must be X.Y format (e.g., 2.34)"
+            exit 1
+          fi
+          if git show-ref --verify --quiet "refs/remotes/origin/release/${{ inputs.version }}"; then
+            echo "::error::release/${{ inputs.version }} already exists"
+            exit 1
+          fi
+
+      - name: Create release branch
+        run: |
+          BRANCH="release/${{ inputs.version }}"
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+          git checkout -b "$BRANCH"
+          git push origin "$BRANCH"
+          echo "## Created \`$BRANCH\` from \`next\` at $(git rev-parse --short HEAD)" >> "$GITHUB_STEP_SUMMARY"
+
+      - name: Open tracking issue
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          gh issue create \
+            --title "Release v${{ inputs.version }}.0" \
+            --label "release" \
+            --body "## Release v${{ inputs.version }}.0
+
+          **Release branch:** \`release/${{ inputs.version }}\`
+          **Created from:** \`next\` at \`$(git rev-parse --short HEAD)\`
+          **Created by:** @${{ github.actor }}
+
+          ### Checklist
+          - [ ] All targeted fixes merged to \`release/${{ inputs.version }}\`
+          - [ ] RC published and smoke-tested (\`npm i gsd-pi@rc\`)
+          - [ ] CHANGELOG reviewed
+          - [ ] Production deployment approved
+
+          ### Post-Release
+          - [ ] \`release/${{ inputs.version }}\` merged to \`main\`
+          - [ ] Tag \`v${{ inputs.version }}.0\` created
+          - [ ] \`@latest\` promoted
+          - [ ] Back-merged to \`next\`"
diff --git a/docs/proposals/workflows/sync-next.yml b/docs/proposals/workflows/sync-next.yml
new file mode 100644
index 000000000..ba98e9552
--- /dev/null
+++ b/docs/proposals/workflows/sync-next.yml
@@ -0,0 +1,53 @@
+# ⚠️ SCAFFOLD ONLY — not active. See docs/proposals/rfc-gitops-branching-strategy.md
+
+name: Sync Next Branch
+
+on:
+  push:
+    tags:
+      - "v*"
+
+jobs:
+  sync-next:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          fetch-depth: 0
+          token: ${{ secrets.RELEASE_PAT }}
+
+      - name: Ensure next branch exists and is synced with main
+        run: |
+          git config user.name "github-actions[bot]"
+          git config user.email "github-actions[bot]@users.noreply.github.com"
+
+          TAG="${GITHUB_REF#refs/tags/}"
+          echo "Syncing next branch after release $TAG"
+
+          git fetch origin next 2>/dev/null || true
+
+          if git show-ref --verify --quiet refs/remotes/origin/next; then
+            git checkout next
+            git merge origin/main --no-edit --strategy-option theirs || {
+              echo "::warning::Merge conflict merging main into next after $TAG."
+              echo "::warning::Manual resolution required: git checkout next && git merge main"
+              # Open a PR for manual resolution
+              git merge --abort
+              gh pr create \
+                --base next \
+                --head main \
+                --title "sync: merge main ($TAG) → next (conflict)" \
+                --body "Automated sync from main to next after $TAG release has a merge conflict. Please resolve manually." \
+                || echo "::warning::PR creation failed — resolve manually"
+              exit 0
+            }
+          else
+            git checkout -b next origin/main
+          fi
+
+          git push origin next
+          echo "## Synced \`next\` with \`main\` after \`$TAG\`" >> "$GITHUB_STEP_SUMMARY"
+        env:
+          GH_TOKEN: ${{ secrets.RELEASE_PAT }}
diff --git a/docs/remote-questions.md b/docs/remote-questions.md
new file mode 100644
index 000000000..8e4ce3555
--- /dev/null
+++ b/docs/remote-questions.md
@@ -0,0 +1,149 @@
+# Remote Questions
+
+Remote questions allow GSD to ask for user input via Slack, Discord, or Telegram when running in headless auto-mode. When GSD encounters a decision point that needs human input, it posts the question to your configured channel and polls for a response.
+
+## Setup
+
+### Discord
+
+```
+/gsd remote discord
+```
+
+The setup wizard:
+1. Prompts for your Discord bot token
+2. Validates the token against the Discord API
+3. Lists servers the bot belongs to (or lets you pick)
+4. Lists text channels in the selected server
+5. Sends a test message to confirm permissions
+6. Saves the configuration to `~/.gsd/preferences.md`
+
+**Bot requirements:**
+- A Discord bot application with a token (from [Discord Developer Portal](https://discord.com/developers/applications))
+- Bot must be invited to the target server with these permissions:
+  - Send Messages
+  - Read Message History
+  - Add Reactions
+  - View Channel
+- The `DISCORD_BOT_TOKEN` environment variable must be set (the setup wizard handles this)
+
+### Slack
+
+```
+/gsd remote slack
+```
+
+The setup wizard:
+1. Prompts for your Slack bot token (`xoxb-...`)
+2. Validates the token
+3. Lists channels the bot can access (with manual ID fallback)
+4. Sends a test message to confirm permissions
+5. Saves the configuration
+
+**Bot requirements:**
+- A Slack app with a bot token (from [Slack API](https://api.slack.com/apps))
+- Bot must be invited to the target channel
+- Typical scopes for public/private channels: `chat:write`, `reactions:read`, `reactions:write`, `channels:read`, `groups:read`, `channels:history`, `groups:history`
+
+### Telegram
+
+```
+/gsd remote telegram
+```
+
+The setup wizard:
+1. Prompts for your Telegram bot token (from [@BotFather](https://t.me/BotFather))
+2. Validates the token against the Telegram API
+3. Prompts for the chat ID (group or private chat)
+4. Sends a test message to confirm permissions
+5. Saves the configuration
+
+**Bot requirements:**
+- A Telegram bot token from [@BotFather](https://t.me/BotFather)
+- Bot must be added to the target group chat (or use private chat with the bot)
+- The `TELEGRAM_BOT_TOKEN` environment variable must be set
+
+## Configuration
+
+Remote questions are configured in `~/.gsd/preferences.md`:
+
+```yaml
+remote_questions:
+  channel: discord          # or slack or telegram
+  channel_id: "1234567890123456789"
+  timeout_minutes: 5        # 1-30, default 5
+  poll_interval_seconds: 5  # 2-30, default 5
+```
+
+## How It Works
+
+1. GSD encounters a decision point during auto-mode
+2. The question is posted to your configured channel as a rich embed (Discord) or Block Kit message (Slack)
+3. GSD polls for a response at the configured interval
+4. You respond by:
+   - **Reacting** with a number emoji (1️⃣, 2️⃣, etc.) for single-question prompts
+   - **Replying** to the message with a number (`1`), comma-separated numbers (`1,3`), or free text
+5. GSD picks up the response and continues execution
+6. A ✅ reaction is added to the prompt message to confirm receipt
+
+### Response Formats
+
+**Single question:**
+- React with a number emoji (single-question prompts)
+- Reply with a number: `2`
+- Reply with free text (captured as a user note)
+
+**Multiple questions:**
+- Reply with semicolons: `1;2;custom text`
+- Reply with newlines (one answer per line)
+
+### Timeouts
+
+If no response is received within `timeout_minutes`, the prompt times out and GSD continues with a timeout result. The LLM handles timeouts according to the task context — typically by making a conservative default choice or pausing auto-mode.
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd remote` | Show remote questions menu and current status |
+| `/gsd remote slack` | Set up Slack integration |
+| `/gsd remote discord` | Set up Discord integration |
+| `/gsd remote status` | Show current configuration and last prompt status |
+| `/gsd remote disconnect` | Remove remote questions configuration |
+
+## Discord vs Slack Feature Comparison
+
+| Feature | Discord | Slack |
+|---------|---------|-------|
+| Rich message format | Embeds with fields | Block Kit |
+| Reaction-based answers | ✅ (single-question) | ✅ (single-question) |
+| Thread-based replies | Message replies | Thread replies |
+| Message URL in logs | ✅ | ✅ |
+| Answer acknowledgement | ✅ reaction on receipt | ✅ reaction on receipt |
+| Multi-question support | Text replies (semicolons/newlines) | Text replies (semicolons/newlines) |
+| Context source in prompt | ✅ (footer) | ✅ (context block) |
+| Server/channel picker | ✅ (interactive) | ✅ (interactive + manual fallback) |
+| Token validation | ✅ | ✅ |
+| Test message on setup | ✅ | ✅ |
+
+## Troubleshooting
+
+### "Remote auth failed"
+- Verify your bot token is correct and not expired
+- For Discord: ensure the bot is still in the server
+- For Slack: ensure the bot token starts with `xoxb-`
+
+### "Could not send to channel"
+- Verify the bot has Send Messages permission in the target channel
+- For Discord: check the bot's role permissions in Server Settings
+- For Slack: ensure the bot is invited to the channel (`/invite @botname`)
+
+### No response detected
+- Ensure you're **replying to** the prompt message (not posting a new message)
+- For reactions: only number emojis (1️⃣-5️⃣) on single-question prompts are detected
+- Check that `timeout_minutes` is long enough for your response time
+
+### Channel ID format
+- **Slack:** 9-12 uppercase alphanumeric characters (e.g., `C0123456789`)
+- **Discord:** 17-20 digit numeric snowflake ID (e.g., `1234567890123456789`)
+- Enable Developer Mode in Discord (Settings → Advanced) to copy channel IDs
diff --git a/docs/skills.md b/docs/skills.md
new file mode 100644
index 000000000..6a9e1d567
--- /dev/null
+++ b/docs/skills.md
@@ -0,0 +1,188 @@
+# Skills
+
+Skills are specialized instruction sets that GSD loads when the task matches. They provide domain-specific guidance for the LLM — coding patterns, framework idioms, testing strategies, and tool usage.
+
+Skills follow the open [Agent Skills standard](https://agentskills.io/) and are **not GSD-specific** — they work with Claude Code, OpenAI Codex, Cursor, GitHub Copilot, Windsurf, and 40+ other agents.
+
+## Skill Directories
+
+GSD reads skills from two locations, in priority order:
+
+| Location                          | Scope   | Description                                              |
+|-----------------------------------|---------|----------------------------------------------------------|
+| `~/.agents/skills/`              | Global  | Shared across all projects and all compatible agents     |
+| `.agents/skills/` (project root) | Project | Project-specific skills, committable to version control  |
+
+Global skills take precedence over project skills when names collide.
+
+> **Migration from `~/.gsd/agent/skills/`:** On first launch after upgrading, GSD automatically copies skills from the legacy `~/.gsd/agent/skills/` directory to `~/.agents/skills/`. The old directory is preserved for backward compatibility.
+
+## Installing Skills
+
+Skills are installed via the [skills.sh CLI](https://skills.sh):
+
+```bash
+# Interactive — choose skills and target agents
+npx skills add dpearson2699/swift-ios-skills
+
+# Install specific skills non-interactively
+npx skills add dpearson2699/swift-ios-skills --skill swift-concurrency --skill swiftui-patterns -y
+
+# Install all skills from a repo
+npx skills add dpearson2699/swift-ios-skills --all
+
+# Check for updates
+npx skills check
+
+# Update installed skills
+npx skills update
+```
+
+### Onboarding Catalog
+
+During `gsd init`, GSD detects the project's tech stack and recommends relevant skill packs. For brownfield projects, detection is automatic; for greenfield projects, the user picks a tech stack.
+
+The curated catalog is maintained in `src/resources/extensions/gsd/skill-catalog.ts`. Each entry maps a tech stack to a skills.sh repo and specific skill names.
+
+#### Available Skill Packs
+
+**Swift (any Swift project — `Package.swift` or `.xcodeproj` detected):**
+- **SwiftUI** — layout, navigation, animations, gestures, Liquid Glass
+- **Swift Core** — Swift language, concurrency, Codable, Charts, Testing, SwiftData
+
+**iOS (only when `.xcodeproj` targets `iphoneos` via SDKROOT):**
+- **iOS App Frameworks** — App Intents, Widgets, StoreKit, MapKit, Live Activities
+- **iOS Data Frameworks** — CloudKit, HealthKit, MusicKit, WeatherKit, Contacts
+- **iOS AI & ML** — Core ML, Vision, on-device AI, speech recognition
+- **iOS Engineering** — networking, security, accessibility, localization, Instruments
+- **iOS Hardware** — Bluetooth, CoreMotion, NFC, PencilKit, RealityKit
+- **iOS Platform** — CallKit, EnergyKit, HomeKit, SharePlay, PermissionKit
+
+**Web:**
+- **React & Web Frontend** — React best practices, web design, composition patterns
+- **React Native** — cross-platform mobile patterns
+- **Frontend Design & UX** — frontend design, accessibility
+
+**Languages:**
+- **Rust** — Rust patterns and best practices
+- **Python** — Python patterns and best practices
+- **Go** — Go patterns and best practices
+
+**General:**
+- **Document Handling** — PDF, DOCX, XLSX, PPTX creation and manipulation
+
+### Maintaining the Catalog
+
+The skill catalog lives in [`src/resources/extensions/gsd/skill-catalog.ts`](../src/resources/extensions/gsd/skill-catalog.ts). To add or update a pack:
+
+1. Add a `SkillPack` entry to the `SKILL_CATALOG` array with `repo`, `skills`, and matching criteria
+2. For language-detection matching, use `matchLanguages` (values from `detection.ts` `LANGUAGE_MAP`)
+3. For Xcode platform matching, use `matchXcodePlatforms` (e.g., `["iphoneos"]` — parsed from `SDKROOT` in `project.pbxproj`)
+4. For file-presence matching, use `matchFiles` (checked against `PROJECT_FILES` in `detection.ts`)
+5. If the pack should appear in greenfield choices, add it to `GREENFIELD_STACKS`
+6. Packs sharing the same `repo` are batched into a single `npx skills add` invocation
+
+## Skill Discovery
+
+The `skill_discovery` preference controls how GSD finds skills during auto mode:
+
+| Mode | Behavior |
+|------|----------|
+| `auto` | Skills are found and applied automatically |
+| `suggest` | Skills are identified but require confirmation (default) |
+| `off` | No skill discovery |
+
+## Skill Preferences
+
+Control which skills are used via preferences:
+
+```yaml
+---
+version: 1
+always_use_skills:
+  - debug-like-expert
+prefer_skills:
+  - frontend-design
+avoid_skills:
+  - security-docker
+skill_rules:
+  - when: task involves Clerk authentication
+    use: [clerk]
+  - when: frontend styling work
+    prefer: [frontend-design]
+---
+```
+
+### Resolution Order
+
+Skills can be referenced by:
+1. **Bare name** — e.g., `frontend-design` → scans `~/.agents/skills/` and project `.agents/skills/`
+2. **Absolute path** — e.g., `/Users/you/.agents/skills/my-skill/SKILL.md`
+3. **Directory path** — e.g., `~/custom-skills/my-skill` → looks for `SKILL.md` inside
+
+Global skills (`~/.agents/skills/`) take precedence over project skills (`.agents/skills/`).
+
+## Custom Skills
+
+Create your own skills by adding a directory with a `SKILL.md` file:
+
+```
+~/.agents/skills/my-skill/
+  SKILL.md           — instructions for the LLM
+  references/        — optional reference files
+```
+
+The `SKILL.md` file contains instructions the LLM follows when the skill is active. Reference files can be loaded by the skill instructions as needed.
+
+### Project-Local Skills
+
+Place skills in your project for project-specific guidance:
+
+```
+.agents/skills/my-project-skill/
+  SKILL.md
+```
+
+Project-local skills can be committed to version control so team members share the same skill set.
+
+## Skill Lifecycle Management
+
+GSD tracks skill performance across auto-mode sessions and surfaces health data to help you maintain skill quality.
+
+### Skill Telemetry
+
+Every auto-mode unit records which skills were available and actively loaded. This data is stored in `metrics.json` alongside existing token and cost tracking.
+
+### Skill Health Dashboard
+
+View skill performance with `/gsd skill-health`:
+
+```
+/gsd skill-health              # overview table: name, uses, success%, tokens, trend, last used
+/gsd skill-health rust-core    # detailed view for one skill
+/gsd skill-health --stale 30   # skills unused for 30+ days
+/gsd skill-health --declining  # skills with falling success rates
+```
+
+The dashboard flags skills that may need attention:
+- **Success rate below 70%** over the last 10 uses
+- **Token usage rising 20%+** compared to the previous window
+- **Stale skills** unused beyond the configured threshold
+
+### Staleness Detection
+
+Skills unused for a configurable number of days are flagged as stale and can be automatically deprioritized:
+
+```yaml
+---
+skill_staleness_days: 60   # default: 60, set to 0 to disable
+---
+```
+
+Stale skills are excluded from automatic matching but remain invokable explicitly via `read`.
+
+### Heal-Skill (Post-Unit Analysis)
+
+When configured as a post-unit hook, GSD can analyze whether the agent deviated from a skill's instructions during execution. If significant drift is detected (outdated API patterns, incorrect guidance), it writes proposed fixes to `.gsd/skill-review-queue.md` for human review.
+
+Key design principle: skills are **never auto-modified**. Research shows curated skills outperform auto-generated ones significantly, so the human review step is critical.
diff --git a/docs/superpowers/plans/2026-03-17-cicd-pipeline.md b/docs/superpowers/plans/2026-03-17-cicd-pipeline.md
new file mode 100644
index 000000000..1e5f1cc56
--- /dev/null
+++ b/docs/superpowers/plans/2026-03-17-cicd-pipeline.md
@@ -0,0 +1,1404 @@
+# CI/CD Pipeline Implementation Plan
+
+> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Build a three-stage promotion pipeline (Dev → Test → Prod) with Docker images, LLM fixture replay, and npm dist-tag management.
+
+**Architecture:** GitHub Actions `workflow_run` trigger chains `ci.yml` success into a new `pipeline.yml` with three jobs (dev-publish, test-verify, prod-release). A `FixtureProvider` wraps `pi-ai`'s `ApiProvider` interface to record/replay LLM conversations. Two Docker images (CI builder + slim runtime) are built from a single multi-stage Dockerfile.
+
+**Tech Stack:** GitHub Actions, Docker (multi-stage), Node 22, Rust toolchain, npm dist-tags, GHCR
+
+**Spec:** `docs/superpowers/specs/2026-03-17-cicd-pipeline-design.md`
+
+---
+
+## File Structure
+
+### New Files
+
+| File | Responsibility |
+|------|---------------|
+| `Dockerfile` | Multi-stage: `builder` target (CI image) + `runtime` target (user image) |
+| `.github/workflows/pipeline.yml` | Three-stage promotion pipeline (Dev → Test → Prod) |
+| `.github/workflows/cleanup-dev-versions.yml` | Weekly scheduled cleanup of old `-dev.` npm versions |
+| `scripts/version-stamp.mjs` | Reads `package.json` version, appends `-dev.<sha>`, writes back |
+| `tests/smoke/run.ts` | Smoke test runner — discovers and executes all smoke tests |
+| `tests/smoke/test-version.ts` | Verify `gsd --version` outputs valid semver |
+| `tests/smoke/test-help.ts` | Verify `gsd --help` exits 0 and contains expected output |
+| `tests/smoke/test-init.ts` | Verify `gsd init` creates expected files in a temp dir |
+| `tests/fixtures/provider.ts` | `FixtureProvider` — wraps `ApiProvider`, records/replays turns |
+| `tests/fixtures/run.ts` | Fixture test runner — loads recordings, replays via `FixtureProvider` |
+| `tests/fixtures/record.ts` | Recording helper — runs a session with `GSD_FIXTURE_MODE=record` |
+| `tests/fixtures/recordings/agent-creates-file.json` | Sample fixture: single-turn file creation |
+| `tests/fixtures/recordings/agent-reads-and-edits.json` | Fixture: multi-turn read + edit flow |
+| `tests/fixtures/recordings/agent-handles-error.json` | Fixture: error response handling |
+| `tests/fixtures/recordings/agent-multi-turn-tools.json` | Fixture: multi-turn tool use round-trips |
+| `tests/live/run.ts` | Live LLM test runner (optional, Prod gate only) |
+| `tests/live/test-anthropic-roundtrip.ts` | Real Anthropic API round-trip test |
+| `tests/live/test-openai-roundtrip.ts` | Real OpenAI API round-trip test |
+
+### Modified Files
+
+| File | Change |
+|------|--------|
+| `package.json` | Add 6 new scripts (`test:smoke`, `test:fixtures`, etc.) |
+
+---
+
+## Chunk 1: Version Stamp Script + Dockerfile
+
+### Task 1: Version Stamp Script
+
+**Files:**
+- Create: `scripts/version-stamp.mjs`
+
+- [ ] **Step 1: Write the version stamp script**
+
+```javascript
+// scripts/version-stamp.mjs
+// Stamps the package.json version with -dev.<short-sha> for CI dev publishes.
+// Usage: node scripts/version-stamp.mjs
+// Example: 2.27.0 → 2.27.0-dev.a3f2c1b
+
+import { readFileSync, writeFileSync } from "fs";
+import { execFileSync } from "child_process";
+
+const pkgPath = new URL("../package.json", import.meta.url);
+const pkg = JSON.parse(readFileSync(pkgPath, "utf8"));
+
+const shortSha = execFileSync("git", ["rev-parse", "--short", "HEAD"], { encoding: "utf8" }).trim();
+const devVersion = `${pkg.version}-dev.${shortSha}`;
+
+pkg.version = devVersion;
+writeFileSync(pkgPath, JSON.stringify(pkg, null, 2) + "\n");
+
+console.log(`Stamped version: ${devVersion}`);
+```
+
+- [ ] **Step 2: Test it locally**
+
+Run: `node scripts/version-stamp.mjs`
+Expected: Outputs `Stamped version: 2.27.0-dev.<your-sha>` and modifies `package.json`
+
+- [ ] **Step 3: Revert the package.json change**
+
+Run: `git checkout -- package.json`
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add scripts/version-stamp.mjs
+git commit -m "feat(ci): add version stamp script for dev publishes"
+```
+
+---
+
+### Task 2: Multi-Stage Dockerfile
+
+**Files:**
+- Create: `Dockerfile`
+
+- [ ] **Step 1: Write the Dockerfile**
+
+```dockerfile
+# ──────────────────────────────────────────────
+# Stage 1: CI Builder
+# Image: ghcr.io/gsd-build/gsd-ci-builder
+# Used by: pipeline.yml Dev stage
+# ──────────────────────────────────────────────
+FROM node:22-bookworm AS builder
+
+# Rust toolchain (stable, minimal profile)
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable --profile minimal
+ENV PATH="/root/.cargo/bin:${PATH}"
+
+# Cross-compilation for linux-arm64
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc-aarch64-linux-gnu \
+    g++-aarch64-linux-gnu \
+    && rustup target add aarch64-unknown-linux-gnu \
+    && rm -rf /var/lib/apt/lists/*
+
+# Verify toolchain
+RUN node --version && rustc --version && cargo --version
+
+# ──────────────────────────────────────────────
+# Stage 2: Runtime
+# Image: ghcr.io/gsd-build/gsd-pi
+# Used by: end users via docker run
+# ──────────────────────────────────────────────
+FROM node:22-slim AS runtime
+
+# Git is required for GSD's git operations
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install GSD globally — version is controlled by the build arg
+ARG GSD_VERSION=latest
+RUN npm install -g gsd-pi@${GSD_VERSION}
+
+# Default working directory for user projects
+WORKDIR /workspace
+
+ENTRYPOINT ["gsd"]
+CMD ["--help"]
+```
+
+- [ ] **Step 2: Verify builder stage builds**
+
+Run: `docker build --target builder -t gsd-ci-builder-test .`
+Expected: Completes successfully (may take 5-10 min first time)
+
+- [ ] **Step 3: Verify runtime stage builds**
+
+Run: `docker build --target runtime -t gsd-pi-test .`
+Expected: Completes successfully
+
+- [ ] **Step 4: Verify runtime image works**
+
+Run: `docker run --rm gsd-pi-test --version`
+Expected: Outputs a version string
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add Dockerfile
+git commit -m "feat(ci): add multi-stage Dockerfile for CI builder and runtime images"
+```
+
+---
+
+## Chunk 2: Smoke Tests
+
+### Task 3: Smoke Test Runner and Tests
+
+**Files:**
+- Create: `tests/smoke/run.ts`
+- Create: `tests/smoke/test-version.ts`
+- Create: `tests/smoke/test-help.ts`
+- Create: `tests/smoke/test-init.ts`
+- Modify: `package.json` (add `test:smoke` script)
+
+- [ ] **Step 1: Create the smoke test runner**
+
+```typescript
+// tests/smoke/run.ts
+// Discovers and runs all smoke tests in this directory.
+// Usage: node --experimental-strip-types tests/smoke/run.ts
+// Note: Uses execFileSync (not exec) to avoid shell injection.
+
+import { readdirSync } from "fs";
+import { execFileSync } from "child_process";
+import { fileURLToPath } from "url";
+import { dirname, join } from "path";
+
+const dir = dirname(fileURLToPath(import.meta.url));
+const tests = readdirSync(dir).filter((f) => f.startsWith("test-") && f.endsWith(".ts"));
+
+let passed = 0;
+let failed = 0;
+
+for (const test of tests) {
+  const path = join(dir, test);
+  try {
+    execFileSync("node", ["--experimental-strip-types", path], {
+      encoding: "utf8",
+      stdio: "pipe",
+      timeout: 30_000,
+    });
+    console.log(`✓ ${test}`);
+    passed++;
+  } catch (err: any) {
+    console.error(`✗ ${test}`);
+    console.error(err.stdout || "");
+    console.error(err.stderr || "");
+    failed++;
+  }
+}
+
+console.log(`\n${passed} passed, ${failed} failed`);
+if (failed > 0) process.exit(1);
+```
+
+- [ ] **Step 2: Create test-version.ts**
+
+```typescript
+// tests/smoke/test-version.ts
+// Verifies that `gsd --version` outputs valid semver-like string.
+// When GSD_SMOKE_BINARY is set (CI), uses that binary directly.
+// Otherwise falls back to npx gsd-pi.
+
+import { execFileSync } from "child_process";
+
+const bin = process.env.GSD_SMOKE_BINARY;
+const output = bin
+  ? execFileSync(bin, ["--version"], { encoding: "utf8", timeout: 30_000 }).trim()
+  : execFileSync("npx", ["gsd-pi", "--version"], { encoding: "utf8", timeout: 30_000 }).trim();
+
+if (!/^\d+\.\d+\.\d+/.test(output)) {
+  console.error(`Unexpected version output: "${output}"`);
+  process.exit(1);
+}
+
+console.log(`version: ${output}`);
+```
+
+- [ ] **Step 3: Create test-help.ts**
+
+```typescript
+// tests/smoke/test-help.ts
+// Verifies that `gsd --help` exits 0 and contains expected keywords.
+
+import { execFileSync } from "child_process";
+
+const bin = process.env.GSD_SMOKE_BINARY;
+const output = bin
+  ? execFileSync(bin, ["--help"], { encoding: "utf8", timeout: 30_000 })
+  : execFileSync("npx", ["gsd-pi", "--help"], { encoding: "utf8", timeout: 30_000 });
+
+const requiredKeywords = ["gsd", "usage"];
+for (const keyword of requiredKeywords) {
+  if (!output.toLowerCase().includes(keyword)) {
+    console.error(`Missing keyword "${keyword}" in help output`);
+    process.exit(1);
+  }
+}
+
+console.log("help output OK");
+```
+
+- [ ] **Step 4: Create test-init.ts**
+
+```typescript
+// tests/smoke/test-init.ts
+// Verifies that `gsd init` creates expected files in a temp directory.
+
+import { execFileSync } from "child_process";
+import { mkdtempSync, existsSync, rmSync } from "fs";
+import { join } from "path";
+import { tmpdir } from "os";
+
+const tmp = mkdtempSync(join(tmpdir(), "gsd-smoke-init-"));
+
+try {
+  const bin = process.env.GSD_SMOKE_BINARY;
+  const args = bin ? [bin, "init"] : ["npx", "gsd-pi", "init"];
+  execFileSync(args[0], args.slice(1), {
+    encoding: "utf8",
+    cwd: tmp,
+    timeout: 30_000,
+    env: { ...process.env, GSD_NON_INTERACTIVE: "1" },
+  });
+
+  // Check that .gsd directory was created
+  if (!existsSync(join(tmp, ".gsd"))) {
+    console.error("Expected .gsd/ directory not found after init");
+    process.exit(1);
+  }
+
+  console.log("init OK");
+} finally {
+  rmSync(tmp, { recursive: true, force: true });
+}
+```
+
+- [ ] **Step 5: Add test:smoke script to package.json**
+
+Add to `package.json` `scripts`:
+```json
+"test:smoke": "node --experimental-strip-types tests/smoke/run.ts"
+```
+
+- [ ] **Step 6: Run the smoke tests locally**
+
+Run: `npm run test:smoke`
+Expected: All 3 tests pass (version, help, init)
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add tests/smoke/ package.json
+git commit -m "feat(ci): add CLI smoke tests for pipeline test stage"
+```
+
+---
+
+## Chunk 3: LLM Fixture Provider
+
+### Task 4: FixtureProvider Implementation
+
+**Files:**
+- Create: `tests/fixtures/provider.ts`
+
+The `FixtureProvider` operates at the `ApiProvider` level defined in `packages/pi-ai/src/api-registry.ts:23-27`. The key interface is:
+
+```typescript
+interface ApiProvider<TApi, TOptions> {
+  api: TApi;
+  stream: StreamFunction<TApi, TOptions>;
+  streamSimple: StreamFunction<TApi, SimpleStreamOptions>;
+}
+```
+
+The provider is registered via `registerApiProvider()` from `packages/pi-ai/src/api-registry.ts:66`.
+
+- [ ] **Step 1: Write the FixtureProvider**
+
+```typescript
+// tests/fixtures/provider.ts
+// Records and replays LLM conversations at the pi-ai ApiProvider level.
+//
+// Record mode: wraps a real provider, saves request/response to JSON.
+// Replay mode: loads saved JSON, serves responses by turn index.
+//
+// Controlled via environment variables:
+//   GSD_FIXTURE_MODE=record|replay
+//   GSD_FIXTURE_DIR=./tests/fixtures/recordings
+
+import { readFileSync, writeFileSync, mkdirSync } from "fs";
+import { join } from "path";
+
+export interface FixtureTurn {
+  request: {
+    model: string;
+    messages: unknown[];
+    tools?: string[];
+  };
+  response: {
+    content: unknown[];
+    stopReason: string;
+    usage: { input: number; output: number };
+  };
+}
+
+export interface FixtureFile {
+  name: string;
+  recorded: string;
+  provider: string;
+  model: string;
+  turns: FixtureTurn[];
+}
+
+export type FixtureMode = "record" | "replay" | "off";
+
+export function getFixtureMode(): FixtureMode {
+  const mode = process.env.GSD_FIXTURE_MODE;
+  if (mode === "record" || mode === "replay") return mode;
+  return "off";
+}
+
+export function getFixtureDir(): string {
+  return process.env.GSD_FIXTURE_DIR || join(process.cwd(), "tests/fixtures/recordings");
+}
+
+export function loadFixture(filepath: string): FixtureFile {
+  const raw = readFileSync(filepath, "utf8");
+  return JSON.parse(raw) as FixtureFile;
+}
+
+export function saveFixture(filepath: string, fixture: FixtureFile): void {
+  const dir = filepath.substring(0, filepath.lastIndexOf("/"));
+  mkdirSync(dir, { recursive: true });
+  writeFileSync(filepath, JSON.stringify(fixture, null, 2) + "\n");
+}
+
+/**
+ * Creates a replay-mode result from a saved fixture turn.
+ * Returns an object with an async result() method that resolves
+ * to the saved response, compatible with AssistantMessageEventStream.
+ */
+export function createReplayStream(turn: FixtureTurn) {
+  const message = {
+    content: turn.response.content,
+    stopReason: turn.response.stopReason,
+    usage: turn.response.usage,
+  };
+
+  return {
+    async *[Symbol.asyncIterator]() {
+      yield { type: "message_complete" as const, message };
+    },
+    result: async () => message,
+  };
+}
+
+/**
+ * FixtureRecorder collects turns during a recording session
+ * and saves them to a JSON file when finalized.
+ */
+export class FixtureRecorder {
+  private turns: FixtureTurn[] = [];
+  private name: string;
+  private provider: string;
+  private model: string;
+
+  constructor(name: string, provider: string, model: string) {
+    this.name = name;
+    this.provider = provider;
+    this.model = model;
+  }
+
+  addTurn(turn: FixtureTurn): void {
+    this.turns.push(turn);
+  }
+
+  save(dir: string): string {
+    const fixture: FixtureFile = {
+      name: this.name,
+      recorded: new Date().toISOString(),
+      provider: this.provider,
+      model: this.model,
+      turns: this.turns,
+    };
+    const filepath = join(dir, `${this.name}.json`);
+    saveFixture(filepath, fixture);
+    return filepath;
+  }
+}
+
+/**
+ * FixtureReplayer serves saved responses by turn index.
+ * Throws if the conversation requests more turns than recorded.
+ */
+export class FixtureReplayer {
+  private fixture: FixtureFile;
+  private turnIndex = 0;
+
+  constructor(fixture: FixtureFile) {
+    this.fixture = fixture;
+  }
+
+  nextTurn(): FixtureTurn {
+    if (this.turnIndex >= this.fixture.turns.length) {
+      throw new Error(
+        `Fixture "${this.fixture.name}" exhausted: requested turn ${this.turnIndex} but only ${this.fixture.turns.length} turns recorded`
+      );
+    }
+    return this.fixture.turns[this.turnIndex++];
+  }
+
+  get turnsRemaining(): number {
+    return this.fixture.turns.length - this.turnIndex;
+  }
+}
+```
+
+Note: This provider implements the core recording/replay data structures and utilities. Wiring it into the `pi-ai` registry as a drop-in `ApiProvider` (via `registerApiProvider()` from `packages/pi-ai/src/api-registry.ts`) requires importing `@gsd/pi-ai` internals, which couples tests to the build output. This integration is deferred to a follow-up task after the pipeline is operational. The current implementation validates fixture format, turn sequencing, and replay correctness independently.
+
+- [ ] **Step 2: Verify the file has no syntax errors**
+
+Run: `node --experimental-strip-types -e "import('./tests/fixtures/provider.ts').then(() => console.log('OK'))"`
+Expected: `OK`
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add tests/fixtures/provider.ts
+git commit -m "feat(ci): add FixtureProvider for LLM conversation recording and replay"
+```
+
+---
+
+### Task 5: Fixture Test Runner
+
+**Files:**
+- Create: `tests/fixtures/run.ts`
+- Create: `tests/fixtures/recordings/agent-creates-file.json`
+- Modify: `package.json` (add `test:fixtures` script)
+
+- [ ] **Step 1: Create a sample fixture recording**
+
+Save to `tests/fixtures/recordings/agent-creates-file.json`:
+
+```json
+{
+  "name": "agent-creates-file",
+  "recorded": "2026-03-17T00:00:00Z",
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-6",
+  "turns": [
+    {
+      "request": {
+        "model": "claude-sonnet-4-6",
+        "messages": [{ "role": "user", "content": "Create a file called hello.ts with a console.log" }],
+        "tools": ["Write", "Read"]
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "I'll create hello.ts for you." },
+          {
+            "type": "tool_use",
+            "id": "toolu_01",
+            "name": "Write",
+            "input": { "file_path": "hello.ts", "content": "console.log('hello');\n" }
+          }
+        ],
+        "stopReason": "toolUse",
+        "usage": { "input": 150, "output": 45 }
+      }
+    }
+  ]
+}
+```
+
+- [ ] **Step 2: Create the fixture test runner**
+
+```typescript
+// tests/fixtures/run.ts
+// Loads all fixture recordings and replays them through the FixtureProvider.
+// Verifies each turn produces the expected response shape.
+//
+// Usage: node --experimental-strip-types tests/fixtures/run.ts
+
+import { readdirSync } from "fs";
+import { join, dirname } from "path";
+import { fileURLToPath } from "url";
+import {
+  loadFixture,
+  FixtureReplayer,
+  createReplayStream,
+} from "./provider.ts";
+
+const dir = dirname(fileURLToPath(import.meta.url));
+const recordingsDir = join(dir, "recordings");
+
+const files = readdirSync(recordingsDir).filter((f) => f.endsWith(".json"));
+
+if (files.length === 0) {
+  console.error("No fixture recordings found in", recordingsDir);
+  process.exit(1);
+}
+
+let passed = 0;
+let failed = 0;
+
+for (const file of files) {
+  const filepath = join(recordingsDir, file);
+  try {
+    const fixture = loadFixture(filepath);
+    const replayer = new FixtureReplayer(fixture);
+
+    // Replay each turn and verify the response is well-formed
+    for (let i = 0; i < fixture.turns.length; i++) {
+      const turn = replayer.nextTurn();
+
+      // Verify response has required fields
+      if (!turn.response.content || !Array.isArray(turn.response.content)) {
+        throw new Error(`Turn ${i}: response.content is not an array`);
+      }
+      if (!turn.response.stopReason) {
+        throw new Error(`Turn ${i}: response.stopReason is missing`);
+      }
+      if (typeof turn.response.usage?.input !== "number") {
+        throw new Error(`Turn ${i}: response.usage.input is not a number`);
+      }
+
+      // Verify the replay stream produces a result
+      const stream = createReplayStream(turn);
+      const result = await stream.result();
+
+      if (!result.content) {
+        throw new Error(`Turn ${i}: replayed result has no content`);
+      }
+    }
+
+    // Verify replayer is exhausted
+    if (replayer.turnsRemaining !== 0) {
+      throw new Error(`${replayer.turnsRemaining} turns remaining after replay`);
+    }
+
+    console.log(`✓ ${fixture.name} (${fixture.turns.length} turns)`);
+    passed++;
+  } catch (err: any) {
+    console.error(`✗ ${file}: ${err.message}`);
+    failed++;
+  }
+}
+
+console.log(`\n${passed} passed, ${failed} failed`);
+if (failed > 0) process.exit(1);
+```
+
+- [ ] **Step 3: Add test:fixtures script to package.json**
+
+Add to `package.json` `scripts`:
+```json
+"test:fixtures": "node --experimental-strip-types tests/fixtures/run.ts"
+```
+
+- [ ] **Step 4: Run the fixture tests**
+
+Run: `npm run test:fixtures`
+Expected: `✓ agent-creates-file (1 turns)` — 1 passed, 0 failed
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tests/fixtures/run.ts tests/fixtures/recordings/ package.json
+git commit -m "feat(ci): add fixture test runner with sample recording"
+```
+
+---
+
+### Task 5b: Additional Fixture Recordings
+
+**Files:**
+- Create: `tests/fixtures/recordings/agent-reads-and-edits.json`
+- Create: `tests/fixtures/recordings/agent-handles-error.json`
+- Create: `tests/fixtures/recordings/agent-multi-turn-tools.json`
+
+- [ ] **Step 1: Create multi-turn read+edit fixture**
+
+Save to `tests/fixtures/recordings/agent-reads-and-edits.json`:
+
+```json
+{
+  "name": "agent-reads-and-edits",
+  "recorded": "2026-03-17T00:00:00Z",
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-6",
+  "turns": [
+    {
+      "request": {
+        "model": "claude-sonnet-4-6",
+        "messages": [{ "role": "user", "content": "Read hello.ts and add a comment" }],
+        "tools": ["Read", "Edit"]
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "Let me read the file first." },
+          { "type": "tool_use", "id": "toolu_01", "name": "Read", "input": { "file_path": "hello.ts" } }
+        ],
+        "stopReason": "toolUse",
+        "usage": { "input": 120, "output": 35 }
+      }
+    },
+    {
+      "request": {
+        "model": "claude-sonnet-4-6",
+        "messages": [{ "role": "tool", "content": "console.log('hello');\n" }],
+        "tools": ["Read", "Edit"]
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "I'll add a comment." },
+          { "type": "tool_use", "id": "toolu_02", "name": "Edit", "input": { "file_path": "hello.ts", "old_string": "console.log", "new_string": "// greeting\nconsole.log" } }
+        ],
+        "stopReason": "toolUse",
+        "usage": { "input": 180, "output": 50 }
+      }
+    }
+  ]
+}
+```
+
+- [ ] **Step 2: Create error-handling fixture**
+
+Save to `tests/fixtures/recordings/agent-handles-error.json`:
+
+```json
+{
+  "name": "agent-handles-error",
+  "recorded": "2026-03-17T00:00:00Z",
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-6",
+  "turns": [
+    {
+      "request": {
+        "model": "claude-sonnet-4-6",
+        "messages": [{ "role": "user", "content": "Read nonexistent.ts" }],
+        "tools": ["Read"]
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "Let me try to read that file." },
+          { "type": "tool_use", "id": "toolu_01", "name": "Read", "input": { "file_path": "nonexistent.ts" } }
+        ],
+        "stopReason": "toolUse",
+        "usage": { "input": 100, "output": 30 }
+      }
+    },
+    {
+      "request": {
+        "model": "claude-sonnet-4-6",
+        "messages": [{ "role": "tool", "content": "Error: File does not exist" }],
+        "tools": ["Read"]
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "The file nonexistent.ts doesn't exist. Would you like me to create it?" }
+        ],
+        "stopReason": "stop",
+        "usage": { "input": 140, "output": 25 }
+      }
+    }
+  ]
+}
+```
+
+- [ ] **Step 3: Create multi-turn tool use fixture**
+
+Save to `tests/fixtures/recordings/agent-multi-turn-tools.json`:
+
+```json
+{
+  "name": "agent-multi-turn-tools",
+  "recorded": "2026-03-17T00:00:00Z",
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-6",
+  "turns": [
+    {
+      "request": {
+        "model": "claude-sonnet-4-6",
+        "messages": [{ "role": "user", "content": "Create utils.ts with an add function, then create a test file" }],
+        "tools": ["Write", "Read"]
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "I'll create both files." },
+          { "type": "tool_use", "id": "toolu_01", "name": "Write", "input": { "file_path": "utils.ts", "content": "export function add(a: number, b: number): number {\n  return a + b;\n}\n" } }
+        ],
+        "stopReason": "toolUse",
+        "usage": { "input": 130, "output": 55 }
+      }
+    },
+    {
+      "request": {
+        "model": "claude-sonnet-4-6",
+        "messages": [{ "role": "tool", "content": "File created successfully" }],
+        "tools": ["Write", "Read"]
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "Now the test file." },
+          { "type": "tool_use", "id": "toolu_02", "name": "Write", "input": { "file_path": "utils.test.ts", "content": "import { add } from './utils.ts';\nimport { test } from 'node:test';\nimport assert from 'node:assert';\n\ntest('add', () => {\n  assert.strictEqual(add(1, 2), 3);\n});\n" } }
+        ],
+        "stopReason": "toolUse",
+        "usage": { "input": 200, "output": 70 }
+      }
+    }
+  ]
+}
+```
+
+- [ ] **Step 4: Re-run fixture tests to verify all 4 fixtures pass**
+
+Run: `npm run test:fixtures`
+Expected: 4 passed, 0 failed
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add tests/fixtures/recordings/
+git commit -m "feat(ci): add additional fixture recordings for multi-turn and error scenarios"
+```
+
+---
+
+## Chunk 4: Live Tests (Stub) + npm Scripts
+
+### Task 6: Live Test Stubs
+
+**Files:**
+- Create: `tests/live/run.ts`
+- Create: `tests/live/test-anthropic-roundtrip.ts`
+- Modify: `package.json` (add remaining scripts)
+
+- [ ] **Step 1: Create live test runner**
+
+```typescript
+// tests/live/run.ts
+// Runs real LLM integration tests. Only executes when GSD_LIVE_TESTS=1.
+// These tests cost real money — used in the Prod gate only.
+//
+// Usage: GSD_LIVE_TESTS=1 node --experimental-strip-types tests/live/run.ts
+
+if (process.env.GSD_LIVE_TESTS !== "1") {
+  console.log("Skipping live tests (set GSD_LIVE_TESTS=1 to enable)");
+  process.exit(0);
+}
+
+import { readdirSync } from "fs";
+import { execFileSync } from "child_process";
+import { fileURLToPath } from "url";
+import { dirname, join } from "path";
+
+const dir = dirname(fileURLToPath(import.meta.url));
+const tests = readdirSync(dir).filter((f) => f.startsWith("test-") && f.endsWith(".ts"));
+
+let passed = 0;
+let failed = 0;
+
+for (const test of tests) {
+  const path = join(dir, test);
+  try {
+    execFileSync("node", ["--experimental-strip-types", path], {
+      encoding: "utf8",
+      stdio: "pipe",
+      timeout: 120_000,
+      env: { ...process.env },
+    });
+    console.log(`✓ ${test}`);
+    passed++;
+  } catch (err: any) {
+    console.error(`✗ ${test}`);
+    console.error(err.stdout || "");
+    console.error(err.stderr || "");
+    failed++;
+  }
+}
+
+console.log(`\n${passed} passed, ${failed} failed`);
+if (failed > 0) process.exit(1);
+```
+
+- [ ] **Step 2: Create Anthropic roundtrip test**
+
+```typescript
+// tests/live/test-anthropic-roundtrip.ts
+// Sends a minimal request to the Anthropic API and verifies a response.
+// Requires ANTHROPIC_API_KEY in environment.
+
+const apiKey = process.env.ANTHROPIC_API_KEY;
+if (!apiKey) {
+  console.error("ANTHROPIC_API_KEY not set");
+  process.exit(1);
+}
+
+const response = await fetch("https://api.anthropic.com/v1/messages", {
+  method: "POST",
+  headers: {
+    "Content-Type": "application/json",
+    "x-api-key": apiKey,
+    "anthropic-version": "2023-06-01",
+  },
+  body: JSON.stringify({
+    model: "claude-haiku-4-5",
+    max_tokens: 32,
+    messages: [{ role: "user", content: "Reply with exactly: OK" }],
+  }),
+});
+
+if (!response.ok) {
+  const body = await response.text();
+  console.error(`API error ${response.status}: ${body}`);
+  process.exit(1);
+}
+
+const data = (await response.json()) as { content: Array<{ text: string }> };
+const text = data.content[0]?.text;
+
+if (!text || text.length === 0) {
+  console.error("Empty response from API");
+  process.exit(1);
+}
+
+console.log(`Anthropic roundtrip OK: "${text.substring(0, 50)}"`);
+```
+
+- [ ] **Step 3: Create OpenAI roundtrip test**
+
+```typescript
+// tests/live/test-openai-roundtrip.ts
+// Sends a minimal request to the OpenAI API and verifies a response.
+// Requires OPENAI_API_KEY in environment.
+
+const apiKey = process.env.OPENAI_API_KEY;
+if (!apiKey) {
+  console.error("OPENAI_API_KEY not set");
+  process.exit(1);
+}
+
+const response = await fetch("https://api.openai.com/v1/chat/completions", {
+  method: "POST",
+  headers: {
+    "Content-Type": "application/json",
+    "Authorization": `Bearer ${apiKey}`,
+  },
+  body: JSON.stringify({
+    model: "gpt-4o-mini",
+    max_tokens: 32,
+    messages: [{ role: "user", content: "Reply with exactly: OK" }],
+  }),
+});
+
+if (!response.ok) {
+  const body = await response.text();
+  console.error(`API error ${response.status}: ${body}`);
+  process.exit(1);
+}
+
+const data = (await response.json()) as { choices: Array<{ message: { content: string } }> };
+const text = data.choices[0]?.message?.content;
+
+if (!text || text.length === 0) {
+  console.error("Empty response from API");
+  process.exit(1);
+}
+
+console.log(`OpenAI roundtrip OK: "${text.substring(0, 50)}"`);
+```
+
+- [ ] **Step 4: Add remaining scripts to package.json**
+
+Add to `package.json` `scripts`:
+```json
+"test:fixtures:record": "GSD_FIXTURE_MODE=record node --experimental-strip-types tests/fixtures/record.ts",
+"test:live": "GSD_LIVE_TESTS=1 node --experimental-strip-types tests/live/run.ts",
+"pipeline:version-stamp": "node scripts/version-stamp.mjs",
+"docker:build-runtime": "docker build --target runtime -t ghcr.io/gsd-build/gsd-pi .",
+"docker:build-builder": "docker build --target builder -t ghcr.io/gsd-build/gsd-ci-builder ."
+```
+
+- [ ] **Step 5: Verify live tests skip without env var**
+
+Run: `npm run test:live`
+Expected: `Skipping live tests (set GSD_LIVE_TESTS=1 to enable)` and exit 0
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add tests/live/ package.json
+git commit -m "feat(ci): add live LLM test stubs and remaining npm scripts"
+```
+
+---
+
+## Chunk 5: GitHub Actions Workflows
+
+### Task 7: Pipeline Workflow
+
+**Files:**
+- Create: `.github/workflows/pipeline.yml`
+
+- [ ] **Step 1: Write the pipeline workflow**
+
+```yaml
+# .github/workflows/pipeline.yml
+# Three-stage promotion pipeline: Dev → Test → Prod
+# Triggers after ci.yml succeeds on main branch.
+
+name: Release Pipeline
+
+on:
+  workflow_run:
+    workflows: ["CI"]
+    types: [completed]
+    branches: [main]
+
+concurrency:
+  group: pipeline-${{ github.sha }}
+  cancel-in-progress: false
+
+jobs:
+  # ─── DEV STAGE ─────────────────────────────────────────────
+  dev-publish:
+    if: ${{ github.event.workflow_run.conclusion == 'success' }}
+    runs-on: ubuntu-latest
+    container:
+      image: ghcr.io/gsd-build/gsd-ci-builder:latest  # Pin to date tag after first build
+    environment: dev
+    outputs:
+      dev-version: ${{ steps.stamp.outputs.version }}
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          ref: ${{ github.event.workflow_run.head_sha }}
+          fetch-depth: 0
+
+      - name: Setup npm registry
+        run: echo "//registry.npmjs.org/:_authToken=${NODE_AUTH_TOKEN}" > ~/.npmrc
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Build
+        run: npm run build
+
+      - name: Stamp dev version
+        id: stamp
+        run: |
+          node scripts/version-stamp.mjs
+          VERSION=$(node -p "require('./package.json').version")
+          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+          echo "Dev version: $VERSION"
+
+      - name: Publish to npm with @dev tag
+        run: npm publish --tag dev
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+
+      - name: Smoke test published package
+        run: |
+          mkdir /tmp/smoke-test && cd /tmp/smoke-test
+          npm init -y
+          npm install gsd-pi@dev
+          npx gsd --version
+
+  # ─── TEST STAGE ────────────────────────────────────────────
+  test-verify:
+    needs: dev-publish
+    runs-on: ubuntu-latest
+    environment: test
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          ref: ${{ github.event.workflow_run.head_sha }}
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: "22"
+          registry-url: "https://registry.npmjs.org"
+
+      - name: Install published dev package globally
+        run: npm install -g gsd-pi@dev
+
+      - name: Install dev dependencies for test runners
+        run: npm ci
+
+      - name: Run CLI smoke tests
+        run: npm run test:smoke
+        env:
+          GSD_SMOKE_BINARY: gsd  # Use globally installed binary, not npx
+
+      - name: Run fixture replay tests
+        run: npm run test:fixtures
+
+      - name: Promote to @next
+        run: npm dist-tag add gsd-pi@${{ needs.dev-publish.outputs.dev-version }} next
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+
+      - name: Build and push runtime Docker image
+        run: |
+          echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
+          docker build --target runtime \
+            --build-arg GSD_VERSION=${{ needs.dev-publish.outputs.dev-version }} \
+            -t ghcr.io/gsd-build/gsd-pi:next \
+            -t ghcr.io/gsd-build/gsd-pi:${{ needs.dev-publish.outputs.dev-version }} \
+            .
+          docker push ghcr.io/gsd-build/gsd-pi:next
+          docker push ghcr.io/gsd-build/gsd-pi:${{ needs.dev-publish.outputs.dev-version }}
+
+  # ─── PROD STAGE ────────────────────────────────────────────
+  prod-release:
+    needs: [dev-publish, test-verify]
+    runs-on: ubuntu-latest
+    environment: prod  # Requires manual approval
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          ref: ${{ github.event.workflow_run.head_sha }}
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: "22"
+          registry-url: "https://registry.npmjs.org"
+
+      - name: Run live LLM tests (optional)
+        if: ${{ vars.RUN_LIVE_TESTS == 'true' }}
+        run: |
+          npm ci
+          npm run build
+          GSD_LIVE_TESTS=1 npm run test:live
+        env:
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+
+      - name: Promote to @latest
+        run: npm dist-tag add gsd-pi@${{ needs.dev-publish.outputs.dev-version }} latest
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+
+      - name: Tag and push Docker images
+        run: |
+          echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
+          docker pull ghcr.io/gsd-build/gsd-pi:${{ needs.dev-publish.outputs.dev-version }}
+          docker tag ghcr.io/gsd-build/gsd-pi:${{ needs.dev-publish.outputs.dev-version }} ghcr.io/gsd-build/gsd-pi:latest
+          docker push ghcr.io/gsd-build/gsd-pi:latest
+
+      - name: Create GitHub Release
+        run: |
+          gh release create v${{ needs.dev-publish.outputs.dev-version }} \
+            --generate-notes \
+            --title "v${{ needs.dev-publish.outputs.dev-version }}"
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Post-publish smoke test
+        run: |
+          mkdir /tmp/prod-smoke && cd /tmp/prod-smoke
+          npm init -y
+          npm install gsd-pi@latest
+          npx gsd --version
+
+  # ─── CI BUILDER IMAGE (conditional) ────────────────────────
+  update-builder:
+    if: |
+      github.event.workflow_run.conclusion == 'success' &&
+      contains(toJSON(github.event.workflow_run.head_commit.modified), 'Dockerfile')
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          ref: ${{ github.event.workflow_run.head_sha }}
+
+      - name: Generate date tag
+        id: tag
+        run: echo "date=$(date +%Y-%m-%d)" >> "$GITHUB_OUTPUT"
+
+      - name: Build and push CI builder image
+        run: |
+          echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
+          docker build --target builder \
+            -t ghcr.io/gsd-build/gsd-ci-builder:latest \
+            -t ghcr.io/gsd-build/gsd-ci-builder:${{ steps.tag.outputs.date }} \
+            .
+          docker push ghcr.io/gsd-build/gsd-ci-builder:latest
+          docker push ghcr.io/gsd-build/gsd-ci-builder:${{ steps.tag.outputs.date }}
+
+      - name: Verify builder image
+        run: |
+          docker run --rm ghcr.io/gsd-build/gsd-ci-builder:latest node --version
+          docker run --rm ghcr.io/gsd-build/gsd-ci-builder:latest rustc --version
+```
+
+- [ ] **Step 2: Validate YAML syntax**
+
+Run: `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/pipeline.yml'))"`
+Expected: No errors
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add .github/workflows/pipeline.yml
+git commit -m "feat(ci): add three-stage promotion pipeline workflow"
+```
+
+---
+
+### Task 8: Dev Version Cleanup Workflow
+
+**Files:**
+- Create: `.github/workflows/cleanup-dev-versions.yml`
+
+- [ ] **Step 1: Write the cleanup workflow**
+
+```yaml
+# .github/workflows/cleanup-dev-versions.yml
+# Weekly cleanup of old -dev. npm versions to prevent registry bloat.
+# Unpublishes dev versions older than 30 days.
+
+name: Cleanup Dev Versions
+
+on:
+  schedule:
+    - cron: "0 6 * * 1"  # Every Monday at 06:00 UTC
+  workflow_dispatch: {}   # Allow manual trigger
+
+jobs:
+  cleanup:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: "22"
+          registry-url: "https://registry.npmjs.org"
+
+      - name: Remove old dev versions
+        run: |
+          VERSIONS=$(npm view gsd-pi versions --json 2>/dev/null || echo "[]")
+
+          DEV_VERSIONS=$(echo "$VERSIONS" | node -e "
+            const stdin = require('fs').readFileSync('/dev/stdin', 'utf8');
+            const versions = JSON.parse(stdin);
+            for (const v of versions) {
+              if (v.includes('-dev.')) {
+                console.log(v);
+              }
+            }
+          ")
+
+          if [ -z "$DEV_VERSIONS" ]; then
+            echo "No dev versions to clean up"
+            exit 0
+          fi
+
+          THIRTY_DAYS_MS=2592000000
+
+          for VERSION in $DEV_VERSIONS; do
+            PUBLISH_TIME=$(npm view "gsd-pi@$VERSION" time --json 2>/dev/null || echo "")
+
+            if [ -n "$PUBLISH_TIME" ]; then
+              AGE_MS=$(node -e "
+                const t = JSON.parse('$PUBLISH_TIME');
+                console.log(Date.now() - new Date(t).getTime());
+              " 2>/dev/null || echo "0")
+
+              if [ "$AGE_MS" -gt "$THIRTY_DAYS_MS" ]; then
+                echo "Unpublishing gsd-pi@$VERSION"
+                npm unpublish "gsd-pi@$VERSION" || echo "Failed to unpublish $VERSION"
+              else
+                echo "Keeping gsd-pi@$VERSION (within 30 days)"
+              fi
+            fi
+          done
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+```
+
+- [ ] **Step 2: Validate YAML syntax**
+
+Run: `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/cleanup-dev-versions.yml'))"`
+Expected: No errors
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add .github/workflows/cleanup-dev-versions.yml
+git commit -m "feat(ci): add weekly dev version cleanup workflow"
+```
+
+---
+
+## Chunk 6: Recording Helper + Final Integration
+
+### Task 9: Fixture Recording Helper
+
+**Files:**
+- Create: `tests/fixtures/record.ts`
+
+- [ ] **Step 1: Create the recording helper**
+
+```typescript
+// tests/fixtures/record.ts
+// Helper for recording new LLM fixtures.
+//
+// Usage:
+//   GSD_FIXTURE_MODE=record \
+//   GSD_FIXTURE_DIR=./tests/fixtures/recordings \
+//   node --experimental-strip-types tests/fixtures/record.ts
+//
+// This is a developer tool, not used in CI.
+// After recording, review and commit the generated fixture JSON.
+
+import { getFixtureMode, getFixtureDir } from "./provider.ts";
+
+const mode = getFixtureMode();
+const dir = getFixtureDir();
+
+if (mode !== "record") {
+  console.error("Recording requires GSD_FIXTURE_MODE=record");
+  console.error("");
+  console.error("Usage:");
+  console.error("  GSD_FIXTURE_MODE=record GSD_FIXTURE_DIR=./tests/fixtures/recordings \\");
+  console.error("  node --experimental-strip-types tests/fixtures/record.ts");
+  process.exit(1);
+}
+
+console.log("Fixture recording mode enabled");
+console.log(`Recordings will be saved to: ${dir}`);
+console.log("");
+console.log("To record a fixture:");
+console.log("1. Set GSD_FIXTURE_MODE=record in your environment");
+console.log("2. Run your GSD session normally");
+console.log("3. The FixtureProvider will intercept and save all LLM calls");
+console.log("4. Review the generated JSON in the recordings directory");
+console.log("5. Commit the fixture to version control");
+console.log("");
+console.log("Note: The FixtureProvider must be integrated into the");
+console.log("agent session startup to intercept real API calls.");
+console.log("See tests/fixtures/provider.ts for the integration API.");
+```
+
+- [ ] **Step 2: Commit**
+
+```bash
+git add tests/fixtures/record.ts
+git commit -m "feat(ci): add fixture recording helper with usage instructions"
+```
+
+---
+
+### Task 10: Final Integration Verification
+
+**Prerequisite:** All work should be on the `ci-cd` branch (created from `main` before starting Task 1).
+
+- [ ] **Step 1: Run the full test suite**
+
+```bash
+npm run test:smoke
+npm run test:fixtures
+npm run test:live
+```
+
+Expected:
+- Smoke tests: 3 passed
+- Fixture tests: 1 passed
+- Live tests: Skipped (no `GSD_LIVE_TESTS=1`)
+
+- [ ] **Step 2: Validate all workflow YAML files**
+
+```bash
+python3 -c "
+import yaml, glob
+for f in glob.glob('.github/workflows/*.yml'):
+    yaml.safe_load(open(f))
+    print(f'OK: {f}')
+"
+```
+
+Expected: All `.yml` files parse without errors
+
+- [ ] **Step 3: Verify git status is clean**
+
+Run: `git status`
+Expected: Nothing to commit, working tree clean
+
+- [ ] **Step 4: Review commit history**
+
+Run: `git log --oneline ci-cd ^main`
+Expected: ~10 commits, each self-contained and descriptive
+
+---
+
+## Post-Implementation: GitHub Configuration (Manual)
+
+These steps require repo admin access and cannot be automated:
+
+1. **Create GitHub Environments:**
+   - `dev` — no protection rules
+   - `test` — no protection rules
+   - `prod` — add required reviewers (maintainer list)
+
+2. **Add secrets:**
+   - `NPM_TOKEN` → all environments
+   - `ANTHROPIC_API_KEY` → prod only
+   - `OPENAI_API_KEY` → prod only
+
+3. **Add environment variable:**
+   - `RUN_LIVE_TESTS` → `false` by default on `prod` (set to `true` to enable)
+
+4. **Enable GHCR:**
+   - Ensure GitHub Container Registry is enabled for the `gsd-build` org
+
+5. **Test the pipeline end-to-end:**
+   - Merge a test PR to `main`
+   - Watch Dev stage publish to `@dev`
+   - Watch Test stage auto-promote to `@next`
+   - Manually approve Prod to promote to `@latest`
diff --git a/docs/superpowers/specs/2026-03-17-cicd-pipeline-design.md b/docs/superpowers/specs/2026-03-17-cicd-pipeline-design.md
new file mode 100644
index 000000000..57875d90e
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-17-cicd-pipeline-design.md
@@ -0,0 +1,357 @@
+# CI/CD Pipeline Design — GSD 2
+
+## Overview
+
+A three-stage promotion pipeline for GSD 2 that moves merged PRs through Dev → Test → Prod using npm dist-tags as environment markers, GitHub Environments for approval gates, and Docker images for both CI acceleration and end-user distribution.
+
+## Goals
+
+1. Every merged PR is immediately installable via `npx gsd-pi@dev`
+2. Verified builds auto-promote to `@next` for early adopters
+3. Production releases require manual approval and optional live-LLM validation
+4. CI builds are fast and reproducible via pre-built Docker builder image
+5. End users can run GSD via Docker as an alternative to npm
+6. LLM-dependent behavior is testable without API calls via recorded fixtures
+
+## Non-Goals
+
+- Replacing the existing PR gate workflow (`ci.yml`)
+- Replacing the native binary cross-compilation workflow (`build-native.yml`)
+- Cross-platform native binary builds (macOS/Windows remain on `build-native.yml`)
+- Hosting GSD as a web service
+- Automated prompt regression testing (future work)
+
+## Pipeline Architecture
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    PR Merged to main                        │
+│              ci.yml runs (build, test, typecheck)           │
+└──────────────────────────┬──────────────────────────────────┘
+                           ▼ (workflow_run: ci.yml success)
+┌──────────────────────────────────────────────────────────────┐
+│  STAGE: DEV                          Environment: dev        │
+│                                                              │
+│  1. Version stamp: <current>-dev.<short-sha>                 │
+│  2. npm publish gsd-pi@<version>-dev.<sha> --tag dev         │
+│  3. Smoke test: npx gsd-pi@dev --version                    │
+│                                                              │
+│  Note: Build/test/typecheck already ran in ci.yml            │
+│  Docker: Build CI builder image (only if Dockerfile changed) │
+└──────────────────────────┬──────────────────────────────────┘
+                           ▼ (auto-promote if all green)
+┌──────────────────────────────────────────────────────────────┐
+│  STAGE: TEST                         Environment: test       │
+│                                                              │
+│  1. Install gsd-pi@dev from registry                         │
+│  2. CLI smoke tests (--version, init, help, config)          │
+│  3. Dry-run fixture suite (recorded LLM conversations)       │
+│     - Agent session replay with fixture provider             │
+│     - Tool use round-trips verified                          │
+│     - Extension loading validated                            │
+│  4. npm dist-tag add gsd-pi@<version> next                   │
+│                                                              │
+│  Docker: Build + push runtime image to GHCR as :next         │
+└──────────────────────────┬──────────────────────────────────┘
+                           ▼ (manual approval required)
+┌──────────────────────────────────────────────────────────────┐
+│  STAGE: PROD                         Environment: prod       │
+│                                                              │
+│  1. (Optional) Real LLM integration tests                    │
+│     - Gated behind workflow input flag                       │
+│     - Uses ANTHROPIC_API_KEY / OPENAI_API_KEY secrets        │
+│     - Budget-capped: small models, short conversations       │
+│  2. npm dist-tag add gsd-pi@<version> latest                 │
+│  3. GitHub Release created with changelog                    │
+│  4. Docker: tag runtime image as :latest + :v<version>       │
+│  5. Post-publish smoke test against @latest                  │
+└──────────────────────────────────────────────────────────────┘
+```
+
+### Version Strategy
+
+| Dist-tag | When published | Version format | Risk level |
+|----------|---------------|----------------|------------|
+| `@dev` | Every merged PR | `2.27.0-dev.a3f2c1b` | Bleeding edge |
+| `@next` | Auto-promoted from Dev | Same version, new tag | Candidate |
+| `@latest` | Manually approved from Test | Same version, new tag | Production |
+
+The `-dev.` prerelease identifier is distinct from the existing `-next.` convention used in `build-native.yml`. The two pipelines do not overlap — `build-native.yml` only triggers on `v*` tags and checks for `-next.` to determine npm dist-tag. The `-dev.` versions are published exclusively by `pipeline.yml`.
+
+### Native Binary Strategy for Dev Publishes
+
+Dev versions (`@dev` tag) use the native binaries from the most recent stable `build-native.yml` release. The `optionalDependencies` in `package.json` use `>=` ranges, so a `-dev.` version of `gsd-pi` resolves the latest stable `@gsd-build/engine-*` packages from the registry.
+
+If a PR modifies Rust native crate code (`native/` directory), the dev publish will bundle stale native binaries. This is acceptable because:
+- Native crate changes are infrequent and always accompanied by a `v*` tag release
+- The Test stage validates the installed package works end-to-end
+- Full native binary validation happens via `build-native.yml` on the version tag
+
+### Concurrency Control
+
+```yaml
+concurrency:
+  group: pipeline-${{ github.sha }}
+  cancel-in-progress: false
+```
+
+Policy:
+- Each pipeline run is keyed to its commit SHA — no two runs for the same commit race
+- Newer merges do NOT cancel in-progress promotions — a version already in the Test stage completes its promotion
+- If Run A is promoting version X to `@next` while Run B publishes version Y to `@dev`, they operate independently — `@next` and `@dev` point to different versions, which is correct
+- The Prod stage always promotes whatever version is currently at `@next`, so approving promotion after a newer version has already moved to `@next` promotes the newer one (last-writer-wins, which is the desired behavior)
+
+### Failure Modes & Recovery
+
+| Failure | Impact | Recovery |
+|---------|--------|----------|
+| Dev publish succeeds, smoke test fails | Broken version on `@dev` tag | Next successful merge overwrites `@dev`. Manual fix: `npm dist-tag add gsd-pi@<last-good> dev` |
+| Test stage fails after promoting to `@next` | Broken version on `@next` tag | Manual: `npm dist-tag add gsd-pi@<last-good> next`. `@latest` is never affected. |
+| Prod promotion publishes `@latest` then found broken | Broken production release | Manual: `npm dist-tag add gsd-pi@<previous-stable> latest` and `docker tag ghcr.io/gsd-build/gsd-pi:<previous> latest && docker push`. Post-mortem required. |
+| Docker push succeeds, npm dist-tag fails | Images and npm out of sync | Re-run the failed job (GitHub Actions retry). Images are tagged by version so stale tags are harmless. |
+| GHCR push fails | No Docker image for this version | Non-blocking — npm publish is the primary distribution. Docker image can be rebuilt manually. |
+
+Rollback responsibility: any maintainer with npm publish rights and GHCR push access. The Prod environment's required-reviewers list doubles as the rollback-authorized list.
+
+### Relationship to Existing Workflows
+
+| File | Trigger | Purpose | Status |
+|------|---------|---------|--------|
+| `ci.yml` | PR opened/updated, push to main | Pre-merge gate: build, test, typecheck | **Unchanged** |
+| `build-native.yml` | `v*` tag or manual dispatch | Cross-compile native binaries for 5 platforms | **Unchanged** |
+| `pipeline.yml` | `workflow_run` (after ci.yml succeeds on main) | Post-merge promotion: Dev → Test → Prod | **New** |
+
+The pipeline triggers via `workflow_run` after `ci.yml` completes successfully on `main`, avoiding duplicate build/test work. The Dev stage only performs version stamping, publishing, and smoke testing.
+
+## Docker Images
+
+### Multi-Stage Dockerfile
+
+Two images from a single `Dockerfile` at the repo root.
+
+#### CI Builder Image
+
+- **Name:** `ghcr.io/gsd-build/gsd-ci-builder`
+- **Base:** `node:22-bookworm`
+- **Contains:** Node 22, Rust stable toolchain, `aarch64-linux-gnu` cross-compiler
+- **Size:** ~2 GB
+- **Tags:** `:latest`, `:<YYYY-MM-DD>` (date-stamped for rollback)
+- **Rebuilt:** Only when `Dockerfile` changes
+- **Used by:** `pipeline.yml` Dev stage, optionally `ci.yml`
+- **Purpose:** Eliminates 3-5 min toolchain install on every CI run
+
+The builder image does NOT include Playwright system deps (not needed for current CI jobs). If browser-based E2E tests are added later, Playwright deps can be added at that point.
+
+#### Builder Image Versioning
+
+Builder images are tagged with both `:latest` and a date stamp (e.g., `:2026-03-17`). The `pipeline.yml` workflow pins to a specific date-stamped tag. When the Dockerfile is updated, the PR that changes it also updates the tag reference in `pipeline.yml`. This prevents a broken Dockerfile change from silently breaking all subsequent runs.
+
+#### Runtime Image
+
+- **Name:** `ghcr.io/gsd-build/gsd-pi`
+- **Base:** `node:22-slim`
+- **Contains:** Node 22, git, `gsd-pi` installed globally
+- **Size:** ~250 MB
+- **Tags:** `:latest`, `:next`, `:v2.27.0`
+- **Published:** On every Prod promotion
+- **Purpose:** `docker run ghcr.io/gsd-build/gsd-pi` as alternative to `npx`
+
+### Why These Base Images
+
+- **Bookworm for CI:** The Rust native crates depend on vendored `libgit2`, image processing, and cross-compilation to ARM64. Debian Bookworm provides the full toolchain via apt. Alpine breaks due to musl vs glibc incompatibilities with N-API bindings.
+- **Slim for runtime:** Only needs Node + git. Native `.node` binaries are prebuilt and bundled in the npm package — no Rust toolchain needed at runtime.
+
+## LLM Fixture Recording & Replay System
+
+### Architecture
+
+The fixture system hooks into the `pi-ai` provider abstraction layer to capture and replay LLM conversations without hitting real APIs.
+
+```
+Agent Session
+    │
+    ▼
+pi-ai provider abstraction
+    │
+    ▼
+FixtureProvider (intercept layer)
+    │
+    ├── record mode → Real API + save to fixture JSON
+    │
+    └── replay mode → Load fixture JSON (no API call)
+```
+
+### Integration Design
+
+The `FixtureProvider` implements the `Provider` interface from `@gsd/pi-ai` (the same interface all 20+ built-in providers implement). It registers itself via environment variable detection at provider initialization:
+
+```typescript
+// Pseudocode — actual implementation will follow pi-ai patterns
+import type { Provider, StreamingResponse } from "@gsd/pi-ai";
+
+class FixtureProvider implements Provider {
+  // In record mode: wraps the real provider, saves responses
+  // In replay mode: returns saved responses directly
+
+  async *stream(request: ProviderRequest): AsyncGenerator<StreamingResponse> {
+    if (this.mode === "replay") {
+      // Yield fixture response chunks (simulated streaming)
+      yield* this.replayTurn(this.turnIndex++);
+    } else {
+      // Proxy to real provider, capture response
+      const chunks = [];
+      for await (const chunk of this.realProvider.stream(request)) {
+        chunks.push(chunk);
+        yield chunk;
+      }
+      this.saveTurn(request, chunks);
+    }
+  }
+}
+```
+
+Key integration details:
+- **Streaming:** Fixture replay simulates streaming by yielding saved response chunks with minimal delay. This exercises the same consumer code paths as real streaming.
+- **Registration:** When `GSD_FIXTURE_MODE` is set, the fixture provider wraps the configured real provider. No changes to provider selection logic needed.
+- **Provider-agnostic:** Fixtures are captured at the `Provider` interface level (above HTTP transport), so they work regardless of which underlying provider was used during recording.
+
+### Modes
+
+| Mode | Trigger | Behavior |
+|------|---------|----------|
+| **Record** | `GSD_FIXTURE_MODE=record GSD_FIXTURE_DIR=./fixtures` | Wraps real provider, saves request/response pairs |
+| **Replay** | `GSD_FIXTURE_MODE=replay GSD_FIXTURE_DIR=./fixtures` | Returns saved responses, zero API calls |
+| **Off** | Default (no env vars) | Normal operation, no interception |
+
+### Fixture Format
+
+One JSON file per recorded session:
+
+```json
+{
+  "name": "agent-creates-file",
+  "recorded": "2026-03-17T00:00:00Z",
+  "provider": "anthropic",
+  "model": "claude-sonnet-4-6",
+  "turns": [
+    {
+      "request": {
+        "messages": [{ "role": "user", "content": "Create hello.ts" }],
+        "tools": ["Write", "Read"],
+        "model": "claude-sonnet-4-6"
+      },
+      "response": {
+        "content": [
+          { "type": "text", "text": "I'll create hello.ts for you." },
+          { "type": "tool_use", "name": "Write", "input": { "file_path": "hello.ts", "content": "console.log('hello')" } }
+        ],
+        "stopReason": "toolUse",
+        "usage": { "input": 150, "output": 45 }
+      }
+    }
+  ]
+}
+```
+
+### Matching Strategy
+
+Turn-index based. Response N is served for request N in sequence. If the conversation diverges from the fixture (e.g., unexpected turn count), the test fails explicitly with a descriptive error rather than silently producing wrong results.
+
+Why not request-body hashing: request bodies contain timestamps, random IDs, and system prompt variations that cause brittle mismatches.
+
+Why not a generic HTTP VCR: The `pi-ai` layer abstracts 20+ providers with different wire formats. Intercepting above the transport means fixtures are provider-agnostic.
+
+### What Gets Tested via Fixtures
+
+- Agent session lifecycle (start → tool calls → completion)
+- Tool dispatch and response handling
+- Multi-turn conversation flow
+- Extension loading and routing
+- Error handling paths (fixtures can include error responses)
+
+### What Does NOT Get Tested (Deferred to Live Gate)
+
+- Model output quality
+- Prompt regression
+- New tool compatibility with live APIs
+
+### Fixture Storage
+
+Committed to repo under `tests/fixtures/recordings/`. Each fixture is 5-50KB of JSON. Recording is a manual developer action, not automated in CI.
+
+### Dev Version Cleanup
+
+Old `-dev.` versions accumulate on npm with every merged PR. A scheduled workflow (`cleanup-dev-versions.yml`) runs weekly and unpublishes dev versions older than 30 days via `npm unpublish gsd-pi@<old-dev-version>`. This prevents registry bloat while keeping recent dev versions available.
+
+## New Files & Scripts
+
+### Directory Structure
+
+```
+tests/
+├── smoke/                     # CLI smoke tests (Stage: Test)
+│   ├── run.ts
+│   ├── test-version.ts
+│   ├── test-help.ts
+│   └── test-init.ts
+│
+├── fixtures/                  # Recorded LLM replay tests (Stage: Test)
+│   ├── run.ts                 # Test runner
+│   ├── record.ts              # Recording helper
+│   ├── provider.ts            # FixtureProvider intercept layer
+│   └── recordings/
+│       ├── agent-creates-file.json
+│       ├── agent-reads-and-edits.json
+│       ├── agent-handles-error.json
+│       └── agent-multi-turn-tools.json
+│
+├── live/                      # Real LLM tests (Stage: Prod, optional)
+│   ├── run.ts
+│   ├── test-anthropic-roundtrip.ts
+│   └── test-openai-roundtrip.ts
+│
+scripts/
+├── version-stamp.mjs          # Stamps <version>-dev.<sha>
+
+Dockerfile                     # Multi-stage: builder + runtime
+.github/workflows/pipeline.yml # Promotion pipeline
+.github/workflows/cleanup-dev-versions.yml # Weekly dev version pruning
+```
+
+All test files use `.ts` with `--experimental-strip-types` for consistency with the existing test convention in the project.
+
+### New npm Scripts
+
+```json
+{
+  "test:smoke": "node --experimental-strip-types tests/smoke/run.ts",
+  "test:fixtures": "node --experimental-strip-types tests/fixtures/run.ts",
+  "test:fixtures:record": "GSD_FIXTURE_MODE=record node --experimental-strip-types tests/fixtures/record.ts",
+  "test:live": "GSD_LIVE_TESTS=1 node --experimental-strip-types tests/live/run.ts",
+  "pipeline:version-stamp": "node scripts/version-stamp.mjs",
+  "docker:build-runtime": "docker build --target runtime -t ghcr.io/gsd-build/gsd-pi .",
+  "docker:build-builder": "docker build --target builder -t ghcr.io/gsd-build/gsd-ci-builder ."
+}
+```
+
+## GitHub Configuration
+
+| Setting | Value |
+|---------|-------|
+| Environment: `dev` | No protection rules |
+| Environment: `test` | No protection rules (auto-promote) |
+| Environment: `prod` | Required reviewers: maintainers |
+| Secret: `NPM_TOKEN` | All environments |
+| Secret: `ANTHROPIC_API_KEY` | Prod only |
+| Secret: `OPENAI_API_KEY` | Prod only |
+| GHCR | Enabled for org |
+
+## Success Criteria
+
+1. A merged PR is installable via `npx gsd-pi@dev` within 15 minutes (assumes warm CI builder image cache)
+2. Fixture replay tests complete in under 60 seconds with zero API calls
+3. The full Dev → Test promotion completes without human intervention
+4. Prod promotion is blocked until a maintainer explicitly approves
+5. `docker run ghcr.io/gsd-build/gsd-pi --version` returns the correct version
+6. Existing `ci.yml` and `build-native.yml` workflows continue to work unchanged
+7. CI builder image reduces toolchain setup from ~3-5 min to ~30s pull
diff --git a/docs/token-optimization.md b/docs/token-optimization.md
new file mode 100644
index 000000000..a622869d1
--- /dev/null
+++ b/docs/token-optimization.md
@@ -0,0 +1,326 @@
+# Token Optimization
+
+*Introduced in v2.17.0*
+
+GSD 2.17 introduces a coordinated token optimization system that can reduce token usage by 40-60% without sacrificing output quality for most workloads. The system has three pillars: **token profiles**, **context compression**, and **complexity-based task routing**.
+
+## Token Profiles
+
+A token profile is a single preference that coordinates model selection, phase skipping, and context compression level. Set it in your preferences:
+
+```yaml
+---
+version: 1
+token_profile: balanced
+---
+```
+
+Three profiles are available:
+
+### `budget` — Maximum Savings (40-60% reduction)
+
+Optimized for cost-sensitive workflows. Uses cheaper models, skips optional phases, and compresses dispatch context to the minimum needed.
+
+| Dimension | Setting |
+|-----------|---------|
+| Planning model | Sonnet |
+| Execution model | Sonnet |
+| Simple task model | Haiku |
+| Completion model | Haiku |
+| Subagent model | Haiku |
+| Milestone research | **Skipped** |
+| Slice research | **Skipped** |
+| Roadmap reassessment | **Skipped** |
+| Context inline level | **Minimal** — drops decisions, requirements, extra templates |
+
+Best for: prototyping, small projects, well-understood codebases, cost-conscious iteration.
+
+### `balanced` — Smart Defaults (default)
+
+The default profile. Keeps the important phases, skips the ones with diminishing returns for most projects, and uses standard context compression.
+
+| Dimension | Setting |
+|-----------|---------|
+| Planning model | User's default |
+| Execution model | User's default |
+| Simple task model | User's default |
+| Completion model | User's default |
+| Subagent model | Sonnet |
+| Milestone research | Runs |
+| Slice research | **Skipped** |
+| Roadmap reassessment | Runs |
+| Context inline level | **Standard** — includes key context, drops low-signal extras |
+
+Best for: most projects, day-to-day development.
+
+### `quality` — Full Context (no compression)
+
+Every phase runs. Every context artifact is inlined. No shortcuts.
+
+| Dimension | Setting |
+|-----------|---------|
+| All models | User's configured defaults |
+| All phases | Run |
+| Context inline level | **Full** — everything inlined |
+
+Best for: complex architectures, greenfield projects requiring deep research, critical production work.
+
+## Context Compression
+
+Each token profile maps to an **inline level** that controls how much context is pre-loaded into dispatch prompts:
+
+| Profile | Inline Level | What's Included |
+|---------|-------------|-----------------|
+| `budget` | `minimal` | Task plan, essential prior summaries (truncated). Drops decisions register, requirements, UAT template, secrets manifest. |
+| `balanced` | `standard` | Task plan, prior summaries, slice plan, roadmap excerpt. Drops some supplementary templates. |
+| `quality` | `full` | Everything — all plans, summaries, decisions, requirements, templates, and root files. |
+
+### How Compression Works
+
+Dispatch prompt builders accept an `inlineLevel` parameter. At each level, specific artifacts are gated:
+
+**Minimal level reductions:**
+- `buildExecuteTaskPrompt` — drops the decisions template, truncates prior summaries to the most recent one
+- `buildPlanMilestonePrompt` — drops `PROJECT.md`, `REQUIREMENTS.md`, decisions, and supplementary templates like `secrets-manifest`
+- `buildCompleteSlicePrompt` — drops requirements and UAT template inlining
+- `buildCompleteMilestonePrompt` — drops root GSD file inlining
+- `buildReassessRoadmapPrompt` — drops project, requirements, and decisions files
+
+These are cumulative — `standard` drops a subset, `minimal` drops more. The `full` level preserves all context (the pre-2.17 behavior).
+
+### Overriding Inline Level
+
+The inline level is derived from your `token_profile`. To control phases independently of the profile, use the `phases` preference:
+
+```yaml
+---
+version: 1
+token_profile: budget
+phases:
+  skip_research: false    # override: run research even on budget
+---
+```
+
+Explicit `phases` settings always override the profile defaults.
+
+## Complexity-Based Task Routing
+
+GSD classifies each task by complexity and routes it to an appropriate model tier when dynamic routing is enabled. Simple documentation fixes use cheaper models while complex architectural work gets the reasoning power it needs.
+
+> **Prerequisite:** Dynamic routing requires explicit `models` in your preferences. Without a `models` section, routing is skipped and the session's launch model is used for all phases. Token profiles set `models` automatically.
+
+> **Ceiling behavior:** When dynamic routing is active, the model configured for each phase acts as a **ceiling**, not a fixed assignment. The router may downgrade to a cheaper model for simpler tasks but never upgrades beyond the configured model.
+
+### How Classification Works
+
+Tasks are classified by analyzing the task plan:
+
+| Signal | Simple | Standard | Complex |
+|--------|--------|----------|---------|
+| Step count | ≤ 3 | 4-7 | ≥ 8 |
+| File count | ≤ 3 | 4-7 | ≥ 8 |
+| Description length | < 500 chars | 500-2000 | > 2000 chars |
+| Code blocks | — | — | ≥ 5 |
+| Signal words | None | Any present | — |
+
+**Signal words** that prevent simple classification: `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`, `distributed`, `backward compat`, `migration`, `architecture`, `concurrency`, `compatibility`.
+
+Empty or malformed plans default to `standard` (conservative).
+
+### Unit Type Defaults
+
+Non-task units have built-in tier assignments:
+
+| Unit Type | Default Tier |
+|-----------|-------------|
+| `complete-slice`, `run-uat` | Light |
+| `research-*`, `plan-*`, `execute-task`, `complete-milestone` | Standard |
+| `replan-slice`, `reassess-roadmap` | Heavy |
+| `hook/*` | Light |
+
+### Model Routing
+
+Each tier maps to a model configuration:
+
+| Tier | Model Phase Key | Typical Model |
+|------|----------------|---------------|
+| Light | `completion` | Haiku (budget) / user default |
+| Standard | `execution` | Sonnet / user default |
+| Heavy | `execution` | Opus / user default |
+
+Simple tasks use the `execution_simple` model key when configured. This is set automatically by the `budget` profile to Haiku.
+
+### Budget Pressure
+
+When approaching your budget ceiling, the classifier automatically downgrades tiers:
+
+| Budget Used | Effect |
+|------------|--------|
+| < 50% | No adjustment |
+| 50-75% | Standard → Light |
+| 75-90% | Standard → Light |
+| > 90% | Everything except Heavy → Light; Heavy → Standard |
+
+This graduated approach preserves model quality for the most complex work while progressively reducing cost as the ceiling approaches.
+
+## Adaptive Learning (Routing History)
+
+GSD tracks the success and failure of each tier assignment over time and adjusts future classifications accordingly. This is opt-in — it happens automatically and persists in `.gsd/routing-history.json`.
+
+### How It Works
+
+1. After each unit completes, the outcome (success/failure) is recorded against the unit type and tier used
+2. Outcomes are tracked per-pattern (e.g., `execute-task`, `execute-task:docs`) with a rolling window of the last 50 entries
+3. If a tier's failure rate exceeds 20% for a given pattern, future classifications for that pattern are bumped up one tier
+4. The system also accepts tag-specific patterns (e.g., `execute-task:test` vs `execute-task:frontend`) for more granular routing
+
+### User Feedback
+
+Use `/gsd rate` to submit feedback on the last completed unit's model tier:
+
+```
+/gsd rate over    # model was overpowered — encourage cheaper next time
+/gsd rate ok      # model was appropriate — no adjustment
+/gsd rate under   # model was too weak — encourage stronger next time
+```
+
+Feedback signals are weighted 2× compared to automatic outcomes. Requires dynamic routing to be active (the last unit must have tier data).
+
+### Data Management
+
+```bash
+# Routing history is stored per-project
+.gsd/routing-history.json
+
+# Clear history to reset adaptive learning
+# (happens via the routing-history module API)
+```
+
+The feedback array is capped at 200 entries. Per-pattern outcome counts use a rolling window of 50 to prevent stale data from dominating.
+
+## Configuration Examples
+
+### Cost-Optimized Setup
+
+```yaml
+---
+version: 1
+token_profile: budget
+budget_ceiling: 25.00
+models:
+  execution_simple: claude-haiku-4-5-20250414
+---
+```
+
+### Balanced with Custom Models
+
+```yaml
+---
+version: 1
+token_profile: balanced
+models:
+  planning:
+    model: claude-opus-4-6
+    fallbacks:
+      - openrouter/z-ai/glm-5
+  execution: claude-sonnet-4-6
+---
+```
+
+### Full Quality for Critical Work
+
+```yaml
+---
+version: 1
+token_profile: quality
+models:
+  planning: claude-opus-4-6
+  execution: claude-opus-4-6
+---
+```
+
+### Per-Phase Overrides
+
+The `token_profile` sets defaults, but explicit preferences always win:
+
+```yaml
+---
+version: 1
+token_profile: budget
+phases:
+  skip_research: false     # override: keep milestone research
+models:
+  planning: claude-opus-4-6  # override: use Opus for planning despite budget profile
+---
+```
+
+## How the Pieces Fit Together
+
+```
+preferences.md
+  └─ token_profile: balanced
+       ├─ resolveProfileDefaults() → model defaults + phase skip defaults
+       ├─ resolveInlineLevel() → standard
+       │    └─ prompt builders gate context inclusion by level
+       └─ classifyUnitComplexity() → routes to execution/execution_simple model
+            ├─ task plan analysis (steps, files, signals)
+            ├─ unit type defaults
+            ├─ budget pressure adjustment
+            └─ adaptive learning from routing-history.json
+```
+
+The profile is resolved once and flows through the entire dispatch pipeline. Explicit preferences override profile defaults at every layer.
+
+## Prompt Compression
+
+*Introduced in v2.29.0*
+
+GSD can apply deterministic prompt compression before falling back to section-boundary truncation. This preserves more information when context exceeds the budget.
+
+### Compression Strategy
+
+Set via preferences:
+
+```yaml
+---
+version: 1
+compression_strategy: compress
+---
+```
+
+Two strategies are available:
+
+| Strategy | Behavior | Default For |
+|----------|----------|------------|
+| `truncate` | Drop entire sections at boundaries (pre-v2.29 behavior) | `quality` profile |
+| `compress` | Apply heuristic text compression first, then truncate if still over budget | `budget` and `balanced` profiles |
+
+Compression removes redundant whitespace, abbreviates verbose phrases, deduplicates repeated content, and removes low-information boilerplate — all deterministically with no LLM calls.
+
+### Context Selection
+
+Controls how files are inlined into prompts:
+
+```yaml
+---
+version: 1
+context_selection: smart
+---
+```
+
+| Mode | Behavior | Default For |
+|------|----------|------------|
+| `full` | Inline entire files | `balanced` and `quality` profiles |
+| `smart` | Use TF-IDF semantic chunking for large files (>3KB), including only relevant portions | `budget` profile |
+
+### Structured Data Compression
+
+At `budget` and `balanced` inline levels, decisions and requirements are formatted in a compact notation that saves 30-50% tokens compared to full markdown tables.
+
+### Summary Distillation
+
+When a slice has 3+ dependency summaries and the total exceeds the summary budget, GSD extracts essential structured data (provides, requires, key_files, key_decisions) and drops verbose prose sections before falling back to section-boundary truncation.
+
+### Cache Hit Rate Tracking
+
+The metrics ledger now tracks `cacheHitRate` per unit (percentage of input tokens served from cache) and provides `aggregateCacheHitRate()` for session-wide cache performance.
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
new file mode 100644
index 000000000..977a7881a
--- /dev/null
+++ b/docs/troubleshooting.md
@@ -0,0 +1,341 @@
+# Troubleshooting
+
+## `/gsd doctor`
+
+The built-in diagnostic tool validates `.gsd/` integrity:
+
+```
+/gsd doctor
+```
+
+It checks:
+- File structure and naming conventions
+- Roadmap ↔ slice ↔ task referential integrity
+- Completion state consistency
+- Git worktree health (worktree and branch modes only — skipped in none mode)
+- Stale lock files and orphaned runtime records
+
+## Common Issues
+
+### Auto mode loops on the same unit
+
+**Symptoms:** The same unit (e.g., `research-slice` or `plan-slice`) dispatches repeatedly until hitting the dispatch limit.
+
+**Causes:**
+- Stale cache after a crash — the in-memory file listing doesn't reflect new artifacts
+- The LLM didn't produce the expected artifact file
+
+**Fix:** Run `/gsd doctor` to repair state, then resume with `/gsd auto`. If the issue persists, check that the expected artifact file exists on disk.
+
+### Auto mode stops with "Loop detected"
+
+**Cause:** A unit failed to produce its expected artifact twice in a row.
+
+**Fix:** Check the task plan for clarity. If the plan is ambiguous, refine it manually, then `/gsd auto` to resume.
+
+### Wrong files in worktree
+
+**Symptoms:** Planning artifacts or code appear in the wrong directory.
+
+**Cause:** The LLM wrote to the main repo instead of the worktree.
+
+**Fix:** This was fixed in v2.14+. If you're on an older version, update. The dispatch prompt now includes explicit working directory instructions.
+
+### `command not found: gsd` after install
+
+**Symptoms:** `npm install -g gsd-pi` succeeds but `gsd` isn't found.
+
+**Cause:** npm's global bin directory isn't in your shell's `$PATH`.
+
+**Fix:**
+
+```bash
+# Find where npm installed the binary
+npm prefix -g
+# Output: /opt/homebrew (Apple Silicon) or /usr/local (Intel Mac)
+
+# Add the bin directory to your PATH if missing
+echo 'export PATH="$(npm prefix -g)/bin:$PATH"' >> ~/.zshrc
+source ~/.zshrc
+```
+
+**Workaround:** Run `npx gsd-pi` or `$(npm prefix -g)/bin/gsd` directly.
+
+**Common causes:**
+- **Homebrew Node** — `/opt/homebrew/bin` should be in PATH but sometimes isn't if Homebrew init is missing from your shell profile
+- **Version manager (nvm, fnm, mise)** — global bin is version-specific; ensure your version manager initializes in your shell config
+- **oh-my-zsh** — the `gitfast` plugin aliases `gsd` to `git svn dcommit`. Check with `alias gsd` and unalias if needed
+
+### `npm install -g gsd-pi` fails
+
+**Common causes:**
+- Missing workspace packages — fixed in v2.10.4+
+- `postinstall` hangs on Linux (Playwright `--with-deps` triggering sudo) — fixed in v2.3.6+
+- Node.js version too old — requires ≥ 22.0.0
+
+### Provider errors during auto mode
+
+**Symptoms:** Auto mode pauses with a provider error (rate limit, server error, auth failure).
+
+**How GSD handles it (v2.26):**
+
+| Error type | Auto-resume? | Delay |
+|-----------|-------------|-------|
+| Rate limit (429, "too many requests") | ✅ Yes | retry-after header or 60s |
+| Server error (500, 502, 503, "overloaded") | ✅ Yes | 30s |
+| Auth/billing ("unauthorized", "invalid key") | ❌ No | Manual resume |
+
+For transient errors, GSD pauses briefly and resumes automatically. For permanent errors, configure fallback models:
+
+```yaml
+models:
+  execution:
+    model: claude-sonnet-4-6
+    fallbacks:
+      - openrouter/minimax/minimax-m2.5
+```
+
+**Headless mode:** `gsd headless auto` auto-restarts the entire process on crash (default 3 attempts with exponential backoff). Combined with provider error auto-resume, this enables true overnight unattended execution.
+
+### Budget ceiling reached
+
+**Symptoms:** Auto mode pauses with "Budget ceiling reached."
+
+**Fix:** Increase `budget_ceiling` in preferences, or switch to `budget` token profile to reduce per-unit cost, then resume with `/gsd auto`.
+
+### Stale lock file
+
+**Symptoms:** Auto mode won't start, says another session is running.
+
+**Fix:** GSD automatically detects stale locks — if the owning PID is dead, the lock is cleaned up and re-acquired on the next `/gsd auto`. This includes stranded `.gsd.lock/` directories left by `proper-lockfile` after crashes. If automatic recovery fails, delete `.gsd/auto.lock` and the `.gsd.lock/` directory manually:
+
+```bash
+rm -f .gsd/auto.lock
+rm -rf "$(dirname .gsd)/.gsd.lock"
+```
+
+### Git merge conflicts
+
+**Symptoms:** Worktree merge fails on `.gsd/` files.
+
+**Fix:** GSD auto-resolves conflicts on `.gsd/` runtime files. For content conflicts in code files, the LLM is given an opportunity to resolve them via a fix-merge session. If that fails, manual resolution is needed.
+
+### Pre-dispatch says the milestone integration branch no longer exists
+
+**Symptoms:** Auto mode or `/gsd doctor` reports that a milestone recorded an integration branch that no longer exists in git.
+
+**What it means:** The milestone's `.gsd/milestones/<MID>/<MID>-META.json` still points at the branch that was active when the milestone started, but that branch has since been renamed or deleted.
+
+**Current behavior:**
+- If GSD can deterministically recover to a safe branch, it no longer hard-stops auto mode.
+- Safe fallbacks are:
+  - explicit `git.main_branch` when configured and present
+  - the repo's detected default integration branch (for example `main` or `master`)
+- In that case `/gsd doctor` reports a warning and `/gsd doctor fix` rewrites the stale metadata to the effective branch.
+- GSD still blocks when no safe fallback branch can be determined.
+
+**Fix:**
+- Run `/gsd doctor fix` to rewrite the stale milestone metadata automatically when the fallback is obvious.
+- If GSD still blocks, recreate the missing branch or update your git preferences so `git.main_branch` points at a real branch.
+
+### Transient `EBUSY` / `EPERM` / `EACCES` while writing `.gsd/` files
+
+**Symptoms:** On Windows, auto mode or doctor occasionally fails while updating `.gsd/` files with errors like `EBUSY`, `EPERM`, or `EACCES`.
+
+**Cause:** Antivirus, indexers, editors, or filesystem watchers can briefly lock the destination or temp file just as GSD performs the atomic rename.
+
+**Current behavior:** GSD now retries those transient rename failures with a short bounded backoff before surfacing an error. The retry is intentionally limited so genuine filesystem problems still fail loudly instead of hanging forever.
+
+**Fix:**
+- Re-run the operation; most transient lock races clear quickly.
+- If the error persists, close tools that may be holding the file open and then retry.
+- If repeated failures continue, run `/gsd doctor` to confirm the repo state is still healthy and report the exact path + error code.
+
+## MCP Client Issues
+
+### `mcp_servers` shows no configured servers
+
+**Symptoms:** `mcp_servers` reports no servers configured.
+
+**Common causes:**
+- No `.mcp.json` or `.gsd/mcp.json` file exists in the current project
+- The config file is malformed JSON
+- The server is configured in a different project directory than the one where you launched GSD
+
+**Fix:**
+- Add the server to `.mcp.json` or `.gsd/mcp.json`
+- Verify the file parses as JSON
+- Re-run `mcp_servers(refresh=true)`
+
+### `mcp_discover` times out
+
+**Symptoms:** `mcp_discover` fails with a timeout.
+
+**Common causes:**
+- The server process starts but never completes the MCP handshake
+- The configured command points to a script that hangs on startup
+- The server is waiting on an unavailable dependency or backend service
+
+**Fix:**
+- Run the configured command directly outside GSD and confirm the server actually starts
+- Check that any backend URLs or required services are reachable
+- For local custom servers, verify the implementation is using an MCP SDK or a correct stdio protocol implementation
+
+### `mcp_discover` reports connection closed
+
+**Symptoms:** `mcp_discover` fails immediately with a connection-closed error.
+
+**Common causes:**
+- Wrong executable path
+- Wrong script path
+- Missing runtime dependency
+- The server crashes before responding
+
+**Fix:**
+- Verify `command` and `args` paths are correct and absolute
+- Run the command manually to catch import/runtime errors
+- Check that the configured interpreter or runtime exists on the machine
+
+### `mcp_call` fails because required arguments are missing
+
+**Symptoms:** A discovered MCP tool exists, but calling it fails validation because required fields are missing.
+
+**Common causes:**
+- The call shape is wrong
+- The target server's tool schema changed
+- You're calling a stale server definition or stale branch build
+
+**Fix:**
+- Re-run `mcp_discover(server="name")` and confirm the exact required argument names
+- Call the tool with `mcp_call(server="name", tool="tool_name", args={...})`
+- If you're developing GSD itself, rebuild after schema changes with `npm run build`
+
+### Local stdio server works manually but not in GSD
+
+**Symptoms:** Running the server command manually seems fine, but GSD can't connect.
+
+**Common causes:**
+- The server depends on shell state that GSD doesn't inherit
+- Relative paths only work from a different working directory
+- Required environment variables exist in your shell but not in the MCP config
+
+**Fix:**
+- Use absolute paths for `command` and script arguments
+- Set required environment variables in the MCP config's `env` block
+- If needed, set `cwd` explicitly in the server definition
+
+### Session lock stolen by `/gsd` in another terminal
+
+**Symptoms:** Running `/gsd` (step mode) in a second terminal causes a running auto-mode session to lose its lock.
+
+**Fix:** Fixed in v2.36.0. Bare `/gsd` no longer steals the session lock from a running auto-mode session. Upgrade to the latest version.
+
+### Worktree commits landing on main instead of milestone branch
+
+**Symptoms:** Auto-mode commits in a worktree end up on `main` instead of the `milestone/<MID>` branch.
+
+**Fix:** Fixed in v2.37.1. CWD is now realigned before dispatch and stale merge state is cleaned on failure. Upgrade to the latest version.
+
+### Extension loader fails with subpath export error
+
+**Symptoms:** Extension fails to load with a `Cannot find module` error referencing npm subpath exports.
+
+**Cause:** Dynamic imports in the extension loader didn't resolve npm subpath exports (e.g., `@pkg/foo/bar`).
+
+**Fix:** Fixed in v2.38+. The extension loader now auto-resolves npm subpath exports and creates a `node_modules` symlink for dynamic import resolution. Upgrade to the latest version.
+
+## Recovery Procedures
+
+### Reset auto mode state
+
+```bash
+rm .gsd/auto.lock
+rm .gsd/completed-units.json
+```
+
+Then `/gsd auto` to restart from current disk state.
+
+### Reset routing history
+
+If adaptive model routing is producing bad results, clear the routing history:
+
+```bash
+rm .gsd/routing-history.json
+```
+
+### Full state rebuild
+
+```
+/gsd doctor
+```
+
+Doctor rebuilds `STATE.md` from plan and roadmap files on disk and fixes detected inconsistencies.
+
+## Getting Help
+
+- **GitHub Issues:** [github.com/gsd-build/GSD-2/issues](https://github.com/gsd-build/GSD-2/issues)
+- **Dashboard:** `Ctrl+Alt+G` or `/gsd status` for real-time diagnostics
+- **Forensics:** `/gsd forensics` for structured post-mortem analysis of auto-mode failures
+- **Session logs:** `.gsd/activity/` contains JSONL session dumps for crash forensics
+
+## Windows-Specific Issues
+
+### LSP returns ENOENT on Windows (MSYS2/Git Bash)
+
+**Symptoms:** LSP initialization fails with `ENOENT` or resolves POSIX-style paths like `/c/Users/...` instead of `C:\Users\...`.
+
+**Cause:** The `which` command in MSYS2/Git Bash returns POSIX paths that Node.js `spawn()` can't resolve.
+
+**Fix:** Updated in v2.29+ to use `where.exe` on Windows. Upgrade to the latest version.
+
+### EBUSY errors during WXT/extension builds
+
+**Symptoms:** `EBUSY: resource busy or locked, rmdir .output/chrome-mv3` when building browser extensions.
+
+**Cause:** A Chromium browser has the extension loaded from the build output directory, preventing deletion.
+
+**Fix:** Close the browser extension, or set a different `outDirTemplate` in your WXT config to avoid the locked directory.
+
+## Database Issues
+
+### "GSD database is not available"
+
+**Symptoms:** `gsd_decision_save` (or its alias `gsd_save_decision`), `gsd_requirement_update` (or `gsd_update_requirement`), or `gsd_summary_save` (or `gsd_save_summary`) fail with this error.
+
+**Cause:** The SQLite database wasn't initialized. This happens in manual `/gsd` sessions (non-auto mode) on versions before v2.29.
+
+**Fix:** Updated in v2.29+ to auto-initialize the database on first tool call. Upgrade to the latest version.
+
+## Verification Issues
+
+### Verification gate fails with shell syntax error
+
+**Symptoms:** `stderr: /bin/sh: 1: Syntax error: "(" unexpected` during verification checks.
+
+**Cause:** A description-like string (e.g., `All 10 checks pass (build, lint)`) was treated as a shell command. This can happen when task plans have `verify:` fields with prose instead of actual commands.
+
+**Fix:** Updated in v2.29+ to filter preference commands through `isLikelyCommand()`. Ensure `verification_commands` in preferences contains only valid shell commands, not descriptions.
+
+## LSP (Language Server Protocol)
+
+### "LSP isn't available in this workspace"
+
+GSD auto-detects language servers based on project files (e.g. `package.json` → TypeScript, `Cargo.toml` → Rust, `go.mod` → Go). If no servers are detected, the agent skips LSP features.
+
+**Check status:**
+```
+lsp status
+```
+
+This shows which servers are active and, if none are found, diagnoses why — including which project markers were detected but which server commands are missing.
+
+**Common fixes:**
+
+| Project type | Install command |
+|-------------|-----------------|
+| TypeScript/JavaScript | `npm install -g typescript-language-server typescript` |
+| Python | `pip install pyright` or `pip install python-lsp-server` |
+| Rust | `rustup component add rust-analyzer` |
+| Go | `go install golang.org/x/tools/gopls@latest` |
+
+After installing, run `lsp reload` to restart detection without restarting GSD.
diff --git a/docs/visualizer.md b/docs/visualizer.md
new file mode 100644
index 000000000..2ca3e4159
--- /dev/null
+++ b/docs/visualizer.md
@@ -0,0 +1,104 @@
+# Workflow Visualizer
+
+*Introduced in v2.19.0*
+
+The workflow visualizer is a full-screen TUI overlay that shows project progress, dependencies, cost metrics, and execution timeline in an interactive four-tab view.
+
+## Opening the Visualizer
+
+```
+/gsd visualize
+```
+
+Or configure automatic display after milestone completion:
+
+```yaml
+auto_visualize: true
+```
+
+## Tabs
+
+Switch tabs with `Tab`, `1`-`4`, or arrow keys.
+
+### 1. Progress
+
+A tree view of milestones, slices, and tasks with completion status:
+
+```
+M001: User Management                        3/6 tasks ⏳
+  ✅ S01: Auth module                         3/3 tasks
+    ✅ T01: Core types
+    ✅ T02: JWT middleware
+    ✅ T03: Login flow
+  ⏳ S02: User dashboard                      1/2 tasks
+    ✅ T01: Layout component
+    ⬜ T02: Profile page
+  ⬜ S03: Admin panel                         0/1 tasks
+```
+
+Shows checkmarks for completed items, spinners for in-progress, and empty boxes for pending. Task counts and completion percentages are displayed at each level.
+
+**Discussion status** is also shown when milestones have been through a discussion phase — indicates whether requirements were captured and what state the discussion left off in.
+
+### 2. Dependencies
+
+An ASCII dependency graph showing slice relationships:
+
+```
+S01 ──→ S02 ──→ S04
+  └───→ S03 ──↗
+```
+
+Visualizes the `depends:` field from the roadmap, making it easy to see which slices are blocked and which can proceed.
+
+### 3. Metrics
+
+Bar charts showing cost and token usage breakdowns:
+
+- **By phase** — research, planning, execution, completion, reassessment
+- **By slice** — cost per slice with running totals
+- **By model** — which models consumed the most budget
+
+Uses data from `.gsd/metrics.json`.
+
+### 4. Timeline
+
+Chronological execution history showing:
+
+- Unit type and ID
+- Start/end timestamps
+- Duration
+- Model used
+- Token counts
+
+Ordered by execution time, showing the full history of auto-mode dispatches.
+
+## Controls
+
+| Key | Action |
+|-----|--------|
+| `Tab` | Next tab |
+| `Shift+Tab` | Previous tab |
+| `1`-`4` | Jump to tab |
+| `↑`/`↓` | Scroll within tab |
+| `Escape` / `q` | Close visualizer |
+
+## Auto-Refresh
+
+The visualizer refreshes data from disk every 2 seconds, so it stays current if opened alongside a running auto-mode session.
+
+## HTML Export (v2.26)
+
+For shareable reports outside the terminal, use `/gsd export --html`. This generates a self-contained HTML file in `.gsd/reports/` with the same data as the TUI visualizer — progress tree, dependency graph (SVG DAG), cost/token bar charts, execution timeline, changelog, and knowledge base. All CSS and JS are inlined — no external dependencies. Printable to PDF from any browser.
+
+An auto-generated `index.html` shows all reports with progression metrics across milestones.
+
+```yaml
+auto_report: true    # auto-generate after milestone completion (default)
+```
+
+## Configuration
+
+```yaml
+auto_visualize: true    # show visualizer after milestone completion
+```
diff --git a/docs/web-interface.md b/docs/web-interface.md
new file mode 100644
index 000000000..ab2ee0ad1
--- /dev/null
+++ b/docs/web-interface.md
@@ -0,0 +1,45 @@
+# Web Interface
+
+> Added in v2.41.0
+
+GSD includes a browser-based web interface for project management, real-time progress monitoring, and multi-project support.
+
+## Quick Start
+
+```bash
+pi --web
+```
+
+This starts a local web server and opens the GSD dashboard in your default browser.
+
+## Features
+
+- **Project management** — view milestones, slices, and tasks in a visual dashboard
+- **Real-time progress** — server-sent events push status updates as auto-mode executes
+- **Multi-project support** — manage multiple projects from a single browser tab via `?project=` URL parameter
+- **Onboarding flow** — API key setup and provider configuration through the browser
+- **Model selection** — switch models and providers from the web UI
+
+## Architecture
+
+The web interface is built with Next.js and communicates with the GSD backend via a bridge service. Each project gets its own bridge instance, providing isolation for concurrent sessions.
+
+Key components:
+- `ProjectBridgeService` — per-project command routing and SSE subscription
+- `getProjectBridgeServiceForCwd()` — registry returning distinct instances per project path
+- `resolveProjectCwd()` — reads `?project=` from request URL or falls back to `GSD_WEB_PROJECT_CWD`
+
+## Configuration
+
+The web server binds to `localhost` by default. No additional configuration is required.
+
+### Environment Variables
+
+| Variable | Description |
+|----------|-------------|
+| `GSD_WEB_PROJECT_CWD` | Default project path when `?project=` is not specified |
+
+## Platform Notes
+
+- **Windows**: The web build is skipped on Windows due to Next.js webpack EPERM issues with system directories. The CLI remains fully functional.
+- **macOS/Linux**: Full support.
diff --git a/docs/what-is-pi/01-what-pi-is.md b/docs/what-is-pi/01-what-pi-is.md
new file mode 100644
index 000000000..8494b9a98
--- /dev/null
+++ b/docs/what-is-pi/01-what-pi-is.md
@@ -0,0 +1,16 @@
+# What Pi Is
+
+Pi is a **terminal-native coding agent harness**. It sits between you and an LLM, giving the model tools to read, write, edit, and execute code on your machine, while giving you a rich terminal UI with session management, branching, and a deep customization system.
+
+**What it is not:**
+- It's not a thin wrapper around an API — it has a full session system, branching, compaction, and event architecture
+- It's not a locked-down product — nearly everything can be replaced, extended, or overridden via TypeScript extensions
+- It's not tied to one model — it supports 20+ providers and you can switch models mid-conversation
+
+**The one-liner:** Pi is a minimal, aggressively extensible coding agent that runs in your terminal, works with any major LLM provider, and lets you adapt it to your workflow instead of adapting to it.
+
+**Repository:** [github.com/badlogic/pi-mono](https://github.com/badlogic/pi-mono)  
+**Package:** `@mariozechner/pi-coding-agent`  
+**Website:** [shittycodingagent.ai](https://shittycodingagent.ai)
+
+---
diff --git a/docs/what-is-pi/02-design-philosophy.md b/docs/what-is-pi/02-design-philosophy.md
new file mode 100644
index 000000000..016fdcc35
--- /dev/null
+++ b/docs/what-is-pi/02-design-philosophy.md
@@ -0,0 +1,34 @@
+# Design Philosophy
+
+Pi has a very specific philosophy that explains almost every architectural decision:
+
+### "Extend, don't fork"
+
+Other coding agents bake features in. If you want sub-agents, plan mode, permission gates, or custom tools, you either use what they built or you fork the project. Pi takes the opposite approach: the core is deliberately minimal, and everything beyond the basics is built through the extension system.
+
+### What Pi ships without (on purpose)
+
+| Feature | Pi's approach |
+|---------|--------------|
+| Sub-agents | Build with extensions, or install a package |
+| Plan mode | Build with extensions, or install a package |
+| Permission popups | Build with extensions — design your own security model |
+| MCP support | Build with extensions — or use Skills instead |
+| Background bash | Use tmux — full observability, direct interaction |
+| Built-in todos | They confuse models. Use a TODO.md, or build with extensions |
+
+This isn't missing features — it's a deliberate architectural choice. Every baked-in feature is an opinion that might not match your workflow. Pi gives you the primitives to build exactly what you need.
+
+### The extension system as a first-class citizen
+
+Extensions aren't an afterthought. The entire event system, tool registration, command system, custom UI, and session persistence were designed from the ground up to make extensions as powerful as built-in features. An extension can:
+- Override any built-in tool
+- Replace the system prompt
+- Modify every message sent to the LLM
+- Replace the input editor entirely
+- Register new model providers
+- Control the agent's tool set at runtime
+
+This is the core value proposition: **pi is a platform, not just a tool.**
+
+---
diff --git a/docs/what-is-pi/03-the-four-modes-of-operation.md b/docs/what-is-pi/03-the-four-modes-of-operation.md
new file mode 100644
index 000000000..9cb19a1f1
--- /dev/null
+++ b/docs/what-is-pi/03-the-four-modes-of-operation.md
@@ -0,0 +1,42 @@
+# The Four Modes of Operation
+
+Pi runs in four modes, each serving a different use case:
+
+### Interactive Mode (default)
+
+The full TUI experience. You type prompts, see responses stream, watch tool calls execute, and interact with the agent in real-time. This is how most people use pi day-to-day.
+
+```bash
+pi                                    # Start interactive
+pi "List all TypeScript files"        # Start with initial prompt
+pi -c                                 # Continue last session
+pi -r                                 # Browse and resume a session
+```
+
+### Print Mode (`-p`)
+
+Non-interactive. Sends a prompt, prints the response, exits. Perfect for scripting and pipelines.
+
+```bash
+pi -p "Summarize this codebase"
+pi -p @screenshot.png "What's in this image?"
+pi -p --tools read,grep "Review the code in src/"
+```
+
+### JSON Mode (`--mode json`)
+
+Streams all events as JSON lines to stdout. For building tools that consume pi's output programmatically.
+
+```bash
+pi --mode json "Fix the bug in auth.ts"
+```
+
+### RPC Mode (`--mode rpc`)
+
+Full bidirectional JSON protocol over stdin/stdout. For embedding pi in IDEs, custom UIs, or other applications. The host sends commands, pi streams events back.
+
+```bash
+pi --mode rpc --provider anthropic
+```
+
+---
diff --git a/docs/what-is-pi/04-the-architecture-how-everything-fits-together.md b/docs/what-is-pi/04-the-architecture-how-everything-fits-together.md
new file mode 100644
index 000000000..7b8422f64
--- /dev/null
+++ b/docs/what-is-pi/04-the-architecture-how-everything-fits-together.md
@@ -0,0 +1,75 @@
+# The Architecture — How Everything Fits Together
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                           Pi Runtime                                    │
+│                                                                         │
+│  ┌─────────────────┐     ┌─────────────────┐     ┌──────────────────┐  │
+│  │  Model Registry  │     │  Auth Storage    │     │  Settings        │  │
+│  │  (all providers) │     │  (API keys,      │     │  (global +       │  │
+│  │                  │     │   OAuth tokens)  │     │   project)       │  │
+│  └────────┬─────────┘     └────────┬────────┘     └────────┬─────────┘  │
+│           │                        │                        │            │
+│  ┌────────▼────────────────────────▼────────────────────────▼─────────┐ │
+│  │                        Agent Session                               │ │
+│  │                                                                    │ │
+│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐ │ │
+│  │  │ Session       │  │ Agent Loop   │  │ Tool Executor            │ │ │
+│  │  │ Manager       │  │              │  │                          │ │ │
+│  │  │ ┌───────────┐ │  │ user prompt  │  │ read │ bash │ edit │    │ │ │
+│  │  │ │ JSONL Tree│ │  │    ↓         │  │ write│ grep │ find │    │ │ │
+│  │  │ │ (entries, │ │  │ LLM call     │  │ ls   │ custom tools │   │ │ │
+│  │  │ │ branches) │ │  │    ↓         │  │                          │ │ │
+│  │  │ └───────────┘ │  │ tool calls   │  └──────────────────────────┘ │ │
+│  │  │               │  │    ↓         │                               │ │
+│  │  │ Compaction    │  │ tool results │                               │ │
+│  │  │ Engine        │  │    ↓         │                               │ │
+│  │  │               │  │ (loop until  │                               │ │
+│  │  │ Branch        │  │  LLM stops)  │                               │ │
+│  │  │ Summarizer    │  │              │                               │ │
+│  │  └──────────────┘  └──────────────┘                               │ │
+│  │                                                                    │ │
+│  │  ┌──────────────────────────────────────────────────────────────┐  │ │
+│  │  │                    Event System                              │  │ │
+│  │  │  session_start → input → before_agent_start → agent_start   │  │ │
+│  │  │  → turn_start → context → tool_call → tool_result →        │  │ │
+│  │  │  turn_end → agent_end → session_shutdown                    │  │ │
+│  │  └──────────────────────────┬───────────────────────────────────┘  │ │
+│  │                             │                                      │ │
+│  └─────────────────────────────┼──────────────────────────────────────┘ │
+│                                │                                        │
+│  ┌─────────────────────────────▼───────────────────────────────────────┐│
+│  │                      Extension Runtime                              ││
+│  │  Extension A    Extension B    Extension C    ...                   ││
+│  │  (tools, cmds,  (event gates,  (custom UI,                         ││
+│  │   events)        tool mods)     providers)                         ││
+│  └─────────────────────────────────────────────────────────────────────┘│
+│                                                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐│
+│  │                    Resource Loader                                  ││
+│  │  Skills │ Prompt Templates │ Themes │ Context Files (AGENTS.md)    ││
+│  └─────────────────────────────────────────────────────────────────────┘│
+│                                                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐│
+│  │                    Mode Layer (TUI / RPC / JSON / Print)            ││
+│  └─────────────────────────────────────────────────────────────────────┘│
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+### The key subsystems:
+
+| Subsystem | What it does |
+|-----------|-------------|
+| **Model Registry** | Tracks all available models across all providers, handles API key lookup |
+| **Auth Storage** | Stores API keys and OAuth tokens securely |
+| **Agent Session** | The main orchestrator — manages the agent loop, session, tools, and events |
+| **Session Manager** | Reads/writes JSONL session files, manages the entry tree, handles branching |
+| **Agent Loop** | The core cycle: send messages to LLM → execute tool calls → repeat until LLM stops |
+| **Tool Executor** | Runs tools (built-in and custom) with cancellation support |
+| **Compaction Engine** | Summarizes old messages when context gets too large |
+| **Event System** | Every action emits events that extensions can observe and modify |
+| **Extension Runtime** | Loads and manages extensions, routes events, handles tool/command registration |
+| **Resource Loader** | Discovers and loads skills, prompts, themes, and context files |
+| **Mode Layer** | Handles I/O for the current mode (TUI rendering, RPC protocol, JSON streaming, print) |
+
+---
diff --git a/docs/what-is-pi/05-the-agent-loop-how-pi-thinks.md b/docs/what-is-pi/05-the-agent-loop-how-pi-thinks.md
new file mode 100644
index 000000000..885441685
--- /dev/null
+++ b/docs/what-is-pi/05-the-agent-loop-how-pi-thinks.md
@@ -0,0 +1,44 @@
+# The Agent Loop — How Pi Thinks
+
+The agent loop is the heartbeat of pi. It's what happens between you sending a prompt and getting a response:
+
+```
+User sends prompt
+    │
+    ▼
+┌─ TURN START ──────────────────────────────────────┐
+│                                                    │
+│  1. Assemble context                               │
+│     - System prompt (+ modifications from hooks)   │
+│     - Previous messages (or compaction summary)     │
+│     - The new user message                         │
+│                                                    │
+│  2. Send to LLM                                    │
+│     - Stream response tokens                       │
+│     - Parse any tool calls in the response         │
+│                                                    │
+│  3. If tool calls present:                         │
+│     - For each tool call:                          │
+│       a. Fire tool_call event (can be blocked)     │
+│       b. Execute the tool                          │
+│       c. Fire tool_result event (can be modified)  │
+│       d. Append result to messages                 │
+│     - Go back to step 1 (new turn with results)    │
+│                                                    │
+│  4. If no tool calls (LLM just responded):         │
+│     - Save messages to session                     │
+│     - Done                                         │
+│                                                    │
+└───────────────────────────────────────────────────┘
+```
+
+**Key insight:** The loop keeps going until the LLM decides to stop calling tools. A single user prompt might trigger 1 turn or 50 turns depending on the task complexity. Each turn is a complete LLM call → response → tool execution cycle.
+
+**Stop reasons the LLM can produce:**
+- `stop` — Normal completion, the LLM is done
+- `toolUse` — The LLM wants to call tools (triggers another turn)
+- `length` — Hit the output token limit
+- `error` — Something went wrong
+- `aborted` — User cancelled (Escape)
+
+---
diff --git a/docs/what-is-pi/06-tools-how-pi-acts-on-the-world.md b/docs/what-is-pi/06-tools-how-pi-acts-on-the-world.md
new file mode 100644
index 000000000..04cce4df8
--- /dev/null
+++ b/docs/what-is-pi/06-tools-how-pi-acts-on-the-world.md
@@ -0,0 +1,41 @@
+# Tools — How Pi Acts on the World
+
+Tools are functions the LLM can call to interact with your system. The LLM sees tool descriptions in its system prompt and decides when to use them.
+
+### Built-in Tools
+
+Pi ships with 7 built-in tools (4 active by default):
+
+| Tool | Default | What it does |
+|------|---------|-------------|
+| `read` | ✅ | Read file contents (text and images). Supports offset/limit for large files. Truncates to 2000 lines / 50KB. |
+| `bash` | ✅ | Execute shell commands. Returns stdout, stderr, exit code. Truncates to 2000 lines / 50KB. |
+| `edit` | ✅ | Surgical text replacement — find exact text and replace it. |
+| `write` | ✅ | Create or overwrite files. Auto-creates parent directories. |
+| `grep` | ❌ | Search file contents with regex patterns. |
+| `find` | ❌ | Find files by name/pattern. |
+| `ls` | ❌ | List directory contents. |
+
+### Tool Control
+
+```bash
+pi --tools read,bash,edit,write       # Specify active tools (default)
+pi --tools read,grep,find,ls          # Read-only exploration
+pi --no-tools                         # No built-in tools (extensions only)
+```
+
+Extensions can also manage tools at runtime:
+```typescript
+pi.setActiveTools(["read", "bash"]);   // Switch to read-only + bash
+pi.setActiveTools(pi.getAllTools().map(t => t.name));  // Enable all
+```
+
+### How Tools Appear to the LLM
+
+The system prompt includes an "Available tools" section listing each active tool with its description and parameter schema. The LLM reads this and decides when to call which tool. This is standard LLM tool-calling — the model outputs a structured tool call, pi executes it, and feeds the result back.
+
+### Output Truncation
+
+**All tools truncate output** to 50KB / 2000 lines (whichever is hit first). This prevents a single tool call from consuming the entire context window. When truncated, the full output is saved to a temp file and the LLM is told where to find it.
+
+---
diff --git a/docs/what-is-pi/07-sessions-memory-that-branches.md b/docs/what-is-pi/07-sessions-memory-that-branches.md
new file mode 100644
index 000000000..cc4b4cfeb
--- /dev/null
+++ b/docs/what-is-pi/07-sessions-memory-that-branches.md
@@ -0,0 +1,81 @@
+# Sessions — Memory That Branches
+
+Sessions are pi's memory system. They're more sophisticated than simple conversation history.
+
+### Storage Format
+
+Sessions are **JSONL files** (one JSON object per line). Each line is an "entry" with a `type`, `id`, and `parentId`:
+
+```
+~/.gsd/agent/sessions/--path--to--project--/<timestamp>_<uuid>.jsonl
+```
+
+### The Entry Tree
+
+Entries form a **tree structure**, not a linear list. This is the key architectural insight:
+
+```
+                    ┌─ [user] ─ [assistant] ─ [tool] ─ [assistant]  ← Branch A
+[header] ─ [user] ─┤
+                    └─ [user] ─ [assistant]                          ← Branch B (via /tree)
+```
+
+Every entry has an `id` and `parentId`. When you navigate to a previous point with `/tree` and continue from there, a new branch is created from that point. **All branches coexist in the same file.** Nothing is deleted.
+
+### Entry Types
+
+| Type | Purpose |
+|------|---------|
+| `session` | Header — file metadata, version, working directory |
+| `message` | A conversation message (user, assistant, tool result, custom) |
+| `compaction` | Summary of older messages (created by compaction) |
+| `branch_summary` | Summary of an abandoned branch (created by `/tree`) |
+| `model_change` | Records when the user switched models |
+| `thinking_level_change` | Records when the user changed thinking level |
+| `custom` | Extension state (NOT sent to LLM) |
+| `custom_message` | Extension-injected message (IS sent to LLM) |
+| `label` | User bookmark on an entry |
+| `session_info` | Session metadata (display name) |
+
+### Message Types Within Entries
+
+Message entries contain typed message objects:
+
+| Role | What it is |
+|------|-----------|
+| `user` | User's prompt (text and/or images) |
+| `assistant` | LLM's response (text, thinking, tool calls) — includes model, provider, usage stats |
+| `toolResult` | Output from a tool execution — includes `details` for rendering and state |
+| `bashExecution` | Output from user's `!command` (not from LLM tool calls) |
+| `custom` | Extension-injected message |
+| `branchSummary` | Summary of an abandoned branch |
+| `compactionSummary` | Summary from compaction |
+
+### Context Building
+
+When pi needs to send messages to the LLM, it walks the tree from the current leaf to the root:
+
+1. If there's a compaction entry on the path → emit the summary first, then messages from `firstKeptEntryId` onward
+2. If there's a branch summary → include it as context
+3. Custom message entries → included in LLM context
+4. Custom entries (extension state) → NOT included in LLM context
+
+### Session Commands
+
+| Command | What it does |
+|---------|-------------|
+| `/tree` | Navigate to any point in the session tree and continue from there |
+| `/fork` | Create a new session file from the current branch |
+| `/resume` | Browse and switch to a previous session |
+| `/new` | Start a fresh session |
+| `/name <name>` | Set a display name for the session |
+| `/session` | Show session info (path, tokens, cost) |
+| `/compact` | Manually trigger compaction |
+
+### Branching in Practice
+
+**`/tree`** — In-place branching. You select a previous point, the conversation continues from there. The old branch is preserved and can be revisited. Pi optionally generates a summary of the branch you're leaving so context isn't lost.
+
+**`/fork`** — Creates a new session file from the current branch. Opens a selector, copies history up to the selected point, and puts that message in the editor for modification. Good for "start fresh but keep the context."
+
+---
diff --git a/docs/what-is-pi/08-compaction-how-pi-manages-context-limits.md b/docs/what-is-pi/08-compaction-how-pi-manages-context-limits.md
new file mode 100644
index 000000000..b80b591f6
--- /dev/null
+++ b/docs/what-is-pi/08-compaction-how-pi-manages-context-limits.md
@@ -0,0 +1,87 @@
+# Compaction — How Pi Manages Context Limits
+
+LLMs have finite context windows. Pi's compaction system keeps conversations going beyond those limits.
+
+### When Compaction Triggers
+
+**Automatic:** When `contextTokens > contextWindow - reserveTokens` (default reserve: 16,384 tokens). Also triggers proactively as you approach the limit.
+
+**Manual:** `/compact [custom instructions]`
+
+### How It Works
+
+```
+Before compaction:
+
+  Messages:  [user][assistant][tool][user][assistant][tool][tool][assistant][tool]
+              └──────── summarize these ────────┘ └──── keep these (recent) ────┘
+                                                   ↑
+                                          keepRecentTokens (default: 20k)
+
+After compaction (new entry appended):
+
+  What the LLM sees:  [system prompt] [summary] [kept messages...]
+```
+
+1. Pi walks backward from the newest message, counting tokens until it reaches `keepRecentTokens` (default 20k)
+2. Everything before that point gets summarized by the LLM using a structured format
+3. A `CompactionEntry` is appended with the summary and a pointer to the first kept message
+4. On reload, the LLM sees: system prompt → summary → recent messages
+
+### Split Turns
+
+Sometimes a single turn (one user prompt + all its tool calls) exceeds the `keepRecentTokens` budget. Pi handles this by cutting mid-turn and generating two summaries: one for the history before the turn, and one for the early part of the split turn.
+
+### The Summary Format
+
+Both compaction and branch summarization produce structured summaries:
+
+```markdown
+## Goal
+[What the user is trying to accomplish]
+
+## Constraints & Preferences
+- [Requirements mentioned by user]
+
+## Progress
+### Done
+- [x] Completed tasks
+### In Progress  
+- [ ] Current work
+### Blocked
+- Issues, if any
+
+## Key Decisions
+- **Decision**: Rationale
+
+## Next Steps
+1. What should happen next
+
+## Critical Context
+- Data needed to continue
+
+<read-files>
+path/to/file1.ts
+</read-files>
+
+<modified-files>
+path/to/changed.ts
+</modified-files>
+```
+
+### Why This Matters
+
+Compaction is lossy — information is lost in the summary. But the full history remains in the JSONL file. You can always use `/tree` to revisit the pre-compaction state. The tradeoff is: continue working with a summary of earlier context, or start fresh. Extensions can customize compaction to produce better summaries for your specific use case.
+
+**Settings:**
+```json
+{
+  "compaction": {
+    "enabled": true,
+    "reserveTokens": 16384,
+    "keepRecentTokens": 20000
+  }
+}
+```
+
+---
diff --git a/docs/what-is-pi/09-the-customization-stack.md b/docs/what-is-pi/09-the-customization-stack.md
new file mode 100644
index 000000000..10d032b39
--- /dev/null
+++ b/docs/what-is-pi/09-the-customization-stack.md
@@ -0,0 +1,96 @@
+# The Customization Stack
+
+Pi has four layers of customization, each serving a different purpose:
+
+```
+┌─────────────────────────────────────┐
+│           Extensions                │  ← TypeScript code. Full runtime access.
+│  Custom tools, events, UI,          │     Can do anything.
+│  commands, providers                │
+├─────────────────────────────────────┤
+│           Skills                    │  ← Markdown instructions + scripts.
+│  On-demand capability packages      │     Loaded when the task matches.
+│  loaded by the agent                │
+├─────────────────────────────────────┤
+│       Prompt Templates              │  ← Markdown snippets.
+│  Reusable prompts expanded          │     Quick text expansion via /name.
+│  via /templatename                  │
+├─────────────────────────────────────┤
+│           Themes                    │  ← JSON color definitions.
+│  Visual appearance                  │     Hot-reload on change.
+└─────────────────────────────────────┘
+```
+
+### Extensions
+
+TypeScript modules with full runtime access. They can hook into every event, register tools the LLM can call, add commands, render custom UI, override built-in behavior, and register model providers. Extensions are the most powerful customization mechanism.
+
+**Placement:**
+- `~/.gsd/agent/extensions/` (global)
+- `.gsd/extensions/` (project-local)
+
+See the companion doc **Pi-Extensions-Complete-Guide.md** for the full 50KB reference.
+
+### Skills
+
+On-demand capability packages following the [Agent Skills standard](https://agentskills.io). A skill is a directory with a `SKILL.md` file containing instructions the agent follows. Skills are progressive: only their names and descriptions are in the system prompt. The agent reads the full SKILL.md only when the task matches.
+
+**How skills work:**
+1. At startup, pi scans for skills and extracts names + descriptions
+2. Descriptions are listed in the system prompt
+3. When a task matches, the agent uses `read` to load the full SKILL.md
+4. The agent follows the instructions, using relative paths for scripts/assets
+
+**Invocation:**
+```
+/skill:brave-search              # Explicit invocation
+/skill:pdf-tools extract file.pdf  # With arguments
+```
+
+**Placement:**
+- `~/.agents/skills/` (global — shared across all agents)
+- `.agents/skills/` (project, searched up to git root)
+
+**Skill structure:**
+```
+my-skill/
+├── SKILL.md              # Required: frontmatter + instructions
+├── scripts/              # Helper scripts (optional)
+│   └── process.sh
+└── references/           # Reference docs (optional)
+    └── api-guide.md
+```
+
+### Prompt Templates
+
+Markdown files that expand into prompts via `/name`. Simple text expansion with positional argument support (`$1`, `$2`, `$@`).
+
+```markdown
+<!-- ~/.gsd/agent/prompts/review.md -->
+---
+description: Review staged git changes
+---
+Review the staged changes (`git diff --cached`). Focus on:
+- Bugs and logic errors  
+- Security issues
+- Performance problems
+Focus area: $1
+```
+
+Usage: `/review "error handling"` → expands with `$1` = "error handling"
+
+**Placement:**
+- `~/.gsd/agent/prompts/` (global)
+- `.gsd/prompts/` (project-local)
+
+### Themes
+
+JSON files defining the color palette for the TUI. Hot-reload: edit the file and pi applies changes immediately.
+
+**Built-in:** `dark`, `light`
+
+**Placement:**
+- `~/.gsd/agent/themes/` (global)
+- `.gsd/themes/` (project-local)
+
+---
diff --git a/docs/what-is-pi/10-providers-models-multi-model-by-default.md b/docs/what-is-pi/10-providers-models-multi-model-by-default.md
new file mode 100644
index 000000000..f218ff10d
--- /dev/null
+++ b/docs/what-is-pi/10-providers-models-multi-model-by-default.md
@@ -0,0 +1,58 @@
+# Providers & Models — Multi-Model by Default
+
+Pi isn't locked to one provider. It supports 20+ providers out of the box and lets you add more.
+
+### Authentication Methods
+
+**OAuth subscriptions (via `/login`):**
+- Anthropic Claude Pro/Max
+- OpenAI ChatGPT Plus/Pro (Codex)
+- GitHub Copilot
+- Google Gemini CLI
+- Google Antigravity
+
+**API keys (via environment variables):**
+- Anthropic, Anthropic (Vertex AI), OpenAI, Azure OpenAI, Google Gemini, Google Vertex, Amazon Bedrock
+- Mistral, Groq, Cerebras, xAI, OpenRouter, Vercel AI Gateway
+- ZAI, OpenCode Zen, OpenCode Go, Hugging Face, Kimi, MiniMax
+
+### Model Switching
+
+You can switch models at any time during a conversation:
+
+- `/model` — Open the model selector
+- `Ctrl+L` — Same as `/model`
+- `Ctrl+P` / `Shift+Ctrl+P` — Cycle through scoped models
+- `Shift+Tab` — Cycle thinking level
+
+Model changes are recorded in the session as `model_change` entries, so when you resume a session, pi knows which model you were using.
+
+### CLI Model Selection
+
+```bash
+pi --model sonnet                          # Fuzzy match
+pi --model openai/gpt-4o                   # Provider/model
+pi --model sonnet:high                     # With thinking level
+pi --models "claude-*,gpt-4o"             # Scope models for Ctrl+P cycling
+pi --list-models                           # List all available
+pi --list-models gemini                    # Search by name
+```
+
+### Custom Providers
+
+Add providers via `~/.gsd/agent/models.json` (simple) or extensions (advanced with OAuth, custom streaming):
+
+```json
+// ~/.gsd/agent/models.json
+{
+  "providers": [{
+    "name": "my-proxy",
+    "baseUrl": "https://proxy.example.com",
+    "apiKey": "PROXY_API_KEY",
+    "api": "anthropic-messages",
+    "models": [{ "id": "claude-sonnet-4", "name": "Sonnet via Proxy", ... }]
+  }]
+}
+```
+
+---
diff --git a/docs/what-is-pi/11-the-interactive-tui.md b/docs/what-is-pi/11-the-interactive-tui.md
new file mode 100644
index 000000000..9da934044
--- /dev/null
+++ b/docs/what-is-pi/11-the-interactive-tui.md
@@ -0,0 +1,50 @@
+# The Interactive TUI
+
+Pi's terminal interface is built with a custom TUI framework (`@mariozechner/pi-tui`).
+
+### Layout (top to bottom)
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  Startup Header                                              │
+│  Shows: shortcuts, loaded AGENTS.md files, prompts,          │
+│  skills, extensions                                          │
+├─────────────────────────────────────────────────────────────┤
+│                                                              │
+│  Messages Area                                               │
+│  User messages, assistant responses, tool calls/results,     │
+│  notifications, errors, extension UI                         │
+│                                                              │
+├─────────────────────────────────────────────────────────────┤
+│  [Widgets above editor - from extensions]                    │
+├─────────────────────────────────────────────────────────────┤
+│  Editor (input area)                                         │
+│  Border color = thinking level                               │
+├─────────────────────────────────────────────────────────────┤
+│  [Widgets below editor - from extensions]                    │
+├─────────────────────────────────────────────────────────────┤
+│  Footer: cwd │ session name │ tokens │ cost │ context │ model│
+│  [Extension status indicators]                               │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Editor Features
+
+| Feature | How |
+|---------|-----|
+| File reference | Type `@` to fuzzy-search project files |
+| Path completion | Tab to complete paths |
+| Multi-line | Shift+Enter |
+| Images | Ctrl+V to paste, or drag onto terminal |
+| Bash commands | `!command` (sends output to LLM), `!!command` (runs without sending) |
+| External editor | Ctrl+G opens `$VISUAL` or `$EDITOR` |
+
+### Tool Output Display
+
+Tool calls and results are rendered inline with collapsible output:
+- `Ctrl+O` — Toggle expand/collapse all tool output
+- `Ctrl+T` — Toggle expand/collapse thinking blocks
+
+Extensions can provide custom renderers for their tools, controlling exactly how tool calls and results appear.
+
+---
diff --git a/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md b/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md
new file mode 100644
index 000000000..952c0f1c8
--- /dev/null
+++ b/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md
@@ -0,0 +1,20 @@
+# The Message Queue — Talking While Pi Thinks
+
+Pi doesn't make you wait for the agent to finish before sending more instructions. You can queue messages while the agent is streaming:
+
+| Key | Behavior |
+|-----|----------|
+| **Enter** | Queue a **steering** message — delivered after current tool, interrupts remaining tools |
+| **Alt+Enter** | Queue a **follow-up** message — delivered after agent finishes all work |
+| **Escape** | Abort the agent and restore queued messages to editor |
+| **Alt+Up** | Retrieve queued messages back to editor |
+
+**Steering** is for course-correction: "Stop, do this instead." The message is delivered after the current tool finishes, but remaining tool calls in the LLM's response are skipped.
+
+**Follow-up** is for chaining: "After you're done with that, also do this." The message waits until the agent has no more tool calls to make.
+
+**Settings:**
+- `steeringMode`: `"one-at-a-time"` (default) or `"all"` (deliver all queued at once)
+- `followUpMode`: same options
+
+---
diff --git a/docs/what-is-pi/13-context-files-project-instructions.md b/docs/what-is-pi/13-context-files-project-instructions.md
new file mode 100644
index 000000000..822fb6ada
--- /dev/null
+++ b/docs/what-is-pi/13-context-files-project-instructions.md
@@ -0,0 +1,34 @@
+# Context Files — Project Instructions
+
+Pi loads instruction files automatically at startup:
+
+### AGENTS.md (or CLAUDE.md)
+
+Pi looks for `AGENTS.md` or `CLAUDE.md` in:
+1. `~/.gsd/agent/AGENTS.md` (global)
+2. Every parent directory from cwd up to filesystem root
+3. Current directory
+
+All matching files are concatenated and included in the system prompt. Use these for project conventions, common commands, architectural notes.
+
+### System Prompt Override
+
+Replace the default system prompt entirely:
+- `.gsd/SYSTEM.md` (project)
+- `~/.gsd/agent/SYSTEM.md` (global)
+
+Append to it instead:
+- `.gsd/APPEND_SYSTEM.md` (project)
+- `~/.gsd/agent/APPEND_SYSTEM.md` (global)
+
+### File Arguments
+
+Include files directly in prompts from the CLI:
+
+```bash
+pi @prompt.md "Answer this"
+pi -p @screenshot.png "What's in this image?"
+pi @code.ts @test.ts "Review these files"
+```
+
+---
diff --git a/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md b/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md
new file mode 100644
index 000000000..80aac1a9e
--- /dev/null
+++ b/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md
@@ -0,0 +1,55 @@
+# The SDK & RPC — Embedding Pi
+
+Pi isn't just a terminal tool. It's designed to be embedded in other applications.
+
+### SDK (TypeScript)
+
+For Node.js/TypeScript applications, import and use pi directly:
+
+```typescript
+import { AuthStorage, createAgentSession, ModelRegistry, SessionManager } from "@mariozechner/pi-coding-agent";
+
+const authStorage = AuthStorage.create();
+const modelRegistry = new ModelRegistry(authStorage);
+
+const { session } = await createAgentSession({
+  sessionManager: SessionManager.inMemory(),
+  authStorage,
+  modelRegistry,
+});
+
+// Subscribe to events
+session.subscribe((event) => {
+  if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
+    process.stdout.write(event.assistantMessageEvent.delta);
+  }
+});
+
+// Send prompts
+await session.prompt("What files are in the current directory?");
+```
+
+The SDK gives you full control: custom tools, custom resource loaders, session management, model selection, event streaming. See the [openclaw/openclaw](https://github.com/openclaw/openclaw) project for a real-world SDK integration.
+
+### RPC Mode (Any Language)
+
+For non-Node.js applications, spawn pi as a subprocess and communicate via JSON over stdin/stdout:
+
+```bash
+pi --mode rpc --provider anthropic
+```
+
+Send commands:
+```json
+{"type": "prompt", "message": "Hello, world!"}
+{"type": "steer", "message": "Stop and do this instead"}
+{"type": "follow_up", "message": "After you're done, also do this"}
+```
+
+Receive events:
+```json
+{"type": "event", "event": {"type": "message_update", ...}}
+{"type": "response", "command": "prompt", "success": true}
+```
+
+---
diff --git a/docs/what-is-pi/15-pi-packages-the-ecosystem.md b/docs/what-is-pi/15-pi-packages-the-ecosystem.md
new file mode 100644
index 000000000..4e19de60a
--- /dev/null
+++ b/docs/what-is-pi/15-pi-packages-the-ecosystem.md
@@ -0,0 +1,43 @@
+# Pi Packages — The Ecosystem
+
+Pi packages bundle extensions, skills, prompts, and themes for distribution via npm or git.
+
+### Installing
+
+```bash
+pi install npm:@foo/bar@1.0.0       # From npm (pinned)
+pi install npm:@foo/bar              # From npm (latest)
+pi install git:github.com/user/repo  # From git
+pi install ./local/path              # From local path
+pi list                              # Show installed
+pi update                            # Update non-pinned
+pi remove npm:@foo/bar               # Uninstall
+pi config                            # Enable/disable resources
+```
+
+### Creating
+
+Add a `pi` key to `package.json`:
+
+```json
+{
+  "name": "my-pi-package",
+  "keywords": ["pi-package"],
+  "pi": {
+    "extensions": ["./extensions"],
+    "skills": ["./skills"],
+    "prompts": ["./prompts"],
+    "themes": ["./themes"]
+  }
+}
+```
+
+Or just use conventional directory names (`extensions/`, `skills/`, `prompts/`, `themes/`) and pi discovers them automatically.
+
+### Finding Packages
+
+- [Package gallery](https://shittycodingagent.ai/packages)
+- [npm search](https://www.npmjs.com/search?q=keywords%3Api-package)
+- [Discord community](https://discord.com/invite/3cU7Bz4UPx)
+
+---
diff --git a/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md b/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md
new file mode 100644
index 000000000..389076ff9
--- /dev/null
+++ b/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md
@@ -0,0 +1,29 @@
+# Why Pi Matters — What Makes It Different
+
+### vs. Other Coding Agents
+
+| Aspect | Typical agents | Pi |
+|--------|---------------|-----|
+| **Customization** | Fork the repo or wait for features | Extension system — build anything without forking |
+| **Model lock-in** | One provider, maybe two | 20+ providers, switch mid-conversation |
+| **Session management** | Linear history, maybe undo | Tree-based branching with in-place navigation |
+| **Context management** | Basic truncation | Structured compaction with summaries, customizable via extensions |
+| **Distribution** | No ecosystem | Pi packages via npm/git, shareable extensions/skills/themes |
+| **Embedding** | Not designed for it | SDK + RPC mode, built for integration |
+| **Philosophy** | Opinionated, batteries-included | Minimal core, extend to your workflow |
+
+### The Core Value Propositions
+
+1. **Extensibility as architecture.** Not an afterthought. The event system, tool registration, command system, and custom UI were designed from day one to make extensions as powerful as built-in features.
+
+2. **Session branching.** Tree-based conversations mean you never lose work. Explore different approaches, keep all of them, jump between them with `/tree`.
+
+3. **Compaction with structure.** When context gets too large, pi summarizes it with a structured format that preserves goals, decisions, and progress. Extensions can customize this entirely.
+
+4. **Multi-model fluidity.** Switch between Claude, GPT, Gemini, or any of 20+ providers mid-conversation. Use the best model for each part of the task.
+
+5. **Progressive disclosure.** Skills load their full instructions only when needed. The system prompt stays lean. Extensions register tools that appear only when active.
+
+6. **Platform, not product.** Pi is infrastructure you build on. Sub-agents, plan mode, permission gates, MCP support, custom workflows — build exactly what you need, share it as a package.
+
+---
diff --git a/docs/what-is-pi/17-file-reference-all-documentation.md b/docs/what-is-pi/17-file-reference-all-documentation.md
new file mode 100644
index 000000000..d23990c9c
--- /dev/null
+++ b/docs/what-is-pi/17-file-reference-all-documentation.md
@@ -0,0 +1,54 @@
+# File Reference — All Documentation
+
+All paths relative to:
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/
+```
+
+### Core Documentation
+
+| File | What It Covers |
+|------|---------------|
+| `README.md` | Main documentation — quick start, all features, CLI reference, philosophy |
+| `docs/extensions.md` | Extensions API — events, tools, commands, UI, state, rendering (1,972 lines) |
+| `docs/tui.md` | TUI component system — Component interface, built-in components, keyboard, theming, overlays |
+| `docs/session.md` | Session format — JSONL tree structure, entry types, message types, SessionManager API |
+| `docs/compaction.md` | Compaction & branch summarization — triggers, algorithm, summary format, extension hooks |
+| `docs/packages.md` | Pi packages — creating, installing, distributing via npm/git |
+| `docs/skills.md` | Skills — structure, frontmatter, locations, invocation |
+| `docs/prompt-templates.md` | Prompt templates — format, arguments, locations |
+| `docs/themes.md` | Themes — creating custom themes, color palette |
+| `docs/settings.md` | Settings — all configuration options |
+| `docs/keybindings.md` | Keyboard shortcuts — format, built-in bindings, customization |
+| `docs/providers.md` | Provider setup — detailed instructions for each provider |
+| `docs/models.md` | Custom models — models.json format |
+| `docs/custom-provider.md` | Custom providers — advanced: OAuth, custom streaming, model definitions |
+| `docs/sdk.md` | SDK — AgentSession, events, embedding pi in applications |
+| `docs/rpc.md` | RPC mode — JSON protocol, commands, events |
+| `docs/json.md` | JSON mode — event stream format |
+| `docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md` | Branded app architecture — shipping your own CLI, app-owned storage, SDK vs RPC, bundling resources |
+| `docs/development.md` | Contributing — development setup, forking, debugging |
+| `docs/windows.md` | Windows platform notes |
+| `docs/termux.md` | Termux (Android) setup |
+| `docs/terminal-setup.md` | Terminal configuration recommendations |
+| `docs/shell-aliases.md` | Shell alias patterns |
+
+### Example Extensions
+
+See the companion doc **Pi-Extensions-Complete-Guide.md** for a categorized reference of all 50+ example extensions.
+
+```
+examples/extensions/          # All example extensions
+examples/sdk/                 # SDK usage examples
+```
+
+### Source Code (on GitHub)
+
+| Package | Purpose |
+|---------|---------|
+| `packages/coding-agent` | The main pi package — agent, tools, extensions, session, compaction |
+| `packages/tui` | Terminal UI component library |
+| `packages/ai` | Core LLM toolkit — providers, streaming, message types |
+| `packages/agent` | Agent loop framework |
+
+---
diff --git a/docs/what-is-pi/18-quick-reference-commands-shortcuts.md b/docs/what-is-pi/18-quick-reference-commands-shortcuts.md
new file mode 100644
index 000000000..fa6b09ad0
--- /dev/null
+++ b/docs/what-is-pi/18-quick-reference-commands-shortcuts.md
@@ -0,0 +1,68 @@
+# Quick Reference — Commands & Shortcuts
+
+### Commands
+
+| Command | Description |
+|---------|-------------|
+| `/login`, `/logout` | OAuth authentication |
+| `/model` | Switch models |
+| `/scoped-models` | Configure Ctrl+P model cycling |
+| `/settings` | Thinking level, theme, delivery mode, transport |
+| `/resume` | Browse previous sessions |
+| `/new` | New session |
+| `/name <name>` | Name current session |
+| `/session` | Session info (path, tokens, cost) |
+| `/tree` | Navigate session tree |
+| `/fork` | Fork to new session |
+| `/compact [instructions]` | Manual compaction |
+| `/copy` | Copy last response to clipboard |
+| `/export [file]` | Export to HTML |
+| `/share` | Upload as private GitHub gist |
+| `/reload` | Reload extensions, skills, prompts, context files |
+| `/hotkeys` | Show all keyboard shortcuts |
+| `/changelog` | Version history |
+| `/quit`, `/exit` | Exit pi |
+
+### Keyboard Shortcuts
+
+| Key | Action |
+|-----|--------|
+| Ctrl+C | Clear editor / quit (twice) |
+| Escape | Cancel/abort / open `/tree` (twice) |
+| Ctrl+L | Model selector |
+| Ctrl+P / Shift+Ctrl+P | Cycle scoped models |
+| Shift+Tab | Cycle thinking level |
+| Ctrl+O | Toggle tool output expand/collapse |
+| Ctrl+T | Toggle thinking block expand/collapse |
+| Ctrl+G | Open external editor |
+| Ctrl+V | Paste (including images) |
+| Enter (during streaming) | Queue steering message |
+| Alt+Enter (during streaming) | Queue follow-up message |
+| Alt+Up | Retrieve queued messages |
+
+### CLI
+
+```bash
+pi                                    # Interactive mode
+pi "prompt"                           # Interactive with initial prompt
+pi -p "prompt"                        # Print mode (non-interactive)
+pi -c                                 # Continue last session
+pi -r                                 # Resume (browse sessions)
+pi --model provider/model:thinking    # Specify model
+pi --tools read,bash                  # Specify tools
+pi -e ./extension.ts                  # Load extension
+pi --mode rpc                         # RPC mode
+pi --mode json                        # JSON mode
+pi @file.ts "Review this"            # Include file in prompt
+pi install npm:package               # Install package
+pi list                               # List packages
+```
+
+---
+
+*This document was generated from the Pi documentation. Source files are at:*
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/
+```
+
+*Companion document: **Pi-Extensions-Complete-Guide.md** (on Desktop)*
diff --git a/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md b/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md
new file mode 100644
index 000000000..ba467b03b
--- /dev/null
+++ b/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md
@@ -0,0 +1,896 @@
+# Building Branded Apps on Top of Pi
+
+This document covers the part that the extension docs, SDK docs, RPC docs, and package docs only imply when read together:
+
+**How do you build your own product on top of pi** so users run **your** app, **your** command, and **your** UI rather than installing and managing pi directly?
+
+Examples:
+- a branded CLI like `gsd`
+- a desktop app that uses pi as its backend engine
+- a web or Electron app that uses pi sessions, tools, and event streaming
+- an internal company agent product built on pi primitives
+
+The short answer is:
+
+- **Yes, you can build your own branded app on top of pi**
+- **No, end users do not need to install pi globally** if you ship your own app that depends on pi packages
+- **No, you do not have to rely on `~/.gsd`** if you embed pi with custom paths and storage
+- **Yes, you can bundle your own extensions, prompts, themes, skills, and providers** inside your app
+
+The rest of this document explains the architecture choices, storage choices, packaging strategies, and practical tradeoffs.
+
+---
+
+## 19.1 The Three Ways to Use Pi as a Foundation
+
+There are really three layers you can build on:
+
+1. **`@mariozechner/pi-coding-agent`**
+   - Highest-level embedding API
+   - Best when you want pi's session system, resource loading, tools, extension model, and coding-agent behaviors
+2. **Pi CLI in RPC mode**
+   - Best when you want process isolation or language-agnostic integration
+3. **`@mariozechner/pi-agent-core`**
+   - Lower-level agent loop without the full pi coding-agent shell
+   - Best when you want more of the engine than the product surface
+
+For most branded CLI or desktop app use cases, start with **`@mariozechner/pi-coding-agent`**.
+
+### Rule of thumb
+
+- Want your own **CLI/TUI** with pi behavior under the hood -> use **SDK embedding** via `createAgentSession()`
+- Want your own app in a **different language** or want a **subprocess boundary** -> use **RPC mode**
+- Want a more generic **agent engine** and will build more infrastructure yourself -> use **`@mariozechner/pi-agent-core`**
+
+---
+
+## 19.2 The Biggest Misconception: Pi Does Not Require a Global `pi` Install
+
+If you are building a product on top of pi, your users do **not** need to install `pi` globally with npm.
+
+You can ship your own app that depends on:
+
+- `@mariozechner/pi-coding-agent`
+- `@mariozechner/pi-agent-core`
+- `@mariozechner/pi-ai`
+- `@mariozechner/pi-tui`
+- `@mariozechner/pi-web-ui`
+
+That means a branded command like:
+
+```bash
+gsd
+```
+
+can be **your** executable, backed by pi internals, without asking users to separately install and run `pi`.
+
+### What this means in practice
+
+Instead of telling users:
+
+```bash
+npm install -g @mariozechner/pi-coding-agent
+pi
+```
+
+you can ship:
+
+```bash
+npm install -g my-gsd
+# or a standalone binary / packaged desktop app
+
+gsd
+```
+
+And inside `gsd`, you import pi packages and create your own session, UI, storage, and resource loading behavior.
+
+---
+
+## 19.3 The Second Biggest Misconception: `~/.gsd` Is a Default, Not a Requirement
+
+Pi CLI defaults to `~/.gsd/agent`, but embedded applications are not forced to use it.
+
+When you use `createAgentSession()`, you can control:
+
+- `agentDir`
+- `cwd`
+- `authStorage`
+- `modelRegistry`
+- `resourceLoader`
+- `sessionManager`
+- `settingsManager`
+
+That means your app can store state under:
+
+- `~/.gsd/agent`
+- `~/Library/Application Support/GSD`
+- `%APPDATA%/GSD`
+- an app-local portable directory
+- a project-local directory
+
+instead of `~/.gsd`.
+
+### Things you can relocate
+
+- auth and OAuth credentials
+- settings
+- models config
+- sessions
+- extensions
+- prompt templates
+- themes
+- AGENTS-style context files
+
+### Important nuance
+
+If you use the default resource loader and default managers, pi behaves like pi:
+- standard discovery
+- standard config locations
+- standard session directories
+
+If you pass custom managers and loaders, pi becomes an engine inside **your** app.
+
+---
+
+## 19.4 Choose an Architecture First
+
+Before writing code, decide which of these architectures you actually want.
+
+### Architecture A: Branded Node CLI or TUI using the SDK
+
+This is the most natural fit for tools like `gsd`.
+
+You create your own executable and call `createAgentSession()` directly.
+
+#### Good for
+- a branded terminal tool
+- a custom TUI
+- internal company coding agents
+- a CLI with pi sessions, tools, and extensions under the hood
+
+#### Benefits
+- type-safe
+- no subprocess management
+- easy to customize storage and discovery
+- easiest way to remove dependency on `~/.gsd`
+- easiest way to bundle built-in resources
+
+#### Typical stack
+- `@mariozechner/pi-coding-agent`
+- optionally `@mariozechner/pi-tui`
+- your own entrypoint and app directories
+
+---
+
+### Architecture B: Branded App + Pi RPC subprocess
+
+Here your app spawns pi as a subprocess and talks to it over JSON lines.
+
+#### Good for
+- non-Node host applications
+- desktop shells with a strict engine boundary
+- process isolation
+- integrations where restarting the engine independently is useful
+
+#### Benefits
+- language-agnostic
+- process isolation
+- JSON protocol is explicit and stream-friendly
+
+#### Costs
+- you must manage subprocess lifecycle
+- some UI features are degraded compared to pi's native TUI
+- extension UI works through a request/response sub-protocol, not full TUI embedding
+
+---
+
+### Architecture C: App built on `pi-agent-core` or `pi-web-ui`
+
+This is for cases where you want pi's model and agent infrastructure but not necessarily pi's full coding-agent product surface.
+
+#### Good for
+- browser apps
+- web chat products
+- custom artifact workflows
+- custom message types and renderers
+
+#### Benefits
+- lower-level control
+- more app-specific freedom
+- easier fit for non-terminal interfaces
+
+#### Costs
+- you build more yourself
+- fewer coding-agent-specific conveniences out of the box
+
+---
+
+## 19.5 SDK vs RPC vs Agent-Core
+
+Use this decision table.
+
+| Goal | Best Starting Point |
+|------|---------------------|
+| Branded CLI like `gsd` | `@mariozechner/pi-coding-agent` SDK |
+| Branded TUI with coding tools | `@mariozechner/pi-coding-agent` SDK |
+| Desktop app with subprocess boundary | pi RPC mode |
+| Non-Node integration | pi RPC mode |
+| Browser chat app | `@mariozechner/pi-web-ui` + `@mariozechner/pi-agent-core` |
+| Generic agent engine with custom infrastructure | `@mariozechner/pi-agent-core` |
+| Want pi sessions/resources/extensions but app-owned directories | `@mariozechner/pi-coding-agent` SDK |
+
+### More detailed tradeoff matrix
+
+| Concern | SDK | RPC | agent-core |
+|--------|-----|-----|------------|
+| Type safety | Excellent | Weak at protocol boundary | Excellent |
+| Process isolation | No | Yes | No |
+| Language agnostic | No | Yes | No |
+| Full pi session/resource system | Yes | Yes | No |
+| App-owned storage | Yes | Partial / external orchestration | Yes |
+| Rich custom UI | Strong | Moderate | Strong |
+| Uses pi extension ecosystem easily | Yes | Yes | No, not directly |
+| Simplest branded CLI path | Yes | No | No |
+
+---
+
+## 19.6 The Recommended Path for a Branded CLI Like `gsd`
+
+If you want users to run:
+
+```bash
+gsd
+```
+
+and you want it to feel like your product rather than "pi but renamed," the default recommendation is:
+
+1. Build a Node/TypeScript app
+2. Depend on `@mariozechner/pi-coding-agent`
+3. Create your own executable entrypoint
+4. Use `createAgentSession()` directly
+5. Set custom directories for config/auth/sessions
+6. Bundle your own extensions/prompts/themes/providers
+7. Expose only the commands and UX you want
+
+That gives you the best control over:
+- branding
+- defaults
+- storage layout
+- startup behavior
+- extension loading
+- model/provider setup
+
+---
+
+## 19.7 App-Owned Storage Layout
+
+A branded app should usually own its own storage hierarchy.
+
+Example:
+
+```text
+~/.gsd/
+  agent/
+    auth.json
+    models.json
+    settings.json
+    extensions/
+    prompts/
+    themes/
+    skills/
+  sessions/
+```
+
+Or on macOS:
+
+```text
+~/Library/Application Support/GSD/
+  agent/
+  sessions/
+```
+
+### Why this matters
+
+If your product uses `~/.gsd`, then:
+- it shares state with the user's pi installation
+- branding becomes muddy
+- support/debugging becomes more confusing
+- product boundaries become less clear
+
+Use app-specific directories unless you intentionally want interoperability with a user's pi environment.
+
+### Minimal example
+
+```typescript
+import path from "node:path";
+import os from "node:os";
+import {
+  AuthStorage,
+  createAgentSession,
+  ModelRegistry,
+  SessionManager,
+  SettingsManager,
+} from "@mariozechner/pi-coding-agent";
+
+const appRoot = path.join(os.homedir(), ".gsd");
+const agentDir = path.join(appRoot, "agent");
+const sessionsDir = path.join(appRoot, "sessions");
+
+const authStorage = AuthStorage.create(path.join(agentDir, "auth.json"));
+const modelRegistry = new ModelRegistry(authStorage, path.join(agentDir, "models.json"));
+const settingsManager = SettingsManager.create(process.cwd(), agentDir);
+const sessionManager = SessionManager.create(process.cwd(), sessionsDir);
+
+const { session } = await createAgentSession({
+  cwd: process.cwd(),
+  agentDir,
+  authStorage,
+  modelRegistry,
+  settingsManager,
+  sessionManager,
+});
+```
+
+This is the core pattern for “my app uses pi, but not as global pi.”
+
+---
+
+## 19.8 Bundling Resources Inside Your App
+
+This is another place where people often assume they must rely on discovery from `~/.gsd` or `.gsd/`.
+
+You do not.
+
+Your app can bundle:
+- extensions
+- prompts
+- themes
+- skills
+- AGENTS-style context
+- provider registrations
+
+inside your own package or app bundle.
+
+### Strategy 1: Use custom paths with `DefaultResourceLoader`
+
+```typescript
+import { DefaultResourceLoader } from "@mariozechner/pi-coding-agent";
+
+const loader = new DefaultResourceLoader({
+  cwd: process.cwd(),
+  agentDir,
+  additionalExtensionPaths: [
+    "/absolute/path/to/bundled/extension.ts",
+  ],
+});
+
+await loader.reload();
+```
+
+### Strategy 2: Use inline extension factories
+
+```typescript
+const loader = new DefaultResourceLoader({
+  cwd: process.cwd(),
+  agentDir,
+  extensionFactories: [
+    (pi) => {
+      pi.registerCommand("hello", {
+        description: "My branded command",
+        handler: async (_args, ctx) => ctx.ui.notify("Hello from GSD", "info"),
+      });
+    },
+  ],
+});
+```
+
+### Strategy 3: Override discovered resources entirely
+
+```typescript
+const loader = new DefaultResourceLoader({
+  cwd: process.cwd(),
+  agentDir,
+  promptsOverride: () => ({ prompts: [], diagnostics: [] }),
+  skillsOverride: () => ({ skills: [], diagnostics: [] }),
+  agentsFilesOverride: () => ({ agentsFiles: [] }),
+  systemPromptOverride: () => "You are GSD, a specialized software delivery agent.",
+});
+```
+
+### Why this matters
+
+For a branded product, it is often better to think in terms of:
+- **bundled built-ins shipped by your app**
+- optional plugin support later
+
+rather than:
+- user-managed global pi resources first
+
+---
+
+## 19.9 Discovery vs Bundling
+
+These are different product strategies.
+
+### Discovery-driven product
+You intentionally load from:
+- `~/.gsd/agent/...`
+- `.gsd/...`
+- installed pi packages
+
+#### Good when
+- your product is basically pi with additions
+- you want compatibility with existing pi user workflows
+
+### Bundled-app product
+You intentionally ship your own resources and avoid implicit user-level discovery.
+
+#### Good when
+- you want strong branding
+- you want predictable behavior
+- you want supportability and reproducibility
+- you do not want random user extensions affecting behavior
+
+### Recommendation
+For a branded tool like `gsd`, default to **bundled-app product** behavior.
+
+If you later add plugin support, make it explicit.
+
+---
+
+## 19.10 Using Pi Packages Internally vs Externally
+
+Pi packages are a sharing mechanism for extensions, prompts, skills, and themes.
+
+But when you are building your own app, there are two separate questions:
+
+1. **Should your app itself be distributed as a pi package?**
+2. **Should your app internally use pi-package-style resource organization?**
+
+### Usually, for a branded app:
+- **No** on #1
+- **Maybe** on #2
+
+If your users run your app directly, your app is usually a normal Node package, binary, or desktop app, not a pi package.
+
+But internally, you may still organize resources in a pi-friendly structure:
+
+```text
+src/
+resources/
+  extensions/
+  prompts/
+  themes/
+  skills/
+```
+
+and load them through your resource loader.
+
+### When pi packages still matter
+Pi packages are still useful when:
+- you want optional add-ons
+- you want to reuse existing pi ecosystem resources
+- you want third parties to extend your app through pi-compatible bundles
+
+---
+
+## 19.11 RPC Mode for Branded Apps
+
+RPC mode is the right answer when your product wants pi as a subprocess engine.
+
+Start it with:
+
+```bash
+pi --mode rpc
+```
+
+or programmatically by calling `runRpcMode(session)` in your own Node process.
+
+### RPC is good for
+- non-Node clients
+- desktop shells in other runtimes
+- separate engine process architecture
+- explicit JSON protocol boundaries
+
+### What RPC gives you
+- prompt / steer / follow_up / abort
+- model selection
+- state inspection
+- session operations
+- bash execution
+- event streaming
+- extension UI request/response protocol
+
+### Important limitation
+RPC is not the same thing as embedding pi's full native TUI.
+
+Some extension UI methods degrade in RPC mode.
+
+#### Dialogs still work
+- `select`
+- `confirm`
+- `input`
+- `editor`
+
+#### Fire-and-forget UI signals still work
+- notifications
+- status
+- widgets
+- title
+- editor text setting
+
+#### Some richer TUI behaviors do not map cleanly
+- full `custom()` component workflows
+- some footer/header/editor replacement behavior
+- some theme-specific TUI behavior
+
+If your branded app needs a deeply custom UI, SDK embedding or direct app-level UI integration is usually better.
+
+---
+
+## 19.12 Extension UI in RPC Mode
+
+One subtle but important point: **extensions with user interaction are still possible in RPC mode**, but through a protocol, not by directly rendering pi TUI components.
+
+The client receives `extension_ui_request` messages and must answer with `extension_ui_response` for blocking dialogs.
+
+This means you can build your own frontend and still support many extension-driven workflows.
+
+### But know the boundary
+RPC mode supports:
+- interaction patterns
+- not full TUI component identity
+
+If your extension assumes pi's exact terminal UI surface, it may need adaptation.
+
+---
+
+## 19.13 Web and Browser Apps
+
+If your app is a web app or browser-hosted UI, look closely at:
+
+- `@mariozechner/pi-agent-core`
+- `@mariozechner/pi-web-ui`
+
+`pi-web-ui` already provides:
+- chat UI
+- session storage
+- provider key storage
+- attachments
+- artifacts
+- model selection
+- settings dialogs
+- renderers and tool renderers
+
+This is effectively a starter kit for a branded web app using pi-related primitives.
+
+### Use pi-web-ui when
+- you want a browser or Electron-friendly UI surface
+- you want a ready-made chat shell
+- you do not specifically want pi's TUI
+
+### Use pi-coding-agent SDK when
+- you want coding-agent-specific resource loading, sessions, extensions, and coding tool behaviors
+- your app is terminal-first or Node-first
+
+---
+
+## 19.14 Branding Boundaries: What Still Feels Like Pi?
+
+This matters if you are building a white-labeled or branded product.
+
+### If you spawn the pi CLI directly
+Your product is closer to “pi as a subprocess.”
+That is fine, but many pi-level assumptions remain nearby.
+
+### If you embed `@mariozechner/pi-coding-agent`
+You can hide most pi branding and product surface decisions.
+You keep the coding-agent infrastructure but own the app UX.
+
+### If you use `@mariozechner/pi-agent-core`
+You are even lower-level. Pi becomes more of a library source than a user-visible product.
+
+### Practical recommendation
+If branding matters, do not treat the pi CLI binary as your product surface unless you truly want pi semantics exposed.
+
+Use the SDK or lower-level packages and build your own interface.
+
+---
+
+## 19.15 Session Strategy for a Branded App
+
+Decide whether your app wants:
+
+- **persistent sessions** with app-owned storage
+- **ephemeral sessions** only
+- **project-local sessions**
+- **branching session history** exposed to users
+
+### Persistent app-owned sessions
+Most natural for a CLI or desktop app.
+
+```typescript
+const sessionManager = SessionManager.create(process.cwd(), sessionsDir);
+```
+
+### Ephemeral mode
+Useful for task-runner or automation workflows.
+
+```typescript
+const sessionManager = SessionManager.inMemory();
+```
+
+### Important question
+Do you want your app to share session files with pi itself?
+
+Usually the answer should be **no** unless interoperability is an explicit feature.
+
+---
+
+## 19.16 Settings Strategy for a Branded App
+
+You should decide whether settings are:
+
+- file-backed
+- in-memory
+- app-global
+- project-local
+- user-editable
+- controlled only by your product UI
+
+### App-owned settings
+
+```typescript
+const settingsManager = SettingsManager.create(projectCwd, agentDir);
+```
+
+with `agentDir` pointing into your app-owned config directory.
+
+### Fully controlled settings
+
+```typescript
+const settingsManager = SettingsManager.inMemory({
+  compaction: { enabled: true },
+  retry: { enabled: true, maxRetries: 2 },
+});
+```
+
+Use in-memory settings when you want the host app to own the config model entirely.
+
+---
+
+## 19.17 Provider and Auth Strategy
+
+A branded app should decide whether users:
+- bring their own API keys
+- use OAuth through pi provider support
+- connect to your proxy/backend
+- use your own registered providers
+
+### App-owned auth paths
+Use custom `AuthStorage` paths.
+
+```typescript
+const authStorage = AuthStorage.create("/path/to/gsd/auth.json");
+```
+
+### App-owned model config
+Use your own `models.json` location or register providers dynamically.
+
+```typescript
+const modelRegistry = new ModelRegistry(authStorage, "/path/to/gsd/models.json");
+```
+
+### Custom provider strategy
+If your app talks to a proxy or company backend, register providers from your app or bundled extensions.
+
+That keeps the app experience aligned with your branding and infrastructure.
+
+---
+
+## 19.18 Building a Branded `gsd` CLI: Recommended Shape
+
+A practical architecture looks like this:
+
+```text
+my-gsd/
+  package.json
+  src/
+    cli.ts
+    app-paths.ts
+    session.ts
+    resource-loader.ts
+    ui/
+  resources/
+    extensions/
+    prompts/
+    themes/
+    skills/
+```
+
+### In `cli.ts`
+- parse your app flags
+- compute app directories
+- create auth/model/settings/session managers
+- create resource loader
+- create agent session
+- run your own mode (custom TUI, print mode, or RPC bridge)
+
+### In `resource-loader.ts`
+- load bundled resources
+- optionally disable ambient pi discovery
+- add your branded system prompt and context files
+
+### In bundled extensions
+- add your commands
+- register your custom tools
+- control your app-specific behaviors
+
+---
+
+## 19.19 Minimal SDK Skeleton for a Branded CLI
+
+```typescript
+import path from "node:path";
+import os from "node:os";
+import {
+  AuthStorage,
+  createAgentSession,
+  DefaultResourceLoader,
+  ModelRegistry,
+  SessionManager,
+  SettingsManager,
+} from "@mariozechner/pi-coding-agent";
+
+const appRoot = path.join(os.homedir(), ".gsd");
+const agentDir = path.join(appRoot, "agent");
+const sessionsDir = path.join(appRoot, "sessions");
+
+const authStorage = AuthStorage.create(path.join(agentDir, "auth.json"));
+const modelRegistry = new ModelRegistry(authStorage, path.join(agentDir, "models.json"));
+const settingsManager = SettingsManager.create(process.cwd(), agentDir);
+const sessionManager = SessionManager.create(process.cwd(), sessionsDir);
+
+const resourceLoader = new DefaultResourceLoader({
+  cwd: process.cwd(),
+  agentDir,
+  settingsManager,
+  systemPromptOverride: () =>
+    "You are GSD, a branded software delivery agent. Prefer project-specific workflows and terminology.",
+  additionalExtensionPaths: [
+    path.resolve("resources/extensions/index.ts"),
+  ],
+});
+
+await resourceLoader.reload();
+
+const { session } = await createAgentSession({
+  cwd: process.cwd(),
+  agentDir,
+  authStorage,
+  modelRegistry,
+  settingsManager,
+  sessionManager,
+  resourceLoader,
+});
+
+session.subscribe((event) => {
+  if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
+    process.stdout.write(event.assistantMessageEvent.delta);
+  }
+});
+
+await session.prompt("Help me understand this repo.");
+```
+
+This is not yet a full product, but it is the correct starting shape for one.
+
+---
+
+## 19.20 When to Reuse Pi's Interactive Mode
+
+The SDK exports `InteractiveMode`, `runPrintMode`, and `runRpcMode`.
+
+These are useful if you want to reuse existing pi surfaces while changing the surrounding setup.
+
+### Reuse `InteractiveMode` when
+- you want pi's TUI mostly intact
+- but with app-owned storage, extensions, defaults, and resources
+
+### Do not reuse it when
+- you want a strongly branded UI
+- you want different commands or layout metaphors
+- you want your app to feel fundamentally different from pi
+
+For a white-labeled product, `InteractiveMode` is a good prototyping step, not always the final product surface.
+
+---
+
+## 19.21 What to Avoid in a Branded Product
+
+### Avoid accidental dependence on ambient user state
+If your app silently loads from a user's `~/.gsd`, you may get:
+- surprising extensions
+- strange prompts
+- odd themes
+- hard-to-debug behavior differences
+
+### Avoid mixing branding and storage casually
+If your app is called `gsd`, but state lives in `~/.gsd`, users will notice.
+
+### Avoid choosing RPC just because it sounds generic
+If your app is already Node/TypeScript, SDK embedding is usually simpler and more powerful.
+
+### Avoid exposing every pi concept unless you want to
+A branded product should choose what the user sees.
+You do not need to expose:
+- all slash commands
+- all extension loading paths
+- all package concepts
+- all theme/customization behaviors
+
+---
+
+## 19.22 Suggested Product Postures
+
+### Posture A: “Pi-compatible branded shell”
+- Uses pi concepts openly
+- Supports pi packages and pi-style discovery
+- Good for power users
+
+### Posture B: “Branded app powered by pi”
+- Uses pi internally
+- App-owned directories and resources
+- Explicit plugins only
+- Good for productized tools like `gsd`
+
+### Posture C: “Custom agent product using pi primitives”
+- Uses `pi-agent-core` or selective libraries
+- Pi itself is mostly invisible
+- Good for SaaS or browser products
+
+For most branded command-line products, posture **B** is the best fit.
+
+---
+
+## 19.23 Recommended Documentation Reading Order for This Use Case
+
+If you are building a branded app on top of pi, read in this order:
+
+1. `what-is-pi/14-the-sdk-rpc-embedding-pi.md`
+2. this file
+3. `extending-pi/19-packaging-distribution.md`
+4. `extending-pi/04-extension-locations-discovery.md`
+5. `extending-pi/05-extension-structure-styles.md`
+6. `extending-pi/12-custom-ui-visual-components.md`
+7. `pi-ui-tui/01-the-ui-architecture.md`
+8. `pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md`
+9. `pi-ui-tui/22-quick-reference-all-ui-apis.md`
+
+Then read the source package docs for exact API details:
+- `packages/coding-agent/docs/sdk.md`
+- `packages/coding-agent/docs/rpc.md`
+- `packages/coding-agent/docs/extensions.md`
+- `packages/coding-agent/docs/packages.md`
+- `packages/web-ui/README.md`
+
+---
+
+## 19.24 Bottom Line
+
+If your goal is:
+
+> “I want users to download and run `gsd`, and have it use pi internally without requiring a separate pi install or `~/.gsd` setup.”
+
+Then the answer is:
+
+- **Yes, that is a supported architecture**
+- **Use the SDK first unless you have a strong reason to choose RPC**
+- **Use app-owned storage directories**
+- **Bundle your own resources instead of relying on global discovery**
+- **Use pi packages as an ecosystem mechanism, not as a requirement for your app's internal structure**
+- **Treat pi as a foundation layer, not necessarily the product surface**
+
+That is the difference between:
+- “using pi as a user tool”
+- and “building your own product on top of pi.”
diff --git a/docs/what-is-pi/README.md b/docs/what-is-pi/README.md
new file mode 100644
index 000000000..6dcb20022
--- /dev/null
+++ b/docs/what-is-pi/README.md
@@ -0,0 +1,30 @@
+# Pi: What It Is, How It Works, and Why It Matters
+
+> Split into individual files for easier consumption.
+
+## Table of Contents
+
+- [01. What Pi Is](./01-what-pi-is.md)
+- [02. Design Philosophy](./02-design-philosophy.md)
+- [03. The Four Modes of Operation](./03-the-four-modes-of-operation.md)
+- [04. The Architecture — How Everything Fits Together](./04-the-architecture-how-everything-fits-together.md)
+- [05. The Agent Loop — How Pi Thinks](./05-the-agent-loop-how-pi-thinks.md)
+- [06. Tools — How Pi Acts on the World](./06-tools-how-pi-acts-on-the-world.md)
+- [07. Sessions — Memory That Branches](./07-sessions-memory-that-branches.md)
+- [08. Compaction — How Pi Manages Context Limits](./08-compaction-how-pi-manages-context-limits.md)
+- [09. The Customization Stack](./09-the-customization-stack.md)
+- [10. Providers & Models — Multi-Model by Default](./10-providers-models-multi-model-by-default.md)
+- [11. The Interactive TUI](./11-the-interactive-tui.md)
+- [12. The Message Queue — Talking While Pi Thinks](./12-the-message-queue-talking-while-pi-thinks.md)
+- [13. Context Files — Project Instructions](./13-context-files-project-instructions.md)
+- [14. The SDK & RPC — Embedding Pi](./14-the-sdk-rpc-embedding-pi.md)
+- [15. Pi Packages — The Ecosystem](./15-pi-packages-the-ecosystem.md)
+- [16. Why Pi Matters — What Makes It Different](./16-why-pi-matters-what-makes-it-different.md)
+- [17. File Reference — All Documentation](./17-file-reference-all-documentation.md)
+- [18. Quick Reference — Commands & Shortcuts](./18-quick-reference-commands-shortcuts.md)
+- [19. Building Branded Apps on Top of Pi](./19-building-branded-apps-on-top-of-pi.md)
+
+---
+
+*Split into per-section files for surgical context loading.*
+
diff --git a/docs/working-in-teams.md b/docs/working-in-teams.md
new file mode 100644
index 000000000..71956d5ff
--- /dev/null
+++ b/docs/working-in-teams.md
@@ -0,0 +1,101 @@
+# Working in Teams
+
+GSD supports multi-user workflows where several developers work on the same repository concurrently.
+
+## Setup
+
+### 1. Set Team Mode
+
+The simplest way to configure GSD for team use is to set `mode: team` in your project preferences. This enables unique milestone IDs, push branches, and pre-merge checks in one setting:
+
+```yaml
+# .gsd/preferences.md (project-level, committed to git)
+---
+version: 1
+mode: team
+---
+```
+
+This is equivalent to manually setting `unique_milestone_ids: true`, `git.push_branches: true`, `git.pre_merge_check: true`, and other team-appropriate defaults. You can still override individual settings — for example, adding `git.auto_push: true` on top of `mode: team` if your team prefers auto-push.
+
+Alternatively, you can configure each setting individually without using a mode (see [Git Strategy](git-strategy.md) for details).
+
+### 2. Configure `.gitignore`
+
+Share planning artifacts (milestones, roadmaps, decisions) while keeping runtime files local:
+
+```bash
+# ── GSD: Runtime / Ephemeral (per-developer, per-session) ──────
+.gsd/auto.lock
+.gsd/completed-units.json
+.gsd/STATE.md
+.gsd/metrics.json
+.gsd/activity/
+.gsd/runtime/
+.gsd/worktrees/
+.gsd/milestones/**/continue.md
+.gsd/milestones/**/*-CONTINUE.md
+```
+
+**What gets shared** (committed to git):
+- `.gsd/preferences.md` — project preferences
+- `.gsd/PROJECT.md` — living project description
+- `.gsd/REQUIREMENTS.md` — requirement contract
+- `.gsd/DECISIONS.md` — architectural decisions
+- `.gsd/milestones/` — roadmaps, plans, summaries, research
+
+**What stays local** (gitignored):
+- Lock files, metrics, state cache, runtime records, worktrees, activity logs
+
+### 3. Commit the Preferences
+
+```bash
+git add .gsd/preferences.md
+git commit -m "chore: enable GSD team workflow"
+```
+
+## `commit_docs: false`
+
+For teams where only some members use GSD, or when company policy requires a clean repo:
+
+```yaml
+git:
+  commit_docs: false
+```
+
+This adds `.gsd/` to `.gitignore` entirely and keeps all artifacts local. The developer gets the benefits of structured planning without affecting teammates who don't use GSD.
+
+## Migrating an Existing Project
+
+If you have an existing project with `.gsd/` blanket-ignored:
+
+1. Ensure no milestones are in progress (clean state)
+2. Update `.gitignore` to use the selective pattern above
+3. Add `unique_milestone_ids: true` to `.gsd/preferences.md`
+4. Optionally rename existing milestones to use unique IDs:
+   ```
+   I have turned on unique milestone ids, please update all old milestone
+   ids to use this new format e.g. M001-abc123 where abc123 is a random
+   6 char lowercase alpha numeric string. Update all references in all
+   .gsd file contents, file names and directory names. Validate your work
+   once done to ensure referential integrity.
+   ```
+5. Commit
+
+## Parallel Development
+
+Multiple developers can run auto mode simultaneously on different milestones. Each developer:
+
+- Gets their own worktree (`.gsd/worktrees/<MID>/`, gitignored)
+- Works on a unique `milestone/<MID>` branch
+- Squash-merges to main independently
+
+Milestone dependencies can be declared in `M00X-CONTEXT.md` frontmatter:
+
+```yaml
+---
+depends_on: [M001-eh88as]
+---
+```
+
+GSD enforces that dependent milestones complete before starting downstream work.
diff --git a/packages/pi-coding-agent/src/core/package-manager.ts b/packages/pi-coding-agent/src/core/package-manager.ts
index d29c44ca5..e07b28c4e 100644
--- a/packages/pi-coding-agent/src/core/package-manager.ts
+++ b/packages/pi-coding-agent/src/core/package-manager.ts
@@ -1701,9 +1701,18 @@ export class DefaultPackageManager implements PackageManager {
 			);
 		}
 		{
+			// Ecosystem skills (~/.agents/skills/) take priority over legacy config-dir skills.
+			// Skip legacy dir entirely when migration has completed (marker file present).
+			const legacySkillsMigrated =
+				resolve(userDirs.skills) !== resolve(userAgentsSkillsDir) &&
+				existsSync(join(userDirs.skills, ".migrated-to-agents"));
+			const legacyUserSkillEntries =
+				!legacySkillsMigrated && userSubdirs.has("skills")
+					? collectAutoSkillEntries(userDirs.skills)
+					: [];
 			const skillEntries = [
-				...(userSubdirs.has("skills") ? collectAutoSkillEntries(userDirs.skills) : []),
 				...collectAutoSkillEntries(userAgentsSkillsDir),
+				...legacyUserSkillEntries,
 			];
 			if (skillEntries.length > 0) {
 				addResources("skills", skillEntries, userMetadata, userOverrides.skills, globalBaseDir);
diff --git a/packages/pi-coding-agent/src/core/skills.ts b/packages/pi-coding-agent/src/core/skills.ts
index 9868b1546..a8ab488ef 100644
--- a/packages/pi-coding-agent/src/core/skills.ts
+++ b/packages/pi-coding-agent/src/core/skills.ts
@@ -2,10 +2,28 @@ import { existsSync, readdirSync, readFileSync, realpathSync, statSync } from "f
 import ignore from "ignore";
 import { homedir } from "os";
 import { basename, dirname, isAbsolute, join, relative, resolve, sep } from "path";
-import { CONFIG_DIR_NAME, getAgentDir } from "../config.js";
 import { parseFrontmatter } from "../utils/frontmatter.js";
 import { toPosixPath } from "../utils/path-display.js";
 import type { ResourceDiagnostic } from "./diagnostics.js";
+import { CONFIG_DIR_NAME } from "../config.js";
+
+/**
+ * The standard ecosystem skills directory used by skills.sh and the
+ * Agent Skills standard.  All agents share this location for globally
+ * installed skills.
+ */
+export const ECOSYSTEM_SKILLS_DIR = join(homedir(), ".agents", "skills");
+
+/**
+ * The standard project-level skills directory (`.agents/skills/` relative to cwd).
+ */
+export const ECOSYSTEM_PROJECT_SKILLS_DIR = ".agents";
+
+/**
+ * Legacy skills directory (~/.gsd/agent/skills/ or ~/.pi/agent/skills/).
+ * Read as a fallback so existing installs don't lose skills before migration runs.
+ */
+const LEGACY_SKILLS_DIR = join(homedir(), CONFIG_DIR_NAME, "agent", "skills");
 
 /** Max name length per spec */
 const MAX_NAME_LENGTH = 64;
@@ -331,7 +349,7 @@ function escapeXml(str: string): string {
 export interface LoadSkillsOptions {
 	/** Working directory for project-local skills. Default: process.cwd() */
 	cwd?: string;
-	/** Agent config directory for global skills. Default: ~/.pi/agent */
+	/** @deprecated Skills now use ~/.agents/skills/ exclusively. This option is ignored. */
 	agentDir?: string;
 	/** Explicit skill paths (files or directories) */
 	skillPaths?: string[];
@@ -357,10 +375,7 @@ function resolveSkillPath(p: string, cwd: string): string {
  * Returns skills and any validation diagnostics.
  */
 export function loadSkills(options: LoadSkillsOptions = {}): LoadSkillsResult {
-	const { cwd = process.cwd(), agentDir, skillPaths = [], includeDefaults = true } = options;
-
-	// Resolve agentDir - if not provided, use default from config
-	const resolvedAgentDir = agentDir ?? getAgentDir();
+	const { cwd = process.cwd(), skillPaths = [], includeDefaults = true } = options;
 
 	const skillMap = new Map<string, Skill>();
 	const realPathSet = new Set<string>();
@@ -404,12 +419,22 @@ export function loadSkills(options: LoadSkillsOptions = {}): LoadSkillsResult {
 	}
 
 	if (includeDefaults) {
-		addSkills(loadSkillsFromDirInternal(join(resolvedAgentDir, "skills"), "user", true));
-		addSkills(loadSkillsFromDirInternal(resolve(cwd, CONFIG_DIR_NAME, "skills"), "project", true));
+		// Primary: ~/.agents/skills/ — the industry-standard skills.sh location
+		addSkills(loadSkillsFromDirInternal(ECOSYSTEM_SKILLS_DIR, "user", true));
+		// Primary project: .agents/skills/ — standard project-level location
+		addSkills(loadSkillsFromDirInternal(resolve(cwd, ECOSYSTEM_PROJECT_SKILLS_DIR, "skills"), "project", true));
+
+		// Legacy fallback: read skills from ~/.gsd/agent/skills/ so existing
+		// installs keep working until the one-time migration in resource-loader
+		// copies them to ~/.agents/skills/. Skip if migration has completed.
+		const legacyMigrated = existsSync(join(LEGACY_SKILLS_DIR, ".migrated-to-agents"));
+		if (LEGACY_SKILLS_DIR !== ECOSYSTEM_SKILLS_DIR && existsSync(LEGACY_SKILLS_DIR) && !legacyMigrated) {
+			addSkills(loadSkillsFromDirInternal(LEGACY_SKILLS_DIR, "user", true));
+		}
 	}
 
-	const userSkillsDir = join(resolvedAgentDir, "skills");
-	const projectSkillsDir = resolve(cwd, CONFIG_DIR_NAME, "skills");
+	const userSkillsDir = ECOSYSTEM_SKILLS_DIR;
+	const projectSkillsDir = resolve(cwd, ECOSYSTEM_PROJECT_SKILLS_DIR, "skills");
 
 	const isUnderPath = (target: string, root: string): boolean => {
 		const normalizedRoot = resolve(root);
diff --git a/packages/pi-coding-agent/src/index.ts b/packages/pi-coding-agent/src/index.ts
index 9787c3b5e..e194e0324 100644
--- a/packages/pi-coding-agent/src/index.ts
+++ b/packages/pi-coding-agent/src/index.ts
@@ -219,6 +219,8 @@ export {
 } from "./core/settings-manager.js";
 // Skills
 export {
+	ECOSYSTEM_SKILLS_DIR,
+	ECOSYSTEM_PROJECT_SKILLS_DIR,
 	formatSkillsForPrompt,
 	getLoadedSkills,
 	type LoadSkillsFromDirOptions,
diff --git a/scripts/install-hooks.sh b/scripts/install-hooks.sh
new file mode 100755
index 000000000..30bfd629e
--- /dev/null
+++ b/scripts/install-hooks.sh
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+# Installs the git pre-commit hook for secret scanning.
+# Safe to run multiple times — only installs if not already present.
+
+set -euo pipefail
+
+HOOK_DIR="$(git rev-parse --git-dir)/hooks"
+HOOK_FILE="$HOOK_DIR/pre-commit"
+MARKER="# gsd-secret-scan"
+
+mkdir -p "$HOOK_DIR"
+
+# Check if our hook is already installed
+if [[ -f "$HOOK_FILE" ]] && grep -q "$MARKER" "$HOOK_FILE" 2>/dev/null; then
+  echo "secret-scan pre-commit hook already installed."
+  exit 0
+fi
+
+# If a pre-commit hook already exists, append; otherwise create
+if [[ -f "$HOOK_FILE" ]]; then
+  echo "" >> "$HOOK_FILE"
+  echo "$MARKER" >> "$HOOK_FILE"
+  echo 'bash "$(git rev-parse --show-toplevel)/scripts/secret-scan.sh"' >> "$HOOK_FILE"
+  echo "secret-scan appended to existing pre-commit hook."
+else
+  cat > "$HOOK_FILE" << 'EOF'
+#!/usr/bin/env bash
+# gsd-secret-scan
+# Pre-commit hook: scan staged files for hardcoded secrets
+bash "$(git rev-parse --show-toplevel)/scripts/secret-scan.sh"
+EOF
+  chmod +x "$HOOK_FILE"
+  echo "secret-scan pre-commit hook installed."
+fi
diff --git a/src/resource-loader.ts b/src/resource-loader.ts
index ded6d3185..690a2e788 100644
--- a/src/resource-loader.ts
+++ b/src/resource-loader.ts
@@ -1,7 +1,7 @@
 import { DefaultResourceLoader } from '@gsd/pi-coding-agent'
 import { createHash } from 'node:crypto'
 import { homedir } from 'node:os'
-import { chmodSync, copyFileSync, cpSync, existsSync, lstatSync, mkdirSync, readFileSync, readlinkSync, readdirSync, rmSync, statSync, symlinkSync, unlinkSync, writeFileSync } from 'node:fs'
+import { chmodSync, copyFileSync, cpSync, existsSync, lstatSync, mkdirSync, openSync, closeSync, readFileSync, readlinkSync, readdirSync, rmSync, statSync, symlinkSync, unlinkSync, writeFileSync } from 'node:fs'
 import { dirname, join, relative, resolve } from 'node:path'
 import { fileURLToPath } from 'node:url'
 import { compareSemver } from './update-check.js'
@@ -381,9 +381,12 @@ function pruneRemovedBundledExtensions(
  *
  * - extensions/ → ~/.gsd/agent/extensions/   (overwrite when version changes)
  * - agents/     → ~/.gsd/agent/agents/        (overwrite when version changes)
- * - skills/     → ~/.gsd/agent/skills/        (overwrite when version changes)
  * - GSD-WORKFLOW.md → ~/.gsd/agent/GSD-WORKFLOW.md (fallback for env var miss)
  *
+ * Skills are NOT synced here. They are installed by the user via the
+ * skills.sh CLI (`npx skills add <repo>`) into ~/.agents/skills/ — the
+ * industry-standard Agent Skills ecosystem directory.
+ *
  * Skips the copy when the managed-resources.json version matches the current
  * GSD version, avoiding ~128ms of synchronous cpSync on every startup.
  * After `npm update -g @glittercowboy/gsd`, versions will differ and the
@@ -408,6 +411,10 @@ export function initResources(agentDir: string): void {
   // extensions fail to resolve @gsd/* packages, rendering GSD non-functional.
   ensureNodeModulesSymlink(agentDir)
 
+  // Migrate legacy skills on every launch (not gated by manifest) so that
+  // partial-failure retries don't wait for a version bump.
+  migrateSkillsToEcosystemDir(agentDir)
+
   // Skip the full copy when both version AND content fingerprint match.
   // Version-only checks miss same-version content changes (npm link dev workflow,
   // hotfixes within a release). The content hash catches those at ~1ms cost.
@@ -424,7 +431,13 @@ export function initResources(agentDir: string): void {
 
   syncResourceDir(bundledExtensionsDir, join(agentDir, 'extensions'))
   syncResourceDir(join(resourcesDir, 'agents'), join(agentDir, 'agents'))
-  syncResourceDir(join(resourcesDir, 'skills'), join(agentDir, 'skills'))
+  // Skills are no longer force-synced here. Users install skills via the
+  // skills.sh CLI (`npx skills add <repo>`) into ~/.agents/skills/ which
+  // is the industry-standard Agent Skills ecosystem directory.
+  //
+  // Migration from the legacy ~/.gsd/agent/skills/ directory is handled
+  // above the manifest check so it runs on every launch (including retries
+  // after partial copy failures).
 
   // Sync GSD-WORKFLOW.md to agentDir as a fallback for when GSD_WORKFLOW_PATH
   // env var is not set (e.g. fork/dev builds, alternative entry points).
@@ -441,6 +454,109 @@ export function initResources(agentDir: string): void {
   ensureRegistryEntries(join(agentDir, 'extensions'))
 }
 
+// ─── Legacy Skill Migration ──────────────────────────────────────────────────────
+
+/**
+ * One-time migration: copy user-customized skills from the old
+ * ~/.gsd/agent/skills/ directory into ~/.agents/skills/.
+ *
+ * The migration is conservative:
+ *  - Only skill directories containing a SKILL.md are considered.
+ *  - Copies, does not move — the old directory stays intact so downgrading
+ *    to a pre-migration GSD version still works.
+ *  - Collision-safe — if a skill name already exists in the target, the
+ *    existing ecosystem skill wins (user may have already installed a newer
+ *    version via skills.sh).
+ *  - Writes a `.migrated-to-agents` marker inside the legacy directory so
+ *    the migration runs at most once.
+ */
+function migrateSkillsToEcosystemDir(agentDir: string): void {
+  const legacyDir = join(agentDir, 'skills')
+  const markerPath = join(legacyDir, '.migrated-to-agents')
+
+  // Already migrated or no legacy dir — nothing to do
+  if (!existsSync(legacyDir)) return
+
+  // Atomic marker check — 'wx' fails if file already exists, preventing races
+  // when two GSD processes start simultaneously.
+  let markerFd: number
+  try {
+    markerFd = openSync(markerPath, 'wx')
+  } catch {
+    return // marker already exists (another process won the race, or already migrated)
+  }
+
+  try {
+    const ecosystemDir = join(homedir(), '.agents', 'skills')
+    mkdirSync(ecosystemDir, { recursive: true })
+
+    const entries = readdirSync(legacyDir, { withFileTypes: true })
+    let migrated = 0
+    let candidates = 0
+    for (const entry of entries) {
+      // Handle both real directories and symlinks pointing to directories
+      const isDir = entry.isDirectory()
+      const isSymlink = entry.isSymbolicLink()
+      if (!isDir && !isSymlink) continue
+
+      const sourcePath = join(legacyDir, entry.name)
+
+      // For symlinks, verify the target is a directory
+      if (isSymlink) {
+        try {
+          const stat = statSync(sourcePath)
+          if (!stat.isDirectory()) continue
+        } catch {
+          continue // broken symlink — skip
+        }
+      }
+
+      const skillMd = join(sourcePath, 'SKILL.md')
+      if (!existsSync(skillMd)) continue
+
+      const target = join(ecosystemDir, entry.name)
+      if (existsSync(target)) continue // ecosystem version wins
+
+      candidates++
+      try {
+        if (isSymlink) {
+          // Recreate the symlink in the ecosystem directory using an absolute
+          // target. Relative symlinks would resolve from the new parent dir
+          // (~/.agents/skills/) instead of the original (~/.gsd/agent/skills/),
+          // pointing to the wrong location.
+          const rawTarget = readlinkSync(sourcePath)
+          const absTarget = resolve(dirname(sourcePath), rawTarget)
+          symlinkSync(absTarget, target)
+        } else {
+          cpSync(sourcePath, target, { recursive: true })
+        }
+        migrated++
+      } catch {
+        // non-fatal — skip this skill
+      }
+    }
+
+    // If any skills failed to copy, remove the marker so migration retries
+    // on the next launch.  This keeps the legacy dir as fallback until every
+    // skill has been successfully migrated.
+    if (migrated < candidates) {
+      try { closeSync(markerFd); markerFd = -1 } catch { /* non-fatal */ }
+      try { unlinkSync(markerPath) } catch { /* non-fatal */ }
+      return
+    }
+
+    // Write migration info to the marker
+    try { writeFileSync(markerFd, `Migrated ${migrated} skill(s) to ${ecosystemDir} on ${new Date().toISOString()}\n`) } catch { /* non-fatal */ }
+  } catch {
+    // can't create ecosystem dir or read legacy dir — close fd first (required on Windows
+    // where unlinkSync fails on open handles), then remove marker so we retry next launch
+    try { closeSync(markerFd); markerFd = -1 } catch { /* non-fatal */ }
+    try { unlinkSync(markerPath) } catch { /* non-fatal */ }
+  } finally {
+    if (markerFd !== -1) { try { closeSync(markerFd) } catch { /* non-fatal */ } }
+  }
+}
+
 export function hasStaleCompiledExtensionSiblings(extensionsDir: string): boolean {
   if (!existsSync(extensionsDir)) return false
   for (const entry of readdirSync(extensionsDir, { withFileTypes: true })) {
diff --git a/src/resources/extensions/gsd/auto-observability.ts b/src/resources/extensions/gsd/auto-observability.ts
new file mode 100644
index 000000000..ddcc0bf3d
--- /dev/null
+++ b/src/resources/extensions/gsd/auto-observability.ts
@@ -0,0 +1,74 @@
+/**
+ * Pre-dispatch observability checks for auto-mode units.
+ * Validates plan/summary file quality and builds repair instructions
+ * for the agent to fix gaps before proceeding with the unit.
+ */
+
+import type { ExtensionContext } from "@gsd/pi-coding-agent";
+import {
+  validatePlanBoundary,
+  validateExecuteBoundary,
+  validateCompleteBoundary,
+  formatValidationIssues,
+} from "./observability-validator.js";
+import type { ValidationIssue } from "./observability-validator.js";
+
+export async function collectObservabilityWarnings(
+  ctx: ExtensionContext,
+  basePath: string,
+  unitType: string,
+  unitId: string,
+): Promise<ValidationIssue[]> {
+  // Hook units have custom artifacts — skip standard observability checks
+  if (unitType.startsWith("hook/")) return [];
+
+  const parts = unitId.split("/");
+  const mid = parts[0];
+  const sid = parts[1];
+  const tid = parts[2];
+
+  if (!mid || !sid) return [];
+
+  let issues = [] as Awaited<ReturnType<typeof validatePlanBoundary>>;
+
+  if (unitType === "plan-slice") {
+    issues = await validatePlanBoundary(basePath, mid, sid);
+  } else if (unitType === "execute-task" && tid) {
+    issues = await validateExecuteBoundary(basePath, mid, sid, tid);
+  } else if (unitType === "complete-slice") {
+    issues = await validateCompleteBoundary(basePath, mid, sid);
+  }
+
+  if (issues.length > 0) {
+    ctx.ui.notify(
+      `Observability check (${unitType}) found ${issues.length} warning${issues.length === 1 ? "" : "s"}:\n${formatValidationIssues(issues)}`,
+      "warning",
+    );
+  }
+
+  return issues;
+}
+
+export function buildObservabilityRepairBlock(issues: ValidationIssue[]): string {
+  if (issues.length === 0) return "";
+  const items = issues.map(issue => {
+    const fileName = issue.file.split("/").pop() || issue.file;
+    let line = `- **${fileName}**: ${issue.message}`;
+    if (issue.suggestion) line += ` → ${issue.suggestion}`;
+    return line;
+  });
+  return [
+    "",
+    "---",
+    "",
+    "## Pre-flight: Observability gaps to fix FIRST",
+    "",
+    "The following issues were detected in plan/summary files for this unit.",
+    "**Read each flagged file, apply the fix described, then proceed with the unit.**",
+    "",
+    ...items,
+    "",
+    "---",
+    "",
+  ].join("\n");
+}
diff --git a/src/resources/extensions/gsd/detection.ts b/src/resources/extensions/gsd/detection.ts
index 3c01a277a..7507d427d 100644
--- a/src/resources/extensions/gsd/detection.ts
+++ b/src/resources/extensions/gsd/detection.ts
@@ -6,7 +6,7 @@
  * flow to show when entering a project directory.
  */
 
-import { existsSync, readdirSync, readFileSync, statSync } from "node:fs";
+import { existsSync, openSync, readSync, closeSync, readdirSync, readFileSync, statSync } from "node:fs";
 import { join } from "node:path";
 import { homedir } from "node:os";
 import { gsdRoot } from "./paths.js";
@@ -48,6 +48,9 @@ export interface V2Detection {
   hasContext: boolean;
 }
 
+/** Apple platform SDKROOTs found in Xcode project.pbxproj files. */
+export type XcodePlatform = "iphoneos" | "macosx" | "watchos" | "appletvos" | "xros";
+
 export interface ProjectSignals {
   /** Detected project/package files */
   detectedFiles: string[];
@@ -57,6 +60,8 @@ export interface ProjectSignals {
   isMonorepo: boolean;
   /** Primary language hint */
   primaryLanguage?: string;
+  /** Apple platform SDKROOTs detected from *.xcodeproj/project.pbxproj */
+  xcodePlatforms: XcodePlatform[];
   /** Has existing CI configuration? */
   hasCI: boolean;
   /** Has existing test setup? */
@@ -97,10 +102,81 @@ export const PROJECT_FILES = [
   "project.yml",
   ".xcodeproj",
   ".xcworkspace",
-  // Docker
+  // Cloud platform config files
+  "firebase.json",
+  "cdk.json",
+  "samconfig.toml",
+  "serverless.yml",
+  "serverless.yaml",
+  "azure-pipelines.yml",
+  // Database / ORM config files
+  "prisma/schema.prisma",
+  "supabase/config.toml",
+  "drizzle.config.ts",
+  "drizzle.config.js",
+  "redis.conf",
+  // React Native markers
+  "metro.config.js",
+  "metro.config.ts",
+  "react-native.config.js",
+  // Frontend framework config files
+  "angular.json",
+  "next.config.js",
+  "next.config.ts",
+  "next.config.mjs",
+  "nuxt.config.ts",
+  "nuxt.config.js",
+  "svelte.config.js",
+  "svelte.config.ts",
+  // Vue CLI config files
+  "vue.config.js",
+  "vue.config.ts",
+  // Frontend tooling
+  "tailwind.config.js",
+  "tailwind.config.ts",
+  "tailwind.config.mjs",
+  "tailwind.config.cjs",
+  // Android project markers
+  "app/build.gradle",
+  "app/build.gradle.kts",
+  // Container / DevOps config files
   "Dockerfile",
+  "docker-compose.yml",
+  "docker-compose.yaml",
+  // Infrastructure as Code
+  "main.tf",
+  // Kubernetes / Helm markers
+  "Chart.yaml",
+  "kustomization.yaml",
+  // CI/CD markers
+  ".github/workflows",
+  // Blockchain / Web3 markers
+  "hardhat.config.js",
+  "hardhat.config.ts",
+  "foundry.toml",
+  // Data engineering markers
+  "dbt_project.yml",
+  "airflow.cfg",
+  // Game engine markers
+  "ProjectSettings/ProjectVersion.txt",
+  "project.godot",
+  // Python framework markers
+  "manage.py",
+  "requirements.txt",
 ] as const;
 
+/** File extensions that indicate SQLite databases in the project. */
+const SQLITE_EXTENSIONS = [".sqlite", ".sqlite3", ".db"] as const;
+
+/** File extensions that indicate SQL usage (migrations, schemas, seeds). */
+const SQL_EXTENSIONS = [".sql"] as const;
+
+/** File extensions that indicate .NET / C# projects. */
+const DOTNET_EXTENSIONS = [".csproj", ".sln", ".fsproj"] as const;
+
+/** File extensions that indicate Vue.js single-file components. */
+const VUE_EXTENSIONS = [".vue"] as const;
+
 const LANGUAGE_MAP: Record<string, string> = {
   "package.json": "javascript/typescript",
   "Cargo.toml": "rust",
@@ -111,6 +187,8 @@ const LANGUAGE_MAP: Record<string, string> = {
   "pom.xml": "java",
   "build.gradle": "java/kotlin",
   "build.gradle.kts": "kotlin",
+  "app/build.gradle": "java/kotlin",
+  "app/build.gradle.kts": "kotlin",
   "CMakeLists.txt": "c/c++",
   "composer.json": "php",
   "pubspec.yaml": "dart/flutter",
@@ -125,6 +203,8 @@ const LANGUAGE_MAP: Record<string, string> = {
   ".xcodeproj": "swift/xcode",
   ".xcworkspace": "swift/xcode",
   "Dockerfile": "docker",
+  "manage.py": "python",
+  "requirements.txt": "python",
 };
 
 const MONOREPO_MARKERS = [
@@ -159,6 +239,44 @@ const TEST_MARKERS = [
   "phpunit.xml",
 ] as const;
 
+/** Directories skipped during bounded recursive project scans. */
+const RECURSIVE_SCAN_IGNORED_DIRS = new Set([
+  ".git",
+  "node_modules",
+  ".venv",
+  "venv",
+  "dist",
+  "build",
+  "coverage",
+  ".next",
+  ".nuxt",
+  "target",
+  "vendor",
+  ".turbo",
+  "Pods",
+  "bin",
+  "obj",
+  ".gradle",
+  "DerivedData",
+  "out",
+]) as ReadonlySet<string>;
+
+/** Project file markers safe to detect recursively via suffix matching. */
+const ROOT_ONLY_PROJECT_FILES = new Set<string>([
+  ".github/workflows",
+  "package.json",
+  "Gemfile",
+  "Makefile",
+  "CMakeLists.txt",
+  "build.gradle",
+  "build.gradle.kts",
+  "deno.json",
+  "deno.jsonc",
+]);
+
+const MAX_RECURSIVE_SCAN_FILES = 2000;
+const MAX_RECURSIVE_SCAN_DEPTH = 6;
+
 // ─── Core Detection ─────────────────────────────────────────────────────────────
 
 /**
@@ -280,9 +398,88 @@ export function detectProjectSignals(basePath: string): ProjectSignals {
     }
   }
 
+  // Bounded recursive scan for nested markers and dependency files.
+  // This covers common brownfield layouts like src/App/App.csproj,
+  // db/migrations/*.sql, src/components/*.vue, and services/api/pyproject.toml
+  // without walking the entire repo or diving into heavyweight folders.
+  const scannedFiles = scanProjectFiles(basePath);
+
+  for (const file of PROJECT_FILES) {
+    if (detectedFiles.includes(file) || ROOT_ONLY_PROJECT_FILES.has(file)) continue;
+    const hasMatch = file === "requirements.txt"
+      ? scannedFiles.some(isPythonRequirementsFile)
+      : scannedFiles.some((scannedFile) => matchesProjectFileMarker(scannedFile, file));
+    if (hasMatch) {
+      pushUnique(detectedFiles, file);
+      if (!primaryLanguage && LANGUAGE_MAP[file]) {
+        primaryLanguage = LANGUAGE_MAP[file];
+      }
+    }
+  }
+
+  if (scannedFiles.some((file) => SQLITE_EXTENSIONS.some((ext) => file.endsWith(ext)))) {
+    pushUnique(detectedFiles, "*.sqlite");
+  }
+  if (scannedFiles.some((file) => SQL_EXTENSIONS.some((ext) => file.endsWith(ext)))) {
+    pushUnique(detectedFiles, "*.sql");
+  }
+
+  const hasCsproj = scannedFiles.some((file) => file.endsWith(".csproj"));
+  const hasFsproj = scannedFiles.some((file) => file.endsWith(".fsproj"));
+  const hasSln = scannedFiles.some((file) => file.endsWith(".sln"));
+
+  if (hasCsproj) {
+    pushUnique(detectedFiles, "*.csproj");
+    if (!primaryLanguage) primaryLanguage = "csharp";
+  }
+  if (hasFsproj) {
+    pushUnique(detectedFiles, "*.fsproj");
+    if (!primaryLanguage) primaryLanguage = "fsharp";
+  }
+  if (hasSln) {
+    pushUnique(detectedFiles, "*.sln");
+    if (!primaryLanguage) primaryLanguage = "dotnet";
+  }
+
+  if (scannedFiles.some((file) => VUE_EXTENSIONS.some((ext) => file.endsWith(ext)))) {
+    pushUnique(detectedFiles, "*.vue");
+  }
+
+  // Python framework detection — scan dependency files for framework-specific packages.
+  // Adds synthetic markers (e.g. "dep:fastapi") so skill catalog matchFiles can reference them.
+  const dependencyFiles = scannedFiles.filter((file) =>
+    isPythonRequirementsFile(file) || file.endsWith("pyproject.toml"),
+  );
+  if (containsFastapiDependency(basePath, dependencyFiles)) {
+    pushUnique(detectedFiles, "dep:fastapi");
+  }
+
+  const springBootBuildFiles = scannedFiles.filter((file) =>
+    file.endsWith("pom.xml") || file.endsWith("build.gradle") || file.endsWith("build.gradle.kts"),
+  );
+  const springBootVersionCatalogs = scannedFiles.filter((file) => file.endsWith(".versions.toml"));
+  const springBootSettingsFiles = scannedFiles.filter((file) =>
+    file.endsWith("settings.gradle") || file.endsWith("settings.gradle.kts"),
+  );
+  if (containsSpringBootMarker(basePath, springBootBuildFiles, springBootVersionCatalogs, springBootSettingsFiles)) {
+    pushUnique(detectedFiles, "dep:spring-boot");
+    if (!primaryLanguage) {
+      primaryLanguage = "java/kotlin";
+    }
+  }
+
   // Git repo detection
   const isGitRepo = existsSync(join(basePath, ".git"));
 
+  // Xcode platform detection — parse SDKROOT from project.pbxproj
+  const xcodePlatforms = detectXcodePlatforms(basePath);
+
+  // Set primaryLanguage to swift when an Xcode project is found but no
+  // Package.swift was detected (CocoaPods or SPM-less projects).
+  if (!primaryLanguage && xcodePlatforms.length > 0) {
+    primaryLanguage = "swift";
+  }
+
   // Monorepo detection
   let isMonorepo = false;
   for (const marker of MONOREPO_MARKERS) {
@@ -325,6 +522,7 @@ export function detectProjectSignals(basePath: string): ProjectSignals {
     isGitRepo,
     isMonorepo,
     primaryLanguage,
+    xcodePlatforms,
     hasCI,
     hasTests,
     packageManager,
@@ -332,6 +530,100 @@ export function detectProjectSignals(basePath: string): ProjectSignals {
   };
 }
 
+// ─── Xcode Platform Detection ───────────────────────────────────────────────────
+
+/** Known SDKROOT values → canonical platform names. */
+const SDKROOT_MAP: Record<string, XcodePlatform> = {
+  iphoneos: "iphoneos",
+  iphonesimulator: "iphoneos",      // simulator builds still target iOS
+  macosx: "macosx",
+  watchos: "watchos",
+  watchsimulator: "watchos",
+  appletvos: "appletvos",
+  appletvsimulator: "appletvos",
+  xros: "xros",
+  xrsimulator: "xros",
+};
+
+/** Regex for SUPPORTED_PLATFORMS — fallback when SDKROOT = auto (Xcode 15+). */
+const SUPPORTED_PLATFORMS_RE = /SUPPORTED_PLATFORMS\s*=\s*"([^"]+)"/gi;
+
+/** Read at most `maxBytes` from a file without loading the full file into memory. */
+function readBounded(filePath: string, maxBytes: number): string {
+  const buf = Buffer.alloc(maxBytes);
+  const fd = openSync(filePath, "r");
+  try {
+    const bytesRead = readSync(fd, buf, 0, maxBytes, 0);
+    return buf.toString("utf-8", 0, bytesRead);
+  } finally {
+    closeSync(fd);
+  }
+}
+
+/** Common subdirectories where .xcodeproj may live in monorepos / standard layouts. */
+const XCODE_SUBDIRS = ["ios", "macos", "app", "apps"] as const;
+
+/**
+ * Scan *.xcodeproj directories for project.pbxproj and extract SDKROOT values.
+ * Returns deduplicated, canonical platform list (e.g. ["iphoneos"]).
+ *
+ * Reading the pbxproj is a lightweight regex scan — no full plist parsing needed.
+ * We read at most 1 MB per file to keep detection fast.
+ * Searches both the project root and common subdirectories (ios/, macos/, app/).
+ */
+function detectXcodePlatforms(basePath: string): XcodePlatform[] {
+  const platforms = new Set<XcodePlatform>();
+
+  // Directories to scan: project root + common subdirs
+  const dirsToScan = [basePath];
+  for (const sub of XCODE_SUBDIRS) {
+    const subPath = join(basePath, sub);
+    if (existsSync(subPath)) dirsToScan.push(subPath);
+  }
+
+  for (const dir of dirsToScan) {
+    try {
+      const entries = readdirSync(dir, { withFileTypes: true });
+      for (const entry of entries) {
+        if (!entry.isDirectory() || !entry.name.endsWith(".xcodeproj")) continue;
+        const pbxprojPath = join(dir, entry.name, "project.pbxproj");
+        try {
+          const content = readBounded(pbxprojPath, 1024 * 1024);
+          // Match SDKROOT = <value>; — both quoted and unquoted forms
+          const sdkRe = /SDKROOT\s*=\s*"?([a-z]+)"?\s*;/gi;
+          let m: RegExpExecArray | null;
+          let foundExplicit = false;
+          while ((m = sdkRe.exec(content)) !== null) {
+            const val = m[1].toLowerCase();
+            if (val === "auto") continue; // handled below via SUPPORTED_PLATFORMS
+            const canonical = SDKROOT_MAP[val];
+            if (canonical) {
+              platforms.add(canonical);
+              foundExplicit = true;
+            }
+          }
+          // Xcode 15+ defaults SDKROOT to "auto"; fall back to SUPPORTED_PLATFORMS
+          if (!foundExplicit) {
+            let sp: RegExpExecArray | null;
+            while ((sp = SUPPORTED_PLATFORMS_RE.exec(content)) !== null) {
+              for (const tok of sp[1].split(/\s+/)) {
+                const canonical = SDKROOT_MAP[tok.toLowerCase()];
+                if (canonical) platforms.add(canonical);
+              }
+            }
+            SUPPORTED_PLATFORMS_RE.lastIndex = 0;
+          }
+        } catch {
+          // unreadable pbxproj — skip
+        }
+      }
+    } catch {
+      // unreadable directory
+    }
+  }
+  return [...platforms];
+}
+
 // ─── Package Manager Detection ──────────────────────────────────────────────────
 
 function detectPackageManager(basePath: string): string | undefined {
@@ -392,7 +684,7 @@ function detectVerificationCommands(
     commands.push("go vet ./...");
   }
 
-  if (detectedFiles.includes("pyproject.toml") || detectedFiles.includes("setup.py")) {
+  if (detectedFiles.includes("pyproject.toml") || detectedFiles.includes("setup.py") || detectedFiles.includes("requirements.txt")) {
     commands.push("pytest");
   }
 
@@ -487,3 +779,370 @@ function readMakefileTargets(basePath: string): string[] {
     return [];
   }
 }
+
+function pushUnique(arr: string[], value: string): void {
+  if (!arr.includes(value)) arr.push(value);
+}
+
+function matchesProjectFileMarker(scannedFile: string, marker: string): boolean {
+  const normalized = scannedFile.replaceAll("\\", "/");
+  return (
+    normalized === marker ||
+    normalized.endsWith(`/${marker}`)
+  );
+}
+
+function isPythonRequirementsFile(relativePath: string): boolean {
+  const normalized = relativePath.replaceAll("\\", "/");
+  const basename = normalized.slice(normalized.lastIndexOf("/") + 1);
+  return (
+    basename === "requirements.txt" ||
+    basename === "requirements.in" ||
+    /^requirements([-.].+)?\.(txt|in)$/i.test(basename) ||
+    /(^|\/)requirements\/.+\.(txt|in)$/i.test(normalized)
+  );
+}
+
+function containsFastapiDependency(basePath: string, relativePaths: string[]): boolean {
+  for (const relativePath of relativePaths) {
+    try {
+      const raw = readBounded(join(basePath, relativePath), 64 * 1024);
+      const content = extractDependencyContent(relativePath, raw);
+      if (isPythonRequirementsFile(relativePath)) {
+        for (const line of content.split("\n")) {
+          if (extractRequirementName(line) === "fastapi") return true;
+        }
+        continue;
+      }
+
+      if (relativePath.endsWith("pyproject.toml")) {
+        if (containsFastapiInPyproject(content)) return true;
+      }
+    } catch {
+      // unreadable file — continue scanning other candidate files
+    }
+  }
+
+  return false;
+}
+
+function containsSpringBootMarker(
+  basePath: string,
+  buildFiles: string[],
+  versionCatalogFiles: string[],
+  settingsFiles: string[],
+): boolean {
+  const usedPluginAliases = new Set<string>();
+  const usedLibraryAliases = new Set<string>();
+  const catalogAccessors = resolveVersionCatalogAccessors(basePath, versionCatalogFiles, settingsFiles);
+
+  for (const relativePath of buildFiles) {
+    try {
+      const raw = readBounded(join(basePath, relativePath), 64 * 1024);
+      const content = stripDependencyComments(relativePath, raw);
+      if (containsDirectSpringBootReference(relativePath, content)) {
+        return true;
+      }
+
+      const normalized = content.toLowerCase();
+      let match: RegExpExecArray | null;
+      for (const accessor of catalogAccessors) {
+        const aliasRe = new RegExp(`alias\\(\\s*${accessor}\\.plugins\\.([a-z0-9_.-]+)\\s*\\)`, "gi");
+        while ((match = aliasRe.exec(normalized)) !== null) {
+          usedPluginAliases.add(normalizePluginAlias(match[1]));
+        }
+
+        const libraryAliasRe = new RegExp(`\\b${accessor}\\.((?!plugins\\b)[a-z0-9_.-]+)`, "gi");
+        while ((match = libraryAliasRe.exec(normalized)) !== null) {
+          usedLibraryAliases.add(normalizePluginAlias(match[1]));
+        }
+      }
+    } catch {
+      // unreadable build file — continue scanning others
+    }
+  }
+
+  if (usedPluginAliases.size === 0 && usedLibraryAliases.size === 0) {
+    return false;
+  }
+  if (versionCatalogFiles.length === 0) {
+    return false;
+  }
+
+  const springBootAliases = new Set<string>();
+  const springBootLibraries = new Set<string>();
+  const pendingSpringBootBundles: Array<{ bundleAlias: string; referencedAliases: string[] }> = [];
+  for (const relativePath of versionCatalogFiles) {
+    try {
+      const raw = readBounded(join(basePath, relativePath), 64 * 1024);
+      const content = stripDependencyComments(relativePath, raw);
+      const aliasRe = /^\s*([A-Za-z0-9_.-]+)\s*=\s*\{[^\n}]*\bid\s*=\s*["']org\.springframework\.boot["'][^\n}]*\}/gm;
+      let match: RegExpExecArray | null;
+      while ((match = aliasRe.exec(content)) !== null) {
+        springBootAliases.add(normalizePluginAlias(match[1]));
+      }
+
+      const libraryRe = /^\s*([A-Za-z0-9_.-]+)\s*=\s*\{[^\n}]*\b(module\s*=\s*["']org\.springframework\.boot:[^"']+["']|group\s*=\s*["']org\.springframework\.boot["'][^\n}]*\bname\s*=\s*["']spring-boot[^"']*["'])[^\n}]*\}/gm;
+      while ((match = libraryRe.exec(content)) !== null) {
+        springBootLibraries.add(normalizePluginAlias(match[1]));
+      }
+
+      const bundleRe = /^\s*([A-Za-z0-9_.-]+)\s*=\s*\[([\s\S]*?)\]/gm;
+      while ((match = bundleRe.exec(content)) !== null) {
+        pendingSpringBootBundles.push({
+          bundleAlias: normalizePluginAlias(`bundles.${match[1]}`),
+          referencedAliases: match[2]
+            .split(",")
+            .map((part) => normalizePluginAlias(part.replace(/["'\s]/g, "")))
+            .filter(Boolean),
+        });
+      }
+    } catch {
+      // unreadable version catalog — continue scanning others
+    }
+  }
+
+  const springBootBundles = new Set<string>();
+  for (const pendingBundle of pendingSpringBootBundles) {
+    if (pendingBundle.referencedAliases.some((alias) => springBootLibraries.has(alias))) {
+      springBootBundles.add(pendingBundle.bundleAlias);
+    }
+  }
+
+  for (const alias of usedPluginAliases) {
+    if (springBootAliases.has(alias)) return true;
+  }
+  for (const alias of usedLibraryAliases) {
+    if (springBootLibraries.has(alias) || springBootBundles.has(alias)) return true;
+  }
+
+  return false;
+}
+
+function stripDependencyComments(relativePath: string, content: string): string {
+  if (relativePath.endsWith("requirements.txt")) {
+    return content.replace(/(^|\s)#.*$/gm, "");
+  }
+  if (relativePath.endsWith("pyproject.toml")) {
+    return content.replace(/(^|\s)#.*$/gm, "");
+  }
+  if (relativePath.endsWith(".versions.toml")) {
+    return content.replace(/(^|\s)#.*$/gm, "");
+  }
+  if (relativePath.endsWith("settings.gradle") || relativePath.endsWith("settings.gradle.kts")) {
+    return content
+      .replace(/\/\*[\s\S]*?\*\//g, "")
+      .replace(/\/\/.*$/gm, "");
+  }
+  if (relativePath.endsWith("pom.xml")) {
+    return content.replace(/<!--[\s\S]*?-->/g, "");
+  }
+  if (relativePath.endsWith("build.gradle") || relativePath.endsWith("build.gradle.kts")) {
+    return content
+      .replace(/\/\*[\s\S]*?\*\//g, "")
+      .replace(/\/\/.*$/gm, "");
+  }
+  return content;
+}
+
+function extractDependencyContent(relativePath: string, content: string): string {
+  const stripped = stripDependencyComments(relativePath, content);
+  if (relativePath.endsWith("pyproject.toml")) {
+    return extractPyprojectDependencySections(stripped);
+  }
+  return stripped;
+}
+
+function extractRequirementName(spec: string): string | null {
+  const trimmed = spec.trim().replace(/^["']|["']$/g, "");
+  if (!trimmed) return null;
+
+  const match = trimmed.match(/^([A-Za-z0-9_.-]+)(?:\[[^\]]+\])?(?=\s*(?:@|[<>=!~;]|$))/);
+  if (!match) return null;
+  return normalizePackageName(match[1]);
+}
+
+function containsFastapiInPyproject(content: string): boolean {
+  for (const line of content.split("\n")) {
+    const keyMatch = line.match(/^\s*([A-Za-z0-9_.-]+)\s*=/);
+    if (keyMatch) {
+      const key = normalizePackageName(keyMatch[1]);
+      if (key === "fastapi") {
+        return true;
+      }
+      if (key !== "dependencies") {
+        continue;
+      }
+    }
+
+    const quotedSpecRe = /["']([^"']+)["']/g;
+    let match: RegExpExecArray | null;
+    while ((match = quotedSpecRe.exec(line)) !== null) {
+      if (extractRequirementName(match[1]) === "fastapi") {
+        return true;
+      }
+    }
+  }
+
+  return false;
+}
+
+function containsDirectSpringBootReference(relativePath: string, content: string): boolean {
+  if (relativePath.endsWith("pom.xml")) {
+    return /<groupId>\s*org\.springframework\.boot\s*<\/groupId>/i.test(content);
+  }
+
+  if (relativePath.endsWith("build.gradle") || relativePath.endsWith("build.gradle.kts")) {
+    return /(id\s*\(?\s*["']org\.springframework\.boot["']|apply\s*\(?\s*plugin\s*[:=]\s*["']org\.springframework\.boot["']|(?:implementation|api|compileOnly|runtimeOnly|testImplementation|annotationProcessor|kapt)\s*\(?\s*["'][^"']*org\.springframework\.boot:[^"']*spring-boot[^"']*["'])/i.test(content);
+  }
+
+  return false;
+}
+
+function extractPyprojectDependencySections(content: string): string {
+  const lines = content.split("\n");
+  const collected: string[] = [];
+  let section = "";
+  let collectingProjectDeps = false;
+  let collectingOptionalDeps = false;
+  let bracketDepth = 0;
+
+  for (const line of lines) {
+    const trimmed = line.trim();
+
+    if (collectingProjectDeps) {
+      collected.push(line);
+      bracketDepth += countChar(line, "[") - countChar(line, "]");
+      if (bracketDepth <= 0) {
+        collectingProjectDeps = false;
+      }
+      continue;
+    }
+
+    if (collectingOptionalDeps) {
+      collected.push(line);
+      bracketDepth += countChar(line, "[") - countChar(line, "]");
+      if (bracketDepth <= 0) {
+        collectingOptionalDeps = false;
+      }
+      continue;
+    }
+
+    const sectionMatch = trimmed.match(/^\[([^\]]+)\]$/);
+    if (sectionMatch) {
+      section = sectionMatch[1].trim();
+      continue;
+    }
+
+    if (section === "project" && /^dependencies\s*=\s*\[/.test(trimmed)) {
+      collected.push(line);
+      bracketDepth = countChar(line, "[") - countChar(line, "]");
+      collectingProjectDeps = bracketDepth > 0;
+      continue;
+    }
+
+    if (
+      section === "project.optional-dependencies" ||
+      section === "tool.poetry.dependencies"
+    ) {
+      if (section === "project.optional-dependencies") {
+        const equalsIndex = line.indexOf("=");
+        if (equalsIndex !== -1) {
+          const value = line.slice(equalsIndex + 1);
+          collected.push(value);
+          bracketDepth = countChar(value, "[") - countChar(value, "]");
+          collectingOptionalDeps = bracketDepth > 0;
+        }
+      } else {
+        collected.push(line);
+      }
+    }
+  }
+
+  return collected.join("\n");
+}
+
+function countChar(text: string, char: string): number {
+  return [...text].filter((c) => c === char).length;
+}
+
+function normalizePackageName(name: string): string {
+  return name.toLowerCase().replace(/[_.]/g, "-");
+}
+
+function normalizePluginAlias(alias: string): string {
+  return alias.toLowerCase().replace(/[-_]/g, ".");
+}
+
+function versionCatalogAccessorName(relativePath: string): string {
+  const normalized = relativePath.replaceAll("\\", "/");
+  const basename = normalized.slice(normalized.lastIndexOf("/") + 1);
+  return basename.replace(/\.versions\.toml$/i, "").toLowerCase();
+}
+
+function resolveVersionCatalogAccessors(
+  basePath: string,
+  versionCatalogFiles: string[],
+  settingsFiles: string[],
+): Set<string> {
+  const accessors = new Set(versionCatalogFiles.map(versionCatalogAccessorName).filter(Boolean));
+  if (versionCatalogFiles.length === 0 || settingsFiles.length === 0) {
+    return accessors;
+  }
+
+  for (const settingsFile of settingsFiles) {
+    try {
+      const raw = readBounded(join(basePath, settingsFile), 64 * 1024);
+      const content = stripDependencyComments(settingsFile, raw);
+      const createRe = /create\(\s*["']([A-Za-z0-9_]+)["']\s*\)\s*\{[\s\S]*?([A-Za-z0-9_.-]+\.versions\.toml)["']?\s*\)\s*\)/g;
+      let match: RegExpExecArray | null;
+      while ((match = createRe.exec(content)) !== null) {
+        const accessor = match[1].toLowerCase();
+        const catalogBasename = match[2].replaceAll("\\", "/").split("/").pop()!;
+        if (versionCatalogFiles.some((file) => {
+          const normalized = file.replaceAll("\\", "/");
+          return normalized === catalogBasename || normalized.endsWith(`/${catalogBasename}`);
+        })) {
+          accessors.add(accessor);
+        }
+      }
+    } catch {
+      // unreadable settings file — ignore
+    }
+  }
+
+  return accessors;
+}
+
+function scanProjectFiles(basePath: string): string[] {
+  const files: string[] = [];
+  const queue: Array<{ path: string; depth: number }> = [{ path: basePath, depth: 0 }];
+
+  while (queue.length > 0 && files.length < MAX_RECURSIVE_SCAN_FILES) {
+    const current = queue.shift()!;
+    let entries: Array<{ name: string; isDirectory(): boolean; isFile(): boolean }>;
+    try {
+      entries = readdirSync(current.path, { withFileTypes: true, encoding: "utf8" });
+    } catch {
+      continue;
+    }
+
+    for (const entry of entries) {
+      const entryPath = join(current.path, entry.name);
+      const relativePath = entryPath.slice(basePath.length + 1);
+
+      if (entry.isDirectory()) {
+        if (current.depth < MAX_RECURSIVE_SCAN_DEPTH && !RECURSIVE_SCAN_IGNORED_DIRS.has(entry.name)) {
+          queue.push({ path: entryPath, depth: current.depth + 1 });
+        }
+        continue;
+      }
+
+      if (!entry.isFile()) continue;
+      files.push(relativePath);
+      if (files.length >= MAX_RECURSIVE_SCAN_FILES) break;
+    }
+  }
+
+  return files;
+}
diff --git a/src/resources/extensions/gsd/init-wizard.ts b/src/resources/extensions/gsd/init-wizard.ts
index c83cda4a6..de634ce99 100644
--- a/src/resources/extensions/gsd/init-wizard.ts
+++ b/src/resources/extensions/gsd/init-wizard.ts
@@ -15,6 +15,7 @@ import { ensureGitignore, untrackRuntimeFiles } from "./gitignore.js";
 import { gsdRoot } from "./paths.js";
 import { assertSafeDirectory } from "./validate-directory.js";
 import type { ProjectDetection, ProjectSignals } from "./detection.js";
+import { runSkillInstallStep } from "./skill-catalog.js";
 
 // ─── Types ──────────────────────────────────────────────────────────────────────
 
@@ -223,7 +224,14 @@ export async function showProjectInit(
     await customizeAdvancedPrefs(ctx, prefs);
   }
 
-  // ── Step 8: Bootstrap .gsd/ ────────────────────────────────────────────────
+  // ── Step 8: Skill Installation ─────────────────────────────────────────────
+  try {
+    await runSkillInstallStep(ctx, signals);
+  } catch {
+    // Non-fatal — skill installation failure should never block project init
+  }
+
+  // ── Step 9: Bootstrap .gsd/ ────────────────────────────────────────────────
   bootstrapGsdDirectory(basePath, prefs, signals);
 
   // Ensure .gitignore
diff --git a/src/resources/extensions/gsd/observability-validator.ts b/src/resources/extensions/gsd/observability-validator.ts
new file mode 100644
index 000000000..0fb87f5d2
--- /dev/null
+++ b/src/resources/extensions/gsd/observability-validator.ts
@@ -0,0 +1,456 @@
+import { loadFile } from "./files.js";
+import { resolveSliceFile, resolveTaskFile, resolveTasksDir, resolveTaskFiles } from "./paths.js";
+
+export interface ValidationIssue {
+  severity: "info" | "warning" | "error";
+  scope: "slice-plan" | "task-plan" | "task-summary" | "slice-summary";
+  file: string;
+  ruleId: string;
+  message: string;
+  suggestion?: string;
+}
+
+function getSection(content: string, heading: string, level: number = 2): string | null {
+  const prefix = "#".repeat(level) + " ";
+  const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+  const regex = new RegExp(`^${prefix}${escaped}\\s*$`, "m");
+  const match = regex.exec(content);
+  if (!match) return null;
+
+  const start = match.index + match[0].length;
+  const rest = content.slice(start);
+  const nextHeading = rest.match(new RegExp(`^#{1,${level}} `, "m"));
+  const end = nextHeading ? nextHeading.index! : rest.length;
+  return rest.slice(0, end).trim();
+}
+
+function getFrontmatter(content: string): string | null {
+  const trimmed = content.trimStart();
+  if (!trimmed.startsWith("---")) return null;
+  const afterFirst = trimmed.indexOf("\n");
+  if (afterFirst === -1) return null;
+  const rest = trimmed.slice(afterFirst + 1);
+  const endIdx = rest.indexOf("\n---");
+  if (endIdx === -1) return null;
+  return rest.slice(0, endIdx);
+}
+
+function hasFrontmatterKey(content: string, key: string): boolean {
+  const fm = getFrontmatter(content);
+  if (!fm) return false;
+  return new RegExp(`^${key}:`, "m").test(fm);
+}
+
+function normalizeMeaningfulLines(text: string): string[] {
+  return text
+    .split("\n")
+    .map(line => line.trim())
+    .filter(line => line.length > 0)
+    .filter(line => !line.startsWith("<!--"))
+    .filter(line => !line.endsWith("-->"))
+    .filter(line => !/^[-*]\s*\{\{.+\}\}$/.test(line))
+    .filter(line => !/^\{\{.+\}\}$/.test(line));
+}
+
+function sectionLooksPlaceholderOnly(text: string | null): boolean {
+  if (!text) return true;
+  const lines = normalizeMeaningfulLines(text)
+    .map(line => line.replace(/^[-*]\s+/, "").trim())
+    .filter(line => line.length > 0);
+
+  if (lines.length === 0) return true;
+
+  return lines.every(line => {
+    const lower = line.toLowerCase();
+    return lower === "none" ||
+      lower.endsWith(": none") ||
+      lower.includes("{{") ||
+      lower.includes("}}") ||
+      lower.startsWith("required for non-trivial") ||
+      lower.startsWith("describe how a future agent") ||
+      lower.startsWith("prefer:") ||
+      lower.startsWith("keep this section concise");
+  });
+}
+
+function textSuggestsObservabilityRelevant(content: string): boolean {
+  const lower = content.toLowerCase();
+  const needles = [
+    " api", "route", "server", "worker", "queue", "job", "sync", "import",
+    "webhook", "auth", "db", "database", "migration", "cache", "background",
+    "polling", "realtime", "socket", "stateful", "integration", "ui", "form",
+    "submit", "status", "service", "pipeline", "health endpoint", "error path"
+  ];
+  return needles.some(needle => lower.includes(needle));
+}
+
+function verificationMentionsDiagnostics(section: string | null): boolean {
+  if (!section) return false;
+  const lower = section.toLowerCase();
+  const needles = [
+    "error", "failure", "diagnostic", "status", "health", "inspect", "log",
+    "network", "console", "retry", "last error", "correlation", "readiness"
+  ];
+  return needles.some(needle => lower.includes(needle));
+}
+
+export function validateSlicePlanContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+
+  // ── Plan quality rules (always run, not gated by runtime relevance) ──
+
+  const tasksSection = getSection(content, "Tasks", 2);
+  if (tasksSection) {
+    const lines = tasksSection.split("\n");
+    const taskLinePattern = /^- \[[ x]\] \*\*T\d+:/;
+    const taskLineIndices: number[] = [];
+    for (let i = 0; i < lines.length; i++) {
+      if (taskLinePattern.test(lines[i])) taskLineIndices.push(i);
+    }
+
+    for (let t = 0; t < taskLineIndices.length; t++) {
+      const start = taskLineIndices[t];
+      const end = t + 1 < taskLineIndices.length ? taskLineIndices[t + 1] : lines.length;
+      // Check lines between this task header and the next (or section end)
+      const bodyLines = lines.slice(start + 1, end);
+      const meaningful = bodyLines.filter(l => l.trim().length > 0);
+      if (meaningful.length === 0) {
+        issues.push({
+          severity: "warning",
+          scope: "slice-plan",
+          file,
+          ruleId: "empty_task_entry",
+          message: "Inline task entry has no description content beneath the checkbox line.",
+          suggestion: "Add at least a Why/Files/Do/Verify summary so the task is self-describing.",
+        });
+      }
+    }
+  }
+
+  // ── Observability rules (gated by runtime relevance) ──
+
+  const relevant = textSuggestsObservabilityRelevant(content);
+  if (!relevant) return issues;
+
+  const obs = getSection(content, "Observability / Diagnostics", 2);
+  const verification = getSection(content, "Verification", 2);
+
+  if (!obs) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-plan",
+      file,
+      ruleId: "missing_observability_section",
+      message: "Slice plan appears non-trivial but is missing `## Observability / Diagnostics`.",
+      suggestion: "Add runtime signals, inspection surfaces, failure visibility, and redaction constraints.",
+    });
+  } else if (sectionLooksPlaceholderOnly(obs)) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-plan",
+      file,
+      ruleId: "observability_section_placeholder_only",
+      message: "Slice plan has `## Observability / Diagnostics` but it still looks like placeholder text.",
+      suggestion: "Replace placeholders with concrete signals and inspection surfaces a future agent should trust.",
+    });
+  }
+
+  if (!verificationMentionsDiagnostics(verification)) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-plan",
+      file,
+      ruleId: "verification_missing_diagnostic_check",
+      message: "Slice verification does not appear to include any diagnostic or failure-path check.",
+      suggestion: "Add at least one verification step for inspectable failure state, structured error output, status surface, or equivalent.",
+    });
+  }
+
+  return issues;
+}
+
+export function validateTaskPlanContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+
+  // ── Plan quality rules (always run, not gated by runtime relevance) ──
+
+  // Rule: empty or missing Steps section
+  const stepsSection = getSection(content, "Steps", 2);
+  if (stepsSection === null || sectionLooksPlaceholderOnly(stepsSection)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "empty_steps_section",
+      message: "Task plan has an empty or missing `## Steps` section.",
+      suggestion: "Add concrete numbered implementation steps so execution has a clear sequence.",
+    });
+  }
+
+  // Rule: placeholder-only Verification section
+  const verificationSection = getSection(content, "Verification", 2);
+  if (verificationSection !== null && sectionLooksPlaceholderOnly(verificationSection)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "placeholder_verification",
+      message: "Task plan has `## Verification` but it still looks like placeholder text.",
+      suggestion: "Replace placeholders with concrete verification commands, test runs, or observable checks.",
+    });
+  }
+
+  // Rule: scope estimate thresholds
+  const fm = getFrontmatter(content);
+  if (fm) {
+    const stepsMatch = fm.match(/^estimated_steps:\s*(\d+)/m);
+    const filesMatch = fm.match(/^estimated_files:\s*(\d+)/m);
+
+    if (stepsMatch) {
+      const estimatedSteps = parseInt(stepsMatch[1], 10);
+      if (estimatedSteps >= 10) {
+        issues.push({
+          severity: "warning",
+          scope: "task-plan",
+          file,
+          ruleId: "scope_estimate_steps_high",
+          message: `Task plan estimates ${estimatedSteps} steps (threshold: 10). Consider splitting into smaller tasks.`,
+          suggestion: "Break the task into sub-tasks or reduce scope so each task stays focused and completable in one pass.",
+        });
+      }
+    }
+
+    if (filesMatch) {
+      const estimatedFiles = parseInt(filesMatch[1], 10);
+      if (estimatedFiles >= 12) {
+        issues.push({
+          severity: "warning",
+          scope: "task-plan",
+          file,
+          ruleId: "scope_estimate_files_high",
+          message: `Task plan estimates ${estimatedFiles} files (threshold: 12). Consider splitting into smaller tasks.`,
+          suggestion: "Break the task into sub-tasks or reduce scope to keep the change footprint manageable.",
+        });
+      }
+    }
+  }
+
+  // Rule: Inputs and Expected Output should contain backtick-wrapped file paths
+  const inputsSection = getSection(content, "Inputs", 2);
+  const outputSection = getSection(content, "Expected Output", 2);
+  const backtickPathPattern = /`[^`]*[./][^`]*`/;
+
+  if (outputSection === null || !backtickPathPattern.test(outputSection)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "missing_output_file_paths",
+      message: "Task plan `## Expected Output` is missing or has no backtick-wrapped file paths.",
+      suggestion: "List concrete output file paths in backticks (e.g. `src/types.ts`). These are machine-parsed to derive task dependencies.",
+    });
+  }
+
+  if (inputsSection !== null && inputsSection.trim().length > 0 && !backtickPathPattern.test(inputsSection)) {
+    issues.push({
+      severity: "info",
+      scope: "task-plan",
+      file,
+      ruleId: "missing_input_file_paths",
+      message: "Task plan `## Inputs` has content but no backtick-wrapped file paths.",
+      suggestion: "List input file paths in backticks (e.g. `src/config.json`). These are machine-parsed to derive task dependencies.",
+    });
+  }
+
+  // ── Observability rules (gated by runtime relevance) ──
+
+  const relevant = textSuggestsObservabilityRelevant(content);
+  if (!relevant) return issues;
+
+  const obs = getSection(content, "Observability Impact", 2);
+  if (!obs) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "missing_observability_impact",
+      message: "Task plan appears runtime-relevant but is missing `## Observability Impact`.",
+      suggestion: "Explain what signals change, how a future agent inspects this task, and what failure state becomes visible.",
+    });
+  } else if (sectionLooksPlaceholderOnly(obs)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-plan",
+      file,
+      ruleId: "observability_impact_placeholder_only",
+      message: "Task plan has `## Observability Impact` but it still looks empty or placeholder-only.",
+      suggestion: "Fill in concrete inspection surfaces or explicitly justify why observability is not applicable.",
+    });
+  }
+
+  return issues;
+}
+
+export function validateTaskSummaryContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+  if (!hasFrontmatterKey(content, "observability_surfaces")) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "missing_observability_frontmatter",
+      message: "Task summary is missing `observability_surfaces` in frontmatter.",
+      suggestion: "List the durable status/log/error surfaces a future agent should use.",
+    });
+  }
+
+  const diagnostics = getSection(content, "Diagnostics", 2);
+  if (!diagnostics) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "missing_diagnostics_section",
+      message: "Task summary is missing `## Diagnostics`.",
+      suggestion: "Document how to inspect what this task built later.",
+    });
+  } else if (sectionLooksPlaceholderOnly(diagnostics)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "diagnostics_placeholder_only",
+      message: "Task summary diagnostics section still looks like placeholder text.",
+      suggestion: "Replace placeholders with concrete commands, endpoints, logs, error shapes, or failure artifacts.",
+    });
+  }
+
+  const evidence = getSection(content, "Verification Evidence", 2);
+  if (!evidence) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "evidence_block_missing",
+      message: "Task summary is missing `## Verification Evidence`.",
+      suggestion: "Add a verification evidence table showing gate check results (command, exit code, verdict, duration).",
+    });
+  } else if (sectionLooksPlaceholderOnly(evidence)) {
+    issues.push({
+      severity: "warning",
+      scope: "task-summary",
+      file,
+      ruleId: "evidence_block_placeholder",
+      message: "Task summary verification evidence section still looks like placeholder text.",
+      suggestion: "Replace placeholders with actual gate results or note that no verification commands were discovered.",
+    });
+  }
+
+  return issues;
+}
+
+export function validateSliceSummaryContent(file: string, content: string): ValidationIssue[] {
+  const issues: ValidationIssue[] = [];
+  if (!hasFrontmatterKey(content, "observability_surfaces")) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-summary",
+      file,
+      ruleId: "missing_observability_frontmatter",
+      message: "Slice summary is missing `observability_surfaces` in frontmatter.",
+      suggestion: "List the authoritative diagnostics and durable inspection surfaces for this slice.",
+    });
+  }
+
+  const diagnostics = getSection(content, "Authoritative diagnostics", 3);
+  if (!diagnostics) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-summary",
+      file,
+      ruleId: "missing_authoritative_diagnostics",
+      message: "Slice summary is missing `### Authoritative diagnostics` in Forward Intelligence.",
+      suggestion: "Tell future agents where to look first and why that signal is trustworthy.",
+    });
+  } else if (sectionLooksPlaceholderOnly(diagnostics)) {
+    issues.push({
+      severity: "warning",
+      scope: "slice-summary",
+      file,
+      ruleId: "authoritative_diagnostics_placeholder_only",
+      message: "Slice summary includes authoritative diagnostics but it still looks like placeholder text.",
+      suggestion: "Replace placeholders with the real first-stop diagnostic surface for this slice.",
+    });
+  }
+
+  return issues;
+}
+
+export async function validatePlanBoundary(basePath: string, milestoneId: string, sliceId: string): Promise<ValidationIssue[]> {
+  const issues: ValidationIssue[] = [];
+  const slicePlan = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+  if (slicePlan) {
+    const content = await loadFile(slicePlan);
+    if (content) issues.push(...validateSlicePlanContent(slicePlan, content));
+  }
+
+  const tasksDir = resolveTasksDir(basePath, milestoneId, sliceId);
+  const taskPlans = tasksDir ? resolveTaskFiles(tasksDir, "PLAN") : [];
+  for (const file of taskPlans) {
+    const taskId = file.split("-")[0];
+    const taskPlan = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+    if (!taskPlan) continue;
+    const content = await loadFile(taskPlan);
+    if (content) issues.push(...validateTaskPlanContent(taskPlan, content));
+  }
+
+  return issues;
+}
+
+export async function validateExecuteBoundary(basePath: string, milestoneId: string, sliceId: string, taskId: string): Promise<ValidationIssue[]> {
+  const issues: ValidationIssue[] = [];
+  const slicePlan = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+  if (slicePlan) {
+    const content = await loadFile(slicePlan);
+    if (content) issues.push(...validateSlicePlanContent(slicePlan, content));
+  }
+
+  const taskPlan = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+  if (taskPlan) {
+    const content = await loadFile(taskPlan);
+    if (content) issues.push(...validateTaskPlanContent(taskPlan, content));
+  }
+
+  return issues;
+}
+
+export async function validateCompleteBoundary(basePath: string, milestoneId: string, sliceId: string): Promise<ValidationIssue[]> {
+  const issues: ValidationIssue[] = [];
+  const tasksDir = resolveTasksDir(basePath, milestoneId, sliceId);
+  const taskSummaries = tasksDir ? resolveTaskFiles(tasksDir, "SUMMARY") : [];
+  for (const file of taskSummaries) {
+    const taskId = file.split("-")[0];
+    const taskSummary = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "SUMMARY");
+    if (!taskSummary) continue;
+    const content = await loadFile(taskSummary);
+    if (content) issues.push(...validateTaskSummaryContent(taskSummary, content));
+  }
+
+  const sliceSummary = resolveSliceFile(basePath, milestoneId, sliceId, "SUMMARY");
+  if (sliceSummary) {
+    const content = await loadFile(sliceSummary);
+    if (content) issues.push(...validateSliceSummaryContent(sliceSummary, content));
+  }
+
+  return issues;
+}
+
+export function formatValidationIssues(issues: ValidationIssue[], limit: number = 4): string {
+  if (issues.length === 0) return "";
+  const lines = issues.slice(0, limit).map(issue => {
+    const fileName = issue.file.split("/").pop() || issue.file;
+    return `- ${fileName}: ${issue.message}`;
+  });
+  if (issues.length > limit) lines.push(`- ...and ${issues.length - limit} more`);
+  return lines.join("\n");
+}
diff --git a/src/resources/extensions/gsd/preferences-skills.ts b/src/resources/extensions/gsd/preferences-skills.ts
index b449af8b4..1ad5a6d39 100644
--- a/src/resources/extensions/gsd/preferences-skills.ts
+++ b/src/resources/extensions/gsd/preferences-skills.ts
@@ -8,7 +8,6 @@
 import { existsSync, readdirSync } from "node:fs";
 import { homedir } from "node:os";
 import { isAbsolute, join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
 import { statSync } from "node:fs";
 
 import type {
@@ -25,13 +24,20 @@ export type { GSDSkillRule, SkillDiscoveryMode, SkillResolution, SkillResolution
 
 /**
  * Known skill directories, in priority order.
- * User skills (~/.gsd/agent/skills/) take precedence over project skills.
+ * Global skills (~/.agents/skills/) take precedence over project skills.
+ * Legacy ~/.gsd/agent/skills/ is included as a fallback for pre-migration installs.
  */
 export function getSkillSearchDirs(cwd: string): Array<{ dir: string; method: SkillResolution["method"] }> {
-  return [
-    { dir: join(getAgentDir(), "skills"), method: "user-skill" },
-    { dir: join(cwd, ".pi", "agent", "skills"), method: "project-skill" },
+  const dirs: Array<{ dir: string; method: SkillResolution["method"] }> = [
+    { dir: join(homedir(), ".agents", "skills"), method: "user-skill" },
+    { dir: join(cwd, ".agents", "skills"), method: "project-skill" },
   ];
+  // Legacy fallback — read skills from old GSD directory only if migration hasn't completed
+  const legacyDir = join(homedir(), ".gsd", "agent", "skills");
+  if (existsSync(legacyDir) && !existsSync(join(legacyDir, ".migrated-to-agents"))) {
+    dirs.push({ dir: legacyDir, method: "user-skill" });
+  }
+  return dirs;
 }
 
 /**
diff --git a/src/resources/extensions/gsd/roadmap-mutations.ts b/src/resources/extensions/gsd/roadmap-mutations.ts
new file mode 100644
index 000000000..39521462b
--- /dev/null
+++ b/src/resources/extensions/gsd/roadmap-mutations.ts
@@ -0,0 +1,134 @@
+/**
+ * Roadmap Mutations — shared utilities for modifying roadmap checkbox state.
+ *
+ * Extracts the duplicated "flip slice checkbox" pattern that existed in
+ * doctor.ts, mechanical-completion.ts, and auto-recovery.ts.
+ */
+
+import { readFileSync } from "node:fs";
+import { atomicWriteSync } from "./atomic-write.js";
+import { resolveMilestoneFile } from "./paths.js";
+import { clearParseCache } from "./files.js";
+
+/**
+ * Mark a slice as done ([x]) in the milestone roadmap.
+ * Idempotent — no-op if already checked or if the slice isn't found.
+ *
+ * @returns true if the roadmap was modified, false if no change was needed
+ */
+export function markSliceDoneInRoadmap(basePath: string, mid: string, sid: string): boolean {
+  const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+  if (!roadmapFile) return false;
+
+  let content: string;
+  try {
+    content = readFileSync(roadmapFile, "utf-8");
+  } catch {
+    return false;
+  }
+
+  // Try checkbox format first: "- [ ] **S01: Title**"
+  let updated = content.replace(
+    new RegExp(`^(\\s*-\\s+)\\[ \\]\\s+\\*\\*${sid}:`, "m"),
+    `$1[x] **${sid}:`,
+  );
+
+  // If checkbox format didn't match, try prose format: "## S01: Title" -> "## S01: \u2713 Title"
+  if (updated === content) {
+    updated = content.replace(
+      new RegExp(`^(#{1,4}\\s+(?:\\*{0,2})(?:Slice\\s+)?${sid}\\*{0,2}[:\\s.\\u2014\\u2013-]+\\s*)(.+)`, "m"),
+      (match, prefix, title) => {
+        // Already marked done — no-op
+        if (/^\u2713/.test(title) || /\(Complete\)\s*$/i.test(title)) return match;
+        return `${prefix}\u2713 ${title}`;
+      },
+    );
+  }
+
+  if (updated === content) return false;
+
+  atomicWriteSync(roadmapFile, updated);
+  clearParseCache();
+  return true;
+}
+
+/**
+ * Mark a slice as not done ([ ]) in the milestone roadmap.
+ * Idempotent — no-op if already unchecked or if the slice isn't found.
+ *
+ * @returns true if the roadmap was modified, false if no change was needed
+ */
+export function markSliceUndoneInRoadmap(basePath: string, mid: string, sid: string): boolean {
+  const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+  if (!roadmapFile) return false;
+
+  let content: string;
+  try {
+    content = readFileSync(roadmapFile, "utf-8");
+  } catch {
+    return false;
+  }
+
+  const updated = content.replace(
+    new RegExp(`^(\\s*-\\s+)\\[x\\]\\s+\\*\\*${sid}:`, "m"),
+    `$1[ ] **${sid}:`,
+  );
+
+  if (updated === content) return false;
+
+  atomicWriteSync(roadmapFile, updated);
+  clearParseCache();
+  return true;
+}
+
+/**
+ * Mark a task as done ([x]) in the slice plan.
+ * Idempotent — no-op if already checked or if the task isn't found.
+ *
+ * @returns true if the plan was modified, false if no change was needed
+ */
+export function markTaskDoneInPlan(basePath: string, planPath: string, tid: string): boolean {
+  let content: string;
+  try {
+    content = readFileSync(planPath, "utf-8");
+  } catch {
+    return false;
+  }
+
+  const updated = content.replace(
+    new RegExp(`^(\\s*-\\s+)\\[ \\]\\s+\\*\\*${tid}:`, "m"),
+    `$1[x] **${tid}:`,
+  );
+
+  if (updated === content) return false;
+
+  atomicWriteSync(planPath, updated);
+  clearParseCache();
+  return true;
+}
+
+/**
+ * Mark a task as not done ([ ]) in the slice plan.
+ * Idempotent — no-op if already unchecked or if the task isn't found.
+ *
+ * @returns true if the plan was modified, false if no change was needed
+ */
+export function markTaskUndoneInPlan(basePath: string, planPath: string, tid: string): boolean {
+  let content: string;
+  try {
+    content = readFileSync(planPath, "utf-8");
+  } catch {
+    return false;
+  }
+
+  const updated = content.replace(
+    new RegExp(`^(\\s*-\\s+)\\[x\\]\\s+\\*\\*${tid}:`, "mi"),
+    `$1[ ] **${tid}:`,
+  );
+
+  if (updated === content) return false;
+
+  atomicWriteSync(planPath, updated);
+  clearParseCache();
+  return true;
+}
diff --git a/src/resources/extensions/gsd/skill-catalog.ts b/src/resources/extensions/gsd/skill-catalog.ts
new file mode 100644
index 000000000..8f1c5d760
--- /dev/null
+++ b/src/resources/extensions/gsd/skill-catalog.ts
@@ -0,0 +1,1085 @@
+/**
+ * GSD Skill Catalog — Curated skill packs mapped to tech stacks.
+ *
+ * Each pack maps a detected (or user-chosen) tech stack to a skills.sh
+ * repo + specific skill names.  The init wizard uses this catalog to
+ * install relevant skills during project onboarding.
+ *
+ * Installation is delegated entirely to the skills.sh CLI:
+ *   npx skills add <repo> --skill <name> --skill <name> -y
+ *
+ * Skills are installed into ~/.agents/skills/ (the industry-standard
+ * ecosystem directory shared across all agents).
+ */
+
+import { execFile } from "node:child_process";
+import { existsSync } from "node:fs";
+import { join } from "node:path";
+import { homedir } from "node:os";
+import type { ExtensionCommandContext } from "@gsd/pi-coding-agent";
+import { showNextAction } from "../shared/tui.js";
+import type { ProjectSignals, XcodePlatform } from "./detection.js";
+
+// ─── Catalog Types ────────────────────────────────────────────────────────────
+
+export interface SkillPack {
+  /** Human-readable name shown in the wizard */
+  label: string;
+  /** Short description */
+  description: string;
+  /** skills.sh repo identifier (owner/repo) */
+  repo: string;
+  /** Specific skill names to install from the repo */
+  skills: string[];
+  /** Which detected primaryLanguage values trigger this pack */
+  matchLanguages?: string[];
+  /** Which detected project files trigger this pack */
+  matchFiles?: string[];
+  /** Trigger when Xcode project targets one of these platforms */
+  matchXcodePlatforms?: XcodePlatform[];
+  /** Always include this pack in brownfield recommendations */
+  matchAlways?: boolean;
+}
+
+// ─── Curated Catalog ──────────────────────────────────────────────────────────
+
+export const SKILL_CATALOG: SkillPack[] = [
+  // ── Swift (language-level — any Swift project) ────────────────────────────
+  {
+    label: "SwiftUI",
+    description: "SwiftUI layout, navigation, animations, gestures, Liquid Glass",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "swiftui-animation",
+      "swiftui-gestures",
+      "swiftui-layout-components",
+      "swiftui-liquid-glass",
+      "swiftui-navigation",
+      "swiftui-patterns",
+      "swiftui-performance",
+      "swiftui-uikit-interop",
+    ],
+    matchLanguages: ["swift"],
+    matchFiles: ["Package.swift"],
+  },
+  {
+    label: "Swift Core",
+    description: "Swift language, concurrency, Codable, Charts, Testing, SwiftData",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "swift-codable",
+      "swift-charts",
+      "swift-concurrency",
+      "swift-language",
+      "swift-testing",
+      "swiftdata",
+    ],
+    matchLanguages: ["swift"],
+    matchFiles: ["Package.swift"],
+  },
+  // ── iOS (Xcode project targeting iphoneos required) ───────────────────────
+  {
+    label: "iOS App Frameworks",
+    description: "App Intents, Widgets, StoreKit, MapKit, Live Activities, push notifications",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "alarmkit",
+      "app-clips",
+      "app-intents",
+      "live-activities",
+      "mapkit-location",
+      "photos-camera-media",
+      "push-notifications",
+      "storekit",
+      "tipkit",
+      "widgetkit",
+    ],
+    matchXcodePlatforms: ["iphoneos"],
+  },
+  {
+    label: "iOS Data Frameworks",
+    description: "CloudKit, HealthKit, MusicKit, WeatherKit, Contacts, Calendar",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "cloudkit-sync",
+      "contacts-framework",
+      "eventkit-calendar",
+      "healthkit",
+      "musickit-audio",
+      "passkit-wallet",
+      "weatherkit",
+    ],
+    matchXcodePlatforms: ["iphoneos"],
+  },
+  {
+    label: "iOS AI & ML",
+    description: "Core ML, Vision, on-device AI, speech recognition, NLP",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "apple-on-device-ai",
+      "coreml",
+      "natural-language",
+      "speech-recognition",
+      "vision-framework",
+    ],
+    matchXcodePlatforms: ["iphoneos"],
+  },
+  {
+    label: "iOS Engineering",
+    description: "Networking, security, accessibility, localization, Instruments, App Store review",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "app-store-review",
+      "authentication",
+      "background-processing",
+      "debugging-instruments",
+      "device-integrity",
+      "ios-accessibility",
+      "ios-localization",
+      "ios-networking",
+      "ios-security",
+      "metrickit-diagnostics",
+    ],
+    matchXcodePlatforms: ["iphoneos"],
+  },
+  {
+    label: "iOS Hardware",
+    description: "Bluetooth, CoreMotion, NFC, PencilKit, RealityKit AR",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "core-bluetooth",
+      "core-motion",
+      "core-nfc",
+      "pencilkit-drawing",
+      "realitykit-ar",
+    ],
+    matchXcodePlatforms: ["iphoneos"],
+  },
+  {
+    label: "iOS Platform",
+    description: "CallKit, EnergyKit, HomeKit, SharePlay, PermissionKit",
+    repo: "dpearson2699/swift-ios-skills",
+    skills: [
+      "callkit-voip",
+      "energykit",
+      "homekit-matter",
+      "permissionkit",
+      "shareplay-activities",
+    ],
+    matchXcodePlatforms: ["iphoneos"],
+  },
+  // ── React / Next.js ───────────────────────────────────────────────────────
+  {
+    label: "React & Web Frontend",
+    description: "React best practices and composition patterns",
+    repo: "vercel-labs/agent-skills",
+    skills: [
+      "vercel-react-best-practices",
+      "vercel-composition-patterns",
+    ],
+    matchLanguages: ["javascript/typescript"],
+  },
+  {
+    label: "shadcn/ui",
+    description: "shadcn/ui component library patterns and usage",
+    repo: "shadcn/ui",
+    skills: ["shadcn"],
+    matchLanguages: ["javascript/typescript"],
+  },
+  // ── React Native ──────────────────────────────────────────────────────────
+  {
+    label: "React Native",
+    description: "React Native and Expo best practices for performant mobile apps",
+    repo: "vercel-labs/agent-skills",
+    skills: ["vercel-react-native-skills"],
+    matchFiles: ["metro.config.js", "metro.config.ts", "react-native.config.js"],
+  },
+  {
+    label: "React Native Architecture",
+    description: "React Native app architecture, navigation, and cross-platform design patterns",
+    repo: "wshobson/agents",
+    skills: ["react-native-architecture", "react-native-design"],
+    matchFiles: ["metro.config.js", "metro.config.ts", "react-native.config.js"],
+  },
+  // ── TypeScript & JS Ecosystem (wshobson/agents — 41K combined installs) ──
+  {
+    label: "TypeScript & JS Development",
+    description: "Advanced TypeScript types, Node.js backend, testing, and modern JS patterns",
+    repo: "wshobson/agents",
+    skills: [
+      "typescript-advanced-types",
+      "nodejs-backend-patterns",
+      "javascript-testing-patterns",
+      "modern-javascript-patterns",
+    ],
+    matchLanguages: ["javascript/typescript"],
+  },
+  // ── React State (wshobson/agents — 8.1K combined installs) ─────────────
+  {
+    label: "React State & Patterns",
+    description: "State management with Zustand, Jotai, React Query, and React modernization",
+    repo: "wshobson/agents",
+    skills: ["react-state-management", "react-modernization"],
+    matchLanguages: ["javascript/typescript"],
+  },
+  // ── Tailwind CSS (wshobson/agents — 22.8K installs) ───────────────────
+  {
+    label: "Tailwind CSS",
+    description: "Tailwind v4 design system, CVA patterns, and utility-first CSS",
+    repo: "wshobson/agents",
+    skills: ["tailwind-design-system"],
+    matchFiles: [
+      "tailwind.config.js",
+      "tailwind.config.ts",
+      "tailwind.config.mjs",
+      "tailwind.config.cjs",
+    ],
+  },
+  // ── General Frontend ──────────────────────────────────────────────────────
+  {
+    label: "Frontend Design & UX",
+    description: "Frontend design, accessibility, and browser automation",
+    repo: "anthropics/skills",
+    skills: ["frontend-design"],
+    matchLanguages: ["javascript/typescript"],
+  },
+  // ── Angular ───────────────────────────────────────────────────────────────
+  {
+    label: "Angular",
+    description: "Angular components, signals, forms, routing, and testing",
+    repo: "analogjs/angular-skills",
+    skills: [
+      "angular-component",
+      "angular-signals",
+      "angular-forms",
+      "angular-routing",
+      "angular-testing",
+    ],
+    matchFiles: ["angular.json"],
+  },
+  {
+    label: "Angular Migration",
+    description: "Migrate from AngularJS to Angular with hybrid mode and incremental rewriting",
+    repo: "wshobson/agents",
+    skills: ["angular-migration"],
+    matchFiles: ["angular.json"],
+  },
+  // ── Vue.js / Nuxt ────────────────────────────────────────────────────────
+  {
+    label: "Vue.js",
+    description: "Vue best practices, Pinia state, Vue Router, and testing",
+    repo: "vuejs-ai/skills",
+    skills: [
+      "vue-best-practices",
+      "vue-pinia-best-practices",
+      "vue-router-best-practices",
+      "vue-testing-best-practices",
+    ],
+    matchFiles: ["nuxt.config.ts", "nuxt.config.js", "vue.config.js", "vue.config.ts", "*.vue"],
+  },
+  // ── Svelte / SvelteKit ────────────────────────────────────────────────────
+  {
+    label: "Svelte",
+    description: "Svelte code patterns and SvelteKit best practices",
+    repo: "sveltejs/ai-tools",
+    skills: ["svelte-code-writer", "svelte-core-bestpractices"],
+    matchFiles: ["svelte.config.js", "svelte.config.ts"],
+  },
+  // ── Next.js ───────────────────────────────────────────────────────────────
+  {
+    label: "Next.js",
+    description: "Next.js app router, server components, and deployment patterns",
+    repo: "vercel-labs/vercel-plugin",
+    skills: ["nextjs"],
+    matchFiles: ["next.config.js", "next.config.ts", "next.config.mjs"],
+  },
+  {
+    label: "Next.js App Router Patterns",
+    description: "Next.js 14+ App Router, React Server Components, and streaming",
+    repo: "wshobson/agents",
+    skills: ["nextjs-app-router-patterns"],
+    matchFiles: ["next.config.js", "next.config.ts", "next.config.mjs"],
+  },
+  // ── Java / Spring Boot ────────────────────────────────────────────────────
+  {
+    label: "Java & Spring Boot",
+    description: "Spring Boot best practices, DI, RESTful APIs, JPA, testing, and security",
+    repo: "github/awesome-copilot",
+    skills: ["java-springboot"],
+    matchFiles: ["dep:spring-boot"],
+  },
+  // ── .NET / C# ────────────────────────────────────────────────────────────
+  {
+    label: ".NET & C#",
+    description: ".NET best practices, design patterns, and upgrade guidance",
+    repo: "github/awesome-copilot",
+    skills: ["dotnet-best-practices", "dotnet-design-pattern-review"],
+    matchLanguages: ["csharp"],
+    matchFiles: ["*.csproj"],
+  },
+  {
+    label: ".NET Backend Patterns",
+    description: ".NET backend architecture, middleware, and production patterns",
+    repo: "wshobson/agents",
+    skills: ["dotnet-backend-patterns"],
+    matchFiles: ["*.csproj", "*.fsproj", "*.sln"],
+  },
+  // ── Flutter / Dart ────────────────────────────────────────────────────────
+  {
+    label: "Flutter",
+    description: "Flutter layouts, architecture, state management, and testing",
+    repo: "flutter/skills",
+    skills: [
+      "flutter-building-layouts",
+      "flutter-architecting-apps",
+      "flutter-managing-state",
+      "flutter-testing-apps",
+    ],
+    matchLanguages: ["dart/flutter"],
+    matchFiles: ["pubspec.yaml"],
+  },
+  // ── PHP / Laravel ─────────────────────────────────────────────────────────
+  {
+    label: "PHP & Laravel",
+    description: "Laravel patterns, PHP best practices, and testing",
+    repo: "jeffallan/claude-skills",
+    skills: ["laravel-specialist", "php-pro"],
+    matchLanguages: ["php"],
+    matchFiles: ["composer.json"],
+  },
+  // ── Django ────────────────────────────────────────────────────────────────
+  {
+    label: "Django",
+    description: "Django expert patterns, models, views, and middleware",
+    repo: "vintasoftware/django-ai-plugins",
+    skills: ["django-expert"],
+    matchFiles: ["manage.py"],
+  },
+  // ── Rust ──────────────────────────────────────────────────────────────────
+  {
+    label: "Rust",
+    description: "Rust language patterns and best practices",
+    repo: "anthropics/skills",
+    skills: ["rust-best-practices"],
+    matchLanguages: ["rust"],
+    matchFiles: ["Cargo.toml"],
+  },
+  {
+    label: "Rust Async Patterns",
+    description: "Async Rust with Tokio, futures, and proper error handling",
+    repo: "wshobson/agents",
+    skills: ["rust-async-patterns"],
+    matchLanguages: ["rust"],
+    matchFiles: ["Cargo.toml"],
+  },
+  // ── Python ────────────────────────────────────────────────────────────────
+  {
+    label: "Python",
+    description: "Python patterns and best practices",
+    repo: "anthropics/skills",
+    skills: ["python-best-practices"],
+    matchLanguages: ["python"],
+    matchFiles: ["pyproject.toml", "setup.py", "requirements.txt"],
+  },
+  {
+    label: "Python Advanced",
+    description: "Python performance, testing, async patterns, and uv package manager",
+    repo: "wshobson/agents",
+    skills: [
+      "python-performance-optimization",
+      "python-testing-patterns",
+      "async-python-patterns",
+      "uv-package-manager",
+    ],
+    matchLanguages: ["python"],
+    matchFiles: ["pyproject.toml", "setup.py", "requirements.txt"],
+  },
+  // FastAPI — detected by scanning requirements.txt / pyproject.toml for the
+  // "fastapi" dependency. Uses the "dep:fastapi" synthetic marker from detection.ts.
+  {
+    label: "FastAPI",
+    description: "Production-ready FastAPI projects with async patterns and error handling",
+    repo: "wshobson/agents",
+    skills: ["fastapi-templates"],
+    matchFiles: ["dep:fastapi"],
+  },
+  // ── Go ────────────────────────────────────────────────────────────────────
+  {
+    label: "Go",
+    description: "Go language patterns and best practices",
+    repo: "anthropics/skills",
+    skills: ["go-best-practices"],
+    matchLanguages: ["go"],
+    matchFiles: ["go.mod"],
+  },
+  {
+    label: "Go Concurrency Patterns",
+    description: "Go concurrency with channels, worker pools, and context cancellation",
+    repo: "wshobson/agents",
+    skills: ["go-concurrency-patterns"],
+    matchLanguages: ["go"],
+    matchFiles: ["go.mod"],
+  },
+  // ── Database / ORM ─────────────────────────────────────────────────────────
+  {
+    label: "Prisma",
+    description: "Prisma ORM setup, schema design, client API, and migrations",
+    repo: "prisma/skills",
+    skills: [
+      "prisma-database-setup",
+      "prisma-client-api",
+      "prisma-cli",
+    ],
+    matchFiles: ["prisma/schema.prisma"],
+  },
+  {
+    label: "Supabase & Postgres",
+    description: "Supabase project setup, auth, Postgres best practices, and Firestore",
+    repo: "supabase/agent-skills",
+    skills: ["supabase-postgres-best-practices"],
+    matchFiles: ["supabase/config.toml"],
+  },
+  {
+    label: "PostgreSQL Design",
+    description: "PostgreSQL table design, indexing strategies, and query optimization",
+    repo: "wshobson/agents",
+    skills: ["postgresql-table-design"],
+    matchFiles: ["supabase/config.toml", "*.sql"],
+  },
+  {
+    label: "SQL Optimization & Review",
+    description: "Universal SQL performance optimization, security (injection prevention), and code review",
+    repo: "github/awesome-copilot",
+    skills: ["sql-optimization", "sql-code-review"],
+    matchFiles: [
+      "*.sql",
+      "*.sqlite",
+      "prisma/schema.prisma",
+      "supabase/config.toml",
+      "drizzle.config.ts",
+      "drizzle.config.js",
+    ],
+  },
+  {
+    label: "Redis",
+    description: "Redis development patterns and best practices",
+    repo: "redis/agent-skills",
+    skills: ["redis-development"],
+    matchFiles: ["redis.conf"],
+  },
+  // ── Cloud Platforms ────────────────────────────────────────────────────────
+  {
+    label: "Firebase",
+    description: "Firebase setup, auth, Firestore, hosting, and AI Logic",
+    repo: "firebase/agent-skills",
+    skills: [
+      "firebase-basics",
+      "firebase-auth-basics",
+      "firebase-firestore-basics",
+      "firebase-hosting-basics",
+      "firebase-ai-logic",
+    ],
+    matchFiles: ["firebase.json"],
+  },
+  {
+    label: "Azure",
+    description: "Azure deployment, AI services, storage, cost optimization, and diagnostics",
+    repo: "microsoft/github-copilot-for-azure",
+    skills: [
+      "azure-deploy",
+      "azure-ai",
+      "azure-storage",
+      "azure-cost-optimization",
+      "azure-diagnostics",
+    ],
+    matchFiles: ["azure-pipelines.yml"],
+  },
+  {
+    label: "AWS",
+    description: "AWS deployment, Lambda, and serverless patterns",
+    repo: "awslabs/agent-plugins",
+    skills: ["deploy", "aws-lambda", "aws-serverless-deployment"],
+    matchFiles: ["cdk.json", "samconfig.toml", "serverless.yml", "serverless.yaml"],
+  },
+  // ── Container / DevOps ─────────────────────────────────────────────────────
+  {
+    label: "Docker",
+    description: "Multi-stage Dockerfiles, layer optimization, and security hardening",
+    repo: "github/awesome-copilot",
+    skills: ["multi-stage-dockerfile"],
+    matchFiles: ["Dockerfile", "docker-compose.yml", "docker-compose.yaml"],
+  },
+  // ── Infrastructure as Code ─────────────────────────────────────────────────
+  {
+    label: "Terraform",
+    description: "Terraform style guide, testing, and stack patterns",
+    repo: "hashicorp/agent-skills",
+    skills: ["terraform-style-guide", "terraform-test", "terraform-stacks"],
+    matchFiles: ["main.tf"],
+  },
+  // ── Android (wshobson/agents — 7K installs) ────────────────────────────────
+  {
+    label: "Android",
+    description: "Android app design following Material Design 3 guidelines",
+    repo: "wshobson/agents",
+    skills: ["mobile-android-design"],
+    matchFiles: ["app/build.gradle", "app/build.gradle.kts"],
+  },
+  // ── Kubernetes (wshobson/agents — 4 skills) ────────────────────────────────
+  {
+    label: "Kubernetes",
+    description: "K8s manifests, Helm charts, GitOps workflows, and security policies",
+    repo: "wshobson/agents",
+    skills: [
+      "k8s-manifest-generator",
+      "helm-chart-scaffolding",
+      "gitops-workflow",
+      "k8s-security-policies",
+    ],
+    matchFiles: ["Chart.yaml", "kustomization.yaml"],
+  },
+  // ── CI/CD (wshobson/agents — 3 skills) ─────────────────────────────────────
+  {
+    label: "CI/CD Automation",
+    description: "Pipeline design, GitHub Actions workflows, and secrets management",
+    repo: "wshobson/agents",
+    skills: [
+      "deployment-pipeline-design",
+      "github-actions-templates",
+      "secrets-management",
+    ],
+    matchFiles: [".github/workflows"],
+  },
+  // ── Blockchain / Web3 (wshobson/agents — 3 skills) ─────────────────────────
+  {
+    label: "Blockchain & Web3",
+    description: "Solidity security, DeFi protocols, and smart contract testing",
+    repo: "wshobson/agents",
+    skills: ["solidity-security", "defi-protocol-templates", "web3-testing"],
+    matchFiles: ["hardhat.config.js", "hardhat.config.ts", "foundry.toml"],
+  },
+  // ── Data Engineering (wshobson/agents — 4 skills) ──────────────────────────
+  {
+    label: "Data Engineering",
+    description: "dbt transformations, Airflow DAGs, Spark optimization, and data quality",
+    repo: "wshobson/agents",
+    skills: [
+      "dbt-transformation-patterns",
+      "airflow-dag-patterns",
+      "spark-optimization",
+      "data-quality-frameworks",
+    ],
+    matchFiles: ["dbt_project.yml", "airflow.cfg"],
+  },
+  // ── Game Development — Unity (wshobson/agents) ─────────────────────────────
+  {
+    label: "Unity",
+    description: "Unity ECS patterns for high-performance game systems",
+    repo: "wshobson/agents",
+    skills: ["unity-ecs-patterns"],
+    matchFiles: ["ProjectSettings/ProjectVersion.txt"],
+  },
+  // ── Game Development — Godot (wshobson/agents) ─────────────────────────────
+  {
+    label: "Godot",
+    description: "Godot GDScript best practices and scene composition",
+    repo: "wshobson/agents",
+    skills: ["godot-gdscript-patterns"],
+    matchFiles: ["project.godot"],
+  },
+  // ── Essential (all projects) ────────────────────────────────────────────
+  {
+    label: "Skill Discovery",
+    description: "Find and install new agent skills from the ecosystem",
+    repo: "vercel-labs/skills",
+    skills: ["find-skills"],
+    matchAlways: true,
+  },
+  {
+    label: "Skill Authoring",
+    description: "Create, audit, and refine SKILL.md files",
+    repo: "anthropics/skills",
+    skills: ["skill-creator"],
+    matchAlways: true,
+  },
+  {
+    label: "Browser Automation",
+    description: "Browser automation for web scraping, testing, and interaction",
+    repo: "vercel-labs/agent-browser",
+    skills: ["agent-browser"],
+    matchAlways: true,
+  },
+  // ── General Tooling ───────────────────────────────────────────────────────
+  {
+    label: "Document Handling",
+    description: "PDF, DOCX, XLSX, PPTX creation and manipulation",
+    repo: "anthropics/skills",
+    skills: ["pdf", "docx", "xlsx", "pptx"],
+    matchAlways: true,
+  },
+  // ── Code Quality (wshobson/agents — matchAlways) ──────────────────────────
+  {
+    label: "Code Review & Quality",
+    description: "Code review excellence and error handling patterns",
+    repo: "wshobson/agents",
+    skills: ["code-review-excellence", "error-handling-patterns"],
+    matchAlways: true,
+  },
+  {
+    label: "Git Advanced Workflows",
+    description: "Advanced Git rebasing, cherry-picking, bisect, worktrees, and reflog",
+    repo: "wshobson/agents",
+    skills: ["git-advanced-workflows"],
+    matchAlways: true,
+  },
+];
+
+// ─── Greenfield Tech Stack Choices ────────────────────────────────────────────
+
+/**
+ * Tech stack → pack mappings for programmatic use.
+ *
+ * NOT shown directly to users during init (greenfield installs essentials
+ * only and defers stack-specific skills).  These mappings are available for:
+ *   1. The LLM to install skills after establishing a design
+ *   2. The `/gsd skills` command (explicit user request)
+ *   3. Re-running brownfield detection after project files are created
+ */
+export const GREENFIELD_STACKS: Array<{
+  id: string;
+  label: string;
+  description: string;
+  packs: string[];
+}> = [
+  {
+    id: "ios",
+    label: "iOS App",
+    description: "Full iOS development — SwiftUI, Swift, and all iOS frameworks",
+    packs: [
+      "SwiftUI",
+      "Swift Core",
+      "iOS App Frameworks",
+      "iOS Data Frameworks",
+      "iOS AI & ML",
+      "iOS Engineering",
+      "iOS Hardware",
+      "iOS Platform",
+    ],
+  },
+  {
+    id: "swift",
+    label: "Swift (non-iOS)",
+    description: "Swift packages, server-side Swift, CLI tools, SwiftUI without iOS",
+    packs: ["SwiftUI", "Swift Core"],
+  },
+  {
+    id: "react-web",
+    label: "React Web",
+    description: "React, Next.js, shadcn/ui, web frontend",
+    packs: ["React & Web Frontend", "TypeScript & JS Development", "React State & Patterns", "Tailwind CSS", "shadcn/ui", "Frontend Design & UX"],
+  },
+  {
+    id: "react-native",
+    label: "React Native",
+    description: "Cross-platform mobile with React Native",
+    packs: ["React Native", "React Native Architecture", "React & Web Frontend", "TypeScript & JS Development"],
+  },
+  {
+    id: "fullstack-js",
+    label: "Full-Stack JavaScript/TypeScript",
+    description: "Node.js backend + React frontend",
+    packs: ["React & Web Frontend", "TypeScript & JS Development", "React State & Patterns", "Tailwind CSS", "shadcn/ui", "Frontend Design & UX", "Prisma"],
+  },
+  {
+    id: "rust",
+    label: "Rust",
+    description: "Systems programming with Rust",
+    packs: ["Rust", "Rust Async Patterns"],
+  },
+  {
+    id: "python",
+    label: "Python",
+    description: "Python applications, scripts, or ML",
+    packs: ["Python", "Python Advanced"],
+  },
+  {
+    id: "go",
+    label: "Go",
+    description: "Go services and CLIs",
+    packs: ["Go", "Go Concurrency Patterns"],
+  },
+  {
+    id: "firebase",
+    label: "Firebase",
+    description: "Firebase backend — auth, Firestore, hosting, AI",
+    packs: ["Firebase"],
+  },
+  {
+    id: "aws",
+    label: "AWS",
+    description: "AWS deployment, Lambda, serverless",
+    packs: ["AWS"],
+  },
+  {
+    id: "azure",
+    label: "Azure",
+    description: "Azure deployment, AI, storage, diagnostics",
+    packs: ["Azure"],
+  },
+  {
+    id: "angular",
+    label: "Angular",
+    description: "Angular components, signals, forms, routing",
+    packs: ["Angular", "Angular Migration", "Frontend Design & UX"],
+  },
+  {
+    id: "vue",
+    label: "Vue.js / Nuxt",
+    description: "Vue.js with Pinia, Vue Router, and testing",
+    packs: ["Vue.js", "Frontend Design & UX"],
+  },
+  {
+    id: "svelte",
+    label: "Svelte / SvelteKit",
+    description: "Svelte 5 and SvelteKit patterns",
+    packs: ["Svelte", "Tailwind CSS", "Frontend Design & UX"],
+  },
+  {
+    id: "nextjs",
+    label: "Next.js",
+    description: "Next.js app router, React, and Vercel deployment",
+    packs: ["Next.js", "Next.js App Router Patterns", "React & Web Frontend", "TypeScript & JS Development", "Tailwind CSS", "shadcn/ui"],
+  },
+  {
+    id: "flutter",
+    label: "Flutter",
+    description: "Cross-platform Flutter/Dart development",
+    packs: ["Flutter"],
+  },
+  {
+    id: "java",
+    label: "Java / Spring Boot",
+    description: "Spring Boot APIs, JPA, and testing",
+    packs: ["Java & Spring Boot"],
+  },
+  {
+    id: "dotnet",
+    label: ".NET / C#",
+    description: "ASP.NET Core, Entity Framework, and design patterns",
+    packs: [".NET & C#", ".NET Backend Patterns"],
+  },
+  {
+    id: "php",
+    label: "PHP / Laravel",
+    description: "Laravel patterns and PHP best practices",
+    packs: ["PHP & Laravel"],
+  },
+  {
+    id: "django",
+    label: "Django",
+    description: "Django models, views, middleware, and Celery",
+    packs: ["Django", "Python", "Python Advanced"],
+  },
+  {
+    id: "fastapi",
+    label: "FastAPI",
+    description: "FastAPI web APIs with async patterns",
+    packs: ["FastAPI", "Python", "Python Advanced"],
+  },
+  {
+    id: "android",
+    label: "Android / Kotlin",
+    description: "Android app development with Material Design 3",
+    packs: ["Android"],
+  },
+  {
+    id: "kubernetes",
+    label: "Kubernetes",
+    description: "Kubernetes manifests, Helm charts, and GitOps",
+    packs: ["Kubernetes", "Docker"],
+  },
+  {
+    id: "blockchain",
+    label: "Blockchain / Web3",
+    description: "Solidity, DeFi protocols, and smart contract testing",
+    packs: ["Blockchain & Web3"],
+  },
+  {
+    id: "data-engineering",
+    label: "Data Engineering",
+    description: "dbt, Airflow, Spark, and data quality",
+    packs: ["Data Engineering", "Python", "Python Advanced"],
+  },
+  {
+    id: "unity",
+    label: "Unity",
+    description: "Unity game development with ECS patterns",
+    packs: ["Unity"],
+  },
+  {
+    id: "godot",
+    label: "Godot",
+    description: "Godot game development with GDScript",
+    packs: ["Godot"],
+  },
+  {
+    id: "other",
+    label: "Other / Skip",
+    description: "Install skills later with npx skills add",
+    packs: [],
+  },
+];
+
+// ─── Detection → Pack Matching ────────────────────────────────────────────────
+
+/**
+ * Match project signals to relevant skill packs.
+ * Returns packs in catalog order (not sorted by match type).
+ */
+export function matchPacksForProject(signals: ProjectSignals): SkillPack[] {
+  const matched = new Set<SkillPack>();
+
+  for (const pack of SKILL_CATALOG) {
+    // Language match
+    if (pack.matchLanguages && signals.primaryLanguage) {
+      if (pack.matchLanguages.includes(signals.primaryLanguage)) {
+        matched.add(pack);
+        continue;
+      }
+    }
+
+    // File match
+    if (pack.matchFiles) {
+      for (const file of pack.matchFiles) {
+        if (signals.detectedFiles.includes(file)) {
+          matched.add(pack);
+          break;
+        }
+      }
+    }
+
+    // Xcode platform match (e.g. iOS packs only when SDKROOT = iphoneos)
+    if (pack.matchXcodePlatforms && signals.xcodePlatforms.length > 0) {
+      const hasMatch = pack.matchXcodePlatforms.some((p) => signals.xcodePlatforms.includes(p));
+      if (hasMatch) matched.add(pack);
+    }
+
+    // Always-include packs (essentials)
+    if (pack.matchAlways) {
+      matched.add(pack);
+    }
+  }
+
+  return [...matched];
+}
+
+// ─── Installation ─────────────────────────────────────────────────────────────
+
+/**
+ * Install a skill pack via the skills.sh CLI.
+ * Runs: npx skills add <repo> --skill <name> ... -y
+ *
+ * Returns true if installation succeeded.
+ */
+export function installSkillPack(pack: SkillPack): Promise<boolean> {
+  return new Promise((resolve) => {
+    // --yes = npx auto-install, -y = skills.sh non-interactive
+    const args = ["--yes", "skills", "add", pack.repo];
+
+    for (const skill of pack.skills) {
+      args.push("--skill", skill);
+    }
+    args.push("-y");
+
+    execFile("npx", args, { timeout: 120_000 }, (error) => {
+      resolve(!error);
+    });
+  });
+}
+
+/**
+ * Install multiple packs, batching by repo to minimize npx invocations.
+ * Returns the labels of successfully installed packs.
+ */
+export async function installPacksBatched(
+  packs: SkillPack[],
+  onProgress?: (label: string) => void,
+): Promise<string[]> {
+  // Group packs by repo
+  const byRepo = new Map<string, { skills: string[]; labels: string[] }>();
+  for (const pack of packs) {
+    const entry = byRepo.get(pack.repo) ?? { skills: [], labels: [] };
+    entry.skills.push(...pack.skills);
+    entry.labels.push(pack.label);
+    byRepo.set(pack.repo, entry);
+  }
+
+  const installed: string[] = [];
+  for (const [repo, { skills, labels }] of byRepo) {
+    onProgress?.(labels.join(", "));
+    const ok = await new Promise<boolean>((resolve) => {
+      // --yes = npx auto-install, -y = skills.sh non-interactive
+      const args = ["--yes", "skills", "add", repo];
+      for (const skill of skills) {
+        args.push("--skill", skill);
+      }
+      args.push("-y");
+      execFile("npx", args, { timeout: 120_000 }, (error) => {
+        resolve(!error);
+      });
+    });
+    if (ok) installed.push(...labels);
+  }
+  return installed;
+}
+
+/**
+ * Check if any skills from a pack are already installed.
+ */
+export function isPackInstalled(pack: SkillPack): boolean {
+  const skillsDir = join(homedir(), ".agents", "skills");
+  if (!existsSync(skillsDir)) return false;
+
+  return pack.skills.every((name) =>
+    existsSync(join(skillsDir, name, "SKILL.md")),
+  );
+}
+
+// ─── Init Wizard Integration ──────────────────────────────────────────────────
+
+/**
+ * Run skill installation step during project init.
+ *
+ * Brownfield (signals.detectedFiles.length > 0):
+ *   Auto-detects tech stack → shows matched packs → installs accepted ones.
+ *
+ * Greenfield (no files detected):
+ *   Installs essential packs only (find-skills, skill-creator, etc.).
+ *   Stack-specific skills are deferred — once the LLM establishes a design
+ *   and creates project files (package.json, firebase.json, etc.), brownfield
+ *   detection will pick them up on the next `gsd init` or via auto-mode
+ *   skill discovery.
+ *
+ * Returns the list of installed pack labels.
+ */
+export async function runSkillInstallStep(
+  ctx: ExtensionCommandContext,
+  signals: ProjectSignals,
+): Promise<string[]> {
+  const installed: string[] = [];
+  const isBrownfield = signals.detectedFiles.length > 0;
+
+  if (isBrownfield) {
+    // ── Brownfield: auto-detect and confirm ─────────────────────────────────
+    const matched = matchPacksForProject(signals);
+    if (matched.length === 0) return installed;
+
+    // Filter out already-installed packs
+    const toInstall = matched.filter((p) => !isPackInstalled(p));
+    if (toInstall.length === 0) return installed;
+
+    // Group for display: Swift packs vs iOS packs vs other
+    const swiftPacks = toInstall.filter((p) => p.matchLanguages?.includes("swift"));
+    const iosPacks = toInstall.filter((p) => p.matchXcodePlatforms?.includes("iphoneos"));
+    const otherPacks = toInstall.filter((p) => !swiftPacks.includes(p) && !iosPacks.includes(p));
+
+    const summaryLines: string[] = [];
+    const hasIOS = signals.xcodePlatforms.includes("iphoneos");
+    if (hasIOS) {
+      summaryLines.push(`Detected: iOS project (${signals.primaryLanguage ?? "swift"})`);
+    } else if (signals.xcodePlatforms.length > 0) {
+      summaryLines.push(`Detected: ${signals.xcodePlatforms.join(", ")} Xcode project (${signals.primaryLanguage ?? "swift"})`);
+    } else {
+      summaryLines.push(`Detected: ${signals.primaryLanguage ?? "unknown"} project`);
+    }
+    summaryLines.push("");
+    summaryLines.push("Recommended skill packs:");
+    if (swiftPacks.length > 0) {
+      summaryLines.push(`  Swift: ${swiftPacks.map((p) => p.label).join(", ")}`);
+    }
+    if (iosPacks.length > 0) {
+      summaryLines.push(`  iOS: ${iosPacks.map((p) => p.label).join(", ")}`);
+    }
+    for (const p of otherPacks) {
+      summaryLines.push(`  • ${p.label}: ${p.description}`);
+    }
+
+    const totalSkills = toInstall.reduce((n, p) => n + p.skills.length, 0);
+    const choice = await showNextAction(ctx, {
+      title: "GSD — Install Skills",
+      summary: summaryLines,
+      actions: [
+        {
+          id: "install",
+          label: "Install recommended skills",
+          description: `Install ${totalSkills} skills from ${toInstall.length} pack${toInstall.length > 1 ? "s" : ""} via skills.sh`,
+          recommended: true,
+        },
+        {
+          id: "skip",
+          label: "Skip",
+          description: "Install skills later with npx skills add",
+        },
+      ],
+      notYetMessage: "Run /gsd init when ready.",
+    });
+
+    if (choice === "install") {
+      const labels = await installPacksBatched(toInstall, (label) => {
+        ctx.ui.notify(`Installing ${label} skills...`, "info");
+      });
+      installed.push(...labels);
+      const failed = toInstall.filter((p) => !installed.includes(p.label));
+      for (const pack of failed) {
+        ctx.ui.notify(`Failed to install ${pack.label} — try manually: npx skills add ${pack.repo}`, "info");
+      }
+    }
+  } else {
+    // ── Greenfield: install essentials only ─────────────────────────────────
+    // Don't ask the user what tech stack they're building — they may not know
+    // yet, especially non-technical users. Install essential packs (discovery,
+    // authoring, browser, docs) and let stack-specific skills auto-detect later
+    // once the LLM establishes the design and creates project files.
+    const essentials = SKILL_CATALOG.filter((p) => p.matchAlways && !isPackInstalled(p));
+    if (essentials.length === 0) return installed;
+
+    const totalSkills = essentials.reduce((n, p) => n + p.skills.length, 0);
+    const choice = await showNextAction(ctx, {
+      title: "GSD — Install Essential Skills",
+      summary: [
+        "GSD will install essential agent skills (skill discovery, authoring,",
+        "browser automation, document handling).",
+        "",
+        "Stack-specific skills (React, Swift, Python, etc.) will be recommended",
+        "automatically once your project files are in place.",
+      ],
+      actions: [
+        {
+          id: "install",
+          label: "Install essentials",
+          description: `Install ${totalSkills} essential skills via skills.sh`,
+          recommended: true,
+        },
+        {
+          id: "skip",
+          label: "Skip",
+          description: "Install skills later with npx skills add",
+        },
+      ],
+      notYetMessage: "Run /gsd init when ready.",
+    });
+
+    if (choice === "install") {
+      const labels = await installPacksBatched(essentials, (label) => {
+        ctx.ui.notify(`Installing ${label} skills...`, "info");
+      });
+      installed.push(...labels);
+    }
+  }
+
+  if (installed.length > 0) {
+    ctx.ui.notify(`Installed: ${installed.join(", ")}`, "info");
+  }
+
+  return installed;
+}
diff --git a/src/resources/extensions/gsd/skill-discovery.ts b/src/resources/extensions/gsd/skill-discovery.ts
index f623c1a21..e8c224ea4 100644
--- a/src/resources/extensions/gsd/skill-discovery.ts
+++ b/src/resources/extensions/gsd/skill-discovery.ts
@@ -10,9 +10,10 @@
 
 import { existsSync, readdirSync, readFileSync } from "node:fs";
 import { join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
+import { homedir } from "node:os";
 
-const SKILLS_DIR = join(getAgentDir(), "skills");
+/** Industry-standard skills.sh global skills directory */
+const SKILLS_DIR = join(homedir(), ".agents", "skills");
 
 export interface DiscoveredSkill {
   name: string;
diff --git a/src/resources/extensions/gsd/skill-health.ts b/src/resources/extensions/gsd/skill-health.ts
index 778bba7a3..a59f4d8aa 100644
--- a/src/resources/extensions/gsd/skill-health.ts
+++ b/src/resources/extensions/gsd/skill-health.ts
@@ -15,7 +15,7 @@
 
 import { existsSync, readFileSync, readdirSync } from "node:fs";
 import { join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
+import { homedir } from "node:os";
 import type { UnitMetrics, MetricsLedger } from "./metrics.js";
 import { formatCost, formatTokenCount, loadLedgerFromDisk } from "./metrics.js";
 import { getSkillLastUsed, detectStaleSkills } from "./skill-telemetry.js";
@@ -208,7 +208,7 @@ export function formatSkillDetail(basePath: string, skillName: string): string {
   }
 
   // Check for SKILL.md existence
-  const skillPath = join(getAgentDir(), "skills", skillName, "SKILL.md");
+  const skillPath = join(homedir(), ".agents", "skills", skillName, "SKILL.md");
   if (existsSync(skillPath)) {
     const stat = require("node:fs").statSync(skillPath);
     lines.push("");
diff --git a/src/resources/extensions/gsd/skill-telemetry.ts b/src/resources/extensions/gsd/skill-telemetry.ts
index ac99e4e83..f1bddfd21 100644
--- a/src/resources/extensions/gsd/skill-telemetry.ts
+++ b/src/resources/extensions/gsd/skill-telemetry.ts
@@ -13,7 +13,7 @@
 
 import { existsSync, readdirSync, readFileSync, statSync } from "node:fs";
 import { join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
+import { homedir } from "node:os";
 
 // ─── In-memory state ──────────────────────────────────────────────────────────
 
@@ -30,8 +30,14 @@ const activelyLoadedSkills = new Set<string>();
  * Called before each unit starts.
  */
 export function captureAvailableSkills(): void {
-  const skillsDir = join(getAgentDir(), "skills");
-  availableSkills = listSkillNames(skillsDir);
+  const skillsDir = join(homedir(), ".agents", "skills");
+  const legacyDir = join(homedir(), ".gsd", "agent", "skills");
+  const names = listSkillNames(skillsDir);
+  // Include skills still in the legacy directory only if migration hasn't completed
+  const legacyMigrated = existsSync(join(legacyDir, ".migrated-to-agents"));
+  const legacyNames = legacyMigrated ? [] : listSkillNames(legacyDir);
+  const all = new Set([...names, ...legacyNames]);
+  availableSkills = [...all];
   activelyLoadedSkills.clear();
 }
 
@@ -99,8 +105,12 @@ export function detectStaleSkills(
   const stale: string[] = [];
 
   // Check all installed skills, not just those with usage data
-  const skillsDir = join(getAgentDir(), "skills");
-  const installed = listSkillNames(skillsDir);
+  const skillsDir = join(homedir(), ".agents", "skills");
+  const legacyDir = join(homedir(), ".gsd", "agent", "skills");
+  const legacyMigrated = existsSync(join(legacyDir, ".migrated-to-agents"));
+  const legacyNames = legacyMigrated ? [] : listSkillNames(legacyDir);
+  const installedSet = new Set([...listSkillNames(skillsDir), ...legacyNames]);
+  const installed = [...installedSet];
 
   for (const skill of installed) {
     const lastTs = lastUsed.get(skill);
diff --git a/src/resources/extensions/gsd/tests/detection.test.ts b/src/resources/extensions/gsd/tests/detection.test.ts
index 1f363b72d..b1a1647dc 100644
--- a/src/resources/extensions/gsd/tests/detection.test.ts
+++ b/src/resources/extensions/gsd/tests/detection.test.ts
@@ -350,3 +350,841 @@ test("detectProjectSignals: Makefile with test target", (t) => {
   assert.ok(signals.detectedFiles.includes("Makefile"));
   assert.ok(signals.verificationCommands.includes("make test"));
 });
+
+test("detectProjectSignals: SQLite file detection via extensions", () => {
+  const dir = makeTempDir("signals-sqlite");
+  try {
+    writeFileSync(join(dir, "app.sqlite3"), "", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.sqlite"), "should add synthetic *.sqlite marker");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: SQL file detection", () => {
+  const dir = makeTempDir("signals-sql");
+  try {
+    writeFileSync(join(dir, "migrations.sql"), "", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.sql"), "should add synthetic *.sql marker");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested SQL file detection", () => {
+  const dir = makeTempDir("signals-sql-nested");
+  try {
+    mkdirSync(join(dir, "db", "migrations"), { recursive: true });
+    writeFileSync(join(dir, "db", "migrations", "001_init.sql"), "", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.sql"), "should detect nested SQL files");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: .db file triggers SQLite detection", () => {
+  const dir = makeTempDir("signals-db");
+  try {
+    writeFileSync(join(dir, "data.db"), "", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.sqlite"), "should add synthetic *.sqlite marker for .db files");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: no SQLite markers without matching files", () => {
+  const dir = makeTempDir("signals-no-sqlite");
+  try {
+    writeFileSync(join(dir, "package.json"), "{}", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("*.sqlite"), "should not have *.sqlite marker");
+    assert.ok(!signals.detectedFiles.includes("*.sql"), "should not have *.sql marker");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: .NET project via .csproj extension", () => {
+  const dir = makeTempDir("signals-dotnet");
+  try {
+    writeFileSync(join(dir, "MyApp.csproj"), "<Project></Project>", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.csproj"), "should add synthetic *.csproj marker");
+    assert.equal(signals.primaryLanguage, "csharp");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested .csproj detection", () => {
+  const dir = makeTempDir("signals-dotnet-nested");
+  try {
+    mkdirSync(join(dir, "src", "App"), { recursive: true });
+    writeFileSync(join(dir, "src", "App", "App.csproj"), "<Project></Project>", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.csproj"), "should detect nested .csproj files");
+    assert.equal(signals.primaryLanguage, "csharp");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: .NET project via .sln extension", () => {
+  const dir = makeTempDir("signals-sln");
+  try {
+    writeFileSync(join(dir, "MyApp.sln"), "", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.sln"), "should add synthetic *.sln marker for .sln files");
+    assert.equal(signals.primaryLanguage, "dotnet");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: F# project via .fsproj extension", () => {
+  const dir = makeTempDir("signals-fsharp");
+  try {
+    writeFileSync(join(dir, "MyApp.fsproj"), "<Project></Project>", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.fsproj"), "should add synthetic *.fsproj marker");
+    assert.equal(signals.primaryLanguage, "fsharp");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Angular project via angular.json", () => {
+  const dir = makeTempDir("signals-angular");
+  try {
+    writeFileSync(join(dir, "angular.json"), "{}", "utf-8");
+    writeFileSync(join(dir, "package.json"), "{}", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("angular.json"));
+    assert.equal(signals.primaryLanguage, "javascript/typescript");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Next.js project via next.config.ts", () => {
+  const dir = makeTempDir("signals-nextjs");
+  try {
+    writeFileSync(join(dir, "next.config.ts"), "export default {}", "utf-8");
+    writeFileSync(join(dir, "package.json"), "{}", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("next.config.ts"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested Next.js config via packages/web/next.config.ts", () => {
+  const dir = makeTempDir("signals-nextjs-nested");
+  try {
+    mkdirSync(join(dir, "packages", "web"), { recursive: true });
+    writeFileSync(join(dir, "packages", "web", "next.config.ts"), "export default {}", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("next.config.ts"), "should detect nested Next.js config");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Flutter project via pubspec.yaml", () => {
+  const dir = makeTempDir("signals-flutter");
+  try {
+    writeFileSync(join(dir, "pubspec.yaml"), "name: my_app", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("pubspec.yaml"));
+    assert.equal(signals.primaryLanguage, "dart/flutter");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Django project via manage.py", () => {
+  const dir = makeTempDir("signals-django");
+  try {
+    writeFileSync(join(dir, "manage.py"), "#!/usr/bin/env python", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("manage.py"));
+    assert.equal(signals.primaryLanguage, "python");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested Django manage.py", () => {
+  const dir = makeTempDir("signals-django-nested");
+  try {
+    mkdirSync(join(dir, "services", "api"), { recursive: true });
+    writeFileSync(join(dir, "services", "api", "manage.py"), "#!/usr/bin/env python", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("manage.py"), "should detect nested manage.py");
+    assert.equal(signals.primaryLanguage, "python");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Docker project via Dockerfile", () => {
+  const dir = makeTempDir("signals-docker");
+  try {
+    writeFileSync(join(dir, "Dockerfile"), "FROM node:18", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("Dockerfile"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Terraform project via main.tf", () => {
+  const dir = makeTempDir("signals-terraform");
+  try {
+    writeFileSync(join(dir, "main.tf"), 'provider "aws" {}', "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("main.tf"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+// ── QA4/QA5 — new detection tests ──────────────────────────────────────────
+
+test("detectProjectSignals: Vue.js via .vue files in src/", () => {
+  const dir = makeTempDir("signals-vue");
+  try {
+    writeFileSync(join(dir, "package.json"), '{"name":"vue-app"}', "utf-8");
+    mkdirSync(join(dir, "src"), { recursive: true });
+    writeFileSync(join(dir, "src", "App.vue"), "<template></template>", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.vue"), "should add *.vue synthetic marker");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Vue.js via nested .vue file in src/components/", () => {
+  const dir = makeTempDir("signals-vue-nested");
+  try {
+    writeFileSync(join(dir, "package.json"), '{"name":"vue-app"}', "utf-8");
+    mkdirSync(join(dir, "src", "components"), { recursive: true });
+    writeFileSync(join(dir, "src", "components", "Card.vue"), "<template></template>", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("*.vue"), "should detect nested .vue files");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Vue CLI via vue.config.js", () => {
+  const dir = makeTempDir("signals-vue-cli");
+  try {
+    writeFileSync(join(dir, "package.json"), '{"name":"vue-cli-app"}', "utf-8");
+    writeFileSync(join(dir, "vue.config.js"), "module.exports = {};", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("vue.config.js"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: requirements.txt sets Python language", () => {
+  const dir = makeTempDir("signals-requirements");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "flask==3.0\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("requirements.txt"));
+    assert.equal(signals.primaryLanguage, "python");
+    assert.ok(signals.verificationCommands.includes("pytest"), "should suggest pytest for requirements.txt projects");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Android project via app/build.gradle", () => {
+  const dir = makeTempDir("signals-android");
+  try {
+    mkdirSync(join(dir, "app"), { recursive: true });
+    writeFileSync(join(dir, "app", "build.gradle"), "apply plugin: 'com.android.application'", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("app/build.gradle"));
+    assert.equal(signals.primaryLanguage, "java/kotlin");
+    assert.ok(!signals.detectedFiles.includes("build.gradle"), "should not collapse Android app/build.gradle into generic build.gradle");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested app/build.gradle normalizes to Android marker", () => {
+  const dir = makeTempDir("signals-android-nested");
+  try {
+    mkdirSync(join(dir, "apps", "mobile", "app"), { recursive: true });
+    writeFileSync(join(dir, "apps", "mobile", "app", "build.gradle"), "apply plugin: 'com.android.application'", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("app/build.gradle"), "should detect nested Android app/build.gradle");
+    assert.ok(!signals.detectedFiles.includes("build.gradle"), "should not emit generic build.gradle marker for nested Android modules");
+    assert.equal(signals.primaryLanguage, "java/kotlin");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Unity project via ProjectSettings/ProjectVersion.txt", () => {
+  const dir = makeTempDir("signals-unity");
+  try {
+    mkdirSync(join(dir, "ProjectSettings"), { recursive: true });
+    writeFileSync(join(dir, "ProjectSettings", "ProjectVersion.txt"), "m_EditorVersion: 2022.3", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("ProjectSettings/ProjectVersion.txt"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Godot project via project.godot", () => {
+  const dir = makeTempDir("signals-godot");
+  try {
+    writeFileSync(join(dir, "project.godot"), "[application]", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("project.godot"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Airflow via airflow.cfg", () => {
+  const dir = makeTempDir("signals-airflow");
+  try {
+    writeFileSync(join(dir, "airflow.cfg"), "[core]\ndags_folder = ./dags", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("airflow.cfg"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Kubernetes via Chart.yaml (Helm)", () => {
+  const dir = makeTempDir("signals-k8s");
+  try {
+    writeFileSync(join(dir, "Chart.yaml"), "apiVersion: v2\nname: my-chart", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("Chart.yaml"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Blockchain via hardhat.config.ts", () => {
+  const dir = makeTempDir("signals-blockchain");
+  try {
+    writeFileSync(join(dir, "hardhat.config.ts"), 'import "@nomiclabs/hardhat-ethers"', "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("hardhat.config.ts"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: CI/CD via .github/workflows", () => {
+  const dir = makeTempDir("signals-cicd");
+  try {
+    mkdirSync(join(dir, ".github", "workflows"), { recursive: true });
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes(".github/workflows"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Tailwind via tailwind.config.ts", () => {
+  const dir = makeTempDir("signals-tailwind");
+  try {
+    writeFileSync(join(dir, "package.json"), '{"name":"tw-app"}', "utf-8");
+    writeFileSync(join(dir, "tailwind.config.ts"), "export default {};", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("tailwind.config.ts"));
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI detected via requirements.txt dependency", () => {
+  const dir = makeTempDir("signals-fastapi-req");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "fastapi==0.115.0\nuvicorn[standard]\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should add dep:fastapi marker");
+    assert.equal(signals.primaryLanguage, "python");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI detected via pyproject.toml dependency", () => {
+  const dir = makeTempDir("signals-fastapi-pyproject");
+  try {
+    writeFileSync(join(dir, "pyproject.toml"), '[project]\ndependencies = ["fastapi>=0.100"]\n', "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should add dep:fastapi marker");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI detected with PEP 508 ~= operator", () => {
+  const dir = makeTempDir("signals-fastapi-compatible-release");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "fastapi~=0.115\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "~= should count as a FastAPI dependency");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: pyproject metadata mention does not trigger dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-pyproject-metadata");
+  try {
+    writeFileSync(
+      join(dir, "pyproject.toml"),
+      '[project]\nname = "example"\nkeywords = ["fastapi"]\ndependencies = ["flask>=3.0"]\n',
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "metadata-only mentions should not trigger FastAPI detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: pyproject dependency table extras do not trigger dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-pyproject-table-extra");
+  try {
+    writeFileSync(
+      join(dir, "pyproject.toml"),
+      '[tool.poetry.dependencies]\npython = "^3.12"\nmy-sdk = { version = "^1.0", extras = ["fastapi"] }\n',
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "dependency table extras should not imply FastAPI framework usage");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Poetry group FastAPI dependency does not imply app framework usage", () => {
+  const dir = makeTempDir("signals-fastapi-poetry-group");
+  try {
+    writeFileSync(
+      join(dir, "pyproject.toml"),
+      '[tool.poetry.dependencies]\npython = "^3.12"\nflask = "^3.0"\n\n[tool.poetry.group.dev.dependencies]\nfastapi = "^0.115"\n',
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "Poetry dev-group dependencies should not imply FastAPI app usage");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: pyproject optional-dependency group name does not trigger dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-pyproject-extra-name");
+  try {
+    writeFileSync(
+      join(dir, "pyproject.toml"),
+      '[project]\ndependencies = ["flask>=3.0"]\n\n[project.optional-dependencies]\nfastapi = ["orjson>=3"]\n',
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "optional-dependency extra names should not trigger FastAPI detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: pyproject multiline optional dependency emits dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-pyproject-optional-multiline");
+  try {
+    writeFileSync(
+      join(dir, "pyproject.toml"),
+      '[project]\ndependencies = ["flask>=3.0"]\n\n[project.optional-dependencies]\napi = [\n  "fastapi>=0.115",\n  "uvicorn>=0.30",\n]\n',
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "multiline optional dependency arrays should trigger FastAPI detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI direct reference with @ emits dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-direct-reference");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "fastapi @ https://example.com/fastapi.whl\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "direct-reference dependencies should trigger FastAPI detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI detected via requirements.in", () => {
+  const dir = makeTempDir("signals-fastapi-requirements-in");
+  try {
+    writeFileSync(join(dir, "requirements.in"), "fastapi>=0.115\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "requirements.in should trigger FastAPI detection");
+    assert.ok(signals.detectedFiles.includes("requirements.txt"), "requirements.in should normalize to requirements.txt marker");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI detected via nested requirements/base.in", () => {
+  const dir = makeTempDir("signals-fastapi-requirements-dir-in");
+  try {
+    mkdirSync(join(dir, "requirements"), { recursive: true });
+    writeFileSync(join(dir, "requirements", "base.in"), "fastapi>=0.115\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "requirements/base.in should trigger FastAPI detection");
+    assert.ok(signals.detectedFiles.includes("requirements.txt"), "requirements/base.in should normalize to requirements.txt marker");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI comments do not trigger dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-comment");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "# maybe evaluate fastapi later\nflask==3.0\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "comments should not trigger FastAPI detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI inline comments do not trigger dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-inline-comment");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "flask==3.0  # maybe fastapi later\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "inline comments should not trigger FastAPI detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: fastapi-* packages do not trigger dep:fastapi without fastapi itself", () => {
+  const dir = makeTempDir("signals-fastapi-suffix-only");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "fastapi-users==13.0\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "fastapi-* packages alone should not imply FastAPI framework usage");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: dependency extras mentioning fastapi do not trigger dep:fastapi", () => {
+  const dir = makeTempDir("signals-fastapi-extra-only");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "my-sdk[fastapi]>=1.0\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "dependency extras should not imply FastAPI framework usage");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Django project does NOT get dep:fastapi marker", () => {
+  const dir = makeTempDir("signals-django-no-fastapi");
+  try {
+    writeFileSync(join(dir, "requirements.txt"), "django==5.0\ncelery\n", "utf-8");
+    writeFileSync(join(dir, "manage.py"), "#!/usr/bin/env python", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "should NOT add dep:fastapi for Django");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI detected case-insensitively (PyPI canonical name)", () => {
+  const dir = makeTempDir("signals-fastapi-case");
+  try {
+    writeFileSync(join(dir, "pyproject.toml"), '[project]\ndependencies = ["FastAPI>=0.100"]\n', "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should detect FastAPI (mixed case)");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: FastAPI detected via nested service requirements.txt", () => {
+  const dir = makeTempDir("signals-fastapi-nested");
+  try {
+    mkdirSync(join(dir, "services", "api"), { recursive: true });
+    writeFileSync(join(dir, "services", "api", "requirements.txt"), "fastapi==0.115.0\n", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should detect FastAPI in nested service requirements.txt");
+    assert.ok(signals.detectedFiles.includes("requirements.txt"), "should normalize nested requirements.txt marker");
+    assert.equal(signals.primaryLanguage, "python");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested Prisma schema normalizes to prisma/schema.prisma", () => {
+  const dir = makeTempDir("signals-prisma-nested");
+  try {
+    mkdirSync(join(dir, "services", "api", "prisma"), { recursive: true });
+    writeFileSync(join(dir, "services", "api", "prisma", "schema.prisma"), "datasource db { provider = \"sqlite\" }", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("prisma/schema.prisma"), "should detect nested Prisma schema");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested Spring Boot Gradle service emits dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-gradle-nested");
+  try {
+    mkdirSync(join(dir, "services", "api"), { recursive: true });
+    writeFileSync(
+      join(dir, "services", "api", "build.gradle"),
+      "plugins { id 'org.springframework.boot' version '3.2.0' }",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "should detect nested Spring Boot Gradle service");
+    assert.equal(signals.primaryLanguage, "java/kotlin");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: legacy apply plugin syntax emits dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-apply-plugin");
+  try {
+    writeFileSync(join(dir, "build.gradle"), "apply plugin: 'org.springframework.boot'", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "apply plugin syntax should trigger Spring Boot detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: nested Spring Boot Kotlin DSL service still uses neutral java/kotlin language hint", () => {
+  const dir = makeTempDir("signals-spring-gradle-kts-nested");
+  try {
+    mkdirSync(join(dir, "services", "api"), { recursive: true });
+    writeFileSync(
+      join(dir, "services", "api", "build.gradle.kts"),
+      "plugins { id(\"org.springframework.boot\") version \"3.2.0\" }",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"));
+    assert.equal(signals.primaryLanguage, "java/kotlin");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Android Gradle project does not emit dep:spring-boot", () => {
+  const dir = makeTempDir("signals-android-no-spring");
+  try {
+    writeFileSync(join(dir, "build.gradle"), "plugins { id 'com.android.application' }", "utf-8");
+    mkdirSync(join(dir, "app"), { recursive: true });
+    writeFileSync(join(dir, "app", "build.gradle"), "plugins { id 'com.android.application' }", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "Android Gradle files should not trigger Spring Boot detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Android inline comments do not emit dep:spring-boot", () => {
+  const dir = makeTempDir("signals-android-inline-comment");
+  try {
+    writeFileSync(join(dir, "build.gradle"), "plugins { id 'com.android.application' } // spring-boot maybe later", "utf-8");
+    mkdirSync(join(dir, "app"), { recursive: true });
+    writeFileSync(join(dir, "app", "build.gradle"), "plugins { id 'com.android.application' }", "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "inline comments should not trigger Spring Boot detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: build metadata mentioning spring-boot does not emit dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-metadata-only");
+  try {
+    writeFileSync(join(dir, "build.gradle"), 'def notes = "spring-boot migration planned later"', "utf-8");
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "arbitrary metadata text should not trigger Spring Boot detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Maven artifactId alone does not emit dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-maven-artifact-only");
+  try {
+    writeFileSync(
+      join(dir, "pom.xml"),
+      '<project><modelVersion>4.0.0</modelVersion><groupId>com.example</groupId><artifactId>spring-boot-tools</artifactId></project>',
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "artifactId alone should not imply Spring Boot");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Spring Boot version-catalog alias emits dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.backend.web) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "libs.versions.toml"),
+      "[plugins]\nbackend-web = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "should detect Spring Boot via version-catalog alias");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: commented Spring Boot alias in libs.versions.toml does not emit dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog-comment");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.backend.web) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "libs.versions.toml"),
+      "[plugins]\n# backend-web = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "commented aliases should not trigger Spring Boot detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: unused Spring Boot alias in libs.versions.toml does not emit dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog-unused");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.backend.web) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "libs.versions.toml"),
+      "[plugins]\nother-plugin = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "unused Spring Boot aliases should not trigger detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: spring-like alias name without Spring Boot id does not emit dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog-false-alias");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.spring.boot.conventions) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "libs.versions.toml"),
+      "[plugins]\nspring-boot-conventions = { id = 'com.example.conventions', version = '1.0.0' }\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "spring-looking alias names should not imply Spring Boot without matching id");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Spring Boot version-catalog library alias emits dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog-library");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(join(dir, "build.gradle.kts"), "dependencies { implementation(libs.backend.web) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "libs.versions.toml"),
+      "[libraries]\nbackend-web = { module = 'org.springframework.boot:spring-boot-starter-web', version = '3.2.0' }\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "Spring Boot library aliases should trigger detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Spring Boot version-catalog bundle alias emits dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog-bundle");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(join(dir, "build.gradle.kts"), "dependencies { implementation(libs.bundles.backend.web) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "libs.versions.toml"),
+      "[libraries]\nspring-boot-starter-web = { module = 'org.springframework.boot:spring-boot-starter-web', version = '3.2.0' }\n\n[bundles]\nbackend-web = ['spring-boot-starter-web']\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "Spring Boot bundle aliases should trigger detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Spring Boot custom version-catalog accessor emits dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog-custom-accessor");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(backend.plugins.web) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "backend.versions.toml"),
+      "[plugins]\nweb = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "custom version-catalog accessors should trigger Spring Boot detection");
+  } finally {
+    cleanup(dir);
+  }
+});
+
+test("detectProjectSignals: Spring Boot settings-defined catalog accessor emits dep:spring-boot", () => {
+  const dir = makeTempDir("signals-spring-version-catalog-settings-accessor");
+  try {
+    mkdirSync(join(dir, "gradle"), { recursive: true });
+    writeFileSync(
+      join(dir, "settings.gradle.kts"),
+      'dependencyResolutionManagement { versionCatalogs { create("backendLibs") { from(files("./gradle/backend.versions.toml")) } } }',
+      "utf-8",
+    );
+    writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(backendLibs.plugins.web) }", "utf-8");
+    writeFileSync(
+      join(dir, "gradle", "backend.versions.toml"),
+      "[plugins]\nweb = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+      "utf-8",
+    );
+    const signals = detectProjectSignals(dir);
+    assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "settings-defined catalog accessors should trigger Spring Boot detection");
+  } finally {
+    cleanup(dir);
+  }
+});
diff --git a/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts b/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts
new file mode 100644
index 000000000..fdbc8de0c
--- /dev/null
+++ b/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts
@@ -0,0 +1,474 @@
+import { validateTaskPlanContent, validateSlicePlanContent } from '../observability-validator.ts';
+import { createTestContext } from './test-helpers.ts';
+
+const { assertEq, assertTrue, report } = createTestContext();
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — empty/missing Steps section
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: empty Steps section ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something useful.
+
+## Steps
+
+## Verification
+
+- Run the tests and confirm output.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const stepsIssues = issues.filter(i => i.ruleId === 'empty_steps_section');
+  assertTrue(stepsIssues.length >= 1, 'empty Steps section produces empty_steps_section issue');
+  if (stepsIssues.length > 0) {
+    assertEq(stepsIssues[0].severity, 'warning', 'empty_steps_section severity is warning');
+    assertEq(stepsIssues[0].scope, 'task-plan', 'empty_steps_section scope is task-plan');
+  }
+}
+
+console.log('\n=== validateTaskPlanContent: missing Steps section entirely ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something useful.
+
+## Verification
+
+- Run the tests.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const stepsIssues = issues.filter(i => i.ruleId === 'empty_steps_section');
+  assertTrue(stepsIssues.length >= 1, 'missing Steps section produces empty_steps_section issue');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — placeholder-only Verification
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: placeholder-only Verification ===');
+{
+  const content = `# T01: Some Task
+
+## Steps
+
+1. Do the thing.
+2. Do the other thing.
+
+## Verification
+
+- {{placeholder verification step}}
+- {{another placeholder}}
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const verifyIssues = issues.filter(i => i.ruleId === 'placeholder_verification');
+  assertTrue(verifyIssues.length >= 1, 'placeholder-only Verification produces placeholder_verification issue');
+  if (verifyIssues.length > 0) {
+    assertEq(verifyIssues[0].severity, 'warning', 'placeholder_verification severity is warning');
+    assertEq(verifyIssues[0].scope, 'task-plan', 'placeholder_verification scope is task-plan');
+  }
+}
+
+console.log('\n=== validateTaskPlanContent: Verification with only template text ===');
+{
+  const content = `# T01: Some Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+{{whatWasVerifiedAndHow — commands run, tests passed, behavior confirmed}}
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const verifyIssues = issues.filter(i => i.ruleId === 'placeholder_verification');
+  assertTrue(verifyIssues.length >= 1, 'template-text-only Verification produces placeholder_verification issue');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateSlicePlanContent — empty inline task entries
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateSlicePlanContent: empty inline task entries ===');
+{
+  const content = `# S01: Some Slice
+
+**Goal:** Build the thing.
+**Demo:** It works.
+
+## Tasks
+
+- [ ] **T01: First Task** \`est:20m\`
+
+- [ ] **T02: Second Task** \`est:15m\`
+
+## Verification
+
+- Run the tests.
+`;
+
+  const issues = validateSlicePlanContent('S01-PLAN.md', content);
+  const emptyTaskIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+  assertTrue(emptyTaskIssues.length >= 1, 'task entries with no description produce empty_task_entry issue');
+  if (emptyTaskIssues.length > 0) {
+    assertEq(emptyTaskIssues[0].severity, 'warning', 'empty_task_entry severity is warning');
+    assertEq(emptyTaskIssues[0].scope, 'slice-plan', 'empty_task_entry scope is slice-plan');
+  }
+}
+
+console.log('\n=== validateSlicePlanContent: task entries with content are fine ===');
+{
+  const content = `# S01: Some Slice
+
+**Goal:** Build the thing.
+**Demo:** It works.
+
+## Tasks
+
+- [ ] **T01: First Task** \`est:20m\`
+  - Why: Because it matters.
+  - Files: \`src/index.ts\`
+  - Do: Implement the feature.
+
+- [ ] **T02: Second Task** \`est:15m\`
+  - Why: Also important.
+  - Do: Add tests.
+
+## Verification
+
+- Run the tests.
+`;
+
+  const issues = validateSlicePlanContent('S01-PLAN.md', content);
+  const emptyTaskIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+  assertEq(emptyTaskIssues.length, 0, 'task entries with description content produce no empty_task_entry issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — scope_estimate over threshold
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: scope_estimate over threshold ===');
+{
+  const content = `---
+estimated_steps: 12
+estimated_files: 15
+---
+
+# T01: Big Task
+
+## Steps
+
+1. Step one.
+2. Step two.
+3. Step three.
+
+## Verification
+
+- Check it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const stepsOverIssues = issues.filter(i => i.ruleId === 'scope_estimate_steps_high');
+  const filesOverIssues = issues.filter(i => i.ruleId === 'scope_estimate_files_high');
+  assertTrue(stepsOverIssues.length >= 1, 'estimated_steps=12 (>=10) produces scope_estimate_steps_high issue');
+  assertTrue(filesOverIssues.length >= 1, 'estimated_files=15 (>=12) produces scope_estimate_files_high issue');
+  if (stepsOverIssues.length > 0) {
+    assertEq(stepsOverIssues[0].severity, 'warning', 'scope_estimate_steps_high severity is warning');
+    assertEq(stepsOverIssues[0].scope, 'task-plan', 'scope_estimate_steps_high scope is task-plan');
+  }
+  if (filesOverIssues.length > 0) {
+    assertEq(filesOverIssues[0].severity, 'warning', 'scope_estimate_files_high severity is warning');
+  }
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — scope_estimate within limits
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: scope_estimate within limits ===');
+{
+  const content = `---
+estimated_steps: 4
+estimated_files: 6
+---
+
+# T01: Small Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const scopeIssues = issues.filter(i =>
+    i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(scopeIssues.length, 0, 'scope_estimate within limits produces no scope issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — missing scope_estimate (no warning)
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: missing scope_estimate ===');
+{
+  const content = `# T01: No Frontmatter Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const scopeIssues = issues.filter(i =>
+    i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(scopeIssues.length, 0, 'missing scope_estimate produces no scope issues');
+}
+
+console.log('\n=== validateTaskPlanContent: frontmatter without scope keys ===');
+{
+  const content = `---
+id: T01
+parent: S01
+---
+
+# T01: Task With Other Frontmatter
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const scopeIssues = issues.filter(i =>
+    i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(scopeIssues.length, 0, 'frontmatter without scope keys produces no scope issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Clean plans — no false positives
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== Clean task plan: no plan-quality issues ===');
+{
+  const content = `---
+estimated_steps: 5
+estimated_files: 3
+---
+
+# T01: Well-Formed Task
+
+## Description
+
+A real task with real content.
+
+## Steps
+
+1. Read the input files.
+2. Parse the configuration.
+3. Transform the data.
+4. Write the output.
+5. Verify the results.
+
+## Must-Haves
+
+- [ ] Output file is valid JSON
+- [ ] All input records are processed
+
+## Verification
+
+- Run \`node --test tests/transform.test.ts\` — all assertions pass
+- Manually inspect output.json for correct structure
+
+## Observability Impact
+
+- Signals added/changed: structured error log on parse failure
+- How a future agent inspects this: check stderr for JSON parse errors
+- Failure state exposed: exit code 1 + error message on invalid input
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const planQualityIssues = issues.filter(i =>
+    i.ruleId === 'empty_steps_section' ||
+    i.ruleId === 'placeholder_verification' ||
+    i.ruleId === 'scope_estimate_steps_high' ||
+    i.ruleId === 'scope_estimate_files_high'
+  );
+  assertEq(planQualityIssues.length, 0, 'clean task plan produces no plan-quality issues');
+}
+
+console.log('\n=== Clean slice plan: no plan-quality issues ===');
+{
+  const content = `# S01: Well-Formed Slice
+
+**Goal:** Build a complete feature.
+**Demo:** Run the test suite and see all green.
+
+## Tasks
+
+- [ ] **T01: Create tests** \`est:20m\`
+  - Why: Tests define the contract before implementation.
+  - Files: \`tests/feature.test.ts\`
+  - Do: Write comprehensive test assertions.
+  - Verify: Test file runs without syntax errors.
+
+- [ ] **T02: Implement feature** \`est:30m\`
+  - Why: Core implementation.
+  - Files: \`src/feature.ts\`
+  - Do: Build the feature to make tests pass.
+  - Verify: All tests pass.
+
+## Verification
+
+- \`node --test tests/feature.test.ts\` — all assertions pass
+- Check error output for diagnostic messages
+
+## Observability / Diagnostics
+
+- Runtime signals: structured error objects with error codes
+- Inspection surfaces: test output shows pass/fail counts
+- Failure visibility: exit code 1 on failure with descriptive message
+- Redaction constraints: none
+`;
+
+  const issues = validateSlicePlanContent('S01-PLAN.md', content);
+  const planQualityIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+  assertEq(planQualityIssues.length, 0, 'clean slice plan produces no empty_task_entry issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — missing output file paths
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: missing output file paths ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+
+## Expected Output
+
+This task produces the main output.
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const outputIssues = issues.filter(i => i.ruleId === 'missing_output_file_paths');
+  assertTrue(outputIssues.length >= 1, 'Expected Output without file paths triggers missing_output_file_paths');
+}
+
+console.log('\n=== validateTaskPlanContent: valid output file paths ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+
+## Expected Output
+
+- \`src/types.ts\` — New type definitions
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const outputIssues = issues.filter(i => i.ruleId === 'missing_output_file_paths');
+  assertEq(outputIssues.length, 0, 'Expected Output with file paths does not trigger warning');
+}
+
+console.log('\n=== validateTaskPlanContent: missing input file paths (info severity) ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+
+## Inputs
+
+Prior task summary insights about the architecture.
+
+## Expected Output
+
+- \`src/output.ts\` — Output file
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const inputIssues = issues.filter(i => i.ruleId === 'missing_input_file_paths');
+  assertTrue(inputIssues.length >= 1, 'Inputs without file paths triggers missing_input_file_paths');
+  if (inputIssues.length > 0) {
+    assertEq(inputIssues[0].severity, 'info', 'missing_input_file_paths is info severity (not warning)');
+  }
+}
+
+console.log('\n=== validateTaskPlanContent: no Expected Output section at all ===');
+{
+  const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+`;
+
+  const issues = validateTaskPlanContent('T01-PLAN.md', content);
+  const outputIssues = issues.filter(i => i.ruleId === 'missing_output_file_paths');
+  assertTrue(outputIssues.length >= 1, 'Missing Expected Output section triggers missing_output_file_paths');
+}
+
+report();
diff --git a/src/resources/extensions/gsd/tests/skill-catalog.test.ts b/src/resources/extensions/gsd/tests/skill-catalog.test.ts
new file mode 100644
index 000000000..4f7e3375e
--- /dev/null
+++ b/src/resources/extensions/gsd/tests/skill-catalog.test.ts
@@ -0,0 +1,193 @@
+/**
+ * Unit tests for GSD Skill Catalog — pack matching logic.
+ *
+ * Exercises matchPacksForProject() to verify that project signals
+ * correctly map to skill packs.
+ */
+
+import test from "node:test";
+import assert from "node:assert/strict";
+import { PROJECT_FILES } from "../detection.ts";
+import { GREENFIELD_STACKS, SKILL_CATALOG, matchPacksForProject } from "../skill-catalog.ts";
+import type { ProjectSignals } from "../detection.ts";
+
+function makeSignals(overrides: Partial<ProjectSignals> = {}): ProjectSignals {
+  return {
+    detectedFiles: [],
+    isGitRepo: false,
+    isMonorepo: false,
+    xcodePlatforms: [],
+    hasCI: false,
+    hasTests: false,
+    verificationCommands: [],
+    ...overrides,
+  };
+}
+
+function packLabels(signals: ProjectSignals): string[] {
+  return matchPacksForProject(signals).map((p) => p.label);
+}
+
+// ── matchAlways packs are always included ────────────────────────────────────
+
+test("matchPacksForProject: always includes matchAlways packs", () => {
+  const labels = packLabels(makeSignals());
+  assert.ok(labels.includes("Skill Discovery"), "should include Skill Discovery");
+  assert.ok(labels.includes("Skill Authoring"), "should include Skill Authoring");
+  assert.ok(labels.includes("Browser Automation"), "should include Browser Automation");
+  assert.ok(labels.includes("Document Handling"), "should include Document Handling");
+  assert.ok(labels.includes("Code Review & Quality"), "should include Code Review & Quality");
+  assert.ok(labels.includes("Git Advanced Workflows"), "should include Git Advanced Workflows");
+});
+
+// ── Language matching ────────────────────────────────────────────────────────
+
+test("matchPacksForProject: Python language matches Python packs", () => {
+  const labels = packLabels(makeSignals({ primaryLanguage: "python", detectedFiles: ["pyproject.toml"] }));
+  assert.ok(labels.includes("Python"), "should include Python");
+  assert.ok(labels.includes("Python Advanced"), "should include Python Advanced");
+});
+
+test("matchPacksForProject: Rust language matches Rust packs", () => {
+  const labels = packLabels(makeSignals({ primaryLanguage: "rust", detectedFiles: ["Cargo.toml"] }));
+  assert.ok(labels.includes("Rust"), "should include Rust");
+  assert.ok(labels.includes("Rust Async Patterns"), "should include Rust Async Patterns");
+});
+
+test("matchPacksForProject: Go language matches Go packs", () => {
+  const labels = packLabels(makeSignals({ primaryLanguage: "go", detectedFiles: ["go.mod"] }));
+  assert.ok(labels.includes("Go"), "should include Go");
+  assert.ok(labels.includes("Go Concurrency Patterns"), "should include Go Concurrency Patterns");
+});
+
+test("matchPacksForProject: JS/TS matches web frontend packs", () => {
+  const labels = packLabels(makeSignals({ primaryLanguage: "javascript/typescript", detectedFiles: ["package.json"] }));
+  assert.ok(labels.includes("React & Web Frontend"), "should include React");
+  assert.ok(labels.includes("TypeScript & JS Development"), "should include TS/JS Dev");
+  assert.ok(labels.includes("React State & Patterns"), "should include React State");
+  assert.ok(labels.includes("shadcn/ui"), "should include shadcn");
+  assert.ok(labels.includes("Frontend Design & UX"), "should include Frontend Design");
+});
+
+// ── File matching ────────────────────────────────────────────────────────────
+
+test("matchPacksForProject: angular.json triggers Angular packs", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["angular.json"] }));
+  assert.ok(labels.includes("Angular"), "should include Angular");
+  assert.ok(labels.includes("Angular Migration"), "should include Angular Migration");
+});
+
+test("matchPacksForProject: next.config.ts triggers Next.js packs", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["next.config.ts"] }));
+  assert.ok(labels.includes("Next.js"), "should include Next.js");
+  assert.ok(labels.includes("Next.js App Router Patterns"), "should include Next.js App Router");
+});
+
+test("matchPacksForProject: *.vue triggers Vue.js", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["*.vue"] }));
+  assert.ok(labels.includes("Vue.js"), "should include Vue.js");
+});
+
+test("matchPacksForProject: Chart.yaml triggers Kubernetes", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["Chart.yaml"] }));
+  assert.ok(labels.includes("Kubernetes"), "should include Kubernetes");
+});
+
+test("matchPacksForProject: hardhat.config.ts triggers Blockchain", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["hardhat.config.ts"] }));
+  assert.ok(labels.includes("Blockchain & Web3"), "should include Blockchain & Web3");
+});
+
+test("matchPacksForProject: tailwind.config.ts triggers Tailwind CSS", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["tailwind.config.ts"] }));
+  assert.ok(labels.includes("Tailwind CSS"), "should include Tailwind CSS");
+});
+
+// ── Xcode platform matching ─────────────────────────────────────────────────
+
+test("matchPacksForProject: iphoneos triggers iOS packs", () => {
+  const labels = packLabels(makeSignals({ xcodePlatforms: ["iphoneos"] }));
+  assert.ok(labels.includes("iOS App Frameworks"), "should include iOS App Frameworks");
+  assert.ok(labels.includes("iOS Data Frameworks"), "should include iOS Data Frameworks");
+  assert.ok(labels.includes("iOS AI & ML"), "should include iOS AI & ML");
+  assert.ok(labels.includes("iOS Engineering"), "should include iOS Engineering");
+  assert.ok(labels.includes("iOS Hardware"), "should include iOS Hardware");
+  assert.ok(labels.includes("iOS Platform"), "should include iOS Platform");
+});
+
+// ── Isolation checks — packs that should NOT match ──────────────────────────
+
+test("matchPacksForProject: FastAPI does not match generic Python", () => {
+  const labels = packLabels(makeSignals({ primaryLanguage: "python", detectedFiles: ["pyproject.toml"] }));
+  assert.ok(!labels.includes("FastAPI"), "FastAPI should NOT match generic Python projects");
+});
+
+test("matchPacksForProject: FastAPI matches when dep:fastapi detected", () => {
+  const labels = packLabels(makeSignals({ primaryLanguage: "python", detectedFiles: ["pyproject.toml", "dep:fastapi"] }));
+  assert.ok(labels.includes("FastAPI"), "FastAPI should match when dep:fastapi is in detectedFiles");
+});
+
+test("matchPacksForProject: Spring Boot does not match via language alone", () => {
+  // Simulate Android project: has java/kotlin language but no root pom.xml/build.gradle
+  const labels = packLabels(makeSignals({ primaryLanguage: "java/kotlin", detectedFiles: ["app/build.gradle"] }));
+  assert.ok(!labels.includes("Java & Spring Boot"), "Spring Boot should NOT match via language alone");
+});
+
+test("matchPacksForProject: Spring Boot matches only dep:spring-boot", () => {
+  const positive = packLabels(makeSignals({ detectedFiles: ["dep:spring-boot"] }));
+  assert.ok(positive.includes("Java & Spring Boot"), "should include Spring Boot pack when dependency marker exists");
+
+  const androidLike = packLabels(makeSignals({ detectedFiles: ["build.gradle", "app/build.gradle"], primaryLanguage: "java/kotlin" }));
+  assert.ok(!androidLike.includes("Java & Spring Boot"), "generic Gradle + Android markers should not imply Spring Boot");
+});
+
+test("matchPacksForProject: Unity does not include Godot", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["ProjectSettings/ProjectVersion.txt"] }));
+  assert.ok(labels.includes("Unity"), "should include Unity");
+  assert.ok(!labels.includes("Godot"), "should NOT include Godot");
+});
+
+test("matchPacksForProject: Godot does not include Unity", () => {
+  const labels = packLabels(makeSignals({ detectedFiles: ["project.godot"] }));
+  assert.ok(labels.includes("Godot"), "should include Godot");
+  assert.ok(!labels.includes("Unity"), "should NOT include Unity");
+});
+
+test("matchPacksForProject: .NET backend patterns match F# and solution markers", () => {
+  const fsprojLabels = packLabels(makeSignals({ detectedFiles: ["*.fsproj"], primaryLanguage: "fsharp" }));
+  assert.ok(fsprojLabels.includes(".NET Backend Patterns"), "should include generic .NET backend patterns for F# projects");
+  assert.ok(!fsprojLabels.includes(".NET & C#"), "should not include C#-specific pack for F# projects");
+
+  const slnLabels = packLabels(makeSignals({ detectedFiles: ["*.sln"], primaryLanguage: "dotnet" }));
+  assert.ok(slnLabels.includes(".NET Backend Patterns"), "should include generic .NET backend patterns for solution files");
+});
+
+test("SKILL_CATALOG: every matchFiles entry is backed by detection", () => {
+  const knownMarkers = new Set<string>([
+    ...PROJECT_FILES,
+    "*.sqlite",
+    "*.sql",
+    "*.csproj",
+    "*.fsproj",
+    "*.sln",
+    "*.vue",
+    "dep:fastapi",
+    "dep:spring-boot",
+  ]);
+
+  for (const pack of SKILL_CATALOG) {
+    for (const marker of pack.matchFiles ?? []) {
+      assert.ok(knownMarkers.has(marker), `Unknown detection marker: ${marker} (pack: ${pack.label})`);
+    }
+  }
+});
+
+test("GREENFIELD_STACKS: every pack label resolves to SKILL_CATALOG", () => {
+  const labels = new Set(SKILL_CATALOG.map((pack) => pack.label));
+
+  for (const stack of GREENFIELD_STACKS) {
+    for (const packLabel of stack.packs) {
+      assert.ok(labels.has(packLabel), `Unknown pack label: ${packLabel} (stack: ${stack.id})`);
+    }
+  }
+});
diff --git a/src/tests/app-smoke.test.ts b/src/tests/app-smoke.test.ts
index 90d8a7953..c6a55f291 100644
--- a/src/tests/app-smoke.test.ts
+++ b/src/tests/app-smoke.test.ts
@@ -203,8 +203,7 @@ test("initResources syncs extensions, agents, and skills to target dir", async (
   // Agents synced
   assert.ok(existsSync(join(fakeAgentDir, "agents", "scout.md")), "scout agent synced");
 
-  // Skills synced
-  assert.ok(existsSync(join(fakeAgentDir, "skills")), "skills directory synced");
+  // Skills are NOT synced here — they use ~/.agents/skills/ via skills.sh
 
   // Version manifest synced
   const managedVersion = readManagedResourceVersion(fakeAgentDir);