diff --git a/docs-internal/configuration.md b/docs-internal/configuration.md
index 067eb5da8..2c7fe49ed 100644
--- a/docs-internal/configuration.md
+++ b/docs-internal/configuration.md
@@ -578,7 +578,7 @@ prefer_skills:
avoid_skills: []
```
-Skills can be bare names (looked up in `~/.gsd/agent/skills/`) or absolute paths.
+Skills can be bare names (looked up in `~/.agents/skills/` and `.agents/skills/`) or absolute paths.
### `skill_rules`
diff --git a/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md b/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md
index aa0fc79ea..7bb2c57cc 100644
--- a/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md
+++ b/docs-internal/context-and-hooks/07-the-system-prompt-anatomy.md
@@ -174,7 +174,7 @@ When a skill file references a relative path, resolve it against the skill direc
commit-outstanding
Commit all uncommitted files in logical groups
- /Users/you/.gsd/agent/skills/commit-outstanding/SKILL.md
+ /Users/you/.agents/skills/commit-outstanding/SKILL.md
```
diff --git a/docs-internal/skills.md b/docs-internal/skills.md
index 71f039546..6a9e1d567 100644
--- a/docs-internal/skills.md
+++ b/docs-internal/skills.md
@@ -2,28 +2,85 @@
Skills are specialized instruction sets that GSD loads when the task matches. They provide domain-specific guidance for the LLM — coding patterns, framework idioms, testing strategies, and tool usage.
-## Bundled Skills
+Skills follow the open [Agent Skills standard](https://agentskills.io/) and are **not GSD-specific** — they work with Claude Code, OpenAI Codex, Cursor, GitHub Copilot, Windsurf, and 40+ other agents.
-GSD ships with these skills, installed to `~/.gsd/agent/skills/`:
+## Skill Directories
-| Skill | Trigger | Description |
-|-------|---------|-------------|
-| `frontend-design` | Web UI work — components, pages, dashboards, styling | Production-grade frontend with high design quality |
-| `swiftui` | macOS/iOS apps — SwiftUI, Xcode, App Store | Full lifecycle from creation to shipping |
-| `debug-like-expert` | Complex debugging — after standard approaches fail | Methodical investigation with evidence gathering |
-| `rust-core` | Rust code — ownership, lifetimes, traits, async | Idiomatic, safe, performant Rust patterns |
-| `axum-web-framework` | Axum web apps — routing, middleware, extractors | Complete Axum development guide |
-| `axum-tests` | Testing Axum apps — integration tests, mock state | Test patterns for Axum applications |
-| `tauri` | Tauri v2 desktop apps — setup, plugins, bundling | Cross-platform desktop app development |
-| `tauri-ipc-developer` | Tauri IPC — React-Rust type-safe communication | Command scaffolding and serialization |
-| `tauri-devtools` | Tauri debugging — CrabNebula DevTools integration | Profiling and monitoring |
-| `github-workflows` | GitHub Actions — CI/CD, workflow debugging | Live syntax, run monitoring, failure diagnosis |
-| `security-audit` | Security auditing — dependency scanning, OWASP | Comprehensive security assessment |
-| `security-review` | Code security review — injection, XSS, auth flaws | Vulnerability-focused code review |
-| `security-docker` | Docker security — Dockerfile, runtime hardening | Container security best practices |
-| `review` | Code review — staged changes, PRs, security, performance | Diff-aware code review with quality analysis |
-| `test` | Test generation and execution — auto-detects frameworks | Generate tests or run existing suites with failure analysis |
-| `lint` | Linting and formatting — ESLint, Biome, Prettier | Auto-detect linter, fix issues, report remaining problems |
+GSD reads skills from two locations, in priority order:
+
+| Location | Scope | Description |
+|-----------------------------------|---------|----------------------------------------------------------|
+| `~/.agents/skills/` | Global | Shared across all projects and all compatible agents |
+| `.agents/skills/` (project root) | Project | Project-specific skills, committable to version control |
+
+Global skills take precedence over project skills when names collide.
+
+> **Migration from `~/.gsd/agent/skills/`:** On first launch after upgrading, GSD automatically copies skills from the legacy `~/.gsd/agent/skills/` directory to `~/.agents/skills/`. The old directory is preserved for backward compatibility.
+
+## Installing Skills
+
+Skills are installed via the [skills.sh CLI](https://skills.sh):
+
+```bash
+# Interactive — choose skills and target agents
+npx skills add dpearson2699/swift-ios-skills
+
+# Install specific skills non-interactively
+npx skills add dpearson2699/swift-ios-skills --skill swift-concurrency --skill swiftui-patterns -y
+
+# Install all skills from a repo
+npx skills add dpearson2699/swift-ios-skills --all
+
+# Check for updates
+npx skills check
+
+# Update installed skills
+npx skills update
+```
+
+### Onboarding Catalog
+
+During `gsd init`, GSD detects the project's tech stack and recommends relevant skill packs. For brownfield projects, detection is automatic; for greenfield projects, the user picks a tech stack.
+
+The curated catalog is maintained in `src/resources/extensions/gsd/skill-catalog.ts`. Each entry maps a tech stack to a skills.sh repo and specific skill names.
+
+#### Available Skill Packs
+
+**Swift (any Swift project — `Package.swift` or `.xcodeproj` detected):**
+- **SwiftUI** — layout, navigation, animations, gestures, Liquid Glass
+- **Swift Core** — Swift language, concurrency, Codable, Charts, Testing, SwiftData
+
+**iOS (only when `.xcodeproj` targets `iphoneos` via SDKROOT):**
+- **iOS App Frameworks** — App Intents, Widgets, StoreKit, MapKit, Live Activities
+- **iOS Data Frameworks** — CloudKit, HealthKit, MusicKit, WeatherKit, Contacts
+- **iOS AI & ML** — Core ML, Vision, on-device AI, speech recognition
+- **iOS Engineering** — networking, security, accessibility, localization, Instruments
+- **iOS Hardware** — Bluetooth, CoreMotion, NFC, PencilKit, RealityKit
+- **iOS Platform** — CallKit, EnergyKit, HomeKit, SharePlay, PermissionKit
+
+**Web:**
+- **React & Web Frontend** — React best practices, web design, composition patterns
+- **React Native** — cross-platform mobile patterns
+- **Frontend Design & UX** — frontend design, accessibility
+
+**Languages:**
+- **Rust** — Rust patterns and best practices
+- **Python** — Python patterns and best practices
+- **Go** — Go patterns and best practices
+
+**General:**
+- **Document Handling** — PDF, DOCX, XLSX, PPTX creation and manipulation
+
+### Maintaining the Catalog
+
+The skill catalog lives in [`src/resources/extensions/gsd/skill-catalog.ts`](../src/resources/extensions/gsd/skill-catalog.ts). To add or update a pack:
+
+1. Add a `SkillPack` entry to the `SKILL_CATALOG` array with `repo`, `skills`, and matching criteria
+2. For language-detection matching, use `matchLanguages` (values from `detection.ts` `LANGUAGE_MAP`)
+3. For Xcode platform matching, use `matchXcodePlatforms` (e.g., `["iphoneos"]` — parsed from `SDKROOT` in `project.pbxproj`)
+4. For file-presence matching, use `matchFiles` (checked against `PROJECT_FILES` in `detection.ts`)
+5. If the pack should appear in greenfield choices, add it to `GREENFIELD_STACKS`
+6. Packs sharing the same `repo` are batched into a single `npx skills add` invocation
## Skill Discovery
@@ -59,18 +116,18 @@ skill_rules:
### Resolution Order
Skills can be referenced by:
-1. **Bare name** — e.g., `frontend-design` → scans `~/.gsd/agent/skills/` and project skills
-2. **Absolute path** — e.g., `/Users/you/.gsd/agent/skills/my-skill/SKILL.md`
+1. **Bare name** — e.g., `frontend-design` → scans `~/.agents/skills/` and project `.agents/skills/`
+2. **Absolute path** — e.g., `/Users/you/.agents/skills/my-skill/SKILL.md`
3. **Directory path** — e.g., `~/custom-skills/my-skill` → looks for `SKILL.md` inside
-User skills (`~/.gsd/agent/skills/`) take precedence over project skills.
+Global skills (`~/.agents/skills/`) take precedence over project skills (`.agents/skills/`).
## Custom Skills
Create your own skills by adding a directory with a `SKILL.md` file:
```
-~/.gsd/agent/skills/my-skill/
+~/.agents/skills/my-skill/
SKILL.md — instructions for the LLM
references/ — optional reference files
```
@@ -82,10 +139,12 @@ The `SKILL.md` file contains instructions the LLM follows when the skill is acti
Place skills in your project for project-specific guidance:
```
-.gsd/agent/skills/my-project-skill/
+.agents/skills/my-project-skill/
SKILL.md
```
+Project-local skills can be committed to version control so team members share the same skill set.
+
## Skill Lifecycle Management
GSD tracks skill performance across auto-mode sessions and surfaces health data to help you maintain skill quality.
diff --git a/docs-internal/what-is-pi/09-the-customization-stack.md b/docs-internal/what-is-pi/09-the-customization-stack.md
index 10a3fb42d..10d032b39 100644
--- a/docs-internal/what-is-pi/09-the-customization-stack.md
+++ b/docs-internal/what-is-pi/09-the-customization-stack.md
@@ -48,8 +48,8 @@ On-demand capability packages following the [Agent Skills standard](https://agen
```
**Placement:**
-- `~/.gsd/agent/skills/` or `~/.agents/skills/` (global)
-- `.gsd/skills/` or `.agents/skills/` (project, searched up to git root)
+- `~/.agents/skills/` (global — shared across all agents)
+- `.agents/skills/` (project, searched up to git root)
**Skill structure:**
```
diff --git a/docs/ADR-001-branchless-worktree-architecture.md b/docs/ADR-001-branchless-worktree-architecture.md
new file mode 100644
index 000000000..478dade24
--- /dev/null
+++ b/docs/ADR-001-branchless-worktree-architecture.md
@@ -0,0 +1,279 @@
+# ADR-001: Branchless Worktree Architecture
+
+**Status:** Proposed
+**Date:** 2026-03-15
+**Deciders:** Lex Christopherson
+**Advisors:** Claude Opus 4.6, Gemini 2.5 Pro, GPT-5.4 (Codex)
+
+## Context
+
+GSD uses git for isolation during autonomous coding sessions. The current architecture (shipped in M003, v2.13.0) creates a **worktree per milestone** with **slice branches inside each worktree**. Each slice (`S01`, `S02`, ...) gets its own branch (`gsd/M001/S01`) within the worktree, which merges back to the milestone branch (`milestone/M001`) via `--no-ff` when the slice completes. The milestone branch squash-merges to `main` when the milestone completes.
+
+This architecture replaced a previous "branch-per-slice" model that had severe `.gsd/` merge conflicts. M003 solved the merge conflicts but retained slice branches inside worktrees, inheriting complexity that has produced persistent, user-facing failures.
+
+### Problems
+
+**1. Planning artifact invisibility (loop detection failures)**
+
+When `research-slice` or `plan-slice` dispatches, the agent writes artifacts (e.g., `S02-RESEARCH.md`) on a slice branch. After the agent completes, `handleAgentEnd` switches back to the milestone branch for the next dispatch. The artifact is on the slice branch, not the milestone branch. `verifyExpectedArtifact()` checks the milestone branch, can't find the file, increments the loop counter, retries, same result. After 3 retries → hard stop. After 6 lifetime dispatches → permanent stop. This burns budget and blocks progress.
+
+Documented in the auto-stop architecture doc as "The Branch-Switching Problem."
+
+**2. `.gsd/` state clobbering across branches**
+
+`.gsd/` is gitignored (line 52 of `.gitignore`: `.gsd/`). Planning artifacts (roadmaps, plans, summaries, decisions, requirements) live in `.gsd/milestones/` but are invisible to git. When multiple branches or worktrees operate from the same repo, they share a single `.gsd/` directory on disk. Branch A's M001 roadmap overwrites Branch B's M001 roadmap. GSD reads corrupted state, shows wrong milestone as complete, or enters infinite dispatch loops.
+
+The codebase has a contradictory workaround: `smartStage()` (git-service.ts:304-352) force-adds `GSD_DURABLE_PATHS` (milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md) despite the `.gitignore`. This means `.gsd/milestones/` IS partially tracked on some branches but the gitignore claims otherwise. The code fights the configuration.
+
+**3. Merge/conflict code complexity**
+
+The current slice branch model requires:
+- `mergeSliceToMilestone()` — 98 lines, `--no-ff` merge with `withMergeHeal` wrapper
+- `mergeSliceToMain()` — 189 lines, squash-merge with conflict detection/categorization/auto-resolution
+- `git-self-heal.ts` — 198 lines, 3 recovery functions for merge failures
+- `fix-merge` dispatch unit — dedicated LLM session to resolve conflicts the auto-resolver can't handle
+- `smartStage()` — 49 lines of runtime exclusion during staging
+- Conflict categorization — 80 lines classifying `.gsd/` vs runtime vs code conflicts
+
+Total: **~582 lines** of merge/branch/conflict code across 3 files, plus the `fix-merge` prompt template and dispatch logic. This code exists solely because of slice branches.
+
+**4. Dual isolation modes**
+
+Branch-mode (`git-service.ts:mergeSliceToMain`) and worktree-mode (`auto-worktree.ts:mergeSliceToMilestone`) have parallel implementations with different merge strategies, different conflict handling, and different branch naming. Both paths must be maintained and tested. 11 test files exercise merge/branch/worktree logic.
+
+**5. Bug history**
+
+- v2.11.1: URGENT fix for parse cache staleness causing repeated unit dispatch (directly caused by branch switching invalidation timing)
+- v2.13.1: Windows hotfix for multi-line commit messages in `mergeSliceToMilestone`
+- 15+ separate bug fixes for `.gsd/` merge conflicts in the pre-M003 era
+- Persistent user complaints about loop detection failures and state corruption
+
+## Decision
+
+**Eliminate slice branches entirely.** All work within a milestone worktree commits sequentially on a single branch (`milestone/`). No branch creation, no branch switching, no slice merges, no conflict resolution within a worktree.
+
+Track `.gsd/` planning artifacts in git. Gitignore only runtime/ephemeral state.
+
+### The Architecture
+
+```
+main ──────────────────────────────────────────── main
+ │ ↑
+ └─ worktree (milestone/M001) │
+ │ │
+ commit: feat(M001): context + roadmap │
+ commit: feat(M001/S01): research │
+ commit: feat(M001/S01): plan │
+ commit: feat(M001/S01/T01): impl │
+ commit: feat(M001/S01/T02): impl │
+ commit: feat(M001/S01): summary + UAT │
+ commit: feat(M001/S02): research │
+ commit: ... │
+ commit: feat(M001): milestone complete │
+ │ │
+ └──────────── squash merge ──────────────────┘
+```
+
+### Git Primitives Used
+
+| Primitive | Purpose |
+|-----------|---------|
+| **Worktrees** | One per active milestone. Filesystem isolation. |
+| **Commits** | Granular sequential history of every action. |
+| **Squash merge** | Clean single commit on `main` per milestone. |
+| **Branches** | Only `main` and `milestone/`. Nothing else. |
+
+### Git Primitives NOT Used
+
+| Primitive | Why Not |
+|-----------|---------|
+| Slice branches | Slices are sequential. Branches add complexity with no rollback benefit. |
+| `--no-ff` merges | No branches to merge within a worktree. |
+| Branch switching | Never happens. All work on one branch. |
+| Conflict resolution | No merges within a worktree means no conflicts within a worktree. |
+
+### `.gsd/` Tracking Model
+
+**Tracked in git (travels with the branch):**
+```
+.gsd/milestones/ — roadmaps, plans, summaries, research, contexts, task plans/summaries
+.gsd/PROJECT.md — project overview
+.gsd/DECISIONS.md — architectural decision register
+.gsd/REQUIREMENTS.md — requirements register
+.gsd/QUEUE.md — work queue
+```
+
+**Gitignored (ephemeral, runtime, infrastructure):**
+```
+.gsd/runtime/ — dispatch records, timeout tracking
+.gsd/activity/ — JSONL session dumps
+.gsd/worktrees/ — git worktree working directories
+.gsd/auto.lock — crash detection sentinel
+.gsd/metrics.json — token/cost accumulator
+.gsd/completed-units.json — dispatch idempotency tracker
+.gsd/STATE.md — derived state cache (rebuilt by deriveState())
+.gsd/gsd.db — SQLite cache (rebuilt from tracked markdown by importers)
+.gsd/DISCUSSION-MANIFEST.json — discussion phase tracking
+.gsd/milestones/**/*-CONTINUE.md — interrupted-work markers
+.gsd/milestones/**/continue.md — legacy continue markers
+```
+
+### `.gitignore` Update
+
+Replace the current blanket `.gsd/` ignore with explicit runtime-only ignores:
+
+```gitignore
+# ── GSD: Runtime / Ephemeral ─────────────────────────────────
+.gsd/auto.lock
+.gsd/completed-units.json
+.gsd/STATE.md
+.gsd/metrics.json
+.gsd/gsd.db
+.gsd/activity/
+.gsd/runtime/
+.gsd/worktrees/
+.gsd/DISCUSSION-MANIFEST.json
+.gsd/milestones/**/*-CONTINUE.md
+.gsd/milestones/**/continue.md
+```
+
+Planning artifacts (milestones/, PROJECT.md, DECISIONS.md, REQUIREMENTS.md, QUEUE.md) are NOT in `.gitignore` and are tracked normally.
+
+## Consequences
+
+### Code Deletion
+
+| File | Lines Deleted | What's Removed |
+|------|--------------|----------------|
+| `auto-worktree.ts` | ~246 | `mergeSliceToMilestone()`, `shouldUseWorktreeIsolation()`, `getMergeToMainMode()`, slice merge guards |
+| `git-service.ts` | ~250 | `mergeSliceToMain()`, conflict resolution, runtime stripping post-merge, `ensureSliceBranch()`, `switchToMain()` |
+| `git-self-heal.ts` | ~86 | `abortAndReset()`, `withMergeHeal()` (merge-specific recovery) |
+| `auto.ts` | ~150 | Merge dispatch guards, `fix-merge` dispatch path, branch-mode routing |
+| `worktree.ts` | ~40 | `getSliceBranchName()`, `ensureSliceBranch()`, `mergeSliceToMain()` delegates |
+| **Test files** | ~11 files | `auto-worktree-merge.test.ts`, `auto-worktree-milestone-merge.test.ts`, merge-related test cases |
+| **Total** | **~770+ lines** | |
+
+### What `mergeMilestoneToMain()` Becomes
+
+The function simplifies dramatically:
+1. Auto-commit any dirty state in worktree
+2. `chdir` back to main repo root
+3. `git checkout main`
+4. `git merge --squash milestone/`
+5. `git commit` with milestone summary
+6. Remove worktree + delete branch
+
+No conflict categorization. No runtime file stripping. No `.gsd/` special handling. Planning artifacts merge cleanly because they're in `.gsd/milestones/M001/` which doesn't exist on `main` until this merge.
+
+### What `smartStage()` Becomes
+
+The force-add of `GSD_DURABLE_PATHS` is no longer needed — planning artifacts are not gitignored, so `git add -A` picks them up naturally. The function reduces to:
+
+1. `git add -A`
+2. `git reset HEAD -- ` (unstage runtime files)
+
+The `_runtimeFilesCleanedUp` one-time migration logic can also be removed.
+
+### What Happens to `handleAgentEnd()`
+
+After any unit completes:
+1. Invalidate caches
+2. `autoCommitCurrentBranch()` — commits on the one and only branch
+3. `verifyExpectedArtifact()` — file is always on the current branch (no branch switching)
+4. Persist completion key
+
+The "Path A fix" (lines 937-953) becomes the only path. No branch mismatch possible.
+
+### What Happens to `fix-merge`
+
+The `fix-merge` dispatch unit type is eliminated. Within a worktree, there are no merges that can conflict. The only merge is milestone→main (squash), and if that conflicts (rare, parallel milestone edge case), it's handled as a one-time resolution at milestone completion — not a dispatch loop.
+
+### Backwards Compatibility
+
+The `shouldUseWorktreeIsolation()` three-tier preference resolution is replaced by a single behavior: worktree isolation is always used. The `git.isolation: "branch"` preference is deprecated.
+
+Projects with existing `gsd/M001/S01` slice branches can still be read by state derivation, but new work never creates slice branches.
+
+### Risks
+
+**1. Parallel milestone code conflicts at squash-merge time**
+
+If two milestones modify the same source file, the second squash-merge to `main` will conflict. Mitigation: `git fetch origin main && git rebase main` before squash-merge. This is standard practice and rare in single-user workflows.
+
+**2. Loss of per-slice git history after squash**
+
+Squash merge collapses all commits into one on `main`. Mitigations:
+- Commit messages tag slices (`feat(M001/S01/T01):`) — filterable with `git log --grep`
+- The milestone branch can be preserved (not deleted) if history is needed
+- Alternative: `merge --no-ff` instead of `--squash` to keep history on `main`
+
+**3. SQLite DB desync after `git reset`**
+
+If tracked markdown rolls back via `git reset --hard`, the gitignored `gsd.db` doesn't. Mitigation: the importer layer (M001/S02) rebuilds the DB from markdown on startup. The DB is a cache, markdown is truth.
+
+**4. Disk space with multiple worktrees**
+
+Each worktree duplicates the working directory (including `node_modules`). Mitigation: single active milestone at a time (single-user workflow), immediate cleanup after completion.
+
+## Alternatives Considered
+
+### A. Keep slice branches, fix visibility with immediate mini-merges
+
+After `research-slice` or `plan-slice`, immediately merge the slice branch back to the milestone branch. This fixes the loop detection bug but retains all merge complexity.
+
+**Rejected:** Adds another merge path instead of removing the root cause. Still requires conflict resolution, self-healing, branch switching.
+
+### B. Keep `.gsd/` gitignored, bootstrap from git history for manual worktrees
+
+When GSD detects an empty `.gsd/` in a worktree, reconstruct state from the branch's git history using `git show :.gsd/...`.
+
+**Rejected:** Recovery logic, not architecture. Doesn't fix the fundamental problem of branch-agnostic state. Fails when git history has been rewritten.
+
+### C. Branch-scoped `.gsd/` directories (`.gsd/branches//milestones/...`)
+
+Each branch writes to a namespaced subdirectory within `.gsd/`.
+
+**Rejected:** Adds complexity instead of removing it. Requires renaming/moving on branch creation, doesn't work with standard git tools (`git checkout` doesn't rename directories).
+
+## Validation
+
+This architecture was stress-tested by three independent models:
+
+**Gemini 2.5 Pro** identified 6 attack vectors. None broke the core model. Recommendations: pre-flight rebase before squash-merge (adopted), heartbeat locks (already exists), DB rebuild on startup (adopted via M001/S02 importers).
+
+**GPT-5.4 (Codex)** read the full codebase and confirmed the model is sound. Identified that `smartStage()` already force-adds durable paths (validating the tracked-artifact approach) and that `resolveMainWorktreeRoot` in PR #487 is architecturally wrong (adopted — PR to be closed).
+
+**Codebase analysis** confirmed `.gsd/milestones/` is already partially tracked on `main` despite the `.gitignore`, that `GSD_DURABLE_PATHS` exists as a code-level acknowledgment that planning artifacts should be tracked, and that the README already documents the correct runtime-only gitignore pattern.
+
+### Codex (GPT-5.4) Dissent — "No Slice Branches Is a Redesign"
+
+Codex read the full codebase and raised 4 concerns. Each is addressed:
+
+**Concern 1: "Crash after slice done but before integration — today the runtime detects orphaned slice branches and merges them."**
+
+Rebuttal: In the branchless model, there is no integration step to crash between. Slice work is committed directly on the milestone branch. On restart, `deriveState()` reads the branch state as-is. The orphaned-branch recovery path exists solely because of slice branches — removing branches removes the failure mode it recovers from.
+
+**Concern 2: "Concurrent edits to shared root docs (PROJECT.md, DECISIONS.md) from two terminals."**
+
+Rebuttal: Valid edge case. If `/gsd queue` edits `DECISIONS.md` on `main` while auto-mode edits it in a worktree, there's a content conflict at squash-merge time. This is a standard git content conflict — no different from two developers editing the same file. Handled by normal merge resolution. Not caused by or solved by slice branches.
+
+**Concern 3: "Slice→milestone merges provide continuous integration. Removing them pushes conflict discovery to the end."**
+
+Rebuttal: In a single-user sequential workflow, there is nothing to integrate against within a worktree. Each slice builds on the previous one. The only conflict source is `main` diverging (e.g., another milestone merging first), which slice→milestone merges don't catch anyway — they merge within the worktree, not against `main`. Pre-flight rebase before squash-merge catches this more directly.
+
+**Concern 4: "Replace slice branches with another explicit slice-boundary primitive. Don't just delete them."**
+
+Response: Accepted in spirit. Commits with conventional tags (`feat(M001/S01):`, `feat(M001/S01/T01):`) serve as the slice boundary primitive. `git log --grep="M001/S01"` isolates a slice's history. `git revert` targets specific commits. Git tags (`gsd/M001/S01-complete`) can mark slice completion if needed. The boundary primitive is commit metadata, not branches.
+
+## Action Items
+
+1. Close PR #487 (`resolveMainWorktreeRoot`) — contradicts this architecture
+2. Implement as a GSD milestone with phases:
+ - Update `.gitignore` and force-add existing planning artifacts
+ - Remove slice branch creation/switching/merging code
+ - Simplify `mergeMilestoneToMain()` and `smartStage()`
+ - Remove `fix-merge` dispatch unit
+ - Remove branch-mode isolation (`git.isolation: "branch"`)
+ - Update/delete 11 test files
+ - Update README suggested gitignore
+ - Migration path for existing projects with slice branches
diff --git a/docs/ADR-003-pipeline-simplification.md b/docs/ADR-003-pipeline-simplification.md
new file mode 100644
index 000000000..ddc31f609
--- /dev/null
+++ b/docs/ADR-003-pipeline-simplification.md
@@ -0,0 +1,738 @@
+# ADR-003: Auto-Mode Pipeline Simplification
+
+**Status:** Proposed
+**Date:** 2026-03-18
+**Deciders:** Lex Christopherson
+**Related:** ADR-001 (branchless worktree architecture), ADR-002 (external state directory)
+**Audited by:** Claude Opus 4.6, OpenAI Codex — findings incorporated below.
+
+## Context
+
+GSD auto-mode orchestrates a multi-session pipeline where each "unit" of work runs in a fresh LLM session. The pipeline for a single milestone with N slices and M tasks per slice runs through:
+
+```
+research-milestone → plan-milestone →
+ (research-slice → plan-slice → execute-task × M → complete-slice → reassess-roadmap) × N →
+ validate-milestone → complete-milestone
+```
+
+The exact session count depends on profile. The "quality" profile runs all phases. The "balanced" profile skips slice research by default. The "budget" profile skips milestone research, slice research, reassessment, and milestone validation. This ADR uses the quality profile as the baseline for analysis — it represents the full pipeline and the worst-case ceremony overhead.
+
+For a typical 4-slice, 3-task milestone under the quality profile:
+- 1 research-milestone + 1 plan-milestone
+- Per slice: research-slice (skipped for S01) + plan-slice + 3 execute-task + complete-slice + reassess-roadmap (skipped for last slice, since all slices are done)
+- Per-slice total for S01: 0 + 1 + 3 + 1 + 1 = 6
+- Per-slice total for S02–S04: 1 + 1 + 3 + 1 + 1 = 7 (S04 skips reassess since it's the last completed slice: 6)
+- Slices total: 6 + 7 + 7 + 6 = 26
+- Plus: 1 validate-milestone + 1 complete-milestone
+
+**Total: 30 sessions.** Only 12 are task execution. The remaining 18 are pipeline ceremony.
+
+(The "balanced" profile drops slice research for S02-S04: 30 - 3 = 27 sessions. The "budget" profile drops milestone research, all slice research, reassessment, and validation: 30 - 1 - 3 - 3 - 1 = 22 sessions.)
+
+### The Token Tax
+
+Every fresh session re-ingests static context via prompt inlining. The `auto-prompts.ts` builders (1,099 lines) inline the following files into nearly every unit type:
+
+| File | Inlined Into | Changes After |
+|------|-------------|---------------|
+| ROADMAP | research-slice, plan-slice, execute-task (excerpt), complete-slice, reassess, validate, complete-milestone | plan-milestone (rare reassess rewrites) |
+| DECISIONS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-milestone, validate | Appended occasionally during execution |
+| REQUIREMENTS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-slice, complete-milestone, validate | Updated during complete-slice |
+| KNOWLEDGE.md | research-milestone, plan-milestone, research-slice, plan-slice, execute-task, complete-slice, complete-milestone, validate | Appended occasionally during execution |
+| PROJECT.md | research-milestone, plan-milestone, complete-milestone, validate | Rarely updated |
+
+The ROADMAP alone is inlined into 7 unit types. It never changes during normal execution. This is a static document being re-tokenized per session at a cost of 5–20K tokens each time.
+
+For the 30-session milestone above (quality profile), context re-ingestion costs approximately:
+- ROADMAP: 7 re-inlines × ~10K tokens = 70K tokens
+- DECISIONS: 6 re-inlines × ~5K tokens = 30K tokens
+- REQUIREMENTS: 8 re-inlines × ~5K tokens = 40K tokens
+- KNOWLEDGE: 8 re-inlines × ~3K tokens = 24K tokens
+- Templates (research, plan, task-plan, etc.): ~2K per inline × ~10 units = 20K tokens
+- Dependency summaries: ~8K per slice plan × 3 non-S01 slices = 24K tokens
+
+**Total context re-ingestion overhead: ~208K tokens per milestone.** This is pure waste — the LLM re-reads documents it already processed in prior sessions, gaining no new information.
+
+### The Lossy Handoff Problem
+
+Each session boundary is a lossy compression step. The research-milestone agent reads the codebase and writes a RESEARCH.md. The plan-milestone agent reads that research and produces a ROADMAP. The research-slice agent reads the ROADMAP and explores the codebase again for its slice scope. The plan-slice agent reads that slice research and produces a PLAN.
+
+This is a game of telephone:
+
+```
+Codebase → [researcher reads code] → RESEARCH.md → [planner reads research] → ROADMAP
+ ↑ often re-reads the same code
+```
+
+The research prompt explicitly says: *"Write for the roadmap planner."* The plan prompt says: *"Trust the research. Don't re-read code."* But planners routinely re-read code because research is a lossy compression — a summary of what one LLM session saw, not the thing itself. The fidelity loss compounds at each handoff.
+
+### The Machinery Tax
+
+The multi-session pipeline requires extensive orchestration machinery to handle edge cases, failures, and recovery:
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `auto-recovery.ts` | 591 | Artifact resolution, loop remediation, skip/rerun logic |
+| `auto-stuck-detection.ts` | 220 | Dispatch loop detection, lifetime caps, stub recovery |
+| `auto-idempotency.ts` | 150 | Skip completed units, phantom loop detection, stale key recovery |
+| `session-forensics.ts` | 536 | Post-mortem analysis, crash briefings, deep diagnostics |
+| `auto-timeout-recovery.ts` | 262 | Resume after timeout, recovery briefing synthesis |
+| `crash-recovery.ts` | 108 | Lock file management, crash detection |
+| `auto-post-unit.ts` | 591 | Post-agent processing, verification, commits, state sync |
+| `auto-verification.ts` | 229 | Post-task verification enforcement |
+| `verification-gate.ts` | 643 | Test/lint/audit gate runner |
+| `doctor-proactive.ts` | 292 | Health checks, proactive healing, escalation detection |
+| **Total** | **3,622** | **Recovery, verification, and post-processing** |
+
+This is 3,622 lines of code managing the complexity of a 15-rule dispatch table across 13 unit types. Much of this machinery exists because the pipeline has so many sessions that failures, timeouts, and stuck states are statistically likely.
+
+### The Ceremony Sessions
+
+Six of the 13 unit types produce no code. They exist purely to manage the pipeline:
+
+| Unit Type | What It Does | Sessions per Milestone (quality, 4-slice) |
+|-----------|-------------|----------------------|
+| research-milestone | Reads codebase, writes RESEARCH.md | 1 |
+| research-slice | Reads codebase for slice scope, writes slice RESEARCH.md | 3 (skipped for S01) |
+| complete-slice | Re-reads ROADMAP + plan + all task summaries, writes slice SUMMARY.md + UAT.md | 4 |
+| reassess-roadmap | Re-reads ROADMAP + slice summary, almost always says "roadmap is fine" | 3 (skipped after last slice) |
+| validate-milestone | Re-reads ROADMAP + all slice summaries, writes VALIDATION.md | 1 |
+| complete-milestone | Re-reads ROADMAP + all slice summaries, writes SUMMARY.md | 1 |
+
+Total: 1 + 3 + 4 + 3 + 1 + 1 = **13 ceremony sessions** (under quality profile), each consuming 12–37K tokens of prompt context. Under the balanced profile this drops to 9 (no slice research). These sessions burn tokens re-reading documents that other sessions already produced, producing intermediate artifacts that downstream sessions then re-read.
+
+### Root Cause
+
+The pipeline was designed around a paradigm where:
+1. LLM context windows are small (32K–100K tokens)
+2. Sessions are expensive, so specialize each one
+3. Handoffs between specialized agents produce better results than generalist sessions
+4. Research → plan → execute is the "correct" decomposition of intellectual work
+
+With 200K+ token context windows and prompt caching, assumptions 1-2 are obsolete. Assumption 3 is demonstrably false — handoffs lose fidelity. Assumption 4 confuses human workflow patterns with LLM-optimal patterns. An LLM with tool access is already researching while it plans. Forcing it to serialize research into a document, then read that document in a new session, is an artificial bottleneck.
+
+## Decision
+
+**Collapse the pipeline from 13 unit types to 5. Merge research into planning. Fold completion into post-unit mechanical processing. Replace LLM-driven validation with mechanical verification aggregation.**
+
+### The Simplified Pipeline
+
+```
+plan-milestone → (plan-slice → execute-task × M) × N → done
+```
+
+Note: `discuss` is an interactive human-facing session, not an auto-mode unit — it's not counted in session math. It continues to work as-is.
+
+For the same 4-slice, 3-task milestone:
+- 1 plan-milestone (S01 plan + task plans produced inline via single-slice fast path if applicable)
+- S01: plan-slice skipped (milestone planner already explored) + 3 execute-task = 3
+- S02–S04: plan-slice + 3 execute-task = 4 each × 3 slices = 12
+
+**Total: 1 + 3 + 12 = 16 sessions** (down from 30). The 14 eliminated sessions were the highest-waste ones — each re-ingested context for minimal value.
+
+### Unit Type Changes
+
+#### 1. Merge research-milestone INTO plan-milestone
+
+**Current:** Two sessions. Researcher explores codebase, writes RESEARCH.md. Planner reads RESEARCH.md, writes ROADMAP.
+
+**New:** One session. The plan-milestone agent explores the codebase directly and produces the ROADMAP. It has full tool access — it can read files, run commands, search code. The "research" happens naturally as part of planning, not as a serialized intermediary.
+
+**What changes:**
+- The plan-milestone prompt gains the research-milestone's exploration instructions: "Explore relevant code, check technologies, identify constraints."
+- The plan-milestone prompt drops "Trust the research" — there is no research document to trust.
+- The RESEARCH.md artifact becomes optional. If the planner wants to capture notes for downstream reference, it can write one. But it's not required, and downstream units don't depend on it.
+- Skill discovery instructions move into the plan-milestone prompt.
+- The research-milestone template (`prompts/research-milestone.md`) is retained but only used when explicitly dispatched via `/gsd dispatch research`.
+
+**Token savings:** ~1 full session (12–37K tokens of prompt context) + the RESEARCH.md document no longer re-inlined into plan-milestone (~5–15K tokens).
+
+**Quality impact:** Positive. The planner has direct access to the codebase instead of reading a lossy summary. It can verify assumptions in real time instead of trusting a prior session's interpretation.
+
+#### 2. Merge research-slice INTO plan-slice
+
+**Current:** Two sessions per non-S01 slice. Slice researcher explores codebase for slice scope, writes slice RESEARCH.md. Slice planner reads that research, writes PLAN.md + task plans.
+
+**New:** One session. The plan-slice agent explores the relevant code directly and produces the slice plan with task plans.
+
+**What changes:**
+- The plan-slice prompt gains exploration instructions: "Read the relevant code for this slice's scope before decomposing."
+- The plan-slice prompt drops "Trust the research" — there is no slice research document.
+- Slice RESEARCH.md becomes optional (same as milestone research above).
+- The research-slice template is retained for explicit dispatch.
+- The `skip_slice_research` preference becomes the default behavior rather than an opt-in.
+- The dispatch rule "planning (no research, not S01) → research-slice" is removed.
+
+**Token savings:** ~1 session per non-S01 slice × (N-1) slices. For a 4-slice milestone: 3 sessions × 12–37K tokens = 36–111K tokens.
+
+**Quality impact:** Positive. The planner can read actual code files instead of a summary. It verifies file paths, function signatures, and patterns directly rather than trusting a researcher's notes.
+
+#### 3. Fold complete-slice INTO mechanical post-unit processing
+
+**Current:** After all tasks in a slice complete, `deriveState()` emits the `summarizing` phase, dispatching a separate complete-slice LLM session that re-reads the ROADMAP, slice plan, and ALL task summaries to write a slice SUMMARY.md and UAT.md.
+
+**New:** Slice completion moves to a **post-gate mechanical closeout** in `auto-post-unit.ts`, not into the final executor's prompt. After the last execute-task's verification gate passes:
+
+1. The post-unit processing detects that all tasks in the slice are done (same check `deriveState()` uses to emit `summarizing`).
+2. It runs mechanical slice completion: aggregate task summaries into a SUMMARY.md using structured frontmatter, generate a UAT.md from the slice plan's verification section, mark the slice done in the ROADMAP.
+3. If the mechanical summary is insufficient (complex slices where structured aggregation loses important narrative), the system detects low quality (e.g., summary is below a character threshold) and dispatches a standalone complete-slice LLM session as recovery.
+
+**Why post-gate, not in the executor prompt:**
+- Codex audit identified that folding completion into execute-task creates a verification-retry ordering problem: if the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure would retry against incorrect derived state (the slice appears complete when it isn't).
+- Post-gate processing runs after verification succeeds, so state transitions are always consistent.
+- The executor's context budget is fully available for its actual work.
+
+**What changes in `deriveState()`:**
+- The `summarizing` phase still exists in state derivation (all tasks done, slice not marked complete).
+- The dispatch table no longer maps `summarizing → complete-slice`. Instead, post-unit processing handles the transition synchronously.
+- If post-unit mechanical completion fails or produces low-quality output, the `summarizing` phase still exists as a dispatch target and the system falls back to dispatching a complete-slice LLM session.
+
+**What changes:**
+- `auto-post-unit.ts` gains a `mechanicalSliceCompletion()` function.
+- The complete-slice dispatch rule is removed from the default path but retained as a fallback.
+- The complete-slice template is retained for recovery and explicit dispatch.
+- The `summarizing` phase in `state.ts` is unchanged — it serves as the fallback trigger if mechanical completion doesn't run.
+
+**Full completion contract preserved:** The mechanical completion writes all three required artifacts (SUMMARY.md, UAT.md, ROADMAP checkbox) — matching the current complete-slice contract. It also handles REQUIREMENTS.md updates and KNOWLEDGE.md/DECISIONS.md appendix that the current complete-slice prompt performs (see Risk 5 below for details).
+
+**Token savings:** ~1 session per slice × N slices. For a 4-slice milestone: 4 sessions × 12–37K tokens = 48–148K tokens.
+
+**Quality impact:** For most slices, the mechanical summary is sufficient — it aggregates structured frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) from task summaries. For complex slices with important narrative context, the LLM fallback preserves quality.
+
+#### 4. Eliminate reassess-roadmap (make opt-in)
+
+**Current:** After every slice completion, a reassess-roadmap session re-reads the ROADMAP and slice summary, then almost always writes "roadmap is fine."
+
+**New:** Reassessment is eliminated by default. The plan-slice agent for the next slice serves as the natural reassessment point — it reads the ROADMAP and prior slice summaries, and can adjust its plan if the ground has shifted.
+
+**What changes:**
+- The reassess-roadmap dispatch rule fires only when the `reassess_after_slice` preference is enabled (default: off, was effectively always-on).
+- The plan-slice prompt gains a reassessment preamble: "Before planning this slice, verify that the roadmap's assumptions still hold given prior slice summaries. If the remaining roadmap needs adjustment, modify it before proceeding."
+- The `checkNeedsReassessment()` function in auto-prompts.ts becomes a preference gate, not a mandatory check.
+
+**Token savings:** ~1 session per completed non-final slice × (N-1) slices minus those already skipped. For a 4-slice milestone under quality profile: 3 sessions × 12–37K tokens = 36–111K tokens.
+
+**Quality impact:** Neutral. The reassess prompt says *"Bias strongly toward 'roadmap is fine.'"* — acknowledging that most reassessments produce no change. JIT reassessment during the next plan-slice is more informed (has the next slice's context) and costs zero additional tokens.
+
+#### 5. Replace validate-milestone with mechanical verification
+
+**Current:** An LLM session re-reads the ROADMAP and all slice summaries, checks success criteria against delivery evidence, and writes a VALIDATION.md with a verdict. It also inlines UAT-RESULT artifacts from slices with `uat_dispatch` enabled.
+
+**New:** The system mechanically aggregates verification results from all tasks and slices. The canonical verification data sources are:
+
+1. **`T##-VERIFY.json`** files (written by `writeVerificationJSON()` in `verification-evidence.ts`) — machine-readable per-task verification results with command, exit code, verdict, duration, and blocking status.
+2. **`S##-UAT-RESULT.md`** files (when `uat_dispatch` is enabled) — human or artifact-driven UAT outcomes.
+3. **Task summary frontmatter** `verification_result` field — a human-readable pass/fail string (not structured, used as a secondary signal).
+
+The aggregator reads `T##-VERIFY.json` as the primary source of truth, supplements with UAT-RESULT artifacts, and produces a deterministic VALIDATION.md.
+
+**What changes:**
+- A new `aggregateMilestoneVerification()` function collects `T##-VERIFY.json` files and `S##-UAT-RESULT.md` files across all slices.
+- The function produces a VALIDATION.md with per-task and per-slice pass/fail status, UAT evidence, and an overall verdict.
+- The LLM-driven validate-milestone session is removed from the default pipeline.
+- The validate-milestone template is retained for explicit dispatch (users who want LLM-driven validation can run `/gsd dispatch validate`).
+- The `skip_milestone_validation` preference (which writes a pass-through VALIDATION.md) becomes the default behavior, with the mechanical aggregation replacing it.
+
+```typescript
+async function aggregateMilestoneVerification(base: string, mid: string): Promise {
+ const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
+ const checks: EvidenceCheckJSON[] = [];
+ const uatResults: { sliceId: string; content: string }[] = [];
+
+ for (const slice of roadmap.slices) {
+ // Primary source: T##-VERIFY.json files (machine-readable, written by verification-gate.ts)
+ const tDir = resolveTasksDir(base, mid, slice.id);
+ if (tDir) {
+ const verifyFiles = resolveTaskFiles(tDir, "VERIFY");
+ for (const file of verifyFiles) {
+ const content = await loadFile(join(tDir, file));
+ if (content) {
+ const evidence: EvidenceJSON = JSON.parse(content);
+ checks.push(...evidence.checks);
+ }
+ }
+ }
+
+ // Secondary source: S##-UAT-RESULT.md (when uat_dispatch enabled)
+ const uatResultFile = resolveSliceFile(base, mid, slice.id, "UAT-RESULT");
+ if (uatResultFile) {
+ const uatContent = await loadFile(uatResultFile);
+ if (uatContent) uatResults.push({ sliceId: slice.id, content: uatContent });
+ }
+ }
+
+ const allChecksPassed = checks.every(c => c.verdict === "pass");
+ const hasUatFailures = uatResults.some(r => r.content.includes("❌") || r.content.includes("FAIL"));
+ const verdict = allChecksPassed && !hasUatFailures ? "pass" : "needs-attention";
+
+ return { verdict, checks, uatResults };
+}
+```
+
+**Token savings:** 1 session × 12–37K tokens. This session is one of the most context-heavy — it inlines the ROADMAP + all slice summaries + all UAT results.
+
+**Quality impact:** Positive. Mechanical verification is deterministic and complete. LLM validation is subjective and can miss things. The verification gate and UAT system already do the hard work — the validate session was a redundant re-check. The `T##-VERIFY.json` artifacts are the canonical machine-readable source, not task summary frontmatter.
+
+#### 6. Replace complete-milestone with mechanical completion
+
+**Current:** An LLM session re-reads the ROADMAP and all slice summaries to write a SUMMARY.md.
+
+**New:** The system produces a milestone summary mechanically by aggregating slice summaries. The summary includes: milestone title, success criteria with pass/fail status, slice completion dates, key decisions made, and patterns established (all extracted from structured frontmatter in slice summaries).
+
+**What changes:**
+- A new `generateMilestoneSummary()` function reads all slice SUMMARY.md files, extracts frontmatter fields, and produces a structured milestone SUMMARY.md.
+- The complete-milestone dispatch rule is replaced with a synchronous post-processing step after the validation artifact is written.
+- The complete-milestone template is retained for explicit dispatch.
+
+**What changes in `deriveState()`:**
+- The `validating-milestone` and `completing-milestone` phases still exist in state derivation.
+- When mechanical validation + completion runs synchronously in post-unit processing, these phases are transient — `deriveState()` emits them, but the mechanical processing writes the VALIDATION.md and SUMMARY.md artifacts before the next dispatch cycle, so the phases resolve immediately.
+- If mechanical processing fails, the phases remain as dispatch targets and the system falls back to dispatching LLM sessions for validation and/or completion.
+
+**Token savings:** 1 session × 12–37K tokens.
+
+**Quality impact:** Neutral. Milestone summaries are archival — they capture what happened, not make decisions. Mechanical aggregation of structured frontmatter is more reliable than an LLM re-interpreting task summaries.
+
+### Dispatch Table Changes
+
+**Current: 15 rules.**
+
+```
+1. rewrite-docs (override gate)
+2. summarizing → complete-slice
+3. run-uat (post-completion)
+4. reassess-roadmap (post-completion)
+5. needs-discussion → stop
+6. pre-planning (no context) → stop
+7. pre-planning (no research) → research-milestone
+8. pre-planning (has research) → plan-milestone
+9. planning (no research, not S01) → research-slice
+10. planning → plan-slice
+11. replanning-slice → replan-slice
+12. executing → execute-task (recovery)
+13. executing → execute-task
+14. validating-milestone → validate-milestone
+15. completing-milestone → complete-milestone
+```
+
+**New: 11 rules.**
+
+```
+1. rewrite-docs (override gate) [unchanged]
+2. summarizing → complete-slice [FALLBACK ONLY — fires when mechanical completion didn't run]
+3. run-uat (post-completion) [unchanged, preference-gated]
+4. needs-discussion → stop [unchanged]
+5. pre-planning (no context) → stop [unchanged]
+6. pre-planning → plan-milestone [rules 7+8 merged — research folded in]
+7. planning → plan-slice [rules 9+10 merged — research folded in]
+8. replanning-slice → replan-slice [unchanged]
+9. executing → execute-task (recovery) [unchanged]
+10. executing → execute-task [unchanged]
+11. validating-milestone → validate-milestone [FALLBACK ONLY — fires when mechanical validation didn't run]
+12. completing-milestone → complete-milestone [FALLBACK ONLY — fires when mechanical completion didn't run]
+```
+
+Note: Rules 2, 11, and 12 are retained as **fallbacks** for cases where mechanical processing fails. They do not fire in the normal path because post-unit processing writes the required artifacts before the next dispatch cycle. This means `deriveState()` is unchanged — it still emits `summarizing`, `validating-milestone`, and `completing-milestone` phases. The change is that these phases are normally resolved mechanically before dispatch evaluates them.
+
+**Removed rules (no longer in default path):**
+- `reassess-roadmap` — folded into next plan-slice (or opt-in preference)
+- `pre-planning (no research) → research-milestone` — merged into plan-milestone
+- `planning (no research, not S01) → research-slice` — merged into plan-slice
+
+### Prompt Changes
+
+#### plan-milestone.md — gains exploration instructions
+
+Add before the planning steps:
+
+```markdown
+## Explore First, Then Decompose
+
+You have full tool access. Before decomposing into slices:
+1. Explore the relevant codebase — read key files, understand existing patterns, identify constraints.
+2. For unfamiliar libraries, use `resolve_library` / `get_library_docs`.
+3. Skill Discovery ({{skillDiscoveryMode}}):{{skillDiscoveryInstructions}}
+
+Narrate key findings as you go. If findings are significant enough to benefit downstream slice planners, write {{researchOutputPath}} — but only if the content would genuinely help. Don't write a research doc just because the template exists.
+```
+
+#### plan-slice.md — gains exploration + reassessment preamble
+
+Add before the planning steps:
+
+```markdown
+## Verify Roadmap Assumptions
+
+Before planning this slice, check whether the roadmap's assumptions still hold:
+- Review prior slice summaries (inlined above). Did anything change that affects this slice?
+- If the remaining roadmap needs adjustment, modify the unchecked slices in {{roadmapPath}} before proceeding.
+
+## Explore Slice Scope
+
+Read the relevant code for this slice before decomposing:
+1. Check the files and modules this slice will touch.
+2. Verify the approach described in the roadmap against the actual codebase state.
+3. If the roadmap's description of this slice is wrong or outdated, adjust your plan accordingly.
+```
+
+### Context Inlining Changes
+
+#### Reduce inlining for planning sessions — provide paths for stable documents
+
+Planning sessions (plan-milestone, plan-slice) currently inline ROADMAP, DECISIONS, REQUIREMENTS, KNOWLEDGE, and PROJECT. Since these sessions now also explore the codebase (merged research), the total prompt size grows. To offset this, stable documents should be provided as file paths rather than inlined content for planning sessions.
+
+**Current pattern:**
+```typescript
+inlined.push(await inlineFile(roadmapPath, roadmapRel, "Milestone Roadmap"));
+```
+
+**New pattern for plan-milestone/plan-slice:**
+```typescript
+sourcePaths.push(`- Milestone Roadmap: \`${roadmapRel}\` — read this for the full slice decomposition`);
+```
+
+The prompt header changes from "All relevant context has been preloaded below" to "Source files are listed below. Read them before proceeding."
+
+**What stays inlined:**
+- **Task plan** in execute-task (it's the executor's authoritative contract — must be in prompt)
+- **Slice plan excerpt** in execute-task (goal/demo/verification — small and task-specific)
+- **Prior task summaries** in execute-task (carry-forward context — already budget-managed)
+- **Milestone context** in plan-milestone (it's the starting input — relatively small)
+
+**What moves to file-path references:**
+- ROADMAP in plan-slice, complete-slice, reassess, validate, complete-milestone
+- DECISIONS.md everywhere except execute-task (where it's already omitted for minimal inline level)
+- REQUIREMENTS.md everywhere except execute-task
+- KNOWLEDGE.md everywhere (already uses `inlineFileSmart` for execute-task)
+- PROJECT.md everywhere
+
+**Interaction with budget engine:** The current budget engine (`context-budget.ts`) truncates inlined content when it exceeds budget. Removing inlining means the LLM reads the full file via tool call. For most documents (ROADMAP ~3-10K chars, DECISIONS ~2-5K chars), the full read is within budget. For very large REQUIREMENTS.md files (>30K chars), the LLM may need to use the DB-scoped query (`inlineRequirementsFromDb` with slice scoping) or the compact formatter. The path reference should note: "For large files, use scoped queries."
+
+**Risk: LLMs might not read referenced files.**
+
+This is the most significant behavioral risk in this ADR. Inlined content forces processing. Path references require the LLM to decide to read. Mitigation:
+
+1. **Mandatory read directives.** The prompt says "You MUST read the following files before proceeding" with a numbered list of 2-3 critical files. Not "read as needed" — a direct instruction.
+2. **Verification.** The plan-slice prompt requires citing the ROADMAP's slice description in its output (slice title, risk level, depends). If these don't match, the planner didn't read it.
+3. **Phased rollout.** Phase 4 (context reduction) is separate from Phase 1 (research merge). This allows measuring whether path references degrade plan quality before full rollout.
+4. **Fallback.** If path references prove unreliable, restore inlining for critical documents only (ROADMAP in plan-slice). The budget engine still handles truncation.
+
+**Token savings (Phase 4 only):** Eliminates ~150K tokens of re-ingestion per milestone (revised from 208K — the execute-task sessions retain inlined content). The LLM reads files as needed via tool calls, cached by API prompt caching. Net savings are ~50-60% of the re-ingestion overhead, since the LLM still reads most files once per session.
+
+### Post-Unit Processing Changes
+
+#### Mechanical slice completion
+
+After the last execute-task's verification gate passes and post-unit processing detects all tasks done:
+
+```typescript
+async function mechanicalSliceCompletion(base: string, mid: string, sid: string): Promise {
+ const tDir = resolveTasksDir(base, mid, sid);
+ if (!tDir) return false;
+
+ const summaryFiles = resolveTaskFiles(tDir, "SUMMARY").sort();
+ const taskSummaries = await Promise.all(
+ summaryFiles.map(async f => ({ file: f, summary: parseSummary(await loadFile(join(tDir, f)) ?? "") }))
+ );
+
+ // Aggregate structured frontmatter
+ const allProvides = taskSummaries.flatMap(t => t.summary.frontmatter.provides);
+ const allKeyFiles = taskSummaries.flatMap(t => t.summary.frontmatter.key_files);
+ const allDecisions = taskSummaries.flatMap(t => t.summary.frontmatter.key_decisions);
+ const allPatterns = taskSummaries.flatMap(t => t.summary.frontmatter.patterns_established);
+ const allAffects = taskSummaries.flatMap(t => t.summary.frontmatter.affects);
+
+ // Build slice SUMMARY.md from aggregated frontmatter
+ const sliceSummary = formatSliceSummary({ sid, provides: allProvides, keyFiles: allKeyFiles, ... });
+
+ // Build UAT.md from slice plan's Verification section
+ const slicePlanContent = await loadFile(resolveSliceFile(base, mid, sid, "PLAN"));
+ const verificationSection = extractMarkdownSection(slicePlanContent, "Verification");
+ const sliceUat = formatSliceUat(sid, verificationSection);
+
+ // Write all three artifacts atomically
+ writeFileSync(sliceSummaryPath, sliceSummary);
+ writeFileSync(sliceUatPath, sliceUat);
+ markSliceDoneInRoadmap(base, mid, sid);
+
+ // Handle REQUIREMENTS.md updates (currently done by complete-slice prompt step 5)
+ // Mechanical: mark requirements as Validated if all tasks covering them passed verification.
+ await mechanicalRequirementsUpdate(base, mid, sid, taskSummaries);
+
+ // Handle DECISIONS.md appendix (currently done by complete-slice prompt step 8)
+ // Mechanical: collect key_decisions from task summaries not already in DECISIONS.md
+ await appendNewDecisions(base, taskSummaries);
+
+ // Handle KNOWLEDGE.md appendix (currently done by complete-slice prompt step 9)
+ // Not mechanical — skip. Knowledge entries require judgment about what's genuinely useful.
+ // The executor tasks already write KNOWLEDGE.md entries during execution (step 13 in execute-task).
+
+ return true;
+}
+```
+
+**Fallback:** If `mechanicalSliceCompletion()` fails or produces output below a quality threshold (e.g., summary under 200 chars for a multi-task slice), the `summarizing` phase persists in `deriveState()` and the dispatch table's retained fallback rule dispatches a complete-slice LLM session.
+
+#### Mechanical milestone validation
+
+See `aggregateMilestoneVerification()` above (Section 5). Reads `T##-VERIFY.json` and `S##-UAT-RESULT.md` as canonical sources.
+
+#### Mechanical milestone summary
+
+```typescript
+async function generateMilestoneSummary(base: string, mid: string): Promise {
+ const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
+ const sliceSummaries = [];
+
+ for (const slice of roadmap.slices) {
+ const content = await loadFile(resolveSliceFile(base, mid, slice.id, "SUMMARY"));
+ if (content) sliceSummaries.push({ id: slice.id, summary: parseSummary(content) });
+ }
+
+ // Aggregate frontmatter fields across all slice summaries
+ // Produce structured markdown from the aggregation
+ return formatMilestoneSummary(roadmap, sliceSummaries);
+}
+```
+
+## Consequences
+
+### Session Count Reduction
+
+Counts assume no fallback sessions fire (mechanical processing succeeds). "Current" uses quality profile. "New" is the simplified pipeline.
+
+| Milestone Shape | Current Sessions (quality) | New Sessions | Reduction |
+|----------------|---------------------------|--------------|-----------|
+| 1 slice, 2 tasks | 9 | 3 | 67% |
+| 2 slices, 3 tasks | 17 | 8 | 53% |
+| 4 slices, 3 tasks | 30 | 16 | 47% |
+| 6 slices, 4 tasks | 46 | 31 | 33% |
+
+**Derivation (4-slice, 3-task):**
+
+Current (quality): research-milestone(1) + plan-milestone(1) + [research-slice(0) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] for S01 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] × 2 for S02-S03 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(0)] for S04 + validate(1) + complete-milestone(1) = 2 + 6 + 14 + 6 + 2 = 30.
+
+New: plan-milestone(1) + [execute(3)] for S01 + [plan-slice(1) + execute(3)] × 3 for S02-S04 = 1 + 3 + 12 = 16.
+
+### Token Savings
+
+Eliminated sessions are the primary savings mechanism. Context re-ingestion reduction is a secondary effect of having fewer sessions (each of the remaining sessions still ingests some context). These are NOT additive — the re-ingestion savings are already captured in the eliminated session savings.
+
+| Source | Per Milestone (4-slice, 3-task) |
+|--------|-------------------------------|
+| Eliminated research sessions (1 milestone + 3 slice) | 48–148K tokens |
+| Eliminated complete-slice sessions (4) | 48–148K tokens |
+| Eliminated reassess sessions (3) | 36–111K tokens |
+| Eliminated validate session (1) | 12–37K tokens |
+| Eliminated complete-milestone session (1) | 12–37K tokens |
+| **Total estimated savings** | **~156–481K tokens** |
+
+At current Opus pricing ($15/MTok input, $75/MTok output — as of March 2026), the input savings alone are **$2.34–$7.22 per milestone**. Output savings are harder to estimate but typically 30-50% of input.
+
+### Code Deletion
+
+| File / Section | Lines | Impact |
+|----------------|-------|--------|
+| `auto-dispatch.ts` — 3 removed default-path rules | ~40 | Simpler dispatch table |
+| `auto-prompts.ts` — 5 builders become fallback-only | ~250 | `buildResearchMilestonePrompt`, `buildResearchSlicePrompt`, `buildCompleteSlicePrompt`, `buildValidateMilestonePrompt`, `buildCompleteMilestonePrompt` move to explicit-dispatch codepath |
+| `auto-prompts.ts` — reduced inlining (Phase 4) | ~100 | Remove `inlineFile` calls for static docs in planning prompts, replace with path references |
+| Context re-ingestion helpers (Phase 4) | ~50 | `inlineDecisionsFromDb`, `inlineRequirementsFromDb`, `inlineProjectFromDb` simplified for planning paths |
+| **Total deletable** | **~440** | |
+
+### Code Added
+
+| File / Section | Lines | Impact |
+|----------------|-------|--------|
+| `auto-prompts.ts` — plan-milestone exploration | ~30 | Research instructions merged in |
+| `auto-prompts.ts` — plan-slice reassessment + exploration | ~25 | Reassessment + exploration preamble |
+| `auto-post-unit.ts` — `mechanicalSliceCompletion()` | ~80 | Structured frontmatter aggregation, UAT generation, artifact writes |
+| `auto-verification.ts` — `aggregateMilestoneVerification()` | ~60 | T##-VERIFY.json + UAT-RESULT aggregation |
+| `auto-unit-closeout.ts` — `generateMilestoneSummary()` | ~60 | Mechanical summary generation |
+| **Total added** | **~255** | |
+
+### Net Impact
+
+- **~185 lines net deleted** (440 deleted - 255 added)
+- **3 fewer default-path dispatch rules** (15 → 12, with 3 retained as fallbacks)
+- **6 fewer unit types in the default pipeline** (13 → 7 active; 6 retained for fallback/explicit dispatch)
+- **~156–481K fewer tokens per milestone**
+- **14 fewer session handoffs per 4-slice milestone under quality profile** (each a potential failure/timeout point)
+- `auto-prompts.ts` goes from ~1,099 lines to ~924 lines (~175 lines net reduction)
+
+### What Stays Unchanged
+
+- The **discuss** flow (guided-flow.ts, interactive discussion)
+- The **dispatch table architecture** (declarative rules, first-match-wins)
+- The **fresh session per unit** pattern (still used for plan-slice and execute-task)
+- The **state derivation** (`deriveState()` reads files, derives phase — all existing phases preserved)
+- The **verification gate** (runs tests/lint after each task)
+- The **worktree isolation** model
+- The **crash recovery**, **idempotency**, and **stuck detection** systems (fewer sessions means these fire less often, but the safety nets remain)
+- The **metrics** and **cost tracking** systems
+- The **parallel orchestrator** for independent milestones
+- All prompt templates are **retained** — for fallback, recovery, and explicit dispatch via `/gsd dispatch `
+
+### What Gets Simpler Downstream
+
+Less machinery is needed when sessions are fewer:
+
+- **Fewer recovery paths.** 14 fewer sessions means 14 fewer opportunities for timeouts, stuck states, and missing artifacts.
+- **Simpler `auto-post-unit.ts`.** Reassess dispatch logic removed (opt-in only). Mechanical completion/validation added but replaces more complex LLM-session dispatch.
+- **Simpler `auto-stuck-detection.ts`.** Fewer unit types means fewer dispatch-loop patterns to detect.
+- **Simpler `auto-idempotency.ts`.** Fewer completed-key types to track.
+
+These simplifications are downstream effects — they don't need to happen in the same change. But they represent ~500-1000 lines of code that becomes significantly simpler or unnecessary as a consequence of this ADR.
+
+## Risks
+
+### 1. Plan-milestone sessions become heavier
+
+Merging research into planning makes plan-milestone sessions longer. The planner must explore the codebase AND decompose into slices in a single session. Risk: the session hits context pressure before finishing.
+
+**Mitigation:** Plan-milestone is the session that benefits most from a large context window. Modern context windows (200K+ tokens) easily accommodate exploration + planning. The single-slice fast path (already in plan-milestone.md) already combines planning with slice plan + task plan writing in one session — this extends that pattern. Phase 4 (reducing inlining for planning sessions) further offsets the added exploration work.
+
+**Phase ordering note:** Phase 1 (merge research into planning) adds exploration to plan-milestone. If Phase 4 (reduce inlining) hasn't landed yet, the plan-milestone prompt includes both exploration instructions AND the full inlined context. This is the most context-heavy state. To mitigate, Phase 1 should also reduce inlining for plan-milestone/plan-slice specifically — moving DECISIONS, REQUIREMENTS, and PROJECT to path references while keeping ROADMAP and CONTEXT inlined. This is a targeted subset of Phase 4, not a separate phase.
+
+### 2. Mechanical completion quality
+
+The mechanical slice completion aggregates structured frontmatter but cannot produce narrative context, forward intelligence sections, or nuanced UAT scenarios that the current LLM-driven complete-slice session produces.
+
+**Mitigation:**
+- For most slices (2-3 tasks, straightforward work), structured aggregation is sufficient. The frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) capture the essential information.
+- The quality threshold fallback dispatches a complete-slice LLM session for complex slices.
+- The LLM fallback is zero-cost to implement — the complete-slice template and dispatch rule are retained.
+
+### 3. Loss of research artifacts
+
+RESEARCH.md files provided a useful paper trail for debugging plan quality. Without them, it's harder to understand why a planner made certain decisions.
+
+**Mitigation:**
+- The planner's narration (visible in the conversation transcript) captures exploration reasoning.
+- RESEARCH.md is optional, not eliminated. Planners can write one when exploration is complex.
+- The KNOWLEDGE.md file captures non-obvious patterns and decisions.
+- DECISIONS.md captures structural choices.
+
+### 4. Reassessment gaps
+
+Without mandatory reassessment, a slice might complete with findings that invalidate the remaining roadmap, and the next planner might not notice.
+
+**Mitigation:**
+- The plan-slice prompt includes a reassessment preamble that explicitly checks prior slice summaries.
+- The `blocker_discovered` flag in task summaries already triggers automatic replanning.
+- Users who want explicit reassessment can enable the `reassess_after_slice` preference.
+
+### 5. Mechanical completion doesn't cover all complete-slice responsibilities
+
+The current complete-slice prompt (steps 5, 8, 9) updates REQUIREMENTS.md, appends to DECISIONS.md, and appends to KNOWLEDGE.md. The mechanical completion handles REQUIREMENTS.md and DECISIONS.md mechanically but cannot produce KNOWLEDGE.md entries (which require judgment about what's genuinely useful).
+
+**Mitigation:**
+- Execute-task prompt step 13 already instructs executors to append to KNOWLEDGE.md during task execution. Most knowledge entries are discovered during implementation, not during completion.
+- DECISIONS.md appendix is handled mechanically by collecting `key_decisions` from task summaries and deduplicating against existing entries.
+- REQUIREMENTS.md updates are handled mechanically by cross-referencing task verification results against requirement-to-slice mappings.
+- For the LLM fallback path (complex slices), the complete-slice prompt retains all responsibilities.
+
+### 6. Migration path
+
+Milestones in progress when this change deploys will have state files (RESEARCH.md, etc.) that the new pipeline doesn't produce. The dispatch table must gracefully handle both old-style and new-style state.
+
+**Mitigation:**
+- Dispatch rules check for file existence, not file absence. A milestone with an existing RESEARCH.md still works — the plan-milestone rule fires regardless of whether research exists.
+- The idempotency system already handles "completed research unit → dispatch plan" transitions.
+- All `deriveState()` phases are preserved — old-style state resolves correctly.
+- No migration needed. The new pipeline is strictly more permissive than the old one.
+
+## Alternatives Considered
+
+### A. Keep research as a separate session, just make it optional
+
+Add a `skip_research` preference (already exists) and make it default to true. This is the minimal change — one boolean flip.
+
+**Rejected:** This saves sessions but doesn't address the context re-ingestion problem, the lossy handoff problem, or the ceremony session overhead. It's a preference toggle, not an architectural improvement.
+
+### B. Keep all unit types but share context via a persistent cache
+
+Instead of fresh sessions, maintain a shared context store that persists across units. Each unit reads from the store instead of re-inlining files.
+
+**Rejected:** This requires a fundamentally different session model — either a long-running session (which hits context limits) or a cache mechanism that the LLM can query (which doesn't exist in the Claude API). The fresh-session-per-unit model is correct; the problem is what we put in each session, not the session model itself.
+
+### C. Collapse everything into a single session per slice
+
+One session per slice: plan + execute all tasks + complete. Maximum context efficiency.
+
+**Rejected:** This hits real context limits for slices with 4+ tasks. Task execution is legitimately heavy — reading code, writing code, running tests, debugging failures. A single session for all of this would exhaust the context window. The plan-slice / execute-task boundary is a genuine engineering constraint, not ceremony.
+
+### D. Fold completion into the last executor's prompt instead of post-unit processing
+
+The original design had the last execute-task writing SUMMARY.md, UAT.md, and marking the slice done.
+
+**Rejected (per Codex audit):** This creates a verification-retry ordering problem. If the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure retries against incorrect derived state. Post-gate mechanical processing avoids this by running only after verification succeeds.
+
+### E. Keep complete-slice as a separate session
+
+The mechanical summary quality might be insufficient for complex slices.
+
+**Addressed:** The mechanical approach with LLM fallback provides the best of both worlds. Simple slices get fast mechanical completion. Complex slices fall back to the existing LLM session. The quality threshold is tunable.
+
+## Action Items
+
+### Phase 1: Merge research into planning (+ targeted inlining reduction)
+1. Update `buildPlanMilestonePrompt()` — add exploration instructions, skill discovery, drop "Trust the research"
+2. Update `buildPlanSlicePrompt()` — add exploration instructions, reassessment preamble, drop "Trust the research"
+3. Remove dispatch rule "pre-planning (no research) → research-milestone" — merge with "pre-planning (has research) → plan-milestone" into single "pre-planning → plan-milestone"
+4. Remove dispatch rule "planning (no research, not S01) → research-slice"
+5. Update `plan-milestone.md` and `plan-slice.md` prompt templates
+6. Make `skip_research` and `skip_slice_research` preferences default to true (backwards compat)
+7. Retain research templates for explicit `/gsd dispatch research` use
+8. **Targeted inlining reduction for planning sessions:** Move DECISIONS, REQUIREMENTS, PROJECT to path references in plan-milestone and plan-slice prompts. Keep ROADMAP and CONTEXT inlined. This prevents context pressure from the added exploration work.
+
+### Phase 2: Mechanical slice completion
+9. Implement `mechanicalSliceCompletion()` in `auto-post-unit.ts`
+10. Wire into post-unit processing: detect all-tasks-done after verification gate passes, run mechanical completion
+11. Implement quality threshold check (summary length, artifact presence)
+12. Retain `summarizing → complete-slice` dispatch rule as fallback for mechanical failures
+13. Implement `mechanicalRequirementsUpdate()` and `appendNewDecisions()`
+
+### Phase 3: Mechanical milestone validation + completion
+14. Implement `aggregateMilestoneVerification()` reading `T##-VERIFY.json` and `S##-UAT-RESULT.md`
+15. Implement `generateMilestoneSummary()` from slice summary aggregation
+16. Wire into post-unit processing: after last slice completion, run mechanical validation + summary
+17. Make reassess-roadmap opt-in via `reassess_after_slice` preference (default: false)
+18. Retain `validating-milestone` and `completing-milestone` dispatch rules as fallbacks
+
+### Phase 4: Full context re-ingestion reduction
+19. Replace remaining `inlineFile()` calls for stable documents with mandatory-read path references
+20. Update prompt headers with explicit "You MUST read" directives for critical files
+21. Add plan output verification (must cite ROADMAP slice description)
+22. Measure plan quality metrics before/after to validate the change
+
+### Phase 5: Downstream simplification (optional, deferred)
+23. Simplify `auto-post-unit.ts` — remove reassess dispatch logic (opt-in only)
+24. Simplify `auto-stuck-detection.ts` — fewer unit type patterns
+25. Simplify `auto-idempotency.ts` — fewer completed-key types
+26. Review `auto-recovery.ts` — simplify recovery paths for unit types that are now fallback-only
+27. Update auto-mode documentation (`docs/auto-mode.md`)
+
+## Audit Trail
+
+### Round 1 — Three-model review (March 18, 2026)
+
+**Claude Opus 4.6** identified 8 issues:
+1. ✅ Session count math inconsistent about S01 plan-slice skip — **fixed**: explicit derivation added with per-slice breakdown
+2. ✅ `discuss` session counted in pipeline but not in math — **fixed**: noted as interactive session, not auto-mode unit
+3. ✅ Token savings double-counting (eliminated sessions + re-ingestion) — **fixed**: removed overlap, noted savings are not additive
+4. ✅ Context inlining change (file paths vs inline) underanalyzed — **fixed**: expanded to dedicated risk section with enforcement strategy, phased rollout, and interaction with budget engine
+5. ✅ Budget engine interaction not discussed — **fixed**: addressed in context inlining section
+6. ✅ `aggregateMilestoneVerification()` reads wrong data source — **fixed**: now reads `T##-VERIFY.json` as primary source, supplemented by `S##-UAT-RESULT.md`
+7. ✅ Phase ordering creates heavy intermediate state (Phase 1 without Phase 4) — **fixed**: Phase 1 now includes targeted inlining reduction for planning sessions
+8. ✅ ADR number conflict — **fixed**: confirmed no ADR-003 exists in `docs/` (the referenced file doesn't exist in current git)
+
+**OpenAI Codex** identified 6 issues:
+1. ✅ HIGH: Folding completion into execute-task breaks verification-retry model — **fixed**: moved completion to post-gate mechanical processing instead of executor prompt. Added Alternative D explaining why.
+2. ✅ HIGH: Mechanical validation reads nonexistent `verification_evidence` frontmatter — **fixed**: now reads `T##-VERIFY.json` (canonical machine-readable source from `verification-evidence.ts`)
+3. ✅ HIGH: Replacement validation drops UAT evidence — **fixed**: aggregator now reads both `T##-VERIFY.json` and `S##-UAT-RESULT.md`
+4. ✅ HIGH: "State derivation stays unchanged" is false — **fixed**: explicitly documented that `deriveState()` phases are preserved, mechanical processing resolves them synchronously, fallback dispatch rules handle failures
+5. ✅ MEDIUM: Folded completion omits REQUIREMENTS.md and KNOWLEDGE.md updates — **fixed**: mechanical completion handles REQUIREMENTS.md and DECISIONS.md; KNOWLEDGE.md addressed in Risk 5
+6. ✅ MEDIUM: Session and token math inconsistent — **fixed**: complete rederivation with per-slice breakdown, corrected to 30 baseline sessions, noted profile variations
+
+**Gemini 2.5 Pro** audit was not usable — it hallucinated the ADR as a CI/CD pipeline document about GitHub Actions, matrix builds, and nx workspace tooling. No findings were applicable to the actual content.
diff --git a/docs/FILE-SYSTEM-MAP.md b/docs/FILE-SYSTEM-MAP.md
new file mode 100644
index 000000000..cfaa65fae
--- /dev/null
+++ b/docs/FILE-SYSTEM-MAP.md
@@ -0,0 +1,1020 @@
+# GSD2 File System Map
+# Maps every source file to its system/subsystem labels
+
+---
+
+## System Labels Reference
+
+| Label | Description |
+|-------|-------------|
+| **Agent Core** | Core agent loop, session lifecycle, SDK factory |
+| **AI Providers** | LLM provider implementations (Anthropic, OpenAI, Google, etc.) |
+| **API Routes** | Next.js API route handlers (web server) |
+| **AST** | Abstract Syntax Tree search/rewrite via tree-sitter + ast-grep |
+| **Async Jobs** | Background bash job management |
+| **Auth / OAuth** | Authentication, OAuth flows, token storage |
+| **Auto Engine** | GSD autonomous execution loop, dispatch, supervision |
+| **Bg Shell** | Background process / interactive shell management |
+| **Browser Tools** | Playwright-based browser automation extension |
+| **Build System** | Scripts for build, packaging, version management, CI |
+| **CLI** | Command-line entry points and argument parsing |
+| **CMux** | Tmux/multiplexer session integration |
+| **Commands** | GSD slash/sub-command routing and handlers |
+| **Compaction** | Context token reduction and summarization |
+| **Config** | Paths, defaults, models, preferences, constants |
+| **Context7** | Library documentation fetching extension |
+| **Doctor / Diagnostics** | Health checks, forensics, skill health |
+| **Event System** | Event bus, publication/subscription |
+| **Extension Registry** | Extension discovery, manifests, enable/disable |
+| **Extensions** | Extension loader, runner, project trust, hooks |
+| **File Search** | grep, glob, fd — file and content discovery |
+| **GSD Workflow** | Core GSD planning/execution workflow engine |
+| **Google Search** | Web search via Google API |
+| **Headless Mode** | Non-interactive / scripted command execution |
+| **Image Processing** | Image decode, resize, encode, clipboard images |
+| **Integration Tests** | Smoke, fixture, live, regression test suites |
+| **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
+| **LSP** | Language Server Protocol client and multiplexer |
+| **Mac Tools** | macOS-native utilities (Swift CLI) |
+| **MCP Server/Client** | Model Context Protocol server and client |
+| **Memory Extension** | In-session memory pipeline and storage |
+| **Migration** | Data and config migration tools |
+| **Modes** | Interactive TUI, Print, RPC, and Web modes |
+| **Model System** | Model discovery, resolution, routing, registry |
+| **Native / Rust Tools** | N-API Rust engine modules |
+| **Node.js Bindings** | TypeScript wrappers around Rust N-API modules |
+| **Onboarding** | First-run wizard and setup flows |
+| **Permissions** | Permission management for tools and trust |
+| **Remote Questions** | Remote prompting via Slack, Discord, Telegram |
+| **Search the Web** | Brave/Jina/Tavily-based web search extension |
+| **Session Management** | Session file I/O, branches, fork trees |
+| **Skills** | Skill tool registration, health, telemetry |
+| **Slash Commands** | Command boilerplate generators extension |
+| **State Machine** | State, history, persistence, reactive graph |
+| **Studio App** | Electron desktop app (renderer, main, preload) |
+| **Subagent** | Parallel/serial subagent delegation |
+| **Syntax Highlighting** | Syntect-backed ANSI code coloring |
+| **Text Processing** | Diff, truncation, HTML→MD, ANSI, JSON parse |
+| **Tool System** | Tool implementations (bash, edit, read, write, grep…) |
+| **TTSR** | Time-Traveling Stream Rules regex guardrails |
+| **TUI Components** | Terminal UI component library (pi-tui) |
+| **Universal Config** | Multi-tool configuration file discovery |
+| **Voice** | Voice input extension (Swift/Python) |
+| **VS Code Extension** | VS Code sidebar, chat participant, RPC client |
+| **Web Mode** | Web server service layer and RPC bridge |
+| **Web UI** | Next.js frontend components, pages, hooks |
+| **Worktree** | Git worktree lifecycle, sync, name generation |
+
+---
+
+## src/ — Core Application Files
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| src/app-paths.ts | Config | App directory paths (GSD_HOME, sessions, web PID, prefs) |
+| src/app-paths.js | Config | Compiled JS version |
+| src/bundled-extension-paths.ts | Extension Registry | Serializes/parses bundled extension directory paths |
+| src/bundled-resource-path.ts | Loader/Bootstrap, Extension Registry | Resolves bundled raw resource files from package root |
+| src/cli.ts | CLI | Main CLI entry point — arg parsing, mode detection, plugin init |
+| src/cli-web-branch.ts | CLI, Web Mode | Web CLI branch; session dir resolution, legacy migration |
+| src/extension-discovery.ts | Extension Registry | Discovers extension entry points from FS and package.json |
+| src/extension-registry.ts | Extension Registry | Extension manifests, registry persistence, enable/disable |
+| src/headless-answers.ts | Headless Mode | Pre-supply answers to extension UI requests in headless |
+| src/headless-context.ts | Headless Mode | Context loading from stdin/files; project bootstrapping |
+| src/headless-events.ts | Headless Mode | Event classification, terminal detection, idle timeouts |
+| src/headless-query.ts | Headless Mode, CLI | Read-only snapshot query (state, dispatch preview, costs) |
+| src/headless-ui.ts | Headless Mode | Extension UI auto-response, progress formatting |
+| src/headless.ts | Headless Mode | Orchestrator for /gsd subcommands without TUI via RPC |
+| src/help-text.ts | CLI | Generates help text for all subcommands |
+| src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
+| src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
+| src/mcp-server.ts | MCP Server/Client | Native MCP server over stdin/stdout for external AI clients |
+| src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to GSD |
+| src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup |
+| src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to GSD |
+| src/project-sessions.ts | State Machine, CLI | Session-per-project directory paths from project CWD |
+| src/remote-questions-config.ts | Config, Onboarding | Saves remote questions (Discord, Slack, Telegram) config |
+| src/resource-loader.ts | Loader/Bootstrap, Extension Registry | Initializes, syncs, validates bundled resources |
+| src/startup-timings.ts | CLI, Build System | Optional startup timing instrumentation |
+| src/tool-bootstrap.ts | Loader/Bootstrap | Manages fd/rg availability, falls back to built-in |
+| src/update-check.ts | CLI | Checks npm registry for new versions (cached) |
+| src/update-cmd.ts | CLI | Executes npm install to update gsd-pi package |
+| src/web-mode.ts | Web Mode | Launches/manages web server process (PID tracking, browser) |
+| src/welcome-screen.ts | CLI | Welcome panel — logo, version, model info |
+| src/wizard.ts | Onboarding, Config | Loads env keys from auth.json → hydrates process.env |
+| src/worktree-cli.ts | Worktree, CLI | Worktree lifecycle: create, list, merge, clean, remove |
+| src/worktree-name-gen.ts | Worktree | Generates random worktree names (adjective-verbing-noun) |
+
+### src/web/ — Web Service Layer
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| src/web/auto-dashboard-service.ts | Web Mode, Auto Engine | Loads auto-mode dashboard state (active, paused, costs) |
+| src/web/bridge-service.ts | Web Mode, State Machine | Central hub spawning RPC sessions, managing session state |
+| src/web/captures-service.ts | Web Mode | Loads knowledge capture entries via child process bridge |
+| src/web/cleanup-service.ts | Web Mode | Collects GSD branches and snapshot refs for cleanup |
+| src/web/cli-entry.ts | Web Mode, CLI | Builds/resolves GSD CLI entry points for RPC/interactive |
+| src/web/doctor-service.ts | Web Mode, Doctor/Diagnostics | Runs diagnostics, returns fixer operations |
+| src/web/export-service.ts | Web Mode | Generates exported project reports (markdown/JSON) |
+| src/web/forensics-service.ts | Web Mode, Doctor/Diagnostics | Loads forensic report data (traces, metrics, issues) |
+| src/web/git-summary-service.ts | Web Mode | Provides git branch, commit history, diff summary |
+| src/web/history-service.ts | Web Mode | Loads metrics ledger, aggregates history views |
+| src/web/hooks-service.ts | Web Mode | Manages git hook registration and shell integration |
+| src/web/inspect-service.ts | Web Mode | Detailed inspection of project state and traces |
+| src/web/knowledge-service.ts | Web Mode | Reads and parses KNOWLEDGE.md |
+| src/web/onboarding-service.ts | Web Mode, Onboarding, Auth/OAuth | Manages onboarding state, auth refresh, lock reasons |
+| src/web/project-discovery-service.ts | Web Mode | Discovers and catalogs projects in filesystem |
+| src/web/recovery-diagnostics-service.ts | Web Mode | Recovery suggestions for error states/blockers |
+| src/web/settings-service.ts | Web Mode, Config | Loads preferences, routing config, budget, totals |
+| src/web/skill-health-service.ts | Web Mode, Doctor/Diagnostics | Loads skill health report with capability assessments |
+| src/web/undo-service.ts | Web Mode | Manages undo/snapshot and restoration |
+| src/web/update-service.ts | Web Mode | Checks for and executes application updates |
+| src/web/visualizer-service.ts | Web Mode | Generates visual representations of project state |
+| src/web/web-auth-storage.ts | Web Mode, Auth/OAuth | OAuth and API key credential storage for web mode |
+
+---
+
+## packages/pi-agent-core/src/ — Agent Core
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| agent-loop.ts | Agent Core, State Machine | Core agent execution loop — tool calls and LLM interactions |
+| agent.ts | Agent Core | Main Agent class wrapping loop with state management |
+| proxy.ts | Agent Core | Proxy wrapper for agent functionality |
+| types.ts | Agent Core | Type definitions for agent config, context, events |
+| index.ts | Agent Core | Package exports |
+
+---
+
+## packages/pi-ai/src/ — AI Providers
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| index.ts | AI Providers | Main export hub for providers and streaming |
+| api-registry.ts | AI Providers | Registry for managing multiple AI provider implementations |
+| models.ts | AI Providers | Model definitions and metadata |
+| models.generated.ts | AI Providers | Auto-generated model list from provider registries |
+| stream.ts | AI Providers | Main streaming interface dispatching to registered providers |
+| types.ts | AI Providers | Core types for models, APIs, streaming options |
+| env-api-keys.ts | AI Providers, Auth/OAuth | Environment variable API key resolution |
+| web-runtime-env-api-keys.ts | AI Providers, Auth/OAuth | Web runtime API key handling |
+| web-runtime-oauth.ts | AI Providers, Auth/OAuth | Web runtime OAuth token management |
+| providers/register-builtins.ts | AI Providers | Registration of built-in provider implementations |
+| providers/anthropic.ts | AI Providers | Anthropic API provider |
+| providers/anthropic-shared.ts | AI Providers | Shared utilities for Anthropic provider variants |
+| providers/anthropic-vertex.ts | AI Providers | Google Vertex AI Anthropic models |
+| providers/amazon-bedrock.ts | AI Providers | AWS Bedrock LLM provider |
+| providers/bedrock-provider.ts | AI Providers | Bedrock-specific streaming logic |
+| providers/google.ts | AI Providers | Google Generative AI provider |
+| providers/google-gemini-cli.ts | AI Providers | Google Gemini CLI authentication provider |
+| providers/google-shared.ts | AI Providers | Shared Google provider utilities |
+| providers/google-vertex.ts | AI Providers | Google Vertex AI provider |
+| providers/mistral.ts | AI Providers | Mistral AI provider |
+| providers/openai-completions.ts | AI Providers | OpenAI legacy completions API |
+| providers/openai-responses.ts | AI Providers | OpenAI responses (chat) API |
+| providers/openai-responses-shared.ts | AI Providers | Shared OpenAI responses utilities |
+| providers/openai-shared.ts | AI Providers | Shared OpenAI utilities |
+| providers/openai-codex-responses.ts | AI Providers | OpenAI Codex-specific response handling |
+| providers/azure-openai-responses.ts | AI Providers | Azure OpenAI responses provider |
+| providers/github-copilot-headers.ts | AI Providers | GitHub Copilot custom header construction |
+| providers/simple-options.ts | AI Providers | Common options builder for simple streaming |
+| providers/transform-messages.ts | AI Providers | Message transformation for provider compatibility |
+| utils/oauth/index.ts | Auth/OAuth | OAuth utilities export hub |
+| utils/oauth/types.ts | Auth/OAuth | OAuth credential and prompt types |
+| utils/oauth/pkce.ts | Auth/OAuth | PKCE flow implementation |
+| utils/oauth/github-copilot.ts | Auth/OAuth | GitHub Copilot OAuth flow |
+| utils/oauth/google-oauth-utils.ts | Auth/OAuth | Shared Google OAuth utilities |
+| utils/oauth/google-gemini-cli.ts | Auth/OAuth | Google Gemini CLI OAuth flow |
+| utils/oauth/google-antigravity.ts | Auth/OAuth | Google Antigravity OAuth implementation |
+| utils/oauth/openai-codex.ts | Auth/OAuth | OpenAI Codex OAuth flow |
+| utils/oauth/anthropic.ts | Auth/OAuth | Anthropic OAuth flow |
+| utils/event-stream.ts | AI Providers | Event stream parsing and handling |
+| utils/hash.ts | AI Providers | Hashing utilities |
+| utils/json-parse.ts | AI Providers | Resilient JSON parsing with recovery |
+| utils/overflow.ts | AI Providers | Token/context overflow detection |
+| utils/sanitize-unicode.ts | AI Providers | Unicode sanitization for API compatibility |
+| utils/validation.ts | AI Providers | Request/response validation schemas |
+| utils/typebox-helpers.ts | AI Providers | TypeBox schema helpers |
+
+---
+
+## packages/pi-tui/src/ — TUI Components
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| index.ts | TUI Components | Main TUI export hub |
+| tui.ts | TUI Components | Core TUI renderer and component system |
+| terminal.ts | TUI Components | Low-level terminal I/O and rendering |
+| keys.ts | TUI Components | Keyboard key parsing and matching |
+| keybindings.ts | TUI Components | Keybinding configuration and management |
+| stdin-buffer.ts | TUI Components | Buffered stdin for batch key processing |
+| editor-component.ts | TUI Components | Interface for custom editor implementations |
+| autocomplete.ts | TUI Components | Autocomplete suggestion provider system |
+| fuzzy.ts | TUI Components | Fuzzy matching algorithm |
+| terminal-image.ts | TUI Components | Terminal image protocol (Kitty, iTerm2) |
+| kill-ring.ts | TUI Components | Emacs-style kill ring buffer |
+| undo-stack.ts | TUI Components | Undo/redo stack for editor operations |
+| overlay-layout.ts | TUI Components | Overlay/modal dialog layout system |
+| utils.ts | TUI Components | Text width calculation, ANSI utilities |
+| components/box.ts | TUI Components | Box drawing with borders and styling |
+| components/text.ts | TUI Components | Simple text display component |
+| components/truncated-text.ts | TUI Components | Text with automatic truncation |
+| components/spacer.ts | TUI Components | Vertical/horizontal spacing |
+| components/input.ts | TUI Components | Single-line text input with history |
+| components/loader.ts | TUI Components | Animated loading spinner |
+| components/cancellable-loader.ts | TUI Components | Loading spinner with cancel |
+| components/image.ts | TUI Components | Image display with theme support |
+| components/select-list.ts | TUI Components | List selection UI with keyboard nav |
+| components/settings-list.ts | TUI Components | Settings/preferences list display |
+| components/editor.ts | TUI Components | Full multi-line editor with syntax awareness |
+| components/markdown.ts | TUI Components | Markdown rendering to terminal |
+
+---
+
+## packages/pi-coding-agent/src/ — Coding Agent
+
+### CLI
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| cli.ts | CLI | Main CLI entry point and argument routing |
+| main.ts | CLI | CLI main entry with mode routing |
+| cli/args.ts | CLI | CLI argument definition and parsing |
+| cli/config-selector.ts | CLI | Interactive configuration selection |
+| cli/file-processor.ts | CLI | File input processing for agent context |
+| cli/list-models.ts | CLI, Model System | Model listing and discovery UI |
+| cli/session-picker.ts | CLI | Session selection interface |
+
+### Core — Session & State
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/agent-session.ts | Agent Core, State Machine | Core session abstraction, agent lifecycle, persistence |
+| core/session-manager.ts | Session Management | Session file I/O, branch/fork tree management |
+| core/event-bus.ts | Agent Core, Event System | Event publication and subscription |
+| core/messages.ts | State Machine | Message type definitions and constructors |
+| core/settings-manager.ts | Session Management, Config | Session-level settings persistence |
+
+### Core — Tool System
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/tools/index.ts | Tool System | Tool registry and factory exports |
+| core/tools/bash.ts | Tool System | Bash/shell command execution tool |
+| core/tools/bash-interceptor.ts | Tool System | Bash command interception and filtering |
+| core/tools/edit.ts | Tool System | File editing tool with line ranges |
+| core/tools/edit-diff.ts | Tool System | Edit tool with diff-based operations |
+| core/tools/read.ts | Tool System | File reading tool |
+| core/tools/write.ts | Tool System | File writing tool |
+| core/tools/find.ts | Tool System, File Search | File discovery tool |
+| core/tools/grep.ts | Tool System, File Search | Pattern search tool |
+| core/tools/ls.ts | Tool System | Directory listing tool |
+| core/tools/truncate.ts | Tool System, Text Processing | Output truncation utility |
+| core/tools/hashline.ts | Tool System | Hash-based line identification |
+| core/tools/hashline-read.ts | Tool System | File reading with hash-based line ranges |
+| core/tools/hashline-edit.ts | Tool System | File editing with hash-based line identification |
+| core/tools/path-utils.ts | Tool System | Path normalization and validation |
+| core/bash-executor.ts | Tool System | High-level bash execution with event handling |
+| core/exec.ts | Tool System | Utility functions for command execution |
+
+### Core — Model Management
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/model-registry.ts | Model System | Model metadata and capability registry |
+| core/model-discovery.ts | Model System | Model discovery from external sources |
+| core/model-resolver.ts | Model System | Model selection and resolution logic |
+| core/models-json-writer.ts | Model System | Model metadata serialization |
+
+### Core — AI & Context
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/prompt-templates.ts | Agent Core | Template system for prompt construction |
+| core/system-prompt.ts | Agent Core | System prompt building and management |
+| core/retry-handler.ts | AI Providers | Retry logic with exponential backoff |
+| core/fallback-resolver.ts | Model System | Model fallback resolution on API failures |
+| core/slash-commands.ts | Commands | Built-in slash command definitions and handlers |
+
+### Core — Extensions & Skills
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/extensions/index.ts | Extensions | Extension system exports |
+| core/extensions/types.ts | Extensions | Extension event and context types |
+| core/extensions/loader.ts | Extensions | Extension discovery and loading |
+| core/extensions/runner.ts | Extensions, Event System | Extension event dispatch and execution |
+| core/extensions/wrapper.ts | Extensions, Tool System | Tool wrapping for extension monitoring |
+| core/extensions/project-trust.ts | Extensions, Permissions | Project trust management for local extensions |
+| core/skills.ts | Skills, Tool System | Skill tool registration and management |
+
+### Core — Compaction
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/compaction-orchestrator.ts | Compaction | Orchestrates session compaction decisions |
+| core/compaction/compaction.ts | Compaction | Context token reduction via summarization |
+| core/compaction/branch-summarization.ts | Compaction | Branch history summarization for context limits |
+| core/compaction/utils.ts | Compaction | Compaction utilities |
+
+### Core — Configuration & Auth
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| config.ts | Config | Directory paths and version management |
+| core/sdk.ts | Agent Core | Main SDK factory for creating agent sessions |
+| core/resolve-config-value.ts | Config | Config value resolution from environment/files |
+| core/resource-loader.ts | Config, Loader/Bootstrap | Extensible resource loading (tools, extensions, themes) |
+| core/defaults.ts | Config | Default configuration values |
+| core/constants.ts | Config | Global constants |
+| core/auth-storage.ts | Auth/OAuth, Permissions | OAuth token storage and management |
+| migrations.ts | Config, Migration | Configuration migration and deprecation handling |
+
+### Core — Artifacts & Export
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/artifact-manager.ts | Agent Core | Artifact file management and metadata |
+| core/blob-store.ts | Agent Core | Binary data storage for images and attachments |
+| core/export-html/index.ts | Web Mode | Session export to HTML |
+| core/export-html/ansi-to-html.ts | Web Mode | ANSI code to HTML conversion |
+| core/export-html/tool-renderer.ts | Web Mode | HTML rendering for tool calls/results |
+
+### Core — LSP
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/lsp/index.ts | LSP | LSP integration exports |
+| core/lsp/client.ts | LSP | LSP client implementation |
+| core/lsp/lspmux.ts | LSP | LSP server multiplexing |
+| core/lsp/config.ts | LSP | LSP server configuration |
+| core/lsp/edits.ts | LSP | LSP-based code editing operations |
+| core/lsp/helpers.ts | LSP | LSP utility functions |
+| core/lsp/types.ts | LSP | LSP type definitions |
+| core/lsp/utils.ts | LSP | LSP utilities |
+
+### Core — Utilities
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| core/fs-utils.ts | Tool System | File system utilities (atomic writes, temp files) |
+| core/lock-utils.ts | Tool System | File locking for concurrent access |
+| core/timings.ts | Build System | Performance timing measurement |
+| core/diagnostics.ts | Doctor/Diagnostics | Diagnostic information collection |
+| core/discovery-cache.ts | Model System | Model discovery result caching |
+| core/keybindings.ts | TUI Components | Keybinding definitions |
+| core/footer-data-provider.ts | TUI Components | Footer information provider |
+| core/index.ts | Agent Core | Core module exports |
+| index.ts | Agent Core | Package exports |
+| utils/clipboard.ts | Tool System | Clipboard read/write |
+| utils/clipboard-native.ts | Tool System | Native clipboard implementation |
+| utils/clipboard-image.ts | Tool System | Clipboard image support |
+| utils/error.ts | Agent Core | Error message extraction/formatting |
+| utils/frontmatter.ts | Config | YAML frontmatter parsing |
+| utils/git.ts | Tool System | Git information and utilities |
+| utils/image-convert.ts | Image Processing | Image format conversion |
+| utils/image-resize.ts | Image Processing | Image resizing and optimization |
+| utils/mime.ts | Tool System | MIME type detection |
+| utils/path-display.ts | TUI Components | Path formatting for display |
+| utils/photon.ts | Agent Core | Photon scripting runtime support |
+| utils/shell.ts | Tool System | Shell detection and execution |
+| utils/changelog.ts | CLI | Changelog parsing |
+| utils/sleep.ts | Agent Core | Async sleep/delay utility |
+| utils/tools-manager.ts | Tool System | Tool discovery and management |
+| package-manager.ts | Build System | npm/yarn/pnpm/bun abstraction |
+
+### Modes
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| modes/index.ts | Modes | Mode system exports |
+| modes/print-mode.ts | Modes | Non-interactive print mode |
+| modes/rpc/rpc-mode.ts | Modes, MCP Server/Client | RPC server mode for remote access |
+| modes/rpc/rpc-client.ts | Modes, MCP Server/Client | RPC client for remote agent interaction |
+| modes/rpc/rpc-types.ts | Modes, MCP Server/Client | RPC protocol type definitions |
+| modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC |
+| modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling |
+| modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities |
+| modes/interactive/interactive-mode.ts | Modes, TUI Components | Main interactive TUI mode orchestration |
+| modes/interactive/interactive-mode-state.ts | Modes, TUI Components, State Machine | Interactive mode state management |
+| modes/interactive/slash-command-handlers.ts | Modes, Commands | Interactive mode slash command handlers |
+| modes/interactive/theme/theme.ts | TUI Components | Theme system and hot reloading |
+| modes/interactive/theme/themes.ts | TUI Components | Built-in theme definitions |
+| modes/interactive/utils/shorten-path.ts | TUI Components | Path shortening for display |
+| modes/interactive/controllers/chat-controller.ts | Modes, TUI Components | Chat input and message submission |
+| modes/interactive/controllers/input-controller.ts | Modes, TUI Components | Input handling and routing |
+| modes/interactive/controllers/model-controller.ts | Modes, TUI Components, Model System | Model/provider/thinking configuration |
+| modes/interactive/controllers/extension-ui-controller.ts | Modes, TUI Components, Extensions | Extension UI event handling |
+
+### Modes — Interactive Components
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| components/index.ts | TUI Components | Interactive mode component exports |
+| components/armin.ts | TUI Components | Assistant message rendering |
+| components/assistant-message.ts | TUI Components | Assistant message display |
+| components/user-message.ts | TUI Components | User message display |
+| components/user-message-selector.ts | TUI Components | User message editing selector |
+| components/bash-execution.ts | TUI Components, Tool System | Bash execution result display |
+| components/tool-execution.ts | TUI Components, Tool System | Tool call and result display |
+| components/custom-message.ts | TUI Components | Custom message type display |
+| components/custom-editor.ts | TUI Components | Custom editor integration |
+| components/skill-invocation-message.ts | TUI Components, Skills | Skill invocation display |
+| components/branch-summary-message.ts | TUI Components, Compaction | Branch summary display |
+| components/compaction-summary-message.ts | TUI Components, Compaction | Compaction summary display |
+| components/diff.ts | TUI Components, Text Processing | Diff display component |
+| components/tree-render-utils.ts | TUI Components, Session Management | Session tree rendering utilities |
+| components/tree-selector.ts | TUI Components, Session Management | Session tree navigation UI |
+| components/session-selector.ts | TUI Components, Session Management | Session selection UI |
+| components/session-selector-search.ts | TUI Components, Session Management | Session search UI |
+| components/model-selector.ts | TUI Components, Model System | Model selection UI |
+| components/scoped-models-selector.ts | TUI Components, Model System | Scoped model selection |
+| components/thinking-selector.ts | TUI Components, Model System | Thinking level selection |
+| components/provider-manager.ts | TUI Components, AI Providers | Provider configuration UI |
+| components/oauth-selector.ts | TUI Components, Auth/OAuth | OAuth provider selection/login |
+| components/login-dialog.ts | TUI Components, Auth/OAuth | OAuth login dialog |
+| components/theme-selector.ts | TUI Components | Theme selection UI |
+| components/config-selector.ts | TUI Components, Config | Configuration selection UI |
+| components/extension-selector.ts | TUI Components, Extensions | Extension selection UI |
+| components/extension-editor.ts | TUI Components, Extensions | Extension code editor |
+| components/extension-input.ts | TUI Components, Extensions | Extension input handling |
+| components/settings-selector.ts | TUI Components, Config | Settings/preferences UI |
+| components/show-images-selector.ts | TUI Components, Config | Image display toggle |
+| components/bordered-loader.ts | TUI Components | Loading spinner with border |
+| components/countdown-timer.ts | TUI Components | Countdown timer display |
+| components/dynamic-border.ts | TUI Components | Dynamic border drawing |
+| components/keybinding-hints.ts | TUI Components | Keybinding help display |
+| components/footer.ts | TUI Components | Footer information display |
+| components/daxnuts.ts | TUI Components | Special rendering effect |
+| components/visual-truncate.ts | TUI Components | Visual text truncation |
+
+### Resources — Memory Extension
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| resources/extensions/memory/index.ts | Memory Extension | Memory extension index and setup |
+| resources/extensions/memory/pipeline.ts | Memory Extension | Memory processing pipeline |
+| resources/extensions/memory/storage.ts | Memory Extension | Memory persistence storage |
+
+---
+
+## src/resources/extensions/ — Extension Subsystems
+
+### GSD Extension (Core Workflow Engine)
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| gsd/index.ts | GSD Workflow | Main GSD extension bootstrap and registration |
+| gsd/auto.ts | Auto Engine | Automatic workflow execution and loop management |
+| gsd/auto-dashboard.ts | Auto Engine, Web Mode | Real-time dashboard for auto-run progress |
+| gsd/auto-worktree.ts | Auto Engine, Worktree | Automatic worktree creation and branch management |
+| gsd/auto-recovery.ts | Auto Engine | Recovery for crashed/stalled workflows |
+| gsd/auto-start.ts | Auto Engine | Initialization sequence for automatic execution |
+| gsd/auto-worktree-sync.ts | Auto Engine, Worktree | State sync between worktrees and main |
+| gsd/auto-model-selection.ts | Auto Engine, Model System | Intelligent LLM model routing |
+| gsd/auto-direct-dispatch.ts | Auto Engine | Direct command dispatching without planning |
+| gsd/auto-dispatch.ts | Auto Engine | Task queueing and priority-based dispatch |
+| gsd/auto-timeout-recovery.ts | Auto Engine | Timeout handling and recovery |
+| gsd/auto-post-unit.ts | Auto Engine | Post-unit milestone completion processing |
+| gsd/auto-unit-closeout.ts | Auto Engine | Unit finalization and archiving |
+| gsd/auto-verification.ts | Auto Engine | Post-execution verification |
+| gsd/auto-timers.ts | Auto Engine | Timeout and deadline management |
+| gsd/auto-loop.ts | Auto Engine, State Machine | Execution loop state and cycle management |
+| gsd/auto-supervisor.ts | Auto Engine | Supervision and oversight of autonomous runs |
+| gsd/auto-budget.ts | Auto Engine | Token/cost budgeting and tracking |
+| gsd/auto-observability.ts | Auto Engine | Observability hooks and telemetry |
+| gsd/auto-tool-tracking.ts | Auto Engine | Tool usage instrumentation |
+| gsd/doctor.ts | Doctor/Diagnostics | Health check and system diagnostics |
+| gsd/doctor-checks.ts | Doctor/Diagnostics | Individual diagnostic checks |
+| gsd/doctor-providers.ts | Doctor/Diagnostics | Diagnostic data source providers |
+| gsd/doctor-format.ts | Doctor/Diagnostics | Diagnostic output formatting |
+| gsd/state.ts | State Machine | Milestone and workflow state management |
+| gsd/history.ts | State Machine | State history and versioning |
+| gsd/json-persistence.ts | State Machine | JSON-based persistence layer |
+| gsd/memory-store.ts | State Machine | In-memory state storage |
+| gsd/reactive-graph.ts | State Machine | Reactive dependency graph for state |
+| gsd/routing-history.ts | State Machine | History of routing decisions |
+| gsd/cache.ts | State Machine | Caching layer for performance |
+| gsd/model-router.ts | Model System | LLM model selection and routing logic |
+| gsd/worktree.ts | Worktree | Worktree creation and management |
+| gsd/worktree-manager.ts | Worktree | Higher-level worktree orchestration |
+| gsd/worktree-resolver.ts | Worktree | Worktree path and reference resolution |
+| gsd/unit-runtime.ts | Auto Engine | Unit-level execution runtime |
+| gsd/activity-log.ts | GSD Workflow | Activity tracking and logging |
+| gsd/debug-logger.ts | GSD Workflow | Debug output and verbose logging |
+| gsd/commands.ts | Commands | Main command dispatcher |
+| gsd/commands-handlers.ts | Commands | Command-specific handlers |
+| gsd/commands-bootstrap.ts | Commands | Bootstrap and initialization commands |
+| gsd/commands-config.ts | Commands, Config | Configuration management commands |
+| gsd/commands-extensions.ts | Commands, Extensions | Extension discovery and management |
+| gsd/commands-inspect.ts | Commands, Doctor/Diagnostics | Database and state inspection tools |
+| gsd/commands-logs.ts | Commands | Log viewing and filtering |
+| gsd/commands-workflow-templates.ts | Commands, GSD Workflow | Workflow template management |
+| gsd/commands-cmux.ts | Commands, CMux | Tmux/cmux integration commands |
+| gsd/exit-command.ts | Commands | Exit and cleanup commands |
+| gsd/undo.ts | Commands | Undo and rollback functionality |
+| gsd/kill.ts | Commands | Process termination and cleanup |
+| gsd/worktree-command.ts | Commands, Worktree | Worktree subcommands |
+| gsd/namespaced-resolver.ts | GSD Workflow | Namespace and scoped resource resolution |
+| gsd/error-utils.ts | GSD Workflow | Error handling and formatting |
+| gsd/errors.ts | GSD Workflow | Error type definitions |
+| gsd/diff-context.ts | GSD Workflow | Diff-based context extraction |
+| gsd/memory-extractor.ts | GSD Workflow | Memory and context extraction from state |
+| gsd/structured-data-formatter.ts | GSD Workflow | Structured output formatting |
+| gsd/export-html.ts | GSD Workflow | HTML export of milestone reports |
+| gsd/reports.ts | GSD Workflow | Report generation and summaries |
+| gsd/notifications.ts | GSD Workflow | User notification and messaging |
+| gsd/triage-ui.ts | GSD Workflow | Triage interface for issue categorization |
+| gsd/guided-flow.ts | GSD Workflow | User-guided workflow orchestration |
+| gsd/env-utils.ts | GSD Workflow | Environment variable utilities |
+| gsd/git-constants.ts | GSD Workflow | Git-related constants and paths |
+| gsd/milestone-id-utils.ts | GSD Workflow | Milestone ID generation and parsing |
+| gsd/resource-version.ts | GSD Workflow | Resource versioning helpers |
+| gsd/atomic-write.ts | GSD Workflow | Atomic file write operations |
+| gsd/captures.ts | GSD Workflow | Artifact capture and storage |
+| gsd/changelog.ts | GSD Workflow | Changelog generation |
+| gsd/claude-import.ts | GSD Workflow | Claude API/resource importing |
+| gsd/collision-diagnostics.ts | Doctor/Diagnostics | Collision detection and diagnostics |
+| gsd/prompt-loader.ts | GSD Workflow | Prompt template loading |
+| gsd/file-watcher.ts | GSD Workflow | File system change monitoring |
+| gsd/parallel-eligibility.ts | GSD Workflow | Parallel execution eligibility checks |
+| gsd/plugin-importer.ts | GSD Workflow, Extensions | Custom plugin/extension importing |
+| gsd/verification-gate.ts | GSD Workflow | Pre-execution verification checks |
+| gsd/preference-models.ts | Config, Model System | Model preference configuration |
+| gsd/preferences-skills.ts | Config, Skills | Skill preference configuration |
+| gsd/post-unit-hooks.ts | GSD Workflow | Post-unit execution hooks |
+| gsd/skill-telemetry.ts | Skills | Skill usage and performance telemetry |
+| gsd/bootstrap/* | GSD Workflow, Loader/Bootstrap | Extension initialization and hook registration |
+| gsd/auto/* | Auto Engine | Auto-execution engine components |
+| gsd/commands/* | Commands | Command routing and handling |
+| gsd/templates/* | GSD Workflow | Output templates and formatters |
+| gsd/prompts/* | GSD Workflow | System prompts and instructions |
+| gsd/workflow-templates/* | GSD Workflow | Workflow starter templates and registry |
+| gsd/skills/* | Skills | Integrated skill configurations |
+| gsd/migrate/* | Migration | Data migration and upgrade tools |
+
+### Other Extensions
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| async-jobs/index.ts | Async Jobs | Background bash command execution extension |
+| async-jobs/job-manager.ts | Async Jobs | Background job lifecycle management |
+| async-jobs/async-bash-tool.ts | Async Jobs, Tool System | Tool for spawning background bash processes |
+| async-jobs/await-tool.ts | Async Jobs, Tool System | Tool for waiting on job completion |
+| async-jobs/cancel-job-tool.ts | Async Jobs, Tool System | Tool for cancelling background jobs |
+| bg-shell/index.ts | Bg Shell | Interactive background process management extension |
+| bg-shell/bg-shell-tool.ts | Bg Shell, Tool System | Tool for spawning background processes |
+| bg-shell/bg-shell-command.ts | Bg Shell, Commands | Command handler for bg subcommands |
+| bg-shell/bg-shell-lifecycle.ts | Bg Shell | Process lifecycle and state management |
+| bg-shell/process-manager.ts | Bg Shell | Core process management implementation |
+| bg-shell/readiness-detector.ts | Bg Shell | Startup readiness detection |
+| bg-shell/interaction.ts | Bg Shell | Interactive process communication |
+| bg-shell/output-formatter.ts | Bg Shell | Process output formatting |
+| bg-shell/overlay.ts | Bg Shell, TUI Components | Terminal overlay for process monitoring |
+| browser-tools/index.ts | Browser Tools | Playwright-based browser automation extension |
+| browser-tools/core.ts | Browser Tools | Core Playwright instance management |
+| browser-tools/lifecycle.ts | Browser Tools | Browser session lifecycle |
+| browser-tools/capture.ts | Browser Tools | Screenshot and media capture |
+| browser-tools/settle.ts | Browser Tools | Page settlement and readiness detection |
+| browser-tools/refs.ts | Browser Tools | Reference-based element selection |
+| browser-tools/state.ts | Browser Tools, State Machine | Browser state management |
+| browser-tools/tools/navigation.ts | Browser Tools, Tool System | Navigation and page loading tool |
+| browser-tools/tools/interaction.ts | Browser Tools, Tool System | Element interaction tool (click, type) |
+| browser-tools/tools/screenshot.ts | Browser Tools, Tool System | Screenshot and visual capture tool |
+| browser-tools/tools/inspection.ts | Browser Tools, Tool System | Page inspection tool |
+| browser-tools/tools/session.ts | Browser Tools, Tool System | Session management and cookies tool |
+| browser-tools/tools/pages.ts | Browser Tools, Tool System | Multi-page management tool |
+| browser-tools/tools/forms.ts | Browser Tools, Tool System | Form filling and submission tool |
+| browser-tools/tools/wait.ts | Browser Tools, Tool System | Wait conditions and polling tool |
+| browser-tools/tools/assertions.ts | Browser Tools, Tool System | Visual and content assertions tool |
+| browser-tools/tools/verify.ts | Browser Tools, Tool System | Verification checks tool |
+| browser-tools/tools/extract.ts | Browser Tools, Tool System | Data extraction tool |
+| browser-tools/tools/pdf.ts | Browser Tools, Tool System | PDF export/generation tool |
+| browser-tools/tools/state-persistence.ts | Browser Tools, Tool System | State save/restore tool |
+| browser-tools/tools/network-mock.ts | Browser Tools, Tool System | Network mocking/interception tool |
+| browser-tools/tools/device.ts | Browser Tools, Tool System | Device emulation tool |
+| browser-tools/tools/visual-diff.ts | Browser Tools, Tool System | Visual regression testing tool |
+| browser-tools/tools/zoom.ts | Browser Tools, Tool System | Zoom and viewport manipulation tool |
+| browser-tools/tools/codegen.ts | Browser Tools, Tool System | Test code generation tool |
+| browser-tools/tools/action-cache.ts | Browser Tools | Action caching and replay |
+| context7/index.ts | Context7, Tool System | Library documentation fetching extension |
+| google-search/index.ts | Google Search, Tool System | Web search via Google API |
+| search-the-web/index.ts | Search the Web | Brave/Jina/Tavily-based web search extension |
+| search-the-web/provider.ts | Search the Web | Search provider abstraction |
+| search-the-web/native-search.ts | Search the Web | Native Brave search implementation |
+| search-the-web/tavily.ts | Search the Web | Tavily search provider |
+| search-the-web/tool-search.ts | Search the Web, Tool System | Search tool implementation |
+| search-the-web/tool-fetch-page.ts | Search the Web, Tool System | Page fetching tool |
+| search-the-web/cache.ts | Search the Web | Search result caching |
+| remote-questions/index.ts | Remote Questions | Remote question routing extension |
+| remote-questions/manager.ts | Remote Questions | Question lifecycle management |
+| remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter |
+| remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter |
+| remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter |
+| mcp-client/index.ts | MCP Server/Client | Model Context Protocol client integration |
+| subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension |
+| subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery |
+| subagent/isolation.ts | Subagent | Execution isolation and sandboxing |
+| subagent/worker-registry.ts | Subagent | Worker process management |
+| slash-commands/index.ts | Slash Commands, Commands | Command boilerplate generators extension |
+| slash-commands/create-slash-command.ts | Slash Commands | Generator for new slash command scaffolding |
+| slash-commands/create-extension.ts | Slash Commands, Extensions | Generator for new extension scaffolding |
+| universal-config/index.ts | Universal Config | Multi-tool configuration file discovery |
+| universal-config/discovery.ts | Universal Config | Configuration file discovery |
+| universal-config/scanners.ts | Universal Config | Tool-specific config scanners |
+| ttsr/index.ts | TTSR | TTSR regex engine — streaming output guardrails |
+| ttsr/ttsr-manager.ts | TTSR | Streaming rule manager |
+| ttsr/rule-loader.ts | TTSR | Rule loading and parsing |
+| voice/index.ts | Voice | Voice input mode extension |
+| voice/speech-recognizer.swift | Voice | macOS Swift speech recognizer |
+| voice/speech-recognizer.py | Voice | Linux/Windows Python speech recognizer |
+| cmux/index.ts | CMux | Tmux/multiplexer session management |
+| mac-tools/index.ts | Mac Tools | macOS-specific utilities extension |
+| mac-tools/swift-cli/Sources/main.swift | Mac Tools | macOS native tools Swift implementation |
+| aws-auth/index.ts | Auth/OAuth | AWS authentication and credential handling |
+| shared/ui.ts | TUI Components | Generic UI components and utilities |
+| shared/tui.ts | TUI Components | Terminal UI helpers |
+| shared/interview-ui.ts | TUI Components | Interview-style questionnaire UI |
+| shared/confirm-ui.ts | TUI Components | Confirmation dialog UI |
+| shared/terminal.ts | TUI Components | Terminal operations and formatting |
+| shared/format-utils.ts | GSD Workflow | String formatting utilities |
+| shared/sanitize.ts | GSD Workflow | Input sanitization |
+| shared/frontmatter.ts | Config | YAML frontmatter parsing |
+
+### src/resources/agents/
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| javascript-pro.md | Subagent | JavaScript specialist agent definition |
+| typescript-pro.md | Subagent | TypeScript specialist agent definition |
+| worker.md | Subagent | Generic worker agent definition |
+| researcher.md | Subagent | Research and exploration agent definition |
+| scout.md | Subagent | Scout/pathfinding agent definition |
+
+### src/resources/skills/
+
+| Skill Directory | System Label(s) | Description |
+|-----------------|-----------------|-------------|
+| react-best-practices/ | Skills | React development patterns (62 files) |
+| userinterface-wiki/ | Skills | UI/UX guidelines and component reference (155 files) |
+| create-skill/ | Skills | Skill creation scaffolding and templates (25 files) |
+| create-gsd-extension/ | Skills, Extensions | GSD extension scaffolding (22 files) |
+| code-optimizer/ | Skills | Performance optimization techniques (16 files) |
+| agent-browser/ | Skills, Browser Tools | Browser automation guidance (11 files) |
+| github-workflows/ | Skills | GitHub Actions workflow patterns (10 files) |
+| debug-like-expert/ | Skills | Advanced debugging techniques (6 files) |
+| make-interfaces-feel-better/ | Skills | UI/UX improvement patterns (5 files) |
+| accessibility/ | Skills | WCAG and accessibility standards |
+| core-web-vitals/ | Skills | Web performance metrics guidance |
+| web-quality-audit/ | Skills | Quality audit procedures |
+| best-practices/ | Skills | General development best practices |
+| frontend-design/ | Skills | Frontend design principles |
+| lint/ | Skills | Code linting standards |
+| review/ | Skills | Code review guidelines |
+| test/ | Skills | Testing strategies and patterns |
+| web-design-guidelines/ | Skills | Web design principles |
+
+---
+
+## web/ — Web Frontend (Next.js)
+
+### App Shell & Navigation
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/app/layout.tsx | Web UI | Root Next.js layout with theme provider and font |
+| web/app/page.tsx | Web UI | Entry page loading GSDAppShell |
+| web/components/gsd/app-shell.tsx | Web UI | Main app shell — sidebar, panels, terminal, commands |
+| web/components/gsd/sidebar.tsx | Web UI | Multi-panel sidebar with milestone explorer |
+| web/components/gsd/status-bar.tsx | Web UI | Status bar with workspace state and metrics |
+
+### Main Views
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/dashboard.tsx | Web UI | Dashboard with workflow actions and metrics |
+| web/components/gsd/chat-mode.tsx | Web UI | Chat interface for agent interaction |
+| web/components/gsd/projects-view.tsx | Web UI | Project browser and selector |
+| web/components/gsd/files-view.tsx | Web UI | File browser and explorer |
+| web/components/gsd/activity-view.tsx | Web UI | Activity log and history view |
+| web/components/gsd/roadmap.tsx | Web UI, GSD Workflow | Milestone roadmap visualization |
+| web/components/gsd/visualizer-view.tsx | Web UI, Doctor/Diagnostics | Workflow visualization |
+| web/components/gsd/project-welcome.tsx | Web UI | Welcome screen for new projects |
+| web/components/gsd/knowledge-captures-panel.tsx | Web UI | Knowledge and capture management |
+
+### Terminal
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/terminal.tsx | Web UI | Terminal widget with input mode handling |
+| web/components/gsd/shell-terminal.tsx | Web UI | Shell terminal with PTY integration |
+| web/components/gsd/main-session-terminal.tsx | Web UI | Main session terminal display |
+| web/components/gsd/dual-terminal.tsx | Web UI | Side-by-side terminal layout |
+
+### Commands & Dialogs
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/command-surface.tsx | Web UI, Commands | Command palette and slash command dispatcher |
+| web/components/gsd/remaining-command-panels.tsx | Web UI, Commands | History, undo, export, cleanup panels |
+| web/components/gsd/diagnostics-panels.tsx | Web UI, Doctor/Diagnostics | Doctor, forensics, skill health panels |
+| web/components/gsd/settings-panels.tsx | Web UI, Config | Settings and preferences panels |
+| web/components/gsd/guided-dialog.tsx | Web UI | Generic guided dialog component |
+| web/components/gsd/update-banner.tsx | Web UI | Update notification banner |
+| web/components/gsd/scope-badge.tsx | Web UI | Scope badge indicator |
+| web/components/gsd/loading-skeletons.tsx | Web UI | Loading skeleton placeholders |
+| web/components/gsd/code-editor.tsx | Web UI | Code editor display component |
+| web/components/gsd/file-content-viewer.tsx | Web UI | File content viewer and previewer |
+| web/components/gsd/focused-panel.tsx | Web UI | Focused panel layout component |
+
+### Onboarding
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/components/gsd/onboarding-gate.tsx | Web UI, Onboarding | Gate and orchestration for onboarding flow |
+| web/components/gsd/onboarding/step-welcome.tsx | Web UI, Onboarding | Welcome step |
+| web/components/gsd/onboarding/step-mode.tsx | Web UI, Onboarding | User mode selection step |
+| web/components/gsd/onboarding/step-provider.tsx | Web UI, Onboarding | LLM provider selection step |
+| web/components/gsd/onboarding/step-authenticate.tsx | Web UI, Onboarding, Auth/OAuth | Authentication step |
+| web/components/gsd/onboarding/step-dev-root.tsx | Web UI, Onboarding | Dev root directory selection step |
+| web/components/gsd/onboarding/step-project.tsx | Web UI, Onboarding | Project selection step |
+| web/components/gsd/onboarding/step-remote.tsx | Web UI, Onboarding | Remote configuration step |
+| web/components/gsd/onboarding/step-optional.tsx | Web UI, Onboarding | Optional settings step |
+| web/components/gsd/onboarding/step-ready.tsx | Web UI, Onboarding | Ready confirmation step |
+| web/components/gsd/onboarding/wizard-stepper.tsx | Web UI, Onboarding | Stepper progress indicator |
+
+### API Routes
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/app/api/boot/route.ts | API Routes, State Machine | Initial boot payload with project/workspace state |
+| web/app/api/session/manage/route.ts | API Routes, Session Management | Session rename and management |
+| web/app/api/session/browser/route.ts | API Routes, Session Management | Session browser listing |
+| web/app/api/session/command/route.ts | API Routes, Session Management | Session command execution |
+| web/app/api/session/events/route.ts | API Routes, Session Management | Session event streaming (SSE) |
+| web/app/api/terminal/stream/route.ts | API Routes | PTY output streaming via SSE |
+| web/app/api/terminal/input/route.ts | API Routes | Terminal input submission |
+| web/app/api/terminal/resize/route.ts | API Routes | Terminal resize |
+| web/app/api/terminal/sessions/route.ts | API Routes | Terminal session management |
+| web/app/api/terminal/upload/route.ts | API Routes | File upload for terminal |
+| web/app/api/bridge-terminal/stream/route.ts | API Routes, Web Mode | Bridge terminal output streaming |
+| web/app/api/bridge-terminal/input/route.ts | API Routes, Web Mode | Bridge terminal input |
+| web/app/api/bridge-terminal/resize/route.ts | API Routes, Web Mode | Bridge terminal resize |
+| web/app/api/projects/route.ts | API Routes | Project discovery and listing |
+| web/app/api/live-state/route.ts | API Routes, State Machine | Live workspace state updates |
+| web/app/api/steer/route.ts | API Routes, Commands | Steering endpoint for agent direction |
+| web/app/api/history/route.ts | API Routes, State Machine | History and metrics |
+| web/app/api/undo/route.ts | API Routes, Commands | Undo operation |
+| web/app/api/cleanup/route.ts | API Routes, Commands | Cleanup operation |
+| web/app/api/export-data/route.ts | API Routes, Commands | Data export |
+| web/app/api/knowledge/route.ts | API Routes, GSD Workflow | Knowledge base |
+| web/app/api/hooks/route.ts | API Routes, GSD Workflow | Git hooks management |
+| web/app/api/inspect/route.ts | API Routes, Doctor/Diagnostics | Inspection and analysis |
+| web/app/api/doctor/route.ts | API Routes, Doctor/Diagnostics | Doctor diagnostic tool |
+| web/app/api/forensics/route.ts | API Routes, Doctor/Diagnostics | Forensics analysis |
+| web/app/api/skill-health/route.ts | API Routes, Doctor/Diagnostics | Skill health check |
+| web/app/api/visualizer/route.ts | API Routes, Doctor/Diagnostics | Workflow visualization |
+| web/app/api/preferences/route.ts | API Routes, Config | User preferences |
+| web/app/api/settings-data/route.ts | API Routes, Config | Settings data |
+| web/app/api/dev-mode/route.ts | API Routes, Config | Development mode toggle |
+| web/app/api/captures/route.ts | API Routes, GSD Workflow | Knowledge captures |
+| web/app/api/browse-directories/route.ts | API Routes | Directory browsing |
+| web/app/api/files/route.ts | API Routes, Tool System | File system access |
+| web/app/api/git/route.ts | API Routes, Tool System | Git operations |
+| web/app/api/onboarding/route.ts | API Routes, Onboarding | Onboarding data |
+| web/app/api/recovery/route.ts | API Routes, Doctor/Diagnostics | Recovery operations |
+| web/app/api/remote-questions/route.ts | API Routes, Remote Questions | Remote question handling |
+| web/app/api/shutdown/route.ts | API Routes | Graceful shutdown |
+| web/app/api/update/route.ts | API Routes, CLI | Update check |
+
+### Library & State
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| web/lib/auth.ts | Auth/OAuth | Client-side auth token management from URL fragment |
+| web/lib/gsd-workspace-store.tsx | State Machine | Global workspace state store with external store |
+| web/lib/project-store-manager.tsx | State Machine | Multi-project store manager with SSE lifecycle |
+| web/lib/shutdown-gate.ts | State Machine | Graceful shutdown coordination |
+| web/lib/browser-slash-command-dispatch.ts | Commands | Slash command dispatch |
+| web/lib/workflow-actions.ts | GSD Workflow | Primary workflow action derivation logic |
+| web/lib/workflow-action-execution.ts | GSD Workflow | Workflow action execution handler |
+| web/lib/command-surface-contract.ts | Commands | Command surface request/response contract types |
+| web/lib/pty-manager.ts | Web UI | Server-side PTY spawning and session management |
+| web/lib/pty-chat-parser.ts | Web UI | PTY output parsing for chat display |
+| web/lib/remaining-command-types.ts | Web UI | Browser-safe types for command surfaces |
+| web/lib/knowledge-captures-types.ts | GSD Workflow | Knowledge entry and captures types |
+| web/lib/diagnostics-types.ts | Doctor/Diagnostics | Diagnostics panel types |
+| web/lib/settings-types.ts | Config | Settings and preferences types |
+| web/lib/visualizer-types.ts | Doctor/Diagnostics | Workflow visualizer types |
+| web/lib/session-browser-contract.ts | Session Management | Session browser contract types |
+| web/lib/git-summary-contract.ts | Tool System | Git summary contract types |
+| web/lib/utils.ts | Web UI | Common utility functions |
+| web/lib/project-url.ts | Web UI | Project URL parsing and construction |
+| web/lib/workspace-status.ts | Web UI, State Machine | Workspace status derivation |
+| web/lib/image-utils.ts | Image Processing | Image handling and processing utilities |
+| web/lib/use-editor-font-size.ts | Web UI | Editor font size preference hook |
+| web/lib/use-terminal-font-size.ts | Web UI | Terminal font size preference hook |
+| web/lib/use-user-mode.ts | Web UI | User mode hook |
+| web/hooks/use-mobile.ts | Web UI | Mobile viewport detection hook |
+| web/hooks/use-toast.ts | Web UI | Toast notification hook |
+| web/components/theme-provider.tsx | Web UI | Theme provider for dark/light modes |
+| web/components/ui/* (50+ files) | Web UI | Shadcn/ui base component library |
+
+---
+
+## vscode-extension/ — VS Code Extension
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration |
+| vscode-extension/src/gsd-client.ts | VS Code Extension, MCP Server/Client | RPC client for GSD agent communication |
+| vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @gsd command |
+| vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display |
+
+---
+
+## studio/ — Electron Desktop App
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| studio/electron.vite.config.ts | Studio App, Build System | Electron Vite build configuration |
+| studio/src/main/index.ts | Studio App | Electron main process window creation |
+| studio/src/preload/index.ts | Studio App | Context isolation preload for IPC bridge |
+| studio/src/preload/index.d.ts | Studio App | Preload bridge type definitions |
+| studio/src/renderer/src/main.tsx | Studio App | React renderer entry point |
+| studio/src/renderer/src/App.tsx | Studio App | Main app component |
+| studio/src/renderer/src/lib/theme/tokens.ts | Studio App | Design tokens (colors, fonts, sizes) |
+
+---
+
+## native/ — Rust Engine
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| native/crates/engine/src/lib.rs | Native/Rust Tools | N-API entry point exposing all Rust modules |
+| native/crates/engine/src/grep.rs | File Search, Native/Rust Tools | Ripgrep-backed regex search with context/globbing |
+| native/crates/engine/src/glob.rs | File Search, Native/Rust Tools | Glob-pattern FS discovery with gitignore + scan cache |
+| native/crates/engine/src/fd.rs | File Search, Native/Rust Tools | Fuzzy file discovery for autocomplete/@-mentions |
+| native/crates/engine/src/highlight.rs | Syntax Highlighting, Native/Rust Tools | Syntect-backed ANSI syntax highlighting |
+| native/crates/engine/src/ast.rs | AST, Native/Rust Tools | Linker shim for AST N-API registrations |
+| native/crates/engine/src/diff.rs | Text Processing, Native/Rust Tools | Fuzzy matching, Unicode normalization, unified diffs |
+| native/crates/engine/src/image.rs | Image Processing, Native/Rust Tools | Image decode/encode and resize |
+| native/crates/engine/src/html.rs | Text Processing, Native/Rust Tools | HTML to Markdown conversion |
+| native/crates/engine/src/text.rs | Text Processing, Native/Rust Tools | ANSI-aware text measurement and slicing |
+| native/crates/engine/src/truncate.rs | Text Processing, Native/Rust Tools | Line-boundary-aware output truncation |
+| native/crates/engine/src/ps.rs | Native/Rust Tools | Cross-platform process tree management |
+| native/crates/engine/src/clipboard.rs | Native/Rust Tools | Clipboard read/write for text and images |
+| native/crates/engine/src/json_parse.rs | Text Processing, Native/Rust Tools | Streaming JSON parser with partial recovery |
+| native/crates/engine/src/gsd_parser.rs | GSD Workflow, Native/Rust Tools | .gsd/ directory file parser (markdown, frontmatter) |
+| native/crates/engine/src/ttsr.rs | TTSR, Native/Rust Tools | TTSR regex engine with compiled RegexSet |
+| native/crates/engine/src/stream_process.rs | Text Processing, Native/Rust Tools | Bash stream processor (UTF-8, ANSI strip, binary) |
+| native/crates/engine/src/xxhash.rs | Native/Rust Tools | xxHash32 for hashline edit tool |
+| native/crates/engine/src/git.rs | Native/Rust Tools | Native git operations via libgit2 |
+| native/crates/engine/src/fs_cache.rs | File Search, Native/Rust Tools | TTL-based FS scan cache with explicit invalidation |
+| native/crates/engine/src/glob_util.rs | File Search, Native/Rust Tools | Shared glob-pattern helpers |
+| native/crates/engine/src/task.rs | Native/Rust Tools | Blocking work on libuv thread pool with cancellation |
+| native/crates/engine/build.rs | Build System | Cargo build script for napi-build compilation |
+| native/crates/grep/src/lib.rs | File Search, Native/Rust Tools | Ripgrep search library (in-memory and on-disk) |
+| native/crates/ast/src/lib.rs | AST, Native/Rust Tools | AST-aware structural search and rewrite engine |
+| native/crates/ast/src/ast.rs | AST, Native/Rust Tools | ast-grep integration for structural code search |
+| native/crates/ast/src/language/mod.rs | AST, Native/Rust Tools | Vendored language defs and tree-sitter bindings |
+| native/crates/ast/src/language/parsers.rs | AST, Native/Rust Tools | Pre-compiled tree-sitter parsers (50+ languages) |
+
+## packages/native/src/ — Node.js Rust Bindings
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| packages/native/src/native.ts | Native/Rust Tools, Node.js Bindings | Native addon loader with platform fallback |
+| packages/native/src/grep/index.ts | File Search, Node.js Bindings | Ripgrep wrapper for regex search |
+| packages/native/src/fd/index.ts | File Search, Node.js Bindings | Fuzzy file discovery wrapper |
+| packages/native/src/highlight/index.ts | Syntax Highlighting, Node.js Bindings | Syntax highlighting wrapper |
+| packages/native/src/image/index.ts | Image Processing, Node.js Bindings | Image processing wrapper |
+| packages/native/src/html/index.ts | Text Processing, Node.js Bindings | HTML to Markdown wrapper |
+| packages/native/src/diff/index.ts | Text Processing, Node.js Bindings | Text diffing wrapper |
+| packages/native/src/ps/index.ts | Native/Rust Tools, Node.js Bindings | Process tree management wrapper |
+| packages/native/src/truncate/index.ts | Text Processing, Node.js Bindings | Output truncation wrapper |
+| packages/native/src/json-parse/index.ts | Text Processing, Node.js Bindings | JSON parsing wrapper |
+| packages/native/src/stream-process/index.ts | Text Processing, Node.js Bindings | Stream processing wrapper |
+| packages/native/src/ttsr/index.ts | TTSR, Node.js Bindings | TTSR regex engine wrapper |
+
+---
+
+## tests/ — Test Suite
+
+| File / Directory | System Label(s) | Description |
+|------------------|-----------------|-------------|
+| tests/smoke/run.ts | Integration Tests | Test runner for smoke tests |
+| tests/smoke/test-help.ts | Integration Tests | Smoke test for help command |
+| tests/smoke/test-init.ts | Integration Tests | Smoke test for initialization |
+| tests/smoke/test-version.ts | Integration Tests | Smoke test for version command |
+| tests/fixtures/run.ts | Integration Tests | Fixture-based test harness with recording replay |
+| tests/fixtures/provider.ts | Integration Tests | Fixture provider and replayer for LLM turns |
+| tests/fixtures/record.ts | Integration Tests | Recording fixture capture |
+| tests/fixtures/recordings/*.json | Integration Tests | Pre-recorded LLM agent interaction fixtures |
+| tests/live/run.ts | Integration Tests | Live API roundtrip test runner |
+| tests/live/test-anthropic-roundtrip.ts | Integration Tests, AI Providers | Live Anthropic API integration test |
+| tests/live/test-openai-roundtrip.ts | Integration Tests, AI Providers | Live OpenAI API integration test |
+| tests/live-regression/run.ts | Integration Tests | Live regression test runner |
+| tests/repro-worktree-bug/*.mjs | Integration Tests, Worktree | Worktree bug reproduction scripts |
+
+---
+
+## scripts/ — Build & Utility
+
+| File | System Label(s) | Description |
+|------|-----------------|-------------|
+| scripts/dev.js | Build System | Dev supervisor — tsc and resource watcher |
+| scripts/dev-cli.js | Build System | CLI development mode runner |
+| scripts/watch-resources.js | Build System | Resource file watcher for hot reload |
+| scripts/bump-version.mjs | Build System | Version bumper for package.json and platform packages |
+| scripts/sync-pkg-version.cjs | Build System | Sync pkg/package.json with workspace version |
+| scripts/copy-resources.cjs | Build System | Resource file copier for distribution |
+| scripts/copy-export-html.cjs | Build System | HTML export asset copier |
+| scripts/copy-themes.cjs | Build System | Theme file copier |
+| scripts/link-workspace-packages.cjs | Build System | Workspace package symlink manager |
+| scripts/ensure-workspace-builds.cjs | Build System | Postinstall build checker |
+| scripts/build-web-if-stale.cjs | Build System | Conditional web build trigger |
+| scripts/stage-web-standalone.cjs | Build System | Web standalone staging |
+| scripts/generate-changelog.mjs | Build System | Changelog generator from commits |
+| scripts/update-changelog.mjs | Build System | Changelog updater |
+| scripts/version-stamp.mjs | Build System | Version timestamp generator |
+| scripts/validate-pack.sh | Build System | Package validation script |
+| scripts/validate-pack.js | Build System | Package validation (Node.js) |
+| scripts/install-pi-global.js | Build System | Global installation helper |
+| scripts/uninstall-pi-global.js | Build System | Global uninstallation helper |
+| scripts/install-hooks.sh | Build System, GSD Workflow | Git hook installer |
+| scripts/secret-scan.sh | Build System, Auth/OAuth | Secret scanning for credentials |
+| scripts/docs-prompt-injection-scan.sh | Build System | Prompt injection detection in docs |
+| scripts/check-skill-references.mjs | Build System, Skills | Skill reference validator |
+| scripts/preview-dashboard.ts | Web Mode | Dashboard preview server |
+| scripts/ci_monitor.cjs | Build System | CI monitoring dashboard |
+| scripts/recover-gsd-1364.sh | Build System, Migration | Recovery script for issue #1364 |
+| scripts/recover-gsd-1364.ps1 | Build System, Migration | Recovery script for issue #1364 (PowerShell) |
+| scripts/recover-gsd-1668.sh | Build System, Migration | Recovery script for issue #1668 |
+| scripts/recover-gsd-1668.ps1 | Build System, Migration | Recovery script for issue #1668 (PowerShell) |
+
+---
+
+## System → File Reverse Index
+
+Quick lookup: which files are part of each system?
+
+| System | Key Files (abbreviated) |
+|--------|------------------------|
+| **Agent Core** | pi-agent-core/src/*, pi-coding-agent/src/core/agent-session.ts, agent-loop.ts, agent.ts, event-bus.ts, sdk.ts |
+| **AI Providers** | pi-ai/src/providers/*, pi-ai/src/stream.ts, pi-ai/src/models*.ts |
+| **API Routes** | web/app/api/**/*.ts |
+| **AST** | native/crates/ast/*, packages/native/src/ast/ |
+| **Async Jobs** | src/resources/extensions/async-jobs/* |
+| **Auth / OAuth** | pi-ai/src/utils/oauth/*, src/web/web-auth-storage.ts, core/auth-storage.ts, src/pi-migration.ts, aws-auth/index.ts, web/lib/auth.ts |
+| **Auto Engine** | src/resources/extensions/gsd/auto*.ts, gsd/auto-loop.ts, gsd/auto-supervisor.ts, gsd/unit-runtime.ts |
+| **Bg Shell** | src/resources/extensions/bg-shell/* |
+| **Browser Tools** | src/resources/extensions/browser-tools/* |
+| **Build System** | scripts/*, native/crates/engine/build.rs |
+| **CLI** | src/cli.ts, src/cli-web-branch.ts, src/help-text.ts, src/update*.ts, pi-coding-agent/src/cli.ts, src/worktree-cli.ts |
+| **CMux** | src/resources/extensions/cmux/index.ts |
+| **Commands** | gsd/commands*.ts, gsd/exit-command.ts, gsd/undo.ts, gsd/kill.ts, pi-coding-agent/src/core/slash-commands.ts |
+| **Compaction** | pi-coding-agent/src/core/compaction*.ts, core/compaction/* |
+| **Config** | src/app-paths.ts, src/models-resolver.ts, src/remote-questions-config.ts, src/wizard.ts, core/defaults.ts, core/constants.ts, config.ts |
+| **Context7** | src/resources/extensions/context7/index.ts |
+| **Doctor / Diagnostics** | gsd/doctor*.ts, gsd/collision-diagnostics.ts, core/diagnostics.ts, web/lib/diagnostics-types.ts, web/app/api/doctor/*, forensics/* |
+| **Event System** | pi-coding-agent/src/core/event-bus.ts, gsd/auto-observability.ts |
+| **Extension Registry** | src/extension-discovery.ts, src/extension-registry.ts, src/bundled-extension-paths.ts |
+| **Extensions** | pi-coding-agent/src/core/extensions/*, src/resource-loader.ts |
+| **File Search** | native/crates/engine/src/grep.rs, glob.rs, fd.rs, fs_cache.rs, packages/native/src/grep/*, fd/*, core/tools/grep.ts, find.ts |
+| **GSD Workflow** | src/resources/extensions/gsd/* (non-auto), gsd/reports.ts, gsd/notifications.ts, gsd/prompts/*, gsd/workflow-templates/* |
+| **Google Search** | src/resources/extensions/google-search/index.ts |
+| **Headless Mode** | src/headless*.ts |
+| **Image Processing** | native/crates/engine/src/image.rs, packages/native/src/image/*, utils/image-*.ts, web/lib/image-utils.ts |
+| **Integration Tests** | tests/**/* |
+| **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, gsd/bootstrap/* |
+| **LSP** | pi-coding-agent/src/core/lsp/* |
+| **Mac Tools** | src/resources/extensions/mac-tools/* |
+| **MCP Server/Client** | src/mcp-server.ts, src/resources/extensions/mcp-client/index.ts, vscode-extension/src/gsd-client.ts, modes/rpc/* |
+| **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* |
+| **Migration** | gsd/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
+| **Modes** | pi-coding-agent/src/modes/* |
+| **Model System** | pi-coding-agent/src/core/model-*.ts, pi-ai/src/models*.ts, pi-ai/src/api-registry.ts, gsd/model-router.ts |
+| **Native / Rust Tools** | native/crates/engine/src/* |
+| **Node.js Bindings** | packages/native/src/* |
+| **Onboarding** | src/onboarding.ts, src/wizard.ts, web/components/gsd/onboarding/*, web/app/api/onboarding/* |
+| **Permissions** | core/extensions/project-trust.ts, core/auth-storage.ts |
+| **Remote Questions** | src/resources/extensions/remote-questions/* |
+| **Search the Web** | src/resources/extensions/search-the-web/* |
+| **Session Management** | pi-coding-agent/src/core/session-manager.ts, core/settings-manager.ts, web/app/api/session/* |
+| **Skills** | src/resources/skills/*, gsd/skill-telemetry.ts, gsd/preferences-skills.ts, core/skills.ts |
+| **Slash Commands** | src/resources/extensions/slash-commands/* |
+| **State Machine** | gsd/state.ts, gsd/history.ts, gsd/json-persistence.ts, gsd/memory-store.ts, gsd/reactive-graph.ts, core/agent-session.ts, web/lib/gsd-workspace-store.tsx |
+| **Studio App** | studio/* |
+| **Subagent** | src/resources/extensions/subagent/*, src/resources/agents/* |
+| **Syntax Highlighting** | native/crates/engine/src/highlight.rs, packages/native/src/highlight/* |
+| **Text Processing** | native/crates/engine/src/diff.rs, html.rs, text.rs, truncate.rs, json_parse.rs, stream_process.rs |
+| **Tool System** | pi-coding-agent/src/core/tools/*, core/bash-executor.ts, core/exec.ts |
+| **TTSR** | src/resources/extensions/ttsr/*, native/crates/engine/src/ttsr.rs, packages/native/src/ttsr/* |
+| **TUI Components** | packages/pi-tui/src/*, pi-coding-agent/src/modes/interactive/components/*, pi-coding-agent/src/modes/interactive/controllers/* |
+| **Universal Config** | src/resources/extensions/universal-config/* |
+| **Voice** | src/resources/extensions/voice/* |
+| **VS Code Extension** | vscode-extension/src/* |
+| **Web Mode** | src/web/*.ts, src/web-mode.ts |
+| **Web UI** | web/app/*.tsx, web/components/*, web/hooks/*, web/lib/* |
+| **Worktree** | src/worktree-cli.ts, src/worktree-name-gen.ts, gsd/worktree*.ts, tests/repro-worktree-bug/* |
diff --git a/docs/FRONTIER-TECHNIQUES.md b/docs/FRONTIER-TECHNIQUES.md
new file mode 100644
index 000000000..6aa5ad59a
--- /dev/null
+++ b/docs/FRONTIER-TECHNIQUES.md
@@ -0,0 +1,741 @@
+# Frontier Techniques for GSD-2
+
+Research into cutting-edge AI agent techniques that map directly to GSD-2's architecture, ranked by impact and feasibility.
+
+**Date:** 2026-03-25
+**Status:** Research / Pre-RFC
+
+---
+
+## Table of Contents
+
+- [Executive Summary](#executive-summary)
+- [1. Skill Library Evolution](#1-skill-library-evolution)
+- [2. DAG-Based Parallel Tool Execution](#2-dag-based-parallel-tool-execution)
+- [3. Speculative Tool Execution](#3-speculative-tool-execution)
+- [4. Semantic Context Compression](#4-semantic-context-compression)
+- [5. Cross-Session Learning Graph](#5-cross-session-learning-graph)
+- [6. MCTS-Based Planning](#6-mcts-based-planning)
+- [Priority Matrix](#priority-matrix)
+- [Sources & References](#sources--references)
+
+---
+
+## Executive Summary
+
+GSD-2 is a multi-layered, event-driven agent platform with strong extensibility primitives: a skill system, file-based memory, session branching, compaction, and 16+ extension lifecycle hooks. These existing primitives create natural integration points for six frontier techniques that could fundamentally change how GSD operates.
+
+The techniques fall into three categories:
+
+| Category | Techniques | Theme |
+|----------|-----------|-------|
+| **Self-Improvement** | Skill Library Evolution, Cross-Session Learning Graph | GSD gets better the more you use it |
+| **Performance** | DAG Tool Execution, Speculative Tool Execution | GSD gets faster per turn |
+| **Intelligence** | Semantic Context Compression, MCTS Planning | GSD reasons better with the same context budget |
+
+---
+
+## 1. Skill Library Evolution
+
+**Category:** Self-Improvement
+**Impact:** Massive | **Effort:** Medium | **Priority:** #1
+
+### What It Is
+
+Inspired by [SkillRL](https://arxiv.org/abs/2602.08234) (ICLR 2026), this technique transforms GSD's skill system from static instruction files into a self-improving knowledge base. Instead of skills being written once and updated manually, they evolve based on execution outcomes.
+
+SkillRL demonstrates that agents with learned skill libraries outperform baselines by 15.3%+ across task benchmarks, with 10-20% token compression compared to raw trajectory storage.
+
+### How It Works
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ EXECUTION LOOP │
+│ │
+│ 1. Skill invoked → agent executes task │
+│ 2. Outcome captured (success/failure + trajectory) │
+│ 3. Trajectory distilled: │
+│ ├─ Success → strategic pattern extracted │
+│ └─ Failure → anti-pattern + lesson recorded │
+│ 4. Skill file updated with versioned improvement │
+│ 5. Next invocation benefits from accumulated learnings │
+│ │
+└─────────────────────────────────────────────────────────┘
+```
+
+**Two types of learned knowledge:**
+
+| Type | Description | Example |
+|------|-------------|---------|
+| **General Skills** | Universal strategic guidance applicable across tasks | "When editing TypeScript files, always check for type errors via LSP before committing" |
+| **Task-Specific Skills** | Category-level heuristics for specific skill domains | "The `fix-issue` skill should check CI status before opening a PR, not after" |
+
+### Why It Fits GSD-2
+
+GSD already has every primitive needed:
+
+- **Skill files** (`~/.claude/skills/`, `.claude/skills/`) — the storage layer exists
+- **Extension hooks** (`turn_end`, `agent_end`) — outcome capture points exist
+- **Memory system** (MEMORY.md + individual files) — persistence exists
+- **`/improve-skill` and `/heal-skill` commands** — manual versions of this loop already exist
+
+The gap is automation: connecting execution outcomes back to skill files without human intervention.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-session.ts` → `turn_end` event | Captures execution outcome (success/failure signals) |
+| Extension hook: `agent_end` | Triggers trajectory distillation |
+| Skill file system | Receives versioned updates with learned patterns |
+| `compaction.ts` | Provides trajectory data from the session for distillation |
+
+### Architecture
+
+```
+User invokes skill
+ │
+ ▼
+┌──────────────┐ ┌──────────────────┐
+│ AgentSession │────▶│ Skill Executor │
+│ (turn_end) │ │ (tracks outcome) │
+└──────────────┘ └────────┬─────────┘
+ │
+ ┌─────────▼──────────┐
+ │ Outcome Classifier │
+ │ (success/failure/ │
+ │ partial) │
+ └─────────┬──────────┘
+ │
+ ┌───────────────┼───────────────┐
+ ▼ ▼ ▼
+ ┌────────────┐ ┌──────────────┐ ┌───────────┐
+ │ Success │ │ Failure │ │ Partial │
+ │ Distiller │ │ Distiller │ │ Analyzer │
+ └─────┬──────┘ └──────┬───────┘ └─────┬─────┘
+ │ │ │
+ ▼ ▼ ▼
+ ┌─────────────────────────────────────────────┐
+ │ Skill File Updater │
+ │ • Appends learned pattern to skill │
+ │ • Versions the update │
+ │ • Preserves original skill intent │
+ └─────────────────────────────────────────────┘
+```
+
+### Open Questions
+
+- **Drift prevention:** How to prevent accumulated learnings from overwhelming the original skill intent?
+- **Conflict resolution:** What happens when a lesson from one session contradicts another?
+- **Quality gate:** Should updates require a validation pass before being written?
+
+---
+
+## 2. DAG-Based Parallel Tool Execution
+
+**Category:** Performance
+**Impact:** High | **Effort:** Medium | **Priority:** #2
+
+### What It Is
+
+The [LLM Compiler pattern](https://arxiv.org/pdf/2312.04511) (ICML 2024) treats multi-tool workflows like a compiler optimization pass. When the model returns multiple tool calls in a single response, instead of executing them sequentially, the system:
+
+1. Analyzes dependencies between tool calls
+2. Constructs a Directed Acyclic Graph (DAG)
+3. Executes independent tools in parallel
+4. Blocks only on actual data dependencies
+
+### How It Works
+
+**Current GSD behavior (sequential):**
+```
+Read(auth.ts) ─── 150ms ───▶ result
+ │
+Read(types.ts) ─── 120ms ──▶ result
+ │
+Grep("login") ─── 80ms ────▶ result
+ │
+Read(test.ts) ─── 130ms ───▶ result
+ │
+Total: ~480ms sequential
+```
+
+**With DAG execution (parallel):**
+```
+Read(auth.ts) ─── 150ms ──▶ result ─┐
+Read(types.ts) ─── 120ms ──▶ result ─┤
+Grep("login") ─── 80ms ───▶ result ─┤── all complete at 150ms
+Read(test.ts) ─── 130ms ──▶ result ─┘
+ │
+Total: ~150ms (max of parallel set)
+```
+
+**Dependency analysis rules:**
+
+| Tool A | Tool B | Dependency? | Reason |
+|--------|--------|-------------|--------|
+| Read(file) | Read(file) | No | Reads are idempotent |
+| Read(file) | Grep(pattern) | No | Independent data sources |
+| Read(file) | Edit(file) | Yes | Edit depends on Read content |
+| Edit(file) | Edit(file) | Yes | Edits to same file must serialize |
+| Bash(cmd) | Bash(cmd) | Maybe | Depends on side effects |
+| Write(file) | Read(file) | Yes | Read after write needs write to complete |
+
+### Why It Fits GSD-2
+
+The model already emits multiple `tool_use` blocks in a single response. GSD processes them, but the execution path in `agent-loop.ts` handles them in sequence. The parallelism opportunity is sitting right there.
+
+**Measured impact estimate:** A typical coding turn involves 3-5 tool calls. With 60% parallelizable (reads, greps, globs), per-turn latency drops by 40-60%. Over a 50-turn session, that's minutes saved.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-loop.ts` tool execution path | Replace sequential execution with DAG scheduler |
+| Tool definitions | Annotate tools with side-effect metadata (pure/impure) |
+| Extension hooks (`tool_*`) | Must still fire in correct order per dependency chain |
+
+### Architecture
+
+```
+Model response with N tool_use blocks
+ │
+ ▼
+┌──────────────────────────────┐
+│ Dependency Analyzer │
+│ • Parse tool calls │
+│ • Identify file overlaps │
+│ • Identify data dependencies │
+│ • Classify: pure vs impure │
+└──────────────┬───────────────┘
+ │
+ ▼
+┌──────────────────────────────┐
+│ DAG Constructor │
+│ • Nodes = tool calls │
+│ • Edges = dependencies │
+│ • Topological sort │
+└──────────────┬───────────────┘
+ │
+ ▼
+┌──────────────────────────────┐
+│ Parallel Executor │
+│ • Execute roots immediately │
+│ • On completion, unlock │
+│ dependent nodes │
+│ • Collect all results │
+│ • Return in original order │
+└──────────────────────────────┘
+```
+
+### Open Questions
+
+- **Bash side effects:** How to determine if two Bash commands conflict without executing them?
+- **Extension hooks:** Should `tool_start`/`tool_end` events fire in execution order or original order?
+- **Error propagation:** If a parallel tool fails, do dependent tools get cancelled or receive the error?
+
+---
+
+## 3. Speculative Tool Execution
+
+**Category:** Performance
+**Impact:** High | **Effort:** Low-Medium | **Priority:** #3
+
+### What It Is
+
+Based on [Speculative Tool Calls research](https://arxiv.org/pdf/2512.15834), this technique predicts which tools the model will request and pre-executes them before the model responds. Correct predictions eliminate the first tool-call round-trip entirely. Wrong predictions are discarded at zero cost beyond compute.
+
+### How It Works
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ User: "fix the bug in auth.ts" │
+│ │
+│ BEFORE model responds: │
+│ Speculator predicts: │
+│ ├─ Read("auth.ts") → pre-executed ✓ │
+│ ├─ Grep("error|bug", "auth") → pre-executed ✓ │
+│ ├─ LSP diagnostics(auth.ts) → pre-executed ✓ │
+│ └─ Read("auth.test.ts") → pre-executed ✓ │
+│ │
+│ Model responds with tool calls: │
+│ ├─ Read("auth.ts") → CACHE HIT (0ms) │
+│ ├─ Read("auth.test.ts") → CACHE HIT (0ms) │
+│ └─ Grep("login", "src/") → cache miss (execute) │
+│ │
+│ Hit rate: 2/3 = 67% │
+│ Latency saved: ~300ms on this turn │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Prediction strategies (simplest to most sophisticated):**
+
+| Strategy | Description | Expected Hit Rate |
+|----------|-------------|-------------------|
+| **Keyword extraction** | Parse user prompt for file paths, function names → Read those files | 40-60% |
+| **Session history** | Track which tools follow which user prompt patterns | 50-70% |
+| **Learned patterns** | Use the skill library evolution data to predict tool sequences | 60-80% |
+| **Model pre-query** | Ask a fast/cheap model to predict tool calls | 70-85% |
+
+### Why It Fits GSD-2
+
+The #1 latency bottleneck in GSD is the round-trip: user prompt → model thinks → model requests tool → tool executes → result sent back → model thinks again. Speculative execution attacks the highest-latency step.
+
+GSD's architecture makes this easy to add:
+- `AgentSession.prompt()` already processes user input before sending to the model
+- Tool results are already cached in the message array
+- The extension system can intercept input and spawn pre-fetches
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `AgentSession.prompt()` | Trigger speculation after user input, before model call |
+| Tool result cache (new) | Store speculated results keyed by tool+args |
+| `agent-loop.ts` tool execution | Check cache before executing; serve cached result on hit |
+| Extension hook: `input` | Parse user intent for file paths, patterns |
+
+### Architecture
+
+```
+User input arrives
+ │
+ ├──────────────────────────────────────┐
+ │ │
+ ▼ ▼
+┌───────────────┐ ┌──────────────────┐
+│ Send to LLM │ │ Speculator │
+│ (normal path) │ │ • Extract paths │
+│ │ │ • Predict tools │
+│ ... waiting │ │ • Pre-execute │
+│ for response │ │ • Cache results │
+│ │ └──────────────────┘
+│ │ │
+│ │◀─── model returns ──────────│
+│ │ tool_use blocks │
+└───────┬───────┘ │
+ │ │
+ ▼ │
+┌───────────────┐ │
+│ Tool Executor │◀──── check cache ───────────┘
+│ • Cache hit? │
+│ → return │
+│ • Cache miss? │
+│ → execute │
+└───────────────┘
+```
+
+### Cost Analysis
+
+| Scenario | Cost |
+|----------|------|
+| **Correct prediction** | ~0ms latency (result already available). Compute cost: the pre-execution itself (trivial for Read/Grep). |
+| **Wrong prediction** | Wasted compute for the pre-executed tool. For Read/Grep/Glob, this is <10ms of I/O. |
+| **Partial hit** | Net positive as long as hit rate > 20% (given how cheap misses are). |
+
+### Open Questions
+
+- **TTL for cached results:** How long are speculated results valid? File contents can change between speculation and model request.
+- **Side effects:** Should only pure tools (Read, Grep, Glob, LSP) be speculatable?
+- **Resource limits:** Cap on number of speculative executions per turn to prevent I/O storms?
+
+---
+
+## 4. Semantic Context Compression
+
+**Category:** Intelligence
+**Impact:** High | **Effort:** High | **Priority:** #4
+
+### What It Is
+
+GSD's compaction system uses a char/4 heuristic for token estimation and all-or-nothing LLM summarization for context reduction. Research from [Zylos](https://zylos.ai/research/2026-02-28-ai-agent-context-compression-strategies) and [context engineering literature](https://rlancemartin.github.io/2025/06/23/context_engineering/) shows that embedding-based compression achieves 80-90% token reduction while preserving the ability to selectively recall specific historical context.
+
+### Current GSD Compaction (Weaknesses Highlighted)
+
+```
+Messages: [M1, M2, M3, M4, M5, M6, M7, M8, M9, M10]
+ ▲
+Token budget exceeded │ recent
+ │
+Current approach:
+┌─────────────────────────┬─────────────────────────┐
+│ M1-M6: LLM-summarized │ M7-M10: kept verbatim │
+│ into single blob │ (last ~20k tokens) │
+│ │ │
+│ ⚠ All detail lost │ ✓ Full fidelity │
+│ ⚠ No selective recall │ │
+│ ⚠ char/4 overestimates │ │
+└─────────────────────────┴─────────────────────────┘
+```
+
+**Three specific weaknesses:**
+
+| Weakness | Impact | Current Code Location |
+|----------|--------|-----------------------|
+| char/4 token estimation | ~25% overestimate → compacts too early → wastes context | `compaction.ts:201-259` |
+| All-or-nothing summarization | Loses specific details that may be relevant later | `compaction.ts:327-400` |
+| No retrieval from compacted history | Once summarized, detail is gone forever | `compaction-orchestrator.ts` |
+
+### Proposed: Tiered Memory Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ HOT TIER │
+│ Recent turns (last ~20k tokens) │
+│ Full text, full fidelity │
+│ Storage: in-context messages │
+│ Access: always in prompt │
+├─────────────────────────────────────────────────────────┤
+│ WARM TIER │
+│ Older turns (beyond context window) │
+│ Stored as embeddings + compressed text │
+│ Storage: session-local vector index │
+│ Access: retrieved when semantically relevant to │
+│ current turn │
+│ Token cost: only retrieved segments count │
+├─────────────────────────────────────────────────────────┤
+│ COLD TIER │
+│ Ancient turns / previous sessions │
+│ Stored as summaries + metadata │
+│ Storage: disk (existing session files) │
+│ Access: retrieved only on explicit recall │
+│ Token cost: minimal summary headers │
+└─────────────────────────────────────────────────────────┘
+```
+
+**How retrieval works per turn:**
+
+```
+New user prompt arrives
+ │
+ ▼
+┌───────────────────┐
+│ Embed the prompt │ (compute embedding of user's question)
+└────────┬──────────┘
+ │
+ ├──── query warm tier ──▶ top-K relevant historical turns
+ │ (cosine similarity > threshold)
+ │
+ ├──── always include ──▶ hot tier (recent turns, full text)
+ │
+ ▼
+┌───────────────────┐
+│ Compose context │
+│ = hot + retrieved │
+│ + system prompt │
+└───────────────────┘
+```
+
+### Token Estimation Improvement
+
+Replace char/4 with adaptive estimation:
+
+| Approach | Accuracy | Cost |
+|----------|----------|------|
+| **char/4 (current)** | ~75% (overestimates) | Zero |
+| **Provider-reported usage** | 100% (for last turn) | Zero (already tracked) |
+| **tiktoken/provider tokenizer** | ~98% | ~5ms per message |
+| **Hybrid: actual for recent, char/4 for old** | ~95% | Negligible |
+
+The hybrid approach — use actual token counts from provider responses for recent messages, fall back to char/4 for older messages — is a quick win that requires no new dependencies.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `compaction.ts` | Replace cut-point algorithm with tiered approach |
+| `compaction-orchestrator.ts` | Add warm-tier retrieval before model call |
+| `agent-session.ts` message building | Inject retrieved warm-tier segments |
+| Session persistence layer | Store embeddings alongside session entries |
+
+### Open Questions
+
+- **Embedding model:** Local (fast, private) or API (better quality, adds latency)?
+- **Index format:** Simple cosine similarity on flat arrays vs. HNSW index?
+- **Retrieval budget:** How many tokens to allocate to warm-tier retrievals per turn?
+- **Coherence:** How to prevent retrieved historical context from confusing the model about the current state?
+
+---
+
+## 5. Cross-Session Learning Graph
+
+**Category:** Self-Improvement
+**Impact:** Transformative | **Effort:** High | **Priority:** #5
+
+### What It Is
+
+GSD's memory system (MEMORY.md + individual files) stores flat, file-based memories. A learning graph extends this into a structured knowledge base that captures relationships between codebases, files, errors, solutions, and patterns across all sessions.
+
+This is informed by research on [agent memory architectures](https://github.com/Shichun-Liu/Agent-Memory-Paper-List) and the emerging discipline of [context engineering](https://thenewstack.io/memory-for-ai-agents-a-new-paradigm-of-context-engineering/).
+
+### Current Memory vs Learning Graph
+
+| Aspect | Current (MEMORY.md) | Learning Graph |
+|--------|---------------------|----------------|
+| **Structure** | Flat file list | Nodes + edges (graph) |
+| **Relationships** | None | "file X often breaks when Y changes" |
+| **Retrieval** | All loaded into context | Query-driven, only relevant nodes |
+| **Learning** | Manual (user says "remember X") | Automatic from execution outcomes |
+| **Scope** | Per-project directory | Per-project with cross-project patterns |
+| **Staleness** | Manual cleanup | Confidence decay over time |
+
+### Graph Schema
+
+```
+┌──────────┐ touches ┌──────────┐
+│ Session │────────────────▶│ File │
+│ │ │ │
+│ • date │ │ • path │
+│ • outcome │ │ • type │
+│ • tokens │ │ • churn │
+└────┬──────┘ └─────┬─────┘
+ │ │
+ │ encountered │ involved_in
+ │ │
+ ▼ ▼
+┌──────────┐ resolved_by ┌──────────┐
+│ Error │────────────────▶│ Solution │
+│ │ │ │
+│ • type │ │ • pattern │
+│ • message │ │ • success │
+│ • freq │ │ rate │
+└──────────┘ └──────────┘
+ │ │
+ │ prevented_by │ uses
+ │ │
+ ▼ ▼
+┌──────────┐ ┌──────────┐
+│ Pattern │ │ Tool │
+│ │ │ │
+│ • type │ │ • name │
+│ • desc │ │ • avg │
+│ • conf │ │ time │
+└──────────┘ └──────────┘
+```
+
+### Example Queries
+
+| Query | Result |
+|-------|--------|
+| "What errors have occurred in `auth.ts`?" | List of error nodes connected to that file node |
+| "What's the typical fix for `TypeError` in this codebase?" | Solution nodes with highest success rate for that error type |
+| "Which files tend to break together?" | File clusters with high co-occurrence in error sessions |
+| "What tools are slowest in this project?" | Tool nodes sorted by avg execution time |
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `session-manager.ts` | Write graph nodes on session save |
+| `agent-session.ts` prompt building | Query graph for relevant context before model call |
+| Memory system (MEMORY.md) | Coexists — graph handles structured knowledge, memory handles preferences/feedback |
+| Extension hook: `agent_end` | Trigger graph update with session outcome |
+
+### Storage Options
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **SQLite + json columns** | Simple, no dependencies, fast queries | No native vector search |
+| **SQLite + sqlite-vss** | Adds vector similarity to SQLite | Extra native dependency |
+| **Flat JSON files** | Zero dependencies, git-friendly | Slow for large graphs |
+| **LanceDB** | Embedded vector DB, no server | Additional dependency |
+
+### Open Questions
+
+- **Privacy:** Graph contains detailed codebase interaction history — should it be encrypted at rest?
+- **Portability:** Should the graph travel with the project (`.claude/` dir) or stay user-local?
+- **Garbage collection:** How to prune stale nodes (e.g., files that no longer exist)?
+
+---
+
+## 6. MCTS-Based Planning
+
+**Category:** Intelligence
+**Impact:** Transformative | **Effort:** Very High | **Priority:** #6
+
+### What It Is
+
+Inspired by [ToolTree](https://www.agentic-patterns.com/patterns/skill-library-evolution/) and Monte Carlo Tree Search, this technique replaces GSD's linear action selection with a tree-based planner that explores multiple solution paths simultaneously.
+
+Instead of the model deciding one action at a time and hoping it works, the system:
+
+1. Generates N candidate next-actions
+2. Scores each based on estimated probability of reaching the goal
+3. Explores promising branches in parallel
+4. Backtracks when a path fails, without wasting the user's context on dead ends
+
+### Current vs MCTS Approach
+
+**Current (linear):**
+```
+User: "fix the auth bug"
+ │
+ ▼
+Action 1: Read auth.ts ──▶ Action 2: Edit line 45 ──▶ Action 3: Run tests
+ │
+ Tests fail ✗
+ │
+ ▼
+ Action 4: Try different edit
+ │
+ Tests fail ✗
+ │
+ ▼
+ Action 5: Read error log...
+ (linear flailing)
+```
+
+**With MCTS (tree search):**
+```
+User: "fix the auth bug"
+ │
+ ▼
+Read auth.ts
+ │
+ ├── Branch A: Edit line 45 (score: 0.6)
+ │ └── Run tests → FAIL → prune
+ │
+ ├── Branch B: Check auth middleware (score: 0.7) ◀── highest score
+ │ └── Edit middleware.ts → Run tests → PASS ✓
+ │
+ └── Branch C: Check env config (score: 0.3)
+ └── (not explored — lower score)
+
+Result: Branch B succeeds after 2 actions, not 5+
+```
+
+### Why It Fits GSD-2
+
+GSD already has session branching primitives:
+- `fork()` creates a branch from any message
+- Branch summaries compress history at fork points
+- Tree navigation (`/tree`) lets users explore branches
+- Session tree is already a first-class concept
+
+The gap: these primitives are user-triggered. MCTS would make the agent trigger them automatically during problem-solving.
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ MCTS Planning Layer │
+│ │
+│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │
+│ │ Proposer │───▶│ Scorer │───▶│ Selector │ │
+│ │ Generate N │ │ Estimate P │ │ Pick best │ │
+│ │ candidates │ │ of success │ │ to explore │ │
+│ └─────────────┘ └──────────────┘ └─────┬──────┘ │
+│ │ │
+│ ┌─────────────┐ ┌──────────────┐ │ │
+│ │ Pruner │◀───│ Executor │◀─────────┘ │
+│ │ Kill dead │ │ Run action │ │
+│ │ branches │ │ in worktree │ │
+│ └─────────────┘ └──────────────┘ │
+└─────────────────────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────┐
+│ Agent Session │
+│ (receives winning │
+│ branch as result) │
+└─────────────────────┘
+```
+
+### Scoring Approaches
+
+| Approach | Speed | Quality | Cost |
+|----------|-------|---------|------|
+| **Heuristic** (file relevance, error proximity) | Fast | Low | Free |
+| **Fast model** (haiku-class rates candidates) | Medium | Medium | Low |
+| **Self-evaluation** (main model rates its own proposals) | Slow | High | High |
+| **Learned scorer** (trained on past outcomes from learning graph) | Fast | High | Free at inference |
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-loop.ts` | New planning phase between user prompt and action execution |
+| Session branching (`fork()`) | Used to create exploration branches |
+| Git worktrees | Each branch explored in an isolated worktree |
+| `agent-session.ts` | Receives the winning branch and presents it as the result |
+| Skill Library Evolution (#1) | Provides learned patterns to improve the scorer over time |
+
+### Cost-Benefit Analysis
+
+| Factor | Value |
+|--------|-------|
+| **LLM calls per turn** | 2-5x more (proposal generation + scoring) |
+| **Token usage** | 3-10x more per complex problem |
+| **Success rate on hard problems** | Estimated 30-50% improvement |
+| **Time to solution** | Fewer total turns despite more LLM calls per turn |
+| **User experience** | Agent appears to "think harder" on hard problems |
+
+### Open Questions
+
+- **When to activate:** MCTS is expensive. Should it only activate when the agent detects a hard problem (repeated failures, high uncertainty)?
+- **Branch isolation:** Git worktrees work for file changes, but how to isolate Bash side effects?
+- **Budget control:** How many branches to explore before falling back to linear execution?
+- **Transparency:** Should the user see the exploration tree or just the winning path?
+
+---
+
+## Priority Matrix
+
+| # | Technique | Impact | Effort | Compounding | Dependencies |
+|---|-----------|--------|--------|-------------|--------------|
+| 1 | **Skill Library Evolution** | Massive | Medium | Yes — improves all other techniques | None |
+| 2 | **DAG Tool Execution** | High | Medium | No — static speedup | None |
+| 3 | **Speculative Tool Execution** | High | Low-Med | Yes — improves with learning | Benefits from #1 |
+| 4 | **Semantic Context Compression** | High | High | No — static improvement | None |
+| 5 | **Cross-Session Learning Graph** | Transformative | High | Yes — feeds #1, #3, #6 | Benefits from #1 |
+| 6 | **MCTS Planning** | Transformative | Very High | Yes — improves with #1, #5 | Benefits from #1, #5 |
+
+### Recommended Implementation Order
+
+```
+Phase 1 (Foundation) Phase 2 (Performance) Phase 3 (Intelligence)
+───────────────────── ───────────────────── ─────────────────────
+┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
+│ Skill Library │ │ DAG Tool Exec │ │ Semantic Context│
+│ Evolution │──feeds──▶│ │ │ Compression │
+│ │ │ Speculative │ │ │
+│ │──feeds──▶│ Tool Exec │ │ MCTS Planning │
+└─────────────────┘ └─────────────────┘ └─────────────────┘
+ │ ▲
+┌─────────────────┐ │ │
+│ Cross-Session │───────────────────┴──────────────────────────┘
+│ Learning Graph │ (feeds intelligence layer)
+└─────────────────┘
+```
+
+**Phase 1** creates the feedback loop that makes everything else better over time.
+**Phase 2** delivers immediate, measurable performance wins.
+**Phase 3** requires the most architectural change but delivers the deepest capability gains.
+
+---
+
+## Sources & References
+
+### Papers
+
+- [SkillRL: Evolving Agents via Recursive Skill-Augmented RL](https://arxiv.org/abs/2602.08234) — ICLR 2026. Skill library evolution framework.
+- [LLMCompiler: An LLM Compiler for Parallel Function Calling](https://arxiv.org/pdf/2312.04511) — ICML 2024. DAG-based tool execution.
+- [Optimizing Agentic LLM Inference via Speculative Tool Calls](https://arxiv.org/pdf/2512.15834) — Speculative execution for agent tools.
+- [RISE: Recursive Introspection for Self-Improvement](https://proceedings.neurips.cc/paper_files/paper/2024/file/639d992f819c2b40387d4d5170b8ffd7-Paper-Conference.pdf) — NeurIPS 2024. Self-improving LLM agents.
+- [Don't Break the Cache: Prompt Caching for Agentic Tasks](https://arxiv.org/html/2601.06007v1) — Prompt caching evaluation.
+- [Efficient LLM Serving for Agentic Workflows](https://arxiv.org/html/2603.16104v1) — Systems perspective on agent serving.
+
+### Industry & Analysis
+
+- [Context Engineering for Agents](https://rlancemartin.github.io/2025/06/23/context_engineering/) — Lance Martin's comprehensive guide.
+- [AI Agent Context Compression Strategies](https://zylos.ai/research/2026-02-28-ai-agent-context-compression-strategies) — Zylos Research, Feb 2026.
+- [Context Engineering for Coding Agents](https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html) — Martin Fowler.
+- [Memory for AI Agents: A New Paradigm](https://thenewstack.io/memory-for-ai-agents-a-new-paradigm-of-context-engineering/) — The New Stack.
+- [LLM Compiler Agent Pattern](https://agent-patterns.readthedocs.io/en/stable/patterns/llm-compiler.html) — Agent Patterns documentation.
+- [Skill Library Evolution Pattern](https://www.agentic-patterns.com/patterns/skill-library-evolution/) — Awesome Agentic Patterns.
+
+### Workshops & Events
+
+- [ICLR 2026 Workshop on AI with Recursive Self-Improvement](https://iclr.cc/virtual/2026/workshop/10000796)
+- [Agent Memory Paper List](https://github.com/Shichun-Liu/Agent-Memory-Paper-List) — Comprehensive survey.
+- [Awesome Context Engineering](https://github.com/Meirtz/Awesome-Context-Engineering) — Papers, frameworks, guides.
diff --git a/docs/PRD-branchless-worktree-architecture.md b/docs/PRD-branchless-worktree-architecture.md
new file mode 100644
index 000000000..4c511353c
--- /dev/null
+++ b/docs/PRD-branchless-worktree-architecture.md
@@ -0,0 +1,383 @@
+# PRD: Branchless Worktree Architecture
+
+**Author:** Lex Christopherson
+**Date:** 2026-03-15
+**ADR:** [ADR-001-branchless-worktree-architecture.md](./ADR-001-branchless-worktree-architecture.md)
+**Priority:** Critical — blocks reliable auto-mode operation
+
+---
+
+## Problem Statement
+
+GSD's auto-mode is unreliable. Users experience:
+
+1. **Infinite loop detection failures** — the agent writes planning artifacts on slice branches that become invisible after branch switching, causing `verifyExpectedArtifact()` to fail repeatedly. Auto-mode burns budget retrying the same unit 3-6 times before hard-stopping. This is the #1 user complaint.
+
+2. **State corruption across branches** — `.gsd/` planning artifacts (roadmaps, plans, decisions) are gitignored but branch-specific. Multiple branches sharing a single `.gsd/` directory clobber each other's state. Users see wrong milestones marked complete, wrong roadmaps loaded, and auto-mode starting from the wrong phase.
+
+3. **Excessive complexity** — 770+ lines of merge, conflict resolution, branch switching, and self-healing code exist solely to manage slice branches inside worktrees. This code has required 15+ bug fixes across versions and remains the primary source of auto-mode failures.
+
+These problems are architectural. They cannot be fixed by patching individual symptoms.
+
+## Vision
+
+Auto-mode uses git worktrees for isolation and sequential commits for history. No branch switching. No merge conflicts within a worktree. Planning artifacts are tracked in git and travel with the branch. The git layer is so simple it can't break.
+
+## Success Criteria
+
+| Criterion | Measurement |
+|-----------|-------------|
+| Zero loop detection failures from branch visibility | No `verifyExpectedArtifact()` failures caused by branch mismatch in 50 consecutive auto-mode runs |
+| Zero `.gsd/` state corruption | Manual worktrees created via `git worktree add` have correct `.gsd/` state without any GSD-specific initialization |
+| Code deletion | Net removal of ≥500 lines of merge/conflict/branch-switching code |
+| Test simplification | Removal or simplification of ≥6 merge-specific test files |
+| Backwards compatibility | Existing projects with `gsd/M001/S01` slice branches continue to work (read-only; new work uses new model) |
+| No new git primitives | The implementation uses only: worktrees, commits, squash-merge. No new branch types, merge strategies, or conflict resolution. |
+
+## Non-Goals
+
+- Parallel slice execution within a single worktree (if needed later, use separate worktrees)
+- Changing how milestones relate to `main` (squash-merge stays)
+- Modifying the dispatch unit types or state machine (except removing `fix-merge`)
+- Changing the worktree-manager.ts manual worktree API (`/worktree` command)
+
+## Current Architecture
+
+### Branch Model (M003, v2.13.0)
+
+```
+main
+ └─ milestone/M001 (worktree at .gsd/worktrees/M001/)
+ ├─ gsd/M001/S01 (slice branch — code + .gsd/ artifacts)
+ │ └── merge --no-ff → milestone/M001
+ ├─ gsd/M001/S02
+ │ └── merge --no-ff → milestone/M001
+ └── squash merge → main
+```
+
+### Data Flow
+
+```
+Agent writes file → on slice branch → handleAgentEnd → auto-commit on slice branch
+→ switch to milestone branch → verifyExpectedArtifact → FILE NOT FOUND (it's on slice branch)
+→ loop counter++ → retry → same result → HARD STOP
+```
+
+### Code Involved
+
+| File | Lines | Purpose |
+|------|-------|---------|
+| `auto-worktree.ts` | 512 | Worktree lifecycle + slice→milestone merge |
+| `git-service.ts` | 915 | Branch creation, switching, merge with conflict resolution |
+| `git-self-heal.ts` | 198 | Merge failure recovery |
+| `auto.ts` | ~150 lines | Merge dispatch guards, fix-merge routing, branch-mode vs worktree-mode branching |
+| `worktree.ts` | ~40 lines | Slice branch delegates |
+| 11 test files | ~2000 lines | Merge/branch/worktree test coverage |
+
+### `.gsd/` Tracking (Current — Contradictory)
+
+- `.gitignore` line 52: `.gsd/` — ignores everything
+- `smartStage()` lines 338-349: force-adds `GSD_DURABLE_PATHS` — tracks milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md
+- Result: `.gsd/milestones/` is partially tracked on some branches, fully ignored on others. The code fights the config.
+
+## Proposed Architecture
+
+### Branch Model
+
+```
+main
+ └─ milestone/M001 (worktree at .gsd/worktrees/M001/)
+ │
+ commit: feat(M001): context + roadmap
+ commit: feat(M001/S01): research
+ commit: feat(M001/S01): plan
+ commit: feat(M001/S01/T01): implement auth service
+ commit: feat(M001/S01/T02): implement auth tests
+ commit: feat(M001/S01): summary + UAT
+ commit: docs(M001): reassess roadmap after S01
+ commit: feat(M001/S02): research
+ commit: feat(M001/S02): plan
+ commit: ...
+ commit: feat(M001): milestone complete
+ │
+ └── squash merge → main
+```
+
+One branch. Sequential commits. No merges within the worktree.
+
+### Data Flow
+
+```
+Agent writes file → on milestone branch → handleAgentEnd → auto-commit on milestone branch
+→ verifyExpectedArtifact → FILE FOUND (same branch) → persist completion → next dispatch
+```
+
+### `.gsd/` Tracking (Proposed — Coherent)
+
+**Tracked (travels with branch):**
+```
+.gsd/milestones/**/*.md (except CONTINUE markers)
+.gsd/milestones/**/*.json (META.json integration records)
+.gsd/PROJECT.md
+.gsd/DECISIONS.md
+.gsd/REQUIREMENTS.md
+.gsd/QUEUE.md
+```
+
+**Gitignored (ephemeral):**
+```
+.gsd/auto.lock
+.gsd/completed-units.json
+.gsd/STATE.md
+.gsd/metrics.json
+.gsd/gsd.db
+.gsd/activity/
+.gsd/runtime/
+.gsd/worktrees/
+.gsd/DISCUSSION-MANIFEST.json
+.gsd/milestones/**/*-CONTINUE.md
+.gsd/milestones/**/continue.md
+```
+
+### Why This Works
+
+| Problem | How It's Solved |
+|---------|----------------|
+| Artifact invisibility after branch switch | No branch switching. Artifacts commit on the one branch. |
+| `.gsd/` state clobbering | Artifacts tracked in git. Each branch carries its own `.gsd/`. `git worktree add` and `git checkout` give correct state. |
+| Merge conflict complexity | No merges within a worktree. Only merge is milestone→main (squash). |
+| Manual worktree initialization | Tracked artifacts are checked out with the branch. No GSD-specific bootstrap needed. |
+| Dual isolation mode maintenance | Single mode: worktree. Branch-mode (`git.isolation: "branch"`) deprecated. |
+
+## Implementation Plan
+
+### Phase 1: `.gitignore` + Tracking Fix
+
+**Goal:** Planning artifacts are tracked in git. `.gitignore` reflects reality.
+
+1. Update `.gitignore`:
+ - Remove blanket `.gsd/` ignore
+ - Add explicit runtime-only ignores (see proposed list above)
+
+2. Force-add existing planning artifacts on current branch:
+ ```
+ git add --force .gsd/milestones/ .gsd/PROJECT.md .gsd/DECISIONS.md .gsd/REQUIREMENTS.md .gsd/QUEUE.md
+ ```
+
+3. Ensure runtime files are NOT tracked:
+ ```
+ git rm --cached -r .gsd/runtime/ .gsd/activity/ .gsd/STATE.md .gsd/metrics.json .gsd/completed-units.json .gsd/auto.lock
+ ```
+
+4. Update README suggested `.gitignore` section
+
+5. Remove `smartStage()` force-add of `GSD_DURABLE_PATHS` — no longer needed since `.gitignore` doesn't block them
+
+**Verification:** `git status` shows planning artifacts tracked, runtime files untracked. `git worktree add` on a new worktree has correct `.gsd/milestones/` state.
+
+### Phase 2: Remove Slice Branch Creation + Switching
+
+**Goal:** No code creates, switches to, or references slice branches for new work.
+
+1. Remove `ensureSliceBranch()` from `git-service.ts` (lines 485-544)
+2. Remove `switchToMain()` from `git-service.ts` (lines 549-563)
+3. Remove `getSliceBranchName()` from `worktree.ts` (lines 94-98)
+4. Remove `isOnSliceBranch()` and `getActiveSliceBranch()` from `worktree.ts`
+5. Update `auto.ts` dispatch paths — remove branch creation before `execute-task`
+6. Update `handleAgentEnd` — remove branch-switching logic post-dispatch
+
+**Verification:** Auto-mode runs a full slice (research → plan → execute → complete) without creating any branches. All commits land on `milestone/`.
+
+### Phase 3: Remove Slice Merge Code
+
+**Goal:** All slice→milestone and slice→main merge code is deleted.
+
+1. Remove `mergeSliceToMilestone()` from `auto-worktree.ts` (lines 253-350)
+2. Remove `mergeSliceToMain()` from `git-service.ts` (lines 705-893)
+3. Remove merge dispatch guards from `auto.ts` (lines 1635-1679)
+4. Remove `fix-merge` dispatch unit type from `auto.ts`
+5. Remove `buildPromptForFixMerge()` from `auto.ts`
+6. Remove `withMergeHeal()` from `git-self-heal.ts` (lines 99-136)
+7. Remove `abortAndReset()` from `git-self-heal.ts` (lines 37-84) — or simplify to crash-recovery-only
+8. Remove `shouldUseWorktreeIsolation()` preference resolution — worktree is the only mode
+9. Remove `getMergeToMainMode()` — milestone merge is the only mode
+10. Deprecate `git.isolation: "branch"` and `git.merge_to_main: "slice"` preferences
+
+**Verification:** `git grep mergeSliceToMilestone` returns zero results. `git grep mergeSliceToMain` returns zero results. `git grep fix-merge` returns zero results (outside of changelog/docs).
+
+### Phase 4: Simplify `mergeMilestoneToMain()`
+
+**Goal:** Milestone→main merge is clean and minimal.
+
+The function becomes:
+1. Auto-commit any dirty state in worktree
+2. `process.chdir(originalBasePath)` — back to main repo
+3. `git checkout main`
+4. `git merge --squash milestone/`
+5. Build commit message with milestone summary + slice manifest
+6. `git commit`
+7. Optional: `git push`
+8. `removeWorktree()` + `git branch -D milestone/`
+
+No conflict categorization. No runtime file stripping (runtime files are gitignored, not in the merge). No `.gsd/` special handling.
+
+If squash-merge conflicts (parallel milestone edge case): stop auto-mode with clear error, user resolves manually or GSD dispatches a one-time resolution session.
+
+**Verification:** Complete a full milestone in auto-mode. `main` receives one squash commit with all code and planning artifacts.
+
+### Phase 5: Test Cleanup
+
+**Goal:** Test suite reflects the simplified architecture.
+
+1. Delete or rewrite:
+ - `auto-worktree-merge.test.ts` — tests slice→milestone merge (deleted)
+ - `auto-worktree-milestone-merge.test.ts` — rewrite for simplified milestone→main
+ - `worktree-e2e.test.ts` — rewrite for branchless flow
+ - `worktree-integration.test.ts` — rewrite for branchless flow
+ - Merge-related test cases in `git-service.test.ts`
+
+2. Add new tests:
+ - Branchless worktree lifecycle: create → commit → commit → squash-merge → cleanup
+ - `.gsd/` tracking: planning artifacts tracked, runtime files ignored
+ - Manual worktree: `git worktree add` has correct `.gsd/` state
+ - Crash recovery: dirty state on milestone branch, restart, auto-commit, continue
+
+3. Remove merge-specific doctor checks or simplify:
+ - `corrupt_merge_state` — keep (still relevant for milestone→main)
+ - `orphaned_auto_worktree` — keep
+ - `stale_milestone_branch` — keep
+ - `tracked_runtime_files` — keep
+
+**Verification:** `npm run test` passes. No test references `mergeSliceToMilestone`, `mergeSliceToMain`, or `ensureSliceBranch`.
+
+### Phase 6: Migration + Backwards Compatibility
+
+**Goal:** Existing projects with slice branches continue to work.
+
+1. State derivation (`deriveState()`) continues to read `gsd/M001/S01` branch naming for legacy detection
+2. On first run after upgrade:
+ - Detect existing slice branches
+ - Notify user: "GSD no longer creates slice branches. Existing branches are preserved but new work commits directly to the milestone branch."
+ - No forced migration — legacy branches are read-only context
+3. Doctor check: `legacy_slice_branches` — informational, not auto-fix
+4. Update `shouldUseWorktreeIsolation()` preference handling:
+ - `git.isolation: "worktree"` → default behavior (only option)
+ - `git.isolation: "branch"` → warning, treated as worktree
+ - Remove preference UI for isolation mode
+
+**Verification:** Open a project with existing `gsd/M001/S01` branches. GSD reads state correctly, new work commits on milestone branch without slice branches.
+
+## Stress Test Results
+
+Validated by three independent models:
+
+### Gemini 2.5 Pro — 6 Attack Vectors
+
+| Attack | Severity | Mitigation |
+|--------|----------|------------|
+| Parallel milestone code conflict at squash-merge | Medium | `git rebase main` before squash. Rare in single-user. |
+| SQLite desync after `git reset --hard` | Low | DB rebuilt from tracked markdown on startup (M001/S02 importers). |
+| Ghost lock after SIGKILL | Low | Existing heartbeat lock detection handles this. |
+| Squash merge loses bisect granularity | Low | Commit messages tag slices. Branch preservable if needed. |
+| Disk space with multiple worktrees | Low | Single active milestone at a time. Immediate cleanup. |
+| Plan-action atomicity gap (crash between write and commit) | Low | `handleAgentEnd` auto-commits. Sequential model simplifies recovery. |
+
+### GPT-5.4 (Codex) — Codebase-Informed Analysis
+
+- Confirmed `smartStage()` force-add already implements tracked-artifact intent
+- Confirmed `resolveMainWorktreeRoot` (PR #487) contradicts this architecture
+- Confirmed `.gsd/milestones/` partially tracked on `main` despite `.gitignore`
+- Verdict: **Model is sound. Removes only accidental complexity.**
+
+### GPT-5.4 (Codex) — Dissenting Opinion
+
+Codex agreed on tracked artifacts and worktree-per-milestone, but pushed back on removing slice branches, calling it "a redesign, not a simplification." Specific concerns:
+
+| Concern | Rebuttal |
+|---------|----------|
+| Crash recovery for orphaned slice branches disappears | The failure mode (orphaned branch needing merge) is caused by slice branches. Removing branches removes the failure. Sequential commits on one branch need no orphan recovery. |
+| Concurrent edits to shared root docs (DECISIONS.md) from two terminals | Standard content conflict at squash-merge time. Not caused by or solved by slice branches. |
+| Continuous integration via slice→milestone merges | In sequential single-user work, there's nothing to integrate against within the worktree. Pre-flight rebase before squash-merge is more direct. |
+| Need a replacement slice-boundary primitive | Accepted: conventional commit tags (`feat(M001/S01):`) + optional git tags (`gsd/M001/S01-complete`) serve as boundaries. |
+
+Codex's analysis confirms the tracked-artifact approach but recommends treating branchless as a deliberate redesign with explicit replacement primitives, not a casual deletion.
+
+### Edge Case: Two Milestones Touching Same Source Files
+
+Scenario: M001 and M002 both modify `src/auth.ts`. M001 squash-merges first.
+
+Resolution: Before M002 squash-merges, rebase onto updated `main`:
+```
+cd .gsd/worktrees/M002
+git fetch origin main
+git rebase main
+# Resolve any conflicts (code-only, never .gsd/)
+# Then squash-merge
+```
+
+This is standard git workflow. GSD can automate the rebase step as a pre-merge check.
+
+### Edge Case: Agent Crash Mid-Commit
+
+Scenario: Power loss during `git commit` on the milestone branch.
+
+Resolution: Git's internal journaling protects the object store. On restart:
+- If commit completed: state is consistent
+- If commit didn't complete: working directory has uncommitted changes, `handleAgentEnd` auto-commits on next dispatch
+- No branch to be "stuck between" — single branch means no split-brain state
+
+### Edge Case: User Edits Main While Worktree Active
+
+Scenario: User makes manual commits on `main` while M001 worktree is active.
+
+Resolution: Worktree is on `milestone/M001` branch, independent of `main`. Manual `main` commits don't affect the worktree. At squash-merge time, `git merge --squash` handles the divergence normally. If there's a conflict, it's resolved once.
+
+## Metrics
+
+### Before (Current)
+
+| Metric | Value |
+|--------|-------|
+| Merge/conflict/branch code | 770+ lines across 4 files |
+| Merge-related test files | 11 files |
+| Branch types | 4 (main, milestone/*, gsd/*/*, worktree/*) |
+| Merge strategies | 3 (--no-ff, --squash, conflict resolution) |
+| Dispatch unit types with merge logic | 2 (complete-slice, fix-merge) |
+| Isolation modes | 2 (branch, worktree) |
+| Doctor git checks | 4 |
+
+### After (Proposed)
+
+| Metric | Value |
+|--------|-------|
+| Merge/conflict/branch code | ~50 lines (simplified `mergeMilestoneToMain` only) |
+| Merge-related test files | 3-4 files (rewritten) |
+| Branch types | 2 (main, milestone/*) |
+| Merge strategies | 1 (--squash) |
+| Dispatch unit types with merge logic | 0 |
+| Isolation modes | 1 (worktree) |
+| Doctor git checks | 3-4 (simplified) |
+
+### Net Impact
+
+- **~720 lines deleted** (net, after simplified replacements)
+- **~7 test files deleted or consolidated**
+- **2 branch types eliminated**
+- **2 merge strategies eliminated**
+- **1 dispatch unit type eliminated** (fix-merge)
+- **1 isolation mode eliminated** (branch)
+- **0 merge conflicts possible within a worktree**
+
+## Dependencies
+
+- **M001 (Memory Database):** The SQLite database (`gsd.db`) must remain gitignored. The M001/S02 importer layer rebuilds it from tracked markdown. This PRD's `.gitignore` update explicitly ignores `gsd.db`.
+
+- **PR #487:** Must be closed. The `resolveMainWorktreeRoot` approach (sharing `.gsd/` across worktrees) contradicts tracked-artifact architecture.
+
+## Open Questions
+
+1. **Squash vs `--no-ff` for milestone→main merge?** Squash gives clean history on `main` but loses bisect granularity. `--no-ff` preserves granular commits but clutters `main`. Current proposal: squash (matching existing behavior), with option to preserve milestone branch for debugging.
+
+2. **Should `worktrees/` move outside `.gsd/`?** Having worktrees inside `.gsd/` creates a nesting-doll pattern (worktree contains `.gsd/` which is inside `.gsd/worktrees/`). Relocating to `.gsd-worktrees/` or `~/.gsd/worktrees//` is cleaner but changes the filesystem layout. Recommendation: defer, address separately if it causes issues.
+
+3. **Pre-flight rebase automation?** Before milestone→main squash-merge, should GSD automatically `git rebase main`? Gemini recommends yes. Risk: rebase can fail with conflicts, adding a code path. Recommendation: implement as a doctor check ("milestone branch is behind main by N commits") with manual resolution, automate later if needed.
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 000000000..c37b303c0
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,53 @@
+# GSD Documentation
+
+Welcome to the GSD documentation. This covers everything from getting started to advanced configuration, auto-mode internals, and extending GSD with the Pi SDK.
+
+## User Documentation
+
+| Guide | Description |
+|-------|-------------|
+| [Getting Started](./getting-started.md) | Installation, first run, and basic usage |
+| [Auto Mode](./auto-mode.md) | How autonomous execution works — the state machine, crash recovery, and steering |
+| [Commands Reference](./commands.md) | All commands, keyboard shortcuts, and CLI flags |
+| [Remote Questions](./remote-questions.md) | Discord and Slack integration for headless auto-mode |
+| [Configuration](./configuration.md) | Preferences, model selection, git settings, and token profiles |
+| [Custom Models](./custom-models.md) | Add custom providers (Ollama, vLLM, LM Studio, proxies) via models.json |
+| [Token Optimization](./token-optimization.md) | Token profiles, context compression, complexity routing, and adaptive learning (v2.17) |
+| [Dynamic Model Routing](./dynamic-model-routing.md) | Complexity-based model selection, cost tables, escalation, and budget pressure (v2.19) |
+| [Captures & Triage](./captures-triage.md) | Fire-and-forget thought capture during auto-mode with automated triage (v2.19) |
+| [Workflow Visualizer](./visualizer.md) | Interactive TUI overlay for progress, dependencies, metrics, and timeline (v2.19) |
+| [Cost Management](./cost-management.md) | Budget ceilings, cost tracking, projections, and enforcement modes |
+| [Git Strategy](./git-strategy.md) | Worktree isolation, branching model, and merge behavior |
+| [Parallel Orchestration](./parallel-orchestration.md) | Run multiple milestones simultaneously with worker isolation and coordination |
+| [Working in Teams](./working-in-teams.md) | Unique milestone IDs, `.gitignore` setup, and shared planning artifacts |
+| [Skills](./skills.md) | Bundled skills, skill discovery, and custom skill authoring |
+| [Migration from v1](./migration.md) | Migrating `.planning` directories from the original GSD |
+| [Troubleshooting](./troubleshooting.md) | Common issues, `/gsd doctor` (real-time visibility v2.40), `/gsd forensics` (full debugger v2.40), and recovery procedures |
+| [Web Interface](./web-interface.md) | Browser-based project management with `pi --web` (v2.41) |
+| [VS Code Extension](../vscode-extension/README.md) | Chat participant, sidebar dashboard, and RPC integration for VS Code |
+
+## Architecture & Internals
+
+| Guide | Description |
+|-------|-------------|
+| [Architecture Overview](./architecture.md) | System design, extension model, state-on-disk, and dispatch pipeline |
+| [Native Engine](../native/README.md) | Rust N-API modules for performance-critical operations |
+| [ADR-001: Branchless Worktree Architecture](./ADR-001-branchless-worktree-architecture.md) | Decision record for the v2.14 git architecture |
+| [ADR-003: Pipeline Simplification](./ADR-003-pipeline-simplification.md) | Research merged into planning, mechanical completion (v2.30) |
+
+## Pi SDK Documentation
+
+These guides cover the underlying Pi SDK that GSD is built on. Useful if you want to extend GSD or build your own agent application.
+
+| Guide | Description |
+|-------|-------------|
+| [What is Pi](./what-is-pi/README.md) | Core concepts — modes, agent loop, sessions, tools, providers |
+| [Extending Pi](./extending-pi/README.md) | Building extensions — tools, commands, UI, events, state |
+| [Context & Hooks](./context-and-hooks/README.md) | Context pipeline, hook reference, inter-extension communication |
+| [Pi UI / TUI](./pi-ui-tui/README.md) | Terminal UI components, theming, keyboard input, rendering |
+
+## Research
+
+| Guide | Description |
+|-------|-------------|
+| [Building Coding Agents](./building-coding-agents/README.md) | Research notes on agent design — decomposition, context engineering, cost/quality tradeoffs |
diff --git a/docs/agent-knowledge-index.md b/docs/agent-knowledge-index.md
new file mode 100644
index 000000000..6d9cb6c77
--- /dev/null
+++ b/docs/agent-knowledge-index.md
@@ -0,0 +1,222 @@
+# Agent Knowledge Index
+
+Use this file as a machine-operational routing table for pi docs and research references.
+
+Rules:
+
+- Read only the specific files relevant to the current task.
+- Prefer the primary bundle first.
+- Read files in parallel when the task clearly maps to multiple known references.
+- Use absolute paths directly with `read`.
+- Follow conditional references only when the primary bundle does not answer the question.
+
+## Pi architecture
+
+Use when:
+
+- understanding how pi works end to end
+- tracing subsystem relationships
+- understanding sessions, compaction, models, tools, or prompt flow
+- deciding how to embed pi in a branded app, custom CLI, desktop app, or web product
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/01-what-pi-is.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/04-the-architecture-how-everything-fits-together.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/05-the-agent-loop-how-pi-thinks.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/06-tools-how-pi-acts-on-the-world.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/07-sessions-memory-that-branches.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/08-compaction-how-pi-manages-context-limits.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/09-the-customization-stack.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/10-providers-models-multi-model-by-default.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/13-context-files-project-instructions.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/03-the-four-modes-of-operation.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/11-the-interactive-tui.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/15-pi-packages-the-ecosystem.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/17-file-reference-all-documentation.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/18-quick-reference-commands-shortcuts.md`
+- `/Users/lexchristopherson/.gsd/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md`
+
+## Context engineering, hooks, and context flow
+
+Use when:
+
+- understanding how user prompts flow through to the LLM
+- working with before_agent_start, context, tool_call, tool_result, input hooks
+- injecting, filtering, or transforming LLM context
+- understanding message types and what the LLM actually sees
+- coordinating multiple extensions
+- building mode systems, presets, or context management extensions
+- debugging why the LLM does or doesn't see certain information
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/01-the-context-pipeline.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/02-hook-reference.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/03-context-injection-patterns.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/04-message-types-and-llm-visibility.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/05-inter-extension-communication.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/06-advanced-patterns-from-source.md`
+- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/07-the-system-prompt-anatomy.md`
+
+## Extension development
+
+Use when:
+
+- building or modifying extensions
+- adding tools, commands, hooks, renderers, state, or packaging
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/01-what-are-extensions.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/02-architecture-mental-model.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/03-getting-started.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/06-the-extension-lifecycle.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/07-events-the-nervous-system.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/08-extensioncontext-what-you-can-access.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/09-extensionapi-what-you-can-do.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/10-custom-tools-giving-the-llm-new-abilities.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/11-custom-commands-user-facing-actions.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/14-custom-rendering-controlling-what-the-user-sees.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/25-slash-command-subcommand-patterns.md` # for subcommand-style slash command UX via getArgumentCompletions()
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/15-system-prompt-modification.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/22-key-rules-gotchas.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/04-extension-locations-discovery.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/05-extension-structure-styles.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/12-custom-ui-visual-components.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/13-state-management-persistence.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/16-compaction-session-control.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/17-model-provider-management.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/18-remote-execution-tool-overrides.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/19-packaging-distribution.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/20-mode-behavior.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/21-error-handling.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/23-file-reference-documentation.md`
+- `/Users/lexchristopherson/.gsd/docs/extending-pi/24-file-reference-example-extensions.md`
+
+## Pi UI and TUI
+
+Use when:
+
+- building dialogs, widgets, overlays, custom editors, or UI renderers
+- working on TUI layout or display behavior
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/01-the-ui-architecture.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/22-quick-reference-all-ui-apis.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/04-built-in-dialog-methods.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/05-persistent-ui-elements.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/06-ctx-ui-custom-full-custom-components.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/07-built-in-components-the-building-blocks.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/12-overlays-floating-modals-and-panels.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/13-custom-editors-replacing-the-input.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/14-tool-rendering-custom-tool-display.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/15-message-rendering-custom-message-display.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/21-common-mistakes-and-how-to-avoid-them.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/02-the-component-interface-foundation-of-everything.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/08-high-level-components-from-pi-coding-agent.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/09-keyboard-input-how-to-handle-keys.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/10-line-width-the-cardinal-rule.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/11-theming-colors-and-styles.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/16-performance-caching-and-invalidation.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/17-theme-changes-and-invalidation.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/18-ime-support-the-focusable-interface.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/19-building-a-complete-component-step-by-step.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/20-real-world-patterns-from-examples.md`
+- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/23-file-reference-example-extensions-with-ui.md`
+
+## Building coding agents
+
+Use when:
+
+- designing agent behavior
+- improving autonomy, speed, context handling, or decomposition
+- solving hard ambiguity, safety, or verification problems
+
+Read first:
+
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/01-work-decomposition.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/11-god-tier-context-engineering.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/12-handling-ambiguity-contradiction.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/03-state-machine-context-management.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/04-optimal-storage-for-project-context.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/05-parallelization-strategy.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/08-speed-optimization.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/20-error-taxonomy-routing.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/24-security-trust-boundaries.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/13-long-running-memory-fidelity.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/16-encoding-taste-aesthetics.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/19-when-to-scrap-and-start-over.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/23-evolution-across-project-scale.md`
+- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md`
+
+## Pi product docs
+
+Use when:
+
+- the user asks about pi itself, its SDK, extensions, themes, skills, packages, TUI, prompt templates, keybindings, or custom providers
+
+Read first:
+
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/README.md`
+
+Read together when relevant:
+
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/extensions.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/themes.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/skills.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/prompt-templates.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/tui.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/keybindings.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/sdk.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/custom-provider.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/models.md`
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/packages.md`
+
+Follow-up if needed:
+
+- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples`
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 000000000..a166c148b
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,162 @@
+# Architecture Overview
+
+GSD is a TypeScript application built on the [Pi SDK](https://github.com/badlogic/pi-mono). It embeds the Pi coding agent and extends it with the GSD workflow engine, auto mode state machine, and project management primitives.
+
+## System Structure
+
+```
+gsd (CLI binary)
+ └─ loader.ts Sets PI_PACKAGE_DIR, GSD env vars, dynamic-imports cli.ts
+ └─ cli.ts Wires SDK managers, loads extensions, starts InteractiveMode
+ ├─ onboarding.ts First-run setup wizard (LLM provider + tool keys)
+ ├─ wizard.ts Env hydration from stored auth.json credentials
+ ├─ app-paths.ts ~/.gsd/agent/, ~/.gsd/sessions/, auth.json
+ ├─ resource-loader.ts Syncs bundled extensions + agents to ~/.gsd/agent/
+ └─ src/resources/
+ ├─ extensions/gsd/ Core GSD extension
+ ├─ extensions/... 12 supporting extensions
+ ├─ agents/ scout, researcher, worker
+ ├─ AGENTS.md Agent routing instructions
+ └─ GSD-WORKFLOW.md Manual bootstrap protocol
+
+gsd headless Headless mode — CI/cron orchestration via RPC child process
+gsd --mode mcp MCP server mode — exposes tools over stdin/stdout
+
+vscode-extension/ VS Code extension — chat participant (@gsd), sidebar dashboard, RPC integration
+```
+
+## Key Design Decisions
+
+### State Lives on Disk
+
+`.gsd/` is the sole source of truth. Auto mode reads it, writes it, and advances based on what it finds. No in-memory state survives across sessions. This enables crash recovery, multi-terminal steering, and session resumption.
+
+### Two-File Loader Pattern
+
+`loader.ts` sets all environment variables with zero SDK imports, then dynamically imports `cli.ts` which does static SDK imports. This ensures `PI_PACKAGE_DIR` is set before any SDK code evaluates.
+
+### `pkg/` Shim Directory
+
+`PI_PACKAGE_DIR` points to `pkg/` (not project root) to avoid Pi's theme resolution colliding with GSD's `src/` directory. Contains only `piConfig` and theme assets.
+
+### Always-Overwrite Sync
+
+Bundled extensions and agents are synced to `~/.gsd/agent/` on every launch, not just first run. This means `npm update -g` takes effect immediately.
+
+### Lazy Provider Loading
+
+LLM provider SDKs (Anthropic, OpenAI, Google, etc.) are lazy-loaded on first use rather than imported at startup. This significantly reduces cold-start time — only the provider you actually connect to gets loaded.
+
+### Fresh Session Per Unit
+
+Every dispatch creates a new agent session. The LLM starts with a clean context window containing only the pre-inlined artifacts it needs. This prevents quality degradation from context accumulation.
+
+## Bundled Extensions
+
+| Extension | What It Provides |
+|-----------|-----------------|
+| **GSD** | Core workflow engine — auto mode, state machine, commands, dashboard |
+| **Browser Tools** | Playwright-based browser automation — navigation, forms, screenshots, PDF export, device emulation, visual regression, structured data extraction, route mocking, accessibility tree inspection, and semantic actions |
+| **Search the Web** | Brave Search, Tavily, or Jina page extraction |
+| **Google Search** | Gemini-powered web search with AI-synthesized answers |
+| **Context7** | Up-to-date library/framework documentation |
+| **Background Shell** | Long-running process management with readiness detection |
+| **Subagent** | Delegated tasks with isolated context windows |
+| **Mac Tools** | macOS native app automation via Accessibility APIs |
+| **MCP Client** | Native MCP server integration via @modelcontextprotocol/sdk |
+| **Voice** | Real-time speech-to-text (macOS, Linux) |
+| **Slash Commands** | Custom command creation |
+| **LSP** | Language Server Protocol — diagnostics, definitions, references, hover, rename |
+| **Ask User Questions** | Structured user input with single/multi-select |
+| **Secure Env Collect** | Masked secret collection |
+| **Async Jobs** | Background command execution with `async_bash`, `await_job`, `cancel_job` |
+| **Remote Questions** | Discord, Slack, and Telegram integration for headless question routing |
+| **TTSR** | Tool-triggered system rules — conditional context injection based on tool usage |
+| **Universal Config** | Discovery of existing AI tool configurations (Claude Code, Cursor, Windsurf, etc.) |
+
+## Bundled Agents
+
+| Agent | Role |
+|-------|------|
+| **Scout** | Fast codebase recon — compressed context for handoff |
+| **Researcher** | Web research — finds and synthesizes current information |
+| **Worker** | General-purpose execution in an isolated context window |
+
+## Native Engine
+
+Performance-critical operations use a Rust N-API engine:
+
+- **grep** — ripgrep-backed content search
+- **glob** — gitignore-aware file discovery
+- **ps** — cross-platform process tree management
+- **highlight** — syntect-based syntax highlighting
+- **ast** — structural code search via ast-grep
+- **diff** — fuzzy text matching and unified diff generation
+- **text** — ANSI-aware text measurement and wrapping
+- **html** — HTML-to-Markdown conversion
+- **image** — decode, encode, resize images
+- **fd** — fuzzy file path discovery
+- **clipboard** — native clipboard access
+- **git** — libgit2-backed git read operations (v2.16+)
+- **parser** — GSD file parsing and frontmatter extraction
+
+## Dispatch Pipeline
+
+The auto mode dispatch pipeline:
+
+```
+1. Read disk state (STATE.md, roadmap, plans)
+2. Determine next unit type and ID
+3. Classify complexity → select model tier
+4. Apply budget pressure adjustments
+5. Check routing history for adaptive adjustments
+6. Dynamic model routing (if enabled) → select cheapest model for tier
+7. Resolve effective model (with fallbacks)
+8. Check pending captures → triage if needed
+9. Build dispatch prompt (applying inline level compression)
+10. Create fresh agent session
+11. Inject prompt and let LLM execute
+12. On completion: snapshot metrics, verify artifacts, persist state
+13. Loop to step 1
+```
+
+Phase skipping (from token profile) gates steps 2-3: if a phase is skipped, the corresponding unit type is never dispatched.
+
+## Key Modules (v2.33)
+
+| Module | Purpose |
+|--------|---------|
+| `auto.ts` | Auto-mode state machine and orchestration |
+| `auto/session.ts` | `AutoSession` class — all mutable auto-mode state in one encapsulated instance |
+| `auto-dispatch.ts` | Declarative dispatch table (phase → unit mapping) |
+| `auto-idempotency.ts` | Completed-key checks, skip loop detection, key eviction |
+| `auto-stuck-detection.ts` | Stuck loop recovery and unit retry escalation |
+| `auto-start.ts` | Fresh-start bootstrap — git/state init, crash lock detection, worktree setup |
+| `auto-post-unit.ts` | Post-unit processing — commit, doctor, state rebuild, hooks |
+| `auto-verification.ts` | Post-unit verification gate (lint/test/typecheck with auto-fix retries) |
+| `auto-prompts.ts` | Prompt builders with inline level compression |
+| `auto-worktree.ts` | Worktree lifecycle (create, enter, merge, teardown) |
+| `auto-recovery.ts` | Expected artifact resolution, completed-key persistence, self-healing |
+| `auto-timeout-recovery.ts` | Timed-out unit recovery and continuation |
+| `auto-timers.ts` | Unit supervision — soft/idle/hard timeouts, continue-here monitor |
+| `complexity-classifier.ts` | Unit complexity classification (light/standard/heavy) |
+| `model-router.ts` | Dynamic model routing with cost-aware selection |
+| `model-cost-table.ts` | Built-in per-model cost data for cross-provider comparison |
+| `routing-history.ts` | Adaptive learning from routing outcomes |
+| `captures.ts` | Fire-and-forget thought capture and triage classification |
+| `triage-resolution.ts` | Capture resolution (inject, defer, replan, quick-task) |
+| `visualizer-overlay.ts` | Workflow visualizer TUI overlay |
+| `visualizer-data.ts` | Data loading for visualizer tabs |
+| `visualizer-views.ts` | Tab renderers (progress, deps, metrics, timeline, discussion status) |
+| `metrics.ts` | Token and cost tracking ledger |
+| `state.ts` | State derivation from disk |
+| `session-lock.ts` | OS-level exclusive session locking (proper-lockfile) |
+| `crash-recovery.ts` | Lock file management for crash detection and recovery |
+| `preferences.ts` | Preference loading, merging, validation |
+| `git-service.ts` | Git operations — commit, merge, worktree sync, completed-units cross-boundary sync |
+| `unit-id.ts` | Centralized `parseUnitId()` — milestone/slice/task extraction from unit IDs |
+| `error-utils.ts` | `getErrorMessage()` — unified error-to-string conversion |
+| `roadmap-slices.ts` | Roadmap parser with prose fallback for LLM-generated variants |
+| `memory-extractor.ts` | Extract reusable knowledge from session transcripts |
+| `memory-store.ts` | Persistent memory store for cross-session knowledge |
+| `queue-order.ts` | Milestone queue ordering |
diff --git a/docs/auto-mode.md b/docs/auto-mode.md
new file mode 100644
index 000000000..5d2c47e3a
--- /dev/null
+++ b/docs/auto-mode.md
@@ -0,0 +1,301 @@
+# Auto Mode
+
+Auto mode is GSD's autonomous execution engine. Run `/gsd auto`, walk away, come back to built software with clean git history.
+
+## How It Works
+
+Auto mode is a **state machine driven by files on disk**. It reads `.gsd/STATE.md`, determines the next unit of work, creates a fresh agent session, injects a focused prompt with all relevant context pre-inlined, and lets the LLM execute. When the LLM finishes, auto mode reads disk state again and dispatches the next unit.
+
+### The Loop
+
+Each slice flows through phases automatically:
+
+```
+Plan (with integrated research) → Execute (per task) → Complete → Reassess Roadmap → Next Slice
+ ↓ (all slices done)
+ Validate Milestone → Complete Milestone
+```
+
+- **Plan** — scouts the codebase, researches relevant docs, and decomposes the slice into tasks with must-haves
+- **Execute** — runs each task in a fresh context window
+- **Complete** — writes summary, UAT script, marks roadmap, commits
+- **Reassess** — checks if the roadmap still makes sense
+- **Validate Milestone** — reconciliation gate after all slices complete; compares roadmap success criteria against actual results, catches gaps before sealing the milestone
+
+## Key Properties
+
+### Fresh Session Per Unit
+
+Every task, research phase, and planning step gets a clean context window. No accumulated garbage. No degraded quality from context bloat. The dispatch prompt includes everything needed — task plans, prior summaries, dependency context, decisions register — so the LLM starts oriented instead of spending tool calls reading files.
+
+### Context Pre-Loading
+
+The dispatch prompt is carefully constructed with:
+
+| Inlined Artifact | Purpose |
+|------------------|---------|
+| Task plan | What to build |
+| Slice plan | Where this task fits |
+| Prior task summaries | What's already done |
+| Dependency summaries | Cross-slice context |
+| Roadmap excerpt | Overall direction |
+| Decisions register | Architectural context |
+
+The amount of context inlined is controlled by your [token profile](./token-optimization.md). Budget mode inlines minimal context; quality mode inlines everything.
+
+### Git Isolation
+
+GSD isolates milestone work using one of three modes (configured via `git.isolation` in preferences):
+
+- **`worktree`** (default): Each milestone runs in its own git worktree at `.gsd/worktrees//` on a `milestone/` branch. All slice work commits sequentially — no branch switching, no merge conflicts mid-milestone. When the milestone completes, it's squash-merged to main as one clean commit.
+- **`branch`**: Work happens in the project root on a `milestone/` branch. Useful for submodule-heavy repos where worktrees don't work well.
+- **`none`**: Work happens directly on your current branch. No worktree, no milestone branch. Ideal for hot-reload workflows where file isolation breaks dev tooling.
+
+See [Git Strategy](./git-strategy.md) for details.
+
+### Parallel Execution
+
+When your project has independent milestones, you can run them simultaneously. Each milestone gets its own worker process and worktree. See [Parallel Orchestration](./parallel-orchestration.md) for setup and usage.
+
+### Crash Recovery
+
+A lock file tracks the current unit. If the session dies, the next `/gsd auto` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context.
+
+**Headless auto-restart (v2.26):** When running `gsd headless auto`, crashes trigger automatic restart with exponential backoff (5s → 10s → 30s cap, default 3 attempts). Configure with `--max-restarts N`. SIGINT/SIGTERM bypasses restart. Combined with crash recovery, this enables true overnight "run until done" execution.
+
+### Provider Error Recovery
+
+GSD classifies provider errors and auto-resumes when safe:
+
+| Error type | Examples | Action |
+|-----------|----------|--------|
+| **Rate limit** | 429, "too many requests" | Auto-resume after retry-after header or 60s |
+| **Server error** | 500, 502, 503, "overloaded", "api_error" | Auto-resume after 30s |
+| **Permanent** | "unauthorized", "invalid key", "billing" | Pause indefinitely (requires manual resume) |
+
+No manual intervention needed for transient errors — the session pauses briefly and continues automatically.
+
+### Incremental Memory (v2.26)
+
+GSD maintains a `KNOWLEDGE.md` file — an append-only register of project-specific rules, patterns, and lessons learned. The agent reads it at the start of every unit and appends to it when discovering recurring issues, non-obvious patterns, or rules that future sessions should follow. This gives auto-mode cross-session memory that survives context window boundaries.
+
+### Context Pressure Monitor (v2.26)
+
+When context usage reaches 70%, GSD sends a wrap-up signal to the agent, nudging it to finish durable output (commit, write summaries) before the context window fills. This prevents sessions from hitting the hard context limit mid-task with no artifacts written.
+
+### Meaningful Commit Messages (v2.26)
+
+Commits are generated from task summaries — not generic "complete task" messages. Each commit message reflects what was actually built, giving clean `git log` output that reads like a changelog.
+
+### Stuck Detection (v2.39)
+
+GSD uses a sliding-window analysis to detect stuck loops. Instead of a simple "same unit dispatched twice" counter, the detector examines recent dispatch history for repeated patterns — catching cycles like A→B→A→B as well as single-unit repeats. On detection, GSD retries once with a deep diagnostic prompt. If it fails again, auto mode stops with the exact file it expected, so you can intervene.
+
+The sliding-window approach reduces false positives on legitimate retries (e.g., verification failures that self-correct) while catching genuine stuck loops faster.
+
+### Post-Mortem Investigation (v2.40)
+
+`/gsd forensics` is a full-access GSD debugger for post-mortem analysis of auto-mode failures. It provides:
+
+- **Anomaly detection** — structured identification of stuck loops, cost spikes, timeouts, missing artifacts, and crashes with severity levels
+- **Unit traces** — last 10 unit executions with error details and execution times
+- **Metrics analysis** — cost, token counts, and execution time breakdowns
+- **Doctor integration** — includes structural health issues from `/gsd doctor`
+- **LLM-guided investigation** — an agent session with full tool access to investigate root causes
+
+```
+/gsd forensics [optional problem description]
+```
+
+See [Troubleshooting](./troubleshooting.md) for more on diagnosing issues.
+
+### Timeout Supervision
+
+Three timeout tiers prevent runaway sessions:
+
+| Timeout | Default | Behavior |
+|---------|---------|----------|
+| Soft | 20 min | Warns the LLM to wrap up |
+| Idle | 10 min | Detects stalls, intervenes |
+| Hard | 30 min | Pauses auto mode |
+
+Recovery steering nudges the LLM to finish durable output before timing out. Configure in preferences:
+
+```yaml
+auto_supervisor:
+ soft_timeout_minutes: 20
+ idle_timeout_minutes: 10
+ hard_timeout_minutes: 30
+```
+
+### Cost Tracking
+
+Every unit's token usage and cost is captured, broken down by phase, slice, and model. The dashboard shows running totals and projections. Budget ceilings can pause auto mode before overspending.
+
+See [Cost Management](./cost-management.md).
+
+### Adaptive Replanning
+
+After each slice completes, the roadmap is reassessed. If the work revealed new information that changes the plan, slices are reordered, added, or removed before continuing. This can be skipped with the `balanced` or `budget` token profiles.
+
+### Verification Enforcement (v2.26)
+
+Configure shell commands that run automatically after every task execution:
+
+```yaml
+verification_commands:
+ - npm run lint
+ - npm run test
+verification_auto_fix: true # auto-retry on failure (default)
+verification_max_retries: 2 # max retry attempts (default: 2)
+```
+
+Failures trigger auto-fix retries — the agent sees the verification output and attempts to fix the issues before advancing. This ensures code quality gates are enforced mechanically, not by LLM compliance.
+
+### Slice Discussion Gate (v2.26)
+
+For projects where you want human review before each slice begins:
+
+```yaml
+require_slice_discussion: true
+```
+
+Auto-mode pauses before each slice, presenting the slice context for discussion. After you confirm, execution continues. Useful for high-stakes projects where you want to review the plan before the agent builds.
+
+### HTML Reports (v2.26)
+
+After a milestone completes, GSD auto-generates a self-contained HTML report in `.gsd/reports/`. Reports include project summary, progress tree, slice dependency graph (SVG DAG), cost/token metrics with bar charts, execution timeline, changelog, and knowledge base. No external dependencies — all CSS and JS are inlined.
+
+```yaml
+auto_report: true # enabled by default
+```
+
+Generate manually anytime with `/gsd export --html`, or generate reports for all milestones at once with `/gsd export --html --all` (v2.28).
+
+### Failure Recovery (v2.28)
+
+v2.28 hardens auto-mode reliability with multiple safeguards: atomic file writes prevent corruption on crash, OAuth fetch timeouts (30s) prevent indefinite hangs, RPC subprocess exit is detected and reported, and blob garbage collection prevents unbounded disk growth. Combined with the existing crash recovery and headless auto-restart, auto-mode is designed for true "fire and forget" overnight execution.
+
+### Pipeline Architecture (v2.40)
+
+The auto-loop is structured as a linear phase pipeline rather than recursive dispatch. Each iteration flows through explicit stages:
+
+1. **Pre-Dispatch** — validate state, check guards, resolve model preferences
+2. **Dispatch** — execute the unit with a focused prompt
+3. **Post-Unit** — close out the unit, update caches, run cleanup
+4. **Verification** — optional validation gate (lint, test, etc.)
+5. **Stuck Detection** — sliding-window pattern analysis
+
+This linear flow is easier to debug, uses less memory (no recursive call stack), and provides cleaner error recovery since each phase has well-defined entry and exit conditions.
+
+### Real-Time Health Visibility (v2.40)
+
+Doctor issues (from `/gsd doctor`) now surface in real time across three places:
+
+- **Dashboard widget** — health indicator with issue count and severity
+- **Workflow visualizer** — issues shown in the status panel
+- **HTML reports** — health section with all issues at report generation time
+
+Issues are classified by severity: `error` (blocks auto-mode), `warning` (non-blocking), and `info` (advisory). Auto-mode checks health at dispatch time and can pause on critical issues.
+
+### Skill Activation in Prompts (v2.39)
+
+Configured skills are automatically resolved and injected into dispatch prompts. The agent receives an "Available Skills" block listing skills that match the current context, based on:
+
+- `always_use_skills` — always included
+- `prefer_skills` — included with preference indicator
+- `skill_rules` — conditional activation based on `when` clauses
+
+See [Configuration](./configuration.md) for skill routing preferences.
+
+## Controlling Auto Mode
+
+### Start
+
+```
+/gsd auto
+```
+
+### Pause
+
+Press **Escape**. The conversation is preserved. You can interact with the agent, inspect state, or resume.
+
+### Resume
+
+```
+/gsd auto
+```
+
+Auto mode reads disk state and picks up where it left off.
+
+### Stop
+
+```
+/gsd stop
+```
+
+Stops auto mode gracefully. Can be run from a different terminal.
+
+### Steer
+
+```
+/gsd steer
+```
+
+Hard-steer plan documents during execution without stopping the pipeline. Changes are picked up at the next phase boundary.
+
+### Capture
+
+```
+/gsd capture "add rate limiting to API endpoints"
+```
+
+Fire-and-forget thought capture. Captures are triaged automatically between tasks. See [Captures & Triage](./captures-triage.md).
+
+### Visualize
+
+```
+/gsd visualize
+```
+
+Open the workflow visualizer — interactive tabs for progress, dependencies, metrics, and timeline. See [Workflow Visualizer](./visualizer.md).
+
+## Dashboard
+
+`Ctrl+Alt+G` or `/gsd status` shows real-time progress:
+
+- Current milestone, slice, and task
+- Auto mode elapsed time and phase
+- Per-unit cost and token breakdown
+- Cost projections
+- Completed and in-progress units
+- Pending capture count (when captures are awaiting triage)
+- Parallel worker status (when running parallel milestones — includes 80% budget alert)
+
+## Phase Skipping
+
+Token profiles can skip certain phases to reduce cost:
+
+| Phase | `budget` | `balanced` | `quality` |
+|-------|----------|------------|-----------|
+| Milestone Research | Skipped | Runs | Runs |
+| Slice Research | Skipped | Skipped | Runs |
+| Reassess Roadmap | Skipped | Runs | Runs |
+
+See [Token Optimization](./token-optimization.md) for details.
+
+## Dynamic Model Routing
+
+When enabled, auto-mode automatically selects cheaper models for simple units (slice completion, UAT) and reserves expensive models for complex work (replanning, architectural tasks). See [Dynamic Model Routing](./dynamic-model-routing.md).
+
+## Reactive Task Execution (v2.38)
+
+When `reactive_execution: true` is set in preferences, GSD derives a dependency graph from IO annotations in task plans. Tasks that don't conflict (no shared file reads/writes) are dispatched in parallel via subagents, while dependent tasks wait for their predecessors to complete.
+
+```yaml
+reactive_execution: true # disabled by default
+```
+
+The graph derivation is pure and deterministic — it resolves a ready-set of tasks, detects conflicts, and guards against deadlocks. Verification results carry forward across parallel batches, so tasks that pass verification don't need to be re-verified when subsequent tasks in the same slice complete.
+
+The implementation lives in `reactive-graph.ts` (graph derivation, ready-set resolution, conflict/deadlock detection) with integration into `auto-dispatch.ts` and `auto-prompts.ts`.
diff --git a/docs/building-coding-agents/01-work-decomposition.md b/docs/building-coding-agents/01-work-decomposition.md
new file mode 100644
index 000000000..fe58f8d2a
--- /dev/null
+++ b/docs/building-coding-agents/01-work-decomposition.md
@@ -0,0 +1,34 @@
+# Work Decomposition
+
+**The universal consensus:** Elite engineers never jump from vision to code. They use **progressive decomposition** through layers of abstraction.
+
+### The Compression Ladder
+
+```
+Vision → Capabilities → Systems/Architecture → Features → Tasks
+```
+
+Each layer answers a different question:
+
+| Layer | Question |
+|-------|----------|
+| Vision | What world are we creating? |
+| Capabilities | What must the product be able to do? |
+| Systems | What infrastructure enables those capabilities? |
+| Features | What does the user interact with? |
+| Tasks | What exact code gets written? |
+
+### Core Principles (All 4 Models Agree)
+
+- **Start with outcomes, not features.** Define "done" before anything else. Not "build a login page" but "a user can securely access their dashboard using OAuth."
+- **Vertical slices over horizontal layers.** Build thin end-to-end slices (UI → API → DB) rather than completing all backend before all frontend. Each slice is independently demoable and testable.
+- **The 1-Day Rule.** If a task takes longer than a day, it's not a task — it's a milestone. Break it down further until each item is a single, clear action completable in one sitting.
+- **Risk-first exploration.** Identify the hardest/most uncertain parts first. Spike on unknowns before committing to architecture. "Kill the biggest risks while they are still cheap to fix."
+- **Interface-first design.** Define contracts between components before building them. This enables parallel work and creates natural verification checkpoints.
+- **MECE decomposition.** Tasks should be Mutually Exclusive (no overlap) and Collectively Exhaustive (complete the vision when all are done).
+
+### The Recursive Heuristic
+
+> If something feels fuzzy, break it down one level deeper. Keep decomposing until a task is obvious how to start.
+
+---
diff --git a/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md b/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md
new file mode 100644
index 000000000..347c9abc8
--- /dev/null
+++ b/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md
@@ -0,0 +1,38 @@
+# What to Keep & Discard from Human Engineering
+
+### KEEP & Amplify
+
+| Practice | Why It Matters More for AI |
+|----------|---------------------------|
+| **Clear product intent & experience specs** | AI needs direction, not instructions. "How should it feel?" drives architecture. |
+| **Acceptance criteria as the backbone** | Becomes TDD at its logical extreme — human writes tests in natural language, AI makes them true. |
+| **Vertical slicing** | Even more critical — prevents AI from going deep down a wrong path fast and confidently. |
+| **Interface-first approach** | Creates natural checkpoints, makes systems modular and replaceable. |
+| **Explicit constraints & non-functional requirements** | Narrows the search space. Without them AI may produce technically correct but strategically wrong systems. |
+| **Architecture Decision Records (ADRs)** | Prevents AI from "accidentally" undoing decisions made weeks ago. |
+| **Feedback loops** | Build → test → observe → refine. Accelerated to machine speed. |
+
+### DISCARD
+
+| Practice | Why It's Dead Weight |
+|----------|---------------------|
+| **Estimation rituals** (story points, velocity, sprint planning) | AI doesn't get tired, doesn't context-switch, works at machine speed. |
+| **Communication overhead** (standups, design reviews, PR reviews) | Only one communication channel matters: human ↔ agent. |
+| **Manual code review for style** | Automated linting + formatting handles this deterministically. |
+| **Step-by-step instructions** | Provide outcomes, not "how." |
+| **Heavy upfront documentation** | AI can read the entire repo instantly. Document *intent* and *why*, not *how*. |
+| **Gradual skill-building** | No ramp-up, no knowledge silos, no "only Sarah knows how that module works." |
+| **Defensive architecture against human error** | Tests still needed, but for a different reason: verifying AI's interpretation of intent. |
+
+### The New Human Role
+
+| Responsibility | Description |
+|---------------|-------------|
+| **Defining "good"** | Vision, personas, experience specs, success metrics |
+| **Taste & judgment** | Aesthetics, emotional experience, brand voice |
+| **Strategic decisions** | Which problems matter, product pivots |
+| **Gut checks at milestones** | Does this *feel* right? |
+
+> **The core shift:** Human = intention + taste. AI = exploration + execution.
+
+---
diff --git a/docs/building-coding-agents/03-state-machine-context-management.md b/docs/building-coding-agents/03-state-machine-context-management.md
new file mode 100644
index 000000000..c6e8da7e5
--- /dev/null
+++ b/docs/building-coding-agents/03-state-machine-context-management.md
@@ -0,0 +1,46 @@
+# State Machine & Context Management
+
+### The Fundamental Tension
+
+The agent needs to understand the whole project to make good decisions, but any single context window degrades with too much information — not just from token limits but from **attention dilution**.
+
+### Layered Memory Architecture (Universal Agreement)
+
+```
+Project Manifest (always loaded, <1000 tokens)
+ ↓
+Task Context (per-task, relevant files + specs)
+ ↓
+Retrieval Layer (pull-based, on-demand)
+ ↓
+Ground Truth (filesystem, git, actual code)
+```
+
+| Layer | Content | Access Pattern | Token Impact |
+|-------|---------|---------------|--------------|
+| **Working Context** (L1) | Current task + 3–5 relevant files | Dynamically assembled per LLM call | 8k–25k tokens |
+| **Session/Episodic** (L2) | Compressed history + recent decisions | Auto-summarized at transitions | Summary only |
+| **Project Semantic** (L3) | Full codebase summaries, dependency graph, ADRs | Vector + Graph retrieval | Pointers only |
+| **Ground Truth** (L4) | Actual files, git history, test results | Agent reads via tools | Zero in prompt |
+
+### The State Machine
+
+The agent should always be in one explicit state:
+
+```
+PLAN → IMPLEMENT → TEST → DEBUG → VERIFY → DOCUMENT
+```
+
+**Critical transitions that matter:**
+- **Task completion:** Defined by automated tests passing + acceptance criteria met
+- **Stuck detection:** Triggered by repeated failed attempts or missing information
+- **Plan revision:** Triggered when completed tasks reveal wrong assumptions
+
+### Key Principles
+
+- **Summarize aggressively between phases.** Don't carry full implementation context forward — carry compressed summaries: what was built, what decisions were made, what interfaces were created.
+- **Pull-based, not push-based context.** Don't preload everything the agent might need. Let it ask for what it discovers it needs.
+- **Use structured state for reliability.** Natural language summaries drift. Use JSON/typed configs for anything the system needs to track. Reserve natural language for reasoning.
+- **The filesystem is external memory.** The codebase itself is the most detailed representation of current state. Hold *understanding* about code in context, not the code itself.
+
+---
diff --git a/docs/building-coding-agents/04-optimal-storage-for-project-context.md b/docs/building-coding-agents/04-optimal-storage-for-project-context.md
new file mode 100644
index 000000000..e6d45ace9
--- /dev/null
+++ b/docs/building-coding-agents/04-optimal-storage-for-project-context.md
@@ -0,0 +1,56 @@
+# Optimal Storage for Project Context
+
+### The Universal Answer: Plain Text Files in the Repo + Structured State Store
+
+All four models converge on a hybrid approach. The key insight: **don't over-engineer with databases and vector stores, but don't under-engineer with a single massive file either.**
+
+### The Optimal Stack
+
+| Storage | What Lives Here | Why |
+|---------|----------------|-----|
+| **Project Manifest** (`PROJECT.md`) | Vision, principles, architecture overview, component status | Always loaded, <1000 tokens, single source of truth |
+| **Structured State** (JSON/SQLite/Postgres) | Task status, phase, dependencies, verification results | Machine-parseable, drives state machine transitions |
+| **Context Directory** (`.context/` or `.ai/`) | Architecture docs, task specs, decision records | Organized for retrieval, not human browsing |
+| **Git Repository** | Actual source code, test results | Ultimate ground truth, never duplicated |
+| **Knowledge Graph** (optional at scale) | File → function → dependency relationships | Enables "what breaks if I change this?" queries |
+
+### Why Plain Files Win
+
+- AI reads files directly — no query language, no ORM, no API calls
+- Version control comes free via git
+- Human can read and edit with any text editor
+- Survives tooling changes — not locked into any system
+
+### Why NOT Vector Stores (as primary)
+
+- Project context is **structured** — you know where things are
+- Vector stores return **approximately relevant** results — approximate is often wrong in codebases
+- They can't represent state, relationships, or task progress
+
+### The Hybrid Format
+
+Individual files use **YAML frontmatter + Markdown body**:
+```yaml
+---
+status: in_progress
+dependencies: [AUTH-01, DB-02]
+acceptance_criteria:
+ - User can reset password via email
+ - Token expires after 30 minutes
+---
+
+## Task: Password Reset Flow
+[Rich narrative description and context here]
+```
+
+### Size Discipline
+
+| File | Target Size |
+|------|------------|
+| Project Manifest | <1,000 tokens |
+| Individual task files (completed) | <500 tokens |
+| Architecture doc | <2,000 tokens |
+
+> The context system isn't just storage — it's a **compression engine**. Its job is to maintain maximum useful understanding in minimum token footprint.
+
+---
diff --git a/docs/building-coding-agents/05-parallelization-strategy.md b/docs/building-coding-agents/05-parallelization-strategy.md
new file mode 100644
index 000000000..83823021d
--- /dev/null
+++ b/docs/building-coding-agents/05-parallelization-strategy.md
@@ -0,0 +1,62 @@
+# Parallelization Strategy
+
+### Core Principle
+
+> Parallelize across boundaries, serialize within them.
+
+The quality of parallelization is directly determined by the quality of interface definitions.
+
+### The Diamond Pattern
+
+```
+ Planning (narrow, serial)
+ ↓
+ Fan Out (parallel execution)
+ ↓
+ Convergence (integration verification)
+ ↓
+ Fan Out (next parallel set)
+```
+
+### Phase-by-Phase Strategy
+
+#### Planning: Mostly Serial, with Parallel Spikes
+- High-level decomposition must be serial (one coherent act of reasoning)
+- **Parallelize uncertainty resolution:** Multiple spikes investigating different risks simultaneously
+- Output: A dependency graph that explicitly identifies what can be parallelized
+
+#### Execution: Massive Parallelization with Right Topology
+
+| Work Type | Strategy |
+|-----------|----------|
+| **Independent leaf tasks** | Embarrassingly parallel — one agent per module |
+| **Dependent chains** | Serial within chain, but chains run in parallel |
+| **Convergence points** | Strictly serial — integration verification |
+
+**Critical insight:** The frontend doesn't need the real API — it needs the API *contract*. Once contracts exist, both sides build in parallel.
+
+#### Testing: The Most Interesting Story
+- **Unit tests:** Same agent, same context, atomic with code
+- **Cross-task tests:** All parallel by definition
+- **Integration tests:** Parallel across different boundaries
+- **E2E tests:** Serial (exercises whole system)
+
+#### Verification: Deliberate Redundancy
+- **Adversarial verification:** Separate reviewer agent with fresh context evaluates against spec
+- **Red-team parallelism:** Agent tries to break the implementation
+
+### Coordination Rules
+
+- Agents communicate through the **filesystem**, never directly
+- Each agent works on a **branch** — merge on success, discard on failure
+- One agent per file at a time (file locking)
+- Optimal concurrency: **3–8 simultaneous agents** for most projects
+
+### Anti-Patterns
+
+- ❌ Don't parallelize tasks that modify the same files
+- ❌ Don't parallelize interacting decisions
+- ❌ Don't skip convergence/integration verification
+- ❌ Don't over-parallelize (coordination tax eats gains above ~8 agents)
+
+---
diff --git a/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md b/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md
new file mode 100644
index 000000000..2d5802580
--- /dev/null
+++ b/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md
@@ -0,0 +1,44 @@
+# Maximizing Agent Autonomy & Superpowers
+
+### The Foundational Insight
+
+> Autonomy comes from **self-correction**, not from getting it right the first time. The power isn't in the initial generation — it's in iteration speed and feedback signal quality.
+
+### The Essential Tool Arsenal
+
+| Category | Tools | Why |
+|----------|-------|-----|
+| **Execution Environment** | Terminal, filesystem, git, package manager | Closes the write → run → debug → verify loop |
+| **Verification** | Test runner, linter, type checker, security scanner | Ground truth over self-assessment |
+| **Observation** | Logs, browser/renderer, performance profiler | Sees what users would see |
+| **Exploration** | Code search, documentation lookup, web research | Self-directed learning |
+| **Recovery** | Git revert, branch management, checkpoints | Safety net that enables boldness |
+
+### Self-Verification Architecture
+
+Every task completion should self-evaluate against a checklist:
+1. Does the code compile?
+2. Do all existing tests still pass?
+3. Do new tests pass?
+4. Does the application actually start?
+5. Can I exercise the feature and see expected behavior?
+6. Does this match acceptance criteria point by point?
+
+### Debugging Superpowers
+
+- **Temporary instrumentation:** Add logging, remove after diagnosis
+- **Bisection:** Walk back through changes to find where regression was introduced
+- **Minimal reproduction:** Strip away everything except exact conditions that trigger failure
+- **Exploratory tests:** Quick throwaway scripts to test hypotheses
+
+### Meta-Cognitive Layer
+
+- **Scratchpad:** External reasoning space to track hypotheses, attempts, and outcomes
+- **Stuck detection:** After N failed attempts, trigger step-back with fresh context and explicitly different approach
+- **Structured escalation:** "Here's what I'm trying, here's what I've tried, here's what I think the issue is, here's what I need from you"
+
+### The Philosophy
+
+> You're not trying to build an agent that doesn't make mistakes. You're building one that **catches and fixes its own mistakes faster than a human would notice them**. Not intelligence — **closed-loop execution with rich feedback**.
+
+---
diff --git a/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md b/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md
new file mode 100644
index 000000000..4772b7a3a
--- /dev/null
+++ b/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md
@@ -0,0 +1,82 @@
+# System Prompt & LLM vs Deterministic Split
+
+### The Core Separation Principle
+
+> If you could write an if-else statement that handles it correctly every time, **it should not be in the LLM's context**. Every token the model spends reasoning about something deterministic is wasted and introduces hallucination risk.
+
+### What the LLM Owns
+
+| Capability | Why LLM |
+|-----------|---------|
+| Understanding intent | Interpretation, judgment |
+| Architectural reasoning | Weighing tradeoffs |
+| Code generation | Creative, context-dependent |
+| Debugging & diagnosis | Abductive reasoning, hypothesis formation |
+| Self-critique & quality assessment | Judgment calls |
+
+### What TypeScript/Deterministic Code Owns
+
+| Capability | Why Deterministic |
+|-----------|-------------------|
+| State machine transitions | Typed state object, no ambiguity |
+| Context assembly | Predict + pre-load what agent needs |
+| File operations | Validate paths, handle encoding, manage permissions |
+| Test execution & result parsing | Structured results, not raw terminal output |
+| Build & environment management | Install deps, start servers, manage ports |
+| Code formatting | Run prettier automatically, never waste LLM tokens |
+| Task scheduling & dependency resolution | Graph traversal, instant vs 5-second LLM call |
+| Summarization triggers | Mechanical workflow, LLM provides content |
+
+### Modular System Prompt Architecture
+
+```
+Base Layer (always present, ~500 tokens)
+ → Identity, core behavioral rules, general approach
+
+Phase-Specific Layer (swapped based on state)
+ → Planning mode: decomposition, interfaces, risks
+ → Execution mode: implementation, testing, iteration
+ → Debugging mode: diagnosis, hypothesis testing, isolation
+
+Task-Specific Layer (assembled fresh per task)
+ → Current spec, acceptance criteria, relevant contracts, prior attempts
+
+Tools Layer
+ → Available tool definitions and parameters
+```
+
+### Tool Design Philosophy
+
+> Each tool should do one thing, do it completely, and return structured results the LLM can immediately act on.
+
+**Bad:** LLM calls `readFile` → `parseJSON` → `runCommand` (3 calls, 3 failure points)
+**Good:** LLM calls `runTests(filter)` → gets structured pass/fail with locations (1 call, clean result)
+
+### Essential Tools
+
+| Tool | Returns |
+|------|---------|
+| `runTests` | Structured results: pass count, fail count, per-failure details |
+| `readFiles` | Batched file contents (array of paths, not one at a time) |
+| `writeFile` | Auto-formats before writing |
+| `searchCodebase` | Grep-like results with file paths and line numbers |
+| `getProjectState` | Manifest + current task spec + related task statuses |
+| `updateTaskStatus` | Handles downstream state updates automatically |
+| `buildProject` | Structured errors with file paths and line numbers |
+| `browserCheck` | Screenshot or structured description of rendered output |
+| `commitChanges` | Enforces conventions, runs pre-commit hooks |
+| `revertToCheckpoint` | Rolls back to last known good state |
+
+### Prompt Patterns That Maximize Agency
+
+1. **Tell it what it CAN do, not what it can't.** "Full authority as long as acceptance criteria and tests pass."
+2. **Explicit permission to iterate.** "First attempt doesn't need to be perfect. Write, run, observe, improve."
+3. **Clear exit conditions.** Concrete, measurable, unambiguous definition of "done."
+4. **Built-in scratchpad.** "Write reasoning in thinking blocks. Track attempts and outcomes."
+5. **Recovery protocol.** "After 3 failed approaches, produce structured escalation."
+
+### The Meta-Principle
+
+> Your TypeScript orchestrator is the deterministic skeleton — workflow, state, context, tools, coordination. The LLM is the reasoning muscle — understanding, creativity, judgment, problem-solving. **Neither should do the other's job.** When you get this right, the LLM becomes dramatically more capable because it's only doing what it's good at, with exactly the context it needs.
+
+---
diff --git a/docs/building-coding-agents/08-speed-optimization.md b/docs/building-coding-agents/08-speed-optimization.md
new file mode 100644
index 000000000..48fbf5fe3
--- /dev/null
+++ b/docs/building-coding-agents/08-speed-optimization.md
@@ -0,0 +1,60 @@
+# Speed Optimization
+
+### The #1 Speed Principle
+
+> The fastest possible operation is the one you don't perform. Before optimizing any step, ask: does this step need to exist at all?
+
+### Speed Levers (Ranked by Impact)
+
+#### 1. Minimize LLM Calls
+- **Batch intent into single calls.** Don't generate code, then tests, then docs separately. One call: "implement, test, and document." TypeScript splits the output.
+- **Deterministic fast paths.** Missing import? Syntax error? Fix without an LLM call if the fix is mechanical.
+- Audit call chains ruthlessly — most systems have 50%+ unnecessary sequential calls.
+
+#### 2. Make Feedback Loops Instantaneous
+- Use test watch mode (no cold start)
+- Run only relevant test subsets (track which files affect which tests)
+- Incremental builds (hot module reloading)
+- Async, non-blocking file writes
+
+#### 3. Precompute Context
+- Predict what the agent will need based on task definition
+- Pre-load into the prompt — no tool calls needed mid-generation
+- **Speculative pre-fetching** (like CPU cache prefetching)
+
+#### 4. Parallelize Independent Work
+- Minimize startup cost for new parallel agents (pre-built templates, warm connections)
+- Use the dependency graph to identify independent work automatically
+
+#### 5. Stream Everything, Block on Nothing
+- Process tokens as they arrive
+- Pipeline parallelism: start formatting code while commit message is still generating
+
+#### 6. Cache Aggressively
+- In-memory cache of everything agent might need
+- Cross-task caching for unchanged files
+- Cache LLM results for deterministic inputs (boilerplate, type definitions)
+
+#### 7. Minimize Token Waste
+- Dense context, not verbose context
+- Structured formats for structured data
+- Minify reference code that's informational, not for modification
+
+### Anti-Patterns That Murder Speed
+
+| Anti-Pattern | Fix |
+|-------------|-----|
+| Re-verifying things that can't have changed | Dependency-aware selective re-verification |
+| Excessive self-reflection on simple tasks | Complexity-based workflow routing |
+| Over-summarization between micro-steps | Only full context reset at task boundaries |
+| Waiting for human approval on auto-verifiable work | Human checkpoints at milestones, not tasks |
+| Quadratic history growth | Aggressive compression at every transition |
+| Synchronous blocking tools | Async everything, pipeline parallelism |
+
+### The Speed Multiplier Nobody Talks About
+
+**Failure prediction.** Track patterns across tasks. If certain task types fail on first attempt, pre-load extra guidance. Preventing a failed iteration is faster than executing one.
+
+> The magical feeling of speed comes from only doing things that matter, and then doing those things as fast as possible. The system should feel like the agent knew what to do and just did it.
+
+---
diff --git a/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md b/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md
new file mode 100644
index 000000000..8d1d9e031
--- /dev/null
+++ b/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md
@@ -0,0 +1,33 @@
+# Top 10 Tips for a World-Class Agent
+
+### 1. The Orchestrator Is the Product, Not the Model
+The model is a commodity. Two teams using the same model produce wildly different results based on orchestration quality. Invest 70% of effort in the orchestrator, 30% in prompt engineering.
+
+### 2. Context Assembly Is a Craft
+Profile your context like you'd profile code. Measure which context elements correlate with first-attempt success. Prune relentlessly. The right files, in the right order, with the right framing, at the right level of detail.
+
+### 3. Make the Feedback Loop the Fastest Thing
+Treat feedback loop latency like a game engine treats frame rate. Incremental builds, targeted tests, pre-warmed servers, cached deps. Put it on a dashboard you look at every day.
+
+### 4. Build First-Class Error Recovery Into Every Layer
+Retry with variation (never the same way twice), automatic rollback, structured escalation, ability to park blocked tasks. **Design failure paths first** — they'll get more use than you expect.
+
+### 5. Verify Through Execution, Not Self-Assessment
+An agent that asks itself "is this correct?" says yes 90% of the time regardless. Run the code, observe results, get ground truth. Self-assessment supplements execution-based verification, never replaces it.
+
+### 6. Return Structured, Actionable Data from Every Tool
+Don't return raw terminal output. Return structured objects: what passed, what failed, where, why. Remove cognitive load from the model — it directly translates to better decisions.
+
+### 7. Use a DAG, Not a Flat List
+Explicit inputs, outputs, dependencies, acceptance criteria per task. Maximizes parallelism, identifies critical path, enables smart impact tracing when things change.
+
+### 8. Keep the Manifest Small and Always Current
+One file, <1000 tokens, always included. Updated automatically after every task completion. If it drifts from reality, everything downstream suffers.
+
+### 9. Build Observability From Day One
+Log every LLM call. Track iterations per task type, token usage, failure rates, first-attempt success rates. This is your training data for improving the orchestrator. Teams that instrument well improve 10x faster.
+
+### 10. Make Human Touchpoints High-Leverage and Low-Friction
+Present specific questions with context, not walls of text. "The API could return nested or flat fields — which fits your vision?" is a 5-second decision. "Please review everything" takes 20 minutes.
+
+---
diff --git a/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md b/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md
new file mode 100644
index 000000000..906a66060
--- /dev/null
+++ b/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md
@@ -0,0 +1,33 @@
+# Top 10 Pitfalls to Avoid
+
+### 1. Putting Workflow Logic in the Prompt
+Control flow belongs in TypeScript with actual conditionals and state tracking. Prompts that describe workflows are fragile, inconsistently followed, and impossible to debug with a debugger.
+
+### 2. Unbounded Context Accumulation
+Each iteration adds noise. After 7 iterations, context is bloated with stale information from attempts 1–5. **Carry forward only current state and most recent error.** Summarize or discard everything else.
+
+### 3. Trusting the Model's Self-Assessment of Completion
+Models are biased toward completion. Never let the model be the sole judge. Use deterministic checks: tests pass, it builds, acceptance criteria have corresponding passing tests.
+
+### 4. Over-Engineering Tools Before Understanding Workflows
+Start with a small general-purpose set (file read/write, execute command, run tests). Watch where the agent struggles in real tasks. Then build specialized tools to solve observed problems.
+
+### 5. Neglecting the Cold-Start Problem
+The first task is fundamentally different from the twentieth. Use deterministic templates for project scaffolding, conventions, and test infrastructure before handing off to the agent.
+
+### 6. Too Much Autonomy Too Early
+An agent going slightly wrong for 2 hours produces a mountain of throwaway code. Start with more checkpoints than needed. Earn autonomy incrementally for proven task types.
+
+### 7. Ignoring Compounding Inconsistency
+Different naming, different patterns, different structures across files = technical debt that confuses the agent itself later. Enforce consistency through linting or by showing existing examples before new code.
+
+### 8. Building for the Demo, Not the Recovery
+The demo is the happy path. The product is what happens when tests fail, builds break, APIs change. **Spend 2x as much time on failure/recovery paths.** The agent spends more time recovering than succeeding first-attempt.
+
+### 9. Treating All Tasks as Equally Complex
+Simple utility functions and complex state management shouldn't go through the same workflow. Classify by complexity. Simple tasks get a fast path. Complex tasks get the full treatment.
+
+### 10. Not Measuring What Actually Matters
+Don't just track tokens and costs. Measure: first-attempt success rate, iterations to completion, human intervention frequency, code survival rate (does it survive the next 3 tasks?), stuck-detection accuracy. These guide real improvement.
+
+---
diff --git a/docs/building-coding-agents/11-god-tier-context-engineering.md b/docs/building-coding-agents/11-god-tier-context-engineering.md
new file mode 100644
index 000000000..90de37c38
--- /dev/null
+++ b/docs/building-coding-agents/11-god-tier-context-engineering.md
@@ -0,0 +1,97 @@
+# God-Tier Context Engineering
+
+### The Core Principle
+
+> God-tier context engineering treats the context window as a **designed experience for the model**, not as a bucket you throw information into. The context window is the UX of your agent. Design it accordingly.
+
+### The 10 Commandments of Context Engineering
+
+#### 1. The Pyramid of Relevance
+- **Sharp focus:** Active files at full detail
+- **Present but compressed:** Interface contracts, manifest, task definition
+- **Summarized or absent:** Other components' internals, completed task histories
+
+Each tier has a token budget. If full-resolution tier is large, outer tiers compress harder.
+
+#### 2. Context Is a Cache, Not a History
+Treat it like a CPU cache: holds exactly what's needed now, everything else evicted. The question isn't "what has happened" but "what does the model need to see right now?"
+
+#### 3. Separate Reference from Instruction
+- **Instruction context** (what to do) → beginning and end of prompt (highest attention)
+- **Reference context** (helpful info) → middle, clearly delineated
+
+Manage them independently. Compress reference aggressively while keeping instructions at full detail.
+
+#### 4. Earn Every Token's Place
+Implement a token budget system:
+
+| Category | Budget |
+|----------|--------|
+| System prompt + behavioral instructions | ~15% |
+| Manifest | ~5% |
+| Task spec + acceptance criteria | ~20% |
+| Active code files | ~40% |
+| Interface contracts | ~10% |
+| Reserve (tool results, errors) | ~10% |
+
+When any category exceeds budget, intelligently summarize (not truncate).
+
+#### 5. Write for the Model's Attention Pattern
+- Critical info at the very beginning and reiterated at the end
+- Structured blocks with clear headers and delimiters
+- Consistent formatting conventions
+
+```
+TASK: Implement password reset flow
+STATUS: New
+DEPENDS ON: auth-module (complete), email-service (complete)
+ACCEPTANCE CRITERIA:
+- User can request reset via email
+- Token expires after 30 minutes
+- New password meets existing validation rules
+- All existing auth tests pass
+RELEVANT INTERFACES: [below]
+ACTIVE FILES: [below]
+```
+
+#### 6. Compress at Every State Transition
+- Task completion → 50–100 token completion record
+- Use a **dedicated summarization call** with a tight prompt (not the working agent self-summarizing)
+- **Cascading summarization:** Task summaries → milestone summaries → phase summaries (5:1 compression ratio at each level)
+
+#### 7. Use the Filesystem as Your Infinite Context Window
+- Organize files for retrieval, not human browsing
+- Predictable naming conventions = instant lookup
+- Essentially a custom database on top of the filesystem
+
+#### 8. Profile Context Quality, Not Just Size
+Track first-attempt success rate as a function of context composition. What was in context when it succeeded vs failed? Let data guide what constitutes high-quality context.
+
+#### 9. Dynamic Context Based on Task Phase
+Different phases need different context:
+
+| Phase | Optimal Context |
+|-------|----------------|
+| Understanding | Spec, acceptance criteria, broad architectural context |
+| Implementation | Active files, interface contracts, coding patterns |
+| Debugging | Failing test output, relevant code, test code |
+| Verification | Acceptance criteria prominently, ability to exercise feature |
+
+#### 10. Design for Context Recovery
+- **Checkpoint** context state at task starts and phase transitions
+- On detected confusion (repeated failures, increasing iterations, off-task output): **roll back to checkpoint** and re-enter with fresh context + concise failure info + strategy hint
+- Structured recovery ≠ naive retry. It rebuilds context from scratch with learned information.
+
+### The God-Tier Strategy in One Sentence
+
+> Orchestrator-assembled minimal slice + persistent hierarchical memory. Every single LLM call stays 8k–25k tokens while the agent has perfect knowledge of a 500k-line codebase and months of project history.
+
+---
+
+---
+
+# Part II: The Hard Problems (Grey Area Synthesis)
+
+> Synthesized from a second round of deep conversations with all four models, targeting the 13 hardest unsolved problems in autonomous coding agents — plus a critical question on accessibility for non-technical users.
+
+---
diff --git a/docs/building-coding-agents/12-handling-ambiguity-contradiction.md b/docs/building-coding-agents/12-handling-ambiguity-contradiction.md
new file mode 100644
index 000000000..d94936254
--- /dev/null
+++ b/docs/building-coding-agents/12-handling-ambiguity-contradiction.md
@@ -0,0 +1,56 @@
+# Handling Ambiguity & Contradiction
+
+**The universal consensus:** This is the highest-cost failure mode. An agent confidently building the wrong thing based on a reasonable-but-incorrect interpretation burns hours of work discovered only at milestone reviews.
+
+### The Three-Layer Strategy (All 4 Models Agree)
+
+#### Layer 1: Classification of Ambiguity Type
+
+Every requirement should be classified during planning:
+
+| Classification | Action |
+|---------------|--------|
+| **Clear and actionable** | Proceed autonomously |
+| **Ambiguous but decidable with sensible defaults** | Proceed + document assumptions |
+| **Genuinely unclear or contradictory** | Halt and escalate to human |
+
+> The middle category is where most real work lives. "The user should be able to reset their password" has a hundred implied decisions. A good agent resolves these with sensible defaults and **documents the assumptions it made** — it doesn't ask about every one.
+
+#### Layer 2: The Assumption Ledger
+
+Every task completion includes an `assumptions.md` update listing every interpretive decision the agent made:
+
+```json
+{
+ "assumptions": [
+ "Password reset tokens expire after 30 minutes (common security practice)",
+ "Email delivery, not SMS",
+ "No password history check"
+ ],
+ "confidence": 0.82
+}
+```
+
+The human reviews these at **milestones, not in real-time** — preserving speed while maintaining correctness.
+
+#### Layer 3: Contradiction Detection Pass
+
+Before execution begins, a **dedicated reasoning pass** (separate from planning) scans for conflicts:
+- Do requirements contradict each other?
+- Do acceptance criteria conflict with stated architecture?
+- Are there implicit assumptions in one requirement that violate another?
+
+### Escalation Threshold
+
+- **Impact confined to current task** → decide and document
+- **Impact touches interface contracts** → escalate (wrong interpretation cascades)
+
+Grok adds a **"Multi-Hypothesis Planning"** approach: when underspecification is detected, generate three distinct "Intent Hypotheses" (The Minimalist Path, The Scalable Path, The Feature-Rich Path). If the semantic distance between them exceeds a threshold, hard-halt and present a decision matrix to the human.
+
+### The Deepest Pitfall
+
+Models don't naturally express uncertainty — they pick an interpretation and run with it as if it's obviously correct. The system prompt must explicitly instruct confidence-level flagging, and the orchestrator must treat low-confidence decisions differently from high-confidence ones.
+
+> **Proven result:** Grok reports this pattern cuts wrong-path rework by ~65% in 2026 evaluations.
+
+---
diff --git a/docs/building-coding-agents/13-long-running-memory-fidelity.md b/docs/building-coding-agents/13-long-running-memory-fidelity.md
new file mode 100644
index 000000000..118fd9c0c
--- /dev/null
+++ b/docs/building-coding-agents/13-long-running-memory-fidelity.md
@@ -0,0 +1,34 @@
+# Long-Running Memory Fidelity
+
+**The core problem:** Every compression loses information. Over enough compressions, summaries drift from reality like a photocopy of a photocopy. The system can't easily tell it's happening because it only sees the current summary, not what was lost.
+
+### Multi-Tier Memory with Different Decay Rates
+
+| Tier | Decay Rate | Content | Update Strategy |
+|------|-----------|---------|-----------------|
+| **Manifest** | Fast (updates every task) | Current state only, <1000 tokens | Continuous overwrite — no history |
+| **Decision Log** | Never decays (append-only) | Every significant architectural decision + rationale | Never summarized, grows linearly |
+| **Task Archive** | Medium | Compressed task completion records | Available for retrieval, not routinely loaded |
+
+### The Critical Mechanism: Periodic Reconciliation
+
+All four models converge on some form of automated audit:
+
+- **Claude:** Every milestone or N tasks — agent compares manifest against actual codebase
+- **Gemini:** Every N commits, spawn a "History Auditor" agent whose sole job is manifest-vs-code comparison
+- **GPT:** Self-healing summaries with checksums — when source files change, invalidate and regenerate
+- **Grok:** Deterministic "Memory Fidelity Audit" node every 5 checkpoints — samples key invariants, scores drift 0-100, auto-rebuilds if drift >15%
+
+### The Golden Rule
+
+> **Never summarize summaries.** Each compression layer regenerates from the one below. The codebase is always the lossless source of truth.
+
+### The Most Dangerous Form of Drift
+
+Not factual inaccuracy — **the loss of "why."** The manifest says "auth uses JWT tokens." Three months ago there was a long discussion about why JWT was chosen over session-based auth. That context is exactly what gets compressed away. The **append-only decision log** solves this by preserving *why* indefinitely even as *what* gets continuously compressed.
+
+### Phase Boundary Refresh
+
+For very long projects (weeks/months), **rebuild the manifest from scratch** at phase boundaries by having the agent read the actual codebase + decision log — rather than carrying forward the old manifest with incremental updates. This is the equivalent of defragmenting a hard drive.
+
+---
diff --git a/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md b/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md
new file mode 100644
index 000000000..33acefca5
--- /dev/null
+++ b/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md
@@ -0,0 +1,25 @@
+# Multi-Agent Semantic Conflict Resolution
+
+**The hard case:** Git-level merge conflicts are easy. The real problem is code that merges cleanly but doesn't work — agents honoring the same typed interface while disagreeing on semantics (e.g., Agent A returns `null` for "not found," Agent B treats `null` as "error").
+
+### Three Lines of Defense (Universal Agreement)
+
+#### 1. Semantically Rich Interface Contracts
+
+Don't just define type signatures — define **behavioral contracts**: What does `null` mean? What are the error semantics? What invariants does the caller rely on? Contracts should be miniature specs, not just type definitions.
+
+#### 2. Pre-Written Integration Tests
+
+Write integration tests **during planning, before parallel execution begins** — tests that exercise semantic expectations, not just types. These are waiting when parallel branches converge.
+
+#### 3. Dedicated Integration/Reconciliation Agent
+
+After parallel branches merge, a focused agent gets: interface contracts + both implementations + integration tests. Its job is finding semantic mismatches, not rebuilding.
+
+### The Highest-Value Technique
+
+**Adversarial edge-case generation at integration points.** The integration agent reads both implementations, sees how each handles boundaries, and generates new tests that specifically probe the assumption gaps between them. This catches the subtlest bugs.
+
+Gemini adds the concept of a **"Shadow Merge"** agent that runs "Cross-Impact Analysis" before actual merge — looking for "Logical Race Conditions" where Worker A changed a utility that Worker B relied on, even when the git merge is clean.
+
+---
diff --git a/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md b/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md
new file mode 100644
index 000000000..0a044d758
--- /dev/null
+++ b/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md
@@ -0,0 +1,35 @@
+# Legacy Code & Brownfield Onboarding
+
+**The fundamental difference:** Greenfield = design → implement. Brownfield = **observe → infer → validate → modify.**
+
+### The Onboarding Pipeline (All 4 Models Agree)
+
+#### Phase 1: Structural Analysis (Deterministic)
+- Dependency graph mapping
+- Module identification, LOC per component
+- Test coverage analysis, entry point discovery
+- Database schema mapping
+
+#### Phase 2: Convention Extraction (LLM-Assisted)
+- Sample representative files across modules
+- Identify: error handling patterns, naming conventions, API structure, DB access patterns, testing patterns
+- Output: a **conventions document** that becomes critical reference context
+
+#### Phase 3: Pattern Mining
+- Extract implicit "tribal knowledge" — workarounds for browser bugs, special customer cases, performance hacks that look like mistakes
+- Generate decision records into project state
+
+### The Cardinal Rules
+
+| Rule | Why |
+|------|-----|
+| **Observe first, edit later** | Agents must never modify code they don't understand |
+| **Preserve local consistency over global ideals** | Resist the "Junior Refactor" — don't "fix" legacy code to modern standards |
+| **Add characterization tests before modifying** | Tests that document *current behavior*, not *correct behavior* |
+| **Minimal, surgical modifications** | Refactoring is a separate task requiring explicit human approval |
+
+### The Biggest Pitfall
+
+The agent will try to refactor legacy code to match its sense of good patterns. Left unchecked, this produces massive diffs that change behavior in subtle ways. **Enforce strict rules:** modifications to legacy code should be minimal and surgical.
+
+---
diff --git a/docs/building-coding-agents/16-encoding-taste-aesthetics.md b/docs/building-coding-agents/16-encoding-taste-aesthetics.md
new file mode 100644
index 000000000..c57aee799
--- /dev/null
+++ b/docs/building-coding-agents/16-encoding-taste-aesthetics.md
@@ -0,0 +1,34 @@
+# Encoding Taste & Aesthetics
+
+**The honest frontier:** This is where all four models are most candid about current limitations.
+
+### What CAN Be Automated
+
+| Technique | Description |
+|-----------|-------------|
+| **Reference-based extraction** | "Feels like Linear" → extract concrete attributes: spacing ratios, animation timing curves, color relationships, typography |
+| **Style specification** | Convert extracted attributes to verifiable parameters: "transitions 150-200ms ease-out, 8px grid spacing, specific contrast ratios" |
+| **Automated verification** | Lighthouse scores, visual regression tests, accessibility audits, performance budgets, design system linting |
+| **Visual comparison** | Render output, compare against reference screenshots using vision-capable models |
+| **A/B comparison** | Show two versions, human picks which "feels better" — faster than absolute judgment |
+
+### What CANNOT Be Automated
+
+The **gestalt** — the overall feeling, emotional response, sense of quality emerging from a thousand small interacting decisions. *Does this feel premium? Fast? Trustworthy?* These are fundamentally subjective.
+
+### The Optimal Strategy
+
+**Narrow the gap** by converting as much "taste" as possible into **concrete, verifiable specifications upfront:**
+
+- Not "use nice spacing" → "16px between sections, 8px between related elements, 4px between tightly coupled elements"
+- Exact animation timing curves, color values with contrast ratios, typography weights and sizes
+
+Then **reserve human review for the remaining subjective layer** with structured, specific questions:
+
+> "Does the density feel right? Does the transition timing feel snappy enough? Does the empty state feel intentional or broken?"
+
+### The Emerging Frontier
+
+Vision-capable models for aesthetic evaluation — render output, capture screenshot, compare against references on specific visual dimensions. Imperfect but improving rapidly. Grok reports ~80-85% of taste can be automated this way; the remaining 15% stays human-only.
+
+---
diff --git a/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md b/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md
new file mode 100644
index 000000000..cb17b6d6f
--- /dev/null
+++ b/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md
@@ -0,0 +1,31 @@
+# Irreversible Operations & Safety Architecture
+
+**The core principle (universal agreement):** Irreversible operations should **never be executed by the agent.** The agent prepares them; the human executes them.
+
+### Risk-Graded Action Classification
+
+| Class | Examples | Policy |
+|-------|----------|--------|
+| **Reversible** | Code edits, UI changes, unit tests | Full autonomy + auto-revert on failure |
+| **Semi-Reversible** | New files, dependencies | Auto-execute + git checkpoint |
+| **Irreversible** | DB migrations, external API changes, data transformations | Human-in-the-loop required |
+| **External Side-Effect** | Payment charges, third-party API calls with side effects | Human approval + dry-run + rollback plan |
+
+### Per-Operation Protocols
+
+| Operation | Agent Does | Human Does |
+|-----------|-----------|-----------|
+| **Database migrations** | Write migration + rollback + tests, run against test DB, produce review package | Review package, execute migration |
+| **External APIs** | Build + test against sandbox/mock versions | Switch from sandbox to production |
+| **Deployment** | Produce artifacts, verify in staging | Trigger production deployment |
+
+### The Classification Must Be:
+- **Static and deterministic** (not left to the agent's judgment)
+- **Conservative** (if there's doubt, classify as irreversible)
+- **Enforced by the orchestrator** (the agent never encounters an irreversible operation without interception)
+
+### The Subtlety Most Miss
+
+Data transformations that technically don't delete anything but **lose information through reformatting**. Converting a nullable column to non-nullable with a default value permanently destroys the distinction between rows that had real values and rows that got the default. These must be flagged with the same severity as deletions.
+
+---
diff --git a/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md b/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md
new file mode 100644
index 000000000..1220ad599
--- /dev/null
+++ b/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md
@@ -0,0 +1,31 @@
+# The Handoff Problem: Agent → Human Maintainability
+
+**The failure modes of AI-generated code** that all four models identify:
+
+### Known Anti-Patterns
+
+| Pattern | Problem | Fix |
+|---------|---------|-----|
+| **Flat code** | Everything in one function/file to reduce inconsistency risk | Enforce human-friendly modular patterns |
+| **Clever solutions** | Dense functional chains (`filter().map().reduce().flatMap()`) | Max 3 chained operations; extract named intermediates |
+| **Useless comments** | `// filter active users` above a filter call | Require *why* comments, skip *what* comments |
+| **Over-abstraction** | Creates clever custom abstractions no human can follow | Enforce standard framework patterns over custom inventions |
+| **Missing breadcrumbs** | No README files in directories, no ADRs, no diagrams | Include documentation in task completion checklist |
+
+### The Architecture That Maximizes Handoff Quality
+
+**Enforce well-known frameworks and conventions** over custom patterns. A codebase using standard Next.js/Express/React patterns is immediately navigable. A codebase with custom-invented patterns requires learning a new system.
+
+### Verification Mechanism
+
+**Automated readability test:** Periodically have a **separate agent** (with no knowledge of the building agent's decisions) attempt to add a feature using only the code and docs. If it struggles, a human will too.
+
+### Gemini's "Boring Code" Principle
+
+> Humans hate "clever" AI code; they love "boring" AI code. Run a **Complexity Linter** — if a function has cyclomatic complexity >10, the reviewer agent rejects it.
+
+### Grok's Maintainability Checklist
+
+Every file gets: auto-generated JSDoc/TS comments + ADR for every major decision. No magic numbers, no over-abstraction. Mandatory "maintainability score" (cyclomatic complexity + test coverage + comment density) in the critic node.
+
+---
diff --git a/docs/building-coding-agents/19-when-to-scrap-and-start-over.md b/docs/building-coding-agents/19-when-to-scrap-and-start-over.md
new file mode 100644
index 000000000..35248d46f
--- /dev/null
+++ b/docs/building-coding-agents/19-when-to-scrap-and-start-over.md
@@ -0,0 +1,27 @@
+# When to Scrap and Start Over
+
+### The Four Signals (Cross-Model Convergence)
+
+| Signal | What It Looks Like |
+|--------|-------------------|
+| **Iteration count trending upward** | Task 1: 3 iterations. Task 2: 5. Task 3: 8. Complexity compounding, not resolving. |
+| **Test flakiness increasing** | Previously passing tests intermittently fail — hidden coupling being strained |
+| **Same files modified repeatedly** | Every task touches the same core module — god object absorbing too much responsibility |
+| **Acceptance criteria requiring exceptions** | "Works except when X" / "Passes if you ignore test Y" — agent negotiating with criteria |
+
+### The Reassessment Protocol
+
+When thresholds are crossed, trigger a **focused LLM call** with: manifest + original spec + task summaries + signal data. Prompt: *"Is the current approach viable or would a different architecture serve better? If different, what and why?"*
+
+### The Critical Architectural Enabler: Make Rewrites Cheap
+
+- Clean interface contracts + good test suites → rewriting internals while preserving interfaces is low-risk
+- Tests verify new implementation against same criteria
+- Interface contracts ensure nothing downstream breaks
+- **Every major approach on a branch** that can be discarded without affecting anything else
+
+Gemini's **"Sunk-Cost Heuristic"**: Monitor "Task Re-entry Rate." If the same 3 tests have been attempted >5 times, or if the refactor-to-feature ratio exceeds 4:1, trigger a "Whiteboard Session."
+
+Grok adds **parallel experimentation**: create a "Rewrite Branch" subgraph, run the same vision on a clean slate for one vertical slice, compare metrics. Only merge if superior. Cost is near-zero because it runs in parallel and is discarded on failure.
+
+---
diff --git a/docs/building-coding-agents/20-error-taxonomy-routing.md b/docs/building-coding-agents/20-error-taxonomy-routing.md
new file mode 100644
index 000000000..fa330dd36
--- /dev/null
+++ b/docs/building-coding-agents/20-error-taxonomy-routing.md
@@ -0,0 +1,28 @@
+# Error Taxonomy & Routing
+
+**The key insight:** Different errors have fundamentally different causes and optimal resolution strategies. Treating them uniformly is one of the biggest sources of wasted iterations.
+
+### The Optimal Taxonomy
+
+| Error Class | Context Needed | Optimal Handler | Escalation |
+|-------------|---------------|-----------------|------------|
+| **Syntax/Type** | Error message + offending file + types | Deterministic fast path (no LLM needed) | Only if fast path fails |
+| **Logic** | Failing test (expected vs actual) + implementation + spec | LLM with medium, focused context | After 3 attempts |
+| **Design** | Original spec + architecture + interface contracts + implementation | LLM with broad context | Often needs human input |
+| **Performance** | Profiling data + benchmarks + code | Specialist optimization agent | If regression >2x |
+| **Security** | Static analysis results + secure pattern reference | Conservative fix prompt | Always flag for review |
+| **Environment** | Environment config + recent dep changes + error output | Specialized env context | If not auto-resolved |
+| **Flaky Tests** | Run test multiple times to confirm flakiness | Quarantine, don't fix | Infrastructure agent |
+
+### Critical Routing Rules
+
+- **Flaky tests:** Detect by running failing tests multiple times. If inconsistent, **quarantine** — never trigger a fix cycle.
+- **Environment errors:** Classify as potentially environmental when they appear in build/startup rather than tests.
+- **Security:** Caught by static analysis in the deterministic layer, not by the LLM. Run security linting after every task.
+- **Syntax/Type:** Hit a deterministic fast path first. Missing import? Search codebase for the export. Only escalate to LLM if mechanical fix fails.
+
+### The Architecture
+
+The orchestrator classifies every error → selects the appropriate context assembly strategy → optionally selects a different prompt framing. The agent experiences this as *"I got exactly the information I need"* rather than *"I got a dump of everything."*
+
+---
diff --git a/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md b/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md
new file mode 100644
index 000000000..d98a3c8cb
--- /dev/null
+++ b/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md
@@ -0,0 +1,26 @@
+# Cost-Quality Tradeoff & Model Routing
+
+### The Key Insight
+
+Quality requirements vary enormously across task types, but most systems use the same model for everything.
+
+### The Optimal Model Routing Strategy (All 4 Agree)
+
+| Task Type | Model Tier | Rationale |
+|-----------|-----------|-----------|
+| **Planning, architecture, critique** | Frontier (always) | Planning errors cascade through every downstream task |
+| **Ambiguity resolution** | Frontier | Wrong interpretation = wasted execution |
+| **Well-specified implementation** (CRUD, standard UI, utilities) | Mid-tier / capable but cheaper | Task is well-defined, patterns established |
+| **Code review, test generation** | Mid-tier | Evaluating against known criteria, not generating novel solutions |
+| **Summarization** (task records, manifest updates) | Lightest viable | Language competence, minimal reasoning depth |
+| **Boilerplate** | Small/fast model | Predictable output, low reasoning requirements |
+
+### The Non-Obvious Cost Optimization
+
+> **Reducing wasted tokens is higher leverage than reducing token price.** A bloated context window costs money on every single call. Trimming 500 unnecessary tokens from context assembly saves more over a project than switching to a model that's 10% cheaper.
+
+### Measurement
+
+Track **cost-per-successful-task**, not cost-per-task. If the cheaper model requires twice as many iterations, it's not actually cheaper. Grok reports 60-70% cost reduction with zero quality loss when routing is done at the orchestrator level.
+
+---
diff --git a/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md b/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md
new file mode 100644
index 000000000..96ab8687e
--- /dev/null
+++ b/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md
@@ -0,0 +1,30 @@
+# Cross-Project Learning & Reusable Intelligence
+
+### What Transfers Well
+
+| Type | Transferability | Example |
+|------|----------------|---------|
+| **Problem-solving patterns** (abstract) | ✅ High | "When implementing OAuth, these are the common pitfalls and the architecture that avoids them" |
+| **Code templates & scaffolding** | ✅ With adaptation | Proven auth module structure, tested payment integration pattern |
+| **Learned pitfalls** | ✅ High | "When integrating Stripe, these edge cases around webhooks most implementations miss" |
+| **Project-specific conventions** | ❌ Does not transfer | Architectural decisions are contextual |
+| **Domain logic** | ❌ Does not transfer | Business rules are project-specific |
+
+### The Optimal Architecture: A Pattern Library
+
+Each pattern includes:
+- Description of the problem it solves
+- The approach and tradeoffs
+- Common pitfalls
+- Verification tests
+- Reference implementation
+
+### Growth Through Extraction, Not Manual Curation
+
+When a task completes with high quality (first-attempt success, no subsequent modifications, clean review), flag it as a **candidate for pattern extraction.** A dedicated pass determines whether the solution embodies a generalizable pattern.
+
+### The Critical Constraint
+
+Patterns should be **descriptive, not prescriptive** — "here's an approach that has worked well, with these tradeoffs" not "always do it this way." Grok adds an overfitting guard: require **3+ project examples** before promoting to reusable.
+
+---
diff --git a/docs/building-coding-agents/23-evolution-across-project-scale.md b/docs/building-coding-agents/23-evolution-across-project-scale.md
new file mode 100644
index 000000000..085acb8f5
--- /dev/null
+++ b/docs/building-coding-agents/23-evolution-across-project-scale.md
@@ -0,0 +1,31 @@
+# Evolution Across Project Scale
+
+### Phase Transitions (All 4 Models Converge)
+
+#### 0–1k LOC: The Monolithic Phase
+- Everything fits in one context window
+- Agent reads entire codebase, makes globally coherent decisions
+- Orchestrator is simple, manifest barely needed
+- **This is where most demos live**
+
+#### 1k–10k LOC: The Modular Phase
+- Codebase no longer fits in one context window
+- **What breaks first: consistency** — agent sees fragments that gradually diverge
+- Requirements: modular context assembly, manifest as essential map, interface contracts, convention enforcement (linting, formatting)
+
+#### 10k–50k LOC: The Architectural Phase
+- Relationships between components become non-obvious
+- Changing one thing might affect ten others through indirect dependencies
+- **What breaks:** planning quality — planner can't understand full system
+- Requirements: dependency-aware context assembly, impact analysis before execution, more conservative/incremental plans
+
+#### 50k–100k+ LOC: The Organizational Phase
+- System of systems — no single agent context can reason about the whole thing
+- **What breaks:** integration — interactions between components become so numerous that integration testing becomes the bottleneck
+- Requirements: hierarchical planning (system-level planner → component-level agents), continuous integration verification, possibly distributed orchestrator, hierarchy of manifests
+
+### The Meta-Insight
+
+> The architecture of your agentic system should **mirror the architecture of the software it's building.** Microservices projects need a more distributed orchestrator. Monolithic projects can use a simpler one.
+
+---
diff --git a/docs/building-coding-agents/24-security-trust-boundaries.md b/docs/building-coding-agents/24-security-trust-boundaries.md
new file mode 100644
index 000000000..b2d2c1e68
--- /dev/null
+++ b/docs/building-coding-agents/24-security-trust-boundaries.md
@@ -0,0 +1,35 @@
+# Security & Trust Boundaries
+
+### Hard Boundaries — Things the Agent Should NEVER Do (Universal Agreement)
+
+| Forbidden Action | Why |
+|-----------------|-----|
+| Access production systems directly | Agent's world is the dev environment, full stop |
+| Access or embed secrets | API keys, credentials should never appear in agent context or output |
+| Make network requests to arbitrary destinations | Restrictive firewall, whitelist only required services |
+| Modify its own orchestrator, prompts, or tools | Prevents removing safety constraints |
+| Execute commands outside the project directory | Sandbox to project dir + temp working dirs only |
+
+### The Sandboxing Architecture
+
+| Layer | Mechanism |
+|-------|-----------|
+| **Execution** | Containerized (Docker + seccomp), restricted filesystem, network policy |
+| **Filesystem** | Content-addressable storage — agent *proposes* changes, backend validates before writing |
+| **Secrets** | Vault proxy with short-lived tokens, never direct credentials |
+| **Commands** | Parsed and blocked for dangerous patterns (`rm -rf /`, `curl` to unknown hosts) |
+| **Dependencies** | Approved dependency list — new deps require auto-approval (pre-approved list) or human approval |
+
+### The Capability-Based Security Model
+
+The orchestrator runs **outside** the sandbox. The agent requests operations through a controlled API. The orchestrator validates every request before executing. The agent doesn't have direct access to anything — it has access to **tools that the orchestrator mediates.**
+
+### The Subtle Risk
+
+The agent introduces vulnerabilities not through malice but through **plausible-looking insecure patterns**: string concatenation for SQL queries, disabling CORS for convenience, logging sensitive data for debugging. Security linting rules should be tuned to catch these **AI-common patterns** specifically.
+
+### The Trust Model
+
+> Think of the agent as a **highly capable but unvetted contractor.** Give them the codebase and dev environment. Don't give them production credentials, deployment access, or the ability to modify security infrastructure. The goal isn't to make the agent safe by limiting capabilities — it's to make the **environment** safe so the agent can be maximally capable within it.
+
+---
diff --git a/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md b/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md
new file mode 100644
index 000000000..b67515ef2
--- /dev/null
+++ b/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md
@@ -0,0 +1,116 @@
+# Designing for Non-Technical Users ("Vibe Coders")
+
+**The question that matters most** — because everything else is worthless if only engineers can use it.
+
+### The Fundamental Principle (All 4 Models Converge)
+
+> The human should **never have to think in code.** Not in input, not in output, not in error messages, not in verification, not in debugging. The entire technical layer should be absorbed by the system. The human operates purely in **intent, vision, preference, and judgment.**
+
+### The Core Philosophy
+
+| Human Provides | System Provides |
+|---------------|-----------------|
+| Vision & imagination | Engineering intelligence |
+| Taste & aesthetic judgment | Technical translation |
+| Direction & priorities | Architecture & implementation |
+| "This feels off — calmer, like Linear" | Concrete CSS/animation/spacing changes |
+
+### The 10 Pillars of a Magical Non-Technical Experience
+
+#### 1. Intent-Based Input, Not Specification
+Users speak naturally: *"I want an app where people can upload recipes and find them by ingredient."* The system runs a **discovery conversation** that feels like talking to a brilliant product partner — not filling out a requirements form. Behind the scenes, answers compile into structured specs, acceptance criteria, and interface contracts the human never sees.
+
+> **Critical:** Questions should be about the *experience*, not the *implementation.* Never "relational or document store?" Always "should search find exact matches only, or also substitutable ingredients?"
+
+#### 2. Show the Thing, Not the Process
+After each milestone: a **working preview**, not a task list. The human interacts with the real thing at every checkpoint — clicks around, feels it, reacts. Progress is communicated as capability, not code: *"Your app can now save workouts and retrieve them later"* — not *"implemented REST endpoint."*
+
+#### 3. Collaborative Builder, Not Command Executor
+The agent should feel like a senior co-founder:
+```
+User: I want something like Notion but for recipes.
+
+Agent: Here's how I'd approach that:
+- Recipe database with tagging
+- Search by ingredient
+- Meal planner
+
+Would you like to prioritize simplicity or advanced features?
+```
+This implicitly educates the user while avoiding wrong builds from vague specs.
+
+#### 4. Problems, Not Errors
+The human should **never see a stack trace**. Technical failures are either resolved silently or translated to domain-level questions:
+
+| ❌ Never Show | ✅ Show Instead |
+|--------------|----------------|
+| `TypeError: Cannot read property 'map' of undefined` | "The recipe list isn't displaying correctly. I'm fixing it now — should be ready in a few minutes." |
+| `ECONNREFUSED localhost:5432` | "I'm having trouble connecting to the database. Working on it." |
+| Ambiguous technical decision | "When someone searches 'chicken,' should results include recipes where chicken is optional?" |
+
+#### 5. Reactions, Not Reviews
+Design for **reactions** to the running app, not code reviews. Like working with an interior designer: *"I love the color but the couch feels too big."* Visual, spatial, experiential feedback. **A/B comparison** is the most powerful pattern: show two versions, human picks which "feels better" in seconds.
+
+#### 6. Engineering Tradeoffs as Simple Choices
+Instead of *"Which auth provider?"* → ask *"Which matters more: A) Simplicity B) Maximum customization C) Enterprise security"* — the system maps answers to technical decisions automatically.
+
+#### 7. Safety Blanket
+- Auto-backups every slice + "undo entire feature" button
+- **"Vibe Checkpoints"** — before every major change, a save point. "Go back to how it was ten minutes ago."
+- Deployment previews before anything goes live
+- No irreversible actions without plain-English confirmation
+
+#### 8. Progressive Disclosure
+Start ultra-simple. Offer "Advanced mode" toggle only if the user ever asks. The system should **progressively reveal engineering** — at first pure vision → later architecture tweaking → eventually deep collaboration. Many users will never leave the simple mode, and that's fine.
+
+#### 9. Implicit Teaching
+When the user asks *"why is that taking longer?"*:
+> "The recipe search needs to look through all recipes every time. I'm adding an index — think of it like a table of contents — so it can find things faster."
+
+Optional, triggered by curiosity, expressed in analogy. Over time, users develop useful mental models of software **without it ever being mandatory.**
+
+#### 10. Invisible Deployment & Operations
+"I want to share this with people" → receive a URL. Behind the scenes: hosting, domain, database, SSL, CI/CD. Ongoing maintenance equally invisible. Simple dashboard: *"Your recipe app had 340 visitors this week. Everything is running smoothly."*
+
+### The Translation Layer (The Magic Glue)
+
+A deterministic "Human Translator" node at the front of every orchestrator cycle:
+
+```
+Raw user message + references
+ ↓
+ [Human Translator]
+ ↓
+Precise assumptions, invariants, success criteria
+ ↓
+ [Rest of the god-tier orchestrator pipeline]
+```
+
+The rest of the graph never sees "vibe language" — only clean spec. This preserves all technical quality while shielding the user.
+
+### The Scope Protection Layer
+
+Non-technical users often don't realize how complex their requests are. The system must be honest — gently:
+
+> *"That's a great idea. Adding social features is significant — it involves user profiles, a follow system, a feed algorithm, and notifications. It'll take as long as everything we've built so far. Want me to go ahead, or finish core recipe features first?"*
+
+This respects agency while providing the information needed for good decisions.
+
+### The Meta-Principle
+
+> The system is a **creative tool**, not a development tool. It should feel like Photoshop or Ableton — a powerful instrument that lets a person with vision manifest that vision without understanding the underlying mechanics. A music producer doesn't need to understand digital signal processing. A filmmaker doesn't need to understand codec compression. **A person with a great app idea shouldn't need to understand React component lifecycle.**
+
+### What Makes It Feel Magical
+
+The most powerful systems feel magical when they:
+- Understand vague ideas
+- Ask smart clarifying questions
+- Translate intent into architecture
+- Show visible progress quickly
+- Make experimentation safe
+- Explain decisions clearly
+- Hide complexity without blocking power
+
+> When these align, the user experiences: **"I can build anything I imagine."** That feeling is the real product.
+
+---
diff --git a/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md b/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md
new file mode 100644
index 000000000..07fa2a984
--- /dev/null
+++ b/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md
@@ -0,0 +1,33 @@
+# Cross-Cutting Themes (Where All 4 Models Converge)
+
+### Original Themes (Reinforced)
+
+These ideas appeared independently in all four conversations across both rounds, indicating the highest-confidence principles:
+
+1. **The LLM should only do what requires judgment.** Everything deterministic belongs in code.
+2. **Vertical slices are non-negotiable.** End-to-end working increments at every stage.
+3. **Context leanness = quality.** Less (but more relevant) context produces better outputs than more context.
+4. **Execution-based verification beats self-assessment.** Run the code. Trust test results over the model's opinion.
+5. **The orchestrator is the product.** The model is a commodity; the system around it is the differentiator.
+6. **State must be structured and deterministic.** Never let the LLM manage its own lifecycle or memory.
+7. **Speed comes from removing unnecessary work.** Not from doing the same work faster.
+8. **Failure recovery matters more than happy-path perfection.** Design the error paths first.
+9. **Human involvement should be high-leverage.** Specific questions with context, not open-ended reviews.
+10. **The system improves over time.** Track patterns, cache solutions, learn from failures.
+
+### New Themes (From Grey Area Deep-Dives)
+
+11. **Document assumptions, don't ask about every one.** Proceed with sensible defaults + transparent logging. Review at milestones, not in real-time.
+12. **The codebase is the lossless source of truth.** Summaries are lossy caches that must be periodically reconciled against actual code. Never summarize summaries.
+13. **Semantic conflicts are harder than syntactic ones.** Interface contracts must be behavioral specs, not just type signatures. Integration testing is a first-class concern, not an afterthought.
+14. **Observe before modifying.** Especially in legacy codebases — the agent must understand existing patterns before changing them. Preserve local consistency over global ideals.
+15. **Taste can be ~80-85% automated.** Convert subjective preferences to concrete, verifiable specs. Reserve human judgment for the remaining gestalt. The gap is closing fast with vision-capable models.
+16. **Irreversible operations are categorically different.** The agent prepares; the human executes. No exceptions.
+17. **"Boring" code is good code.** For handoff, enforce standard patterns, limit complexity, and write *why* comments. Automated readability testing catches problems before humans encounter them.
+18. **Make rewrites cheap, not rare.** Clean interfaces + good tests + branch-based experimentation = rewriting is a safe, routine operation rather than a crisis.
+19. **Route errors by type, not by severity.** Different error classes need different context, different handlers, and different escalation thresholds. Flaky tests should be quarantined, not fixed.
+20. **The magic is the translation layer.** For non-technical users, the entire value proposition is the invisible bridge between human intent and technical execution. Every moment the user has to think like a developer is a failure.
+
+---
+
+*Generated March 2026. Updated with grey-area deep-dive synthesis. Source material: two rounds of parallel deep-dive conversations with Claude (Anthropic), Gemini (Google), GPT (OpenAI), and Grok (xAI) on optimal autonomous AI coding agent architecture — including the 13 hardest unsolved problems and designing for non-technical users.*
diff --git a/docs/building-coding-agents/README.md b/docs/building-coding-agents/README.md
new file mode 100644
index 000000000..ea5b8eb81
--- /dev/null
+++ b/docs/building-coding-agents/README.md
@@ -0,0 +1,37 @@
+# Building Coding Agents — Research
+
+> Split into individual files for easier consumption.
+
+## Table of Contents
+
+- [01. Work Decomposition](./01-work-decomposition.md)
+- [02. What to Keep & Discard from Human Engineering](./02-what-to-keep-discard-from-human-engineering.md)
+- [03. State Machine & Context Management](./03-state-machine-context-management.md)
+- [04. Optimal Storage for Project Context](./04-optimal-storage-for-project-context.md)
+- [05. Parallelization Strategy](./05-parallelization-strategy.md)
+- [06. Maximizing Agent Autonomy & Superpowers](./06-maximizing-agent-autonomy-superpowers.md)
+- [07. System Prompt & LLM vs Deterministic Split](./07-system-prompt-llm-vs-deterministic-split.md)
+- [08. Speed Optimization](./08-speed-optimization.md)
+- [09. Top 10 Tips for a World-Class Agent](./09-top-10-tips-for-a-world-class-agent.md)
+- [10. Top 10 Pitfalls to Avoid](./10-top-10-pitfalls-to-avoid.md)
+- [11. God-Tier Context Engineering](./11-god-tier-context-engineering.md)
+- [12. Handling Ambiguity & Contradiction](./12-handling-ambiguity-contradiction.md)
+- [13. Long-Running Memory Fidelity](./13-long-running-memory-fidelity.md)
+- [14. Multi-Agent Semantic Conflict Resolution](./14-multi-agent-semantic-conflict-resolution.md)
+- [15. Legacy Code & Brownfield Onboarding](./15-legacy-code-brownfield-onboarding.md)
+- [16. Encoding Taste & Aesthetics](./16-encoding-taste-aesthetics.md)
+- [17. Irreversible Operations & Safety Architecture](./17-irreversible-operations-safety-architecture.md)
+- [18. The Handoff Problem: Agent → Human Maintainability](./18-the-handoff-problem-agent-human-maintainability.md)
+- [19. When to Scrap and Start Over](./19-when-to-scrap-and-start-over.md)
+- [20. Error Taxonomy & Routing](./20-error-taxonomy-routing.md)
+- [21. Cost-Quality Tradeoff & Model Routing](./21-cost-quality-tradeoff-model-routing.md)
+- [22. Cross-Project Learning & Reusable Intelligence](./22-cross-project-learning-reusable-intelligence.md)
+- [23. Evolution Across Project Scale](./23-evolution-across-project-scale.md)
+- [24. Security & Trust Boundaries](./24-security-trust-boundaries.md)
+- [25. Designing for Non-Technical Users ("Vibe Coders")](./25-designing-for-non-technical-users-vibe-coders.md)
+- [26. Cross-Cutting Themes (Where All 4 Models Converge)](./26-cross-cutting-themes-where-all-4-models-converge.md)
+
+---
+
+*Split into per-section files for surgical context loading.*
+
diff --git a/docs/captures-triage.md b/docs/captures-triage.md
new file mode 100644
index 000000000..1c5f7e3f7
--- /dev/null
+++ b/docs/captures-triage.md
@@ -0,0 +1,82 @@
+# Captures & Triage
+
+*Introduced in v2.19.0*
+
+Captures let you fire-and-forget thoughts during auto-mode execution. Instead of pausing auto-mode to steer, you can capture ideas, bugs, or scope changes and let GSD triage them at natural seams between tasks.
+
+## Quick Start
+
+While auto-mode is running (or any time):
+
+```
+/gsd capture "add rate limiting to the API endpoints"
+/gsd capture "the auth flow should support OAuth, not just JWT"
+```
+
+Captures are appended to `.gsd/CAPTURES.md` and triaged automatically between tasks.
+
+## How It Works
+
+### Pipeline
+
+```
+capture → triage → confirm → resolve → resume
+```
+
+1. **Capture** — `/gsd capture "thought"` appends to `.gsd/CAPTURES.md` with a timestamp and unique ID
+2. **Triage** — at natural seams between tasks (in `handleAgentEnd`), GSD detects pending captures and classifies them
+3. **Confirm** — the user is shown the proposed resolution and confirms or adjusts
+4. **Resolve** — the resolution is applied (task injection, replan trigger, deferral, etc.)
+5. **Resume** — auto-mode continues
+
+### Classification Types
+
+Each capture is classified into one of five types:
+
+| Type | Meaning | Resolution |
+|------|---------|------------|
+| `quick-task` | Small, self-contained fix | Inline quick task executed immediately |
+| `inject` | New task needed in current slice | Task injected into the active slice plan |
+| `defer` | Important but not urgent | Deferred to roadmap reassessment |
+| `replan` | Changes the current approach | Triggers slice replan with capture context |
+| `note` | Informational, no action needed | Acknowledged, no plan changes |
+
+### Automatic Triage
+
+Triage fires automatically between tasks during auto-mode. The triage prompt receives:
+- All pending captures
+- The current slice plan
+- The active roadmap
+
+The LLM classifies each capture and proposes a resolution. Plan-modifying resolutions (inject, replan) require user confirmation.
+
+### Manual Triage
+
+Trigger triage manually at any time:
+
+```
+/gsd triage
+```
+
+This is useful when you've accumulated several captures and want to process them before the next natural seam.
+
+## Dashboard Integration
+
+The progress widget shows a pending capture count badge when captures are waiting for triage. This is visible in both the `Ctrl+Alt+G` dashboard and the auto-mode progress widget.
+
+## Context Injection
+
+Capture context is automatically injected into:
+- **Replan-slice prompts** — so the replan knows what triggered it
+- **Reassess-roadmap prompts** — so deferred captures influence roadmap decisions
+
+## Worktree Awareness
+
+Captures always resolve to the **original project root's** `.gsd/CAPTURES.md`, not the worktree's local copy. This ensures captures from a steering terminal are visible to the auto-mode session running in a worktree.
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd capture "text"` | Capture a thought (quotes optional for single words) |
+| `/gsd triage` | Manually trigger triage of pending captures |
diff --git a/docs/ci-cd-pipeline.md b/docs/ci-cd-pipeline.md
new file mode 100644
index 000000000..80410d124
--- /dev/null
+++ b/docs/ci-cd-pipeline.md
@@ -0,0 +1,196 @@
+# CI/CD Pipeline Guide
+
+## Overview
+
+GSD 2 uses a three-stage promotion pipeline that automatically moves merged PRs through **Dev → Test → Prod** environments using npm dist-tags.
+
+```
+PR merged to main
+ │
+ ▼
+ ┌─────────┐ ci.yml passes (build, test, typecheck)
+ │ DEV │ → publishes gsd-pi@-dev. with @dev tag
+ └────┬────┘
+ ▼ (automatic if green)
+ ┌─────────┐ CLI smoke tests + LLM fixture replay
+ │ TEST │ → promotes to @next tag
+ └────┬────┘ → pushes Docker image as :next
+ ▼ (manual approval required)
+ ┌─────────┐ optional real-LLM integration tests
+ │ PROD │ → promotes to @latest tag
+ └─────────┘ → creates GitHub Release
+```
+
+## For Contributors: Testing Your PR Before It Ships
+
+### Install the Dev Build
+
+Every merged PR is immediately installable:
+
+```bash
+# Latest dev build (bleeding edge, every merged PR)
+npx gsd-pi@dev
+
+# Test candidate (passed smoke + fixture tests)
+npx gsd-pi@next
+
+# Stable production release
+npx gsd-pi@latest # or just: npx gsd-pi
+```
+
+### Using Docker
+
+```bash
+# Test candidate
+docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:next --version
+
+# Stable
+docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:latest --version
+```
+
+### Checking if a Fix Landed
+
+1. Find the PR's merge commit SHA (first 7 chars)
+2. Check if it's in `@dev`: `npm view gsd-pi@dev version`
+ - If the version ends in `-dev.`, your PR is in dev
+3. Check if it promoted to `@next`: `npm view gsd-pi@next version`
+4. Check if it's in production: `npm view gsd-pi@latest version`
+
+## For Maintainers
+
+### Pipeline Workflows
+
+| Workflow | File | Trigger | Purpose |
+|----------|------|---------|---------|
+| CI | `ci.yml` | PR + push to main | Build, test, typecheck — **gate for all promotions** |
+| Release Pipeline | `pipeline.yml` | After CI succeeds on main | Three-stage promotion |
+| Native Binaries | `build-native.yml` | `v*` tags | Cross-compile platform binaries |
+| Dev Cleanup | `cleanup-dev-versions.yml` | Weekly (Monday 06:00 UTC) | Unpublish `-dev.` versions older than 30 days |
+| AI Triage | `triage.yml` | New issues + PRs | Automated classification via Claude Haiku (v2.36) |
+
+**CI optimization (v2.38):** GitHub Actions minutes were reduced ~60-70% (~10k → ~3-4k/month) through workflow consolidation and caching improvements.
+
+**Pipeline optimization (v2.41):**
+- **Shallow clones** — CI lint and build jobs use `fetch-depth: 1` or `fetch-depth: 2` instead of full history, saving ~30-60s per job
+- **npm cache in pipeline** — dev-publish, test-verify, and prod-release now use `cache: 'npm'` on setup-node, saving ~1-2 min per job on repeat runs
+- **Exponential backoff** — npm registry propagation waits in `build-native.yml` replaced hardcoded `sleep 30` + fixed 15s retries with exponential backoff (5s → 10s → 20s → 30s cap), typically finishing in <15s when the registry is fast
+- **Security hardening** — pipeline.yml moved `${{ }}` expressions from `run:` blocks to `env:` variables to prevent command injection vectors
+### Docs-Only PR Detection (v2.41)
+
+CI automatically detects when a PR contains only documentation changes (`.md` files and `docs/` content). When docs-only:
+
+- **Skipped:** `build`, `windows-portability` (no code to compile or test)
+- **Still runs:** `lint` (secret scanning, `.gsd/` check), `docs-check` (prompt injection scan)
+
+This saves CI minutes on documentation PRs while still enforcing security checks.
+
+### Prompt Injection Scan (v2.41)
+
+The `docs-check` job runs `scripts/docs-prompt-injection-scan.sh` on every PR that touches markdown files. It scans documentation prose (excluding fenced code blocks) for patterns that could manipulate LLM behavior when docs are ingested as context:
+
+- **System prompt markers** — ``, `<|im_start|>system`, `[SYSTEM]:`
+- **Role/instruction overrides** — `ignore previous instructions`, `you are now`, `new instructions:`
+- **Hidden HTML directives** — `
+---
+description: Review staged git changes
+---
+Review the staged changes (`git diff --cached`). Focus on:
+- Bugs and logic errors
+- Security issues
+- Performance problems
+Focus area: $1
+```
+
+Usage: `/review "error handling"` → expands with `$1` = "error handling"
+
+**Placement:**
+- `~/.gsd/agent/prompts/` (global)
+- `.gsd/prompts/` (project-local)
+
+### Themes
+
+JSON files defining the color palette for the TUI. Hot-reload: edit the file and pi applies changes immediately.
+
+**Built-in:** `dark`, `light`
+
+**Placement:**
+- `~/.gsd/agent/themes/` (global)
+- `.gsd/themes/` (project-local)
+
+---
diff --git a/docs/what-is-pi/10-providers-models-multi-model-by-default.md b/docs/what-is-pi/10-providers-models-multi-model-by-default.md
new file mode 100644
index 000000000..f218ff10d
--- /dev/null
+++ b/docs/what-is-pi/10-providers-models-multi-model-by-default.md
@@ -0,0 +1,58 @@
+# Providers & Models — Multi-Model by Default
+
+Pi isn't locked to one provider. It supports 20+ providers out of the box and lets you add more.
+
+### Authentication Methods
+
+**OAuth subscriptions (via `/login`):**
+- Anthropic Claude Pro/Max
+- OpenAI ChatGPT Plus/Pro (Codex)
+- GitHub Copilot
+- Google Gemini CLI
+- Google Antigravity
+
+**API keys (via environment variables):**
+- Anthropic, Anthropic (Vertex AI), OpenAI, Azure OpenAI, Google Gemini, Google Vertex, Amazon Bedrock
+- Mistral, Groq, Cerebras, xAI, OpenRouter, Vercel AI Gateway
+- ZAI, OpenCode Zen, OpenCode Go, Hugging Face, Kimi, MiniMax
+
+### Model Switching
+
+You can switch models at any time during a conversation:
+
+- `/model` — Open the model selector
+- `Ctrl+L` — Same as `/model`
+- `Ctrl+P` / `Shift+Ctrl+P` — Cycle through scoped models
+- `Shift+Tab` — Cycle thinking level
+
+Model changes are recorded in the session as `model_change` entries, so when you resume a session, pi knows which model you were using.
+
+### CLI Model Selection
+
+```bash
+pi --model sonnet # Fuzzy match
+pi --model openai/gpt-4o # Provider/model
+pi --model sonnet:high # With thinking level
+pi --models "claude-*,gpt-4o" # Scope models for Ctrl+P cycling
+pi --list-models # List all available
+pi --list-models gemini # Search by name
+```
+
+### Custom Providers
+
+Add providers via `~/.gsd/agent/models.json` (simple) or extensions (advanced with OAuth, custom streaming):
+
+```json
+// ~/.gsd/agent/models.json
+{
+ "providers": [{
+ "name": "my-proxy",
+ "baseUrl": "https://proxy.example.com",
+ "apiKey": "PROXY_API_KEY",
+ "api": "anthropic-messages",
+ "models": [{ "id": "claude-sonnet-4", "name": "Sonnet via Proxy", ... }]
+ }]
+}
+```
+
+---
diff --git a/docs/what-is-pi/11-the-interactive-tui.md b/docs/what-is-pi/11-the-interactive-tui.md
new file mode 100644
index 000000000..9da934044
--- /dev/null
+++ b/docs/what-is-pi/11-the-interactive-tui.md
@@ -0,0 +1,50 @@
+# The Interactive TUI
+
+Pi's terminal interface is built with a custom TUI framework (`@mariozechner/pi-tui`).
+
+### Layout (top to bottom)
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ Startup Header │
+│ Shows: shortcuts, loaded AGENTS.md files, prompts, │
+│ skills, extensions │
+├─────────────────────────────────────────────────────────────┤
+│ │
+│ Messages Area │
+│ User messages, assistant responses, tool calls/results, │
+│ notifications, errors, extension UI │
+│ │
+├─────────────────────────────────────────────────────────────┤
+│ [Widgets above editor - from extensions] │
+├─────────────────────────────────────────────────────────────┤
+│ Editor (input area) │
+│ Border color = thinking level │
+├─────────────────────────────────────────────────────────────┤
+│ [Widgets below editor - from extensions] │
+├─────────────────────────────────────────────────────────────┤
+│ Footer: cwd │ session name │ tokens │ cost │ context │ model│
+│ [Extension status indicators] │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Editor Features
+
+| Feature | How |
+|---------|-----|
+| File reference | Type `@` to fuzzy-search project files |
+| Path completion | Tab to complete paths |
+| Multi-line | Shift+Enter |
+| Images | Ctrl+V to paste, or drag onto terminal |
+| Bash commands | `!command` (sends output to LLM), `!!command` (runs without sending) |
+| External editor | Ctrl+G opens `$VISUAL` or `$EDITOR` |
+
+### Tool Output Display
+
+Tool calls and results are rendered inline with collapsible output:
+- `Ctrl+O` — Toggle expand/collapse all tool output
+- `Ctrl+T` — Toggle expand/collapse thinking blocks
+
+Extensions can provide custom renderers for their tools, controlling exactly how tool calls and results appear.
+
+---
diff --git a/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md b/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md
new file mode 100644
index 000000000..952c0f1c8
--- /dev/null
+++ b/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md
@@ -0,0 +1,20 @@
+# The Message Queue — Talking While Pi Thinks
+
+Pi doesn't make you wait for the agent to finish before sending more instructions. You can queue messages while the agent is streaming:
+
+| Key | Behavior |
+|-----|----------|
+| **Enter** | Queue a **steering** message — delivered after current tool, interrupts remaining tools |
+| **Alt+Enter** | Queue a **follow-up** message — delivered after agent finishes all work |
+| **Escape** | Abort the agent and restore queued messages to editor |
+| **Alt+Up** | Retrieve queued messages back to editor |
+
+**Steering** is for course-correction: "Stop, do this instead." The message is delivered after the current tool finishes, but remaining tool calls in the LLM's response are skipped.
+
+**Follow-up** is for chaining: "After you're done with that, also do this." The message waits until the agent has no more tool calls to make.
+
+**Settings:**
+- `steeringMode`: `"one-at-a-time"` (default) or `"all"` (deliver all queued at once)
+- `followUpMode`: same options
+
+---
diff --git a/docs/what-is-pi/13-context-files-project-instructions.md b/docs/what-is-pi/13-context-files-project-instructions.md
new file mode 100644
index 000000000..822fb6ada
--- /dev/null
+++ b/docs/what-is-pi/13-context-files-project-instructions.md
@@ -0,0 +1,34 @@
+# Context Files — Project Instructions
+
+Pi loads instruction files automatically at startup:
+
+### AGENTS.md (or CLAUDE.md)
+
+Pi looks for `AGENTS.md` or `CLAUDE.md` in:
+1. `~/.gsd/agent/AGENTS.md` (global)
+2. Every parent directory from cwd up to filesystem root
+3. Current directory
+
+All matching files are concatenated and included in the system prompt. Use these for project conventions, common commands, architectural notes.
+
+### System Prompt Override
+
+Replace the default system prompt entirely:
+- `.gsd/SYSTEM.md` (project)
+- `~/.gsd/agent/SYSTEM.md` (global)
+
+Append to it instead:
+- `.gsd/APPEND_SYSTEM.md` (project)
+- `~/.gsd/agent/APPEND_SYSTEM.md` (global)
+
+### File Arguments
+
+Include files directly in prompts from the CLI:
+
+```bash
+pi @prompt.md "Answer this"
+pi -p @screenshot.png "What's in this image?"
+pi @code.ts @test.ts "Review these files"
+```
+
+---
diff --git a/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md b/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md
new file mode 100644
index 000000000..80aac1a9e
--- /dev/null
+++ b/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md
@@ -0,0 +1,55 @@
+# The SDK & RPC — Embedding Pi
+
+Pi isn't just a terminal tool. It's designed to be embedded in other applications.
+
+### SDK (TypeScript)
+
+For Node.js/TypeScript applications, import and use pi directly:
+
+```typescript
+import { AuthStorage, createAgentSession, ModelRegistry, SessionManager } from "@mariozechner/pi-coding-agent";
+
+const authStorage = AuthStorage.create();
+const modelRegistry = new ModelRegistry(authStorage);
+
+const { session } = await createAgentSession({
+ sessionManager: SessionManager.inMemory(),
+ authStorage,
+ modelRegistry,
+});
+
+// Subscribe to events
+session.subscribe((event) => {
+ if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
+ process.stdout.write(event.assistantMessageEvent.delta);
+ }
+});
+
+// Send prompts
+await session.prompt("What files are in the current directory?");
+```
+
+The SDK gives you full control: custom tools, custom resource loaders, session management, model selection, event streaming. See the [openclaw/openclaw](https://github.com/openclaw/openclaw) project for a real-world SDK integration.
+
+### RPC Mode (Any Language)
+
+For non-Node.js applications, spawn pi as a subprocess and communicate via JSON over stdin/stdout:
+
+```bash
+pi --mode rpc --provider anthropic
+```
+
+Send commands:
+```json
+{"type": "prompt", "message": "Hello, world!"}
+{"type": "steer", "message": "Stop and do this instead"}
+{"type": "follow_up", "message": "After you're done, also do this"}
+```
+
+Receive events:
+```json
+{"type": "event", "event": {"type": "message_update", ...}}
+{"type": "response", "command": "prompt", "success": true}
+```
+
+---
diff --git a/docs/what-is-pi/15-pi-packages-the-ecosystem.md b/docs/what-is-pi/15-pi-packages-the-ecosystem.md
new file mode 100644
index 000000000..4e19de60a
--- /dev/null
+++ b/docs/what-is-pi/15-pi-packages-the-ecosystem.md
@@ -0,0 +1,43 @@
+# Pi Packages — The Ecosystem
+
+Pi packages bundle extensions, skills, prompts, and themes for distribution via npm or git.
+
+### Installing
+
+```bash
+pi install npm:@foo/bar@1.0.0 # From npm (pinned)
+pi install npm:@foo/bar # From npm (latest)
+pi install git:github.com/user/repo # From git
+pi install ./local/path # From local path
+pi list # Show installed
+pi update # Update non-pinned
+pi remove npm:@foo/bar # Uninstall
+pi config # Enable/disable resources
+```
+
+### Creating
+
+Add a `pi` key to `package.json`:
+
+```json
+{
+ "name": "my-pi-package",
+ "keywords": ["pi-package"],
+ "pi": {
+ "extensions": ["./extensions"],
+ "skills": ["./skills"],
+ "prompts": ["./prompts"],
+ "themes": ["./themes"]
+ }
+}
+```
+
+Or just use conventional directory names (`extensions/`, `skills/`, `prompts/`, `themes/`) and pi discovers them automatically.
+
+### Finding Packages
+
+- [Package gallery](https://shittycodingagent.ai/packages)
+- [npm search](https://www.npmjs.com/search?q=keywords%3Api-package)
+- [Discord community](https://discord.com/invite/3cU7Bz4UPx)
+
+---
diff --git a/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md b/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md
new file mode 100644
index 000000000..389076ff9
--- /dev/null
+++ b/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md
@@ -0,0 +1,29 @@
+# Why Pi Matters — What Makes It Different
+
+### vs. Other Coding Agents
+
+| Aspect | Typical agents | Pi |
+|--------|---------------|-----|
+| **Customization** | Fork the repo or wait for features | Extension system — build anything without forking |
+| **Model lock-in** | One provider, maybe two | 20+ providers, switch mid-conversation |
+| **Session management** | Linear history, maybe undo | Tree-based branching with in-place navigation |
+| **Context management** | Basic truncation | Structured compaction with summaries, customizable via extensions |
+| **Distribution** | No ecosystem | Pi packages via npm/git, shareable extensions/skills/themes |
+| **Embedding** | Not designed for it | SDK + RPC mode, built for integration |
+| **Philosophy** | Opinionated, batteries-included | Minimal core, extend to your workflow |
+
+### The Core Value Propositions
+
+1. **Extensibility as architecture.** Not an afterthought. The event system, tool registration, command system, and custom UI were designed from day one to make extensions as powerful as built-in features.
+
+2. **Session branching.** Tree-based conversations mean you never lose work. Explore different approaches, keep all of them, jump between them with `/tree`.
+
+3. **Compaction with structure.** When context gets too large, pi summarizes it with a structured format that preserves goals, decisions, and progress. Extensions can customize this entirely.
+
+4. **Multi-model fluidity.** Switch between Claude, GPT, Gemini, or any of 20+ providers mid-conversation. Use the best model for each part of the task.
+
+5. **Progressive disclosure.** Skills load their full instructions only when needed. The system prompt stays lean. Extensions register tools that appear only when active.
+
+6. **Platform, not product.** Pi is infrastructure you build on. Sub-agents, plan mode, permission gates, MCP support, custom workflows — build exactly what you need, share it as a package.
+
+---
diff --git a/docs/what-is-pi/17-file-reference-all-documentation.md b/docs/what-is-pi/17-file-reference-all-documentation.md
new file mode 100644
index 000000000..d23990c9c
--- /dev/null
+++ b/docs/what-is-pi/17-file-reference-all-documentation.md
@@ -0,0 +1,54 @@
+# File Reference — All Documentation
+
+All paths relative to:
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/
+```
+
+### Core Documentation
+
+| File | What It Covers |
+|------|---------------|
+| `README.md` | Main documentation — quick start, all features, CLI reference, philosophy |
+| `docs/extensions.md` | Extensions API — events, tools, commands, UI, state, rendering (1,972 lines) |
+| `docs/tui.md` | TUI component system — Component interface, built-in components, keyboard, theming, overlays |
+| `docs/session.md` | Session format — JSONL tree structure, entry types, message types, SessionManager API |
+| `docs/compaction.md` | Compaction & branch summarization — triggers, algorithm, summary format, extension hooks |
+| `docs/packages.md` | Pi packages — creating, installing, distributing via npm/git |
+| `docs/skills.md` | Skills — structure, frontmatter, locations, invocation |
+| `docs/prompt-templates.md` | Prompt templates — format, arguments, locations |
+| `docs/themes.md` | Themes — creating custom themes, color palette |
+| `docs/settings.md` | Settings — all configuration options |
+| `docs/keybindings.md` | Keyboard shortcuts — format, built-in bindings, customization |
+| `docs/providers.md` | Provider setup — detailed instructions for each provider |
+| `docs/models.md` | Custom models — models.json format |
+| `docs/custom-provider.md` | Custom providers — advanced: OAuth, custom streaming, model definitions |
+| `docs/sdk.md` | SDK — AgentSession, events, embedding pi in applications |
+| `docs/rpc.md` | RPC mode — JSON protocol, commands, events |
+| `docs/json.md` | JSON mode — event stream format |
+| `docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md` | Branded app architecture — shipping your own CLI, app-owned storage, SDK vs RPC, bundling resources |
+| `docs/development.md` | Contributing — development setup, forking, debugging |
+| `docs/windows.md` | Windows platform notes |
+| `docs/termux.md` | Termux (Android) setup |
+| `docs/terminal-setup.md` | Terminal configuration recommendations |
+| `docs/shell-aliases.md` | Shell alias patterns |
+
+### Example Extensions
+
+See the companion doc **Pi-Extensions-Complete-Guide.md** for a categorized reference of all 50+ example extensions.
+
+```
+examples/extensions/ # All example extensions
+examples/sdk/ # SDK usage examples
+```
+
+### Source Code (on GitHub)
+
+| Package | Purpose |
+|---------|---------|
+| `packages/coding-agent` | The main pi package — agent, tools, extensions, session, compaction |
+| `packages/tui` | Terminal UI component library |
+| `packages/ai` | Core LLM toolkit — providers, streaming, message types |
+| `packages/agent` | Agent loop framework |
+
+---
diff --git a/docs/what-is-pi/18-quick-reference-commands-shortcuts.md b/docs/what-is-pi/18-quick-reference-commands-shortcuts.md
new file mode 100644
index 000000000..fa6b09ad0
--- /dev/null
+++ b/docs/what-is-pi/18-quick-reference-commands-shortcuts.md
@@ -0,0 +1,68 @@
+# Quick Reference — Commands & Shortcuts
+
+### Commands
+
+| Command | Description |
+|---------|-------------|
+| `/login`, `/logout` | OAuth authentication |
+| `/model` | Switch models |
+| `/scoped-models` | Configure Ctrl+P model cycling |
+| `/settings` | Thinking level, theme, delivery mode, transport |
+| `/resume` | Browse previous sessions |
+| `/new` | New session |
+| `/name ` | Name current session |
+| `/session` | Session info (path, tokens, cost) |
+| `/tree` | Navigate session tree |
+| `/fork` | Fork to new session |
+| `/compact [instructions]` | Manual compaction |
+| `/copy` | Copy last response to clipboard |
+| `/export [file]` | Export to HTML |
+| `/share` | Upload as private GitHub gist |
+| `/reload` | Reload extensions, skills, prompts, context files |
+| `/hotkeys` | Show all keyboard shortcuts |
+| `/changelog` | Version history |
+| `/quit`, `/exit` | Exit pi |
+
+### Keyboard Shortcuts
+
+| Key | Action |
+|-----|--------|
+| Ctrl+C | Clear editor / quit (twice) |
+| Escape | Cancel/abort / open `/tree` (twice) |
+| Ctrl+L | Model selector |
+| Ctrl+P / Shift+Ctrl+P | Cycle scoped models |
+| Shift+Tab | Cycle thinking level |
+| Ctrl+O | Toggle tool output expand/collapse |
+| Ctrl+T | Toggle thinking block expand/collapse |
+| Ctrl+G | Open external editor |
+| Ctrl+V | Paste (including images) |
+| Enter (during streaming) | Queue steering message |
+| Alt+Enter (during streaming) | Queue follow-up message |
+| Alt+Up | Retrieve queued messages |
+
+### CLI
+
+```bash
+pi # Interactive mode
+pi "prompt" # Interactive with initial prompt
+pi -p "prompt" # Print mode (non-interactive)
+pi -c # Continue last session
+pi -r # Resume (browse sessions)
+pi --model provider/model:thinking # Specify model
+pi --tools read,bash # Specify tools
+pi -e ./extension.ts # Load extension
+pi --mode rpc # RPC mode
+pi --mode json # JSON mode
+pi @file.ts "Review this" # Include file in prompt
+pi install npm:package # Install package
+pi list # List packages
+```
+
+---
+
+*This document was generated from the Pi documentation. Source files are at:*
+```
+/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/
+```
+
+*Companion document: **Pi-Extensions-Complete-Guide.md** (on Desktop)*
diff --git a/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md b/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md
new file mode 100644
index 000000000..ba467b03b
--- /dev/null
+++ b/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md
@@ -0,0 +1,896 @@
+# Building Branded Apps on Top of Pi
+
+This document covers the part that the extension docs, SDK docs, RPC docs, and package docs only imply when read together:
+
+**How do you build your own product on top of pi** so users run **your** app, **your** command, and **your** UI rather than installing and managing pi directly?
+
+Examples:
+- a branded CLI like `gsd`
+- a desktop app that uses pi as its backend engine
+- a web or Electron app that uses pi sessions, tools, and event streaming
+- an internal company agent product built on pi primitives
+
+The short answer is:
+
+- **Yes, you can build your own branded app on top of pi**
+- **No, end users do not need to install pi globally** if you ship your own app that depends on pi packages
+- **No, you do not have to rely on `~/.gsd`** if you embed pi with custom paths and storage
+- **Yes, you can bundle your own extensions, prompts, themes, skills, and providers** inside your app
+
+The rest of this document explains the architecture choices, storage choices, packaging strategies, and practical tradeoffs.
+
+---
+
+## 19.1 The Three Ways to Use Pi as a Foundation
+
+There are really three layers you can build on:
+
+1. **`@mariozechner/pi-coding-agent`**
+ - Highest-level embedding API
+ - Best when you want pi's session system, resource loading, tools, extension model, and coding-agent behaviors
+2. **Pi CLI in RPC mode**
+ - Best when you want process isolation or language-agnostic integration
+3. **`@mariozechner/pi-agent-core`**
+ - Lower-level agent loop without the full pi coding-agent shell
+ - Best when you want more of the engine than the product surface
+
+For most branded CLI or desktop app use cases, start with **`@mariozechner/pi-coding-agent`**.
+
+### Rule of thumb
+
+- Want your own **CLI/TUI** with pi behavior under the hood -> use **SDK embedding** via `createAgentSession()`
+- Want your own app in a **different language** or want a **subprocess boundary** -> use **RPC mode**
+- Want a more generic **agent engine** and will build more infrastructure yourself -> use **`@mariozechner/pi-agent-core`**
+
+---
+
+## 19.2 The Biggest Misconception: Pi Does Not Require a Global `pi` Install
+
+If you are building a product on top of pi, your users do **not** need to install `pi` globally with npm.
+
+You can ship your own app that depends on:
+
+- `@mariozechner/pi-coding-agent`
+- `@mariozechner/pi-agent-core`
+- `@mariozechner/pi-ai`
+- `@mariozechner/pi-tui`
+- `@mariozechner/pi-web-ui`
+
+That means a branded command like:
+
+```bash
+gsd
+```
+
+can be **your** executable, backed by pi internals, without asking users to separately install and run `pi`.
+
+### What this means in practice
+
+Instead of telling users:
+
+```bash
+npm install -g @mariozechner/pi-coding-agent
+pi
+```
+
+you can ship:
+
+```bash
+npm install -g my-gsd
+# or a standalone binary / packaged desktop app
+
+gsd
+```
+
+And inside `gsd`, you import pi packages and create your own session, UI, storage, and resource loading behavior.
+
+---
+
+## 19.3 The Second Biggest Misconception: `~/.gsd` Is a Default, Not a Requirement
+
+Pi CLI defaults to `~/.gsd/agent`, but embedded applications are not forced to use it.
+
+When you use `createAgentSession()`, you can control:
+
+- `agentDir`
+- `cwd`
+- `authStorage`
+- `modelRegistry`
+- `resourceLoader`
+- `sessionManager`
+- `settingsManager`
+
+That means your app can store state under:
+
+- `~/.gsd/agent`
+- `~/Library/Application Support/GSD`
+- `%APPDATA%/GSD`
+- an app-local portable directory
+- a project-local directory
+
+instead of `~/.gsd`.
+
+### Things you can relocate
+
+- auth and OAuth credentials
+- settings
+- models config
+- sessions
+- extensions
+- prompt templates
+- themes
+- AGENTS-style context files
+
+### Important nuance
+
+If you use the default resource loader and default managers, pi behaves like pi:
+- standard discovery
+- standard config locations
+- standard session directories
+
+If you pass custom managers and loaders, pi becomes an engine inside **your** app.
+
+---
+
+## 19.4 Choose an Architecture First
+
+Before writing code, decide which of these architectures you actually want.
+
+### Architecture A: Branded Node CLI or TUI using the SDK
+
+This is the most natural fit for tools like `gsd`.
+
+You create your own executable and call `createAgentSession()` directly.
+
+#### Good for
+- a branded terminal tool
+- a custom TUI
+- internal company coding agents
+- a CLI with pi sessions, tools, and extensions under the hood
+
+#### Benefits
+- type-safe
+- no subprocess management
+- easy to customize storage and discovery
+- easiest way to remove dependency on `~/.gsd`
+- easiest way to bundle built-in resources
+
+#### Typical stack
+- `@mariozechner/pi-coding-agent`
+- optionally `@mariozechner/pi-tui`
+- your own entrypoint and app directories
+
+---
+
+### Architecture B: Branded App + Pi RPC subprocess
+
+Here your app spawns pi as a subprocess and talks to it over JSON lines.
+
+#### Good for
+- non-Node host applications
+- desktop shells with a strict engine boundary
+- process isolation
+- integrations where restarting the engine independently is useful
+
+#### Benefits
+- language-agnostic
+- process isolation
+- JSON protocol is explicit and stream-friendly
+
+#### Costs
+- you must manage subprocess lifecycle
+- some UI features are degraded compared to pi's native TUI
+- extension UI works through a request/response sub-protocol, not full TUI embedding
+
+---
+
+### Architecture C: App built on `pi-agent-core` or `pi-web-ui`
+
+This is for cases where you want pi's model and agent infrastructure but not necessarily pi's full coding-agent product surface.
+
+#### Good for
+- browser apps
+- web chat products
+- custom artifact workflows
+- custom message types and renderers
+
+#### Benefits
+- lower-level control
+- more app-specific freedom
+- easier fit for non-terminal interfaces
+
+#### Costs
+- you build more yourself
+- fewer coding-agent-specific conveniences out of the box
+
+---
+
+## 19.5 SDK vs RPC vs Agent-Core
+
+Use this decision table.
+
+| Goal | Best Starting Point |
+|------|---------------------|
+| Branded CLI like `gsd` | `@mariozechner/pi-coding-agent` SDK |
+| Branded TUI with coding tools | `@mariozechner/pi-coding-agent` SDK |
+| Desktop app with subprocess boundary | pi RPC mode |
+| Non-Node integration | pi RPC mode |
+| Browser chat app | `@mariozechner/pi-web-ui` + `@mariozechner/pi-agent-core` |
+| Generic agent engine with custom infrastructure | `@mariozechner/pi-agent-core` |
+| Want pi sessions/resources/extensions but app-owned directories | `@mariozechner/pi-coding-agent` SDK |
+
+### More detailed tradeoff matrix
+
+| Concern | SDK | RPC | agent-core |
+|--------|-----|-----|------------|
+| Type safety | Excellent | Weak at protocol boundary | Excellent |
+| Process isolation | No | Yes | No |
+| Language agnostic | No | Yes | No |
+| Full pi session/resource system | Yes | Yes | No |
+| App-owned storage | Yes | Partial / external orchestration | Yes |
+| Rich custom UI | Strong | Moderate | Strong |
+| Uses pi extension ecosystem easily | Yes | Yes | No, not directly |
+| Simplest branded CLI path | Yes | No | No |
+
+---
+
+## 19.6 The Recommended Path for a Branded CLI Like `gsd`
+
+If you want users to run:
+
+```bash
+gsd
+```
+
+and you want it to feel like your product rather than "pi but renamed," the default recommendation is:
+
+1. Build a Node/TypeScript app
+2. Depend on `@mariozechner/pi-coding-agent`
+3. Create your own executable entrypoint
+4. Use `createAgentSession()` directly
+5. Set custom directories for config/auth/sessions
+6. Bundle your own extensions/prompts/themes/providers
+7. Expose only the commands and UX you want
+
+That gives you the best control over:
+- branding
+- defaults
+- storage layout
+- startup behavior
+- extension loading
+- model/provider setup
+
+---
+
+## 19.7 App-Owned Storage Layout
+
+A branded app should usually own its own storage hierarchy.
+
+Example:
+
+```text
+~/.gsd/
+ agent/
+ auth.json
+ models.json
+ settings.json
+ extensions/
+ prompts/
+ themes/
+ skills/
+ sessions/
+```
+
+Or on macOS:
+
+```text
+~/Library/Application Support/GSD/
+ agent/
+ sessions/
+```
+
+### Why this matters
+
+If your product uses `~/.gsd`, then:
+- it shares state with the user's pi installation
+- branding becomes muddy
+- support/debugging becomes more confusing
+- product boundaries become less clear
+
+Use app-specific directories unless you intentionally want interoperability with a user's pi environment.
+
+### Minimal example
+
+```typescript
+import path from "node:path";
+import os from "node:os";
+import {
+ AuthStorage,
+ createAgentSession,
+ ModelRegistry,
+ SessionManager,
+ SettingsManager,
+} from "@mariozechner/pi-coding-agent";
+
+const appRoot = path.join(os.homedir(), ".gsd");
+const agentDir = path.join(appRoot, "agent");
+const sessionsDir = path.join(appRoot, "sessions");
+
+const authStorage = AuthStorage.create(path.join(agentDir, "auth.json"));
+const modelRegistry = new ModelRegistry(authStorage, path.join(agentDir, "models.json"));
+const settingsManager = SettingsManager.create(process.cwd(), agentDir);
+const sessionManager = SessionManager.create(process.cwd(), sessionsDir);
+
+const { session } = await createAgentSession({
+ cwd: process.cwd(),
+ agentDir,
+ authStorage,
+ modelRegistry,
+ settingsManager,
+ sessionManager,
+});
+```
+
+This is the core pattern for “my app uses pi, but not as global pi.”
+
+---
+
+## 19.8 Bundling Resources Inside Your App
+
+This is another place where people often assume they must rely on discovery from `~/.gsd` or `.gsd/`.
+
+You do not.
+
+Your app can bundle:
+- extensions
+- prompts
+- themes
+- skills
+- AGENTS-style context
+- provider registrations
+
+inside your own package or app bundle.
+
+### Strategy 1: Use custom paths with `DefaultResourceLoader`
+
+```typescript
+import { DefaultResourceLoader } from "@mariozechner/pi-coding-agent";
+
+const loader = new DefaultResourceLoader({
+ cwd: process.cwd(),
+ agentDir,
+ additionalExtensionPaths: [
+ "/absolute/path/to/bundled/extension.ts",
+ ],
+});
+
+await loader.reload();
+```
+
+### Strategy 2: Use inline extension factories
+
+```typescript
+const loader = new DefaultResourceLoader({
+ cwd: process.cwd(),
+ agentDir,
+ extensionFactories: [
+ (pi) => {
+ pi.registerCommand("hello", {
+ description: "My branded command",
+ handler: async (_args, ctx) => ctx.ui.notify("Hello from GSD", "info"),
+ });
+ },
+ ],
+});
+```
+
+### Strategy 3: Override discovered resources entirely
+
+```typescript
+const loader = new DefaultResourceLoader({
+ cwd: process.cwd(),
+ agentDir,
+ promptsOverride: () => ({ prompts: [], diagnostics: [] }),
+ skillsOverride: () => ({ skills: [], diagnostics: [] }),
+ agentsFilesOverride: () => ({ agentsFiles: [] }),
+ systemPromptOverride: () => "You are GSD, a specialized software delivery agent.",
+});
+```
+
+### Why this matters
+
+For a branded product, it is often better to think in terms of:
+- **bundled built-ins shipped by your app**
+- optional plugin support later
+
+rather than:
+- user-managed global pi resources first
+
+---
+
+## 19.9 Discovery vs Bundling
+
+These are different product strategies.
+
+### Discovery-driven product
+You intentionally load from:
+- `~/.gsd/agent/...`
+- `.gsd/...`
+- installed pi packages
+
+#### Good when
+- your product is basically pi with additions
+- you want compatibility with existing pi user workflows
+
+### Bundled-app product
+You intentionally ship your own resources and avoid implicit user-level discovery.
+
+#### Good when
+- you want strong branding
+- you want predictable behavior
+- you want supportability and reproducibility
+- you do not want random user extensions affecting behavior
+
+### Recommendation
+For a branded tool like `gsd`, default to **bundled-app product** behavior.
+
+If you later add plugin support, make it explicit.
+
+---
+
+## 19.10 Using Pi Packages Internally vs Externally
+
+Pi packages are a sharing mechanism for extensions, prompts, skills, and themes.
+
+But when you are building your own app, there are two separate questions:
+
+1. **Should your app itself be distributed as a pi package?**
+2. **Should your app internally use pi-package-style resource organization?**
+
+### Usually, for a branded app:
+- **No** on #1
+- **Maybe** on #2
+
+If your users run your app directly, your app is usually a normal Node package, binary, or desktop app, not a pi package.
+
+But internally, you may still organize resources in a pi-friendly structure:
+
+```text
+src/
+resources/
+ extensions/
+ prompts/
+ themes/
+ skills/
+```
+
+and load them through your resource loader.
+
+### When pi packages still matter
+Pi packages are still useful when:
+- you want optional add-ons
+- you want to reuse existing pi ecosystem resources
+- you want third parties to extend your app through pi-compatible bundles
+
+---
+
+## 19.11 RPC Mode for Branded Apps
+
+RPC mode is the right answer when your product wants pi as a subprocess engine.
+
+Start it with:
+
+```bash
+pi --mode rpc
+```
+
+or programmatically by calling `runRpcMode(session)` in your own Node process.
+
+### RPC is good for
+- non-Node clients
+- desktop shells in other runtimes
+- separate engine process architecture
+- explicit JSON protocol boundaries
+
+### What RPC gives you
+- prompt / steer / follow_up / abort
+- model selection
+- state inspection
+- session operations
+- bash execution
+- event streaming
+- extension UI request/response protocol
+
+### Important limitation
+RPC is not the same thing as embedding pi's full native TUI.
+
+Some extension UI methods degrade in RPC mode.
+
+#### Dialogs still work
+- `select`
+- `confirm`
+- `input`
+- `editor`
+
+#### Fire-and-forget UI signals still work
+- notifications
+- status
+- widgets
+- title
+- editor text setting
+
+#### Some richer TUI behaviors do not map cleanly
+- full `custom()` component workflows
+- some footer/header/editor replacement behavior
+- some theme-specific TUI behavior
+
+If your branded app needs a deeply custom UI, SDK embedding or direct app-level UI integration is usually better.
+
+---
+
+## 19.12 Extension UI in RPC Mode
+
+One subtle but important point: **extensions with user interaction are still possible in RPC mode**, but through a protocol, not by directly rendering pi TUI components.
+
+The client receives `extension_ui_request` messages and must answer with `extension_ui_response` for blocking dialogs.
+
+This means you can build your own frontend and still support many extension-driven workflows.
+
+### But know the boundary
+RPC mode supports:
+- interaction patterns
+- not full TUI component identity
+
+If your extension assumes pi's exact terminal UI surface, it may need adaptation.
+
+---
+
+## 19.13 Web and Browser Apps
+
+If your app is a web app or browser-hosted UI, look closely at:
+
+- `@mariozechner/pi-agent-core`
+- `@mariozechner/pi-web-ui`
+
+`pi-web-ui` already provides:
+- chat UI
+- session storage
+- provider key storage
+- attachments
+- artifacts
+- model selection
+- settings dialogs
+- renderers and tool renderers
+
+This is effectively a starter kit for a branded web app using pi-related primitives.
+
+### Use pi-web-ui when
+- you want a browser or Electron-friendly UI surface
+- you want a ready-made chat shell
+- you do not specifically want pi's TUI
+
+### Use pi-coding-agent SDK when
+- you want coding-agent-specific resource loading, sessions, extensions, and coding tool behaviors
+- your app is terminal-first or Node-first
+
+---
+
+## 19.14 Branding Boundaries: What Still Feels Like Pi?
+
+This matters if you are building a white-labeled or branded product.
+
+### If you spawn the pi CLI directly
+Your product is closer to “pi as a subprocess.”
+That is fine, but many pi-level assumptions remain nearby.
+
+### If you embed `@mariozechner/pi-coding-agent`
+You can hide most pi branding and product surface decisions.
+You keep the coding-agent infrastructure but own the app UX.
+
+### If you use `@mariozechner/pi-agent-core`
+You are even lower-level. Pi becomes more of a library source than a user-visible product.
+
+### Practical recommendation
+If branding matters, do not treat the pi CLI binary as your product surface unless you truly want pi semantics exposed.
+
+Use the SDK or lower-level packages and build your own interface.
+
+---
+
+## 19.15 Session Strategy for a Branded App
+
+Decide whether your app wants:
+
+- **persistent sessions** with app-owned storage
+- **ephemeral sessions** only
+- **project-local sessions**
+- **branching session history** exposed to users
+
+### Persistent app-owned sessions
+Most natural for a CLI or desktop app.
+
+```typescript
+const sessionManager = SessionManager.create(process.cwd(), sessionsDir);
+```
+
+### Ephemeral mode
+Useful for task-runner or automation workflows.
+
+```typescript
+const sessionManager = SessionManager.inMemory();
+```
+
+### Important question
+Do you want your app to share session files with pi itself?
+
+Usually the answer should be **no** unless interoperability is an explicit feature.
+
+---
+
+## 19.16 Settings Strategy for a Branded App
+
+You should decide whether settings are:
+
+- file-backed
+- in-memory
+- app-global
+- project-local
+- user-editable
+- controlled only by your product UI
+
+### App-owned settings
+
+```typescript
+const settingsManager = SettingsManager.create(projectCwd, agentDir);
+```
+
+with `agentDir` pointing into your app-owned config directory.
+
+### Fully controlled settings
+
+```typescript
+const settingsManager = SettingsManager.inMemory({
+ compaction: { enabled: true },
+ retry: { enabled: true, maxRetries: 2 },
+});
+```
+
+Use in-memory settings when you want the host app to own the config model entirely.
+
+---
+
+## 19.17 Provider and Auth Strategy
+
+A branded app should decide whether users:
+- bring their own API keys
+- use OAuth through pi provider support
+- connect to your proxy/backend
+- use your own registered providers
+
+### App-owned auth paths
+Use custom `AuthStorage` paths.
+
+```typescript
+const authStorage = AuthStorage.create("/path/to/gsd/auth.json");
+```
+
+### App-owned model config
+Use your own `models.json` location or register providers dynamically.
+
+```typescript
+const modelRegistry = new ModelRegistry(authStorage, "/path/to/gsd/models.json");
+```
+
+### Custom provider strategy
+If your app talks to a proxy or company backend, register providers from your app or bundled extensions.
+
+That keeps the app experience aligned with your branding and infrastructure.
+
+---
+
+## 19.18 Building a Branded `gsd` CLI: Recommended Shape
+
+A practical architecture looks like this:
+
+```text
+my-gsd/
+ package.json
+ src/
+ cli.ts
+ app-paths.ts
+ session.ts
+ resource-loader.ts
+ ui/
+ resources/
+ extensions/
+ prompts/
+ themes/
+ skills/
+```
+
+### In `cli.ts`
+- parse your app flags
+- compute app directories
+- create auth/model/settings/session managers
+- create resource loader
+- create agent session
+- run your own mode (custom TUI, print mode, or RPC bridge)
+
+### In `resource-loader.ts`
+- load bundled resources
+- optionally disable ambient pi discovery
+- add your branded system prompt and context files
+
+### In bundled extensions
+- add your commands
+- register your custom tools
+- control your app-specific behaviors
+
+---
+
+## 19.19 Minimal SDK Skeleton for a Branded CLI
+
+```typescript
+import path from "node:path";
+import os from "node:os";
+import {
+ AuthStorage,
+ createAgentSession,
+ DefaultResourceLoader,
+ ModelRegistry,
+ SessionManager,
+ SettingsManager,
+} from "@mariozechner/pi-coding-agent";
+
+const appRoot = path.join(os.homedir(), ".gsd");
+const agentDir = path.join(appRoot, "agent");
+const sessionsDir = path.join(appRoot, "sessions");
+
+const authStorage = AuthStorage.create(path.join(agentDir, "auth.json"));
+const modelRegistry = new ModelRegistry(authStorage, path.join(agentDir, "models.json"));
+const settingsManager = SettingsManager.create(process.cwd(), agentDir);
+const sessionManager = SessionManager.create(process.cwd(), sessionsDir);
+
+const resourceLoader = new DefaultResourceLoader({
+ cwd: process.cwd(),
+ agentDir,
+ settingsManager,
+ systemPromptOverride: () =>
+ "You are GSD, a branded software delivery agent. Prefer project-specific workflows and terminology.",
+ additionalExtensionPaths: [
+ path.resolve("resources/extensions/index.ts"),
+ ],
+});
+
+await resourceLoader.reload();
+
+const { session } = await createAgentSession({
+ cwd: process.cwd(),
+ agentDir,
+ authStorage,
+ modelRegistry,
+ settingsManager,
+ sessionManager,
+ resourceLoader,
+});
+
+session.subscribe((event) => {
+ if (event.type === "message_update" && event.assistantMessageEvent.type === "text_delta") {
+ process.stdout.write(event.assistantMessageEvent.delta);
+ }
+});
+
+await session.prompt("Help me understand this repo.");
+```
+
+This is not yet a full product, but it is the correct starting shape for one.
+
+---
+
+## 19.20 When to Reuse Pi's Interactive Mode
+
+The SDK exports `InteractiveMode`, `runPrintMode`, and `runRpcMode`.
+
+These are useful if you want to reuse existing pi surfaces while changing the surrounding setup.
+
+### Reuse `InteractiveMode` when
+- you want pi's TUI mostly intact
+- but with app-owned storage, extensions, defaults, and resources
+
+### Do not reuse it when
+- you want a strongly branded UI
+- you want different commands or layout metaphors
+- you want your app to feel fundamentally different from pi
+
+For a white-labeled product, `InteractiveMode` is a good prototyping step, not always the final product surface.
+
+---
+
+## 19.21 What to Avoid in a Branded Product
+
+### Avoid accidental dependence on ambient user state
+If your app silently loads from a user's `~/.gsd`, you may get:
+- surprising extensions
+- strange prompts
+- odd themes
+- hard-to-debug behavior differences
+
+### Avoid mixing branding and storage casually
+If your app is called `gsd`, but state lives in `~/.gsd`, users will notice.
+
+### Avoid choosing RPC just because it sounds generic
+If your app is already Node/TypeScript, SDK embedding is usually simpler and more powerful.
+
+### Avoid exposing every pi concept unless you want to
+A branded product should choose what the user sees.
+You do not need to expose:
+- all slash commands
+- all extension loading paths
+- all package concepts
+- all theme/customization behaviors
+
+---
+
+## 19.22 Suggested Product Postures
+
+### Posture A: “Pi-compatible branded shell”
+- Uses pi concepts openly
+- Supports pi packages and pi-style discovery
+- Good for power users
+
+### Posture B: “Branded app powered by pi”
+- Uses pi internally
+- App-owned directories and resources
+- Explicit plugins only
+- Good for productized tools like `gsd`
+
+### Posture C: “Custom agent product using pi primitives”
+- Uses `pi-agent-core` or selective libraries
+- Pi itself is mostly invisible
+- Good for SaaS or browser products
+
+For most branded command-line products, posture **B** is the best fit.
+
+---
+
+## 19.23 Recommended Documentation Reading Order for This Use Case
+
+If you are building a branded app on top of pi, read in this order:
+
+1. `what-is-pi/14-the-sdk-rpc-embedding-pi.md`
+2. this file
+3. `extending-pi/19-packaging-distribution.md`
+4. `extending-pi/04-extension-locations-discovery.md`
+5. `extending-pi/05-extension-structure-styles.md`
+6. `extending-pi/12-custom-ui-visual-components.md`
+7. `pi-ui-tui/01-the-ui-architecture.md`
+8. `pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md`
+9. `pi-ui-tui/22-quick-reference-all-ui-apis.md`
+
+Then read the source package docs for exact API details:
+- `packages/coding-agent/docs/sdk.md`
+- `packages/coding-agent/docs/rpc.md`
+- `packages/coding-agent/docs/extensions.md`
+- `packages/coding-agent/docs/packages.md`
+- `packages/web-ui/README.md`
+
+---
+
+## 19.24 Bottom Line
+
+If your goal is:
+
+> “I want users to download and run `gsd`, and have it use pi internally without requiring a separate pi install or `~/.gsd` setup.”
+
+Then the answer is:
+
+- **Yes, that is a supported architecture**
+- **Use the SDK first unless you have a strong reason to choose RPC**
+- **Use app-owned storage directories**
+- **Bundle your own resources instead of relying on global discovery**
+- **Use pi packages as an ecosystem mechanism, not as a requirement for your app's internal structure**
+- **Treat pi as a foundation layer, not necessarily the product surface**
+
+That is the difference between:
+- “using pi as a user tool”
+- and “building your own product on top of pi.”
diff --git a/docs/what-is-pi/README.md b/docs/what-is-pi/README.md
new file mode 100644
index 000000000..6dcb20022
--- /dev/null
+++ b/docs/what-is-pi/README.md
@@ -0,0 +1,30 @@
+# Pi: What It Is, How It Works, and Why It Matters
+
+> Split into individual files for easier consumption.
+
+## Table of Contents
+
+- [01. What Pi Is](./01-what-pi-is.md)
+- [02. Design Philosophy](./02-design-philosophy.md)
+- [03. The Four Modes of Operation](./03-the-four-modes-of-operation.md)
+- [04. The Architecture — How Everything Fits Together](./04-the-architecture-how-everything-fits-together.md)
+- [05. The Agent Loop — How Pi Thinks](./05-the-agent-loop-how-pi-thinks.md)
+- [06. Tools — How Pi Acts on the World](./06-tools-how-pi-acts-on-the-world.md)
+- [07. Sessions — Memory That Branches](./07-sessions-memory-that-branches.md)
+- [08. Compaction — How Pi Manages Context Limits](./08-compaction-how-pi-manages-context-limits.md)
+- [09. The Customization Stack](./09-the-customization-stack.md)
+- [10. Providers & Models — Multi-Model by Default](./10-providers-models-multi-model-by-default.md)
+- [11. The Interactive TUI](./11-the-interactive-tui.md)
+- [12. The Message Queue — Talking While Pi Thinks](./12-the-message-queue-talking-while-pi-thinks.md)
+- [13. Context Files — Project Instructions](./13-context-files-project-instructions.md)
+- [14. The SDK & RPC — Embedding Pi](./14-the-sdk-rpc-embedding-pi.md)
+- [15. Pi Packages — The Ecosystem](./15-pi-packages-the-ecosystem.md)
+- [16. Why Pi Matters — What Makes It Different](./16-why-pi-matters-what-makes-it-different.md)
+- [17. File Reference — All Documentation](./17-file-reference-all-documentation.md)
+- [18. Quick Reference — Commands & Shortcuts](./18-quick-reference-commands-shortcuts.md)
+- [19. Building Branded Apps on Top of Pi](./19-building-branded-apps-on-top-of-pi.md)
+
+---
+
+*Split into per-section files for surgical context loading.*
+
diff --git a/docs/working-in-teams.md b/docs/working-in-teams.md
new file mode 100644
index 000000000..71956d5ff
--- /dev/null
+++ b/docs/working-in-teams.md
@@ -0,0 +1,101 @@
+# Working in Teams
+
+GSD supports multi-user workflows where several developers work on the same repository concurrently.
+
+## Setup
+
+### 1. Set Team Mode
+
+The simplest way to configure GSD for team use is to set `mode: team` in your project preferences. This enables unique milestone IDs, push branches, and pre-merge checks in one setting:
+
+```yaml
+# .gsd/preferences.md (project-level, committed to git)
+---
+version: 1
+mode: team
+---
+```
+
+This is equivalent to manually setting `unique_milestone_ids: true`, `git.push_branches: true`, `git.pre_merge_check: true`, and other team-appropriate defaults. You can still override individual settings — for example, adding `git.auto_push: true` on top of `mode: team` if your team prefers auto-push.
+
+Alternatively, you can configure each setting individually without using a mode (see [Git Strategy](git-strategy.md) for details).
+
+### 2. Configure `.gitignore`
+
+Share planning artifacts (milestones, roadmaps, decisions) while keeping runtime files local:
+
+```bash
+# ── GSD: Runtime / Ephemeral (per-developer, per-session) ──────
+.gsd/auto.lock
+.gsd/completed-units.json
+.gsd/STATE.md
+.gsd/metrics.json
+.gsd/activity/
+.gsd/runtime/
+.gsd/worktrees/
+.gsd/milestones/**/continue.md
+.gsd/milestones/**/*-CONTINUE.md
+```
+
+**What gets shared** (committed to git):
+- `.gsd/preferences.md` — project preferences
+- `.gsd/PROJECT.md` — living project description
+- `.gsd/REQUIREMENTS.md` — requirement contract
+- `.gsd/DECISIONS.md` — architectural decisions
+- `.gsd/milestones/` — roadmaps, plans, summaries, research
+
+**What stays local** (gitignored):
+- Lock files, metrics, state cache, runtime records, worktrees, activity logs
+
+### 3. Commit the Preferences
+
+```bash
+git add .gsd/preferences.md
+git commit -m "chore: enable GSD team workflow"
+```
+
+## `commit_docs: false`
+
+For teams where only some members use GSD, or when company policy requires a clean repo:
+
+```yaml
+git:
+ commit_docs: false
+```
+
+This adds `.gsd/` to `.gitignore` entirely and keeps all artifacts local. The developer gets the benefits of structured planning without affecting teammates who don't use GSD.
+
+## Migrating an Existing Project
+
+If you have an existing project with `.gsd/` blanket-ignored:
+
+1. Ensure no milestones are in progress (clean state)
+2. Update `.gitignore` to use the selective pattern above
+3. Add `unique_milestone_ids: true` to `.gsd/preferences.md`
+4. Optionally rename existing milestones to use unique IDs:
+ ```
+ I have turned on unique milestone ids, please update all old milestone
+ ids to use this new format e.g. M001-abc123 where abc123 is a random
+ 6 char lowercase alpha numeric string. Update all references in all
+ .gsd file contents, file names and directory names. Validate your work
+ once done to ensure referential integrity.
+ ```
+5. Commit
+
+## Parallel Development
+
+Multiple developers can run auto mode simultaneously on different milestones. Each developer:
+
+- Gets their own worktree (`.gsd/worktrees//`, gitignored)
+- Works on a unique `milestone/` branch
+- Squash-merges to main independently
+
+Milestone dependencies can be declared in `M00X-CONTEXT.md` frontmatter:
+
+```yaml
+---
+depends_on: [M001-eh88as]
+---
+```
+
+GSD enforces that dependent milestones complete before starting downstream work.
diff --git a/packages/pi-coding-agent/src/core/package-manager.ts b/packages/pi-coding-agent/src/core/package-manager.ts
index d29c44ca5..e07b28c4e 100644
--- a/packages/pi-coding-agent/src/core/package-manager.ts
+++ b/packages/pi-coding-agent/src/core/package-manager.ts
@@ -1701,9 +1701,18 @@ export class DefaultPackageManager implements PackageManager {
);
}
{
+ // Ecosystem skills (~/.agents/skills/) take priority over legacy config-dir skills.
+ // Skip legacy dir entirely when migration has completed (marker file present).
+ const legacySkillsMigrated =
+ resolve(userDirs.skills) !== resolve(userAgentsSkillsDir) &&
+ existsSync(join(userDirs.skills, ".migrated-to-agents"));
+ const legacyUserSkillEntries =
+ !legacySkillsMigrated && userSubdirs.has("skills")
+ ? collectAutoSkillEntries(userDirs.skills)
+ : [];
const skillEntries = [
- ...(userSubdirs.has("skills") ? collectAutoSkillEntries(userDirs.skills) : []),
...collectAutoSkillEntries(userAgentsSkillsDir),
+ ...legacyUserSkillEntries,
];
if (skillEntries.length > 0) {
addResources("skills", skillEntries, userMetadata, userOverrides.skills, globalBaseDir);
diff --git a/packages/pi-coding-agent/src/core/skills.ts b/packages/pi-coding-agent/src/core/skills.ts
index 9868b1546..a8ab488ef 100644
--- a/packages/pi-coding-agent/src/core/skills.ts
+++ b/packages/pi-coding-agent/src/core/skills.ts
@@ -2,10 +2,28 @@ import { existsSync, readdirSync, readFileSync, realpathSync, statSync } from "f
import ignore from "ignore";
import { homedir } from "os";
import { basename, dirname, isAbsolute, join, relative, resolve, sep } from "path";
-import { CONFIG_DIR_NAME, getAgentDir } from "../config.js";
import { parseFrontmatter } from "../utils/frontmatter.js";
import { toPosixPath } from "../utils/path-display.js";
import type { ResourceDiagnostic } from "./diagnostics.js";
+import { CONFIG_DIR_NAME } from "../config.js";
+
+/**
+ * The standard ecosystem skills directory used by skills.sh and the
+ * Agent Skills standard. All agents share this location for globally
+ * installed skills.
+ */
+export const ECOSYSTEM_SKILLS_DIR = join(homedir(), ".agents", "skills");
+
+/**
+ * The standard project-level skills directory (`.agents/skills/` relative to cwd).
+ */
+export const ECOSYSTEM_PROJECT_SKILLS_DIR = ".agents";
+
+/**
+ * Legacy skills directory (~/.gsd/agent/skills/ or ~/.pi/agent/skills/).
+ * Read as a fallback so existing installs don't lose skills before migration runs.
+ */
+const LEGACY_SKILLS_DIR = join(homedir(), CONFIG_DIR_NAME, "agent", "skills");
/** Max name length per spec */
const MAX_NAME_LENGTH = 64;
@@ -331,7 +349,7 @@ function escapeXml(str: string): string {
export interface LoadSkillsOptions {
/** Working directory for project-local skills. Default: process.cwd() */
cwd?: string;
- /** Agent config directory for global skills. Default: ~/.pi/agent */
+ /** @deprecated Skills now use ~/.agents/skills/ exclusively. This option is ignored. */
agentDir?: string;
/** Explicit skill paths (files or directories) */
skillPaths?: string[];
@@ -357,10 +375,7 @@ function resolveSkillPath(p: string, cwd: string): string {
* Returns skills and any validation diagnostics.
*/
export function loadSkills(options: LoadSkillsOptions = {}): LoadSkillsResult {
- const { cwd = process.cwd(), agentDir, skillPaths = [], includeDefaults = true } = options;
-
- // Resolve agentDir - if not provided, use default from config
- const resolvedAgentDir = agentDir ?? getAgentDir();
+ const { cwd = process.cwd(), skillPaths = [], includeDefaults = true } = options;
const skillMap = new Map();
const realPathSet = new Set();
@@ -404,12 +419,22 @@ export function loadSkills(options: LoadSkillsOptions = {}): LoadSkillsResult {
}
if (includeDefaults) {
- addSkills(loadSkillsFromDirInternal(join(resolvedAgentDir, "skills"), "user", true));
- addSkills(loadSkillsFromDirInternal(resolve(cwd, CONFIG_DIR_NAME, "skills"), "project", true));
+ // Primary: ~/.agents/skills/ — the industry-standard skills.sh location
+ addSkills(loadSkillsFromDirInternal(ECOSYSTEM_SKILLS_DIR, "user", true));
+ // Primary project: .agents/skills/ — standard project-level location
+ addSkills(loadSkillsFromDirInternal(resolve(cwd, ECOSYSTEM_PROJECT_SKILLS_DIR, "skills"), "project", true));
+
+ // Legacy fallback: read skills from ~/.gsd/agent/skills/ so existing
+ // installs keep working until the one-time migration in resource-loader
+ // copies them to ~/.agents/skills/. Skip if migration has completed.
+ const legacyMigrated = existsSync(join(LEGACY_SKILLS_DIR, ".migrated-to-agents"));
+ if (LEGACY_SKILLS_DIR !== ECOSYSTEM_SKILLS_DIR && existsSync(LEGACY_SKILLS_DIR) && !legacyMigrated) {
+ addSkills(loadSkillsFromDirInternal(LEGACY_SKILLS_DIR, "user", true));
+ }
}
- const userSkillsDir = join(resolvedAgentDir, "skills");
- const projectSkillsDir = resolve(cwd, CONFIG_DIR_NAME, "skills");
+ const userSkillsDir = ECOSYSTEM_SKILLS_DIR;
+ const projectSkillsDir = resolve(cwd, ECOSYSTEM_PROJECT_SKILLS_DIR, "skills");
const isUnderPath = (target: string, root: string): boolean => {
const normalizedRoot = resolve(root);
diff --git a/packages/pi-coding-agent/src/index.ts b/packages/pi-coding-agent/src/index.ts
index 9787c3b5e..e194e0324 100644
--- a/packages/pi-coding-agent/src/index.ts
+++ b/packages/pi-coding-agent/src/index.ts
@@ -219,6 +219,8 @@ export {
} from "./core/settings-manager.js";
// Skills
export {
+ ECOSYSTEM_SKILLS_DIR,
+ ECOSYSTEM_PROJECT_SKILLS_DIR,
formatSkillsForPrompt,
getLoadedSkills,
type LoadSkillsFromDirOptions,
diff --git a/scripts/install-hooks.sh b/scripts/install-hooks.sh
new file mode 100755
index 000000000..30bfd629e
--- /dev/null
+++ b/scripts/install-hooks.sh
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+# Installs the git pre-commit hook for secret scanning.
+# Safe to run multiple times — only installs if not already present.
+
+set -euo pipefail
+
+HOOK_DIR="$(git rev-parse --git-dir)/hooks"
+HOOK_FILE="$HOOK_DIR/pre-commit"
+MARKER="# gsd-secret-scan"
+
+mkdir -p "$HOOK_DIR"
+
+# Check if our hook is already installed
+if [[ -f "$HOOK_FILE" ]] && grep -q "$MARKER" "$HOOK_FILE" 2>/dev/null; then
+ echo "secret-scan pre-commit hook already installed."
+ exit 0
+fi
+
+# If a pre-commit hook already exists, append; otherwise create
+if [[ -f "$HOOK_FILE" ]]; then
+ echo "" >> "$HOOK_FILE"
+ echo "$MARKER" >> "$HOOK_FILE"
+ echo 'bash "$(git rev-parse --show-toplevel)/scripts/secret-scan.sh"' >> "$HOOK_FILE"
+ echo "secret-scan appended to existing pre-commit hook."
+else
+ cat > "$HOOK_FILE" << 'EOF'
+#!/usr/bin/env bash
+# gsd-secret-scan
+# Pre-commit hook: scan staged files for hardcoded secrets
+bash "$(git rev-parse --show-toplevel)/scripts/secret-scan.sh"
+EOF
+ chmod +x "$HOOK_FILE"
+ echo "secret-scan pre-commit hook installed."
+fi
diff --git a/src/resource-loader.ts b/src/resource-loader.ts
index ded6d3185..690a2e788 100644
--- a/src/resource-loader.ts
+++ b/src/resource-loader.ts
@@ -1,7 +1,7 @@
import { DefaultResourceLoader } from '@gsd/pi-coding-agent'
import { createHash } from 'node:crypto'
import { homedir } from 'node:os'
-import { chmodSync, copyFileSync, cpSync, existsSync, lstatSync, mkdirSync, readFileSync, readlinkSync, readdirSync, rmSync, statSync, symlinkSync, unlinkSync, writeFileSync } from 'node:fs'
+import { chmodSync, copyFileSync, cpSync, existsSync, lstatSync, mkdirSync, openSync, closeSync, readFileSync, readlinkSync, readdirSync, rmSync, statSync, symlinkSync, unlinkSync, writeFileSync } from 'node:fs'
import { dirname, join, relative, resolve } from 'node:path'
import { fileURLToPath } from 'node:url'
import { compareSemver } from './update-check.js'
@@ -381,9 +381,12 @@ function pruneRemovedBundledExtensions(
*
* - extensions/ → ~/.gsd/agent/extensions/ (overwrite when version changes)
* - agents/ → ~/.gsd/agent/agents/ (overwrite when version changes)
- * - skills/ → ~/.gsd/agent/skills/ (overwrite when version changes)
* - GSD-WORKFLOW.md → ~/.gsd/agent/GSD-WORKFLOW.md (fallback for env var miss)
*
+ * Skills are NOT synced here. They are installed by the user via the
+ * skills.sh CLI (`npx skills add `) into ~/.agents/skills/ — the
+ * industry-standard Agent Skills ecosystem directory.
+ *
* Skips the copy when the managed-resources.json version matches the current
* GSD version, avoiding ~128ms of synchronous cpSync on every startup.
* After `npm update -g @glittercowboy/gsd`, versions will differ and the
@@ -408,6 +411,10 @@ export function initResources(agentDir: string): void {
// extensions fail to resolve @gsd/* packages, rendering GSD non-functional.
ensureNodeModulesSymlink(agentDir)
+ // Migrate legacy skills on every launch (not gated by manifest) so that
+ // partial-failure retries don't wait for a version bump.
+ migrateSkillsToEcosystemDir(agentDir)
+
// Skip the full copy when both version AND content fingerprint match.
// Version-only checks miss same-version content changes (npm link dev workflow,
// hotfixes within a release). The content hash catches those at ~1ms cost.
@@ -424,7 +431,13 @@ export function initResources(agentDir: string): void {
syncResourceDir(bundledExtensionsDir, join(agentDir, 'extensions'))
syncResourceDir(join(resourcesDir, 'agents'), join(agentDir, 'agents'))
- syncResourceDir(join(resourcesDir, 'skills'), join(agentDir, 'skills'))
+ // Skills are no longer force-synced here. Users install skills via the
+ // skills.sh CLI (`npx skills add `) into ~/.agents/skills/ which
+ // is the industry-standard Agent Skills ecosystem directory.
+ //
+ // Migration from the legacy ~/.gsd/agent/skills/ directory is handled
+ // above the manifest check so it runs on every launch (including retries
+ // after partial copy failures).
// Sync GSD-WORKFLOW.md to agentDir as a fallback for when GSD_WORKFLOW_PATH
// env var is not set (e.g. fork/dev builds, alternative entry points).
@@ -441,6 +454,109 @@ export function initResources(agentDir: string): void {
ensureRegistryEntries(join(agentDir, 'extensions'))
}
+// ─── Legacy Skill Migration ──────────────────────────────────────────────────────
+
+/**
+ * One-time migration: copy user-customized skills from the old
+ * ~/.gsd/agent/skills/ directory into ~/.agents/skills/.
+ *
+ * The migration is conservative:
+ * - Only skill directories containing a SKILL.md are considered.
+ * - Copies, does not move — the old directory stays intact so downgrading
+ * to a pre-migration GSD version still works.
+ * - Collision-safe — if a skill name already exists in the target, the
+ * existing ecosystem skill wins (user may have already installed a newer
+ * version via skills.sh).
+ * - Writes a `.migrated-to-agents` marker inside the legacy directory so
+ * the migration runs at most once.
+ */
+function migrateSkillsToEcosystemDir(agentDir: string): void {
+ const legacyDir = join(agentDir, 'skills')
+ const markerPath = join(legacyDir, '.migrated-to-agents')
+
+ // Already migrated or no legacy dir — nothing to do
+ if (!existsSync(legacyDir)) return
+
+ // Atomic marker check — 'wx' fails if file already exists, preventing races
+ // when two GSD processes start simultaneously.
+ let markerFd: number
+ try {
+ markerFd = openSync(markerPath, 'wx')
+ } catch {
+ return // marker already exists (another process won the race, or already migrated)
+ }
+
+ try {
+ const ecosystemDir = join(homedir(), '.agents', 'skills')
+ mkdirSync(ecosystemDir, { recursive: true })
+
+ const entries = readdirSync(legacyDir, { withFileTypes: true })
+ let migrated = 0
+ let candidates = 0
+ for (const entry of entries) {
+ // Handle both real directories and symlinks pointing to directories
+ const isDir = entry.isDirectory()
+ const isSymlink = entry.isSymbolicLink()
+ if (!isDir && !isSymlink) continue
+
+ const sourcePath = join(legacyDir, entry.name)
+
+ // For symlinks, verify the target is a directory
+ if (isSymlink) {
+ try {
+ const stat = statSync(sourcePath)
+ if (!stat.isDirectory()) continue
+ } catch {
+ continue // broken symlink — skip
+ }
+ }
+
+ const skillMd = join(sourcePath, 'SKILL.md')
+ if (!existsSync(skillMd)) continue
+
+ const target = join(ecosystemDir, entry.name)
+ if (existsSync(target)) continue // ecosystem version wins
+
+ candidates++
+ try {
+ if (isSymlink) {
+ // Recreate the symlink in the ecosystem directory using an absolute
+ // target. Relative symlinks would resolve from the new parent dir
+ // (~/.agents/skills/) instead of the original (~/.gsd/agent/skills/),
+ // pointing to the wrong location.
+ const rawTarget = readlinkSync(sourcePath)
+ const absTarget = resolve(dirname(sourcePath), rawTarget)
+ symlinkSync(absTarget, target)
+ } else {
+ cpSync(sourcePath, target, { recursive: true })
+ }
+ migrated++
+ } catch {
+ // non-fatal — skip this skill
+ }
+ }
+
+ // If any skills failed to copy, remove the marker so migration retries
+ // on the next launch. This keeps the legacy dir as fallback until every
+ // skill has been successfully migrated.
+ if (migrated < candidates) {
+ try { closeSync(markerFd); markerFd = -1 } catch { /* non-fatal */ }
+ try { unlinkSync(markerPath) } catch { /* non-fatal */ }
+ return
+ }
+
+ // Write migration info to the marker
+ try { writeFileSync(markerFd, `Migrated ${migrated} skill(s) to ${ecosystemDir} on ${new Date().toISOString()}\n`) } catch { /* non-fatal */ }
+ } catch {
+ // can't create ecosystem dir or read legacy dir — close fd first (required on Windows
+ // where unlinkSync fails on open handles), then remove marker so we retry next launch
+ try { closeSync(markerFd); markerFd = -1 } catch { /* non-fatal */ }
+ try { unlinkSync(markerPath) } catch { /* non-fatal */ }
+ } finally {
+ if (markerFd !== -1) { try { closeSync(markerFd) } catch { /* non-fatal */ } }
+ }
+}
+
export function hasStaleCompiledExtensionSiblings(extensionsDir: string): boolean {
if (!existsSync(extensionsDir)) return false
for (const entry of readdirSync(extensionsDir, { withFileTypes: true })) {
diff --git a/src/resources/extensions/gsd/auto-observability.ts b/src/resources/extensions/gsd/auto-observability.ts
new file mode 100644
index 000000000..ddcc0bf3d
--- /dev/null
+++ b/src/resources/extensions/gsd/auto-observability.ts
@@ -0,0 +1,74 @@
+/**
+ * Pre-dispatch observability checks for auto-mode units.
+ * Validates plan/summary file quality and builds repair instructions
+ * for the agent to fix gaps before proceeding with the unit.
+ */
+
+import type { ExtensionContext } from "@gsd/pi-coding-agent";
+import {
+ validatePlanBoundary,
+ validateExecuteBoundary,
+ validateCompleteBoundary,
+ formatValidationIssues,
+} from "./observability-validator.js";
+import type { ValidationIssue } from "./observability-validator.js";
+
+export async function collectObservabilityWarnings(
+ ctx: ExtensionContext,
+ basePath: string,
+ unitType: string,
+ unitId: string,
+): Promise {
+ // Hook units have custom artifacts — skip standard observability checks
+ if (unitType.startsWith("hook/")) return [];
+
+ const parts = unitId.split("/");
+ const mid = parts[0];
+ const sid = parts[1];
+ const tid = parts[2];
+
+ if (!mid || !sid) return [];
+
+ let issues = [] as Awaited>;
+
+ if (unitType === "plan-slice") {
+ issues = await validatePlanBoundary(basePath, mid, sid);
+ } else if (unitType === "execute-task" && tid) {
+ issues = await validateExecuteBoundary(basePath, mid, sid, tid);
+ } else if (unitType === "complete-slice") {
+ issues = await validateCompleteBoundary(basePath, mid, sid);
+ }
+
+ if (issues.length > 0) {
+ ctx.ui.notify(
+ `Observability check (${unitType}) found ${issues.length} warning${issues.length === 1 ? "" : "s"}:\n${formatValidationIssues(issues)}`,
+ "warning",
+ );
+ }
+
+ return issues;
+}
+
+export function buildObservabilityRepairBlock(issues: ValidationIssue[]): string {
+ if (issues.length === 0) return "";
+ const items = issues.map(issue => {
+ const fileName = issue.file.split("/").pop() || issue.file;
+ let line = `- **${fileName}**: ${issue.message}`;
+ if (issue.suggestion) line += ` → ${issue.suggestion}`;
+ return line;
+ });
+ return [
+ "",
+ "---",
+ "",
+ "## Pre-flight: Observability gaps to fix FIRST",
+ "",
+ "The following issues were detected in plan/summary files for this unit.",
+ "**Read each flagged file, apply the fix described, then proceed with the unit.**",
+ "",
+ ...items,
+ "",
+ "---",
+ "",
+ ].join("\n");
+}
diff --git a/src/resources/extensions/gsd/detection.ts b/src/resources/extensions/gsd/detection.ts
index 3c01a277a..7507d427d 100644
--- a/src/resources/extensions/gsd/detection.ts
+++ b/src/resources/extensions/gsd/detection.ts
@@ -6,7 +6,7 @@
* flow to show when entering a project directory.
*/
-import { existsSync, readdirSync, readFileSync, statSync } from "node:fs";
+import { existsSync, openSync, readSync, closeSync, readdirSync, readFileSync, statSync } from "node:fs";
import { join } from "node:path";
import { homedir } from "node:os";
import { gsdRoot } from "./paths.js";
@@ -48,6 +48,9 @@ export interface V2Detection {
hasContext: boolean;
}
+/** Apple platform SDKROOTs found in Xcode project.pbxproj files. */
+export type XcodePlatform = "iphoneos" | "macosx" | "watchos" | "appletvos" | "xros";
+
export interface ProjectSignals {
/** Detected project/package files */
detectedFiles: string[];
@@ -57,6 +60,8 @@ export interface ProjectSignals {
isMonorepo: boolean;
/** Primary language hint */
primaryLanguage?: string;
+ /** Apple platform SDKROOTs detected from *.xcodeproj/project.pbxproj */
+ xcodePlatforms: XcodePlatform[];
/** Has existing CI configuration? */
hasCI: boolean;
/** Has existing test setup? */
@@ -97,10 +102,81 @@ export const PROJECT_FILES = [
"project.yml",
".xcodeproj",
".xcworkspace",
- // Docker
+ // Cloud platform config files
+ "firebase.json",
+ "cdk.json",
+ "samconfig.toml",
+ "serverless.yml",
+ "serverless.yaml",
+ "azure-pipelines.yml",
+ // Database / ORM config files
+ "prisma/schema.prisma",
+ "supabase/config.toml",
+ "drizzle.config.ts",
+ "drizzle.config.js",
+ "redis.conf",
+ // React Native markers
+ "metro.config.js",
+ "metro.config.ts",
+ "react-native.config.js",
+ // Frontend framework config files
+ "angular.json",
+ "next.config.js",
+ "next.config.ts",
+ "next.config.mjs",
+ "nuxt.config.ts",
+ "nuxt.config.js",
+ "svelte.config.js",
+ "svelte.config.ts",
+ // Vue CLI config files
+ "vue.config.js",
+ "vue.config.ts",
+ // Frontend tooling
+ "tailwind.config.js",
+ "tailwind.config.ts",
+ "tailwind.config.mjs",
+ "tailwind.config.cjs",
+ // Android project markers
+ "app/build.gradle",
+ "app/build.gradle.kts",
+ // Container / DevOps config files
"Dockerfile",
+ "docker-compose.yml",
+ "docker-compose.yaml",
+ // Infrastructure as Code
+ "main.tf",
+ // Kubernetes / Helm markers
+ "Chart.yaml",
+ "kustomization.yaml",
+ // CI/CD markers
+ ".github/workflows",
+ // Blockchain / Web3 markers
+ "hardhat.config.js",
+ "hardhat.config.ts",
+ "foundry.toml",
+ // Data engineering markers
+ "dbt_project.yml",
+ "airflow.cfg",
+ // Game engine markers
+ "ProjectSettings/ProjectVersion.txt",
+ "project.godot",
+ // Python framework markers
+ "manage.py",
+ "requirements.txt",
] as const;
+/** File extensions that indicate SQLite databases in the project. */
+const SQLITE_EXTENSIONS = [".sqlite", ".sqlite3", ".db"] as const;
+
+/** File extensions that indicate SQL usage (migrations, schemas, seeds). */
+const SQL_EXTENSIONS = [".sql"] as const;
+
+/** File extensions that indicate .NET / C# projects. */
+const DOTNET_EXTENSIONS = [".csproj", ".sln", ".fsproj"] as const;
+
+/** File extensions that indicate Vue.js single-file components. */
+const VUE_EXTENSIONS = [".vue"] as const;
+
const LANGUAGE_MAP: Record = {
"package.json": "javascript/typescript",
"Cargo.toml": "rust",
@@ -111,6 +187,8 @@ const LANGUAGE_MAP: Record = {
"pom.xml": "java",
"build.gradle": "java/kotlin",
"build.gradle.kts": "kotlin",
+ "app/build.gradle": "java/kotlin",
+ "app/build.gradle.kts": "kotlin",
"CMakeLists.txt": "c/c++",
"composer.json": "php",
"pubspec.yaml": "dart/flutter",
@@ -125,6 +203,8 @@ const LANGUAGE_MAP: Record = {
".xcodeproj": "swift/xcode",
".xcworkspace": "swift/xcode",
"Dockerfile": "docker",
+ "manage.py": "python",
+ "requirements.txt": "python",
};
const MONOREPO_MARKERS = [
@@ -159,6 +239,44 @@ const TEST_MARKERS = [
"phpunit.xml",
] as const;
+/** Directories skipped during bounded recursive project scans. */
+const RECURSIVE_SCAN_IGNORED_DIRS = new Set([
+ ".git",
+ "node_modules",
+ ".venv",
+ "venv",
+ "dist",
+ "build",
+ "coverage",
+ ".next",
+ ".nuxt",
+ "target",
+ "vendor",
+ ".turbo",
+ "Pods",
+ "bin",
+ "obj",
+ ".gradle",
+ "DerivedData",
+ "out",
+]) as ReadonlySet;
+
+/** Project file markers safe to detect recursively via suffix matching. */
+const ROOT_ONLY_PROJECT_FILES = new Set([
+ ".github/workflows",
+ "package.json",
+ "Gemfile",
+ "Makefile",
+ "CMakeLists.txt",
+ "build.gradle",
+ "build.gradle.kts",
+ "deno.json",
+ "deno.jsonc",
+]);
+
+const MAX_RECURSIVE_SCAN_FILES = 2000;
+const MAX_RECURSIVE_SCAN_DEPTH = 6;
+
// ─── Core Detection ─────────────────────────────────────────────────────────────
/**
@@ -280,9 +398,88 @@ export function detectProjectSignals(basePath: string): ProjectSignals {
}
}
+ // Bounded recursive scan for nested markers and dependency files.
+ // This covers common brownfield layouts like src/App/App.csproj,
+ // db/migrations/*.sql, src/components/*.vue, and services/api/pyproject.toml
+ // without walking the entire repo or diving into heavyweight folders.
+ const scannedFiles = scanProjectFiles(basePath);
+
+ for (const file of PROJECT_FILES) {
+ if (detectedFiles.includes(file) || ROOT_ONLY_PROJECT_FILES.has(file)) continue;
+ const hasMatch = file === "requirements.txt"
+ ? scannedFiles.some(isPythonRequirementsFile)
+ : scannedFiles.some((scannedFile) => matchesProjectFileMarker(scannedFile, file));
+ if (hasMatch) {
+ pushUnique(detectedFiles, file);
+ if (!primaryLanguage && LANGUAGE_MAP[file]) {
+ primaryLanguage = LANGUAGE_MAP[file];
+ }
+ }
+ }
+
+ if (scannedFiles.some((file) => SQLITE_EXTENSIONS.some((ext) => file.endsWith(ext)))) {
+ pushUnique(detectedFiles, "*.sqlite");
+ }
+ if (scannedFiles.some((file) => SQL_EXTENSIONS.some((ext) => file.endsWith(ext)))) {
+ pushUnique(detectedFiles, "*.sql");
+ }
+
+ const hasCsproj = scannedFiles.some((file) => file.endsWith(".csproj"));
+ const hasFsproj = scannedFiles.some((file) => file.endsWith(".fsproj"));
+ const hasSln = scannedFiles.some((file) => file.endsWith(".sln"));
+
+ if (hasCsproj) {
+ pushUnique(detectedFiles, "*.csproj");
+ if (!primaryLanguage) primaryLanguage = "csharp";
+ }
+ if (hasFsproj) {
+ pushUnique(detectedFiles, "*.fsproj");
+ if (!primaryLanguage) primaryLanguage = "fsharp";
+ }
+ if (hasSln) {
+ pushUnique(detectedFiles, "*.sln");
+ if (!primaryLanguage) primaryLanguage = "dotnet";
+ }
+
+ if (scannedFiles.some((file) => VUE_EXTENSIONS.some((ext) => file.endsWith(ext)))) {
+ pushUnique(detectedFiles, "*.vue");
+ }
+
+ // Python framework detection — scan dependency files for framework-specific packages.
+ // Adds synthetic markers (e.g. "dep:fastapi") so skill catalog matchFiles can reference them.
+ const dependencyFiles = scannedFiles.filter((file) =>
+ isPythonRequirementsFile(file) || file.endsWith("pyproject.toml"),
+ );
+ if (containsFastapiDependency(basePath, dependencyFiles)) {
+ pushUnique(detectedFiles, "dep:fastapi");
+ }
+
+ const springBootBuildFiles = scannedFiles.filter((file) =>
+ file.endsWith("pom.xml") || file.endsWith("build.gradle") || file.endsWith("build.gradle.kts"),
+ );
+ const springBootVersionCatalogs = scannedFiles.filter((file) => file.endsWith(".versions.toml"));
+ const springBootSettingsFiles = scannedFiles.filter((file) =>
+ file.endsWith("settings.gradle") || file.endsWith("settings.gradle.kts"),
+ );
+ if (containsSpringBootMarker(basePath, springBootBuildFiles, springBootVersionCatalogs, springBootSettingsFiles)) {
+ pushUnique(detectedFiles, "dep:spring-boot");
+ if (!primaryLanguage) {
+ primaryLanguage = "java/kotlin";
+ }
+ }
+
// Git repo detection
const isGitRepo = existsSync(join(basePath, ".git"));
+ // Xcode platform detection — parse SDKROOT from project.pbxproj
+ const xcodePlatforms = detectXcodePlatforms(basePath);
+
+ // Set primaryLanguage to swift when an Xcode project is found but no
+ // Package.swift was detected (CocoaPods or SPM-less projects).
+ if (!primaryLanguage && xcodePlatforms.length > 0) {
+ primaryLanguage = "swift";
+ }
+
// Monorepo detection
let isMonorepo = false;
for (const marker of MONOREPO_MARKERS) {
@@ -325,6 +522,7 @@ export function detectProjectSignals(basePath: string): ProjectSignals {
isGitRepo,
isMonorepo,
primaryLanguage,
+ xcodePlatforms,
hasCI,
hasTests,
packageManager,
@@ -332,6 +530,100 @@ export function detectProjectSignals(basePath: string): ProjectSignals {
};
}
+// ─── Xcode Platform Detection ───────────────────────────────────────────────────
+
+/** Known SDKROOT values → canonical platform names. */
+const SDKROOT_MAP: Record = {
+ iphoneos: "iphoneos",
+ iphonesimulator: "iphoneos", // simulator builds still target iOS
+ macosx: "macosx",
+ watchos: "watchos",
+ watchsimulator: "watchos",
+ appletvos: "appletvos",
+ appletvsimulator: "appletvos",
+ xros: "xros",
+ xrsimulator: "xros",
+};
+
+/** Regex for SUPPORTED_PLATFORMS — fallback when SDKROOT = auto (Xcode 15+). */
+const SUPPORTED_PLATFORMS_RE = /SUPPORTED_PLATFORMS\s*=\s*"([^"]+)"/gi;
+
+/** Read at most `maxBytes` from a file without loading the full file into memory. */
+function readBounded(filePath: string, maxBytes: number): string {
+ const buf = Buffer.alloc(maxBytes);
+ const fd = openSync(filePath, "r");
+ try {
+ const bytesRead = readSync(fd, buf, 0, maxBytes, 0);
+ return buf.toString("utf-8", 0, bytesRead);
+ } finally {
+ closeSync(fd);
+ }
+}
+
+/** Common subdirectories where .xcodeproj may live in monorepos / standard layouts. */
+const XCODE_SUBDIRS = ["ios", "macos", "app", "apps"] as const;
+
+/**
+ * Scan *.xcodeproj directories for project.pbxproj and extract SDKROOT values.
+ * Returns deduplicated, canonical platform list (e.g. ["iphoneos"]).
+ *
+ * Reading the pbxproj is a lightweight regex scan — no full plist parsing needed.
+ * We read at most 1 MB per file to keep detection fast.
+ * Searches both the project root and common subdirectories (ios/, macos/, app/).
+ */
+function detectXcodePlatforms(basePath: string): XcodePlatform[] {
+ const platforms = new Set();
+
+ // Directories to scan: project root + common subdirs
+ const dirsToScan = [basePath];
+ for (const sub of XCODE_SUBDIRS) {
+ const subPath = join(basePath, sub);
+ if (existsSync(subPath)) dirsToScan.push(subPath);
+ }
+
+ for (const dir of dirsToScan) {
+ try {
+ const entries = readdirSync(dir, { withFileTypes: true });
+ for (const entry of entries) {
+ if (!entry.isDirectory() || !entry.name.endsWith(".xcodeproj")) continue;
+ const pbxprojPath = join(dir, entry.name, "project.pbxproj");
+ try {
+ const content = readBounded(pbxprojPath, 1024 * 1024);
+ // Match SDKROOT = ; — both quoted and unquoted forms
+ const sdkRe = /SDKROOT\s*=\s*"?([a-z]+)"?\s*;/gi;
+ let m: RegExpExecArray | null;
+ let foundExplicit = false;
+ while ((m = sdkRe.exec(content)) !== null) {
+ const val = m[1].toLowerCase();
+ if (val === "auto") continue; // handled below via SUPPORTED_PLATFORMS
+ const canonical = SDKROOT_MAP[val];
+ if (canonical) {
+ platforms.add(canonical);
+ foundExplicit = true;
+ }
+ }
+ // Xcode 15+ defaults SDKROOT to "auto"; fall back to SUPPORTED_PLATFORMS
+ if (!foundExplicit) {
+ let sp: RegExpExecArray | null;
+ while ((sp = SUPPORTED_PLATFORMS_RE.exec(content)) !== null) {
+ for (const tok of sp[1].split(/\s+/)) {
+ const canonical = SDKROOT_MAP[tok.toLowerCase()];
+ if (canonical) platforms.add(canonical);
+ }
+ }
+ SUPPORTED_PLATFORMS_RE.lastIndex = 0;
+ }
+ } catch {
+ // unreadable pbxproj — skip
+ }
+ }
+ } catch {
+ // unreadable directory
+ }
+ }
+ return [...platforms];
+}
+
// ─── Package Manager Detection ──────────────────────────────────────────────────
function detectPackageManager(basePath: string): string | undefined {
@@ -392,7 +684,7 @@ function detectVerificationCommands(
commands.push("go vet ./...");
}
- if (detectedFiles.includes("pyproject.toml") || detectedFiles.includes("setup.py")) {
+ if (detectedFiles.includes("pyproject.toml") || detectedFiles.includes("setup.py") || detectedFiles.includes("requirements.txt")) {
commands.push("pytest");
}
@@ -487,3 +779,370 @@ function readMakefileTargets(basePath: string): string[] {
return [];
}
}
+
+function pushUnique(arr: string[], value: string): void {
+ if (!arr.includes(value)) arr.push(value);
+}
+
+function matchesProjectFileMarker(scannedFile: string, marker: string): boolean {
+ const normalized = scannedFile.replaceAll("\\", "/");
+ return (
+ normalized === marker ||
+ normalized.endsWith(`/${marker}`)
+ );
+}
+
+function isPythonRequirementsFile(relativePath: string): boolean {
+ const normalized = relativePath.replaceAll("\\", "/");
+ const basename = normalized.slice(normalized.lastIndexOf("/") + 1);
+ return (
+ basename === "requirements.txt" ||
+ basename === "requirements.in" ||
+ /^requirements([-.].+)?\.(txt|in)$/i.test(basename) ||
+ /(^|\/)requirements\/.+\.(txt|in)$/i.test(normalized)
+ );
+}
+
+function containsFastapiDependency(basePath: string, relativePaths: string[]): boolean {
+ for (const relativePath of relativePaths) {
+ try {
+ const raw = readBounded(join(basePath, relativePath), 64 * 1024);
+ const content = extractDependencyContent(relativePath, raw);
+ if (isPythonRequirementsFile(relativePath)) {
+ for (const line of content.split("\n")) {
+ if (extractRequirementName(line) === "fastapi") return true;
+ }
+ continue;
+ }
+
+ if (relativePath.endsWith("pyproject.toml")) {
+ if (containsFastapiInPyproject(content)) return true;
+ }
+ } catch {
+ // unreadable file — continue scanning other candidate files
+ }
+ }
+
+ return false;
+}
+
+function containsSpringBootMarker(
+ basePath: string,
+ buildFiles: string[],
+ versionCatalogFiles: string[],
+ settingsFiles: string[],
+): boolean {
+ const usedPluginAliases = new Set();
+ const usedLibraryAliases = new Set();
+ const catalogAccessors = resolveVersionCatalogAccessors(basePath, versionCatalogFiles, settingsFiles);
+
+ for (const relativePath of buildFiles) {
+ try {
+ const raw = readBounded(join(basePath, relativePath), 64 * 1024);
+ const content = stripDependencyComments(relativePath, raw);
+ if (containsDirectSpringBootReference(relativePath, content)) {
+ return true;
+ }
+
+ const normalized = content.toLowerCase();
+ let match: RegExpExecArray | null;
+ for (const accessor of catalogAccessors) {
+ const aliasRe = new RegExp(`alias\\(\\s*${accessor}\\.plugins\\.([a-z0-9_.-]+)\\s*\\)`, "gi");
+ while ((match = aliasRe.exec(normalized)) !== null) {
+ usedPluginAliases.add(normalizePluginAlias(match[1]));
+ }
+
+ const libraryAliasRe = new RegExp(`\\b${accessor}\\.((?!plugins\\b)[a-z0-9_.-]+)`, "gi");
+ while ((match = libraryAliasRe.exec(normalized)) !== null) {
+ usedLibraryAliases.add(normalizePluginAlias(match[1]));
+ }
+ }
+ } catch {
+ // unreadable build file — continue scanning others
+ }
+ }
+
+ if (usedPluginAliases.size === 0 && usedLibraryAliases.size === 0) {
+ return false;
+ }
+ if (versionCatalogFiles.length === 0) {
+ return false;
+ }
+
+ const springBootAliases = new Set();
+ const springBootLibraries = new Set();
+ const pendingSpringBootBundles: Array<{ bundleAlias: string; referencedAliases: string[] }> = [];
+ for (const relativePath of versionCatalogFiles) {
+ try {
+ const raw = readBounded(join(basePath, relativePath), 64 * 1024);
+ const content = stripDependencyComments(relativePath, raw);
+ const aliasRe = /^\s*([A-Za-z0-9_.-]+)\s*=\s*\{[^\n}]*\bid\s*=\s*["']org\.springframework\.boot["'][^\n}]*\}/gm;
+ let match: RegExpExecArray | null;
+ while ((match = aliasRe.exec(content)) !== null) {
+ springBootAliases.add(normalizePluginAlias(match[1]));
+ }
+
+ const libraryRe = /^\s*([A-Za-z0-9_.-]+)\s*=\s*\{[^\n}]*\b(module\s*=\s*["']org\.springframework\.boot:[^"']+["']|group\s*=\s*["']org\.springframework\.boot["'][^\n}]*\bname\s*=\s*["']spring-boot[^"']*["'])[^\n}]*\}/gm;
+ while ((match = libraryRe.exec(content)) !== null) {
+ springBootLibraries.add(normalizePluginAlias(match[1]));
+ }
+
+ const bundleRe = /^\s*([A-Za-z0-9_.-]+)\s*=\s*\[([\s\S]*?)\]/gm;
+ while ((match = bundleRe.exec(content)) !== null) {
+ pendingSpringBootBundles.push({
+ bundleAlias: normalizePluginAlias(`bundles.${match[1]}`),
+ referencedAliases: match[2]
+ .split(",")
+ .map((part) => normalizePluginAlias(part.replace(/["'\s]/g, "")))
+ .filter(Boolean),
+ });
+ }
+ } catch {
+ // unreadable version catalog — continue scanning others
+ }
+ }
+
+ const springBootBundles = new Set();
+ for (const pendingBundle of pendingSpringBootBundles) {
+ if (pendingBundle.referencedAliases.some((alias) => springBootLibraries.has(alias))) {
+ springBootBundles.add(pendingBundle.bundleAlias);
+ }
+ }
+
+ for (const alias of usedPluginAliases) {
+ if (springBootAliases.has(alias)) return true;
+ }
+ for (const alias of usedLibraryAliases) {
+ if (springBootLibraries.has(alias) || springBootBundles.has(alias)) return true;
+ }
+
+ return false;
+}
+
+function stripDependencyComments(relativePath: string, content: string): string {
+ if (relativePath.endsWith("requirements.txt")) {
+ return content.replace(/(^|\s)#.*$/gm, "");
+ }
+ if (relativePath.endsWith("pyproject.toml")) {
+ return content.replace(/(^|\s)#.*$/gm, "");
+ }
+ if (relativePath.endsWith(".versions.toml")) {
+ return content.replace(/(^|\s)#.*$/gm, "");
+ }
+ if (relativePath.endsWith("settings.gradle") || relativePath.endsWith("settings.gradle.kts")) {
+ return content
+ .replace(/\/\*[\s\S]*?\*\//g, "")
+ .replace(/\/\/.*$/gm, "");
+ }
+ if (relativePath.endsWith("pom.xml")) {
+ return content.replace(//g, "");
+ }
+ if (relativePath.endsWith("build.gradle") || relativePath.endsWith("build.gradle.kts")) {
+ return content
+ .replace(/\/\*[\s\S]*?\*\//g, "")
+ .replace(/\/\/.*$/gm, "");
+ }
+ return content;
+}
+
+function extractDependencyContent(relativePath: string, content: string): string {
+ const stripped = stripDependencyComments(relativePath, content);
+ if (relativePath.endsWith("pyproject.toml")) {
+ return extractPyprojectDependencySections(stripped);
+ }
+ return stripped;
+}
+
+function extractRequirementName(spec: string): string | null {
+ const trimmed = spec.trim().replace(/^["']|["']$/g, "");
+ if (!trimmed) return null;
+
+ const match = trimmed.match(/^([A-Za-z0-9_.-]+)(?:\[[^\]]+\])?(?=\s*(?:@|[<>=!~;]|$))/);
+ if (!match) return null;
+ return normalizePackageName(match[1]);
+}
+
+function containsFastapiInPyproject(content: string): boolean {
+ for (const line of content.split("\n")) {
+ const keyMatch = line.match(/^\s*([A-Za-z0-9_.-]+)\s*=/);
+ if (keyMatch) {
+ const key = normalizePackageName(keyMatch[1]);
+ if (key === "fastapi") {
+ return true;
+ }
+ if (key !== "dependencies") {
+ continue;
+ }
+ }
+
+ const quotedSpecRe = /["']([^"']+)["']/g;
+ let match: RegExpExecArray | null;
+ while ((match = quotedSpecRe.exec(line)) !== null) {
+ if (extractRequirementName(match[1]) === "fastapi") {
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
+function containsDirectSpringBootReference(relativePath: string, content: string): boolean {
+ if (relativePath.endsWith("pom.xml")) {
+ return /\s*org\.springframework\.boot\s*<\/groupId>/i.test(content);
+ }
+
+ if (relativePath.endsWith("build.gradle") || relativePath.endsWith("build.gradle.kts")) {
+ return /(id\s*\(?\s*["']org\.springframework\.boot["']|apply\s*\(?\s*plugin\s*[:=]\s*["']org\.springframework\.boot["']|(?:implementation|api|compileOnly|runtimeOnly|testImplementation|annotationProcessor|kapt)\s*\(?\s*["'][^"']*org\.springframework\.boot:[^"']*spring-boot[^"']*["'])/i.test(content);
+ }
+
+ return false;
+}
+
+function extractPyprojectDependencySections(content: string): string {
+ const lines = content.split("\n");
+ const collected: string[] = [];
+ let section = "";
+ let collectingProjectDeps = false;
+ let collectingOptionalDeps = false;
+ let bracketDepth = 0;
+
+ for (const line of lines) {
+ const trimmed = line.trim();
+
+ if (collectingProjectDeps) {
+ collected.push(line);
+ bracketDepth += countChar(line, "[") - countChar(line, "]");
+ if (bracketDepth <= 0) {
+ collectingProjectDeps = false;
+ }
+ continue;
+ }
+
+ if (collectingOptionalDeps) {
+ collected.push(line);
+ bracketDepth += countChar(line, "[") - countChar(line, "]");
+ if (bracketDepth <= 0) {
+ collectingOptionalDeps = false;
+ }
+ continue;
+ }
+
+ const sectionMatch = trimmed.match(/^\[([^\]]+)\]$/);
+ if (sectionMatch) {
+ section = sectionMatch[1].trim();
+ continue;
+ }
+
+ if (section === "project" && /^dependencies\s*=\s*\[/.test(trimmed)) {
+ collected.push(line);
+ bracketDepth = countChar(line, "[") - countChar(line, "]");
+ collectingProjectDeps = bracketDepth > 0;
+ continue;
+ }
+
+ if (
+ section === "project.optional-dependencies" ||
+ section === "tool.poetry.dependencies"
+ ) {
+ if (section === "project.optional-dependencies") {
+ const equalsIndex = line.indexOf("=");
+ if (equalsIndex !== -1) {
+ const value = line.slice(equalsIndex + 1);
+ collected.push(value);
+ bracketDepth = countChar(value, "[") - countChar(value, "]");
+ collectingOptionalDeps = bracketDepth > 0;
+ }
+ } else {
+ collected.push(line);
+ }
+ }
+ }
+
+ return collected.join("\n");
+}
+
+function countChar(text: string, char: string): number {
+ return [...text].filter((c) => c === char).length;
+}
+
+function normalizePackageName(name: string): string {
+ return name.toLowerCase().replace(/[_.]/g, "-");
+}
+
+function normalizePluginAlias(alias: string): string {
+ return alias.toLowerCase().replace(/[-_]/g, ".");
+}
+
+function versionCatalogAccessorName(relativePath: string): string {
+ const normalized = relativePath.replaceAll("\\", "/");
+ const basename = normalized.slice(normalized.lastIndexOf("/") + 1);
+ return basename.replace(/\.versions\.toml$/i, "").toLowerCase();
+}
+
+function resolveVersionCatalogAccessors(
+ basePath: string,
+ versionCatalogFiles: string[],
+ settingsFiles: string[],
+): Set {
+ const accessors = new Set(versionCatalogFiles.map(versionCatalogAccessorName).filter(Boolean));
+ if (versionCatalogFiles.length === 0 || settingsFiles.length === 0) {
+ return accessors;
+ }
+
+ for (const settingsFile of settingsFiles) {
+ try {
+ const raw = readBounded(join(basePath, settingsFile), 64 * 1024);
+ const content = stripDependencyComments(settingsFile, raw);
+ const createRe = /create\(\s*["']([A-Za-z0-9_]+)["']\s*\)\s*\{[\s\S]*?([A-Za-z0-9_.-]+\.versions\.toml)["']?\s*\)\s*\)/g;
+ let match: RegExpExecArray | null;
+ while ((match = createRe.exec(content)) !== null) {
+ const accessor = match[1].toLowerCase();
+ const catalogBasename = match[2].replaceAll("\\", "/").split("/").pop()!;
+ if (versionCatalogFiles.some((file) => {
+ const normalized = file.replaceAll("\\", "/");
+ return normalized === catalogBasename || normalized.endsWith(`/${catalogBasename}`);
+ })) {
+ accessors.add(accessor);
+ }
+ }
+ } catch {
+ // unreadable settings file — ignore
+ }
+ }
+
+ return accessors;
+}
+
+function scanProjectFiles(basePath: string): string[] {
+ const files: string[] = [];
+ const queue: Array<{ path: string; depth: number }> = [{ path: basePath, depth: 0 }];
+
+ while (queue.length > 0 && files.length < MAX_RECURSIVE_SCAN_FILES) {
+ const current = queue.shift()!;
+ let entries: Array<{ name: string; isDirectory(): boolean; isFile(): boolean }>;
+ try {
+ entries = readdirSync(current.path, { withFileTypes: true, encoding: "utf8" });
+ } catch {
+ continue;
+ }
+
+ for (const entry of entries) {
+ const entryPath = join(current.path, entry.name);
+ const relativePath = entryPath.slice(basePath.length + 1);
+
+ if (entry.isDirectory()) {
+ if (current.depth < MAX_RECURSIVE_SCAN_DEPTH && !RECURSIVE_SCAN_IGNORED_DIRS.has(entry.name)) {
+ queue.push({ path: entryPath, depth: current.depth + 1 });
+ }
+ continue;
+ }
+
+ if (!entry.isFile()) continue;
+ files.push(relativePath);
+ if (files.length >= MAX_RECURSIVE_SCAN_FILES) break;
+ }
+ }
+
+ return files;
+}
diff --git a/src/resources/extensions/gsd/init-wizard.ts b/src/resources/extensions/gsd/init-wizard.ts
index c83cda4a6..de634ce99 100644
--- a/src/resources/extensions/gsd/init-wizard.ts
+++ b/src/resources/extensions/gsd/init-wizard.ts
@@ -15,6 +15,7 @@ import { ensureGitignore, untrackRuntimeFiles } from "./gitignore.js";
import { gsdRoot } from "./paths.js";
import { assertSafeDirectory } from "./validate-directory.js";
import type { ProjectDetection, ProjectSignals } from "./detection.js";
+import { runSkillInstallStep } from "./skill-catalog.js";
// ─── Types ──────────────────────────────────────────────────────────────────────
@@ -223,7 +224,14 @@ export async function showProjectInit(
await customizeAdvancedPrefs(ctx, prefs);
}
- // ── Step 8: Bootstrap .gsd/ ────────────────────────────────────────────────
+ // ── Step 8: Skill Installation ─────────────────────────────────────────────
+ try {
+ await runSkillInstallStep(ctx, signals);
+ } catch {
+ // Non-fatal — skill installation failure should never block project init
+ }
+
+ // ── Step 9: Bootstrap .gsd/ ────────────────────────────────────────────────
bootstrapGsdDirectory(basePath, prefs, signals);
// Ensure .gitignore
diff --git a/src/resources/extensions/gsd/observability-validator.ts b/src/resources/extensions/gsd/observability-validator.ts
new file mode 100644
index 000000000..0fb87f5d2
--- /dev/null
+++ b/src/resources/extensions/gsd/observability-validator.ts
@@ -0,0 +1,456 @@
+import { loadFile } from "./files.js";
+import { resolveSliceFile, resolveTaskFile, resolveTasksDir, resolveTaskFiles } from "./paths.js";
+
+export interface ValidationIssue {
+ severity: "info" | "warning" | "error";
+ scope: "slice-plan" | "task-plan" | "task-summary" | "slice-summary";
+ file: string;
+ ruleId: string;
+ message: string;
+ suggestion?: string;
+}
+
+function getSection(content: string, heading: string, level: number = 2): string | null {
+ const prefix = "#".repeat(level) + " ";
+ const escaped = heading.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+ const regex = new RegExp(`^${prefix}${escaped}\\s*$`, "m");
+ const match = regex.exec(content);
+ if (!match) return null;
+
+ const start = match.index + match[0].length;
+ const rest = content.slice(start);
+ const nextHeading = rest.match(new RegExp(`^#{1,${level}} `, "m"));
+ const end = nextHeading ? nextHeading.index! : rest.length;
+ return rest.slice(0, end).trim();
+}
+
+function getFrontmatter(content: string): string | null {
+ const trimmed = content.trimStart();
+ if (!trimmed.startsWith("---")) return null;
+ const afterFirst = trimmed.indexOf("\n");
+ if (afterFirst === -1) return null;
+ const rest = trimmed.slice(afterFirst + 1);
+ const endIdx = rest.indexOf("\n---");
+ if (endIdx === -1) return null;
+ return rest.slice(0, endIdx);
+}
+
+function hasFrontmatterKey(content: string, key: string): boolean {
+ const fm = getFrontmatter(content);
+ if (!fm) return false;
+ return new RegExp(`^${key}:`, "m").test(fm);
+}
+
+function normalizeMeaningfulLines(text: string): string[] {
+ return text
+ .split("\n")
+ .map(line => line.trim())
+ .filter(line => line.length > 0)
+ .filter(line => !line.startsWith(""))
+ .filter(line => !/^[-*]\s*\{\{.+\}\}$/.test(line))
+ .filter(line => !/^\{\{.+\}\}$/.test(line));
+}
+
+function sectionLooksPlaceholderOnly(text: string | null): boolean {
+ if (!text) return true;
+ const lines = normalizeMeaningfulLines(text)
+ .map(line => line.replace(/^[-*]\s+/, "").trim())
+ .filter(line => line.length > 0);
+
+ if (lines.length === 0) return true;
+
+ return lines.every(line => {
+ const lower = line.toLowerCase();
+ return lower === "none" ||
+ lower.endsWith(": none") ||
+ lower.includes("{{") ||
+ lower.includes("}}") ||
+ lower.startsWith("required for non-trivial") ||
+ lower.startsWith("describe how a future agent") ||
+ lower.startsWith("prefer:") ||
+ lower.startsWith("keep this section concise");
+ });
+}
+
+function textSuggestsObservabilityRelevant(content: string): boolean {
+ const lower = content.toLowerCase();
+ const needles = [
+ " api", "route", "server", "worker", "queue", "job", "sync", "import",
+ "webhook", "auth", "db", "database", "migration", "cache", "background",
+ "polling", "realtime", "socket", "stateful", "integration", "ui", "form",
+ "submit", "status", "service", "pipeline", "health endpoint", "error path"
+ ];
+ return needles.some(needle => lower.includes(needle));
+}
+
+function verificationMentionsDiagnostics(section: string | null): boolean {
+ if (!section) return false;
+ const lower = section.toLowerCase();
+ const needles = [
+ "error", "failure", "diagnostic", "status", "health", "inspect", "log",
+ "network", "console", "retry", "last error", "correlation", "readiness"
+ ];
+ return needles.some(needle => lower.includes(needle));
+}
+
+export function validateSlicePlanContent(file: string, content: string): ValidationIssue[] {
+ const issues: ValidationIssue[] = [];
+
+ // ── Plan quality rules (always run, not gated by runtime relevance) ──
+
+ const tasksSection = getSection(content, "Tasks", 2);
+ if (tasksSection) {
+ const lines = tasksSection.split("\n");
+ const taskLinePattern = /^- \[[ x]\] \*\*T\d+:/;
+ const taskLineIndices: number[] = [];
+ for (let i = 0; i < lines.length; i++) {
+ if (taskLinePattern.test(lines[i])) taskLineIndices.push(i);
+ }
+
+ for (let t = 0; t < taskLineIndices.length; t++) {
+ const start = taskLineIndices[t];
+ const end = t + 1 < taskLineIndices.length ? taskLineIndices[t + 1] : lines.length;
+ // Check lines between this task header and the next (or section end)
+ const bodyLines = lines.slice(start + 1, end);
+ const meaningful = bodyLines.filter(l => l.trim().length > 0);
+ if (meaningful.length === 0) {
+ issues.push({
+ severity: "warning",
+ scope: "slice-plan",
+ file,
+ ruleId: "empty_task_entry",
+ message: "Inline task entry has no description content beneath the checkbox line.",
+ suggestion: "Add at least a Why/Files/Do/Verify summary so the task is self-describing.",
+ });
+ }
+ }
+ }
+
+ // ── Observability rules (gated by runtime relevance) ──
+
+ const relevant = textSuggestsObservabilityRelevant(content);
+ if (!relevant) return issues;
+
+ const obs = getSection(content, "Observability / Diagnostics", 2);
+ const verification = getSection(content, "Verification", 2);
+
+ if (!obs) {
+ issues.push({
+ severity: "warning",
+ scope: "slice-plan",
+ file,
+ ruleId: "missing_observability_section",
+ message: "Slice plan appears non-trivial but is missing `## Observability / Diagnostics`.",
+ suggestion: "Add runtime signals, inspection surfaces, failure visibility, and redaction constraints.",
+ });
+ } else if (sectionLooksPlaceholderOnly(obs)) {
+ issues.push({
+ severity: "warning",
+ scope: "slice-plan",
+ file,
+ ruleId: "observability_section_placeholder_only",
+ message: "Slice plan has `## Observability / Diagnostics` but it still looks like placeholder text.",
+ suggestion: "Replace placeholders with concrete signals and inspection surfaces a future agent should trust.",
+ });
+ }
+
+ if (!verificationMentionsDiagnostics(verification)) {
+ issues.push({
+ severity: "warning",
+ scope: "slice-plan",
+ file,
+ ruleId: "verification_missing_diagnostic_check",
+ message: "Slice verification does not appear to include any diagnostic or failure-path check.",
+ suggestion: "Add at least one verification step for inspectable failure state, structured error output, status surface, or equivalent.",
+ });
+ }
+
+ return issues;
+}
+
+export function validateTaskPlanContent(file: string, content: string): ValidationIssue[] {
+ const issues: ValidationIssue[] = [];
+
+ // ── Plan quality rules (always run, not gated by runtime relevance) ──
+
+ // Rule: empty or missing Steps section
+ const stepsSection = getSection(content, "Steps", 2);
+ if (stepsSection === null || sectionLooksPlaceholderOnly(stepsSection)) {
+ issues.push({
+ severity: "warning",
+ scope: "task-plan",
+ file,
+ ruleId: "empty_steps_section",
+ message: "Task plan has an empty or missing `## Steps` section.",
+ suggestion: "Add concrete numbered implementation steps so execution has a clear sequence.",
+ });
+ }
+
+ // Rule: placeholder-only Verification section
+ const verificationSection = getSection(content, "Verification", 2);
+ if (verificationSection !== null && sectionLooksPlaceholderOnly(verificationSection)) {
+ issues.push({
+ severity: "warning",
+ scope: "task-plan",
+ file,
+ ruleId: "placeholder_verification",
+ message: "Task plan has `## Verification` but it still looks like placeholder text.",
+ suggestion: "Replace placeholders with concrete verification commands, test runs, or observable checks.",
+ });
+ }
+
+ // Rule: scope estimate thresholds
+ const fm = getFrontmatter(content);
+ if (fm) {
+ const stepsMatch = fm.match(/^estimated_steps:\s*(\d+)/m);
+ const filesMatch = fm.match(/^estimated_files:\s*(\d+)/m);
+
+ if (stepsMatch) {
+ const estimatedSteps = parseInt(stepsMatch[1], 10);
+ if (estimatedSteps >= 10) {
+ issues.push({
+ severity: "warning",
+ scope: "task-plan",
+ file,
+ ruleId: "scope_estimate_steps_high",
+ message: `Task plan estimates ${estimatedSteps} steps (threshold: 10). Consider splitting into smaller tasks.`,
+ suggestion: "Break the task into sub-tasks or reduce scope so each task stays focused and completable in one pass.",
+ });
+ }
+ }
+
+ if (filesMatch) {
+ const estimatedFiles = parseInt(filesMatch[1], 10);
+ if (estimatedFiles >= 12) {
+ issues.push({
+ severity: "warning",
+ scope: "task-plan",
+ file,
+ ruleId: "scope_estimate_files_high",
+ message: `Task plan estimates ${estimatedFiles} files (threshold: 12). Consider splitting into smaller tasks.`,
+ suggestion: "Break the task into sub-tasks or reduce scope to keep the change footprint manageable.",
+ });
+ }
+ }
+ }
+
+ // Rule: Inputs and Expected Output should contain backtick-wrapped file paths
+ const inputsSection = getSection(content, "Inputs", 2);
+ const outputSection = getSection(content, "Expected Output", 2);
+ const backtickPathPattern = /`[^`]*[./][^`]*`/;
+
+ if (outputSection === null || !backtickPathPattern.test(outputSection)) {
+ issues.push({
+ severity: "warning",
+ scope: "task-plan",
+ file,
+ ruleId: "missing_output_file_paths",
+ message: "Task plan `## Expected Output` is missing or has no backtick-wrapped file paths.",
+ suggestion: "List concrete output file paths in backticks (e.g. `src/types.ts`). These are machine-parsed to derive task dependencies.",
+ });
+ }
+
+ if (inputsSection !== null && inputsSection.trim().length > 0 && !backtickPathPattern.test(inputsSection)) {
+ issues.push({
+ severity: "info",
+ scope: "task-plan",
+ file,
+ ruleId: "missing_input_file_paths",
+ message: "Task plan `## Inputs` has content but no backtick-wrapped file paths.",
+ suggestion: "List input file paths in backticks (e.g. `src/config.json`). These are machine-parsed to derive task dependencies.",
+ });
+ }
+
+ // ── Observability rules (gated by runtime relevance) ──
+
+ const relevant = textSuggestsObservabilityRelevant(content);
+ if (!relevant) return issues;
+
+ const obs = getSection(content, "Observability Impact", 2);
+ if (!obs) {
+ issues.push({
+ severity: "warning",
+ scope: "task-plan",
+ file,
+ ruleId: "missing_observability_impact",
+ message: "Task plan appears runtime-relevant but is missing `## Observability Impact`.",
+ suggestion: "Explain what signals change, how a future agent inspects this task, and what failure state becomes visible.",
+ });
+ } else if (sectionLooksPlaceholderOnly(obs)) {
+ issues.push({
+ severity: "warning",
+ scope: "task-plan",
+ file,
+ ruleId: "observability_impact_placeholder_only",
+ message: "Task plan has `## Observability Impact` but it still looks empty or placeholder-only.",
+ suggestion: "Fill in concrete inspection surfaces or explicitly justify why observability is not applicable.",
+ });
+ }
+
+ return issues;
+}
+
+export function validateTaskSummaryContent(file: string, content: string): ValidationIssue[] {
+ const issues: ValidationIssue[] = [];
+ if (!hasFrontmatterKey(content, "observability_surfaces")) {
+ issues.push({
+ severity: "warning",
+ scope: "task-summary",
+ file,
+ ruleId: "missing_observability_frontmatter",
+ message: "Task summary is missing `observability_surfaces` in frontmatter.",
+ suggestion: "List the durable status/log/error surfaces a future agent should use.",
+ });
+ }
+
+ const diagnostics = getSection(content, "Diagnostics", 2);
+ if (!diagnostics) {
+ issues.push({
+ severity: "warning",
+ scope: "task-summary",
+ file,
+ ruleId: "missing_diagnostics_section",
+ message: "Task summary is missing `## Diagnostics`.",
+ suggestion: "Document how to inspect what this task built later.",
+ });
+ } else if (sectionLooksPlaceholderOnly(diagnostics)) {
+ issues.push({
+ severity: "warning",
+ scope: "task-summary",
+ file,
+ ruleId: "diagnostics_placeholder_only",
+ message: "Task summary diagnostics section still looks like placeholder text.",
+ suggestion: "Replace placeholders with concrete commands, endpoints, logs, error shapes, or failure artifacts.",
+ });
+ }
+
+ const evidence = getSection(content, "Verification Evidence", 2);
+ if (!evidence) {
+ issues.push({
+ severity: "warning",
+ scope: "task-summary",
+ file,
+ ruleId: "evidence_block_missing",
+ message: "Task summary is missing `## Verification Evidence`.",
+ suggestion: "Add a verification evidence table showing gate check results (command, exit code, verdict, duration).",
+ });
+ } else if (sectionLooksPlaceholderOnly(evidence)) {
+ issues.push({
+ severity: "warning",
+ scope: "task-summary",
+ file,
+ ruleId: "evidence_block_placeholder",
+ message: "Task summary verification evidence section still looks like placeholder text.",
+ suggestion: "Replace placeholders with actual gate results or note that no verification commands were discovered.",
+ });
+ }
+
+ return issues;
+}
+
+export function validateSliceSummaryContent(file: string, content: string): ValidationIssue[] {
+ const issues: ValidationIssue[] = [];
+ if (!hasFrontmatterKey(content, "observability_surfaces")) {
+ issues.push({
+ severity: "warning",
+ scope: "slice-summary",
+ file,
+ ruleId: "missing_observability_frontmatter",
+ message: "Slice summary is missing `observability_surfaces` in frontmatter.",
+ suggestion: "List the authoritative diagnostics and durable inspection surfaces for this slice.",
+ });
+ }
+
+ const diagnostics = getSection(content, "Authoritative diagnostics", 3);
+ if (!diagnostics) {
+ issues.push({
+ severity: "warning",
+ scope: "slice-summary",
+ file,
+ ruleId: "missing_authoritative_diagnostics",
+ message: "Slice summary is missing `### Authoritative diagnostics` in Forward Intelligence.",
+ suggestion: "Tell future agents where to look first and why that signal is trustworthy.",
+ });
+ } else if (sectionLooksPlaceholderOnly(diagnostics)) {
+ issues.push({
+ severity: "warning",
+ scope: "slice-summary",
+ file,
+ ruleId: "authoritative_diagnostics_placeholder_only",
+ message: "Slice summary includes authoritative diagnostics but it still looks like placeholder text.",
+ suggestion: "Replace placeholders with the real first-stop diagnostic surface for this slice.",
+ });
+ }
+
+ return issues;
+}
+
+export async function validatePlanBoundary(basePath: string, milestoneId: string, sliceId: string): Promise {
+ const issues: ValidationIssue[] = [];
+ const slicePlan = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+ if (slicePlan) {
+ const content = await loadFile(slicePlan);
+ if (content) issues.push(...validateSlicePlanContent(slicePlan, content));
+ }
+
+ const tasksDir = resolveTasksDir(basePath, milestoneId, sliceId);
+ const taskPlans = tasksDir ? resolveTaskFiles(tasksDir, "PLAN") : [];
+ for (const file of taskPlans) {
+ const taskId = file.split("-")[0];
+ const taskPlan = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+ if (!taskPlan) continue;
+ const content = await loadFile(taskPlan);
+ if (content) issues.push(...validateTaskPlanContent(taskPlan, content));
+ }
+
+ return issues;
+}
+
+export async function validateExecuteBoundary(basePath: string, milestoneId: string, sliceId: string, taskId: string): Promise {
+ const issues: ValidationIssue[] = [];
+ const slicePlan = resolveSliceFile(basePath, milestoneId, sliceId, "PLAN");
+ if (slicePlan) {
+ const content = await loadFile(slicePlan);
+ if (content) issues.push(...validateSlicePlanContent(slicePlan, content));
+ }
+
+ const taskPlan = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "PLAN");
+ if (taskPlan) {
+ const content = await loadFile(taskPlan);
+ if (content) issues.push(...validateTaskPlanContent(taskPlan, content));
+ }
+
+ return issues;
+}
+
+export async function validateCompleteBoundary(basePath: string, milestoneId: string, sliceId: string): Promise {
+ const issues: ValidationIssue[] = [];
+ const tasksDir = resolveTasksDir(basePath, milestoneId, sliceId);
+ const taskSummaries = tasksDir ? resolveTaskFiles(tasksDir, "SUMMARY") : [];
+ for (const file of taskSummaries) {
+ const taskId = file.split("-")[0];
+ const taskSummary = resolveTaskFile(basePath, milestoneId, sliceId, taskId, "SUMMARY");
+ if (!taskSummary) continue;
+ const content = await loadFile(taskSummary);
+ if (content) issues.push(...validateTaskSummaryContent(taskSummary, content));
+ }
+
+ const sliceSummary = resolveSliceFile(basePath, milestoneId, sliceId, "SUMMARY");
+ if (sliceSummary) {
+ const content = await loadFile(sliceSummary);
+ if (content) issues.push(...validateSliceSummaryContent(sliceSummary, content));
+ }
+
+ return issues;
+}
+
+export function formatValidationIssues(issues: ValidationIssue[], limit: number = 4): string {
+ if (issues.length === 0) return "";
+ const lines = issues.slice(0, limit).map(issue => {
+ const fileName = issue.file.split("/").pop() || issue.file;
+ return `- ${fileName}: ${issue.message}`;
+ });
+ if (issues.length > limit) lines.push(`- ...and ${issues.length - limit} more`);
+ return lines.join("\n");
+}
diff --git a/src/resources/extensions/gsd/preferences-skills.ts b/src/resources/extensions/gsd/preferences-skills.ts
index b449af8b4..1ad5a6d39 100644
--- a/src/resources/extensions/gsd/preferences-skills.ts
+++ b/src/resources/extensions/gsd/preferences-skills.ts
@@ -8,7 +8,6 @@
import { existsSync, readdirSync } from "node:fs";
import { homedir } from "node:os";
import { isAbsolute, join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
import { statSync } from "node:fs";
import type {
@@ -25,13 +24,20 @@ export type { GSDSkillRule, SkillDiscoveryMode, SkillResolution, SkillResolution
/**
* Known skill directories, in priority order.
- * User skills (~/.gsd/agent/skills/) take precedence over project skills.
+ * Global skills (~/.agents/skills/) take precedence over project skills.
+ * Legacy ~/.gsd/agent/skills/ is included as a fallback for pre-migration installs.
*/
export function getSkillSearchDirs(cwd: string): Array<{ dir: string; method: SkillResolution["method"] }> {
- return [
- { dir: join(getAgentDir(), "skills"), method: "user-skill" },
- { dir: join(cwd, ".pi", "agent", "skills"), method: "project-skill" },
+ const dirs: Array<{ dir: string; method: SkillResolution["method"] }> = [
+ { dir: join(homedir(), ".agents", "skills"), method: "user-skill" },
+ { dir: join(cwd, ".agents", "skills"), method: "project-skill" },
];
+ // Legacy fallback — read skills from old GSD directory only if migration hasn't completed
+ const legacyDir = join(homedir(), ".gsd", "agent", "skills");
+ if (existsSync(legacyDir) && !existsSync(join(legacyDir, ".migrated-to-agents"))) {
+ dirs.push({ dir: legacyDir, method: "user-skill" });
+ }
+ return dirs;
}
/**
diff --git a/src/resources/extensions/gsd/roadmap-mutations.ts b/src/resources/extensions/gsd/roadmap-mutations.ts
new file mode 100644
index 000000000..39521462b
--- /dev/null
+++ b/src/resources/extensions/gsd/roadmap-mutations.ts
@@ -0,0 +1,134 @@
+/**
+ * Roadmap Mutations — shared utilities for modifying roadmap checkbox state.
+ *
+ * Extracts the duplicated "flip slice checkbox" pattern that existed in
+ * doctor.ts, mechanical-completion.ts, and auto-recovery.ts.
+ */
+
+import { readFileSync } from "node:fs";
+import { atomicWriteSync } from "./atomic-write.js";
+import { resolveMilestoneFile } from "./paths.js";
+import { clearParseCache } from "./files.js";
+
+/**
+ * Mark a slice as done ([x]) in the milestone roadmap.
+ * Idempotent — no-op if already checked or if the slice isn't found.
+ *
+ * @returns true if the roadmap was modified, false if no change was needed
+ */
+export function markSliceDoneInRoadmap(basePath: string, mid: string, sid: string): boolean {
+ const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+ if (!roadmapFile) return false;
+
+ let content: string;
+ try {
+ content = readFileSync(roadmapFile, "utf-8");
+ } catch {
+ return false;
+ }
+
+ // Try checkbox format first: "- [ ] **S01: Title**"
+ let updated = content.replace(
+ new RegExp(`^(\\s*-\\s+)\\[ \\]\\s+\\*\\*${sid}:`, "m"),
+ `$1[x] **${sid}:`,
+ );
+
+ // If checkbox format didn't match, try prose format: "## S01: Title" -> "## S01: \u2713 Title"
+ if (updated === content) {
+ updated = content.replace(
+ new RegExp(`^(#{1,4}\\s+(?:\\*{0,2})(?:Slice\\s+)?${sid}\\*{0,2}[:\\s.\\u2014\\u2013-]+\\s*)(.+)`, "m"),
+ (match, prefix, title) => {
+ // Already marked done — no-op
+ if (/^\u2713/.test(title) || /\(Complete\)\s*$/i.test(title)) return match;
+ return `${prefix}\u2713 ${title}`;
+ },
+ );
+ }
+
+ if (updated === content) return false;
+
+ atomicWriteSync(roadmapFile, updated);
+ clearParseCache();
+ return true;
+}
+
+/**
+ * Mark a slice as not done ([ ]) in the milestone roadmap.
+ * Idempotent — no-op if already unchecked or if the slice isn't found.
+ *
+ * @returns true if the roadmap was modified, false if no change was needed
+ */
+export function markSliceUndoneInRoadmap(basePath: string, mid: string, sid: string): boolean {
+ const roadmapFile = resolveMilestoneFile(basePath, mid, "ROADMAP");
+ if (!roadmapFile) return false;
+
+ let content: string;
+ try {
+ content = readFileSync(roadmapFile, "utf-8");
+ } catch {
+ return false;
+ }
+
+ const updated = content.replace(
+ new RegExp(`^(\\s*-\\s+)\\[x\\]\\s+\\*\\*${sid}:`, "m"),
+ `$1[ ] **${sid}:`,
+ );
+
+ if (updated === content) return false;
+
+ atomicWriteSync(roadmapFile, updated);
+ clearParseCache();
+ return true;
+}
+
+/**
+ * Mark a task as done ([x]) in the slice plan.
+ * Idempotent — no-op if already checked or if the task isn't found.
+ *
+ * @returns true if the plan was modified, false if no change was needed
+ */
+export function markTaskDoneInPlan(basePath: string, planPath: string, tid: string): boolean {
+ let content: string;
+ try {
+ content = readFileSync(planPath, "utf-8");
+ } catch {
+ return false;
+ }
+
+ const updated = content.replace(
+ new RegExp(`^(\\s*-\\s+)\\[ \\]\\s+\\*\\*${tid}:`, "m"),
+ `$1[x] **${tid}:`,
+ );
+
+ if (updated === content) return false;
+
+ atomicWriteSync(planPath, updated);
+ clearParseCache();
+ return true;
+}
+
+/**
+ * Mark a task as not done ([ ]) in the slice plan.
+ * Idempotent — no-op if already unchecked or if the task isn't found.
+ *
+ * @returns true if the plan was modified, false if no change was needed
+ */
+export function markTaskUndoneInPlan(basePath: string, planPath: string, tid: string): boolean {
+ let content: string;
+ try {
+ content = readFileSync(planPath, "utf-8");
+ } catch {
+ return false;
+ }
+
+ const updated = content.replace(
+ new RegExp(`^(\\s*-\\s+)\\[x\\]\\s+\\*\\*${tid}:`, "mi"),
+ `$1[ ] **${tid}:`,
+ );
+
+ if (updated === content) return false;
+
+ atomicWriteSync(planPath, updated);
+ clearParseCache();
+ return true;
+}
diff --git a/src/resources/extensions/gsd/skill-catalog.ts b/src/resources/extensions/gsd/skill-catalog.ts
new file mode 100644
index 000000000..8f1c5d760
--- /dev/null
+++ b/src/resources/extensions/gsd/skill-catalog.ts
@@ -0,0 +1,1085 @@
+/**
+ * GSD Skill Catalog — Curated skill packs mapped to tech stacks.
+ *
+ * Each pack maps a detected (or user-chosen) tech stack to a skills.sh
+ * repo + specific skill names. The init wizard uses this catalog to
+ * install relevant skills during project onboarding.
+ *
+ * Installation is delegated entirely to the skills.sh CLI:
+ * npx skills add --skill --skill -y
+ *
+ * Skills are installed into ~/.agents/skills/ (the industry-standard
+ * ecosystem directory shared across all agents).
+ */
+
+import { execFile } from "node:child_process";
+import { existsSync } from "node:fs";
+import { join } from "node:path";
+import { homedir } from "node:os";
+import type { ExtensionCommandContext } from "@gsd/pi-coding-agent";
+import { showNextAction } from "../shared/tui.js";
+import type { ProjectSignals, XcodePlatform } from "./detection.js";
+
+// ─── Catalog Types ────────────────────────────────────────────────────────────
+
+export interface SkillPack {
+ /** Human-readable name shown in the wizard */
+ label: string;
+ /** Short description */
+ description: string;
+ /** skills.sh repo identifier (owner/repo) */
+ repo: string;
+ /** Specific skill names to install from the repo */
+ skills: string[];
+ /** Which detected primaryLanguage values trigger this pack */
+ matchLanguages?: string[];
+ /** Which detected project files trigger this pack */
+ matchFiles?: string[];
+ /** Trigger when Xcode project targets one of these platforms */
+ matchXcodePlatforms?: XcodePlatform[];
+ /** Always include this pack in brownfield recommendations */
+ matchAlways?: boolean;
+}
+
+// ─── Curated Catalog ──────────────────────────────────────────────────────────
+
+export const SKILL_CATALOG: SkillPack[] = [
+ // ── Swift (language-level — any Swift project) ────────────────────────────
+ {
+ label: "SwiftUI",
+ description: "SwiftUI layout, navigation, animations, gestures, Liquid Glass",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "swiftui-animation",
+ "swiftui-gestures",
+ "swiftui-layout-components",
+ "swiftui-liquid-glass",
+ "swiftui-navigation",
+ "swiftui-patterns",
+ "swiftui-performance",
+ "swiftui-uikit-interop",
+ ],
+ matchLanguages: ["swift"],
+ matchFiles: ["Package.swift"],
+ },
+ {
+ label: "Swift Core",
+ description: "Swift language, concurrency, Codable, Charts, Testing, SwiftData",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "swift-codable",
+ "swift-charts",
+ "swift-concurrency",
+ "swift-language",
+ "swift-testing",
+ "swiftdata",
+ ],
+ matchLanguages: ["swift"],
+ matchFiles: ["Package.swift"],
+ },
+ // ── iOS (Xcode project targeting iphoneos required) ───────────────────────
+ {
+ label: "iOS App Frameworks",
+ description: "App Intents, Widgets, StoreKit, MapKit, Live Activities, push notifications",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "alarmkit",
+ "app-clips",
+ "app-intents",
+ "live-activities",
+ "mapkit-location",
+ "photos-camera-media",
+ "push-notifications",
+ "storekit",
+ "tipkit",
+ "widgetkit",
+ ],
+ matchXcodePlatforms: ["iphoneos"],
+ },
+ {
+ label: "iOS Data Frameworks",
+ description: "CloudKit, HealthKit, MusicKit, WeatherKit, Contacts, Calendar",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "cloudkit-sync",
+ "contacts-framework",
+ "eventkit-calendar",
+ "healthkit",
+ "musickit-audio",
+ "passkit-wallet",
+ "weatherkit",
+ ],
+ matchXcodePlatforms: ["iphoneos"],
+ },
+ {
+ label: "iOS AI & ML",
+ description: "Core ML, Vision, on-device AI, speech recognition, NLP",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "apple-on-device-ai",
+ "coreml",
+ "natural-language",
+ "speech-recognition",
+ "vision-framework",
+ ],
+ matchXcodePlatforms: ["iphoneos"],
+ },
+ {
+ label: "iOS Engineering",
+ description: "Networking, security, accessibility, localization, Instruments, App Store review",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "app-store-review",
+ "authentication",
+ "background-processing",
+ "debugging-instruments",
+ "device-integrity",
+ "ios-accessibility",
+ "ios-localization",
+ "ios-networking",
+ "ios-security",
+ "metrickit-diagnostics",
+ ],
+ matchXcodePlatforms: ["iphoneos"],
+ },
+ {
+ label: "iOS Hardware",
+ description: "Bluetooth, CoreMotion, NFC, PencilKit, RealityKit AR",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "core-bluetooth",
+ "core-motion",
+ "core-nfc",
+ "pencilkit-drawing",
+ "realitykit-ar",
+ ],
+ matchXcodePlatforms: ["iphoneos"],
+ },
+ {
+ label: "iOS Platform",
+ description: "CallKit, EnergyKit, HomeKit, SharePlay, PermissionKit",
+ repo: "dpearson2699/swift-ios-skills",
+ skills: [
+ "callkit-voip",
+ "energykit",
+ "homekit-matter",
+ "permissionkit",
+ "shareplay-activities",
+ ],
+ matchXcodePlatforms: ["iphoneos"],
+ },
+ // ── React / Next.js ───────────────────────────────────────────────────────
+ {
+ label: "React & Web Frontend",
+ description: "React best practices and composition patterns",
+ repo: "vercel-labs/agent-skills",
+ skills: [
+ "vercel-react-best-practices",
+ "vercel-composition-patterns",
+ ],
+ matchLanguages: ["javascript/typescript"],
+ },
+ {
+ label: "shadcn/ui",
+ description: "shadcn/ui component library patterns and usage",
+ repo: "shadcn/ui",
+ skills: ["shadcn"],
+ matchLanguages: ["javascript/typescript"],
+ },
+ // ── React Native ──────────────────────────────────────────────────────────
+ {
+ label: "React Native",
+ description: "React Native and Expo best practices for performant mobile apps",
+ repo: "vercel-labs/agent-skills",
+ skills: ["vercel-react-native-skills"],
+ matchFiles: ["metro.config.js", "metro.config.ts", "react-native.config.js"],
+ },
+ {
+ label: "React Native Architecture",
+ description: "React Native app architecture, navigation, and cross-platform design patterns",
+ repo: "wshobson/agents",
+ skills: ["react-native-architecture", "react-native-design"],
+ matchFiles: ["metro.config.js", "metro.config.ts", "react-native.config.js"],
+ },
+ // ── TypeScript & JS Ecosystem (wshobson/agents — 41K combined installs) ──
+ {
+ label: "TypeScript & JS Development",
+ description: "Advanced TypeScript types, Node.js backend, testing, and modern JS patterns",
+ repo: "wshobson/agents",
+ skills: [
+ "typescript-advanced-types",
+ "nodejs-backend-patterns",
+ "javascript-testing-patterns",
+ "modern-javascript-patterns",
+ ],
+ matchLanguages: ["javascript/typescript"],
+ },
+ // ── React State (wshobson/agents — 8.1K combined installs) ─────────────
+ {
+ label: "React State & Patterns",
+ description: "State management with Zustand, Jotai, React Query, and React modernization",
+ repo: "wshobson/agents",
+ skills: ["react-state-management", "react-modernization"],
+ matchLanguages: ["javascript/typescript"],
+ },
+ // ── Tailwind CSS (wshobson/agents — 22.8K installs) ───────────────────
+ {
+ label: "Tailwind CSS",
+ description: "Tailwind v4 design system, CVA patterns, and utility-first CSS",
+ repo: "wshobson/agents",
+ skills: ["tailwind-design-system"],
+ matchFiles: [
+ "tailwind.config.js",
+ "tailwind.config.ts",
+ "tailwind.config.mjs",
+ "tailwind.config.cjs",
+ ],
+ },
+ // ── General Frontend ──────────────────────────────────────────────────────
+ {
+ label: "Frontend Design & UX",
+ description: "Frontend design, accessibility, and browser automation",
+ repo: "anthropics/skills",
+ skills: ["frontend-design"],
+ matchLanguages: ["javascript/typescript"],
+ },
+ // ── Angular ───────────────────────────────────────────────────────────────
+ {
+ label: "Angular",
+ description: "Angular components, signals, forms, routing, and testing",
+ repo: "analogjs/angular-skills",
+ skills: [
+ "angular-component",
+ "angular-signals",
+ "angular-forms",
+ "angular-routing",
+ "angular-testing",
+ ],
+ matchFiles: ["angular.json"],
+ },
+ {
+ label: "Angular Migration",
+ description: "Migrate from AngularJS to Angular with hybrid mode and incremental rewriting",
+ repo: "wshobson/agents",
+ skills: ["angular-migration"],
+ matchFiles: ["angular.json"],
+ },
+ // ── Vue.js / Nuxt ────────────────────────────────────────────────────────
+ {
+ label: "Vue.js",
+ description: "Vue best practices, Pinia state, Vue Router, and testing",
+ repo: "vuejs-ai/skills",
+ skills: [
+ "vue-best-practices",
+ "vue-pinia-best-practices",
+ "vue-router-best-practices",
+ "vue-testing-best-practices",
+ ],
+ matchFiles: ["nuxt.config.ts", "nuxt.config.js", "vue.config.js", "vue.config.ts", "*.vue"],
+ },
+ // ── Svelte / SvelteKit ────────────────────────────────────────────────────
+ {
+ label: "Svelte",
+ description: "Svelte code patterns and SvelteKit best practices",
+ repo: "sveltejs/ai-tools",
+ skills: ["svelte-code-writer", "svelte-core-bestpractices"],
+ matchFiles: ["svelte.config.js", "svelte.config.ts"],
+ },
+ // ── Next.js ───────────────────────────────────────────────────────────────
+ {
+ label: "Next.js",
+ description: "Next.js app router, server components, and deployment patterns",
+ repo: "vercel-labs/vercel-plugin",
+ skills: ["nextjs"],
+ matchFiles: ["next.config.js", "next.config.ts", "next.config.mjs"],
+ },
+ {
+ label: "Next.js App Router Patterns",
+ description: "Next.js 14+ App Router, React Server Components, and streaming",
+ repo: "wshobson/agents",
+ skills: ["nextjs-app-router-patterns"],
+ matchFiles: ["next.config.js", "next.config.ts", "next.config.mjs"],
+ },
+ // ── Java / Spring Boot ────────────────────────────────────────────────────
+ {
+ label: "Java & Spring Boot",
+ description: "Spring Boot best practices, DI, RESTful APIs, JPA, testing, and security",
+ repo: "github/awesome-copilot",
+ skills: ["java-springboot"],
+ matchFiles: ["dep:spring-boot"],
+ },
+ // ── .NET / C# ────────────────────────────────────────────────────────────
+ {
+ label: ".NET & C#",
+ description: ".NET best practices, design patterns, and upgrade guidance",
+ repo: "github/awesome-copilot",
+ skills: ["dotnet-best-practices", "dotnet-design-pattern-review"],
+ matchLanguages: ["csharp"],
+ matchFiles: ["*.csproj"],
+ },
+ {
+ label: ".NET Backend Patterns",
+ description: ".NET backend architecture, middleware, and production patterns",
+ repo: "wshobson/agents",
+ skills: ["dotnet-backend-patterns"],
+ matchFiles: ["*.csproj", "*.fsproj", "*.sln"],
+ },
+ // ── Flutter / Dart ────────────────────────────────────────────────────────
+ {
+ label: "Flutter",
+ description: "Flutter layouts, architecture, state management, and testing",
+ repo: "flutter/skills",
+ skills: [
+ "flutter-building-layouts",
+ "flutter-architecting-apps",
+ "flutter-managing-state",
+ "flutter-testing-apps",
+ ],
+ matchLanguages: ["dart/flutter"],
+ matchFiles: ["pubspec.yaml"],
+ },
+ // ── PHP / Laravel ─────────────────────────────────────────────────────────
+ {
+ label: "PHP & Laravel",
+ description: "Laravel patterns, PHP best practices, and testing",
+ repo: "jeffallan/claude-skills",
+ skills: ["laravel-specialist", "php-pro"],
+ matchLanguages: ["php"],
+ matchFiles: ["composer.json"],
+ },
+ // ── Django ────────────────────────────────────────────────────────────────
+ {
+ label: "Django",
+ description: "Django expert patterns, models, views, and middleware",
+ repo: "vintasoftware/django-ai-plugins",
+ skills: ["django-expert"],
+ matchFiles: ["manage.py"],
+ },
+ // ── Rust ──────────────────────────────────────────────────────────────────
+ {
+ label: "Rust",
+ description: "Rust language patterns and best practices",
+ repo: "anthropics/skills",
+ skills: ["rust-best-practices"],
+ matchLanguages: ["rust"],
+ matchFiles: ["Cargo.toml"],
+ },
+ {
+ label: "Rust Async Patterns",
+ description: "Async Rust with Tokio, futures, and proper error handling",
+ repo: "wshobson/agents",
+ skills: ["rust-async-patterns"],
+ matchLanguages: ["rust"],
+ matchFiles: ["Cargo.toml"],
+ },
+ // ── Python ────────────────────────────────────────────────────────────────
+ {
+ label: "Python",
+ description: "Python patterns and best practices",
+ repo: "anthropics/skills",
+ skills: ["python-best-practices"],
+ matchLanguages: ["python"],
+ matchFiles: ["pyproject.toml", "setup.py", "requirements.txt"],
+ },
+ {
+ label: "Python Advanced",
+ description: "Python performance, testing, async patterns, and uv package manager",
+ repo: "wshobson/agents",
+ skills: [
+ "python-performance-optimization",
+ "python-testing-patterns",
+ "async-python-patterns",
+ "uv-package-manager",
+ ],
+ matchLanguages: ["python"],
+ matchFiles: ["pyproject.toml", "setup.py", "requirements.txt"],
+ },
+ // FastAPI — detected by scanning requirements.txt / pyproject.toml for the
+ // "fastapi" dependency. Uses the "dep:fastapi" synthetic marker from detection.ts.
+ {
+ label: "FastAPI",
+ description: "Production-ready FastAPI projects with async patterns and error handling",
+ repo: "wshobson/agents",
+ skills: ["fastapi-templates"],
+ matchFiles: ["dep:fastapi"],
+ },
+ // ── Go ────────────────────────────────────────────────────────────────────
+ {
+ label: "Go",
+ description: "Go language patterns and best practices",
+ repo: "anthropics/skills",
+ skills: ["go-best-practices"],
+ matchLanguages: ["go"],
+ matchFiles: ["go.mod"],
+ },
+ {
+ label: "Go Concurrency Patterns",
+ description: "Go concurrency with channels, worker pools, and context cancellation",
+ repo: "wshobson/agents",
+ skills: ["go-concurrency-patterns"],
+ matchLanguages: ["go"],
+ matchFiles: ["go.mod"],
+ },
+ // ── Database / ORM ─────────────────────────────────────────────────────────
+ {
+ label: "Prisma",
+ description: "Prisma ORM setup, schema design, client API, and migrations",
+ repo: "prisma/skills",
+ skills: [
+ "prisma-database-setup",
+ "prisma-client-api",
+ "prisma-cli",
+ ],
+ matchFiles: ["prisma/schema.prisma"],
+ },
+ {
+ label: "Supabase & Postgres",
+ description: "Supabase project setup, auth, Postgres best practices, and Firestore",
+ repo: "supabase/agent-skills",
+ skills: ["supabase-postgres-best-practices"],
+ matchFiles: ["supabase/config.toml"],
+ },
+ {
+ label: "PostgreSQL Design",
+ description: "PostgreSQL table design, indexing strategies, and query optimization",
+ repo: "wshobson/agents",
+ skills: ["postgresql-table-design"],
+ matchFiles: ["supabase/config.toml", "*.sql"],
+ },
+ {
+ label: "SQL Optimization & Review",
+ description: "Universal SQL performance optimization, security (injection prevention), and code review",
+ repo: "github/awesome-copilot",
+ skills: ["sql-optimization", "sql-code-review"],
+ matchFiles: [
+ "*.sql",
+ "*.sqlite",
+ "prisma/schema.prisma",
+ "supabase/config.toml",
+ "drizzle.config.ts",
+ "drizzle.config.js",
+ ],
+ },
+ {
+ label: "Redis",
+ description: "Redis development patterns and best practices",
+ repo: "redis/agent-skills",
+ skills: ["redis-development"],
+ matchFiles: ["redis.conf"],
+ },
+ // ── Cloud Platforms ────────────────────────────────────────────────────────
+ {
+ label: "Firebase",
+ description: "Firebase setup, auth, Firestore, hosting, and AI Logic",
+ repo: "firebase/agent-skills",
+ skills: [
+ "firebase-basics",
+ "firebase-auth-basics",
+ "firebase-firestore-basics",
+ "firebase-hosting-basics",
+ "firebase-ai-logic",
+ ],
+ matchFiles: ["firebase.json"],
+ },
+ {
+ label: "Azure",
+ description: "Azure deployment, AI services, storage, cost optimization, and diagnostics",
+ repo: "microsoft/github-copilot-for-azure",
+ skills: [
+ "azure-deploy",
+ "azure-ai",
+ "azure-storage",
+ "azure-cost-optimization",
+ "azure-diagnostics",
+ ],
+ matchFiles: ["azure-pipelines.yml"],
+ },
+ {
+ label: "AWS",
+ description: "AWS deployment, Lambda, and serverless patterns",
+ repo: "awslabs/agent-plugins",
+ skills: ["deploy", "aws-lambda", "aws-serverless-deployment"],
+ matchFiles: ["cdk.json", "samconfig.toml", "serverless.yml", "serverless.yaml"],
+ },
+ // ── Container / DevOps ─────────────────────────────────────────────────────
+ {
+ label: "Docker",
+ description: "Multi-stage Dockerfiles, layer optimization, and security hardening",
+ repo: "github/awesome-copilot",
+ skills: ["multi-stage-dockerfile"],
+ matchFiles: ["Dockerfile", "docker-compose.yml", "docker-compose.yaml"],
+ },
+ // ── Infrastructure as Code ─────────────────────────────────────────────────
+ {
+ label: "Terraform",
+ description: "Terraform style guide, testing, and stack patterns",
+ repo: "hashicorp/agent-skills",
+ skills: ["terraform-style-guide", "terraform-test", "terraform-stacks"],
+ matchFiles: ["main.tf"],
+ },
+ // ── Android (wshobson/agents — 7K installs) ────────────────────────────────
+ {
+ label: "Android",
+ description: "Android app design following Material Design 3 guidelines",
+ repo: "wshobson/agents",
+ skills: ["mobile-android-design"],
+ matchFiles: ["app/build.gradle", "app/build.gradle.kts"],
+ },
+ // ── Kubernetes (wshobson/agents — 4 skills) ────────────────────────────────
+ {
+ label: "Kubernetes",
+ description: "K8s manifests, Helm charts, GitOps workflows, and security policies",
+ repo: "wshobson/agents",
+ skills: [
+ "k8s-manifest-generator",
+ "helm-chart-scaffolding",
+ "gitops-workflow",
+ "k8s-security-policies",
+ ],
+ matchFiles: ["Chart.yaml", "kustomization.yaml"],
+ },
+ // ── CI/CD (wshobson/agents — 3 skills) ─────────────────────────────────────
+ {
+ label: "CI/CD Automation",
+ description: "Pipeline design, GitHub Actions workflows, and secrets management",
+ repo: "wshobson/agents",
+ skills: [
+ "deployment-pipeline-design",
+ "github-actions-templates",
+ "secrets-management",
+ ],
+ matchFiles: [".github/workflows"],
+ },
+ // ── Blockchain / Web3 (wshobson/agents — 3 skills) ─────────────────────────
+ {
+ label: "Blockchain & Web3",
+ description: "Solidity security, DeFi protocols, and smart contract testing",
+ repo: "wshobson/agents",
+ skills: ["solidity-security", "defi-protocol-templates", "web3-testing"],
+ matchFiles: ["hardhat.config.js", "hardhat.config.ts", "foundry.toml"],
+ },
+ // ── Data Engineering (wshobson/agents — 4 skills) ──────────────────────────
+ {
+ label: "Data Engineering",
+ description: "dbt transformations, Airflow DAGs, Spark optimization, and data quality",
+ repo: "wshobson/agents",
+ skills: [
+ "dbt-transformation-patterns",
+ "airflow-dag-patterns",
+ "spark-optimization",
+ "data-quality-frameworks",
+ ],
+ matchFiles: ["dbt_project.yml", "airflow.cfg"],
+ },
+ // ── Game Development — Unity (wshobson/agents) ─────────────────────────────
+ {
+ label: "Unity",
+ description: "Unity ECS patterns for high-performance game systems",
+ repo: "wshobson/agents",
+ skills: ["unity-ecs-patterns"],
+ matchFiles: ["ProjectSettings/ProjectVersion.txt"],
+ },
+ // ── Game Development — Godot (wshobson/agents) ─────────────────────────────
+ {
+ label: "Godot",
+ description: "Godot GDScript best practices and scene composition",
+ repo: "wshobson/agents",
+ skills: ["godot-gdscript-patterns"],
+ matchFiles: ["project.godot"],
+ },
+ // ── Essential (all projects) ────────────────────────────────────────────
+ {
+ label: "Skill Discovery",
+ description: "Find and install new agent skills from the ecosystem",
+ repo: "vercel-labs/skills",
+ skills: ["find-skills"],
+ matchAlways: true,
+ },
+ {
+ label: "Skill Authoring",
+ description: "Create, audit, and refine SKILL.md files",
+ repo: "anthropics/skills",
+ skills: ["skill-creator"],
+ matchAlways: true,
+ },
+ {
+ label: "Browser Automation",
+ description: "Browser automation for web scraping, testing, and interaction",
+ repo: "vercel-labs/agent-browser",
+ skills: ["agent-browser"],
+ matchAlways: true,
+ },
+ // ── General Tooling ───────────────────────────────────────────────────────
+ {
+ label: "Document Handling",
+ description: "PDF, DOCX, XLSX, PPTX creation and manipulation",
+ repo: "anthropics/skills",
+ skills: ["pdf", "docx", "xlsx", "pptx"],
+ matchAlways: true,
+ },
+ // ── Code Quality (wshobson/agents — matchAlways) ──────────────────────────
+ {
+ label: "Code Review & Quality",
+ description: "Code review excellence and error handling patterns",
+ repo: "wshobson/agents",
+ skills: ["code-review-excellence", "error-handling-patterns"],
+ matchAlways: true,
+ },
+ {
+ label: "Git Advanced Workflows",
+ description: "Advanced Git rebasing, cherry-picking, bisect, worktrees, and reflog",
+ repo: "wshobson/agents",
+ skills: ["git-advanced-workflows"],
+ matchAlways: true,
+ },
+];
+
+// ─── Greenfield Tech Stack Choices ────────────────────────────────────────────
+
+/**
+ * Tech stack → pack mappings for programmatic use.
+ *
+ * NOT shown directly to users during init (greenfield installs essentials
+ * only and defers stack-specific skills). These mappings are available for:
+ * 1. The LLM to install skills after establishing a design
+ * 2. The `/gsd skills` command (explicit user request)
+ * 3. Re-running brownfield detection after project files are created
+ */
+export const GREENFIELD_STACKS: Array<{
+ id: string;
+ label: string;
+ description: string;
+ packs: string[];
+}> = [
+ {
+ id: "ios",
+ label: "iOS App",
+ description: "Full iOS development — SwiftUI, Swift, and all iOS frameworks",
+ packs: [
+ "SwiftUI",
+ "Swift Core",
+ "iOS App Frameworks",
+ "iOS Data Frameworks",
+ "iOS AI & ML",
+ "iOS Engineering",
+ "iOS Hardware",
+ "iOS Platform",
+ ],
+ },
+ {
+ id: "swift",
+ label: "Swift (non-iOS)",
+ description: "Swift packages, server-side Swift, CLI tools, SwiftUI without iOS",
+ packs: ["SwiftUI", "Swift Core"],
+ },
+ {
+ id: "react-web",
+ label: "React Web",
+ description: "React, Next.js, shadcn/ui, web frontend",
+ packs: ["React & Web Frontend", "TypeScript & JS Development", "React State & Patterns", "Tailwind CSS", "shadcn/ui", "Frontend Design & UX"],
+ },
+ {
+ id: "react-native",
+ label: "React Native",
+ description: "Cross-platform mobile with React Native",
+ packs: ["React Native", "React Native Architecture", "React & Web Frontend", "TypeScript & JS Development"],
+ },
+ {
+ id: "fullstack-js",
+ label: "Full-Stack JavaScript/TypeScript",
+ description: "Node.js backend + React frontend",
+ packs: ["React & Web Frontend", "TypeScript & JS Development", "React State & Patterns", "Tailwind CSS", "shadcn/ui", "Frontend Design & UX", "Prisma"],
+ },
+ {
+ id: "rust",
+ label: "Rust",
+ description: "Systems programming with Rust",
+ packs: ["Rust", "Rust Async Patterns"],
+ },
+ {
+ id: "python",
+ label: "Python",
+ description: "Python applications, scripts, or ML",
+ packs: ["Python", "Python Advanced"],
+ },
+ {
+ id: "go",
+ label: "Go",
+ description: "Go services and CLIs",
+ packs: ["Go", "Go Concurrency Patterns"],
+ },
+ {
+ id: "firebase",
+ label: "Firebase",
+ description: "Firebase backend — auth, Firestore, hosting, AI",
+ packs: ["Firebase"],
+ },
+ {
+ id: "aws",
+ label: "AWS",
+ description: "AWS deployment, Lambda, serverless",
+ packs: ["AWS"],
+ },
+ {
+ id: "azure",
+ label: "Azure",
+ description: "Azure deployment, AI, storage, diagnostics",
+ packs: ["Azure"],
+ },
+ {
+ id: "angular",
+ label: "Angular",
+ description: "Angular components, signals, forms, routing",
+ packs: ["Angular", "Angular Migration", "Frontend Design & UX"],
+ },
+ {
+ id: "vue",
+ label: "Vue.js / Nuxt",
+ description: "Vue.js with Pinia, Vue Router, and testing",
+ packs: ["Vue.js", "Frontend Design & UX"],
+ },
+ {
+ id: "svelte",
+ label: "Svelte / SvelteKit",
+ description: "Svelte 5 and SvelteKit patterns",
+ packs: ["Svelte", "Tailwind CSS", "Frontend Design & UX"],
+ },
+ {
+ id: "nextjs",
+ label: "Next.js",
+ description: "Next.js app router, React, and Vercel deployment",
+ packs: ["Next.js", "Next.js App Router Patterns", "React & Web Frontend", "TypeScript & JS Development", "Tailwind CSS", "shadcn/ui"],
+ },
+ {
+ id: "flutter",
+ label: "Flutter",
+ description: "Cross-platform Flutter/Dart development",
+ packs: ["Flutter"],
+ },
+ {
+ id: "java",
+ label: "Java / Spring Boot",
+ description: "Spring Boot APIs, JPA, and testing",
+ packs: ["Java & Spring Boot"],
+ },
+ {
+ id: "dotnet",
+ label: ".NET / C#",
+ description: "ASP.NET Core, Entity Framework, and design patterns",
+ packs: [".NET & C#", ".NET Backend Patterns"],
+ },
+ {
+ id: "php",
+ label: "PHP / Laravel",
+ description: "Laravel patterns and PHP best practices",
+ packs: ["PHP & Laravel"],
+ },
+ {
+ id: "django",
+ label: "Django",
+ description: "Django models, views, middleware, and Celery",
+ packs: ["Django", "Python", "Python Advanced"],
+ },
+ {
+ id: "fastapi",
+ label: "FastAPI",
+ description: "FastAPI web APIs with async patterns",
+ packs: ["FastAPI", "Python", "Python Advanced"],
+ },
+ {
+ id: "android",
+ label: "Android / Kotlin",
+ description: "Android app development with Material Design 3",
+ packs: ["Android"],
+ },
+ {
+ id: "kubernetes",
+ label: "Kubernetes",
+ description: "Kubernetes manifests, Helm charts, and GitOps",
+ packs: ["Kubernetes", "Docker"],
+ },
+ {
+ id: "blockchain",
+ label: "Blockchain / Web3",
+ description: "Solidity, DeFi protocols, and smart contract testing",
+ packs: ["Blockchain & Web3"],
+ },
+ {
+ id: "data-engineering",
+ label: "Data Engineering",
+ description: "dbt, Airflow, Spark, and data quality",
+ packs: ["Data Engineering", "Python", "Python Advanced"],
+ },
+ {
+ id: "unity",
+ label: "Unity",
+ description: "Unity game development with ECS patterns",
+ packs: ["Unity"],
+ },
+ {
+ id: "godot",
+ label: "Godot",
+ description: "Godot game development with GDScript",
+ packs: ["Godot"],
+ },
+ {
+ id: "other",
+ label: "Other / Skip",
+ description: "Install skills later with npx skills add",
+ packs: [],
+ },
+];
+
+// ─── Detection → Pack Matching ────────────────────────────────────────────────
+
+/**
+ * Match project signals to relevant skill packs.
+ * Returns packs in catalog order (not sorted by match type).
+ */
+export function matchPacksForProject(signals: ProjectSignals): SkillPack[] {
+ const matched = new Set();
+
+ for (const pack of SKILL_CATALOG) {
+ // Language match
+ if (pack.matchLanguages && signals.primaryLanguage) {
+ if (pack.matchLanguages.includes(signals.primaryLanguage)) {
+ matched.add(pack);
+ continue;
+ }
+ }
+
+ // File match
+ if (pack.matchFiles) {
+ for (const file of pack.matchFiles) {
+ if (signals.detectedFiles.includes(file)) {
+ matched.add(pack);
+ break;
+ }
+ }
+ }
+
+ // Xcode platform match (e.g. iOS packs only when SDKROOT = iphoneos)
+ if (pack.matchXcodePlatforms && signals.xcodePlatforms.length > 0) {
+ const hasMatch = pack.matchXcodePlatforms.some((p) => signals.xcodePlatforms.includes(p));
+ if (hasMatch) matched.add(pack);
+ }
+
+ // Always-include packs (essentials)
+ if (pack.matchAlways) {
+ matched.add(pack);
+ }
+ }
+
+ return [...matched];
+}
+
+// ─── Installation ─────────────────────────────────────────────────────────────
+
+/**
+ * Install a skill pack via the skills.sh CLI.
+ * Runs: npx skills add --skill ... -y
+ *
+ * Returns true if installation succeeded.
+ */
+export function installSkillPack(pack: SkillPack): Promise {
+ return new Promise((resolve) => {
+ // --yes = npx auto-install, -y = skills.sh non-interactive
+ const args = ["--yes", "skills", "add", pack.repo];
+
+ for (const skill of pack.skills) {
+ args.push("--skill", skill);
+ }
+ args.push("-y");
+
+ execFile("npx", args, { timeout: 120_000 }, (error) => {
+ resolve(!error);
+ });
+ });
+}
+
+/**
+ * Install multiple packs, batching by repo to minimize npx invocations.
+ * Returns the labels of successfully installed packs.
+ */
+export async function installPacksBatched(
+ packs: SkillPack[],
+ onProgress?: (label: string) => void,
+): Promise {
+ // Group packs by repo
+ const byRepo = new Map();
+ for (const pack of packs) {
+ const entry = byRepo.get(pack.repo) ?? { skills: [], labels: [] };
+ entry.skills.push(...pack.skills);
+ entry.labels.push(pack.label);
+ byRepo.set(pack.repo, entry);
+ }
+
+ const installed: string[] = [];
+ for (const [repo, { skills, labels }] of byRepo) {
+ onProgress?.(labels.join(", "));
+ const ok = await new Promise((resolve) => {
+ // --yes = npx auto-install, -y = skills.sh non-interactive
+ const args = ["--yes", "skills", "add", repo];
+ for (const skill of skills) {
+ args.push("--skill", skill);
+ }
+ args.push("-y");
+ execFile("npx", args, { timeout: 120_000 }, (error) => {
+ resolve(!error);
+ });
+ });
+ if (ok) installed.push(...labels);
+ }
+ return installed;
+}
+
+/**
+ * Check if any skills from a pack are already installed.
+ */
+export function isPackInstalled(pack: SkillPack): boolean {
+ const skillsDir = join(homedir(), ".agents", "skills");
+ if (!existsSync(skillsDir)) return false;
+
+ return pack.skills.every((name) =>
+ existsSync(join(skillsDir, name, "SKILL.md")),
+ );
+}
+
+// ─── Init Wizard Integration ──────────────────────────────────────────────────
+
+/**
+ * Run skill installation step during project init.
+ *
+ * Brownfield (signals.detectedFiles.length > 0):
+ * Auto-detects tech stack → shows matched packs → installs accepted ones.
+ *
+ * Greenfield (no files detected):
+ * Installs essential packs only (find-skills, skill-creator, etc.).
+ * Stack-specific skills are deferred — once the LLM establishes a design
+ * and creates project files (package.json, firebase.json, etc.), brownfield
+ * detection will pick them up on the next `gsd init` or via auto-mode
+ * skill discovery.
+ *
+ * Returns the list of installed pack labels.
+ */
+export async function runSkillInstallStep(
+ ctx: ExtensionCommandContext,
+ signals: ProjectSignals,
+): Promise {
+ const installed: string[] = [];
+ const isBrownfield = signals.detectedFiles.length > 0;
+
+ if (isBrownfield) {
+ // ── Brownfield: auto-detect and confirm ─────────────────────────────────
+ const matched = matchPacksForProject(signals);
+ if (matched.length === 0) return installed;
+
+ // Filter out already-installed packs
+ const toInstall = matched.filter((p) => !isPackInstalled(p));
+ if (toInstall.length === 0) return installed;
+
+ // Group for display: Swift packs vs iOS packs vs other
+ const swiftPacks = toInstall.filter((p) => p.matchLanguages?.includes("swift"));
+ const iosPacks = toInstall.filter((p) => p.matchXcodePlatforms?.includes("iphoneos"));
+ const otherPacks = toInstall.filter((p) => !swiftPacks.includes(p) && !iosPacks.includes(p));
+
+ const summaryLines: string[] = [];
+ const hasIOS = signals.xcodePlatforms.includes("iphoneos");
+ if (hasIOS) {
+ summaryLines.push(`Detected: iOS project (${signals.primaryLanguage ?? "swift"})`);
+ } else if (signals.xcodePlatforms.length > 0) {
+ summaryLines.push(`Detected: ${signals.xcodePlatforms.join(", ")} Xcode project (${signals.primaryLanguage ?? "swift"})`);
+ } else {
+ summaryLines.push(`Detected: ${signals.primaryLanguage ?? "unknown"} project`);
+ }
+ summaryLines.push("");
+ summaryLines.push("Recommended skill packs:");
+ if (swiftPacks.length > 0) {
+ summaryLines.push(` Swift: ${swiftPacks.map((p) => p.label).join(", ")}`);
+ }
+ if (iosPacks.length > 0) {
+ summaryLines.push(` iOS: ${iosPacks.map((p) => p.label).join(", ")}`);
+ }
+ for (const p of otherPacks) {
+ summaryLines.push(` • ${p.label}: ${p.description}`);
+ }
+
+ const totalSkills = toInstall.reduce((n, p) => n + p.skills.length, 0);
+ const choice = await showNextAction(ctx, {
+ title: "GSD — Install Skills",
+ summary: summaryLines,
+ actions: [
+ {
+ id: "install",
+ label: "Install recommended skills",
+ description: `Install ${totalSkills} skills from ${toInstall.length} pack${toInstall.length > 1 ? "s" : ""} via skills.sh`,
+ recommended: true,
+ },
+ {
+ id: "skip",
+ label: "Skip",
+ description: "Install skills later with npx skills add",
+ },
+ ],
+ notYetMessage: "Run /gsd init when ready.",
+ });
+
+ if (choice === "install") {
+ const labels = await installPacksBatched(toInstall, (label) => {
+ ctx.ui.notify(`Installing ${label} skills...`, "info");
+ });
+ installed.push(...labels);
+ const failed = toInstall.filter((p) => !installed.includes(p.label));
+ for (const pack of failed) {
+ ctx.ui.notify(`Failed to install ${pack.label} — try manually: npx skills add ${pack.repo}`, "info");
+ }
+ }
+ } else {
+ // ── Greenfield: install essentials only ─────────────────────────────────
+ // Don't ask the user what tech stack they're building — they may not know
+ // yet, especially non-technical users. Install essential packs (discovery,
+ // authoring, browser, docs) and let stack-specific skills auto-detect later
+ // once the LLM establishes the design and creates project files.
+ const essentials = SKILL_CATALOG.filter((p) => p.matchAlways && !isPackInstalled(p));
+ if (essentials.length === 0) return installed;
+
+ const totalSkills = essentials.reduce((n, p) => n + p.skills.length, 0);
+ const choice = await showNextAction(ctx, {
+ title: "GSD — Install Essential Skills",
+ summary: [
+ "GSD will install essential agent skills (skill discovery, authoring,",
+ "browser automation, document handling).",
+ "",
+ "Stack-specific skills (React, Swift, Python, etc.) will be recommended",
+ "automatically once your project files are in place.",
+ ],
+ actions: [
+ {
+ id: "install",
+ label: "Install essentials",
+ description: `Install ${totalSkills} essential skills via skills.sh`,
+ recommended: true,
+ },
+ {
+ id: "skip",
+ label: "Skip",
+ description: "Install skills later with npx skills add",
+ },
+ ],
+ notYetMessage: "Run /gsd init when ready.",
+ });
+
+ if (choice === "install") {
+ const labels = await installPacksBatched(essentials, (label) => {
+ ctx.ui.notify(`Installing ${label} skills...`, "info");
+ });
+ installed.push(...labels);
+ }
+ }
+
+ if (installed.length > 0) {
+ ctx.ui.notify(`Installed: ${installed.join(", ")}`, "info");
+ }
+
+ return installed;
+}
diff --git a/src/resources/extensions/gsd/skill-discovery.ts b/src/resources/extensions/gsd/skill-discovery.ts
index f623c1a21..e8c224ea4 100644
--- a/src/resources/extensions/gsd/skill-discovery.ts
+++ b/src/resources/extensions/gsd/skill-discovery.ts
@@ -10,9 +10,10 @@
import { existsSync, readdirSync, readFileSync } from "node:fs";
import { join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
+import { homedir } from "node:os";
-const SKILLS_DIR = join(getAgentDir(), "skills");
+/** Industry-standard skills.sh global skills directory */
+const SKILLS_DIR = join(homedir(), ".agents", "skills");
export interface DiscoveredSkill {
name: string;
diff --git a/src/resources/extensions/gsd/skill-health.ts b/src/resources/extensions/gsd/skill-health.ts
index 778bba7a3..a59f4d8aa 100644
--- a/src/resources/extensions/gsd/skill-health.ts
+++ b/src/resources/extensions/gsd/skill-health.ts
@@ -15,7 +15,7 @@
import { existsSync, readFileSync, readdirSync } from "node:fs";
import { join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
+import { homedir } from "node:os";
import type { UnitMetrics, MetricsLedger } from "./metrics.js";
import { formatCost, formatTokenCount, loadLedgerFromDisk } from "./metrics.js";
import { getSkillLastUsed, detectStaleSkills } from "./skill-telemetry.js";
@@ -208,7 +208,7 @@ export function formatSkillDetail(basePath: string, skillName: string): string {
}
// Check for SKILL.md existence
- const skillPath = join(getAgentDir(), "skills", skillName, "SKILL.md");
+ const skillPath = join(homedir(), ".agents", "skills", skillName, "SKILL.md");
if (existsSync(skillPath)) {
const stat = require("node:fs").statSync(skillPath);
lines.push("");
diff --git a/src/resources/extensions/gsd/skill-telemetry.ts b/src/resources/extensions/gsd/skill-telemetry.ts
index ac99e4e83..f1bddfd21 100644
--- a/src/resources/extensions/gsd/skill-telemetry.ts
+++ b/src/resources/extensions/gsd/skill-telemetry.ts
@@ -13,7 +13,7 @@
import { existsSync, readdirSync, readFileSync, statSync } from "node:fs";
import { join } from "node:path";
-import { getAgentDir } from "@gsd/pi-coding-agent";
+import { homedir } from "node:os";
// ─── In-memory state ──────────────────────────────────────────────────────────
@@ -30,8 +30,14 @@ const activelyLoadedSkills = new Set();
* Called before each unit starts.
*/
export function captureAvailableSkills(): void {
- const skillsDir = join(getAgentDir(), "skills");
- availableSkills = listSkillNames(skillsDir);
+ const skillsDir = join(homedir(), ".agents", "skills");
+ const legacyDir = join(homedir(), ".gsd", "agent", "skills");
+ const names = listSkillNames(skillsDir);
+ // Include skills still in the legacy directory only if migration hasn't completed
+ const legacyMigrated = existsSync(join(legacyDir, ".migrated-to-agents"));
+ const legacyNames = legacyMigrated ? [] : listSkillNames(legacyDir);
+ const all = new Set([...names, ...legacyNames]);
+ availableSkills = [...all];
activelyLoadedSkills.clear();
}
@@ -99,8 +105,12 @@ export function detectStaleSkills(
const stale: string[] = [];
// Check all installed skills, not just those with usage data
- const skillsDir = join(getAgentDir(), "skills");
- const installed = listSkillNames(skillsDir);
+ const skillsDir = join(homedir(), ".agents", "skills");
+ const legacyDir = join(homedir(), ".gsd", "agent", "skills");
+ const legacyMigrated = existsSync(join(legacyDir, ".migrated-to-agents"));
+ const legacyNames = legacyMigrated ? [] : listSkillNames(legacyDir);
+ const installedSet = new Set([...listSkillNames(skillsDir), ...legacyNames]);
+ const installed = [...installedSet];
for (const skill of installed) {
const lastTs = lastUsed.get(skill);
diff --git a/src/resources/extensions/gsd/tests/detection.test.ts b/src/resources/extensions/gsd/tests/detection.test.ts
index 1f363b72d..b1a1647dc 100644
--- a/src/resources/extensions/gsd/tests/detection.test.ts
+++ b/src/resources/extensions/gsd/tests/detection.test.ts
@@ -350,3 +350,841 @@ test("detectProjectSignals: Makefile with test target", (t) => {
assert.ok(signals.detectedFiles.includes("Makefile"));
assert.ok(signals.verificationCommands.includes("make test"));
});
+
+test("detectProjectSignals: SQLite file detection via extensions", () => {
+ const dir = makeTempDir("signals-sqlite");
+ try {
+ writeFileSync(join(dir, "app.sqlite3"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.sqlite"), "should add synthetic *.sqlite marker");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: SQL file detection", () => {
+ const dir = makeTempDir("signals-sql");
+ try {
+ writeFileSync(join(dir, "migrations.sql"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.sql"), "should add synthetic *.sql marker");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested SQL file detection", () => {
+ const dir = makeTempDir("signals-sql-nested");
+ try {
+ mkdirSync(join(dir, "db", "migrations"), { recursive: true });
+ writeFileSync(join(dir, "db", "migrations", "001_init.sql"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.sql"), "should detect nested SQL files");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: .db file triggers SQLite detection", () => {
+ const dir = makeTempDir("signals-db");
+ try {
+ writeFileSync(join(dir, "data.db"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.sqlite"), "should add synthetic *.sqlite marker for .db files");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: no SQLite markers without matching files", () => {
+ const dir = makeTempDir("signals-no-sqlite");
+ try {
+ writeFileSync(join(dir, "package.json"), "{}", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("*.sqlite"), "should not have *.sqlite marker");
+ assert.ok(!signals.detectedFiles.includes("*.sql"), "should not have *.sql marker");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: .NET project via .csproj extension", () => {
+ const dir = makeTempDir("signals-dotnet");
+ try {
+ writeFileSync(join(dir, "MyApp.csproj"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.csproj"), "should add synthetic *.csproj marker");
+ assert.equal(signals.primaryLanguage, "csharp");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested .csproj detection", () => {
+ const dir = makeTempDir("signals-dotnet-nested");
+ try {
+ mkdirSync(join(dir, "src", "App"), { recursive: true });
+ writeFileSync(join(dir, "src", "App", "App.csproj"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.csproj"), "should detect nested .csproj files");
+ assert.equal(signals.primaryLanguage, "csharp");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: .NET project via .sln extension", () => {
+ const dir = makeTempDir("signals-sln");
+ try {
+ writeFileSync(join(dir, "MyApp.sln"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.sln"), "should add synthetic *.sln marker for .sln files");
+ assert.equal(signals.primaryLanguage, "dotnet");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: F# project via .fsproj extension", () => {
+ const dir = makeTempDir("signals-fsharp");
+ try {
+ writeFileSync(join(dir, "MyApp.fsproj"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.fsproj"), "should add synthetic *.fsproj marker");
+ assert.equal(signals.primaryLanguage, "fsharp");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Angular project via angular.json", () => {
+ const dir = makeTempDir("signals-angular");
+ try {
+ writeFileSync(join(dir, "angular.json"), "{}", "utf-8");
+ writeFileSync(join(dir, "package.json"), "{}", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("angular.json"));
+ assert.equal(signals.primaryLanguage, "javascript/typescript");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Next.js project via next.config.ts", () => {
+ const dir = makeTempDir("signals-nextjs");
+ try {
+ writeFileSync(join(dir, "next.config.ts"), "export default {}", "utf-8");
+ writeFileSync(join(dir, "package.json"), "{}", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("next.config.ts"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested Next.js config via packages/web/next.config.ts", () => {
+ const dir = makeTempDir("signals-nextjs-nested");
+ try {
+ mkdirSync(join(dir, "packages", "web"), { recursive: true });
+ writeFileSync(join(dir, "packages", "web", "next.config.ts"), "export default {}", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("next.config.ts"), "should detect nested Next.js config");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Flutter project via pubspec.yaml", () => {
+ const dir = makeTempDir("signals-flutter");
+ try {
+ writeFileSync(join(dir, "pubspec.yaml"), "name: my_app", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("pubspec.yaml"));
+ assert.equal(signals.primaryLanguage, "dart/flutter");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Django project via manage.py", () => {
+ const dir = makeTempDir("signals-django");
+ try {
+ writeFileSync(join(dir, "manage.py"), "#!/usr/bin/env python", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("manage.py"));
+ assert.equal(signals.primaryLanguage, "python");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested Django manage.py", () => {
+ const dir = makeTempDir("signals-django-nested");
+ try {
+ mkdirSync(join(dir, "services", "api"), { recursive: true });
+ writeFileSync(join(dir, "services", "api", "manage.py"), "#!/usr/bin/env python", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("manage.py"), "should detect nested manage.py");
+ assert.equal(signals.primaryLanguage, "python");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Docker project via Dockerfile", () => {
+ const dir = makeTempDir("signals-docker");
+ try {
+ writeFileSync(join(dir, "Dockerfile"), "FROM node:18", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("Dockerfile"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Terraform project via main.tf", () => {
+ const dir = makeTempDir("signals-terraform");
+ try {
+ writeFileSync(join(dir, "main.tf"), 'provider "aws" {}', "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("main.tf"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+// ── QA4/QA5 — new detection tests ──────────────────────────────────────────
+
+test("detectProjectSignals: Vue.js via .vue files in src/", () => {
+ const dir = makeTempDir("signals-vue");
+ try {
+ writeFileSync(join(dir, "package.json"), '{"name":"vue-app"}', "utf-8");
+ mkdirSync(join(dir, "src"), { recursive: true });
+ writeFileSync(join(dir, "src", "App.vue"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.vue"), "should add *.vue synthetic marker");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Vue.js via nested .vue file in src/components/", () => {
+ const dir = makeTempDir("signals-vue-nested");
+ try {
+ writeFileSync(join(dir, "package.json"), '{"name":"vue-app"}', "utf-8");
+ mkdirSync(join(dir, "src", "components"), { recursive: true });
+ writeFileSync(join(dir, "src", "components", "Card.vue"), "", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("*.vue"), "should detect nested .vue files");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Vue CLI via vue.config.js", () => {
+ const dir = makeTempDir("signals-vue-cli");
+ try {
+ writeFileSync(join(dir, "package.json"), '{"name":"vue-cli-app"}', "utf-8");
+ writeFileSync(join(dir, "vue.config.js"), "module.exports = {};", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("vue.config.js"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: requirements.txt sets Python language", () => {
+ const dir = makeTempDir("signals-requirements");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "flask==3.0\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("requirements.txt"));
+ assert.equal(signals.primaryLanguage, "python");
+ assert.ok(signals.verificationCommands.includes("pytest"), "should suggest pytest for requirements.txt projects");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Android project via app/build.gradle", () => {
+ const dir = makeTempDir("signals-android");
+ try {
+ mkdirSync(join(dir, "app"), { recursive: true });
+ writeFileSync(join(dir, "app", "build.gradle"), "apply plugin: 'com.android.application'", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("app/build.gradle"));
+ assert.equal(signals.primaryLanguage, "java/kotlin");
+ assert.ok(!signals.detectedFiles.includes("build.gradle"), "should not collapse Android app/build.gradle into generic build.gradle");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested app/build.gradle normalizes to Android marker", () => {
+ const dir = makeTempDir("signals-android-nested");
+ try {
+ mkdirSync(join(dir, "apps", "mobile", "app"), { recursive: true });
+ writeFileSync(join(dir, "apps", "mobile", "app", "build.gradle"), "apply plugin: 'com.android.application'", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("app/build.gradle"), "should detect nested Android app/build.gradle");
+ assert.ok(!signals.detectedFiles.includes("build.gradle"), "should not emit generic build.gradle marker for nested Android modules");
+ assert.equal(signals.primaryLanguage, "java/kotlin");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Unity project via ProjectSettings/ProjectVersion.txt", () => {
+ const dir = makeTempDir("signals-unity");
+ try {
+ mkdirSync(join(dir, "ProjectSettings"), { recursive: true });
+ writeFileSync(join(dir, "ProjectSettings", "ProjectVersion.txt"), "m_EditorVersion: 2022.3", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("ProjectSettings/ProjectVersion.txt"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Godot project via project.godot", () => {
+ const dir = makeTempDir("signals-godot");
+ try {
+ writeFileSync(join(dir, "project.godot"), "[application]", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("project.godot"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Airflow via airflow.cfg", () => {
+ const dir = makeTempDir("signals-airflow");
+ try {
+ writeFileSync(join(dir, "airflow.cfg"), "[core]\ndags_folder = ./dags", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("airflow.cfg"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Kubernetes via Chart.yaml (Helm)", () => {
+ const dir = makeTempDir("signals-k8s");
+ try {
+ writeFileSync(join(dir, "Chart.yaml"), "apiVersion: v2\nname: my-chart", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("Chart.yaml"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Blockchain via hardhat.config.ts", () => {
+ const dir = makeTempDir("signals-blockchain");
+ try {
+ writeFileSync(join(dir, "hardhat.config.ts"), 'import "@nomiclabs/hardhat-ethers"', "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("hardhat.config.ts"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: CI/CD via .github/workflows", () => {
+ const dir = makeTempDir("signals-cicd");
+ try {
+ mkdirSync(join(dir, ".github", "workflows"), { recursive: true });
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes(".github/workflows"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Tailwind via tailwind.config.ts", () => {
+ const dir = makeTempDir("signals-tailwind");
+ try {
+ writeFileSync(join(dir, "package.json"), '{"name":"tw-app"}', "utf-8");
+ writeFileSync(join(dir, "tailwind.config.ts"), "export default {};", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("tailwind.config.ts"));
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI detected via requirements.txt dependency", () => {
+ const dir = makeTempDir("signals-fastapi-req");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "fastapi==0.115.0\nuvicorn[standard]\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should add dep:fastapi marker");
+ assert.equal(signals.primaryLanguage, "python");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI detected via pyproject.toml dependency", () => {
+ const dir = makeTempDir("signals-fastapi-pyproject");
+ try {
+ writeFileSync(join(dir, "pyproject.toml"), '[project]\ndependencies = ["fastapi>=0.100"]\n', "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should add dep:fastapi marker");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI detected with PEP 508 ~= operator", () => {
+ const dir = makeTempDir("signals-fastapi-compatible-release");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "fastapi~=0.115\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "~= should count as a FastAPI dependency");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: pyproject metadata mention does not trigger dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-pyproject-metadata");
+ try {
+ writeFileSync(
+ join(dir, "pyproject.toml"),
+ '[project]\nname = "example"\nkeywords = ["fastapi"]\ndependencies = ["flask>=3.0"]\n',
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "metadata-only mentions should not trigger FastAPI detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: pyproject dependency table extras do not trigger dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-pyproject-table-extra");
+ try {
+ writeFileSync(
+ join(dir, "pyproject.toml"),
+ '[tool.poetry.dependencies]\npython = "^3.12"\nmy-sdk = { version = "^1.0", extras = ["fastapi"] }\n',
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "dependency table extras should not imply FastAPI framework usage");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Poetry group FastAPI dependency does not imply app framework usage", () => {
+ const dir = makeTempDir("signals-fastapi-poetry-group");
+ try {
+ writeFileSync(
+ join(dir, "pyproject.toml"),
+ '[tool.poetry.dependencies]\npython = "^3.12"\nflask = "^3.0"\n\n[tool.poetry.group.dev.dependencies]\nfastapi = "^0.115"\n',
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "Poetry dev-group dependencies should not imply FastAPI app usage");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: pyproject optional-dependency group name does not trigger dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-pyproject-extra-name");
+ try {
+ writeFileSync(
+ join(dir, "pyproject.toml"),
+ '[project]\ndependencies = ["flask>=3.0"]\n\n[project.optional-dependencies]\nfastapi = ["orjson>=3"]\n',
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "optional-dependency extra names should not trigger FastAPI detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: pyproject multiline optional dependency emits dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-pyproject-optional-multiline");
+ try {
+ writeFileSync(
+ join(dir, "pyproject.toml"),
+ '[project]\ndependencies = ["flask>=3.0"]\n\n[project.optional-dependencies]\napi = [\n "fastapi>=0.115",\n "uvicorn>=0.30",\n]\n',
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "multiline optional dependency arrays should trigger FastAPI detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI direct reference with @ emits dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-direct-reference");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "fastapi @ https://example.com/fastapi.whl\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "direct-reference dependencies should trigger FastAPI detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI detected via requirements.in", () => {
+ const dir = makeTempDir("signals-fastapi-requirements-in");
+ try {
+ writeFileSync(join(dir, "requirements.in"), "fastapi>=0.115\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "requirements.in should trigger FastAPI detection");
+ assert.ok(signals.detectedFiles.includes("requirements.txt"), "requirements.in should normalize to requirements.txt marker");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI detected via nested requirements/base.in", () => {
+ const dir = makeTempDir("signals-fastapi-requirements-dir-in");
+ try {
+ mkdirSync(join(dir, "requirements"), { recursive: true });
+ writeFileSync(join(dir, "requirements", "base.in"), "fastapi>=0.115\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "requirements/base.in should trigger FastAPI detection");
+ assert.ok(signals.detectedFiles.includes("requirements.txt"), "requirements/base.in should normalize to requirements.txt marker");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI comments do not trigger dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-comment");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "# maybe evaluate fastapi later\nflask==3.0\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "comments should not trigger FastAPI detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI inline comments do not trigger dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-inline-comment");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "flask==3.0 # maybe fastapi later\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "inline comments should not trigger FastAPI detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: fastapi-* packages do not trigger dep:fastapi without fastapi itself", () => {
+ const dir = makeTempDir("signals-fastapi-suffix-only");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "fastapi-users==13.0\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "fastapi-* packages alone should not imply FastAPI framework usage");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: dependency extras mentioning fastapi do not trigger dep:fastapi", () => {
+ const dir = makeTempDir("signals-fastapi-extra-only");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "my-sdk[fastapi]>=1.0\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "dependency extras should not imply FastAPI framework usage");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Django project does NOT get dep:fastapi marker", () => {
+ const dir = makeTempDir("signals-django-no-fastapi");
+ try {
+ writeFileSync(join(dir, "requirements.txt"), "django==5.0\ncelery\n", "utf-8");
+ writeFileSync(join(dir, "manage.py"), "#!/usr/bin/env python", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:fastapi"), "should NOT add dep:fastapi for Django");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI detected case-insensitively (PyPI canonical name)", () => {
+ const dir = makeTempDir("signals-fastapi-case");
+ try {
+ writeFileSync(join(dir, "pyproject.toml"), '[project]\ndependencies = ["FastAPI>=0.100"]\n', "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should detect FastAPI (mixed case)");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: FastAPI detected via nested service requirements.txt", () => {
+ const dir = makeTempDir("signals-fastapi-nested");
+ try {
+ mkdirSync(join(dir, "services", "api"), { recursive: true });
+ writeFileSync(join(dir, "services", "api", "requirements.txt"), "fastapi==0.115.0\n", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:fastapi"), "should detect FastAPI in nested service requirements.txt");
+ assert.ok(signals.detectedFiles.includes("requirements.txt"), "should normalize nested requirements.txt marker");
+ assert.equal(signals.primaryLanguage, "python");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested Prisma schema normalizes to prisma/schema.prisma", () => {
+ const dir = makeTempDir("signals-prisma-nested");
+ try {
+ mkdirSync(join(dir, "services", "api", "prisma"), { recursive: true });
+ writeFileSync(join(dir, "services", "api", "prisma", "schema.prisma"), "datasource db { provider = \"sqlite\" }", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("prisma/schema.prisma"), "should detect nested Prisma schema");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested Spring Boot Gradle service emits dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-gradle-nested");
+ try {
+ mkdirSync(join(dir, "services", "api"), { recursive: true });
+ writeFileSync(
+ join(dir, "services", "api", "build.gradle"),
+ "plugins { id 'org.springframework.boot' version '3.2.0' }",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "should detect nested Spring Boot Gradle service");
+ assert.equal(signals.primaryLanguage, "java/kotlin");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: legacy apply plugin syntax emits dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-apply-plugin");
+ try {
+ writeFileSync(join(dir, "build.gradle"), "apply plugin: 'org.springframework.boot'", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "apply plugin syntax should trigger Spring Boot detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: nested Spring Boot Kotlin DSL service still uses neutral java/kotlin language hint", () => {
+ const dir = makeTempDir("signals-spring-gradle-kts-nested");
+ try {
+ mkdirSync(join(dir, "services", "api"), { recursive: true });
+ writeFileSync(
+ join(dir, "services", "api", "build.gradle.kts"),
+ "plugins { id(\"org.springframework.boot\") version \"3.2.0\" }",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"));
+ assert.equal(signals.primaryLanguage, "java/kotlin");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Android Gradle project does not emit dep:spring-boot", () => {
+ const dir = makeTempDir("signals-android-no-spring");
+ try {
+ writeFileSync(join(dir, "build.gradle"), "plugins { id 'com.android.application' }", "utf-8");
+ mkdirSync(join(dir, "app"), { recursive: true });
+ writeFileSync(join(dir, "app", "build.gradle"), "plugins { id 'com.android.application' }", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "Android Gradle files should not trigger Spring Boot detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Android inline comments do not emit dep:spring-boot", () => {
+ const dir = makeTempDir("signals-android-inline-comment");
+ try {
+ writeFileSync(join(dir, "build.gradle"), "plugins { id 'com.android.application' } // spring-boot maybe later", "utf-8");
+ mkdirSync(join(dir, "app"), { recursive: true });
+ writeFileSync(join(dir, "app", "build.gradle"), "plugins { id 'com.android.application' }", "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "inline comments should not trigger Spring Boot detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: build metadata mentioning spring-boot does not emit dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-metadata-only");
+ try {
+ writeFileSync(join(dir, "build.gradle"), 'def notes = "spring-boot migration planned later"', "utf-8");
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "arbitrary metadata text should not trigger Spring Boot detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Maven artifactId alone does not emit dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-maven-artifact-only");
+ try {
+ writeFileSync(
+ join(dir, "pom.xml"),
+ '4.0.0com.examplespring-boot-tools',
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "artifactId alone should not imply Spring Boot");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Spring Boot version-catalog alias emits dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.backend.web) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "libs.versions.toml"),
+ "[plugins]\nbackend-web = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "should detect Spring Boot via version-catalog alias");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: commented Spring Boot alias in libs.versions.toml does not emit dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog-comment");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.backend.web) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "libs.versions.toml"),
+ "[plugins]\n# backend-web = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "commented aliases should not trigger Spring Boot detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: unused Spring Boot alias in libs.versions.toml does not emit dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog-unused");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.backend.web) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "libs.versions.toml"),
+ "[plugins]\nother-plugin = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "unused Spring Boot aliases should not trigger detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: spring-like alias name without Spring Boot id does not emit dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog-false-alias");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(libs.plugins.spring.boot.conventions) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "libs.versions.toml"),
+ "[plugins]\nspring-boot-conventions = { id = 'com.example.conventions', version = '1.0.0' }\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(!signals.detectedFiles.includes("dep:spring-boot"), "spring-looking alias names should not imply Spring Boot without matching id");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Spring Boot version-catalog library alias emits dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog-library");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(join(dir, "build.gradle.kts"), "dependencies { implementation(libs.backend.web) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "libs.versions.toml"),
+ "[libraries]\nbackend-web = { module = 'org.springframework.boot:spring-boot-starter-web', version = '3.2.0' }\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "Spring Boot library aliases should trigger detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Spring Boot version-catalog bundle alias emits dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog-bundle");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(join(dir, "build.gradle.kts"), "dependencies { implementation(libs.bundles.backend.web) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "libs.versions.toml"),
+ "[libraries]\nspring-boot-starter-web = { module = 'org.springframework.boot:spring-boot-starter-web', version = '3.2.0' }\n\n[bundles]\nbackend-web = ['spring-boot-starter-web']\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "Spring Boot bundle aliases should trigger detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Spring Boot custom version-catalog accessor emits dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog-custom-accessor");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(backend.plugins.web) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "backend.versions.toml"),
+ "[plugins]\nweb = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "custom version-catalog accessors should trigger Spring Boot detection");
+ } finally {
+ cleanup(dir);
+ }
+});
+
+test("detectProjectSignals: Spring Boot settings-defined catalog accessor emits dep:spring-boot", () => {
+ const dir = makeTempDir("signals-spring-version-catalog-settings-accessor");
+ try {
+ mkdirSync(join(dir, "gradle"), { recursive: true });
+ writeFileSync(
+ join(dir, "settings.gradle.kts"),
+ 'dependencyResolutionManagement { versionCatalogs { create("backendLibs") { from(files("./gradle/backend.versions.toml")) } } }',
+ "utf-8",
+ );
+ writeFileSync(join(dir, "build.gradle.kts"), "plugins { alias(backendLibs.plugins.web) }", "utf-8");
+ writeFileSync(
+ join(dir, "gradle", "backend.versions.toml"),
+ "[plugins]\nweb = { id = 'org.springframework.boot', version = '3.2.0' }\n",
+ "utf-8",
+ );
+ const signals = detectProjectSignals(dir);
+ assert.ok(signals.detectedFiles.includes("dep:spring-boot"), "settings-defined catalog accessors should trigger Spring Boot detection");
+ } finally {
+ cleanup(dir);
+ }
+});
diff --git a/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts b/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts
new file mode 100644
index 000000000..fdbc8de0c
--- /dev/null
+++ b/src/resources/extensions/gsd/tests/plan-quality-validator.test.ts
@@ -0,0 +1,474 @@
+import { validateTaskPlanContent, validateSlicePlanContent } from '../observability-validator.ts';
+import { createTestContext } from './test-helpers.ts';
+
+const { assertEq, assertTrue, report } = createTestContext();
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — empty/missing Steps section
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: empty Steps section ===');
+{
+ const content = `# T01: Some Task
+
+## Description
+
+Do something useful.
+
+## Steps
+
+## Verification
+
+- Run the tests and confirm output.
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const stepsIssues = issues.filter(i => i.ruleId === 'empty_steps_section');
+ assertTrue(stepsIssues.length >= 1, 'empty Steps section produces empty_steps_section issue');
+ if (stepsIssues.length > 0) {
+ assertEq(stepsIssues[0].severity, 'warning', 'empty_steps_section severity is warning');
+ assertEq(stepsIssues[0].scope, 'task-plan', 'empty_steps_section scope is task-plan');
+ }
+}
+
+console.log('\n=== validateTaskPlanContent: missing Steps section entirely ===');
+{
+ const content = `# T01: Some Task
+
+## Description
+
+Do something useful.
+
+## Verification
+
+- Run the tests.
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const stepsIssues = issues.filter(i => i.ruleId === 'empty_steps_section');
+ assertTrue(stepsIssues.length >= 1, 'missing Steps section produces empty_steps_section issue');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — placeholder-only Verification
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: placeholder-only Verification ===');
+{
+ const content = `# T01: Some Task
+
+## Steps
+
+1. Do the thing.
+2. Do the other thing.
+
+## Verification
+
+- {{placeholder verification step}}
+- {{another placeholder}}
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const verifyIssues = issues.filter(i => i.ruleId === 'placeholder_verification');
+ assertTrue(verifyIssues.length >= 1, 'placeholder-only Verification produces placeholder_verification issue');
+ if (verifyIssues.length > 0) {
+ assertEq(verifyIssues[0].severity, 'warning', 'placeholder_verification severity is warning');
+ assertEq(verifyIssues[0].scope, 'task-plan', 'placeholder_verification scope is task-plan');
+ }
+}
+
+console.log('\n=== validateTaskPlanContent: Verification with only template text ===');
+{
+ const content = `# T01: Some Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+{{whatWasVerifiedAndHow — commands run, tests passed, behavior confirmed}}
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const verifyIssues = issues.filter(i => i.ruleId === 'placeholder_verification');
+ assertTrue(verifyIssues.length >= 1, 'template-text-only Verification produces placeholder_verification issue');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateSlicePlanContent — empty inline task entries
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateSlicePlanContent: empty inline task entries ===');
+{
+ const content = `# S01: Some Slice
+
+**Goal:** Build the thing.
+**Demo:** It works.
+
+## Tasks
+
+- [ ] **T01: First Task** \`est:20m\`
+
+- [ ] **T02: Second Task** \`est:15m\`
+
+## Verification
+
+- Run the tests.
+`;
+
+ const issues = validateSlicePlanContent('S01-PLAN.md', content);
+ const emptyTaskIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+ assertTrue(emptyTaskIssues.length >= 1, 'task entries with no description produce empty_task_entry issue');
+ if (emptyTaskIssues.length > 0) {
+ assertEq(emptyTaskIssues[0].severity, 'warning', 'empty_task_entry severity is warning');
+ assertEq(emptyTaskIssues[0].scope, 'slice-plan', 'empty_task_entry scope is slice-plan');
+ }
+}
+
+console.log('\n=== validateSlicePlanContent: task entries with content are fine ===');
+{
+ const content = `# S01: Some Slice
+
+**Goal:** Build the thing.
+**Demo:** It works.
+
+## Tasks
+
+- [ ] **T01: First Task** \`est:20m\`
+ - Why: Because it matters.
+ - Files: \`src/index.ts\`
+ - Do: Implement the feature.
+
+- [ ] **T02: Second Task** \`est:15m\`
+ - Why: Also important.
+ - Do: Add tests.
+
+## Verification
+
+- Run the tests.
+`;
+
+ const issues = validateSlicePlanContent('S01-PLAN.md', content);
+ const emptyTaskIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+ assertEq(emptyTaskIssues.length, 0, 'task entries with description content produce no empty_task_entry issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — scope_estimate over threshold
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: scope_estimate over threshold ===');
+{
+ const content = `---
+estimated_steps: 12
+estimated_files: 15
+---
+
+# T01: Big Task
+
+## Steps
+
+1. Step one.
+2. Step two.
+3. Step three.
+
+## Verification
+
+- Check it works.
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const stepsOverIssues = issues.filter(i => i.ruleId === 'scope_estimate_steps_high');
+ const filesOverIssues = issues.filter(i => i.ruleId === 'scope_estimate_files_high');
+ assertTrue(stepsOverIssues.length >= 1, 'estimated_steps=12 (>=10) produces scope_estimate_steps_high issue');
+ assertTrue(filesOverIssues.length >= 1, 'estimated_files=15 (>=12) produces scope_estimate_files_high issue');
+ if (stepsOverIssues.length > 0) {
+ assertEq(stepsOverIssues[0].severity, 'warning', 'scope_estimate_steps_high severity is warning');
+ assertEq(stepsOverIssues[0].scope, 'task-plan', 'scope_estimate_steps_high scope is task-plan');
+ }
+ if (filesOverIssues.length > 0) {
+ assertEq(filesOverIssues[0].severity, 'warning', 'scope_estimate_files_high severity is warning');
+ }
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — scope_estimate within limits
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: scope_estimate within limits ===');
+{
+ const content = `---
+estimated_steps: 4
+estimated_files: 6
+---
+
+# T01: Small Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const scopeIssues = issues.filter(i =>
+ i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+ );
+ assertEq(scopeIssues.length, 0, 'scope_estimate within limits produces no scope issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — missing scope_estimate (no warning)
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: missing scope_estimate ===');
+{
+ const content = `# T01: No Frontmatter Task
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const scopeIssues = issues.filter(i =>
+ i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+ );
+ assertEq(scopeIssues.length, 0, 'missing scope_estimate produces no scope issues');
+}
+
+console.log('\n=== validateTaskPlanContent: frontmatter without scope keys ===');
+{
+ const content = `---
+id: T01
+parent: S01
+---
+
+# T01: Task With Other Frontmatter
+
+## Steps
+
+1. Do the thing.
+
+## Verification
+
+- Verify it works.
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const scopeIssues = issues.filter(i =>
+ i.ruleId === 'scope_estimate_steps_high' || i.ruleId === 'scope_estimate_files_high'
+ );
+ assertEq(scopeIssues.length, 0, 'frontmatter without scope keys produces no scope issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// Clean plans — no false positives
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== Clean task plan: no plan-quality issues ===');
+{
+ const content = `---
+estimated_steps: 5
+estimated_files: 3
+---
+
+# T01: Well-Formed Task
+
+## Description
+
+A real task with real content.
+
+## Steps
+
+1. Read the input files.
+2. Parse the configuration.
+3. Transform the data.
+4. Write the output.
+5. Verify the results.
+
+## Must-Haves
+
+- [ ] Output file is valid JSON
+- [ ] All input records are processed
+
+## Verification
+
+- Run \`node --test tests/transform.test.ts\` — all assertions pass
+- Manually inspect output.json for correct structure
+
+## Observability Impact
+
+- Signals added/changed: structured error log on parse failure
+- How a future agent inspects this: check stderr for JSON parse errors
+- Failure state exposed: exit code 1 + error message on invalid input
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const planQualityIssues = issues.filter(i =>
+ i.ruleId === 'empty_steps_section' ||
+ i.ruleId === 'placeholder_verification' ||
+ i.ruleId === 'scope_estimate_steps_high' ||
+ i.ruleId === 'scope_estimate_files_high'
+ );
+ assertEq(planQualityIssues.length, 0, 'clean task plan produces no plan-quality issues');
+}
+
+console.log('\n=== Clean slice plan: no plan-quality issues ===');
+{
+ const content = `# S01: Well-Formed Slice
+
+**Goal:** Build a complete feature.
+**Demo:** Run the test suite and see all green.
+
+## Tasks
+
+- [ ] **T01: Create tests** \`est:20m\`
+ - Why: Tests define the contract before implementation.
+ - Files: \`tests/feature.test.ts\`
+ - Do: Write comprehensive test assertions.
+ - Verify: Test file runs without syntax errors.
+
+- [ ] **T02: Implement feature** \`est:30m\`
+ - Why: Core implementation.
+ - Files: \`src/feature.ts\`
+ - Do: Build the feature to make tests pass.
+ - Verify: All tests pass.
+
+## Verification
+
+- \`node --test tests/feature.test.ts\` — all assertions pass
+- Check error output for diagnostic messages
+
+## Observability / Diagnostics
+
+- Runtime signals: structured error objects with error codes
+- Inspection surfaces: test output shows pass/fail counts
+- Failure visibility: exit code 1 on failure with descriptive message
+- Redaction constraints: none
+`;
+
+ const issues = validateSlicePlanContent('S01-PLAN.md', content);
+ const planQualityIssues = issues.filter(i => i.ruleId === 'empty_task_entry');
+ assertEq(planQualityIssues.length, 0, 'clean slice plan produces no empty_task_entry issues');
+}
+
+// ═══════════════════════════════════════════════════════════════════════════
+// validateTaskPlanContent — missing output file paths
+// ═══════════════════════════════════════════════════════════════════════════
+
+console.log('\n=== validateTaskPlanContent: missing output file paths ===');
+{
+ const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+
+## Expected Output
+
+This task produces the main output.
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const outputIssues = issues.filter(i => i.ruleId === 'missing_output_file_paths');
+ assertTrue(outputIssues.length >= 1, 'Expected Output without file paths triggers missing_output_file_paths');
+}
+
+console.log('\n=== validateTaskPlanContent: valid output file paths ===');
+{
+ const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+
+## Expected Output
+
+- \`src/types.ts\` — New type definitions
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const outputIssues = issues.filter(i => i.ruleId === 'missing_output_file_paths');
+ assertEq(outputIssues.length, 0, 'Expected Output with file paths does not trigger warning');
+}
+
+console.log('\n=== validateTaskPlanContent: missing input file paths (info severity) ===');
+{
+ const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+
+## Inputs
+
+Prior task summary insights about the architecture.
+
+## Expected Output
+
+- \`src/output.ts\` — Output file
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const inputIssues = issues.filter(i => i.ruleId === 'missing_input_file_paths');
+ assertTrue(inputIssues.length >= 1, 'Inputs without file paths triggers missing_input_file_paths');
+ if (inputIssues.length > 0) {
+ assertEq(inputIssues[0].severity, 'info', 'missing_input_file_paths is info severity (not warning)');
+ }
+}
+
+console.log('\n=== validateTaskPlanContent: no Expected Output section at all ===');
+{
+ const content = `# T01: Some Task
+
+## Description
+
+Do something.
+
+## Steps
+
+1. Do the thing
+
+## Verification
+
+- Check it works
+`;
+
+ const issues = validateTaskPlanContent('T01-PLAN.md', content);
+ const outputIssues = issues.filter(i => i.ruleId === 'missing_output_file_paths');
+ assertTrue(outputIssues.length >= 1, 'Missing Expected Output section triggers missing_output_file_paths');
+}
+
+report();
diff --git a/src/resources/extensions/gsd/tests/skill-catalog.test.ts b/src/resources/extensions/gsd/tests/skill-catalog.test.ts
new file mode 100644
index 000000000..4f7e3375e
--- /dev/null
+++ b/src/resources/extensions/gsd/tests/skill-catalog.test.ts
@@ -0,0 +1,193 @@
+/**
+ * Unit tests for GSD Skill Catalog — pack matching logic.
+ *
+ * Exercises matchPacksForProject() to verify that project signals
+ * correctly map to skill packs.
+ */
+
+import test from "node:test";
+import assert from "node:assert/strict";
+import { PROJECT_FILES } from "../detection.ts";
+import { GREENFIELD_STACKS, SKILL_CATALOG, matchPacksForProject } from "../skill-catalog.ts";
+import type { ProjectSignals } from "../detection.ts";
+
+function makeSignals(overrides: Partial = {}): ProjectSignals {
+ return {
+ detectedFiles: [],
+ isGitRepo: false,
+ isMonorepo: false,
+ xcodePlatforms: [],
+ hasCI: false,
+ hasTests: false,
+ verificationCommands: [],
+ ...overrides,
+ };
+}
+
+function packLabels(signals: ProjectSignals): string[] {
+ return matchPacksForProject(signals).map((p) => p.label);
+}
+
+// ── matchAlways packs are always included ────────────────────────────────────
+
+test("matchPacksForProject: always includes matchAlways packs", () => {
+ const labels = packLabels(makeSignals());
+ assert.ok(labels.includes("Skill Discovery"), "should include Skill Discovery");
+ assert.ok(labels.includes("Skill Authoring"), "should include Skill Authoring");
+ assert.ok(labels.includes("Browser Automation"), "should include Browser Automation");
+ assert.ok(labels.includes("Document Handling"), "should include Document Handling");
+ assert.ok(labels.includes("Code Review & Quality"), "should include Code Review & Quality");
+ assert.ok(labels.includes("Git Advanced Workflows"), "should include Git Advanced Workflows");
+});
+
+// ── Language matching ────────────────────────────────────────────────────────
+
+test("matchPacksForProject: Python language matches Python packs", () => {
+ const labels = packLabels(makeSignals({ primaryLanguage: "python", detectedFiles: ["pyproject.toml"] }));
+ assert.ok(labels.includes("Python"), "should include Python");
+ assert.ok(labels.includes("Python Advanced"), "should include Python Advanced");
+});
+
+test("matchPacksForProject: Rust language matches Rust packs", () => {
+ const labels = packLabels(makeSignals({ primaryLanguage: "rust", detectedFiles: ["Cargo.toml"] }));
+ assert.ok(labels.includes("Rust"), "should include Rust");
+ assert.ok(labels.includes("Rust Async Patterns"), "should include Rust Async Patterns");
+});
+
+test("matchPacksForProject: Go language matches Go packs", () => {
+ const labels = packLabels(makeSignals({ primaryLanguage: "go", detectedFiles: ["go.mod"] }));
+ assert.ok(labels.includes("Go"), "should include Go");
+ assert.ok(labels.includes("Go Concurrency Patterns"), "should include Go Concurrency Patterns");
+});
+
+test("matchPacksForProject: JS/TS matches web frontend packs", () => {
+ const labels = packLabels(makeSignals({ primaryLanguage: "javascript/typescript", detectedFiles: ["package.json"] }));
+ assert.ok(labels.includes("React & Web Frontend"), "should include React");
+ assert.ok(labels.includes("TypeScript & JS Development"), "should include TS/JS Dev");
+ assert.ok(labels.includes("React State & Patterns"), "should include React State");
+ assert.ok(labels.includes("shadcn/ui"), "should include shadcn");
+ assert.ok(labels.includes("Frontend Design & UX"), "should include Frontend Design");
+});
+
+// ── File matching ────────────────────────────────────────────────────────────
+
+test("matchPacksForProject: angular.json triggers Angular packs", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["angular.json"] }));
+ assert.ok(labels.includes("Angular"), "should include Angular");
+ assert.ok(labels.includes("Angular Migration"), "should include Angular Migration");
+});
+
+test("matchPacksForProject: next.config.ts triggers Next.js packs", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["next.config.ts"] }));
+ assert.ok(labels.includes("Next.js"), "should include Next.js");
+ assert.ok(labels.includes("Next.js App Router Patterns"), "should include Next.js App Router");
+});
+
+test("matchPacksForProject: *.vue triggers Vue.js", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["*.vue"] }));
+ assert.ok(labels.includes("Vue.js"), "should include Vue.js");
+});
+
+test("matchPacksForProject: Chart.yaml triggers Kubernetes", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["Chart.yaml"] }));
+ assert.ok(labels.includes("Kubernetes"), "should include Kubernetes");
+});
+
+test("matchPacksForProject: hardhat.config.ts triggers Blockchain", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["hardhat.config.ts"] }));
+ assert.ok(labels.includes("Blockchain & Web3"), "should include Blockchain & Web3");
+});
+
+test("matchPacksForProject: tailwind.config.ts triggers Tailwind CSS", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["tailwind.config.ts"] }));
+ assert.ok(labels.includes("Tailwind CSS"), "should include Tailwind CSS");
+});
+
+// ── Xcode platform matching ─────────────────────────────────────────────────
+
+test("matchPacksForProject: iphoneos triggers iOS packs", () => {
+ const labels = packLabels(makeSignals({ xcodePlatforms: ["iphoneos"] }));
+ assert.ok(labels.includes("iOS App Frameworks"), "should include iOS App Frameworks");
+ assert.ok(labels.includes("iOS Data Frameworks"), "should include iOS Data Frameworks");
+ assert.ok(labels.includes("iOS AI & ML"), "should include iOS AI & ML");
+ assert.ok(labels.includes("iOS Engineering"), "should include iOS Engineering");
+ assert.ok(labels.includes("iOS Hardware"), "should include iOS Hardware");
+ assert.ok(labels.includes("iOS Platform"), "should include iOS Platform");
+});
+
+// ── Isolation checks — packs that should NOT match ──────────────────────────
+
+test("matchPacksForProject: FastAPI does not match generic Python", () => {
+ const labels = packLabels(makeSignals({ primaryLanguage: "python", detectedFiles: ["pyproject.toml"] }));
+ assert.ok(!labels.includes("FastAPI"), "FastAPI should NOT match generic Python projects");
+});
+
+test("matchPacksForProject: FastAPI matches when dep:fastapi detected", () => {
+ const labels = packLabels(makeSignals({ primaryLanguage: "python", detectedFiles: ["pyproject.toml", "dep:fastapi"] }));
+ assert.ok(labels.includes("FastAPI"), "FastAPI should match when dep:fastapi is in detectedFiles");
+});
+
+test("matchPacksForProject: Spring Boot does not match via language alone", () => {
+ // Simulate Android project: has java/kotlin language but no root pom.xml/build.gradle
+ const labels = packLabels(makeSignals({ primaryLanguage: "java/kotlin", detectedFiles: ["app/build.gradle"] }));
+ assert.ok(!labels.includes("Java & Spring Boot"), "Spring Boot should NOT match via language alone");
+});
+
+test("matchPacksForProject: Spring Boot matches only dep:spring-boot", () => {
+ const positive = packLabels(makeSignals({ detectedFiles: ["dep:spring-boot"] }));
+ assert.ok(positive.includes("Java & Spring Boot"), "should include Spring Boot pack when dependency marker exists");
+
+ const androidLike = packLabels(makeSignals({ detectedFiles: ["build.gradle", "app/build.gradle"], primaryLanguage: "java/kotlin" }));
+ assert.ok(!androidLike.includes("Java & Spring Boot"), "generic Gradle + Android markers should not imply Spring Boot");
+});
+
+test("matchPacksForProject: Unity does not include Godot", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["ProjectSettings/ProjectVersion.txt"] }));
+ assert.ok(labels.includes("Unity"), "should include Unity");
+ assert.ok(!labels.includes("Godot"), "should NOT include Godot");
+});
+
+test("matchPacksForProject: Godot does not include Unity", () => {
+ const labels = packLabels(makeSignals({ detectedFiles: ["project.godot"] }));
+ assert.ok(labels.includes("Godot"), "should include Godot");
+ assert.ok(!labels.includes("Unity"), "should NOT include Unity");
+});
+
+test("matchPacksForProject: .NET backend patterns match F# and solution markers", () => {
+ const fsprojLabels = packLabels(makeSignals({ detectedFiles: ["*.fsproj"], primaryLanguage: "fsharp" }));
+ assert.ok(fsprojLabels.includes(".NET Backend Patterns"), "should include generic .NET backend patterns for F# projects");
+ assert.ok(!fsprojLabels.includes(".NET & C#"), "should not include C#-specific pack for F# projects");
+
+ const slnLabels = packLabels(makeSignals({ detectedFiles: ["*.sln"], primaryLanguage: "dotnet" }));
+ assert.ok(slnLabels.includes(".NET Backend Patterns"), "should include generic .NET backend patterns for solution files");
+});
+
+test("SKILL_CATALOG: every matchFiles entry is backed by detection", () => {
+ const knownMarkers = new Set([
+ ...PROJECT_FILES,
+ "*.sqlite",
+ "*.sql",
+ "*.csproj",
+ "*.fsproj",
+ "*.sln",
+ "*.vue",
+ "dep:fastapi",
+ "dep:spring-boot",
+ ]);
+
+ for (const pack of SKILL_CATALOG) {
+ for (const marker of pack.matchFiles ?? []) {
+ assert.ok(knownMarkers.has(marker), `Unknown detection marker: ${marker} (pack: ${pack.label})`);
+ }
+ }
+});
+
+test("GREENFIELD_STACKS: every pack label resolves to SKILL_CATALOG", () => {
+ const labels = new Set(SKILL_CATALOG.map((pack) => pack.label));
+
+ for (const stack of GREENFIELD_STACKS) {
+ for (const packLabel of stack.packs) {
+ assert.ok(labels.has(packLabel), `Unknown pack label: ${packLabel} (stack: ${stack.id})`);
+ }
+ }
+});
diff --git a/src/tests/app-smoke.test.ts b/src/tests/app-smoke.test.ts
index 90d8a7953..c6a55f291 100644
--- a/src/tests/app-smoke.test.ts
+++ b/src/tests/app-smoke.test.ts
@@ -203,8 +203,7 @@ test("initResources syncs extensions, agents, and skills to target dir", async (
// Agents synced
assert.ok(existsSync(join(fakeAgentDir, "agents", "scout.md")), "scout agent synced");
- // Skills synced
- assert.ok(existsSync(join(fakeAgentDir, "skills")), "skills directory synced");
+ // Skills are NOT synced here — they use ~/.agents/skills/ via skills.sh
// Version manifest synced
const managedVersion = readManagedResourceVersion(fakeAgentDir);