fix: resolve merge conflicts with main for interrupted-session resume

- auto.ts: fix broken paused session resume (duplicate meta variable, restore shouldResumePausedSession guard, add pausedSessionFile/ pausedUnitType/pausedUnitId assignments) - auto-start.ts: remove orphaned crash block (missing imports) - guided-flow.ts: resolve conflict markers, keep assessInterruptedSession branching - interrupted-session.ts: remove completedUnits from isBootstrapCrashLock - crash-recovery.test.ts: resolve conflict, update isLockProcessAlive semantics, remove completedUnits from all fixtures - auto-recovery.test.ts: add missing imports/helpers, remove dead selfHealRuntimeRecords tests, update assertions for main's APIs (hasImplementationArtifacts returns strings, buildLoopRemediationSteps uses gsd undo-task/recover, run-uat resolves to ASSESSMENT) - interrupted-session-auto.test.ts: remove completedUnits, restore shouldResumePausedSession source assertion - interrupted-session-ui.test.ts: remove completedUnits, update pendingAutoStartMap assertion
2026-04-09 14:51:05 -04:00 · 2026-04-09 14:51:05 -04:00 · de6f3ecb2d
commit de6f3ecb2d
parent 80a09a3503 0be7c44605
1365 changed files with 173023 additions and 32496 deletions
--- a/.dockerignore
+++ b/.dockerignore
@ -0,0 +1,53 @@
+# ── Build artifacts ──
+dist/
+build/
+coverage/
+*.tsbuildinfo
+
+# ── Dependencies ──
+node_modules/
+packages/*/node_modules/
+
+# ── Environment & secrets ──
+.env
+.env.*
+!.env.example
+.gsd/
+
+# ── IDE & OS ──
+.idea/
+.vscode/
+*.code-workspace
+.DS_Store
+Thumbs.db
+
+# ── Git ──
+.git/
+.github/
+
+# ── Development files ──
+.claude/
+.plans/
+.artifacts/
+.bg-shell/
+.bg_shell
+*.log
+*.swp
+*.swo
+*~
+tmp/
+.cache/
+
+# ── Native build artifacts ──
+native/
+target/
+
+# ── Test fixtures ──
+tests/
+
+# ── Lock files (npm is canonical via package-lock.json) ──
+pnpm-lock.yaml
+bun.lock
+
+# ── Tarballs ──
+*.tgz
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@ -0,0 +1,36 @@
+# CODEOWNERS
+# Defines required reviewers per path. GitHub enforces these on PRs.
+# https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
+#
+# Format: <pattern>  <@user or @org/team>
+# Last matching rule wins.
+
+# Default: maintainers review everything not explicitly matched below
+*                                   @gsd-build/maintainers
+
+# Core agent orchestration — RFC required, senior review only
+packages/pi-agent-core/             @gsd-build/maintainers
+src/resources/extensions/gsd/       @gsd-build/maintainers
+
+# AI/LLM provider integrations
+packages/pi-ai/                     @gsd-build/maintainers
+
+# Terminal UI
+packages/pi-tui/                    @gsd-build/maintainers
+
+# Native bindings — platform-specific, needs careful review
+native/                             @gsd-build/maintainers
+
+# CI/CD and release pipeline — high blast radius
+.github/                            @gsd-build/maintainers
+scripts/                            @gsd-build/maintainers
+Dockerfile                          @gsd-build/maintainers
+
+# Security-sensitive files — always require maintainer sign-off
+.secretscanignore                   @gsd-build/maintainers
+scripts/secret-scan.sh              @gsd-build/maintainers
+scripts/install-hooks.sh            @gsd-build/maintainers
+
+# Contributor-facing docs — keep accurate, maintainers approve
+CONTRIBUTING.md                     @gsd-build/maintainers
+VISION.md                           @gsd-build/maintainers
--- a/.github/workflows/ai-triage.yml
+++ b/.github/workflows/ai-triage.yml
@ -12,9 +12,9 @@ permissions:

 jobs:
  triage:
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v6
        with:
          sparse-checkout: |
            VISION.md
@ -96,41 +96,47 @@ jobs:
            Be generous in your assessment — only flag clear violations. Ambiguous cases should be marked as aligned.
            Do NOT flag issues/PRs that are legitimately reporting bugs or requesting features, even if they could be better written.`;

-            const response = await fetch('https://api.anthropic.com/v1/messages', {
-              method: 'POST',
-              headers: {
-                'x-api-key': process.env.ANTHROPIC_API_KEY,
-                'content-type': 'application/json',
-                'anthropic-version': '2023-06-01'
-              },
-              body: JSON.stringify({
-                model: 'claude-haiku-4-5-20251001',
-                max_tokens: 1024,
-                messages: [{ role: 'user', content: prompt }]
-              })
-            });
-
-            if (!response.ok) {
-              const err = await response.text();
-              core.setFailed(`Anthropic API error: ${response.status} ${err}`);
-              return;
-            }
-
-            const data = await response.json();
-            const text = data.content[0].text;
-
-            // Extract JSON from response (handle markdown code blocks)
-            const jsonMatch = text.match(/\{[\s\S]*\}/);
-            if (!jsonMatch) {
-              core.setFailed(`Could not parse Claude response: ${text}`);
+            if (!process.env.ANTHROPIC_API_KEY) {
+              core.warning('Skipping AI triage because ANTHROPIC_API_KEY is not configured.');
              return;
            }

            let result;
            try {
+              const response = await fetch('https://api.anthropic.com/v1/messages', {
+                method: 'POST',
+                headers: {
+                  'x-api-key': process.env.ANTHROPIC_API_KEY,
+                  'content-type': 'application/json',
+                  'anthropic-version': '2023-06-01'
+                },
+                body: JSON.stringify({
+                  model: 'claude-haiku-4-5-20251001',
+                  max_tokens: 1024,
+                  messages: [{ role: 'user', content: prompt }]
+                }),
+                signal: AbortSignal.timeout(20000)
+              });
+
+              if (!response.ok) {
+                const err = await response.text();
+                core.warning(`Skipping AI triage after Anthropic API error: ${response.status} ${err}`);
+                return;
+              }
+
+              const data = await response.json();
+              const text = data.content?.[0]?.text ?? '';
+
+              // Extract JSON from response (handle markdown code blocks)
+              const jsonMatch = text.match(/\{[\s\S]*\}/);
+              if (!jsonMatch) {
+                core.warning(`Skipping AI triage because the model response was not parseable JSON: ${text}`);
+                return;
+              }
+
              result = JSON.parse(jsonMatch[0]);
            } catch (e) {
-              core.setFailed(`JSON parse error: ${e.message}\nRaw text: ${text}`);
+              core.warning(`Skipping AI triage after unexpected failure: ${e.message}`);
              return;
            }
            core.info(`Triage result: ${JSON.stringify(result, null, 2)}`);
--- a/.github/workflows/build-native.yml
+++ b/.github/workflows/build-native.yml
@ -46,8 +46,9 @@ jobs:

      - name: Install Rust toolchain
        uses: dtolnay/rust-toolchain@stable
-        with:
-          targets: ${{ matrix.target }}
+
+      - name: Add Rust compilation target
+        run: rustup target add ${{ matrix.target }}

      - name: Cache Rust build artifacts
        uses: Swatinem/rust-cache@v2
@ -97,7 +98,7 @@ jobs:
  publish:
    needs: build
    if: startsWith(github.ref, 'refs/tags/v') || github.event.inputs.publish == 'true'
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    name: Publish platform packages

    steps:
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@ -1,3 +1,4 @@
+# CI workflow — builds, tests, and gates merges to main
 name: CI

 on:
@ -24,7 +25,8 @@ concurrency:

 jobs:
  detect-changes:
-    runs-on: ubuntu-latest
+    timeout-minutes: 2
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    outputs:
      docs-only: ${{ steps.check.outputs.docs-only }}
    steps:
@ -59,7 +61,8 @@ jobs:
          fi

  docs-check:
-    runs-on: ubuntu-latest
+    timeout-minutes: 5
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    needs: detect-changes
    steps:
      - uses: actions/checkout@v6
@ -70,8 +73,9 @@ jobs:
        run: bash scripts/docs-prompt-injection-scan.sh --diff origin/main

  lint:
+    timeout-minutes: 5
    needs: detect-changes
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    steps:
      - uses: actions/checkout@v6
        with:
@ -80,6 +84,9 @@ jobs:
      - name: Scan for hardcoded secrets
        run: bash scripts/secret-scan.sh --diff origin/main

+      - name: Scan for base64-encoded secrets
+        run: bash scripts/base64-scan.sh --diff origin/main
+
      - name: Ensure .gsd/ is not checked in
        run: |
          if [ -d ".gsd" ]; then
@ -95,10 +102,17 @@ jobs:
      - name: Validate skill references
        run: node scripts/check-skill-references.mjs

+      - name: Require tests with source changes
+        if: github.event_name == 'pull_request'
+        env:
+          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
+        run: bash scripts/require-tests.sh
+
  build:
+    timeout-minutes: 15
    needs: detect-changes
    if: needs.detect-changes.outputs.docs-only != 'true'
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404

    steps:
      - name: Checkout repository
@ -131,15 +145,21 @@ jobs:
      - name: Run unit tests
        run: npm run test:unit

+      - name: Run package tests
+        run: npm run test:packages
+
      - name: Run integration tests
        run: npm run test:integration

+      - name: Check test coverage thresholds
+        run: npm run test:coverage
+
  windows-portability:
+    timeout-minutes: 15
    needs: detect-changes
    if: >-
-      needs.detect-changes.outputs.docs-only != 'true' &&
-      github.event_name == 'push' && github.ref == 'refs/heads/main'
-    runs-on: windows-latest
+      needs.detect-changes.outputs.docs-only != 'true'
+    runs-on: blacksmith-4vcpu-windows-2025

    steps:
      - name: Checkout repository
@ -162,3 +182,70 @@ jobs:

      - name: Run unit tests
        run: npm run test:unit
+
+      - name: Run package tests
+        run: npm run test:packages
+
+  rtk-portability:
+    timeout-minutes: 20
+    needs: detect-changes
+    if: needs.detect-changes.outputs.docs-only != 'true'
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - label: linux
+            os: blacksmith-4vcpu-ubuntu-2404
+          - label: windows
+            os: blacksmith-4vcpu-windows-2025
+          - label: macos
+            os: macos-15
+    runs-on: ${{ matrix.os }}
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v6
+        with:
+          node-version: '24'
+          cache: 'npm'
+
+      - name: Install dependencies
+        env:
+          PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: '1'
+        run: npm ci
+
+      - name: Validate managed RTK install
+        run: >-
+          node --experimental-strip-types --input-type=module -e
+          "const mod = await import('./src/rtk.ts');
+          const path = mod.getManagedRtkPath(process.platform);
+          if (!mod.validateRtkBinary(path)) {
+            console.error('Managed RTK validation failed:', path);
+            process.exit(1);
+          }
+          console.log('Managed RTK validated at', path);"
+
+      - name: Run RTK-focused portability tests
+        run: >-
+          node --import ./src/resources/extensions/gsd/tests/resolve-ts.mjs
+          --experimental-strip-types --experimental-test-isolation=process --test
+          src/tests/rtk.test.ts
+          src/tests/rtk-execution-seams.test.ts
+          src/tests/postinstall.test.ts
+          src/tests/app-smoke.test.ts
+          src/resources/extensions/gsd/tests/custom-verification.test.ts
+          src/resources/extensions/gsd/tests/verification-gate.test.ts
+
+      - name: Generate RTK benchmark evidence
+        if: matrix.label == 'linux'
+        run: node scripts/rtk-benchmark.mjs --output .artifacts/rtk-benchmark.md
+
+      - name: Upload RTK benchmark artifact
+        if: matrix.label == 'linux'
+        uses: actions/upload-artifact@v4
+        with:
+          name: rtk-benchmark-linux
+          path: .artifacts/rtk-benchmark.md
--- a/.github/workflows/cleanup-dev-versions.yml
+++ b/.github/workflows/cleanup-dev-versions.yml
@ -11,7 +11,7 @@ permissions:
 jobs:
  cleanup:
    name: Remove stale -dev versions
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    steps:
      - uses: actions/setup-node@v6
        with:
--- a/.github/workflows/pipeline.yml
+++ b/.github/workflows/pipeline.yml
@ -7,7 +7,7 @@ on:
    branches: [main]

 concurrency:
-  group: pipeline-${{ github.sha }}
+  group: pipeline-main
  cancel-in-progress: false

 permissions:
@ -18,7 +18,7 @@ jobs:
  dev-publish:
    name: Dev Publish
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    container:
      image: ghcr.io/gsd-build/gsd-ci-builder:latest
      credentials:
@ -71,7 +71,7 @@ jobs:
  test-verify:
    name: Test & Verify
    needs: dev-publish
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    steps:
      - uses: actions/checkout@v6

@ -81,8 +81,15 @@ jobs:
          registry-url: https://registry.npmjs.org
          cache: 'npm'

-      - name: Install gsd-pi@dev globally
-        run: npm install -g gsd-pi@dev
+      - name: Install gsd-pi@dev globally (with registry propagation retry)
+        run: |
+          for i in 1 2 3 4 5 6; do
+            npm install -g gsd-pi@dev && exit 0
+            echo "Attempt $i failed — waiting 10s for npm registry propagation..."
+            sleep 10
+          done
+          echo "Failed to install gsd-pi@dev after 6 attempts"
+          exit 1

      - name: Run smoke tests (against installed binary)
        run: |
@ -129,7 +136,7 @@ jobs:
  prod-release:
    name: Production Release
    needs: [dev-publish, test-verify]
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    environment: prod
    steps:
      - uses: actions/checkout@v6
@ -180,6 +187,7 @@ jobs:
          git add package.json package-lock.json CHANGELOG.md native/npm/*/package.json pkg/package.json packages/pi-coding-agent/package.json
          git commit -m "release: v${RELEASE_VERSION}"
          git tag "v${RELEASE_VERSION}"
+          git pull --rebase origin main
          git push origin main
          git push origin "v${RELEASE_VERSION}"

@ -240,7 +248,7 @@ jobs:
  update-builder:
    name: Update CI Builder Image
    if: ${{ github.event.workflow_run.conclusion == 'success' }}
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404
    steps:
      - uses: actions/checkout@v6
        with:
--- a/.github/workflows/pr-risk.yml
+++ b/.github/workflows/pr-risk.yml
@ -14,19 +14,19 @@ permissions:
 jobs:
  risk-check:
    name: Classify changed files and assess risk
-    runs-on: ubuntu-latest
+    runs-on: blacksmith-4vcpu-ubuntu-2404

    steps:
      # Checkout the BASE branch — our trusted script and map, not fork code.
      - name: Checkout base
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6
        with:
          ref: ${{ github.base_ref }}

      - name: Setup Node.js
-        uses: actions/setup-node@v4
+        uses: actions/setup-node@v6
        with:
-          node-version: '20'
+          node-version: '24'

      # Use the GitHub API to get changed files — no fork code is executed.
      - name: Get changed files
@ -44,14 +44,14 @@ jobs:
        id: risk
        run: |
          REPORT=$(cat /tmp/changed-files.txt | node scripts/pr-risk-check.mjs --github || true)
-          echo "report<<EOF" >> $GITHUB_OUTPUT
-          echo "$REPORT" >> $GITHUB_OUTPUT
-          echo "EOF" >> $GITHUB_OUTPUT
+          echo "report<<EOF" >> "$GITHUB_OUTPUT"
+          echo "$REPORT" >> "$GITHUB_OUTPUT"
+          echo "EOF" >> "$GITHUB_OUTPUT"

          RISK_LEVEL=$(cat /tmp/changed-files.txt | node scripts/pr-risk-check.mjs --json 2>/dev/null \
            | node -e "let d=''; process.stdin.on('data',c=>d+=c); process.stdin.on('end',()=>{ try { console.log(JSON.parse(d).risk) } catch { console.log('low') } })" \
            || echo "low")
-          echo "level=$RISK_LEVEL" >> $GITHUB_OUTPUT
+          echo "level=$RISK_LEVEL" >> "$GITHUB_OUTPUT"

      - name: Write step summary
        run: echo "${{ steps.risk.outputs.report }}" >> $GITHUB_STEP_SUMMARY
--- a/.github/workflows/regenerate-models.yml
+++ b/.github/workflows/regenerate-models.yml
@ -0,0 +1,43 @@
+# Regenerates models.generated.ts from live provider APIs weekly.
+# Opens a PR automatically if the model list has changed.
+name: Regenerate model registry
+
+on:
+  schedule:
+    - cron: '0 6 * * 1'  # Every Monday at 06:00 UTC
+  workflow_dispatch:       # Allow manual trigger
+
+permissions:
+  contents: write
+  pull-requests: write
+
+jobs:
+  regenerate:
+    runs-on: blacksmith-4vcpu-ubuntu-2404
+    timeout-minutes: 15
+    steps:
+      - uses: actions/checkout@v6
+
+      - uses: actions/setup-node@v4
+        with:
+          node-version: '22'
+          cache: 'npm'
+
+      - name: Install dependencies
+        run: npm ci
+
+      - name: Regenerate model registry
+        run: npx tsx packages/pi-ai/scripts/generate-models.ts
+
+      - name: Open PR if changed
+        uses: peter-evans/create-pull-request@v7
+        with:
+          commit-message: 'chore(pi-ai): regenerate model registry from upstream APIs'
+          title: 'chore(pi-ai): regenerate model registry from upstream APIs'
+          body: |
+            Automated weekly regeneration of `models.generated.ts` from live provider APIs.
+
+            Run `packages/pi-ai/scripts/generate-models.ts` — no logic changed, output only.
+          branch: chore/auto-regenerate-models
+          labels: chore
+          delete-branch: true
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,17 @@

+# ── Compiled test output ──
+dist-test/
+
+# ── Compiled output in src/ (should only contain .ts source) ──
+src/**/*.js
+src/**/*.js.map
+src/**/*.d.ts
+src/**/*.d.ts.map
+!src/**/*.test.js
+
+# ── Repowise index (local machine-generated cache) ──
+.repowise/
+
 # ── GSD project state (development-only, lives in worktree branches) ──
 package-lock.json
 .claude/
@ -39,6 +52,9 @@ tmp/
 packages/*/dist/
 packages/*/node_modules/

+# ── Scratch/WIP files ──
+preflight-script.ts
+
 # ── GSD baseline (auto-generated) ──
 dist/
 !/pkg/dist/modes/
@ -52,6 +68,7 @@ TODOS.md
 .planning/
 .audits/
 docs/coherence-audit/
+.plans/

 # ── GSD project state (per-worktree, never committed) ──
 .gsd/
@ -62,3 +79,6 @@ bun.lock

 # ── GSD baseline (auto-generated) ──
 .gsd
+
+# ── GSD baseline (auto-generated) ──
+.gsd-id
--- a/.mcp.json
+++ b/.mcp.json
@ -0,0 +1,14 @@
+{
+  "mcpServers": {
+    "repowise": {
+      "command": "repowise",
+      "args": [
+        "mcp",
+        "/Users/jeremymcspadden/Github/gsd-2",
+        "--transport",
+        "stdio"
+      ],
+      "description": "repowise: codebase intelligence \u2014 docs, graph, git signals, dead code, decisions"
+    }
+  }
+}
--- a/.npmrc
+++ b/.npmrc
@ -0,0 +1 @@
+engine-strict=true
--- a/.plans/doctor-cleanup-consolidation.md
+++ b/.plans/doctor-cleanup-consolidation.md
--- a/.plans/extension-loading-multi-path.md
+++ b/.plans/extension-loading-multi-path.md
@ -0,0 +1,138 @@
+# Extension Loading: Dependency Sort + Unified Enable/Disable
+
+## Context
+
+GSD-2 has a well-structured extension system with three discovery paths (bundled, global/community, project-local) that are **already wired up** through pi's `DefaultPackageManager.addAutoDiscoveredResources()`. However, two critical gaps remain:
+
+1. `sortExtensionPaths()` (topological dependency sort) is implemented but **never called** — `dependencies.extensions` in manifests is decorative
+2. The GSD extension registry (enable/disable) only applies to **bundled** extensions — community extensions bypass it entirely
+
+### Architecture (Current Flow)
+
+```
+GSD loader.ts
+  → discoverExtensionEntryPaths(bundledExtDir)
+  → filter by GSD registry (isExtensionEnabled)
+  → set GSD_BUNDLED_EXTENSION_PATHS env var
+      ↓
+DefaultResourceLoader.reload()
+  → packageManager.resolve()
+    → addAutoDiscoveredResources()
+      → project: cwd/.gsd/extensions/     (CONFIG_DIR_NAME = ".gsd")
+      → global:  ~/.gsd/agent/extensions/  (includes synced bundled)
+  → loadExtensions(mergedPaths)            ← NO sort, NO registry check on community
+```
+
+### Key Files
+
+| File | Role |
+|------|------|
+| `src/loader.ts` (lines 146-161) | GSD startup — bundled discovery + registry filter |
+| `src/extension-sort.ts` | Topological sort (Kahn's BFS) — EXISTS but NEVER CALLED |
+| `src/extension-registry.ts` | Registry I/O, enable/disable, tier checks |
+| `src/resource-loader.ts` (lines 589-607) | `buildResourceLoader()` — constructs DefaultResourceLoader |
+| `packages/pi-coding-agent/src/core/resource-loader.ts` (lines 311-395) | `reload()` — merges paths, calls `loadExtensions()` |
+| `packages/pi-coding-agent/src/core/package-manager.ts` (lines 1585-1700) | `addAutoDiscoveredResources()` — auto-discovers from .gsd/ dirs |
+| `packages/pi-coding-agent/src/core/extensions/loader.ts` (lines 945-1002) | `discoverAndLoadExtensions()` — DEAD CODE, never invoked |
+
+---
+
+## Plan
+
+### Task 1: Wire topological sort into extension loading
+
+**What:** Call `sortExtensionPaths()` on the merged extension paths before passing them to `loadExtensions()`.
+
+**Where:** `packages/pi-coding-agent/src/core/resource-loader.ts` ~line 381-385
+
+**Before:**
+```typescript
+const extensionsResult = await loadExtensions(extensionPaths, this.cwd, this.eventBus);
+```
+
+**After:**
+```typescript
+import { sortExtensionPaths } from '../../../src/extension-sort.js';
+
+const { sortedPaths, warnings } = sortExtensionPaths(extensionPaths);
+for (const w of warnings) {
+  // emit as diagnostic, not hard error
+}
+const extensionsResult = await loadExtensions(sortedPaths, this.cwd, this.eventBus);
+```
+
+**Consideration:** `sortExtensionPaths` lives in `src/` (GSD side), not in `packages/pi-coding-agent/`. Need to either:
+- (a) Move it into pi-coding-agent as a shared utility, OR
+- (b) Import it cross-package (already done for other GSD→pi imports), OR
+- (c) Call it on the GSD side before paths reach pi — harder since auto-discovered paths are added inside pi's package manager
+
+Option (a) is cleanest — the sort logic only depends on `readManifestFromEntryPath` which is also in `src/extension-registry.ts` but could be duplicated or shared.
+
+### Task 2: Apply GSD registry to community extensions
+
+**What:** When `buildResourceLoader()` in `src/resource-loader.ts` constructs the DefaultResourceLoader, also discover and filter community extensions from `~/.gsd/agent/extensions/` through the GSD registry — same as it already does for `~/.pi/agent/extensions/` paths.
+
+**Where:** `src/resource-loader.ts` → `buildResourceLoader()` (lines 589-607)
+
+**Current code already filters pi extensions:**
+```typescript
+const piExtensionPaths = discoverExtensionEntryPaths(piExtensionsDir)
+  .filter((entryPath) => !bundledKeys.has(getExtensionKey(entryPath, piExtensionsDir)))
+  .filter((entryPath) => {
+    const manifest = readManifestFromEntryPath(entryPath)
+    if (!manifest) return true
+    return isExtensionEnabled(registry, manifest.id)
+  })
+```
+
+**Add similar filtering for community extensions in agentDir:**
+- Discover extensions in `~/.gsd/agent/extensions/` that are NOT bundled
+- Filter through `isExtensionEnabled(registry, manifest.id)`
+- Pass as disabled (via override patterns or pre-filtering) to the resource loader
+
+**Alternative approach:** Hook into `addAutoDiscoveredResources` or the `addResource` call to check the GSD registry. This might be cleaner since the auto-discovery already happens inside pi's package manager.
+
+### Task 3: Emit sort warnings as diagnostics
+
+**What:** Surface dependency warnings (missing deps, cycles) through GSD's diagnostic system so users see them.
+
+**Where:** Wherever the sort is invoked from Task 1.
+
+**Format:**
+```
+⚠ Extension 'gsd-watch' declares dependency 'gsd' which is not installed — loading anyway
+⚠ Extensions 'foo' and 'bar' form a dependency cycle — loading in alphabetical order
+```
+
+### Task 4: Clean up dead code
+
+**What:** The `discoverAndLoadExtensions()` function in `packages/pi-coding-agent/src/core/extensions/loader.ts` (lines 945-1002) is exported but never invoked. The project-local trust model inside it (`getUntrustedExtensionPaths`) also never runs.
+
+**Options:**
+- (a) Remove it entirely — it's dead
+- (b) Mark deprecated — in case upstream pi uses it
+- (c) Leave it — lowest risk
+
+Recommend (b) for now — add `@deprecated` JSDoc so it doesn't grow new callers.
+
+### Task 5: Tests
+
+- **Sort integration test:** Create two extensions where A depends on B. Verify B loads before A after sort.
+- **Registry community test:** Drop a community extension in `~/.gsd/agent/extensions/`, run `gsd extensions disable <id>`, verify it doesn't load.
+- **Conflict test:** Same extension ID in project-local and global — verify project-local wins.
+- **Missing dep test:** Extension declares dependency on non-existent extension — verify warning emitted, extension still loads.
+- **Cycle test:** Two extensions that depend on each other — verify warning, both load.
+
+---
+
+## Follow-up PR (separate)
+
+**Subagent extension forwarding:** Update `src/resources/extensions/subagent/index.ts` to forward ALL extension paths (not just bundled) to child processes. May need a second env var like `GSD_COMMUNITY_EXTENSION_PATHS` or consolidate into `GSD_EXTENSION_PATHS`.
+
+---
+
+## Open Questions
+
+1. **Where should `sortExtensionPaths` live?** Currently in `src/` (GSD side). Needs to be callable from pi's resource-loader. Options: move to pi, keep and import cross-package, or duplicate.
+2. **Should community extensions respect the same registry as bundled?** Or should they have their own enable/disable mechanism? Current plan unifies them.
+3. **Project-local trust:** The TOFU model in the dead `discoverAndLoadExtensions()` never runs. Should `addAutoDiscoveredResources` also gate project-local extensions behind trust? Or is `.gsd/extensions/` in your own project always trusted?
--- a/.plans/issue-575-dynamic-model-routing.md
+++ b/.plans/issue-575-dynamic-model-routing.md
@ -11,7 +11,7 @@ Users on capped plans (e.g., Claude Pro) exhaust weekly token limits in 15-20 ho
 ## Current Architecture

 ### What Exists
- **Phase-based model config:** Users can set different models per phase via `preferences.md` (research, planning, execution, completion)
+- **Phase-based model config:** Users can set different models per phase via `PREFERENCES.md` (research, planning, execution, completion)
 - **Fallback chains:** Each phase supports `fallbacks: [model1, model2]` for error recovery
 - **Pre-dispatch hooks:** `PreDispatchResult` has a `model` field but it's **never applied** in `auto.ts` — this is a ready-made extension point
 - **Model registry:** `ModelRegistry.getAvailable()` provides all configured models with metadata
--- a/.plans/left-native-tui-main-session-plan.md
+++ b/.plans/left-native-tui-main-session-plan.md
--- a/.plans/ollama-native-provider.md
+++ b/.plans/ollama-native-provider.md
@ -0,0 +1,241 @@
+# Ollama Extension — First-Class Local LLM Support
+
+## Status: DRAFT — Awaiting approval
+
+## Problem
+
+Ollama support in GSD2 currently requires manual `models.json` configuration. Users must:
+1. Know the OpenAI-compatibility endpoint (`localhost:11434/v1`)
+2. Manually list every model they want to use
+3. Set compat flags (`supportsDeveloperRole: false`, etc.)
+4. Use a dummy API key
+
+There's an `ollama-cloud` provider for hosted Ollama, and a discovery adapter that can list models, but no first-class **local Ollama** extension that "just works."
+
+## Goal
+
+Make Ollama the easiest way to use GSD2 — zero config when Ollama is running locally. All Ollama functionality lives in a single extension: `src/resources/extensions/ollama/`.
+
+## Architecture
+
+Everything is a self-contained extension under `src/resources/extensions/ollama/`. The extension:
+- Auto-detects Ollama on startup via health check
+- Discovers and registers local models with the model registry
+- Provides native Ollama API streaming (not OpenAI shim)
+- Exposes `/ollama` slash commands for model management
+- Registers an LLM-callable tool for model pull/status
+
+Minimal core changes — only `KnownProvider` and `KnownApi` type additions in `pi-ai`, and `env-api-keys.ts` for key resolution. Everything else is in the extension.
+
+## File Structure
+
+```
+src/resources/extensions/ollama/
+├── index.ts                  # Extension entry — wires everything on session_start
+├── ollama-client.ts          # HTTP client for Ollama REST API (/api/*)
+├── ollama-discovery.ts       # Model discovery + capability detection
+├── ollama-provider.ts        # Native /api/chat streaming provider (registers with pi-ai)
+├── ollama-commands.ts        # /ollama slash commands (status, pull, list, remove, ps)
+├── ollama-tool.ts            # LLM-callable tool for model management
+├── model-capabilities.ts     # Known model capability table (context window, vision, reasoning)
+└── types.ts                  # Shared types for Ollama API responses
+```
+
+## Scope
+
+### Phase 1: Auto-Discovery + OpenAI-Compat Routing
+
+**What:** Extension that auto-detects Ollama, discovers models, registers them using the existing `openai-completions` API provider. Zero config needed.
+
+**Extension files:**
+- `ollama/index.ts` — Main entry. On `session_start`:
+  1. Probe `localhost:11434` (or `OLLAMA_HOST`) with 1.5s timeout
+  2. If reachable, discover models via `/api/tags`
+  3. Register discovered models with `ctx.modelRegistry` using correct defaults
+  4. Show status widget if Ollama is detected
+- `ollama/ollama-client.ts` — Low-level HTTP client:
+  - `isRunning()` — `GET /` health check
+  - `getVersion()` — `GET /api/version`
+  - `listModels()` — `GET /api/tags`
+  - `showModel(name)` — `POST /api/show` (details, template, parameters, size)
+  - `getRunningModels()` — `GET /api/ps` (loaded models, VRAM usage)
+  - `pullModel(name, onProgress)` — `POST /api/pull` (streaming progress)
+  - `deleteModel(name)` — `DELETE /api/delete`
+  - `copyModel(source, dest)` — `POST /api/copy`
+  - Respects `OLLAMA_HOST` env var for non-default endpoints
+- `ollama/ollama-discovery.ts` — Enhanced model discovery:
+  - Calls `/api/tags` to get model list
+  - Calls `/api/show` per model (batch, cached) to get:
+    - `details.parameter_size` → estimate context window
+    - `details.families` → detect vision (clip), reasoning (deepseek-r1)
+    - `modelfile` → extract default parameters
+  - Returns enriched `DiscoveredModel[]` with proper capabilities
+- `ollama/model-capabilities.ts` — Known model lookup table:
+  - Maps well-known model families to capabilities
+  - e.g., `llama3.1` → `{ contextWindow: 131072, input: ["text"] }`
+  - e.g., `llava` → `{ contextWindow: 4096, input: ["text", "image"] }`
+  - e.g., `deepseek-r1` → `{ reasoning: true, contextWindow: 131072 }`
+  - e.g., `qwen2.5-coder` → `{ contextWindow: 131072, input: ["text"] }`
+  - Fallback: estimate from parameter count if not in table
+- `ollama/types.ts` — Ollama API response types
+
+**Core changes (minimal):**
+- `packages/pi-ai/src/types.ts` — Add `"ollama"` to `KnownProvider`
+- `packages/pi-ai/src/env-api-keys.ts` — Add `"ollama"` key resolution (returns `"ollama"` placeholder — no real key needed)
+- `src/onboarding.ts` — Add `"ollama"` to provider selection list
+- `src/wizard.ts` — Add `ollama` entry (no key required)
+
+**Model registration details:**
+Each discovered model registers as:
+```typescript
+{
+  id: "llama3.1:8b",           // from /api/tags
+  name: "Llama 3.1 8B",        // humanized
+  api: "openai-completions",    // uses existing provider
+  provider: "ollama",
+  baseUrl: "http://localhost:11434/v1",
+  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+  reasoning: false,             // from capabilities table
+  input: ["text"],              // from capabilities table
+  contextWindow: 131072,        // from capabilities table or /api/show
+  maxTokens: 16384,             // conservative default
+  compat: {
+    supportsDeveloperRole: false,
+    supportsReasoningEffort: false,
+    supportsUsageInStreaming: false,
+    maxTokensField: "max_tokens",
+  },
+}
+```
+
+**Behavior:**
+- `gsd --list-models` shows all locally-pulled Ollama models automatically
+- `/model ollama/llama3.1:8b` works without any config file
+- If Ollama isn't running, extension is silent — no errors, no models listed
+- `models.json` overrides still work (user config wins over auto-discovery)
+
+### Phase 2: Native Ollama API Provider (`/api/chat`)
+
+**What:** A dedicated streaming provider that talks Ollama's native protocol instead of the OpenAI compatibility shim.
+
+**Extension files:**
+- `ollama/ollama-provider.ts` — Native `/api/chat` streaming:
+  - Registers `"ollama-chat"` API with `registerApiProvider()`
+  - Implements `stream()` and `streamSimple()`:
+    - Maps GSD `Context` → Ollama messages format
+    - Maps GSD `Tool[]` → Ollama tool format
+    - Streams NDJSON responses, maps back to `AssistantMessage` events
+    - Extracts `<think>` blocks for reasoning models (deepseek-r1, qwq)
+  - Ollama-specific options:
+    - `keep_alive` — control model memory retention (default: "5m")
+    - `num_ctx` — pass through model's context window
+    - `num_predict` — max output tokens
+    - Temperature, top_p, top_k
+  - Response metadata:
+    - `eval_count` / `eval_duration` → tokens/sec in usage stats
+    - `total_duration`, `load_duration` → performance visibility
+  - Vision support: converts image content to base64 for multimodal models
+
+**Core changes:**
+- `packages/pi-ai/src/types.ts` — Add `"ollama-chat"` to `KnownApi`
+
+**Phase 1 models switch to `api: "ollama-chat"` by default.** Users can force OpenAI-compat via `models.json` override if needed.
+
+**Why native over OpenAI-compat:**
+- Full `keep_alive` / `num_ctx` control
+- Better error messages (Ollama-native vs generic OpenAI)
+- More reliable tool calling on Ollama's native format
+- Performance metrics in response (tokens/sec)
+- Foundation for model management commands
+
+### Phase 3: Local LLM Management UX
+
+**What:** `/ollama` slash commands and an LLM tool for model management.
+
+**Extension files:**
+- `ollama/ollama-commands.ts` — Slash commands registered via `pi.registerCommand()`:
+  - `/ollama` — Status overview:
+    ```
+    Ollama v0.5.7 — running (localhost:11434)
+
+    Loaded:
+      llama3.1:8b       4.7 GB VRAM   idle 3m
+
+    Available:
+      llama3.1:8b       (4.7 GB)
+      qwen2.5-coder:7b  (4.4 GB)
+      deepseek-r1:8b    (4.9 GB)
+    ```
+  - `/ollama pull <model>` — Pull with streaming progress via `ctx.ui.setWidget()`
+  - `/ollama list` — List all local models with sizes and families
+  - `/ollama remove <model>` — Delete a model (with confirmation)
+  - `/ollama ps` — Running models + VRAM usage
+- `ollama/ollama-tool.ts` — LLM-callable tool registered via `pi.registerTool()`:
+  - `ollama_manage` tool — lets the agent pull/list/check models
+  - Parameters: `{ action: "list" | "pull" | "status" | "ps", model?: string }`
+  - Use case: agent detects it needs a model, pulls it automatically
+
+**UX Flow:**
+```
+$ gsd
+> /ollama
+Ollama v0.5.7 — running (localhost:11434)
+Loaded:
+  llama3.1:8b    — 4.7 GB VRAM, idle 3m
+Available:
+  llama3.1:8b    (4.7 GB)
+  qwen2.5-coder:7b (4.4 GB)
+  deepseek-r1:8b (4.9 GB)
+
+> /ollama pull codestral:22b
+Pulling codestral:22b...
+████████████████████████████░░░░ 78% (14.2 GB / 18.1 GB)
+✓ codestral:22b ready
+
+> /model ollama/codestral:22b
+Switched to codestral:22b (local, Ollama)
+```
+
+## Implementation Order
+
+1. **Phase 1** — Auto-discovery with OpenAI-compat routing. Biggest user impact, smallest risk.
+2. **Phase 3** — Management UX (`/ollama` commands). Valuable even before native API.
+3. **Phase 2** — Native `/api/chat` provider. Optimization over OpenAI-compat; do last.
+
+## Core Changes Summary (minimal)
+
+| File | Change |
+|------|--------|
+| `packages/pi-ai/src/types.ts` | Add `"ollama"` to `KnownProvider`, `"ollama-chat"` to `KnownApi` (Phase 2) |
+| `packages/pi-ai/src/env-api-keys.ts` | Add `"ollama"` → always returns `"ollama"` placeholder |
+| `src/onboarding.ts` | Add `"ollama"` to provider picker |
+| `src/wizard.ts` | Add `"ollama"` key mapping (no key required) |
+
+Everything else lives in `src/resources/extensions/ollama/`.
+
+## Risks & Mitigations
+
+| Risk | Mitigation |
+|------|------------|
+| Ollama not running — startup probe latency | 1.5s timeout; cache result; probe async so it doesn't block TUI paint |
+| Model capabilities unknown | Known-model table + `/api/show` fallback + parameter_size estimation |
+| Tool calling unreliable on small models | Detect param count; warn on <7B models |
+| Ollama API changes between versions | Version detect via `/api/version`; stable endpoints only |
+| Conflicts with `models.json` Ollama config | User config always wins; auto-discovered models merge beneath manual config |
+| Extension disabled — no impact on core | Extension is additive; disabling removes all Ollama features cleanly |
+
+## Testing Strategy
+
+- Unit tests: `ollama-client.ts` with mocked fetch responses
+- Unit tests: `ollama-discovery.ts` model capability parsing
+- Unit tests: `ollama-provider.ts` message format mapping + NDJSON stream parsing
+- Unit tests: `model-capabilities.ts` known model lookups
+- Integration test: mock HTTP server simulating Ollama `/api/tags`, `/api/chat`, `/api/pull`
+- Manual test: real Ollama instance with llama3.1, qwen2.5-coder, deepseek-r1
+
+## Open Questions
+
+1. **Startup probe** — Probe Ollama on `session_start` (adds ~1.5s if not running) or lazy on first `/model`? **Recommendation: async probe on session_start (non-blocking), eager if `OLLAMA_HOST` is set.**
+2. **Auto-start** — Try to launch Ollama if installed but not running? **Recommendation: no — too invasive. Show helpful message in `/ollama` status.**
+3. **Vision support** — Support multimodal models (llava, etc.) in Phase 2 native API? **Recommendation: yes, detected via capabilities table.**
+4. **Model refresh** — How often to re-probe Ollama for new models? **Recommendation: on `/ollama list`, on `/model` command, and every 5 min (existing TTL).**
--- a/.plans/onboarding-detection-wizard.md
+++ b/.plans/onboarding-detection-wizard.md
@ -134,7 +134,7 @@ Quick filesystem scan (no heavy reads):

 ### Task 1.4: `isFirstEverLaunch(): boolean`

-Returns `true` if `~/.gsd/` doesn't exist or has no `preferences.md`.
+Returns `true` if `~/.gsd/` doesn't exist or has no `PREFERENCES.md`.

 ---

@ -298,7 +298,7 @@ Step 8: Advanced (collapsed by default, expandable)

 Step 9: Bootstrap .gsd/ structure
   - Creates .gsd/milestones/
-   - Creates .gsd/preferences.md (from wizard answers)
+   - Creates .gsd/PREFERENCES.md (from wizard answers)
   - Creates .gitignore entries
   - Seeds CONTEXT.md with detected project signals
   - Commits "chore: init gsd" (if commit_docs enabled)
--- a/.plans/preferences-wizard-completeness.md
+++ b/.plans/preferences-wizard-completeness.md
@ -42,7 +42,7 @@ The `/gsd prefs wizard` currently only configures 6 of 18+ preference fields. Us
 - Added missing keys to `orderedKeys` in `serializePreferencesToFrontmatter()`

 ### Group 6: Update Template & Docs ✓
- Updated `templates/preferences.md` with new fields
+- Updated `templates/PREFERENCES.md` with new fields
 - Updated `docs/preferences-reference.md` with budget, notifications, git, hooks

 ### Group 7: Tests ✓
--- a/.plans/single-writer-engine-v3-control-plane.md
+++ b/.plans/single-writer-engine-v3-control-plane.md
@ -0,0 +1,396 @@
+# Single-Writer Engine v3: Agent Control Plane
+# Plan: State machine guards + actor causation + reversibility
+# Created: 2026-03-25
+
+---
+
+## Background
+
+v2 gave the engine **write discipline** — agents can't corrupt STATE.md directly,
+every mutation goes through the DB, event log is append-only.
+
+What v2 did NOT give us: **behavioral control**.  Agents can still:
+- Complete a task twice (silent overwrite)
+- Complete a slice with open tasks (if they bypass the slice status check)
+- Complete a milestone in any status
+- Re-plan already-completed slices/tasks
+- Call any tool on any unit regardless of ownership
+- Leave no trace of *who* did what or *why*
+
+This plan bundles three work streams that close those gaps together, since they
+share infrastructure (WorkflowEvent schema, DB query surface, handler preconditions).
+
+---
+
+## Work Streams
+
+### Stream 1 — State Machine Guards (P0)
+Add precondition checks to all 8 tool handlers so invalid transitions return an
+error instead of silently succeeding.
+
+### Stream 2 — Actor Identity + Persistent Audit Log (P1)
+Extend `WorkflowEvent` with `actor_name` and `trigger_reason`. Flush the
+in-process `workflow-logger` buffer to a persistent `.gsd/audit-log.jsonl`
+after every tool invocation, so "who did what and why" is durable.
+
+### Stream 3 — Reversibility + Unit Ownership (P2)
+Add `gsd_task_reopen` and `gsd_slice_reopen` tools. Add a unit-ownership
+validation layer so an agent can only complete/reopen units it explicitly claimed.
+
+---
+
+## Detailed Task Breakdown
+
+---
+
+### Stream 1: State Machine Guards
+
+#### S1-T1: Add `getTask`, `getSlice`, `getMilestone` existence helpers to `gsd-db.ts`
+
+**Files:** `src/resources/extensions/gsd/gsd-db.ts`
+
+These are read-only DB helpers to confirm an entity exists and return its current
+`status` field before any mutation. Each returns `null` if not found.
+
+```ts
+getTask(taskId: string, sliceId: string): { status: string } | null
+getSlice(sliceId: string, milestoneId: string): { status: string } | null
+getMilestoneById(milestoneId: string): { status: string } | null
+```
+
+Note: `getSlice` may already exist — check before adding a duplicate. The audit
+report references it in `complete-slice.ts` line 207 but only to list tasks.
+Need a version that returns the slice row itself.
+
+---
+
+#### S1-T2: Guard `complete-task.ts` — enforce valid transitions
+
+**File:** `src/resources/extensions/gsd/tools/complete-task.ts`
+
+Preconditions to add (before the transaction block):
+1. `getMilestoneById(milestoneId)` → must exist, must NOT be `"complete"` or `"done"`
+2. `getSlice(sliceId, milestoneId)` → must exist, must be `"pending"` or `"in_progress"`
+3. `getTask(taskId, sliceId)` → if exists, status must be `"pending"` (not already `"complete"`)
+
+On failure: return `{ error: "<reason>" }` — do NOT throw.
+
+---
+
+#### S1-T3: Guard `complete-slice.ts` — enforce valid transitions
+
+**File:** `src/resources/extensions/gsd/tools/complete-slice.ts`
+
+Preconditions to add:
+1. `getSlice(sliceId, milestoneId)` → must exist, status must be `"pending"` or `"in_progress"` (not already `"complete"`)
+2. `getMilestoneById(milestoneId)` → must exist, must NOT be `"complete"`
+3. All tasks in slice must be `"complete"` (already enforced — keep it, add explicit slice-status check before this)
+
+---
+
+#### S1-T4: Guard `complete-milestone.ts` — enforce valid transitions
+
+**File:** `src/resources/extensions/gsd/tools/complete-milestone.ts`
+
+Preconditions to add:
+1. `getMilestoneById(milestoneId)` → must exist, status must be `"active"` (not already `"complete"`)
+2. Keep existing all-slices-complete check
+3. Add deep check: all tasks across all slices must also be `"complete"` (not just slice status)
+
+---
+
+#### S1-T5: Guard `plan-task.ts` — block re-planning completed tasks
+
+**File:** `src/resources/extensions/gsd/tools/plan-task.ts`
+
+Preconditions to add:
+1. `getSlice(sliceId, milestoneId)` → must exist, status must NOT be `"complete"` (already blocks planning on a closed slice)
+2. If task exists (`getTask`), status must be `"pending"` — block re-planning a `"complete"` task
+
+---
+
+#### S1-T6: Guard `plan-slice.ts` — block re-planning completed slices
+
+**File:** `src/resources/extensions/gsd/tools/plan-slice.ts`
+
+Preconditions to add:
+1. `getSlice(sliceId, milestoneId)` → if exists, status must NOT be `"complete"`
+2. `getMilestoneById(milestoneId)` → must exist, status must NOT be `"complete"`
+
+---
+
+#### S1-T7: Guard `plan-milestone.ts` — block re-planning completed milestones
+
+**File:** `src/resources/extensions/gsd/tools/plan-milestone.ts`
+
+Preconditions to add:
+1. If milestone exists (`getMilestoneById`), status must NOT be `"complete"`
+2. Validate `depends_on` array: each referenced milestoneId must exist and be `"complete"` before this milestone can be planned
+
+---
+
+#### S1-T8: Guard `reassess-roadmap.ts` — verify completedSliceId is actually complete
+
+**File:** `src/resources/extensions/gsd/tools/reassess-roadmap.ts`
+
+Gap: `completedSliceId` is accepted without confirming it is actually `"complete"` status.
+Also: no check that milestone is still `"active"` (could reassess after milestone is done).
+
+Preconditions to add:
+1. `getSlice(completedSliceId, milestoneId)` → status must be `"complete"`
+2. `getMilestoneById(milestoneId)` → status must be `"active"`
+
+---
+
+#### S1-T9: Guard `replan-slice.ts` — verify blockerTaskId exists and is complete
+
+**File:** `src/resources/extensions/gsd/tools/replan-slice.ts`
+
+Gaps:
+- `blockerTaskId` is accepted without verifying it exists or is `"complete"`
+- No check that slice is still `"in_progress"` (could replan after slice is complete)
+
+Preconditions to add:
+1. `getSlice(sliceId, milestoneId)` → status must be `"in_progress"` or `"pending"`, NOT `"complete"`
+2. `getTask(blockerTaskId, sliceId)` → must exist, status must be `"complete"`
+
+---
+
+### Stream 2: Actor Identity + Persistent Audit Log
+
+#### S2-T1: Extend `WorkflowEvent` with actor identity and causation fields
+
+**File:** `src/resources/extensions/gsd/workflow-events.ts`
+
+Extend the `WorkflowEvent` interface:
+```ts
+export interface WorkflowEvent {
+  cmd: string;
+  params: Record<string, unknown>;
+  ts: string;
+  hash: string;
+  actor: "agent" | "system";
+  actor_name?: string;       // ADD: e.g. "executor-agent-01", "gsd-orchestrator"
+  trigger_reason?: string;   // ADD: e.g. "plan-phase complete", "user invoked gsd_complete_task"
+  session_id?: string;       // ADD: process.env.GSD_SESSION_ID if set
+}
+```
+
+Update `appendEvent` to accept and persist these new optional fields.
+Hash computation must remain stable (still hashes only `cmd + params`, not the new fields)
+so fork detection isn't broken.
+
+---
+
+#### S2-T2: Update all 8 tool handlers to pass actor identity to `appendEvent`
+
+**Files:** All 8 handlers in `src/resources/extensions/gsd/tools/`
+
+Each handler receives its inputs. Add a convention where params can include:
+- `actor_name` (optional string) — caller passes their agent identity
+- `trigger_reason` (optional string) — caller passes why this action was triggered
+
+If not provided, default to `actor_name: "agent"`, `trigger_reason: undefined`.
+
+Handlers pass these through to `appendEvent`.
+
+The tool schemas (in the MCP tool definitions) should expose `actor_name` and
+`trigger_reason` as optional string params so agents can self-identify.
+
+---
+
+#### S2-T3: Persist `workflow-logger` to `.gsd/audit-log.jsonl`
+
+**File:** `src/resources/extensions/gsd/workflow-logger.ts`
+
+Current behavior: `_buffer` is in-process memory, drained per-unit and dropped.
+This means errors/warnings disappear across context resets.
+
+Change: After `_push()` writes to the in-process buffer, also append the entry
+to `.gsd/audit-log.jsonl` (using `appendFileSync`). This requires the basePath
+to be available — either pass it as a module-level setter (`setLogBasePath(path)`)
+called at engine init, or accept it as a param on `logWarning`/`logError`.
+
+The audit log format should match `LogEntry` serialized as JSON + newline,
+consistent with `event-log.jsonl`.
+
+---
+
+#### S2-T4: Add `readAuditLog` helper to `workflow-logger.ts`
+
+**File:** `src/resources/extensions/gsd/workflow-logger.ts`
+
+Expose a read function so the auto-loop and diagnostics can surface persistent
+audit entries without replaying the event log:
+
+```ts
+export function readAuditLog(basePath: string): LogEntry[]
+```
+
+---
+
+### Stream 3: Reversibility + Unit Ownership
+
+#### S3-T1: Add `updateTaskStatus` and `updateSliceStatus` DB helpers
+
+**File:** `src/resources/extensions/gsd/gsd-db.ts`
+
+If they don't already exist (check first):
+```ts
+updateTaskStatus(taskId: string, sliceId: string, status: string): void
+updateSliceStatus(sliceId: string, milestoneId: string, status: string): void
+```
+
+These are the write primitives needed by reopen tools.
+
+---
+
+#### S3-T2: Implement `gsd_task_reopen` tool handler
+
+**New file:** `src/resources/extensions/gsd/tools/reopen-task.ts`
+
+Logic:
+1. Validate `taskId`, `sliceId`, `milestoneId` are non-empty strings
+2. `getTask(taskId, sliceId)` → must exist, status must be `"complete"` (can't reopen what isn't closed)
+3. `getSlice(sliceId, milestoneId)` → must exist, status must NOT be `"complete"` (can't reopen a task inside a closed slice — too late)
+4. `getMilestoneById(milestoneId)` → must exist, status must NOT be `"complete"`
+5. In a transaction: `updateTaskStatus(taskId, sliceId, "pending")`
+6. Append event: `cmd: "reopen_task"`, include `actor_name`, `trigger_reason`
+7. Invalidate state cache + render projections
+
+---
+
+#### S3-T3: Implement `gsd_slice_reopen` tool handler
+
+**New file:** `src/resources/extensions/gsd/tools/reopen-slice.ts`
+
+Logic:
+1. Validate `sliceId`, `milestoneId`
+2. `getSlice(sliceId, milestoneId)` → must exist, status must be `"complete"`
+3. `getMilestoneById(milestoneId)` → must NOT be `"complete"`
+4. In a transaction: `updateSliceStatus(sliceId, milestoneId, "in_progress")` + set all tasks back to `"pending"`
+5. Append event: `cmd: "reopen_slice"`
+6. Invalidate state cache + render projections
+
+---
+
+#### S3-T4: Add unit ownership claim/check mechanism
+
+**New file:** `src/resources/extensions/gsd/unit-ownership.ts`
+
+Lightweight JSON file at `.gsd/unit-claims.json` mapping unit IDs to agent names:
+```json
+{
+  "M01/S01/T01": { "agent": "executor-01", "claimed_at": "2026-03-25T..." },
+  "M01/S01":     { "agent": "executor-01", "claimed_at": "2026-03-25T..." }
+}
+```
+
+Functions:
+```ts
+claimUnit(basePath, unitKey, agentName): void   // atomic write
+releaseUnit(basePath, unitKey): void
+getOwner(basePath, unitKey): string | null
+```
+
+`unitKey` format: `"<milestoneId>/<sliceId>/<taskId>"` for tasks, `"<milestoneId>/<sliceId>"` for slices.
+
+---
+
+#### S3-T5: Wire ownership check into `complete-task` and `complete-slice`
+
+**Files:** `complete-task.ts`, `complete-slice.ts`
+
+If `actor_name` is provided AND `.gsd/unit-claims.json` exists AND the unit is claimed:
+- Verify `actor_name` matches the registered owner
+- If mismatch: return `{ error: "Unit <key> is owned by <owner>, not <actor>" }`
+- If no claim file / unit is unclaimed: allow the operation (opt-in ownership)
+
+Ownership is enforced only when claims are present, keeping the feature opt-in.
+
+---
+
+## Files Changed Summary
+
+| File | Change Type |
+|------|-------------|
+| `gsd-db.ts` | Add `getTask`, `getMilestoneById` existence helpers; add `updateTaskStatus`, `updateSliceStatus` |
+| `workflow-events.ts` | Extend `WorkflowEvent` with `actor_name`, `trigger_reason`, `session_id` |
+| `workflow-logger.ts` | Add persistent flush to `.gsd/audit-log.jsonl`; add `setLogBasePath`; add `readAuditLog` |
+| `tools/complete-task.ts` | State machine guards + ownership check + actor passthrough |
+| `tools/complete-slice.ts` | State machine guards + ownership check + actor passthrough |
+| `tools/complete-milestone.ts` | State machine guards + deep task check |
+| `tools/plan-task.ts` | Block re-planning complete tasks |
+| `tools/plan-slice.ts` | Block re-planning complete slices |
+| `tools/plan-milestone.ts` | Block re-planning complete milestones + depends_on validation |
+| `tools/reassess-roadmap.ts` | Verify completedSliceId status + milestone status check |
+| `tools/replan-slice.ts` | Verify blockerTaskId exists + slice status check |
+| `tools/reopen-task.ts` | NEW — gsd_task_reopen handler |
+| `tools/reopen-slice.ts` | NEW — gsd_slice_reopen handler |
+| `unit-ownership.ts` | NEW — claim/release/check ownership |
+
+---
+
+## Execution Order (Dependencies)
+
+```
+S1-T1 (DB helpers)
+  └── S1-T2 (complete-task guards)
+  └── S1-T3 (complete-slice guards)
+  └── S1-T4 (complete-milestone guards)
+  └── S1-T5 (plan-task guards)
+  └── S1-T6 (plan-slice guards)
+  └── S1-T7 (plan-milestone guards)
+  └── S1-T8 (reassess-roadmap guards)
+  └── S1-T9 (replan-slice guards)
+  └── S3-T1 (updateTask/SliceStatus helpers) ── S3-T2, S3-T3
+
+S2-T1 (WorkflowEvent schema)
+  └── S2-T2 (handler actor passthrough)
+
+S2-T3 (audit-log flush)
+  └── S2-T4 (readAuditLog)
+
+S3-T4 (unit-ownership.ts)
+  └── S3-T5 (wire into complete-task/slice)
+```
+
+Parallelizable:
+- All of Stream 1 (S1-T2 through S1-T9) can run in parallel once S1-T1 is done
+- Stream 2 and Stream 3 are fully independent of Stream 1
+
+---
+
+## What Success Looks Like
+
+After this phase:
+
+1. **Double-complete** → returns `{ error: "Task T01 is already complete" }` instead of silently overwriting
+2. **Complete slice with open tasks** → still blocked (was already caught), plus slice status guard added
+3. **Re-plan closed work** → returns `{ error: "Cannot re-plan: slice S01 is already complete" }`
+4. **Wrong agent completes task** → returns `{ error: "Unit M01/S01/T01 is owned by executor-01, not executor-02" }`
+5. **Post-mortem** → `.gsd/audit-log.jsonl` has full trace with actor_name + trigger_reason across context resets
+6. **Oops recovery** → `gsd_task_reopen` / `gsd_slice_reopen` without manual SQL surgery
+7. **depends_on enforcement** → cannot plan M02 if M01 is not yet complete
+
+---
+
+## Decisions
+
+1. **Ownership: opt-in** — enforced only when `.gsd/unit-claims.json` exists. Zero breaking change for existing workflows; teams adopt incrementally.
+
+2. **Slice reopen: reset all tasks to `"pending"`** — simpler invariant. If you're reopening a slice, you're re-doing the work. Partial resets create ambiguous state.
+
+3. **`trigger_reason`: caller-provided** — agents know *why* they acted; the engine can only know *what* was called. Default to `undefined` if not passed.
+
+4. **Session ID: engine-generated** — UUID generated once at engine startup, stored in module state in `workflow-events.ts`. No reliance on agents setting env vars correctly.
+
+5. **Idempotency: fix in this phase** — convert `insertAssessment` and `insertReplanHistory` to upserts (keyed on `milestoneId+sliceId` and `milestoneId+sliceId+ts` respectively). Accumulating duplicate records on retry is a bug, not a feature.
+
+### Additional task from decision 5:
+#### S1-T10: Convert `insertAssessment` and `insertReplanHistory` to upserts
+
+**File:** `src/resources/extensions/gsd/gsd-db.ts`
+
+- `insertAssessment`: upsert keyed on `(milestone_id, completed_slice_id)` — one assessment per completed slice per milestone
+- `insertReplanHistory`: upsert keyed on `(milestone_id, slice_id, blocker_task_id)` — one replan record per blocker per slice
--- a/.prompt-injection-scanignore
+++ b/.prompt-injection-scanignore
@ -0,0 +1,2 @@
+# False positives in GSD prompt templates — these are legitimate LLM instructions, not injection
+src/resources/extensions/gsd/prompts/doctor-heal.md:You are now responsible
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -11,6 +11,59 @@ Read [VISION.md](VISION.md) before contributing. It defines what GSD-2 is, what
 3. **No issue? Create one first** for new features. Bug fixes for obvious problems can skip this step.
 4. **Architectural changes require an RFC.** If your change touches core systems (auto-mode, agent-core, orchestration), open an issue describing your approach and get approval before writing code. We use Architecture Decision Records (ADRs) for significant decisions.

+## Branching and commits
+
+Always work on a dedicated branch. Never push directly to `main`.
+
+**Branch naming:** `<type>/<short-description>`
+
+| Type | When to use |
+|------|-------------|
+| `feat/` | New functionality |
+| `fix/` | Bug or defect correction |
+| `refactor/` | Code restructuring, no behavior change |
+| `test/` | Adding or updating tests |
+| `docs/` | Documentation only |
+| `chore/` | Dependencies, tooling, housekeeping |
+| `ci/` | CI/CD configuration |
+
+**Commit messages** must follow [Conventional Commits](https://www.conventionalcommits.org/). The commit-msg hook enforces this locally; CI enforces it on push.
+
+```
+<type>(<scope>): <short summary>
+```
+
+Valid types: `feat` `fix` `docs` `chore` `refactor` `test` `infra` `ci` `perf` `build` `revert`
+
+```
+feat(pi-agent-core): add streaming output for long-running tasks
+fix(pi-ai): resolve null pointer on empty provider response
+chore(deps): bump typescript from 5.3.0 to 5.4.2
+```
+
+Keep branches current by rebasing onto `main` — do not merge `main` into your feature branch:
+
+```bash
+git fetch origin
+git rebase origin/main
+```
+
+## Working with GSD (team workflow)
+
+GSD uses worktree-based isolation for multi-developer work. If you're contributing with GSD running, enable team mode in your project preferences:
+
+```yaml
+# .gsd/PREFERENCES.md
+---
+version: 1
+mode: team
+---
+```
+
+This enables unique milestone IDs, branch pushing, and pre-merge checks — preventing milestone ID collisions when multiple contributors run auto-mode simultaneously. Each developer gets their own isolated worktree; squash merges to `main` happen independently.
+
+For full details see [docs/working-in-teams.md](docs/working-in-teams.md) and [docs/git-strategy.md](docs/git-strategy.md).
+
 ## Opening a pull request

 ### PR description format
@ -65,10 +118,12 @@ If your PR changes any public API, CLI behavior, config format, or file structur

 AI-generated PRs are first-class citizens here. We welcome them. We just ask for transparency:

- **Disclose it.** Note that the PR is AI-assisted in your description.
+- **Disclose it.** Note that the PR is AI-assisted in your description. Do not credit the AI tool as an author or co-author in the commit or PR.
 - **Test it.** AI-generated code must be tested to the same standard as human-written code. "The AI said it works" is not a test plan.
 - **Understand it.** You should be able to explain what the code does and why. If a reviewer asks a question, "I'll ask the AI" is not an answer.

+AI agents opening PRs must follow the same workflow as human contributors: clean working tree, new branch per task, CI passing before requesting review. Multi-phase work should start as a Draft PR and only move to Ready when complete.
+
 AI PRs go through the same review process as any other PR. No special treatment in either direction.

 ## Architecture guidelines
@ -91,9 +146,14 @@ The codebase is organized into these areas. All are open to contributions:
 | AI/LLM layer | `packages/pi-ai` | Provider integrations, model handling |
 | Agent core | `packages/pi-agent-core` | Agent orchestration — RFC required for changes |
 | Coding agent | `packages/pi-coding-agent` | The main coding agent |
+| MCP server | `packages/mcp-server` | Project state tools and MCP protocol |
 | GSD extension | `src/resources/extensions/gsd/` | GSD workflow — RFC required for auto-mode |
-| Native bindings | `native/` | Platform-specific native code |
+| Other extensions | `src/resources/extensions/` | Browser, search, voice, MCP client, etc. |
+| Native engine | `native/` | Rust N-API modules (grep, git, AST, etc.) |
+| VS Code extension | `vscode-extension/` | Chat participant, sidebar, RPC integration |
+| Web interface | `web/` | Browser-based dashboard |
 | CI/Build | `.github/`, `scripts/` | Workflows, build scripts |
+| Documentation | `docs/` | User guides, ADRs, SDK docs |

 ## Review process

@ -103,12 +163,113 @@ PRs go through automated review first, then human review. To help us review effi
 - Respond to review comments. If you disagree, explain why — discussion is welcome.
 - If your PR has been open for a while without review, ping in Discord. We're a small team and things slip.

+### What reviewers verify
+
+Reading a diff is not the same as verifying a change. Our review standard is execution-based, not static-analysis-based.
+
+**What reviewers do:**
+
+1. **Check out the branch** — check out the PR branch locally (or in a worktree). Don't review from the diff view alone.
+2. **Build the branch** — run `npm run build`. A diff that doesn't compile is not reviewable.
+3. **Run the test suite** — run `npm test`. CI status is a signal, not a substitute for local verification.
+4. **Trace root cause for bug fixes** — confirm the diff addresses the root cause described in the issue, not just the symptom.
+5. **Check for a regression test** — bug fixes must include a test that would have caught the original bug. If it's absent, the fix is incomplete.
+
+Only after completing these steps should a reviewer make claims about correctness.
+
+**What "looks right" means:**
+
+"Looks right" is the starting point for review, not the conclusion. "The tests pass" only means the tests pass — not that the claimed bug is fixed or the feature works as described. A well-written commit message on a broken change is still a broken change.
+
+### What contributors must provide to unblock review
+
+- **Bug fixes** — include a regression test. A fix without a test is an assertion, not a proof.
+- **Features** — include tests covering the primary success path and at least one failure path.
+- **Behavior changes** — update or replace any existing tests that cover the changed behavior. Don't leave passing-but-wrong tests in place.
+
+If your PR claims to fix issue #N, reviewers will verify the fix addresses the root cause described in #N — not just that CI is green.
+
+## Testing standards
+
+This project uses Node.js built-in `node:test` as the test runner. All new tests must follow these patterns:
+
+### Use `node:test` and `node:assert/strict`
+
+```typescript
+import { describe, test, beforeEach, afterEach } from "node:test";
+import assert from "node:assert/strict";
+```
+
+Do not use `createTestContext()` from `test-helpers.ts` (legacy, being removed). Do not introduce Jest, Vitest, or other test frameworks.
+
+### Use `beforeEach`/`afterEach` or `t.after()` for cleanup — never `try`/`finally`
+
+```typescript
+// ✅ CORRECT — shared fixture with beforeEach/afterEach
+describe("feature", () => {
+  let tmp: string;
+  beforeEach(() => { tmp = mkdtempSync(join(tmpdir(), "test-")); });
+  afterEach(() => { rmSync(tmp, { recursive: true, force: true }); });
+
+  test("case", () => { /* clean test body */ });
+});
+
+// ✅ CORRECT — per-test cleanup with t.after()
+test("case", (t) => {
+  const tmp = mkdtempSync(join(tmpdir(), "test-"));
+  t.after(() => { rmSync(tmp, { recursive: true, force: true }); });
+  // test body
+});
+
+// ❌ WRONG — inline try/finally
+test("case", () => {
+  const tmp = mkdtempSync(join(tmpdir(), "test-"));
+  try {
+    // test body
+  } finally {
+    rmSync(tmp, { recursive: true, force: true });
+  }
+});
+```
+
+**When to use which:**
+- `beforeEach`/`afterEach` — when all tests in a `describe` block share the same setup/teardown pattern
+- `t.after()` — when each test has unique cleanup (different fixtures, env vars, etc.)
+- `try`/`finally` — only inside standalone helper functions that don't have access to the test context `t` (e.g., `withEnv()`, `capture()`)
+
+### Template literal fixture data
+
+When constructing multi-line fixture content (markdown, YAML, etc.) inside indented test blocks, use array join to avoid unintended leading whitespace:
+
+```typescript
+// ✅ CORRECT — no indentation leakage
+const content = [
+  "## Slices",
+  "- [x] **S01: First slice**",
+  "- [ ] **S02: Second slice**",
+].join("\n");
+
+// ❌ WRONG — template literal inside describe/test adds leading spaces
+const content = `
+  ## Slices
+  - [x] **S01: First slice**
+`;
+// Each line now has 2 leading spaces, breaking ^## regex anchors
+```
+
+### Test-first for bug fixes
+
+Bug fixes must include a regression test that fails before the fix and passes after. Write the test first, confirm it fails, then apply the fix. See the `test-first-bugfix` skill.
+
 ## Local development

 ```bash
 # Install dependencies
 npm ci

+# Install git hooks (secret scanning + commit message validation)
+npm run secret-scan:install-hook
+
 # Build
 npm run build

@ -119,6 +280,10 @@ npm test
 npx tsc --noEmit
 ```

+Run `npm run secret-scan:install-hook` once after cloning. It installs two hooks:
+- **pre-commit** — blocks commits containing hardcoded secrets or credentials
+- **commit-msg** — validates Conventional Commits format before the commit lands
+
 CI must pass before your PR will be reviewed. Run these locally to save time.

 ## Security
--- a/23
+++ b/23
@ -1,26 +1,5 @@
 # ──────────────────────────────────────────────
-# Stage 1: CI Builder
-# Image: ghcr.io/gsd-build/gsd-ci-builder
-# Used by: pipeline.yml Dev stage
-# ──────────────────────────────────────────────
-FROM node:24-bookworm AS builder
-
-# Rust toolchain (stable, minimal profile)
-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable --profile minimal
-ENV PATH="/root/.cargo/bin:${PATH}"
-
-# Cross-compilation for linux-arm64
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    gcc-aarch64-linux-gnu \
-    g++-aarch64-linux-gnu \
-    && rustup target add aarch64-unknown-linux-gnu \
-    && rm -rf /var/lib/apt/lists/*
-
-# Verify toolchain
-RUN node --version && rustc --version && cargo --version
-
-# ──────────────────────────────────────────────
-# Stage 2: Runtime
+# Runtime
 # Image: ghcr.io/gsd-build/gsd-pi
 # Used by: end users via docker run
 # ──────────────────────────────────────────────
--- a/README.md
+++ b/README.md
@ -7,8 +7,9 @@
 [![npm version](https://img.shields.io/npm/v/gsd-pi?style=for-the-badge&logo=npm&logoColor=white&color=CB3837)](https://www.npmjs.com/package/gsd-pi)
 [![npm downloads](https://img.shields.io/npm/dm/gsd-pi?style=for-the-badge&logo=npm&logoColor=white&color=CB3837)](https://www.npmjs.com/package/gsd-pi)
 [![GitHub stars](https://img.shields.io/github/stars/gsd-build/GSD-2?style=for-the-badge&logo=github&color=181717)](https://github.com/gsd-build/GSD-2)
-[![Discord](https://img.shields.io/badge/Discord-Join%20us-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/gsd)
+[![Discord](https://img.shields.io/badge/Discord-Join%20us-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.com/invite/nKXTsAcmbT)
 [![License](https://img.shields.io/badge/license-MIT-blue?style=for-the-badge)](LICENSE)
+[![$GSD Token](https://img.shields.io/badge/$GSD-Dexscreener-1C1C1C?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjQiIGhlaWdodD0iMjQiIHZpZXdCb3g9IjAgMCAyNCAyNCIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48Y2lyY2xlIGN4PSIxMiIgY3k9IjEyIiByPSIxMCIgZmlsbD0iIzAwRkYwMCIvPjwvc3ZnPg==&logoColor=00FF00)](https://dexscreener.com/solana/dwudwjvan7bzkw9zwlbyv6kspdlvhwzrqy6ebk8xzxkv)

 The original GSD went viral as a prompt framework for Claude Code. It worked, but it was fighting the tool — injecting prompts through slash commands, hoping the LLM would follow instructions, with no actual control over context windows, sessions, or execution.

@ -18,81 +19,77 @@ One command. Walk away. Come back to a built project with clean git history.

 <pre><code>npm install -g gsd-pi@latest</code></pre>

+> GSD now provisions a managed [RTK](https://github.com/rtk-ai/rtk) binary on supported macOS, Linux, and Windows installs to compress shell-command output in `bash`, `async_bash`, `bg_shell`, and verification flows. GSD forces `RTK_TELEMETRY_DISABLED=1` for all managed invocations. Set `GSD_RTK_DISABLED=1` to disable the integration.
+
 > **📋 NOTICE: New to Node on Mac?** If you installed Node.js via Homebrew, you may be running a development release instead of LTS. **[Read this guide](./docs/node-lts-macos.md)** to pin Node 24 LTS and avoid compatibility issues.

 </div>

 ---

-## What's New in v2.41.0
+## What's New in v2.67

-### New Features
+### Context Engineering

- **Browser-based web interface** — run GSD from the browser with `gsd --web`. Full project management, real-time progress, and multi-project support via server-sent events. (#1717)
- **Doctor: worktree lifecycle checks** — `/gsd doctor` now validates worktree health, detects orphaned worktrees, consolidates cleanup, and enhances `/worktree list` with lifecycle status. (#1814)
- **CI: docs-only PR detection** — PRs that only change documentation skip build and test steps, with a new prompt injection scan for security. (#1699)
- **Custom Models guide** — new documentation for adding custom providers (Ollama, vLLM, LM Studio, proxies) via `models.json`. (#1670)
+- **Tiered Context Injection (M005)** — relevance-scoped context with 65%+ token reduction. Decision scope cascade derives context from slice metadata instead of blanket injection.
+- **Resilient transient error recovery** — defers to Core RetryHandler and fixes cmdCtx race conditions for more reliable auto-mode sessions.

-### Data Loss Prevention (Critical Fixes)
+### Provider & Model Improvements

-This release includes 7 fixes preventing silent data loss in auto-mode:
+- **Anthropic subscription routing** — users with Anthropic subscriptions are automatically routed through Claude Code CLI provider with proper display names across all UI surfaces.
+- **Claude Code provider hardening** — native Windows claude lookup, fallback guards, and `out of extra usage` error matching.
+- **XML parameter recovery** — pi-ai recovers XML parameters trapped in JSON strings from providers.

- **Hallucination guard** — execute-task agents that complete with zero tool calls are now rejected as hallucinated. Previously, agents could produce detailed but fabricated summaries without writing any code, wasting ~$25/milestone. (#1838)
- **Merge anchor verification** — before deleting a milestone worktree/branch, GSD now verifies the code is actually on the integration branch. Prevents orphaning commits when squash-merge produces an empty diff. (#1829)
- **Dirty working tree detection** — `nativeMergeSquash` now distinguishes dirty-tree rejections from content conflicts, preventing silent commit loss when synced `.gsd/` files block the merge. (#1752)
- **Doctor cleanup safety** — the `orphaned_completed_units` check no longer auto-fixes during post-task health checks. Previously, timing races could cause the doctor to remove valid completion keys, reverting users to earlier tasks. (#1825)
- **Root file reverse-sync** — worktree teardown now syncs root-level `.gsd/` files (PROJECT.md, REQUIREMENTS.md, completed-units.json) back to the project root. Previously these were lost on milestone closeout. (#1831)
- **Empty merge guard** — milestone branches with unanchored code changes are preserved instead of deleted when squash-merge produces nothing to commit. (#1755)
- **Crash-safe task closeout** — orphaned checkboxes in PLAN.md are unchecked on retry, preventing phantom task completion. (#1759)
+### Safety & Data Integrity

-### Auto-Mode Stability
+- **LLM safety harness** — auto-mode damage control prevents the LLM from running destructive operations or querying `gsd.db` directly via bash.
+- **5-wave state machine hardening** — critical data integrity fixes across atomic writes, randomized tmp paths, event log reconciliation, session recovery, and consistency enforcement. 86+ regression tests added.
+- **Discussion gate enforcement** — mechanical enforcement for discussion question gates with fail-closed behavior.
+- **Enhanced verification** — pre-execution plan verification checks, post-execution cross-task consistency checks, blocking behavior and strict mode.

- **Terminal hang fix** — `stopAuto()` now resolves pending promises, preventing the terminal from freezing permanently after stopping auto-mode. (#1818)
- **Signal handler coverage** — SIGHUP and SIGINT now clean up lock files, not just SIGTERM. Prevents stranded locks on VS-Code crash. (#1821)
- **Needs-discussion routing** — milestones in `needs-discussion` phase now route to the smart entry UI instead of hard-stopping, breaking the infinite loop. (#1820)
- **Infrastructure error handling** — auto-mode stops immediately on ENOSPC, ENOMEM, and similar unrecoverable errors instead of retrying. (#1780)
- **Dependency-aware dispatch** — slice dispatch now uses declared `depends_on` instead of positional ordering. (#1770)
- **Queue mode depth verification** — the write gate now processes depth verification in queue mode, fixing a deadlock where CONTEXT.md writes were permanently blocked. (#1823)
+### Parallel Execution & Dispatch

-### Roadmap Parser Improvements
+- **Slice-level parallelism** — dependency-aware parallel dispatch within a milestone, not just across milestones.
+- **Parallel research slices** — research and milestone validation run in parallel.
+- **Worker model override** — configure different models for parallel milestone workers.

- **Table format support** — roadmaps using markdown tables (`| S01 | Title | Risk | Status |`) are now parsed correctly. (#1741)
- **Prose header fallback** — when `## Slices` contains H3 headers instead of checkboxes, the prose parser is invoked as a fallback. (#1744)
- **Completion marker detection** — prose headers with `✓` or `(Complete)` markers are correctly identified as done. (#1816)
- **Zero-slice stub handling** — stub roadmaps from `/gsd queue` return `pre-planning` instead of `blocked`. (#1826)
- **Immediate roadmap fix** — roadmap checkbox and UAT stub are fixed immediately after last task instead of deferring to `complete-slice`. (#1819)
+### TUI & Notifications

-### State & Git Improvements
+- **Persistent notification panel** — TUI overlay, widget, and web API for real-time notifications.
+- **Remote questions race** — local TUI races against remote channel (Slack/Discord) instead of remote-only routing.
+- **OS-specific keyboard shortcuts** — shortcut hints now adapt to macOS/Linux/Windows.
+- **`/gsd show-config`** — inspect active configuration at a glance.

- **CONTEXT-DRAFT.md fallback** — `depends_on` is read from CONTEXT-DRAFT.md when CONTEXT.md doesn't exist, preventing draft milestones from being promoted past dependency constraints. (#1743)
- **Unborn branch support** — `nativeBranchExists` handles repos with zero commits, preventing dispatch deadlock on new repos. (#1815)
- **Ghost milestone detection** — empty `.gsd/milestones/` directories are skipped instead of crashing `deriveState()`. (#1817)
- **Default branch detection** — milestone merge detects `master` vs `main` instead of hardcoding. (#1669)
- **Milestone title extraction** — titles are pulled from CONTEXT.md headings when no ROADMAP exists. (#1729)
+### Infrastructure

-### Windows & Platform
+- **Ollama native provider** — `/api/chat` provider with full option exposure, `apiKey` auth mode, and headless probe.
+- **MCP OAuth** — MCP client supports OAuth auth provider for HTTP transport.
+- **WAL-safe migration backup** — database migrations create WAL-safe backups with stronger regression tests.
+- **Xcode/xcodegen detection** — project detection now supports Xcode bundles and xcodegen.
+- **170+ bug fixes** — state machine resilience, worktree safety, prompt injection, session recovery, and more.

- **Windows path handling** — 8.3 short paths, `pathToFileURL` for ESM imports, and `realpathSync.native` fixes across the test suite and verification gate. (#1804)
- **DEP0190 fix** — `spawnSync` deprecation warning eliminated by passing commands to shell explicitly. (#1827)
- **Web build skip on Windows** — Next.js webpack EPERM errors on system directories are handled gracefully.
+See the full [Changelog](./CHANGELOG.md) for details on every release.

-### Developer Experience
+<details>
+<summary>Previous highlights (v2.63 and earlier)</summary>

- **@ file finder fix** — typing `@` no longer freezes the TUI. The fix adds debounce, dedup, and empty-query short-circuit. (#1832)
- **Tool-call loop guard** — detects and breaks infinite tool-call loops within a single unit, preventing stack overflow. (#1801)
- **Completion deferral fix** — roadmap checkbox and UAT stub are fixed at task level, closing the fragile handoff window between last task and `complete-slice`. (#1819)
+- **MCP server** — 6 read-only project state tools for external integrations, auto-wrapup guard, and question dedup
+- **Ollama extension** — first-class local LLM support via Ollama, with dynamic routing enabled by default
+- **Discord bot & daemon** — dedicated daemon package, Discord bot, and headless text mode with tool calls
+- **Capability-aware model routing (ADR-004)** — capability scoring, `before_model_select` hook, and task metadata extraction
+- **VS Code sidebar redesign** — SCM provider, checkpoints, diagnostics panel, activity feed, workflow controls, session forking
+- **`/gsd parallel watch`** — native TUI overlay for real-time worker monitoring
+- **Codebase map** — automatic codebase map injection for fresh agent contexts
+- **`--resume` flag** — resume previous sessions from the CLI
+- **Concurrent invocation guard** — prevents overlapping auto-mode runs
+- **VS Code integration** — status bar, file decorations, bash terminal, session tree, conversation history, and code lens
+- **Skills overhaul** — 30+ skill packs covering major frameworks, databases, and cloud platforms
+- **Single-writer state engine** — disciplined state transitions with machine guards and TOCTOU hardening
+- **DB-backed planning tools** — atomic SQLite tool calls for state transitions
+- **Declarative workflow engine** — YAML workflows through auto-loop
+- **Doctor: worktree lifecycle checks** — validates worktree health, detects orphans, consolidates cleanup

-See the full [Changelog](./CHANGELOG.md) for all 70+ fixes in this release.
-
-### Previous highlights (v2.39–v2.40)
-
- **GitHub sync extension** — auto-sync milestones to GitHub Issues, PRs, and Milestones
- **Skill tool resolution** — skills auto-activate in dispatched prompts
- **Health check phase 2** — real-time doctor issues in dashboard and visualizer
- **Forensics upgrade** — full-access GSD debugger with anomaly detection
- **Pipeline decomposition** — auto-loop rewritten as linear phase pipeline
- **Sliding-window stuck detection** — pattern-aware, fewer false positives
- **Data-loss recovery** — automatic detection and recovery from v2.30–v2.38 migration issues
+</details>

 ---

@ -118,7 +115,9 @@ Full documentation is available in the [`docs/`](./docs/) directory:
 - **[Visualizer](./docs/visualizer.md)** — workflow visualizer with stats and discussion status
 - **[Remote Questions](./docs/remote-questions.md)** — route decisions to Slack or Discord when human input is needed
 - **[Dynamic Model Routing](./docs/dynamic-model-routing.md)** — complexity-based model selection and budget pressure
+- **[Web Interface](./docs/web-interface.md)** — browser-based project management and real-time progress
 - **[Pipeline Simplification (ADR-003)](./docs/ADR-003-pipeline-simplification.md)** — merged research into planning, mechanical completion
+- **[Docker Sandbox](./docker/README.md)** — run GSD auto mode in an isolated Docker container
 - **[Migration from v1](./docs/migration.md)** — `.planning` → `.gsd` migration

 ---
@ -218,7 +217,7 @@ Auto mode is a state machine driven by files on disk. It reads `.gsd/STATE.md`,

 2. **Context pre-loading** — The dispatch prompt includes inlined task plans, slice plans, prior task summaries, dependency summaries, roadmap excerpts, and decisions register. The LLM starts with everything it needs instead of spending tool calls reading files.

-3. **Git worktree isolation** — Each milestone runs in its own git worktree with a `milestone/<MID>` branch. All slice work commits sequentially — no branch switching, no merge conflicts. When the milestone completes, it's squash-merged to main as one clean commit.
+3. **Git isolation** — When `git.isolation` is set to `worktree` or `branch`, each milestone runs on its own `milestone/<MID>` branch (in a worktree or in-place). All slice work commits sequentially — no branch switching, no merge conflicts. When the milestone completes, it's squash-merged to main as one clean commit. The default is `none` (work on the current branch), configurable via preferences.

 4. **Crash recovery** — A lock file tracks the current unit. If the session dies, the next `/gsd auto` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context. Parallel orchestrator state is persisted to disk with PID liveness detection, so multi-worker sessions survive crashes too. In headless mode, crashes trigger automatic restart with exponential backoff (default 3 attempts).

@ -354,6 +353,8 @@ On first run, GSD launches a branded setup wizard that walks you through LLM pro
 | `/gsd stop`             | Stop auto mode gracefully                                       |
 | `/gsd steer`            | Hard-steer plan documents during execution                      |
 | `/gsd discuss`          | Discuss architecture and decisions (works alongside auto mode)  |
+| `/gsd rethink`          | Conversational project reorganization                           |
+| `/gsd mcp`              | MCP server status and connectivity                              |
 | `/gsd status`           | Progress dashboard                                              |
 | `/gsd queue`            | Queue future milestones (safe during auto mode)                 |
 | `/gsd prefs`            | Model selection, timeouts, budget ceiling                       |
@ -460,7 +461,7 @@ An auto-generated `index.html` shows all reports with progression metrics across

 ### Preferences

-GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project). Manage with `/gsd prefs`.
+GSD preferences live in `~/.gsd/PREFERENCES.md` (global) or `.gsd/PREFERENCES.md` (project). Manage with `/gsd prefs`.

 ```yaml
 ---
@ -501,7 +502,7 @@ auto_report: true
 | `skill_rules`          | Situational rules for skill routing                                                                   |
 | `skill_staleness_days` | Skills unused for N days get deprioritized (default: 60, 0 = disabled)                                |
 | `unique_milestone_ids` | Uses unique milestone names to avoid clashes when working in teams of people                          |
-| `git.isolation`        | `worktree` (default), `branch`, or `none` — disable worktree isolation for projects that don't need it           |
+| `git.isolation`        | `none` (default), `worktree`, or `branch` — enable worktree or branch isolation for milestone work               |
 | `git.manage_gitignore` | Set `false` to prevent GSD from modifying `.gitignore`                                                           |
 | `verification_commands`| Array of shell commands to run after task execution (e.g., `["npm run lint", "npm run test"]`)        |
 | `verification_auto_fix`| Auto-retry on verification failures (default: true)                                                   |
@ -542,7 +543,7 @@ See the full [Token Optimization Guide](./docs/token-optimization.md) for detail

 ### Bundled Tools

-GSD ships with 19 extensions, all loaded automatically:
+GSD ships with 24 extensions, all loaded automatically:

 | Extension              | What it provides                                                                                                       |
 | ---------------------- | ---------------------------------------------------------------------------------------------------------------------- |
@ -564,7 +565,12 @@ GSD ships with 19 extensions, all loaded automatically:
 | **Remote Questions**   | Route decisions to Slack/Discord when human input is needed in headless/CI mode                                         |
 | **Universal Config**   | Discover and import MCP servers and rules from other AI coding tools                                                    |
 | **AWS Auth**           | Automatic Bedrock credential refresh for AWS-hosted models                                                              |
-| **TTSR**               | Tool-use type-safe runtime validation                                                                                   |
+| **Ollama**             | First-class local LLM support via Ollama                                                                                |
+| **Claude Code CLI**    | External provider extension for Claude Code CLI                                                                         |
+| **cmux**               | Claude multiplexer integration — desktop notifications, sidebar metadata, visual subagent splits                        |
+| **GitHub Sync**        | Auto-sync milestones to GitHub Issues, PRs, and Milestones                                                              |
+| **LSP**                | Language Server Protocol — diagnostics, definitions, references, hover, rename                                          |
+| **TTSR**               | Tool-triggered system rules — conditional context injection based on tool usage                                         |

 ### Bundled Agents

@ -611,7 +617,7 @@ The best practice for working in teams is to ensure unique milestone names acros

 ### Unique Milestone Names

-Create or amend your `.gsd/preferences.md` file within the repo to include `unique_milestone_ids: true` e.g.
+Create or amend your `.gsd/PREFERENCES.md` file within the repo to include `unique_milestone_ids: true` e.g.

 ```markdown
 ---
@ -620,7 +626,7 @@ unique_milestone_ids: true
 ---
 ```

-With the above `.gitignore` set up, the `.gsd/preferences.md` file is checked into the repo ensuring all teammates use unique milestone names to avoid collisions.
+With the above `.gitignore` set up, the `.gsd/PREFERENCES.md` file is checked into the repo ensuring all teammates use unique milestone names to avoid collisions.

 Milestone names will now be generated with a 6 char random string appended e.g. instead of `M001` you'll get something like `M001-ush8s3`

@ -628,7 +634,7 @@ Milestone names will now be generated with a 6 char random string appended e.g.

 1. Ensure you are not in the middle of any milestones (clean state)
 2. Update the `.gsd/` related entries in your `.gitignore` to follow the `Suggested .gitignore setup` section under `Working in teams` (ensure you are no longer blanket ignoring the whole `.gsd/` directory)
-3. Update your `.gsd/preferences.md` file within the repo as per section `Unique Milestone Names`
+3. Update your `.gsd/PREFERENCES.md` file within the repo as per section `Unique Milestone Names`
 4. If you want to update all your existing milestones use this prompt in GSD: `I have turned on unique milestone ids, please update all old milestone ids to use this new format e.g. M001-abc123 where abc123 is a random 6 char lowercase alpha numeric string. Update all references in all .gsd file contents, file names and directory names. Validate your work once done to ensure referential integrity.`
 5. Commit to git

@ -649,7 +655,7 @@ gsd (CLI binary)
          ├─ resource-loader.ts  Syncs bundled extensions + agents to ~/.gsd/agent/
          └─ src/resources/
              ├─ extensions/gsd/    Core GSD extension (auto, state, commands, ...)
-              ├─ extensions/...     18 supporting extensions
+              ├─ extensions/...     23 supporting extensions
              ├─ agents/            scout, researcher, worker
              ├─ AGENTS.md          Agent routing instructions
              └─ GSD-WORKFLOW.md    Manual bootstrap protocol
--- a/docker/.env.example
+++ b/docker/.env.example
@ -0,0 +1,44 @@
+# ──────────────────────────────────────────────
+# GSD Docker Sandbox — Environment Variables
+# Copy this file to .env and fill in your keys.
+# ──────────────────────────────────────────────
+
+# ── Container User Identity ──
+# Match your host UID/GID to avoid permission issues on bind mounts.
+# Run `id -u` and `id -g` on your host to find the right values.
+PUID=1000
+PGID=1000
+
+# ── LLM Provider API Keys (at least one required) ──
+
+# Anthropic (Claude)
+# ANTHROPIC_API_KEY=sk-ant-...
+
+# OpenAI
+# OPENAI_API_KEY=sk-...
+
+# Google (Gemini)
+# GOOGLE_API_KEY=...
+
+# OpenRouter (multi-provider gateway)
+# OPENROUTER_API_KEY=sk-or-...
+
+# ── Optional: Research & Search Tools ──
+
+# Brave Search API
+# BRAVE_API_KEY=...
+
+# Tavily Search API
+# TAVILY_API_KEY=tvly-...
+
+# Jina AI (reader/search)
+# JINA_API_KEY=...
+
+# ── Optional: Git & GitHub ──
+
+# GitHub personal access token (for PR operations)
+# GITHUB_TOKEN=ghp_...
+
+# Git author identity inside the sandbox
+# GIT_AUTHOR_NAME=Your Name
+# GIT_AUTHOR_EMAIL=you@example.com
--- a/docker/Dockerfile.ci-builder
+++ b/docker/Dockerfile.ci-builder
@ -0,0 +1,20 @@
+# ──────────────────────────────────────────────
+# CI Builder
+# Image: ghcr.io/gsd-build/gsd-ci-builder
+# Used by: pipeline.yml Dev stage
+# ──────────────────────────────────────────────
+FROM node:24-bookworm
+
+# Rust toolchain (stable, minimal profile)
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable --profile minimal
+ENV PATH="/root/.cargo/bin:${PATH}"
+
+# Cross-compilation for linux-arm64
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc-aarch64-linux-gnu \
+    g++-aarch64-linux-gnu \
+    && rustup target add aarch64-unknown-linux-gnu \
+    && rm -rf /var/lib/apt/lists/*
+
+# Verify toolchain
+RUN node --version && rustc --version && cargo --version
--- a/docker/Dockerfile.sandbox
+++ b/docker/Dockerfile.sandbox
@ -0,0 +1,42 @@
+# ──────────────────────────────────────────────
+# GSD Docker Sandbox Template
+# Base: docker/sandbox-templates:shell
+# Purpose: Isolated environment for GSD auto mode
+# Usage: docker sandbox create --template ./docker
+# ──────────────────────────────────────────────
+FROM node:24-bookworm-slim
+
+# System dependencies required by GSD
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    curl \
+    ca-certificates \
+    openssh-client \
+    gosu \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install GSD globally — version controlled via build arg
+ARG GSD_VERSION=latest
+RUN npm install -g gsd-pi@${GSD_VERSION}
+
+# Create non-root user for sandbox isolation
+RUN groupadd --gid 1000 gsd \
+    && useradd --uid 1000 --gid gsd --shell /bin/bash --create-home gsd
+
+# Persistent GSD state directory
+RUN mkdir -p /home/gsd/.gsd && chown -R gsd:gsd /home/gsd/.gsd
+
+# Workspace directory — synced from host via Docker sandbox
+WORKDIR /workspace
+RUN chown gsd:gsd /workspace
+
+# Entrypoint handles UID/GID remapping, bootstrap, and drops to gsd user
+COPY entrypoint.sh /usr/local/bin/entrypoint.sh
+COPY bootstrap.sh /usr/local/bin/bootstrap.sh
+RUN chmod +x /usr/local/bin/entrypoint.sh /usr/local/bin/bootstrap.sh
+
+# Expose default GSD web UI port
+EXPOSE 3000
+
+ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
+CMD ["gsd", "--help"]
--- a/docker/README.md
+++ b/docker/README.md
@ -0,0 +1,144 @@
+# GSD Docker Sandbox
+
+Run GSD auto mode inside an isolated Docker sandbox so it cannot touch your host filesystem, SSH keys, or other projects.
+
+## Prerequisites
+
+- Docker Desktop 4.58+ (macOS or Windows; Linux support is experimental)
+- At least one LLM provider API key
+
+## Docker Images
+
+| File | Purpose |
+|------|---------|
+| `Dockerfile.sandbox` | Runtime sandbox with entrypoint (UID remapping, bootstrap) |
+| `Dockerfile.ci-builder` | CI builds — includes build tools, no entrypoint magic |
+
+## Compose Files
+
+| File | Purpose |
+|------|---------|
+| `docker-compose.yaml` | Minimal zero-config setup — just works with sensible defaults |
+| `docker-compose.full.yaml` | Fully documented reference with all options, resource limits, health checks |
+
+Start with `docker-compose.yaml`. Copy options from `docker-compose.full.yaml` when you need them.
+
+## Quick Start
+
+### Option A: Docker Sandbox CLI (recommended)
+
+Docker Sandboxes provide MicroVM isolation — each sandbox runs in a lightweight VM with its own kernel and private Docker daemon.
+
+```bash
+# Create a sandbox from the template
+docker sandbox create --template ./docker --name gsd-sandbox
+
+# Shell into the sandbox
+docker sandbox exec -it gsd-sandbox bash
+
+# Inside the sandbox, run GSD
+gsd auto "implement the feature described in issue #42"
+```
+
+### Option B: Docker Compose
+
+For environments without Docker Sandbox support, use Compose for container-level isolation:
+
+```bash
+# 1. Configure API keys
+cp docker/.env.example docker/.env
+# Edit docker/.env with your keys
+
+# 2. Start the sandbox
+docker compose -f docker/docker-compose.yaml up -d
+
+# 3. Shell into the container
+docker exec -it gsd-sandbox bash
+
+# 4. Run GSD inside the container
+gsd auto "implement the feature described in issue #42"
+```
+
+## UID/GID Remapping
+
+The entrypoint handles UID/GID remapping via `PUID` and `PGID` environment variables. This avoids permission issues on bind-mounted volumes by matching the container's `gsd` user to your host UID/GID.
+
+```bash
+# Find your host UID/GID
+id -u  # PUID
+id -g  # PGID
+```
+
+Set these in your `.env` file or in the `environment` section of the compose file. Defaults to `1000:1000`.
+
+## Entrypoint Behavior
+
+The container entrypoint (`entrypoint.sh`) runs four steps on every start:
+
+1. **UID/GID remapping** — adjusts the `gsd` user to match `PUID`/`PGID`
+2. **Pre-create critical files** — prevents Docker bind-mount from creating directories where files are expected
+3. **Sentinel-based bootstrap** — runs `bootstrap.sh` exactly once on first boot
+4. **Drop privileges** — `exec gosu gsd` for proper PID 1 signal forwarding
+
+No hardcoded `user:` directive in compose — the entrypoint starts as root, remaps, then drops to `gsd`.
+
+## Two-Terminal Workflow
+
+GSD's recommended workflow uses two terminals — one for auto mode, one for interactive discussion:
+
+```bash
+# Terminal 1: auto mode
+docker sandbox exec -it gsd-sandbox bash
+gsd auto "your task description"
+
+# Terminal 2: discuss / monitor
+docker sandbox exec -it gsd-sandbox bash
+gsd discuss
+```
+
+With Docker Compose, replace `docker sandbox exec` with `docker exec`.
+
+## Credential Injection
+
+### Docker Sandbox (automatic)
+
+Docker's proxy layer forwards API keys set in your host shell config (`~/.bashrc`, `~/.zshrc`) into the sandbox automatically. Keys are never stored inside the sandbox.
+
+### Docker Compose (manual)
+
+Copy `docker/.env.example` to `docker/.env` and fill in your keys. The `.env` file is gitignored and never committed.
+
+## Network Allowlisting
+
+If you restrict outbound network access in your sandbox, GSD needs these endpoints:
+
+| Purpose | Endpoints |
+|---------|-----------|
+| LLM APIs | `api.anthropic.com`, `api.openai.com`, `generativelanguage.googleapis.com`, `openrouter.ai` |
+| Package registry | `registry.npmjs.org` |
+| Research tools | `api.search.brave.com`, `api.tavily.com`, `r.jina.ai` |
+| GitHub | `api.github.com`, `github.com` |
+
+## Customizing the Image
+
+Build with a specific GSD version:
+
+```bash
+docker compose -f docker/docker-compose.yaml build --build-arg GSD_VERSION=2.51.0
+```
+
+## Cleanup
+
+```bash
+# Docker Sandbox
+docker sandbox rm gsd-sandbox
+
+# Docker Compose
+docker compose -f docker/docker-compose.yaml down -v
+```
+
+## Known Limitations
+
+- **macOS/Windows only**: Docker Sandboxes require Docker Desktop 4.58+. Linux sandbox support is experimental.
+- **Environment parity**: The sandbox runs Ubuntu (Debian). macOS-only dependencies may not work inside the sandbox.
+- **Named agent registration**: Docker Desktop's built-in named agents (claude, codex, etc.) are registered by Docker itself. Third-party tools cannot register new named agents. GSD uses the generic shell sandbox type with a custom template instead.
--- a/docker/bootstrap.sh
+++ b/docker/bootstrap.sh
@ -0,0 +1,27 @@
+#!/bin/bash
+set -e
+
+# ──────────────────────────────────────────────
+# GSD First-Boot Bootstrap
+#
+# Runs once on initial container creation.
+# Called by entrypoint.sh as the gsd user.
+#
+# This script is idempotent — safe to run multiple
+# times, but the sentinel in entrypoint.sh ensures
+# it only runs once in practice.
+# ──────────────────────────────────────────────
+
+# ── Git Identity ────────────────────────────────────────
+# Without this, git commits inside the container will fail
+# or use garbage defaults.
+
+if [ -n "${GIT_AUTHOR_NAME}" ]; then
+    git config --global user.name "${GIT_AUTHOR_NAME}"
+fi
+
+if [ -n "${GIT_AUTHOR_EMAIL}" ]; then
+    git config --global user.email "${GIT_AUTHOR_EMAIL}"
+fi
+
+echo "Bootstrap complete."
--- a/docker/docker-compose.full.yaml
+++ b/docker/docker-compose.full.yaml
@ -0,0 +1,61 @@
+services:
+  gsd:
+    build:
+      context: .                        # Build context is the docker/ directory
+      dockerfile: Dockerfile.sandbox    # Runtime sandbox image with entrypoint
+      args:
+        GSD_VERSION: latest             # Pin a specific version: GSD_VERSION=2.51.0
+
+    container_name: gsd-sandbox
+
+    ports:
+      - "3000:3000"                     # GSD web UI
+
+    volumes:
+      - ../:/workspace                  # Project root mounted into the container
+      - gsd-state:/home/gsd/.gsd        # Persistent GSD state across restarts
+      # - ~/.ssh:/home/gsd/.ssh:ro      # SSH keys for git operations (read-only)
+      # - ~/.gitconfig:/home/gsd/.gitconfig:ro  # Host git config
+
+    env_file:
+      - .env                            # API keys and secrets (see .env.example)
+
+    environment:
+      - NODE_ENV=development
+      # UID/GID remapping — match your host user to avoid permission issues
+      # on bind-mounted volumes. The entrypoint remaps the container's gsd
+      # user to these IDs at startup. Run `id -u` / `id -g` to find yours.
+      - PUID=1000
+      - PGID=1000
+      # Git identity inside the container (overrides .env if set here)
+      # - GIT_AUTHOR_NAME=Your Name
+      # - GIT_AUTHOR_EMAIL=you@example.com
+
+    stdin_open: true                    # Keep stdin open for interactive use
+    tty: true                           # Allocate a pseudo-TTY
+
+    # Health check — verify GSD is installed and responsive
+    healthcheck:
+      test: ["CMD", "gsd", "--version"]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+      start_period: 10s
+
+    # Resource limits — uncomment to constrain container resources
+    # deploy:
+    #   resources:
+    #     limits:
+    #       cpus: "4.0"
+    #       memory: 8G
+    #     reservations:
+    #       cpus: "1.0"
+    #       memory: 2G
+
+    # Network mode — uncomment ONE if you need host networking
+    # network_mode: host               # Full host network access (no port mapping needed)
+    # network_mode: bridge             # Default Docker bridge (already the default)
+
+volumes:
+  gsd-state:
+    driver: local
--- a/docker/docker-compose.yaml
+++ b/docker/docker-compose.yaml
@ -0,0 +1,23 @@
+services:
+  gsd:
+    build:
+      context: .
+      dockerfile: Dockerfile.sandbox
+      args:
+        GSD_VERSION: latest
+    container_name: gsd-sandbox
+    ports:
+      - "3000:3000"
+    volumes:
+      - ../:/workspace
+      - gsd-state:/home/gsd/.gsd
+    env_file:
+      - .env
+    environment:
+      - NODE_ENV=development
+    stdin_open: true
+    tty: true
+
+volumes:
+  gsd-state:
+    driver: local
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@ -0,0 +1,81 @@
+#!/bin/bash
+set -e
+
+# ──────────────────────────────────────────────
+# GSD Container Entrypoint
+#
+# Responsibilities:
+#   1. UID/GID remapping — match host user via PUID/PGID
+#   2. Pre-create critical files — prevent Docker bind-mount
+#      from creating directories where files are expected
+#   3. Sentinel-based bootstrap — one-time first-boot setup
+#   4. Signal forwarding — exec into the final process
+# ──────────────────────────────────────────────
+
+GSD_USER="gsd"
+GSD_HOME="/home/${GSD_USER}"
+GSD_DIR="${GSD_HOME}/.gsd"
+
+# ── 1. UID/GID Remapping ────────────────────────────────
+# Accept PUID/PGID from the environment so the container
+# can run with the same UID/GID as the host user, avoiding
+# permission headaches on bind-mounted volumes.
+
+PUID="${PUID:-1000}"
+PGID="${PGID:-1000}"
+
+CURRENT_UID=$(id -u "${GSD_USER}")
+CURRENT_GID=$(id -g "${GSD_USER}")
+
+REMAPPED=0
+
+if [ "${PGID}" != "${CURRENT_GID}" ]; then
+    groupmod -o -g "${PGID}" "${GSD_USER}"
+    REMAPPED=1
+fi
+
+if [ "${PUID}" != "${CURRENT_UID}" ]; then
+    usermod -o -u "${PUID}" "${GSD_USER}"
+    REMAPPED=1
+fi
+
+# Fix ownership only when UID/GID actually changed
+if [ "${REMAPPED}" -eq 1 ]; then
+    chown -R "${PUID}:${PGID}" "${GSD_HOME}"
+    chown "${PUID}:${PGID}" /workspace
+fi
+
+# ── 2. Pre-create Critical Files ────────────────────────
+# Docker bind-mounts will create a *directory* if the target
+# path doesn't exist. We need these to be files, so touch
+# them before Docker gets a chance to mangle things.
+
+mkdir -p "${GSD_DIR}"
+
+if [ ! -f "${GSD_DIR}/settings.json" ]; then
+    echo '{}' > "${GSD_DIR}/settings.json"
+fi
+
+chown "${PUID}:${PGID}" "${GSD_DIR}" "${GSD_DIR}/settings.json"
+
+# ── 3. Sentinel-based Bootstrap ─────────────────────────
+# Run first-boot setup exactly once. Subsequent container
+# starts (or restarts) skip this entirely.
+
+SENTINEL="${GSD_DIR}/.bootstrapped"
+
+if [ ! -f "${SENTINEL}" ]; then
+    if [ -x /usr/local/bin/bootstrap.sh ]; then
+        # Run bootstrap as the gsd user so files get correct ownership
+        gosu "${GSD_USER}" /usr/local/bin/bootstrap.sh
+    fi
+    touch "${SENTINEL}"
+    chown "${PUID}:${PGID}" "${SENTINEL}"
+fi
+
+# ── 4. Drop Privileges & Exec ──────────────────────────
+# Replace this shell process with the final command running
+# as the gsd user. exec + gosu = proper PID 1 = proper
+# signal forwarding (SIGTERM, SIGINT, etc.).
+
+exec gosu "${GSD_USER}" "$@"
--- a/docs/ADR-004-capability-aware-model-routing.md
+++ b/docs/ADR-004-capability-aware-model-routing.md
@ -0,0 +1,460 @@
+# ADR-004: Capability-Aware Model Routing
+
+**Status:** Implemented (Phase 2)
+**Date:** 2026-03-26
+**Revised:** 2026-04-03
+**Deciders:** Jeremy McSpadden
+**Related:** ADR-003 (pipeline simplification), [Issue #2655](https://github.com/gsd-build/gsd-2/issues/2655), `docs/dynamic-model-routing.md`
+
+## Context
+
+GSD already supports dynamic model routing in auto-mode, but the current router is fundamentally **complexity-tier and cost based**, not **task-capability based**.
+
+Today the selection pipeline is:
+
+```
+unit dispatch
+  → classifyUnitComplexity(unitType, unitId, basePath, budgetPct)
+      → UNIT_TYPE_TIERS default mapping
+      → analyzeTaskComplexity() / analyzePlanComplexity()  [metadata heuristics]
+      → getAdaptiveTierAdjustment()                        [routing history]
+      → applyBudgetPressure()                              [budget ceiling]
+  → resolveModelForComplexity(classification, phaseConfig, routingConfig, availableModelIds)
+      → downgrade-only: never upgrades beyond user's configured model
+      → MODEL_CAPABILITY_TIER lookup → cheapest available in tier
+      → fallback chain assembly
+  → resolveModelId() → pi.setModel()
+  → before_provider_request hook (payload mutation only)
+```
+
+This architecture works when all models inside a tier are effectively interchangeable. That assumption no longer holds.
+
+Users increasingly configure heterogeneous provider pools through `models.json`, scoped provider setup, and `/scoped-models`. In practice:
+
+- Claude-class models often perform best on greenfield implementation and architecture work
+- Codex-class models often perform best on debugging, refactoring, and root-cause analysis
+- Gemini-class models often perform best on long-context synthesis and research-heavy tasks
+- Fast small models are often best for cheap validation, triage, and lightweight hooks
+
+The current router cannot express those differences. If Claude and Codex are both available at the same tier, GSD either:
+
+- treats them as equivalent and picks the cheaper one, or
+- requires the user to hardcode specific phase models manually
+
+That produces three structural problems:
+
+### 1. Wrong optimization target
+
+The router optimizes primarily for **task difficulty vs model cost**. The real problem is **task requirements vs model strengths**, subject to cost constraints.
+
+### 2. Poor behavior with heterogeneous pools
+
+Different users have different subscriptions and provider access. A fixed mapping like "research always uses Gemini" does not generalize when the user only has Claude + Codex, or only local models.
+
+### 3. Capability knowledge is trapped in user intuition
+
+Experienced users know which models are better at coding, debugging, research, long-context work, or instruction following. GSD has no representation for that knowledge, so it cannot route intelligently on the user's behalf.
+
+The system already has several building blocks that make a richer router feasible:
+
+- unit types already encode the kind of work being dispatched
+- `complexity-classifier.ts` already extracts rich `TaskMetadata` (file counts, dependency counts, tags, complexity keywords, code block counts)
+- `auto-dispatch.ts` and prompt builders provide stable task categories
+- `ctx.modelRegistry.getAvailable()` exposes the current model pool
+- `models.json` already supports user overrides and cost data per model
+- budget ceilings, routing history, and retry escalation already exist
+- the `model_select` hook fires on model changes and could be extended for pre-selection interception
+
+## Decision
+
+**Extend dynamic routing from a one-dimensional tier system to a two-dimensional system that combines complexity classification ("how hard") with capability scoring ("what kind"), while preserving downgrade-only semantics, budget controls, and user overrideability.**
+
+### Design Principles
+
+1. **Downgrade-only invariant is preserved.** The user's configured model for a phase is always the ceiling. Capability scoring ranks models within the eligible set — it never promotes above the user's configured model.
+
+2. **Complexity classification remains.** The existing `classifyUnitComplexity()` pipeline (unit type defaults, task plan analysis, adaptive learning, budget pressure) continues to determine tier eligibility. Capability scoring selects among tier-eligible models.
+
+3. **Cost is a constraint, not a score dimension.** Budget pressure constrains which models are eligible. Capability profiles describe what models are good at, not what they cost.
+
+4. **Requirement vectors are dynamic, not static.** Task requirements are computed from `(unitType, TaskMetadata)`, not from unit type alone.
+
+### The Revised Routing Pipeline
+
+```
+unit dispatch
+  → classifyUnitComplexity(unitType, unitId, basePath, budgetPct)
+      [unchanged — determines tier eligibility and budget filtering]
+  → resolveModelForComplexity(classification, phaseConfig, routingConfig, availableModelIds)
+      → STEP 1: filter to tier-eligible models (downgrade-only from user ceiling)
+      → STEP 2: if capability routing enabled AND >1 eligible model:
+          → computeTaskRequirements(unitType, taskMetadata)
+          → scoreEligibleModels(eligible, taskRequirements)
+          → select highest-scoring model (deterministic tie-break by cost, then ID)
+      → STEP 3: assemble fallback chain
+  → resolveModelId() → pi.setModel()
+```
+
+### Model Capability Profiles
+
+Each model gains an optional capability profile:
+
+```ts
+interface ModelCapabilities {
+  coding: number;       // greenfield implementation, code generation
+  debugging: number;    // root-cause analysis, error diagnosis, refactoring
+  research: number;     // information synthesis, investigation, exploration
+  reasoning: number;    // multi-step logic, planning, architecture
+  speed: number;        // response latency (inverse of thinking time)
+  longContext: number;  // effective use of large input windows
+  instruction: number;  // instruction following, structured output adherence
+}
+```
+
+Scores are normalized `0–100`. Seven dimensions. No `costEfficiency` dimension — cost is handled separately by budget pressure and tier economics.
+
+Models without a capability profile are treated as having uniform scores across all dimensions (score 50 in each), which makes capability scoring a no-op for those models and falls back to the existing cheapest-in-tier behavior.
+
+### Dynamic Task Requirement Vectors
+
+Requirement vectors are computed as a function of `(unitType, TaskMetadata)`, not looked up from a static table. This preserves the nuance that `classifyUnitComplexity` already captures.
+
+```ts
+function computeTaskRequirements(
+  unitType: string,
+  metadata?: TaskMetadata,
+): Partial<Record<keyof ModelCapabilities, number>> {
+  // Base vector from unit type
+  const base = BASE_REQUIREMENTS[unitType] ?? { reasoning: 0.5 };
+
+  // Refine based on task metadata (only for execute-task)
+  if (unitType === "execute-task" && metadata) {
+    // Docs/config/rename tasks → boost instruction, reduce coding
+    if (metadata.tags?.some(t => /^(docs?|readme|comment|config|typo|rename)$/i.test(t))) {
+      return { ...base, instruction: 0.9, coding: 0.3, speed: 0.7 };
+    }
+    // Debugging keywords → boost debugging and reasoning
+    if (metadata.complexityKeywords?.some(k => k === "concurrency" || k === "compatibility")) {
+      return { ...base, debugging: 0.9, reasoning: 0.8 };
+    }
+    // Migration/architecture → boost reasoning and coding
+    if (metadata.complexityKeywords?.some(k => k === "migration" || k === "architecture")) {
+      return { ...base, reasoning: 0.9, coding: 0.8 };
+    }
+    // Many files or high estimated lines → boost coding
+    if ((metadata.fileCount ?? 0) >= 6 || (metadata.estimatedLines ?? 0) >= 500) {
+      return { ...base, coding: 0.9, reasoning: 0.7 };
+    }
+  }
+
+  return base;
+}
+```
+
+Base requirement vectors by unit type:
+
+```ts
+const BASE_REQUIREMENTS: Record<string, Partial<Record<keyof ModelCapabilities, number>>> = {
+  "execute-task":        { coding: 0.9, instruction: 0.7, speed: 0.3 },
+  "research-milestone":  { research: 0.9, longContext: 0.7, reasoning: 0.5 },
+  "research-slice":      { research: 0.9, longContext: 0.7, reasoning: 0.5 },
+  "plan-milestone":      { reasoning: 0.9, coding: 0.5 },
+  "plan-slice":          { reasoning: 0.9, coding: 0.5 },
+  "replan-slice":        { reasoning: 0.9, debugging: 0.6, coding: 0.5 },
+  "reassess-roadmap":    { reasoning: 0.9, research: 0.5 },
+  "complete-slice":      { instruction: 0.8, speed: 0.7 },
+  "run-uat":             { instruction: 0.7, speed: 0.8 },
+  "discuss-milestone":   { reasoning: 0.6, instruction: 0.7 },
+  "complete-milestone":  { instruction: 0.8, reasoning: 0.5 },
+};
+```
+
+### Scoring Function
+
+```ts
+function scoreModel(
+  model: ModelCapabilities,
+  requirements: Partial<Record<keyof ModelCapabilities, number>>,
+): number {
+  let weightedSum = 0;
+  let weightSum = 0;
+  for (const [dim, weight] of Object.entries(requirements)) {
+    const capability = model[dim as keyof ModelCapabilities] ?? 50;
+    weightedSum += weight * capability;
+    weightSum += weight;
+  }
+  return weightSum > 0 ? weightedSum / weightSum : 50;
+}
+```
+
+This produces a **weighted average** in the range `0–100`, where each dimension's contribution is proportional to its requirement weight. The output is directly comparable across models regardless of how many dimensions the requirement vector has.
+
+**Tie-breaking:** When two models score within 2 points of each other, prefer the cheaper model (by `MODEL_COST_PER_1K_INPUT`). If cost is also equal, break ties by lexicographic model ID for determinism.
+
+### Configuration Model
+
+Built-in capability profiles ship as a data table alongside `MODEL_CAPABILITY_TIER` and `MODEL_COST_PER_1K_INPUT` in `model-router.ts`:
+
+```ts
+const MODEL_CAPABILITY_PROFILES: Record<string, ModelCapabilities> = {
+  "claude-opus-4-6":     { coding: 95, debugging: 90, research: 85, reasoning: 95, speed: 30, longContext: 80, instruction: 90 },
+  "claude-sonnet-4-6":   { coding: 85, debugging: 80, research: 75, reasoning: 80, speed: 60, longContext: 75, instruction: 85 },
+  "claude-haiku-4-5":    { coding: 60, debugging: 50, research: 45, reasoning: 50, speed: 95, longContext: 50, instruction: 75 },
+  "gpt-4o":              { coding: 80, debugging: 75, research: 70, reasoning: 75, speed: 65, longContext: 70, instruction: 80 },
+  "gpt-4o-mini":         { coding: 55, debugging: 45, research: 40, reasoning: 45, speed: 90, longContext: 45, instruction: 70 },
+  "gemini-2.5-pro":      { coding: 75, debugging: 70, research: 85, reasoning: 75, speed: 55, longContext: 90, instruction: 75 },
+  "gemini-2.0-flash":    { coding: 50, debugging: 40, research: 50, reasoning: 40, speed: 95, longContext: 60, instruction: 65 },
+  "deepseek-chat":       { coding: 75, debugging: 65, research: 55, reasoning: 70, speed: 70, longContext: 55, instruction: 65 },
+  "o3":                  { coding: 80, debugging: 85, research: 80, reasoning: 92, speed: 25, longContext: 70, instruction: 85 },
+};
+```
+
+Users can override capability profiles in `models.json` per provider:
+
+```json
+{
+  "providers": {
+    "anthropic": {
+      "modelOverrides": {
+        "claude-sonnet-4-6": {
+          "capabilities": {
+            "debugging": 90,
+            "research": 85
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+Partial overrides are deep-merged with built-in defaults. This uses the same `modelOverrides` path that already supports `contextWindow`, `cost`, and `compat` overrides.
+
+### Profile Versioning
+
+Built-in capability profiles are maintained alongside the existing `MODEL_CAPABILITY_TIER` and `MODEL_COST_PER_1K_INPUT` tables in `model-router.ts`. When the `@gsd/pi-ai` model catalog is updated with new models, the capability profile table must be updated in the same PR. A linting rule should flag any model present in `MODEL_CAPABILITY_TIER` but missing from `MODEL_CAPABILITY_PROFILES`.
+
+Profiles are versioned implicitly by GSD release. The existing `models.json` `modelOverrides` mechanism allows users to correct stale defaults immediately without waiting for a GSD update.
+
+### Extension-First Rollout
+
+Capability-aware routing should be prototypable as an extension before moving to core. The current hook surface is **insufficient** for this:
+
+- `before_provider_request` fires after model selection, at the API payload level — too late to swap model choice.
+- `model_select` fires reactively when a model changes, not before selection — it cannot influence the choice.
+
+**Required hook addition:** A `before_model_select` hook that fires within `selectAndApplyModel()` after tier classification but before `resolveModelForComplexity()`. This hook would receive:
+
+```ts
+interface BeforeModelSelectEvent {
+  unitType: string;
+  unitId: string;
+  classification: ClassificationResult;
+  taskMetadata: TaskMetadata;
+  eligibleModels: string[];     // tier-filtered available models
+  phaseConfig: ResolvedModelConfig;
+}
+```
+
+Return value: `{ modelId: string } | undefined` (override selection, or undefined to use default).
+
+This hook enables an extension to implement capability scoring externally, test it against real workloads, and validate behavior before the logic moves into `model-router.ts`.
+
+**Rollout sequence:**
+
+1. **Phase 1:** Add `before_model_select` hook and `TaskMetadata` to `ClassificationResult`. Ship built-in capability profile data table. No core routing changes.
+2. **Phase 2:** Implement capability scoring as an extension that hooks `before_model_select`. Gather user feedback through routing history.
+3. **Phase 3:** If behavior proves stable, move scoring into `resolveModelForComplexity()` in core. Extension hook remains for custom routing strategies.
+
+### Observability
+
+Every routing decision must be inspectable. The existing `RoutingDecision` interface is extended:
+
+```ts
+interface RoutingDecision {
+  modelId: string;
+  fallbacks: string[];
+  tier: ComplexityTier;
+  wasDowngraded: boolean;
+  reason: string;
+  // New fields:
+  capabilityScores?: Record<string, number>;    // model ID → score
+  taskRequirements?: Partial<Record<string, number>>;  // dimension → weight
+  selectionMethod: "tier-only" | "capability-scored";
+}
+```
+
+When verbose mode is on, the routing notification includes the top-scoring models and why the winner was selected:
+
+```
+Dynamic routing [S]: claude-sonnet-4-6 (scored 82.3 — coding:0.9×85, debugging:0.6×80)
+  runner-up: gpt-4o (scored 78.1)
+```
+
+## Consequences
+
+### Positive
+
+#### 1. Better model-task fit
+
+Routing decisions become based on the kind of work being done, not only how expensive or complex the work appears. A debugging task routes to the strongest debugger in the pool; a research task routes to the best synthesizer.
+
+#### 2. Works across arbitrary model pools
+
+The router no longer depends on a hardcoded vendor assumption. If a user has only Claude + Codex, it can still route intelligently between them. If the user adds Gemini or local models later, the same scoring system continues to work.
+
+#### 3. Preserves all existing invariants
+
+- **Downgrade-only semantics:** capability scoring never upgrades beyond the user's configured phase model.
+- **Budget pressure:** unchanged — constrains tier eligibility before scoring runs.
+- **Retry escalation:** unchanged — escalates tier, then scoring picks the best model in the new tier.
+- **Fallback chains:** assembled the same way, with capability-scored winner as primary.
+
+#### 4. Creates a testable, versionable contract for routing behavior
+
+Capability profiles and task vectors are explicit data structures. Routing decisions are inspectable in verbose mode. The scoring function is a pure function suitable for deterministic unit tests.
+
+#### 5. Opens the door to adaptive learning
+
+Existing routing history (`routing-history.ts`) can later refine capability scores per task type. When a model consistently fails at a particular task shape, its effective score for that dimension decreases. This is a natural extension of the existing `getAdaptiveTierAdjustment()` mechanism.
+
+#### 6. Graceful degradation
+
+Models without capability profiles get uniform scores, producing the same cheapest-in-tier behavior as today. Zero behavior change for users who don't configure heterogeneous pools.
+
+### Negative
+
+#### 1. More metadata to maintain
+
+Built-in model profiles will drift as model families evolve. Mitigation: profiles live in a single data table, versioned with GSD releases, with a lint rule for completeness.
+
+#### 2. Scoring can create false precision
+
+A `0–100` capability scale looks exact but is still heuristic. Mitigation: document profiles as "relative rankings, not benchmarks." The 2-point tie-breaking threshold prevents insignificant score differences from overriding cost optimization.
+
+#### 3. More routing complexity
+
+The current tier router is simple to explain and debug. Multi-dimensional scoring is more powerful but harder to reason about. Mitigation: verbose observability output shows scores and reasons. The `selectionMethod` field in routing decisions makes it clear whether capability scoring was active.
+
+#### 4. Stronger test requirements
+
+The router will need coverage for:
+
+- profile loading and override merge rules (partial deep-merge from `modelOverrides`)
+- `computeTaskRequirements()` with various unit types and metadata combinations
+- scoring function correctness (weighted average, tie-breaking)
+- interaction with tier eligibility filtering
+- budget pressure applied before scoring, not conflicting with it
+- fallback behavior when no scored model is eligible
+- graceful degradation when no profiles exist (uniform scores)
+- `before_model_select` hook contract (extension path)
+
+#### 5. New hook surface to maintain
+
+The `before_model_select` hook adds a new extension API contract that must be maintained across releases. Mitigation: the hook is narrowly scoped — one event type, optional return.
+
+### Neutral / Migration
+
+#### 1. Tier-based routing does not disappear
+
+Complexity tiers remain as:
+
+- the primary "how hard is this" signal that determines tier eligibility
+- the fallback behavior for models without capability profiles
+- the escalation path on retries (light → standard → heavy)
+
+Capability scoring adds the "what kind of work" signal on top. The two systems are layered, not competing.
+
+#### 2. Existing preferences continue to work
+
+`dynamic_routing.tier_models` still works — it pins a specific model per tier, bypassing capability scoring for that tier. Per-phase model overrides (`models.planning`, `models.execution`, etc.) continue to set the ceiling. No existing configuration breaks.
+
+#### 3. Documentation update required
+
+`docs/dynamic-model-routing.md` must be updated to explain:
+
+- what capability profiles are and how to override them
+- how scoring interacts with tier routing
+- how to read verbose routing output
+- how to use `before_model_select` for custom routing extensions
+
+## Risks
+
+### 1. Hardcoded vendor stereotypes become stale
+
+If the default profiles are not reviewed regularly, GSD will encode outdated assumptions about which models are "best" at which tasks.
+
+**Mitigation:** Keep defaults in a single data table (not scattered conditionals). Lint for completeness against the model catalog. User overrides via `modelOverrides` provide immediate escape hatch. Document profiles as heuristic rankings, not benchmarks.
+
+### 2. Budget logic and capability logic may conflict in user perception
+
+The highest-scoring model may not be selected because budget pressure constrained the eligible tier. This could look inconsistent if the user doesn't understand the pipeline order.
+
+**Mitigation:** Pipeline order is explicit and enforced in code:
+1. Complexity classification determines tier
+2. Budget pressure may downgrade tier
+3. Tier-eligible models are filtered (downgrade-only from user ceiling)
+4. Capability scoring ranks the eligible set
+5. Cost tie-breaks within scoring threshold
+
+Verbose output shows each step. The user sees "budget pressure: 85%" in the reason string when downgrade occurs.
+
+### 3. Task-type classification may be too coarse initially
+
+A unit type like `execute-task` contains many sub-shapes. The initial base vector plus metadata refinement may not distinguish all meaningful cases.
+
+**Mitigation:** The `computeTaskRequirements()` function is designed for iterative refinement. The existing `TaskMetadata` already captures tags, complexity keywords, file counts, dependency counts, and code block counts. New metadata signals can be added to the existing `extractTaskMetadata()` without changing the scoring function. Routing history provides signal on where refinement is needed.
+
+### 4. Unknown and custom models may score poorly by default
+
+Users often bring custom provider IDs, local models, or vendor aliases that will not exist in the built-in profile table.
+
+**Mitigation:** Unknown models receive uniform scores (50 across all dimensions), making capability scoring a no-op — they compete on cost within their tier, same as today. Users can add capability profiles via `modelOverrides` in `models.json` for models they know well.
+
+### 5. Extension hook adds API surface
+
+The `before_model_select` hook creates a contract that extensions may depend on.
+
+**Mitigation:** The hook has a narrow, well-defined interface. It is additive (existing hooks unchanged). The return type is simple (`{ modelId } | undefined`). Breaking changes would be handled through the same extension API versioning as other hooks.
+
+## Alternatives Considered
+
+### A. Keep pure complexity-tier routing
+
+Rejected because it optimizes cost within a tier but still treats meaningfully different models as interchangeable. The existing `MODEL_CAPABILITY_TIER` table already proves this is a recognized gap — it just stops at three buckets.
+
+### B. Hardcode task → model mappings
+
+Rejected because it breaks as soon as the user does not have the expected model. This is appropriate for a closed product with a fixed fleet, not for GSD's user-configured provider model.
+
+### C. Route only by user-specified per-phase models
+
+Rejected because it pushes all routing intelligence onto the user and does not adapt to retries, task subtype, or provider heterogeneity.
+
+### D. Use capability-aware routing only as an extension, never in core
+
+Not rejected as a starting point, but insufficient as the long-term architecture. Extension prototyping is the recommended first phase. However, coherent preferences, diagnostics, testing, and profile versioning will likely require core integration if the model proves valuable.
+
+### E. Add `costEfficiency` as a capability dimension
+
+Rejected because it conflates two concerns. If cost appears in both the scoring function and the budget constraint, the router has two competing cost signals that produce confusing behavior (e.g., a cheap model wins on `costEfficiency` score but then gets filtered out by budget pressure, or vice versa). Cost constrains eligibility; capability determines ranking.
+
+### F. Use static requirement vectors per unit type (no metadata refinement)
+
+Rejected because the existing `classifyUnitComplexity()` already proves that unit type alone is too coarse. A `execute-task` for docs vs. a `execute-task` for migration are categorically different. The metadata signals (tags, complexity keywords, file counts) that the classifier already extracts should inform requirement vectors.
+
+## Appendix: Current Architecture Reference
+
+For implementors, the current routing pipeline files:
+
+| File | Role |
+|------|------|
+| `auto-dispatch.ts` | Rule table that determines unit type + prompt |
+| `auto-model-selection.ts` | Orchestrates model selection for each dispatch |
+| `complexity-classifier.ts` | Tier classification with task metadata analysis |
+| `model-router.ts` | Tier → model resolution with downgrade-only semantics |
+| `routing-history.ts` | Adaptive learning from success/failure patterns |
+| `preferences-models.ts` | Per-phase model config resolution and fallbacks |
+| `register-hooks.ts` | Hook registration including `before_provider_request` |
+
+The capability scoring additions would primarily touch `model-router.ts` (profiles, scoring function) and `auto-model-selection.ts` (passing metadata to the router, new hook point).
--- a/docs/ADR-007-model-catalog-split.md
+++ b/docs/ADR-007-model-catalog-split.md
@ -0,0 +1,285 @@
+# ADR-007: Model Catalog Split and Provider API Encapsulation
+
+**Status:** Proposed
+**Date:** 2026-04-03
+**Deciders:** Jeremy McSpadden
+**Related:** ADR-004 (capability-aware model routing), [ADR-005](https://github.com/gsd-build/gsd-2/issues/2790), [ADR-006](https://github.com/gsd-build/gsd-2/issues/2995), `packages/pi-ai/src/providers/`, `packages/pi-ai/src/models.ts`
+
+## Context
+
+The model/provider system in `pi-ai` has two structural problems worth fixing — but the system is **not fundamentally broken**. The heavy lifting (lazy SDK imports, registry-based dispatch, extension-based registration) is already well-designed. This ADR targets the two areas where the current design creates real friction without proposing unnecessary runtime changes.
+
+### Current Architecture
+
+```
+stream.ts
+  └─ import "./providers/register-builtins.js"  ← side-effect import at load time
+       ├─ import anthropic.ts            (6.8 KB)
+       ├─ import anthropic-vertex.ts     (3.9 KB)
+       ├─ import openai-completions.ts   (26 KB)
+       ├─ import openai-responses.ts     (6.4 KB)
+       ├─ import openai-codex-responses.ts (29 KB)
+       ├─ import azure-openai-responses.ts (7.8 KB)
+       ├─ import google.ts              (13.6 KB)
+       ├─ import google-vertex.ts       (14.5 KB)
+       ├─ import google-gemini-cli.ts   (30 KB)
+       ├─ import mistral.ts             (18.9 KB)
+       └─ amazon-bedrock.ts             (24 KB) ← only lazy-loaded provider
+
+models.ts
+  └─ import models.generated.ts   ← 13,848 lines, ALL providers, loaded at init
+  └─ import models.custom.ts      ← 197 lines, additional providers
+```
+
+### What Already Works Well
+
+1. **SDK lazy loading.** Every provider file uses `async function getXxxClass()` with a cached dynamic `import()`. The heavy npm packages (`@anthropic-ai/sdk`, `openai`, `@google/genai`, `@aws-sdk/*`, `@mistralai/*`) are only loaded on first API call. This is where the real startup cost would be — and it's already handled.
+
+2. **Registry-based dispatch.** `api-registry.ts` cleanly maps API types to stream functions. Callers use `stream(model, context)` and the registry routes to the right provider. This pattern is sound.
+
+3. **Extension registration.** Ollama and Claude Code CLI register via `registerApiProvider()` at runtime. This extensibility point works correctly.
+
+4. **Provider implementation code loading (~200KB total).** While all providers load eagerly, V8 parses local `.js` files in single-digit milliseconds each. The total parse cost for all provider files is ~10-30ms — not a user-visible bottleneck on a CLI that's about to make a multi-second API call anyway.
+
+### What's Actually Worth Fixing
+
+#### Problem 1: Monolithic model catalog — developer experience, not runtime
+
+`models.generated.ts` is **13,848 lines in a single file**. This creates real friction:
+
+- **PR reviews are painful.** When the generation script runs, the diff is a wall of changes across unrelated providers. Reviewers can't tell what actually changed for a specific provider.
+- **Navigation is slow.** Finding a specific model requires scrolling or searching through thousands of lines of static object literals.
+- **Merge conflicts are frequent.** Any two PRs that touch model generation will conflict on the same monolithic file.
+- **Git blame is useless.** Every line was "last changed" by the generation script, obscuring the history of individual provider additions.
+
+The runtime cost of loading all model definitions is negligible — a Map of ~200 model objects is maybe 50-100KB of heap. The problem is purely about code organization and developer workflow.
+
+#### Problem 2: Barrel export leaks provider internals — API design
+
+`packages/pi-ai/src/index.ts` re-exports every provider module's internals:
+
+```typescript
+export * from "./providers/anthropic.js";
+export * from "./providers/google.js";
+export * from "./providers/google-gemini-cli.js";
+export * from "./providers/google-vertex.js";
+export * from "./providers/mistral.js";
+export * from "./providers/openai-completions.js";
+export * from "./providers/openai-responses.js";
+// ... etc
+```
+
+This is a public API problem:
+
+- **Consumers can bypass the registry.** Any code that `import { streamAnthropic } from "pi-ai"` has a direct dependency on an implementation detail that should be internal.
+- **Refactoring is blocked.** Renaming a function inside a provider file is a breaking change because it's re-exported from the package root.
+- **API surface is unnecessarily large.** The public API should be `stream()`, `streamSimple()`, `registerApiProvider()`, model utilities, and types. Provider-specific stream functions are implementation details.
+
+### What Is NOT Worth Changing
+
+**Lazy provider loading (converting `register-builtins.ts` to async on-demand loading).** This was considered and rejected because:
+
+1. **The SDKs are already lazy.** The heavy cost is handled. Provider implementation code (~200KB of local `.js`) parses in ~10-30ms total.
+2. **Async resolution adds complexity to the hot path.** `stream.ts` currently does a synchronous `Map.get()`. Making `resolveApiProvider` async adds a microtask hop to every API call — not just the first. Small but measurable, and for no user-visible gain.
+3. **High blast radius, low payoff.** Touching `stream.ts`, `api-registry.ts`, and the registration lifecycle simultaneously risks regressions in the core streaming path for an optimization that wouldn't show up in profiling.
+4. **Bedrock's lazy loading is a special case, not a template.** It exists because `@aws-sdk/client-bedrock-runtime` is uniquely massive. Generalizing this pattern to providers where the SDK is already lazy-imported doesn't compound the benefit.
+
+## Decision
+
+**Make two targeted improvements to code organization and API hygiene. Do not change runtime loading behavior.**
+
+### Change 1: Split `models.generated.ts` into per-provider files
+
+Replace the monolithic 13,848-line generated file with per-provider files:
+
+```
+packages/pi-ai/src/models/
+  ├── index.ts                  ← re-exports combined registry, same public API
+  ├── generated/
+  │   ├── anthropic.ts          ← Anthropic model definitions
+  │   ├── openai.ts             ← OpenAI model definitions
+  │   ├── google.ts             ← Google model definitions
+  │   ├── mistral.ts            ← Mistral model definitions
+  │   ├── amazon-bedrock.ts     ← Bedrock model definitions
+  │   ├── groq.ts               ← Groq model definitions
+  │   ├── xai.ts                ← xAI model definitions
+  │   ├── cerebras.ts           ← Cerebras model definitions
+  │   ├── openrouter.ts         ← OpenRouter model definitions
+  │   └── ...                   ← one file per provider in the catalog
+  ├── custom.ts                 ← replaces models.custom.ts (unchanged content)
+  └── capability-patches.ts     ← CAPABILITY_PATCHES extracted for clarity
+```
+
+**`models/index.ts` keeps the exact same synchronous public API:**
+
+```typescript
+// models/index.ts
+// GSD-2 — Model registry (split by provider for maintainability)
+
+import { ANTHROPIC_MODELS } from "./generated/anthropic.js";
+import { OPENAI_MODELS } from "./generated/openai.js";
+import { GOOGLE_MODELS } from "./generated/google.js";
+// ... one import per provider
+
+import { CUSTOM_MODELS } from "./custom.js";
+import { CAPABILITY_PATCHES, applyCapabilityPatches } from "./capability-patches.js";
+import type { Api, KnownProvider, Model, Usage } from "../types.js";
+
+// Combine all generated models into single registry — same as today
+const MODELS = {
+  ...ANTHROPIC_MODELS,
+  ...OPENAI_MODELS,
+  ...GOOGLE_MODELS,
+  // ...
+};
+
+// Rest of the file is identical to current models.ts:
+// modelRegistry Map construction, capability patch application,
+// getModel(), getProviders(), getModels(), calculateCost(),
+// supportsXhigh(), modelsAreEqual()
+```
+
+**Key constraint: loading stays synchronous and eager.** All model files are statically imported. The Map is built at module init exactly as today. No async, no lazy loading, no runtime behavior change. This is purely a file organization change.
+
+**Update `generate-models.ts`** to emit one file per provider instead of a single `models.generated.ts`. The script already groups models by provider internally — it just needs to write separate files instead of one.
+
+#### Why this matters
+
+| Before | After |
+|--------|-------|
+| PR diffs show 13K-line file changes | PR diffs scoped to the provider that changed |
+| Merge conflicts on any concurrent model update | Conflicts only when same provider is touched |
+| `git blame` shows "regenerate models" for every line | `git blame` shows per-provider history |
+| Finding a model = search through 13K lines | Finding a model = open the provider file |
+| One reviewer must understand all providers | Reviewers only need context for affected provider |
+
+### Change 2: Stop barrel-exporting provider internals
+
+**Update `packages/pi-ai/src/index.ts`:**
+
+```typescript
+// Before (current — 17 re-exports including all providers):
+export * from "./providers/anthropic.js";
+export * from "./providers/azure-openai-responses.js";
+export * from "./providers/google.js";
+export * from "./providers/google-gemini-cli.js";
+export * from "./providers/google-vertex.js";
+export * from "./providers/mistral.js";
+export * from "./providers/openai-completions.js";
+export * from "./providers/openai-responses.js";
+export * from "./providers/register-builtins.js";
+// ...
+
+// After (clean public API):
+export * from "./api-registry.js";
+export * from "./env-api-keys.js";
+export * from "./models/index.js";
+export * from "./providers/register-builtins.js";  // resetApiProviders() is public
+export * from "./stream.js";
+export * from "./types.js";
+export * from "./utils/event-stream.js";
+export * from "./utils/json-parse.js";
+export type { OAuthAuthInfo, OAuthCredentials, /* ... */ } from "./utils/oauth/types.js";
+export * from "./utils/overflow.js";
+export * from "./utils/typebox-helpers.js";
+export * from "./utils/repair-tool-json.js";
+export * from "./utils/validation.js";
+```
+
+Provider-specific exports (`streamAnthropic`, `streamGoogle`, etc.) are removed from the public API. Any external consumer that imported them directly should use the registry-based `stream()` / `streamSimple()` functions instead — which is how all internal callers already work.
+
+#### Why this matters
+
+- **Enforces the registry pattern.** The correct way to call a provider is `stream(model, context)`. Direct provider function imports create fragile coupling.
+- **Enables future refactoring.** Provider internal function signatures can change without breaking the package API. Today, renaming `streamAnthropic` would be a semver-breaking change.
+- **Reduces API surface.** Consumers see only what they need: `stream`, `streamSimple`, `registerApiProvider`, model utilities, and types.
+
+### What Does NOT Change
+
+- **Runtime behavior** — all providers still load eagerly, same as today
+- **The `Model<TApi>` type system** — all types, interfaces, and generics stay the same
+- **The `ApiProvider` interface** — providers still implement `{ api, stream, streamSimple }`
+- **The `api-registry.ts` registry** — synchronous `Map.get()` dispatch, unchanged
+- **`stream.ts`** — no changes to the streaming entry point
+- **`register-builtins.ts`** — still eagerly imports and registers all providers (only `resetApiProviders` remains in barrel export)
+- **The extension system** — `registerApiProvider()` continues to work for Ollama, Claude Code CLI, etc.
+- **`models.json` user config** — custom models, overrides, provider settings are unaffected
+- **Model discovery** — discovery adapters are already lazy and independent
+- **Model routing** — ADR-004's capability-aware routing is orthogonal
+
+## Consequences
+
+### Positive
+
+1. **Cleaner PRs.** Model catalog changes are scoped to the provider that changed. Reviewers see a 200-line diff in `models/generated/openai.ts` instead of a 13K-line diff in `models.generated.ts`.
+
+2. **Fewer merge conflicts.** Two PRs that update different providers no longer conflict on the same file.
+
+3. **Better navigability.** Developers can jump directly to `models/generated/anthropic.ts` to see Anthropic's model definitions instead of searching through a monolith.
+
+4. **Cleaner package API.** `pi-ai` exports only what consumers need. Provider internals are properly encapsulated.
+
+5. **Future-proofs refactoring.** Provider implementation details can evolve without breaking the public API contract.
+
+6. **Zero runtime risk.** No changes to loading, registration, streaming, or dispatch. The refactor is purely structural.
+
+### Negative
+
+1. **More files.** Instead of 1 generated file + 1 custom file, we'll have ~15-20 generated files. Marginal complexity increase, but each file is focused and small.
+
+2. **Generation script update.** `generate-models.ts` needs to write per-provider files. The script already groups by provider, so this is straightforward but requires testing.
+
+3. **Import audit for barrel export change.** Any code that directly imports `streamAnthropic` (etc.) from `pi-ai` needs to be updated. Based on research, the main consumer is `register-builtins.ts` itself, which imports providers directly (not through the barrel). External usage should be minimal.
+
+## Alternatives Considered
+
+### 1. Full lazy provider loading (original ADR-005 proposal)
+
+Make all providers load on-demand via async dynamic imports, generalizing the Bedrock pattern. **Rejected** because:
+- SDK imports are already lazy — the heavy cost is handled
+- Provider implementation parsing is ~10-30ms total — not a bottleneck
+- Adds async complexity to the synchronous stream dispatch hot path
+- High migration effort and regression risk for unmeasurable performance gain
+
+### 2. Plugin architecture with separate npm packages
+
+Move each provider to its own package (`@gsd/provider-anthropic`, etc.). Maximum isolation but dramatically more complex build/release/versioning. Overkill for a monorepo where all providers ship together.
+
+### 3. Do nothing
+
+The current architecture works. This is a valid choice. The split is justified by the developer experience friction (13K-line file, merge conflicts, unusable git blame) and the API hygiene issue (leaking provider internals), not by a runtime problem. If the team is not experiencing these friction points, deferring is reasonable.
+
+## Implementation Plan
+
+### Wave 1: Split Model Catalog (Low-Medium Risk)
+1. Update `generate-models.ts` to emit per-provider files into `models/generated/`
+2. Create `models/index.ts` that imports all per-provider files and builds the same registry
+3. Extract `CAPABILITY_PATCHES` into `models/capability-patches.ts`
+4. Move `models.custom.ts` to `models/custom.ts`
+5. Update imports in `models.ts` (or replace it with the new `models/index.ts`)
+6. Verify `npm run build` and `npm run test` pass
+7. Delete `models.generated.ts` and `models.custom.ts`
+
+### Wave 2: Clean Up Barrel Export (Low Risk)
+1. Remove provider re-exports from `index.ts`
+2. Grep for direct provider imports from `"pi-ai"` across the codebase
+3. Migrate any found usages to use `stream()` / `streamSimple()` through the registry
+4. Verify build and tests
+
+### Wave 3: Validate
+1. Run full test suite
+2. Verify extension registration (Ollama, Claude Code CLI) still works
+3. Verify `resetApiProviders()` test helper still works
+4. Spot-check a few providers end-to-end
+
+## References
+
+- Current model catalog: `packages/pi-ai/src/models.generated.ts` (13,848 lines)
+- Current barrel export: `packages/pi-ai/src/index.ts`
+- Model registry: `packages/pi-ai/src/models.ts`
+- API provider registry: `packages/pi-ai/src/api-registry.ts`
+- Eager registration: `packages/pi-ai/src/providers/register-builtins.ts`
+- Stream dispatch: `packages/pi-ai/src/stream.ts`
+- Generation script: `packages/pi-ai/scripts/generate-models.ts`
+- Extension registration: `packages/pi-coding-agent/src/core/model-registry.ts`
+- ADR-004: `docs/ADR-004-capability-aware-model-routing.md`
--- a/docs/FRONTIER-TECHNIQUES.md
+++ b/docs/FRONTIER-TECHNIQUES.md
@ -0,0 +1,741 @@
+# Frontier Techniques for GSD-2
+
+Research into cutting-edge AI agent techniques that map directly to GSD-2's architecture, ranked by impact and feasibility.
+
+**Date:** 2026-03-25
+**Status:** Research / Pre-RFC
+
+---
+
+## Table of Contents
+
+- [Executive Summary](#executive-summary)
+- [1. Skill Library Evolution](#1-skill-library-evolution)
+- [2. DAG-Based Parallel Tool Execution](#2-dag-based-parallel-tool-execution)
+- [3. Speculative Tool Execution](#3-speculative-tool-execution)
+- [4. Semantic Context Compression](#4-semantic-context-compression)
+- [5. Cross-Session Learning Graph](#5-cross-session-learning-graph)
+- [6. MCTS-Based Planning](#6-mcts-based-planning)
+- [Priority Matrix](#priority-matrix)
+- [Sources & References](#sources--references)
+
+---
+
+## Executive Summary
+
+GSD-2 is a multi-layered, event-driven agent platform with strong extensibility primitives: a skill system, file-based memory, session branching, compaction, and 16+ extension lifecycle hooks. These existing primitives create natural integration points for six frontier techniques that could fundamentally change how GSD operates.
+
+The techniques fall into three categories:
+
+| Category | Techniques | Theme |
+|----------|-----------|-------|
+| **Self-Improvement** | Skill Library Evolution, Cross-Session Learning Graph | GSD gets better the more you use it |
+| **Performance** | DAG Tool Execution, Speculative Tool Execution | GSD gets faster per turn |
+| **Intelligence** | Semantic Context Compression, MCTS Planning | GSD reasons better with the same context budget |
+
+---
+
+## 1. Skill Library Evolution
+
+**Category:** Self-Improvement
+**Impact:** Massive | **Effort:** Medium | **Priority:** #1
+
+### What It Is
+
+Inspired by [SkillRL](https://arxiv.org/abs/2602.08234) (ICLR 2026), this technique transforms GSD's skill system from static instruction files into a self-improving knowledge base. Instead of skills being written once and updated manually, they evolve based on execution outcomes.
+
+SkillRL demonstrates that agents with learned skill libraries outperform baselines by 15.3%+ across task benchmarks, with 10-20% token compression compared to raw trajectory storage.
+
+### How It Works
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    EXECUTION LOOP                       │
+│                                                         │
+│  1. Skill invoked → agent executes task                 │
+│  2. Outcome captured (success/failure + trajectory)     │
+│  3. Trajectory distilled:                               │
+│     ├─ Success → strategic pattern extracted            │
+│     └─ Failure → anti-pattern + lesson recorded         │
+│  4. Skill file updated with versioned improvement       │
+│  5. Next invocation benefits from accumulated learnings │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+```
+
+**Two types of learned knowledge:**
+
+| Type | Description | Example |
+|------|-------------|---------|
+| **General Skills** | Universal strategic guidance applicable across tasks | "When editing TypeScript files, always check for type errors via LSP before committing" |
+| **Task-Specific Skills** | Category-level heuristics for specific skill domains | "The `fix-issue` skill should check CI status before opening a PR, not after" |
+
+### Why It Fits GSD-2
+
+GSD already has every primitive needed:
+
+- **Skill files** (`~/.claude/skills/`, `.claude/skills/`) — the storage layer exists
+- **Extension hooks** (`turn_end`, `agent_end`) — outcome capture points exist
+- **Memory system** (MEMORY.md + individual files) — persistence exists
+- **`/improve-skill` and `/heal-skill` commands** — manual versions of this loop already exist
+
+The gap is automation: connecting execution outcomes back to skill files without human intervention.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-session.ts` → `turn_end` event | Captures execution outcome (success/failure signals) |
+| Extension hook: `agent_end` | Triggers trajectory distillation |
+| Skill file system | Receives versioned updates with learned patterns |
+| `compaction.ts` | Provides trajectory data from the session for distillation |
+
+### Architecture
+
+```
+User invokes skill
+        │
+        ▼
+┌──────────────┐     ┌──────────────────┐
+│ AgentSession  │────▶│  Skill Executor   │
+│ (turn_end)    │     │  (tracks outcome) │
+└──────────────┘     └────────┬─────────┘
+                              │
+                    ┌─────────▼──────────┐
+                    │ Outcome Classifier  │
+                    │ (success/failure/   │
+                    │  partial)           │
+                    └─────────┬──────────┘
+                              │
+              ┌───────────────┼───────────────┐
+              ▼               ▼               ▼
+     ┌────────────┐  ┌──────────────┐  ┌───────────┐
+     │  Success   │  │   Failure    │  │  Partial   │
+     │  Distiller │  │  Distiller   │  │  Analyzer  │
+     └─────┬──────┘  └──────┬───────┘  └─────┬─────┘
+           │                │                 │
+           ▼                ▼                 ▼
+     ┌─────────────────────────────────────────────┐
+     │           Skill File Updater                 │
+     │  • Appends learned pattern to skill          │
+     │  • Versions the update                       │
+     │  • Preserves original skill intent           │
+     └─────────────────────────────────────────────┘
+```
+
+### Open Questions
+
+- **Drift prevention:** How to prevent accumulated learnings from overwhelming the original skill intent?
+- **Conflict resolution:** What happens when a lesson from one session contradicts another?
+- **Quality gate:** Should updates require a validation pass before being written?
+
+---
+
+## 2. DAG-Based Parallel Tool Execution
+
+**Category:** Performance
+**Impact:** High | **Effort:** Medium | **Priority:** #2
+
+### What It Is
+
+The [LLM Compiler pattern](https://arxiv.org/pdf/2312.04511) (ICML 2024) treats multi-tool workflows like a compiler optimization pass. When the model returns multiple tool calls in a single response, instead of executing them sequentially, the system:
+
+1. Analyzes dependencies between tool calls
+2. Constructs a Directed Acyclic Graph (DAG)
+3. Executes independent tools in parallel
+4. Blocks only on actual data dependencies
+
+### How It Works
+
+**Current GSD behavior (sequential):**
+```
+Read(auth.ts) ─── 150ms ───▶ result
+                               │
+Read(types.ts) ─── 120ms ──▶ result
+                               │
+Grep("login") ─── 80ms ────▶ result
+                               │
+Read(test.ts) ─── 130ms ───▶ result
+                               │
+Total: ~480ms sequential
+```
+
+**With DAG execution (parallel):**
+```
+Read(auth.ts)  ─── 150ms ──▶ result ─┐
+Read(types.ts) ─── 120ms ──▶ result ─┤
+Grep("login")  ─── 80ms ───▶ result ─┤── all complete at 150ms
+Read(test.ts)  ─── 130ms ──▶ result ─┘
+                                      │
+Total: ~150ms (max of parallel set)
+```
+
+**Dependency analysis rules:**
+
+| Tool A | Tool B | Dependency? | Reason |
+|--------|--------|-------------|--------|
+| Read(file) | Read(file) | No | Reads are idempotent |
+| Read(file) | Grep(pattern) | No | Independent data sources |
+| Read(file) | Edit(file) | Yes | Edit depends on Read content |
+| Edit(file) | Edit(file) | Yes | Edits to same file must serialize |
+| Bash(cmd) | Bash(cmd) | Maybe | Depends on side effects |
+| Write(file) | Read(file) | Yes | Read after write needs write to complete |
+
+### Why It Fits GSD-2
+
+The model already emits multiple `tool_use` blocks in a single response. GSD processes them, but the execution path in `agent-loop.ts` handles them in sequence. The parallelism opportunity is sitting right there.
+
+**Measured impact estimate:** A typical coding turn involves 3-5 tool calls. With 60% parallelizable (reads, greps, globs), per-turn latency drops by 40-60%. Over a 50-turn session, that's minutes saved.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-loop.ts` tool execution path | Replace sequential execution with DAG scheduler |
+| Tool definitions | Annotate tools with side-effect metadata (pure/impure) |
+| Extension hooks (`tool_*`) | Must still fire in correct order per dependency chain |
+
+### Architecture
+
+```
+Model response with N tool_use blocks
+                │
+                ▼
+┌──────────────────────────────┐
+│      Dependency Analyzer      │
+│  • Parse tool calls           │
+│  • Identify file overlaps     │
+│  • Identify data dependencies │
+│  • Classify: pure vs impure   │
+└──────────────┬───────────────┘
+               │
+               ▼
+┌──────────────────────────────┐
+│        DAG Constructor        │
+│  • Nodes = tool calls         │
+│  • Edges = dependencies       │
+│  • Topological sort           │
+└──────────────┬───────────────┘
+               │
+               ▼
+┌──────────────────────────────┐
+│      Parallel Executor        │
+│  • Execute roots immediately  │
+│  • On completion, unlock      │
+│    dependent nodes            │
+│  • Collect all results        │
+│  • Return in original order   │
+└──────────────────────────────┘
+```
+
+### Open Questions
+
+- **Bash side effects:** How to determine if two Bash commands conflict without executing them?
+- **Extension hooks:** Should `tool_start`/`tool_end` events fire in execution order or original order?
+- **Error propagation:** If a parallel tool fails, do dependent tools get cancelled or receive the error?
+
+---
+
+## 3. Speculative Tool Execution
+
+**Category:** Performance
+**Impact:** High | **Effort:** Low-Medium | **Priority:** #3
+
+### What It Is
+
+Based on [Speculative Tool Calls research](https://arxiv.org/pdf/2512.15834), this technique predicts which tools the model will request and pre-executes them before the model responds. Correct predictions eliminate the first tool-call round-trip entirely. Wrong predictions are discarded at zero cost beyond compute.
+
+### How It Works
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ User: "fix the bug in auth.ts"                              │
+│                                                             │
+│ BEFORE model responds:                                      │
+│   Speculator predicts:                                      │
+│     ├─ Read("auth.ts")           → pre-executed ✓           │
+│     ├─ Grep("error|bug", "auth") → pre-executed ✓           │
+│     ├─ LSP diagnostics(auth.ts)  → pre-executed ✓           │
+│     └─ Read("auth.test.ts")      → pre-executed ✓           │
+│                                                             │
+│ Model responds with tool calls:                             │
+│     ├─ Read("auth.ts")           → CACHE HIT (0ms)         │
+│     ├─ Read("auth.test.ts")      → CACHE HIT (0ms)         │
+│     └─ Grep("login", "src/")     → cache miss (execute)    │
+│                                                             │
+│ Hit rate: 2/3 = 67%                                         │
+│ Latency saved: ~300ms on this turn                          │
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Prediction strategies (simplest to most sophisticated):**
+
+| Strategy | Description | Expected Hit Rate |
+|----------|-------------|-------------------|
+| **Keyword extraction** | Parse user prompt for file paths, function names → Read those files | 40-60% |
+| **Session history** | Track which tools follow which user prompt patterns | 50-70% |
+| **Learned patterns** | Use the skill library evolution data to predict tool sequences | 60-80% |
+| **Model pre-query** | Ask a fast/cheap model to predict tool calls | 70-85% |
+
+### Why It Fits GSD-2
+
+The #1 latency bottleneck in GSD is the round-trip: user prompt → model thinks → model requests tool → tool executes → result sent back → model thinks again. Speculative execution attacks the highest-latency step.
+
+GSD's architecture makes this easy to add:
+- `AgentSession.prompt()` already processes user input before sending to the model
+- Tool results are already cached in the message array
+- The extension system can intercept input and spawn pre-fetches
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `AgentSession.prompt()` | Trigger speculation after user input, before model call |
+| Tool result cache (new) | Store speculated results keyed by tool+args |
+| `agent-loop.ts` tool execution | Check cache before executing; serve cached result on hit |
+| Extension hook: `input` | Parse user intent for file paths, patterns |
+
+### Architecture
+
+```
+User input arrives
+        │
+        ├──────────────────────────────────────┐
+        │                                      │
+        ▼                                      ▼
+┌───────────────┐                    ┌──────────────────┐
+│  Send to LLM  │                    │   Speculator      │
+│  (normal path) │                    │  • Extract paths   │
+│               │                    │  • Predict tools   │
+│  ... waiting  │                    │  • Pre-execute     │
+│  for response │                    │  • Cache results   │
+│               │                    └──────────────────┘
+│               │                              │
+│               │◀─── model returns ──────────│
+│               │     tool_use blocks         │
+└───────┬───────┘                              │
+        │                                      │
+        ▼                                      │
+┌───────────────┐                              │
+│ Tool Executor  │◀──── check cache ───────────┘
+│ • Cache hit?   │
+│   → return     │
+│ • Cache miss?  │
+│   → execute    │
+└───────────────┘
+```
+
+### Cost Analysis
+
+| Scenario | Cost |
+|----------|------|
+| **Correct prediction** | ~0ms latency (result already available). Compute cost: the pre-execution itself (trivial for Read/Grep). |
+| **Wrong prediction** | Wasted compute for the pre-executed tool. For Read/Grep/Glob, this is <10ms of I/O. |
+| **Partial hit** | Net positive as long as hit rate > 20% (given how cheap misses are). |
+
+### Open Questions
+
+- **TTL for cached results:** How long are speculated results valid? File contents can change between speculation and model request.
+- **Side effects:** Should only pure tools (Read, Grep, Glob, LSP) be speculatable?
+- **Resource limits:** Cap on number of speculative executions per turn to prevent I/O storms?
+
+---
+
+## 4. Semantic Context Compression
+
+**Category:** Intelligence
+**Impact:** High | **Effort:** High | **Priority:** #4
+
+### What It Is
+
+GSD's compaction system uses a char/4 heuristic for token estimation and all-or-nothing LLM summarization for context reduction. Research from [Zylos](https://zylos.ai/research/2026-02-28-ai-agent-context-compression-strategies) and [context engineering literature](https://rlancemartin.github.io/2025/06/23/context_engineering/) shows that embedding-based compression achieves 80-90% token reduction while preserving the ability to selectively recall specific historical context.
+
+### Current GSD Compaction (Weaknesses Highlighted)
+
+```
+Messages: [M1, M2, M3, M4, M5, M6, M7, M8, M9, M10]
+                                                    ▲
+Token budget exceeded                               │ recent
+                                                    │
+Current approach:
+┌─────────────────────────┬─────────────────────────┐
+│  M1-M6: LLM-summarized │  M7-M10: kept verbatim  │
+│  into single blob       │  (last ~20k tokens)     │
+│                         │                         │
+│  ⚠ All detail lost      │  ✓ Full fidelity        │
+│  ⚠ No selective recall  │                         │
+│  ⚠ char/4 overestimates │                         │
+└─────────────────────────┴─────────────────────────┘
+```
+
+**Three specific weaknesses:**
+
+| Weakness | Impact | Current Code Location |
+|----------|--------|-----------------------|
+| char/4 token estimation | ~25% overestimate → compacts too early → wastes context | `compaction.ts:201-259` |
+| All-or-nothing summarization | Loses specific details that may be relevant later | `compaction.ts:327-400` |
+| No retrieval from compacted history | Once summarized, detail is gone forever | `compaction-orchestrator.ts` |
+
+### Proposed: Tiered Memory Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    HOT TIER                              │
+│  Recent turns (last ~20k tokens)                        │
+│  Full text, full fidelity                               │
+│  Storage: in-context messages                           │
+│  Access: always in prompt                               │
+├─────────────────────────────────────────────────────────┤
+│                    WARM TIER                             │
+│  Older turns (beyond context window)                    │
+│  Stored as embeddings + compressed text                 │
+│  Storage: session-local vector index                    │
+│  Access: retrieved when semantically relevant to        │
+│          current turn                                   │
+│  Token cost: only retrieved segments count              │
+├─────────────────────────────────────────────────────────┤
+│                    COLD TIER                             │
+│  Ancient turns / previous sessions                      │
+│  Stored as summaries + metadata                         │
+│  Storage: disk (existing session files)                 │
+│  Access: retrieved only on explicit recall              │
+│  Token cost: minimal summary headers                    │
+└─────────────────────────────────────────────────────────┘
+```
+
+**How retrieval works per turn:**
+
+```
+New user prompt arrives
+        │
+        ▼
+┌───────────────────┐
+│  Embed the prompt  │ (compute embedding of user's question)
+└────────┬──────────┘
+         │
+         ├──── query warm tier ──▶ top-K relevant historical turns
+         │                         (cosine similarity > threshold)
+         │
+         ├──── always include ──▶ hot tier (recent turns, full text)
+         │
+         ▼
+┌───────────────────┐
+│  Compose context   │
+│  = hot + retrieved │
+│  + system prompt   │
+└───────────────────┘
+```
+
+### Token Estimation Improvement
+
+Replace char/4 with adaptive estimation:
+
+| Approach | Accuracy | Cost |
+|----------|----------|------|
+| **char/4 (current)** | ~75% (overestimates) | Zero |
+| **Provider-reported usage** | 100% (for last turn) | Zero (already tracked) |
+| **tiktoken/provider tokenizer** | ~98% | ~5ms per message |
+| **Hybrid: actual for recent, char/4 for old** | ~95% | Negligible |
+
+The hybrid approach — use actual token counts from provider responses for recent messages, fall back to char/4 for older messages — is a quick win that requires no new dependencies.
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `compaction.ts` | Replace cut-point algorithm with tiered approach |
+| `compaction-orchestrator.ts` | Add warm-tier retrieval before model call |
+| `agent-session.ts` message building | Inject retrieved warm-tier segments |
+| Session persistence layer | Store embeddings alongside session entries |
+
+### Open Questions
+
+- **Embedding model:** Local (fast, private) or API (better quality, adds latency)?
+- **Index format:** Simple cosine similarity on flat arrays vs. HNSW index?
+- **Retrieval budget:** How many tokens to allocate to warm-tier retrievals per turn?
+- **Coherence:** How to prevent retrieved historical context from confusing the model about the current state?
+
+---
+
+## 5. Cross-Session Learning Graph
+
+**Category:** Self-Improvement
+**Impact:** Transformative | **Effort:** High | **Priority:** #5
+
+### What It Is
+
+GSD's memory system (MEMORY.md + individual files) stores flat, file-based memories. A learning graph extends this into a structured knowledge base that captures relationships between codebases, files, errors, solutions, and patterns across all sessions.
+
+This is informed by research on [agent memory architectures](https://github.com/Shichun-Liu/Agent-Memory-Paper-List) and the emerging discipline of [context engineering](https://thenewstack.io/memory-for-ai-agents-a-new-paradigm-of-context-engineering/).
+
+### Current Memory vs Learning Graph
+
+| Aspect | Current (MEMORY.md) | Learning Graph |
+|--------|---------------------|----------------|
+| **Structure** | Flat file list | Nodes + edges (graph) |
+| **Relationships** | None | "file X often breaks when Y changes" |
+| **Retrieval** | All loaded into context | Query-driven, only relevant nodes |
+| **Learning** | Manual (user says "remember X") | Automatic from execution outcomes |
+| **Scope** | Per-project directory | Per-project with cross-project patterns |
+| **Staleness** | Manual cleanup | Confidence decay over time |
+
+### Graph Schema
+
+```
+┌──────────┐     touches      ┌──────────┐
+│  Session  │────────────────▶│   File    │
+│           │                 │           │
+│ • date    │                 │ • path    │
+│ • outcome │                 │ • type    │
+│ • tokens  │                 │ • churn   │
+└────┬──────┘                 └─────┬─────┘
+     │                              │
+     │ encountered                  │ involved_in
+     │                              │
+     ▼                              ▼
+┌──────────┐    resolved_by   ┌──────────┐
+│  Error    │────────────────▶│ Solution  │
+│           │                 │           │
+│ • type    │                 │ • pattern │
+│ • message │                 │ • success │
+│ • freq    │                 │   rate    │
+└──────────┘                 └──────────┘
+     │                              │
+     │ prevented_by                 │ uses
+     │                              │
+     ▼                              ▼
+┌──────────┐                 ┌──────────┐
+│  Pattern  │                │   Tool   │
+│           │                │          │
+│ • type    │                │ • name   │
+│ • desc    │                │ • avg    │
+│ • conf    │                │   time   │
+└──────────┘                 └──────────┘
+```
+
+### Example Queries
+
+| Query | Result |
+|-------|--------|
+| "What errors have occurred in `auth.ts`?" | List of error nodes connected to that file node |
+| "What's the typical fix for `TypeError` in this codebase?" | Solution nodes with highest success rate for that error type |
+| "Which files tend to break together?" | File clusters with high co-occurrence in error sessions |
+| "What tools are slowest in this project?" | Tool nodes sorted by avg execution time |
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `session-manager.ts` | Write graph nodes on session save |
+| `agent-session.ts` prompt building | Query graph for relevant context before model call |
+| Memory system (MEMORY.md) | Coexists — graph handles structured knowledge, memory handles preferences/feedback |
+| Extension hook: `agent_end` | Trigger graph update with session outcome |
+
+### Storage Options
+
+| Option | Pros | Cons |
+|--------|------|------|
+| **SQLite + json columns** | Simple, no dependencies, fast queries | No native vector search |
+| **SQLite + sqlite-vss** | Adds vector similarity to SQLite | Extra native dependency |
+| **Flat JSON files** | Zero dependencies, git-friendly | Slow for large graphs |
+| **LanceDB** | Embedded vector DB, no server | Additional dependency |
+
+### Open Questions
+
+- **Privacy:** Graph contains detailed codebase interaction history — should it be encrypted at rest?
+- **Portability:** Should the graph travel with the project (`.claude/` dir) or stay user-local?
+- **Garbage collection:** How to prune stale nodes (e.g., files that no longer exist)?
+
+---
+
+## 6. MCTS-Based Planning
+
+**Category:** Intelligence
+**Impact:** Transformative | **Effort:** Very High | **Priority:** #6
+
+### What It Is
+
+Inspired by [ToolTree](https://www.agentic-patterns.com/patterns/skill-library-evolution/) and Monte Carlo Tree Search, this technique replaces GSD's linear action selection with a tree-based planner that explores multiple solution paths simultaneously.
+
+Instead of the model deciding one action at a time and hoping it works, the system:
+
+1. Generates N candidate next-actions
+2. Scores each based on estimated probability of reaching the goal
+3. Explores promising branches in parallel
+4. Backtracks when a path fails, without wasting the user's context on dead ends
+
+### Current vs MCTS Approach
+
+**Current (linear):**
+```
+User: "fix the auth bug"
+  │
+  ▼
+Action 1: Read auth.ts ──▶ Action 2: Edit line 45 ──▶ Action 3: Run tests
+                                                              │
+                                                         Tests fail ✗
+                                                              │
+                                                         ▼
+                                                    Action 4: Try different edit
+                                                              │
+                                                         Tests fail ✗
+                                                              │
+                                                         ▼
+                                                    Action 5: Read error log...
+                                                    (linear flailing)
+```
+
+**With MCTS (tree search):**
+```
+User: "fix the auth bug"
+  │
+  ▼
+Read auth.ts
+  │
+  ├── Branch A: Edit line 45 (score: 0.6)
+  │     └── Run tests → FAIL → prune
+  │
+  ├── Branch B: Check auth middleware (score: 0.7)  ◀── highest score
+  │     └── Edit middleware.ts → Run tests → PASS ✓
+  │
+  └── Branch C: Check env config (score: 0.3)
+        └── (not explored — lower score)
+
+Result: Branch B succeeds after 2 actions, not 5+
+```
+
+### Why It Fits GSD-2
+
+GSD already has session branching primitives:
+- `fork()` creates a branch from any message
+- Branch summaries compress history at fork points
+- Tree navigation (`/tree`) lets users explore branches
+- Session tree is already a first-class concept
+
+The gap: these primitives are user-triggered. MCTS would make the agent trigger them automatically during problem-solving.
+
+### Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    MCTS Planning Layer                   │
+│                                                         │
+│  ┌─────────────┐    ┌──────────────┐    ┌────────────┐ │
+│  │   Proposer   │───▶│   Scorer     │───▶│  Selector  │ │
+│  │ Generate N   │    │ Estimate P   │    │ Pick best  │ │
+│  │ candidates   │    │ of success   │    │ to explore │ │
+│  └─────────────┘    └──────────────┘    └─────┬──────┘ │
+│                                               │        │
+│  ┌─────────────┐    ┌──────────────┐          │        │
+│  │  Pruner     │◀───│   Executor   │◀─────────┘        │
+│  │ Kill dead   │    │ Run action   │                   │
+│  │ branches    │    │ in worktree  │                   │
+│  └─────────────┘    └──────────────┘                   │
+└─────────────────────────────────────────────────────────┘
+         │
+         ▼
+┌─────────────────────┐
+│  Agent Session       │
+│  (receives winning   │
+│   branch as result)  │
+└─────────────────────┘
+```
+
+### Scoring Approaches
+
+| Approach | Speed | Quality | Cost |
+|----------|-------|---------|------|
+| **Heuristic** (file relevance, error proximity) | Fast | Low | Free |
+| **Fast model** (haiku-class rates candidates) | Medium | Medium | Low |
+| **Self-evaluation** (main model rates its own proposals) | Slow | High | High |
+| **Learned scorer** (trained on past outcomes from learning graph) | Fast | High | Free at inference |
+
+### Integration Points
+
+| GSD Component | Role in Integration |
+|---------------|-------------------|
+| `agent-loop.ts` | New planning phase between user prompt and action execution |
+| Session branching (`fork()`) | Used to create exploration branches |
+| Git worktrees | Each branch explored in an isolated worktree |
+| `agent-session.ts` | Receives the winning branch and presents it as the result |
+| Skill Library Evolution (#1) | Provides learned patterns to improve the scorer over time |
+
+### Cost-Benefit Analysis
+
+| Factor | Value |
+|--------|-------|
+| **LLM calls per turn** | 2-5x more (proposal generation + scoring) |
+| **Token usage** | 3-10x more per complex problem |
+| **Success rate on hard problems** | Estimated 30-50% improvement |
+| **Time to solution** | Fewer total turns despite more LLM calls per turn |
+| **User experience** | Agent appears to "think harder" on hard problems |
+
+### Open Questions
+
+- **When to activate:** MCTS is expensive. Should it only activate when the agent detects a hard problem (repeated failures, high uncertainty)?
+- **Branch isolation:** Git worktrees work for file changes, but how to isolate Bash side effects?
+- **Budget control:** How many branches to explore before falling back to linear execution?
+- **Transparency:** Should the user see the exploration tree or just the winning path?
+
+---
+
+## Priority Matrix
+
+| # | Technique | Impact | Effort | Compounding | Dependencies |
+|---|-----------|--------|--------|-------------|--------------|
+| 1 | **Skill Library Evolution** | Massive | Medium | Yes — improves all other techniques | None |
+| 2 | **DAG Tool Execution** | High | Medium | No — static speedup | None |
+| 3 | **Speculative Tool Execution** | High | Low-Med | Yes — improves with learning | Benefits from #1 |
+| 4 | **Semantic Context Compression** | High | High | No — static improvement | None |
+| 5 | **Cross-Session Learning Graph** | Transformative | High | Yes — feeds #1, #3, #6 | Benefits from #1 |
+| 6 | **MCTS Planning** | Transformative | Very High | Yes — improves with #1, #5 | Benefits from #1, #5 |
+
+### Recommended Implementation Order
+
+```
+Phase 1 (Foundation)          Phase 2 (Performance)       Phase 3 (Intelligence)
+─────────────────────         ─────────────────────       ─────────────────────
+┌─────────────────┐          ┌─────────────────┐         ┌─────────────────┐
+│ Skill Library    │          │ DAG Tool Exec   │         │ Semantic Context│
+│ Evolution        │──feeds──▶│                 │         │ Compression     │
+│                  │          │ Speculative     │         │                 │
+│                  │──feeds──▶│ Tool Exec       │         │ MCTS Planning   │
+└─────────────────┘          └─────────────────┘         └─────────────────┘
+                                      │                          ▲
+┌─────────────────┐                   │                          │
+│ Cross-Session   │───────────────────┴──────────────────────────┘
+│ Learning Graph  │         (feeds intelligence layer)
+└─────────────────┘
+```
+
+**Phase 1** creates the feedback loop that makes everything else better over time.
+**Phase 2** delivers immediate, measurable performance wins.
+**Phase 3** requires the most architectural change but delivers the deepest capability gains.
+
+---
+
+## Sources & References
+
+### Papers
+
+- [SkillRL: Evolving Agents via Recursive Skill-Augmented RL](https://arxiv.org/abs/2602.08234) — ICLR 2026. Skill library evolution framework.
+- [LLMCompiler: An LLM Compiler for Parallel Function Calling](https://arxiv.org/pdf/2312.04511) — ICML 2024. DAG-based tool execution.
+- [Optimizing Agentic LLM Inference via Speculative Tool Calls](https://arxiv.org/pdf/2512.15834) — Speculative execution for agent tools.
+- [RISE: Recursive Introspection for Self-Improvement](https://proceedings.neurips.cc/paper_files/paper/2024/file/639d992f819c2b40387d4d5170b8ffd7-Paper-Conference.pdf) — NeurIPS 2024. Self-improving LLM agents.
+- [Don't Break the Cache: Prompt Caching for Agentic Tasks](https://arxiv.org/html/2601.06007v1) — Prompt caching evaluation.
+- [Efficient LLM Serving for Agentic Workflows](https://arxiv.org/html/2603.16104v1) — Systems perspective on agent serving.
+
+### Industry & Analysis
+
+- [Context Engineering for Agents](https://rlancemartin.github.io/2025/06/23/context_engineering/) — Lance Martin's comprehensive guide.
+- [AI Agent Context Compression Strategies](https://zylos.ai/research/2026-02-28-ai-agent-context-compression-strategies) — Zylos Research, Feb 2026.
+- [Context Engineering for Coding Agents](https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html) — Martin Fowler.
+- [Memory for AI Agents: A New Paradigm](https://thenewstack.io/memory-for-ai-agents-a-new-paradigm-of-context-engineering/) — The New Stack.
+- [LLM Compiler Agent Pattern](https://agent-patterns.readthedocs.io/en/stable/patterns/llm-compiler.html) — Agent Patterns documentation.
+- [Skill Library Evolution Pattern](https://www.agentic-patterns.com/patterns/skill-library-evolution/) — Awesome Agentic Patterns.
+
+### Workshops & Events
+
+- [ICLR 2026 Workshop on AI with Recursive Self-Improvement](https://iclr.cc/virtual/2026/workshop/10000796)
+- [Agent Memory Paper List](https://github.com/Shichun-Liu/Agent-Memory-Paper-List) — Comprehensive survey.
+- [Awesome Context Engineering](https://github.com/Meirtz/Awesome-Context-Engineering) — Papers, frameworks, guides.
--- a/docs/README.md
+++ b/docs/README.md
@ -11,7 +11,8 @@ Welcome to the GSD documentation. This covers everything from getting started to
 | [Commands Reference](./commands.md) | All commands, keyboard shortcuts, and CLI flags |
 | [Remote Questions](./remote-questions.md) | Discord and Slack integration for headless auto-mode |
 | [Configuration](./configuration.md) | Preferences, model selection, git settings, and token profiles |
-| [Custom Models](./custom-models.md) | Add custom providers (Ollama, vLLM, LM Studio, proxies) via models.json |
+| [Provider Setup](./providers.md) | Step-by-step setup for OpenRouter, Ollama, LM Studio, vLLM, and all supported providers |
+| [Custom Models](./custom-models.md) | Advanced model configuration — models.json schema, compat flags, overrides |
 | [Token Optimization](./token-optimization.md) | Token profiles, context compression, complexity routing, and adaptive learning (v2.17) |
 | [Dynamic Model Routing](./dynamic-model-routing.md) | Complexity-based model selection, cost tables, escalation, and budget pressure (v2.19) |
 | [Captures & Triage](./captures-triage.md) | Fire-and-forget thought capture during auto-mode with automated triage (v2.19) |
@ -23,7 +24,7 @@ Welcome to the GSD documentation. This covers everything from getting started to
 | [Skills](./skills.md) | Bundled skills, skill discovery, and custom skill authoring |
 | [Migration from v1](./migration.md) | Migrating `.planning` directories from the original GSD |
 | [Troubleshooting](./troubleshooting.md) | Common issues, `/gsd doctor` (real-time visibility v2.40), `/gsd forensics` (full debugger v2.40), and recovery procedures |
-| [Web Interface](./web-interface.md) | Browser-based project management with `pi --web` (v2.41) |
+| [Web Interface](./web-interface.md) | Browser-based project management with `gsd --web` (v2.41) |
 | [VS Code Extension](../vscode-extension/README.md) | Chat participant, sidebar dashboard, and RPC integration for VS Code |

 ## Architecture & Internals
@ -34,6 +35,9 @@ Welcome to the GSD documentation. This covers everything from getting started to
 | [Native Engine](../native/README.md) | Rust N-API modules for performance-critical operations |
 | [ADR-001: Branchless Worktree Architecture](./ADR-001-branchless-worktree-architecture.md) | Decision record for the v2.14 git architecture |
 | [ADR-003: Pipeline Simplification](./ADR-003-pipeline-simplification.md) | Research merged into planning, mechanical completion (v2.30) |
+| [ADR-004: Capability-Aware Model Routing](./ADR-004-capability-aware-model-routing.md) | Extend routing from tier/cost selection to task-capability matching |
+| [ADR-007: Model Catalog Split](./ADR-007-model-catalog-split.md) | Separate model metadata from routing logic for extensibility |
+| [Context Optimization Opportunities](./pi-context-optimization-opportunities.md) | Analysis of context window usage and optimization strategies |

 ## Pi SDK Documentation

--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -14,7 +14,7 @@ gsd (CLI binary)
          ├─ resource-loader.ts  Syncs bundled extensions + agents to ~/.gsd/agent/
          └─ src/resources/
              ├─ extensions/gsd/    Core GSD extension
-              ├─ extensions/...     12 supporting extensions
+              ├─ extensions/...     23 supporting extensions
              ├─ agents/            scout, researcher, worker
              ├─ AGENTS.md          Agent routing instructions
              └─ GSD-WORKFLOW.md    Manual bootstrap protocol
@ -73,6 +73,12 @@ Every dispatch creates a new agent session. The LLM starts with a clean context
 | **Remote Questions** | Discord, Slack, and Telegram integration for headless question routing |
 | **TTSR** | Tool-triggered system rules — conditional context injection based on tool usage |
 | **Universal Config** | Discovery of existing AI tool configurations (Claude Code, Cursor, Windsurf, etc.) |
+| **AWS Auth** | AWS credential management and authentication |
+| **Claude Code CLI** | Claude Code CLI integration |
+| **cmux** | Context multiplexing for multi-session coordination |
+| **GitHub Sync** | GitHub issue and PR synchronization |
+| **Ollama** | Local Ollama model integration |
+| **Shared** | Shared utilities across extensions |

 ## Bundled Agents

@ -122,7 +128,7 @@ The auto mode dispatch pipeline:

 Phase skipping (from token profile) gates steps 2-3: if a phase is skipped, the corresponding unit type is never dispatched.

-## Key Modules (v2.33)
+## Key Modules (v2.67)

 | Module | Purpose |
 |--------|---------|
@ -160,3 +166,11 @@ Phase skipping (from token profile) gates steps 2-3: if a phase is skipped, the
 | `memory-extractor.ts` | Extract reusable knowledge from session transcripts |
 | `memory-store.ts` | Persistent memory store for cross-session knowledge |
 | `queue-order.ts` | Milestone queue ordering |
+| `context-masker.ts` | Context masking for model routing optimization |
+| `phase-anchor.ts` | Phase anchoring for dispatch pipeline |
+| `slice-parallel-orchestrator.ts` | Slice-level parallelism with dependency-aware dispatch |
+| `slice-parallel-eligibility.ts` | Slice parallel eligibility checks |
+| `slice-parallel-conflict.ts` | Slice parallel conflict detection |
+| `preferences-models.ts` | Model preferences configuration |
+| `preferences-validation.ts` | Preferences validation |
+| `preferences-types.ts` | Preferences type definitions |
--- a/docs/commands.md
+++ b/docs/commands.md
@ -9,12 +9,16 @@
 | `/gsd auto` | Autonomous mode — research, plan, execute, commit, repeat |
 | `/gsd quick` | Execute a quick task with GSD guarantees (atomic commits, state tracking) without full planning overhead |
 | `/gsd stop` | Stop auto mode gracefully |
+| `/gsd pause` | Pause auto-mode (preserves state, `/gsd auto` to resume) |
 | `/gsd steer` | Hard-steer plan documents during execution |
 | `/gsd discuss` | Discuss architecture and decisions (works alongside auto mode) |
 | `/gsd status` | Progress dashboard |
+| `/gsd widget` | Cycle dashboard widget: full / small / min / off |
 | `/gsd queue` | Queue and reorder future milestones (safe during auto mode) |
 | `/gsd capture` | Fire-and-forget thought capture (works during auto mode) |
 | `/gsd triage` | Manually trigger triage of pending captures |
+| `/gsd dispatch` | Dispatch a specific phase directly (research, plan, execute, complete, reassess, uat, replan) |
+| `/gsd history` | View execution history (supports `--cost`, `--phase`, `--model` filters) |
 | `/gsd forensics` | Full-access GSD debugger — structured anomaly detection, unit traces, and LLM-guided root-cause analysis for auto-mode failures |
 | `/gsd cleanup` | Clean up GSD state files and stale worktrees |
 | `/gsd visualize` | Open workflow visualizer (progress, deps, metrics, timeline) |
@ -22,6 +26,11 @@
 | `/gsd export --html --all` | Generate retrospective reports for all milestones at once |
 | `/gsd update` | Update GSD to the latest version in-session |
 | `/gsd knowledge` | Add persistent project knowledge (rule, pattern, or lesson) |
+| `/gsd fast` | Toggle service tier for supported models (prioritized API routing) |
+| `/gsd rate` | Rate last unit's model tier (over/ok/under) — improves adaptive routing |
+| `/gsd changelog` | Show categorized release notes |
+| `/gsd logs` | Browse activity logs, debug logs, and metrics |
+| `/gsd remote` | Control remote auto-mode |
 | `/gsd help` | Categorized command reference with descriptions for all GSD subcommands |

 ## Configuration & Diagnostics
@ -33,6 +42,9 @@
 | `/gsd config` | Re-run the provider setup wizard (LLM provider + tool keys) |
 | `/gsd keys` | API key manager — list, add, remove, test, rotate, doctor |
 | `/gsd doctor` | Runtime health checks with auto-fix — issues surface in real time across widget, visualizer, and HTML reports (v2.40) |
+| `/gsd inspect` | Show SQLite DB diagnostics |
+| `/gsd init` | Project init wizard — detect, configure, bootstrap `.gsd/` |
+| `/gsd setup` | Global setup status and configuration |
 | `/gsd skill-health` | Skill lifecycle dashboard — usage stats, success rates, token trends, staleness warnings |
 | `/gsd skill-health <name>` | Detailed view for a single skill |
 | `/gsd skill-health --declining` | Show only skills flagged for declining performance |
@ -48,8 +60,10 @@
 | `/gsd new-milestone` | Create a new milestone |
 | `/gsd skip` | Prevent a unit from auto-mode dispatch |
 | `/gsd undo` | Revert last completed unit |
-| Park milestone | Available via `/gsd` wizard → "Milestone actions" → "Park" |
-| Unpark milestone | Available via `/gsd` wizard → "Milestone actions" → "Unpark" |
+| `/gsd undo-task` | Reset a specific task's completion state (DB + markdown) |
+| `/gsd reset-slice` | Reset a slice and all its tasks (DB + markdown) |
+| `/gsd park` | Park a milestone — skip without deleting |
+| `/gsd unpark` | Reactivate a parked milestone |
 | Discard milestone | Available via `/gsd` wizard → "Milestone actions" → "Discard" |

 ## Parallel Orchestration
@ -65,6 +79,46 @@

 See [Parallel Orchestration](./parallel-orchestration.md) for full documentation.

+## Workflow Templates (v2.42)
+
+| Command | Description |
+|---------|-------------|
+| `/gsd start` | Start a workflow template (bugfix, spike, feature, hotfix, refactor, security-audit, dep-upgrade, full-project) |
+| `/gsd start resume` | Resume an in-progress workflow |
+| `/gsd templates` | List available workflow templates |
+| `/gsd templates info <name>` | Show detailed template info |
+
+## Custom Workflows (v2.42)
+
+| Command | Description |
+|---------|-------------|
+| `/gsd workflow new` | Create a new workflow definition (via skill) |
+| `/gsd workflow run <name>` | Create a run and start auto-mode |
+| `/gsd workflow list` | List workflow runs |
+| `/gsd workflow validate <name>` | Validate a workflow definition YAML |
+| `/gsd workflow pause` | Pause custom workflow auto-mode |
+| `/gsd workflow resume` | Resume paused custom workflow auto-mode |
+
+## Extensions
+
+| Command | Description |
+|---------|-------------|
+| `/gsd extensions list` | List all extensions and their status |
+| `/gsd extensions enable <id>` | Enable a disabled extension |
+| `/gsd extensions disable <id>` | Disable an extension |
+| `/gsd extensions info <id>` | Show extension details |
+
+## cmux Integration
+
+| Command | Description |
+|---------|-------------|
+| `/gsd cmux status` | Show cmux detection, prefs, and capabilities |
+| `/gsd cmux on` | Enable cmux integration |
+| `/gsd cmux off` | Disable cmux integration |
+| `/gsd cmux notifications on/off` | Toggle cmux desktop notifications |
+| `/gsd cmux sidebar on/off` | Toggle cmux sidebar metadata |
+| `/gsd cmux splits on/off` | Toggle cmux visual subagent splits |
+
 ## GitHub Sync (v2.39)

 | Command | Description |
@ -116,6 +170,14 @@ Enable with `github.enabled: true` in preferences. Requires `gh` CLI installed a
 | `gsd --print "msg"` (`-p`) | Single-shot prompt mode (no TUI) |
 | `gsd --mode <text\|json\|rpc\|mcp>` | Output mode for non-interactive use |
 | `gsd --list-models [search]` | List available models and exit |
+| `gsd --web [path]` | Start browser-based web interface (optional project path) |
+| `gsd --worktree` (`-w`) [name] | Start session in a git worktree (auto-generates name if omitted) |
+| `gsd --no-session` | Disable session persistence |
+| `gsd --extension <path>` | Load an additional extension (can be repeated) |
+| `gsd --append-system-prompt <text>` | Append text to the system prompt |
+| `gsd --tools <list>` | Comma-separated list of tools to enable |
+| `gsd --version` (`-v`) | Print version and exit |
+| `gsd --help` (`-h`) | Print help and exit |
 | `gsd sessions` | Interactive session picker — list all saved sessions for the current directory and choose one to resume |
 | `gsd --debug` | Enable structured JSONL diagnostic logging for troubleshooting dispatch and state issues |
 | `gsd config` | Set up global API keys for search and docs tools (saved to `~/.gsd/agent/auth.json`, applies to all projects). See [Global API Keys](./configuration.md#global-api-keys-gsd-config). |
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -1,14 +1,14 @@
 # Configuration

-GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project-local). Manage interactively with `/gsd prefs`.
+GSD preferences live in `~/.gsd/PREFERENCES.md` (global) or `.gsd/PREFERENCES.md` (project-local). Manage interactively with `/gsd prefs`.

 ## `/gsd prefs` Commands

 | Command | Description |
 |---------|-------------|
 | `/gsd prefs` | Open the global preferences wizard (default) |
-| `/gsd prefs global` | Interactive wizard for global preferences (`~/.gsd/preferences.md`) |
-| `/gsd prefs project` | Interactive wizard for project preferences (`.gsd/preferences.md`) |
+| `/gsd prefs global` | Interactive wizard for global preferences (`~/.gsd/PREFERENCES.md`) |
+| `/gsd prefs project` | Interactive wizard for project preferences (`.gsd/PREFERENCES.md`) |
 | `/gsd prefs status` | Show current preference files, merged values, and skill resolution status |
 | `/gsd prefs wizard` | Alias for `/gsd prefs global` |
 | `/gsd prefs setup` | Alias for `/gsd prefs wizard` — creates preferences file if missing |
@ -42,8 +42,8 @@ token_profile: balanced

 | Scope | Path | Applies to |
 |-------|------|-----------|
-| Global | `~/.gsd/preferences.md` | All projects |
-| Project | `.gsd/preferences.md` | Current project only |
+| Global | `~/.gsd/PREFERENCES.md` | All projects |
+| Project | `.gsd/PREFERENCES.md` | Current project only |

 **Merge behavior:**
 - **Scalar fields** (`skill_discovery`, `budget_ceiling`): project wins if defined
@ -159,6 +159,8 @@ Recommended verification order:
 | `GSD_PROJECT_ID` | (auto-hash) | Override the automatic project identity hash. Per-project state goes to `$GSD_HOME/projects/<GSD_PROJECT_ID>/` instead of the computed hash. Useful for CI/CD or sharing state across clones of the same repo. (v2.39) |
 | `GSD_STATE_DIR` | `$GSD_HOME` | Per-project state root. Controls where `projects/<repo-hash>/` directories are created. Takes precedence over `GSD_HOME` for project state. |
 | `GSD_CODING_AGENT_DIR` | `$GSD_HOME/agent` | Agent directory containing managed resources, extensions, and auth. Takes precedence over `GSD_HOME` for agent paths. |
+| `GSD_ALLOWED_COMMAND_PREFIXES` | (built-in list) | Comma-separated command prefixes allowed for `!command` value resolution. Overrides `allowedCommandPrefixes` in settings.json. See [Custom Models — Command Allowlist](custom-models.md#command-allowlist). |
+| `GSD_FETCH_ALLOWED_URLS` | (none) | Comma-separated hostnames exempted from `fetch_page` URL blocking. Overrides `fetchAllowedUrls` in settings.json. See [URL Blocking](#url-blocking-fetch_page). |

 ## All Settings

@ -346,6 +348,43 @@ verification_max_retries: 2       # max retry attempts (default: 2)
 | `verification_auto_fix` | boolean | `true` | Auto-retry when verification fails |
 | `verification_max_retries` | number | `2` | Maximum auto-fix retry attempts |

+### URL Blocking (`fetch_page`)
+
+The `fetch_page` tool blocks requests to private and internal network addresses to prevent server-side request forgery (SSRF). This protects against the agent being tricked into accessing internal services, cloud metadata endpoints, or local files.
+
+**Blocked by default:**
+
+| Category | Examples |
+|----------|----------|
+| Private IP ranges | `10.x.x.x`, `172.16-31.x.x`, `192.168.x.x`, `127.x.x.x` |
+| Link-local / cloud metadata | `169.254.x.x` (AWS/GCP instance metadata) |
+| Cloud metadata hostnames | `metadata.google.internal`, `instance-data` |
+| Localhost | `localhost` (any port) |
+| Non-HTTP protocols | `file://`, `ftp://` |
+| IPv6 private ranges | `::1`, `fc00:`, `fd`, `fe80:` |
+
+Public URLs (`https://example.com`, `http://8.8.8.8`) are not affected.
+
+**Allowing specific internal hosts:**
+
+If you need the agent to fetch from internal URLs (self-hosted docs, internal APIs behind a VPN), add their hostnames to `fetchAllowedUrls` in global settings (`~/.gsd/agent/settings.json`):
+
+```json
+{
+  "fetchAllowedUrls": ["internal-docs.company.com", "192.168.1.50"]
+}
+```
+
+Alternatively, set the `GSD_FETCH_ALLOWED_URLS` environment variable (comma-separated). The env var takes precedence over settings.json:
+
+```bash
+export GSD_FETCH_ALLOWED_URLS="internal-docs.company.com,192.168.1.50"
+```
+
+Allowed hostnames bypass the blocklist checks. The protocol restriction (HTTP/HTTPS only) still applies — `file://` and `ftp://` cannot be allowlisted.
+
+> **Note:** This setting is global-only. Project-level settings.json cannot override the URL allowlist — this prevents a cloned repo from directing `fetch_page` at internal infrastructure.
+
 ### `auto_report` (v2.26)

 Auto-generate HTML reports after milestone completion:
@ -374,8 +413,8 @@ git:
  auto_push: false            # push commits to remote after committing
  push_branches: false        # push milestone branch to remote
  remote: origin              # git remote name
-  snapshots: false            # WIP snapshot commits during long tasks
-  pre_merge_check: false      # run checks before worktree merge (true/false/"auto")
+  snapshots: true             # WIP snapshot commits during long tasks
+  pre_merge_check: auto       # run checks before worktree merge (true/false/"auto")
  commit_type: feat           # override conventional commit prefix
  main_branch: main           # primary branch name
  merge_strategy: squash      # how worktree branches merge: "squash" or "merge"
@ -392,8 +431,8 @@ git:
 | `auto_push` | boolean | `false` | Push commits to remote after committing |
 | `push_branches` | boolean | `false` | Push milestone branch to remote |
 | `remote` | string | `"origin"` | Git remote name |
-| `snapshots` | boolean | `false` | WIP snapshot commits during long tasks |
-| `pre_merge_check` | bool/string | `false` | Run checks before merge (`true`/`false`/`"auto"`) |
+| `snapshots` | boolean | `true` | WIP snapshot commits during long tasks |
+| `pre_merge_check` | bool/string | `"auto"` | Run checks before merge (`true`/`false`/`"auto"`) |
 | `commit_type` | string | (inferred) | Override conventional commit prefix (`feat`, `fix`, `refactor`, `docs`, `test`, `chore`, `perf`, `ci`, `build`, `style`) |
 | `main_branch` | string | `"main"` | Primary branch name |
 | `merge_strategy` | string | `"squash"` | How worktree branches merge: `"squash"` (combine all commits) or `"merge"` (preserve individual commits) |
@ -494,6 +533,14 @@ notifications:
  on_attention: true          # notify when manual attention needed
 ```

+**macOS delivery:** GSD uses [`terminal-notifier`](https://github.com/julienXX/terminal-notifier) when available, falling back to `osascript`. We recommend installing `terminal-notifier` for reliable notification delivery:
+
+```bash
+brew install terminal-notifier
+```
+
+Why: `osascript display notification` is attributed to your terminal app (Ghostty, iTerm2, etc.), which may not have notification permissions in System Settings → Notifications. `terminal-notifier` registers as its own app and prompts for permission on first use. See [Troubleshooting: Notifications not appearing on macOS](troubleshooting.md#notifications-not-appearing-on-macos) if notifications aren't working.
+
 ### `remote_questions`

 Route interactive questions to Slack or Discord for headless auto mode:
@ -578,7 +625,7 @@ prefer_skills:
 avoid_skills: []
 ```

-Skills can be bare names (looked up in `~/.gsd/agent/skills/`) or absolute paths.
+Skills can be bare names (looked up in `~/.agents/skills/` and `.agents/skills/`) or absolute paths.

 ### `skill_rules`

@ -639,6 +686,7 @@ Complexity-based model routing. See [Dynamic Model Routing](./dynamic-model-rout
 ```yaml
 dynamic_routing:
  enabled: true
+  capability_routing: true          # score models by task capability (v2.59)
  tier_models:
    light: claude-haiku-4-5
    standard: claude-sonnet-4-6
@ -648,6 +696,48 @@ dynamic_routing:
  cross_provider: true
 ```

+### `context_management` (v2.59)
+
+Controls observation masking and tool result truncation during auto-mode sessions. Reduces context bloat between compactions with zero LLM overhead.
+
+```yaml
+context_management:
+  observation_masking: true          # replace old tool results with placeholders (default: true)
+  observation_mask_turns: 8          # keep results from last N user turns (1-50, default: 8)
+  compaction_threshold_percent: 0.70 # target compaction at 70% context usage (0.5-0.95, default: 0.70)
+  tool_result_max_chars: 800         # cap individual tool result content (200-10000, default: 800)
+```
+
+### `service_tier` (v2.42)
+
+OpenAI service tier preference for supported models. Toggle with `/gsd fast`.
+
+| Value | Behavior |
+|-------|----------|
+| `"priority"` | Priority tier — 2x cost, faster responses |
+| `"flex"` | Flex tier — 0.5x cost, slower responses |
+| (unset) | Default tier |
+
+```yaml
+service_tier: priority
+```
+
+### `forensics_dedup` (v2.43)
+
+Opt-in: search existing issues and PRs before filing from `/gsd forensics`. Uses additional AI tokens.
+
+```yaml
+forensics_dedup: true    # default: false
+```
+
+### `show_token_cost` (v2.44)
+
+Opt-in: show per-prompt and cumulative session token cost in the footer.
+
+```yaml
+show_token_cost: true    # default: false
+```
+
 ### `auto_visualize`

 Show the workflow visualizer automatically after milestone completion:
@ -734,6 +824,13 @@ notifications:
 # Visualizer
 auto_visualize: true

+# Service tier
+service_tier: priority         # "priority" or "flex" (for /gsd fast)
+
+# Diagnostics
+forensics_dedup: true          # deduplicate before filing forensics issues
+show_token_cost: true          # show per-prompt cost in footer
+
 # Hooks
 post_unit_hooks:
  - name: code-review
--- a/docs/context-and-hooks/07-the-system-prompt-anatomy.md
+++ b/docs/context-and-hooks/07-the-system-prompt-anatomy.md
@ -174,7 +174,7 @@ When a skill file references a relative path, resolve it against the skill direc
  <skill>
    <name>commit-outstanding</name>
    <description>Commit all uncommitted files in logical groups</description>
-    <location>/Users/you/.gsd/agent/skills/commit-outstanding/SKILL.md</location>
+    <location>/Users/you/.agents/skills/commit-outstanding/SKILL.md</location>
  </skill>
 </available_skills>
 ```
--- a/docs/custom-models.md
+++ b/docs/custom-models.md
@ -131,6 +131,36 @@ The `apiKey` and `headers` fields support three formats:
  "apiKey": "sk-..."
  ```

+#### Command Allowlist
+
+Shell commands (`!command`) are restricted to a set of known credential tools. Only commands starting with one of these are allowed to execute:
+
+`pass`, `op`, `aws`, `gcloud`, `vault`, `security`, `gpg`, `bw`, `gopass`, `lpass`
+
+Commands not on this list are blocked and the value resolves to `undefined`. A warning is written to stderr.
+
+Shell operators (`;`, `|`, `&`, `` ` ``, `$`, `>`, `<`) are also blocked in command arguments to prevent injection.
+
+**Customizing the allowlist:**
+
+If you use a credential tool not on the default list, override it in global settings (`~/.gsd/agent/settings.json`):
+
+```json
+{
+  "allowedCommandPrefixes": ["pass", "op", "sops", "doppler", "mycli"]
+}
+```
+
+This replaces the default list entirely — include any defaults you still want.
+
+Alternatively, set the `GSD_ALLOWED_COMMAND_PREFIXES` environment variable (comma-separated). The env var takes precedence over settings.json:
+
+```bash
+export GSD_ALLOWED_COMMAND_PREFIXES="pass,op,sops,doppler"
+```
+
+> **Note:** This setting is global-only. Project-level settings.json (`<project>/.gsd/settings.json`) cannot override the command allowlist — this prevents a cloned repo from escalating command execution privileges.
+
 ### Custom Headers

 ```json
--- a/docs/dynamic-model-routing.md
+++ b/docs/dynamic-model-routing.md
@ -1,12 +1,20 @@
 # Dynamic Model Routing

-*Introduced in v2.19.0*
+*Introduced in v2.19.0. Capability scoring introduced in v2.52.0.*

 Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.

+Starting in v2.52.0, the router uses **capability-aware scoring** to select the *best fit* model for each task, not just the cheapest one in the tier.
+
 ## How It Works

-Each unit dispatched by auto-mode is classified into a complexity tier:
+Each unit dispatched by auto-mode passes through a two-stage pipeline:
+
+**Stage 1: Complexity classification** — classifies the work into a tier (light/standard/heavy).
+
+**Stage 2: Capability scoring** — within the eligible tier, ranks available models by how well their capabilities match the task's requirements.
+
+The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.

 | Tier | Typical Work | Default Model Level |
 |------|-------------|-------------------|
@ -14,8 +22,6 @@ Each unit dispatched by auto-mode is classified into a complexity tier:
 | **Standard** | Research, planning, execution, milestone completion | Sonnet-class |
 | **Heavy** | Replanning, roadmap reassessment, complex execution | Opus-class |

-The router then selects a model for that tier. The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
-
 ## Enabling

 Dynamic routing is off by default. Enable it in preferences:
@ -41,6 +47,7 @@ dynamic_routing:
  budget_pressure: true           # auto-downgrade when approaching budget ceiling (default: true)
  cross_provider: true            # consider models from other providers (default: true)
  hooks: true                     # apply routing to post-unit hooks (default: true)
+  capability_routing: true        # enable capability scoring within tier (default: true)
 ```

 ### `tier_models`
@ -70,6 +77,157 @@ When approaching the budget ceiling, the router progressively downgrades:

 When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.

+### `capability_routing`
+
+When enabled (default: true), the router uses capability scoring to pick the best model in a tier rather than always defaulting to the cheapest. Set to `false` to revert to cheapest-in-tier behavior:
+
+```yaml
+dynamic_routing:
+  enabled: true
+  capability_routing: false   # disable scoring, use cheapest-in-tier
+```
+
+## Capability Profiles
+
+Each model has a built-in **capability profile** — a 7-dimension score (0–100) representing how well it handles different task types:
+
+| Dimension | What It Represents |
+|-----------|-------------------|
+| `coding` | Code generation and implementation accuracy |
+| `debugging` | Diagnosing and fixing errors |
+| `research` | Synthesizing information and exploring topics |
+| `reasoning` | Multi-step logical reasoning |
+| `speed` | Latency and throughput (inverse of capability depth) |
+| `longContext` | Handling large codebases and long documents |
+| `instruction` | Following structured instructions precisely |
+
+**Built-in profiles** exist for 9 models: `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`, `gpt-4o`, `gpt-4o-mini`, `gemini-2.5-pro`, `gemini-2.0-flash`, `deepseek-chat`, `o3`.
+
+Models without a built-in profile receive **uniform scores of 50** across all dimensions. This is a cold-start policy — unknown models compete but don't have an advantage. From the user's perspective, routing behaves the same as before capability scoring was introduced for those models.
+
+**Profiles are heuristic rankings, not benchmarks.** They represent approximate relative strengths, not verified benchmark results. Use user overrides (below) to correct them for models you know well.
+
+## How Scoring Works
+
+The routing pipeline within a tier:
+
+```
+classify complexity tier
+    ↓
+filter eligible models for tier
+    ↓
+fire before_model_select hook (optional override)
+    ↓
+capability score eligible models
+    ↓
+select winner (or first eligible if scoring is disabled)
+```
+
+**Scoring formula:** weighted average of capability dimensions
+
+```
+score = Σ(weight × capability) / Σ(weights)
+```
+
+**Task requirements** are dynamic — different task types weight dimensions differently:
+
+| Unit Type | Key Dimensions |
+|-----------|---------------|
+| `execute-task` | coding (0.9), instruction (0.7), speed (0.3) |
+| `research-*` | research (0.9), longContext (0.7), reasoning (0.5) |
+| `plan-*` | reasoning (0.9), coding (0.5) |
+| `replan-slice` | reasoning (0.9), debugging (0.6), coding (0.5) |
+| `complete-slice`, `run-uat` | instruction (0.8), speed (0.7) |
+
+For `execute-task`, requirements are further refined by task metadata signals:
+- Tags like `docs`, `config`, `readme` → boost instruction weight
+- Keywords like `concurrency`, `compatibility` → boost debugging and reasoning
+- Keywords like `migration`, `architecture` → boost reasoning and coding
+- Large file counts (≥6) or large estimated line counts (≥500) → boost coding and reasoning
+
+**Tie-breaking:** When two models score within 2 points of each other, the cheaper model wins. If costs are equal, lexicographic model ID breaks the tie (deterministic).
+
+## User Overrides
+
+Correct built-in capability profiles for models you know well using `modelOverrides` in your models configuration:
+
+```json
+{
+  "providers": {
+    "anthropic": {
+      "modelOverrides": {
+        "claude-sonnet-4-6": {
+          "capabilities": {
+            "debugging": 90,
+            "research": 85
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+Overrides are **deep-merged** with built-in defaults — only the specified dimensions are overridden; others retain their built-in values.
+
+**Use case:** You've found that a model consistently outperforms its built-in profile on specific task types. Override the relevant dimensions to steer the router toward that model for those tasks.
+
+## Verbose Output
+
+When verbose mode is active, the router logs its routing decision. When capability scoring was used, the log includes a full scoring breakdown:
+
+```
+Dynamic routing [S]: claude-sonnet-4-6 (capability-scored) — claude-sonnet-4-6: 82.3, gpt-4o: 78.1, deepseek-chat: 72.0
+```
+
+When tier-only routing was used (scoring disabled, single eligible model, or routing guards applied):
+
+```
+Dynamic routing [S]: claude-sonnet-4-6 (standard complexity, multiple steps)
+```
+
+The `selectionMethod` field in the routing decision indicates which path was taken:
+- `"capability-scored"` — capability scoring selected the winner
+- `"tier-only"` — cheapest in tier (or explicit pin) was used
+
+## Extension Hook
+
+Extensions can intercept and override model selection using the `before_model_select` hook.
+
+The hook fires **after** tier filtering (eligible models are known) and **before** capability scoring (scores have not been computed yet). A hook can override selection entirely or return `undefined` to let scoring proceed normally.
+
+**Registering a handler:**
+
+```typescript
+pi.on("before_model_select", async (event) => {
+  const { unitType, unitId, classification, taskMetadata, eligibleModels, phaseConfig } = event;
+
+  // Custom routing strategy: always use gemini for research tasks
+  if (unitType.startsWith("research-")) {
+    const gemini = eligibleModels.find(id => id.includes("gemini"));
+    if (gemini) return { modelId: gemini };
+  }
+
+  // Return undefined to let capability scoring proceed
+  return undefined;
+});
+```
+
+**Event payload:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `unitType` | `string` | The unit type being dispatched (e.g., `"execute-task"`) |
+| `unitId` | `string` | Unique identifier for this unit dispatch |
+| `classification` | `{ tier, reason, downgraded }` | The complexity classification result |
+| `taskMetadata` | `Record<string, unknown> \| undefined` | Task metadata extracted from the unit plan |
+| `eligibleModels` | `string[]` | Models eligible for the classified tier |
+| `phaseConfig` | `{ primary, fallbacks } \| undefined` | The user's configured model for this phase |
+
+**Return value:** `{ modelId: string }` to override selection, or `undefined` to defer to capability scoring.
+
+**First-override-wins:** If multiple extensions register handlers, the first one to return a non-undefined result wins. Subsequent handlers are not called.
+
 ## Complexity Classification

 Units are classified using pure heuristics — no LLM calls, sub-millisecond:
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@ -39,6 +39,10 @@ GSD is also available as a VS Code extension. Install from the marketplace (publ

 The CLI (`gsd-pi`) must be installed first — the extension connects to it via RPC.

+### Web Interface
+
+GSD also has a browser-based interface. Run `gsd --web` to start a local web server with a visual dashboard, real-time progress, and multi-project support. See [Web Interface](./web-interface.md) for details.
+
 ## First Launch

 Run `gsd` in any directory:
@ -54,6 +58,8 @@ GSD displays a welcome screen showing your version, active model, and available

 If you have an existing Pi installation, provider credentials are imported automatically.

+For detailed setup instructions for specific providers (OpenRouter, Ollama, LM Studio, vLLM, and more), see the [Provider Setup Guide](./providers.md).
+
 Re-run the wizard anytime with:

 ```bash
--- a/docs/git-strategy.md
+++ b/docs/git-strategy.md
@ -36,10 +36,10 @@ Use this for hot-reload workflows where file isolation breaks dev tooling (e.g.,
 main ─────────────────────────────────────────────────────────
  │                                                     ↑
  └── milestone/M001 (worktree) ────────────────────────┘
-       commit: feat(S01/T01): core types
-       commit: feat(S01/T02): markdown parser
-       commit: feat(S01/T03): file writer
-       commit: docs(M001/S01): workflow docs
+       commit: feat: core types
+       commit: feat: markdown parser
+       commit: feat: file writer
+       commit: docs: workflow docs
       ...
       → squash-merged to main as single commit
 ```
@ -56,13 +56,13 @@ With [parallel orchestration](./parallel-orchestration.md) enabled, multiple mil
 main ──────────────────────────────────────────────────────────
  │                                      ↑              ↑
  ├── milestone/M002 (worktree) ─────────┘              │
-  │    commit: feat(S01/T01): auth types                │
-  │    commit: feat(S01/T02): JWT middleware             │
+  │    commit: feat: auth types                         │
+  │    commit: feat: JWT middleware                     │
  │    → squash-merged first                            │
  │                                                     │
  └── milestone/M003 (worktree) ────────────────────────┘
-       commit: feat(S01/T01): dashboard layout
-       commit: feat(S01/T02): chart components
+       commit: feat: dashboard layout
+       commit: feat: chart components
       → squash-merged second
 ```

@ -75,13 +75,16 @@ Each worktree operates on its own branch with its own commit history. Merges hap

 ### Commit Format

-Commits use conventional commit format with scope:
+Commits use conventional commit format with GSD metadata in trailers:

 ```
-feat(S01/T01): core type definitions
-feat(S01/T02): markdown parser for plan files
-fix(M001/S03): bug fixes and doc corrections
-docs(M001/S04): workflow documentation
+feat: core type definitions
+
+GSD-Task: M001/S01/T01
+
+feat: markdown parser for plan files
+
+GSD-Task: M001/S01/T02
 ```

 ## Worktree Management
--- a/docs/parallel-orchestration.md
+++ b/docs/parallel-orchestration.md
@ -126,7 +126,7 @@ File overlaps are warnings, not blockers. Both milestones work in separate workt

 ## Configuration

-Add to `~/.gsd/preferences.md` or `.gsd/preferences.md`:
+Add to `~/.gsd/PREFERENCES.md` or `.gsd/PREFERENCES.md`:

 ```yaml
 ---
--- a/docs/pi-context-optimization-opportunities.md
+++ b/docs/pi-context-optimization-opportunities.md
@ -0,0 +1,198 @@
+# pi-coding-agent: Context Optimization Opportunities
+
+> **Status**: Research only — not planned for implementation.
+> Scope: `packages/pi-coding-agent` and `packages/pi-agent-core` infrastructure.
+> These changes would benefit every consumer of the pi engine, not just GSD.
+
+---
+
+## 1. Prompt Caching (`cache_control`) — Highest Impact
+
+**Current state**: Every LLM call re-pays full input token cost for the system prompt, tool definitions, and context files. No `cache_control` breakpoints are set anywhere in the API call path.
+
+**Opportunity**: Anthropic's KV cache delivers 90% cost reduction on cached tokens (0.1x input rate). Claude Code achieves 92–98% cache hit rates by placing stable content before volatile content.
+
+**Where to instrument** (`packages/pi-ai/src/providers/anthropic.ts`):
+- Set `cache_control: { type: "ephemeral" }` on the last tool definition block
+- Set `cache_control` after the static system prompt sections (base boilerplate + context files)
+- Leave the per-turn user message uncached
+
+**Critical constraint**: The cache breakpoint must be placed *after* all static content and *before* any dynamic content (timestamps, per-request variables). Moving a timestamp before a cache breakpoint defeats it on every call.
+
+**Cache hierarchy**: Tools → system → messages. Changing a tool definition invalidates system and message caches. Tool definitions should be sorted deterministically (alphabetically) to prevent spurious cache misses.
+
+**Expected savings**: 80–90% reduction in input token cost for multi-turn sessions (the dominant cost pattern in GSD auto-mode).
+
+---
+
+## 2. Observation Masking in the Message Pipeline
+
+**Current state**: `agent-loop.ts` passes the full `context.messages` array to the LLM on every turn. Tool results from 50 turns ago are re-read in full on every subsequent call. The `transformContext` hook exists on `AgentContext` and fires before every LLM call, but has no default implementation — extensions are responsible for any pruning.
+
+**Opportunity**: Replace old tool result content with lightweight placeholders after N turns. JetBrains Research tested this on SWE-bench Verified (500 tasks, up to 250-turn trajectories) and found:
+- 50%+ cost reduction vs. unmanaged history
+- Performance matched or slightly exceeded LLM summarization
+- Zero overhead (no extra LLM call required)
+
+**Proposed implementation** (default `transformContext` in `pi-agent-core`):
+```typescript
+// Keep last KEEP_RECENT_TURNS verbatim; mask older tool results
+const KEEP_RECENT_TURNS = 8;
+
+function defaultObservationMask(messages: AgentMessage[]): AgentMessage[] {
+  const cutoff = findTurnBoundary(messages, KEEP_RECENT_TURNS);
+  return messages.map((m, i) => {
+    if (i >= cutoff) return m;
+    if (m.type === "toolResult" || m.type === "bashExecution") {
+      return { ...m, content: "[result masked — within summarized history]", excludeFromContext: false };
+    }
+    return m;
+  });
+}
+```
+
+**Compaction interaction**: Observation masking reduces the token accumulation rate, pushing the compaction threshold further out. The two mechanisms are complementary — masking handles the steady state, compaction handles the rare deep-session case.
+
+---
+
+## 3. Earlier Compaction Threshold
+
+**Current state** (`packages/pi-coding-agent/src/core/constants.ts`):
+```typescript
+COMPACTION_RESERVE_TOKENS = 16_384   // triggers at contextWindow - 16K
+COMPACTION_KEEP_RECENT_TOKENS = 20_000
+```
+
+For a 200K context window, compaction fires at ~183K tokens — 91.5% utilization.
+
+**Problem**: Context drift (not raw exhaustion) causes ~65% of enterprise agent failures. Performance degrades measurably beyond ~30K tokens per Zylos production data. The current threshold lets sessions run degraded for a long stretch before compaction fires.
+
+**Opportunity**: Lower the trigger to 70% utilization. For a 200K window, this means compacting at ~140K tokens — 43K tokens earlier.
+
+```typescript
+// Proposed
+COMPACTION_THRESHOLD_PERCENT = 0.70   // fire at 70% of contextWindow
+COMPACTION_RESERVE_TOKENS = contextWindow * (1 - COMPACTION_THRESHOLD_PERCENT)
+```
+
+**Trade-off**: More frequent compactions, each happening earlier when there's more "fresh" content to keep. Summary quality improves because less material needs to be discarded at each cut.
+
+---
+
+## 4. Tool Result Truncation at Write Time
+
+**Current state**: `TOOL_RESULT_MAX_CHARS = 2_000` in `constants.ts`, but this limit is only applied *during compaction summarization*, not when the tool result enters the message store. A bash result returning 50KB of log output is stored and re-sent verbatim until compaction fires.
+
+**Opportunity**: Truncate at write time in `messages.ts` → `convertToLlm()` or in the tool result handler. Two strategies:
+
+- **Hard truncation**: Slice at N chars, append `"\n[truncated — {original_length} chars]"`. Simple, zero overhead.
+- **Semantic head/tail**: Keep first 500 chars (context, command echo) + last 1000 chars (final output, errors). Better for bash results where the end contains the error.
+
+**Recommendation**: Semantic head/tail as the default, configurable per tool type. File read results benefit from head; bash/test output benefits from head+tail.
+
+---
+
+## 5. Context File Deduplication and Trim
+
+**Current state** (`packages/pi-coding-agent/src/core/resource-loader.ts`, lines 84–109):
+- Searches from `~/.gsd/agent/` → ancestor dirs → cwd
+- Deduplicates by *file path* but not by *content*
+- Entire file content concatenated verbatim into system prompt — no trimming, no summarization
+
+**Anti-pattern**: A project with AGENTS.md at 3 ancestor levels (repo root, workspace, home) injects all three in full. If they share common boilerplate, that content is re-injected multiple times.
+
+**Opportunities**:
+1. **Content deduplication**: Hash paragraph-level chunks; skip any chunk already seen in a previously-loaded file
+2. **Section-aware loading**: Parse `## ` headings in AGENTS.md; only include sections relevant to the current task type (e.g., `## Testing` section only when running tests)
+3. **Token budget enforcement**: If total context files exceed N tokens, summarize oldest/most-distant file rather than including verbatim
+
+---
+
+## 6. Skill Content Lazy Loading and Summarization
+
+**Current state**: When `/skill:name` is invoked, the full skill file content is injected inline as `<skill>...</skill>` in the user message. No chunking, no summarization. A 10KB skill file adds ~2,500 tokens to that turn.
+
+**Opportunity**:
+- **Cached skill injection**: If the same skill is used across multiple turns (rare but possible), it's re-injected each time. Cache with `cache_control` after first injection.
+- **Skill digest mode**: Inject a 200-token summary of the skill on first reference; full content only if the model requests it via a `get_skill_detail` tool call. Reduces cost for skills that don't end up being followed.
+- **Skill prefetching**: Before a known long session (e.g., auto-mode start), pre-inject all likely skills with `cache_control` so they're cached for the entire session.
+
+---
+
+## 7. Token Estimation Accuracy
+
+**Current state** (`compaction.ts`, line 216): `chars / 4` heuristic. This overestimates token count for English prose (~3.5 chars/token) and underestimates for code with short identifiers or Unicode.
+
+**Opportunity**: Use a proper tokenizer.
+- `@anthropic-ai/tokenizer` (tiktoken-compatible, ships with the SDK) — accurate but ~5ms per call
+- Tiered approach: use chars/4 for display; use proper tokenizer only for compaction threshold decisions (where accuracy matters)
+
+**Impact**: More accurate compaction timing, fewer unnecessary compactions, slightly better `COMPACTION_KEEP_RECENT_TOKENS` boundary placement.
+
+---
+
+## 8. Format: Markdown over XML for Internal Context
+
+**Current state**: The message pipeline uses `<skill>`, `<summary>`, `<compaction>` XML wrappers in several places. System prompt sections are largely prose Markdown.
+
+**Findings**: XML tags carry 15–40% more tokens than equivalent Markdown for the same semantic content, due to paired open/close tags. However, Claude was optimized for XML and shows higher accuracy on tasks requiring precise section parsing.
+
+**Recommendation**: Audit XML usage in the pipeline and convert to Markdown where the content is:
+- Non-nested (flat instructions, status messages)
+- Human-readable rather than machine-parsed by the model
+- Not requiring precise boundary detection
+
+Keep XML for: few-shot examples with ambiguous boundaries, skill content (requires precise isolation from surrounding text), compaction summaries that the model must treat as authoritative history.
+
+**Estimated savings**: 5–15% reduction in system prompt token count.
+
+---
+
+## 9. Dynamic Tool Set Delivery
+
+**Current state**: All tool definitions are included in every LLM request. Tool descriptions consume 60–80% of input tokens in static configurations. As new extensions register tools, the baseline grows linearly.
+
+**Opportunity** (higher complexity): Implement the three-function Dynamic Toolset pattern:
+1. `search_tools(query)` — semantic search over tool catalog
+2. `describe_tools(ids[])` — fetch full schemas on demand
+3. `execute_tool(id, params)` — unchanged execution
+
+Speakeasy measured 91–97% token reduction with 100% task success rate. Trade-off: 2–3x more tool calls, ~50% longer wall time. Net cost dramatically lower.
+
+**Feasibility for pi**: The tool registry (`packages/pi-coding-agent/src/core/tool-registry.ts`) already stores tool metadata separately from definitions. The primary engineering work is the semantic search index and the `describe_tools` / `search_tools` tool implementations.
+
+---
+
+## 10. Cost Attribution and Per-Phase Reporting
+
+**Current state**: `SessionManager.getUsageTotals()` accumulates cost across the entire session. No per-phase or per-agent breakdown is stored. Cost visibility is limited to the footer total and `GSD_SHOW_TOKEN_COST=1` per-turn display.
+
+**Opportunity**: Emit structured cost events that extensions can subscribe to:
+```typescript
+interface CostCheckpointEvent {
+  type: "cost_checkpoint";
+  label: string;          // "discuss-phase", "execute-slice-3"
+  deltaTokens: Usage;     // tokens since last checkpoint
+  cumulativeTokens: Usage;
+  cumulativeCost: number;
+}
+```
+
+GSD extension could consume these events to surface per-milestone cost in `/gsd stats` and flag milestones that are disproportionately expensive — enabling budget-aware planning.
+
+---
+
+## Implementation Ordering (if pursued)
+
+| Priority | Item | Effort | Expected Impact |
+|----------|------|--------|-----------------|
+| 1 | Prompt caching (`cache_control`) | Low | 80–90% input cost reduction |
+| 2 | Earlier compaction threshold (70%) | Trivial | Reduces drift in long sessions |
+| 3 | Tool result truncation at write time | Low | Reduces context bloat between compactions |
+| 4 | Context file deduplication | Medium | Variable — high for multi-level AGENTS.md setups |
+| 5 | Observation masking (default `transformContext`) | Medium | 50%+ on long-running agents |
+| 6 | Token estimation (proper tokenizer) | Low | Accuracy improvement, minor cost impact |
+| 7 | Markdown over XML audit | Low | 5–15% system prompt reduction |
+| 8 | Skill caching with `cache_control` | Low | Meaningful for skill-heavy sessions |
+| 9 | Dynamic tool set delivery | High | 90%+ on large tool catalogs; major architecture change |
+| 10 | Per-phase cost attribution events | Medium | Visibility only; enables future budget routing |
--- a/docs/pr-1530/01-full.png
+++ b/docs/pr-1530/01-full.png
--- a/docs/pr-1530/02-small.png
+++ b/docs/pr-1530/02-small.png
--- a/docs/pr-1530/03-min.png
+++ b/docs/pr-1530/03-min.png
--- a/docs/pr-1530/04-unhealthy.png
+++ b/docs/pr-1530/04-unhealthy.png
--- a/docs/pr-876/01-index.png
+++ b/docs/pr-876/01-index.png
--- a/docs/pr-876/02-summary.png
+++ b/docs/pr-876/02-summary.png
--- a/docs/pr-876/03-progress.png
+++ b/docs/pr-876/03-progress.png
--- a/docs/pr-876/04-depgraph.png
+++ b/docs/pr-876/04-depgraph.png
--- a/docs/pr-876/05-metrics.png
+++ b/docs/pr-876/05-metrics.png
--- a/docs/pr-876/06-changelog.png
+++ b/docs/pr-876/06-changelog.png
--- a/docs/pr-876/06-timeline.png
+++ b/docs/pr-876/06-timeline.png
--- a/docs/pr-876/07-changelog.png
+++ b/docs/pr-876/07-changelog.png
--- a/docs/pr-876/07-knowledge.png
+++ b/docs/pr-876/07-knowledge.png
--- a/docs/pr-876/08-knowledge.png
+++ b/docs/pr-876/08-knowledge.png
--- a/docs/pr-876/09-captures.png
+++ b/docs/pr-876/09-captures.png
--- a/docs/pr-876/10-artifacts.png
+++ b/docs/pr-876/10-artifacts.png
--- a/docs/providers.md
+++ b/docs/providers.md
@ -0,0 +1,587 @@
+# Provider Setup Guide
+
+Step-by-step setup instructions for every LLM provider GSD supports. If you ran the onboarding wizard (`gsd config`) and picked a provider, you may already be configured — check with `/model` inside a session.
+
+## Table of Contents
+
+- [Quick Reference](#quick-reference)
+- [Built-in Providers](#built-in-providers)
+  - [Anthropic (Claude)](#anthropic-claude)
+  - [OpenAI](#openai)
+  - [Google Gemini](#google-gemini)
+  - [OpenRouter](#openrouter)
+  - [Groq](#groq)
+  - [xAI (Grok)](#xai-grok)
+  - [Mistral](#mistral)
+  - [GitHub Copilot](#github-copilot)
+  - [Amazon Bedrock](#amazon-bedrock)
+  - [Anthropic on Vertex AI](#anthropic-on-vertex-ai)
+  - [Azure OpenAI](#azure-openai)
+- [Local Providers](#local-providers)
+  - [Ollama](#ollama)
+  - [LM Studio](#lm-studio)
+  - [vLLM](#vllm)
+  - [SGLang](#sglang)
+- [Custom OpenAI-Compatible Endpoints](#custom-openai-compatible-endpoints)
+- [Common Pitfalls](#common-pitfalls)
+- [Verifying Your Setup](#verifying-your-setup)
+
+## Quick Reference
+
+| Provider | Auth Method | Env Variable | Config File |
+|----------|-------------|-------------|-------------|
+| Anthropic | OAuth or API key | `ANTHROPIC_API_KEY` | — |
+| OpenAI | API key | `OPENAI_API_KEY` | — |
+| Google Gemini | API key | `GEMINI_API_KEY` | — |
+| OpenRouter | API key | `OPENROUTER_API_KEY` | Optional `models.json` |
+| Groq | API key | `GROQ_API_KEY` | — |
+| xAI | API key | `XAI_API_KEY` | — |
+| Mistral | API key | `MISTRAL_API_KEY` | — |
+| GitHub Copilot | OAuth | `GH_TOKEN` | — |
+| Amazon Bedrock | IAM credentials | `AWS_PROFILE` or `AWS_ACCESS_KEY_ID` | — |
+| Vertex AI | ADC | `GOOGLE_APPLICATION_CREDENTIALS` | — |
+| Azure OpenAI | API key | `AZURE_OPENAI_API_KEY` | — |
+| Ollama | None (local) | — | `models.json` required |
+| LM Studio | None (local) | — | `models.json` required |
+| vLLM / SGLang | None (local) | — | `models.json` required |
+
+---
+
+## Built-in Providers
+
+Built-in providers have models pre-registered in GSD. You only need to supply credentials.
+
+### Anthropic (Claude)
+
+**Recommended.** Anthropic models have the deepest integration: built-in web search, extended thinking, and prompt caching.
+
+**Option A — Browser sign-in (recommended):**
+
+```bash
+gsd config
+# Choose "Sign in with your browser" → "Anthropic (Claude)"
+```
+
+Or inside a session: `/login`
+
+**Option B — API key:**
+
+```bash
+export ANTHROPIC_API_KEY="sk-ant-..."
+```
+
+Or paste it during `gsd config` when prompted.
+
+**Get a key:** [console.anthropic.com/settings/keys](https://console.anthropic.com/settings/keys)
+
+### OpenAI
+
+```bash
+export OPENAI_API_KEY="sk-..."
+```
+
+Or run `gsd config` and choose "Paste an API key" then "OpenAI".
+
+**Get a key:** [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
+
+### Google Gemini
+
+```bash
+export GEMINI_API_KEY="..."
+```
+
+**Get a key:** [aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
+
+### OpenRouter
+
+OpenRouter aggregates 200+ models from multiple providers behind a single API key.
+
+**Step 1 — Get your API key:**
+
+Go to [openrouter.ai/keys](https://openrouter.ai/keys) and create a key.
+
+**Step 2 — Set the key:**
+
+```bash
+export OPENROUTER_API_KEY="sk-or-..."
+```
+
+Or run `gsd config`, choose "Paste an API key", then "OpenRouter".
+
+**Step 3 — Switch to an OpenRouter model:**
+
+Inside a GSD session, type `/model` and select an OpenRouter model. Models are prefixed with `openrouter/` (e.g., `openrouter/anthropic/claude-sonnet-4`).
+
+**Optional — Add custom OpenRouter models via `models.json`:**
+
+If you want models not in the built-in list, add them to `~/.gsd/agent/models.json`:
+
+```json
+{
+  "providers": {
+    "openrouter": {
+      "baseUrl": "https://openrouter.ai/api/v1",
+      "apiKey": "OPENROUTER_API_KEY",
+      "api": "openai-completions",
+      "models": [
+        {
+          "id": "meta-llama/llama-3.3-70b",
+          "name": "Llama 3.3 70B (OpenRouter)",
+          "reasoning": false,
+          "input": ["text"],
+          "contextWindow": 131072,
+          "maxTokens": 32768,
+          "cost": { "input": 0.3, "output": 0.3, "cacheRead": 0, "cacheWrite": 0 }
+        }
+      ]
+    }
+  }
+}
+```
+
+Note: the `apiKey` field here is the *name* of the environment variable, not the literal key. GSD resolves it automatically. You can also use a literal value or a shell command (see [Value Resolution](./custom-models.md#value-resolution)).
+
+**Optional — Route through specific providers:**
+
+Use `modelOverrides` to control which upstream provider OpenRouter uses:
+
+```json
+{
+  "providers": {
+    "openrouter": {
+      "modelOverrides": {
+        "anthropic/claude-sonnet-4": {
+          "compat": {
+            "openRouterRouting": {
+              "only": ["amazon-bedrock"]
+            }
+          }
+        }
+      }
+    }
+  }
+}
+```
+
+### Groq
+
+```bash
+export GROQ_API_KEY="gsk_..."
+```
+
+**Get a key:** [console.groq.com/keys](https://console.groq.com/keys)
+
+### xAI (Grok)
+
+```bash
+export XAI_API_KEY="xai-..."
+```
+
+**Get a key:** [console.x.ai](https://console.x.ai)
+
+### Mistral
+
+```bash
+export MISTRAL_API_KEY="..."
+```
+
+**Get a key:** [console.mistral.ai/api-keys](https://console.mistral.ai/api-keys)
+
+### GitHub Copilot
+
+Uses OAuth — sign in through the browser:
+
+```bash
+gsd config
+# Choose "Sign in with your browser" → "GitHub Copilot"
+```
+
+Requires an active GitHub Copilot subscription.
+
+### Amazon Bedrock
+
+Bedrock uses AWS IAM credentials, not API keys. Any of these work:
+
+```bash
+# Option 1: Named profile
+export AWS_PROFILE="my-profile"
+
+# Option 2: IAM keys
+export AWS_ACCESS_KEY_ID="AKIA..."
+export AWS_SECRET_ACCESS_KEY="..."
+export AWS_REGION="us-east-1"
+
+# Option 3: Bedrock API key (bearer token)
+export AWS_BEARER_TOKEN_BEDROCK="..."
+```
+
+ECS task roles and IRSA (Kubernetes) are also detected automatically.
+
+### Anthropic on Vertex AI
+
+Uses Google Cloud Application Default Credentials:
+
+```bash
+gcloud auth application-default login
+export ANTHROPIC_VERTEX_PROJECT_ID="my-project-id"
+```
+
+Or set `GOOGLE_CLOUD_PROJECT` and ensure ADC credentials exist at `~/.config/gcloud/application_default_credentials.json`.
+
+### Azure OpenAI
+
+```bash
+export AZURE_OPENAI_API_KEY="..."
+```
+
+---
+
+## Local Providers
+
+Local providers run on your machine. They require a `models.json` configuration file because GSD needs to know the endpoint URL and which models are available.
+
+**Config file location:** `~/.gsd/agent/models.json`
+
+The file reloads each time you open `/model` — no restart needed.
+
+### Ollama
+
+**Step 1 — Install and start Ollama:**
+
+```bash
+# macOS
+brew install ollama
+ollama serve
+
+# Or download from https://ollama.com
+```
+
+**Step 2 — Pull a model:**
+
+```bash
+ollama pull llama3.1:8b
+ollama pull qwen2.5-coder:7b
+```
+
+**Step 3 — Create `~/.gsd/agent/models.json`:**
+
+```json
+{
+  "providers": {
+    "ollama": {
+      "baseUrl": "http://localhost:11434/v1",
+      "api": "openai-completions",
+      "apiKey": "ollama",
+      "compat": {
+        "supportsDeveloperRole": false,
+        "supportsReasoningEffort": false
+      },
+      "models": [
+        { "id": "llama3.1:8b" },
+        { "id": "qwen2.5-coder:7b" }
+      ]
+    }
+  }
+}
+```
+
+The `apiKey` is required by the config schema but Ollama ignores it — any value works.
+
+**Step 4 — Select the model:**
+
+Inside GSD, type `/model` and pick your Ollama model.
+
+**Ollama tips:**
+- Ollama does not support the `developer` role or `reasoning_effort` — always set `compat.supportsDeveloperRole: false` and `compat.supportsReasoningEffort: false`.
+- If you get empty responses, check that `ollama serve` is running and the model is pulled.
+- Context window and max tokens default to 128K / 16K if not specified. Override these if your model has different limits.
+
+### LM Studio
+
+**Step 1 — Install LM Studio:**
+
+Download from [lmstudio.ai](https://lmstudio.ai).
+
+**Step 2 — Start the local server:**
+
+In LM Studio, go to the "Local Server" tab, load a model, and click "Start Server". The default port is 1234.
+
+**Step 3 — Create `~/.gsd/agent/models.json`:**
+
+```json
+{
+  "providers": {
+    "lm-studio": {
+      "baseUrl": "http://localhost:1234/v1",
+      "api": "openai-completions",
+      "apiKey": "lm-studio",
+      "compat": {
+        "supportsDeveloperRole": false,
+        "supportsReasoningEffort": false
+      },
+      "models": [
+        {
+          "id": "your-model-name",
+          "name": "My Local Model",
+          "contextWindow": 32768,
+          "maxTokens": 4096
+        }
+      ]
+    }
+  }
+}
+```
+
+Replace `your-model-name` with the model identifier shown in LM Studio's server tab.
+
+**LM Studio tips:**
+- The model ID in `models.json` must match what LM Studio reports in its server API. Check the server tab for the exact string.
+- LM Studio defaults to port 1234. If you changed it, update `baseUrl` accordingly.
+- Increase `contextWindow` and `maxTokens` if your model supports larger contexts.
+
+### vLLM
+
+```json
+{
+  "providers": {
+    "vllm": {
+      "baseUrl": "http://localhost:8000/v1",
+      "api": "openai-completions",
+      "apiKey": "vllm",
+      "compat": {
+        "supportsDeveloperRole": false,
+        "supportsReasoningEffort": false,
+        "supportsUsageInStreaming": false
+      },
+      "models": [
+        {
+          "id": "meta-llama/Llama-3.1-8B-Instruct",
+          "contextWindow": 128000,
+          "maxTokens": 16384
+        }
+      ]
+    }
+  }
+}
+```
+
+The model `id` must match the `--model` flag you passed to `vllm serve`.
+
+### SGLang
+
+```json
+{
+  "providers": {
+    "sglang": {
+      "baseUrl": "http://localhost:30000/v1",
+      "api": "openai-completions",
+      "apiKey": "sglang",
+      "compat": {
+        "supportsDeveloperRole": false,
+        "supportsReasoningEffort": false
+      },
+      "models": [
+        {
+          "id": "meta-llama/Llama-3.1-8B-Instruct"
+        }
+      ]
+    }
+  }
+}
+```
+
+---
+
+## Custom OpenAI-Compatible Endpoints
+
+Any server that implements the OpenAI Chat Completions API can work with GSD. This covers proxies (LiteLLM, Portkey, Helicone), self-hosted inference, and new providers.
+
+**Quickest path — use the onboarding wizard:**
+
+```bash
+gsd config
+# Choose "Paste an API key" → "Custom (OpenAI-compatible)"
+# Enter: base URL, API key, model ID
+```
+
+This writes `~/.gsd/agent/models.json` for you automatically.
+
+**Manual setup:**
+
+```json
+{
+  "providers": {
+    "my-provider": {
+      "baseUrl": "https://my-endpoint.example.com/v1",
+      "apiKey": "MY_PROVIDER_API_KEY",
+      "api": "openai-completions",
+      "models": [
+        {
+          "id": "model-id-here",
+          "name": "Friendly Model Name",
+          "reasoning": false,
+          "input": ["text"],
+          "contextWindow": 128000,
+          "maxTokens": 16384,
+          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
+        }
+      ]
+    }
+  }
+}
+```
+
+**Adding custom headers (for proxies):**
+
+```json
+{
+  "providers": {
+    "litellm-proxy": {
+      "baseUrl": "https://litellm.example.com/v1",
+      "apiKey": "MY_API_KEY",
+      "api": "openai-completions",
+      "headers": {
+        "x-custom-header": "value"
+      },
+      "models": [...]
+    }
+  }
+}
+```
+
+**Qwen models with thinking mode:**
+
+For Qwen-compatible servers, use `thinkingFormat` to enable thinking mode:
+
+```json
+{
+  "compat": {
+    "thinkingFormat": "qwen",
+    "supportsDeveloperRole": false
+  }
+}
+```
+
+Use `"qwen-chat-template"` instead if the server requires `chat_template_kwargs.enable_thinking`.
+
+For the full reference on `compat` fields, `modelOverrides`, value resolution, and advanced configuration, see [Custom Models](./custom-models.md).
+
+---
+
+## Common Pitfalls
+
+### "Authentication failed" with a valid key
+
+**Cause:** The key is set in your shell but not visible to GSD.
+
+**Fix:** Make sure the environment variable is exported in the same terminal where you run `gsd`. Or use `gsd config` to save the key to `~/.gsd/agent/auth.json` so it persists across sessions.
+
+### OpenRouter models not appearing in `/model`
+
+**Cause:** No `OPENROUTER_API_KEY` set, so GSD hides OpenRouter models.
+
+**Fix:** Set the key and restart GSD:
+
+```bash
+export OPENROUTER_API_KEY="sk-or-..."
+gsd
+```
+
+### Ollama returns empty responses
+
+**Cause:** Ollama server isn't running, or the model isn't pulled.
+
+**Fix:**
+
+```bash
+# Verify the server is running
+curl http://localhost:11434/v1/models
+
+# Pull the model if missing
+ollama pull llama3.1:8b
+```
+
+### LM Studio model ID mismatch
+
+**Cause:** The `id` in `models.json` doesn't match what LM Studio exposes via its API.
+
+**Fix:** Check the LM Studio server tab for the exact model identifier. It often includes the filename or quantization level (e.g., `lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF`).
+
+### `developer` role error with local models
+
+**Cause:** Most local inference servers don't support the OpenAI `developer` message role.
+
+**Fix:** Add `compat.supportsDeveloperRole: false` to the provider config. This makes GSD send `system` messages instead:
+
+```json
+{
+  "compat": {
+    "supportsDeveloperRole": false,
+    "supportsReasoningEffort": false
+  }
+}
+```
+
+### `stream_options` error with local models
+
+**Cause:** Some servers don't support `stream_options: { include_usage: true }`.
+
+**Fix:** Add `compat.supportsUsageInStreaming: false`:
+
+```json
+{
+  "compat": {
+    "supportsUsageInStreaming": false
+  }
+}
+```
+
+### "apiKey is required" validation error
+
+**Cause:** `models.json` schema requires `apiKey` when `models` are defined.
+
+**Fix:** For local servers that don't need auth, set a dummy value:
+
+```json
+"apiKey": "not-needed"
+```
+
+### Cost shows $0.00 for custom models
+
+**Expected behavior.** GSD defaults cost to zero for custom models. Override with the `cost` field if you want accurate cost tracking:
+
+```json
+"cost": { "input": 0.15, "output": 0.60, "cacheRead": 0.015, "cacheWrite": 0.19 }
+```
+
+Values are per million tokens.
+
+---
+
+## Verifying Your Setup
+
+After configuring a provider:
+
+1. **Launch GSD:**
+   ```bash
+   gsd
+   ```
+
+2. **Check available models:**
+   ```
+   /model
+   ```
+   Your provider's models should appear in the list.
+
+3. **Switch to the model:**
+   Select it from the `/model` picker.
+
+4. **Send a test message:**
+   Type anything to confirm the model responds.
+
+If the model doesn't appear, check:
+- The environment variable is set in the current shell
+- `models.json` is valid JSON (use `cat ~/.gsd/agent/models.json | python3 -m json.tool`)
+- The server is running (for local providers)
+
+For additional help, see [Troubleshooting](./troubleshooting.md) or run `/gsd doctor` inside a session.
--- a/docs/remote-questions.md
+++ b/docs/remote-questions.md
@ -16,7 +16,7 @@ The setup wizard:
 3. Lists servers the bot belongs to (or lets you pick)
 4. Lists text channels in the selected server
 5. Sends a test message to confirm permissions
-6. Saves the configuration to `~/.gsd/preferences.md`
+6. Saves the configuration to `~/.gsd/PREFERENCES.md`

 **Bot requirements:**
 - A Discord bot application with a token (from [Discord Developer Portal](https://discord.com/developers/applications))
@ -65,7 +65,7 @@ The setup wizard:

 ## Configuration

-Remote questions are configured in `~/.gsd/preferences.md`:
+Remote questions are configured in `~/.gsd/PREFERENCES.md`:

 ```yaml
 remote_questions:
--- a/docs/skills.md
+++ b/docs/skills.md
@ -2,28 +2,85 @@

 Skills are specialized instruction sets that GSD loads when the task matches. They provide domain-specific guidance for the LLM — coding patterns, framework idioms, testing strategies, and tool usage.

-## Bundled Skills
+Skills follow the open [Agent Skills standard](https://agentskills.io/) and are **not GSD-specific** — they work with Claude Code, OpenAI Codex, Cursor, GitHub Copilot, Windsurf, and 40+ other agents.

-GSD ships with these skills, installed to `~/.gsd/agent/skills/`:
+## Skill Directories

-| Skill | Trigger | Description |
-|-------|---------|-------------|
-| `frontend-design` | Web UI work — components, pages, dashboards, styling | Production-grade frontend with high design quality |
-| `swiftui` | macOS/iOS apps — SwiftUI, Xcode, App Store | Full lifecycle from creation to shipping |
-| `debug-like-expert` | Complex debugging — after standard approaches fail | Methodical investigation with evidence gathering |
-| `rust-core` | Rust code — ownership, lifetimes, traits, async | Idiomatic, safe, performant Rust patterns |
-| `axum-web-framework` | Axum web apps — routing, middleware, extractors | Complete Axum development guide |
-| `axum-tests` | Testing Axum apps — integration tests, mock state | Test patterns for Axum applications |
-| `tauri` | Tauri v2 desktop apps — setup, plugins, bundling | Cross-platform desktop app development |
-| `tauri-ipc-developer` | Tauri IPC — React-Rust type-safe communication | Command scaffolding and serialization |
-| `tauri-devtools` | Tauri debugging — CrabNebula DevTools integration | Profiling and monitoring |
-| `github-workflows` | GitHub Actions — CI/CD, workflow debugging | Live syntax, run monitoring, failure diagnosis |
-| `security-audit` | Security auditing — dependency scanning, OWASP | Comprehensive security assessment |
-| `security-review` | Code security review — injection, XSS, auth flaws | Vulnerability-focused code review |
-| `security-docker` | Docker security — Dockerfile, runtime hardening | Container security best practices |
-| `review` | Code review — staged changes, PRs, security, performance | Diff-aware code review with quality analysis |
-| `test` | Test generation and execution — auto-detects frameworks | Generate tests or run existing suites with failure analysis |
-| `lint` | Linting and formatting — ESLint, Biome, Prettier | Auto-detect linter, fix issues, report remaining problems |
+GSD reads skills from two locations, in priority order:
+
+| Location                          | Scope   | Description                                              |
+|-----------------------------------|---------|----------------------------------------------------------|
+| `~/.agents/skills/`              | Global  | Shared across all projects and all compatible agents     |
+| `.agents/skills/` (project root) | Project | Project-specific skills, committable to version control  |
+
+Global skills take precedence over project skills when names collide.
+
+> **Migration from `~/.gsd/agent/skills/`:** On first launch after upgrading, GSD automatically copies skills from the legacy `~/.gsd/agent/skills/` directory to `~/.agents/skills/`. The old directory is preserved for backward compatibility.
+
+## Installing Skills
+
+Skills are installed via the [skills.sh CLI](https://skills.sh):
+
+```bash
+# Interactive — choose skills and target agents
+npx skills add dpearson2699/swift-ios-skills
+
+# Install specific skills non-interactively
+npx skills add dpearson2699/swift-ios-skills --skill swift-concurrency --skill swiftui-patterns -y
+
+# Install all skills from a repo
+npx skills add dpearson2699/swift-ios-skills --all
+
+# Check for updates
+npx skills check
+
+# Update installed skills
+npx skills update
+```
+
+### Onboarding Catalog
+
+During `gsd init`, GSD detects the project's tech stack and recommends relevant skill packs. For brownfield projects, detection is automatic; for greenfield projects, the user picks a tech stack.
+
+The curated catalog is maintained in `src/resources/extensions/gsd/skill-catalog.ts`. Each entry maps a tech stack to a skills.sh repo and specific skill names.
+
+#### Available Skill Packs
+
+**Swift (any Swift project — `Package.swift` or `.xcodeproj` detected):**
+- **SwiftUI** — layout, navigation, animations, gestures, Liquid Glass
+- **Swift Core** — Swift language, concurrency, Codable, Charts, Testing, SwiftData
+
+**iOS (only when `.xcodeproj` targets `iphoneos` via SDKROOT):**
+- **iOS App Frameworks** — App Intents, Widgets, StoreKit, MapKit, Live Activities
+- **iOS Data Frameworks** — CloudKit, HealthKit, MusicKit, WeatherKit, Contacts
+- **iOS AI & ML** — Core ML, Vision, on-device AI, speech recognition
+- **iOS Engineering** — networking, security, accessibility, localization, Instruments
+- **iOS Hardware** — Bluetooth, CoreMotion, NFC, PencilKit, RealityKit
+- **iOS Platform** — CallKit, EnergyKit, HomeKit, SharePlay, PermissionKit
+
+**Web:**
+- **React & Web Frontend** — React best practices, web design, composition patterns
+- **React Native** — cross-platform mobile patterns
+- **Frontend Design & UX** — frontend design, accessibility
+
+**Languages:**
+- **Rust** — Rust patterns and best practices
+- **Python** — Python patterns and best practices
+- **Go** — Go patterns and best practices
+
+**General:**
+- **Document Handling** — PDF, DOCX, XLSX, PPTX creation and manipulation
+
+### Maintaining the Catalog
+
+The skill catalog lives in [`src/resources/extensions/gsd/skill-catalog.ts`](../src/resources/extensions/gsd/skill-catalog.ts). To add or update a pack:
+
+1. Add a `SkillPack` entry to the `SKILL_CATALOG` array with `repo`, `skills`, and matching criteria
+2. For language-detection matching, use `matchLanguages` (values from `detection.ts` `LANGUAGE_MAP`)
+3. For Xcode platform matching, use `matchXcodePlatforms` (e.g., `["iphoneos"]` — parsed from `SDKROOT` in `project.pbxproj`)
+4. For file-presence matching, use `matchFiles` (checked against `PROJECT_FILES` in `detection.ts`)
+5. If the pack should appear in greenfield choices, add it to `GREENFIELD_STACKS`
+6. Packs sharing the same `repo` are batched into a single `npx skills add` invocation

 ## Skill Discovery

@ -59,18 +116,18 @@ skill_rules:
 ### Resolution Order

 Skills can be referenced by:
-1. **Bare name** — e.g., `frontend-design` → scans `~/.gsd/agent/skills/` and project skills
-2. **Absolute path** — e.g., `/Users/you/.gsd/agent/skills/my-skill/SKILL.md`
+1. **Bare name** — e.g., `frontend-design` → scans `~/.agents/skills/` and project `.agents/skills/`
+2. **Absolute path** — e.g., `/Users/you/.agents/skills/my-skill/SKILL.md`
 3. **Directory path** — e.g., `~/custom-skills/my-skill` → looks for `SKILL.md` inside

-User skills (`~/.gsd/agent/skills/`) take precedence over project skills.
+Global skills (`~/.agents/skills/`) take precedence over project skills (`.agents/skills/`).

 ## Custom Skills

 Create your own skills by adding a directory with a `SKILL.md` file:

 ```
-~/.gsd/agent/skills/my-skill/
+~/.agents/skills/my-skill/
  SKILL.md           — instructions for the LLM
  references/        — optional reference files
 ```
@ -82,10 +139,12 @@ The `SKILL.md` file contains instructions the LLM follows when the skill is acti
 Place skills in your project for project-specific guidance:

 ```
-.gsd/agent/skills/my-project-skill/
+.agents/skills/my-project-skill/
  SKILL.md
 ```

+Project-local skills can be committed to version control so team members share the same skill set.
+
 ## Skill Lifecycle Management

 GSD tracks skill performance across auto-mode sessions and surfaces health data to help you maintain skill quality.
--- a/docs/token-optimization.md
+++ b/docs/token-optimization.md
@ -257,20 +257,64 @@ models:
 ## How the Pieces Fit Together

 ```
-preferences.md
+PREFERENCES.md
  └─ token_profile: balanced
       ├─ resolveProfileDefaults() → model defaults + phase skip defaults
       ├─ resolveInlineLevel() → standard
       │    └─ prompt builders gate context inclusion by level
-       └─ classifyUnitComplexity() → routes to execution/execution_simple model
-            ├─ task plan analysis (steps, files, signals)
-            ├─ unit type defaults
-            ├─ budget pressure adjustment
-            └─ adaptive learning from routing-history.json
+       ├─ classifyUnitComplexity() → routes to execution/execution_simple model
+       │    ├─ task plan analysis (steps, files, signals)
+       │    ├─ unit type defaults
+       │    ├─ budget pressure adjustment
+       │    ├─ adaptive learning from routing-history.json
+       │    └─ capability scoring (when capability_routing: true)
+       │         └─ 7-dimension model profiles × task requirement vectors
+       └─ context_management
+            ├─ observation masking (before_provider_request hook)
+            ├─ tool result truncation (tool_result_max_chars)
+            └─ phase handoff anchors (injected into prompt builders)
 ```

 The profile is resolved once and flows through the entire dispatch pipeline. Explicit preferences override profile defaults at every layer.

+## Observation Masking
+
+*Introduced in v2.59.0*
+
+During auto-mode sessions, tool results accumulate in the conversation history and consume context window space. Observation masking replaces tool result content older than N user turns with a lightweight placeholder before each LLM call. This reduces token usage with zero LLM overhead — no summarization calls, no latency.
+
+Masking is enabled by default during auto-mode. Configure via preferences:
+
+```yaml
+context_management:
+  observation_masking: true     # default: true (set false to disable)
+  observation_mask_turns: 8     # keep results from last 8 user turns (range: 1-50)
+  tool_result_max_chars: 800    # truncate individual tool results beyond this length
+```
+
+### How It Works
+
+1. Before each provider request, the `before_provider_request` hook inspects the messages array
+2. Tool results (`toolResult`, `bashExecution`) older than the configured turn threshold are replaced with `[result masked — within summarized history]`
+3. Recent tool results (within the keep window) are preserved in full
+4. All assistant and user messages are always preserved — only tool result content is masked
+
+This pairs with the existing compaction system: masking reduces context pressure between compactions, and compaction handles the full context reset when the window fills.
+
+### Tool Result Truncation
+
+Individual tool results that exceed `tool_result_max_chars` (default: 800) are truncated with a `…[truncated]` marker. This prevents a single large tool output from dominating the context window.
+
+## Phase Handoff Anchors
+
+*Introduced in v2.59.0*
+
+When auto-mode transitions between phases (research → planning → execution), structured JSON anchors are written to `.gsd/milestones/<mid>/anchors/<phase>.json`. Downstream prompt builders inject these anchors so the next phase inherits intent, decisions, blockers, and next steps without re-inferring from artifact files.
+
+This reduces context drift — the 65% of enterprise agent failures caused by agents losing track of prior decisions across phase boundaries.
+
+Anchors are written automatically after successful completion of `research-milestone`, `research-slice`, `plan-milestone`, and `plan-slice` units. No configuration needed.
+
 ## Prompt Compression

 *Introduced in v2.29.0*
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@ -97,6 +97,8 @@ models:

 **Headless mode:** `gsd headless auto` auto-restarts the entire process on crash (default 3 attempts with exponential backoff). Combined with provider error auto-resume, this enables true overnight unattended execution.

+For common provider setup issues (role errors, streaming errors, model ID mismatches), see the [Provider Setup Guide — Common Pitfalls](./providers.md#common-pitfalls).
+
 ### Budget ceiling reached

 **Symptoms:** Auto mode pauses with "Budget ceiling reached."
@ -151,6 +153,38 @@ rm -rf "$(dirname .gsd)/.gsd.lock"
 - If the error persists, close tools that may be holding the file open and then retry.
 - If repeated failures continue, run `/gsd doctor` to confirm the repo state is still healthy and report the exact path + error code.

+### Node v24 web boot failure
+
+**Symptoms:** `gsd --web` fails with `ERR_UNSUPPORTED_NODE_MODULES_TYPE_STRIPPING` on Node v24.
+
+**Cause:** Node v24 changed type-stripping behavior for `node_modules`, breaking the Next.js web build.
+
+**Fix:** Fixed in v2.42.0+ (#1864). Upgrade to the latest version.
+
+### Orphan web server process
+
+**Symptoms:** `gsd --web` fails because port 3000 is already in use, even though no GSD session is running.
+
+**Cause:** A previous web server process was not cleaned up on exit.
+
+**Fix:** Fixed in v2.42.0+. GSD now cleans up stale web server processes automatically. If you're on an older version, kill the orphan process manually: `lsof -ti:3000 | xargs kill`.
+
+### Non-JS project blocked by worktree health check
+
+**Symptoms:** Worktree health check fails or blocks auto-mode in projects that don't use Node.js (e.g., Rust, Go, Python).
+
+**Cause:** The worktree health check only recognized JavaScript ecosystems prior to v2.42.0.
+
+**Fix:** Fixed in v2.42.0+ (#1860). The health check now supports 17+ ecosystems. Upgrade to the latest version.
+
+### German/non-English locale git errors
+
+**Symptoms:** Git commands fail or produce unexpected results when the system locale is non-English (e.g., German).
+
+**Cause:** GSD parsed git output assuming English locale strings.
+
+**Fix:** Fixed in v2.42.0+. All git commands now force `LC_ALL=C` to ensure consistent English output regardless of system locale.
+
 ## MCP Client Issues

 ### `mcp_servers` shows no configured servers
@ -278,6 +312,16 @@ Doctor rebuilds `STATE.md` from plan and roadmap files on disk and fixes detecte
 - **Forensics:** `/gsd forensics` for structured post-mortem analysis of auto-mode failures
 - **Session logs:** `.gsd/activity/` contains JSONL session dumps for crash forensics

+## iTerm2-Specific Issues
+
+### Ctrl+Alt shortcuts trigger the wrong action (e.g., Ctrl+Alt+G opens external editor instead of GSD dashboard)
+
+**Symptoms:** Pressing Ctrl+Alt+G opens the external editor prompt (Ctrl+G) instead of the GSD dashboard. Other Ctrl+Alt shortcuts behave as their Ctrl-only counterparts.
+
+**Cause:** iTerm2's default Left Option Key setting is "Normal", which swallows the Alt modifier for Ctrl+Alt key combinations. The terminal receives only the Ctrl key, so Ctrl+Alt+G arrives as Ctrl+G.
+
+**Fix:** In iTerm2, go to **Profiles → Keys → General** and set **Left Option Key** to **Esc+**. This makes Alt/Option send an escape prefix that terminal applications can detect, enabling Ctrl+Alt shortcuts to work correctly.
+
 ## Windows-Specific Issues

 ### LSP returns ENOENT on Windows (MSYS2/Git Bash)
@ -339,3 +383,33 @@ This shows which servers are active and, if none are found, diagnoses why — in
 | Go | `go install golang.org/x/tools/gopls@latest` |

 After installing, run `lsp reload` to restart detection without restarting GSD.
+
+## Notifications
+
+### Notifications not appearing on macOS
+
+**Symptoms:** `notifications.enabled: true` in preferences, but no desktop notifications appear during auto-mode (no milestone complete alerts, no budget warnings, no error notifications). No error messages logged.
+
+**Cause:** GSD uses `osascript display notification` as a fallback on macOS. This command is attributed to your terminal app (Ghostty, iTerm2, Alacritty, Kitty, Warp, etc.). If that app doesn't have notification permissions in System Settings → Notifications, macOS silently drops the notification — `osascript` exits 0 with no error.
+
+Most terminal apps don't appear in the Notifications settings panel until they've successfully delivered at least one notification, creating a chicken-and-egg problem.
+
+**Fix (recommended):** Install `terminal-notifier`, which registers as its own Notification Center app:
+
+```bash
+brew install terminal-notifier
+```
+
+GSD automatically prefers `terminal-notifier` when available. On first use, macOS will prompt you to allow notifications — this is the expected behavior.
+
+**Fix (alternative):** Go to **System Settings → Notifications** and enable notifications for your terminal app. If your terminal doesn't appear in the list, try sending a test notification from Terminal.app first to register "Script Editor":
+
+```bash
+osascript -e 'display notification "test" with title "GSD"'
+```
+
+**Verify:** After applying either fix, test with:
+
+```bash
+terminal-notifier -title "GSD" -message "working!" -sound Glass
+```
--- a/docs/web-interface.md
+++ b/docs/web-interface.md
@ -7,16 +7,29 @@ GSD includes a browser-based web interface for project management, real-time pro
 ## Quick Start

 ```bash
-pi --web
+gsd --web
 ```

 This starts a local web server and opens the GSD dashboard in your default browser.

+### CLI Flags (v2.42.0)
+
+```bash
+gsd --web --host 0.0.0.0 --port 8080 --allowed-origins "https://example.com"
+```
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--host` | `localhost` | Bind address for the web server |
+| `--port` | `3000` | Port for the web server |
+| `--allowed-origins` | (none) | Comma-separated list of allowed CORS origins |
+
 ## Features

 - **Project management** — view milestones, slices, and tasks in a visual dashboard
 - **Real-time progress** — server-sent events push status updates as auto-mode executes
 - **Multi-project support** — manage multiple projects from a single browser tab via `?project=` URL parameter
+- **Change project root** — switch project directories from the web UI without restarting the server (v2.44)
 - **Onboarding flow** — API key setup and provider configuration through the browser
 - **Model selection** — switch models and providers from the web UI

@ -31,7 +44,7 @@ Key components:

 ## Configuration

-The web server binds to `localhost` by default. No additional configuration is required.
+The web server binds to `localhost:3000` by default. Use `--host`, `--port`, and `--allowed-origins` to override (see CLI Flags above).

 ### Environment Variables

@ -39,6 +52,14 @@ The web server binds to `localhost` by default. No additional configuration is r
 |----------|-------------|
 | `GSD_WEB_PROJECT_CWD` | Default project path when `?project=` is not specified |

+## Node v24 Compatibility
+
+Node v24 introduced breaking changes to type stripping that caused `ERR_UNSUPPORTED_NODE_MODULES_TYPE_STRIPPING` on web boot. This is fixed in v2.42.0+ (#1864). If you encounter this error, upgrade GSD.
+
+## Auth Token Persistence
+
+As of v2.42.0, the web UI persists the auth token in `sessionStorage` so it survives page refreshes (#1877). Previously, refreshing the page required re-authentication.
+
 ## Platform Notes

 - **Windows**: The web build is skipped on Windows due to Next.js webpack EPERM issues with system directories. The CLI remains fully functional.
--- a/docs/what-is-pi/09-the-customization-stack.md
+++ b/docs/what-is-pi/09-the-customization-stack.md
@ -48,8 +48,8 @@ On-demand capability packages following the [Agent Skills standard](https://agen
 ```

 **Placement:**
- `~/.gsd/agent/skills/` or `~/.agents/skills/` (global)
- `.gsd/skills/` or `.agents/skills/` (project, searched up to git root)
+- `~/.agents/skills/` (global — shared across all agents)
+- `.agents/skills/` (project, searched up to git root)

 **Skill structure:**
 ```
--- a/docs/what-is-pi/15-pi-packages-the-ecosystem.md
+++ b/docs/what-is-pi/15-pi-packages-the-ecosystem.md
@ -38,6 +38,6 @@ Or just use conventional directory names (`extensions/`, `skills/`, `prompts/`,

 - [Package gallery](https://shittycodingagent.ai/packages)
 - [npm search](https://www.npmjs.com/search?q=keywords%3Api-package)
- [Discord community](https://discord.com/invite/3cU7Bz4UPx)
+- [Discord community](https://discord.com/invite/nKXTsAcmbT)

 ---
--- a/docs/what-is-pi/18-quick-reference-commands-shortcuts.md
+++ b/docs/what-is-pi/18-quick-reference-commands-shortcuts.md
@ -40,6 +40,8 @@
 | Alt+Enter (during streaming) | Queue follow-up message |
 | Alt+Up | Retrieve queued messages |

+> **iTerm2 users:** Ctrl+Alt shortcuts (e.g., Ctrl+Alt+G for the GSD dashboard) require Left Option Key set to "Esc+" in Profiles → Keys → General. The default "Normal" setting swallows the Alt modifier.
+
 ### CLI

 ```bash
--- a/docs/working-in-teams.md
+++ b/docs/working-in-teams.md
@ -9,7 +9,7 @@ GSD supports multi-user workflows where several developers work on the same repo
 The simplest way to configure GSD for team use is to set `mode: team` in your project preferences. This enables unique milestone IDs, push branches, and pre-merge checks in one setting:

 ```yaml
-# .gsd/preferences.md (project-level, committed to git)
+# .gsd/PREFERENCES.md (project-level, committed to git)
 ---
 version: 1
 mode: team
@ -38,7 +38,7 @@ Share planning artifacts (milestones, roadmaps, decisions) while keeping runtime
 ```

 **What gets shared** (committed to git):
- `.gsd/preferences.md` — project preferences
+- `.gsd/PREFERENCES.md` — project preferences
 - `.gsd/PROJECT.md` — living project description
 - `.gsd/REQUIREMENTS.md` — requirement contract
 - `.gsd/DECISIONS.md` — architectural decisions
@ -50,7 +50,7 @@ Share planning artifacts (milestones, roadmaps, decisions) while keeping runtime
 ### 3. Commit the Preferences

 ```bash
-git add .gsd/preferences.md
+git add .gsd/PREFERENCES.md
 git commit -m "chore: enable GSD team workflow"
 ```

@ -71,7 +71,7 @@ If you have an existing project with `.gsd/` blanket-ignored:

 1. Ensure no milestones are in progress (clean state)
 2. Update `.gitignore` to use the selective pattern above
-3. Add `unique_milestone_ids: true` to `.gsd/preferences.md`
+3. Add `unique_milestone_ids: true` to `.gsd/PREFERENCES.md`
 4. Optionally rename existing milestones to use unique IDs:
   ```
   I have turned on unique milestone ids, please update all old milestone
--- a/gsd-orchestrator/SKILL.md
+++ b/gsd-orchestrator/SKILL.md
@ -0,0 +1,215 @@
+---
+name: gsd-orchestrator
+description: >
+  Build software products autonomously via GSD headless mode. Handles the full
+  lifecycle: write a spec, launch a build, poll for completion, handle blockers,
+  track costs, and verify the result. Use when asked to "build something",
+  "create a project", "run gsd", "check build status", or any task that
+  requires autonomous software development via subprocess.
+metadata:
+  openclaw:
+    requires:
+      bins: [gsd]
+    install:
+      kind: node
+      package: gsd-pi
+      bins: [gsd]
+---
+
+<objective>
+You are an autonomous agent that builds software by orchestrating GSD as a subprocess.
+GSD is a headless CLI that plans, codes, tests, and ships software from a spec.
+You control it via shell commands, exit codes, and JSON output — no SDK, no RPC.
+</objective>
+
+<mental_model>
+GSD headless is a subprocess you launch and monitor. Think of it like a junior developer
+you hand a spec to:
+
+1. You write the spec (what to build)
+2. You launch the build (`gsd headless ... new-milestone --context spec.md --auto`)
+3. You wait for it to finish (exit code tells you the outcome)
+4. You check the result (query state, inspect files, verify deliverables)
+5. If blocked, you intervene (steer, supply answers, or escalate)
+
+The subprocess handles all planning, coding, testing, and git commits internally.
+You never write application code yourself — GSD does that.
+</mental_model>
+
+<critical_rules>
+- **Flags before command.** `gsd headless [--flags] [command] [args]`. Flags after the command are ignored.
+- **Redirect stderr.** JSON output goes to stdout. Progress goes to stderr. Always `2>/dev/null` when parsing JSON.
+- **Check exit codes.** 0=success, 1=error, 10=blocked (needs you), 11=cancelled.
+- **Use `query` to poll.** Instant (~50ms), no LLM cost. Use it between steps, not `auto` for status.
+- **Budget awareness.** Track `cost.total` from query results. Set limits before launching long runs.
+- **One project directory per build.** Each GSD project needs its own directory with a `.gsd/` folder.
+</critical_rules>
+
+<routing>
+Route based on what you need to do:
+
+**Build something from scratch:**
+Read `workflows/build-from-spec.md` — write spec, init directory, launch, monitor, verify.
+
+**Check on a running or completed build:**
+Read `workflows/monitor-and-poll.md` — query state, interpret phases, handle blockers.
+
+**Execute with fine-grained control:**
+Read `workflows/step-by-step.md` — run one unit at a time with decision points.
+
+**Understand the JSON output:**
+Read `references/json-result.md` — field reference for HeadlessJsonResult.
+
+**Pre-supply answers or secrets:**
+Read `references/answer-injection.md` — answer file schema and injection mechanism.
+
+**Look up a specific command:**
+Read `references/commands.md` — full command reference with flags and examples.
+</routing>
+
+<quick_reference>
+
+**Launch a full build (spec to working code):**
+```bash
+mkdir -p /tmp/my-project && cd /tmp/my-project && git init
+cat > spec.md << 'EOF'
+# Your Product Spec Here
+Build a ...
+EOF
+gsd headless --output-format json --context spec.md new-milestone --auto 2>/dev/null
+```
+
+**Check project state (instant, free):**
+```bash
+cd /path/to/project
+gsd headless query | jq '{phase: .state.phase, progress: .state.progress, cost: .cost.total}'
+```
+
+**Resume work on an existing project:**
+```bash
+cd /path/to/project
+gsd headless --output-format json auto 2>/dev/null
+```
+
+**Run one step at a time:**
+```bash
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+echo "$RESULT" | jq '{status: .status, phase: .phase, cost: .cost.total}'
+```
+
+</quick_reference>
+
+<exit_codes>
+| Code | Meaning | Your action |
+|------|---------|-------------|
+| `0`  | Success | Check deliverables, verify output, report completion |
+| `1`  | Error or timeout | Inspect stderr, check `.gsd/STATE.md`, retry or escalate |
+| `10` | Blocked | Query state for blocker details, steer around it or escalate to human |
+| `11` | Cancelled | Process was interrupted — resume with `--resume <sessionId>` or restart |
+</exit_codes>
+
+<project_structure>
+GSD creates and manages all state in `.gsd/`:
+```
+.gsd/
+  PROJECT.md          # What this project is
+  REQUIREMENTS.md     # Capability contract
+  DECISIONS.md        # Architectural decisions (append-only)
+  KNOWLEDGE.md        # Persistent project knowledge (patterns, rules, lessons)
+  STATE.md            # Current phase and next action
+  milestones/
+    M001-xxxxx/
+      M001-xxxxx-CONTEXT.md    # Scope, constraints, assumptions
+      M001-xxxxx-ROADMAP.md    # Slices with checkboxes
+      M001-xxxxx-SUMMARY.md    # Completion summary
+      slices/S01/
+        S01-PLAN.md            # Tasks
+        S01-SUMMARY.md         # Slice summary
+        tasks/
+          T01-PLAN.md          # Individual task spec
+          T01-SUMMARY.md       # Task completion summary
+```
+
+State is derived from files on disk — checkboxes in ROADMAP.md and PLAN.md are the source of truth for completion. You never need to edit these files. GSD manages them. But you can read them to understand progress.
+</project_structure>
+
+<flags>
+| Flag | Description |
+|------|-------------|
+| `--output-format <fmt>` | `text` (default), `json` (structured result at exit), `stream-json` (JSONL events) |
+| `--json` | Alias for `--output-format stream-json` — JSONL event stream to stdout |
+| `--bare` | Skip CLAUDE.md, AGENTS.md, user settings, user skills. Use for CI/ecosystem runs. |
+| `--resume <id>` | Resume a prior headless session by its session ID |
+| `--timeout N` | Overall timeout in ms (default: 300000, use 0 to disable) |
+| `--model ID` | Override LLM model |
+| `--supervised` | Forward interactive UI requests to orchestrator via stdout/stdin |
+| `--response-timeout N` | Timeout (ms) for orchestrator response in supervised mode (default: 30000) |
+| `--answers <path>` | Pre-supply answers and secrets from JSON file |
+| `--events <types>` | Filter JSONL to specific event types (comma-separated, implies `--json`) |
+| `--verbose` | Show tool calls in progress output |
+| `--context <path>` | Spec file path for `new-milestone` (use `-` for stdin) |
+| `--context-text <text>` | Inline spec text for `new-milestone` |
+| `--auto` | Chain into auto-mode after `new-milestone` |
+</flags>
+
+<answer_injection>
+Pre-supply answers and secrets for fully autonomous runs:
+
+```bash
+gsd headless --answers answers.json --output-format json auto 2>/dev/null
+```
+
+```json
+{
+  "questions": { "question_id": "selected_option" },
+  "secrets": { "API_KEY": "sk-..." },
+  "defaults": { "strategy": "first_option" }
+}
+```
+
+- **questions** — question ID to answer (string for single-select, string[] for multi-select)
+- **secrets** — env var to value, injected into child process environment
+- **defaults.strategy** — `"first_option"` (default) or `"cancel"` for unmatched questions
+
+See `references/answer-injection.md` for the full mechanism.
+</answer_injection>
+
+<event_streaming>
+For real-time monitoring, use JSONL event streaming:
+
+```bash
+gsd headless --json auto 2>/dev/null | while read -r line; do
+  TYPE=$(echo "$line" | jq -r '.type')
+  case "$TYPE" in
+    tool_execution_start) echo "Tool: $(echo "$line" | jq -r '.toolName')" ;;
+    extension_ui_request) echo "GSD: $(echo "$line" | jq -r '.message // .title // empty')" ;;
+    agent_end) echo "Session ended" ;;
+  esac
+done
+```
+
+Filter to specific events: `--events agent_end,execution_complete,extension_ui_request`
+
+Available types: `agent_start`, `agent_end`, `tool_execution_start`, `tool_execution_end`,
+`tool_execution_update`, `extension_ui_request`, `message_start`, `message_end`,
+`message_update`, `turn_start`, `turn_end`, `cost_update`, `execution_complete`.
+</event_streaming>
+
+<all_commands>
+| Command | Purpose |
+|---------|---------|
+| `auto` | Run all queued units until milestone complete or blocked (default) |
+| `next` | Run exactly one unit, then exit |
+| `query` | Instant JSON snapshot — state, next dispatch, costs (no LLM, ~50ms) |
+| `new-milestone` | Create milestone from spec file |
+| `dispatch <phase>` | Force specific phase (research, plan, execute, complete, reassess, uat, replan) |
+| `stop` / `pause` | Control auto-mode |
+| `steer <desc>` | Hard-steer plan mid-execution |
+| `skip` / `undo` | Unit control |
+| `queue` | Queue/reorder milestones |
+| `history` | View execution history |
+| `doctor` | Health check + auto-fix |
+| `knowledge <rule>` | Add persistent project knowledge |
+
+See `references/commands.md` for the complete reference.
+</all_commands>
--- a/gsd-orchestrator/references/answer-injection.md
+++ b/gsd-orchestrator/references/answer-injection.md
@ -0,0 +1,119 @@
+# Answer Injection
+
+Pre-supply answers and secrets to eliminate interactive prompts during headless execution.
+
+## Usage
+
+```bash
+gsd headless --answers answers.json auto
+gsd headless --answers answers.json new-milestone --context spec.md --auto
+```
+
+The `--answers` flag takes a path to a JSON file containing pre-supplied answers and secrets.
+
+## Answer File Schema
+
+```json
+{
+  "questions": {
+    "question_id": "selected_option_label",
+    "multi_select_question": ["option_a", "option_b"]
+  },
+  "secrets": {
+    "API_KEY": "sk-...",
+    "DATABASE_URL": "postgres://..."
+  },
+  "defaults": {
+    "strategy": "first_option"
+  }
+}
+```
+
+### Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `questions` | `Record<string, string \| string[]>` | Map question ID → answer. String for single-select, string array for multi-select. |
+| `secrets` | `Record<string, string>` | Map env var name → value. Injected into child process environment variables. |
+| `defaults.strategy` | `"first_option" \| "cancel"` | Fallback for unmatched questions. Default: `"first_option"`. |
+
+## How Secrets Work
+
+Secrets are injected as environment variables into the GSD child process:
+
+1. The orchestrator passes the answer file via `--answers`
+2. GSD reads the file and sets secret values as env vars in the child process
+3. When `secure_env_collect` runs inside the agent, it finds the keys already in `process.env`
+4. The tool skips the interactive prompt and reports the keys as "already configured"
+
+Secrets are never logged or included in event streams.
+
+## How Question Matching Works
+
+Two-phase correlation:
+
+1. **Observe** — GSD monitors `tool_execution_start` events for `ask_user_questions` to extract question metadata (ID, options, allowMultiple)
+2. **Match** — Subsequent `extension_ui_request` events are correlated to the metadata and responded to with the pre-supplied answer
+
+Handles out-of-order events (extension_ui_request can arrive before tool_execution_start) via a deferred processing queue with 500ms timeout.
+
+## Coexistence with `--supervised`
+
+Both `--answers` and `--supervised` can be active simultaneously. Priority order:
+
+1. Answer injector tries first
+2. If no answer found, supervised mode forwards to the orchestrator
+3. If no orchestrator response within `--response-timeout`, the auto-responder kicks in
+
+## Without Answer Injection
+
+Headless mode has built-in auto-responders for all prompt types:
+
+| Prompt Type | Default Behavior |
+|-------------|-----------------|
+| Select | Picks first option |
+| Confirm | Auto-confirms |
+| Input | Empty string |
+| Editor | Returns prefill or empty |
+
+Answer injection overrides these defaults with specific answers when precision matters.
+
+## Diagnostics
+
+The injector tracks statistics printed in the session summary:
+
+| Stat | Description |
+|------|-------------|
+| `questionsAnswered` | Questions resolved from the answer file |
+| `questionsDefaulted` | Questions handled by the default strategy |
+| `secretsProvided` | Number of secrets injected |
+
+Unused question IDs and secret keys are warned about at exit.
+
+## Example: Orchestrator with Answers
+
+```bash
+# Create answer file
+cat > answers.json << 'EOF'
+{
+  "questions": {
+    "test_framework": "vitest",
+    "package_manager": "pnpm"
+  },
+  "secrets": {
+    "OPENAI_API_KEY": "sk-...",
+    "DATABASE_URL": "postgres://localhost:5432/mydb"
+  },
+  "defaults": {
+    "strategy": "first_option"
+  }
+}
+EOF
+
+# Run with pre-supplied answers
+gsd headless --answers answers.json --output-format json auto 2>/dev/null
+
+# Parse result
+RESULT=$(gsd headless --answers answers.json --output-format json next 2>/dev/null)
+echo "$RESULT" | jq '{status: .status, cost: .cost.total}'
+```
--- a/gsd-orchestrator/references/commands.md
+++ b/gsd-orchestrator/references/commands.md
@ -0,0 +1,210 @@
+# GSD Commands Reference
+
+All commands run as subprocesses via `gsd headless [flags] [command] [args...]`.
+
+## Global Flags
+
+These flags apply to any `gsd headless` invocation:
+
+| Flag | Description |
+|------|-------------|
+| `--output-format <fmt>` | `text` (default), `json` (structured result), `stream-json` (JSONL) |
+| `--json` | Alias for `--output-format stream-json` |
+| `--bare` | Minimal context: skip CLAUDE.md, AGENTS.md, user settings, user skills |
+| `--resume <id>` | Resume a prior headless session by ID |
+| `--timeout N` | Overall timeout in ms (default: 300000) |
+| `--model ID` | Override LLM model |
+| `--supervised` | Forward interactive UI requests to orchestrator via stdout/stdin |
+| `--response-timeout N` | Timeout for orchestrator response in supervised mode (default: 30000ms) |
+| `--answers <path>` | Pre-supply answers and secrets from JSON file |
+| `--events <types>` | Filter JSONL output to specific event types (comma-separated, implies `--json`) |
+| `--verbose` | Show tool calls in progress output |
+
+## Exit Codes
+
+| Code | Meaning | When |
+|------|---------|------|
+| `0` | Success | Unit/milestone completed normally |
+| `1` | Error or timeout | Runtime error, LLM failure, or `--timeout` exceeded |
+| `10` | Blocked | Execution hit a blocker requiring human intervention |
+| `11` | Cancelled | User or orchestrator cancelled the operation |
+
+## Workflow Commands
+
+### `auto` (default)
+
+Autonomous mode — loop through all pending units until milestone complete or blocked.
+
+```bash
+gsd headless --output-format json auto
+```
+
+### `next`
+
+Step mode — execute exactly one unit (task/slice/milestone step), then exit. Recommended for orchestrators that need decision points between steps.
+
+```bash
+gsd headless --output-format json next
+```
+
+### `new-milestone`
+
+Create a milestone from a specification document.
+
+```bash
+gsd headless new-milestone --context spec.md
+gsd headless new-milestone --context spec.md --auto
+gsd headless new-milestone --context-text "Build a REST API" --auto
+cat spec.md | gsd headless new-milestone --context - --auto
+```
+
+Extra flags:
+- `--context <path>` — path to spec/PRD file (use `-` for stdin)
+- `--context-text <text>` — inline specification text
+- `--auto` — start auto-mode after milestone creation
+
+### `dispatch <phase>`
+
+Force-route to a specific phase, bypassing normal state-machine routing.
+
+```bash
+gsd headless dispatch research
+gsd headless dispatch plan
+gsd headless dispatch execute
+gsd headless dispatch complete
+gsd headless dispatch reassess
+gsd headless dispatch uat
+gsd headless dispatch replan
+```
+
+### `discuss`
+
+Start guided milestone/slice discussion.
+
+```bash
+gsd headless discuss
+```
+
+### `stop`
+
+Stop auto-mode gracefully.
+
+```bash
+gsd headless stop
+```
+
+### `pause`
+
+Pause auto-mode (preserves state, resumable).
+
+```bash
+gsd headless pause
+```
+
+## State Inspection
+
+### `query`
+
+**Instant JSON snapshot** — state, next dispatch, parallel costs. No LLM, ~50ms. The recommended way for orchestrators to inspect state.
+
+```bash
+gsd headless query
+gsd headless query | jq '.state.phase'
+gsd headless query | jq '.next'
+gsd headless query | jq '.cost.total'
+```
+
+### `status`
+
+Progress dashboard (TUI overlay — useful interactively, not for parsing).
+
+```bash
+gsd headless status
+```
+
+### `history`
+
+Execution history. Supports `--cost`, `--phase`, `--model`, and `limit` arguments.
+
+```bash
+gsd headless history
+```
+
+## Unit Control
+
+### `skip`
+
+Prevent a unit from auto-mode dispatch.
+
+```bash
+gsd headless skip
+```
+
+### `undo`
+
+Revert last completed unit. Use `--force` to bypass confirmation.
+
+```bash
+gsd headless undo
+gsd headless undo --force
+```
+
+### `steer <description>`
+
+Hard-steer plan documents during execution. Useful for mid-course corrections.
+
+```bash
+gsd headless steer "Skip the blocked dependency, use mock instead"
+```
+
+### `queue`
+
+Queue and reorder future milestones.
+
+```bash
+gsd headless queue
+```
+
+## Configuration & Health
+
+### `doctor`
+
+Runtime health checks with auto-fix.
+
+```bash
+gsd headless doctor
+```
+
+### `prefs`
+
+Manage preferences (global/project/status/wizard/setup).
+
+```bash
+gsd headless prefs
+```
+
+### `knowledge <rule|pattern|lesson>`
+
+Add persistent project knowledge.
+
+```bash
+gsd headless knowledge "Always use UTC timestamps in API responses"
+```
+
+## Phases
+
+GSD workflows progress through these phases:
+
+```
+pre-planning → needs-discussion → discussing → researching → planning →
+executing → verifying → summarizing → advancing → validating-milestone →
+completing-milestone → complete
+```
+
+Special phases: `paused`, `blocked`, `replanning-slice`
+
+## Hierarchy
+
+- **Milestone**: Shippable version (4–10 slices, 1–4 weeks)
+- **Slice**: One demoable vertical capability (1–7 tasks, 1–3 days)
+- **Task**: One context-window-sized unit of work (one session)
--- a/gsd-orchestrator/references/json-result.md
+++ b/gsd-orchestrator/references/json-result.md
@ -0,0 +1,162 @@
+# HeadlessJsonResult Reference
+
+When using `--output-format json`, GSD collects events silently and emits a single `HeadlessJsonResult` JSON object to stdout at process exit. This is the structured result for orchestrator decision-making.
+
+## Obtaining the Result
+
+```bash
+# Capture the JSON result
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+EXIT=$?
+
+# Parse fields with jq
+echo "$RESULT" | jq '.status'
+echo "$RESULT" | jq '.cost.total'
+echo "$RESULT" | jq '.nextAction'
+```
+
+**Important:** Progress text goes to stderr. The JSON result goes to stdout. Redirect stderr to `/dev/null` when parsing stdout.
+
+## Field Reference
+
+### Top-Level Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | `"success" \| "error" \| "blocked" \| "cancelled" \| "timeout"` | Final session status. Maps directly to exit codes. |
+| `exitCode` | `number` | Process exit code: `0` (success), `1` (error/timeout), `10` (blocked), `11` (cancelled). |
+| `sessionId` | `string \| undefined` | Session identifier. Pass to `--resume <id>` to continue this session. |
+| `duration` | `number` | Session wall-clock duration in milliseconds. |
+| `cost` | `CostObject` | Token usage and cost breakdown. See below. |
+| `toolCalls` | `number` | Total number of tool calls made during the session. |
+| `events` | `number` | Total number of events processed during the session. |
+| `milestone` | `string \| undefined` | Active milestone ID (e.g. `"M001"`). |
+| `phase` | `string \| undefined` | Current GSD phase at session end (e.g. `"executing"`, `"blocked"`, `"complete"`). |
+| `nextAction` | `string \| undefined` | Recommended next action from the state machine (e.g. `"dispatch"`, `"complete"`). |
+| `artifacts` | `string[] \| undefined` | Paths to artifacts created or modified during the session. |
+| `commits` | `string[] \| undefined` | Git commit SHAs created during the session. |
+
+### Status → Exit Code Mapping
+
+| Status | Exit Code | Constant | Meaning |
+|--------|-----------|----------|---------|
+| `success` | `0` | `EXIT_SUCCESS` | Unit or milestone completed successfully |
+| `error` | `1` | `EXIT_ERROR` | Runtime error or LLM failure |
+| `timeout` | `1` | `EXIT_ERROR` | `--timeout` deadline exceeded |
+| `blocked` | `10` | `EXIT_BLOCKED` | Execution blocked — needs human intervention |
+| `cancelled` | `11` | `EXIT_CANCELLED` | Cancelled by user or orchestrator |
+
+### Cost Object
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `cost.total` | `number` | Total cost in USD for the session. |
+| `cost.input_tokens` | `number` | Number of input tokens consumed. |
+| `cost.output_tokens` | `number` | Number of output tokens generated. |
+| `cost.cache_read_tokens` | `number` | Number of tokens served from prompt cache. |
+| `cost.cache_write_tokens` | `number` | Number of tokens written to prompt cache. |
+
+## Parsing Patterns
+
+### Decision-Making After Each Step
+
+```bash
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+EXIT=$?
+
+case $EXIT in
+  0)
+    PHASE=$(echo "$RESULT" | jq -r '.phase')
+    NEXT=$(echo "$RESULT" | jq -r '.nextAction')
+    echo "Success — phase: $PHASE, next: $NEXT"
+    ;;
+  1)
+    STATUS=$(echo "$RESULT" | jq -r '.status')
+    echo "Failed — status: $STATUS"
+    ;;
+  10)
+    echo "Blocked — needs intervention"
+    gsd headless query | jq '.state'
+    ;;
+  11)
+    echo "Cancelled"
+    ;;
+esac
+```
+
+### Cost Tracking
+
+```bash
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+
+COST=$(echo "$RESULT" | jq -r '.cost.total')
+INPUT=$(echo "$RESULT" | jq -r '.cost.input_tokens')
+OUTPUT=$(echo "$RESULT" | jq -r '.cost.output_tokens')
+
+echo "Cost: \$$COST (${INPUT} in / ${OUTPUT} out)"
+```
+
+### Session Resumption
+
+```bash
+# First run — capture session ID
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+SESSION_ID=$(echo "$RESULT" | jq -r '.sessionId')
+
+# Resume the same session later
+gsd headless --resume "$SESSION_ID" --output-format json next 2>/dev/null
+```
+
+### Artifact Collection
+
+```bash
+RESULT=$(gsd headless --output-format json auto 2>/dev/null)
+
+# List files created/modified
+echo "$RESULT" | jq -r '.artifacts[]?'
+
+# List commits made
+echo "$RESULT" | jq -r '.commits[]?'
+```
+
+## Example Result
+
+```json
+{
+  "status": "success",
+  "exitCode": 0,
+  "sessionId": "abc123def456",
+  "duration": 45200,
+  "cost": {
+    "total": 0.42,
+    "input_tokens": 15000,
+    "output_tokens": 3500,
+    "cache_read_tokens": 8000,
+    "cache_write_tokens": 2000
+  },
+  "toolCalls": 12,
+  "events": 87,
+  "milestone": "M001",
+  "phase": "executing",
+  "nextAction": "dispatch",
+  "artifacts": [
+    ".gsd/milestones/M001/slices/S01/tasks/T01-SUMMARY.md"
+  ],
+  "commits": [
+    "a1b2c3d"
+  ]
+}
+```
+
+## Combined with `query` for Full Picture
+
+The `HeadlessJsonResult` captures what happened during a session. Use `query` for the current project state:
+
+```bash
+# What happened in this step?
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+echo "$RESULT" | jq '{status, cost: .cost.total, phase}'
+
+# What's the overall project state now?
+gsd headless query | jq '{phase: .state.phase, progress: .state.progress, totalCost: .cost.total}'
+```
--- a/gsd-orchestrator/templates/spec.md
+++ b/gsd-orchestrator/templates/spec.md
@ -0,0 +1,20 @@
+# [Product Name]
+
+## What
+[One paragraph: what this product does. Be concrete — "A CLI tool that converts CSV files to JSON" not "A data transformation solution".]
+
+## Requirements
+- [User can DO something specific and observable]
+- [User can DO another specific thing]
+- [System DOES something automatically]
+- [Error case: system handles X gracefully]
+
+## Technical Constraints
+- Language: [Node.js / Python / Go / Rust / etc.]
+- Framework: [Express / FastAPI / none / etc.]
+- External dependencies: [list APIs, databases, services]
+- Environment: [Node >= 22 / Python 3.12+ / etc.]
+
+## Out of Scope
+- [Explicit exclusion 1 — prevents scope creep]
+- [Explicit exclusion 2]
--- a/gsd-orchestrator/workflows/build-from-spec.md
+++ b/gsd-orchestrator/workflows/build-from-spec.md
@ -0,0 +1,184 @@
+# Build From Spec
+
+End-to-end workflow: take a product idea or specification, produce working software.
+
+## Prerequisites
+
+- `gsd` CLI installed (`npm install -g gsd-pi`)
+- A directory for the project (can be empty)
+- Git initialized in the directory
+
+## Process
+
+### Step 1: Prepare the project directory
+
+```bash
+PROJECT_DIR="/tmp/my-project-name"
+mkdir -p "$PROJECT_DIR"
+cd "$PROJECT_DIR"
+git init 2>/dev/null  # GSD needs a git repo
+```
+
+### Step 2: Write the spec file
+
+Write a spec file that describes what to build. More detail = better results.
+
+```bash
+cat > spec.md << 'SPEC'
+# Product Name
+
+## What
+[Concrete description of what to build]
+
+## Requirements
+- [Specific, testable requirement 1]
+- [Specific, testable requirement 2]
+- [Specific, testable requirement 3]
+
+## Technical Constraints
+- [Language, framework, or platform requirements]
+- [External services or APIs involved]
+- [Performance or security requirements]
+
+## Out of Scope
+- [Things explicitly NOT included]
+SPEC
+```
+
+**Spec quality matters.** Vague specs produce vague results. Include:
+- What the user can DO when it's done (not what code to write)
+- Technical constraints (language, framework, Node version)
+- What's out of scope (prevents scope creep)
+
+### Step 3: Launch the build
+
+**Fire-and-forget (simplest — GSD does everything):**
+```bash
+cd "$PROJECT_DIR"
+RESULT=$(gsd headless --output-format json --timeout 0 --context spec.md new-milestone --auto 2>/dev/null)
+EXIT=$?
+```
+
+`--timeout 0` disables the timeout for long builds. `--auto` chains milestone creation into execution.
+
+**With budget limit:**
+```bash
+# Use step-by-step mode with budget checks instead of auto
+# See workflows/step-by-step.md
+```
+
+**For CI or ecosystem runs (no user config):**
+```bash
+RESULT=$(gsd headless --bare --output-format json --timeout 0 --context spec.md new-milestone --auto 2>/dev/null)
+EXIT=$?
+```
+
+### Step 4: Handle the result
+
+```bash
+case $EXIT in
+  0)
+    # Success — verify deliverables
+    STATUS=$(echo "$RESULT" | jq -r '.status')
+    COST=$(echo "$RESULT" | jq -r '.cost.total')
+    COMMITS=$(echo "$RESULT" | jq -r '.commits | length')
+    echo "Build complete: $STATUS, cost: \$$COST, commits: $COMMITS"
+
+    # Inspect what was built
+    gsd headless query | jq '.state.progress'
+
+    # Check the actual files
+    ls -la "$PROJECT_DIR"
+    ;;
+  1)
+    # Error — inspect and decide
+    echo "Build failed"
+    echo "$RESULT" | jq '{status: .status, phase: .phase}'
+
+    # Check state for details
+    gsd headless query | jq '.state'
+    ;;
+  10)
+    # Blocked — needs intervention
+    echo "Build blocked — needs human input"
+    gsd headless query | jq '{phase: .state.phase, blockers: .state.blockers}'
+
+    # Options: steer, supply answers, or escalate
+    # See workflows/monitor-and-poll.md for blocker handling
+    ;;
+  11)
+    echo "Build was cancelled"
+    ;;
+esac
+```
+
+### Step 5: Verify deliverables
+
+After a successful build, verify the output:
+
+```bash
+cd "$PROJECT_DIR"
+
+# Check project state
+gsd headless query | jq '{
+  phase: .state.phase,
+  progress: .state.progress,
+  cost: .cost.total
+}'
+
+# Check git log for what was built
+git log --oneline
+
+# Run the project's own tests if they exist
+[ -f package.json ] && npm test 2>/dev/null
+[ -f Makefile ] && make test 2>/dev/null
+```
+
+## Complete Example
+
+```bash
+# 1. Setup
+mkdir -p /tmp/todo-api && cd /tmp/todo-api && git init
+
+# 2. Write spec
+cat > spec.md << 'SPEC'
+# Todo API
+
+Build a REST API for managing todo items using Node.js and Express.
+
+## Requirements
+- GET /todos — list all todos
+- POST /todos — create a todo (title, completed)
+- PUT /todos/:id — update a todo
+- DELETE /todos/:id — delete a todo
+- Todos stored in-memory (no database)
+- Input validation with descriptive error messages
+- Health check endpoint at GET /health
+
+## Technical Constraints
+- Node.js with ESM modules
+- Express framework
+- No external database — in-memory array
+- Port configurable via PORT env var (default 3000)
+
+## Out of Scope
+- Authentication
+- Persistent storage
+- Frontend
+SPEC
+
+# 3. Launch
+RESULT=$(gsd headless --output-format json --timeout 0 --context spec.md new-milestone --auto 2>/dev/null)
+EXIT=$?
+
+# 4. Report
+if [ $EXIT -eq 0 ]; then
+  COST=$(echo "$RESULT" | jq -r '.cost.total')
+  echo "Build complete (\$$COST)"
+  echo "Files created:"
+  find . -not -path './.gsd/*' -not -path './.git/*' -type f
+else
+  echo "Build failed (exit $EXIT)"
+  echo "$RESULT" | jq .
+fi
+```
--- a/gsd-orchestrator/workflows/monitor-and-poll.md
+++ b/gsd-orchestrator/workflows/monitor-and-poll.md
@ -0,0 +1,187 @@
+# Monitor and Poll
+
+Check status of a GSD project, handle blockers, track costs, and decide next actions.
+
+## Checking Project State
+
+The `query` command is your primary monitoring tool. It's instant (~50ms), costs nothing (no LLM), and returns the full project snapshot.
+
+```bash
+cd /path/to/project
+gsd headless query
+```
+
+### Key fields to inspect
+
+```bash
+# Overall status
+gsd headless query | jq '{
+  phase: .state.phase,
+  milestone: .state.activeMilestone.id,
+  slice: .state.activeSlice.id,
+  task: .state.activeTask.id,
+  progress: .state.progress,
+  cost: .cost.total
+}'
+
+# What should happen next
+gsd headless query | jq '.next'
+# Returns: { "action": "dispatch", "unitType": "execute-task", "unitId": "M001/S01/T01" }
+
+# Is it done?
+gsd headless query | jq '.state.phase'
+# "complete" = done, "blocked" = needs you, anything else = in progress
+```
+
+### Phase meanings
+
+| Phase | Meaning | Your action |
+|-------|---------|-------------|
+| `pre-planning` | Milestone exists, no slices planned yet | Run `auto` or `next` |
+| `needs-discussion` | Ambiguities need resolution | Supply answers or run with defaults |
+| `discussing` | Discussion in progress | Wait |
+| `researching` | Codebase/library research | Wait |
+| `planning` | Creating task plans | Wait |
+| `executing` | Writing code | Wait |
+| `verifying` | Checking must-haves | Wait |
+| `summarizing` | Recording what happened | Wait |
+| `advancing` | Moving to next task/slice | Wait |
+| `evaluating-gates` | Quality checks before execution | Wait or run `next` |
+| `validating-milestone` | Final milestone checks | Wait |
+| `completing-milestone` | Archiving and cleanup | Wait |
+| `complete` | Done | Verify deliverables |
+| `blocked` | Needs human input | Handle blocker (see below) |
+| `paused` | Explicitly paused | Resume with `auto` |
+
+## Handling Blockers
+
+When exit code is `10` or phase is `blocked`:
+
+```bash
+# 1. Understand the blocker
+gsd headless query | jq '{phase: .state.phase, blockers: .state.blockers, nextAction: .state.nextAction}'
+
+# 2. Option A: Steer around it
+gsd headless steer "Skip the database dependency, use in-memory storage instead"
+
+# 3. Option B: Supply pre-built answers
+cat > fix.json << 'EOF'
+{
+  "questions": { "blocked_question_id": "workaround_option" },
+  "defaults": { "strategy": "first_option" }
+}
+EOF
+gsd headless --answers fix.json auto
+
+# 4. Option C: Force a specific phase
+gsd headless dispatch replan
+
+# 5. Option D: Escalate to user
+echo "GSD build blocked. Phase: $(gsd headless query | jq -r '.state.phase')"
+echo "Manual intervention required."
+```
+
+## Cost Tracking
+
+```bash
+# Current cumulative cost
+gsd headless query | jq '.cost.total'
+
+# Per-worker breakdown
+gsd headless query | jq '.cost.workers'
+
+# After a step (from HeadlessJsonResult)
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+echo "$RESULT" | jq '.cost'
+```
+
+### Budget enforcement pattern
+
+```bash
+MAX_BUDGET=15.00
+
+check_budget() {
+  TOTAL=$(gsd headless query | jq -r '.cost.total')
+  OVER=$(echo "$TOTAL > $MAX_BUDGET" | bc -l)
+  if [ "$OVER" = "1" ]; then
+    echo "Budget exceeded: \$$TOTAL > \$$MAX_BUDGET"
+    gsd headless stop
+    return 1
+  fi
+  return 0
+}
+```
+
+## Poll-and-React Loop
+
+For agents that need to periodically check on a build:
+
+```bash
+cd /path/to/project
+
+poll_project() {
+  STATE=$(gsd headless query 2>/dev/null)
+  if [ -z "$STATE" ]; then
+    echo "NO_PROJECT"
+    return
+  fi
+
+  PHASE=$(echo "$STATE" | jq -r '.state.phase')
+  COST=$(echo "$STATE" | jq -r '.cost.total')
+  PROGRESS=$(echo "$STATE" | jq -r '"\(.state.progress.milestones.done)/\(.state.progress.milestones.total) milestones, \(.state.progress.tasks.done)/\(.state.progress.tasks.total) tasks"')
+
+  case "$PHASE" in
+    complete)
+      echo "COMPLETE cost=\$$COST progress=$PROGRESS"
+      ;;
+    blocked)
+      BLOCKER=$(echo "$STATE" | jq -r '.state.nextAction // "unknown"')
+      echo "BLOCKED reason=$BLOCKER cost=\$$COST"
+      ;;
+    *)
+      NEXT=$(echo "$STATE" | jq -r '.next.action // "none"')
+      echo "IN_PROGRESS phase=$PHASE next=$NEXT cost=\$$COST progress=$PROGRESS"
+      ;;
+  esac
+}
+```
+
+## Resuming Work
+
+If a build was interrupted or you need to continue:
+
+```bash
+cd /path/to/project
+
+# Check current state
+gsd headless query | jq '.state.phase'
+
+# Resume from where it left off
+gsd headless --output-format json auto 2>/dev/null
+
+# Or resume a specific session
+gsd headless --resume "$SESSION_ID" --output-format json auto 2>/dev/null
+```
+
+## Reading Build Artifacts
+
+After completion, inspect what GSD produced:
+
+```bash
+cd /path/to/project
+
+# Project summary
+cat .gsd/PROJECT.md
+
+# What was decided
+cat .gsd/DECISIONS.md
+
+# Requirements and their validation status
+cat .gsd/REQUIREMENTS.md
+
+# Milestone summary
+cat .gsd/milestones/M001-*/M001-*-SUMMARY.md 2>/dev/null
+
+# Git history (GSD commits per-slice)
+git log --oneline
+```
--- a/gsd-orchestrator/workflows/step-by-step.md
+++ b/gsd-orchestrator/workflows/step-by-step.md
@ -0,0 +1,156 @@
+# Step-by-Step Execution
+
+Run GSD one unit at a time with decision points between steps. Use this when you need
+control over execution — budget enforcement, progress reporting, conditional logic,
+or the ability to steer mid-build.
+
+## When to use this vs `auto`
+
+| Approach | Use when |
+|----------|----------|
+| `auto` | You trust the build, just want the result |
+| `next` loop | You need budget checks, progress updates, or intervention points |
+
+## Core Loop
+
+```bash
+cd /path/to/project
+MAX_BUDGET=20.00
+TOTAL_COST=0
+
+while true; do
+  # Run one unit
+  RESULT=$(gsd headless --output-format json next 2>/dev/null)
+  EXIT=$?
+
+  # Parse result
+  STATUS=$(echo "$RESULT" | jq -r '.status')
+  STEP_COST=$(echo "$RESULT" | jq -r '.cost.total')
+  PHASE=$(echo "$RESULT" | jq -r '.phase // empty')
+  SESSION_ID=$(echo "$RESULT" | jq -r '.sessionId // empty')
+
+  # Handle exit codes
+  case $EXIT in
+    0) ;; # success — continue
+    1)
+      echo "Step failed: $STATUS"
+      break
+      ;;
+    10)
+      echo "Blocked — needs intervention"
+      gsd headless query | jq '.state'
+      break
+      ;;
+    11)
+      echo "Cancelled"
+      break
+      ;;
+  esac
+
+  # Check if milestone complete
+  CURRENT_PHASE=$(gsd headless query | jq -r '.state.phase')
+  if [ "$CURRENT_PHASE" = "complete" ]; then
+    TOTAL_COST=$(gsd headless query | jq -r '.cost.total')
+    echo "Milestone complete. Total cost: \$$TOTAL_COST"
+    break
+  fi
+
+  # Budget check
+  TOTAL_COST=$(gsd headless query | jq -r '.cost.total')
+  OVER=$(echo "$TOTAL_COST > $MAX_BUDGET" | bc -l)
+  if [ "$OVER" = "1" ]; then
+    echo "Budget limit (\$$MAX_BUDGET) exceeded at \$$TOTAL_COST"
+    gsd headless stop
+    break
+  fi
+
+  # Progress report
+  PROGRESS=$(gsd headless query | jq -r '"\(.state.progress.tasks.done)/\(.state.progress.tasks.total) tasks"')
+  echo "Step done ($STATUS). Phase: $CURRENT_PHASE, Progress: $PROGRESS, Cost: \$$TOTAL_COST"
+done
+```
+
+## Step-by-Step with Spec Creation
+
+Complete flow from idea to working code with full control:
+
+```bash
+# 1. Setup
+PROJECT_DIR="/tmp/my-project"
+mkdir -p "$PROJECT_DIR" && cd "$PROJECT_DIR" && git init 2>/dev/null
+
+# 2. Write spec
+cat > spec.md << 'SPEC'
+[Your spec here]
+SPEC
+
+# 3. Create the milestone (planning only, no execution)
+RESULT=$(gsd headless --output-format json --context spec.md new-milestone 2>/dev/null)
+EXIT=$?
+
+if [ $EXIT -ne 0 ]; then
+  echo "Milestone creation failed"
+  echo "$RESULT" | jq .
+  exit 1
+fi
+
+echo "Milestone created. Starting execution..."
+
+# 4. Execute step-by-step
+STEP=0
+while true; do
+  STEP=$((STEP + 1))
+  RESULT=$(gsd headless --output-format json next 2>/dev/null)
+  EXIT=$?
+
+  [ $EXIT -ne 0 ] && break
+
+  PHASE=$(gsd headless query | jq -r '.state.phase')
+  COST=$(gsd headless query | jq -r '.cost.total')
+
+  echo "Step $STEP complete. Phase: $PHASE, Cost: \$$COST"
+
+  [ "$PHASE" = "complete" ] && break
+done
+
+echo "Build finished in $STEP steps"
+```
+
+## Intervention Patterns
+
+### Steer mid-execution
+
+If you detect the build going in the wrong direction:
+
+```bash
+# Check what's happening
+gsd headless query | jq '{phase: .state.phase, task: .state.activeTask}'
+
+# Redirect
+gsd headless steer "Use SQLite instead of PostgreSQL for storage"
+
+# Continue
+gsd headless --output-format json next 2>/dev/null
+```
+
+### Skip a stuck unit
+
+```bash
+gsd headless skip
+gsd headless --output-format json next 2>/dev/null
+```
+
+### Undo last completed unit
+
+```bash
+gsd headless undo --force
+gsd headless --output-format json next 2>/dev/null
+```
+
+### Force a specific phase
+
+```bash
+gsd headless dispatch replan   # Re-plan the current slice
+gsd headless dispatch execute  # Skip to execution
+gsd headless dispatch uat      # Jump to user acceptance testing
+```
--- a/mintlify-docs/docs.json
+++ b/mintlify-docs/docs.json
@ -0,0 +1,101 @@
+{
+  "$schema": "https://mintlify.com/docs.json",
+  "theme": "mint",
+  "name": "GSD",
+  "logo": {
+    "light": "/images/logo.svg",
+    "dark": "/images/logo.svg",
+    "href": "https://github.com/gsd-build/gsd-2/tree/main/docs"
+  },
+  "favicon": "/images/favicon.svg",
+  "colors": {
+    "primary": "#7dcfff",
+    "light": "#7dcfff",
+    "dark": "#1a1b26"
+  },
+  "appearance": {
+    "default": "dark"
+  },
+  "background": {
+    "decoration": "gradient"
+  },
+  "fonts": {
+    "heading": {
+      "family": "JetBrains Mono",
+      "weight": 700
+    },
+    "body": {
+      "family": "Inter",
+      "weight": 400
+    }
+  },
+  "navbar": {
+    "links": [
+      {
+        "label": "GitHub",
+        "href": "https://github.com/gsd-build/gsd-2"
+      }
+    ],
+    "primary": {
+      "type": "button",
+      "label": "Install",
+      "href": "/getting-started"
+    }
+  },
+  "footer": {
+    "socials": {
+      "github": "https://github.com/gsd-build/gsd-2"
+    }
+  },
+  "navigation": {
+    "groups": [
+      {
+        "group": "Getting started",
+        "pages": [
+          "introduction",
+          "getting-started"
+        ]
+      },
+      {
+        "group": "Core concepts",
+        "pages": [
+          "guides/auto-mode",
+          "guides/commands",
+          "guides/git-strategy"
+        ]
+      },
+      {
+        "group": "Configuration",
+        "pages": [
+          "guides/configuration",
+          "guides/custom-models",
+          "guides/token-optimization",
+          "guides/dynamic-model-routing",
+          "guides/cost-management"
+        ]
+      },
+      {
+        "group": "Features",
+        "pages": [
+          "guides/captures-triage",
+          "guides/parallel-orchestration",
+          "guides/remote-questions",
+          "guides/skills",
+          "guides/visualizer",
+          "guides/web-interface",
+          "guides/working-in-teams"
+        ]
+      },
+      {
+        "group": "Reference",
+        "pages": [
+          "guides/troubleshooting",
+          "guides/migration"
+        ]
+      }
+    ]
+  },
+  "search": {
+    "prompt": "Search GSD docs..."
+  }
+}
--- a/mintlify-docs/getting-started.mdx
+++ b/mintlify-docs/getting-started.mdx
@ -0,0 +1,187 @@
+---
+title: "Getting started"
+description: "Install GSD, configure your LLM provider, and run your first autonomous session."
+---
+
+## Install
+
+```bash
+npm install -g gsd-pi
+```
+
+Requires Node.js 22+ and Git.
+
+<Note>
+**`command not found: gsd`?** Your shell may not have npm's global bin directory in `$PATH`. Run `npm prefix -g` to find it, then add `$(npm prefix -g)/bin` to your PATH. See [troubleshooting](/guides/troubleshooting) for details.
+</Note>
+
+GSD checks for updates every 24 hours. Update in-session with `/gsd update`.
+
+## First launch
+
+```bash
+gsd
+```
+
+On first launch, a setup wizard walks you through:
+
+1. **LLM provider** — 20+ providers (Anthropic, OpenAI, Google, OpenRouter, GitHub Copilot, Amazon Bedrock, Azure, and more). OAuth handles Claude Max and Copilot subscriptions automatically; otherwise paste an API key.
+2. **Tool API keys** (optional) — Brave Search, Context7, Jina, Slack, Discord. Press Enter to skip any.
+
+Re-run the wizard anytime:
+
+```bash
+gsd config
+```
+
+### Set up API keys
+
+For non-Anthropic models, you may need a search API key. Run `/gsd config` to set keys globally — they're saved to `~/.gsd/agent/auth.json` and apply to all projects.
+
+### Set up MCP servers
+
+To connect GSD to local or external MCP servers, add project-local config in `.mcp.json` or `.gsd/mcp.json`. See [configuration](/guides/configuration) for examples. Use `/gsd mcp` to verify connectivity.
+
+### Offline mode
+
+GSD works fully offline with local models (Ollama, vLLM, LM Studio). Configure a [custom model](/guides/custom-models) and GSD handles the rest — no internet connection required.
+
+## Choose a model
+
+GSD auto-selects a default model after login. Switch anytime:
+
+```
+/model
+```
+
+Or configure per-phase models in [preferences](/guides/configuration).
+
+## Two ways to work
+
+<Tabs>
+  <Tab title="Step mode">
+    Type `/gsd` inside a session. GSD executes one unit at a time, pausing between each with a wizard showing what completed and what's next.
+
+    - **No `.gsd/` directory** → starts a discussion to capture your project vision
+    - **Milestone exists, no roadmap** → discuss or research the milestone
+    - **Roadmap exists, slices pending** → plan the next slice or execute a task
+    - **Mid-task** → resume where you left off
+  </Tab>
+  <Tab title="Auto mode">
+    Type `/gsd auto` and walk away. GSD autonomously researches, plans, executes, verifies, commits, and advances through every slice until the milestone is complete.
+
+    ```
+    /gsd auto
+    ```
+
+    See [auto mode](/guides/auto-mode) for the full details.
+  </Tab>
+</Tabs>
+
+## Two terminals, one project
+
+The recommended workflow: auto mode in one terminal, steering from another.
+
+**Terminal 1 — let it build:**
+
+```bash
+gsd
+/gsd auto
+```
+
+**Terminal 2 — steer while it works:**
+
+```bash
+gsd
+/gsd discuss    # talk through architecture decisions
+/gsd status     # check progress
+/gsd queue      # queue the next milestone
+```
+
+Both terminals read and write the same `.gsd/` files. Decisions in terminal 2 are picked up at the next phase boundary automatically.
+
+## Project structure
+
+GSD organizes work into a hierarchy:
+
+```
+Milestone  →  a shippable version (4-10 slices)
+  Slice    →  one demoable vertical capability (1-7 tasks)
+    Task   →  one context-window-sized unit of work
+```
+
+All state lives on disk in `.gsd/`:
+
+<Accordion title="Directory structure">
+```
+.gsd/
+  PROJECT.md          — what the project is right now
+  REQUIREMENTS.md     — requirement contract (active/validated/deferred)
+  DECISIONS.md        — append-only architectural decisions
+  KNOWLEDGE.md        — cross-session rules, patterns, and lessons
+  RUNTIME.md          — runtime context: API endpoints, env vars, services
+  STATE.md            — quick-glance status
+  milestones/
+    M001/
+      M001-ROADMAP.md — slice plan with risk levels and dependencies
+      M001-CONTEXT.md — scope and goals from discussion
+      slices/
+        S01/
+          S01-PLAN.md     — task decomposition
+          S01-SUMMARY.md  — what happened
+          S01-UAT.md      — human test script
+          tasks/
+            T01-PLAN.md
+            T01-SUMMARY.md
+```
+</Accordion>
+
+## Resume a session
+
+```bash
+gsd --continue    # or gsd -c
+```
+
+Resumes the most recent session. To pick from all saved sessions:
+
+```bash
+gsd sessions
+```
+
+## VS Code extension
+
+GSD is also available as a VS Code extension (publisher: FluxLabs). It provides:
+
+- **`@gsd` chat participant** — talk to the agent in VS Code Chat
+- **Sidebar dashboard** — connection status, model info, token usage, quick actions
+- **Full command palette** — start/stop agent, switch models, export sessions
+
+The CLI (`gsd-pi`) must be installed first — the extension connects to it via RPC.
+
+## Web interface
+
+```bash
+gsd --web
+```
+
+A browser-based dashboard with real-time progress and multi-project support. See [web interface](/guides/web-interface) for details.
+
+## Troubleshooting
+
+### `gsd` runs `git svn dcommit` instead of GSD
+
+The [oh-my-zsh git plugin](https://github.com/ohmyzsh/ohmyzsh/tree/master/plugins/git) defines `alias gsd='git svn dcommit'`.
+
+**Option 1** — Remove the alias in `~/.zshrc` (after the `source $ZSH/oh-my-zsh.sh` line):
+
+```bash
+unalias gsd 2>/dev/null
+```
+
+**Option 2** — Use the alternative binary name:
+
+```bash
+gsd-cli
+```
+
+Both `gsd` and `gsd-cli` point to the same binary.
--- a/mintlify-docs/guides/auto-mode.mdx
+++ b/mintlify-docs/guides/auto-mode.mdx
@ -0,0 +1,181 @@
+---
+title: "Auto mode"
+description: "GSD's autonomous execution engine — run /gsd auto, walk away, come back to built software with clean git history."
+---
+
+Auto mode is a **state machine driven by files on disk**. It reads `.gsd/STATE.md`, determines the next unit of work, creates a fresh agent session with pre-loaded context, and lets the LLM execute. When the LLM finishes, auto mode reads disk state again and dispatches the next unit.
+
+## The loop
+
+```
+Plan → Execute (per task) → Complete → Reassess Roadmap → Next Slice
+                                                            ↓ (all slices done)
+                                                    Validate → Complete Milestone
+```
+
+- **Plan** — scouts the codebase, researches docs, decomposes the slice into tasks
+- **Execute** — runs each task in a fresh context window
+- **Complete** — writes summary, UAT script, marks roadmap, commits
+- **Reassess** — checks if the roadmap still makes sense
+- **Validate** — reconciliation gate after all slices; catches gaps before sealing the milestone
+
+## Key properties
+
+### Fresh session per unit
+
+Every task, research phase, and planning step gets a clean context window. The dispatch prompt includes everything needed — task plans, prior summaries, dependency context, decisions register — so the LLM starts oriented.
+
+### Context pre-loading
+
+| Inlined artifact | Purpose |
+|------------------|---------|
+| Task plan | What to build |
+| Slice plan | Where this task fits |
+| Prior task summaries | What's already done |
+| Dependency summaries | Cross-slice context |
+| Roadmap excerpt | Overall direction |
+| Decisions register | Architectural context |
+
+The amount of context inlined is controlled by your [token profile](/guides/token-optimization). Budget mode inlines minimal context; quality mode inlines everything.
+
+### Git isolation
+
+GSD isolates milestone work using one of three modes (configured via `git.isolation` in preferences):
+
+- **`none`** (default) — work happens on your current branch. No isolation overhead.
+- **`worktree`** — each milestone runs in its own git worktree. Squash-merged to main on completion.
+- **`branch`** — work happens on a `milestone/<MID>` branch in the project root. Useful for submodule-heavy repos.
+
+See [git strategy](/guides/git-strategy) for details.
+
+### Crash recovery
+
+A lock file tracks the current unit. If the session dies, the next `/gsd auto` synthesizes a recovery briefing from tool calls that made it to disk and resumes with full context.
+
+**Headless auto-restart:** When running `gsd headless auto`, crashes trigger automatic restart with exponential backoff (5s → 10s → 30s cap, default 3 attempts). Combined with crash recovery, this enables overnight "run until done" execution.
+
+### Provider error recovery
+
+| Error type | Examples | Action |
+|-----------|----------|--------|
+| Rate limit | 429, "too many requests" | Auto-resume after retry-after header or 60s |
+| Server error | 500, 502, 503, "overloaded" | Auto-resume after 30s |
+| Permanent | "unauthorized", "invalid key" | Pause indefinitely (requires manual resume) |
+
+### Stuck detection
+
+A sliding-window analysis detects stuck loops — catching cycles like A→B→A→B as well as single-unit repeats. On detection, GSD retries once with a diagnostic prompt. If it fails again, auto mode stops with the exact file it expected.
+
+### Timeout supervision
+
+| Timeout | Default | Behavior |
+|---------|---------|----------|
+| Soft | 20 min | Warns the LLM to wrap up |
+| Idle | 10 min | Detects stalls, intervenes |
+| Hard | 30 min | Pauses auto mode |
+
+Configure in preferences:
+
+```yaml
+auto_supervisor:
+  soft_timeout_minutes: 20
+  idle_timeout_minutes: 10
+  hard_timeout_minutes: 30
+```
+
+### Incremental memory
+
+GSD maintains a `KNOWLEDGE.md` file — an append-only register of project-specific rules, patterns, and lessons learned. The agent reads it at the start of every unit and appends when discovering recurring issues or non-obvious patterns.
+
+### Verification enforcement
+
+```yaml
+verification_commands:
+  - npm run lint
+  - npm run test
+verification_auto_fix: true
+verification_max_retries: 2
+```
+
+Failures trigger auto-fix retries — the agent sees the output and attempts to fix issues before advancing.
+
+### HTML reports
+
+After milestone completion, GSD auto-generates a self-contained HTML report with progress tree, dependency graph, cost/token metrics, execution timeline, and changelog.
+
+```yaml
+auto_report: true    # enabled by default
+```
+
+Generate manually with `/gsd export --html`, or for all milestones with `/gsd export --html --all`.
+
+### Reactive task execution
+
+When `reactive_execution: true` is set, GSD derives a dependency graph from IO annotations in task plans. Tasks that don't conflict are dispatched in parallel via subagents.
+
+```yaml
+reactive_execution: true    # disabled by default
+```
+
+## Controlling auto mode
+
+<Steps>
+  <Step title="Start">
+    ```
+    /gsd auto
+    ```
+  </Step>
+  <Step title="Pause">
+    Press **Escape**. The conversation is preserved. You can interact with the agent, inspect state, or resume.
+  </Step>
+  <Step title="Resume">
+    ```
+    /gsd auto
+    ```
+    Auto mode reads disk state and picks up where it left off.
+  </Step>
+  <Step title="Stop">
+    ```
+    /gsd stop
+    ```
+    Stops auto mode gracefully. Can be run from a different terminal.
+  </Step>
+</Steps>
+
+### Steer during execution
+
+```
+/gsd steer
+```
+
+Hard-steer plan documents without stopping the pipeline. Changes are picked up at the next phase boundary.
+
+### Capture thoughts
+
+```
+/gsd capture "add rate limiting to API endpoints"
+```
+
+Fire-and-forget thought capture. Triaged automatically between tasks. See [captures and triage](/guides/captures-triage).
+
+## Dashboard
+
+`Ctrl+Alt+G` or `/gsd status` shows real-time progress:
+
+- Current milestone, slice, and task
+- Auto mode elapsed time and phase
+- Per-unit cost and token breakdown
+- Cost projections
+- Pending capture count
+
+## Phase skipping
+
+Token profiles can skip phases to reduce cost:
+
+| Phase | `budget` | `balanced` | `quality` |
+|-------|----------|------------|-----------|
+| Milestone research | Skipped | Runs | Runs |
+| Slice research | Skipped | Skipped | Runs |
+| Reassess roadmap | Skipped | Runs | Runs |
+
+See [token optimization](/guides/token-optimization) for details.
--- a/mintlify-docs/guides/captures-triage.mdx
+++ b/mintlify-docs/guides/captures-triage.mdx
@ -0,0 +1,75 @@
+---
+title: "Captures and triage"
+description: "Fire-and-forget thought capture during auto-mode with automated triage."
+---
+
+Captures let you fire-and-forget thoughts during auto-mode execution. Instead of pausing to steer, capture ideas, bugs, or scope changes and let GSD triage them at natural seams between tasks.
+
+## Quick start
+
+While auto-mode is running (or any time):
+
+```
+/gsd capture "add rate limiting to the API endpoints"
+/gsd capture "the auth flow should support OAuth, not just JWT"
+```
+
+Captures are appended to `.gsd/CAPTURES.md` and triaged automatically between tasks.
+
+## How it works
+
+```
+capture → triage → confirm → resolve → resume
+```
+
+<Steps>
+  <Step title="Capture">
+    `/gsd capture "thought"` appends to `.gsd/CAPTURES.md` with a timestamp and unique ID.
+  </Step>
+  <Step title="Triage">
+    At natural seams between tasks, GSD classifies each capture.
+  </Step>
+  <Step title="Confirm">
+    You're shown the proposed resolution. Plan-modifying resolutions require confirmation.
+  </Step>
+  <Step title="Resolve">
+    The resolution is applied (task injection, replan trigger, deferral, etc.).
+  </Step>
+  <Step title="Resume">
+    Auto-mode continues.
+  </Step>
+</Steps>
+
+## Classification types
+
+| Type | Meaning | Resolution |
+|------|---------|------------|
+| `quick-task` | Small, self-contained fix | Inline quick task executed immediately |
+| `inject` | New task needed in current slice | Task injected into the active slice plan |
+| `defer` | Important but not urgent | Deferred to roadmap reassessment |
+| `replan` | Changes the current approach | Triggers slice replan with capture context |
+| `note` | Informational, no action | Acknowledged, no plan changes |
+
+## Manual triage
+
+Trigger triage at any time:
+
+```
+/gsd triage
+```
+
+Useful when you've accumulated several captures and want to process them before the next natural seam.
+
+## Dashboard integration
+
+The progress widget shows a pending capture count badge when captures are waiting for triage. Visible in both the `Ctrl+Alt+G` dashboard and the auto-mode widget.
+
+## Context injection
+
+Capture context is automatically injected into:
+- **Replan-slice prompts** — so the replan knows what triggered it
+- **Reassess-roadmap prompts** — so deferred captures influence roadmap decisions
+
+## Worktree awareness
+
+Captures resolve to the **original project root's** `.gsd/CAPTURES.md`, not the worktree's local copy. Captures from a steering terminal are visible to the auto-mode session running in a worktree.
--- a/mintlify-docs/guides/commands.mdx
+++ b/mintlify-docs/guides/commands.mdx
@ -0,0 +1,182 @@
+---
+title: "Commands reference"
+description: "Every GSD command, keyboard shortcut, and CLI flag."
+---
+
+## Session commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd` | Step mode — execute one unit at a time, pause between each |
+| `/gsd next` | Explicit step mode (same as `/gsd`) |
+| `/gsd auto` | Autonomous mode — research, plan, execute, commit, repeat |
+| `/gsd quick` | Execute a quick task with GSD guarantees without full planning overhead |
+| `/gsd stop` | Stop auto mode gracefully |
+| `/gsd pause` | Pause auto mode (preserves state, `/gsd auto` to resume) |
+| `/gsd steer` | Hard-steer plan documents during execution |
+| `/gsd discuss` | Discuss architecture and decisions (works alongside auto mode) |
+| `/gsd rethink` | Conversational project reorganization |
+| `/gsd mcp` | MCP server status and connectivity |
+| `/gsd status` | Progress dashboard |
+| `/gsd widget` | Cycle dashboard widget: full / small / min / off |
+| `/gsd queue` | Queue and reorder future milestones (safe during auto mode) |
+| `/gsd capture` | Fire-and-forget thought capture (works during auto mode) |
+| `/gsd triage` | Manually trigger triage of pending captures |
+| `/gsd dispatch` | Dispatch a specific phase directly |
+| `/gsd history` | View execution history (supports `--cost`, `--phase`, `--model` filters) |
+| `/gsd forensics` | Full-access debugger for auto-mode failures |
+| `/gsd cleanup` | Clean up GSD state files and stale worktrees |
+| `/gsd visualize` | Open workflow visualizer |
+| `/gsd export --html` | Generate self-contained HTML report |
+| `/gsd export --html --all` | Generate reports for all milestones |
+| `/gsd update` | Update GSD to the latest version in-session |
+| `/gsd knowledge` | Add persistent project knowledge |
+| `/gsd fast` | Toggle service tier for supported models |
+| `/gsd rate` | Rate last unit's model tier (over/ok/under) |
+| `/gsd changelog` | Show categorized release notes |
+| `/gsd logs` | Browse activity logs, debug logs, and metrics |
+| `/gsd remote` | Control remote auto-mode |
+| `/gsd help` | Categorized command reference |
+
+## Configuration and diagnostics
+
+| Command | Description |
+|---------|-------------|
+| `/gsd prefs` | Model selection, timeouts, budget ceiling |
+| `/gsd mode` | Switch workflow mode (solo/team) |
+| `/gsd config` | Re-run the provider setup wizard |
+| `/gsd keys` | API key manager — list, add, remove, test, rotate |
+| `/gsd doctor` | Runtime health checks with auto-fix |
+| `/gsd inspect` | Show SQLite DB diagnostics |
+| `/gsd init` | Project init wizard |
+| `/gsd setup` | Global setup status and configuration |
+| `/gsd skill-health` | Skill lifecycle dashboard |
+| `/gsd hooks` | Show configured post-unit and pre-dispatch hooks |
+| `/gsd run-hook` | Manually trigger a specific hook |
+| `/gsd migrate` | Migrate a v1 `.planning` directory to `.gsd` format |
+
+## Milestone management
+
+| Command | Description |
+|---------|-------------|
+| `/gsd new-milestone` | Create a new milestone |
+| `/gsd skip` | Prevent a unit from auto-mode dispatch |
+| `/gsd undo` | Revert last completed unit |
+| `/gsd undo-task` | Reset a specific task's completion state |
+| `/gsd reset-slice` | Reset a slice and all its tasks |
+| `/gsd park` | Park a milestone — skip without deleting |
+| `/gsd unpark` | Reactivate a parked milestone |
+
+## Parallel orchestration
+
+| Command | Description |
+|---------|-------------|
+| `/gsd parallel start` | Analyze eligibility, confirm, and start workers |
+| `/gsd parallel status` | Show all workers with state, progress, and cost |
+| `/gsd parallel stop [MID]` | Stop all workers or a specific one |
+| `/gsd parallel pause [MID]` | Pause all or a specific worker |
+| `/gsd parallel resume [MID]` | Resume paused workers |
+| `/gsd parallel merge [MID]` | Merge completed milestones to main |
+
+## Workflow templates
+
+| Command | Description |
+|---------|-------------|
+| `/gsd start` | Start a workflow template (bugfix, spike, feature, hotfix, refactor, etc.) |
+| `/gsd start resume` | Resume an in-progress workflow |
+| `/gsd templates` | List available workflow templates |
+| `/gsd templates info <name>` | Show detailed template info |
+
+## Custom workflows
+
+| Command | Description |
+|---------|-------------|
+| `/gsd workflow new` | Create a new workflow definition |
+| `/gsd workflow run <name>` | Create a run and start auto-mode |
+| `/gsd workflow list` | List workflow runs |
+| `/gsd workflow validate <name>` | Validate a workflow definition |
+| `/gsd workflow pause` | Pause custom workflow auto-mode |
+| `/gsd workflow resume` | Resume paused custom workflow auto-mode |
+
+## Extensions
+
+| Command | Description |
+|---------|-------------|
+| `/gsd extensions list` | List all extensions and their status |
+| `/gsd extensions enable <id>` | Enable a disabled extension |
+| `/gsd extensions disable <id>` | Disable an extension |
+| `/gsd extensions info <id>` | Show extension details |
+
+## Keyboard shortcuts
+
+| Shortcut | Action |
+|----------|--------|
+| `Ctrl+Alt+G` | Toggle dashboard overlay |
+| `Ctrl+Alt+V` | Toggle voice transcription |
+| `Ctrl+Alt+B` | Show background shell processes |
+| `Ctrl+V` / `Alt+V` | Paste image from clipboard |
+| `Escape` | Pause auto mode |
+
+<Note>
+In terminals without Kitty keyboard protocol support (macOS Terminal.app, JetBrains IDEs), slash-command fallbacks are shown instead of `Ctrl+Alt` shortcuts.
+</Note>
+
+## CLI flags
+
+| Flag | Description |
+|------|-------------|
+| `gsd` | Start a new interactive session |
+| `gsd --continue` (`-c`) | Resume the most recent session |
+| `gsd --model <id>` | Override the default model |
+| `gsd --print "msg"` (`-p`) | Single-shot prompt mode (no TUI) |
+| `gsd --mode <text\|json\|rpc\|mcp>` | Output mode for non-interactive use |
+| `gsd --list-models [search]` | List available models and exit |
+| `gsd --web [path]` | Start browser-based web interface |
+| `gsd --worktree` (`-w`) `[name]` | Start session in a git worktree |
+| `gsd --no-session` | Disable session persistence |
+| `gsd --extension <path>` | Load an additional extension |
+| `gsd --version` (`-v`) | Print version and exit |
+| `gsd sessions` | Interactive session picker |
+| `gsd config` | Set up global API keys |
+| `gsd update` | Update GSD to the latest version |
+
+## Headless mode
+
+`gsd headless` runs commands without a TUI — designed for CI, cron jobs, and scripted automation.
+
+```bash
+gsd headless              # run auto mode
+gsd headless next         # run a single unit
+gsd headless query        # instant JSON snapshot (~50ms, no LLM)
+gsd headless --timeout 600000 auto   # with timeout
+gsd headless new-milestone --context brief.md --auto
+```
+
+| Flag | Description |
+|------|-------------|
+| `--timeout N` | Overall timeout in milliseconds (default: 300000) |
+| `--max-restarts N` | Auto-restart on crash (default: 3, set 0 to disable) |
+| `--json` | Stream events as JSONL to stdout |
+| `--model ID` | Override the model |
+| `--context <file>` | Context file for `new-milestone` (use `-` for stdin) |
+| `--auto` | Chain into auto-mode after milestone creation |
+
+**Exit codes:** `0` = complete, `1` = error/timeout, `2` = blocked.
+
+### `gsd headless query`
+
+Returns a JSON snapshot of the project state — no LLM session, instant response.
+
+```bash
+gsd headless query | jq '.state.phase'      # "executing"
+gsd headless query | jq '.next'              # next dispatch action
+gsd headless query | jq '.cost.total'        # total spend
+```
+
+## MCP server mode
+
+```bash
+gsd --mode mcp
+```
+
+Runs GSD as a Model Context Protocol server over stdin/stdout, exposing all tools to external AI clients (Claude Desktop, VS Code Copilot, etc.).
--- a/mintlify-docs/guides/configuration.mdx
+++ b/mintlify-docs/guides/configuration.mdx
@ -0,0 +1,306 @@
+---
+title: "Configuration"
+description: "Preferences, model selection, MCP servers, hooks, and all settings."
+---
+
+GSD preferences live in `~/.gsd/PREFERENCES.md` (global) or `.gsd/PREFERENCES.md` (project-local). Manage interactively with `/gsd prefs`.
+
+## Preferences commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd prefs` | Open the global preferences wizard |
+| `/gsd prefs global` | Global preferences wizard |
+| `/gsd prefs project` | Project preferences wizard |
+| `/gsd prefs status` | Show current files, merged values, and skill status |
+
+## Preferences file format
+
+Preferences use YAML frontmatter in a markdown file:
+
+```yaml
+---
+version: 1
+models:
+  research: claude-sonnet-4-6
+  planning: claude-opus-4-6
+  execution: claude-sonnet-4-6
+  completion: claude-sonnet-4-6
+skill_discovery: suggest
+auto_supervisor:
+  soft_timeout_minutes: 20
+  idle_timeout_minutes: 10
+  hard_timeout_minutes: 30
+budget_ceiling: 50.00
+token_profile: balanced
+---
+```
+
+## Global vs project preferences
+
+| Scope | Path | Applies to |
+|-------|------|-----------|
+| Global | `~/.gsd/PREFERENCES.md` | All projects |
+| Project | `.gsd/PREFERENCES.md` | Current project only |
+
+**Merge behavior:**
+- **Scalar fields** — project wins if defined
+- **Array fields** — concatenated (global first, then project)
+- **Object fields** — shallow-merged, project overrides per-key
+
+## Global API keys
+
+Tool API keys are stored globally in `~/.gsd/agent/auth.json`. Set them once with `/gsd config`.
+
+| Tool | Environment variable | Purpose |
+|------|---------------------|---------|
+| Tavily Search | `TAVILY_API_KEY` | Web search for non-Anthropic models |
+| Brave Search | `BRAVE_API_KEY` | Web search for non-Anthropic models |
+| Context7 Docs | `CONTEXT7_API_KEY` | Library documentation lookup |
+
+Anthropic models have built-in web search — no extra keys needed.
+
+## MCP servers
+
+GSD connects to external MCP servers configured in project files:
+
+- `.mcp.json` — repo-shared config
+- `.gsd/mcp.json` — local-only config
+
+<Tabs>
+  <Tab title="stdio server">
+    ```json
+    {
+      "mcpServers": {
+        "my-server": {
+          "type": "stdio",
+          "command": "/absolute/path/to/python3",
+          "args": ["/absolute/path/to/server.py"],
+          "env": {
+            "API_URL": "http://localhost:8000"
+          }
+        }
+      }
+    }
+    ```
+  </Tab>
+  <Tab title="HTTP server">
+    ```json
+    {
+      "mcpServers": {
+        "my-http-server": {
+          "url": "http://localhost:8080/mcp"
+        }
+      }
+    }
+    ```
+  </Tab>
+</Tabs>
+
+Verify from a GSD session: `mcp_servers` → `mcp_discover` → `mcp_call`.
+
+## Models
+
+Per-phase model selection:
+
+```yaml
+models:
+  research: claude-sonnet-4-6
+  planning:
+    model: claude-opus-4-6
+    fallbacks:
+      - openrouter/z-ai/glm-5
+  execution: claude-sonnet-4-6
+  execution_simple: claude-haiku-4-5-20250414
+  completion: claude-sonnet-4-6
+  subagent: claude-sonnet-4-6
+```
+
+**Phases:** `research`, `planning`, `execution`, `execution_simple`, `completion`, `subagent`
+
+When a model fails to switch, GSD automatically tries the next model in the `fallbacks` list.
+
+For custom providers (Ollama, vLLM, LM Studio), see [custom models](/guides/custom-models).
+
+## All settings
+
+### `token_profile`
+
+Coordinates model selection, phase skipping, and context compression. Values: `budget`, `balanced` (default), `quality`. See [token optimization](/guides/token-optimization).
+
+### `budget_ceiling`
+
+Maximum USD spend during auto mode:
+
+```yaml
+budget_ceiling: 50.00
+budget_enforcement: pause    # warn, pause (default), or halt
+```
+
+### `auto_supervisor`
+
+Timeout thresholds:
+
+```yaml
+auto_supervisor:
+  soft_timeout_minutes: 20
+  idle_timeout_minutes: 10
+  hard_timeout_minutes: 30
+```
+
+### `skill_discovery`
+
+| Value | Behavior |
+|-------|----------|
+| `auto` | Skills found and applied automatically |
+| `suggest` | Skills identified but not auto-installed (default) |
+| `off` | Disabled |
+
+### Verification
+
+```yaml
+verification_commands:
+  - npm run lint
+  - npm run test
+verification_auto_fix: true
+verification_max_retries: 2
+```
+
+### Git
+
+See [git strategy](/guides/git-strategy) for full git configuration.
+
+### Notifications
+
+```yaml
+notifications:
+  enabled: true
+  on_complete: true
+  on_error: true
+  on_budget: true
+  on_milestone: true
+  on_attention: true
+```
+
+### Post-unit hooks
+
+```yaml
+post_unit_hooks:
+  - name: code-review
+    after: [execute-task]
+    prompt: "Review the code changes for quality and security."
+    model: claude-opus-4-6
+    max_cycles: 1
+    artifact: REVIEW.md
+```
+
+### Pre-dispatch hooks
+
+```yaml
+pre_dispatch_hooks:
+  - name: add-standards
+    before: [execute-task]
+    action: modify          # modify, skip, or replace
+    prepend: "Follow our coding standards."
+```
+
+### Skill routing
+
+```yaml
+always_use_skills:
+  - debug-like-expert
+prefer_skills:
+  - frontend-design
+skill_rules:
+  - when: task involves authentication
+    use: [clerk]
+```
+
+### Custom instructions
+
+```yaml
+custom_instructions:
+  - "Always use TypeScript strict mode"
+  - "Prefer functional patterns over classes"
+```
+
+### Dynamic routing
+
+See [dynamic model routing](/guides/dynamic-model-routing).
+
+### Parallel execution
+
+See [parallel orchestration](/guides/parallel-orchestration).
+
+## Environment variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `GSD_HOME` | `~/.gsd` | Global GSD directory |
+| `GSD_PROJECT_ID` | (auto-hash) | Override project identity hash |
+| `GSD_STATE_DIR` | `$GSD_HOME` | Per-project state root |
+| `GSD_CODING_AGENT_DIR` | `$GSD_HOME/agent` | Agent directory |
+
+## Full example
+
+<Accordion title="Complete preferences file">
+```yaml
+---
+version: 1
+
+models:
+  research: openrouter/deepseek/deepseek-r1
+  planning:
+    model: claude-opus-4-6
+    fallbacks:
+      - openrouter/z-ai/glm-5
+  execution: claude-sonnet-4-6
+  execution_simple: claude-haiku-4-5-20250414
+  completion: claude-sonnet-4-6
+
+token_profile: balanced
+
+dynamic_routing:
+  enabled: true
+  escalate_on_failure: true
+  budget_pressure: true
+
+budget_ceiling: 25.00
+budget_enforcement: pause
+context_pause_threshold: 80
+
+auto_supervisor:
+  soft_timeout_minutes: 15
+  hard_timeout_minutes: 25
+
+git:
+  auto_push: true
+  merge_strategy: squash
+  isolation: none
+  commit_docs: true
+
+skill_discovery: suggest
+always_use_skills:
+  - debug-like-expert
+skill_rules:
+  - when: task involves authentication
+    use: [clerk]
+
+notifications:
+  on_complete: false
+  on_milestone: true
+  on_attention: true
+
+auto_visualize: true
+service_tier: priority
+forensics_dedup: true
+show_token_cost: true
+
+post_unit_hooks:
+  - name: code-review
+    after: [execute-task]
+    prompt: "Review {sliceId}/{taskId} for quality and security."
+    artifact: REVIEW.md
+---
+```
+</Accordion>
--- a/mintlify-docs/guides/cost-management.mdx
+++ b/mintlify-docs/guides/cost-management.mdx
@ -0,0 +1,80 @@
+---
+title: "Cost management"
+description: "Budget ceilings, cost tracking, projections, and enforcement modes."
+---
+
+GSD tracks token usage and cost for every unit of work dispatched during auto mode. This data powers the dashboard, budget enforcement, and cost projections.
+
+## Cost tracking
+
+Every unit's metrics are captured automatically:
+
+- **Token counts** — input, output, cache read, cache write, total
+- **Cost** — USD cost per unit
+- **Duration** — wall-clock time
+- **Tool calls** — number of tool invocations
+- **Message counts** — assistant and user messages
+
+Data is stored in `.gsd/metrics.json` and survives across sessions.
+
+### Viewing costs
+
+`Ctrl+Alt+G` or `/gsd status` shows real-time cost breakdown by:
+
+- Phase (research, planning, execution, completion, reassessment)
+- Slice (M001/S01, M001/S02, ...)
+- Model (which models consumed the most budget)
+- Project totals
+
+## Budget ceiling
+
+```yaml
+budget_ceiling: 50.00
+```
+
+### Enforcement modes
+
+| Mode | Behavior |
+|------|----------|
+| `warn` | Log a warning, continue |
+| `pause` | Pause auto mode (default when ceiling is set) |
+| `halt` | Stop auto mode entirely |
+
+## Cost projections
+
+After two or more slices complete, GSD projects the remaining cost:
+
+```
+Projected remaining: $12.40 ($6.20/slice avg × 2 remaining)
+```
+
+## Budget pressure and model downgrading
+
+When approaching the budget ceiling, the [complexity router](/guides/token-optimization) automatically downgrades model assignments:
+
+| Budget used | Effect |
+|------------|--------|
+| < 50% | No adjustment |
+| 50-75% | Standard tasks → Light |
+| 75-90% | More aggressive |
+| > 90% | Nearly everything downgrades |
+
+## Token profiles and cost
+
+| Profile | Typical savings | How |
+|---------|----------------|-----|
+| `budget` | 40-60% | Cheaper models, phase skipping, minimal context |
+| `balanced` | 10-20% | Default models, skip slice research |
+| `quality` | 0% (baseline) | Full models, all phases |
+
+See [token optimization](/guides/token-optimization) for details.
+
+## Tips
+
+- Start with `balanced` and a generous `budget_ceiling` to establish baseline costs
+- Check `/gsd status` after a few slices to see per-slice averages
+- Switch to `budget` for well-understood, repetitive work
+- Use `quality` only for architectural decisions
+- Per-phase model selection lets you use Opus for planning while keeping execution on Sonnet
+- Enable [dynamic routing](/guides/dynamic-model-routing) for automatic downgrading on simple tasks
+- Use `/gsd visualize` → Metrics tab to see where your budget is going
--- a/mintlify-docs/guides/custom-models.mdx
+++ b/mintlify-docs/guides/custom-models.mdx
@ -0,0 +1,126 @@
+---
+title: "Custom models"
+description: "Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via models.json."
+---
+
+Define custom models and providers in `~/.gsd/agent/models.json`. This lets you add models not in the default registry — self-hosted endpoints, fine-tuned models, proxies, or new provider releases.
+
+The file reloads each time you open `/model` — no restart needed.
+
+## Minimal example
+
+For local models (Ollama, LM Studio, vLLM):
+
+```json
+{
+  "providers": {
+    "ollama": {
+      "baseUrl": "http://localhost:11434/v1",
+      "api": "openai-completions",
+      "apiKey": "ollama",
+      "models": [
+        { "id": "llama3.1:8b" },
+        { "id": "qwen2.5-coder:7b" }
+      ]
+    }
+  }
+}
+```
+
+The `apiKey` is required but Ollama ignores it — any value works.
+
+## Supported APIs
+
+| API | Description |
+|-----|-------------|
+| `openai-completions` | OpenAI Chat Completions (most compatible) |
+| `openai-responses` | OpenAI Responses API |
+| `anthropic-messages` | Anthropic Messages API |
+| `google-generative-ai` | Google Generative AI |
+
+## Provider configuration
+
+| Field | Description |
+|-------|-------------|
+| `baseUrl` | API endpoint URL |
+| `api` | API type |
+| `apiKey` | API key (supports shell commands, env vars, or literals) |
+| `headers` | Custom headers |
+| `authHeader` | Set `true` to add `Authorization: Bearer` automatically |
+| `models` | Array of model configurations |
+| `modelOverrides` | Per-model overrides for built-in models |
+
+### Value resolution
+
+The `apiKey` and `headers` fields support three formats:
+
+```json
+"apiKey": "!security find-generic-password -ws 'anthropic'"  // shell command
+"apiKey": "MY_API_KEY"                                        // env variable
+"apiKey": "sk-..."                                            // literal value
+```
+
+## Model configuration
+
+| Field | Required | Default | Description |
+|-------|----------|---------|-------------|
+| `id` | Yes | — | Model identifier (passed to the API) |
+| `name` | No | `id` | Human-readable label |
+| `api` | No | provider's `api` | Override per model |
+| `reasoning` | No | `false` | Supports extended thinking |
+| `input` | No | `["text"]` | `["text"]` or `["text", "image"]` |
+| `contextWindow` | No | `128000` | Context window size |
+| `maxTokens` | No | `16384` | Maximum output tokens |
+| `cost` | No | all zeros | Per-million tokens: `input`, `output`, `cacheRead`, `cacheWrite` |
+
+## Overriding built-in providers
+
+Route a built-in provider through a proxy without redefining models:
+
+```json
+{
+  "providers": {
+    "anthropic": {
+      "baseUrl": "https://my-proxy.example.com/v1"
+    }
+  }
+}
+```
+
+All built-in Anthropic models remain available. To add custom models alongside built-in ones, include the `models` array.
+
+## OpenAI compatibility
+
+For providers with partial OpenAI compatibility, use the `compat` field at provider or model level:
+
+```json
+{
+  "providers": {
+    "local-llm": {
+      "baseUrl": "http://localhost:8080/v1",
+      "api": "openai-completions",
+      "compat": {
+        "supportsDeveloperRole": false,
+        "supportsReasoningEffort": false
+      },
+      "models": [...]
+    }
+  }
+}
+```
+
+| Field | Description |
+|-------|-------------|
+| `supportsDeveloperRole` | Use `developer` vs `system` role |
+| `supportsReasoningEffort` | Support for `reasoning_effort` parameter |
+| `supportsUsageInStreaming` | Support for `stream_options: { include_usage: true }` |
+| `maxTokensField` | `max_completion_tokens` or `max_tokens` |
+| `thinkingFormat` | `reasoning_effort`, `zai`, `qwen`, or `qwen-chat-template` |
+| `openRouterRouting` | OpenRouter provider selection config |
+| `vercelGatewayRouting` | Vercel AI Gateway provider selection |
+
+## Community provider extensions
+
+| Extension | Provider | Models | Install |
+|-----------|----------|--------|---------|
+| [`pi-dashscope`](https://www.npmjs.com/package/pi-dashscope) | Alibaba DashScope | Qwen3, GLM-5, MiniMax M2.5, Kimi K2.5 | `gsd install npm:pi-dashscope` |
--- a/mintlify-docs/guides/dynamic-model-routing.mdx
+++ b/mintlify-docs/guides/dynamic-model-routing.mdx
@ -0,0 +1,94 @@
+---
+title: "Dynamic model routing"
+description: "Automatically select cheaper models for simple work and reserve expensive models for complex tasks."
+---
+
+Dynamic model routing classifies each dispatched unit into a complexity tier and selects an appropriate model. This reduces token consumption by 20-50% without sacrificing quality where it matters.
+
+The key rule: **downgrade-only semantics**. Your configured model is always the ceiling — routing never upgrades beyond what you've configured.
+
+## Enabling
+
+```yaml
+dynamic_routing:
+  enabled: true
+```
+
+## Complexity tiers
+
+| Tier | Typical work | Default model level |
+|------|-------------|-------------------|
+| **Light** | Slice completion, UAT, hooks | Haiku-class |
+| **Standard** | Research, planning, execution | Sonnet-class |
+| **Heavy** | Replanning, roadmap reassessment | Opus-class |
+
+## Configuration
+
+```yaml
+dynamic_routing:
+  enabled: true
+  tier_models:
+    light: claude-haiku-4-5
+    standard: claude-sonnet-4-6
+    heavy: claude-opus-4-6
+  escalate_on_failure: true    # bump tier on task failure
+  budget_pressure: true        # auto-downgrade near budget ceiling
+  cross_provider: true         # consider models from other providers
+```
+
+### `escalate_on_failure`
+
+When a task fails at a given tier, the router escalates: Light → Standard → Heavy. Prevents cheap models from burning retries on work that needs more reasoning.
+
+### `budget_pressure`
+
+Progressive downgrading as budget ceiling approaches:
+
+| Budget used | Effect |
+|------------|--------|
+| < 50% | No adjustment |
+| 50-75% | Standard → Light |
+| 75-90% | More aggressive |
+| > 90% | Nearly everything → Light |
+
+### `cross_provider`
+
+The router may select models from providers other than your primary, using a built-in cost table to find the cheapest model at each tier.
+
+## Task plan analysis
+
+For `execute-task` units, the classifier analyzes the task plan:
+
+| Signal | Simple → Light | Complex → Heavy |
+|--------|---------------|----------------|
+| Step count | ≤ 3 | ≥ 8 |
+| File count | ≤ 3 | ≥ 8 |
+| Description length | < 500 chars | > 2000 chars |
+| Code blocks | — | ≥ 5 |
+| Complexity keywords | None | Present |
+
+## Adaptive learning
+
+The routing history (`.gsd/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20%, future classifications are bumped up.
+
+User feedback (`/gsd rate`) is weighted 2x vs automatic outcomes.
+
+## Cost table
+
+| Model | Input (per M) | Output (per M) |
+|-------|-------|--------|
+| claude-haiku-4-5 | $0.80 | $4.00 |
+| claude-sonnet-4-6 | $3.00 | $15.00 |
+| claude-opus-4-6 | $15.00 | $75.00 |
+| gpt-4o-mini | $0.15 | $0.60 |
+| gpt-4o | $2.50 | $10.00 |
+| gemini-2.0-flash | $0.10 | $0.40 |
+
+The cost table is for comparison only — actual billing comes from your provider.
+
+## Interaction with token profiles
+
+- **Token profiles** control phase skipping and context compression
+- **Dynamic routing** controls per-unit model selection within those constraints
+
+The `budget` profile + dynamic routing provides maximum cost savings.
--- a/mintlify-docs/guides/git-strategy.mdx
+++ b/mintlify-docs/guides/git-strategy.mdx
@ -0,0 +1,157 @@
+---
+title: "Git strategy"
+description: "Isolation modes, branching model, and merge behavior for milestone work."
+---
+
+GSD uses git for milestone isolation and sequential commits. You choose an **isolation mode** that controls where work happens. The strategy is fully automated — no manual branch management needed.
+
+## Isolation modes
+
+Configure via the `git.isolation` preference:
+
+| Mode | Working directory | Branch | Best for |
+|------|-------------------|--------|----------|
+| `none` (default) | Project root | Current branch | Most projects — no isolation overhead |
+| `worktree` | `.gsd/worktrees/<MID>/` | `milestone/<MID>` | Full file isolation |
+| `branch` | Project root | `milestone/<MID>` | Submodule-heavy repos |
+
+### `none` mode (default)
+
+Work happens directly on your current branch. No worktree, no milestone branch. GSD still commits sequentially with conventional commit messages, but there's no branch isolation. This is the simplest mode and works well for most projects.
+
+### `worktree` mode
+
+Each milestone gets its own git worktree on a `milestone/<MID>` branch. All execution happens inside the worktree. On completion, the worktree is squash-merged to main as one clean commit. The worktree and branch are cleaned up.
+
+### `branch` mode
+
+Work happens in the project root on a `milestone/<MID>` branch. No worktree is created. On completion, the branch is merged to main.
+
+<Note>
+**Changed in v2.45.0:** The default isolation mode changed from `worktree` to `none`. If your workflow relies on worktree isolation, set `git.isolation: worktree` explicitly in your preferences.
+</Note>
+
+## Branching model
+
+```
+main ─────────────────────────────────────────────────────────
+  │                                                     ↑
+  └── milestone/M001 (worktree) ────────────────────────┘
+       commit: feat: core types
+       commit: feat: markdown parser
+       commit: feat: file writer
+       → squash-merged to main as single commit
+```
+
+### Parallel worktrees
+
+With [parallel orchestration](/guides/parallel-orchestration) enabled, multiple milestones run in separate worktrees simultaneously:
+
+```
+main ──────────────────────────────────────────────────────────
+  │                                      ↑              ↑
+  ├── milestone/M002 (worktree) ─────────┘              │
+  │    → squash-merged first                            │
+  │                                                     │
+  └── milestone/M003 (worktree) ────────────────────────┘
+       → squash-merged second
+```
+
+Merges happen sequentially to avoid conflicts.
+
+### Commit format
+
+Conventional commit format with GSD metadata in trailers:
+
+```
+feat: core type definitions
+
+GSD-Task: M001/S01/T01
+
+feat: markdown parser for plan files
+
+GSD-Task: M001/S01/T02
+```
+
+## Workflow modes
+
+Set `mode` to get sensible defaults:
+
+```yaml
+mode: solo    # personal projects
+mode: team    # shared repos
+```
+
+| Setting | `solo` | `team` |
+|---|---|---|
+| `git.auto_push` | `true` | `false` |
+| `git.push_branches` | `false` | `true` |
+| `git.pre_merge_check` | `false` | `true` |
+| `git.merge_strategy` | `"squash"` | `"squash"` |
+| `unique_milestone_ids` | `false` | `true` |
+
+Mode defaults are the lowest priority — any explicit preference overrides them.
+
+## Git preferences
+
+```yaml
+git:
+  auto_push: false
+  push_branches: false
+  remote: origin
+  snapshots: false
+  pre_merge_check: false
+  commit_type: feat
+  main_branch: main
+  merge_strategy: squash    # "squash" or "merge"
+  isolation: none           # "none" (default), "worktree", or "branch"
+  commit_docs: true
+  auto_pr: false
+  pr_target_branch: develop
+```
+
+### Automatic pull requests
+
+For teams using Gitflow or branch-based workflows:
+
+```yaml
+git:
+  auto_push: true
+  auto_pr: true
+  pr_target_branch: develop
+```
+
+Pushes the milestone branch and creates a PR targeting your specified branch. Requires `gh` CLI installed and authenticated.
+
+### `commit_docs: false`
+
+Adds `.gsd/` to `.gitignore` and keeps all planning artifacts local-only. Useful for teams where only some members use GSD.
+
+## Worktree management
+
+### Automatic (auto mode)
+
+1. Milestone starts → worktree created at `.gsd/worktrees/<MID>/`
+2. Planning artifacts copied into the worktree
+3. All execution happens inside the worktree
+4. Milestone completes → squash-merged to main
+5. Worktree and branch cleaned up
+
+### Manual
+
+```
+/worktree create
+/worktree switch
+/worktree merge
+/worktree remove
+```
+
+## Self-healing
+
+GSD includes automatic recovery for common git issues:
+
+- **Detached HEAD** — automatically reattaches to the correct branch
+- **Stale lock files** — removes `index.lock` files from crashed processes
+- **Orphaned worktrees** — detects and offers cleanup
+
+Run `/gsd doctor` to check git health manually.
--- a/mintlify-docs/guides/migration.mdx
+++ b/mintlify-docs/guides/migration.mdx
@ -0,0 +1,47 @@
+---
+title: "Migration from v1"
+description: "Migrate .planning directories from the original GSD to GSD-2's .gsd format."
+---
+
+If you have projects with `.planning` directories from the original Get Shit Done (v1), you can migrate them to GSD-2's `.gsd` format.
+
+## Running the migration
+
+```bash
+# From within the project directory
+/gsd migrate
+
+# Or specify a path
+/gsd migrate ~/projects/my-old-project
+```
+
+## What gets migrated
+
+The migration tool:
+
+- Parses `PROJECT.md`, `ROADMAP.md`, `REQUIREMENTS.md`, phase directories, plans, summaries, and research
+- Maps phases → slices, plans → tasks, milestones → milestones
+- Preserves completion state (`[x]` phases stay done, summaries carry over)
+- Consolidates research files
+- Shows a preview before writing anything
+- Optionally runs an agent-driven review of the output
+
+## Supported formats
+
+The migration handles various v1 format variations:
+
+- Milestone-sectioned roadmaps with `<details>` blocks
+- Bold phase entries
+- Bullet-format requirements
+- Decimal phase numbering
+- Duplicate phase numbers across milestones
+
+## Post-migration
+
+Verify the output:
+
+```
+/gsd doctor
+```
+
+This checks `.gsd/` integrity and flags any structural issues.
--- a/mintlify-docs/guides/parallel-orchestration.mdx
+++ b/mintlify-docs/guides/parallel-orchestration.mdx
@ -0,0 +1,123 @@
+---
+title: "Parallel orchestration"
+description: "Run multiple milestones simultaneously in isolated git worktrees."
+---
+
+Run multiple milestones simultaneously. Each gets its own worker process, branch, and context window — while a coordinator tracks progress, enforces budgets, and keeps everything in sync.
+
+<Note>
+Parallel mode is behind `parallel.enabled: false` by default. Opt-in only.
+</Note>
+
+## Quick start
+
+1. Enable in preferences:
+
+```yaml
+parallel:
+  enabled: true
+  max_workers: 2
+```
+
+2. Start parallel execution:
+
+```
+/gsd parallel start
+```
+
+3. Monitor progress:
+
+```
+/gsd parallel status
+```
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────┐
+│  Coordinator (your GSD session)                      │
+│                                                      │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
+│  │ Worker 1 │  │ Worker 2 │  │ Worker 3 │  ...       │
+│  │ M001     │  │ M003     │  │ M005     │           │
+│  └──────────┘  └──────────┘  └──────────┘           │
+│       │              │              │                │
+│       ▼              ▼              ▼                │
+│  .gsd/worktrees/ .gsd/worktrees/ .gsd/worktrees/    │
+└─────────────────────────────────────────────────────┘
+```
+
+### Worker isolation
+
+| Resource | Isolation method |
+|----------|-----------------|
+| Filesystem | Git worktree — separate checkout |
+| Git branch | `milestone/<MID>` per milestone |
+| State | `GSD_MILESTONE_LOCK` — each worker sees only its milestone |
+| Context | Separate process with its own agent sessions |
+| Metrics | Each worktree has its own `metrics.json` |
+
+## Eligibility analysis
+
+Before starting, GSD checks which milestones can run concurrently:
+
+1. **Not complete** — finished milestones are skipped
+2. **Dependencies satisfied** — all `dependsOn` entries must be complete
+3. **File overlap check** — shared files get a warning (not a blocker)
+
+## Configuration
+
+```yaml
+parallel:
+  enabled: false
+  max_workers: 2
+  budget_ceiling: 50.00
+  merge_strategy: "per-milestone"    # or "per-slice"
+  auto_merge: "confirm"              # "auto", "confirm", or "manual"
+```
+
+| Key | Default | Description |
+|-----|---------|-------------|
+| `enabled` | `false` | Master toggle |
+| `max_workers` | `2` | Concurrent workers (1-4) |
+| `budget_ceiling` | none | Aggregate cost limit across all workers |
+| `merge_strategy` | `"per-milestone"` | When to merge back to main |
+| `auto_merge` | `"confirm"` | How merge-back is handled |
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd parallel start` | Analyze, confirm, and start workers |
+| `/gsd parallel status` | Show workers with state, progress, cost |
+| `/gsd parallel stop [MID]` | Stop all or a specific worker |
+| `/gsd parallel pause [MID]` | Pause all or a specific worker |
+| `/gsd parallel resume [MID]` | Resume paused workers |
+| `/gsd parallel merge [MID]` | Merge completed milestones to main |
+
+## Merge reconciliation
+
+- `.gsd/` state files — auto-resolved (accept milestone branch version)
+- Code conflicts — merge halts, shows conflicting files. Resolve manually and retry.
+
+## Budget management
+
+When `budget_ceiling` is set, aggregate cost is tracked across all workers. Ceiling reached → coordinator signals workers to stop.
+
+## Troubleshooting
+
+### "No milestones are eligible"
+
+All milestones are complete or blocked by dependencies. Check `/gsd queue`.
+
+### Worker crashed
+
+Workers persist state to disk. On restart, the coordinator detects dead PIDs. Run `/gsd doctor --fix` to clean up, then `/gsd parallel start` to spawn new workers.
+
+### Merge conflicts
+
+```
+/gsd parallel merge       # see which milestones conflict
+# resolve in .gsd/worktrees/<MID>/
+/gsd parallel merge MID   # retry
+```
--- a/mintlify-docs/guides/remote-questions.mdx
+++ b/mintlify-docs/guides/remote-questions.mdx
@ -0,0 +1,84 @@
+---
+title: "Remote questions"
+description: "Discord, Slack, and Telegram integration for headless auto-mode."
+---
+
+Remote questions allow GSD to ask for user input via Slack, Discord, or Telegram when running in headless auto-mode. When GSD encounters a decision point, it posts the question to your configured channel and polls for a response.
+
+## Setup
+
+<Tabs>
+  <Tab title="Discord">
+    ```
+    /gsd remote discord
+    ```
+
+    The setup wizard validates your bot token, picks a server and channel, sends a test message, and saves the config.
+
+    **Bot requirements:**
+    - A Discord bot token from the [Developer Portal](https://discord.com/developers/applications)
+    - Permissions: Send Messages, Read Message History, Add Reactions, View Channel
+  </Tab>
+  <Tab title="Slack">
+    ```
+    /gsd remote slack
+    ```
+
+    The setup wizard validates your bot token, picks a channel, sends a test message, and saves the config.
+
+    **Bot requirements:**
+    - A Slack bot token (`xoxb-...`) from [Slack API](https://api.slack.com/apps)
+    - Scopes: `chat:write`, `reactions:read`, `reactions:write`, `channels:read`, `groups:read`, `channels:history`, `groups:history`
+  </Tab>
+  <Tab title="Telegram">
+    ```
+    /gsd remote telegram
+    ```
+
+    The setup wizard validates your bot token, prompts for a chat ID, sends a test message, and saves the config.
+
+    **Bot requirements:**
+    - A bot token from [@BotFather](https://t.me/BotFather)
+    - Bot must be added to the target group chat
+  </Tab>
+</Tabs>
+
+## Configuration
+
+```yaml
+remote_questions:
+  channel: discord
+  channel_id: "1234567890123456789"
+  timeout_minutes: 5
+  poll_interval_seconds: 5
+```
+
+## How it works
+
+1. GSD encounters a decision point during auto-mode
+2. The question is posted to your channel as a rich embed (Discord) or Block Kit message (Slack)
+3. GSD polls for a response at the configured interval
+4. You respond by reacting with a number emoji or replying with text
+5. GSD picks up the response and continues
+6. A check reaction confirms receipt
+
+### Response formats
+
+**Single question:** React with a number emoji (1️⃣-5️⃣) or reply with a number.
+
+**Multiple questions:** Reply with semicolons (`1;2;custom text`) or one answer per line.
+
+### Timeouts
+
+If no response within `timeout_minutes`, the LLM makes a conservative default choice or pauses auto-mode.
+
+## Commands
+
+| Command | Description |
+|---------|-------------|
+| `/gsd remote` | Show menu and current status |
+| `/gsd remote slack` | Set up Slack |
+| `/gsd remote discord` | Set up Discord |
+| `/gsd remote telegram` | Set up Telegram |
+| `/gsd remote status` | Show current config and last prompt status |
+| `/gsd remote disconnect` | Remove configuration |
--- a/mintlify-docs/guides/skills.mdx
+++ b/mintlify-docs/guides/skills.mdx
@ -0,0 +1,97 @@
+---
+title: "Skills"
+description: "Specialized instruction sets that provide domain-specific guidance to the LLM."
+---
+
+Skills are specialized instruction sets that GSD loads when the task matches. They provide domain-specific guidance — coding patterns, framework idioms, testing strategies, and tool usage.
+
+## Bundled skills
+
+GSD ships with these skills, installed to `~/.gsd/agent/skills/`:
+
+| Skill | Trigger | Description |
+|-------|---------|-------------|
+| `frontend-design` | Web UI work | Production-grade frontend with high design quality |
+| `swiftui` | macOS/iOS apps | Full lifecycle from creation to shipping |
+| `debug-like-expert` | Complex debugging | Methodical investigation with evidence gathering |
+| `rust-core` | Rust code | Idiomatic, safe, performant Rust patterns |
+| `axum-web-framework` | Axum web apps | Complete Axum development guide |
+| `tauri` | Tauri v2 desktop apps | Cross-platform desktop development |
+| `github-workflows` | GitHub Actions | CI/CD, workflow debugging |
+| `security-audit` | Security auditing | Dependency scanning, OWASP |
+| `review` | Code review | Diff-aware quality analysis |
+| `test` | Test generation | Auto-detects frameworks |
+| `lint` | Linting and formatting | ESLint, Biome, Prettier |
+
+## Skill discovery
+
+The `skill_discovery` preference controls how GSD finds skills:
+
+| Mode | Behavior |
+|------|----------|
+| `auto` | Skills found and applied automatically |
+| `suggest` | Skills identified but require confirmation (default) |
+| `off` | No skill discovery |
+
+## Skill preferences
+
+```yaml
+always_use_skills:
+  - debug-like-expert
+prefer_skills:
+  - frontend-design
+avoid_skills:
+  - security-docker
+skill_rules:
+  - when: task involves Clerk authentication
+    use: [clerk]
+  - when: frontend styling work
+    prefer: [frontend-design]
+```
+
+### Resolution order
+
+1. **Bare name** — e.g., `frontend-design` → scans `~/.gsd/agent/skills/` and project skills
+2. **Absolute path** — e.g., `/Users/you/.gsd/agent/skills/my-skill/SKILL.md`
+3. **Directory path** — looks for `SKILL.md` inside
+
+User skills take precedence over project skills.
+
+## Custom skills
+
+Create a directory with a `SKILL.md` file:
+
+```
+~/.gsd/agent/skills/my-skill/
+  SKILL.md           — instructions for the LLM
+  references/        — optional reference files
+```
+
+### Project-local skills
+
+```
+.gsd/agent/skills/my-project-skill/
+  SKILL.md
+```
+
+## Skill health dashboard
+
+```
+/gsd skill-health              # overview table
+/gsd skill-health rust-core    # detailed view
+/gsd skill-health --stale 30   # unused for 30+ days
+/gsd skill-health --declining  # falling success rates
+```
+
+The dashboard flags:
+- Success rate below 70% over the last 10 uses
+- Token usage rising 20%+
+- Skills unused beyond the staleness threshold
+
+### Staleness detection
+
+```yaml
+skill_staleness_days: 60    # default: 60, set 0 to disable
+```
+
+Stale skills are excluded from automatic matching but remain invokable explicitly.
--- a/mintlify-docs/guides/token-optimization.mdx
+++ b/mintlify-docs/guides/token-optimization.mdx
@ -0,0 +1,175 @@
+---
+title: "Token optimization"
+description: "Token profiles, context compression, and complexity-based task routing to reduce costs by 40-60%."
+---
+
+GSD's token optimization system has three pillars: **token profiles**, **context compression**, and **complexity-based task routing**.
+
+## Token profiles
+
+A token profile coordinates model selection, phase skipping, and context compression. Set it in preferences:
+
+```yaml
+token_profile: balanced
+```
+
+### `budget` — maximum savings (40-60% reduction)
+
+| Dimension | Setting |
+|-----------|---------|
+| Planning model | Sonnet |
+| Execution model | Sonnet |
+| Simple task model | Haiku |
+| Completion model | Haiku |
+| Milestone research | Skipped |
+| Slice research | Skipped |
+| Reassessment | Skipped |
+| Context level | Minimal |
+
+Best for: prototyping, small projects, well-understood codebases.
+
+### `balanced` — smart defaults
+
+| Dimension | Setting |
+|-----------|---------|
+| All models | User's default |
+| Subagent model | Sonnet |
+| Milestone research | Runs |
+| Slice research | Skipped |
+| Reassessment | Runs |
+| Context level | Standard |
+
+Best for: most projects, day-to-day development.
+
+### `quality` — full context
+
+Every phase runs. Every context artifact is inlined. No shortcuts. Best for: complex architectures, greenfield projects, critical production work.
+
+## Context compression
+
+Each profile maps to an **inline level** controlling how much context is pre-loaded into dispatch prompts:
+
+| Profile | Level | What's included |
+|---------|-------|-----------------|
+| `budget` | Minimal | Task plan, essential prior summaries (truncated). Drops decisions, requirements, templates. |
+| `balanced` | Standard | Task plan, prior summaries, slice plan, roadmap excerpt. |
+| `quality` | Full | Everything — all plans, summaries, decisions, requirements, templates. |
+
+### Prompt compression
+
+GSD can apply deterministic text compression before falling back to section-boundary truncation:
+
+```yaml
+compression_strategy: compress    # or "truncate"
+```
+
+| Strategy | Behavior | Default for |
+|----------|----------|------------|
+| `truncate` | Drop entire sections at boundaries | `quality` |
+| `compress` | Heuristic text compression first, then truncate | `budget`, `balanced` |
+
+### Context selection
+
+```yaml
+context_selection: smart    # or "full"
+```
+
+| Mode | Behavior | Default for |
+|------|----------|------------|
+| `full` | Inline entire files | `balanced`, `quality` |
+| `smart` | TF-IDF semantic chunking for large files | `budget` |
+
+## Complexity-based task routing
+
+GSD classifies each task by complexity and routes it to an appropriate model tier.
+
+<Warning>
+Dynamic routing requires explicit `models` in your preferences. Without a `models` section, routing is skipped.
+</Warning>
+
+### Classification signals
+
+| Signal | Simple | Standard | Complex |
+|--------|--------|----------|---------|
+| Step count | ≤ 3 | 4-7 | ≥ 8 |
+| File count | ≤ 3 | 4-7 | ≥ 8 |
+| Description length | < 500 chars | 500-2000 | > 2000 chars |
+| Code blocks | — | — | ≥ 5 |
+| Complexity keywords | None | Any present | — |
+
+**Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`
+
+### Budget pressure
+
+When approaching the budget ceiling, the classifier automatically downgrades tiers:
+
+| Budget used | Effect |
+|------------|--------|
+| < 50% | No adjustment |
+| 50-75% | Standard → Light |
+| 75-90% | More aggressive |
+| > 90% | Everything except Heavy → Light |
+
+## Adaptive learning
+
+GSD tracks success/failure per tier and adjusts classifications over time. User feedback via `/gsd rate` is weighted 2x:
+
+```
+/gsd rate over    # model was overpowered
+/gsd rate ok      # appropriate
+/gsd rate under   # too weak
+```
+
+## Configuration examples
+
+<Tabs>
+  <Tab title="Cost-optimized">
+    ```yaml
+    ---
+    version: 1
+    token_profile: budget
+    budget_ceiling: 25.00
+    models:
+      execution_simple: claude-haiku-4-5-20250414
+    ---
+    ```
+  </Tab>
+  <Tab title="Balanced with custom models">
+    ```yaml
+    ---
+    version: 1
+    token_profile: balanced
+    models:
+      planning:
+        model: claude-opus-4-6
+        fallbacks:
+          - openrouter/z-ai/glm-5
+      execution: claude-sonnet-4-6
+    ---
+    ```
+  </Tab>
+  <Tab title="Full quality">
+    ```yaml
+    ---
+    version: 1
+    token_profile: quality
+    models:
+      planning: claude-opus-4-6
+      execution: claude-opus-4-6
+    ---
+    ```
+  </Tab>
+</Tabs>
+
+Per-phase overrides always win over profile defaults:
+
+```yaml
+---
+version: 1
+token_profile: budget
+phases:
+  skip_research: false       # keep research despite budget profile
+models:
+  planning: claude-opus-4-6  # use Opus for planning despite budget
+---
+```
--- a/mintlify-docs/guides/troubleshooting.mdx
+++ b/mintlify-docs/guides/troubleshooting.mdx
@ -0,0 +1,158 @@
+---
+title: "Troubleshooting"
+description: "Common issues, /gsd doctor, /gsd forensics, and recovery procedures."
+---
+
+## `/gsd doctor`
+
+The built-in diagnostic tool validates `.gsd/` integrity:
+
+```
+/gsd doctor
+```
+
+It checks file structure, referential integrity, completion state consistency, git worktree health, and stale lock files.
+
+## Common issues
+
+<AccordionGroup>
+  <Accordion title="Auto mode loops on the same unit">
+    **Cause:** Stale cache after a crash, or the LLM didn't produce the expected artifact.
+
+    **Fix:** Run `/gsd doctor` to repair state, then `/gsd auto`.
+  </Accordion>
+
+  <Accordion title="Auto mode stops with 'Loop detected'">
+    **Cause:** A unit failed to produce its expected artifact twice in a row.
+
+    **Fix:** Check the task plan for clarity. Refine it manually, then `/gsd auto`.
+  </Accordion>
+
+  <Accordion title="command not found: gsd">
+    **Cause:** npm's global bin directory isn't in `$PATH`.
+
+    **Fix:**
+    ```bash
+    npm prefix -g
+    echo 'export PATH="$(npm prefix -g)/bin:$PATH"' >> ~/.zshrc
+    source ~/.zshrc
+    ```
+
+    **Workaround:** `npx gsd-pi` or `$(npm prefix -g)/bin/gsd`
+  </Accordion>
+
+  <Accordion title="Provider errors during auto mode">
+    | Error type | Auto-resume? | Delay |
+    |-----------|-------------|-------|
+    | Rate limit (429) | Yes | retry-after or 60s |
+    | Server error (500, 502, 503) | Yes | 30s |
+    | Auth/billing | No | Manual resume |
+
+    For transient errors, configure fallback models:
+    ```yaml
+    models:
+      execution:
+        model: claude-sonnet-4-6
+        fallbacks:
+          - openrouter/minimax/minimax-m2.5
+    ```
+  </Accordion>
+
+  <Accordion title="Budget ceiling reached">
+    Increase `budget_ceiling` in preferences, or switch to `budget` token profile. Resume with `/gsd auto`.
+  </Accordion>
+
+  <Accordion title="Stale lock file">
+    GSD auto-detects stale locks. If automatic recovery fails:
+    ```bash
+    rm -f .gsd/auto.lock
+    rm -rf "$(dirname .gsd)/.gsd.lock"
+    ```
+  </Accordion>
+
+  <Accordion title="Git merge conflicts on .gsd/ files">
+    GSD auto-resolves conflicts on `.gsd/` runtime files. For code conflicts, the LLM attempts resolution. If that fails, resolve manually.
+  </Accordion>
+
+  <Accordion title="EBUSY / EPERM / EACCES on Windows">
+    **Cause:** Antivirus, indexers, or editors briefly locking files during atomic rename.
+
+    **Fix:** Re-run the operation. Close tools holding files open if the error persists. Run `/gsd doctor` to verify repo health.
+  </Accordion>
+
+  <Accordion title="Worktree isolation stopped working after upgrade to v2.45+">
+    **Cause:** The default `git.isolation` mode changed from `worktree` to `none` in v2.45.0.
+
+    **Fix:** Set `git.isolation: worktree` explicitly in your preferences:
+    ```yaml
+    git:
+      isolation: worktree
+    ```
+  </Accordion>
+
+  <Accordion title="Node.js version or git not found at startup">
+    **Cause:** GSD v2.45+ checks for Node.js >= 22 and git availability at startup.
+
+    **Fix:** Install Node.js 22+ (24 LTS recommended) and ensure `git` is in your PATH.
+  </Accordion>
+</AccordionGroup>
+
+## `/gsd forensics`
+
+Full-access debugger for post-mortem analysis:
+
+```
+/gsd forensics [optional problem description]
+```
+
+Provides anomaly detection, unit traces, metrics analysis, doctor integration, and LLM-guided investigation.
+
+## MCP client issues
+
+Use `/gsd mcp` to check MCP server status and connectivity at a glance.
+
+<AccordionGroup>
+  <Accordion title="No configured servers">
+    Verify `.mcp.json` or `.gsd/mcp.json` exists and parses as valid JSON.
+  </Accordion>
+
+  <Accordion title="mcp_discover times out">
+    Run the configured command outside GSD to confirm the server starts. Check backend URLs and dependencies.
+  </Accordion>
+
+  <Accordion title="Local server works manually but not in GSD">
+    Use absolute paths. Set required environment variables in the MCP config's `env` block.
+  </Accordion>
+</AccordionGroup>
+
+## Recovery procedures
+
+### Reset auto mode state
+
+```bash
+rm .gsd/auto.lock
+rm .gsd/completed-units.json
+```
+
+Then `/gsd auto` to restart from current disk state.
+
+### Reset routing history
+
+```bash
+rm .gsd/routing-history.json
+```
+
+### Full state rebuild
+
+```
+/gsd doctor
+```
+
+Rebuilds `STATE.md` from plan and roadmap files on disk.
+
+## Getting help
+
+- **GitHub Issues:** [github.com/gsd-build/GSD-2/issues](https://github.com/gsd-build/gsd-2/issues)
+- **Dashboard:** `Ctrl+Alt+G` or `/gsd status`
+- **Forensics:** `/gsd forensics`
+- **Session logs:** `.gsd/activity/`
--- a/mintlify-docs/guides/visualizer.mdx
+++ b/mintlify-docs/guides/visualizer.mdx
@ -0,0 +1,82 @@
+---
+title: "Workflow visualizer"
+description: "Interactive TUI overlay for progress, dependencies, metrics, and timeline."
+---
+
+The workflow visualizer is a full-screen TUI overlay with four tabs showing project progress, dependencies, cost metrics, and execution timeline.
+
+## Opening
+
+```
+/gsd visualize
+```
+
+Or configure automatic display after milestone completion:
+
+```yaml
+auto_visualize: true
+```
+
+## Tabs
+
+Switch tabs with `Tab`, `1`-`4`, or arrow keys.
+
+### 1. Progress
+
+A tree view of milestones, slices, and tasks with completion status:
+
+```
+M001: User Management                        3/6 tasks ⏳
+  ✅ S01: Auth module                         3/3 tasks
+    ✅ T01: Core types
+    ✅ T02: JWT middleware
+    ✅ T03: Login flow
+  ⏳ S02: User dashboard                      1/2 tasks
+    ✅ T01: Layout component
+    ⬜ T02: Profile page
+```
+
+### 2. Dependencies
+
+ASCII dependency graph showing slice relationships:
+
+```
+S01 ──→ S02 ──→ S04
+  └───→ S03 ──↗
+```
+
+### 3. Metrics
+
+Bar charts showing cost and token usage by phase, slice, and model.
+
+### 4. Timeline
+
+Chronological execution history with unit type, timestamps, duration, model, and token counts.
+
+## Controls
+
+| Key | Action |
+|-----|--------|
+| `Tab` | Next tab |
+| `Shift+Tab` | Previous tab |
+| `1`-`4` | Jump to tab |
+| `↑`/`↓` | Scroll |
+| `Escape` / `q` | Close |
+
+The visualizer refreshes from disk every 2 seconds, staying current alongside a running auto-mode session.
+
+## HTML export
+
+For shareable reports outside the terminal:
+
+```
+/gsd export --html
+```
+
+Generates a self-contained HTML file in `.gsd/reports/` with progress tree, dependency graph (SVG), cost/token charts, execution timeline, and changelog. All CSS and JS are inlined — printable to PDF from any browser.
+
+```yaml
+auto_report: true    # auto-generate after milestone completion (default)
+```
+
+An auto-generated `index.html` shows all reports with progression metrics across milestones.
--- a/mintlify-docs/guides/web-interface.mdx
+++ b/mintlify-docs/guides/web-interface.mdx
@ -0,0 +1,38 @@
+---
+title: "Web interface"
+description: "Browser-based project management with real-time progress and multi-project support."
+---
+
+GSD includes a browser-based web interface for project management, real-time progress monitoring, and multi-project support.
+
+## Quick start
+
+```bash
+gsd --web
+```
+
+### CLI flags
+
+```bash
+gsd --web --host 0.0.0.0 --port 8080 --allowed-origins "https://example.com"
+```
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--host` | `localhost` | Bind address |
+| `--port` | `3000` | Port |
+| `--allowed-origins` | (none) | Comma-separated CORS origins |
+
+## Features
+
+- **Project management** — view milestones, slices, and tasks in a visual dashboard
+- **Real-time progress** — server-sent events push status updates during auto-mode
+- **Multi-project support** — manage multiple projects from a single tab via `?project=` URL parameter
+- **Change project root** — switch directories from the web UI without restarting
+- **Onboarding flow** — API key setup and provider configuration through the browser
+- **Model selection** — switch models and providers from the web UI
+
+## Platform notes
+
+- **macOS/Linux** — full support
+- **Windows** — web build is skipped due to Next.js webpack issues. The CLI remains fully functional.
--- a/mintlify-docs/guides/working-in-teams.mdx
+++ b/mintlify-docs/guides/working-in-teams.mdx
@ -0,0 +1,72 @@
+---
+title: "Working in teams"
+description: "Multi-user workflows with unique milestone IDs, push branches, and shared planning artifacts."
+---
+
+GSD supports multi-user workflows where several developers work on the same repository concurrently.
+
+## Setup
+
+### 1. Set team mode
+
+```yaml
+# .gsd/PREFERENCES.md (project-level, committed to git)
+---
+version: 1
+mode: team
+---
+```
+
+This enables unique milestone IDs, push branches, and pre-merge checks in one setting. Override individual settings on top of `mode: team` as needed.
+
+### 2. Configure `.gitignore`
+
+Share planning artifacts while keeping runtime files local:
+
+```bash
+# Runtime / ephemeral (per-developer)
+.gsd/auto.lock
+.gsd/completed-units.json
+.gsd/STATE.md
+.gsd/metrics.json
+.gsd/activity/
+.gsd/runtime/
+.gsd/worktrees/
+.gsd/milestones/**/continue.md
+.gsd/milestones/**/*-CONTINUE.md
+```
+
+**Shared** (committed): preferences, PROJECT.md, REQUIREMENTS.md, DECISIONS.md, milestones.
+
+**Local** (gitignored): lock files, metrics, state cache, worktrees, activity logs.
+
+### 3. Commit
+
+```bash
+git add .gsd/PREFERENCES.md
+git commit -m "chore: enable GSD team workflow"
+```
+
+## `commit_docs: false`
+
+For teams where only some members use GSD:
+
+```yaml
+git:
+  commit_docs: false
+```
+
+Adds `.gsd/` to `.gitignore` entirely. The developer gets structured planning without affecting teammates.
+
+## Parallel development
+
+Multiple developers run auto mode simultaneously on different milestones. Each developer gets their own worktree and unique `milestone/<MID>` branch. Milestone dependencies can be declared:
+
+```yaml
+# M00X-CONTEXT.md frontmatter
+---
+depends_on: [M001-eh88as]
+---
+```
+
+GSD enforces that dependent milestones complete before starting downstream work.
--- a/Show more
+++ b/Show more