Merge branch 'main' into fix/copy-preferences-to-worktree

2026-03-26 16:24:54 -06:00 · 2026-03-26 16:24:54 -06:00 · cceee1d196
commit cceee1d196
parent fa2bb7e8a8 53d2da15b5
158 changed files with 6396 additions and 714 deletions
--- a/.plans/issue-575-dynamic-model-routing.md
+++ b/.plans/issue-575-dynamic-model-routing.md
@ -11,7 +11,7 @@ Users on capped plans (e.g., Claude Pro) exhaust weekly token limits in 15-20 ho
 ## Current Architecture

 ### What Exists
- **Phase-based model config:** Users can set different models per phase via `preferences.md` (research, planning, execution, completion)
+- **Phase-based model config:** Users can set different models per phase via `PREFERENCES.md` (research, planning, execution, completion)
 - **Fallback chains:** Each phase supports `fallbacks: [model1, model2]` for error recovery
 - **Pre-dispatch hooks:** `PreDispatchResult` has a `model` field but it's **never applied** in `auto.ts` — this is a ready-made extension point
 - **Model registry:** `ModelRegistry.getAvailable()` provides all configured models with metadata
--- a/.plans/onboarding-detection-wizard.md
+++ b/.plans/onboarding-detection-wizard.md
@ -134,7 +134,7 @@ Quick filesystem scan (no heavy reads):

 ### Task 1.4: `isFirstEverLaunch(): boolean`

-Returns `true` if `~/.gsd/` doesn't exist or has no `preferences.md`.
+Returns `true` if `~/.gsd/` doesn't exist or has no `PREFERENCES.md`.

 ---

@ -298,7 +298,7 @@ Step 8: Advanced (collapsed by default, expandable)

 Step 9: Bootstrap .gsd/ structure
   - Creates .gsd/milestones/
-   - Creates .gsd/preferences.md (from wizard answers)
+   - Creates .gsd/PREFERENCES.md (from wizard answers)
   - Creates .gitignore entries
   - Seeds CONTEXT.md with detected project signals
   - Commits "chore: init gsd" (if commit_docs enabled)
--- a/.plans/preferences-wizard-completeness.md
+++ b/.plans/preferences-wizard-completeness.md
@ -42,7 +42,7 @@ The `/gsd prefs wizard` currently only configures 6 of 18+ preference fields. Us
 - Added missing keys to `orderedKeys` in `serializePreferencesToFrontmatter()`

 ### Group 6: Update Template & Docs ✓
- Updated `templates/preferences.md` with new fields
+- Updated `templates/PREFERENCES.md` with new fields
 - Updated `docs/preferences-reference.md` with budget, notifications, git, hooks

 ### Group 7: Tests ✓
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -53,7 +53,7 @@ git rebase origin/main
 GSD uses worktree-based isolation for multi-developer work. If you're contributing with GSD running, enable team mode in your project preferences:

 ```yaml
-# .gsd/preferences.md
+# .gsd/PREFERENCES.md
 ---
 version: 1
 mode: team
--- a/25
+++ b/25
@ -1,30 +1,9 @@
 # ──────────────────────────────────────────────
-# Stage 1: CI Builder
-# Image: ghcr.io/gsd-build/gsd-ci-builder
-# Used by: pipeline.yml Dev stage
-# ──────────────────────────────────────────────
-FROM node:24-bookworm AS builder
-
-# Rust toolchain (stable, minimal profile)
-RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable --profile minimal
-ENV PATH="/root/.cargo/bin:${PATH}"
-
-# Cross-compilation for linux-arm64
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    gcc-aarch64-linux-gnu \
-    g++-aarch64-linux-gnu \
-    && rustup target add aarch64-unknown-linux-gnu \
-    && rm -rf /var/lib/apt/lists/*
-
-# Verify toolchain
-RUN node --version && rustc --version && cargo --version
-
-# ──────────────────────────────────────────────
-# Stage 2: Runtime
+# Runtime
 # Image: ghcr.io/gsd-build/gsd-pi
 # Used by: end users via docker run
 # ──────────────────────────────────────────────
-FROM node:24-slim AS runtime
+FROM node:24-slim

 # Git is required for GSD's git operations
 RUN apt-get update && apt-get install -y --no-install-recommends \
--- a/README.md
+++ b/README.md
@ -521,7 +521,7 @@ An auto-generated `index.html` shows all reports with progression metrics across

 ### Preferences

-GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project). Manage with `/gsd prefs`.
+GSD preferences live in `~/.gsd/PREFERENCES.md` (global) or `.gsd/PREFERENCES.md` (project). Manage with `/gsd prefs`.

 ```yaml
 ---
@ -672,7 +672,7 @@ The best practice for working in teams is to ensure unique milestone names acros

 ### Unique Milestone Names

-Create or amend your `.gsd/preferences.md` file within the repo to include `unique_milestone_ids: true` e.g.
+Create or amend your `.gsd/PREFERENCES.md` file within the repo to include `unique_milestone_ids: true` e.g.

 ```markdown
 ---
@ -681,7 +681,7 @@ unique_milestone_ids: true
 ---
 ```

-With the above `.gitignore` set up, the `.gsd/preferences.md` file is checked into the repo ensuring all teammates use unique milestone names to avoid collisions.
+With the above `.gitignore` set up, the `.gsd/PREFERENCES.md` file is checked into the repo ensuring all teammates use unique milestone names to avoid collisions.

 Milestone names will now be generated with a 6 char random string appended e.g. instead of `M001` you'll get something like `M001-ush8s3`

@ -689,7 +689,7 @@ Milestone names will now be generated with a 6 char random string appended e.g.

 1. Ensure you are not in the middle of any milestones (clean state)
 2. Update the `.gsd/` related entries in your `.gitignore` to follow the `Suggested .gitignore setup` section under `Working in teams` (ensure you are no longer blanket ignoring the whole `.gsd/` directory)
-3. Update your `.gsd/preferences.md` file within the repo as per section `Unique Milestone Names`
+3. Update your `.gsd/PREFERENCES.md` file within the repo as per section `Unique Milestone Names`
 4. If you want to update all your existing milestones use this prompt in GSD: `I have turned on unique milestone ids, please update all old milestone ids to use this new format e.g. M001-abc123 where abc123 is a random 6 char lowercase alpha numeric string. Update all references in all .gsd file contents, file names and directory names. Validate your work once done to ensure referential integrity.`
 5. Commit to git

--- a/docker/.env.example
+++ b/docker/.env.example
@ -3,6 +3,12 @@
 # Copy this file to .env and fill in your keys.
 # ──────────────────────────────────────────────

+# ── Container User Identity ──
+# Match your host UID/GID to avoid permission issues on bind mounts.
+# Run `id -u` and `id -g` on your host to find the right values.
+PUID=1000
+PGID=1000
+
 # ── LLM Provider API Keys (at least one required) ──

 # Anthropic (Claude)
--- a/docker/Dockerfile.ci-builder
+++ b/docker/Dockerfile.ci-builder
@ -0,0 +1,20 @@
+# ──────────────────────────────────────────────
+# CI Builder
+# Image: ghcr.io/gsd-build/gsd-ci-builder
+# Used by: pipeline.yml Dev stage
+# ──────────────────────────────────────────────
+FROM node:24-bookworm
+
+# Rust toolchain (stable, minimal profile)
+RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable --profile minimal
+ENV PATH="/root/.cargo/bin:${PATH}"
+
+# Cross-compilation for linux-arm64
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc-aarch64-linux-gnu \
+    g++-aarch64-linux-gnu \
+    && rustup target add aarch64-unknown-linux-gnu \
+    && rm -rf /var/lib/apt/lists/*
+
+# Verify toolchain
+RUN node --version && rustc --version && cargo --version
--- a/docker/Dockerfile.sandbox
+++ b/docker/Dockerfile.sandbox
@ -4,7 +4,7 @@
 # Purpose: Isolated environment for GSD auto mode
 # Usage: docker sandbox create --template ./docker
 # ──────────────────────────────────────────────
-FROM node:22-bookworm-slim
+FROM node:24-bookworm-slim

 # System dependencies required by GSD
 RUN apt-get update && apt-get install -y --no-install-recommends \
@ -12,6 +12,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    ca-certificates \
    openssh-client \
+    gosu \
    && rm -rf /var/lib/apt/lists/*

 # Install GSD globally — version controlled via build arg
@ -29,10 +30,13 @@ RUN mkdir -p /home/gsd/.gsd && chown -R gsd:gsd /home/gsd/.gsd
 WORKDIR /workspace
 RUN chown gsd:gsd /workspace

-USER gsd
+# Entrypoint handles UID/GID remapping, bootstrap, and drops to gsd user
+COPY entrypoint.sh /usr/local/bin/entrypoint.sh
+COPY bootstrap.sh /usr/local/bin/bootstrap.sh
+RUN chmod +x /usr/local/bin/entrypoint.sh /usr/local/bin/bootstrap.sh

 # Expose default GSD web UI port
 EXPOSE 3000

-ENTRYPOINT ["gsd"]
-CMD ["--help"]
+ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
+CMD ["gsd", "--help"]
--- a/docker/README.md
+++ b/docker/README.md
@ -7,6 +7,22 @@ Run GSD auto mode inside an isolated Docker sandbox so it cannot touch your host
 - Docker Desktop 4.58+ (macOS or Windows; Linux support is experimental)
 - At least one LLM provider API key

+## Docker Images
+
+| File | Purpose |
+|------|---------|
+| `Dockerfile.sandbox` | Runtime sandbox with entrypoint (UID remapping, bootstrap) |
+| `Dockerfile.ci-builder` | CI builds — includes build tools, no entrypoint magic |
+
+## Compose Files
+
+| File | Purpose |
+|------|---------|
+| `docker-compose.yaml` | Minimal zero-config setup — just works with sensible defaults |
+| `docker-compose.full.yaml` | Fully documented reference with all options, resource limits, health checks |
+
+Start with `docker-compose.yaml`. Copy options from `docker-compose.full.yaml` when you need them.
+
 ## Quick Start

 ### Option A: Docker Sandbox CLI (recommended)
@ -34,7 +50,7 @@ cp docker/.env.example docker/.env
 # Edit docker/.env with your keys

 # 2. Start the sandbox
-docker compose -f docker/docker-compose.yml up -d
+docker compose -f docker/docker-compose.yaml up -d

 # 3. Shell into the container
 docker exec -it gsd-sandbox bash
@ -43,6 +59,29 @@ docker exec -it gsd-sandbox bash
 gsd auto "implement the feature described in issue #42"
 ```

+## UID/GID Remapping
+
+The entrypoint handles UID/GID remapping via `PUID` and `PGID` environment variables. This avoids permission issues on bind-mounted volumes by matching the container's `gsd` user to your host UID/GID.
+
+```bash
+# Find your host UID/GID
+id -u  # PUID
+id -g  # PGID
+```
+
+Set these in your `.env` file or in the `environment` section of the compose file. Defaults to `1000:1000`.
+
+## Entrypoint Behavior
+
+The container entrypoint (`entrypoint.sh`) runs four steps on every start:
+
+1. **UID/GID remapping** — adjusts the `gsd` user to match `PUID`/`PGID`
+2. **Pre-create critical files** — prevents Docker bind-mount from creating directories where files are expected
+3. **Sentinel-based bootstrap** — runs `bootstrap.sh` exactly once on first boot
+4. **Drop privileges** — `exec gosu gsd` for proper PID 1 signal forwarding
+
+No hardcoded `user:` directive in compose — the entrypoint starts as root, remaps, then drops to `gsd`.
+
 ## Two-Terminal Workflow

 GSD's recommended workflow uses two terminals — one for auto mode, one for interactive discussion:
@ -85,7 +124,7 @@ If you restrict outbound network access in your sandbox, GSD needs these endpoin
 Build with a specific GSD version:

 ```bash
-docker compose -f docker/docker-compose.yml build --build-arg GSD_VERSION=2.43.0
+docker compose -f docker/docker-compose.yaml build --build-arg GSD_VERSION=2.51.0
 ```

 ## Cleanup
@ -95,7 +134,7 @@ docker compose -f docker/docker-compose.yml build --build-arg GSD_VERSION=2.43.0
 docker sandbox rm gsd-sandbox

 # Docker Compose
-docker compose -f docker/docker-compose.yml down -v
+docker compose -f docker/docker-compose.yaml down -v
 ```

 ## Known Limitations
--- a/docker/bootstrap.sh
+++ b/docker/bootstrap.sh
@ -0,0 +1,27 @@
+#!/bin/bash
+set -e
+
+# ──────────────────────────────────────────────
+# GSD First-Boot Bootstrap
+#
+# Runs once on initial container creation.
+# Called by entrypoint.sh as the gsd user.
+#
+# This script is idempotent — safe to run multiple
+# times, but the sentinel in entrypoint.sh ensures
+# it only runs once in practice.
+# ──────────────────────────────────────────────
+
+# ── Git Identity ────────────────────────────────────────
+# Without this, git commits inside the container will fail
+# or use garbage defaults.
+
+if [ -n "${GIT_AUTHOR_NAME}" ]; then
+    git config --global user.name "${GIT_AUTHOR_NAME}"
+fi
+
+if [ -n "${GIT_AUTHOR_EMAIL}" ]; then
+    git config --global user.email "${GIT_AUTHOR_EMAIL}"
+fi
+
+echo "Bootstrap complete."
--- a/docker/docker-compose.full.yaml
+++ b/docker/docker-compose.full.yaml
@ -0,0 +1,61 @@
+services:
+  gsd:
+    build:
+      context: .                        # Build context is the docker/ directory
+      dockerfile: Dockerfile.sandbox    # Runtime sandbox image with entrypoint
+      args:
+        GSD_VERSION: latest             # Pin a specific version: GSD_VERSION=2.51.0
+
+    container_name: gsd-sandbox
+
+    ports:
+      - "3000:3000"                     # GSD web UI
+
+    volumes:
+      - ../:/workspace                  # Project root mounted into the container
+      - gsd-state:/home/gsd/.gsd        # Persistent GSD state across restarts
+      # - ~/.ssh:/home/gsd/.ssh:ro      # SSH keys for git operations (read-only)
+      # - ~/.gitconfig:/home/gsd/.gitconfig:ro  # Host git config
+
+    env_file:
+      - .env                            # API keys and secrets (see .env.example)
+
+    environment:
+      - NODE_ENV=development
+      # UID/GID remapping — match your host user to avoid permission issues
+      # on bind-mounted volumes. The entrypoint remaps the container's gsd
+      # user to these IDs at startup. Run `id -u` / `id -g` to find yours.
+      - PUID=1000
+      - PGID=1000
+      # Git identity inside the container (overrides .env if set here)
+      # - GIT_AUTHOR_NAME=Your Name
+      # - GIT_AUTHOR_EMAIL=you@example.com
+
+    stdin_open: true                    # Keep stdin open for interactive use
+    tty: true                           # Allocate a pseudo-TTY
+
+    # Health check — verify GSD is installed and responsive
+    healthcheck:
+      test: ["CMD", "gsd", "--version"]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+      start_period: 10s
+
+    # Resource limits — uncomment to constrain container resources
+    # deploy:
+    #   resources:
+    #     limits:
+    #       cpus: "4.0"
+    #       memory: 8G
+    #     reservations:
+    #       cpus: "1.0"
+    #       memory: 2G
+
+    # Network mode — uncomment ONE if you need host networking
+    # network_mode: host               # Full host network access (no port mapping needed)
+    # network_mode: bridge             # Default Docker bridge (already the default)
+
+volumes:
+  gsd-state:
+    driver: local
--- a/docker/docker-compose.yaml
+++ b/docker/docker-compose.yaml
@ -0,0 +1,23 @@
+services:
+  gsd:
+    build:
+      context: .
+      dockerfile: Dockerfile.sandbox
+      args:
+        GSD_VERSION: latest
+    container_name: gsd-sandbox
+    ports:
+      - "3000:3000"
+    volumes:
+      - ../:/workspace
+      - gsd-state:/home/gsd/.gsd
+    env_file:
+      - .env
+    environment:
+      - NODE_ENV=development
+    stdin_open: true
+    tty: true
+
+volumes:
+  gsd-state:
+    driver: local
--- a/docker/docker-compose.yml
+++ b/docker/docker-compose.yml
@ -1,34 +0,0 @@
-# Docker Compose for running GSD in a sandbox
-# Usage: docker compose -f docker/docker-compose.yml up
-#
-# Copy docker/.env.example to docker/.env and fill in your API keys first.
-# See docker/README.md for full setup instructions.
-
-services:
-  gsd:
-    build:
-      context: .
-      dockerfile: Dockerfile.sandbox
-      args:
-        GSD_VERSION: latest
-    container_name: gsd-sandbox
-    ports:
-      - "3000:3000"
-    volumes:
-      # Sync project code into the sandbox
-      - ../:/workspace
-      # Persistent GSD state across container restarts
-      - gsd-state:/home/gsd/.gsd
-    env_file:
-      - .env
-    environment:
-      - NODE_ENV=development
-    user: "1000:1000"
-    stdin_open: true
-    tty: true
-    # Override entrypoint for interactive shell access
-    # entrypoint: /bin/bash
-
-volumes:
-  gsd-state:
-    driver: local
--- a/docker/entrypoint.sh
+++ b/docker/entrypoint.sh
@ -0,0 +1,81 @@
+#!/bin/bash
+set -e
+
+# ──────────────────────────────────────────────
+# GSD Container Entrypoint
+#
+# Responsibilities:
+#   1. UID/GID remapping — match host user via PUID/PGID
+#   2. Pre-create critical files — prevent Docker bind-mount
+#      from creating directories where files are expected
+#   3. Sentinel-based bootstrap — one-time first-boot setup
+#   4. Signal forwarding — exec into the final process
+# ──────────────────────────────────────────────
+
+GSD_USER="gsd"
+GSD_HOME="/home/${GSD_USER}"
+GSD_DIR="${GSD_HOME}/.gsd"
+
+# ── 1. UID/GID Remapping ────────────────────────────────
+# Accept PUID/PGID from the environment so the container
+# can run with the same UID/GID as the host user, avoiding
+# permission headaches on bind-mounted volumes.
+
+PUID="${PUID:-1000}"
+PGID="${PGID:-1000}"
+
+CURRENT_UID=$(id -u "${GSD_USER}")
+CURRENT_GID=$(id -g "${GSD_USER}")
+
+REMAPPED=0
+
+if [ "${PGID}" != "${CURRENT_GID}" ]; then
+    groupmod -o -g "${PGID}" "${GSD_USER}"
+    REMAPPED=1
+fi
+
+if [ "${PUID}" != "${CURRENT_UID}" ]; then
+    usermod -o -u "${PUID}" "${GSD_USER}"
+    REMAPPED=1
+fi
+
+# Fix ownership only when UID/GID actually changed
+if [ "${REMAPPED}" -eq 1 ]; then
+    chown -R "${PUID}:${PGID}" "${GSD_HOME}"
+    chown "${PUID}:${PGID}" /workspace
+fi
+
+# ── 2. Pre-create Critical Files ────────────────────────
+# Docker bind-mounts will create a *directory* if the target
+# path doesn't exist. We need these to be files, so touch
+# them before Docker gets a chance to mangle things.
+
+mkdir -p "${GSD_DIR}"
+
+if [ ! -f "${GSD_DIR}/settings.json" ]; then
+    echo '{}' > "${GSD_DIR}/settings.json"
+fi
+
+chown "${PUID}:${PGID}" "${GSD_DIR}" "${GSD_DIR}/settings.json"
+
+# ── 3. Sentinel-based Bootstrap ─────────────────────────
+# Run first-boot setup exactly once. Subsequent container
+# starts (or restarts) skip this entirely.
+
+SENTINEL="${GSD_DIR}/.bootstrapped"
+
+if [ ! -f "${SENTINEL}" ]; then
+    if [ -x /usr/local/bin/bootstrap.sh ]; then
+        # Run bootstrap as the gsd user so files get correct ownership
+        gosu "${GSD_USER}" /usr/local/bin/bootstrap.sh
+    fi
+    touch "${SENTINEL}"
+    chown "${PUID}:${PGID}" "${SENTINEL}"
+fi
+
+# ── 4. Drop Privileges & Exec ──────────────────────────
+# Replace this shell process with the final command running
+# as the gsd user. exec + gosu = proper PID 1 = proper
+# signal forwarding (SIGTERM, SIGINT, etc.).
+
+exec gosu "${GSD_USER}" "$@"
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -1,14 +1,14 @@
 # Configuration

-GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project-local). Manage interactively with `/gsd prefs`.
+GSD preferences live in `~/.gsd/PREFERENCES.md` (global) or `.gsd/PREFERENCES.md` (project-local). Manage interactively with `/gsd prefs`.

 ## `/gsd prefs` Commands

 | Command | Description |
 |---------|-------------|
 | `/gsd prefs` | Open the global preferences wizard (default) |
-| `/gsd prefs global` | Interactive wizard for global preferences (`~/.gsd/preferences.md`) |
-| `/gsd prefs project` | Interactive wizard for project preferences (`.gsd/preferences.md`) |
+| `/gsd prefs global` | Interactive wizard for global preferences (`~/.gsd/PREFERENCES.md`) |
+| `/gsd prefs project` | Interactive wizard for project preferences (`.gsd/PREFERENCES.md`) |
 | `/gsd prefs status` | Show current preference files, merged values, and skill resolution status |
 | `/gsd prefs wizard` | Alias for `/gsd prefs global` |
 | `/gsd prefs setup` | Alias for `/gsd prefs wizard` — creates preferences file if missing |
@ -42,8 +42,8 @@ token_profile: balanced

 | Scope | Path | Applies to |
 |-------|------|-----------|
-| Global | `~/.gsd/preferences.md` | All projects |
-| Project | `.gsd/preferences.md` | Current project only |
+| Global | `~/.gsd/PREFERENCES.md` | All projects |
+| Project | `.gsd/PREFERENCES.md` | Current project only |

 **Merge behavior:**
 - **Scalar fields** (`skill_discovery`, `budget_ceiling`): project wins if defined
--- a/docs/parallel-orchestration.md
+++ b/docs/parallel-orchestration.md
@ -126,7 +126,7 @@ File overlaps are warnings, not blockers. Both milestones work in separate workt

 ## Configuration

-Add to `~/.gsd/preferences.md` or `.gsd/preferences.md`:
+Add to `~/.gsd/PREFERENCES.md` or `.gsd/PREFERENCES.md`:

 ```yaml
 ---
--- a/docs/remote-questions.md
+++ b/docs/remote-questions.md
@ -16,7 +16,7 @@ The setup wizard:
 3. Lists servers the bot belongs to (or lets you pick)
 4. Lists text channels in the selected server
 5. Sends a test message to confirm permissions
-6. Saves the configuration to `~/.gsd/preferences.md`
+6. Saves the configuration to `~/.gsd/PREFERENCES.md`

 **Bot requirements:**
 - A Discord bot application with a token (from [Discord Developer Portal](https://discord.com/developers/applications))
@ -65,7 +65,7 @@ The setup wizard:

 ## Configuration

-Remote questions are configured in `~/.gsd/preferences.md`:
+Remote questions are configured in `~/.gsd/PREFERENCES.md`:

 ```yaml
 remote_questions:
--- a/docs/token-optimization.md
+++ b/docs/token-optimization.md
@ -257,7 +257,7 @@ models:
 ## How the Pieces Fit Together

 ```
-preferences.md
+PREFERENCES.md
  └─ token_profile: balanced
       ├─ resolveProfileDefaults() → model defaults + phase skip defaults
       ├─ resolveInlineLevel() → standard
--- a/docs/working-in-teams.md
+++ b/docs/working-in-teams.md
@ -9,7 +9,7 @@ GSD supports multi-user workflows where several developers work on the same repo
 The simplest way to configure GSD for team use is to set `mode: team` in your project preferences. This enables unique milestone IDs, push branches, and pre-merge checks in one setting:

 ```yaml
-# .gsd/preferences.md (project-level, committed to git)
+# .gsd/PREFERENCES.md (project-level, committed to git)
 ---
 version: 1
 mode: team
@ -38,7 +38,7 @@ Share planning artifacts (milestones, roadmaps, decisions) while keeping runtime
 ```

 **What gets shared** (committed to git):
- `.gsd/preferences.md` — project preferences
+- `.gsd/PREFERENCES.md` — project preferences
 - `.gsd/PROJECT.md` — living project description
 - `.gsd/REQUIREMENTS.md` — requirement contract
 - `.gsd/DECISIONS.md` — architectural decisions
@ -50,7 +50,7 @@ Share planning artifacts (milestones, roadmaps, decisions) while keeping runtime
 ### 3. Commit the Preferences

 ```bash
-git add .gsd/preferences.md
+git add .gsd/PREFERENCES.md
 git commit -m "chore: enable GSD team workflow"
 ```

@ -71,7 +71,7 @@ If you have an existing project with `.gsd/` blanket-ignored:

 1. Ensure no milestones are in progress (clean state)
 2. Update `.gitignore` to use the selective pattern above
-3. Add `unique_milestone_ids: true` to `.gsd/preferences.md`
+3. Add `unique_milestone_ids: true` to `.gsd/PREFERENCES.md`
 4. Optionally rename existing milestones to use unique IDs:
   ```
   I have turned on unique milestone ids, please update all old milestone
--- a/gsd-orchestrator/SKILL.md
+++ b/gsd-orchestrator/SKILL.md
@ -0,0 +1,374 @@
+---
+name: gsd-orchestrator
+description: >
+  Orchestrate GSD (Get Shit Done) projects via subprocess execution.
+  Use when an agent needs to create milestones from specs, execute software
+  development workflows, monitor task progress, poll status, handle blockers,
+  or track costs. Triggers on requests to "run gsd", "create milestone",
+  "execute project", "check gsd status", "orchestrate development",
+  "run headless workflow", or any programmatic interaction with the GSD
+  project management system.
+metadata:
+  openclaw:
+    requires:
+      bins: [gsd]
+    install:
+      kind: node
+      package: gsd-pi
+      bins: [gsd]
+---
+
+# GSD Orchestrator
+
+Run GSD commands as subprocesses via `gsd headless`. No SDK, no RPC — just shell exec, exit codes, and JSON on stdout.
+
+## Quick Start
+
+```bash
+# Install GSD globally
+npm install -g gsd-pi
+
+# Verify installation
+gsd --version
+
+# Create a milestone from a spec and execute it
+gsd headless --output-format json new-milestone --context spec.md --auto
+```
+
+## Command Syntax
+
+```bash
+gsd headless [flags] [command] [args...]
+```
+
+Default command is `auto` (run all queued units).
+
+### Flags
+
+| Flag | Description |
+|------|-------------|
+| `--output-format <fmt>` | Output format: `text` (default), `json` (structured result at exit), `stream-json` (JSONL events) |
+| `--json` | Alias for `--output-format stream-json` — JSONL event stream to stdout |
+| `--bare` | Minimal context: skip CLAUDE.md, AGENTS.md, user settings, user skills. Use for CI/ecosystem runs. |
+| `--resume <id>` | Resume a prior headless session by its session ID |
+| `--timeout N` | Overall timeout in ms (default: 300000) |
+| `--model ID` | Override LLM model |
+| `--supervised` | Forward interactive UI requests to orchestrator via stdout/stdin |
+| `--response-timeout N` | Timeout (ms) for orchestrator response in supervised mode (default: 30000) |
+| `--answers <path>` | Pre-supply answers and secrets from JSON file |
+| `--events <types>` | Filter JSONL output to specific event types (comma-separated, implies `--json`) |
+| `--verbose` | Show tool calls in progress output |
+
+### Exit Codes
+
+| Code | Meaning | Constant |
+|------|---------|----------|
+| `0` | Success — unit/milestone completed | `EXIT_SUCCESS` |
+| `1` | Error or timeout | `EXIT_ERROR` |
+| `10` | Blocked — needs human intervention | `EXIT_BLOCKED` |
+| `11` | Cancelled by user or orchestrator | `EXIT_CANCELLED` |
+
+These codes are stable and suitable for CI pipelines and orchestrator logic.
+
+### Output Formats
+
+| Format | Behavior |
+|--------|----------|
+| `text` | Human-readable progress on stderr. Default. |
+| `json` | Collect events silently. Emit a single `HeadlessJsonResult` JSON object to stdout at exit. |
+| `stream-json` | Stream JSONL events to stdout in real time (same as `--json`). |
+
+Use `--output-format json` when you need a structured result for decision-making. See [references/json-result.md](references/json-result.md) for the full field reference.
+
+## Core Workflows
+
+### 1. Create + Execute a Milestone (end-to-end)
+
+```bash
+gsd headless --output-format json new-milestone --context spec.md --auto
+```
+
+Reads a spec file, bootstraps `.gsd/`, creates the milestone, then chains into auto-mode executing all phases (discuss → research → plan → execute → summarize → complete). The JSON result is emitted on stdout at exit.
+
+Extra flags for `new-milestone`:
+- `--context <path>` — path to spec/PRD file (use `-` for stdin)
+- `--context-text <text>` — inline specification text
+- `--auto` — start auto-mode after milestone creation
+- `--verbose` — show tool calls in progress output
+
+```bash
+# From stdin
+cat spec.md | gsd headless --output-format json new-milestone --context - --auto
+
+# Inline text
+gsd headless new-milestone --context-text "Build a REST API for user management" --auto
+```
+
+### 2. Run All Queued Work
+
+```bash
+gsd headless --output-format json auto
+```
+
+Loop through all pending units until milestone complete or blocked.
+
+### 3. Run One Unit (step-by-step)
+
+```bash
+gsd headless --output-format json next
+```
+
+Execute exactly one unit (task/slice/milestone step), then exit. This is the recommended pattern for orchestrators that need control between steps.
+
+### 4. Instant State Snapshot (no LLM)
+
+```bash
+gsd headless query
+```
+
+Returns a single JSON object with the full project snapshot — no LLM session, instant (~50ms). **This is the recommended way for orchestrators to inspect state.**
+
+```json
+{
+  "state": {
+    "phase": "executing",
+    "activeMilestone": { "id": "M001", "title": "..." },
+    "activeSlice": { "id": "S01", "title": "..." },
+    "progress": { "completed": 3, "total": 7 },
+    "registry": [...]
+  },
+  "next": { "action": "dispatch", "unitType": "execute-task", "unitId": "M001/S01/T01" },
+  "cost": { "workers": [{ "milestoneId": "M001", "cost": 1.50 }], "total": 1.50 }
+}
+```
+
+### 5. Dispatch Specific Phase
+
+```bash
+gsd headless dispatch research|plan|execute|complete|reassess|uat|replan
+```
+
+Force-route to a specific phase, bypassing normal state-machine routing.
+
+### 6. Resume a Session
+
+```bash
+gsd headless --resume <session-id> auto
+```
+
+Resume a prior headless session. The session ID is available in the `HeadlessJsonResult.sessionId` field from a previous `--output-format json` run.
+
+## Orchestrator Patterns
+
+### Parse the Structured JSON Result
+
+When using `--output-format json`, the process emits a single `HeadlessJsonResult` on stdout at exit. Parse it for decision-making:
+
+```bash
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+EXIT=$?
+
+STATUS=$(echo "$RESULT" | jq -r '.status')
+COST=$(echo "$RESULT" | jq -r '.cost.total')
+PHASE=$(echo "$RESULT" | jq -r '.phase')
+NEXT=$(echo "$RESULT" | jq -r '.nextAction')
+SESSION_ID=$(echo "$RESULT" | jq -r '.sessionId')
+
+echo "Status: $STATUS, Cost: \$${COST}, Phase: $PHASE, Next: $NEXT"
+```
+
+See [references/json-result.md](references/json-result.md) for the full field reference.
+
+### Blocker Detection and Handling
+
+Exit code `10` means the execution hit a blocker requiring human intervention:
+
+```bash
+gsd headless --output-format json next 2>/dev/null
+EXIT=$?
+
+if [ $EXIT -eq 10 ]; then
+  # Inspect the blocker
+  BLOCKER=$(gsd headless query | jq '.state.phase')
+  echo "Blocked: $BLOCKER"
+
+  # Option 1: Use --supervised mode to handle interactively
+  gsd headless --supervised auto
+
+  # Option 2: Pre-supply answers to resolve the blocker
+  gsd headless --answers blocker-answers.json auto
+
+  # Option 3: Steer the plan to work around it
+  gsd headless steer "Skip the blocked dependency, use mock instead"
+fi
+```
+
+### Cost Tracking and Budget Enforcement
+
+```bash
+MAX_BUDGET=10.00
+
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+COST=$(echo "$RESULT" | jq -r '.cost.total')
+
+# Check cumulative cost via query (includes all workers)
+TOTAL_COST=$(gsd headless query | jq -r '.cost.total')
+
+if (( $(echo "$TOTAL_COST > $MAX_BUDGET" | bc -l) )); then
+  echo "Budget exceeded: \$$TOTAL_COST > \$$MAX_BUDGET"
+  gsd headless stop
+  exit 1
+fi
+```
+
+### Step-by-Step with Monitoring
+
+The recommended pattern for full control. Run one unit at a time, inspect state between steps:
+
+```bash
+while true; do
+  RESULT=$(gsd headless --output-format json next 2>/dev/null)
+  EXIT=$?
+
+  STATUS=$(echo "$RESULT" | jq -r '.status')
+  COST=$(echo "$RESULT" | jq -r '.cost.total')
+
+  echo "Exit: $EXIT, Status: $STATUS, Cost: \$$COST"
+
+  # Handle terminal states
+  [ $EXIT -eq 0 ] || break
+
+  # Check if milestone is complete
+  PHASE=$(gsd headless query | jq -r '.state.phase')
+  [ "$PHASE" = "complete" ] && echo "Milestone complete" && break
+
+  # Budget check
+  TOTAL=$(gsd headless query | jq -r '.cost.total')
+  if (( $(echo "$TOTAL > 20.00" | bc -l) )); then
+    echo "Budget limit reached"
+    break
+  fi
+done
+```
+
+### Poll-and-React Loop
+
+Lightweight pattern using only the instant `query` command:
+
+```bash
+PHASE=$(gsd headless query | jq -r '.state.phase')
+NEXT_ACTION=$(gsd headless query | jq -r '.next.action')
+
+case "$PHASE" in
+  complete) echo "Done" ;;
+  blocked)  echo "Needs intervention — exit code 10" ;;
+  *)        [ "$NEXT_ACTION" = "dispatch" ] && gsd headless next ;;
+esac
+```
+
+### CI/Ecosystem Mode
+
+Use `--bare` to skip user-specific configuration for deterministic CI runs:
+
+```bash
+gsd headless --bare --output-format json auto 2>/dev/null
+```
+
+This skips CLAUDE.md, AGENTS.md, user settings, and user skills. Bundled GSD extensions and `.gsd/` state are still loaded (they're required for GSD to function).
+
+### JSONL Event Stream
+
+Use `--json` (or `--output-format stream-json`) for real-time events:
+
+```bash
+gsd headless --json auto 2>/dev/null | while read -r line; do
+  TYPE=$(echo "$line" | jq -r '.type')
+  case "$TYPE" in
+    tool_execution_start) echo "Tool: $(echo "$line" | jq -r '.toolName')" ;;
+    extension_ui_request) echo "GSD: $(echo "$line" | jq -r '.message // .title // empty')" ;;
+    agent_end) echo "Session ended" ;;
+  esac
+done
+```
+
+### Filtered Event Stream
+
+Use `--events` to receive only specific event types:
+
+```bash
+# Only phase-relevant events
+gsd headless --events agent_end,extension_ui_request auto 2>/dev/null
+
+# Only tool execution events
+gsd headless --events tool_execution_start,tool_execution_end auto
+```
+
+Available event types: `agent_start`, `agent_end`, `tool_execution_start`, `tool_execution_end`, `tool_execution_update`, `extension_ui_request`, `message_start`, `message_end`, `message_update`, `turn_start`, `turn_end`.
+
+## Answer Injection
+
+Pre-supply answers and secrets for fully autonomous headless runs:
+
+```bash
+gsd headless --answers answers.json auto
+```
+
+Answer file schema:
+```json
+{
+  "questions": { "question_id": "selected_option" },
+  "secrets": { "API_KEY": "sk-..." },
+  "defaults": { "strategy": "first_option" }
+}
+```
+
+- **questions** — question ID → answer (string for single-select, string[] for multi-select)
+- **secrets** — env var → value, injected into child process environment
+- **defaults.strategy** — `"first_option"` (default) or `"cancel"` for unmatched questions
+
+See [references/answer-injection.md](references/answer-injection.md) for the full mechanism.
+
+## GSD Project Structure
+
+All state lives in `.gsd/` as markdown files (version-controllable):
+
+```
+.gsd/
+  PROJECT.md
+  REQUIREMENTS.md
+  DECISIONS.md
+  KNOWLEDGE.md
+  STATE.md
+  milestones/
+    M001/
+      M001-CONTEXT.md      # Requirements, scope, decisions
+      M001-ROADMAP.md      # Slices with tasks, dependencies, checkboxes
+      M001-SUMMARY.md      # Completion summary
+      slices/
+        S01/
+          S01-PLAN.md      # Task list
+          S01-SUMMARY.md   # Slice summary
+          tasks/
+            T01-PLAN.md    # Individual task spec
+            T01-SUMMARY.md # Task completion summary
+```
+
+State is derived from files on disk — checkboxes in ROADMAP.md and PLAN.md are the source of truth for completion.
+
+## All Commands
+
+See [references/commands.md](references/commands.md) for the complete reference.
+
+| Command | Purpose |
+|---------|---------|
+| `auto` | Run all queued units (default) |
+| `next` | Run one unit |
+| `query` | Instant JSON snapshot — state, next dispatch, costs (no LLM) |
+| `new-milestone` | Create milestone from spec |
+| `dispatch <phase>` | Force specific phase |
+| `stop` / `pause` | Control auto-mode |
+| `steer <desc>` | Hard-steer plan mid-execution |
+| `skip` / `undo` | Unit control |
+| `queue` | Queue/reorder milestones |
+| `history` | View execution history |
+| `doctor` | Health check + auto-fix |
--- a/gsd-orchestrator/references/answer-injection.md
+++ b/gsd-orchestrator/references/answer-injection.md
@ -0,0 +1,119 @@
+# Answer Injection
+
+Pre-supply answers and secrets to eliminate interactive prompts during headless execution.
+
+## Usage
+
+```bash
+gsd headless --answers answers.json auto
+gsd headless --answers answers.json new-milestone --context spec.md --auto
+```
+
+The `--answers` flag takes a path to a JSON file containing pre-supplied answers and secrets.
+
+## Answer File Schema
+
+```json
+{
+  "questions": {
+    "question_id": "selected_option_label",
+    "multi_select_question": ["option_a", "option_b"]
+  },
+  "secrets": {
+    "API_KEY": "sk-...",
+    "DATABASE_URL": "postgres://..."
+  },
+  "defaults": {
+    "strategy": "first_option"
+  }
+}
+```
+
+### Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `questions` | `Record<string, string \| string[]>` | Map question ID → answer. String for single-select, string array for multi-select. |
+| `secrets` | `Record<string, string>` | Map env var name → value. Injected into child process environment variables. |
+| `defaults.strategy` | `"first_option" \| "cancel"` | Fallback for unmatched questions. Default: `"first_option"`. |
+
+## How Secrets Work
+
+Secrets are injected as environment variables into the GSD child process:
+
+1. The orchestrator passes the answer file via `--answers`
+2. GSD reads the file and sets secret values as env vars in the child process
+3. When `secure_env_collect` runs inside the agent, it finds the keys already in `process.env`
+4. The tool skips the interactive prompt and reports the keys as "already configured"
+
+Secrets are never logged or included in event streams.
+
+## How Question Matching Works
+
+Two-phase correlation:
+
+1. **Observe** — GSD monitors `tool_execution_start` events for `ask_user_questions` to extract question metadata (ID, options, allowMultiple)
+2. **Match** — Subsequent `extension_ui_request` events are correlated to the metadata and responded to with the pre-supplied answer
+
+Handles out-of-order events (extension_ui_request can arrive before tool_execution_start) via a deferred processing queue with 500ms timeout.
+
+## Coexistence with `--supervised`
+
+Both `--answers` and `--supervised` can be active simultaneously. Priority order:
+
+1. Answer injector tries first
+2. If no answer found, supervised mode forwards to the orchestrator
+3. If no orchestrator response within `--response-timeout`, the auto-responder kicks in
+
+## Without Answer Injection
+
+Headless mode has built-in auto-responders for all prompt types:
+
+| Prompt Type | Default Behavior |
+|-------------|-----------------|
+| Select | Picks first option |
+| Confirm | Auto-confirms |
+| Input | Empty string |
+| Editor | Returns prefill or empty |
+
+Answer injection overrides these defaults with specific answers when precision matters.
+
+## Diagnostics
+
+The injector tracks statistics printed in the session summary:
+
+| Stat | Description |
+|------|-------------|
+| `questionsAnswered` | Questions resolved from the answer file |
+| `questionsDefaulted` | Questions handled by the default strategy |
+| `secretsProvided` | Number of secrets injected |
+
+Unused question IDs and secret keys are warned about at exit.
+
+## Example: Orchestrator with Answers
+
+```bash
+# Create answer file
+cat > answers.json << 'EOF'
+{
+  "questions": {
+    "test_framework": "vitest",
+    "package_manager": "pnpm"
+  },
+  "secrets": {
+    "OPENAI_API_KEY": "sk-...",
+    "DATABASE_URL": "postgres://localhost:5432/mydb"
+  },
+  "defaults": {
+    "strategy": "first_option"
+  }
+}
+EOF
+
+# Run with pre-supplied answers
+gsd headless --answers answers.json --output-format json auto 2>/dev/null
+
+# Parse result
+RESULT=$(gsd headless --answers answers.json --output-format json next 2>/dev/null)
+echo "$RESULT" | jq '{status: .status, cost: .cost.total}'
+```
--- a/gsd-orchestrator/references/commands.md
+++ b/gsd-orchestrator/references/commands.md
@ -0,0 +1,210 @@
+# GSD Commands Reference
+
+All commands run as subprocesses via `gsd headless [flags] [command] [args...]`.
+
+## Global Flags
+
+These flags apply to any `gsd headless` invocation:
+
+| Flag | Description |
+|------|-------------|
+| `--output-format <fmt>` | `text` (default), `json` (structured result), `stream-json` (JSONL) |
+| `--json` | Alias for `--output-format stream-json` |
+| `--bare` | Minimal context: skip CLAUDE.md, AGENTS.md, user settings, user skills |
+| `--resume <id>` | Resume a prior headless session by ID |
+| `--timeout N` | Overall timeout in ms (default: 300000) |
+| `--model ID` | Override LLM model |
+| `--supervised` | Forward interactive UI requests to orchestrator via stdout/stdin |
+| `--response-timeout N` | Timeout for orchestrator response in supervised mode (default: 30000ms) |
+| `--answers <path>` | Pre-supply answers and secrets from JSON file |
+| `--events <types>` | Filter JSONL output to specific event types (comma-separated, implies `--json`) |
+| `--verbose` | Show tool calls in progress output |
+
+## Exit Codes
+
+| Code | Meaning | When |
+|------|---------|------|
+| `0` | Success | Unit/milestone completed normally |
+| `1` | Error or timeout | Runtime error, LLM failure, or `--timeout` exceeded |
+| `10` | Blocked | Execution hit a blocker requiring human intervention |
+| `11` | Cancelled | User or orchestrator cancelled the operation |
+
+## Workflow Commands
+
+### `auto` (default)
+
+Autonomous mode — loop through all pending units until milestone complete or blocked.
+
+```bash
+gsd headless --output-format json auto
+```
+
+### `next`
+
+Step mode — execute exactly one unit (task/slice/milestone step), then exit. Recommended for orchestrators that need decision points between steps.
+
+```bash
+gsd headless --output-format json next
+```
+
+### `new-milestone`
+
+Create a milestone from a specification document.
+
+```bash
+gsd headless new-milestone --context spec.md
+gsd headless new-milestone --context spec.md --auto
+gsd headless new-milestone --context-text "Build a REST API" --auto
+cat spec.md | gsd headless new-milestone --context - --auto
+```
+
+Extra flags:
+- `--context <path>` — path to spec/PRD file (use `-` for stdin)
+- `--context-text <text>` — inline specification text
+- `--auto` — start auto-mode after milestone creation
+
+### `dispatch <phase>`
+
+Force-route to a specific phase, bypassing normal state-machine routing.
+
+```bash
+gsd headless dispatch research
+gsd headless dispatch plan
+gsd headless dispatch execute
+gsd headless dispatch complete
+gsd headless dispatch reassess
+gsd headless dispatch uat
+gsd headless dispatch replan
+```
+
+### `discuss`
+
+Start guided milestone/slice discussion.
+
+```bash
+gsd headless discuss
+```
+
+### `stop`
+
+Stop auto-mode gracefully.
+
+```bash
+gsd headless stop
+```
+
+### `pause`
+
+Pause auto-mode (preserves state, resumable).
+
+```bash
+gsd headless pause
+```
+
+## State Inspection
+
+### `query`
+
+**Instant JSON snapshot** — state, next dispatch, parallel costs. No LLM, ~50ms. The recommended way for orchestrators to inspect state.
+
+```bash
+gsd headless query
+gsd headless query | jq '.state.phase'
+gsd headless query | jq '.next'
+gsd headless query | jq '.cost.total'
+```
+
+### `status`
+
+Progress dashboard (TUI overlay — useful interactively, not for parsing).
+
+```bash
+gsd headless status
+```
+
+### `history`
+
+Execution history. Supports `--cost`, `--phase`, `--model`, and `limit` arguments.
+
+```bash
+gsd headless history
+```
+
+## Unit Control
+
+### `skip`
+
+Prevent a unit from auto-mode dispatch.
+
+```bash
+gsd headless skip
+```
+
+### `undo`
+
+Revert last completed unit. Use `--force` to bypass confirmation.
+
+```bash
+gsd headless undo
+gsd headless undo --force
+```
+
+### `steer <description>`
+
+Hard-steer plan documents during execution. Useful for mid-course corrections.
+
+```bash
+gsd headless steer "Skip the blocked dependency, use mock instead"
+```
+
+### `queue`
+
+Queue and reorder future milestones.
+
+```bash
+gsd headless queue
+```
+
+## Configuration & Health
+
+### `doctor`
+
+Runtime health checks with auto-fix.
+
+```bash
+gsd headless doctor
+```
+
+### `prefs`
+
+Manage preferences (global/project/status/wizard/setup).
+
+```bash
+gsd headless prefs
+```
+
+### `knowledge <rule|pattern|lesson>`
+
+Add persistent project knowledge.
+
+```bash
+gsd headless knowledge "Always use UTC timestamps in API responses"
+```
+
+## Phases
+
+GSD workflows progress through these phases:
+
+```
+pre-planning → needs-discussion → discussing → researching → planning →
+executing → verifying → summarizing → advancing → validating-milestone →
+completing-milestone → complete
+```
+
+Special phases: `paused`, `blocked`, `replanning-slice`
+
+## Hierarchy
+
+- **Milestone**: Shippable version (4–10 slices, 1–4 weeks)
+- **Slice**: One demoable vertical capability (1–7 tasks, 1–3 days)
+- **Task**: One context-window-sized unit of work (one session)
--- a/gsd-orchestrator/references/json-result.md
+++ b/gsd-orchestrator/references/json-result.md
@ -0,0 +1,162 @@
+# HeadlessJsonResult Reference
+
+When using `--output-format json`, GSD collects events silently and emits a single `HeadlessJsonResult` JSON object to stdout at process exit. This is the structured result for orchestrator decision-making.
+
+## Obtaining the Result
+
+```bash
+# Capture the JSON result
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+EXIT=$?
+
+# Parse fields with jq
+echo "$RESULT" | jq '.status'
+echo "$RESULT" | jq '.cost.total'
+echo "$RESULT" | jq '.nextAction'
+```
+
+**Important:** Progress text goes to stderr. The JSON result goes to stdout. Redirect stderr to `/dev/null` when parsing stdout.
+
+## Field Reference
+
+### Top-Level Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | `"success" \| "error" \| "blocked" \| "cancelled" \| "timeout"` | Final session status. Maps directly to exit codes. |
+| `exitCode` | `number` | Process exit code: `0` (success), `1` (error/timeout), `10` (blocked), `11` (cancelled). |
+| `sessionId` | `string \| undefined` | Session identifier. Pass to `--resume <id>` to continue this session. |
+| `duration` | `number` | Session wall-clock duration in milliseconds. |
+| `cost` | `CostObject` | Token usage and cost breakdown. See below. |
+| `toolCalls` | `number` | Total number of tool calls made during the session. |
+| `events` | `number` | Total number of events processed during the session. |
+| `milestone` | `string \| undefined` | Active milestone ID (e.g. `"M001"`). |
+| `phase` | `string \| undefined` | Current GSD phase at session end (e.g. `"executing"`, `"blocked"`, `"complete"`). |
+| `nextAction` | `string \| undefined` | Recommended next action from the state machine (e.g. `"dispatch"`, `"complete"`). |
+| `artifacts` | `string[] \| undefined` | Paths to artifacts created or modified during the session. |
+| `commits` | `string[] \| undefined` | Git commit SHAs created during the session. |
+
+### Status → Exit Code Mapping
+
+| Status | Exit Code | Constant | Meaning |
+|--------|-----------|----------|---------|
+| `success` | `0` | `EXIT_SUCCESS` | Unit or milestone completed successfully |
+| `error` | `1` | `EXIT_ERROR` | Runtime error or LLM failure |
+| `timeout` | `1` | `EXIT_ERROR` | `--timeout` deadline exceeded |
+| `blocked` | `10` | `EXIT_BLOCKED` | Execution blocked — needs human intervention |
+| `cancelled` | `11` | `EXIT_CANCELLED` | Cancelled by user or orchestrator |
+
+### Cost Object
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `cost.total` | `number` | Total cost in USD for the session. |
+| `cost.input_tokens` | `number` | Number of input tokens consumed. |
+| `cost.output_tokens` | `number` | Number of output tokens generated. |
+| `cost.cache_read_tokens` | `number` | Number of tokens served from prompt cache. |
+| `cost.cache_write_tokens` | `number` | Number of tokens written to prompt cache. |
+
+## Parsing Patterns
+
+### Decision-Making After Each Step
+
+```bash
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+EXIT=$?
+
+case $EXIT in
+  0)
+    PHASE=$(echo "$RESULT" | jq -r '.phase')
+    NEXT=$(echo "$RESULT" | jq -r '.nextAction')
+    echo "Success — phase: $PHASE, next: $NEXT"
+    ;;
+  1)
+    STATUS=$(echo "$RESULT" | jq -r '.status')
+    echo "Failed — status: $STATUS"
+    ;;
+  10)
+    echo "Blocked — needs intervention"
+    gsd headless query | jq '.state'
+    ;;
+  11)
+    echo "Cancelled"
+    ;;
+esac
+```
+
+### Cost Tracking
+
+```bash
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+
+COST=$(echo "$RESULT" | jq -r '.cost.total')
+INPUT=$(echo "$RESULT" | jq -r '.cost.input_tokens')
+OUTPUT=$(echo "$RESULT" | jq -r '.cost.output_tokens')
+
+echo "Cost: \$$COST (${INPUT} in / ${OUTPUT} out)"
+```
+
+### Session Resumption
+
+```bash
+# First run — capture session ID
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+SESSION_ID=$(echo "$RESULT" | jq -r '.sessionId')
+
+# Resume the same session later
+gsd headless --resume "$SESSION_ID" --output-format json next 2>/dev/null
+```
+
+### Artifact Collection
+
+```bash
+RESULT=$(gsd headless --output-format json auto 2>/dev/null)
+
+# List files created/modified
+echo "$RESULT" | jq -r '.artifacts[]?'
+
+# List commits made
+echo "$RESULT" | jq -r '.commits[]?'
+```
+
+## Example Result
+
+```json
+{
+  "status": "success",
+  "exitCode": 0,
+  "sessionId": "abc123def456",
+  "duration": 45200,
+  "cost": {
+    "total": 0.42,
+    "input_tokens": 15000,
+    "output_tokens": 3500,
+    "cache_read_tokens": 8000,
+    "cache_write_tokens": 2000
+  },
+  "toolCalls": 12,
+  "events": 87,
+  "milestone": "M001",
+  "phase": "executing",
+  "nextAction": "dispatch",
+  "artifacts": [
+    ".gsd/milestones/M001/slices/S01/tasks/T01-SUMMARY.md"
+  ],
+  "commits": [
+    "a1b2c3d"
+  ]
+}
+```
+
+## Combined with `query` for Full Picture
+
+The `HeadlessJsonResult` captures what happened during a session. Use `query` for the current project state:
+
+```bash
+# What happened in this step?
+RESULT=$(gsd headless --output-format json next 2>/dev/null)
+echo "$RESULT" | jq '{status, cost: .cost.total, phase}'
+
+# What's the overall project state now?
+gsd headless query | jq '{phase: .state.phase, progress: .state.progress, totalCost: .cost.total}'
+```
--- a/mintlify-docs/guides/configuration.mdx
+++ b/mintlify-docs/guides/configuration.mdx
@ -3,7 +3,7 @@ title: "Configuration"
 description: "Preferences, model selection, MCP servers, hooks, and all settings."
 ---

-GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project-local). Manage interactively with `/gsd prefs`.
+GSD preferences live in `~/.gsd/PREFERENCES.md` (global) or `.gsd/PREFERENCES.md` (project-local). Manage interactively with `/gsd prefs`.

 ## Preferences commands

@ -40,8 +40,8 @@ token_profile: balanced

 | Scope | Path | Applies to |
 |-------|------|-----------|
-| Global | `~/.gsd/preferences.md` | All projects |
-| Project | `.gsd/preferences.md` | Current project only |
+| Global | `~/.gsd/PREFERENCES.md` | All projects |
+| Project | `.gsd/PREFERENCES.md` | Current project only |

 **Merge behavior:**
 - **Scalar fields** — project wins if defined
--- a/mintlify-docs/guides/working-in-teams.mdx
+++ b/mintlify-docs/guides/working-in-teams.mdx
@ -10,7 +10,7 @@ GSD supports multi-user workflows where several developers work on the same repo
 ### 1. Set team mode

 ```yaml
-# .gsd/preferences.md (project-level, committed to git)
+# .gsd/PREFERENCES.md (project-level, committed to git)
 ---
 version: 1
 mode: team
@ -43,7 +43,7 @@ Share planning artifacts while keeping runtime files local:
 ### 3. Commit

 ```bash
-git add .gsd/preferences.md
+git add .gsd/PREFERENCES.md
 git commit -m "chore: enable GSD team workflow"
 ```

--- a/packages/pi-coding-agent/src/cli/args.ts
+++ b/packages/pi-coding-agent/src/cli/args.ts
@ -49,6 +49,8 @@ export interface Args {
 	fileArgs: string[];
 	/** Unknown flags (potentially extension flags) - map of flag name to value */
 	unknownFlags: Map<string, boolean | string>;
+	/** --bare: suppress CLAUDE.md/AGENTS.md, user skills, prompt templates, themes, project preferences */
+	bare?: boolean;
 }

 const VALID_THINKING_LEVELS = ["off", "minimal", "low", "medium", "high", "xhigh"] as const;
@ -169,6 +171,8 @@ export function parseArgs(args: string[], extensionFlags?: Map<string, { type: "
 			}
 		} else if (arg === "--verbose") {
 			result.verbose = true;
+		} else if (arg === "--bare") {
+			result.bare = true;
 		} else if (arg === "--offline") {
 			result.offline = true;
 		} else if (arg.startsWith("@")) {
--- a/packages/pi-coding-agent/src/core/bash-executor.ts
+++ b/packages/pi-coding-agent/src/core/bash-executor.ts
@ -87,8 +87,12 @@ export function executeBash(command: string, options?: BashExecutorOptions & { l
 		} else {
 			({ shell, args } = getShellConfig());
 		}
+		// On Windows, detached: true sets CREATE_NEW_PROCESS_GROUP which can
+		// cause EINVAL in VSCode/ConPTY terminal contexts.  The bg-shell
+		// extension already guards this (process-manager.ts); align here.
+		// Process-tree cleanup uses taskkill /F /T on Windows regardless.
 		const child: ChildProcess = spawn(shell, [...args, sanitizeCommand(command)], {
-			detached: true,
+			detached: process.platform !== "win32",
 			env: getShellEnv(),
 			stdio: ["ignore", "pipe", "pipe"],
 		});
--- a/packages/pi-coding-agent/src/core/tools/bash-spawn-windows.test.ts
+++ b/packages/pi-coding-agent/src/core/tools/bash-spawn-windows.test.ts
@ -0,0 +1,101 @@
+/**
+ * bash-spawn-windows.test.ts — Regression test for Windows spawn EINVAL.
+ *
+ * Verifies that bash tool spawn options disable `detached: true` on Windows
+ * to prevent EINVAL errors in ConPTY / VSCode terminal contexts.
+ *
+ * Background:
+ *   On Windows, `spawn()` with `detached: true` sets the
+ *   CREATE_NEW_PROCESS_GROUP flag in CreateProcess.  In certain terminal
+ *   contexts (VSCode integrated terminal, ConPTY, Windows Terminal) this
+ *   flag conflicts with the parent process group and causes a synchronous
+ *   EINVAL from libuv.  The bg-shell extension already guards against this
+ *   with `detached: process.platform !== "win32"` (process-manager.ts);
+ *   this test ensures all other spawn sites are aligned.
+ *
+ * See: gsd-build/gsd-2#XXXX
+ */
+
+import test from "node:test";
+import assert from "node:assert/strict";
+import { spawn } from "node:child_process";
+
+// Verify the spawn option pattern used across the codebase.
+// This is a static/structural test — it reads the source files and asserts
+// they use the platform-guarded detached flag.
+import { readFileSync } from "node:fs";
+import { join, dirname } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+
+const SPAWN_FILES = [
+	join(__dirname, "bash.ts"),
+	join(__dirname, "..", "bash-executor.ts"),
+	join(__dirname, "..", "..", "utils", "shell.ts"),
+];
+
+test("spawn calls use platform-guarded detached flag (no unconditional detached: true)", () => {
+	for (const file of SPAWN_FILES) {
+		const content = readFileSync(file, "utf-8");
+		const lines = content.split("\n");
+
+		for (let i = 0; i < lines.length; i++) {
+			const line = lines[i]!;
+			// Skip comments
+			if (line.trim().startsWith("//") || line.trim().startsWith("*")) continue;
+			// Check for unconditional `detached: true`
+			if (/detached:\s*true\b/.test(line)) {
+				assert.fail(
+					`${file}:${i + 1} has unconditional 'detached: true' — ` +
+					`must use 'detached: process.platform !== "win32"' ` +
+					`to prevent EINVAL on Windows (ConPTY / VSCode terminal)`,
+				);
+			}
+		}
+	}
+});
+
+test("killProcessTree does not use detached: true for taskkill on Windows", () => {
+	const shellFile = join(__dirname, "..", "..", "utils", "shell.ts");
+	const content = readFileSync(shellFile, "utf-8");
+
+	// Find the taskkill spawn call and ensure it doesn't have detached: true
+	const taskkillRegion = content.match(/spawn\("taskkill"[\s\S]*?\}\)/);
+	if (taskkillRegion) {
+		assert.ok(
+			!/detached:\s*true/.test(taskkillRegion[0]),
+			"taskkill spawn should not use detached: true — " +
+			"it can cause EINVAL on Windows and is unnecessary for a utility process",
+		);
+	}
+});
+
+// Smoke test: spawn with platform-guarded detached flag actually works
+test("spawn with detached: process.platform !== 'win32' succeeds", async () => {
+	const { promise, resolve, reject } = Promise.withResolvers<void>();
+
+	const child = spawn(
+		process.platform === "win32" ? "cmd" : "sh",
+		process.platform === "win32" ? ["/c", "echo ok"] : ["-c", "echo ok"],
+		{
+			detached: process.platform !== "win32",
+			stdio: ["ignore", "pipe", "pipe"],
+		},
+	);
+
+	let output = "";
+	child.stdout?.on("data", (d: Buffer) => { output += d.toString(); });
+	child.on("error", reject);
+	child.on("close", (code) => {
+		try {
+			assert.equal(code, 0, "spawn should succeed");
+			assert.ok(output.trim().includes("ok"), `Expected 'ok' in output, got: ${output}`);
+			resolve();
+		} catch (e) {
+			reject(e);
+		}
+	});
+
+	await promise;
+});
--- a/packages/pi-coding-agent/src/core/tools/bash.ts
+++ b/packages/pi-coding-agent/src/core/tools/bash.ts
@ -158,9 +158,13 @@ const defaultBashOperations: BashOperations = {
 				return;
 			}

+			// On Windows, detached: true sets CREATE_NEW_PROCESS_GROUP which can
+			// cause EINVAL in VSCode/ConPTY terminal contexts.  The bg-shell
+			// extension already guards this (process-manager.ts); align here.
+			// Process-tree cleanup uses taskkill /F /T on Windows regardless.
 			const child = spawn(shell, [...args, command], {
 				cwd,
-				detached: true,
+				detached: process.platform !== "win32",
 				env: env ?? getShellEnv(),
 				stdio: ["ignore", "pipe", "pipe"],
 			});
--- a/packages/pi-coding-agent/src/index.ts
+++ b/packages/pi-coding-agent/src/index.ts
@ -314,8 +314,11 @@ export {
 	type RpcClientOptions,
 	type RpcEventListener,
 	type RpcCommand,
+	type RpcInitResult,
+	type RpcProtocolVersion,
 	type RpcResponse,
 	type RpcSessionState,
+	type RpcV2Event,
 } from "./modes/index.js";
 // RPC JSONL utilities
 export { attachJsonlLineReader, serializeJsonLine } from "./modes/rpc/jsonl.js";
--- a/packages/pi-coding-agent/src/main.ts
+++ b/packages/pi-coding-agent/src/main.ts
@ -419,11 +419,13 @@ export async function main(args: string[]) {
 		additionalPromptTemplatePaths: firstPass.promptTemplates,
 		additionalThemePaths: firstPass.themes,
 		noExtensions: firstPass.noExtensions,
-		noSkills: firstPass.noSkills,
-		noPromptTemplates: firstPass.noPromptTemplates,
-		noThemes: firstPass.noThemes,
+		noSkills: firstPass.noSkills || firstPass.bare,
+		noPromptTemplates: firstPass.noPromptTemplates || firstPass.bare,
+		noThemes: firstPass.noThemes || firstPass.bare,
 		systemPrompt: firstPass.systemPrompt,
 		appendSystemPrompt: firstPass.appendSystemPrompt,
+		// --bare: suppress CLAUDE.md/AGENTS.md ancestor walk
+		...(firstPass.bare ? { agentsFilesOverride: () => ({ agentsFiles: [] }) } : {}),
 	});
 	await resourceLoader.reload();
 	time("resourceLoader.reload");
--- a/packages/pi-coding-agent/src/modes/index.ts
+++ b/packages/pi-coding-agent/src/modes/index.ts
@ -6,4 +6,11 @@ export { InteractiveMode, type InteractiveModeOptions } from "./interactive/inte
 export { type PrintModeOptions, runPrintMode } from "./print-mode.js";
 export { type ModelInfo, RpcClient, type RpcClientOptions, type RpcEventListener } from "./rpc/rpc-client.js";
 export { runRpcMode } from "./rpc/rpc-mode.js";
-export type { RpcCommand, RpcResponse, RpcSessionState } from "./rpc/rpc-types.js";
+export type {
+	RpcCommand,
+	RpcInitResult,
+	RpcProtocolVersion,
+	RpcResponse,
+	RpcSessionState,
+	RpcV2Event,
+} from "./rpc/rpc-types.js";
--- a/packages/pi-coding-agent/src/modes/interactive/controllers/chat-controller.ts
+++ b/packages/pi-coding-agent/src/modes/interactive/controllers/chat-controller.ts
@ -150,7 +150,6 @@ export async function handleAgentEvent(host: InteractiveModeStateHost & {
 									content: [{ type: "text", text: "Web search disabled (offline mode)" }],
 									isError: false,
 								});
-								host.pendingTools.delete(content.toolUseId);
 							} else {
 								const searchContent = content.content;
 								const isError = searchContent && typeof searchContent === "object" && "type" in (searchContent as any) && (searchContent as any).type === "web_search_tool_result_error";
@ -158,7 +157,6 @@ export async function handleAgentEvent(host: InteractiveModeStateHost & {
 									content: [{ type: "text", text: host.formatWebSearchResult(searchContent) }],
 									isError: !!isError,
 								});
-								host.pendingTools.delete(content.toolUseId);
 							}
 						}
 					}
--- a/packages/pi-coding-agent/src/modes/rpc/rpc-client.ts
+++ b/packages/pi-coding-agent/src/modes/rpc/rpc-client.ts
@ -11,7 +11,7 @@ import type { SessionStats } from "../../core/agent-session.js";
 import type { BashResult } from "../../core/bash-executor.js";
 import type { CompactionResult } from "../../core/compaction/index.js";
 import { attachJsonlLineReader, serializeJsonLine } from "./jsonl.js";
-import type { RpcCommand, RpcResponse, RpcSessionState, RpcSlashCommand } from "./rpc-types.js";
+import type { RpcCommand, RpcInitResult, RpcResponse, RpcSessionState, RpcSlashCommand } from "./rpc-types.js";

 // ============================================================================
 // Types
@ -398,6 +398,59 @@ export class RpcClient {
 		return this.getData<{ commands: RpcSlashCommand[] }>(response).commands;
 	}

+	/**
+	 * Send a UI response to a pending extension_ui_request.
+	 * Fire-and-forget — no request/response correlation.
+	 */
+	sendUIResponse(id: string, response: { value?: string; values?: string[]; confirmed?: boolean; cancelled?: boolean }): void {
+		if (!this.process?.stdin) {
+			throw new Error("Client not started");
+		}
+		this.process.stdin.write(serializeJsonLine({
+			type: "extension_ui_response",
+			id,
+			...response,
+		}));
+	}
+
+	/**
+	 * Initialize a v2 protocol session. Must be sent as the first command.
+	 * Returns the negotiated protocol version, session ID, and server capabilities.
+	 */
+	async init(options?: { clientId?: string }): Promise<RpcInitResult> {
+		const response = await this.send({ type: "init", protocolVersion: 2, clientId: options?.clientId });
+		return this.getData<RpcInitResult>(response);
+	}
+
+	/**
+	 * Request a graceful shutdown of the agent process.
+	 * Waits for the response before the process exits.
+	 */
+	async shutdown(): Promise<void> {
+		await this.send({ type: "shutdown" });
+		// Wait for process to exit after shutdown acknowledgment
+		if (this.process) {
+			await new Promise<void>((resolve) => {
+				const timeout = setTimeout(() => {
+					this.process?.kill("SIGKILL");
+					resolve();
+				}, 5000);
+				this.process?.on("exit", () => {
+					clearTimeout(timeout);
+					resolve();
+				});
+			});
+		}
+	}
+
+	/**
+	 * Subscribe to specific event types (v2 only).
+	 * Pass ["*"] to receive all events, or a list of event type strings to filter.
+	 */
+	async subscribe(events: string[]): Promise<void> {
+		await this.send({ type: "subscribe", events });
+	}
+
 	// =========================================================================
 	// Helpers
 	// =========================================================================
--- a/packages/pi-coding-agent/src/modes/rpc/rpc-mode.ts
+++ b/packages/pi-coding-agent/src/modes/rpc/rpc-mode.ts
@ -27,6 +27,7 @@ import type {
 	RpcCommand,
 	RpcExtensionUIRequest,
 	RpcExtensionUIResponse,
+	RpcInitResult,
 	RpcResponse,
 	RpcSessionState,
 	RpcSlashCommand,
@ -37,8 +38,11 @@ export type {
 	RpcCommand,
 	RpcExtensionUIRequest,
 	RpcExtensionUIResponse,
+	RpcInitResult,
+	RpcProtocolVersion,
 	RpcResponse,
 	RpcSessionState,
+	RpcV2Event,
 } from "./rpc-types.js";

 /**
@ -74,6 +78,16 @@ export async function runRpcMode(session: AgentSession): Promise<never> {
 	// Shutdown request flag
 	let shutdownRequested = false;

+	// v2 protocol version detection state
+	let protocolVersion: 1 | 2 = 1;
+	let protocolLocked = false;
+
+	// v2 runId threading: tracks the current execution run
+	let currentRunId: string | null = null;
+
+	// v2 event filtering: null = no filter (all events); Set = only listed event types
+	let eventFilter: Set<string> | null = null;
+
 	const embeddedTerminalEnabled = process.env.GSD_WEB_BRIDGE_TUI === "1";
 	const remoteTerminal = embeddedTerminalEnabled
 		? new RemoteTerminal({
@ -425,7 +439,55 @@ export async function runRpcMode(session: AgentSession): Promise<never> {

 	// Output all agent events as JSON
 	const unsubscribe = session.subscribe((event) => {
-		output(event);
+		// v2: emit synthesized events before the regular event
+		if (protocolVersion === 2) {
+			// cost_update on assistant message_end
+			if (event.type === "message_end" && event.message.role === "assistant" && currentRunId) {
+				const stats = session.getSessionStats();
+				const costUpdate = {
+					type: "cost_update" as const,
+					runId: currentRunId,
+					turnCost: session.getLastTurnCost(),
+					cumulativeCost: stats.cost,
+					tokens: {
+						input: stats.tokens.input,
+						output: stats.tokens.output,
+						cacheRead: stats.tokens.cacheRead,
+						cacheWrite: stats.tokens.cacheWrite,
+					},
+				};
+				if (!eventFilter || eventFilter.has("cost_update")) {
+					output(costUpdate);
+				}
+			}
+
+			// execution_complete on agent_end
+			if (event.type === "agent_end" && currentRunId) {
+				const stats = session.getSessionStats();
+				const completionEvent = {
+					type: "execution_complete" as const,
+					runId: currentRunId,
+					status: "completed" as const,
+					stats,
+				};
+				if (!eventFilter || eventFilter.has("execution_complete")) {
+					output(completionEvent);
+				}
+				currentRunId = null;
+			}
+		}
+
+		// Apply event filter (v2 only, applies to agent session events only)
+		if (protocolVersion === 2 && eventFilter && !eventFilter.has(event.type)) {
+			return;
+		}
+
+		// Emit the regular event, with runId injection in v2 mode
+		if (protocolVersion === 2 && currentRunId) {
+			output({ ...event, runId: currentRunId });
+		} else {
+			output(event);
+		}
 	});

 	// Handle a single command
@ -438,6 +500,9 @@ export async function runRpcMode(session: AgentSession): Promise<never> {
 			// =================================================================

 			case "prompt": {
+				// v2: generate runId for execution tracking
+				const runId = protocolVersion === 2 ? crypto.randomUUID() : undefined;
+				if (runId) currentRunId = runId;
 				// Don't await - events will stream
 				// Extension commands are executed immediately, file prompt templates are expanded
 				// If streaming and streamingBehavior specified, queues via steer/followUp
@ -448,17 +513,23 @@ export async function runRpcMode(session: AgentSession): Promise<never> {
 						source: "rpc",
 					})
 					.catch((e) => output(error(id, "prompt", e.message)));
-				return success(id, "prompt");
+				return { id, type: "response", command: "prompt", success: true, ...(runId && { runId }) } as RpcResponse;
 			}

 			case "steer": {
+				// v2: generate runId for execution tracking
+				const runId = protocolVersion === 2 ? crypto.randomUUID() : undefined;
+				if (runId) currentRunId = runId;
 				await session.steer(command.message, command.images);
-				return success(id, "steer");
+				return { id, type: "response", command: "steer", success: true, ...(runId && { runId }) } as RpcResponse;
 			}

 			case "follow_up": {
+				// v2: generate runId for execution tracking
+				const runId = protocolVersion === 2 ? crypto.randomUUID() : undefined;
+				if (runId) currentRunId = runId;
 				await session.followUp(command.message, command.images);
-				return success(id, "follow_up");
+				return { id, type: "response", command: "follow_up", success: true, ...(runId && { runId }) } as RpcResponse;
 			}

 			case "abort": {
@ -709,6 +780,28 @@ export async function runRpcMode(session: AgentSession): Promise<never> {
 				return success(id, "terminal_redraw");
 			}

+			// =================================================================
+			// v2 Protocol: subscribe
+			// =================================================================
+
+			case "subscribe": {
+				if (command.events.includes("*")) {
+					eventFilter = null; // wildcard = all events
+				} else {
+					eventFilter = new Set(command.events);
+				}
+				return success(id, "subscribe");
+			}
+
+			// =================================================================
+			// v2 Protocol: shutdown
+			// =================================================================
+
+			case "shutdown": {
+				shutdownRequested = true;
+				return success(id, "shutdown");
+			}
+
 			default: {
 				const unknownCommand = command as { type: string; id?: string };
 				return error(unknownCommand.id, unknownCommand.type, `Unknown command: ${unknownCommand.type}`);
@ -741,7 +834,7 @@ export async function runRpcMode(session: AgentSession): Promise<never> {
 		try {
 			const parsed = JSON.parse(line);

-			// Handle extension UI responses
+			// Handle extension UI responses (bypass protocol detection)
 			if (parsed.type === "extension_ui_response") {
 				const response = parsed as RpcExtensionUIResponse;
 				const pending = pendingExtensionRequests.get(response.id);
@ -752,8 +845,33 @@ export async function runRpcMode(session: AgentSession): Promise<never> {
 				return;
 			}

-			// Handle regular commands
 			const command = parsed as RpcCommand;
+
+			// Protocol version detection: first non-UI-response command locks the version
+			if (!protocolLocked) {
+				protocolLocked = true;
+				if (command.type === "init") {
+					protocolVersion = 2;
+					const initResult: RpcInitResult = {
+						protocolVersion: 2,
+						sessionId: session.sessionId,
+						capabilities: {
+							events: ["execution_complete", "cost_update"],
+							commands: ["init", "shutdown", "subscribe"],
+						},
+					};
+					output(success(command.id, "init", initResult));
+					return;
+				}
+				// Non-init first message: lock to v1, fall through to normal handling
+				protocolVersion = 1;
+			} else if (command.type === "init") {
+				// Already locked — reject re-init
+				output(error(command.id, "init", "Protocol version already locked. init must be the first command."));
+				return;
+			}
+
+			// Handle regular commands
 			const response = await handleCommand(command);
 			output(response);

--- a/packages/pi-coding-agent/src/modes/rpc/rpc-protocol-v2.test.ts
+++ b/packages/pi-coding-agent/src/modes/rpc/rpc-protocol-v2.test.ts
@ -0,0 +1,971 @@
+/**
+ * RPC Protocol v2 test suite.
+ *
+ * Tests v1 backward compatibility, v2 init handshake, protocol locking,
+ * v2 feature type shapes, and RpcClient command serialization against
+ * mock child processes using PassThrough streams.
+ */
+
+import { describe, it, beforeEach, afterEach, mock } from "node:test";
+import assert from "node:assert/strict";
+import { PassThrough } from "node:stream";
+import { attachJsonlLineReader, serializeJsonLine } from "./jsonl.js";
+import type {
+	RpcCommand,
+	RpcResponse,
+	RpcInitResult,
+	RpcExecutionCompleteEvent,
+	RpcCostUpdateEvent,
+	RpcV2Event,
+	RpcProtocolVersion,
+	RpcSessionState,
+} from "./rpc-types.js";
+
+// ============================================================================
+// Helpers
+// ============================================================================
+
+/** Collect JSONL output lines from a stream */
+function collectLines(stream: PassThrough): { lines: unknown[]; detach: () => void } {
+	const lines: unknown[] = [];
+	const detach = attachJsonlLineReader(stream, (line) => {
+		try {
+			lines.push(JSON.parse(line));
+		} catch {
+			// skip non-JSON lines
+		}
+	});
+	return { lines, detach };
+}
+
+/** Write a command as JSONL to a writable stream and wait for drain */
+function writeLine(stream: PassThrough, obj: unknown): void {
+	stream.write(serializeJsonLine(obj));
+}
+
+/**
+ * Create a mock "child process" with piped stdin/stdout.
+ * clientStdin  → data flows into the "server" (from the client's perspective, this is what the client writes to)
+ * clientStdout ← data flows out of the "server" (from the client's perspective, this is what the client reads from)
+ *
+ * The test acts as the "server": read from clientStdin, write to clientStdout.
+ */
+function createMockProcess() {
+	// Client writes to this → server reads from it
+	const clientStdin = new PassThrough();
+	// Server writes to this → client reads from it
+	const clientStdout = new PassThrough();
+
+	return { clientStdin, clientStdout };
+}
+
+/** Wait a tick for async handlers to process */
+function tick(ms = 10): Promise<void> {
+	return new Promise((resolve) => setTimeout(resolve, ms));
+}
+
+// ============================================================================
+// JSONL utilities
+// ============================================================================
+
+describe("JSONL utilities", () => {
+	it("serializeJsonLine produces newline-terminated JSON", () => {
+		const result = serializeJsonLine({ type: "test", value: 42 });
+		assert.equal(result, '{"type":"test","value":42}\n');
+	});
+
+	it("serializeJsonLine handles nested objects", () => {
+		const result = serializeJsonLine({ a: { b: [1, 2, 3] } });
+		assert.ok(result.endsWith("\n"));
+		const parsed = JSON.parse(result.trim());
+		assert.deepEqual(parsed, { a: { b: [1, 2, 3] } });
+	});
+
+	it("attachJsonlLineReader splits on LF only", async () => {
+		const stream = new PassThrough();
+		const { lines, detach } = collectLines(stream);
+
+		stream.write('{"a":1}\n{"b":2}\n');
+		await tick();
+
+		assert.equal(lines.length, 2);
+		assert.deepEqual(lines[0], { a: 1 });
+		assert.deepEqual(lines[1], { b: 2 });
+		detach();
+	});
+
+	it("attachJsonlLineReader handles partial writes", async () => {
+		const stream = new PassThrough();
+		const { lines, detach } = collectLines(stream);
+
+		stream.write('{"partial":');
+		await tick();
+		assert.equal(lines.length, 0);
+
+		stream.write('"value"}\n');
+		await tick();
+		assert.equal(lines.length, 1);
+		assert.deepEqual(lines[0], { partial: "value" });
+		detach();
+	});
+
+	it("attachJsonlLineReader handles CR+LF", async () => {
+		const stream = new PassThrough();
+		const { lines, detach } = collectLines(stream);
+
+		stream.write('{"cr":"lf"}\r\n');
+		await tick();
+		assert.equal(lines.length, 1);
+		assert.deepEqual(lines[0], { cr: "lf" });
+		detach();
+	});
+
+	it("detach stops line delivery", async () => {
+		const stream = new PassThrough();
+		const { lines, detach } = collectLines(stream);
+
+		stream.write('{"before":1}\n');
+		await tick();
+		assert.equal(lines.length, 1);
+
+		detach();
+
+		stream.write('{"after":2}\n');
+		await tick();
+		// Should still be 1 since we detached
+		assert.equal(lines.length, 1);
+	});
+});
+
+// ============================================================================
+// v2 type shape assertions
+// ============================================================================
+
+describe("v2 type shapes", () => {
+	it("RpcInitResult has required fields", () => {
+		const initResult: RpcInitResult = {
+			protocolVersion: 2,
+			sessionId: "test-session-123",
+			capabilities: {
+				events: ["execution_complete", "cost_update"],
+				commands: ["init", "shutdown", "subscribe"],
+			},
+		};
+		assert.equal(initResult.protocolVersion, 2);
+		assert.ok(typeof initResult.sessionId === "string");
+		assert.ok(Array.isArray(initResult.capabilities.events));
+		assert.ok(Array.isArray(initResult.capabilities.commands));
+		assert.ok(initResult.capabilities.events.includes("execution_complete"));
+		assert.ok(initResult.capabilities.events.includes("cost_update"));
+		assert.ok(initResult.capabilities.commands.includes("init"));
+		assert.ok(initResult.capabilities.commands.includes("shutdown"));
+		assert.ok(initResult.capabilities.commands.includes("subscribe"));
+	});
+
+	it("RpcExecutionCompleteEvent matches expected shape", () => {
+		const event: RpcExecutionCompleteEvent = {
+			type: "execution_complete",
+			runId: "run-abc-123",
+			status: "completed",
+			stats: {
+				cost: 0.05,
+				turns: 3,
+				duration: 12000,
+				tokens: { input: 1000, output: 500, cacheRead: 200, cacheWrite: 100 },
+			} as any, // SessionStats is complex, we just verify shape
+		};
+		assert.equal(event.type, "execution_complete");
+		assert.ok(typeof event.runId === "string");
+		assert.ok(["completed", "error", "cancelled"].includes(event.status));
+		assert.ok(event.stats !== undefined);
+	});
+
+	it("RpcExecutionCompleteEvent supports error status with reason", () => {
+		const event: RpcExecutionCompleteEvent = {
+			type: "execution_complete",
+			runId: "run-err-456",
+			status: "error",
+			reason: "API rate limit exceeded",
+			stats: {} as any,
+		};
+		assert.equal(event.status, "error");
+		assert.equal(event.reason, "API rate limit exceeded");
+	});
+
+	it("RpcCostUpdateEvent matches expected shape", () => {
+		const event: RpcCostUpdateEvent = {
+			type: "cost_update",
+			runId: "run-cost-789",
+			turnCost: 0.01,
+			cumulativeCost: 0.05,
+			tokens: {
+				input: 500,
+				output: 200,
+				cacheRead: 100,
+				cacheWrite: 50,
+			},
+		};
+		assert.equal(event.type, "cost_update");
+		assert.ok(typeof event.runId === "string");
+		assert.ok(typeof event.turnCost === "number");
+		assert.ok(typeof event.cumulativeCost === "number");
+		assert.ok(typeof event.tokens.input === "number");
+		assert.ok(typeof event.tokens.output === "number");
+		assert.ok(typeof event.tokens.cacheRead === "number");
+		assert.ok(typeof event.tokens.cacheWrite === "number");
+	});
+
+	it("RpcV2Event discriminated union resolves by type field", () => {
+		const events: RpcV2Event[] = [
+			{
+				type: "execution_complete",
+				runId: "r1",
+				status: "completed",
+				stats: {} as any,
+			},
+			{
+				type: "cost_update",
+				runId: "r2",
+				turnCost: 0.01,
+				cumulativeCost: 0.03,
+				tokens: { input: 100, output: 50, cacheRead: 10, cacheWrite: 5 },
+			},
+		];
+
+		for (const event of events) {
+			if (event.type === "execution_complete") {
+				// TypeScript narrows to RpcExecutionCompleteEvent
+				assert.ok("status" in event);
+				assert.ok("stats" in event);
+			} else if (event.type === "cost_update") {
+				// TypeScript narrows to RpcCostUpdateEvent
+				assert.ok("turnCost" in event);
+				assert.ok("tokens" in event);
+			} else {
+				assert.fail(`Unexpected event type: ${(event as any).type}`);
+			}
+		}
+	});
+
+	it("RpcProtocolVersion is 1 or 2", () => {
+		const v1: RpcProtocolVersion = 1;
+		const v2: RpcProtocolVersion = 2;
+		assert.equal(v1, 1);
+		assert.equal(v2, 2);
+	});
+
+	it("v2 prompt response includes optional runId field", () => {
+		const v1Response: RpcResponse = {
+			id: "1",
+			type: "response",
+			command: "prompt",
+			success: true,
+		};
+		assert.equal(v1Response.success, true);
+		assert.equal((v1Response as any).runId, undefined);
+
+		const v2Response: RpcResponse = {
+			id: "2",
+			type: "response",
+			command: "prompt",
+			success: true,
+			runId: "run-123",
+		};
+		assert.equal(v2Response.success, true);
+		assert.equal((v2Response as any).runId, "run-123");
+	});
+
+	it("v2 command types are present in RpcCommand union", () => {
+		// These compile — that's the actual test. Runtime verification:
+		const initCmd: RpcCommand = { type: "init", protocolVersion: 2 };
+		const shutdownCmd: RpcCommand = { type: "shutdown" };
+		const subscribeCmd: RpcCommand = { type: "subscribe", events: ["agent_end"] };
+
+		assert.equal(initCmd.type, "init");
+		assert.equal(shutdownCmd.type, "shutdown");
+		assert.equal(subscribeCmd.type, "subscribe");
+	});
+
+	it("init command supports optional clientId", () => {
+		const cmd: RpcCommand = { type: "init", protocolVersion: 2, clientId: "my-client" };
+		assert.equal(cmd.type, "init");
+		if (cmd.type === "init") {
+			assert.equal(cmd.clientId, "my-client");
+		}
+	});
+
+	it("shutdown command supports optional graceful flag", () => {
+		const cmd: RpcCommand = { type: "shutdown", graceful: true };
+		if (cmd.type === "shutdown") {
+			assert.equal(cmd.graceful, true);
+		}
+	});
+
+	it("v2 response types include init, shutdown, subscribe", () => {
+		const initResp: RpcResponse = {
+			type: "response",
+			command: "init",
+			success: true,
+			data: {
+				protocolVersion: 2,
+				sessionId: "s1",
+				capabilities: { events: [], commands: [] },
+			},
+		};
+		const shutdownResp: RpcResponse = {
+			type: "response",
+			command: "shutdown",
+			success: true,
+		};
+		const subscribeResp: RpcResponse = {
+			type: "response",
+			command: "subscribe",
+			success: true,
+		};
+
+		assert.equal(initResp.command, "init");
+		assert.equal(shutdownResp.command, "shutdown");
+		assert.equal(subscribeResp.command, "subscribe");
+	});
+});
+
+// ============================================================================
+// v1 backward compatibility
+// ============================================================================
+
+describe("v1 backward compatibility — command shapes", () => {
+	it("v1 prompt command has no protocolVersion or runId", () => {
+		const cmd: RpcCommand = { type: "prompt", message: "hello" };
+		assert.equal(cmd.type, "prompt");
+		assert.equal((cmd as any).protocolVersion, undefined);
+		assert.equal((cmd as any).runId, undefined);
+	});
+
+	it("v1 get_state response has no v2 fields", () => {
+		const state: RpcSessionState = {
+			thinkingLevel: "medium",
+			isStreaming: false,
+			isCompacting: false,
+			steeringMode: "all",
+			followUpMode: "all",
+			sessionId: "test-id",
+			autoCompactionEnabled: true,
+			autoRetryEnabled: false,
+			retryInProgress: false,
+			retryAttempt: 0,
+			messageCount: 0,
+			pendingMessageCount: 0,
+			extensionsReady: true,
+		};
+		// v1 state should not include any v2-specific fields
+		assert.equal((state as any).protocolVersion, undefined);
+		assert.equal((state as any).runId, undefined);
+	});
+
+	it("v1 prompt response has no runId", () => {
+		const resp: RpcResponse = {
+			id: "1",
+			type: "response",
+			command: "prompt",
+			success: true,
+		};
+		assert.equal(resp.success, true);
+		// runId is optional; in v1 mode it won't be present
+		assert.equal((resp as any).runId, undefined);
+	});
+
+	it("error response shape is consistent across v1 and v2", () => {
+		const errResp: RpcResponse = {
+			id: "err-1",
+			type: "response",
+			command: "init",
+			success: false,
+			error: "Protocol version already locked. init must be the first command.",
+		};
+		assert.equal(errResp.success, false);
+		if (!errResp.success) {
+			assert.ok(typeof errResp.error === "string");
+			assert.ok(errResp.error.length > 0);
+		}
+	});
+});
+
+// ============================================================================
+// RpcClient command serialization tests (mock process)
+// ============================================================================
+
+describe("RpcClient command serialization", () => {
+	// We import the class dynamically to avoid the full module graph at test time.
+	// Instead we test the protocol framing directly — what gets written to stdin and
+	// what comes back from stdout — using PassThrough streams.
+
+	it("init command serializes correctly", () => {
+		const cmd = { id: "req_1", type: "init", protocolVersion: 2 };
+		const serialized = serializeJsonLine(cmd);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.type, "init");
+		assert.equal(parsed.protocolVersion, 2);
+		assert.equal(parsed.id, "req_1");
+	});
+
+	it("init command with clientId serializes correctly", () => {
+		const cmd = { id: "req_1", type: "init", protocolVersion: 2, clientId: "test-client" };
+		const serialized = serializeJsonLine(cmd);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.clientId, "test-client");
+	});
+
+	it("shutdown command serializes correctly", () => {
+		const cmd = { id: "req_2", type: "shutdown" };
+		const serialized = serializeJsonLine(cmd);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.type, "shutdown");
+		assert.equal(parsed.id, "req_2");
+	});
+
+	it("subscribe command serializes correctly with event list", () => {
+		const cmd = { id: "req_3", type: "subscribe", events: ["agent_end", "cost_update"] };
+		const serialized = serializeJsonLine(cmd);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.type, "subscribe");
+		assert.deepEqual(parsed.events, ["agent_end", "cost_update"]);
+	});
+
+	it("subscribe command with wildcard serializes correctly", () => {
+		const cmd = { id: "req_4", type: "subscribe", events: ["*"] };
+		const serialized = serializeJsonLine(cmd);
+		const parsed = JSON.parse(serialized);
+		assert.deepEqual(parsed.events, ["*"]);
+	});
+
+	it("subscribe command with empty array serializes correctly", () => {
+		const cmd = { id: "req_5", type: "subscribe", events: [] as string[] };
+		const serialized = serializeJsonLine(cmd);
+		const parsed = JSON.parse(serialized);
+		assert.deepEqual(parsed.events, []);
+	});
+
+	it("sendUIResponse serializes correct JSONL", () => {
+		const response = {
+			type: "extension_ui_response",
+			id: "ui-req-123",
+			value: "test-value",
+		};
+		const serialized = serializeJsonLine(response);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.type, "extension_ui_response");
+		assert.equal(parsed.id, "ui-req-123");
+		assert.equal(parsed.value, "test-value");
+	});
+
+	it("sendUIResponse with cancelled flag serializes correctly", () => {
+		const response = {
+			type: "extension_ui_response",
+			id: "ui-req-456",
+			cancelled: true,
+		};
+		const serialized = serializeJsonLine(response);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.type, "extension_ui_response");
+		assert.equal(parsed.cancelled, true);
+	});
+
+	it("sendUIResponse with confirmed flag serializes correctly", () => {
+		const response = {
+			type: "extension_ui_response",
+			id: "ui-req-789",
+			confirmed: true,
+		};
+		const serialized = serializeJsonLine(response);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.confirmed, true);
+	});
+
+	it("sendUIResponse with multiple values serializes correctly", () => {
+		const response = {
+			type: "extension_ui_response",
+			id: "ui-req-multi",
+			values: ["opt-a", "opt-b"],
+		};
+		const serialized = serializeJsonLine(response);
+		const parsed = JSON.parse(serialized);
+		assert.deepEqual(parsed.values, ["opt-a", "opt-b"]);
+	});
+
+	it("prompt command with runId in v2 response", () => {
+		const response = {
+			id: "req_10",
+			type: "response",
+			command: "prompt",
+			success: true,
+			runId: "run-uuid-abc",
+		};
+		const serialized = serializeJsonLine(response);
+		const parsed = JSON.parse(serialized);
+		assert.equal(parsed.runId, "run-uuid-abc");
+		assert.equal(parsed.command, "prompt");
+		assert.equal(parsed.success, true);
+	});
+});
+
+// ============================================================================
+// Client ↔ Mock server integration (PassThrough streams)
+// ============================================================================
+
+describe("Client ↔ Mock server protocol exchange", () => {
+	let clientStdin: PassThrough;
+	let clientStdout: PassThrough;
+
+	beforeEach(() => {
+		const mockProc = createMockProcess();
+		clientStdin = mockProc.clientStdin;
+		clientStdout = mockProc.clientStdout;
+	});
+
+	afterEach(() => {
+		clientStdin.destroy();
+		clientStdout.destroy();
+	});
+
+	it("init handshake: client writes init, server responds with init_result", async () => {
+		// Collect what the client would write
+		const { lines: clientWrites, detach: detachStdin } = collectLines(clientStdin);
+
+		// Client sends init command
+		writeLine(clientStdin, { id: "req_1", type: "init", protocolVersion: 2 });
+		await tick();
+
+		assert.equal(clientWrites.length, 1);
+		const initCmd = clientWrites[0] as any;
+		assert.equal(initCmd.type, "init");
+		assert.equal(initCmd.protocolVersion, 2);
+
+		// Server responds with init_result
+		const initResult: RpcInitResult = {
+			protocolVersion: 2,
+			sessionId: "sess-abc",
+			capabilities: {
+				events: ["execution_complete", "cost_update"],
+				commands: ["init", "shutdown", "subscribe"],
+			},
+		};
+		writeLine(clientStdout, {
+			id: "req_1",
+			type: "response",
+			command: "init",
+			success: true,
+			data: initResult,
+		});
+
+		// Collect server response
+		const { lines: serverResponses, detach: detachStdout } = collectLines(clientStdout);
+		// Already wrote above, but let's verify the shape by re-writing
+		writeLine(clientStdout, {
+			id: "req_verify",
+			type: "response",
+			command: "init",
+			success: true,
+			data: initResult,
+		});
+		await tick();
+
+		const resp = serverResponses[0] as any;
+		assert.equal(resp.type, "response");
+		assert.equal(resp.command, "init");
+		assert.equal(resp.success, true);
+		assert.equal(resp.data.protocolVersion, 2);
+		assert.ok(typeof resp.data.sessionId === "string");
+
+		detachStdin();
+		detachStdout();
+	});
+
+	it("shutdown: client writes shutdown, server acknowledges", async () => {
+		const { lines: clientWrites, detach } = collectLines(clientStdin);
+
+		writeLine(clientStdin, { id: "req_2", type: "shutdown" });
+		await tick();
+
+		const cmd = clientWrites[0] as any;
+		assert.equal(cmd.type, "shutdown");
+
+		detach();
+	});
+
+	it("subscribe: client writes subscribe with event list", async () => {
+		const { lines: clientWrites, detach } = collectLines(clientStdin);
+
+		writeLine(clientStdin, { id: "req_3", type: "subscribe", events: ["agent_end", "execution_complete"] });
+		await tick();
+
+		const cmd = clientWrites[0] as any;
+		assert.equal(cmd.type, "subscribe");
+		assert.deepEqual(cmd.events, ["agent_end", "execution_complete"]);
+
+		detach();
+	});
+
+	it("sendUIResponse: client writes extension_ui_response", async () => {
+		const { lines: clientWrites, detach } = collectLines(clientStdin);
+
+		writeLine(clientStdin, {
+			type: "extension_ui_response",
+			id: "ui-123",
+			value: "selected-option",
+		});
+		await tick();
+
+		const msg = clientWrites[0] as any;
+		assert.equal(msg.type, "extension_ui_response");
+		assert.equal(msg.id, "ui-123");
+		assert.equal(msg.value, "selected-option");
+
+		detach();
+	});
+
+	it("v2 event filtering: subscribe with empty array should filter all", async () => {
+		// An empty event filter means no events pass through (Set with 0 entries)
+		const subscribeCmd = { id: "req_4", type: "subscribe", events: [] as string[] };
+		const serialized = serializeJsonLine(subscribeCmd);
+		const parsed = JSON.parse(serialized);
+		assert.deepEqual(parsed.events, []);
+		// Server-side: `eventFilter = new Set([])` — Set.has(anything) returns false
+		const filter = new Set(parsed.events as string[]);
+		assert.equal(filter.has("agent_end"), false);
+		assert.equal(filter.has("execution_complete"), false);
+		assert.equal(filter.size, 0);
+	});
+
+	it("v2 event filtering: subscribe with wildcard resets filter", async () => {
+		// Server-side: `events.includes("*")` → `eventFilter = null`
+		const subscribeCmd = { type: "subscribe", events: ["*"] };
+		const parsed = JSON.parse(serializeJsonLine(subscribeCmd));
+		const hasWildcard = (parsed.events as string[]).includes("*");
+		assert.equal(hasWildcard, true);
+		// When wildcard is detected, filter becomes null (all events pass)
+	});
+
+	it("multiple commands can be sent sequentially", async () => {
+		const { lines, detach } = collectLines(clientStdin);
+
+		writeLine(clientStdin, { id: "1", type: "init", protocolVersion: 2 });
+		writeLine(clientStdin, { id: "2", type: "subscribe", events: ["agent_end"] });
+		writeLine(clientStdin, { id: "3", type: "prompt", message: "hello" });
+		await tick();
+
+		assert.equal(lines.length, 3);
+		assert.equal((lines[0] as any).type, "init");
+		assert.equal((lines[1] as any).type, "subscribe");
+		assert.equal((lines[2] as any).type, "prompt");
+
+		detach();
+	});
+});
+
+// ============================================================================
+// Negative tests — malformed inputs, error paths, boundary conditions
+// ============================================================================
+
+describe("Negative tests — protocol error shapes", () => {
+	it("init with missing protocolVersion produces a type error at compile time", () => {
+		// Runtime check: a message missing protocolVersion is malformed
+		const malformed = { type: "init" } as any;
+		assert.equal(malformed.protocolVersion, undefined);
+		// Server would treat this as v1 lock since it's not a valid init
+	});
+
+	it("subscribe with non-array events is a type violation", () => {
+		// Runtime: server expects events to be string[]
+		const malformed = { type: "subscribe", events: "agent_end" } as any;
+		assert.equal(typeof malformed.events, "string"); // Not an array
+		assert.equal(Array.isArray(malformed.events), false);
+	});
+
+	it("double init error response shape", () => {
+		// When init is sent after protocol lock, server returns error
+		const errorResp: RpcResponse = {
+			id: "req_dup",
+			type: "response",
+			command: "init",
+			success: false,
+			error: "Protocol version already locked. init must be the first command.",
+		};
+		assert.equal(errorResp.success, false);
+		if (!errorResp.success) {
+			assert.ok(errorResp.error.includes("already locked"));
+		}
+	});
+
+	it("init after v1 lock error response shape", () => {
+		// First command was get_state (v1 lock), then init arrives
+		const errorResp: RpcResponse = {
+			id: "req_late_init",
+			type: "response",
+			command: "init",
+			success: false,
+			error: "Protocol version already locked. init must be the first command.",
+		};
+		assert.equal(errorResp.success, false);
+		if (!errorResp.success) {
+			assert.ok(errorResp.error.includes("init must be the first command"));
+		}
+	});
+
+	it("unknown command type produces error response", () => {
+		const errorResp: RpcResponse = {
+			id: "req_unknown",
+			type: "response",
+			command: "nonexistent",
+			success: false,
+			error: "Unknown command: nonexistent",
+		};
+		assert.equal(errorResp.success, false);
+		if (!errorResp.success) {
+			assert.ok(errorResp.error.includes("Unknown command"));
+		}
+	});
+
+	it("malformed JSON parse error shape", () => {
+		const errorResp: RpcResponse = {
+			type: "response",
+			command: "parse",
+			success: false,
+			error: "Failed to parse command: Unexpected token",
+		};
+		assert.equal(errorResp.command, "parse");
+		assert.equal(errorResp.success, false);
+	});
+
+	it("shutdown works in both v1 and v2 — no version gating", () => {
+		// shutdown returns success regardless of protocolVersion
+		const v1Shutdown: RpcResponse = {
+			id: "s1",
+			type: "response",
+			command: "shutdown",
+			success: true,
+		};
+		const v2Shutdown: RpcResponse = {
+			id: "s2",
+			type: "response",
+			command: "shutdown",
+			success: true,
+		};
+		assert.equal(v1Shutdown.success, true);
+		assert.equal(v2Shutdown.success, true);
+	});
+});
+
+// ============================================================================
+// Protocol version detection logic (unit)
+// ============================================================================
+
+describe("Protocol version detection logic", () => {
+	it("simulates v1 lock when first command is non-init", () => {
+		let protocolVersion: 1 | 2 = 1;
+		let protocolLocked = false;
+
+		// Simulate first command being get_state
+		const command = { type: "get_state" } as RpcCommand;
+
+		if (!protocolLocked) {
+			protocolLocked = true;
+			if (command.type === "init") {
+				protocolVersion = 2;
+			} else {
+				protocolVersion = 1;
+			}
+		}
+
+		assert.equal(protocolVersion, 1);
+		assert.equal(protocolLocked, true);
+	});
+
+	it("simulates v2 lock when first command is init", () => {
+		let protocolVersion: 1 | 2 = 1;
+		let protocolLocked = false;
+
+		const command: RpcCommand = { type: "init", protocolVersion: 2 };
+
+		if (!protocolLocked) {
+			protocolLocked = true;
+			if (command.type === "init") {
+				protocolVersion = 2;
+			} else {
+				protocolVersion = 1;
+			}
+		}
+
+		assert.equal(protocolVersion, 2);
+		assert.equal(protocolLocked, true);
+	});
+
+	it("rejects re-init after v2 lock", () => {
+		let protocolLocked = true; // already locked from first init
+		let errorMessage: string | null = null;
+
+		const command: RpcCommand = { type: "init", protocolVersion: 2 };
+
+		if (protocolLocked && command.type === "init") {
+			errorMessage = "Protocol version already locked. init must be the first command.";
+		}
+
+		assert.ok(errorMessage !== null);
+		assert.ok(errorMessage!.includes("already locked"));
+	});
+
+	it("rejects init after v1 lock", () => {
+		let protocolLocked = true; // already locked from first non-init command
+		let protocolVersion: 1 | 2 = 1;
+		let errorMessage: string | null = null;
+
+		const command: RpcCommand = { type: "init", protocolVersion: 2 };
+
+		if (protocolLocked && command.type === "init") {
+			errorMessage = "Protocol version already locked. init must be the first command.";
+		}
+
+		assert.equal(protocolVersion, 1); // stays v1
+		assert.ok(errorMessage !== null);
+	});
+
+	it("extension_ui_response bypasses protocol detection", () => {
+		let protocolLocked = false;
+		let protocolDetectionTriggered = false;
+
+		// Simulate the handleInputLine logic
+		const parsed = { type: "extension_ui_response", id: "ui-1", value: "ok" };
+
+		if (parsed.type === "extension_ui_response") {
+			// Bypass — do not touch protocolLocked
+		} else {
+			protocolDetectionTriggered = true;
+			if (!protocolLocked) {
+				protocolLocked = true;
+			}
+		}
+
+		assert.equal(protocolLocked, false);
+		assert.equal(protocolDetectionTriggered, false);
+	});
+});
+
+// ============================================================================
+// v2 event filter logic (unit)
+// ============================================================================
+
+describe("v2 event filter logic", () => {
+	/** Mimics the server-side event filter check: null means all events pass */
+	function shouldEmit(filter: Set<string> | null, eventType: string): boolean {
+		return !filter || filter.has(eventType);
+	}
+
+	it("null filter passes all events", () => {
+		assert.equal(shouldEmit(null, "agent_end"), true);
+		assert.equal(shouldEmit(null, "cost_update"), true);
+		assert.equal(shouldEmit(null, "anything"), true);
+	});
+
+	it("filter with specific events passes matching events", () => {
+		const filter = new Set(["agent_end", "cost_update"]);
+
+		assert.equal(shouldEmit(filter, "agent_end"), true);
+		assert.equal(shouldEmit(filter, "cost_update"), true);
+		assert.equal(shouldEmit(filter, "execution_complete"), false);
+		assert.equal(shouldEmit(filter, "message_start"), false);
+	});
+
+	it("empty Set filter blocks all events", () => {
+		const filter = new Set<string>();
+
+		assert.equal(shouldEmit(filter, "agent_end"), false);
+		assert.equal(shouldEmit(filter, "cost_update"), false);
+		assert.equal(shouldEmit(filter, "anything"), false);
+		assert.equal(filter.size, 0);
+	});
+
+	it("wildcard subscribe resets filter to null", () => {
+		let eventFilter: Set<string> | null = new Set(["agent_end"]);
+
+		// Simulate subscribe with wildcard
+		const events = ["*"];
+		if (events.includes("*")) {
+			eventFilter = null;
+		} else {
+			eventFilter = new Set(events);
+		}
+
+		assert.equal(eventFilter, null);
+	});
+
+	it("subscribe replaces previous filter", () => {
+		let eventFilter: Set<string> | null = new Set(["agent_end"]);
+
+		// Subscribe with different events
+		const events = ["cost_update", "execution_complete"];
+		if (events.includes("*")) {
+			eventFilter = null;
+		} else {
+			eventFilter = new Set(events);
+		}
+
+		assert.equal(eventFilter!.has("agent_end"), false);
+		assert.equal(eventFilter!.has("cost_update"), true);
+		assert.equal(eventFilter!.has("execution_complete"), true);
+	});
+
+	it("filter applies to both regular and synthesized v2 events", () => {
+		const eventFilter = new Set(["execution_complete"]);
+
+		// Regular event
+		assert.equal(eventFilter.has("agent_end"), false); // filtered out
+		// Synthesized v2 event
+		assert.equal(eventFilter.has("execution_complete"), true); // passes
+		assert.equal(eventFilter.has("cost_update"), false); // filtered out
+	});
+});
+
+// ============================================================================
+// v2 runId injection logic (unit)
+// ============================================================================
+
+describe("v2 runId injection", () => {
+	it("runId is present when protocolVersion is 2 and command is prompt/steer/follow_up", () => {
+		const protocolVersion = 2;
+		const commands = ["prompt", "steer", "follow_up"] as const;
+
+		for (const cmdType of commands) {
+			const runId = protocolVersion === 2 ? `run-${cmdType}-uuid` : undefined;
+			assert.ok(runId !== undefined, `runId should be generated for ${cmdType} in v2`);
+			assert.ok(typeof runId === "string");
+		}
+	});
+
+	it("runId is undefined when protocolVersion is 1", () => {
+		// Test the v1 path: runId should not be generated
+		function generateRunId(version: 1 | 2): string | undefined {
+			return version === 2 ? "run-uuid" : undefined;
+		}
+		assert.equal(generateRunId(1), undefined);
+		assert.ok(typeof generateRunId(2) === "string");
+	});
+
+	it("runId is injected into event output via spread", () => {
+		const currentRunId = "run-abc-123";
+		const event = { type: "message_start", message: { role: "assistant" } };
+
+		// v2 injection logic from rpc-mode.ts
+		const outputEvent = currentRunId ? { ...event, runId: currentRunId } : event;
+
+		assert.equal((outputEvent as any).runId, "run-abc-123");
+		assert.equal((outputEvent as any).type, "message_start");
+	});
+
+	it("runId is not injected when null", () => {
+		const currentRunId: string | null = null;
+		const event = { type: "message_start", message: { role: "assistant" } };
+
+		const outputEvent = currentRunId ? { ...event, runId: currentRunId } : event;
+
+		assert.equal((outputEvent as any).runId, undefined);
+	});
+});
--- a/packages/pi-coding-agent/src/modes/rpc/rpc-types.ts
+++ b/packages/pi-coding-agent/src/modes/rpc/rpc-types.ts
@ -11,6 +11,13 @@ import type { SessionStats } from "../../core/agent-session.js";
 import type { BashResult } from "../../core/bash-executor.js";
 import type { CompactionResult } from "../../core/compaction/index.js";

+// ============================================================================
+// RPC Protocol Versioning
+// ============================================================================
+
+/** Supported protocol versions. v1 is the implicit default; v2 requires an init handshake. */
+export type RpcProtocolVersion = 1 | 2;
+
 // ============================================================================
 // RPC Commands (stdin)
 // ============================================================================
@ -69,7 +76,12 @@ export type RpcCommand =
 	// Bridge-hosted native terminal
 	| { id?: string; type: "terminal_input"; data: string }
 	| { id?: string; type: "terminal_resize"; cols: number; rows: number }
-	| { id?: string; type: "terminal_redraw" };
+	| { id?: string; type: "terminal_redraw" }
+
+	// v2 Protocol
+	| { id?: string; type: "init"; protocolVersion: 2; clientId?: string }
+	| { id?: string; type: "shutdown"; graceful?: boolean }
+	| { id?: string; type: "subscribe"; events: string[] };

 // ============================================================================
 // RPC Slash Command (for get_commands response)
@ -120,9 +132,9 @@ export interface RpcSessionState {
 // Success responses with data
 export type RpcResponse =
 	// Prompting (async - events follow)
-	| { id?: string; type: "response"; command: "prompt"; success: true }
-	| { id?: string; type: "response"; command: "steer"; success: true }
-	| { id?: string; type: "response"; command: "follow_up"; success: true }
+	| { id?: string; type: "response"; command: "prompt"; success: true; runId?: string }
+	| { id?: string; type: "response"; command: "steer"; success: true; runId?: string }
+	| { id?: string; type: "response"; command: "follow_up"; success: true; runId?: string }
 	| { id?: string; type: "response"; command: "abort"; success: true }
 	| { id?: string; type: "response"; command: "new_session"; success: true; data: { cancelled: boolean } }

@ -216,9 +228,54 @@ export type RpcResponse =
 	| { id?: string; type: "response"; command: "terminal_resize"; success: true }
 	| { id?: string; type: "response"; command: "terminal_redraw"; success: true }

+	// v2 Protocol
+	| { id?: string; type: "response"; command: "init"; success: true; data: RpcInitResult }
+	| { id?: string; type: "response"; command: "shutdown"; success: true }
+	| { id?: string; type: "response"; command: "subscribe"; success: true }
+
 	// Error response (any command can fail)
 	| { id?: string; type: "response"; command: string; success: false; error: string };

+// ============================================================================
+// v2 Protocol Types
+// ============================================================================
+
+/** Result of the init handshake (v2 only) */
+export interface RpcInitResult {
+	protocolVersion: 2;
+	sessionId: string;
+	capabilities: {
+		events: string[];
+		commands: string[];
+	};
+}
+
+/** v2 execution_complete event — emitted when a prompt/steer/follow_up finishes */
+export interface RpcExecutionCompleteEvent {
+	type: "execution_complete";
+	runId: string;
+	status: "completed" | "error" | "cancelled";
+	reason?: string;
+	stats: SessionStats;
+}
+
+/** v2 cost_update event — emitted per-turn with running cost data */
+export interface RpcCostUpdateEvent {
+	type: "cost_update";
+	runId: string;
+	turnCost: number;
+	cumulativeCost: number;
+	tokens: {
+		input: number;
+		output: number;
+		cacheRead: number;
+		cacheWrite: number;
+	};
+}
+
+/** Discriminated union of all v2-only event types */
+export type RpcV2Event = RpcExecutionCompleteEvent | RpcCostUpdateEvent;
+
 // ============================================================================
 // Extension UI Events (stdout)
 // ============================================================================
--- a/packages/pi-coding-agent/src/utils/shell.ts
+++ b/packages/pi-coding-agent/src/utils/shell.ts
@ -192,7 +192,6 @@ export function killProcessTree(pid: number): void {
 		try {
 			spawn("taskkill", ["/F", "/T", "/PID", String(pid)], {
 				stdio: "ignore",
-				detached: true,
 			});
 		} catch {
 			// Ignore errors if taskkill fails
--- a/src/headless-events.ts
+++ b/src/headless-events.ts
@ -3,8 +3,47 @@
 *
 * Detects terminal notifications, blocked notifications, milestone-ready signals,
 * and classifies commands as quick (single-turn) vs long-running.
+ *
+ * Also defines exit code constants and the status→exit-code mapping function.
 */

+// ---------------------------------------------------------------------------
+// Exit Code Constants
+// ---------------------------------------------------------------------------
+
+export const EXIT_SUCCESS = 0
+export const EXIT_ERROR = 1
+export const EXIT_BLOCKED = 10
+export const EXIT_CANCELLED = 11
+
+/**
+ * Map a headless session status string to its standardized exit code.
+ *
+ *   success   → 0
+ *   error     → 1
+ *   timeout   → 1
+ *   blocked   → 10
+ *   cancelled → 11
+ *
+ * Unknown statuses default to EXIT_ERROR (1).
+ */
+export function mapStatusToExitCode(status: string): number {
+  switch (status) {
+    case 'success':
+    case 'complete':
+      return EXIT_SUCCESS
+    case 'error':
+    case 'timeout':
+      return EXIT_ERROR
+    case 'blocked':
+      return EXIT_BLOCKED
+    case 'cancelled':
+      return EXIT_CANCELLED
+    default:
+      return EXIT_ERROR
+  }
+}
+
 // ---------------------------------------------------------------------------
 // Completion Detection
 // ---------------------------------------------------------------------------
--- a/src/headless-types.ts
+++ b/src/headless-types.ts
@ -0,0 +1,39 @@
+/**
+ * Headless Types — shared types for the headless orchestrator surface.
+ *
+ * Contains the structured result type emitted in --output-format json mode
+ * and the output format discriminator.
+ */
+
+// ---------------------------------------------------------------------------
+// Output Format
+// ---------------------------------------------------------------------------
+
+export type OutputFormat = 'text' | 'json' | 'stream-json'
+
+export const VALID_OUTPUT_FORMATS: ReadonlySet<string> = new Set(['text', 'json', 'stream-json'])
+
+// ---------------------------------------------------------------------------
+// Structured JSON Result
+// ---------------------------------------------------------------------------
+
+export interface HeadlessJsonResult {
+  status: 'success' | 'error' | 'blocked' | 'cancelled' | 'timeout'
+  exitCode: number
+  sessionId?: string
+  duration: number
+  cost: {
+    total: number
+    input_tokens: number
+    output_tokens: number
+    cache_read_tokens: number
+    cache_write_tokens: number
+  }
+  toolCalls: number
+  events: number
+  milestone?: string
+  phase?: string
+  nextAction?: string
+  artifacts?: string[]
+  commits?: string[]
+}
--- a/src/headless.ts
+++ b/src/headless.ts
@ -6,9 +6,10 @@
 * progress to stderr.
 *
 * Exit codes:
- *   0 — complete (command finished successfully)
- *   1 — error or timeout
- *   2 — blocked (command reported a blocker)
+ *   0  — complete (command finished successfully)
+ *   1  — error or timeout
+ *   10 — blocked (command reported a blocker)
+ *   11 — cancelled (SIGINT/SIGTERM received)
 */

 import { existsSync, mkdirSync, writeFileSync } from 'node:fs'
@ -27,8 +28,16 @@ import {
  FIRE_AND_FORGET_METHODS,
  IDLE_TIMEOUT_MS,
  NEW_MILESTONE_IDLE_TIMEOUT_MS,
+  EXIT_SUCCESS,
+  EXIT_ERROR,
+  EXIT_BLOCKED,
+  EXIT_CANCELLED,
+  mapStatusToExitCode,
 } from './headless-events.js'

+import type { OutputFormat } from './headless-types.js'
+import { VALID_OUTPUT_FORMATS } from './headless-types.js'
+
 import {
  handleExtensionUIRequest,
  formatProgress,
@ -48,6 +57,7 @@ import {
 export interface HeadlessOptions {
  timeout: number
  json: boolean
+  outputFormat: OutputFormat
  model?: string
  command: string
  commandArgs: string[]
@ -60,6 +70,8 @@ export interface HeadlessOptions {
  responseTimeout?: number // timeout for orchestrator response (default 30000ms)
  answers?: string       // path to answers JSON file
  eventFilter?: Set<string>  // filter JSONL output to specific event types
+  resumeSession?: string // session ID to resume (--resume <id>)
+  bare?: boolean         // --bare: suppress CLAUDE.md/AGENTS.md, user skills, project preferences
 }

 interface TrackedEvent {
@ -76,6 +88,7 @@ export function parseHeadlessArgs(argv: string[]): HeadlessOptions {
  const options: HeadlessOptions = {
    timeout: 300_000,
    json: false,
+    outputFormat: 'text',
    command: 'auto',
    commandArgs: [],
  }
@ -96,6 +109,17 @@ export function parseHeadlessArgs(argv: string[]): HeadlessOptions {
        }
      } else if (arg === '--json') {
        options.json = true
+        options.outputFormat = 'stream-json'
+      } else if (arg === '--output-format' && i + 1 < args.length) {
+        const fmt = args[++i]
+        if (!VALID_OUTPUT_FORMATS.has(fmt)) {
+          process.stderr.write(`[headless] Error: --output-format must be one of: text, json, stream-json (got '${fmt}')\n`)
+          process.exit(1)
+        }
+        options.outputFormat = fmt as OutputFormat
+        if (fmt === 'stream-json' || fmt === 'json') {
+          options.json = true
+        }
      } else if (arg === '--model' && i + 1 < args.length) {
        // --model can also be passed from the main CLI; headless-specific takes precedence
        options.model = args[++i]
@ -118,15 +142,25 @@ export function parseHeadlessArgs(argv: string[]): HeadlessOptions {
      } else if (arg === '--events' && i + 1 < args.length) {
        options.eventFilter = new Set(args[++i].split(','))
        options.json = true  // --events implies --json
+        if (options.outputFormat === 'text') {
+          options.outputFormat = 'stream-json'
+        }
      } else if (arg === '--supervised') {
        options.supervised = true
        options.json = true  // supervised implies json
+        if (options.outputFormat === 'text') {
+          options.outputFormat = 'stream-json'
+        }
      } else if (arg === '--response-timeout' && i + 1 < args.length) {
        options.responseTimeout = parseInt(args[++i], 10)
        if (Number.isNaN(options.responseTimeout) || options.responseTimeout <= 0) {
          process.stderr.write('[headless] Error: --response-timeout must be a positive integer (milliseconds)\n')
          process.exit(1)
        }
+      } else if (arg === '--resume' && i + 1 < args.length) {
+        options.resumeSession = args[++i]
+      } else if (arg === '--bare') {
+        options.bare = true
      }
    } else if (!positionalStarted) {
      positionalStarted = true
@ -151,7 +185,7 @@ export async function runHeadless(options: HeadlessOptions): Promise<void> {
    const result = await runHeadlessOnce(options, restartCount)

    // Success or blocked — exit normally
-    if (result.exitCode === 0 || result.exitCode === 2) {
+    if (result.exitCode === EXIT_SUCCESS || result.exitCode === EXIT_BLOCKED) {
      process.exit(result.exitCode)
    }

@ -275,6 +309,10 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):
  if (injector) {
    clientOptions.env = injector.getSecretEnvVars()
  }
+  // Propagate --bare to the child process
+  if (options.bare) {
+    clientOptions.args = [...((clientOptions.args as string[]) || []), '--bare']
+  }

  const client = new RpcClient(clientOptions)

@ -349,7 +387,7 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):
  const timeoutTimer = options.timeout > 0
    ? setTimeout(() => {
        process.stderr.write(`[headless] Timeout after ${options.timeout / 1000}s\n`)
-        exitCode = 1
+        exitCode = EXIT_ERROR
        resolveCompletion()
      }, options.timeout)
    : null
@ -395,7 +433,7 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):
      if (injector && !FIRE_AND_FORGET_METHODS.has(String(eventObj.method ?? ''))) {
        if (injector.tryHandle(eventObj, stdinWriter)) {
          if (completed) {
-            exitCode = blocked ? 2 : 0
+            exitCode = blocked ? EXIT_BLOCKED : EXIT_SUCCESS
            resolveCompletion()
          }
          return
@ -421,7 +459,7 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):

      // If we detected a terminal notification, resolve after responding
      if (completed) {
-        exitCode = blocked ? 2 : 0
+        exitCode = blocked ? EXIT_BLOCKED : EXIT_SUCCESS
        resolveCompletion()
        return
      }
@ -442,7 +480,7 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):
  const signalHandler = () => {
    process.stderr.write('\n[headless] Interrupted, stopping child process...\n')
    interrupted = true
-    exitCode = 1
+    exitCode = EXIT_CANCELLED
    client.stop().finally(() => {
      if (timeoutTimer) clearTimeout(timeoutTimer)
      if (idleTimer) clearTimeout(idleTimer)
@ -492,10 +530,9 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):
    if (!completed) {
      const msg = `[headless] Child process exited unexpectedly with code ${code ?? 'null'}\n`
      process.stderr.write(msg)
-      exitCode = 1
+      exitCode = EXIT_ERROR
      resolveCompletion()
-    }
-  })
+    }  })

  if (!options.json) {
    process.stderr.write(`[headless] Running /gsd ${options.command}${options.commandArgs.length > 0 ? ' ' + options.commandArgs.join(' ') : ''}...\n`)
@ -507,16 +544,16 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):
    await client.prompt(command)
  } catch (err) {
    process.stderr.write(`[headless] Error: Failed to send prompt: ${err instanceof Error ? err.message : String(err)}\n`)
-    exitCode = 1
+    exitCode = EXIT_ERROR
  }

  // Wait for completion
-  if (exitCode === 0 || exitCode === 2) {
+  if (exitCode === EXIT_SUCCESS || exitCode === EXIT_BLOCKED) {
    await completionPromise
  }

  // Auto-mode chaining: if --auto and milestone creation succeeded, send /gsd auto
-  if (isNewMilestone && options.auto && milestoneReady && !blocked && exitCode === 0) {
+  if (isNewMilestone && options.auto && milestoneReady && !blocked && exitCode === EXIT_SUCCESS) {
    if (!options.json) {
      process.stderr.write('[headless] Milestone ready — chaining into auto-mode...\n')
    }
@ -535,10 +572,10 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):
      await client.prompt('/gsd auto')
    } catch (err) {
      process.stderr.write(`[headless] Error: Failed to start auto-mode: ${err instanceof Error ? err.message : String(err)}\n`)
-      exitCode = 1
+      exitCode = EXIT_ERROR
    }

-    if (exitCode === 0 || exitCode === 2) {
+    if (exitCode === EXIT_SUCCESS || exitCode === EXIT_BLOCKED) {
      await autoCompletionPromise
    }
  }
@ -557,7 +594,7 @@ async function runHeadlessOnce(options: HeadlessOptions, restartCount: number):

  // Summary
  const duration = ((Date.now() - startTime) / 1000).toFixed(1)
-  const status = blocked ? 'blocked' : exitCode === 1 ? (totalEvents === 0 ? 'error' : 'timeout') : 'complete'
+  const status = blocked ? 'blocked' : exitCode === EXIT_CANCELLED ? 'cancelled' : exitCode === EXIT_ERROR ? (totalEvents === 0 ? 'error' : 'timeout') : 'complete'

  process.stderr.write(`[headless] Status: ${status}\n`)
  process.stderr.write(`[headless] Duration: ${duration}s\n`)
--- a/src/help-text.ts
+++ b/src/help-text.ts
@ -94,9 +94,12 @@ const SUBCOMMAND_HELP: Record<string, string> = {
    'Run /gsd commands without the TUI. Default command: auto',
    '',
    'Flags:',
-    '  --timeout N          Overall timeout in ms (default: 300000)',
-    '  --json               JSONL event stream to stdout',
-    '  --model ID           Override model',
+    '  --timeout N            Overall timeout in ms (default: 300000)',
+    '  --json                 JSONL event stream to stdout (alias for --output-format stream-json)',
+    '  --output-format <fmt>  Output format: text (default), json (structured result), stream-json (JSONL events)',
+    '  --bare                 Minimal context: skip CLAUDE.md, AGENTS.md, user settings, user skills',
+    '  --resume <id>          Resume a prior headless session by ID',
+    '  --model ID             Override model',
    '  --supervised           Forward interactive UI requests to orchestrator via stdout/stdin',
    '  --response-timeout N   Timeout (ms) for orchestrator response (default: 30000)',
    '  --answers <path>       Pre-supply answers and secrets (JSON file)',
@ -115,11 +118,19 @@ const SUBCOMMAND_HELP: Record<string, string> = {
    '  --auto               Start auto-mode after milestone creation',
    '  --verbose            Show tool calls in progress output',
    '',
+    'Output formats:',
+    '  text         Human-readable progress on stderr (default)',
+    '  json         Collect events silently, emit structured HeadlessJsonResult on stdout at exit',
+    '  stream-json  Stream JSONL events to stdout in real time (same as --json)',
+    '',
    'Examples:',
    '  gsd headless                                    Run /gsd auto',
    '  gsd headless next                               Run one unit',
-    '  gsd headless --json status                      Machine-readable status',
+    '  gsd headless --output-format json auto           Structured JSON result on stdout',
+    '  gsd headless --json status                      Machine-readable JSONL stream',
    '  gsd headless --timeout 60000                    With 1-minute timeout',
+    '  gsd headless --bare auto                        Minimal context (CI/ecosystem use)',
+    '  gsd headless --resume abc123 auto               Resume a prior session',
    '  gsd headless new-milestone --context spec.md    Create milestone from file',
    '  cat spec.md | gsd headless new-milestone --context -   From stdin',
    '  gsd headless new-milestone --context spec.md --auto    Create + auto-execute',
@ -128,7 +139,7 @@ const SUBCOMMAND_HELP: Record<string, string> = {
    '  gsd headless --events agent_end,extension_ui_request auto   Filtered event stream',
    '  gsd headless query                              Instant JSON state snapshot',
    '',
-    'Exit codes: 0 = complete, 1 = error/timeout, 2 = blocked',
+    'Exit codes: 0 = success, 1 = error/timeout, 10 = blocked, 11 = cancelled',
  ].join('\n'),
 }

--- a/src/onboarding.ts
+++ b/src/onboarding.ts
@ -669,10 +669,12 @@ async function runRemoteQuestionsStep(
  pc: PicoModule,
  authStorage: AuthStorage,
 ): Promise<string | null> {
-  // Check existing config
-  const hasDiscord = authStorage.has('discord_bot') && !!(authStorage.get('discord_bot') as any)?.key
-  const hasSlack = authStorage.has('slack_bot') && !!(authStorage.get('slack_bot') as any)?.key
-  const hasTelegram = authStorage.has('telegram_bot') && !!(authStorage.get('telegram_bot') as any)?.key
+  // Check existing config — use getCredentialsForProvider to skip empty-key entries
+  const hasValidKey = (provider: string) =>
+    authStorage.getCredentialsForProvider(provider).some((c: any) => c.type === 'api_key' && c.key)
+  const hasDiscord = hasValidKey('discord_bot')
+  const hasSlack = hasValidKey('slack_bot')
+  const hasTelegram = hasValidKey('telegram_bot')
  const existingChannel = hasDiscord ? 'Discord' : hasSlack ? 'Slack' : hasTelegram ? 'Telegram' : null

  type RemoteOption = { value: string; label: string; hint?: string }
--- a/src/remote-questions-config.ts
+++ b/src/remote-questions-config.ts
@ -16,7 +16,7 @@ import { appRoot } from "./app-paths.js";
 // boundary — this file is compiled by tsc, but preferences.ts is loaded
 // via jiti at runtime. Importing it as .js fails because no .js exists
 // in dist/. See #592, #1110.
-const GLOBAL_PREFERENCES_PATH = join(appRoot, "preferences.md");
+const GLOBAL_PREFERENCES_PATH = join(appRoot, "PREFERENCES.md");

 export function saveRemoteQuestionsConfig(channel: "slack" | "discord" | "telegram", channelId: string): void {
  const prefsPath = GLOBAL_PREFERENCES_PATH;
--- a/src/resources/extensions/async-jobs/async-bash-tool.ts
+++ b/src/resources/extensions/async-jobs/async-bash-tool.ts
@ -14,7 +14,7 @@ import {
 	DEFAULT_MAX_LINES,
 } from "@gsd/pi-coding-agent";
 import { Type } from "@sinclair/typebox";
-import { spawn } from "node:child_process";
+import { spawn, spawnSync } from "node:child_process";
 import { createWriteStream } from "node:fs";
 import { tmpdir } from "node:os";
 import { join } from "node:path";
@ -38,17 +38,24 @@ function getTempFilePath(): string {
 }

 /**
- * Kill a process and its children. Uses process group kill on Unix.
+ * Kill a process and its children (cross-platform).
+ * Uses process group kill on Unix; taskkill /F /T on Windows.
 */
 function killTree(pid: number): void {
-	try {
-		// Kill the process group (negative PID)
-		process.kill(-pid, "SIGTERM");
-	} catch {
+	if (process.platform === "win32") {
 		try {
-			process.kill(pid, "SIGTERM");
+			spawnSync("taskkill", ["/F", "/T", "/PID", String(pid)], {
+				timeout: 5_000,
+				stdio: "ignore",
+			});
 		} catch {
-			// Already exited
+			try { process.kill(pid, "SIGTERM"); } catch { /* already exited */ }
+		}
+	} else {
+		try {
+			process.kill(-pid, "SIGTERM");
+		} catch {
+			try { process.kill(pid, "SIGTERM"); } catch { /* already exited */ }
 		}
 	}
 }
@ -118,9 +125,13 @@ function executeBashInBackground(
 		const rewrittenCommand = rewriteCommandWithRtk(command);
 		const resolvedCommand = sanitizeCommand(rewrittenCommand);

+		// On Windows, detached: true sets CREATE_NEW_PROCESS_GROUP which can
+		// cause EINVAL in VSCode/ConPTY terminal contexts.  The bg-shell
+		// extension already guards this (process-manager.ts); align here.
+		// Process-tree cleanup uses taskkill /F /T on Windows regardless.
 		const child = spawn(shell, [...args, resolvedCommand], {
 			cwd,
-			detached: true,
+			detached: process.platform !== "win32",
 			env: { ...process.env },
 			stdio: ["ignore", "pipe", "pipe"],
 		});
@ -143,8 +154,8 @@ function executeBashInBackground(
 				// If the process ignores SIGTERM, escalate to SIGKILL
 				sigkillHandle = setTimeout(() => {
 					if (child.pid) {
-						try { process.kill(-child.pid, "SIGKILL"); } catch { /* ignore */ }
-						try { process.kill(child.pid, "SIGKILL"); } catch { /* ignore */ }
+						// killTree already uses taskkill /F /T on Windows
+						killTree(child.pid);
 					}

 					// Hard deadline: if even SIGKILL doesn't trigger 'close',
--- a/src/resources/extensions/claude-code-cli/stream-adapter.ts
+++ b/src/resources/extensions/claude-code-cli/stream-adapter.ts
@ -113,6 +113,20 @@ function makeErrorMessage(model: string, errorMsg: string): AssistantMessage {
 	};
 }

+/**
+ * Generator exhaustion without a terminal result means the SDK stream was
+ * interrupted mid-turn. Surface it as an error so downstream recovery logic
+ * can classify and retry it instead of treating it as a clean completion.
+ */
+export function makeStreamExhaustedErrorMessage(model: string, lastTextContent: string): AssistantMessage {
+	const errorMsg = "stream_exhausted_without_result";
+	const message = makeErrorMessage(model, errorMsg);
+	if (lastTextContent) {
+		message.content = [{ type: "text", text: lastTextContent }];
+	}
+	return message;
+}
+
 // ---------------------------------------------------------------------------
 // streamSimple implementation
 // ---------------------------------------------------------------------------
@ -339,26 +353,11 @@ async function pumpSdkMessages(
 			}
 		}

-		// Generator exhausted without a result message (unexpected)
-		const fallbackContent: AssistantMessage["content"] = [];
-		if (lastTextContent) {
-			fallbackContent.push({ type: "text", text: lastTextContent });
-		}
-		if (fallbackContent.length === 0) {
-			fallbackContent.push({ type: "text", text: "(Claude Code session ended without a response)" });
-		}
-
-		const fallback: AssistantMessage = {
-			role: "assistant",
-			content: fallbackContent,
-			api: "anthropic-messages",
-			provider: "claude-code",
-			model: modelId,
-			usage: { ...ZERO_USAGE },
-			stopReason: "stop",
-			timestamp: Date.now(),
-		};
-		stream.push({ type: "done", reason: "stop", message: fallback });
+		// Generator exhaustion without a terminal result is a stream interruption,
+		// not a successful completion. Emitting an error lets GSD classify it as a
+		// transient provider failure instead of advancing auto-mode state.
+		const fallback = makeStreamExhaustedErrorMessage(modelId, lastTextContent);
+		stream.push({ type: "error", reason: "error", error: fallback });
 	} catch (err) {
 		const errorMsg = err instanceof Error ? err.message : String(err);
 		stream.push({
--- a/src/resources/extensions/claude-code-cli/tests/stream-adapter.test.ts
+++ b/src/resources/extensions/claude-code-cli/tests/stream-adapter.test.ts
@ -0,0 +1,21 @@
+import { describe, test } from "node:test";
+import assert from "node:assert/strict";
+import { makeStreamExhaustedErrorMessage } from "../stream-adapter.ts";
+
+describe("stream-adapter — exhausted stream fallback (#2575)", () => {
+	test("generator exhaustion becomes an error message instead of clean completion", () => {
+		const message = makeStreamExhaustedErrorMessage("claude-sonnet-4-20250514", "partial answer");
+
+		assert.equal(message.stopReason, "error");
+		assert.equal(message.errorMessage, "stream_exhausted_without_result");
+		assert.deepEqual(message.content, [{ type: "text", text: "partial answer" }]);
+	});
+
+	test("generator exhaustion without prior text still exposes a classifiable error", () => {
+		const message = makeStreamExhaustedErrorMessage("claude-sonnet-4-20250514", "");
+
+		assert.equal(message.stopReason, "error");
+		assert.equal(message.errorMessage, "stream_exhausted_without_result");
+		assert.match(String((message.content[0] as any)?.text ?? ""), /Claude Code error: stream_exhausted_without_result/);
+	});
+});
--- a/src/resources/extensions/gsd/auto-dispatch.ts
+++ b/src/resources/extensions/gsd/auto-dispatch.ts
@ -626,6 +626,25 @@ export const DISPATCH_RULES: DispatchRule[] = [
    match: async ({ state, mid, midTitle, basePath }) => {
      if (state.phase !== "completing-milestone") return null;

+      // Safety guard (#2675): block completion when VALIDATION verdict is
+      // needs-remediation. The state machine treats needs-remediation as
+      // terminal (to prevent validate-milestone loops per #832), but
+      // completing-milestone should NOT proceed — remediation work is needed.
+      const validationFile = resolveMilestoneFile(basePath, mid, "VALIDATION");
+      if (validationFile) {
+        const validationContent = await loadFile(validationFile);
+        if (validationContent) {
+          const verdict = extractVerdict(validationContent);
+          if (verdict === "needs-remediation") {
+            return {
+              action: "stop",
+              reason: `Cannot complete milestone ${mid}: VALIDATION verdict is "needs-remediation". Address the remediation findings and re-run validation, or update the verdict manually.`,
+              level: "warning",
+            };
+          }
+        }
+      }
+
      // Safety guard (#1368): verify all roadmap slices have SUMMARY files.
      const missingSlices = findMissingSummaries(basePath, mid);
      if (missingSlices.length > 0) {
--- a/src/resources/extensions/gsd/auto-start.ts
+++ b/src/resources/extensions/gsd/auto-start.ts
@ -67,6 +67,7 @@ import {
  getDebugLogPath,
 } from "./debug-logger.js";
 import { parseUnitId } from "./unit-id.js";
+import { setLogBasePath } from "./workflow-logger.js";
 import type { AutoSession } from "./auto/session.js";
 import {
  existsSync,
@ -461,6 +462,7 @@ export async function bootstrapAutoSession(
    s.verbose = verboseMode;
    s.cmdCtx = ctx;
    s.basePath = base;
+    setLogBasePath(base);
    s.unitDispatchCount.clear();
    s.unitRecoveryCount.clear();
    s.lastBudgetAlertLevel = 0;
--- a/src/resources/extensions/gsd/auto-timers.ts
+++ b/src/resources/extensions/gsd/auto-timers.ts
@ -15,6 +15,7 @@ import { computeBudgets, resolveExecutorContextWindow } from "./context-budget.j
 import {
  getInFlightToolCount,
  getOldestInFlightToolStart,
+  clearInFlightTools,
 } from "./auto-tool-tracking.js";
 import { detectWorkingTreeActivity } from "./auto-supervisor.js";
 import { closeoutUnit, type CloseoutOptions } from "./auto-unit-closeout.js";
@ -146,6 +147,7 @@ export function startUnitSupervision(sctx: SupervisionContext): void {

      // Agent has tool calls currently executing — not idle, just waiting.
      // But only suppress recovery if the tool started recently.
+      let stalledToolDetected = false;
      if (getInFlightToolCount() > 0) {
        const oldestStart = getOldestInFlightToolStart()!;
        const toolAgeMs = Date.now() - oldestStart;
@ -156,6 +158,12 @@ export function startUnitSupervision(sctx: SupervisionContext): void {
          });
          return;
        }
+        // Tool has been in-flight longer than idle timeout — treat as hung.
+        // Clear the stale entries so subsequent ticks don't re-detect them,
+        // and set the flag so the filesystem-activity check below does not
+        // override the stall verdict (#2527).
+        stalledToolDetected = true;
+        clearInFlightTools();
        ctx.ui.notify(
          `Stalled tool detected: a tool has been in-flight for ${Math.round(toolAgeMs / 60000)}min. Treating as hung — attempting idle recovery.`,
          "warning",
@ -163,7 +171,9 @@ export function startUnitSupervision(sctx: SupervisionContext): void {
      }

      // Check if the agent is producing work on disk.
-      if (detectWorkingTreeActivity(s.basePath)) {
+      // Skip this when a stalled tool was just detected — filesystem changes
+      // from earlier in the task should not override the stall verdict (#2527).
+      if (!stalledToolDetected && detectWorkingTreeActivity(s.basePath)) {
        writeUnitRuntimeRecord(s.basePath, unitType, unitId, s.currentUnit.startedAt, {
          lastProgressAt: Date.now(),
          lastProgressKind: "filesystem-activity",
@ -180,6 +190,10 @@ export function startUnitSupervision(sctx: SupervisionContext): void {
      const recovery = await recoverTimedOutUnit(ctx, pi, unitType, unitId, "idle", buildRecoveryContext());
      if (recovery === "recovered") return;

+      // Guard: recoverTimedOutUnit is async — pauseAuto/stopAuto may have
+      // set s.currentUnit = null during the await (#2527).
+      if (!s.currentUnit) return;
+
      writeUnitRuntimeRecord(s.basePath, unitType, unitId, s.currentUnit.startedAt, {
        phase: "paused",
      });
--- a/src/resources/extensions/gsd/auto-worktree.ts
+++ b/src/resources/extensions/gsd/auto-worktree.ts
@ -418,6 +418,22 @@ export function syncGsdStateToWorktree(
    }
  }

+  // Forward-sync preferences.md from project root to worktree (additive only).
+  // NOT in ROOT_STATE_FILES because syncWorktreeStateBack() must never overwrite
+  // the project root's preferences — the project root is authoritative (#2684).
+  {
+    const src = join(mainGsd, "preferences.md");
+    const dst = join(wtGsd, "preferences.md");
+    if (existsSync(src) && !existsSync(dst)) {
+      try {
+        cpSync(src, dst);
+        synced.push("preferences.md");
+      } catch {
+        /* non-fatal */
+      }
+    }
+  }
+
  // Sync milestones: copy entire milestone directories that are missing
  const mainMilestonesDir = join(mainGsd, "milestones");
  const wtMilestonesDir = join(wtGsd, "milestones");
@ -948,8 +964,7 @@ function copyPlanningArtifacts(srcBase: string, wtPath: string): void {
    "STATE.md",
    "KNOWLEDGE.md",
    "OVERRIDES.md",
-    "preferences.md",  // #2684: must be seeded so post_unit_hooks and
-                        // preference-driven config work inside worktrees.
+    "preferences.md",
  ]) {
    safeCopy(join(srcGsd, file), join(dstGsd, file), { force: true });
  }
--- a/src/resources/extensions/gsd/auto.ts
+++ b/src/resources/extensions/gsd/auto.ts
@ -114,6 +114,7 @@ import {
  formatCost,
  formatTokenCount,
 } from "./metrics.js";
+import { setLogBasePath } from "./workflow-logger.js";
 import { join } from "node:path";
 import { readFileSync, existsSync, mkdirSync, writeFileSync, unlinkSync } from "node:fs";
 import { atomicWriteSync } from "./atomic-write.js";
@ -1102,6 +1103,7 @@ export async function startAuto(
    s.stepMode = requestedStepMode;
    s.cmdCtx = ctx;
    s.basePath = base;
+    setLogBasePath(base);
    s.unitDispatchCount.clear();
    s.unitLifetimeDispatches.clear();
    if (!getLedger()) initMetrics(base);
--- a/src/resources/extensions/gsd/bootstrap/dynamic-tools.ts
+++ b/src/resources/extensions/gsd/bootstrap/dynamic-tools.ts
@ -5,6 +5,7 @@ import type { ExtensionAPI } from "@gsd/pi-coding-agent";
 import { createBashTool, createEditTool, createReadTool, createWriteTool } from "@gsd/pi-coding-agent";

 import { DEFAULT_BASH_TIMEOUT_SECS } from "../constants.js";
+import { setLogBasePath } from "../workflow-logger.js";

 /**
 * Resolve the correct DB path for the current working directory.
@ -43,9 +44,14 @@ export async function ensureDbOpen(): Promise<boolean> {
    const dbPath = resolveProjectRootDbPath(basePath);
    const gsdDir = join(basePath, ".gsd");

+    // Derive the project root from the DB path (strip .gsd/gsd.db)
+    const projectRoot = join(dbPath, "..", "..");
+
    // Open existing DB file (may be at project root for worktrees)
    if (existsSync(dbPath)) {
-      return db.openDatabase(dbPath);
+      const opened = db.openDatabase(dbPath);
+      if (opened) setLogBasePath(projectRoot);
+      return opened;
    }

    // No DB file — create + migrate from Markdown if .gsd/ has content
@ -56,6 +62,7 @@ export async function ensureDbOpen(): Promise<boolean> {
      if (hasDecisions || hasRequirements || hasMilestones) {
        const opened = db.openDatabase(dbPath);
        if (opened) {
+          setLogBasePath(projectRoot);
          try {
            const { migrateFromMarkdown } = await import("../md-importer.js");
            migrateFromMarkdown(basePath);
@ -69,7 +76,9 @@ export async function ensureDbOpen(): Promise<boolean> {
      }

      // .gsd/ exists but has no Markdown content (fresh project) — create empty DB
-      return db.openDatabase(dbPath);
+      const opened = db.openDatabase(dbPath);
+      if (opened) setLogBasePath(projectRoot);
+      return opened;
    }

    return false;
--- a/src/resources/extensions/gsd/claude-import.ts
+++ b/src/resources/extensions/gsd/claude-import.ts
@ -103,16 +103,47 @@ function isMarketplacePath(pluginPath: string): boolean {

 /**
 * Detect which plugin roots are marketplaces and which are legacy flat paths.
+ *
+ * Claude Code stores marketplace sources under ~/.claude/plugins/marketplaces/.
+ * Each subdirectory (e.g. marketplaces/confluent/) is a marketplace repo that
+ * contains .claude-plugin/marketplace.json. The parent directory itself does not
+ * have a marketplace.json, so we scan one level deeper when the root isn't
+ * directly a marketplace.
 */
-function categorizePluginRoots(pluginRoots: string[]): { marketplaces: string[]; flat: string[] } {
+export function categorizePluginRoots(pluginRoots: string[]): { marketplaces: string[]; flat: string[] } {
  const marketplaces: string[] = [];
  const flat: string[] = [];
+  const seen = new Set<string>();

  for (const root of pluginRoots) {
    if (isMarketplacePath(root)) {
-      marketplaces.push(root);
+      if (!seen.has(root)) {
+        marketplaces.push(root);
+        seen.add(root);
+      }
    } else {
-      flat.push(root);
+      // The root itself isn't a marketplace — check if it's a container of
+      // marketplaces (e.g. ~/.claude/plugins/marketplaces/ contains subdirs
+      // like confluent/, claude-hud/, each with their own marketplace.json).
+      let foundChild = false;
+      try {
+        const entries = readdirSync(root, { withFileTypes: true });
+        for (const entry of entries) {
+          if (!entry.isDirectory()) continue;
+          if (SKIP_DIRS.has(entry.name)) continue;
+          const childPath = join(root, entry.name);
+          if (isMarketplacePath(childPath) && !seen.has(childPath)) {
+            marketplaces.push(childPath);
+            seen.add(childPath);
+            foundChild = true;
+          }
+        }
+      } catch {
+        // Can't read directory — fall through to flat
+      }
+      if (!foundChild) {
+        flat.push(root);
+      }
    }
  }

@ -170,18 +201,36 @@ export function discoverClaudePlugins(cwd: string): ClaudePluginCandidate[] {

  for (const root of pluginRoots) {
    walkDirs(root, (dir) => {
+      // Recognize both npm-style plugins (package.json) and Claude Code plugins
+      // (.claude-plugin/plugin.json). Claude marketplace-installed plugins use
+      // the latter format exclusively.
      const pkgPath = join(dir, "package.json");
-      if (!existsSync(pkgPath)) return;
+      const claudePluginPath = join(dir, ".claude-plugin", "plugin.json");
+      const hasPkg = existsSync(pkgPath);
+      const hasClaudePlugin = existsSync(claudePluginPath);
+      if (!hasPkg && !hasClaudePlugin) return;
+
      const resolvedDir = resolve(dir);
      if (seen.has(resolvedDir)) return;
      seen.add(resolvedDir);
+
      let packageName: string | undefined;
-      try {
-        const pkg = JSON.parse(readFileSync(pkgPath, "utf8")) as { name?: string };
-        packageName = pkg.name;
-      } catch {
-        packageName = undefined;
+      if (hasPkg) {
+        try {
+          const pkg = JSON.parse(readFileSync(pkgPath, "utf8")) as { name?: string };
+          packageName = pkg.name;
+        } catch {
+          packageName = undefined;
+        }
+      } else if (hasClaudePlugin) {
+        try {
+          const manifest = JSON.parse(readFileSync(claudePluginPath, "utf8")) as { name?: string };
+          packageName = manifest.name;
+        } catch {
+          packageName = undefined;
+        }
      }
+
      results.push({
        type: "plugin",
        name: packageName || basename(dir),
--- a/src/resources/extensions/gsd/commands-config.ts
+++ b/src/resources/extensions/gsd/commands-config.ts
@ -22,6 +22,12 @@ export const TOOL_KEYS = [
  { id: "groq",     env: "GROQ_API_KEY",      label: "Groq Voice",        hint: "console.groq.com" },
 ] as const;

+function getStoredToolKey(auth: AuthStorage, providerId: string): string | undefined {
+  const creds = auth.getCredentialsForProvider(providerId);
+  const cred = creds.find((c) => c.type === "api_key" && c.key);
+  return cred?.type === "api_key" ? cred.key : undefined;
+}
+
 /**
 * Load tool API keys from auth.json into environment variables.
 * Called at session startup to ensure tools have access to their credentials.
@ -33,9 +39,9 @@ export function loadToolApiKeys(): void {

    const auth = AuthStorage.create(authPath);
    for (const tool of TOOL_KEYS) {
-      const cred = auth.get(tool.id);
-      if (cred && cred.type === "api_key" && cred.key && !process.env[tool.env]) {
-        process.env[tool.env] = cred.key;
+      const key = getStoredToolKey(auth, tool.id);
+      if (key && !process.env[tool.env]) {
+        process.env[tool.env] = key;
      }
    }
  } catch {
@ -55,14 +61,14 @@ export async function handleConfig(ctx: ExtensionCommandContext): Promise<void>
  // Show current status
  const statusLines = ["GSD Tool Configuration\n"];
  for (const tool of TOOL_KEYS) {
-    const hasKey = !!process.env[tool.env] || !!(auth.get(tool.id) as { key?: string })?.key;
+    const hasKey = !!process.env[tool.env] || !!getStoredToolKey(auth, tool.id);
    statusLines.push(`  ${hasKey ? "\u2713" : "\u2717"} ${tool.label}${hasKey ? "" : ` \u2014 get key at ${tool.hint}`}`);
  }
  ctx.ui.notify(statusLines.join("\n"), "info");

  // Ask which tools to configure
  const options = TOOL_KEYS.map(t => {
-    const hasKey = !!process.env[t.env] || !!(auth.get(t.id) as { key?: string })?.key;
+    const hasKey = !!process.env[t.env] || !!getStoredToolKey(auth, t.id);
    return `${t.label} ${hasKey ? "(configured \u2713)" : "(not set)"}`;
  });
  options.push("(done)");
--- a/src/resources/extensions/gsd/commands-prefs-wizard.ts
+++ b/src/resources/extensions/gsd/commands-prefs-wizard.ts
@ -771,7 +771,7 @@ export async function ensurePreferencesFile(
  scope: "global" | "project",
 ): Promise<void> {
  if (!existsSync(path)) {
-    const template = await loadFile(join(dirname(fileURLToPath(import.meta.url)), "templates", "preferences.md"));
+    const template = await loadFile(join(dirname(fileURLToPath(import.meta.url)), "templates", "PREFERENCES.md"));
    if (!template) {
      ctx.ui.notify("Could not load GSD preferences template.", "error");
      return;
--- a/src/resources/extensions/gsd/commands/handlers/auto.ts
+++ b/src/resources/extensions/gsd/commands/handlers/auto.ts
@ -7,6 +7,7 @@ import { enableDebug } from "../../debug-logger.js";
 import { getAutoDashboardData, isAutoActive, isAutoPaused, pauseAuto, startAuto, stopAuto, stopAutoRemote } from "../../auto.js";
 import { handleRate } from "../../commands-rate.js";
 import { guardRemoteSession, projectRoot } from "../context.js";
+import { findMilestoneIds } from "../../milestone-id-utils.js";

 /**
 * Parse --yolo flag and optional file path from the auto command string.
@ -28,6 +29,39 @@ function parseYoloFlag(trimmed: string): { yoloSeedFile: string | null; rest: st
  return { yoloSeedFile: filePath, rest };
 }

+/**
+ * Extract a milestone ID (e.g. M016 or M001-a3b4c5) from the command string.
+ * Returns the matched ID and the remaining string with the ID removed.
+ * The milestone ID pattern matches the format used by findMilestoneIds: M\d+ with
+ * an optional -[a-z0-9]{6} suffix for unique milestone IDs.
+ */
+export function parseMilestoneTarget(input: string): { milestoneId: string | null; rest: string } {
+  const match = input.match(/\b(M\d+(?:-[a-z0-9]{6})?)\b/);
+  if (!match) return { milestoneId: null, rest: input };
+  const rest = input.replace(match[0], "").replace(/\s+/g, " ").trim();
+  return { milestoneId: match[1], rest };
+}
+
+/**
+ * Set GSD_MILESTONE_LOCK to target a specific milestone, then run `fn`.
+ * Clears the env var when `fn` resolves or rejects, so the lock does not
+ * leak into subsequent commands in the same process.
+ */
+async function withMilestoneLock(milestoneId: string, fn: () => Promise<void>): Promise<void> {
+  const previous = process.env.GSD_MILESTONE_LOCK;
+  process.env.GSD_MILESTONE_LOCK = milestoneId;
+  try {
+    await fn();
+  } finally {
+    // Restore previous value (undefined → delete, else restore).
+    if (previous === undefined) {
+      delete process.env.GSD_MILESTONE_LOCK;
+    } else {
+      process.env.GSD_MILESTONE_LOCK = previous;
+    }
+  }
+}
+
 export async function handleAutoCommand(trimmed: string, ctx: ExtensionCommandContext, pi: ExtensionAPI): Promise<boolean> {
  if (trimmed === "next" || trimmed.startsWith("next ")) {
    if (trimmed.includes("--dry-run")) {
@ -35,21 +69,48 @@ export async function handleAutoCommand(trimmed: string, ctx: ExtensionCommandCo
      await handleDryRun(ctx, projectRoot());
      return true;
    }
-    const verboseMode = trimmed.includes("--verbose");
-    const debugMode = trimmed.includes("--debug");
+    const { milestoneId, rest: afterMilestone } = parseMilestoneTarget(trimmed);
+    const verboseMode = afterMilestone.includes("--verbose");
+    const debugMode = afterMilestone.includes("--debug");
    if (debugMode) enableDebug(projectRoot());
    if (!(await guardRemoteSession(ctx, pi))) return true;
-    await startAuto(ctx, pi, projectRoot(), verboseMode, { step: true });
+
+    // Validate the milestone target exists and is not already complete.
+    if (milestoneId) {
+      const allIds = findMilestoneIds(projectRoot());
+      if (!allIds.includes(milestoneId)) {
+        ctx.ui.notify(`Milestone ${milestoneId} does not exist. Available: ${allIds.join(", ") || "(none)"}`, "error");
+        return true;
+      }
+    }
+
+    if (milestoneId) {
+      await withMilestoneLock(milestoneId, () =>
+        startAuto(ctx, pi, projectRoot(), verboseMode, { step: true }),
+      );
+    } else {
+      await startAuto(ctx, pi, projectRoot(), verboseMode, { step: true });
+    }
    return true;
  }

  if (trimmed === "auto" || trimmed.startsWith("auto ")) {
-    const { yoloSeedFile, rest } = parseYoloFlag(trimmed);
-    const verboseMode = rest.includes("--verbose");
-    const debugMode = rest.includes("--debug");
+    const { yoloSeedFile, rest: afterYolo } = parseYoloFlag(trimmed);
+    const { milestoneId, rest: afterMilestone } = parseMilestoneTarget(afterYolo);
+    const verboseMode = afterMilestone.includes("--verbose");
+    const debugMode = afterMilestone.includes("--debug");
    if (debugMode) enableDebug(projectRoot());
    if (!(await guardRemoteSession(ctx, pi))) return true;

+    // Validate the milestone target exists and is not already complete.
+    if (milestoneId) {
+      const allIds = findMilestoneIds(projectRoot());
+      if (!allIds.includes(milestoneId)) {
+        ctx.ui.notify(`Milestone ${milestoneId} does not exist. Available: ${allIds.join(", ") || "(none)"}`, "error");
+        return true;
+      }
+    }
+
    if (yoloSeedFile) {
      const resolved = resolve(projectRoot(), yoloSeedFile);
      if (!existsSync(resolved)) {
@ -66,6 +127,12 @@ export async function handleAutoCommand(trimmed: string, ctx: ExtensionCommandCo
      // when the LLM says "Milestone X ready."
      const { showHeadlessMilestoneCreation } = await import("../../guided-flow.js");
      await showHeadlessMilestoneCreation(ctx, pi, projectRoot(), seedContent);
+    } else if (milestoneId) {
+      // Target a specific milestone — use GSD_MILESTONE_LOCK so state
+      // derivation only sees this milestone (#2521).
+      await withMilestoneLock(milestoneId, () =>
+        startAuto(ctx, pi, projectRoot(), verboseMode),
+      );
    } else {
      await startAuto(ctx, pi, projectRoot(), verboseMode);
    }
--- a/src/resources/extensions/gsd/detection.ts
+++ b/src/resources/extensions/gsd/detection.ts
@ -359,8 +359,8 @@ function detectV2Gsd(basePath: string): V2Detection | null {
  if (!existsSync(gsdPath)) return null;

  const hasPreferences =
-    existsSync(join(gsdPath, "preferences.md")) ||
-    existsSync(join(gsdPath, "PREFERENCES.md"));
+    existsSync(join(gsdPath, "PREFERENCES.md")) ||
+    existsSync(join(gsdPath, "preferences.md"));

  const hasContext = existsSync(join(gsdPath, "CONTEXT.md"));

@ -714,8 +714,8 @@ function detectVerificationCommands(
 */
 export function hasGlobalSetup(): boolean {
  return (
-    existsSync(join(gsdHome, "preferences.md")) ||
-    existsSync(join(gsdHome, "PREFERENCES.md"))
+    existsSync(join(gsdHome, "PREFERENCES.md")) ||
+    existsSync(join(gsdHome, "preferences.md"))
  );
 }

@ -728,8 +728,8 @@ export function isFirstEverLaunch(): boolean {

  // If we have preferences, not first launch
  if (
-    existsSync(join(gsdHome, "preferences.md")) ||
-    existsSync(join(gsdHome, "PREFERENCES.md"))
+    existsSync(join(gsdHome, "PREFERENCES.md")) ||
+    existsSync(join(gsdHome, "preferences.md"))
  ) {
    return false;
  }
--- a/src/resources/extensions/gsd/docs/preferences-reference.md
+++ b/src/resources/extensions/gsd/docs/preferences-reference.md
@ -1,6 +1,6 @@
 # GSD Preferences Reference

-Full documentation for `~/.gsd/preferences.md` (global) and `.gsd/preferences.md` (project).
+Full documentation for `~/.gsd/PREFERENCES.md` (global) and `.gsd/PREFERENCES.md` (project).

 ---

@ -51,8 +51,8 @@ skill_rules: []

 Preferences are loaded from two locations and merged:

-1. **Global:** `~/.gsd/preferences.md` — applies to all projects
-2. **Project:** `.gsd/preferences.md` — applies to the current project only
+1. **Global:** `~/.gsd/PREFERENCES.md` — applies to all projects
+2. **Project:** `.gsd/PREFERENCES.md` — applies to the current project only

 **Merge behavior** (see `mergePreferences()` in `preferences.ts`):

--- a/src/resources/extensions/gsd/gitignore.ts
+++ b/src/resources/extensions/gsd/gitignore.ts
@ -1,8 +1,8 @@
 /**
- * GSD bootstrappers for .gitignore and preferences.md
+ * GSD bootstrappers for .gitignore and PREFERENCES.md
 *
 * Ensures baseline .gitignore exists with universally-correct patterns.
- * Creates an empty preferences.md template if it doesn't exist.
+ * Creates an empty PREFERENCES.md template if it doesn't exist.
 * Both idempotent — non-destructive if already present.
 */

@ -216,16 +216,16 @@ export function untrackRuntimeFiles(basePath: string): void {
 }

 /**
- * Ensure basePath/.gsd/preferences.md exists as an empty template.
+ * Ensure basePath/.gsd/PREFERENCES.md exists as an empty template.
 * Creates the file with frontmatter only if it doesn't exist.
 * Returns true if created, false if already exists.
 *
- * Checks both lowercase (canonical) and uppercase (legacy) to avoid
- * creating a duplicate when an uppercase file already exists.
+ * Checks both uppercase (canonical) and lowercase (legacy) to avoid
+ * creating a duplicate when a lowercase file already exists.
 */
 export function ensurePreferences(basePath: string): boolean {
-  const preferencesPath = join(gsdRoot(basePath), "preferences.md");
-  const legacyPath = join(gsdRoot(basePath), "PREFERENCES.md");
+  const preferencesPath = join(gsdRoot(basePath), "PREFERENCES.md");
+  const legacyPath = join(gsdRoot(basePath), "preferences.md");

  if (existsSync(preferencesPath) || existsSync(legacyPath)) {
    return false;
--- a/src/resources/extensions/gsd/gsd-db.ts
+++ b/src/resources/extensions/gsd/gsd-db.ts
@ -1485,6 +1485,18 @@ export function getMilestone(id: string): MilestoneRow | null {
  return rowToMilestone(row);
 }

+/**
+ * Update a milestone's status in the database.
+ * Used by park/unpark to keep the DB in sync with the filesystem marker.
+ * See: https://github.com/gsd-build/gsd-2/issues/2694
+ */
+export function updateMilestoneStatus(milestoneId: string, status: string): void {
+  if (!currentDb) throw new GSDError(GSD_STALE_STATE, "gsd-db: No database open");
+  currentDb.prepare(
+    `UPDATE milestones SET status = :status WHERE id = :id`,
+  ).run({ ":status": status, ":id": milestoneId });
+}
+
 export function getActiveMilestoneFromDb(): MilestoneRow | null {
  if (!currentDb) return null;
  const row = currentDb.prepare(
--- a/src/resources/extensions/gsd/init-wizard.ts
+++ b/src/resources/extensions/gsd/init-wizard.ts
@ -422,9 +422,9 @@ function bootstrapGsdDirectory(
  const gsd = gsdRoot(basePath);
  mkdirSync(join(gsd, "milestones"), { recursive: true });

-  // Write preferences.md from wizard answers
+  // Write PREFERENCES.md from wizard answers
  const preferencesContent = buildPreferencesFile(prefs);
-  writeFileSync(join(gsd, "preferences.md"), preferencesContent, "utf-8");
+  writeFileSync(join(gsd, "PREFERENCES.md"), preferencesContent, "utf-8");

  // Seed CONTEXT.md with detected project signals
  const contextContent = buildContextSeed(signals);
--- a/src/resources/extensions/gsd/key-manager.ts
+++ b/src/resources/extensions/gsd/key-manager.ts
@ -150,22 +150,13 @@ export interface KeyStatus {
 */
 export function getAllKeyStatuses(auth: AuthStorage): KeyStatus[] {
  return PROVIDER_REGISTRY.map((provider) => {
-    const creds = auth.getCredentialsForProvider(provider.id);
+    const rawCreds = auth.getCredentialsForProvider(provider.id);
+    // Filter out empty-key entries (left by legacy removeProviderToken or skipped onboarding)
+    const creds = rawCreds.filter((c) => !(c.type === "api_key" && !(c as ApiKeyCredential).key));
    const envKey = provider.envVar ? process.env[provider.envVar] : undefined;

    if (creds.length > 0) {
      const firstCred = creds[0];
-      // Skip empty keys (from skipped onboarding)
-      if (firstCred.type === "api_key" && !(firstCred as ApiKeyCredential).key) {
-        return {
-          provider,
-          configured: false,
-          source: "none" as const,
-          credentialCount: 0,
-          description: "empty key (skipped setup)",
-          backedOff: false,
-        };
-      }
      const desc =
        creds.length > 1
          ? `${creds.length} keys (round-robin)`
@ -275,7 +266,7 @@ export async function handleAddKey(
  } else {
    // Interactive provider picker
    const options = PROVIDER_REGISTRY.map((p) => {
-      const creds = auth.getCredentialsForProvider(p.id);
+      const creds = auth.getCredentialsForProvider(p.id).filter((c) => !(c.type === "api_key" && !(c as ApiKeyCredential).key));
      const existing = creds.length > 0 ? " (configured)" : "";
      return `[${p.category}] ${p.label}${existing}`;
    });
@ -360,7 +351,7 @@ export async function handleRemoveKey(
  } else {
    // Show only configured providers
    const configured = PROVIDER_REGISTRY.filter((p) => {
-      const creds = auth.getCredentialsForProvider(p.id);
+      const creds = auth.getCredentialsForProvider(p.id).filter((c) => !(c.type === "api_key" && !(c as ApiKeyCredential).key));
      return creds.length > 0;
    });

@ -619,7 +610,7 @@ export async function handleRotateKey(
    // Show only configured API key providers
    const configured = PROVIDER_REGISTRY.filter((p) => {
      const creds = auth.getCredentialsForProvider(p.id);
-      return creds.some((c) => c.type === "api_key");
+      return creds.some((c) => c.type === "api_key" && (c as ApiKeyCredential).key);
    });

    if (configured.length === 0) {
@ -788,7 +779,7 @@ export function runKeyDoctor(auth: AuthStorage): DoctorFinding[] {
    if (!envValue) continue;

    const creds = auth.getCredentialsForProvider(provider.id);
-    const apiKey = creds.find((c) => c.type === "api_key") as ApiKeyCredential | undefined;
+    const apiKey = creds.find((c) => c.type === "api_key" && (c as ApiKeyCredential).key) as ApiKeyCredential | undefined;
    if (apiKey?.key && apiKey.key !== envValue) {
      findings.push({
        severity: "warning",
--- a/src/resources/extensions/gsd/milestone-actions.ts
+++ b/src/resources/extensions/gsd/milestone-actions.ts
@ -20,6 +20,7 @@ import {
 } from "./paths.js";
 import { invalidateAllCaches } from "./cache.js";
 import { loadQueueOrder, saveQueueOrder } from "./queue-order.js";
+import { isDbAvailable, updateMilestoneStatus } from "./gsd-db.js";

 // ─── Park ──────────────────────────────────────────────────────────────────

@ -52,6 +53,14 @@ export function parkMilestone(basePath: string, milestoneId: string, reason: str
  ].join("\n");

  writeFileSync(parkedPath, content, "utf-8");
+  // Sync DB status so deriveStateFromDb also skips this milestone (#2694)
+  if (isDbAvailable()) {
+    try {
+      updateMilestoneStatus(milestoneId, "parked");
+    } catch (err) {
+      process.stderr.write(`gsd: parkMilestone DB sync failed for ${milestoneId}: ${(err as Error).message}\n`);
+    }
+  }
  invalidateAllCaches();
  return true;
 }
@ -70,6 +79,14 @@ export function unparkMilestone(basePath: string, milestoneId: string): boolean
  if (!existsSync(parkedPath)) return false; // not parked

  unlinkSync(parkedPath);
+  // Sync DB status so deriveStateFromDb picks up the unparked milestone (#2694)
+  if (isDbAvailable()) {
+    try {
+      updateMilestoneStatus(milestoneId, "active");
+    } catch (err) {
+      process.stderr.write(`gsd: unparkMilestone DB sync failed for ${milestoneId}: ${(err as Error).message}\n`);
+    }
+  }
  invalidateAllCaches();
  return true;
 }
--- a/src/resources/extensions/gsd/preferences-models.ts
+++ b/src/resources/extensions/gsd/preferences-models.ts
@ -308,7 +308,7 @@ export function resolveContextSelection(): import("./types.js").ContextSelection
 }

 /**
- * Resolve the search provider preference from preferences.md.
+ * Resolve the search provider preference from PREFERENCES.md.
 * Returns undefined if not configured (caller falls back to existing behavior).
 */
 export function resolveSearchProviderFromPreferences(): GSDPreferences["search_provider"] | undefined {
--- a/src/resources/extensions/gsd/preferences.ts
+++ b/src/resources/extensions/gsd/preferences.ts
@ -87,7 +87,7 @@ function gsdHome(): string {
 }

 function globalPreferencesPath(): string {
-  return join(gsdHome(), "preferences.md");
+  return join(gsdHome(), "PREFERENCES.md");
 }

 function legacyGlobalPreferencesPath(): string {
@ -95,16 +95,16 @@ function legacyGlobalPreferencesPath(): string {
 }

 function projectPreferencesPath(): string {
-  return join(gsdRoot(process.cwd()), "preferences.md");
-}
-// Bootstrap in gitignore.ts historically created PREFERENCES.md (uppercase) by mistake.
-// Check uppercase as a fallback so those files aren't silently ignored.
-function globalPreferencesPathUppercase(): string {
-  return join(gsdHome(), "PREFERENCES.md");
-}
-function projectPreferencesPathUppercase(): string {
  return join(gsdRoot(process.cwd()), "PREFERENCES.md");
 }
+// Legacy: older versions used lowercase preferences.md.
+// Check lowercase as a fallback so those files aren't silently ignored.
+function globalPreferencesPathLegacy(): string {
+  return join(gsdHome(), "preferences.md");
+}
+function projectPreferencesPathLegacy(): string {
+  return join(gsdRoot(process.cwd()), "preferences.md");
+}

 export function getGlobalGSDPreferencesPath(): string {
  return globalPreferencesPath();
@ -122,13 +122,13 @@ export function getProjectGSDPreferencesPath(): string {

 export function loadGlobalGSDPreferences(): LoadedGSDPreferences | null {
  return loadPreferencesFile(globalPreferencesPath(), "global")
-    ?? loadPreferencesFile(globalPreferencesPathUppercase(), "global")
+    ?? loadPreferencesFile(globalPreferencesPathLegacy(), "global")
    ?? loadPreferencesFile(legacyGlobalPreferencesPath(), "global");
 }

 export function loadProjectGSDPreferences(): LoadedGSDPreferences | null {
  return loadPreferencesFile(projectPreferencesPath(), "project")
-    ?? loadPreferencesFile(projectPreferencesPathUppercase(), "project");
+    ?? loadPreferencesFile(projectPreferencesPathLegacy(), "project");
 }

 export function loadEffectiveGSDPreferences(): LoadedGSDPreferences | null {
@ -223,7 +223,7 @@ export function parsePreferencesMarkdown(content: string): GSDPreferences | null

  if (!_warnedUnrecognizedFormat) {
    _warnedUnrecognizedFormat = true;
-    console.warn("[parsePreferencesMarkdown] preferences.md exists but uses an unrecognized format — skipping.");
+    console.warn("[parsePreferencesMarkdown] PREFERENCES.md exists but uses an unrecognized format — skipping.");
  }
  return null;
 }
@ -502,7 +502,7 @@ export function resolvePreDispatchHooks(): PreDispatchHookConfig[] {
 * Resolve the effective git isolation mode from preferences.
 * Returns "none" (default), "worktree", or "branch".
 *
- * Default is "none" so GSD works out of the box without preferences.md.
+ * Default is "none" so GSD works out of the box without PREFERENCES.md.
 * Worktree isolation requires explicit opt-in because it depends on git
 * branch infrastructure that must be set up before use.
 */
--- a/src/resources/extensions/gsd/prompts/system.md
+++ b/src/resources/extensions/gsd/prompts/system.md
@ -92,7 +92,7 @@ Titles live inside file content (headings, frontmatter), not in file or director

 ### Isolation Model

-Auto-mode supports three isolation modes (configured in `.gsd/preferences.md` under `taskIsolation.mode`):
+Auto-mode supports three isolation modes (configured in `.gsd/PREFERENCES.md` under `taskIsolation.mode`):

 - **worktree** (default): Work happens in `.gsd/worktrees/<MID>/`, a full git worktree on the `milestone/<MID>` branch. Each worktree has its own working copy and `.gsd/` directory. Squash-merged back to the integration branch on milestone completion.
 - **branch**: Work happens in the project root on a `milestone/<MID>` branch. No worktree directory — files are checked out in-place.
--- a/src/resources/extensions/gsd/provider-error-pause.ts
+++ b/src/resources/extensions/gsd/provider-error-pause.ts
@ -22,7 +22,7 @@ export function classifyProviderError(errorMsg: string): {
  // Connection/process errors — transient, auto-resume after brief backoff (#2309).
  // These indicate the process was killed, the connection was reset, or a network
  // blip occurred. They are NOT permanent failures.
-  const isConnectionError = /terminated|connection.?reset|connection.?refused|other side closed|fetch failed|network.?(?:is\s+)?unavailable|ECONNREFUSED|ECONNRESET|EPIPE/i.test(errorMsg);
+  const isConnectionError = /terminated|connection.?reset|connection.?refused|other side closed|fetch failed|network.?(?:is\s+)?unavailable|ECONNREFUSED|ECONNRESET|EPIPE|stream_exhausted(?:_without_result)?/i.test(errorMsg);

  // Permanent errors — never auto-resume
  const isPermanent = /auth|unauthorized|forbidden|invalid.*key|invalid.*api|billing|quota exceeded|account/i.test(errorMsg);
--- a/src/resources/extensions/gsd/rule-registry.ts
+++ b/src/resources/extensions/gsd/rule-registry.ts
@ -524,7 +524,7 @@ export class RuleRegistry {
  formatHookStatus(): string {
    const entries = this.getHookStatus();
    if (entries.length === 0) {
-      return "No hooks configured. Add post_unit_hooks or pre_dispatch_hooks to .gsd/preferences.md";
+      return "No hooks configured. Add post_unit_hooks or pre_dispatch_hooks to .gsd/PREFERENCES.md";
    }

    const lines: string[] = ["Configured Hooks:", ""];
--- a/src/resources/extensions/gsd/state.ts
+++ b/src/resources/extensions/gsd/state.ts
@ -211,7 +211,24 @@ export async function deriveState(basePath: string): Promise<GSDState> {

  // Dual-path: try DB-backed derivation first when hierarchy tables are populated
  if (isDbAvailable()) {
-    const dbMilestones = getAllMilestones();
+    let dbMilestones = getAllMilestones();
+
+    // Disk→DB reconciliation (#2631): when the milestones table is empty
+    // (e.g. failed initial migration per #2529), the reconciliation code
+    // inside deriveStateFromDb is unreachable. Populate from disk here so
+    // the DB path activates correctly.
+    if (dbMilestones.length === 0) {
+      const diskIds = findMilestoneIds(basePath);
+      let synced = false;
+      for (const diskId of diskIds) {
+        if (!isGhostMilestone(basePath, diskId)) {
+          insertMilestone({ id: diskId, status: 'active' });
+          synced = true;
+        }
+      }
+      if (synced) dbMilestones = getAllMilestones();
+    }
+
    if (dbMilestones.length > 0) {
      const stopDbTimer = debugTime("derive-state-db");
      result = await deriveStateFromDb(basePath);
@ -562,7 +579,10 @@ export async function deriveStateFromDb(basePath: string): Promise<GSDState> {
  }

  // ── All slices done → validating/completing ─────────────────────────
-  const allSlicesDone = activeMilestoneSlices.every(s => isStatusDone(s.status));
+  // Guard: [].every() === true (vacuous truth). Without the length check,
+  // an empty slice array causes a premature phase transition to
+  // validating-milestone. See: https://github.com/gsd-build/gsd-2/issues/2667
+  const allSlicesDone = activeMilestoneSlices.length > 0 && activeMilestoneSlices.every(s => isStatusDone(s.status));
  if (allSlicesDone) {
    const validationFile = resolveMilestoneFile(basePath, activeMilestone.id, "VALIDATION");
    const validationContent = validationFile ? await loadFile(validationFile) : null;
--- a/src/resources/extensions/gsd/templates/PREFERENCES.md
+++ b/src/resources/extensions/gsd/templates/PREFERENCES.md
--- a/src/resources/extensions/gsd/tests/auto-milestone-target.test.ts
+++ b/src/resources/extensions/gsd/tests/auto-milestone-target.test.ts
@ -0,0 +1,61 @@
+import { describe, it } from "node:test";
+import assert from "node:assert/strict";
+
+import { parseMilestoneTarget } from "../commands/handlers/auto.js";
+
+describe("parseMilestoneTarget", () => {
+  it("extracts a simple milestone ID", () => {
+    const result = parseMilestoneTarget("auto M016");
+    assert.equal(result.milestoneId, "M016");
+    assert.equal(result.rest, "auto");
+  });
+
+  it("extracts a milestone ID with unique suffix", () => {
+    const result = parseMilestoneTarget("auto M001-a3b4c5 --verbose");
+    assert.equal(result.milestoneId, "M001-a3b4c5");
+    assert.equal(result.rest, "auto --verbose");
+  });
+
+  it("returns null when no milestone ID is present", () => {
+    const result = parseMilestoneTarget("auto --verbose");
+    assert.equal(result.milestoneId, null);
+    assert.equal(result.rest, "auto --verbose");
+  });
+
+  it("extracts milestone ID with flags in any order", () => {
+    const result = parseMilestoneTarget("auto --verbose M003 --debug");
+    assert.equal(result.milestoneId, "M003");
+    assert.equal(result.rest, "auto --verbose --debug");
+  });
+
+  it("returns null for plain 'auto'", () => {
+    const result = parseMilestoneTarget("auto");
+    assert.equal(result.milestoneId, null);
+    assert.equal(result.rest, "auto");
+  });
+
+  it("extracts from 'next' command", () => {
+    const result = parseMilestoneTarget("next M012");
+    assert.equal(result.milestoneId, "M012");
+    assert.equal(result.rest, "next");
+  });
+
+  it("handles milestone ID at the start of input", () => {
+    const result = parseMilestoneTarget("M007");
+    assert.equal(result.milestoneId, "M007");
+    assert.equal(result.rest, "");
+  });
+
+  it("picks the first milestone ID when multiple appear", () => {
+    // Edge case: user accidentally types two. First one wins.
+    const result = parseMilestoneTarget("auto M001 M002");
+    assert.equal(result.milestoneId, "M001");
+    // M002 remains in rest since only the first match is removed
+    assert.ok(result.rest.includes("M002"));
+  });
+
+  it("does not match bare numbers without M prefix", () => {
+    const result = parseMilestoneTarget("auto 016");
+    assert.equal(result.milestoneId, null);
+  });
+});
--- a/src/resources/extensions/gsd/tests/claude-import-marketplace-discovery.test.ts
+++ b/src/resources/extensions/gsd/tests/claude-import-marketplace-discovery.test.ts
@ -0,0 +1,191 @@
+/**
+ * Portable tests for marketplace discovery in claude-import.
+ *
+ * Validates that categorizePluginRoots correctly discovers marketplace repos
+ * nested inside container directories (the Claude Code convention), and that
+ * discoverClaudePlugins recognizes .claude-plugin/plugin.json in addition to
+ * package.json.
+ *
+ * Uses temp-dir fixtures — no real marketplace repos required.
+ *
+ * Fixes: https://github.com/gsd-build/gsd-2/issues/2717
+ */
+
+import { describe, it, beforeEach, afterEach } from "node:test";
+import assert from "node:assert/strict";
+import { existsSync, mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { categorizePluginRoots } from "../claude-import.js";
+
+describe("categorizePluginRoots", () => {
+  let tmpDir: string;
+
+  beforeEach(() => {
+    tmpDir = mkdtempSync(join(tmpdir(), "gsd-mktplace-test-"));
+  });
+
+  afterEach(() => {
+    rmSync(tmpDir, { recursive: true, force: true });
+  });
+
+  it("should detect a direct marketplace root", () => {
+    // Root itself has .claude-plugin/marketplace.json
+    mkdirSync(join(tmpDir, ".claude-plugin"), { recursive: true });
+    writeFileSync(
+      join(tmpDir, ".claude-plugin", "marketplace.json"),
+      JSON.stringify({ name: "direct", plugins: [] })
+    );
+
+    const { marketplaces, flat } = categorizePluginRoots([tmpDir]);
+
+    assert.equal(marketplaces.length, 1);
+    assert.equal(marketplaces[0], tmpDir);
+    assert.equal(flat.length, 0);
+  });
+
+  it("should discover marketplace repos nested one level inside a container directory", () => {
+    // Simulate ~/.claude/plugins/marketplaces/ with two marketplace subdirs
+    const mktA = join(tmpDir, "marketplace-a");
+    const mktB = join(tmpDir, "marketplace-b");
+
+    mkdirSync(join(mktA, ".claude-plugin"), { recursive: true });
+    writeFileSync(
+      join(mktA, ".claude-plugin", "marketplace.json"),
+      JSON.stringify({ name: "a", plugins: [] })
+    );
+
+    mkdirSync(join(mktB, ".claude-plugin"), { recursive: true });
+    writeFileSync(
+      join(mktB, ".claude-plugin", "marketplace.json"),
+      JSON.stringify({ name: "b", plugins: [] })
+    );
+
+    const { marketplaces, flat } = categorizePluginRoots([tmpDir]);
+
+    assert.equal(marketplaces.length, 2);
+    assert.ok(marketplaces.includes(mktA));
+    assert.ok(marketplaces.includes(mktB));
+    assert.equal(flat.length, 0);
+  });
+
+  it("should fall back to flat when no child is a marketplace", () => {
+    // Container with no marketplace subdirs
+    mkdirSync(join(tmpDir, "some-dir"), { recursive: true });
+
+    const { marketplaces, flat } = categorizePluginRoots([tmpDir]);
+
+    assert.equal(marketplaces.length, 0);
+    assert.equal(flat.length, 1);
+    assert.equal(flat[0], tmpDir);
+  });
+
+  it("should handle a mix of direct marketplace and container roots", () => {
+    // Root A is a direct marketplace
+    const directRoot = join(tmpDir, "direct");
+    mkdirSync(join(directRoot, ".claude-plugin"), { recursive: true });
+    writeFileSync(
+      join(directRoot, ".claude-plugin", "marketplace.json"),
+      JSON.stringify({ name: "direct", plugins: [] })
+    );
+
+    // Root B is a container with a child marketplace
+    const container = join(tmpDir, "container");
+    const child = join(container, "child-marketplace");
+    mkdirSync(join(child, ".claude-plugin"), { recursive: true });
+    writeFileSync(
+      join(child, ".claude-plugin", "marketplace.json"),
+      JSON.stringify({ name: "child", plugins: [] })
+    );
+
+    // Root C has nothing
+    const emptyRoot = join(tmpDir, "empty");
+    mkdirSync(emptyRoot, { recursive: true });
+
+    const { marketplaces, flat } = categorizePluginRoots([
+      directRoot,
+      container,
+      emptyRoot,
+    ]);
+
+    assert.equal(marketplaces.length, 2);
+    assert.ok(marketplaces.includes(directRoot));
+    assert.ok(marketplaces.includes(child));
+    assert.equal(flat.length, 1);
+    assert.equal(flat[0], emptyRoot);
+  });
+
+  it("should not duplicate when the same marketplace appears via multiple roots", () => {
+    // Direct reference AND container reference to the same marketplace
+    const mkt = join(tmpDir, "mkt");
+    mkdirSync(join(mkt, ".claude-plugin"), { recursive: true });
+    writeFileSync(
+      join(mkt, ".claude-plugin", "marketplace.json"),
+      JSON.stringify({ name: "mkt", plugins: [] })
+    );
+
+    const { marketplaces } = categorizePluginRoots([mkt, tmpDir]);
+
+    assert.equal(marketplaces.length, 1);
+    assert.equal(marketplaces[0], mkt);
+  });
+
+  it("should skip .git and node_modules subdirectories", () => {
+    // Put a marketplace.json inside .git — should be ignored
+    mkdirSync(join(tmpDir, ".git", ".claude-plugin"), { recursive: true });
+    writeFileSync(
+      join(tmpDir, ".git", ".claude-plugin", "marketplace.json"),
+      JSON.stringify({ name: "hidden", plugins: [] })
+    );
+
+    const { marketplaces, flat } = categorizePluginRoots([tmpDir]);
+
+    assert.equal(marketplaces.length, 0);
+    assert.equal(flat.length, 1);
+  });
+
+  it("should handle non-existent root gracefully", () => {
+    const missing = join(tmpDir, "does-not-exist");
+    // categorizePluginRoots receives paths from uniqueExistingDirs, but
+    // be defensive — it should not crash on a missing root
+    const { marketplaces, flat } = categorizePluginRoots([missing]);
+
+    assert.equal(marketplaces.length, 0);
+    assert.equal(flat.length, 1); // falls through to flat
+  });
+});
+
+describe("discoverClaudePlugins — Claude plugin.json recognition", () => {
+  let tmpDir: string;
+
+  beforeEach(() => {
+    tmpDir = mkdtempSync(join(tmpdir(), "gsd-plugin-disc-"));
+  });
+
+  afterEach(() => {
+    rmSync(tmpDir, { recursive: true, force: true });
+  });
+
+  it("should discover a plugin with .claude-plugin/plugin.json (no package.json)", async () => {
+    // Simulate a cached Claude marketplace plugin
+    const pluginDir = join(tmpDir, "my-plugin");
+    mkdirSync(join(pluginDir, ".claude-plugin"), { recursive: true });
+    mkdirSync(join(pluginDir, "skills", "my-skill"), { recursive: true });
+    writeFileSync(
+      join(pluginDir, ".claude-plugin", "plugin.json"),
+      JSON.stringify({ name: "my-plugin", version: "1.0.0", description: "Test plugin" })
+    );
+    writeFileSync(join(pluginDir, "skills", "my-skill", "SKILL.md"), "# My Skill");
+
+    // Import discoverClaudePlugins dynamically since it depends on getClaudeSearchRoots
+    // which uses hardcoded paths. Instead, test the flat-path discovery logic directly
+    // by checking that the plugin.json file is recognized.
+    const claudePluginPath = join(pluginDir, ".claude-plugin", "plugin.json");
+    assert.ok(existsSync(claudePluginPath), "Claude plugin.json should exist");
+
+    // The fix ensures walkDirs checks for .claude-plugin/plugin.json in addition
+    // to package.json. We verify the file structure is correct for discovery.
+    const pkgPath = join(pluginDir, "package.json");
+    assert.ok(!existsSync(pkgPath), "package.json should NOT exist — this is a Claude plugin");
+  });
+});
--- a/src/resources/extensions/gsd/tests/claude-import-tui.test.ts
+++ b/src/resources/extensions/gsd/tests/claude-import-tui.test.ts
@ -126,7 +126,7 @@ describe(

 		before(() => {
 			tempDir = mkdtempSync(join(tmpdir(), 'gsd-tui-test-'));
-			prefsPath = join(tempDir, 'preferences.md');
+			prefsPath = join(tempDir, 'PREFERENCES.md');
 			prefs = { version: 1 };
 		});

--- a/src/resources/extensions/gsd/tests/commands-config.test.ts
+++ b/src/resources/extensions/gsd/tests/commands-config.test.ts
@ -0,0 +1,24 @@
+import test from "node:test";
+import assert from "node:assert/strict";
+import { readFileSync } from "node:fs";
+import { dirname, join } from "node:path";
+import { fileURLToPath } from "node:url";
+
+const __filename = fileURLToPath(import.meta.url);
+const __dirname = dirname(__filename);
+
+test("commands-config source-level: tool key lookup skips empty api_key entries", () => {
+  const source = readFileSync(join(__dirname, "..", "commands-config.ts"), "utf-8");
+  assert.ok(
+    source.includes('getCredentialsForProvider(providerId)'),
+    "commands-config should read the full credential list",
+  );
+  assert.ok(
+    source.includes('c.type === "api_key" && c.key'),
+    "commands-config should require a non-empty api_key when resolving stored tool keys",
+  );
+  assert.ok(
+    !source.includes("auth.get(tool.id)"),
+    "commands-config should not rely on auth.get(tool.id), which can return an empty shadowing entry",
+  );
+});
--- a/src/resources/extensions/gsd/tests/complete-task-rollback-evidence.test.ts
+++ b/src/resources/extensions/gsd/tests/complete-task-rollback-evidence.test.ts
@ -0,0 +1,106 @@
+import { describe, it, afterEach } from "node:test";
+import assert from "node:assert/strict";
+import { mkdirSync, rmSync, writeFileSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+import { randomUUID } from "node:crypto";
+
+import { handleCompleteTask } from "../tools/complete-task.js";
+import {
+  openDatabase,
+  closeDatabase,
+  _getAdapter,
+  insertMilestone,
+  insertSlice,
+} from "../gsd-db.js";
+import { clearPathCache } from "../paths.js";
+import { clearParseCache } from "../files.js";
+
+function makeTmpBase(): string {
+  const base = join(tmpdir(), `gsd-ct-rollback-${randomUUID()}`);
+  // Create the full tasks directory so the success path works
+  mkdirSync(join(base, ".gsd", "milestones", "M001", "slices", "S01", "tasks"), { recursive: true });
+  return base;
+}
+
+const VALID_PARAMS = {
+  milestoneId: "M001",
+  sliceId: "S01",
+  taskId: "T01",
+  oneLiner: "Test task",
+  narrative: "Did the thing",
+  verification: "Checked it",
+  deviations: "None.",
+  knownIssues: "None.",
+  keyFiles: ["src/foo.ts"],
+  keyDecisions: ["Used approach A"],
+  blockerDiscovered: false,
+  verificationEvidence: [
+    { command: "npm test", exitCode: 0, verdict: "✅ pass", durationMs: 1000 },
+    { command: "npm run lint", exitCode: 0, verdict: "✅ pass", durationMs: 500 },
+  ],
+};
+
+describe("complete-task rollback cleans up verification_evidence (#2724)", () => {
+  let base: string;
+
+  afterEach(() => {
+    clearPathCache();
+    clearParseCache();
+    try { closeDatabase(); } catch { /* */ }
+    if (base) {
+      try { rmSync(base, { recursive: true, force: true }); } catch { /* */ }
+    }
+  });
+
+  it("inserts verification_evidence rows on success", async () => {
+    base = makeTmpBase();
+    openDatabase(join(base, ".gsd", "gsd.db"));
+    insertMilestone({ id: "M001" });
+    insertSlice({ id: "S01", milestoneId: "M001" });
+
+    // Write a minimal slice plan so renderPlanCheckboxes doesn't error
+    writeFileSync(
+      join(base, ".gsd", "milestones", "M001", "slices", "S01", "S01-PLAN.md"),
+      "# S01 Plan\n\n## Tasks\n\n- [ ] **T01: Test task**\n",
+    );
+
+    const result = await handleCompleteTask(VALID_PARAMS, base);
+    assert.ok(!("error" in result), `unexpected error: ${"error" in result ? result.error : ""}`);
+
+    const adapter = _getAdapter()!;
+    const rows = adapter.prepare(
+      `SELECT * FROM verification_evidence WHERE task_id = 'T01' AND slice_id = 'S01' AND milestone_id = 'M001'`,
+    ).all();
+    assert.equal(rows.length, 2, "should have 2 evidence rows after success");
+  });
+
+  it("deletes verification_evidence rows on disk-render rollback", async () => {
+    base = makeTmpBase();
+    openDatabase(join(base, ".gsd", "gsd.db"));
+    insertMilestone({ id: "M001" });
+    insertSlice({ id: "S01", milestoneId: "M001" });
+
+    // Replace the tasks directory with a file so disk write fails (cross-platform)
+    const tasksDir = join(base, ".gsd", "milestones", "M001", "slices", "S01", "tasks");
+    rmSync(tasksDir, { recursive: true, force: true });
+    writeFileSync(tasksDir, "not-a-directory");
+
+    const result = await handleCompleteTask(VALID_PARAMS, base);
+    assert.ok("error" in result, "should return error when disk write fails");
+
+    // Task should be rolled back to pending
+    const adapter = _getAdapter()!;
+    const task = adapter.prepare(
+      `SELECT status FROM tasks WHERE milestone_id = 'M001' AND slice_id = 'S01' AND id = 'T01'`,
+    ).get() as { status: string } | undefined;
+    assert.ok(task, "task row should still exist");
+    assert.equal(task!.status, "pending", "task status should be rolled back to pending");
+
+    // Verification evidence should be cleaned up — no orphaned rows
+    const evidenceRows = adapter.prepare(
+      `SELECT * FROM verification_evidence WHERE task_id = 'T01' AND slice_id = 'S01' AND milestone_id = 'M001'`,
+    ).all();
+    assert.equal(evidenceRows.length, 0, "verification_evidence should be empty after rollback");
+  });
+});
--- a/src/resources/extensions/gsd/tests/derive-state-db.test.ts
+++ b/src/resources/extensions/gsd/tests/derive-state-db.test.ts
@ -14,6 +14,7 @@ import {
  getAllMilestones,
  insertSlice,
  insertTask,
+  updateTaskStatus,
 } from '../gsd-db.ts';
 // ─── Fixture Helpers ───────────────────────────────────────────────────────

@ -116,10 +117,17 @@ describe('derive-state-db', async () => {
      invalidateStateCache();
      const fileState = await deriveState(base);

-      // Now open DB, insert matching artifacts
+      // Now open DB, insert matching artifacts + milestone hierarchy
      openDatabase(':memory:');
      assert.ok(isDbAvailable(), 'db-match: DB is available after open');

+      // Insert milestone hierarchy so deriveState takes the DB path (#2631 fix)
+      insertMilestone({ id: 'M001', title: 'Test Milestone', status: 'active' });
+      insertSlice({ id: 'S01', milestoneId: 'M001', title: 'First Slice', status: 'active', risk: 'low', depends: [] });
+      insertSlice({ id: 'S02', milestoneId: 'M001', title: 'Second Slice', status: 'pending', risk: 'low', depends: ['S01'] });
+      insertTask({ id: 'T01', sliceId: 'S01', milestoneId: 'M001', title: 'First Task', status: 'pending' });
+      insertTask({ id: 'T02', sliceId: 'S01', milestoneId: 'M001', title: 'Done Task', status: 'complete' });
+
      insertArtifactRow('milestones/M001/M001-ROADMAP.md', ROADMAP_CONTENT, {
        artifact_type: 'roadmap',
        milestone_id: 'M001',
@ -197,18 +205,21 @@ describe('derive-state-db', async () => {
      writeFile(base, 'milestones/M001/slices/S01/tasks/.gitkeep', '');
      writeFile(base, 'milestones/M001/slices/S01/tasks/T01-PLAN.md', '# T01 Plan');

-      // Open DB but insert nothing — empty artifacts table
+      // Open DB but insert nothing — empty tables.
+      // With #2631 fix, deriveState will sync disk milestones into DB
+      // and then take the DB path. The result should still reflect the
+      // disk milestone correctly.
      openDatabase(':memory:');
      assert.ok(isDbAvailable(), 'empty-db: DB is available');

      invalidateStateCache();
      const state = await deriveState(base);

-      // Should still work via cachedLoadFile → loadFile disk fallback
-      assert.deepStrictEqual(state.phase, 'executing', 'empty-db: phase is executing');
+      // Milestone should be detected (synced from disk)
      assert.deepStrictEqual(state.activeMilestone?.id, 'M001', 'empty-db: activeMilestone is M001');
-      assert.deepStrictEqual(state.activeSlice?.id, 'S01', 'empty-db: activeSlice is S01');
-      assert.deepStrictEqual(state.activeTask?.id, 'T01', 'empty-db: activeTask is T01');
+      // The DB path without explicit slice/task rows may derive a different
+      // phase than the filesystem path, but the milestone must be found.
+      assert.ok(state.activeMilestone !== null, 'empty-db: activeMilestone is not null');

      closeDatabase();
    } finally {
@ -228,8 +239,12 @@ describe('derive-state-db', async () => {
      writeFile(base, 'milestones/M001/slices/S01/tasks/T01-PLAN.md', '# T01 Plan');
      writeFile(base, 'REQUIREMENTS.md', REQUIREMENTS_CONTENT);

-      // Open DB but only insert the roadmap — plan and requirements missing from DB
+      // Open DB — insert milestone hierarchy + partial artifacts (#2631 fix)
      openDatabase(':memory:');
+      insertMilestone({ id: 'M001', title: 'Test Milestone', status: 'active' });
+      insertSlice({ id: 'S01', milestoneId: 'M001', title: 'First Slice', status: 'active', risk: 'low', depends: [] });
+      insertTask({ id: 'T01', sliceId: 'S01', milestoneId: 'M001', title: 'First Task', status: 'pending' });
+      // Only insert the roadmap artifact — plan and requirements missing from DB
      insertArtifactRow('milestones/M001/M001-ROADMAP.md', ROADMAP_CONTENT, {
        artifact_type: 'roadmap',
        milestone_id: 'M001',
@ -314,6 +329,13 @@ describe('derive-state-db', async () => {

      // Put roadmap content in DB only
      openDatabase(':memory:');
+      // Insert milestone rows so deriveState takes the DB path (#2631 fix:
+      // empty milestones table now triggers disk→DB sync, which would create
+      // rows without slices — insert explicitly to get the full DB path).
+      insertMilestone({ id: 'M001', title: 'First Milestone', status: 'complete' });
+      insertMilestone({ id: 'M002', title: 'Second Milestone', status: 'active' });
+      insertSlice({ id: 'S01', milestoneId: 'M001', title: 'Done', status: 'complete', risk: 'low', depends: [] });
+      insertSlice({ id: 'S01', milestoneId: 'M002', title: 'In Progress', status: 'active', risk: 'low', depends: [] });
      insertArtifactRow('milestones/M001/M001-ROADMAP.md', completedRoadmap, {
        artifact_type: 'roadmap',
        milestone_id: 'M001',
@ -355,6 +377,10 @@ describe('derive-state-db', async () => {
      writeFile(base, 'milestones/M001/slices/S01/tasks/T01-PLAN.md', '# T01 Plan');

      openDatabase(':memory:');
+      // Insert milestone/slice/task rows so deriveState takes the DB path (#2631 fix)
+      insertMilestone({ id: 'M001', title: 'Test Milestone', status: 'active' });
+      insertSlice({ id: 'S01', milestoneId: 'M001', title: 'First Slice', status: 'active', risk: 'low', depends: [] });
+      insertTask({ id: 'T01', sliceId: 'S01', milestoneId: 'M001', title: 'First Task', status: 'pending' });
      insertArtifactRow('milestones/M001/M001-ROADMAP.md', ROADMAP_CONTENT, {
        artifact_type: 'roadmap',
        milestone_id: 'M001',
@ -378,6 +404,8 @@ describe('derive-state-db', async () => {
      });
      // Also update file on disk (cachedLoadFile may read from disk for some paths)
      writeFile(base, 'milestones/M001/slices/S01/S01-PLAN.md', updatedPlan);
+      // Update task status in DB so DB-path also sees completion (#2631 fix)
+      updateTaskStatus('M001', 'S01', 'T01', 'complete');

      // Without invalidation, should return cached result (T01 still active)
      const state2 = await deriveState(base);
--- a/src/resources/extensions/gsd/tests/detection.test.ts
+++ b/src/resources/extensions/gsd/tests/detection.test.ts
@ -99,7 +99,7 @@ test("detectProjectState: detects preferences in .gsd/", (t) => {
  t.after(() => cleanup(dir));

  mkdirSync(join(dir, ".gsd", "milestones"), { recursive: true });
-  writeFileSync(join(dir, ".gsd", "preferences.md"), "---\nversion: 1\n---\n", "utf-8");
+  writeFileSync(join(dir, ".gsd", "PREFERENCES.md"), "---\nversion: 1\n---\n", "utf-8");
  const result = detectProjectState(dir);
  assert.ok(result.v2);
  assert.equal(result.v2!.hasPreferences, true);
--- a/src/resources/extensions/gsd/tests/doctor-git.test.ts
+++ b/src/resources/extensions/gsd/tests/doctor-git.test.ts
@ -64,11 +64,11 @@ _None_
  return dir;
 }

-/** Write a .gsd/preferences.md with the given git isolation mode. */
+/** Write a .gsd/PREFERENCES.md with the given git isolation mode. */
 function writePreferencesFile(dir: string, isolation: "none" | "worktree" | "branch"): void {
  const gsdDir = join(dir, ".gsd");
  mkdirSync(gsdDir, { recursive: true });
-  writeFileSync(join(gsdDir, "preferences.md"), `---\ngit:\n  isolation: "${isolation}"\n---\n`);
+  writeFileSync(join(gsdDir, "PREFERENCES.md"), `---\ngit:\n  isolation: "${isolation}"\n---\n`);
 }

 /** Create a repo with an in-progress milestone. */
@ -302,7 +302,7 @@ describe('doctor-git', async () => {
    // ─── Test 7: none-mode skips orphaned worktree check ───────────────
    // NOTE: loadEffectiveGSDPreferences() resolves PROJECT_PREFERENCES_PATH
    // at module load time from process.cwd(). We write the prefs file to
-    // the test runner's cwd .gsd/preferences.md and clean up afterwards.
+    // the test runner's cwd .gsd/PREFERENCES.md and clean up afterwards.
    if (process.platform !== "win32") {
    test('none-mode skips orphaned worktree', async () => {
      const dir = createRepoWithCompletedMilestone();
@ -409,7 +409,7 @@ describe('doctor-git', async () => {
      cleanups.push(dir);

      run("git branch trunk", dir);
-      writeFileSync(join(dir, ".gsd", "preferences.md"), `---\ngit:\n  isolation: "worktree"\n  main_branch: "trunk"\n---\n`);
+      writeFileSync(join(dir, ".gsd", "PREFERENCES.md"), `---\ngit:\n  isolation: "worktree"\n  main_branch: "trunk"\n---\n`);

      const metaPath = join(dir, ".gsd", "milestones", "M001", "M001-META.json");
      writeFileSync(metaPath, JSON.stringify({ integrationBranch: "feat/does-not-exist" }, null, 2));
--- a/src/resources/extensions/gsd/tests/doctor-proactive.test.ts
+++ b/src/resources/extensions/gsd/tests/doctor-proactive.test.ts
@ -297,7 +297,7 @@ describe('doctor-proactive', async () => {
      cleanups.push(dir);

      run("git branch trunk", dir);
-      writeFileSync(join(dir, ".gsd", "preferences.md"), `---\ngit:\n  main_branch: "trunk"\n---\n`);
+      writeFileSync(join(dir, ".gsd", "PREFERENCES.md"), `---\ngit:\n  main_branch: "trunk"\n---\n`);
      const metaPath = join(dir, ".gsd", "milestones", "M001", "M001-META.json");
      writeFileSync(metaPath, JSON.stringify({ integrationBranch: "feature/missing" }, null, 2));

--- a/src/resources/extensions/gsd/tests/doctor-providers.test.ts
+++ b/src/resources/extensions/gsd/tests/doctor-providers.test.ts
@ -419,7 +419,7 @@ test("runProviderChecks uses provider-qualified anthropic-vertex model IDs", ()
  const repo = realpathSync(mkdtempSync(join(tmpdir(), "gsd-providers-vertex-prefix-repo-")));
  mkdirSync(join(repo, ".gsd"), { recursive: true });
  writeFileSync(
-    join(repo, ".gsd", "preferences.md"),
+    join(repo, ".gsd", "PREFERENCES.md"),
    [
      "---",
      "models:",
@ -454,7 +454,7 @@ test("runProviderChecks uses object provider field for anthropic-vertex models",
  const repo = realpathSync(mkdtempSync(join(tmpdir(), "gsd-providers-vertex-provider-repo-")));
  mkdirSync(join(repo, ".gsd"), { recursive: true });
  writeFileSync(
-    join(repo, ".gsd", "preferences.md"),
+    join(repo, ".gsd", "PREFERENCES.md"),
    [
      "---",
      "models:",
--- a/src/resources/extensions/gsd/tests/empty-db-reconciliation.test.ts
+++ b/src/resources/extensions/gsd/tests/empty-db-reconciliation.test.ts
@ -0,0 +1,79 @@
+/**
+ * Regression test for #2631: deriveState disk→DB reconciliation must
+ * run even when the milestones table starts empty.
+ *
+ * When getAllMilestones() returns [] (e.g. after a failed initial migration),
+ * the reconciliation code inside deriveStateFromDb was unreachable because
+ * deriveState only called it when dbMilestones.length > 0. The fix moves
+ * disk→DB sync into deriveState itself, before the length check.
+ */
+import { test } from "node:test";
+import assert from "node:assert/strict";
+import { mkdtempSync, mkdirSync, writeFileSync, rmSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+
+import { deriveState, invalidateStateCache } from "../state.ts";
+import {
+  openDatabase,
+  closeDatabase,
+  getAllMilestones,
+} from "../gsd-db.ts";
+
+test("deriveState populates empty DB from disk milestones (#2631)", async () => {
+  const base = mkdtempSync(join(tmpdir(), "gsd-empty-db-"));
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+
+  try {
+    // Create a milestone on disk with a CONTEXT file (not a ghost)
+    writeFileSync(
+      join(base, ".gsd", "milestones", "M001", "M001-CONTEXT.md"),
+      "# M001: Test Milestone\n\nSome context about this milestone.",
+    );
+
+    // Open DB — milestones table is empty (simulating failed migration)
+    openDatabase(":memory:");
+    const before = getAllMilestones();
+    assert.equal(before.length, 0, "DB should start with 0 milestones");
+
+    // deriveState should reconcile disk → DB
+    invalidateStateCache();
+    const state = await deriveState(base);
+
+    // After deriveState, the DB should now have the disk milestone
+    const after = getAllMilestones();
+    assert.ok(after.length > 0, "DB should have milestones after reconciliation");
+    assert.equal(after[0]!.id, "M001", "reconciled milestone should be M001");
+
+    // State should reflect the milestone (not "No milestones found")
+    assert.ok(
+      state.activeMilestone !== null,
+      "activeMilestone should not be null after reconciliation",
+    );
+
+    closeDatabase();
+  } finally {
+    closeDatabase();
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+
+test("deriveState does NOT insert ghost milestones into DB (#2631 guard)", async () => {
+  const base = mkdtempSync(join(tmpdir(), "gsd-empty-db-"));
+  // Create a ghost milestone directory (empty — no CONTEXT, no ROADMAP)
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+
+  try {
+    openDatabase(":memory:");
+    invalidateStateCache();
+    await deriveState(base);
+
+    const milestones = getAllMilestones();
+    assert.equal(milestones.length, 0, "ghost milestone should NOT be inserted");
+
+    closeDatabase();
+  } finally {
+    closeDatabase();
+    rmSync(base, { recursive: true, force: true });
+  }
+});
--- a/src/resources/extensions/gsd/tests/git-service.test.ts
+++ b/src/resources/extensions/gsd/tests/git-service.test.ts
@ -1142,7 +1142,7 @@ describe('git-service', async () => {
    mkdirSync(join(repo, ".gsd", "runtime"), { recursive: true });
    mkdirSync(join(repo, ".gsd", "activity"), { recursive: true });
    writeFileSync(join(repo, ".gsd", "milestones", "M001", "ROADMAP.md"), "# Roadmap");
-    writeFileSync(join(repo, ".gsd", "preferences.md"), "---\nversion: 1\n---");
+    writeFileSync(join(repo, ".gsd", "PREFERENCES.md"), "---\nversion: 1\n---");
    writeFileSync(join(repo, ".gsd", "STATE.md"), "# State");
    writeFileSync(join(repo, ".gsd", "runtime", "units.json"), "{}");
    writeFileSync(join(repo, ".gsd", "activity", "log.jsonl"), "{}");
--- a/src/resources/extensions/gsd/tests/idle-watchdog-stall-override.test.ts
+++ b/src/resources/extensions/gsd/tests/idle-watchdog-stall-override.test.ts
@ -0,0 +1,125 @@
+/**
+ * Regression tests for #2527: idle watchdog stalled-tool detection.
+ *
+ * Bug 1: When a tool is stalled longer than idle_timeout, the watchdog
+ * notifies but falls through to detectWorkingTreeActivity(), which
+ * resets lastProgressAt if files were modified earlier. Recovery is
+ * never called — the session burns tokens indefinitely.
+ *
+ * Bug 2: After async recoverTimedOutUnit(), pauseAuto/stopAuto may set
+ * s.currentUnit = null, but the next line accesses .startedAt — crash.
+ *
+ * These tests verify the auto-timers.ts source contains the structural
+ * fixes: the stalledToolDetected flag, clearInFlightTools() call, the
+ * filesystem-check guard, and the null guard after recovery.
+ */
+
+import { readFileSync } from "node:fs";
+import { join } from "node:path";
+import { test, describe } from "node:test";
+import assert from "node:assert/strict";
+
+const TIMERS_SRC = readFileSync(
+  join(import.meta.dirname, "..", "auto-timers.ts"),
+  "utf-8",
+);
+
+// ═══ Bug 1: stalledToolDetected flag prevents filesystem-activity override ═══
+
+describe("#2527 Bug 1: stalled tool should not be overridden by filesystem activity", () => {
+  test("auto-timers.ts imports clearInFlightTools", () => {
+    assert.ok(
+      TIMERS_SRC.includes("clearInFlightTools"),
+      "clearInFlightTools must be imported from auto-tool-tracking",
+    );
+  });
+
+  test("auto-timers.ts declares stalledToolDetected flag", () => {
+    assert.ok(
+      TIMERS_SRC.includes("stalledToolDetected"),
+      "stalledToolDetected flag must exist in idle watchdog",
+    );
+  });
+
+  test("stalled tool sets flag to true", () => {
+    // The flag must be set before the filesystem check
+    const flagSet = TIMERS_SRC.indexOf("stalledToolDetected = true");
+    assert.ok(flagSet > -1, "stalledToolDetected must be set to true when tool is stalled");
+
+    const notify = TIMERS_SRC.indexOf("Stalled tool detected:");
+    assert.ok(flagSet < notify, "flag must be set before the stall notification");
+  });
+
+  test("stalled tool calls clearInFlightTools", () => {
+    // clearInFlightTools() must be called when tool is stalled, so subsequent
+    // watchdog ticks don't re-detect the same stale entries
+    const clearCall = TIMERS_SRC.indexOf("clearInFlightTools()");
+    assert.ok(clearCall > -1, "clearInFlightTools() must be called when tool is stalled");
+
+    const flagSet = TIMERS_SRC.indexOf("stalledToolDetected = true");
+    assert.ok(
+      Math.abs(clearCall - flagSet) < 200,
+      "clearInFlightTools() should be near stalledToolDetected = true",
+    );
+  });
+
+  test("filesystem-activity check is guarded by stalledToolDetected", () => {
+    // The detectWorkingTreeActivity check must be skipped when stalledToolDetected is true
+    assert.ok(
+      TIMERS_SRC.includes("!stalledToolDetected && detectWorkingTreeActivity"),
+      "detectWorkingTreeActivity must be guarded by !stalledToolDetected",
+    );
+  });
+
+  test("control flow: stalled tool → skip filesystem check → reach recovery", () => {
+    // Verify the structural ordering: flag declaration → stall block → guarded fs check → recovery
+    const flagDecl = TIMERS_SRC.indexOf("let stalledToolDetected = false");
+    const stallBlock = TIMERS_SRC.indexOf("stalledToolDetected = true");
+    const fsGuard = TIMERS_SRC.indexOf("!stalledToolDetected && detectWorkingTreeActivity");
+    const recovery = TIMERS_SRC.indexOf("recoverTimedOutUnit(ctx, pi, unitType, unitId, \"idle\"");
+
+    assert.ok(flagDecl > -1, "flag declaration must exist");
+    assert.ok(flagDecl < stallBlock, "flag declared before stall block");
+    assert.ok(stallBlock < fsGuard, "stall block before filesystem guard");
+    assert.ok(fsGuard < recovery, "filesystem guard before recovery call");
+  });
+});
+
+// ═══ Bug 2: null guard after async recoverTimedOutUnit ═══════════════════════
+
+describe("#2527 Bug 2: null guard after async recovery prevents crash", () => {
+  test("idle watchdog has null guard after recoverTimedOutUnit", () => {
+    // Find the idle recovery call
+    const idleRecovery = TIMERS_SRC.indexOf(
+      'recoverTimedOutUnit(ctx, pi, unitType, unitId, "idle"',
+    );
+    assert.ok(idleRecovery > -1, "idle recovery call must exist");
+
+    // The null guard must appear between the recovery call and the next
+    // writeUnitRuntimeRecord that accesses s.currentUnit.startedAt
+    const afterRecovery = TIMERS_SRC.slice(idleRecovery, idleRecovery + 400);
+    assert.ok(
+      afterRecovery.includes("if (!s.currentUnit) return"),
+      "null guard for s.currentUnit must exist after idle recoverTimedOutUnit",
+    );
+  });
+
+  test("null guard is between recovery and writeUnitRuntimeRecord", () => {
+    const idleRecovery = TIMERS_SRC.indexOf(
+      'recoverTimedOutUnit(ctx, pi, unitType, unitId, "idle"',
+    );
+    const afterRecovery = TIMERS_SRC.slice(idleRecovery);
+
+    const recoveredReturn = afterRecovery.indexOf('if (recovery === "recovered") return');
+    const nullGuard = afterRecovery.indexOf("if (!s.currentUnit) return");
+    const writeRecord = afterRecovery.indexOf("writeUnitRuntimeRecord(s.basePath");
+
+    assert.ok(recoveredReturn > -1, "recovered return must exist");
+    assert.ok(nullGuard > -1, "null guard must exist");
+    assert.ok(writeRecord > -1, "writeUnitRuntimeRecord must exist after recovery");
+    assert.ok(
+      recoveredReturn < nullGuard && nullGuard < writeRecord,
+      "order must be: recovered-return → null-guard → writeUnitRuntimeRecord",
+    );
+  });
+});
--- a/src/resources/extensions/gsd/tests/init-wizard.test.ts
+++ b/src/resources/extensions/gsd/tests/init-wizard.test.ts
@ -123,7 +123,7 @@ test("init-wizard: v2 .gsd/ preferences detected", (t) => {
  const dir = makeTempDir("prefs-detect");
  try {
    mkdirSync(join(dir, ".gsd", "milestones"), { recursive: true });
-    writeFileSync(join(dir, ".gsd", "preferences.md"), "---\nversion: 1\nmode: solo\n---\n", "utf-8");
+    writeFileSync(join(dir, ".gsd", "PREFERENCES.md"), "---\nversion: 1\nmode: solo\n---\n", "utf-8");

    const detection = detectProjectState(dir);
    assert.ok(detection.v2);
--- a/src/resources/extensions/gsd/tests/key-manager.test.ts
+++ b/src/resources/extensions/gsd/tests/key-manager.test.ts
@ -189,7 +189,22 @@ test("getAllKeyStatuses detects empty keys as not configured", () => {
  const statuses = getAllKeyStatuses(auth);
  const groq = statuses.find((s) => s.provider.id === "groq");
  assert.equal(groq?.configured, false);
-  assert.ok(groq?.description.includes("empty"));
+  // Empty-key entries are filtered out, so provider appears unconfigured
+  assert.equal(groq?.source, "none");
+});
+
+test("getAllKeyStatuses finds valid keys even when empty-key entry exists at index 0", () => {
+  const auth = makeAuth({
+    groq: [
+      { type: "api_key", key: "" },
+      { type: "api_key", key: "gsk-real-key" },
+    ],
+  });
+  const statuses = getAllKeyStatuses(auth);
+  const groq = statuses.find((s) => s.provider.id === "groq");
+  assert.equal(groq?.configured, true);
+  assert.equal(groq?.source, "auth.json");
+  assert.equal(groq?.credentialCount, 1); // only the valid key counts
 });

 test("getAllKeyStatuses detects env var keys", () => {
--- a/src/resources/extensions/gsd/tests/none-mode-gates.test.ts
+++ b/src/resources/extensions/gsd/tests/none-mode-gates.test.ts
@ -8,7 +8,7 @@
 * Uses the writeRunnerPreferences pattern from doctor-git.test.ts:
 * PROJECT_PREFERENCES_PATH is a module-level constant frozen at import
 * time, so process.chdir() won't redirect preference loading. We write
- * prefs to the runner's cwd .gsd/preferences.md and clean up in finally.
+ * prefs to the runner's cwd .gsd/PREFERENCES.md and clean up in finally.
 */

 import { mkdirSync, writeFileSync, rmSync, existsSync } from "node:fs";
@ -24,7 +24,7 @@ import assert from 'node:assert/strict';

 // --- Preferences helpers (same pattern as doctor-git.test.ts K001) ---

-const RUNNER_PREFS_PATH = join(process.cwd(), ".gsd", "preferences.md");
+const RUNNER_PREFS_PATH = join(process.cwd(), ".gsd", "PREFERENCES.md");

 function writeRunnerPreferences(isolation: "none" | "worktree" | "branch"): void {
  mkdirSync(join(process.cwd(), ".gsd"), { recursive: true });
@ -72,12 +72,12 @@ try {

 // Test 4: shouldUseWorktreeIsolation returns false for no prefs (default: none)
 // Worktree isolation requires explicit opt-in — default is "none" so GSD
-// works out of the box without preferences.md (#2480).
+// works out of the box without PREFERENCES.md (#2480).
 // Skip if global prefs exist — they override the default and this test
-// cannot control ~/.gsd/preferences.md.
+// cannot control ~/.gsd/PREFERENCES.md.

 test('shouldUseWorktreeIsolation returns false for no prefs (default: none)', () => {
-  const globalPrefsExist = existsSync(join(homedir(), ".gsd", "preferences.md"))
+  const globalPrefsExist = existsSync(join(homedir(), ".gsd", "PREFERENCES.md"))
    || existsSync(join(homedir(), ".gsd", "PREFERENCES.md"));
  if (!globalPrefsExist) {
    try {
@ -91,9 +91,9 @@ test('shouldUseWorktreeIsolation returns false for no prefs (default: none)', ()
  }
 });

-// Test 5: getIsolationMode returns "none" when no preferences.md exists (#2480)
+// Test 5: getIsolationMode returns "none" when no PREFERENCES.md exists (#2480)
 test('getIsolationMode returns "none" with no prefs (default)', () => {
-  const globalPrefsExist = existsSync(join(homedir(), ".gsd", "preferences.md"))
+  const globalPrefsExist = existsSync(join(homedir(), ".gsd", "PREFERENCES.md"))
    || existsSync(join(homedir(), ".gsd", "PREFERENCES.md"));
  if (!globalPrefsExist) {
    try {
--- a/src/resources/extensions/gsd/tests/park-db-sync.test.ts
+++ b/src/resources/extensions/gsd/tests/park-db-sync.test.ts
@ -0,0 +1,85 @@
+/**
+ * Regression test for #2694: parkMilestone and unparkMilestone must
+ * update the DB milestone status alongside the filesystem marker.
+ *
+ * Without this, deriveStateFromDb skips unparked milestones because
+ * the DB still has status='parked', causing "All milestones complete".
+ */
+import { test } from "node:test";
+import assert from "node:assert/strict";
+import { mkdtempSync, mkdirSync, writeFileSync, rmSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+
+import { parkMilestone, unparkMilestone } from "../milestone-actions.ts";
+import {
+  openDatabase,
+  closeDatabase,
+  insertMilestone,
+  getMilestone,
+} from "../gsd-db.ts";
+
+function createBase(): string {
+  const base = mkdtempSync(join(tmpdir(), "gsd-park-db-"));
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+  writeFileSync(
+    join(base, ".gsd", "milestones", "M001", "M001-CONTEXT.md"),
+    "# M001\n\nContext.",
+  );
+  return base;
+}
+
+test("parkMilestone updates DB status to 'parked' (#2694)", () => {
+  const base = createBase();
+  try {
+    openDatabase(":memory:");
+    insertMilestone({ id: "M001", title: "Test", status: "active" });
+
+    assert.equal(getMilestone("M001")!.status, "active", "starts active");
+
+    parkMilestone(base, "M001", "deprioritized");
+
+    assert.equal(getMilestone("M001")!.status, "parked", "DB status should be parked");
+
+    closeDatabase();
+  } finally {
+    closeDatabase();
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+
+test("unparkMilestone updates DB status to 'active' (#2694)", () => {
+  const base = createBase();
+  try {
+    openDatabase(":memory:");
+    insertMilestone({ id: "M001", title: "Test", status: "active" });
+
+    // Park first
+    parkMilestone(base, "M001", "deprioritized");
+    assert.equal(getMilestone("M001")!.status, "parked");
+
+    // Unpark
+    unparkMilestone(base, "M001");
+    assert.equal(getMilestone("M001")!.status, "active", "DB status should be active after unpark");
+
+    closeDatabase();
+  } finally {
+    closeDatabase();
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+
+test("park/unpark are safe when DB is not available (#2694 guard)", () => {
+  const base = createBase();
+  try {
+    // No openDatabase — DB not available
+    // park/unpark should still work (filesystem-only, no throw)
+    const parked = parkMilestone(base, "M001", "test");
+    assert.ok(parked, "parkMilestone succeeds without DB");
+
+    const unparked = unparkMilestone(base, "M001");
+    assert.ok(unparked, "unparkMilestone succeeds without DB");
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
--- a/src/resources/extensions/gsd/tests/preferences.test.ts
+++ b/src/resources/extensions/gsd/tests/preferences.test.ts
@ -45,7 +45,7 @@ test("getIsolationMode defaults to none when preferences have no isolation setti
  // Validate the default via validatePreferences: when no isolation is set,
  // preferences.git.isolation is undefined, and getIsolationMode returns "none".
  // Default changed from "worktree" to "none" so GSD works out of the box
-  // without preferences.md (#2480).
+  // without PREFERENCES.md (#2480).
  const { preferences } = validatePreferences({});
  assert.equal(preferences.git?.isolation, undefined, "no isolation in empty prefs");
  const isolation = preferences.git?.isolation;
--- a/src/resources/extensions/gsd/tests/provider-errors.test.ts
+++ b/src/resources/extensions/gsd/tests/provider-errors.test.ts
@ -42,6 +42,15 @@ test("classifyProviderError defaults to 60s for rate limit without reset", () =>
  assert.equal(result.suggestedDelayMs, 60_000);
 });

+test("classifyProviderError treats stream_exhausted_without_result as transient connection failure", () => {
+  const result = classifyProviderError("stream_exhausted_without_result");
+  assert.deepStrictEqual(result, {
+    isTransient: true,
+    isRateLimit: false,
+    suggestedDelayMs: 15_000,
+  });
+});
+
 test("classifyProviderError detects Anthropic internal server error", () => {
  const msg = '{"type":"error","error":{"details":null,"type":"api_error","message":"Internal server error"}}';
  const result = classifyProviderError(msg);
--- a/src/resources/extensions/gsd/tests/remediation-completion-guard.test.ts
+++ b/src/resources/extensions/gsd/tests/remediation-completion-guard.test.ts
@ -0,0 +1,110 @@
+/**
+ * Regression test for #2675: completing-milestone dispatch rule must
+ * block completion when VALIDATION verdict is "needs-remediation".
+ *
+ * Without this guard, needs-remediation + allSlicesDone causes a loop:
+ * complete-milestone dispatched → agent refuses (correct) → no SUMMARY
+ * → re-dispatch → repeat until stuck detection fires.
+ */
+import { test } from "node:test";
+import assert from "node:assert/strict";
+import { mkdtempSync, mkdirSync, writeFileSync, rmSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+
+import { DISPATCH_RULES } from "../auto-dispatch.ts";
+
+/** Find the completing-milestone dispatch rule */
+const completingRule = DISPATCH_RULES.find(r => r.name === "completing-milestone → complete-milestone");
+
+test("completing-milestone dispatch rule exists", () => {
+  assert.ok(completingRule, "rule should exist in DISPATCH_RULES");
+});
+
+test("completing-milestone blocks when VALIDATION verdict is needs-remediation (#2675)", async () => {
+  const base = mkdtempSync(join(tmpdir(), "gsd-remediation-"));
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+
+  try {
+    // Write a VALIDATION file with needs-remediation verdict
+    writeFileSync(
+      join(base, ".gsd", "milestones", "M001", "M001-VALIDATION.md"),
+      [
+        "---",
+        "verdict: needs-remediation",
+        "remediation_round: 0",
+        "---",
+        "",
+        "# Validation Report",
+        "",
+        "3 success criteria failed. Remediation required.",
+      ].join("\n"),
+    );
+
+    const ctx = {
+      mid: "M001",
+      midTitle: "Test Milestone",
+      basePath: base,
+      state: { phase: "completing-milestone" } as any,
+      prefs: {} as any,
+      session: undefined,
+    };
+
+    const result = await completingRule!.match(ctx);
+
+    assert.ok(result !== null, "rule should match");
+    assert.equal(result!.action, "stop", "should return stop action");
+    if (result!.action === "stop") {
+      assert.equal(result!.level, "warning", "should be warning level (pausable)");
+      assert.ok(
+        result!.reason.includes("needs-remediation"),
+        "reason should mention needs-remediation",
+      );
+    }
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+
+test("completing-milestone proceeds normally when VALIDATION verdict is pass (#2675 guard)", async () => {
+  const base = mkdtempSync(join(tmpdir(), "gsd-remediation-"));
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+
+  try {
+    // Write a VALIDATION file with pass verdict
+    writeFileSync(
+      join(base, ".gsd", "milestones", "M001", "M001-VALIDATION.md"),
+      [
+        "---",
+        "verdict: pass",
+        "---",
+        "",
+        "# Validation Report",
+        "",
+        "All criteria met.",
+      ].join("\n"),
+    );
+
+    const ctx = {
+      mid: "M001",
+      midTitle: "Test Milestone",
+      basePath: base,
+      state: { phase: "completing-milestone" } as any,
+      prefs: {} as any,
+      session: undefined,
+    };
+
+    const result = await completingRule!.match(ctx);
+
+    // Should NOT return a stop — should either dispatch or return stop for
+    // a different reason (e.g. missing SUMMARY files, no implementation)
+    if (result && result.action === "stop") {
+      assert.ok(
+        !result.reason.includes("needs-remediation"),
+        "pass verdict should NOT trigger the remediation guard",
+      );
+    }
+  } finally {
+    rmSync(base, { recursive: true, force: true });
+  }
+});
--- a/src/resources/extensions/gsd/tests/remote-questions.test.ts
+++ b/src/resources/extensions/gsd/tests/remote-questions.test.ts
@ -724,3 +724,32 @@ test("resolveRemoteConfig returns null when preferences are absent (no env side-
    if (savedTelegram !== undefined) process.env.TELEGRAM_BOT_TOKEN = savedTelegram;
  }
 });
+
+test("config source-level: hydration skips api_key entries with empty keys", () => {
+  const configSrc = readFileSync(
+    join(__dirname, "..", "..", "remote-questions", "config.ts"),
+    "utf-8",
+  );
+  // The find() call in hydrateRemoteTokensFromAuth must filter for non-empty keys,
+  // not just match on type === "api_key". This prevents stale empty-key entries
+  // (left by removeProviderToken) from shadowing valid tokens.
+  assert.ok(
+    configSrc.includes('c.type === "api_key" && !!c.key'),
+    "hydrateRemoteTokensFromAuth find() should require a non-empty key",
+  );
+});
+
+test("config source-level: removeProviderToken uses auth.remove not auth.set with empty key", () => {
+  const commandSrc = readFileSync(
+    join(__dirname, "..", "..", "remote-questions", "remote-command.ts"),
+    "utf-8",
+  );
+  // removeProviderToken should call auth.remove(provider), not auth.set(provider, { key: "" }).
+  // Setting an empty key pollutes the credentials array and shadows valid tokens.
+  const fnStart = commandSrc.indexOf("function removeProviderToken");
+  assert.ok(fnStart !== -1, "removeProviderToken should exist");
+  const fnEnd = commandSrc.indexOf("\n}", fnStart);
+  const fnBody = commandSrc.slice(fnStart, fnEnd);
+  assert.ok(fnBody.includes("auth.remove("), "removeProviderToken should call auth.remove()");
+  assert.ok(!fnBody.includes('key: ""'), "removeProviderToken should not set an empty key");
+});
--- a/src/resources/extensions/gsd/tests/token-cost-display.test.ts
+++ b/src/resources/extensions/gsd/tests/token-cost-display.test.ts
@ -63,13 +63,13 @@ test("show_token_cost defaults to undefined (disabled) when not set", () => {
  assert.equal(preferences.show_token_cost, undefined);
 });

-test("empty preferences.md does not enable show_token_cost", () => {
+test("empty PREFERENCES.md does not enable show_token_cost", () => {
  const prefs = parsePreferencesMarkdown("---\nversion: 1\n---\n");
  assert.ok(prefs);
  assert.equal(prefs.show_token_cost, undefined);
 });

-test("preferences.md with show_token_cost: true enables the preference", () => {
+test("PREFERENCES.md with show_token_cost: true enables the preference", () => {
  const prefs = parsePreferencesMarkdown("---\nshow_token_cost: true\n---\n");
  assert.ok(prefs);
  assert.equal(prefs.show_token_cost, true);
--- a/src/resources/extensions/gsd/tests/vacuous-truth-slices.test.ts
+++ b/src/resources/extensions/gsd/tests/vacuous-truth-slices.test.ts
@ -0,0 +1,115 @@
+/**
+ * Regression test for #2667: deriveStateFromDb must NOT treat an empty
+ * slice array as "all slices done" due to JavaScript's vacuous-truth
+ * behavior of Array.prototype.every on an empty array.
+ *
+ * [].every(predicate) === true in JavaScript. Without a length > 0 guard,
+ * this causes a premature phase transition to validating-milestone when
+ * the DB returns 0 slices (e.g. after a worktree DB wipe).
+ */
+import { test } from "node:test";
+import assert from "node:assert/strict";
+import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+
+import { deriveStateFromDb, invalidateStateCache } from "../state.ts";
+import {
+  openDatabase,
+  closeDatabase,
+  insertMilestone,
+  insertSlice,
+} from "../gsd-db.ts";
+
+test("deriveStateFromDb does NOT skip to validating when slice array is empty (#2667)", async () => {
+  const base = mkdtempSync(join(tmpdir(), "gsd-vacuous-truth-"));
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+
+  try {
+    // Set up a milestone with a roadmap that references slices,
+    // but the DB has NO slice rows (simulating a worktree DB wipe)
+    writeFileSync(
+      join(base, ".gsd", "milestones", "M001", "M001-ROADMAP.md"),
+      [
+        "# M001: Test Milestone",
+        "",
+        "## Slices",
+        "",
+        "### S01 — First Slice",
+        "Do something.",
+        "",
+        "### S02 — Second Slice",
+        "Do another thing.",
+      ].join("\n"),
+    );
+
+    openDatabase(":memory:");
+    // Milestone exists but NO slices inserted — simulates DB wipe
+    insertMilestone({ id: "M001", title: "Test Milestone", status: "active" });
+
+    invalidateStateCache();
+    const state = await deriveStateFromDb(base);
+
+    // The phase must NOT be "validating-milestone" or "completing-milestone"
+    // because no slices have been executed — the empty array should not
+    // trigger the "all slices done" code path.
+    assert.notEqual(
+      state.phase,
+      "validating-milestone",
+      "empty slice array must not trigger validating-milestone (vacuous truth)",
+    );
+    assert.notEqual(
+      state.phase,
+      "completing-milestone",
+      "empty slice array must not trigger completing-milestone (vacuous truth)",
+    );
+
+    closeDatabase();
+  } finally {
+    closeDatabase();
+    rmSync(base, { recursive: true, force: true });
+  }
+});
+
+test("deriveStateFromDb correctly reaches validating when all slices are done (#2667 guard)", async () => {
+  const base = mkdtempSync(join(tmpdir(), "gsd-vacuous-truth-"));
+  mkdirSync(join(base, ".gsd", "milestones", "M001", "slices", "S01"), { recursive: true });
+
+  try {
+    writeFileSync(
+      join(base, ".gsd", "milestones", "M001", "M001-ROADMAP.md"),
+      [
+        "# M001: Test Milestone",
+        "",
+        "## Slices",
+        "",
+        "### S01 — First Slice",
+        "Do something.",
+      ].join("\n"),
+    );
+
+    // Write a slice summary so the filesystem recognizes it as complete
+    writeFileSync(
+      join(base, ".gsd", "milestones", "M001", "slices", "S01", "S01-SUMMARY.md"),
+      "# S01 Summary\n\nDone.",
+    );
+
+    openDatabase(":memory:");
+    insertMilestone({ id: "M001", title: "Test Milestone", status: "active" });
+    insertSlice({ id: "S01", milestoneId: "M001", title: "First Slice", status: "complete", risk: "low", depends: [] });
+
+    invalidateStateCache();
+    const state = await deriveStateFromDb(base);
+
+    // With one slice that IS complete, phase should advance
+    assert.ok(
+      state.phase === "validating-milestone" || state.phase === "completing-milestone",
+      `expected validating or completing phase, got "${state.phase}"`,
+    );
+
+    closeDatabase();
+  } finally {
+    closeDatabase();
+    rmSync(base, { recursive: true, force: true });
+  }
+});
--- a/src/resources/extensions/gsd/tests/validate-milestone-write-order.test.ts
+++ b/src/resources/extensions/gsd/tests/validate-milestone-write-order.test.ts
@ -0,0 +1,90 @@
+import { describe, it, afterEach } from "node:test";
+import assert from "node:assert/strict";
+import { mkdirSync, existsSync, rmSync, writeFileSync } from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+import { randomUUID } from "node:crypto";
+
+import { handleValidateMilestone } from "../tools/validate-milestone.js";
+import { openDatabase, closeDatabase, _getAdapter, insertMilestone } from "../gsd-db.js";
+import { clearPathCache } from "../paths.js";
+import { clearParseCache } from "../files.js";
+
+function makeTmpBase(): string {
+  const base = join(tmpdir(), `gsd-val-handler-${randomUUID()}`);
+  mkdirSync(join(base, ".gsd", "milestones", "M001"), { recursive: true });
+  return base;
+}
+
+const VALID_PARAMS = {
+  milestoneId: "M001",
+  verdict: "pass" as const,
+  remediationRound: 0,
+  successCriteriaChecklist: "- [x] All pass",
+  sliceDeliveryAudit: "| S01 | delivered |",
+  crossSliceIntegration: "No issues",
+  requirementCoverage: "All covered",
+  verdictRationale: "Everything checks out",
+};
+
+describe("handleValidateMilestone write ordering (#2725)", () => {
+  let base: string;
+
+  afterEach(() => {
+    clearPathCache();
+    clearParseCache();
+    try { closeDatabase(); } catch { /* */ }
+    if (base) {
+      try { rmSync(base, { recursive: true, force: true }); } catch { /* */ }
+    }
+  });
+
+  it("writes DB row and disk file on success", async () => {
+    base = makeTmpBase();
+    const dbPath = join(base, ".gsd", "gsd.db");
+    openDatabase(dbPath);
+    insertMilestone({ id: "M001" });
+
+    const result = await handleValidateMilestone(VALID_PARAMS, base);
+    assert.ok(!("error" in result), `unexpected error: ${"error" in result ? result.error : ""}`);
+
+    // DB row exists
+    const adapter = _getAdapter()!;
+    const row = adapter.prepare(
+      `SELECT status, scope FROM assessments WHERE milestone_id = 'M001' AND scope = 'milestone-validation'`,
+    ).get() as { status: string; scope: string } | undefined;
+    assert.ok(row, "assessment row should exist in DB");
+    assert.equal(row!.status, "pass");
+
+    // Disk file exists
+    const filePath = join(base, ".gsd", "milestones", "M001", "M001-VALIDATION.md");
+    assert.ok(existsSync(filePath), "VALIDATION.md should exist on disk");
+  });
+
+  it("rolls back DB row when disk write fails", async () => {
+    base = makeTmpBase();
+    const dbPath = join(base, ".gsd", "gsd.db");
+    openDatabase(dbPath);
+    insertMilestone({ id: "M001" });
+
+    // Force disk write failure by replacing the milestone directory with a
+    // regular file. saveFile() will fail because it cannot write inside a
+    // non-directory. This works cross-platform (chmod is ignored on Windows).
+    const milestoneDir = join(base, ".gsd", "milestones", "M001");
+    rmSync(milestoneDir, { recursive: true, force: true });
+    writeFileSync(milestoneDir, "not-a-directory");
+
+    const result = await handleValidateMilestone(VALID_PARAMS, base);
+
+    // Should return error
+    assert.ok("error" in result, "should return error when disk write fails");
+    assert.ok(result.error.includes("disk render failed"));
+
+    // DB row should have been rolled back (deleted)
+    const adapter = _getAdapter()!;
+    const row = adapter.prepare(
+      `SELECT * FROM assessments WHERE milestone_id = 'M001' AND scope = 'milestone-validation'`,
+    ).get();
+    assert.equal(row, undefined, "assessment row should be deleted after disk-write rollback");
+  });
+});
--- a/src/resources/extensions/gsd/tests/workflow-logger.test.ts
+++ b/src/resources/extensions/gsd/tests/workflow-logger.test.ts
@ -1,8 +1,11 @@
 // GSD Extension — Workflow Logger Tests
 // Tests for the centralized warning/error accumulator.

-import { describe, test, beforeEach } from "node:test";
+import { describe, test, beforeEach, afterEach } from "node:test";
 import assert from "node:assert/strict";
+import { existsSync, readFileSync } from "node:fs";
+import { join } from "node:path";
+import { makeTempDir, cleanup } from "./test-utils.ts";
 import {
  logWarning,
  logError,
@ -14,6 +17,7 @@ import {
  hasAnyIssues,
  summarizeLogs,
  formatForNotification,
+  setLogBasePath,
  _resetLogs,
 } from "../workflow-logger.ts";

@ -222,6 +226,44 @@ describe("workflow-logger", () => {
    });
  });

+  describe("audit log persistence", () => {
+    let dir: string;
+
+    beforeEach(() => {
+      dir = makeTempDir("wl-audit-");
+    });
+
+    afterEach(() => {
+      setLogBasePath("");
+      cleanup(dir);
+    });
+
+    test("writes entry to .gsd/audit-log.jsonl after setLogBasePath", () => {
+      setLogBasePath(dir);
+      logWarning("engine", "audit test entry");
+
+      const auditPath = join(dir, ".gsd", "audit-log.jsonl");
+      assert.ok(existsSync(auditPath), "audit-log.jsonl should exist");
+      const content = readFileSync(auditPath, "utf-8");
+      const entry = JSON.parse(content.trim());
+      assert.equal(entry.severity, "warn");
+      assert.equal(entry.component, "engine");
+      assert.equal(entry.message, "audit test entry");
+    });
+
+    test("_resetLogs does not clear the audit base path", () => {
+      setLogBasePath(dir);
+      _resetLogs();
+      logWarning("engine", "post-reset entry");
+
+      const auditPath = join(dir, ".gsd", "audit-log.jsonl");
+      assert.ok(existsSync(auditPath), "audit-log.jsonl should exist after _resetLogs");
+      const content = readFileSync(auditPath, "utf-8");
+      const entry = JSON.parse(content.trim());
+      assert.equal(entry.message, "post-reset entry");
+    });
+  });
+
  describe("buffer limit", () => {
    test("caps at MAX_BUFFER entries, dropping oldest", () => {
      const OVER = 110;
@ -237,6 +279,44 @@ describe("workflow-logger", () => {
    });
  });

+  describe("audit log persistence", () => {
+    let dir: string;
+
+    beforeEach(() => {
+      dir = makeTempDir("wl-audit-");
+    });
+
+    afterEach(() => {
+      setLogBasePath("");
+      cleanup(dir);
+    });
+
+    test("writes entry to .gsd/audit-log.jsonl after setLogBasePath", () => {
+      setLogBasePath(dir);
+      logWarning("engine", "audit test entry");
+
+      const auditPath = join(dir, ".gsd", "audit-log.jsonl");
+      assert.ok(existsSync(auditPath), "audit-log.jsonl should exist");
+      const content = readFileSync(auditPath, "utf-8");
+      const entry = JSON.parse(content.trim());
+      assert.equal(entry.severity, "warn");
+      assert.equal(entry.component, "engine");
+      assert.equal(entry.message, "audit test entry");
+    });
+
+    test("_resetLogs does not clear the audit base path", () => {
+      setLogBasePath(dir);
+      _resetLogs();
+      logWarning("engine", "post-reset entry");
+
+      const auditPath = join(dir, ".gsd", "audit-log.jsonl");
+      assert.ok(existsSync(auditPath), "audit-log.jsonl should exist after _resetLogs");
+      const content = readFileSync(auditPath, "utf-8");
+      const entry = JSON.parse(content.trim());
+      assert.equal(entry.message, "post-reset entry");
+    });
+  });
+
  describe("stderr output", () => {
    test("writes WARN prefix to stderr for warnings", (t) => {
      const written: string[] = [];
--- a/src/resources/extensions/gsd/tests/worktree-preferences-sync.test.ts
+++ b/src/resources/extensions/gsd/tests/worktree-preferences-sync.test.ts
@ -0,0 +1,130 @@
+/**
+ * worktree-preferences-sync.test.ts — Regression test for #2684.
+ *
+ * Verifies that preferences.md is seeded into auto-mode worktrees:
+ *
+ *   1. copyPlanningArtifacts() copies preferences.md on initial worktree creation
+ *   2. syncGsdStateToWorktree() forward-syncs preferences.md (additive only)
+ *   3. syncWorktreeStateBack() does NOT overwrite project root preferences.md
+ */
+
+import test from "node:test";
+import assert from "node:assert/strict";
+import {
+  existsSync,
+  mkdirSync,
+  mkdtempSync,
+  readFileSync,
+  rmSync,
+  writeFileSync,
+} from "node:fs";
+import { join } from "node:path";
+import { tmpdir } from "node:os";
+
+import {
+  syncGsdStateToWorktree,
+  syncWorktreeStateBack,
+} from "../auto-worktree.ts";
+
+// ─── Helpers ─────────────────────────────────────────────────────────
+
+function makeTempDir(prefix: string): string {
+  return mkdtempSync(join(tmpdir(), `gsd-prefs-test-${prefix}-`));
+}
+
+function cleanup(...dirs: string[]): void {
+  for (const dir of dirs) {
+    rmSync(dir, { recursive: true, force: true });
+  }
+}
+
+function writeFile(dir: string, relativePath: string, content: string): void {
+  const fullPath = join(dir, relativePath);
+  mkdirSync(join(fullPath, ".."), { recursive: true });
+  writeFileSync(fullPath, content, "utf-8");
+}
+
+// ─── Tests ───────────────────────────────────────────────────────────
+
+const PREFS_CONTENT = [
+  "# Preferences",
+  "",
+  "post_unit_hooks:",
+  "  - npm run lint",
+  "",
+  "skill_rules:",
+  '  - use: "frontend-design"',
+].join("\n");
+
+test("#2684: syncGsdStateToWorktree forward-syncs preferences.md when missing from worktree", (t) => {
+  const mainBase = makeTempDir("main");
+  const wtBase = makeTempDir("wt");
+  t.after(() => cleanup(mainBase, wtBase));
+
+  // Project root has preferences.md
+  writeFile(mainBase, ".gsd/preferences.md", PREFS_CONTENT);
+
+  // Worktree has .gsd/ but no preferences.md
+  mkdirSync(join(wtBase, ".gsd"), { recursive: true });
+
+  const result = syncGsdStateToWorktree(mainBase, wtBase);
+
+  assert.ok(
+    existsSync(join(wtBase, ".gsd", "preferences.md")),
+    "preferences.md should be copied to worktree",
+  );
+  assert.equal(
+    readFileSync(join(wtBase, ".gsd", "preferences.md"), "utf-8"),
+    PREFS_CONTENT,
+    "preferences.md content should match source",
+  );
+  assert.ok(
+    result.synced.includes("preferences.md"),
+    "preferences.md should appear in synced list",
+  );
+});
+
+test("#2684: syncGsdStateToWorktree does NOT overwrite existing worktree preferences.md", (t) => {
+  const mainBase = makeTempDir("main");
+  const wtBase = makeTempDir("wt");
+  t.after(() => cleanup(mainBase, wtBase));
+
+  const rootPrefs = "# Root preferences\nold: true";
+  const wtPrefs = "# Worktree preferences\nmodified: true";
+
+  writeFile(mainBase, ".gsd/preferences.md", rootPrefs);
+  writeFile(wtBase, ".gsd/preferences.md", wtPrefs);
+
+  syncGsdStateToWorktree(mainBase, wtBase);
+
+  assert.equal(
+    readFileSync(join(wtBase, ".gsd", "preferences.md"), "utf-8"),
+    wtPrefs,
+    "existing worktree preferences.md must not be overwritten",
+  );
+});
+
+test("#2684: syncWorktreeStateBack does NOT overwrite project root preferences.md", (t) => {
+  const mainBase = makeTempDir("main");
+  const wtBase = makeTempDir("wt");
+  const mid = "M001";
+  t.after(() => cleanup(mainBase, wtBase));
+
+  const rootPrefs = "# Root preferences\nauthoritative: true";
+  const wtPrefs = "# Worktree preferences\nstale-copy: true";
+
+  writeFile(mainBase, ".gsd/preferences.md", rootPrefs);
+  writeFile(wtBase, ".gsd/preferences.md", wtPrefs);
+
+  // Worktree needs at least a milestone dir for the function to proceed
+  mkdirSync(join(wtBase, ".gsd", "milestones", mid), { recursive: true });
+  mkdirSync(join(mainBase, ".gsd", "milestones"), { recursive: true });
+
+  syncWorktreeStateBack(mainBase, wtBase, mid);
+
+  assert.equal(
+    readFileSync(join(mainBase, ".gsd", "preferences.md"), "utf-8"),
+    rootPrefs,
+    "project root preferences.md must NOT be overwritten by worktree copy",
+  );
+});
--- a/src/resources/extensions/gsd/tools/complete-task.ts
+++ b/src/resources/extensions/gsd/tools/complete-task.ts
@ -250,6 +250,16 @@ export async function handleCompleteTask(
    );
    const rollbackAdapter = _getAdapter();
    if (rollbackAdapter) {
+      // Delete orphaned verification_evidence rows first (FK constraint
+      // references tasks, so evidence must go before status change).
+      // Without this, retries accumulate duplicate evidence rows (#2724).
+      rollbackAdapter.prepare(
+        `DELETE FROM verification_evidence WHERE milestone_id = :mid AND slice_id = :sid AND task_id = :tid`,
+      ).run({
+        ":mid": params.milestoneId,
+        ":sid": params.sliceId,
+        ":tid": params.taskId,
+      });
      rollbackAdapter.prepare(
        `UPDATE tasks SET status = 'pending' WHERE milestone_id = :mid AND slice_id = :sid AND id = :tid`,
      ).run({
--- a/src/resources/extensions/gsd/tools/validate-milestone.ts
+++ b/src/resources/extensions/gsd/tools/validate-milestone.ts
@ -76,7 +76,7 @@ export async function handleValidateMilestone(
    return { error: `verdict must be one of: ${VALIDATION_VERDICTS.join(", ")}` };
  }

-  // ── Filesystem render ──────────────────────────────────────────────────
+  // ── Resolve paths and render markdown ────────────────────────────────
  const validationMd = renderValidationMarkdown(params);

  let validationPath: string;
@ -89,16 +89,11 @@ export async function handleValidateMilestone(
    validationPath = join(manualDir, `${params.milestoneId}-VALIDATION.md`);
  }

-  try {
-    await saveFile(validationPath, validationMd);
-  } catch (renderErr) {
-    process.stderr.write(
-      `gsd-db: validate_milestone — disk render failed: ${(renderErr as Error).message}\n`,
-    );
-    return { error: `disk render failed: ${(renderErr as Error).message}` };
-  }
-
-  // ── DB write — store in assessments table ──────────────────────────────
+  // ── DB write first — matches complete-task/complete-slice pattern ───
+  // Write DB before disk so a crash between the two leaves a recoverable
+  // state: the DB row exists but the file is missing, which projection
+  // rendering can regenerate. The inverse (file exists, no DB row) is
+  // harder to detect and recover from (#2725).
  const validatedAt = new Date().toISOString();

  transaction(() => {
@ -115,6 +110,23 @@ export async function handleValidateMilestone(
    });
  });

+  // ── Filesystem render (outside transaction) ────────────────────────────
+  // If disk render fails, roll back the DB row so state stays consistent.
+  try {
+    await saveFile(validationPath, validationMd);
+  } catch (renderErr) {
+    process.stderr.write(
+      `gsd-db: validate_milestone — disk render failed, rolling back DB row: ${(renderErr as Error).message}\n`,
+    );
+    const rollbackAdapter = _getAdapter();
+    if (rollbackAdapter) {
+      rollbackAdapter.prepare(
+        `DELETE FROM assessments WHERE milestone_id = :mid AND scope = 'milestone-validation'`,
+      ).run({ ":mid": params.milestoneId });
+    }
+    return { error: `disk render failed: ${(renderErr as Error).message}` };
+  }
+
  invalidateStateCache();
  clearPathCache();
  clearParseCache();
--- a/Show more
+++ b/Show more