docs(M003): context, requirements, and roadmap

2026-03-14 22:02:26 -06:00 · 2026-03-14 22:02:26 -06:00 · 24597873a0
commit 24597873a0
parent ac33781fd0
6 changed files with 947 additions and 0 deletions
--- a/.gsd/DECISIONS.md
+++ b/.gsd/DECISIONS.md
@ -0,0 +1,41 @@
+# Decisions Register
+
+<!-- Append-only. Never edit or remove existing rows.
+     To reverse a decision, add a new row that supersedes it.
+     Read this file at the start of any planning or research phase. -->
+
+| # | When | Scope | Decision | Choice | Rationale | Revisable? |
+|---|------|-------|----------|--------|-----------|------------|
+| D001 | M001 | arch | Secret collection insertion point | At `/gsd auto` entry (startAuto), not as a dispatch unit type | Keeps the state machine untouched. Collection is a one-time gate, not a repeating unit. Simpler, less risk of dispatch loop bugs. | Yes — if collection needs to happen mid-milestone |
+| D002 | M001 | convention | Manifest file naming | `M00x-SECRETS.md` via existing `resolveMilestoneFile(base, mid, "SECRETS")` | Consistent with all other milestone-level files (CONTEXT, ROADMAP, RESEARCH). No new path resolver needed. | No |
+| D003 | M001 | pattern | Summary screen interactivity | Read-only with auto-skip (no interactive deselection) | Matches the "walk away" philosophy. Simpler UX, fewer edge cases. User can always re-run collection. | Yes — if users request deselection |
+| D004 | M001 | pattern | Guidance display placement | Same page as masked input (above the editor) | Single page per key — no extra navigation. User sees guidance while entering the value. | Yes — if terminal height constraints cause problems |
+| D005 | M001 | convention | Manifest format | Markdown with H3 sections per key, bold fields, numbered guidance | Consistent with all other .gsd files. Parser and formatter already exist in files.ts. | No |
+| D006 | M001 | arch | Destination inference | Reuse existing `detectDestination()` from get-secrets-from-user.ts | Simple file-presence checks (vercel.json → Vercel, convex/ → Convex, default → .env). Already proven. | Yes — if per-key destination override needed |
+| D007 | M002 | arch | File structure after module split | Split index.ts into state.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts, utils.ts, evaluate-helpers.ts, and tools/ directory | 5000-line monolith is unmaintainable; module boundaries enable safe changes. core.js already established the pattern. | No |
+| D008 | M002 | library | Image resizing library | sharp | Fast, well-maintained, standard Node image processing. Replaces fragile canvas-based approach that depends on page context. | No |
+| D009 | M002 | convention | Navigate screenshot default | Off by default, opt-in via parameter | Big token savings. Agent uses browser_screenshot explicitly when visual verification needed. | Yes — if agents consistently need screenshots on navigate |
+| D010 | M002 | arch | Browser-side utility injection | page.addInitScript under window.__pi namespace | Survives navigation, available before page scripts, namespaced to avoid collisions. | Yes — if timing issues discovered |
+| D011 | M002 | convention | Intent resolution approach | Deterministic heuristics only, no LLM calls | Predictable latency and cost. Scoring functions are testable and debuggable. | Yes — if heuristic coverage proves insufficient |
+| D012 | M002 | convention | Browser reuse across sessions | Skip completely | Architecturally different from within-session work; user directed to exclude entirely. | No |
+| D013 | M002/S01 | pattern | Mutable state accessor pattern | get/set functions for all 18 state variables, not `export let` | ES module live bindings break under jiti's CJS shim. Accessors guarantee consumers see mutations. | No |
+| D014 | M002/S01 | pattern | ToolDeps interface location | Defined in state.ts alongside types it references | Keeps the dependency graph simple — tool files import state.ts for ToolDeps + types. | Yes — could move to separate types.ts if state.ts grows |
+| D015 | M002/S01 | pattern | Factory pattern for lifecycle-dependent utils | createGetLivePagesSnapshot(ensureBrowser) instead of direct import | Avoids circular dependency between utils.ts and lifecycle.ts. Wired at orchestrator level. | No |
+| D016 | M002/S01 | pattern | Tool file import strategy | Tool files import state accessors and core.js functions directly — ToolDeps carries only infrastructure functions needing lifecycle wiring | Keeps ToolDeps lean. State accessors are stable imports, not runtime-wired dependencies. Avoids bloating the deps interface with every utility. | Yes — if ToolDeps grows unwieldy |
+| D017 | M002/S02 | pattern | Action tool signal classification | High-signal: click, type, key_press, select_option, set_checked, navigate, click_ref, fill_ref. Low-signal: scroll, hover, drag, upload_file, hover_ref. | High-signal tools produce meaningful page changes worth capturing body text for diffs. Low-signal tools don't change page content. fill_ref is high-signal because input value changes affect form state. | Yes — if new tools need reclassification |
+| D018 | M002/S02 | pattern | postActionSummary retention | Keep postActionSummary in capture.ts for summary-only tools (go_back, go_forward, reload) but remove from action tools that do before/after diff | Summary-only tools don't do diffs and don't need beforeState — postActionSummary is the right abstraction for them. Action tools need consolidated capture. | Yes — could remove entirely if summary-only tools get before/after diff |
+| D019 | M002/S02 | tuning | Zero-mutation settle thresholds | 60ms detection window, 30ms shortened quiet window, totalMutationsSeen === 0 required | Conservative thresholds — 60ms is enough time for any async DOM update to start, 30ms shortened window still catches late mutations. Requiring zero total mutations (not just current poll) prevents false short-circuits. | Yes — if real-world testing shows 60ms is too short for slow SPAs |
+| D020 | M002/S04 | pattern | Form analysis evaluate location | Form analysis evaluate logic lives in tools/forms.ts, not extracted to evaluate-helpers.ts | Form-specific, not a shared utility. The label resolution heuristic is only used by form tools. Keeping it local avoids bloating the shared injection. | Yes — if S05 intent tools need label resolution |
+| D021 | M002/S04 | pattern | Fill uses Playwright APIs, not evaluate | browser_fill_form uses Playwright locator.fill()/selectOption()/setChecked() instead of page.evaluate() value setting | Playwright APIs trigger proper input/change events and handle framework-specific reactivity (React, Vue). Direct value setting via evaluate skips event dispatch and breaks reactive frameworks. | No |
+| D022 | M002/S04 | pattern | Fill field matching priority | Label (exact → case-insensitive) → name → placeholder → aria-label | Label is the most human-readable identifier. Name is the most reliable programmatic identifier. Placeholder and aria-label are fallbacks. Exact match before fuzzy prevents wrong-field fills. | Yes — if real-world usage shows a different priority works better |
+| D023 | M002/S05 | pattern | Intent scoring model | 4 orthogonal dimensions per intent, each 0-1, summed and clamped | Consistent scoring structure across all 8 intents. Makes scoring testable and debuggable — each dimension has a named reason. 4 dimensions balance discrimination vs complexity. | Yes — could add/remove dimensions per intent if real-world usage shows imbalance |
+| D024 | M002/S05 | pattern | search_field action type | Focus instead of click for search_field intent in browser_act | Search fields need keyboard focus for typing, not a click that might submit or toggle. Focus is the semantically correct action. Other intents use click. | Yes — if focus proves unreliable on specific input implementations |
+| D025 | M002/S06 | pattern | Test import strategy for browser-tools | jiti CJS imports instead of ESM resolve-ts hook | The resolve-ts ESM hook breaks on core.js (plain .js file imported by TS modules). jiti handles mixed .ts/.js imports correctly from a .cjs test file. | No |
+| D026 | M002/S06 | pattern | Testing module-private functions | Source extraction via readFileSync + brace-match + strip types + eval | Avoids exporting test-only APIs from production modules. Fragile to refactors but tests fail clearly when extraction breaks. Acceptable tradeoff for test code. | Yes — if private functions get exported for other reasons |
+| D027 | M003 | arch | Git isolation model | Worktree-per-milestone (default for new projects) | Eliminates .gsd/ merge conflicts structurally. Each milestone gets its own worktree with isolated .gsd/ state. Branch-per-slice remains as opt-in legacy mode via git.isolation: "branch". | No |
+| D028 | M003 | arch | Slice merge strategy within worktree | --no-ff merge (not squash) | Preserves full commit history as a diary of agent work. Merge commits give natural slice boundaries. Squash would destroy per-task granularity. | Yes — if commit noise proves problematic |
+| D029 | M003 | arch | Milestone-to-main merge strategy | Squash merge | Main gets one clean commit per milestone. Individually revertable. Reads like a changelog. Full history preserved on milestone branch for forensics. | No |
+| D030 | M003 | arch | Failure handling philosophy | Stop but self-heal | Auto-mode pauses, runs automatic repair (abort, reset, retry), resumes without user intervention in most cases. Only truly ambiguous conflicts need a human. Balances continuity with trust. | Yes — if self-heal proves unreliable |
+| D031 | M003 | arch | Target user priority | Vibe coder first | Zero git errors as the default. Senior engineers configure overrides. Biggest market opportunity is users who can't use git today. | No |
+| D032 | M003 | convention | Auto-worktree naming | Milestone ID as worktree name, milestone/<MID> as branch | .gsd/worktrees/M003/ with branch milestone/M003. Manual worktrees use worktree/<name> branches. No collision between auto and manual. | Yes — if naming conflicts discovered |
+| D033 | M003 | arch | Migration strategy | New projects default to worktree; existing keep branch-per-slice | Detection: if project has gsd/* branches or milestone META with integration branch → legacy. Otherwise → worktree. No forced migration. | Yes — if adoption shows users want migration tooling |
--- a/.gsd/PROJECT.md
+++ b/.gsd/PROJECT.md
@ -0,0 +1,43 @@
+# Project
+
+## What This Is
+
+A pi coding agent extension (GSD — "Get Stuff Done") that provides structured planning, auto-mode execution, and project management for autonomous coding sessions. Includes proactive secret management, browser automation tools for UI verification, and worktree-isolated git architecture for zero-friction autonomous execution.
+
+## Core Value
+
+Auto-mode runs from start to finish without blocking. Git is invisible — no merge conflicts, no checkout errors, no state corruption. The system is automagical for vibe coders and configurable for senior engineers.
+
+## Current State
+
+The GSD extension is fully functional with:
+- Milestone/slice/task planning hierarchy
+- Auto-mode state machine with fresh-session-per-unit dispatch
+- Guided `/gsd` wizard flow
+- `secure_env_collect` tool with masked TUI input, multi-destination write support, guidance display, and summary screen
+- Proactive secret management: planning prompts forecast secrets, manifests persist them, auto-mode collects them before first dispatch
+- Browser-tools extension with 47 registered tools covering navigation, interaction, inspection, verification, tracing, debugging, form intelligence (browser_analyze_form, browser_fill_form), and intent-ranked retrieval and semantic actions (browser_find_best, browser_act)
+- Browser-tools `core.js` with shared utilities for action timeline, page registry, state diffing, assertions, fingerprinting
+- Branch-per-slice git model with squash merge to main (being superseded by worktree-isolated model in M003)
+
+## Architecture / Key Patterns
+
+- **Extension model**: pi extensions register tools, commands, hooks via `ExtensionAPI`
+- **State machine**: `auto.ts` drives `dispatchNextUnit()` which reads disk state and dispatches fresh sessions
+- **Secrets gate**: `startAuto()` checks `getManifestStatus()` before first dispatch
+- **Disk-driven state**: `.gsd/` files are the source of truth, `STATE.md` is derived cache
+- **File parsing**: `files.ts` has markdown parsers for all GSD file types
+- **Browser-tools**: Modular structure — slim `index.ts` orchestrator, 8 focused infrastructure modules (state.ts, utils.ts, evaluate-helpers.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts), 11 categorized tool files under `tools/` (including forms.ts, intent.ts), shared infrastructure in `core.js` (~1000 lines). Browser-side utilities injected once via `addInitScript` under `window.__pi` namespace. Uses Playwright for browser control. Accessibility-first state representation, deterministic versioned refs, adaptive DOM settling, compact post-action summaries. Form tools use Playwright locator APIs for type-aware filling with structured result reporting. Intent tools use deterministic 4-dimension heuristic scoring for element retrieval and one-call semantic actions.
+- **Prompt templates**: `prompts/` directory with mustache-like `{{var}}` substitution
+- **TUI components**: `@gsd/pi-tui` provides `Editor`, `Text`, key handling, themes
+- **Git architecture**: Worktree-per-milestone isolation (default for new projects). Each milestone gets its own git worktree with isolated `.gsd/` state. Slices merge via `--no-ff` into the milestone branch (preserving full commit history). Milestones squash-merge to main on completion. Legacy branch-per-slice model supported via `git.isolation: "branch"` preference.
+
+## Capability Contract
+
+See `.gsd/REQUIREMENTS.md` for the explicit capability contract, requirement status, and coverage mapping.
+
+## Milestone Sequence
+
+- [x] M001: Proactive Secret Management — Front-loaded API key collection into planning so auto-mode runs uninterrupted (10 requirements validated)
+- [x] M002: Browser Tools Performance & Intelligence — Module decomposition, action pipeline optimization, sharp-based screenshots, form intelligence, intent-ranked retrieval, semantic actions, 108-test suite (12 requirements validated)
+- [ ] M003: Worktree-Isolated Git Architecture — Worktree-per-milestone isolation eliminating merge conflicts, self-healing git repair, zero git errors for vibe coders, configurable for senior engineers
--- a/.gsd/REQUIREMENTS.md
+++ b/.gsd/REQUIREMENTS.md
@ -0,0 +1,553 @@
+# Requirements
+
+This file is the explicit capability and coverage contract for the project.
+
+## Active
+
+### R029 — Auto-worktree creation on milestone start
+- Class: core-capability
+- Status: active
+- Description: When auto-mode starts a new milestone, it automatically creates a git worktree under `.gsd/worktrees/<MID>/` with branch `milestone/<MID>`, `chdir`s into it, and dispatches all units from within the worktree. The user never runs a git command.
+- Why it matters: Worktree isolation gives each milestone its own `.gsd/` directory, eliminating the entire category of `.gsd/` merge conflicts that have caused ~15 separate bug fixes to date.
+- Source: user
+- Primary owning slice: M003/S01
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Must handle: fresh milestone (no worktree yet), resumed milestone (worktree already exists), milestone started from non-main branch. Must coexist with manual `/worktree` command.
+
+### R030 — Auto-worktree teardown + squash-merge on milestone complete
+- Class: core-capability
+- Status: active
+- Description: When a milestone completes, the milestone branch is squash-merged to main with a rich commit message, the worktree is removed, and `process.chdir` returns to the main project root. Main receives exactly one commit per milestone.
+- Why it matters: Main stays clean and always represents completed, working milestones. One commit per milestone is individually revertable.
+- Source: user
+- Primary owning slice: M003/S03
+- Supporting slices: M003/S01
+- Validation: unmapped
+- Notes: Must handle: dirty worktree at teardown time (auto-commit first), failed squash-merge (self-heal), remote push after merge (if auto_push enabled).
+
+### R031 — `--no-ff` slice merges within milestone worktree
+- Class: core-capability
+- Status: active
+- Description: Completed slices merge into the milestone branch via `--no-ff` merge instead of squash. This preserves the full per-task commit history on the milestone branch, with merge commits providing natural slice boundaries.
+- Why it matters: The commit history is a diary of the agent's work. The LLM can read `git log` to understand what happened. Squashing slices destroys this granularity. `--no-ff` merge commits give clean slice boundaries while keeping all commits.
+- Source: user
+- Primary owning slice: M003/S02
+- Supporting slices: M003/S01
+- Validation: unmapped
+- Notes: This is the default for worktree-isolated mode. The branch-per-slice legacy model retains its existing squash default.
+
+### R032 — Rich milestone-level squash commit message
+- Class: core-capability
+- Status: active
+- Description: When a milestone squash-merges to main, the commit message summarizes all slices and their key outcomes. Format: conventional commit subject + slice task list body + branch metadata.
+- Why it matters: Main's git log should read like a changelog. Each milestone commit should tell the full story of what was built.
+- Source: user
+- Primary owning slice: M003/S03
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Similar to current rich commit message for slice merges, but at milestone level. Should list all slices with their titles and key outcomes.
+
+### R033 — `git.isolation` preference
+- Class: core-capability
+- Status: active
+- Description: A `git.isolation` preference with values `"worktree"` (default for new projects) and `"branch"` (legacy model). New projects that have never run GSD default to worktree isolation. Existing projects with an established branch-per-slice history default to branch mode.
+- Why it matters: Backwards compatibility — existing projects must not break. New projects get the better model by default.
+- Source: user
+- Primary owning slice: M003/S04
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Detection heuristic: if the project has existing `gsd/*` branches or milestone metadata with integration branch records, it's a legacy project → default to "branch". Otherwise → default to "worktree".
+
+### R034 — `git.merge_to_main` preference
+- Class: core-capability
+- Status: active
+- Description: A `git.merge_to_main` preference with values `"milestone"` (default) and `"slice"`. In milestone mode, main only receives commits when milestones complete. In slice mode, each completed slice squash-merges to main immediately (current behavior).
+- Why it matters: Senior engineers who want frequent integration can opt into slice-level merges. Vibe coders get the cleaner milestone-level default.
+- Source: user
+- Primary owning slice: M003/S04
+- Supporting slices: M003/S03
+- Validation: unmapped
+- Notes: `merge_to_main: "slice"` with `isolation: "worktree"` is valid — slices squash-merge to main from within the worktree, but the worktree still provides `.gsd/` isolation.
+
+### R035 — Self-healing git repair on failure
+- Class: core-capability
+- Status: active
+- Description: When git operations fail during auto-mode (merge conflict, checkout failure, corrupt state), the system automatically attempts repair: abort incomplete merges, reset working tree, retry the operation. Only truly unresolvable conflicts (two humans edited the same code) pause auto-mode with a clear explanation.
+- Why it matters: The north star is "automagical — just runs." Git errors are the #1 cause of auto-mode halting. Self-healing eliminates most of those stops.
+- Source: user
+- Primary owning slice: M003/S05
+- Supporting slices: M003/S01, M003/S02, M003/S03
+- Validation: unmapped
+- Notes: The worktree model eliminates most `.gsd/` conflicts structurally. Self-healing handles the remaining edge cases (code conflicts, remote divergence, corrupt index).
+
+### R036 — `.gsd/` conflict resolution elimination
+- Class: quality-attribute
+- Status: active
+- Description: The ~60 lines of `.gsd/` auto-resolve conflict code in `mergeSliceToMain` and the ~44 merge-related recovery paths in `auto.ts` are simplified or removed. Worktree isolation makes most of this code structurally unnecessary.
+- Why it matters: Dead conflict resolution code is maintenance burden and a source of bugs. If the architecture eliminates the problem, the code that patches it should go.
+- Source: inferred
+- Primary owning slice: M003/S02
+- Supporting slices: M003/S06
+- Validation: unmapped
+- Notes: Only remove code that is genuinely unnecessary in worktree mode. Keep the legacy branch-per-slice path intact for `git.isolation: "branch"` users.
+
+### R037 — Zero git errors for vibe coders
+- Class: primary-user-loop
+- Status: active
+- Description: Users with zero git knowledge should never see a git error message during auto-mode. All git operations are invisible. If something fails, the system self-heals or presents a non-technical explanation with a clear action ("Run `/gsd doctor` to fix this").
+- Why it matters: Vibe coders are the primary market. Git errors are incomprehensible to them and destroy trust in the system.
+- Source: user
+- Primary owning slice: M003/S05
+- Supporting slices: all M003 slices
+- Validation: unmapped
+- Notes: This is a quality bar, not a single feature. Every git-touching codepath must handle errors gracefully.
+
+### R038 — Backwards compatibility with branch-per-slice model
+- Class: continuity
+- Status: active
+- Description: Existing projects that use the branch-per-slice model continue working exactly as they do today. No migration required. The old codepaths remain functional when `git.isolation: "branch"` is active.
+- Why it matters: Breaking existing users' workflows would destroy trust.
+- Source: user
+- Primary owning slice: M003/S04
+- Supporting slices: none
+- Validation: unmapped
+- Notes: All existing git-service.ts tests must continue passing in branch mode.
+
+### R039 — Manual `/worktree` coexistence with auto-worktrees
+- Class: integration
+- Status: active
+- Description: The manual `/worktree` command for exploration coexists with auto-mode's milestone worktrees. Different naming conventions prevent conflicts: auto-worktrees use `milestone/M003` branches, manual worktrees use `worktree/<name>` branches.
+- Why it matters: Manual worktrees are a valuable exploration tool. They shouldn't be broken by auto-mode's worktree usage.
+- Source: user
+- Primary owning slice: M003/S01
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Auto-worktrees are created under `.gsd/worktrees/` just like manual ones, but with milestone ID as the name. The naming convention prevents branch collisions.
+
+### R040 — Doctor git health checks
+- Class: operability
+- Status: active
+- Description: `/gsd doctor` detects and optionally fixes git-related issues: orphaned auto-worktrees, stale milestone branches, corrupt merge state (MERGE_HEAD/SQUASH_MSG), tracked runtime files, missing gitignore patterns.
+- Why it matters: When things do go wrong, users need a one-command fix. Doctor is the safety net.
+- Source: inferred
+- Primary owning slice: M003/S06
+- Supporting slices: M003/S05
+- Validation: unmapped
+- Notes: Doctor already handles planning artifact issues. This extends it to git health.
+
+### R041 — Test coverage for worktree-isolated flow
+- Class: quality-attribute
+- Status: active
+- Description: Test suite covers: auto-worktree create/teardown, `--no-ff` slice merge within worktree, milestone squash to main, preference switching between isolation modes, self-heal scenarios, doctor git checks. All existing git tests continue passing.
+- Why it matters: The git system is the most bug-prone part of GSD. Tests prevent regressions.
+- Source: inferred
+- Primary owning slice: M003/S07
+- Supporting slices: all M003 slices
+- Validation: unmapped
+- Notes: Must test both worktree and branch isolation modes.
+
+## Validated
+
+### R001 — Secret forecasting during milestone planning
+- Class: core-capability
+- Status: validated
+- Description: When a milestone is planned, the LLM analyzes slices for external service dependencies and writes a secrets manifest listing every predicted API key with setup guidance.
+- Why it matters: Without forecasting, auto-mode discovers missing keys mid-execution and blocks for hours waiting for user input.
+- Source: user
+- Primary owning slice: M001/S01
+- Supporting slices: none
+- Validation: plan-milestone.md Secret Forecasting section (line 62) instructs LLM to write manifest. Parser round-trip tested in parsers.test.ts.
+- Notes: The plan-milestone prompt has forecasting instructions. The manifest format and parser are implemented and tested.
+
+### R002 — Secrets manifest persisted in .gsd/
+- Class: continuity
+- Status: validated
+- Description: The secrets manifest is a durable markdown file at `.gsd/milestones/M00x/M00x-SECRETS.md` that survives session boundaries and can be re-read by any future unit.
+- Why it matters: Collection may happen in a different session than planning. The manifest must persist on disk.
+- Source: user
+- Primary owning slice: M001/S01
+- Supporting slices: none
+- Validation: parseSecretsManifest/formatSecretsManifest round-trip tested (parsers.test.ts), resolveMilestoneFile(base, mid, "SECRETS") resolves path.
+- Notes: Parser/formatter implemented in files.ts. Template exists at templates/secrets-manifest.md.
+
+### R003 — Step-by-step guidance per key
+- Class: primary-user-loop
+- Status: validated
+- Description: Each secret in the manifest includes numbered steps for obtaining the key (navigate to dashboard → create project → generate key → copy), a dashboard URL, and a format hint.
+- Why it matters: Users shouldn't have to figure out where to find each key. The guidance makes collection self-service.
+- Source: user
+- Primary owning slice: M001/S02
+- Supporting slices: M001/S01
+- Validation: collectOneSecret renders numbered dim-styled guidance steps with wrapping (collect-from-manifest.test.ts tests 6-8).
+- Notes: Guidance quality is LLM-dependent and best-effort.
+
+### R004 — Summary screen before collection
+- Class: primary-user-loop
+- Status: validated
+- Description: Before collecting secrets one-by-one, show a read-only summary screen listing all needed keys with their status (pending / already set / skipped). Auto-skip keys that already exist in the environment.
+- Why it matters: The user needs to see the full picture before entering keys. Already-set keys should not require re-entry.
+- Source: user
+- Primary owning slice: M001/S02
+- Supporting slices: none
+- Validation: showSecretsSummary() renders read-only ctx.ui.custom screen with status indicators via makeUI().progressItem() (collect-from-manifest.test.ts tests 4-5).
+- Notes: Read-only with auto-skip — no interactive deselection.
+
+### R005 — Existing key detection and silent skip
+- Class: primary-user-loop
+- Status: validated
+- Description: Before prompting for a key, check `.env` and `process.env`. If the key already exists, mark it as "already set" in the summary and skip collection.
+- Why it matters: Users shouldn't re-enter keys they've already configured. Prevents frustration and errors.
+- Source: user
+- Primary owning slice: M001/S02
+- Supporting slices: none
+- Validation: getManifestStatus cross-references checkExistingEnvKeys, categorizes env-present keys as existing (manifest-status.test.ts tests 4,7). collectSecretsFromManifest skips them (collect-from-manifest.test.ts tests 1-2).
+- Notes: `checkExistingEnvKeys()` implemented in get-secrets-from-user.ts.
+
+### R006 — Smart destination detection
+- Class: integration
+- Status: validated
+- Description: Automatically detect whether secrets should go to .env, Vercel, or Convex based on project file presence (vercel.json → Vercel, convex/ dir → Convex, default → .env).
+- Why it matters: Users shouldn't have to specify the destination manually. The system should do the right thing.
+- Source: user
+- Primary owning slice: M001/S02
+- Supporting slices: none
+- Validation: collectSecretsFromManifest calls detectDestination() for destination inference. applySecrets() routes to dotenv/vercel/convex accordingly.
+- Notes: `detectDestination()` implemented in get-secrets-from-user.ts.
+
+### R007 — Auto-mode collection at entry point
+- Class: core-capability
+- Status: validated
+- Description: When the user runs `/gsd auto`, check for a secrets manifest with pending keys. If found, collect them before dispatching the first slice. Collection happens once at the entry point, not as a dispatch unit.
+- Why it matters: This is the primary integration point — auto-mode must not start execution with uncollected secrets.
+- Source: user
+- Primary owning slice: M001/S03
+- Supporting slices: M001/S01, M001/S02
+- Validation: startAuto() secrets gate at auto.ts:479. auto-secrets-gate.test.ts — 3/3 pass covering null manifest, pending keys, and no-pending-keys paths.
+- Notes: Collection at entry point (startAuto), not as a separate unit type in dispatchNextUnit. D001 satisfied.
+
+### R008 — Guided /gsd wizard integration
+- Class: core-capability
+- Status: validated
+- Description: After milestone planning in the guided `/gsd` flow, trigger secret collection if a manifest exists with pending keys.
+- Why it matters: Users who plan via the wizard should also get prompted for secrets before auto-mode begins.
+- Source: user
+- Primary owning slice: M001/S03
+- Supporting slices: M001/S01, M001/S02
+- Validation: guided-flow.ts calls startAuto() directly (lines 52, 486, 647, 794) — all guided flow paths that start auto-mode inherit the secrets gate.
+- Notes: The guided flow dispatches to startAuto after planning. Collection is inherited via the gate.
+
+### R009 — Planning prompts instruct LLM to forecast secrets
+- Class: integration
+- Status: validated
+- Description: The plan-milestone prompt template includes instructions for the LLM to analyze slices for external service dependencies and write the secrets manifest.
+- Why it matters: Without prompt instructions, the LLM won't know to forecast secrets.
+- Source: user
+- Primary owning slice: M001/S01
+- Supporting slices: none
+- Validation: plan-milestone.md has Secret Forecasting section at line 62 with instructions to write {{secretsOutputPath}} with H3 sections per key.
+- Notes: Implemented in plan-milestone.md.
+
+### R010 — secure_env_collect enhanced with guidance display
+- Class: primary-user-loop
+- Status: validated
+- Description: The secure_env_collect TUI renders multi-line guidance steps above the masked input field on the same page, so the user sees setup instructions while entering the key.
+- Why it matters: Without visible guidance, the user has to find keys on their own despite the LLM having generated instructions.
+- Source: user
+- Primary owning slice: M001/S02
+- Supporting slices: none
+- Validation: collectOneSecret accepts guidance parameter, renders numbered dim-styled lines with wrapTextWithAnsi above masked input (collect-from-manifest.test.ts tests 6-8).
+- Notes: The guidance field is rendered in collectOneSecret().
+
+### R015 — Module decomposition of browser-tools
+- Class: quality-attribute
+- Status: validated
+- Description: The monolithic browser-tools index.ts (~5000 lines) is split into focused modules: shared infrastructure, tool groups, and browser-side utilities. All 43 existing tools continue to work identically.
+- Why it matters: A 5000-line file is unmaintainable and makes targeted changes risky. Module boundaries enable safe refactoring and new tool development.
+- Source: user
+- Primary owning slice: M002/S01
+- Supporting slices: none
+- Validation: Extension loads via jiti, 43 tools register, browser navigate/snapshot/click work against real page, index.ts is 47-line orchestrator with zero registerTool calls, 9 tool files under tools/.
+- Notes: core.js already exists with ~1000 lines of shared utilities. The split extends this pattern.
+
+### R016 — Shared browser-side evaluate utilities
+- Class: quality-attribute
+- Status: validated
+- Description: Common functions duplicated across page.evaluate boundaries (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName) are injected once and referenced from all evaluate callbacks.
+- Why it matters: Currently buildRefSnapshot and resolveRefTarget each redeclare ~100 lines of identical utility code. Deduplication reduces payload size, improves maintainability, and ensures consistency.
+- Source: user
+- Primary owning slice: M002/S01
+- Supporting slices: none
+- Validation: window.__pi contains all 9 functions, survives navigation, refs.ts has zero inline redeclarations, close/reopen re-injects via addInitScript correctly.
+- Notes: Uses context.addInitScript under window.__pi namespace.
+
+### R017 — Consolidated state capture per action
+- Class: core-capability
+- Status: validated
+- Description: The before-state capture, after-state capture, post-action summary, and recent-error check are consolidated into fewer page.evaluate calls per action.
+- Why it matters: Every action tool currently runs 3-4 separate page.evaluate calls for state capture. Consolidating them reduces latency on every single browser interaction.
+- Source: user
+- Primary owning slice: M002/S02
+- Supporting slices: M002/S01
+- Validation: postActionSummary eliminated from action tools, countOpenDialogs removed from ToolDeps, consolidated capture pattern. Build passes.
+- Notes: captureCompactPageState and postActionSummary merged into single evaluate.
+
+### R018 — Conditional body text capture
+- Class: core-capability
+- Status: validated
+- Description: Body text capture (includeBodyText: true) is skipped for low-signal actions (scroll, hover, Tab key press) and enabled for high-signal actions (navigate, click, type, submit).
+- Why it matters: Capturing 4000 chars of body text on every scroll or hover is wasteful. Conditional capture reduces evaluate overhead.
+- Source: user
+- Primary owning slice: M002/S02
+- Supporting slices: none
+- Validation: explicit includeBodyText true/false per tool signal level in interaction.ts. Classification codified in D017. Build passes.
+- Notes: Requires classifying each tool as high-signal or low-signal.
+
+### R019 — Faster settle on zero mutations
+- Class: core-capability
+- Status: validated
+- Description: settleAfterActionAdaptive short-circuits with a smaller quiet window when no mutation observer fires in the first 60ms.
+- Why it matters: Many SPA interactions produce no DOM changes. Short-circuiting saves time on the most common case.
+- Source: user
+- Primary owning slice: M002/S02
+- Supporting slices: none
+- Validation: zero_mutation_shortcut settle reason in state.ts type union and settle.ts return path. 60ms/30ms thresholds codified in D019. Build passes.
+- Notes: Track whether any mutation fired at all; if zero after 60ms, use a shorter quiet window.
+
+### R020 — Sharp-based screenshot resizing
+- Class: core-capability
+- Status: validated
+- Description: constrainScreenshot uses the sharp Node library for image resizing instead of bouncing buffers through page canvas context.
+- Why it matters: Faster, no page dependency for image processing.
+- Source: user
+- Primary owning slice: M002/S03
+- Supporting slices: M002/S01
+- Validation: constrainScreenshot uses sharp(buffer).metadata() and sharp(buffer).resize(). Zero page.evaluate calls in capture.ts. Build passes.
+- Notes: sharp added as a dependency.
+
+### R021 — Opt-in screenshots on navigate
+- Class: core-capability
+- Status: validated
+- Description: browser_navigate does not capture or return a screenshot by default. An explicit parameter opts in to screenshot capture.
+- Why it matters: Significant token savings — the screenshot payload is large and often unnecessary.
+- Source: user
+- Primary owning slice: M002/S03
+- Supporting slices: none
+- Validation: browser_navigate has screenshot parameter default false. Capture gated. Build passes.
+- Notes: Default is off. The agent can still use browser_screenshot explicitly.
+
+### R022 — Form analysis tool (browser_analyze_form)
+- Class: core-capability
+- Status: validated
+- Description: A browser_analyze_form tool that returns field inventory including labels, names, types, required status, current values, validation state, and submit controls.
+- Why it matters: Collapses 3-8 tool calls for form analysis into one.
+- Source: user
+- Primary owning slice: M002/S04
+- Supporting slices: M002/S01
+- Validation: 7-level label resolution, form auto-detection, fieldset grouping, submit button discovery. Verified end-to-end against 12-field test form. Build passes.
+- Notes: Must handle label association via for/id, wrapping label, aria-label, aria-labelledby, and placeholder.
+
+### R023 — Form fill tool (browser_fill_form)
+- Class: core-capability
+- Status: validated
+- Description: A browser_fill_form tool that maps labels/names/placeholders to inputs and fills them with type-aware Playwright APIs.
+- Why it matters: Collapses 3-5 tool calls for form filling into one.
+- Source: user
+- Primary owning slice: M002/S04
+- Supporting slices: M002/S01
+- Validation: 5-strategy field resolution, type-aware fill via Playwright APIs, verified end-to-end with 10 fields. Build passes.
+- Notes: Returns matched fields, unmatched values, fields skipped, and validation state.
+
+### R024 — Intent-ranked element retrieval (browser_find_best)
+- Class: core-capability
+- Status: validated
+- Description: A browser_find_best tool that returns scored candidates using deterministic heuristic ranking for 8 semantic intents.
+- Why it matters: Cuts a round trip and reduces reasoning tokens for common element-finding tasks.
+- Source: user
+- Primary owning slice: M002/S05
+- Supporting slices: M002/S01
+- Validation: 8 intents implemented with 4-dimension scoring. Verified via Playwright tests. Build passes, tool count = 47.
+- Notes: Deterministic heuristics only. No hidden LLM calls.
+
+### R025 — Semantic action tool (browser_act)
+- Class: core-capability
+- Status: validated
+- Description: A browser_act tool that resolves the top candidate for a semantic intent and executes the action in one call.
+- Why it matters: Collapses 2-4 tool calls for common micro-tasks into one.
+- Source: user
+- Primary owning slice: M002/S05
+- Supporting slices: M002/S04
+- Validation: Resolves via same scoring engine as browser_find_best. Executes via Playwright locator. Returns before/after diff. Build passes, tool count = 47.
+- Notes: Builds on browser_find_best for element selection. Bounded — does not loop or retry.
+
+### R026 — Test coverage for new and refactored code
+- Class: quality-attribute
+- Status: validated
+- Description: Test suite covers shared browser-side utilities, settle logic, screenshot resizing, form tools, and intent ranking.
+- Why it matters: Regression protection for refactored and new features.
+- Source: user
+- Primary owning slice: M002/S06
+- Supporting slices: all M002 slices
+- Validation: 108 tests (63 unit + 45 integration) passing via `npm run test:browser-tools`.
+- Notes: Test what's unit-testable without a browser. Integration tests with Playwright for tools that need a page.
+
+## Deferred
+
+### R011 — Multi-milestone secret forecasting
+- Class: core-capability
+- Status: deferred
+- Description: Forecast secrets across all planned milestones, not just the active one.
+- Why it matters: Would provide a complete picture of all secrets needed for the project.
+- Source: user
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Deferred — single-milestone forecasting is sufficient for now.
+
+### R012 — Secret rotation reminders
+- Class: operability
+- Status: deferred
+- Description: Track secret age and remind users when keys may need rotation.
+- Why it matters: Security best practice, but not essential for the core workflow.
+- Source: user
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Deferred — out of scope for initial release.
+
+### R027 — Browser reuse across sessions
+- Class: core-capability
+- Status: deferred
+- Description: Keep a warm browser instance across rapid successive agent contexts to avoid ~2-3s Chrome cold-start per session.
+- Why it matters: Would eliminate Chrome launch latency in auto-mode.
+- Source: user
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Deferred — skip completely per user direction.
+
+### R042 — Parallel milestone execution in multiple worktrees
+- Class: core-capability
+- Status: deferred
+- Description: Run multiple milestones simultaneously in separate worktrees with independent auto-mode sessions.
+- Why it matters: Natural extension of worktree-per-milestone architecture. Would enable parallel work streams.
+- Source: user
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Deferred — ship sequential milestone execution first. The worktree infrastructure naturally supports this later.
+
+### R043 — Native libgit2 write operations
+- Class: quality-attribute
+- Status: deferred
+- Description: Extend the Rust/libgit2 native module to cover write operations (commit, merge, checkout) in addition to the current read-only queries.
+- Why it matters: Would eliminate execSync overhead for git writes on the hot path.
+- Source: inferred
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: unmapped
+- Notes: Deferred — execSync writes are functional. Optimize later if profiling shows it matters.
+
+## Out of Scope
+
+### R013 — Curated service knowledge base
+- Class: anti-feature
+- Status: out-of-scope
+- Description: A static database of known services with pre-written guidance for each API key.
+- Why it matters: Prevents scope creep. LLM-generated guidance is sufficient and stays current without maintenance.
+- Source: user
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: n/a
+- Notes: LLM generates guidance dynamically.
+
+### R014 — Just-in-time collection enhancement
+- Class: anti-feature
+- Status: out-of-scope
+- Description: Detect missing secrets during task execution and collect them inline.
+- Why it matters: Prevents scope confusion. M001 is about proactive collection, not reactive.
+- Source: user
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: n/a
+- Notes: Existing secure_env_collect already handles reactive collection.
+
+### R028 — LLM-powered intent resolution
+- Class: anti-feature
+- Status: out-of-scope
+- Description: Using hidden LLM calls inside browser_find_best or browser_act for intent resolution.
+- Why it matters: Prevents unpredictable latency and cost.
+- Source: inferred
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: n/a
+- Notes: browser_find_best and browser_act use scoring heuristics, not LLM inference.
+
+### R044 — Rebase merge strategy
+- Class: anti-feature
+- Status: out-of-scope
+- Description: Adding rebase as a merge strategy option alongside squash and --no-ff merge.
+- Why it matters: Rebase rewrites history, which conflicts with the "commit diary" philosophy. It also introduces more failure modes (rebase conflicts are harder to auto-resolve than merge conflicts).
+- Source: inferred
+- Primary owning slice: none
+- Supporting slices: none
+- Validation: n/a
+- Notes: --no-ff merge + squash covers all needed use cases without history rewriting.
+
+## Traceability
+
+| ID | Class | Status | Primary owner | Supporting | Proof |
+|---|---|---|---|---|---|
+| R001 | core-capability | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section, parser round-trip tests |
+| R002 | continuity | validated | M001/S01 | none | parseSecretsManifest/formatSecretsManifest round-trip tested |
+| R003 | primary-user-loop | validated | M001/S02 | M001/S01 | collect-from-manifest.test.ts tests 6-8 |
+| R004 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 4-5 |
+| R005 | primary-user-loop | validated | M001/S02 | none | manifest-status.test.ts tests 4,7; collect-from-manifest.test.ts tests 1-2 |
+| R006 | integration | validated | M001/S02 | none | collectSecretsFromManifest calls detectDestination() |
+| R007 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | auto-secrets-gate.test.ts 3/3 pass |
+| R008 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | guided-flow.ts calls startAuto() at lines 52, 486, 647, 794 |
+| R009 | integration | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section line 62 |
+| R010 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 6-8 |
+| R011 | core-capability | deferred | none | none | unmapped |
+| R012 | operability | deferred | none | none | unmapped |
+| R013 | anti-feature | out-of-scope | none | none | n/a |
+| R014 | anti-feature | out-of-scope | none | none | n/a |
+| R015 | quality-attribute | validated | M002/S01 | none | jiti load, 43 tools register, slim index, browser spot-check |
+| R016 | quality-attribute | validated | M002/S01 | none | window.__pi injection, zero inline redeclarations, survives navigation |
+| R017 | core-capability | validated | M002/S02 | M002/S01 | postActionSummary eliminated, consolidated capture pattern |
+| R018 | core-capability | validated | M002/S02 | none | explicit includeBodyText true/false per tool signal level |
+| R019 | core-capability | validated | M002/S02 | none | zero_mutation_shortcut settle reason, 60ms/30ms thresholds |
+| R020 | core-capability | validated | M002/S03 | M002/S01 | sharp-based constrainScreenshot, zero page.evaluate in capture.ts |
+| R021 | core-capability | validated | M002/S03 | none | screenshot param default false, capture gated |
+| R022 | core-capability | validated | M002/S04 | M002/S01 | 7-level label resolution, verified against 12-field test form |
+| R023 | core-capability | validated | M002/S04 | M002/S01 | 5-strategy field resolution, verified end-to-end with 10 fields |
+| R024 | core-capability | validated | M002/S05 | M002/S01 | 8-intent scoring, Playwright tests, differentiated rankings |
+| R025 | core-capability | validated | M002/S05 | M002/S04 | top candidate execution, settle + diff, graceful error |
+| R026 | quality-attribute | validated | M002/S06 | all M002 | 108 tests passing via npm run test:browser-tools |
+| R027 | core-capability | deferred | none | none | unmapped |
+| R028 | anti-feature | out-of-scope | none | none | n/a |
+| R029 | core-capability | active | M003/S01 | none | unmapped |
+| R030 | core-capability | active | M003/S03 | M003/S01 | unmapped |
+| R031 | core-capability | active | M003/S02 | M003/S01 | unmapped |
+| R032 | core-capability | active | M003/S03 | none | unmapped |
+| R033 | core-capability | active | M003/S04 | none | unmapped |
+| R034 | core-capability | active | M003/S04 | M003/S03 | unmapped |
+| R035 | core-capability | active | M003/S05 | M003/S01, M003/S02, M003/S03 | unmapped |
+| R036 | quality-attribute | active | M003/S02 | M003/S06 | unmapped |
+| R037 | primary-user-loop | active | M003/S05 | all M003 | unmapped |
+| R038 | continuity | active | M003/S04 | none | unmapped |
+| R039 | integration | active | M003/S01 | none | unmapped |
+| R040 | operability | active | M003/S06 | M003/S05 | unmapped |
+| R041 | quality-attribute | active | M003/S07 | all M003 | unmapped |
+| R042 | core-capability | deferred | none | none | unmapped |
+| R043 | quality-attribute | deferred | none | none | unmapped |
+| R044 | anti-feature | out-of-scope | none | none | n/a |
+
+## Coverage Summary
+
+- Active requirements: 13
+- Mapped to slices: 13
+- Validated: 22
+- Deferred: 5
+- Out of scope: 4
+- Unmapped active requirements: 0
--- a/.gsd/STATE.md
+++ b/.gsd/STATE.md
@ -0,0 +1,23 @@
+# GSD State
+
+**Active Milestone:** M003 — Worktree-Isolated Git Architecture
+**Active Slice:** None
+**Phase:** pre-planning
+
+## Milestone Registry
+- ✅ **M001:** Proactive Secret Management
+- ✅ **M002:** Browser Tools Performance & Intelligence
+- 🔄 **M003:** Worktree-Isolated Git Architecture
+
+## Recent Decisions
+- D027: Git isolation model — worktree-per-milestone (default for new projects)
+- D028: Slice merge strategy within worktree — --no-ff merge
+- D029: Milestone-to-main merge strategy — squash merge
+- D030: Failure handling philosophy — stop but self-heal
+- D031: Target user priority — vibe coder first
+
+## Blockers
+- None
+
+## Next Action
+Research and plan M003. Context and roadmap are written. Ready for auto-mode.
--- a/.gsd/milestones/M003/M003-CONTEXT.md
+++ b/.gsd/milestones/M003/M003-CONTEXT.md
@ -0,0 +1,114 @@
+# M003: Worktree-Isolated Git Architecture
+
+**Gathered:** 2026-03-14
+**Status:** Ready for planning
+
+## Project Description
+
+Overhaul GSD's git system to use worktree-per-milestone isolation as the default model. Each milestone gets its own git worktree with an isolated `.gsd/` directory, eliminating the entire category of `.gsd/` merge conflicts that have caused ~15 separate bug fixes to date. Slices merge into the milestone branch via `--no-ff` (preserving full commit history as a diary of the agent's work). Milestones squash-merge to main on completion (keeping main clean). The system is automagical for vibe coders — zero git errors, zero git knowledge required — and configurable for senior engineers via preferences.
+
+## Why This Milestone
+
+The current branch-per-slice model shares `.gsd/` state across branches, causing merge conflicts that halt auto-mode. The CHANGELOG shows a pattern: each fix leads to a new edge case. The root cause is structural — sharing mutable state across branches. Worktree isolation eliminates the problem architecturally rather than patching symptoms.
+
+## User-Visible Outcome
+
+### When this milestone is complete, the user can:
+
+- Run `/gsd auto` on a new project and have it execute start-to-finish without any git errors, merge conflicts, or mysterious halts
+- See clean `git log` on main with one commit per completed milestone
+- Configure `git.merge_to_main: "slice"` in preferences to get slice-level integration if they want it
+- Run `/gsd doctor` to detect and fix git-related issues
+- Use manual `/worktree` alongside auto-mode without conflicts
+
+### Entry point / environment
+
+- Entry point: `/gsd auto` CLI command, `/gsd doctor` CLI command
+- Environment: local dev — any git repository
+- Live dependencies involved: git CLI, optional libgit2 native module
+
+## Completion Class
+
+- Contract complete means: auto-worktree create/teardown lifecycle works, slice merges use `--no-ff`, milestone squashes to main, preferences switch between modes, self-heal recovers from common failures, all tests pass
+- Integration complete means: the full auto-mode lifecycle (startAuto → dispatch units → complete slices → complete milestone → merge to main) works end-to-end in a real git repo with real file changes
+- Operational complete means: existing projects on branch-per-slice model continue working unchanged, manual `/worktree` coexists without conflicts
+
+## Final Integrated Acceptance
+
+To call this milestone complete, we must prove:
+
+- Auto-mode on a fresh project creates a worktree, executes through multiple slices, and merges the milestone to main — with zero git errors
+- An existing project with branch-per-slice history continues working identically (no regression)
+- A deliberately introduced merge conflict is self-healed without user intervention
+- `git log main` shows exactly one squash commit per completed milestone
+- `git log milestone/M003` shows full commit history with `--no-ff` merge boundaries per slice
+
+## Risks and Unknowns
+
+- **`process.chdir` in auto-mode** — auto-mode currently passes `basePath` to all functions but doesn't `chdir`. Worktree mode needs `chdir` into the worktree so that all tool calls (bash, read, write, edit) resolve against the worktree. The worktree-command.ts already does this, but auto-mode doesn't. Risk: some codepath uses `basePath` while another uses `process.cwd()`, causing split-brain.
+- **Worktree `.gsd/` inheritance** — when a worktree is created, it gets a copy of the project files from the milestone branch base. But `.gsd/` planning files from the main tree may or may not be wanted in the worktree. Need to decide: copy planning state or start fresh.
+- **State machine re-entry** — if auto-mode is paused and resumed, the worktree must be re-entered (if it still exists). The pause/resume logic in `startAuto` needs to handle this.
+- **Existing orphan recovery** — the current `mergeOrphanedSliceBranches` logic needs to work within the worktree context, not just on main.
+
+> See `.gsd/DECISIONS.md` for all architectural and pattern decisions — it is an append-only register; read it during planning, append to it during execution.
+
+## Relevant Requirements
+
+- R029 — Auto-worktree creation on milestone start
+- R030 — Auto-worktree teardown + squash-merge on milestone complete
+- R031 — `--no-ff` slice merges within milestone worktree
+- R032 — Rich milestone-level squash commit message
+- R033 — `git.isolation` preference
+- R034 — `git.merge_to_main` preference
+- R035 — Self-healing git repair on failure
+- R036 — `.gsd/` conflict resolution elimination
+- R037 — Zero git errors for vibe coders
+- R038 — Backwards compatibility with branch-per-slice model
+- R039 — Manual `/worktree` coexistence with auto-worktrees
+- R040 — Doctor git health checks
+- R041 — Test coverage for worktree-isolated flow
+
+## Scope
+
+### In Scope
+
+- Auto-worktree lifecycle wired into `startAuto()` and `complete-milestone`
+- `--no-ff` merge for slices within worktree, squash for milestone to main
+- `git.isolation` and `git.merge_to_main` preferences with validation
+- Self-healing git repair (abort, reset, retry) for common failure modes
+- Doctor git health checks (orphaned worktrees, stale branches, corrupt state)
+- Simplification of `.gsd/` conflict resolution code (worktree mode only)
+- Test suite for both worktree and branch isolation modes
+- Backwards compatibility with existing branch-per-slice projects
+
+### Out of Scope / Non-Goals
+
+- Parallel milestone execution (deferred to future milestone)
+- Native libgit2 write operations (deferred)
+- Rebase merge strategy (anti-feature — conflicts with commit diary philosophy)
+- Remote git operations beyond existing auto-push
+
+## Technical Constraints
+
+- Must work with git CLI (libgit2 native module is optional, read-only)
+- `process.chdir` is the mechanism for worktree switching (proven in worktree-command.ts)
+- All file tools (read, write, edit, bash) resolve against `process.cwd()` — this is the reason `chdir` works
+- Source files are in `src/resources/extensions/gsd/`, tests in `src/resources/extensions/gsd/tests/`
+- Tests run via `npm run test:unit` and `npm run test:integration`
+
+## Integration Points
+
+- `auto.ts` — primary integration point for worktree lifecycle in `startAuto()`, `dispatchNextUnit()`, `handleAgentEnd()`
+- `git-service.ts` — `GitServiceImpl` class owns all git mutation operations
+- `worktree.ts` — thin facade over `GitServiceImpl`, exports `ensureSliceBranch`, `mergeSliceToMain`, etc.
+- `worktree-manager.ts` — existing worktree create/list/remove/merge operations
+- `worktree-command.ts` — manual `/worktree` command with `process.chdir` handling
+- `preferences.ts` — preference validation and loading
+- `doctor.ts` — health check and auto-fix system
+- `native-git-bridge.ts` — libgit2 read operations
+- `dispatch-guard.ts` — prior-slice completion checking
+
+## Open Questions
+
+- **Worktree naming convention for auto-worktrees** — should auto-worktrees use the milestone ID as the name (`.gsd/worktrees/M003/`) or a prefixed name (`.gsd/worktrees/auto-M003/`)? Current thinking: bare milestone ID is cleaner and the branch convention (`milestone/M003` vs `worktree/<name>`) disambiguates from manual worktrees.
+- **`.gsd/` file handling on worktree creation** — should the worktree inherit the main tree's `.gsd/` planning files, or should they be cleared for a fresh start? Current thinking: inherit — the worktree needs the milestone's CONTEXT.md and ROADMAP.md to continue planning.
--- a/.gsd/milestones/M003/M003-ROADMAP.md
+++ b/.gsd/milestones/M003/M003-ROADMAP.md
@ -0,0 +1,173 @@
+# M003: Worktree-Isolated Git Architecture
+
+**Vision:** Overhaul GSD's git system so that auto-mode is automagical — zero git errors, zero merge conflicts, zero user intervention required. Each milestone gets its own isolated worktree. Main is always clean. The system just runs.
+
+## Success Criteria
+
+- Auto-mode on a fresh project executes through an entire milestone without any git errors or halts
+- Main branch only receives commits when milestones complete (one squash commit per milestone)
+- Full commit history preserved within milestone worktree branches via `--no-ff` slice merges
+- Existing branch-per-slice projects continue working identically — zero regressions
+- Self-healing resolves common git failures (merge conflict, checkout issue, corrupt state) without user intervention
+- `/gsd doctor` detects and fixes git health issues (orphaned worktrees, stale branches, corrupt merge state)
+
+## Key Risks / Unknowns
+
+- **`process.chdir` coherence in auto-mode** — all tool calls must resolve against the worktree path after chdir. The worktree-command.ts has proven this works, but auto-mode's `basePath` variable and `process.cwd()` must stay in sync.
+- **Worktree `.gsd/` inheritance** — creating a worktree copies project files from the base branch. `.gsd/` planning files (CONTEXT, ROADMAP) must carry through; runtime files (STATE.md, metrics, activity) must not cause conflicts.
+- **State machine re-entry on resume** — pausing and resuming auto-mode must re-enter the worktree if it exists. The current pause/resume logic doesn't handle this.
+
+## Proof Strategy
+
+- `process.chdir` coherence → retire in S01 by proving auto-mode dispatches and executes a unit inside the worktree with all file operations resolving correctly
+- Worktree `.gsd/` inheritance → retire in S01 by proving planning files are available after worktree creation and runtime files don't conflict
+- State machine re-entry → retire in S01 by proving pause/resume correctly re-enters the worktree
+
+## Verification Classes
+
+- Contract verification: git operations produce expected branch state, file layout, and commit history in temp repos
+- Integration verification: full auto-mode lifecycle (create worktree → execute slices → merge milestone → teardown) in a real git repo
+- Operational verification: existing branch-per-slice projects continue working; manual `/worktree` coexists
+- UAT / human verification: run auto-mode on a real project and confirm zero git errors
+
+## Milestone Definition of Done
+
+This milestone is complete only when all are true:
+
+- Auto-worktree lifecycle works end-to-end (create, execute, merge, teardown)
+- `--no-ff` slice merges produce correct history on milestone branch
+- Milestone squash to main produces clean single commit
+- `git.isolation` and `git.merge_to_main` preferences work with validation
+- Self-healing recovers from common git failures without user intervention
+- Existing branch-per-slice projects pass all existing tests
+- `/gsd doctor` detects and fixes git health issues
+- Full test suite passes for both worktree and branch isolation modes
+- Success criteria re-checked against live behavior
+
+## Requirement Coverage
+
+- Covers: R029, R030, R031, R032, R033, R034, R035, R036, R037, R038, R039, R040, R041
+- Partially covers: none
+- Leaves for later: R042 (parallel milestones), R043 (native libgit2 writes)
+- Orphan risks: none
+
+## Slices
+
+- [ ] **S01: Auto-worktree lifecycle in auto-mode** `risk:high` `depends:[]`
+  > After this: `startAuto()` on a new milestone creates a worktree under `.gsd/worktrees/M003/`, `chdir`s into it, and dispatches units inside the worktree. Pause/resume re-enters the worktree. Progress widget shows the worktree branch. Verified via running auto-mode unit dispatch in a temp repo worktree.
+
+- [ ] **S02: --no-ff slice merges + conflict elimination** `risk:high` `depends:[S01]`
+  > After this: completed slices merge into the milestone branch via `--no-ff` instead of squash. The `.gsd/` auto-resolve conflict code in `mergeSliceToMain` is bypassed in worktree mode. `git log` on the milestone branch shows full commit history with merge commit boundaries per slice. Verified in temp repo.
+
+- [ ] **S03: Milestone-to-main squash merge + worktree teardown** `risk:high` `depends:[S01,S02]`
+  > After this: `complete-milestone` squash-merges the milestone branch to main with a rich commit message listing all slices, removes the worktree, `chdir`s back to the main project root. `git log main` shows one clean commit. Auto-push works if enabled. Verified in temp repo with remote.
+
+- [ ] **S04: Preferences + backwards compatibility** `risk:medium` `depends:[S01]`
+  > After this: `git.isolation: "worktree"` (default for new projects) / `"branch"` (existing projects) and `git.merge_to_main: "milestone"` / `"slice"` preferences are validated and respected. An existing project with `gsd/*` branches defaults to branch mode and works identically to today. Verified by running tests in both modes.
+
+- [ ] **S05: Self-healing git repair** `risk:medium` `depends:[S01,S02,S03]`
+  > After this: when a merge fails or checkout breaks during auto-mode, the system aborts the failed operation, resets working tree state, and retries. Only truly unresolvable conflicts (real code conflicts between human-edited files) pause auto-mode. Users see non-technical messages, not raw git errors. Verified by deliberately introducing failures and confirming auto-recovery.
+
+- [ ] **S06: Doctor + cleanup + code simplification** `risk:low` `depends:[S01,S02,S03,S05]`
+  > After this: `/gsd doctor` detects orphaned auto-worktrees, stale milestone branches, corrupt merge state (MERGE_HEAD/SQUASH_MSG), and tracked runtime files — and fixes them. Dead `.gsd/` conflict resolution code removed from worktree-mode paths in git-service.ts. Verified via doctor test cases.
+
+- [ ] **S07: Test suite for worktree-isolated flow** `risk:low` `depends:[S01,S02,S03,S04,S05,S06]`
+  > After this: full test coverage for auto-worktree create/teardown, `--no-ff` slice merge, milestone squash, preference switching, self-heal scenarios, doctor checks. All existing git tests still pass. Both isolation modes tested. Verified via `npm run test:unit && npm run test:integration`.
+
+<!--
+  Format rules (parsers depend on this exact structure):
+  - Checkbox line: - [ ] **S01: Title** `risk:high|medium|low` `depends:[S01,S02]`
+  - Demo line:     >  After this: one sentence showing what's demoable
+  - Mark done:     change [ ] to [x]
+  - Order slices by risk (highest first)
+  - Each slice must be a vertical, demoable increment — not a layer
+  - If all slices are completed exactly as written, the milestone's promised outcome should actually work at the stated proof level
+  - depends:[X,Y] means X and Y must be done before this slice starts
+-->
+
+## Boundary Map
+
+### S01 → S02, S03, S04, S05
+
+Produces:
+- `createAutoWorktree(basePath, milestoneId)` — creates worktree, returns worktree path
+- `teardownAutoWorktree(basePath, milestoneId)` — removes worktree, returns to main tree
+- `isInAutoWorktree(basePath)` → boolean — detects if currently in an auto-worktree
+- `getAutoWorktreePath(basePath, milestoneId)` → string | null — resolves worktree path
+- `enterAutoWorktree(basePath, milestoneId)` — `process.chdir` into existing worktree
+- Updated `startAuto()` in auto.ts that creates/enters worktree on milestone start
+- Updated pause/resume logic that re-enters worktree on resume
+
+Consumes:
+- nothing (first slice)
+
+### S01 → S02
+
+Produces:
+- The worktree infrastructure that S02 merges slices within
+
+Consumes:
+- nothing (first slice)
+
+### S02 → S03
+
+Produces:
+- `mergeSliceToMilestone(basePath, milestoneId, sliceId, sliceTitle)` — `--no-ff` merge of slice branch into milestone branch within worktree
+- Simplified merge path that skips `.gsd/` conflict resolution in worktree mode
+
+Consumes from S01:
+- `isInAutoWorktree()` to determine which merge strategy to use
+
+### S02 → S06
+
+Produces:
+- Knowledge of which conflict resolution code is dead in worktree mode
+
+Consumes from S01:
+- Worktree detection functions
+
+### S03 → S05
+
+Produces:
+- `mergeMilestoneToMain(basePath, milestoneId)` — squash-merge milestone branch to main
+- `buildMilestoneCommitMessage(milestoneId, milestoneTitle, slices)` — rich squash commit
+
+Consumes from S01:
+- `teardownAutoWorktree()` for worktree removal after merge
+- `isInAutoWorktree()` for detection
+
+Consumes from S02:
+- Merged milestone branch with `--no-ff` slice history
+
+### S04 → S01, S02, S03
+
+Produces:
+- `git.isolation` preference — `"worktree"` | `"branch"`
+- `git.merge_to_main` preference — `"milestone"` | `"slice"`
+- `shouldUseWorktreeIsolation(basePath)` — resolves effective isolation mode
+- Preference validation in `preferences.ts`
+
+Consumes from S01:
+- Auto-worktree functions (gated by isolation preference)
+
+### S05 → S06
+
+Produces:
+- Structured git error handling patterns (try/abort/reset/retry)
+- User-facing error message formatting
+
+Consumes from S01:
+- Worktree detection (to scope repair to correct working tree)
+Consumes from S02:
+- Merge operations that may fail
+Consumes from S03:
+- Milestone merge that may fail
+
+### S06 → S07
+
+Produces:
+- Doctor git health check functions
+- Simplified git-service.ts with dead code removed
+
+Consumes from S05:
+- Error handling patterns for doctor fix operations