From 24597873a046e45696989d9e45167cad659949fa Mon Sep 17 00:00:00 2001 From: Lex Christopherson Date: Sat, 14 Mar 2026 22:02:26 -0600 Subject: [PATCH] docs(M003): context, requirements, and roadmap --- .gsd/DECISIONS.md | 41 ++ .gsd/PROJECT.md | 43 +++ .gsd/REQUIREMENTS.md | 553 +++++++++++++++++++++++++++ .gsd/STATE.md | 23 ++ .gsd/milestones/M003/M003-CONTEXT.md | 114 ++++++ .gsd/milestones/M003/M003-ROADMAP.md | 173 +++++++++ 6 files changed, 947 insertions(+) create mode 100644 .gsd/DECISIONS.md create mode 100644 .gsd/PROJECT.md create mode 100644 .gsd/REQUIREMENTS.md create mode 100644 .gsd/STATE.md create mode 100644 .gsd/milestones/M003/M003-CONTEXT.md create mode 100644 .gsd/milestones/M003/M003-ROADMAP.md diff --git a/.gsd/DECISIONS.md b/.gsd/DECISIONS.md new file mode 100644 index 000000000..e2cab5d0c --- /dev/null +++ b/.gsd/DECISIONS.md @@ -0,0 +1,41 @@ +# Decisions Register + + + +| # | When | Scope | Decision | Choice | Rationale | Revisable? | +|---|------|-------|----------|--------|-----------|------------| +| D001 | M001 | arch | Secret collection insertion point | At `/gsd auto` entry (startAuto), not as a dispatch unit type | Keeps the state machine untouched. Collection is a one-time gate, not a repeating unit. Simpler, less risk of dispatch loop bugs. | Yes — if collection needs to happen mid-milestone | +| D002 | M001 | convention | Manifest file naming | `M00x-SECRETS.md` via existing `resolveMilestoneFile(base, mid, "SECRETS")` | Consistent with all other milestone-level files (CONTEXT, ROADMAP, RESEARCH). No new path resolver needed. | No | +| D003 | M001 | pattern | Summary screen interactivity | Read-only with auto-skip (no interactive deselection) | Matches the "walk away" philosophy. Simpler UX, fewer edge cases. User can always re-run collection. | Yes — if users request deselection | +| D004 | M001 | pattern | Guidance display placement | Same page as masked input (above the editor) | Single page per key — no extra navigation. User sees guidance while entering the value. | Yes — if terminal height constraints cause problems | +| D005 | M001 | convention | Manifest format | Markdown with H3 sections per key, bold fields, numbered guidance | Consistent with all other .gsd files. Parser and formatter already exist in files.ts. | No | +| D006 | M001 | arch | Destination inference | Reuse existing `detectDestination()` from get-secrets-from-user.ts | Simple file-presence checks (vercel.json → Vercel, convex/ → Convex, default → .env). Already proven. | Yes — if per-key destination override needed | +| D007 | M002 | arch | File structure after module split | Split index.ts into state.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts, utils.ts, evaluate-helpers.ts, and tools/ directory | 5000-line monolith is unmaintainable; module boundaries enable safe changes. core.js already established the pattern. | No | +| D008 | M002 | library | Image resizing library | sharp | Fast, well-maintained, standard Node image processing. Replaces fragile canvas-based approach that depends on page context. | No | +| D009 | M002 | convention | Navigate screenshot default | Off by default, opt-in via parameter | Big token savings. Agent uses browser_screenshot explicitly when visual verification needed. | Yes — if agents consistently need screenshots on navigate | +| D010 | M002 | arch | Browser-side utility injection | page.addInitScript under window.__pi namespace | Survives navigation, available before page scripts, namespaced to avoid collisions. | Yes — if timing issues discovered | +| D011 | M002 | convention | Intent resolution approach | Deterministic heuristics only, no LLM calls | Predictable latency and cost. Scoring functions are testable and debuggable. | Yes — if heuristic coverage proves insufficient | +| D012 | M002 | convention | Browser reuse across sessions | Skip completely | Architecturally different from within-session work; user directed to exclude entirely. | No | +| D013 | M002/S01 | pattern | Mutable state accessor pattern | get/set functions for all 18 state variables, not `export let` | ES module live bindings break under jiti's CJS shim. Accessors guarantee consumers see mutations. | No | +| D014 | M002/S01 | pattern | ToolDeps interface location | Defined in state.ts alongside types it references | Keeps the dependency graph simple — tool files import state.ts for ToolDeps + types. | Yes — could move to separate types.ts if state.ts grows | +| D015 | M002/S01 | pattern | Factory pattern for lifecycle-dependent utils | createGetLivePagesSnapshot(ensureBrowser) instead of direct import | Avoids circular dependency between utils.ts and lifecycle.ts. Wired at orchestrator level. | No | +| D016 | M002/S01 | pattern | Tool file import strategy | Tool files import state accessors and core.js functions directly — ToolDeps carries only infrastructure functions needing lifecycle wiring | Keeps ToolDeps lean. State accessors are stable imports, not runtime-wired dependencies. Avoids bloating the deps interface with every utility. | Yes — if ToolDeps grows unwieldy | +| D017 | M002/S02 | pattern | Action tool signal classification | High-signal: click, type, key_press, select_option, set_checked, navigate, click_ref, fill_ref. Low-signal: scroll, hover, drag, upload_file, hover_ref. | High-signal tools produce meaningful page changes worth capturing body text for diffs. Low-signal tools don't change page content. fill_ref is high-signal because input value changes affect form state. | Yes — if new tools need reclassification | +| D018 | M002/S02 | pattern | postActionSummary retention | Keep postActionSummary in capture.ts for summary-only tools (go_back, go_forward, reload) but remove from action tools that do before/after diff | Summary-only tools don't do diffs and don't need beforeState — postActionSummary is the right abstraction for them. Action tools need consolidated capture. | Yes — could remove entirely if summary-only tools get before/after diff | +| D019 | M002/S02 | tuning | Zero-mutation settle thresholds | 60ms detection window, 30ms shortened quiet window, totalMutationsSeen === 0 required | Conservative thresholds — 60ms is enough time for any async DOM update to start, 30ms shortened window still catches late mutations. Requiring zero total mutations (not just current poll) prevents false short-circuits. | Yes — if real-world testing shows 60ms is too short for slow SPAs | +| D020 | M002/S04 | pattern | Form analysis evaluate location | Form analysis evaluate logic lives in tools/forms.ts, not extracted to evaluate-helpers.ts | Form-specific, not a shared utility. The label resolution heuristic is only used by form tools. Keeping it local avoids bloating the shared injection. | Yes — if S05 intent tools need label resolution | +| D021 | M002/S04 | pattern | Fill uses Playwright APIs, not evaluate | browser_fill_form uses Playwright locator.fill()/selectOption()/setChecked() instead of page.evaluate() value setting | Playwright APIs trigger proper input/change events and handle framework-specific reactivity (React, Vue). Direct value setting via evaluate skips event dispatch and breaks reactive frameworks. | No | +| D022 | M002/S04 | pattern | Fill field matching priority | Label (exact → case-insensitive) → name → placeholder → aria-label | Label is the most human-readable identifier. Name is the most reliable programmatic identifier. Placeholder and aria-label are fallbacks. Exact match before fuzzy prevents wrong-field fills. | Yes — if real-world usage shows a different priority works better | +| D023 | M002/S05 | pattern | Intent scoring model | 4 orthogonal dimensions per intent, each 0-1, summed and clamped | Consistent scoring structure across all 8 intents. Makes scoring testable and debuggable — each dimension has a named reason. 4 dimensions balance discrimination vs complexity. | Yes — could add/remove dimensions per intent if real-world usage shows imbalance | +| D024 | M002/S05 | pattern | search_field action type | Focus instead of click for search_field intent in browser_act | Search fields need keyboard focus for typing, not a click that might submit or toggle. Focus is the semantically correct action. Other intents use click. | Yes — if focus proves unreliable on specific input implementations | +| D025 | M002/S06 | pattern | Test import strategy for browser-tools | jiti CJS imports instead of ESM resolve-ts hook | The resolve-ts ESM hook breaks on core.js (plain .js file imported by TS modules). jiti handles mixed .ts/.js imports correctly from a .cjs test file. | No | +| D026 | M002/S06 | pattern | Testing module-private functions | Source extraction via readFileSync + brace-match + strip types + eval | Avoids exporting test-only APIs from production modules. Fragile to refactors but tests fail clearly when extraction breaks. Acceptable tradeoff for test code. | Yes — if private functions get exported for other reasons | +| D027 | M003 | arch | Git isolation model | Worktree-per-milestone (default for new projects) | Eliminates .gsd/ merge conflicts structurally. Each milestone gets its own worktree with isolated .gsd/ state. Branch-per-slice remains as opt-in legacy mode via git.isolation: "branch". | No | +| D028 | M003 | arch | Slice merge strategy within worktree | --no-ff merge (not squash) | Preserves full commit history as a diary of agent work. Merge commits give natural slice boundaries. Squash would destroy per-task granularity. | Yes — if commit noise proves problematic | +| D029 | M003 | arch | Milestone-to-main merge strategy | Squash merge | Main gets one clean commit per milestone. Individually revertable. Reads like a changelog. Full history preserved on milestone branch for forensics. | No | +| D030 | M003 | arch | Failure handling philosophy | Stop but self-heal | Auto-mode pauses, runs automatic repair (abort, reset, retry), resumes without user intervention in most cases. Only truly ambiguous conflicts need a human. Balances continuity with trust. | Yes — if self-heal proves unreliable | +| D031 | M003 | arch | Target user priority | Vibe coder first | Zero git errors as the default. Senior engineers configure overrides. Biggest market opportunity is users who can't use git today. | No | +| D032 | M003 | convention | Auto-worktree naming | Milestone ID as worktree name, milestone/ as branch | .gsd/worktrees/M003/ with branch milestone/M003. Manual worktrees use worktree/ branches. No collision between auto and manual. | Yes — if naming conflicts discovered | +| D033 | M003 | arch | Migration strategy | New projects default to worktree; existing keep branch-per-slice | Detection: if project has gsd/* branches or milestone META with integration branch → legacy. Otherwise → worktree. No forced migration. | Yes — if adoption shows users want migration tooling | diff --git a/.gsd/PROJECT.md b/.gsd/PROJECT.md new file mode 100644 index 000000000..80ff99aac --- /dev/null +++ b/.gsd/PROJECT.md @@ -0,0 +1,43 @@ +# Project + +## What This Is + +A pi coding agent extension (GSD — "Get Stuff Done") that provides structured planning, auto-mode execution, and project management for autonomous coding sessions. Includes proactive secret management, browser automation tools for UI verification, and worktree-isolated git architecture for zero-friction autonomous execution. + +## Core Value + +Auto-mode runs from start to finish without blocking. Git is invisible — no merge conflicts, no checkout errors, no state corruption. The system is automagical for vibe coders and configurable for senior engineers. + +## Current State + +The GSD extension is fully functional with: +- Milestone/slice/task planning hierarchy +- Auto-mode state machine with fresh-session-per-unit dispatch +- Guided `/gsd` wizard flow +- `secure_env_collect` tool with masked TUI input, multi-destination write support, guidance display, and summary screen +- Proactive secret management: planning prompts forecast secrets, manifests persist them, auto-mode collects them before first dispatch +- Browser-tools extension with 47 registered tools covering navigation, interaction, inspection, verification, tracing, debugging, form intelligence (browser_analyze_form, browser_fill_form), and intent-ranked retrieval and semantic actions (browser_find_best, browser_act) +- Browser-tools `core.js` with shared utilities for action timeline, page registry, state diffing, assertions, fingerprinting +- Branch-per-slice git model with squash merge to main (being superseded by worktree-isolated model in M003) + +## Architecture / Key Patterns + +- **Extension model**: pi extensions register tools, commands, hooks via `ExtensionAPI` +- **State machine**: `auto.ts` drives `dispatchNextUnit()` which reads disk state and dispatches fresh sessions +- **Secrets gate**: `startAuto()` checks `getManifestStatus()` before first dispatch +- **Disk-driven state**: `.gsd/` files are the source of truth, `STATE.md` is derived cache +- **File parsing**: `files.ts` has markdown parsers for all GSD file types +- **Browser-tools**: Modular structure — slim `index.ts` orchestrator, 8 focused infrastructure modules (state.ts, utils.ts, evaluate-helpers.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts), 11 categorized tool files under `tools/` (including forms.ts, intent.ts), shared infrastructure in `core.js` (~1000 lines). Browser-side utilities injected once via `addInitScript` under `window.__pi` namespace. Uses Playwright for browser control. Accessibility-first state representation, deterministic versioned refs, adaptive DOM settling, compact post-action summaries. Form tools use Playwright locator APIs for type-aware filling with structured result reporting. Intent tools use deterministic 4-dimension heuristic scoring for element retrieval and one-call semantic actions. +- **Prompt templates**: `prompts/` directory with mustache-like `{{var}}` substitution +- **TUI components**: `@gsd/pi-tui` provides `Editor`, `Text`, key handling, themes +- **Git architecture**: Worktree-per-milestone isolation (default for new projects). Each milestone gets its own git worktree with isolated `.gsd/` state. Slices merge via `--no-ff` into the milestone branch (preserving full commit history). Milestones squash-merge to main on completion. Legacy branch-per-slice model supported via `git.isolation: "branch"` preference. + +## Capability Contract + +See `.gsd/REQUIREMENTS.md` for the explicit capability contract, requirement status, and coverage mapping. + +## Milestone Sequence + +- [x] M001: Proactive Secret Management — Front-loaded API key collection into planning so auto-mode runs uninterrupted (10 requirements validated) +- [x] M002: Browser Tools Performance & Intelligence — Module decomposition, action pipeline optimization, sharp-based screenshots, form intelligence, intent-ranked retrieval, semantic actions, 108-test suite (12 requirements validated) +- [ ] M003: Worktree-Isolated Git Architecture — Worktree-per-milestone isolation eliminating merge conflicts, self-healing git repair, zero git errors for vibe coders, configurable for senior engineers diff --git a/.gsd/REQUIREMENTS.md b/.gsd/REQUIREMENTS.md new file mode 100644 index 000000000..cdce74991 --- /dev/null +++ b/.gsd/REQUIREMENTS.md @@ -0,0 +1,553 @@ +# Requirements + +This file is the explicit capability and coverage contract for the project. + +## Active + +### R029 — Auto-worktree creation on milestone start +- Class: core-capability +- Status: active +- Description: When auto-mode starts a new milestone, it automatically creates a git worktree under `.gsd/worktrees//` with branch `milestone/`, `chdir`s into it, and dispatches all units from within the worktree. The user never runs a git command. +- Why it matters: Worktree isolation gives each milestone its own `.gsd/` directory, eliminating the entire category of `.gsd/` merge conflicts that have caused ~15 separate bug fixes to date. +- Source: user +- Primary owning slice: M003/S01 +- Supporting slices: none +- Validation: unmapped +- Notes: Must handle: fresh milestone (no worktree yet), resumed milestone (worktree already exists), milestone started from non-main branch. Must coexist with manual `/worktree` command. + +### R030 — Auto-worktree teardown + squash-merge on milestone complete +- Class: core-capability +- Status: active +- Description: When a milestone completes, the milestone branch is squash-merged to main with a rich commit message, the worktree is removed, and `process.chdir` returns to the main project root. Main receives exactly one commit per milestone. +- Why it matters: Main stays clean and always represents completed, working milestones. One commit per milestone is individually revertable. +- Source: user +- Primary owning slice: M003/S03 +- Supporting slices: M003/S01 +- Validation: unmapped +- Notes: Must handle: dirty worktree at teardown time (auto-commit first), failed squash-merge (self-heal), remote push after merge (if auto_push enabled). + +### R031 — `--no-ff` slice merges within milestone worktree +- Class: core-capability +- Status: active +- Description: Completed slices merge into the milestone branch via `--no-ff` merge instead of squash. This preserves the full per-task commit history on the milestone branch, with merge commits providing natural slice boundaries. +- Why it matters: The commit history is a diary of the agent's work. The LLM can read `git log` to understand what happened. Squashing slices destroys this granularity. `--no-ff` merge commits give clean slice boundaries while keeping all commits. +- Source: user +- Primary owning slice: M003/S02 +- Supporting slices: M003/S01 +- Validation: unmapped +- Notes: This is the default for worktree-isolated mode. The branch-per-slice legacy model retains its existing squash default. + +### R032 — Rich milestone-level squash commit message +- Class: core-capability +- Status: active +- Description: When a milestone squash-merges to main, the commit message summarizes all slices and their key outcomes. Format: conventional commit subject + slice task list body + branch metadata. +- Why it matters: Main's git log should read like a changelog. Each milestone commit should tell the full story of what was built. +- Source: user +- Primary owning slice: M003/S03 +- Supporting slices: none +- Validation: unmapped +- Notes: Similar to current rich commit message for slice merges, but at milestone level. Should list all slices with their titles and key outcomes. + +### R033 — `git.isolation` preference +- Class: core-capability +- Status: active +- Description: A `git.isolation` preference with values `"worktree"` (default for new projects) and `"branch"` (legacy model). New projects that have never run GSD default to worktree isolation. Existing projects with an established branch-per-slice history default to branch mode. +- Why it matters: Backwards compatibility — existing projects must not break. New projects get the better model by default. +- Source: user +- Primary owning slice: M003/S04 +- Supporting slices: none +- Validation: unmapped +- Notes: Detection heuristic: if the project has existing `gsd/*` branches or milestone metadata with integration branch records, it's a legacy project → default to "branch". Otherwise → default to "worktree". + +### R034 — `git.merge_to_main` preference +- Class: core-capability +- Status: active +- Description: A `git.merge_to_main` preference with values `"milestone"` (default) and `"slice"`. In milestone mode, main only receives commits when milestones complete. In slice mode, each completed slice squash-merges to main immediately (current behavior). +- Why it matters: Senior engineers who want frequent integration can opt into slice-level merges. Vibe coders get the cleaner milestone-level default. +- Source: user +- Primary owning slice: M003/S04 +- Supporting slices: M003/S03 +- Validation: unmapped +- Notes: `merge_to_main: "slice"` with `isolation: "worktree"` is valid — slices squash-merge to main from within the worktree, but the worktree still provides `.gsd/` isolation. + +### R035 — Self-healing git repair on failure +- Class: core-capability +- Status: active +- Description: When git operations fail during auto-mode (merge conflict, checkout failure, corrupt state), the system automatically attempts repair: abort incomplete merges, reset working tree, retry the operation. Only truly unresolvable conflicts (two humans edited the same code) pause auto-mode with a clear explanation. +- Why it matters: The north star is "automagical — just runs." Git errors are the #1 cause of auto-mode halting. Self-healing eliminates most of those stops. +- Source: user +- Primary owning slice: M003/S05 +- Supporting slices: M003/S01, M003/S02, M003/S03 +- Validation: unmapped +- Notes: The worktree model eliminates most `.gsd/` conflicts structurally. Self-healing handles the remaining edge cases (code conflicts, remote divergence, corrupt index). + +### R036 — `.gsd/` conflict resolution elimination +- Class: quality-attribute +- Status: active +- Description: The ~60 lines of `.gsd/` auto-resolve conflict code in `mergeSliceToMain` and the ~44 merge-related recovery paths in `auto.ts` are simplified or removed. Worktree isolation makes most of this code structurally unnecessary. +- Why it matters: Dead conflict resolution code is maintenance burden and a source of bugs. If the architecture eliminates the problem, the code that patches it should go. +- Source: inferred +- Primary owning slice: M003/S02 +- Supporting slices: M003/S06 +- Validation: unmapped +- Notes: Only remove code that is genuinely unnecessary in worktree mode. Keep the legacy branch-per-slice path intact for `git.isolation: "branch"` users. + +### R037 — Zero git errors for vibe coders +- Class: primary-user-loop +- Status: active +- Description: Users with zero git knowledge should never see a git error message during auto-mode. All git operations are invisible. If something fails, the system self-heals or presents a non-technical explanation with a clear action ("Run `/gsd doctor` to fix this"). +- Why it matters: Vibe coders are the primary market. Git errors are incomprehensible to them and destroy trust in the system. +- Source: user +- Primary owning slice: M003/S05 +- Supporting slices: all M003 slices +- Validation: unmapped +- Notes: This is a quality bar, not a single feature. Every git-touching codepath must handle errors gracefully. + +### R038 — Backwards compatibility with branch-per-slice model +- Class: continuity +- Status: active +- Description: Existing projects that use the branch-per-slice model continue working exactly as they do today. No migration required. The old codepaths remain functional when `git.isolation: "branch"` is active. +- Why it matters: Breaking existing users' workflows would destroy trust. +- Source: user +- Primary owning slice: M003/S04 +- Supporting slices: none +- Validation: unmapped +- Notes: All existing git-service.ts tests must continue passing in branch mode. + +### R039 — Manual `/worktree` coexistence with auto-worktrees +- Class: integration +- Status: active +- Description: The manual `/worktree` command for exploration coexists with auto-mode's milestone worktrees. Different naming conventions prevent conflicts: auto-worktrees use `milestone/M003` branches, manual worktrees use `worktree/` branches. +- Why it matters: Manual worktrees are a valuable exploration tool. They shouldn't be broken by auto-mode's worktree usage. +- Source: user +- Primary owning slice: M003/S01 +- Supporting slices: none +- Validation: unmapped +- Notes: Auto-worktrees are created under `.gsd/worktrees/` just like manual ones, but with milestone ID as the name. The naming convention prevents branch collisions. + +### R040 — Doctor git health checks +- Class: operability +- Status: active +- Description: `/gsd doctor` detects and optionally fixes git-related issues: orphaned auto-worktrees, stale milestone branches, corrupt merge state (MERGE_HEAD/SQUASH_MSG), tracked runtime files, missing gitignore patterns. +- Why it matters: When things do go wrong, users need a one-command fix. Doctor is the safety net. +- Source: inferred +- Primary owning slice: M003/S06 +- Supporting slices: M003/S05 +- Validation: unmapped +- Notes: Doctor already handles planning artifact issues. This extends it to git health. + +### R041 — Test coverage for worktree-isolated flow +- Class: quality-attribute +- Status: active +- Description: Test suite covers: auto-worktree create/teardown, `--no-ff` slice merge within worktree, milestone squash to main, preference switching between isolation modes, self-heal scenarios, doctor git checks. All existing git tests continue passing. +- Why it matters: The git system is the most bug-prone part of GSD. Tests prevent regressions. +- Source: inferred +- Primary owning slice: M003/S07 +- Supporting slices: all M003 slices +- Validation: unmapped +- Notes: Must test both worktree and branch isolation modes. + +## Validated + +### R001 — Secret forecasting during milestone planning +- Class: core-capability +- Status: validated +- Description: When a milestone is planned, the LLM analyzes slices for external service dependencies and writes a secrets manifest listing every predicted API key with setup guidance. +- Why it matters: Without forecasting, auto-mode discovers missing keys mid-execution and blocks for hours waiting for user input. +- Source: user +- Primary owning slice: M001/S01 +- Supporting slices: none +- Validation: plan-milestone.md Secret Forecasting section (line 62) instructs LLM to write manifest. Parser round-trip tested in parsers.test.ts. +- Notes: The plan-milestone prompt has forecasting instructions. The manifest format and parser are implemented and tested. + +### R002 — Secrets manifest persisted in .gsd/ +- Class: continuity +- Status: validated +- Description: The secrets manifest is a durable markdown file at `.gsd/milestones/M00x/M00x-SECRETS.md` that survives session boundaries and can be re-read by any future unit. +- Why it matters: Collection may happen in a different session than planning. The manifest must persist on disk. +- Source: user +- Primary owning slice: M001/S01 +- Supporting slices: none +- Validation: parseSecretsManifest/formatSecretsManifest round-trip tested (parsers.test.ts), resolveMilestoneFile(base, mid, "SECRETS") resolves path. +- Notes: Parser/formatter implemented in files.ts. Template exists at templates/secrets-manifest.md. + +### R003 — Step-by-step guidance per key +- Class: primary-user-loop +- Status: validated +- Description: Each secret in the manifest includes numbered steps for obtaining the key (navigate to dashboard → create project → generate key → copy), a dashboard URL, and a format hint. +- Why it matters: Users shouldn't have to figure out where to find each key. The guidance makes collection self-service. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: M001/S01 +- Validation: collectOneSecret renders numbered dim-styled guidance steps with wrapping (collect-from-manifest.test.ts tests 6-8). +- Notes: Guidance quality is LLM-dependent and best-effort. + +### R004 — Summary screen before collection +- Class: primary-user-loop +- Status: validated +- Description: Before collecting secrets one-by-one, show a read-only summary screen listing all needed keys with their status (pending / already set / skipped). Auto-skip keys that already exist in the environment. +- Why it matters: The user needs to see the full picture before entering keys. Already-set keys should not require re-entry. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: showSecretsSummary() renders read-only ctx.ui.custom screen with status indicators via makeUI().progressItem() (collect-from-manifest.test.ts tests 4-5). +- Notes: Read-only with auto-skip — no interactive deselection. + +### R005 — Existing key detection and silent skip +- Class: primary-user-loop +- Status: validated +- Description: Before prompting for a key, check `.env` and `process.env`. If the key already exists, mark it as "already set" in the summary and skip collection. +- Why it matters: Users shouldn't re-enter keys they've already configured. Prevents frustration and errors. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: getManifestStatus cross-references checkExistingEnvKeys, categorizes env-present keys as existing (manifest-status.test.ts tests 4,7). collectSecretsFromManifest skips them (collect-from-manifest.test.ts tests 1-2). +- Notes: `checkExistingEnvKeys()` implemented in get-secrets-from-user.ts. + +### R006 — Smart destination detection +- Class: integration +- Status: validated +- Description: Automatically detect whether secrets should go to .env, Vercel, or Convex based on project file presence (vercel.json → Vercel, convex/ dir → Convex, default → .env). +- Why it matters: Users shouldn't have to specify the destination manually. The system should do the right thing. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: collectSecretsFromManifest calls detectDestination() for destination inference. applySecrets() routes to dotenv/vercel/convex accordingly. +- Notes: `detectDestination()` implemented in get-secrets-from-user.ts. + +### R007 — Auto-mode collection at entry point +- Class: core-capability +- Status: validated +- Description: When the user runs `/gsd auto`, check for a secrets manifest with pending keys. If found, collect them before dispatching the first slice. Collection happens once at the entry point, not as a dispatch unit. +- Why it matters: This is the primary integration point — auto-mode must not start execution with uncollected secrets. +- Source: user +- Primary owning slice: M001/S03 +- Supporting slices: M001/S01, M001/S02 +- Validation: startAuto() secrets gate at auto.ts:479. auto-secrets-gate.test.ts — 3/3 pass covering null manifest, pending keys, and no-pending-keys paths. +- Notes: Collection at entry point (startAuto), not as a separate unit type in dispatchNextUnit. D001 satisfied. + +### R008 — Guided /gsd wizard integration +- Class: core-capability +- Status: validated +- Description: After milestone planning in the guided `/gsd` flow, trigger secret collection if a manifest exists with pending keys. +- Why it matters: Users who plan via the wizard should also get prompted for secrets before auto-mode begins. +- Source: user +- Primary owning slice: M001/S03 +- Supporting slices: M001/S01, M001/S02 +- Validation: guided-flow.ts calls startAuto() directly (lines 52, 486, 647, 794) — all guided flow paths that start auto-mode inherit the secrets gate. +- Notes: The guided flow dispatches to startAuto after planning. Collection is inherited via the gate. + +### R009 — Planning prompts instruct LLM to forecast secrets +- Class: integration +- Status: validated +- Description: The plan-milestone prompt template includes instructions for the LLM to analyze slices for external service dependencies and write the secrets manifest. +- Why it matters: Without prompt instructions, the LLM won't know to forecast secrets. +- Source: user +- Primary owning slice: M001/S01 +- Supporting slices: none +- Validation: plan-milestone.md has Secret Forecasting section at line 62 with instructions to write {{secretsOutputPath}} with H3 sections per key. +- Notes: Implemented in plan-milestone.md. + +### R010 — secure_env_collect enhanced with guidance display +- Class: primary-user-loop +- Status: validated +- Description: The secure_env_collect TUI renders multi-line guidance steps above the masked input field on the same page, so the user sees setup instructions while entering the key. +- Why it matters: Without visible guidance, the user has to find keys on their own despite the LLM having generated instructions. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: collectOneSecret accepts guidance parameter, renders numbered dim-styled lines with wrapTextWithAnsi above masked input (collect-from-manifest.test.ts tests 6-8). +- Notes: The guidance field is rendered in collectOneSecret(). + +### R015 — Module decomposition of browser-tools +- Class: quality-attribute +- Status: validated +- Description: The monolithic browser-tools index.ts (~5000 lines) is split into focused modules: shared infrastructure, tool groups, and browser-side utilities. All 43 existing tools continue to work identically. +- Why it matters: A 5000-line file is unmaintainable and makes targeted changes risky. Module boundaries enable safe refactoring and new tool development. +- Source: user +- Primary owning slice: M002/S01 +- Supporting slices: none +- Validation: Extension loads via jiti, 43 tools register, browser navigate/snapshot/click work against real page, index.ts is 47-line orchestrator with zero registerTool calls, 9 tool files under tools/. +- Notes: core.js already exists with ~1000 lines of shared utilities. The split extends this pattern. + +### R016 — Shared browser-side evaluate utilities +- Class: quality-attribute +- Status: validated +- Description: Common functions duplicated across page.evaluate boundaries (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName) are injected once and referenced from all evaluate callbacks. +- Why it matters: Currently buildRefSnapshot and resolveRefTarget each redeclare ~100 lines of identical utility code. Deduplication reduces payload size, improves maintainability, and ensures consistency. +- Source: user +- Primary owning slice: M002/S01 +- Supporting slices: none +- Validation: window.__pi contains all 9 functions, survives navigation, refs.ts has zero inline redeclarations, close/reopen re-injects via addInitScript correctly. +- Notes: Uses context.addInitScript under window.__pi namespace. + +### R017 — Consolidated state capture per action +- Class: core-capability +- Status: validated +- Description: The before-state capture, after-state capture, post-action summary, and recent-error check are consolidated into fewer page.evaluate calls per action. +- Why it matters: Every action tool currently runs 3-4 separate page.evaluate calls for state capture. Consolidating them reduces latency on every single browser interaction. +- Source: user +- Primary owning slice: M002/S02 +- Supporting slices: M002/S01 +- Validation: postActionSummary eliminated from action tools, countOpenDialogs removed from ToolDeps, consolidated capture pattern. Build passes. +- Notes: captureCompactPageState and postActionSummary merged into single evaluate. + +### R018 — Conditional body text capture +- Class: core-capability +- Status: validated +- Description: Body text capture (includeBodyText: true) is skipped for low-signal actions (scroll, hover, Tab key press) and enabled for high-signal actions (navigate, click, type, submit). +- Why it matters: Capturing 4000 chars of body text on every scroll or hover is wasteful. Conditional capture reduces evaluate overhead. +- Source: user +- Primary owning slice: M002/S02 +- Supporting slices: none +- Validation: explicit includeBodyText true/false per tool signal level in interaction.ts. Classification codified in D017. Build passes. +- Notes: Requires classifying each tool as high-signal or low-signal. + +### R019 — Faster settle on zero mutations +- Class: core-capability +- Status: validated +- Description: settleAfterActionAdaptive short-circuits with a smaller quiet window when no mutation observer fires in the first 60ms. +- Why it matters: Many SPA interactions produce no DOM changes. Short-circuiting saves time on the most common case. +- Source: user +- Primary owning slice: M002/S02 +- Supporting slices: none +- Validation: zero_mutation_shortcut settle reason in state.ts type union and settle.ts return path. 60ms/30ms thresholds codified in D019. Build passes. +- Notes: Track whether any mutation fired at all; if zero after 60ms, use a shorter quiet window. + +### R020 — Sharp-based screenshot resizing +- Class: core-capability +- Status: validated +- Description: constrainScreenshot uses the sharp Node library for image resizing instead of bouncing buffers through page canvas context. +- Why it matters: Faster, no page dependency for image processing. +- Source: user +- Primary owning slice: M002/S03 +- Supporting slices: M002/S01 +- Validation: constrainScreenshot uses sharp(buffer).metadata() and sharp(buffer).resize(). Zero page.evaluate calls in capture.ts. Build passes. +- Notes: sharp added as a dependency. + +### R021 — Opt-in screenshots on navigate +- Class: core-capability +- Status: validated +- Description: browser_navigate does not capture or return a screenshot by default. An explicit parameter opts in to screenshot capture. +- Why it matters: Significant token savings — the screenshot payload is large and often unnecessary. +- Source: user +- Primary owning slice: M002/S03 +- Supporting slices: none +- Validation: browser_navigate has screenshot parameter default false. Capture gated. Build passes. +- Notes: Default is off. The agent can still use browser_screenshot explicitly. + +### R022 — Form analysis tool (browser_analyze_form) +- Class: core-capability +- Status: validated +- Description: A browser_analyze_form tool that returns field inventory including labels, names, types, required status, current values, validation state, and submit controls. +- Why it matters: Collapses 3-8 tool calls for form analysis into one. +- Source: user +- Primary owning slice: M002/S04 +- Supporting slices: M002/S01 +- Validation: 7-level label resolution, form auto-detection, fieldset grouping, submit button discovery. Verified end-to-end against 12-field test form. Build passes. +- Notes: Must handle label association via for/id, wrapping label, aria-label, aria-labelledby, and placeholder. + +### R023 — Form fill tool (browser_fill_form) +- Class: core-capability +- Status: validated +- Description: A browser_fill_form tool that maps labels/names/placeholders to inputs and fills them with type-aware Playwright APIs. +- Why it matters: Collapses 3-5 tool calls for form filling into one. +- Source: user +- Primary owning slice: M002/S04 +- Supporting slices: M002/S01 +- Validation: 5-strategy field resolution, type-aware fill via Playwright APIs, verified end-to-end with 10 fields. Build passes. +- Notes: Returns matched fields, unmatched values, fields skipped, and validation state. + +### R024 — Intent-ranked element retrieval (browser_find_best) +- Class: core-capability +- Status: validated +- Description: A browser_find_best tool that returns scored candidates using deterministic heuristic ranking for 8 semantic intents. +- Why it matters: Cuts a round trip and reduces reasoning tokens for common element-finding tasks. +- Source: user +- Primary owning slice: M002/S05 +- Supporting slices: M002/S01 +- Validation: 8 intents implemented with 4-dimension scoring. Verified via Playwright tests. Build passes, tool count = 47. +- Notes: Deterministic heuristics only. No hidden LLM calls. + +### R025 — Semantic action tool (browser_act) +- Class: core-capability +- Status: validated +- Description: A browser_act tool that resolves the top candidate for a semantic intent and executes the action in one call. +- Why it matters: Collapses 2-4 tool calls for common micro-tasks into one. +- Source: user +- Primary owning slice: M002/S05 +- Supporting slices: M002/S04 +- Validation: Resolves via same scoring engine as browser_find_best. Executes via Playwright locator. Returns before/after diff. Build passes, tool count = 47. +- Notes: Builds on browser_find_best for element selection. Bounded — does not loop or retry. + +### R026 — Test coverage for new and refactored code +- Class: quality-attribute +- Status: validated +- Description: Test suite covers shared browser-side utilities, settle logic, screenshot resizing, form tools, and intent ranking. +- Why it matters: Regression protection for refactored and new features. +- Source: user +- Primary owning slice: M002/S06 +- Supporting slices: all M002 slices +- Validation: 108 tests (63 unit + 45 integration) passing via `npm run test:browser-tools`. +- Notes: Test what's unit-testable without a browser. Integration tests with Playwright for tools that need a page. + +## Deferred + +### R011 — Multi-milestone secret forecasting +- Class: core-capability +- Status: deferred +- Description: Forecast secrets across all planned milestones, not just the active one. +- Why it matters: Would provide a complete picture of all secrets needed for the project. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — single-milestone forecasting is sufficient for now. + +### R012 — Secret rotation reminders +- Class: operability +- Status: deferred +- Description: Track secret age and remind users when keys may need rotation. +- Why it matters: Security best practice, but not essential for the core workflow. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — out of scope for initial release. + +### R027 — Browser reuse across sessions +- Class: core-capability +- Status: deferred +- Description: Keep a warm browser instance across rapid successive agent contexts to avoid ~2-3s Chrome cold-start per session. +- Why it matters: Would eliminate Chrome launch latency in auto-mode. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — skip completely per user direction. + +### R042 — Parallel milestone execution in multiple worktrees +- Class: core-capability +- Status: deferred +- Description: Run multiple milestones simultaneously in separate worktrees with independent auto-mode sessions. +- Why it matters: Natural extension of worktree-per-milestone architecture. Would enable parallel work streams. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — ship sequential milestone execution first. The worktree infrastructure naturally supports this later. + +### R043 — Native libgit2 write operations +- Class: quality-attribute +- Status: deferred +- Description: Extend the Rust/libgit2 native module to cover write operations (commit, merge, checkout) in addition to the current read-only queries. +- Why it matters: Would eliminate execSync overhead for git writes on the hot path. +- Source: inferred +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — execSync writes are functional. Optimize later if profiling shows it matters. + +## Out of Scope + +### R013 — Curated service knowledge base +- Class: anti-feature +- Status: out-of-scope +- Description: A static database of known services with pre-written guidance for each API key. +- Why it matters: Prevents scope creep. LLM-generated guidance is sufficient and stays current without maintenance. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: n/a +- Notes: LLM generates guidance dynamically. + +### R014 — Just-in-time collection enhancement +- Class: anti-feature +- Status: out-of-scope +- Description: Detect missing secrets during task execution and collect them inline. +- Why it matters: Prevents scope confusion. M001 is about proactive collection, not reactive. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: n/a +- Notes: Existing secure_env_collect already handles reactive collection. + +### R028 — LLM-powered intent resolution +- Class: anti-feature +- Status: out-of-scope +- Description: Using hidden LLM calls inside browser_find_best or browser_act for intent resolution. +- Why it matters: Prevents unpredictable latency and cost. +- Source: inferred +- Primary owning slice: none +- Supporting slices: none +- Validation: n/a +- Notes: browser_find_best and browser_act use scoring heuristics, not LLM inference. + +### R044 — Rebase merge strategy +- Class: anti-feature +- Status: out-of-scope +- Description: Adding rebase as a merge strategy option alongside squash and --no-ff merge. +- Why it matters: Rebase rewrites history, which conflicts with the "commit diary" philosophy. It also introduces more failure modes (rebase conflicts are harder to auto-resolve than merge conflicts). +- Source: inferred +- Primary owning slice: none +- Supporting slices: none +- Validation: n/a +- Notes: --no-ff merge + squash covers all needed use cases without history rewriting. + +## Traceability + +| ID | Class | Status | Primary owner | Supporting | Proof | +|---|---|---|---|---|---| +| R001 | core-capability | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section, parser round-trip tests | +| R002 | continuity | validated | M001/S01 | none | parseSecretsManifest/formatSecretsManifest round-trip tested | +| R003 | primary-user-loop | validated | M001/S02 | M001/S01 | collect-from-manifest.test.ts tests 6-8 | +| R004 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 4-5 | +| R005 | primary-user-loop | validated | M001/S02 | none | manifest-status.test.ts tests 4,7; collect-from-manifest.test.ts tests 1-2 | +| R006 | integration | validated | M001/S02 | none | collectSecretsFromManifest calls detectDestination() | +| R007 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | auto-secrets-gate.test.ts 3/3 pass | +| R008 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | guided-flow.ts calls startAuto() at lines 52, 486, 647, 794 | +| R009 | integration | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section line 62 | +| R010 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 6-8 | +| R011 | core-capability | deferred | none | none | unmapped | +| R012 | operability | deferred | none | none | unmapped | +| R013 | anti-feature | out-of-scope | none | none | n/a | +| R014 | anti-feature | out-of-scope | none | none | n/a | +| R015 | quality-attribute | validated | M002/S01 | none | jiti load, 43 tools register, slim index, browser spot-check | +| R016 | quality-attribute | validated | M002/S01 | none | window.__pi injection, zero inline redeclarations, survives navigation | +| R017 | core-capability | validated | M002/S02 | M002/S01 | postActionSummary eliminated, consolidated capture pattern | +| R018 | core-capability | validated | M002/S02 | none | explicit includeBodyText true/false per tool signal level | +| R019 | core-capability | validated | M002/S02 | none | zero_mutation_shortcut settle reason, 60ms/30ms thresholds | +| R020 | core-capability | validated | M002/S03 | M002/S01 | sharp-based constrainScreenshot, zero page.evaluate in capture.ts | +| R021 | core-capability | validated | M002/S03 | none | screenshot param default false, capture gated | +| R022 | core-capability | validated | M002/S04 | M002/S01 | 7-level label resolution, verified against 12-field test form | +| R023 | core-capability | validated | M002/S04 | M002/S01 | 5-strategy field resolution, verified end-to-end with 10 fields | +| R024 | core-capability | validated | M002/S05 | M002/S01 | 8-intent scoring, Playwright tests, differentiated rankings | +| R025 | core-capability | validated | M002/S05 | M002/S04 | top candidate execution, settle + diff, graceful error | +| R026 | quality-attribute | validated | M002/S06 | all M002 | 108 tests passing via npm run test:browser-tools | +| R027 | core-capability | deferred | none | none | unmapped | +| R028 | anti-feature | out-of-scope | none | none | n/a | +| R029 | core-capability | active | M003/S01 | none | unmapped | +| R030 | core-capability | active | M003/S03 | M003/S01 | unmapped | +| R031 | core-capability | active | M003/S02 | M003/S01 | unmapped | +| R032 | core-capability | active | M003/S03 | none | unmapped | +| R033 | core-capability | active | M003/S04 | none | unmapped | +| R034 | core-capability | active | M003/S04 | M003/S03 | unmapped | +| R035 | core-capability | active | M003/S05 | M003/S01, M003/S02, M003/S03 | unmapped | +| R036 | quality-attribute | active | M003/S02 | M003/S06 | unmapped | +| R037 | primary-user-loop | active | M003/S05 | all M003 | unmapped | +| R038 | continuity | active | M003/S04 | none | unmapped | +| R039 | integration | active | M003/S01 | none | unmapped | +| R040 | operability | active | M003/S06 | M003/S05 | unmapped | +| R041 | quality-attribute | active | M003/S07 | all M003 | unmapped | +| R042 | core-capability | deferred | none | none | unmapped | +| R043 | quality-attribute | deferred | none | none | unmapped | +| R044 | anti-feature | out-of-scope | none | none | n/a | + +## Coverage Summary + +- Active requirements: 13 +- Mapped to slices: 13 +- Validated: 22 +- Deferred: 5 +- Out of scope: 4 +- Unmapped active requirements: 0 diff --git a/.gsd/STATE.md b/.gsd/STATE.md new file mode 100644 index 000000000..d4fd32765 --- /dev/null +++ b/.gsd/STATE.md @@ -0,0 +1,23 @@ +# GSD State + +**Active Milestone:** M003 — Worktree-Isolated Git Architecture +**Active Slice:** None +**Phase:** pre-planning + +## Milestone Registry +- ✅ **M001:** Proactive Secret Management +- ✅ **M002:** Browser Tools Performance & Intelligence +- 🔄 **M003:** Worktree-Isolated Git Architecture + +## Recent Decisions +- D027: Git isolation model — worktree-per-milestone (default for new projects) +- D028: Slice merge strategy within worktree — --no-ff merge +- D029: Milestone-to-main merge strategy — squash merge +- D030: Failure handling philosophy — stop but self-heal +- D031: Target user priority — vibe coder first + +## Blockers +- None + +## Next Action +Research and plan M003. Context and roadmap are written. Ready for auto-mode. diff --git a/.gsd/milestones/M003/M003-CONTEXT.md b/.gsd/milestones/M003/M003-CONTEXT.md new file mode 100644 index 000000000..9021f4fe3 --- /dev/null +++ b/.gsd/milestones/M003/M003-CONTEXT.md @@ -0,0 +1,114 @@ +# M003: Worktree-Isolated Git Architecture + +**Gathered:** 2026-03-14 +**Status:** Ready for planning + +## Project Description + +Overhaul GSD's git system to use worktree-per-milestone isolation as the default model. Each milestone gets its own git worktree with an isolated `.gsd/` directory, eliminating the entire category of `.gsd/` merge conflicts that have caused ~15 separate bug fixes to date. Slices merge into the milestone branch via `--no-ff` (preserving full commit history as a diary of the agent's work). Milestones squash-merge to main on completion (keeping main clean). The system is automagical for vibe coders — zero git errors, zero git knowledge required — and configurable for senior engineers via preferences. + +## Why This Milestone + +The current branch-per-slice model shares `.gsd/` state across branches, causing merge conflicts that halt auto-mode. The CHANGELOG shows a pattern: each fix leads to a new edge case. The root cause is structural — sharing mutable state across branches. Worktree isolation eliminates the problem architecturally rather than patching symptoms. + +## User-Visible Outcome + +### When this milestone is complete, the user can: + +- Run `/gsd auto` on a new project and have it execute start-to-finish without any git errors, merge conflicts, or mysterious halts +- See clean `git log` on main with one commit per completed milestone +- Configure `git.merge_to_main: "slice"` in preferences to get slice-level integration if they want it +- Run `/gsd doctor` to detect and fix git-related issues +- Use manual `/worktree` alongside auto-mode without conflicts + +### Entry point / environment + +- Entry point: `/gsd auto` CLI command, `/gsd doctor` CLI command +- Environment: local dev — any git repository +- Live dependencies involved: git CLI, optional libgit2 native module + +## Completion Class + +- Contract complete means: auto-worktree create/teardown lifecycle works, slice merges use `--no-ff`, milestone squashes to main, preferences switch between modes, self-heal recovers from common failures, all tests pass +- Integration complete means: the full auto-mode lifecycle (startAuto → dispatch units → complete slices → complete milestone → merge to main) works end-to-end in a real git repo with real file changes +- Operational complete means: existing projects on branch-per-slice model continue working unchanged, manual `/worktree` coexists without conflicts + +## Final Integrated Acceptance + +To call this milestone complete, we must prove: + +- Auto-mode on a fresh project creates a worktree, executes through multiple slices, and merges the milestone to main — with zero git errors +- An existing project with branch-per-slice history continues working identically (no regression) +- A deliberately introduced merge conflict is self-healed without user intervention +- `git log main` shows exactly one squash commit per completed milestone +- `git log milestone/M003` shows full commit history with `--no-ff` merge boundaries per slice + +## Risks and Unknowns + +- **`process.chdir` in auto-mode** — auto-mode currently passes `basePath` to all functions but doesn't `chdir`. Worktree mode needs `chdir` into the worktree so that all tool calls (bash, read, write, edit) resolve against the worktree. The worktree-command.ts already does this, but auto-mode doesn't. Risk: some codepath uses `basePath` while another uses `process.cwd()`, causing split-brain. +- **Worktree `.gsd/` inheritance** — when a worktree is created, it gets a copy of the project files from the milestone branch base. But `.gsd/` planning files from the main tree may or may not be wanted in the worktree. Need to decide: copy planning state or start fresh. +- **State machine re-entry** — if auto-mode is paused and resumed, the worktree must be re-entered (if it still exists). The pause/resume logic in `startAuto` needs to handle this. +- **Existing orphan recovery** — the current `mergeOrphanedSliceBranches` logic needs to work within the worktree context, not just on main. + +> See `.gsd/DECISIONS.md` for all architectural and pattern decisions — it is an append-only register; read it during planning, append to it during execution. + +## Relevant Requirements + +- R029 — Auto-worktree creation on milestone start +- R030 — Auto-worktree teardown + squash-merge on milestone complete +- R031 — `--no-ff` slice merges within milestone worktree +- R032 — Rich milestone-level squash commit message +- R033 — `git.isolation` preference +- R034 — `git.merge_to_main` preference +- R035 — Self-healing git repair on failure +- R036 — `.gsd/` conflict resolution elimination +- R037 — Zero git errors for vibe coders +- R038 — Backwards compatibility with branch-per-slice model +- R039 — Manual `/worktree` coexistence with auto-worktrees +- R040 — Doctor git health checks +- R041 — Test coverage for worktree-isolated flow + +## Scope + +### In Scope + +- Auto-worktree lifecycle wired into `startAuto()` and `complete-milestone` +- `--no-ff` merge for slices within worktree, squash for milestone to main +- `git.isolation` and `git.merge_to_main` preferences with validation +- Self-healing git repair (abort, reset, retry) for common failure modes +- Doctor git health checks (orphaned worktrees, stale branches, corrupt state) +- Simplification of `.gsd/` conflict resolution code (worktree mode only) +- Test suite for both worktree and branch isolation modes +- Backwards compatibility with existing branch-per-slice projects + +### Out of Scope / Non-Goals + +- Parallel milestone execution (deferred to future milestone) +- Native libgit2 write operations (deferred) +- Rebase merge strategy (anti-feature — conflicts with commit diary philosophy) +- Remote git operations beyond existing auto-push + +## Technical Constraints + +- Must work with git CLI (libgit2 native module is optional, read-only) +- `process.chdir` is the mechanism for worktree switching (proven in worktree-command.ts) +- All file tools (read, write, edit, bash) resolve against `process.cwd()` — this is the reason `chdir` works +- Source files are in `src/resources/extensions/gsd/`, tests in `src/resources/extensions/gsd/tests/` +- Tests run via `npm run test:unit` and `npm run test:integration` + +## Integration Points + +- `auto.ts` — primary integration point for worktree lifecycle in `startAuto()`, `dispatchNextUnit()`, `handleAgentEnd()` +- `git-service.ts` — `GitServiceImpl` class owns all git mutation operations +- `worktree.ts` — thin facade over `GitServiceImpl`, exports `ensureSliceBranch`, `mergeSliceToMain`, etc. +- `worktree-manager.ts` — existing worktree create/list/remove/merge operations +- `worktree-command.ts` — manual `/worktree` command with `process.chdir` handling +- `preferences.ts` — preference validation and loading +- `doctor.ts` — health check and auto-fix system +- `native-git-bridge.ts` — libgit2 read operations +- `dispatch-guard.ts` — prior-slice completion checking + +## Open Questions + +- **Worktree naming convention for auto-worktrees** — should auto-worktrees use the milestone ID as the name (`.gsd/worktrees/M003/`) or a prefixed name (`.gsd/worktrees/auto-M003/`)? Current thinking: bare milestone ID is cleaner and the branch convention (`milestone/M003` vs `worktree/`) disambiguates from manual worktrees. +- **`.gsd/` file handling on worktree creation** — should the worktree inherit the main tree's `.gsd/` planning files, or should they be cleared for a fresh start? Current thinking: inherit — the worktree needs the milestone's CONTEXT.md and ROADMAP.md to continue planning. diff --git a/.gsd/milestones/M003/M003-ROADMAP.md b/.gsd/milestones/M003/M003-ROADMAP.md new file mode 100644 index 000000000..bdee3c372 --- /dev/null +++ b/.gsd/milestones/M003/M003-ROADMAP.md @@ -0,0 +1,173 @@ +# M003: Worktree-Isolated Git Architecture + +**Vision:** Overhaul GSD's git system so that auto-mode is automagical — zero git errors, zero merge conflicts, zero user intervention required. Each milestone gets its own isolated worktree. Main is always clean. The system just runs. + +## Success Criteria + +- Auto-mode on a fresh project executes through an entire milestone without any git errors or halts +- Main branch only receives commits when milestones complete (one squash commit per milestone) +- Full commit history preserved within milestone worktree branches via `--no-ff` slice merges +- Existing branch-per-slice projects continue working identically — zero regressions +- Self-healing resolves common git failures (merge conflict, checkout issue, corrupt state) without user intervention +- `/gsd doctor` detects and fixes git health issues (orphaned worktrees, stale branches, corrupt merge state) + +## Key Risks / Unknowns + +- **`process.chdir` coherence in auto-mode** — all tool calls must resolve against the worktree path after chdir. The worktree-command.ts has proven this works, but auto-mode's `basePath` variable and `process.cwd()` must stay in sync. +- **Worktree `.gsd/` inheritance** — creating a worktree copies project files from the base branch. `.gsd/` planning files (CONTEXT, ROADMAP) must carry through; runtime files (STATE.md, metrics, activity) must not cause conflicts. +- **State machine re-entry on resume** — pausing and resuming auto-mode must re-enter the worktree if it exists. The current pause/resume logic doesn't handle this. + +## Proof Strategy + +- `process.chdir` coherence → retire in S01 by proving auto-mode dispatches and executes a unit inside the worktree with all file operations resolving correctly +- Worktree `.gsd/` inheritance → retire in S01 by proving planning files are available after worktree creation and runtime files don't conflict +- State machine re-entry → retire in S01 by proving pause/resume correctly re-enters the worktree + +## Verification Classes + +- Contract verification: git operations produce expected branch state, file layout, and commit history in temp repos +- Integration verification: full auto-mode lifecycle (create worktree → execute slices → merge milestone → teardown) in a real git repo +- Operational verification: existing branch-per-slice projects continue working; manual `/worktree` coexists +- UAT / human verification: run auto-mode on a real project and confirm zero git errors + +## Milestone Definition of Done + +This milestone is complete only when all are true: + +- Auto-worktree lifecycle works end-to-end (create, execute, merge, teardown) +- `--no-ff` slice merges produce correct history on milestone branch +- Milestone squash to main produces clean single commit +- `git.isolation` and `git.merge_to_main` preferences work with validation +- Self-healing recovers from common git failures without user intervention +- Existing branch-per-slice projects pass all existing tests +- `/gsd doctor` detects and fixes git health issues +- Full test suite passes for both worktree and branch isolation modes +- Success criteria re-checked against live behavior + +## Requirement Coverage + +- Covers: R029, R030, R031, R032, R033, R034, R035, R036, R037, R038, R039, R040, R041 +- Partially covers: none +- Leaves for later: R042 (parallel milestones), R043 (native libgit2 writes) +- Orphan risks: none + +## Slices + +- [ ] **S01: Auto-worktree lifecycle in auto-mode** `risk:high` `depends:[]` + > After this: `startAuto()` on a new milestone creates a worktree under `.gsd/worktrees/M003/`, `chdir`s into it, and dispatches units inside the worktree. Pause/resume re-enters the worktree. Progress widget shows the worktree branch. Verified via running auto-mode unit dispatch in a temp repo worktree. + +- [ ] **S02: --no-ff slice merges + conflict elimination** `risk:high` `depends:[S01]` + > After this: completed slices merge into the milestone branch via `--no-ff` instead of squash. The `.gsd/` auto-resolve conflict code in `mergeSliceToMain` is bypassed in worktree mode. `git log` on the milestone branch shows full commit history with merge commit boundaries per slice. Verified in temp repo. + +- [ ] **S03: Milestone-to-main squash merge + worktree teardown** `risk:high` `depends:[S01,S02]` + > After this: `complete-milestone` squash-merges the milestone branch to main with a rich commit message listing all slices, removes the worktree, `chdir`s back to the main project root. `git log main` shows one clean commit. Auto-push works if enabled. Verified in temp repo with remote. + +- [ ] **S04: Preferences + backwards compatibility** `risk:medium` `depends:[S01]` + > After this: `git.isolation: "worktree"` (default for new projects) / `"branch"` (existing projects) and `git.merge_to_main: "milestone"` / `"slice"` preferences are validated and respected. An existing project with `gsd/*` branches defaults to branch mode and works identically to today. Verified by running tests in both modes. + +- [ ] **S05: Self-healing git repair** `risk:medium` `depends:[S01,S02,S03]` + > After this: when a merge fails or checkout breaks during auto-mode, the system aborts the failed operation, resets working tree state, and retries. Only truly unresolvable conflicts (real code conflicts between human-edited files) pause auto-mode. Users see non-technical messages, not raw git errors. Verified by deliberately introducing failures and confirming auto-recovery. + +- [ ] **S06: Doctor + cleanup + code simplification** `risk:low` `depends:[S01,S02,S03,S05]` + > After this: `/gsd doctor` detects orphaned auto-worktrees, stale milestone branches, corrupt merge state (MERGE_HEAD/SQUASH_MSG), and tracked runtime files — and fixes them. Dead `.gsd/` conflict resolution code removed from worktree-mode paths in git-service.ts. Verified via doctor test cases. + +- [ ] **S07: Test suite for worktree-isolated flow** `risk:low` `depends:[S01,S02,S03,S04,S05,S06]` + > After this: full test coverage for auto-worktree create/teardown, `--no-ff` slice merge, milestone squash, preference switching, self-heal scenarios, doctor checks. All existing git tests still pass. Both isolation modes tested. Verified via `npm run test:unit && npm run test:integration`. + + + +## Boundary Map + +### S01 → S02, S03, S04, S05 + +Produces: +- `createAutoWorktree(basePath, milestoneId)` — creates worktree, returns worktree path +- `teardownAutoWorktree(basePath, milestoneId)` — removes worktree, returns to main tree +- `isInAutoWorktree(basePath)` → boolean — detects if currently in an auto-worktree +- `getAutoWorktreePath(basePath, milestoneId)` → string | null — resolves worktree path +- `enterAutoWorktree(basePath, milestoneId)` — `process.chdir` into existing worktree +- Updated `startAuto()` in auto.ts that creates/enters worktree on milestone start +- Updated pause/resume logic that re-enters worktree on resume + +Consumes: +- nothing (first slice) + +### S01 → S02 + +Produces: +- The worktree infrastructure that S02 merges slices within + +Consumes: +- nothing (first slice) + +### S02 → S03 + +Produces: +- `mergeSliceToMilestone(basePath, milestoneId, sliceId, sliceTitle)` — `--no-ff` merge of slice branch into milestone branch within worktree +- Simplified merge path that skips `.gsd/` conflict resolution in worktree mode + +Consumes from S01: +- `isInAutoWorktree()` to determine which merge strategy to use + +### S02 → S06 + +Produces: +- Knowledge of which conflict resolution code is dead in worktree mode + +Consumes from S01: +- Worktree detection functions + +### S03 → S05 + +Produces: +- `mergeMilestoneToMain(basePath, milestoneId)` — squash-merge milestone branch to main +- `buildMilestoneCommitMessage(milestoneId, milestoneTitle, slices)` — rich squash commit + +Consumes from S01: +- `teardownAutoWorktree()` for worktree removal after merge +- `isInAutoWorktree()` for detection + +Consumes from S02: +- Merged milestone branch with `--no-ff` slice history + +### S04 → S01, S02, S03 + +Produces: +- `git.isolation` preference — `"worktree"` | `"branch"` +- `git.merge_to_main` preference — `"milestone"` | `"slice"` +- `shouldUseWorktreeIsolation(basePath)` — resolves effective isolation mode +- Preference validation in `preferences.ts` + +Consumes from S01: +- Auto-worktree functions (gated by isolation preference) + +### S05 → S06 + +Produces: +- Structured git error handling patterns (try/abort/reset/retry) +- User-facing error message formatting + +Consumes from S01: +- Worktree detection (to scope repair to correct working tree) +Consumes from S02: +- Merge operations that may fail +Consumes from S03: +- Milestone merge that may fail + +### S06 → S07 + +Produces: +- Doctor git health check functions +- Simplified git-service.ts with dead code removed + +Consumes from S05: +- Error handling patterns for doctor fix operations