diff --git a/.gsd/DECISIONS.md b/.gsd/DECISIONS.md new file mode 100644 index 000000000..9909cd4f2 --- /dev/null +++ b/.gsd/DECISIONS.md @@ -0,0 +1,20 @@ +# Decisions Register + + + +| # | When | Scope | Decision | Choice | Rationale | Revisable? | +|---|------|-------|----------|--------|-----------|------------| +| D001 | M001 | arch | Secret collection insertion point | At `/gsd auto` entry (startAuto), not as a dispatch unit type | Keeps the state machine untouched. Collection is a one-time gate, not a repeating unit. Simpler, less risk of dispatch loop bugs. | Yes — if collection needs to happen mid-milestone | +| D002 | M001 | convention | Manifest file naming | `M00x-SECRETS.md` via existing `resolveMilestoneFile(base, mid, "SECRETS")` | Consistent with all other milestone-level files (CONTEXT, ROADMAP, RESEARCH). No new path resolver needed. | No | +| D003 | M001 | pattern | Summary screen interactivity | Read-only with auto-skip (no interactive deselection) | Matches the "walk away" philosophy. Simpler UX, fewer edge cases. User can always re-run collection. | Yes — if users request deselection | +| D004 | M001 | pattern | Guidance display placement | Same page as masked input (above the editor) | Single page per key — no extra navigation. User sees guidance while entering the value. | Yes — if terminal height constraints cause problems | +| D005 | M001 | convention | Manifest format | Markdown with H3 sections per key, bold fields, numbered guidance | Consistent with all other .gsd files. Parser and formatter already exist in files.ts. | No | +| D006 | M001 | arch | Destination inference | Reuse existing `detectDestination()` from get-secrets-from-user.ts | Simple file-presence checks (vercel.json → Vercel, convex/ → Convex, default → .env). Already proven. | Yes — if per-key destination override needed | +| D007 | M002 | arch | File structure after module split | Split index.ts into state.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts, utils.ts, evaluate-helpers.ts, and tools/ directory | 5000-line monolith is unmaintainable; module boundaries enable safe changes. core.js already established the pattern. | No | +| D008 | M002 | library | Image resizing library | sharp | Fast, well-maintained, standard Node image processing. Replaces fragile canvas-based approach that depends on page context. | No | +| D009 | M002 | convention | Navigate screenshot default | Off by default, opt-in via parameter | Big token savings. Agent uses browser_screenshot explicitly when visual verification needed. | Yes — if agents consistently need screenshots on navigate | +| D010 | M002 | arch | Browser-side utility injection | page.addInitScript under window.__pi namespace | Survives navigation, available before page scripts, namespaced to avoid collisions. | Yes — if timing issues discovered | +| D011 | M002 | convention | Intent resolution approach | Deterministic heuristics only, no LLM calls | Predictable latency and cost. Scoring functions are testable and debuggable. | Yes — if heuristic coverage proves insufficient | +| D012 | M002 | convention | Browser reuse across sessions | Skip completely | Architecturally different from within-session work; user directed to exclude entirely. | No | diff --git a/.gsd/PROJECT.md b/.gsd/PROJECT.md new file mode 100644 index 000000000..42e1f4517 --- /dev/null +++ b/.gsd/PROJECT.md @@ -0,0 +1,41 @@ +# Project + +## What This Is + +A pi coding agent extension (GSD — "Get Stuff Done") that provides structured planning, auto-mode execution, and project management for autonomous coding sessions. Includes proactive secret management and browser automation tools for UI verification. + +## Core Value + +Auto-mode runs from start to finish without blocking. The agent has hands, eyes, and judgment in the browser — fast, token-efficient, and reliable. + +## Current State + +The GSD extension is fully functional with: +- Milestone/slice/task planning hierarchy +- Auto-mode state machine with fresh-session-per-unit dispatch +- Guided `/gsd` wizard flow +- `secure_env_collect` tool with masked TUI input, multi-destination write support, guidance display, and summary screen +- Proactive secret management: planning prompts forecast secrets, manifests persist them, auto-mode collects them before first dispatch +- Browser-tools extension with 43 registered tools covering navigation, interaction, inspection, verification, tracing, and debugging +- Browser-tools `core.js` with shared utilities for action timeline, page registry, state diffing, assertions, fingerprinting + +## Architecture / Key Patterns + +- **Extension model**: pi extensions register tools, commands, hooks via `ExtensionAPI` +- **State machine**: `auto.ts` drives `dispatchNextUnit()` which reads disk state and dispatches fresh sessions +- **Secrets gate**: `startAuto()` checks `getManifestStatus()` before first dispatch +- **Disk-driven state**: `.gsd/` files are the source of truth, `STATE.md` is derived cache +- **File parsing**: `files.ts` has markdown parsers for all GSD file types +- **Browser-tools**: Single `index.ts` (~5000 lines) with all tool registrations, shared infrastructure in `core.js` (~1000 lines). Uses Playwright for browser control. Accessibility-first state representation, deterministic versioned refs, adaptive DOM settling, compact post-action summaries. +- **Prompt templates**: `prompts/` directory with mustache-like `{{var}}` substitution +- **TUI components**: `@gsd/pi-tui` provides `Editor`, `Text`, key handling, themes +- **Branch-per-slice**: git branches isolate slice work, squash-merged to main on completion + +## Capability Contract + +See `.gsd/REQUIREMENTS.md` for the explicit capability contract, requirement status, and coverage mapping. + +## Milestone Sequence + +- [x] M001: Proactive Secret Management — Front-loaded API key collection into planning so auto-mode runs uninterrupted (10 requirements validated) +- [ ] M002: Browser Tools Performance & Intelligence — Module decomposition, action pipeline optimization, sharp-based screenshots, form intelligence, intent-ranked retrieval, semantic actions diff --git a/.gsd/REQUIREMENTS.md b/.gsd/REQUIREMENTS.md new file mode 100644 index 000000000..fea891ae2 --- /dev/null +++ b/.gsd/REQUIREMENTS.md @@ -0,0 +1,360 @@ +# Requirements + +This file is the explicit capability and coverage contract for the project. + +## Active + +### R015 — Module decomposition of browser-tools +- Class: quality-attribute +- Status: active +- Description: The monolithic browser-tools index.ts (~5000 lines) is split into focused modules: shared infrastructure, tool groups, and browser-side utilities. All 43 existing tools continue to work identically. +- Why it matters: A 5000-line file is unmaintainable and makes targeted changes risky. Module boundaries enable safe refactoring and new tool development. +- Source: user +- Primary owning slice: M002/S01 +- Supporting slices: none +- Validation: unmapped +- Notes: core.js already exists with ~1000 lines of shared utilities. The split extends this pattern. + +### R016 — Shared browser-side evaluate utilities +- Class: quality-attribute +- Status: active +- Description: Common functions duplicated across page.evaluate boundaries (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName) are injected once and referenced from all evaluate callbacks. +- Why it matters: Currently buildRefSnapshot and resolveRefTarget each redeclare ~100 lines of identical utility code. Deduplication reduces payload size, improves maintainability, and ensures consistency. +- Source: user +- Primary owning slice: M002/S01 +- Supporting slices: none +- Validation: unmapped +- Notes: Options include page.addInitScript or a one-time setup evaluate that attaches to window. + +### R017 — Consolidated state capture per action +- Class: core-capability +- Status: active +- Description: The before-state capture, after-state capture, post-action summary, and recent-error check are consolidated into fewer page.evaluate calls per action. Target: ~50-100ms savings per action. +- Why it matters: Every action tool currently runs 3-4 separate page.evaluate calls for state capture. Consolidating them reduces latency on every single browser interaction. +- Source: user +- Primary owning slice: M002/S02 +- Supporting slices: M002/S01 +- Validation: unmapped +- Notes: captureCompactPageState and postActionSummary can likely be merged into a single evaluate. + +### R018 — Conditional body text capture +- Class: core-capability +- Status: active +- Description: Body text capture (includeBodyText: true) is skipped for low-signal actions (scroll, hover, Tab key press) and enabled for high-signal actions (navigate, click, type, submit). +- Why it matters: Capturing 4000 chars of body text on every scroll or hover is wasteful. Conditional capture reduces evaluate overhead for frequent low-signal actions. +- Source: user +- Primary owning slice: M002/S02 +- Supporting slices: none +- Validation: unmapped +- Notes: Requires classifying each tool as high-signal or low-signal. + +### R019 — Faster settle on zero mutations +- Class: core-capability +- Status: active +- Description: settleAfterActionAdaptive short-circuits with a smaller quiet window when no mutation observer fires in the first 60ms. Target: ~40-80ms savings on zero-mutation actions. +- Why it matters: Many SPA interactions produce no DOM changes. The current settle logic always waits the full quiet window regardless. Short-circuiting saves time on the most common case. +- Source: user +- Primary owning slice: M002/S02 +- Supporting slices: none +- Validation: unmapped +- Notes: Track whether any mutation fired at all; if zero after 60ms, use a shorter quiet window. + +### R020 — Sharp-based screenshot resizing +- Class: core-capability +- Status: active +- Description: constrainScreenshot uses the sharp Node library for image resizing instead of bouncing buffers through page canvas context. Faster, no page dependency. +- Why it matters: The current approach sends full screenshot buffer to the page as base64, creates an Image, draws to canvas, exports, then sends back. This is slow and fragile (depends on working canvas context). +- Source: user +- Primary owning slice: M002/S03 +- Supporting slices: M002/S01 +- Validation: unmapped +- Notes: sharp added as a dependency. API: sharp(buffer).resize(w, h, { fit: 'inside' }).jpeg({ quality }).toBuffer() + +### R021 — Opt-in screenshots on navigate +- Class: core-capability +- Status: active +- Description: browser_navigate does not capture or return a screenshot by default. An explicit parameter (e.g. screenshot: true) opts in to screenshot capture. +- Why it matters: The current always-inline screenshot is a large base64 payload in every navigation response. For many verifications the compact page summary + diff is sufficient. Significant token savings. +- Source: user +- Primary owning slice: M002/S03 +- Supporting slices: none +- Validation: unmapped +- Notes: Default is off. The agent can still use browser_screenshot explicitly when visual verification is needed. + +### R022 — Form analysis tool (browser_analyze_form) +- Class: core-capability +- Status: active +- Description: A browser_analyze_form tool that takes a form selector and returns field inventory including: field labels, names, types, required status, current values, validation state, and submit controls. +- Why it matters: A huge percentage of browser tasks are form tasks. Currently the agent needs 3-8 tool calls to analyze a form. This collapses that into one call. +- Source: user +- Primary owning slice: M002/S04 +- Supporting slices: M002/S01 +- Validation: unmapped +- Notes: Must handle label association via for/id, wrapping label, aria-label, aria-labelledby, and placeholder. + +### R023 — Form fill tool (browser_fill_form) +- Class: core-capability +- Status: active +- Description: A browser_fill_form tool that takes a form selector, a values object mapping field identifiers to values, and an optional submit flag. Maps labels/names/placeholders to inputs and fills them. +- Why it matters: Filling a login form currently takes 3-5 tool calls (find inputs, type email, type password, click submit). This collapses it to one call. +- Source: user +- Primary owning slice: M002/S04 +- Supporting slices: M002/S01 +- Validation: unmapped +- Notes: Returns matched fields, unmatched values, fields skipped, and validation state after fill. + +### R024 — Intent-ranked element retrieval (browser_find_best) +- Class: core-capability +- Status: active +- Description: A browser_find_best tool that takes an intent string (e.g. "submit form", "close dialog", "primary CTA") and returns scored candidates with reasons, using deterministic heuristic ranking. +- Why it matters: The agent frequently needs "which button submits this form?" Currently it does browser_find → gets 15 candidates → reasons about which one. A heuristic ranker cuts a round trip and reduces reasoning tokens. +- Source: user +- Primary owning slice: M002/S05 +- Supporting slices: M002/S01 +- Validation: unmapped +- Notes: Deterministic heuristics only. No hidden LLM calls. + +### R025 — Semantic action tool (browser_act) +- Class: core-capability +- Status: active +- Description: A browser_act tool that takes a semantic intent (e.g. "submit the current form", "close the active modal", "click the primary CTA") and executes the obvious action sequence internally. +- Why it matters: Each of these common micro-tasks currently takes 2-4 tool calls. browser_act collapses them into one. +- Source: user +- Primary owning slice: M002/S05 +- Supporting slices: M002/S04 +- Validation: unmapped +- Notes: Builds on browser_find_best for element selection. Bounded — does not loop or retry. + +### R026 — Test coverage for new and refactored code +- Class: quality-attribute +- Status: active +- Description: Test suite covers shared browser-side utilities, settle logic, screenshot resizing, form tools, and intent ranking. Tests verify correctness and guard against regressions. +- Why it matters: A 5000-line file with zero tests is fragile. The refactoring and new features need regression protection. +- Source: user +- Primary owning slice: M002/S06 +- Supporting slices: all M002 slices +- Validation: unmapped +- Notes: Test what's unit-testable without a running browser (heuristics, scoring, utility functions). Integration tests with Playwright for tools that need a page. + +## Validated + +### R001 — Secret forecasting during milestone planning +- Class: core-capability +- Status: validated +- Description: When a milestone is planned, the LLM analyzes slices for external service dependencies and writes a secrets manifest listing every predicted API key with setup guidance. +- Why it matters: Without forecasting, auto-mode discovers missing keys mid-execution and blocks for hours waiting for user input. +- Source: user +- Primary owning slice: M001/S01 +- Supporting slices: none +- Validation: plan-milestone.md Secret Forecasting section (line 62) instructs LLM to write manifest. Parser round-trip tested in parsers.test.ts. +- Notes: The plan-milestone prompt has forecasting instructions. The manifest format and parser are implemented and tested. + +### R002 — Secrets manifest persisted in .gsd/ +- Class: continuity +- Status: validated +- Description: The secrets manifest is a durable markdown file at `.gsd/milestones/M00x/M00x-SECRETS.md` that survives session boundaries and can be re-read by any future unit. +- Why it matters: Collection may happen in a different session than planning. The manifest must persist on disk. +- Source: user +- Primary owning slice: M001/S01 +- Supporting slices: none +- Validation: parseSecretsManifest/formatSecretsManifest round-trip tested (parsers.test.ts), resolveMilestoneFile(base, mid, "SECRETS") resolves path. +- Notes: Parser/formatter implemented in files.ts. Template exists at templates/secrets-manifest.md. + +### R003 — Step-by-step guidance per key +- Class: primary-user-loop +- Status: validated +- Description: Each secret in the manifest includes numbered steps for obtaining the key (navigate to dashboard → create project → generate key → copy), a dashboard URL, and a format hint. +- Why it matters: Users shouldn't have to figure out where to find each key. The guidance makes collection self-service. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: M001/S01 +- Validation: collectOneSecret renders numbered dim-styled guidance steps with wrapping (collect-from-manifest.test.ts tests 6-8). +- Notes: Guidance quality is LLM-dependent and best-effort. + +### R004 — Summary screen before collection +- Class: primary-user-loop +- Status: validated +- Description: Before collecting secrets one-by-one, show a read-only summary screen listing all needed keys with their status (pending / already set / skipped). Auto-skip keys that already exist in the environment. +- Why it matters: The user needs to see the full picture before entering keys. Already-set keys should not require re-entry. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: showSecretsSummary() renders read-only ctx.ui.custom screen with status indicators via makeUI().progressItem() (collect-from-manifest.test.ts tests 4-5). +- Notes: Read-only with auto-skip — no interactive deselection. + +### R005 — Existing key detection and silent skip +- Class: primary-user-loop +- Status: validated +- Description: Before prompting for a key, check `.env` and `process.env`. If the key already exists, mark it as "already set" in the summary and skip collection. +- Why it matters: Users shouldn't re-enter keys they've already configured. Prevents frustration and errors. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: getManifestStatus cross-references checkExistingEnvKeys, categorizes env-present keys as existing (manifest-status.test.ts tests 4,7). collectSecretsFromManifest skips them (collect-from-manifest.test.ts tests 1-2). +- Notes: `checkExistingEnvKeys()` implemented in get-secrets-from-user.ts. + +### R006 — Smart destination detection +- Class: integration +- Status: validated +- Description: Automatically detect whether secrets should go to .env, Vercel, or Convex based on project file presence (vercel.json → Vercel, convex/ dir → Convex, default → .env). +- Why it matters: Users shouldn't have to specify the destination manually. The system should do the right thing. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: collectSecretsFromManifest calls detectDestination() for destination inference. applySecrets() routes to dotenv/vercel/convex accordingly. +- Notes: `detectDestination()` implemented in get-secrets-from-user.ts. + +### R007 — Auto-mode collection at entry point +- Class: core-capability +- Status: validated +- Description: When the user runs `/gsd auto`, check for a secrets manifest with pending keys. If found, collect them before dispatching the first slice. Collection happens once at the entry point, not as a dispatch unit. +- Why it matters: This is the primary integration point — auto-mode must not start execution with uncollected secrets. +- Source: user +- Primary owning slice: M001/S03 +- Supporting slices: M001/S01, M001/S02 +- Validation: startAuto() secrets gate at auto.ts:479. auto-secrets-gate.test.ts — 3/3 pass covering null manifest, pending keys, and no-pending-keys paths. +- Notes: Collection at entry point (startAuto), not as a separate unit type in dispatchNextUnit. D001 satisfied. + +### R008 — Guided /gsd wizard integration +- Class: core-capability +- Status: validated +- Description: After milestone planning in the guided `/gsd` flow, trigger secret collection if a manifest exists with pending keys. +- Why it matters: Users who plan via the wizard should also get prompted for secrets before auto-mode begins. +- Source: user +- Primary owning slice: M001/S03 +- Supporting slices: M001/S01, M001/S02 +- Validation: guided-flow.ts calls startAuto() directly (lines 52, 486, 647, 794) — all guided flow paths that start auto-mode inherit the secrets gate. +- Notes: The guided flow dispatches to startAuto after planning. Collection is inherited via the gate. + +### R009 — Planning prompts instruct LLM to forecast secrets +- Class: integration +- Status: validated +- Description: The plan-milestone prompt template includes instructions for the LLM to analyze slices for external service dependencies and write the secrets manifest. +- Why it matters: Without prompt instructions, the LLM won't know to forecast secrets. +- Source: user +- Primary owning slice: M001/S01 +- Supporting slices: none +- Validation: plan-milestone.md has Secret Forecasting section at line 62 with instructions to write {{secretsOutputPath}} with H3 sections per key. +- Notes: Implemented in plan-milestone.md. + +### R010 — secure_env_collect enhanced with guidance display +- Class: primary-user-loop +- Status: validated +- Description: The secure_env_collect TUI renders multi-line guidance steps above the masked input field on the same page, so the user sees setup instructions while entering the key. +- Why it matters: Without visible guidance, the user has to find keys on their own despite the LLM having generated instructions. +- Source: user +- Primary owning slice: M001/S02 +- Supporting slices: none +- Validation: collectOneSecret accepts guidance parameter, renders numbered dim-styled lines with wrapTextWithAnsi above masked input (collect-from-manifest.test.ts tests 6-8). +- Notes: The guidance field is rendered in collectOneSecret(). + +## Deferred + +### R011 — Multi-milestone secret forecasting +- Class: core-capability +- Status: deferred +- Description: Forecast secrets across all planned milestones, not just the active one. +- Why it matters: Would provide a complete picture of all secrets needed for the project. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — single-milestone forecasting is sufficient for now. + +### R012 — Secret rotation reminders +- Class: operability +- Status: deferred +- Description: Track secret age and remind users when keys may need rotation. +- Why it matters: Security best practice, but not essential for the core workflow. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — out of scope for initial release. + +### R027 — Browser reuse across sessions +- Class: core-capability +- Status: deferred +- Description: Keep a warm browser instance across rapid successive agent contexts (e.g. GSD auto-mode cycling through tasks) to avoid ~2-3s Chrome cold-start per session. +- Why it matters: Would eliminate Chrome launch latency in auto-mode. But requires inter-process coordination and is architecturally different from within-session optimizations. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: unmapped +- Notes: Deferred — skip completely per user direction. + +## Out of Scope + +### R013 — Curated service knowledge base +- Class: anti-feature +- Status: out-of-scope +- Description: A static database of known services with pre-written guidance for each API key. +- Why it matters: Prevents scope creep. LLM-generated guidance is sufficient and stays current without maintenance. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: n/a +- Notes: LLM generates guidance dynamically. A static KB would become stale. + +### R014 — Just-in-time collection enhancement +- Class: anti-feature +- Status: out-of-scope +- Description: Detect missing secrets during task execution and collect them inline. +- Why it matters: Prevents scope confusion. The whole point of M001 is proactive collection, not reactive. +- Source: user +- Primary owning slice: none +- Supporting slices: none +- Validation: n/a +- Notes: Existing secure_env_collect already handles reactive collection. This milestone is about proactive. + +### R028 — LLM-powered intent resolution +- Class: anti-feature +- Status: out-of-scope +- Description: Using hidden LLM calls inside browser_find_best or browser_act for intent resolution. +- Why it matters: Prevents unpredictable latency and cost. Intent resolution must be deterministic heuristics only. +- Source: inferred +- Primary owning slice: none +- Supporting slices: none +- Validation: n/a +- Notes: browser_find_best and browser_act use scoring heuristics, not LLM inference. + +## Traceability + +| ID | Class | Status | Primary owner | Supporting | Proof | +|---|---|---|---|---|---| +| R001 | core-capability | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section, parser round-trip tests | +| R002 | continuity | validated | M001/S01 | none | parseSecretsManifest/formatSecretsManifest round-trip tested | +| R003 | primary-user-loop | validated | M001/S02 | M001/S01 | collect-from-manifest.test.ts tests 6-8 | +| R004 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 4-5 | +| R005 | primary-user-loop | validated | M001/S02 | none | manifest-status.test.ts tests 4,7; collect-from-manifest.test.ts tests 1-2 | +| R006 | integration | validated | M001/S02 | none | collectSecretsFromManifest calls detectDestination() | +| R007 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | auto-secrets-gate.test.ts 3/3 pass | +| R008 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | guided-flow.ts calls startAuto() at lines 52, 486, 647, 794 | +| R009 | integration | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section line 62 | +| R010 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 6-8 | +| R011 | core-capability | deferred | none | none | unmapped | +| R012 | operability | deferred | none | none | unmapped | +| R013 | anti-feature | out-of-scope | none | none | n/a | +| R014 | anti-feature | out-of-scope | none | none | n/a | +| R015 | quality-attribute | active | M002/S01 | none | unmapped | +| R016 | quality-attribute | active | M002/S01 | none | unmapped | +| R017 | core-capability | active | M002/S02 | M002/S01 | unmapped | +| R018 | core-capability | active | M002/S02 | none | unmapped | +| R019 | core-capability | active | M002/S02 | none | unmapped | +| R020 | core-capability | active | M002/S03 | M002/S01 | unmapped | +| R021 | core-capability | active | M002/S03 | none | unmapped | +| R022 | core-capability | active | M002/S04 | M002/S01 | unmapped | +| R023 | core-capability | active | M002/S04 | M002/S01 | unmapped | +| R024 | core-capability | active | M002/S05 | M002/S01 | unmapped | +| R025 | core-capability | active | M002/S05 | M002/S04 | unmapped | +| R026 | quality-attribute | active | M002/S06 | all M002 | unmapped | +| R027 | core-capability | deferred | none | none | unmapped | +| R028 | anti-feature | out-of-scope | none | none | n/a | + +## Coverage Summary + +- Active requirements: 12 +- Validated requirements: 10 +- Deferred requirements: 3 +- Out of scope: 3 +- Unmapped active requirements: 12 diff --git a/.gsd/STATE.md b/.gsd/STATE.md new file mode 100644 index 000000000..1ddb55e41 --- /dev/null +++ b/.gsd/STATE.md @@ -0,0 +1,24 @@ +# GSD State + +**Active Milestone:** M002 — Browser Tools Performance & Intelligence +**Active Slice:** None +**Active Task:** None +**Phase:** planned +**Requirements Status:** 12 active · 10 validated · 3 deferred · 3 out of scope + +## Milestone Registry +- ✅ **M001:** Proactive Secret Management +- 🔵 **M002:** Browser Tools Performance & Intelligence + +## Recent Decisions +- D007: Split index.ts into focused modules (state, lifecycle, capture, settle, refs, utils, evaluate-helpers, tools/) +- D008: Use sharp for image resizing +- D009: Navigate screenshots off by default +- D010: Inject browser-side utilities via addInitScript under window.__pi +- D011: Deterministic heuristics only for intent resolution + +## Blockers +- None + +## Next Action +Begin S01: Module decomposition and shared evaluate utilities — plan the slice, then execute. diff --git a/.gsd/milestones/M002/M002-CONTEXT.md b/.gsd/milestones/M002/M002-CONTEXT.md new file mode 100644 index 000000000..d3aeaf77d --- /dev/null +++ b/.gsd/milestones/M002/M002-CONTEXT.md @@ -0,0 +1,120 @@ +# M002: Browser Tools Performance & Intelligence — Context + +**Gathered:** 2026-03-12 +**Status:** Ready for planning + +## Project Description + +Performance optimization and capability expansion of pi's browser-tools extension. The extension provides 43 browser interaction tools to the coding agent via Playwright. This milestone decomposes the monolithic 5000-line index.ts into modules, optimizes the per-action performance pipeline, replaces canvas-based screenshot resizing with sharp, and adds form intelligence, intent-ranked element retrieval, and semantic action tools. + +## Why This Milestone + +The browser-tools extension is the agent's primary interface for UI verification and testing. Every action pays a latency tax from redundant page.evaluate calls, unnecessary body text capture, and canvas-based screenshot resizing. The monolithic file structure makes changes risky. And the most common browser tasks (forms, finding the right button, executing obvious micro-actions) still require multiple tool calls where one would suffice. + +## User-Visible Outcome + +### When this milestone is complete, the user can: + +- See faster browser interactions (fewer evaluate round-trips, faster settle, faster screenshots) +- See smaller token payloads (no screenshots on navigate by default, no body text on scroll/hover) +- Use `browser_analyze_form` to inspect any form's fields, types, values, and validation in one call +- Use `browser_fill_form` to fill a form by label/name/placeholder mapping in one call +- Use `browser_find_best` with an intent to get scored element candidates +- Use `browser_act` to execute common micro-tasks ("submit form", "close modal") in one call + +### Entry point / environment + +- Entry point: pi CLI with browser-tools extension loaded +- Environment: local dev, any website/web app +- Live dependencies involved: Playwright browser instance, sharp npm package + +## Completion Class + +- Contract complete means: Tests pass for shared utilities, heuristic scoring, form analysis logic, and screenshot resizing +- Integration complete means: All 43 existing tools work with the new module structure; new tools work against real web pages +- Operational complete means: Build succeeds; the extension loads and registers all tools + +## Final Integrated Acceptance + +To call this milestone complete, we must prove: + +- All existing browser tools work identically after module decomposition (build + behavioral spot-check) +- New tools (browser_analyze_form, browser_fill_form, browser_find_best, browser_act) register and execute against a real page +- Screenshot resizing uses sharp (no canvas evaluate calls) +- Navigate returns no screenshot by default +- Test suite passes + +## Risks and Unknowns + +- Module split regression risk — 43 tools sharing module-level state (browser, context, pageRegistry, logs) must all still work after decomposition +- sharp native dependency — binary compatibility across platforms (macOS, Linux) +- addInitScript timing — injected scripts must be available before any evaluate that references them, including on new pages and after navigation +- Form label association complexity — real-world forms use diverse patterns (for/id, wrapping labels, aria-label, aria-labelledby, placeholder, custom components) + +## Existing Codebase / Prior Art + +- `src/resources/extensions/browser-tools/index.ts` — The monolithic file being decomposed (~5000 lines, 43 tools, all shared infrastructure) +- `src/resources/extensions/browser-tools/core.js` — Existing shared utilities (~1000 lines: action timeline, page registry, state diffing, assertions, fingerprinting, snapshot modes, batch execution) +- `src/resources/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md` — Design proposal; many items already implemented (assertions, batch, diff, timeline, pages, frames, traces). M002 covers remaining items: form intelligence, intent ranking, semantic actions, plus performance work not in V2 proposal. +- `src/resources/extensions/browser-tools/package.json` — Extension package metadata + +> See `.gsd/DECISIONS.md` for all architectural and pattern decisions — it is an append-only register; read it during planning, append to it during execution. + +## Relevant Requirements + +- R015 — Module decomposition: split index.ts into focused modules +- R016 — Shared evaluate utilities: inject once, reference everywhere +- R017 — Consolidated state capture: fewer evaluate calls per action +- R018 — Conditional body text: skip for low-signal actions +- R019 — Faster settle: short-circuit on zero mutations +- R020 — Sharp-based screenshot resizing +- R021 — Opt-in navigate screenshots +- R022 — browser_analyze_form +- R023 — browser_fill_form +- R024 — browser_find_best +- R025 — browser_act +- R026 — Test coverage + +## Scope + +### In Scope + +- Decomposing index.ts into modules (core infrastructure, tool groups, browser-side utilities) +- Injecting shared browser-side utilities once via addInitScript or setup evaluate +- Consolidating captureCompactPageState + postActionSummary into fewer evaluate calls +- Conditional body text capture based on action signal level +- Short-circuiting settle on zero-mutation actions +- Replacing constrainScreenshot canvas approach with sharp +- Making screenshots opt-in on browser_navigate (default off) +- New tool: browser_analyze_form +- New tool: browser_fill_form +- New tool: browser_find_best (deterministic heuristic scoring) +- New tool: browser_act (semantic micro-actions) +- Test coverage for new and refactored code + +### Out of Scope / Non-Goals + +- Browser reuse across sessions (deferred, skip completely) +- LLM-powered intent resolution (deterministic heuristics only) +- Changes to core.js beyond what's needed for the module split +- Changes to existing tool APIs (all 43 existing tools maintain their current interface) + +## Technical Constraints + +- Must maintain backward compatibility for all 43 existing tools +- sharp is acceptable as a native dependency +- Browser-side injected utilities must work on any web page (no assumptions about page content) +- addInitScript runs before page scripts; must not conflict with page globals +- All injected browser-side code must use a namespaced global (e.g. window.__pi) to avoid collisions + +## Integration Points + +- Playwright — browser automation library, provides page.evaluate, page.addInitScript, locator API +- sharp — Node image processing library, replaces canvas-based constrainScreenshot +- pi extension API — registerTool, pi.on("session_shutdown"), ExtensionAPI interface +- core.js — existing shared utilities that index.ts imports + +## Open Questions + +- Best approach for shared evaluate utilities: page.addInitScript vs one-time page.evaluate at ensureBrowser time — addInitScript survives navigation but runs before page scripts; setup evaluate is simpler but must be re-run on navigation. Likely addInitScript is correct. +- How to handle the module-level mutable state (browser, context, pageRegistry, logs, refs) during decomposition — probably a shared state module that all tool modules import. diff --git a/.gsd/milestones/M002/M002-ROADMAP.md b/.gsd/milestones/M002/M002-ROADMAP.md new file mode 100644 index 000000000..3d715c294 --- /dev/null +++ b/.gsd/milestones/M002/M002-ROADMAP.md @@ -0,0 +1,169 @@ +# M002: Browser Tools Performance & Intelligence + +**Vision:** Transform browser-tools from a monolithic 5000-line file into a modular, faster, and smarter browser automation layer. Reduce per-action latency through consolidated state capture and faster settling. Replace fragile canvas screenshot resizing with sharp. Add form intelligence, intent-ranked retrieval, and semantic action tools that collapse common multi-call patterns into single tool calls. + +## Success Criteria + +- All 43 existing browser tools work identically after module decomposition +- Per-action latency reduced by consolidating state capture evaluate calls +- settleAfterActionAdaptive short-circuits on zero-mutation actions +- constrainScreenshot uses sharp in Node, not page canvas +- browser_navigate returns no screenshot by default +- browser_analyze_form returns field inventory for any standard HTML form +- browser_fill_form fills fields by label/name/placeholder mapping +- browser_find_best returns scored candidates for semantic intents +- browser_act executes common micro-tasks in one call +- Test suite covers shared utilities, heuristics, and new tools + +## Key Risks / Unknowns + +- Module split regression — 43 tools sharing mutable module-level state must all survive decomposition +- addInitScript behavior — injected utilities must be available in all evaluate contexts, survive navigation, not collide with page globals +- Form label association — real-world forms use diverse patterns; the heuristic mapper must handle common cases robustly + +## Proof Strategy + +- Module split regression → retire in S01 by proving build succeeds and all existing tools register/execute with the new structure +- addInitScript behavior → retire in S01 by proving shared utilities are callable from evaluate callbacks after navigation +- Form label association → retire in S04 by proving browser_analyze_form and browser_fill_form work on a real multi-field form + +## Verification Classes + +- Contract verification: unit tests for heuristic scoring, utility functions, form analysis logic, screenshot resizing +- Integration verification: existing tools register and execute against a real browser page after module split +- Operational verification: build succeeds, extension loads, sharp dependency resolves +- UAT / human verification: spot-check new tools against real web forms and pages + +## Milestone Definition of Done + +This milestone is complete only when all are true: + +- index.ts is decomposed into focused modules; build succeeds +- Shared browser-side utilities are injected once and used by buildRefSnapshot, resolveRefTarget, and new tools +- Action tools use consolidated state capture (fewer evaluate calls than before) +- Low-signal actions skip body text capture +- Settle short-circuits on zero-mutation actions +- constrainScreenshot uses sharp +- browser_navigate defaults to no screenshot +- browser_analyze_form, browser_fill_form, browser_find_best, and browser_act are registered and functional +- Test suite passes +- All 43 existing tools verified against a running page (spot-check) + +## Requirement Coverage + +- Covers: R015, R016, R017, R018, R019, R020, R021, R022, R023, R024, R025, R026 +- Partially covers: none +- Leaves for later: R027 (browser reuse — deferred) +- Orphan risks: none + +## Slices + +- [ ] **S01: Module decomposition and shared evaluate utilities** `risk:high` `depends:[]` + > After this: all 43 existing browser tools work identically with the new module structure; shared browser-side utilities (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName) are injected once via addInitScript and used by buildRefSnapshot and resolveRefTarget — verified by build success and spot-check against a real page. + +- [ ] **S02: Action pipeline performance** `risk:medium` `depends:[S01]` + > After this: captureCompactPageState and postActionSummary are consolidated into fewer evaluate calls per action; settleAfterActionAdaptive short-circuits on zero-mutation actions; low-signal actions (scroll, hover, Tab) skip body text capture — verified by build success and behavioral spot-check. + +- [ ] **S03: Screenshot pipeline** `risk:low` `depends:[S01]` + > After this: constrainScreenshot uses sharp instead of canvas; browser_navigate returns no screenshot by default with an explicit parameter to opt in — verified by build success and running browser_navigate to confirm no screenshot in response. + +- [ ] **S04: Form intelligence** `risk:medium` `depends:[S01]` + > After this: browser_analyze_form returns field inventory (labels, types, required, values, validation) for any form; browser_fill_form fills fields by label/name/placeholder mapping and optionally submits — verified by running both tools against a real multi-field form. + +- [ ] **S05: Intent-ranked retrieval and semantic actions** `risk:medium` `depends:[S01]` + > After this: browser_find_best returns scored candidates for intents like "submit form", "close dialog", "primary CTA"; browser_act executes common micro-tasks in one call — verified by running both tools against real pages. + +- [ ] **S06: Test coverage** `risk:low` `depends:[S01,S02,S03,S04,S05]` + > After this: test suite covers shared browser-side utilities, settle logic, screenshot resizing, form analysis heuristics, intent scoring, and semantic action resolution — verified by test runner passing. + +## Boundary Map + +### S01 → S02 + +Produces: +- `browser-tools/state.ts` — shared mutable state module (browser, context, pageRegistry, logs, refs, timeline, session state) with accessor functions +- `browser-tools/utils.ts` — shared Node-side utilities (truncateText, artifact helpers, error formatting) +- `browser-tools/lifecycle.ts` — ensureBrowser(), closeBrowser(), getActivePage(), getActiveTarget(), attachPageListeners() +- `browser-tools/capture.ts` — captureCompactPageState(), postActionSummary(), constrainScreenshot(), captureErrorScreenshot(), getRecentErrors() +- `browser-tools/settle.ts` — settleAfterActionAdaptive(), ensureMutationCounter(), readMutationCounter(), readFocusedDescriptor() +- `browser-tools/refs.ts` — buildRefSnapshot(), resolveRefTarget(), parseRef(), ref state management +- `browser-tools/evaluate-helpers.ts` — browser-side utility source injected via addInitScript (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName) +- `browser-tools/tools/` — tool registration files grouped by category + +Consumes: +- nothing (first slice) + +### S01 → S03 + +Produces: +- `browser-tools/capture.ts` — constrainScreenshot() as a separate function that S03 will replace internals of + +Consumes: +- nothing (first slice) + +### S01 → S04 + +Produces: +- `browser-tools/evaluate-helpers.ts` — shared browser-side utilities that form tools will reference +- `browser-tools/lifecycle.ts` — ensureBrowser(), getActiveTarget() +- `browser-tools/state.ts` — action timeline, page state accessors + +Consumes: +- nothing (first slice) + +### S01 → S05 + +Produces: +- `browser-tools/evaluate-helpers.ts` — shared browser-side utilities that intent tools will reference +- `browser-tools/refs.ts` — buildRefSnapshot() for element inventory +- `browser-tools/lifecycle.ts` — ensureBrowser(), getActiveTarget() + +Consumes: +- nothing (first slice) + +### S02 → S06 + +Produces: +- Consolidated captureCompactPageState + postActionSummary logic (testable) +- Modified settleAfterActionAdaptive with zero-mutation short-circuit (testable) +- Action signal classification (high/low) for body text capture (testable) + +Consumes from S01: +- Module structure, shared state, evaluate helpers + +### S03 → S06 + +Produces: +- sharp-based constrainScreenshot (testable with buffer fixtures) + +Consumes from S01: +- capture.ts module structure + +### S04 → S05 + +Produces: +- Form analysis evaluate logic (field inventory, label mapping) that browser_act reuses for "submit form" intent + +Consumes from S01: +- evaluate-helpers.ts, lifecycle.ts, state.ts + +### S04 → S06 + +Produces: +- Form label association heuristics (testable) +- Field inventory logic (testable) + +Consumes from S01: +- Module structure + +### S05 → S06 + +Produces: +- Intent scoring heuristics (testable) +- Semantic action resolution logic (testable) + +Consumes from S01: +- Module structure, refs, evaluate helpers + +Consumes from S04: +- Form analysis logic for "submit form" intent