chore: untrack .claude/ and .gsd/ directories, gitignore *.tgz

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Lex Christopherson 2026-03-13 10:08:17 -06:00
parent d9a9a73ab2
commit 69d8baf17b
26 changed files with 2 additions and 2052 deletions

View file

@ -1,223 +0,0 @@
---
description: Publish GSD updates to npm and GitHub
---
Publish GSD updates with automatic changelog generation.
<process>
<step name="check_uncommitted">
## 1. Check for Uncommitted Changes
```bash
git status --short
```
If uncommitted changes exist:
- Ask: "Uncommitted changes detected. What commit message should I use?"
- Commit with provided message
- Continue to next step
</step>
<step name="get_commits_since_tag">
## 2. Get Commits Since Last Version
```bash
LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
if [ -n "$LAST_TAG" ]; then
git log ${LAST_TAG}..HEAD --oneline --no-merges
else
echo "No previous tags found"
fi
```
Capture the commit list for changelog generation.
</step>
<step name="check_docs">
## 3. Check Documentation Currency
Review the commits captured above and check if README.md needs updates.
**Check for commits that require README updates:**
- New commands or features
- Changed command behavior or flags
- New configuration options
- New workflows or processes
- Deprecations or removals
**Review README.md against commits:**
1. Read README.md
2. For each significant commit, verify the feature/change is documented
3. Check command tables match actual commands
4. Check configuration tables match actual options
**If updates needed:**
1. Draft specific README changes
2. Present changes for approval
3. Apply approved changes
4. Commit: `git add README.md && git commit -m "docs: update README for vX.Y.Z"`
**If no updates needed:**
- State: "README is current with all changes"
- Continue to next step
</step>
<step name="generate_changelog_draft">
## 4. Generate Changelog Entry Draft
Analyze the commits and draft a curated changelog entry.
**Grouping rules:**
- **Added** — New features, commands, capabilities
- **Changed** — Modifications to existing behavior
- **Fixed** — Bug fixes
- **Removed** — Deprecated/removed features
- **BREAKING:** prefix for breaking changes
**Writing rules:**
- Write human-readable descriptions, not raw commit messages
- Focus on user impact, not implementation details
- Group related commits into single entries
- Flag breaking changes prominently with **BREAKING:** prefix
**Example draft:**
```markdown
## [X.Y.Z] - YYYY-MM-DD
### Added
- New `/gsd:whats-new` command for version awareness
### Changed
- Improved parallel execution performance
### Fixed
- STATE.md progress bar calculation
### Removed
- **BREAKING:** Removed deprecated ISSUES.md system
```
Present the draft for review.
</step>
<step name="checkpoint_review" type="checkpoint:human-verify">
## 5. Review Changelog Draft
**Drafted changelog entry:**
[Show the generated draft]
**Verify:**
1. Categories are correct (Added/Changed/Fixed/Removed)
2. Descriptions are clear and user-focused
3. Breaking changes are marked with **BREAKING:** prefix
4. Nothing important is missing from commits
**Resume signal:** Type "approved" or provide edits
</step>
<step name="update_changelog">
## 6. Update CHANGELOG.md
After approval:
1. **Read current CHANGELOG.md**
2. **Insert new version section** after [Unreleased] header
3. **Update version links** at bottom:
- Add new version link: `[X.Y.Z]: https://github.com/gsd-build/gsd-2/releases/tag/vX.Y.Z`
- Update [Unreleased] comparison: `[Unreleased]: https://github.com/gsd-build/gsd-2/compare/vX.Y.Z...HEAD`
```bash
# Stage changelog
git add CHANGELOG.md
git commit -m "docs: update changelog for vX.Y.Z"
```
</step>
<step name="version_bump">
## 7. Bump Version
Ask which version bump type:
- `npm version patch` — Bug fixes (default)
- `npm version minor` — New features
- `npm version major` — Breaking changes
- `npm version prerelease --preid=alpha` — Experimental features
```bash
npm version patch # or minor/major/prerelease
```
This creates a version commit and tag.
</step>
<step name="push_and_publish">
## 8. Push and Publish
```bash
git push && git push --tags
```
Then publish to npm:
```bash
npm publish --access public
```
Verify the publish succeeded by checking the output for the package URL.
</step>
<step name="create_github_release">
## 9. Create GitHub Release
Create a GitHub Release from the tag.
```bash
gh release create vX.Y.Z --title "vX.Y.Z" --notes "[changelog content]" --latest
```
Use the approved changelog content as the release notes.
</step>
<step name="post_discord">
## 10. Post to Discord Changelog
Post the changelog entry to the GSD Discord community.
Use the Discord MCP server:
```
discord_execute("messages.send", {
"channel_id": "1464128246290579469",
"content": "**vX.Y.Z Released** \n\n[changelog content here]\n\nInstall/upgrade: `npx gsd-pi@latest`"
})
```
Format the message with:
- Version number as header
- The approved changelog content (Added/Changed/Fixed/Removed sections)
- Install command at the bottom
</step>
<step name="report">
## 11. Report Success
```
Published vX.Y.Z
- npm: https://www.npmjs.com/package/gsd-pi
- GitHub: https://github.com/gsd-build/gsd-2/releases/tag/vX.Y.Z
```
</step>
</process>
<success_criteria>
- README.md checked against commits and updated if needed
- Changelog entry drafted from commits
- User reviewed and approved entry
- CHANGELOG.md updated and committed
- Version bumped via npm version
- Pushed to GitHub with tags
- Published to npm via `npm publish`
- GitHub Release created with `gh release create`
- Changelog posted to Discord #changelog channel
</success_criteria>

2
.gitignore vendored
View file

@ -1,6 +1,8 @@
# ── GSD (user project artifacts — never commit) ──
.gsd/
.claude/
*.tgz
.DS_Store
Thumbs.db
*.swp

View file

@ -1,34 +0,0 @@
# Decisions Register
<!-- Append-only. Never edit or remove existing rows.
To reverse a decision, add a new row that supersedes it.
Read this file at the start of any planning or research phase. -->
| # | When | Scope | Decision | Choice | Rationale | Revisable? |
|---|------|-------|----------|--------|-----------|------------|
| D001 | M001 | arch | Secret collection insertion point | At `/gsd auto` entry (startAuto), not as a dispatch unit type | Keeps the state machine untouched. Collection is a one-time gate, not a repeating unit. Simpler, less risk of dispatch loop bugs. | Yes — if collection needs to happen mid-milestone |
| D002 | M001 | convention | Manifest file naming | `M00x-SECRETS.md` via existing `resolveMilestoneFile(base, mid, "SECRETS")` | Consistent with all other milestone-level files (CONTEXT, ROADMAP, RESEARCH). No new path resolver needed. | No |
| D003 | M001 | pattern | Summary screen interactivity | Read-only with auto-skip (no interactive deselection) | Matches the "walk away" philosophy. Simpler UX, fewer edge cases. User can always re-run collection. | Yes — if users request deselection |
| D004 | M001 | pattern | Guidance display placement | Same page as masked input (above the editor) | Single page per key — no extra navigation. User sees guidance while entering the value. | Yes — if terminal height constraints cause problems |
| D005 | M001 | convention | Manifest format | Markdown with H3 sections per key, bold fields, numbered guidance | Consistent with all other .gsd files. Parser and formatter already exist in files.ts. | No |
| D006 | M001 | arch | Destination inference | Reuse existing `detectDestination()` from get-secrets-from-user.ts | Simple file-presence checks (vercel.json → Vercel, convex/ → Convex, default → .env). Already proven. | Yes — if per-key destination override needed |
| D007 | M002 | arch | File structure after module split | Split index.ts into state.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts, utils.ts, evaluate-helpers.ts, and tools/ directory | 5000-line monolith is unmaintainable; module boundaries enable safe changes. core.js already established the pattern. | No |
| D008 | M002 | library | Image resizing library | sharp | Fast, well-maintained, standard Node image processing. Replaces fragile canvas-based approach that depends on page context. | No |
| D009 | M002 | convention | Navigate screenshot default | Off by default, opt-in via parameter | Big token savings. Agent uses browser_screenshot explicitly when visual verification needed. | Yes — if agents consistently need screenshots on navigate |
| D010 | M002 | arch | Browser-side utility injection | page.addInitScript under window.__pi namespace | Survives navigation, available before page scripts, namespaced to avoid collisions. | Yes — if timing issues discovered |
| D011 | M002 | convention | Intent resolution approach | Deterministic heuristics only, no LLM calls | Predictable latency and cost. Scoring functions are testable and debuggable. | Yes — if heuristic coverage proves insufficient |
| D012 | M002 | convention | Browser reuse across sessions | Skip completely | Architecturally different from within-session work; user directed to exclude entirely. | No |
| D013 | M002/S01 | pattern | Mutable state accessor pattern | get/set functions for all 18 state variables, not `export let` | ES module live bindings break under jiti's CJS shim. Accessors guarantee consumers see mutations. | No |
| D014 | M002/S01 | pattern | ToolDeps interface location | Defined in state.ts alongside types it references | Keeps the dependency graph simple — tool files import state.ts for ToolDeps + types. | Yes — could move to separate types.ts if state.ts grows |
| D015 | M002/S01 | pattern | Factory pattern for lifecycle-dependent utils | createGetLivePagesSnapshot(ensureBrowser) instead of direct import | Avoids circular dependency between utils.ts and lifecycle.ts. Wired at orchestrator level. | No |
| D016 | M002/S01 | pattern | Tool file import strategy | Tool files import state accessors and core.js functions directly — ToolDeps carries only infrastructure functions needing lifecycle wiring | Keeps ToolDeps lean. State accessors are stable imports, not runtime-wired dependencies. Avoids bloating the deps interface with every utility. | Yes — if ToolDeps grows unwieldy |
| D017 | M002/S02 | pattern | Action tool signal classification | High-signal: click, type, key_press, select_option, set_checked, navigate, click_ref, fill_ref. Low-signal: scroll, hover, drag, upload_file, hover_ref. | High-signal tools produce meaningful page changes worth capturing body text for diffs. Low-signal tools don't change page content. fill_ref is high-signal because input value changes affect form state. | Yes — if new tools need reclassification |
| D018 | M002/S02 | pattern | postActionSummary retention | Keep postActionSummary in capture.ts for summary-only tools (go_back, go_forward, reload) but remove from action tools that do before/after diff | Summary-only tools don't do diffs and don't need beforeState — postActionSummary is the right abstraction for them. Action tools need consolidated capture. | Yes — could remove entirely if summary-only tools get before/after diff |
| D019 | M002/S02 | tuning | Zero-mutation settle thresholds | 60ms detection window, 30ms shortened quiet window, totalMutationsSeen === 0 required | Conservative thresholds — 60ms is enough time for any async DOM update to start, 30ms shortened window still catches late mutations. Requiring zero total mutations (not just current poll) prevents false short-circuits. | Yes — if real-world testing shows 60ms is too short for slow SPAs |
| D020 | M002/S04 | pattern | Form analysis evaluate location | Form analysis evaluate logic lives in tools/forms.ts, not extracted to evaluate-helpers.ts | Form-specific, not a shared utility. The label resolution heuristic is only used by form tools. Keeping it local avoids bloating the shared injection. | Yes — if S05 intent tools need label resolution |
| D021 | M002/S04 | pattern | Fill uses Playwright APIs, not evaluate | browser_fill_form uses Playwright locator.fill()/selectOption()/setChecked() instead of page.evaluate() value setting | Playwright APIs trigger proper input/change events and handle framework-specific reactivity (React, Vue). Direct value setting via evaluate skips event dispatch and breaks reactive frameworks. | No |
| D022 | M002/S04 | pattern | Fill field matching priority | Label (exact → case-insensitive) → name → placeholder → aria-label | Label is the most human-readable identifier. Name is the most reliable programmatic identifier. Placeholder and aria-label are fallbacks. Exact match before fuzzy prevents wrong-field fills. | Yes — if real-world usage shows a different priority works better |
| D023 | M002/S05 | pattern | Intent scoring model | 4 orthogonal dimensions per intent, each 0-1, summed and clamped | Consistent scoring structure across all 8 intents. Makes scoring testable and debuggable — each dimension has a named reason. 4 dimensions balance discrimination vs complexity. | Yes — could add/remove dimensions per intent if real-world usage shows imbalance |
| D024 | M002/S05 | pattern | search_field action type | Focus instead of click for search_field intent in browser_act | Search fields need keyboard focus for typing, not a click that might submit or toggle. Focus is the semantically correct action. Other intents use click. | Yes — if focus proves unreliable on specific input implementations |
| D025 | M002/S06 | pattern | Test import strategy for browser-tools | jiti CJS imports instead of ESM resolve-ts hook | The resolve-ts ESM hook breaks on core.js (plain .js file imported by TS modules). jiti handles mixed .ts/.js imports correctly from a .cjs test file. | No |
| D026 | M002/S06 | pattern | Testing module-private functions | Source extraction via readFileSync + brace-match + strip types + eval | Avoids exporting test-only APIs from production modules. Fragile to refactors but tests fail clearly when extraction breaks. Acceptable tradeoff for test code. | Yes — if private functions get exported for other reasons |

View file

@ -1,41 +0,0 @@
# Project
## What This Is
A pi coding agent extension (GSD — "Get Stuff Done") that provides structured planning, auto-mode execution, and project management for autonomous coding sessions. Includes proactive secret management and browser automation tools for UI verification.
## Core Value
Auto-mode runs from start to finish without blocking. The agent has hands, eyes, and judgment in the browser — fast, token-efficient, and reliable.
## Current State
The GSD extension is fully functional with:
- Milestone/slice/task planning hierarchy
- Auto-mode state machine with fresh-session-per-unit dispatch
- Guided `/gsd` wizard flow
- `secure_env_collect` tool with masked TUI input, multi-destination write support, guidance display, and summary screen
- Proactive secret management: planning prompts forecast secrets, manifests persist them, auto-mode collects them before first dispatch
- Browser-tools extension with 47 registered tools covering navigation, interaction, inspection, verification, tracing, debugging, form intelligence (browser_analyze_form, browser_fill_form), and intent-ranked retrieval and semantic actions (browser_find_best, browser_act)
- Browser-tools `core.js` with shared utilities for action timeline, page registry, state diffing, assertions, fingerprinting
## Architecture / Key Patterns
- **Extension model**: pi extensions register tools, commands, hooks via `ExtensionAPI`
- **State machine**: `auto.ts` drives `dispatchNextUnit()` which reads disk state and dispatches fresh sessions
- **Secrets gate**: `startAuto()` checks `getManifestStatus()` before first dispatch
- **Disk-driven state**: `.gsd/` files are the source of truth, `STATE.md` is derived cache
- **File parsing**: `files.ts` has markdown parsers for all GSD file types
- **Browser-tools**: Modular structure — slim `index.ts` orchestrator, 8 focused infrastructure modules (state.ts, utils.ts, evaluate-helpers.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts), 11 categorized tool files under `tools/` (including forms.ts, intent.ts), shared infrastructure in `core.js` (~1000 lines). Browser-side utilities injected once via `addInitScript` under `window.__pi` namespace. Uses Playwright for browser control. Accessibility-first state representation, deterministic versioned refs, adaptive DOM settling, compact post-action summaries. Form tools use Playwright locator APIs for type-aware filling with structured result reporting. Intent tools use deterministic 4-dimension heuristic scoring for element retrieval and one-call semantic actions.
- **Prompt templates**: `prompts/` directory with mustache-like `{{var}}` substitution
- **TUI components**: `@gsd/pi-tui` provides `Editor`, `Text`, key handling, themes
- **Branch-per-slice**: git branches isolate slice work, squash-merged to main on completion
## Capability Contract
See `.gsd/REQUIREMENTS.md` for the explicit capability contract, requirement status, and coverage mapping.
## Milestone Sequence
- [x] M001: Proactive Secret Management — Front-loaded API key collection into planning so auto-mode runs uninterrupted (10 requirements validated)
- [x] M002: Browser Tools Performance & Intelligence — Module decomposition, action pipeline optimization, sharp-based screenshots, form intelligence, intent-ranked retrieval, semantic actions, 108-test suite (12 requirements validated)

View file

@ -1,360 +0,0 @@
# Requirements
This file is the explicit capability and coverage contract for the project.
## Active
### R020 — Sharp-based screenshot resizing
- Class: core-capability
- Status: validated
- Description: constrainScreenshot uses the sharp Node library for image resizing instead of bouncing buffers through page canvas context. Faster, no page dependency.
- Why it matters: The current approach sends full screenshot buffer to the page as base64, creates an Image, draws to canvas, exports, then sends back. This is slow and fragile (depends on working canvas context).
- Source: user
- Primary owning slice: M002/S03
- Supporting slices: M002/S01
- Validation: constrainScreenshot uses sharp(buffer).metadata() and sharp(buffer).resize(). Zero page.evaluate calls in capture.ts. sharp added to root dependencies and extension peerDependencies. Build passes.
- Notes: sharp added as a dependency. API: sharp(buffer).resize(w, h, { fit: 'inside' }).jpeg({ quality }).toBuffer()
### R021 — Opt-in screenshots on navigate
- Class: core-capability
- Status: validated
- Description: browser_navigate does not capture or return a screenshot by default. An explicit parameter (e.g. screenshot: true) opts in to screenshot capture.
- Why it matters: The current always-inline screenshot is a large base64 payload in every navigation response. For many verifications the compact page summary + diff is sufficient. Significant token savings.
- Source: user
- Primary owning slice: M002/S03
- Supporting slices: none
- Validation: browser_navigate has screenshot: Type.Optional(Type.Boolean({ default: false })) parameter. Screenshot capture gated with if (params.screenshot). browser_reload unchanged. Build passes.
- Notes: Default is off. The agent can still use browser_screenshot explicitly when visual verification is needed.
### R022 — Form analysis tool (browser_analyze_form)
- Class: core-capability
- Status: validated
- Description: A browser_analyze_form tool that takes a form selector and returns field inventory including: field labels, names, types, required status, current values, validation state, and submit controls.
- Why it matters: A huge percentage of browser tasks are form tasks. Currently the agent needs 3-8 tool calls to analyze a form. This collapses that into one call.
- Source: user
- Primary owning slice: M002/S04
- Supporting slices: M002/S01
- Validation: browser_analyze_form registered (45 tools total), 7-level label resolution (aria-labelledby → aria-label → label[for] → wrapping label → placeholder → title → humanized name), form auto-detection, fieldset grouping, submit button discovery. Verified end-to-end against 12-field test form with diverse label associations. Build passes.
- Notes: Must handle label association via for/id, wrapping label, aria-label, aria-labelledby, and placeholder.
### R023 — Form fill tool (browser_fill_form)
- Class: core-capability
- Status: validated
- Description: A browser_fill_form tool that takes a form selector, a values object mapping field identifiers to values, and an optional submit flag. Maps labels/names/placeholders to inputs and fills them.
- Why it matters: Filling a login form currently takes 3-5 tool calls (find inputs, type email, type password, click submit). This collapses it to one call.
- Source: user
- Primary owning slice: M002/S04
- Supporting slices: M002/S01
- Validation: browser_fill_form registered (45 tools total), 5-strategy field resolution (label exact/loose → name → placeholder → aria-label), type-aware filling via Playwright APIs (fill/selectOption/setChecked), file/hidden skip, ambiguity detection, optional submit, post-fill validation. Verified end-to-end: 10 fields filled correctly, file input skipped, unmatched key reported. Build passes.
- Notes: Returns matched fields, unmatched values, fields skipped, and validation state after fill.
### R024 — Intent-ranked element retrieval (browser_find_best)
- Class: core-capability
- Status: validated
- Description: A browser_find_best tool that takes an intent string (e.g. "submit form", "close dialog", "primary CTA") and returns scored candidates with reasons, using deterministic heuristic ranking.
- Why it matters: The agent frequently needs "which button submits this form?" Currently it does browser_find → gets 15 candidates → reasons about which one. A heuristic ranker cuts a round trip and reduces reasoning tokens.
- Source: user
- Primary owning slice: M002/S05
- Supporting slices: M002/S01
- Validation: 8 intents implemented with 4-dimension scoring (submit_form, close_dialog, primary_cta, search_field, next_step, dismiss, auth_action, back_navigation). Each returns up to 5 candidates sorted by score with CSS selectors and reason strings. Intent normalization accepts underscores/spaces/hyphens. Verified via Playwright tests against real HTML pages with differentiated rankings. Build passes, tool count = 47.
- Notes: Deterministic heuristics only. No hidden LLM calls.
### R025 — Semantic action tool (browser_act)
- Class: core-capability
- Status: validated
- Description: A browser_act tool that takes a semantic intent (e.g. "submit the current form", "close the active modal", "click the primary CTA") and executes the obvious action sequence internally.
- Why it matters: Each of these common micro-tasks currently takes 2-4 tool calls. browser_act collapses them into one.
- Source: user
- Primary owning slice: M002/S05
- Supporting slices: M002/S04
- Validation: Resolves top candidate via same scoring engine as browser_find_best. Executes via Playwright locator.click() with getByRole fallback (focus for search_field). Settles via settleAfterActionAdaptive, returns before/after diff. Zero-candidate returns isError:true without throwing. Verified via Playwright test scripts. Build passes, tool count = 47.
- Notes: Builds on browser_find_best for element selection. Bounded — does not loop or retry.
### R026 — Test coverage for new and refactored code
- Class: quality-attribute
- Status: validated
- Description: Test suite covers shared browser-side utilities, settle logic, screenshot resizing, form tools, and intent ranking. Tests verify correctness and guard against regressions.
- Why it matters: A 5000-line file with zero tests is fragile. The refactoring and new features need regression protection.
- Source: user
- Primary owning slice: M002/S06
- Supporting slices: all M002 slices
- Validation: 108 tests (63 unit + 45 integration) passing via `npm run test:browser-tools`. Unit tests cover pure functions, state accessors, EVALUATE_HELPERS_SOURCE validity, constrainScreenshot with sharp. Integration tests cover window.__pi utilities, intent scoring differentiation, and form label resolution — all via Playwright against real DOM.
- Notes: Test what's unit-testable without a running browser (heuristics, scoring, utility functions). Integration tests with Playwright for tools that need a page.
## Validated
### R017 — Consolidated state capture per action
- Class: core-capability
- Status: validated
- Description: The before-state capture, after-state capture, post-action summary, and recent-error check are consolidated into fewer page.evaluate calls per action. Target: ~50-100ms savings per action.
- Why it matters: Every action tool currently runs 3-4 separate page.evaluate calls for state capture. Consolidating them reduces latency on every single browser interaction.
- Source: user
- Primary owning slice: M002/S02
- Supporting slices: M002/S01
- Validation: postActionSummary eliminated from action tools (grep returns 0 in interaction.ts), countOpenDialogs removed from ToolDeps, high-signal tools use single captureCompactPageState + formatCompactStateSummary pattern. Build passes.
- Notes: captureCompactPageState and postActionSummary can likely be merged into a single evaluate.
### R018 — Conditional body text capture
- Class: core-capability
- Status: validated
- Description: Body text capture (includeBodyText: true) is skipped for low-signal actions (scroll, hover, Tab key press) and enabled for high-signal actions (navigate, click, type, submit).
- Why it matters: Capturing 4000 chars of body text on every scroll or hover is wasteful. Conditional capture reduces evaluate overhead for frequent low-signal actions.
- Source: user
- Primary owning slice: M002/S02
- Supporting slices: none
- Validation: grep shows explicit includeBodyText: true for 5 high-signal tools and includeBodyText: false for 4 low-signal tools in interaction.ts. Classification codified in D017. Build passes.
- Notes: Requires classifying each tool as high-signal or low-signal.
### R019 — Faster settle on zero mutations
- Class: core-capability
- Status: validated
- Description: settleAfterActionAdaptive short-circuits with a smaller quiet window when no mutation observer fires in the first 60ms. Target: ~40-80ms savings on zero-mutation actions.
- Why it matters: Many SPA interactions produce no DOM changes. The current settle logic always waits the full quiet window regardless. Short-circuiting saves time on the most common case.
- Source: user
- Primary owning slice: M002/S02
- Supporting slices: none
- Validation: zero_mutation_shortcut settle reason in state.ts type union and settle.ts return path. Combined readSettleState() poll evaluate. 60ms/30ms thresholds codified in D019. Build passes.
- Notes: Track whether any mutation fired at all; if zero after 60ms, use a shorter quiet window.
### R015 — Module decomposition of browser-tools
- Class: quality-attribute
- Status: validated
- Description: The monolithic browser-tools index.ts (~5000 lines) is split into focused modules: shared infrastructure, tool groups, and browser-side utilities. All 43 existing tools continue to work identically.
- Why it matters: A 5000-line file is unmaintainable and makes targeted changes risky. Module boundaries enable safe refactoring and new tool development.
- Source: user
- Primary owning slice: M002/S01
- Supporting slices: none
- Validation: Extension loads via jiti, 43 tools register, browser navigate/snapshot/click work against real page, index.ts is 47-line orchestrator with zero registerTool calls, 9 tool files under tools/.
- Notes: core.js already exists with ~1000 lines of shared utilities. The split extends this pattern.
### R016 — Shared browser-side evaluate utilities
- Class: quality-attribute
- Status: validated
- Description: Common functions duplicated across page.evaluate boundaries (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName) are injected once and referenced from all evaluate callbacks.
- Why it matters: Currently buildRefSnapshot and resolveRefTarget each redeclare ~100 lines of identical utility code. Deduplication reduces payload size, improves maintainability, and ensures consistency.
- Source: user
- Primary owning slice: M002/S01
- Supporting slices: none
- Validation: window.__pi contains all 9 functions, survives navigation, refs.ts has zero inline redeclarations, close/reopen re-injects via addInitScript correctly.
- Notes: Uses context.addInitScript under window.__pi namespace.
### R001 — Secret forecasting during milestone planning
- Class: core-capability
- Status: validated
- Description: When a milestone is planned, the LLM analyzes slices for external service dependencies and writes a secrets manifest listing every predicted API key with setup guidance.
- Why it matters: Without forecasting, auto-mode discovers missing keys mid-execution and blocks for hours waiting for user input.
- Source: user
- Primary owning slice: M001/S01
- Supporting slices: none
- Validation: plan-milestone.md Secret Forecasting section (line 62) instructs LLM to write manifest. Parser round-trip tested in parsers.test.ts.
- Notes: The plan-milestone prompt has forecasting instructions. The manifest format and parser are implemented and tested.
### R002 — Secrets manifest persisted in .gsd/
- Class: continuity
- Status: validated
- Description: The secrets manifest is a durable markdown file at `.gsd/milestones/M00x/M00x-SECRETS.md` that survives session boundaries and can be re-read by any future unit.
- Why it matters: Collection may happen in a different session than planning. The manifest must persist on disk.
- Source: user
- Primary owning slice: M001/S01
- Supporting slices: none
- Validation: parseSecretsManifest/formatSecretsManifest round-trip tested (parsers.test.ts), resolveMilestoneFile(base, mid, "SECRETS") resolves path.
- Notes: Parser/formatter implemented in files.ts. Template exists at templates/secrets-manifest.md.
### R003 — Step-by-step guidance per key
- Class: primary-user-loop
- Status: validated
- Description: Each secret in the manifest includes numbered steps for obtaining the key (navigate to dashboard → create project → generate key → copy), a dashboard URL, and a format hint.
- Why it matters: Users shouldn't have to figure out where to find each key. The guidance makes collection self-service.
- Source: user
- Primary owning slice: M001/S02
- Supporting slices: M001/S01
- Validation: collectOneSecret renders numbered dim-styled guidance steps with wrapping (collect-from-manifest.test.ts tests 6-8).
- Notes: Guidance quality is LLM-dependent and best-effort.
### R004 — Summary screen before collection
- Class: primary-user-loop
- Status: validated
- Description: Before collecting secrets one-by-one, show a read-only summary screen listing all needed keys with their status (pending / already set / skipped). Auto-skip keys that already exist in the environment.
- Why it matters: The user needs to see the full picture before entering keys. Already-set keys should not require re-entry.
- Source: user
- Primary owning slice: M001/S02
- Supporting slices: none
- Validation: showSecretsSummary() renders read-only ctx.ui.custom screen with status indicators via makeUI().progressItem() (collect-from-manifest.test.ts tests 4-5).
- Notes: Read-only with auto-skip — no interactive deselection.
### R005 — Existing key detection and silent skip
- Class: primary-user-loop
- Status: validated
- Description: Before prompting for a key, check `.env` and `process.env`. If the key already exists, mark it as "already set" in the summary and skip collection.
- Why it matters: Users shouldn't re-enter keys they've already configured. Prevents frustration and errors.
- Source: user
- Primary owning slice: M001/S02
- Supporting slices: none
- Validation: getManifestStatus cross-references checkExistingEnvKeys, categorizes env-present keys as existing (manifest-status.test.ts tests 4,7). collectSecretsFromManifest skips them (collect-from-manifest.test.ts tests 1-2).
- Notes: `checkExistingEnvKeys()` implemented in get-secrets-from-user.ts.
### R006 — Smart destination detection
- Class: integration
- Status: validated
- Description: Automatically detect whether secrets should go to .env, Vercel, or Convex based on project file presence (vercel.json → Vercel, convex/ dir → Convex, default → .env).
- Why it matters: Users shouldn't have to specify the destination manually. The system should do the right thing.
- Source: user
- Primary owning slice: M001/S02
- Supporting slices: none
- Validation: collectSecretsFromManifest calls detectDestination() for destination inference. applySecrets() routes to dotenv/vercel/convex accordingly.
- Notes: `detectDestination()` implemented in get-secrets-from-user.ts.
### R007 — Auto-mode collection at entry point
- Class: core-capability
- Status: validated
- Description: When the user runs `/gsd auto`, check for a secrets manifest with pending keys. If found, collect them before dispatching the first slice. Collection happens once at the entry point, not as a dispatch unit.
- Why it matters: This is the primary integration point — auto-mode must not start execution with uncollected secrets.
- Source: user
- Primary owning slice: M001/S03
- Supporting slices: M001/S01, M001/S02
- Validation: startAuto() secrets gate at auto.ts:479. auto-secrets-gate.test.ts — 3/3 pass covering null manifest, pending keys, and no-pending-keys paths.
- Notes: Collection at entry point (startAuto), not as a separate unit type in dispatchNextUnit. D001 satisfied.
### R008 — Guided /gsd wizard integration
- Class: core-capability
- Status: validated
- Description: After milestone planning in the guided `/gsd` flow, trigger secret collection if a manifest exists with pending keys.
- Why it matters: Users who plan via the wizard should also get prompted for secrets before auto-mode begins.
- Source: user
- Primary owning slice: M001/S03
- Supporting slices: M001/S01, M001/S02
- Validation: guided-flow.ts calls startAuto() directly (lines 52, 486, 647, 794) — all guided flow paths that start auto-mode inherit the secrets gate.
- Notes: The guided flow dispatches to startAuto after planning. Collection is inherited via the gate.
### R009 — Planning prompts instruct LLM to forecast secrets
- Class: integration
- Status: validated
- Description: The plan-milestone prompt template includes instructions for the LLM to analyze slices for external service dependencies and write the secrets manifest.
- Why it matters: Without prompt instructions, the LLM won't know to forecast secrets.
- Source: user
- Primary owning slice: M001/S01
- Supporting slices: none
- Validation: plan-milestone.md has Secret Forecasting section at line 62 with instructions to write {{secretsOutputPath}} with H3 sections per key.
- Notes: Implemented in plan-milestone.md.
### R010 — secure_env_collect enhanced with guidance display
- Class: primary-user-loop
- Status: validated
- Description: The secure_env_collect TUI renders multi-line guidance steps above the masked input field on the same page, so the user sees setup instructions while entering the key.
- Why it matters: Without visible guidance, the user has to find keys on their own despite the LLM having generated instructions.
- Source: user
- Primary owning slice: M001/S02
- Supporting slices: none
- Validation: collectOneSecret accepts guidance parameter, renders numbered dim-styled lines with wrapTextWithAnsi above masked input (collect-from-manifest.test.ts tests 6-8).
- Notes: The guidance field is rendered in collectOneSecret().
## Deferred
### R011 — Multi-milestone secret forecasting
- Class: core-capability
- Status: deferred
- Description: Forecast secrets across all planned milestones, not just the active one.
- Why it matters: Would provide a complete picture of all secrets needed for the project.
- Source: user
- Primary owning slice: none
- Supporting slices: none
- Validation: unmapped
- Notes: Deferred — single-milestone forecasting is sufficient for now.
### R012 — Secret rotation reminders
- Class: operability
- Status: deferred
- Description: Track secret age and remind users when keys may need rotation.
- Why it matters: Security best practice, but not essential for the core workflow.
- Source: user
- Primary owning slice: none
- Supporting slices: none
- Validation: unmapped
- Notes: Deferred — out of scope for initial release.
### R027 — Browser reuse across sessions
- Class: core-capability
- Status: deferred
- Description: Keep a warm browser instance across rapid successive agent contexts (e.g. GSD auto-mode cycling through tasks) to avoid ~2-3s Chrome cold-start per session.
- Why it matters: Would eliminate Chrome launch latency in auto-mode. But requires inter-process coordination and is architecturally different from within-session optimizations.
- Source: user
- Primary owning slice: none
- Supporting slices: none
- Validation: unmapped
- Notes: Deferred — skip completely per user direction.
## Out of Scope
### R013 — Curated service knowledge base
- Class: anti-feature
- Status: out-of-scope
- Description: A static database of known services with pre-written guidance for each API key.
- Why it matters: Prevents scope creep. LLM-generated guidance is sufficient and stays current without maintenance.
- Source: user
- Primary owning slice: none
- Supporting slices: none
- Validation: n/a
- Notes: LLM generates guidance dynamically. A static KB would become stale.
### R014 — Just-in-time collection enhancement
- Class: anti-feature
- Status: out-of-scope
- Description: Detect missing secrets during task execution and collect them inline.
- Why it matters: Prevents scope confusion. The whole point of M001 is proactive collection, not reactive.
- Source: user
- Primary owning slice: none
- Supporting slices: none
- Validation: n/a
- Notes: Existing secure_env_collect already handles reactive collection. This milestone is about proactive.
### R028 — LLM-powered intent resolution
- Class: anti-feature
- Status: out-of-scope
- Description: Using hidden LLM calls inside browser_find_best or browser_act for intent resolution.
- Why it matters: Prevents unpredictable latency and cost. Intent resolution must be deterministic heuristics only.
- Source: inferred
- Primary owning slice: none
- Supporting slices: none
- Validation: n/a
- Notes: browser_find_best and browser_act use scoring heuristics, not LLM inference.
## Traceability
| ID | Class | Status | Primary owner | Supporting | Proof |
|---|---|---|---|---|---|
| R001 | core-capability | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section, parser round-trip tests |
| R002 | continuity | validated | M001/S01 | none | parseSecretsManifest/formatSecretsManifest round-trip tested |
| R003 | primary-user-loop | validated | M001/S02 | M001/S01 | collect-from-manifest.test.ts tests 6-8 |
| R004 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 4-5 |
| R005 | primary-user-loop | validated | M001/S02 | none | manifest-status.test.ts tests 4,7; collect-from-manifest.test.ts tests 1-2 |
| R006 | integration | validated | M001/S02 | none | collectSecretsFromManifest calls detectDestination() |
| R007 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | auto-secrets-gate.test.ts 3/3 pass |
| R008 | core-capability | validated | M001/S03 | M001/S01, M001/S02 | guided-flow.ts calls startAuto() at lines 52, 486, 647, 794 |
| R009 | integration | validated | M001/S01 | none | plan-milestone.md Secret Forecasting section line 62 |
| R010 | primary-user-loop | validated | M001/S02 | none | collect-from-manifest.test.ts tests 6-8 |
| R011 | core-capability | deferred | none | none | unmapped |
| R012 | operability | deferred | none | none | unmapped |
| R013 | anti-feature | out-of-scope | none | none | n/a |
| R014 | anti-feature | out-of-scope | none | none | n/a |
| R015 | quality-attribute | validated | M002/S01 | none | jiti load, 43 tools register, slim index, browser spot-check |
| R016 | quality-attribute | validated | M002/S01 | none | window.__pi injection, zero inline redeclarations, survives navigation |
| R017 | core-capability | validated | M002/S02 | M002/S01 | postActionSummary eliminated, countOpenDialogs removed from ToolDeps, consolidated capture pattern |
| R018 | core-capability | validated | M002/S02 | none | explicit includeBodyText true/false per tool signal level, classification in D017 |
| R019 | core-capability | validated | M002/S02 | none | zero_mutation_shortcut settle reason, combined readSettleState poll, 60ms/30ms thresholds in D019 |
| R020 | core-capability | validated | M002/S03 | M002/S01 | sharp-based constrainScreenshot, zero page.evaluate in capture.ts, build passes |
| R021 | core-capability | validated | M002/S03 | none | screenshot param default false, capture gated, browser_reload unchanged, build passes |
| R022 | core-capability | validated | M002/S04 | M002/S01 | 7-level label resolution, form auto-detection, verified against 12-field test form |
| R023 | core-capability | validated | M002/S04 | M002/S01 | 5-strategy field resolution, type-aware fill, verified end-to-end with 10 fields |
| R024 | core-capability | validated | M002/S05 | M002/S01 | 8-intent scoring, Playwright tests, differentiated rankings, build passes |
| R025 | core-capability | validated | M002/S05 | M002/S04 | top candidate execution via Playwright locator, settle + diff, graceful error, build passes |
| R026 | quality-attribute | validated | M002/S06 | all M002 | 108 tests passing via npm run test:browser-tools |
| R027 | core-capability | deferred | none | none | unmapped |
| R028 | anti-feature | out-of-scope | none | none | n/a |
## Coverage Summary
- Active requirements: 0
- Validated requirements: 22
- Deferred requirements: 3
- Out of scope: 3
- Unmapped active requirements: 0

View file

@ -1,19 +0,0 @@
# GSD State
**Active Milestone:** M002 — Browser Tools Performance & Intelligence
**Active Slice:** None
**Phase:** complete
**Requirements Status:** 7 active · 15 validated · 3 deferred · 3 out of scope
## Milestone Registry
- ✅ **M001:** Proactive Secret Management
- ✅ **M002:** Browser Tools Performance & Intelligence
## Recent Decisions
- None recorded
## Blockers
- None
## Next Action
All milestones complete.

View file

@ -1,120 +0,0 @@
# M002: Browser Tools Performance & Intelligence — Context
**Gathered:** 2026-03-12
**Status:** Ready for planning
## Project Description
Performance optimization and capability expansion of pi's browser-tools extension. The extension provides 43 browser interaction tools to the coding agent via Playwright. This milestone decomposes the monolithic 5000-line index.ts into modules, optimizes the per-action performance pipeline, replaces canvas-based screenshot resizing with sharp, and adds form intelligence, intent-ranked element retrieval, and semantic action tools.
## Why This Milestone
The browser-tools extension is the agent's primary interface for UI verification and testing. Every action pays a latency tax from redundant page.evaluate calls, unnecessary body text capture, and canvas-based screenshot resizing. The monolithic file structure makes changes risky. And the most common browser tasks (forms, finding the right button, executing obvious micro-actions) still require multiple tool calls where one would suffice.
## User-Visible Outcome
### When this milestone is complete, the user can:
- See faster browser interactions (fewer evaluate round-trips, faster settle, faster screenshots)
- See smaller token payloads (no screenshots on navigate by default, no body text on scroll/hover)
- Use `browser_analyze_form` to inspect any form's fields, types, values, and validation in one call
- Use `browser_fill_form` to fill a form by label/name/placeholder mapping in one call
- Use `browser_find_best` with an intent to get scored element candidates
- Use `browser_act` to execute common micro-tasks ("submit form", "close modal") in one call
### Entry point / environment
- Entry point: pi CLI with browser-tools extension loaded
- Environment: local dev, any website/web app
- Live dependencies involved: Playwright browser instance, sharp npm package
## Completion Class
- Contract complete means: Tests pass for shared utilities, heuristic scoring, form analysis logic, and screenshot resizing
- Integration complete means: All 43 existing tools work with the new module structure; new tools work against real web pages
- Operational complete means: Build succeeds; the extension loads and registers all tools
## Final Integrated Acceptance
To call this milestone complete, we must prove:
- All existing browser tools work identically after module decomposition (build + behavioral spot-check)
- New tools (browser_analyze_form, browser_fill_form, browser_find_best, browser_act) register and execute against a real page
- Screenshot resizing uses sharp (no canvas evaluate calls)
- Navigate returns no screenshot by default
- Test suite passes
## Risks and Unknowns
- Module split regression risk — 43 tools sharing module-level state (browser, context, pageRegistry, logs) must all still work after decomposition
- sharp native dependency — binary compatibility across platforms (macOS, Linux)
- addInitScript timing — injected scripts must be available before any evaluate that references them, including on new pages and after navigation
- Form label association complexity — real-world forms use diverse patterns (for/id, wrapping labels, aria-label, aria-labelledby, placeholder, custom components)
## Existing Codebase / Prior Art
- `src/resources/extensions/browser-tools/index.ts` — The monolithic file being decomposed (~5000 lines, 43 tools, all shared infrastructure)
- `src/resources/extensions/browser-tools/core.js` — Existing shared utilities (~1000 lines: action timeline, page registry, state diffing, assertions, fingerprinting, snapshot modes, batch execution)
- `src/resources/extensions/browser-tools/BROWSER-TOOLS-V2-PROPOSAL.md` — Design proposal; many items already implemented (assertions, batch, diff, timeline, pages, frames, traces). M002 covers remaining items: form intelligence, intent ranking, semantic actions, plus performance work not in V2 proposal.
- `src/resources/extensions/browser-tools/package.json` — Extension package metadata
> See `.gsd/DECISIONS.md` for all architectural and pattern decisions — it is an append-only register; read it during planning, append to it during execution.
## Relevant Requirements
- R015 — Module decomposition: split index.ts into focused modules
- R016 — Shared evaluate utilities: inject once, reference everywhere
- R017 — Consolidated state capture: fewer evaluate calls per action
- R018 — Conditional body text: skip for low-signal actions
- R019 — Faster settle: short-circuit on zero mutations
- R020 — Sharp-based screenshot resizing
- R021 — Opt-in navigate screenshots
- R022 — browser_analyze_form
- R023 — browser_fill_form
- R024 — browser_find_best
- R025 — browser_act
- R026 — Test coverage
## Scope
### In Scope
- Decomposing index.ts into modules (core infrastructure, tool groups, browser-side utilities)
- Injecting shared browser-side utilities once via addInitScript or setup evaluate
- Consolidating captureCompactPageState + postActionSummary into fewer evaluate calls
- Conditional body text capture based on action signal level
- Short-circuiting settle on zero-mutation actions
- Replacing constrainScreenshot canvas approach with sharp
- Making screenshots opt-in on browser_navigate (default off)
- New tool: browser_analyze_form
- New tool: browser_fill_form
- New tool: browser_find_best (deterministic heuristic scoring)
- New tool: browser_act (semantic micro-actions)
- Test coverage for new and refactored code
### Out of Scope / Non-Goals
- Browser reuse across sessions (deferred, skip completely)
- LLM-powered intent resolution (deterministic heuristics only)
- Changes to core.js beyond what's needed for the module split
- Changes to existing tool APIs (all 43 existing tools maintain their current interface)
## Technical Constraints
- Must maintain backward compatibility for all 43 existing tools
- sharp is acceptable as a native dependency
- Browser-side injected utilities must work on any web page (no assumptions about page content)
- addInitScript runs before page scripts; must not conflict with page globals
- All injected browser-side code must use a namespaced global (e.g. window.__pi) to avoid collisions
## Integration Points
- Playwright — browser automation library, provides page.evaluate, page.addInitScript, locator API
- sharp — Node image processing library, replaces canvas-based constrainScreenshot
- pi extension API — registerTool, pi.on("session_shutdown"), ExtensionAPI interface
- core.js — existing shared utilities that index.ts imports
## Open Questions
- Best approach for shared evaluate utilities: page.addInitScript vs one-time page.evaluate at ensureBrowser time — addInitScript survives navigation but runs before page scripts; setup evaluate is simpler but must be re-run on navigation. Likely addInitScript is correct.
- How to handle the module-level mutable state (browser, context, pageRegistry, logs, refs) during decomposition — probably a shared state module that all tool modules import.

View file

@ -1,169 +0,0 @@
# M002: Browser Tools Performance & Intelligence
**Vision:** Transform browser-tools from a monolithic 5000-line file into a modular, faster, and smarter browser automation layer. Reduce per-action latency through consolidated state capture and faster settling. Replace fragile canvas screenshot resizing with sharp. Add form intelligence, intent-ranked retrieval, and semantic action tools that collapse common multi-call patterns into single tool calls.
## Success Criteria
- All 43 existing browser tools work identically after module decomposition
- Per-action latency reduced by consolidating state capture evaluate calls
- settleAfterActionAdaptive short-circuits on zero-mutation actions
- constrainScreenshot uses sharp in Node, not page canvas
- browser_navigate returns no screenshot by default
- browser_analyze_form returns field inventory for any standard HTML form
- browser_fill_form fills fields by label/name/placeholder mapping
- browser_find_best returns scored candidates for semantic intents
- browser_act executes common micro-tasks in one call
- Test suite covers shared utilities, heuristics, and new tools
## Key Risks / Unknowns
- Module split regression — 43 tools sharing mutable module-level state must all survive decomposition
- addInitScript behavior — injected utilities must be available in all evaluate contexts, survive navigation, not collide with page globals
- Form label association — real-world forms use diverse patterns; the heuristic mapper must handle common cases robustly
## Proof Strategy
- Module split regression → retire in S01 by proving build succeeds and all existing tools register/execute with the new structure
- addInitScript behavior → retire in S01 by proving shared utilities are callable from evaluate callbacks after navigation
- Form label association → retire in S04 by proving browser_analyze_form and browser_fill_form work on a real multi-field form
## Verification Classes
- Contract verification: unit tests for heuristic scoring, utility functions, form analysis logic, screenshot resizing
- Integration verification: existing tools register and execute against a real browser page after module split
- Operational verification: build succeeds, extension loads, sharp dependency resolves
- UAT / human verification: spot-check new tools against real web forms and pages
## Milestone Definition of Done
This milestone is complete only when all are true:
- index.ts is decomposed into focused modules; build succeeds
- Shared browser-side utilities are injected once and used by buildRefSnapshot, resolveRefTarget, and new tools
- Action tools use consolidated state capture (fewer evaluate calls than before)
- Low-signal actions skip body text capture
- Settle short-circuits on zero-mutation actions
- constrainScreenshot uses sharp
- browser_navigate defaults to no screenshot
- browser_analyze_form, browser_fill_form, browser_find_best, and browser_act are registered and functional
- Test suite passes
- All 43 existing tools verified against a running page (spot-check)
## Requirement Coverage
- Covers: R015, R016, R017, R018, R019, R020, R021, R022, R023, R024, R025, R026
- Partially covers: none
- Leaves for later: R027 (browser reuse — deferred)
- Orphan risks: none
## Slices
- [x] **S01: Module decomposition and shared evaluate utilities** `risk:high` `depends:[]`
> After this: all 43 existing browser tools work identically with the new module structure; shared browser-side utilities (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName) are injected once via addInitScript and used by buildRefSnapshot and resolveRefTarget — verified by build success and spot-check against a real page.
- [x] **S02: Action pipeline performance** `risk:medium` `depends:[S01]`
> After this: captureCompactPageState and postActionSummary are consolidated into fewer evaluate calls per action; settleAfterActionAdaptive short-circuits on zero-mutation actions; low-signal actions (scroll, hover, Tab) skip body text capture — verified by build success and behavioral spot-check.
- [x] **S03: Screenshot pipeline** `risk:low` `depends:[S01]`
> After this: constrainScreenshot uses sharp instead of canvas; browser_navigate returns no screenshot by default with an explicit parameter to opt in — verified by build success and running browser_navigate to confirm no screenshot in response.
- [x] **S04: Form intelligence** `risk:medium` `depends:[S01]`
> After this: browser_analyze_form returns field inventory (labels, types, required, values, validation) for any form; browser_fill_form fills fields by label/name/placeholder mapping and optionally submits — verified by running both tools against a real multi-field form.
- [x] **S05: Intent-ranked retrieval and semantic actions** `risk:medium` `depends:[S01]`
> After this: browser_find_best returns scored candidates for intents like "submit form", "close dialog", "primary CTA"; browser_act executes common micro-tasks in one call — verified by running both tools against real pages.
- [x] **S06: Test coverage** `risk:low` `depends:[S01,S02,S03,S04,S05]`
> After this: test suite covers shared browser-side utilities, settle logic, screenshot resizing, form analysis heuristics, intent scoring, and semantic action resolution — verified by test runner passing.
## Boundary Map
### S01 → S02
Produces:
- `browser-tools/state.ts` — shared mutable state module (browser, context, pageRegistry, logs, refs, timeline, session state) with accessor functions
- `browser-tools/utils.ts` — shared Node-side utilities (truncateText, artifact helpers, error formatting)
- `browser-tools/lifecycle.ts` — ensureBrowser(), closeBrowser(), getActivePage(), getActiveTarget(), attachPageListeners()
- `browser-tools/capture.ts` — captureCompactPageState(), postActionSummary(), constrainScreenshot(), captureErrorScreenshot(), getRecentErrors()
- `browser-tools/settle.ts` — settleAfterActionAdaptive(), ensureMutationCounter(), readMutationCounter(), readFocusedDescriptor()
- `browser-tools/refs.ts` — buildRefSnapshot(), resolveRefTarget(), parseRef(), ref state management
- `browser-tools/evaluate-helpers.ts` — browser-side utility source injected via addInitScript (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName)
- `browser-tools/tools/` — tool registration files grouped by category
Consumes:
- nothing (first slice)
### S01 → S03
Produces:
- `browser-tools/capture.ts` — constrainScreenshot() as a separate function that S03 will replace internals of
Consumes:
- nothing (first slice)
### S01 → S04
Produces:
- `browser-tools/evaluate-helpers.ts` — shared browser-side utilities that form tools will reference
- `browser-tools/lifecycle.ts` — ensureBrowser(), getActiveTarget()
- `browser-tools/state.ts` — action timeline, page state accessors
Consumes:
- nothing (first slice)
### S01 → S05
Produces:
- `browser-tools/evaluate-helpers.ts` — shared browser-side utilities that intent tools will reference
- `browser-tools/refs.ts` — buildRefSnapshot() for element inventory
- `browser-tools/lifecycle.ts` — ensureBrowser(), getActiveTarget()
Consumes:
- nothing (first slice)
### S02 → S06
Produces:
- Consolidated captureCompactPageState + postActionSummary logic (testable)
- Modified settleAfterActionAdaptive with zero-mutation short-circuit (testable)
- Action signal classification (high/low) for body text capture (testable)
Consumes from S01:
- Module structure, shared state, evaluate helpers
### S03 → S06
Produces:
- sharp-based constrainScreenshot (testable with buffer fixtures)
Consumes from S01:
- capture.ts module structure
### S04 → S05
Produces:
- Form analysis evaluate logic (field inventory, label mapping) that browser_act reuses for "submit form" intent
Consumes from S01:
- evaluate-helpers.ts, lifecycle.ts, state.ts
### S04 → S06
Produces:
- Form label association heuristics (testable)
- Field inventory logic (testable)
Consumes from S01:
- Module structure
### S05 → S06
Produces:
- Intent scoring heuristics (testable)
- Semantic action resolution logic (testable)
Consumes from S01:
- Module structure, refs, evaluate helpers
Consumes from S04:
- Form analysis logic for "submit form" intent

View file

@ -1,85 +0,0 @@
# S01: Module decomposition and shared evaluate utilities
**Goal:** Split browser-tools index.ts (~5000 lines) into focused modules with shared browser-side utilities injected via addInitScript — all 43 existing tools work identically after.
**Demo:** Extension loads via jiti, all 43 tools register, browser_navigate + browser_snapshot_refs + browser_click work against a real page, buildRefSnapshot/resolveRefTarget use window.__pi utilities instead of inline duplicates.
## Must-Haves
- All 18 mutable state variables live in state.ts with accessor/mutator functions
- Infrastructure functions (ensureBrowser, captureCompactPageState, settleAfterActionAdaptive, buildRefSnapshot, resolveRefTarget, etc.) live in dedicated modules
- 43 tool registrations distributed across 9 categorized files in tools/
- index.ts is a slim orchestrator (<50 lines) that imports and calls registration functions
- evaluate-helpers.ts exports a JS string constant defining window.__pi.{cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, domPath, selectorHints}
- ensureBrowser() injects evaluate-helpers via context.addInitScript()
- buildRefSnapshot and resolveRefTarget reference window.__pi.* instead of redeclaring utilities inline
- Extension loads via jiti at runtime — no build step failures
- All 43 tools register and are callable
## Proof Level
- This slice proves: operational + integration (module split works at runtime, tools register and execute)
- Real runtime required: yes (jiti loading, Playwright browser)
- Human/UAT required: no (spot-check is agent-executable)
## Verification
- `node -e "const jiti = require('@mariozechner/jiti')(...); const ext = jiti('src/resources/extensions/browser-tools/index.ts'); console.log(typeof ext.default)"` — extension loads without error
- Run browser_navigate to a test page, then browser_snapshot_refs, then browser_click on a ref — all succeed
- Verify window.__pi utilities are available: `page.evaluate(() => typeof window.__pi?.cssPath)` returns "function"
- Count registered tools === 43
## Integration Closure
- Upstream surfaces consumed: `core.js` (pure helpers), `@gsd/pi-coding-agent` (ExtensionAPI type, truncation utils)
- New wiring introduced in this slice: state.ts accessor pattern, ToolDeps interface, addInitScript injection in ensureBrowser()
- What remains before the milestone is truly usable end-to-end: S02 (performance), S03 (screenshot/sharp), S04 (form tools), S05 (intent tools), S06 (tests)
## Tasks
- [x] **T01: Extract state, types, utilities, and evaluate-helpers modules** `est:1h`
- Why: Foundation — everything else imports from these. State accessors are the key risk (jiti mutable binding behavior). evaluate-helpers is a standalone string constant with no imports.
- Files: `src/resources/extensions/browser-tools/state.ts`, `src/resources/extensions/browser-tools/utils.ts`, `src/resources/extensions/browser-tools/evaluate-helpers.ts`
- Do: Extract all 18 mutable state variables + types into state.ts with get/set accessor functions and resetAllState(). Extract truncateText, artifact helpers, error formatting, accessibility helpers, assertion helpers, verification helpers into utils.ts. Write evaluate-helpers.ts as an exported string constant containing the browser-side JS for window.__pi utilities (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, domPath, selectorHints). Define ToolDeps interface that tool registration functions will accept. Preserve the djb2 hash invariant — simpleHash must match core.js computeContentHash algorithm.
- Verify: `node -e "..."` — state.ts, utils.ts, evaluate-helpers.ts all import without error via jiti
- Done when: Three modules exist, export correct interfaces, and load via jiti without circular dependency errors
- [x] **T02: Extract infrastructure modules and wire addInitScript injection** `est:1.5h`
- Why: Delivers R016 (shared evaluate utilities) and the infrastructure layer that all tool files depend on. This is where addInitScript injection lands and where buildRefSnapshot/resolveRefTarget stop redeclaring utilities.
- Files: `src/resources/extensions/browser-tools/lifecycle.ts`, `src/resources/extensions/browser-tools/capture.ts`, `src/resources/extensions/browser-tools/settle.ts`, `src/resources/extensions/browser-tools/refs.ts`
- Do: Extract ensureBrowser/closeBrowser/getActivePage/getActiveTarget/attachPageListeners into lifecycle.ts — add context.addInitScript(EVALUATE_HELPERS_SOURCE) right after browser.newContext(). Extract captureCompactPageState/postActionSummary/constrainScreenshot/captureErrorScreenshot/getRecentErrors into capture.ts. Extract settleAfterActionAdaptive/ensureMutationCounter/readMutationCounter/readFocusedDescriptor into settle.ts. Extract buildRefSnapshot/resolveRefTarget/parseRef/formatVersionedRef/staleRefGuidance into refs.ts — refactor the evaluate callbacks in buildRefSnapshot and resolveRefTarget to reference window.__pi.cssPath, window.__pi.simpleHash etc. instead of redeclaring them. All modules import state accessors from state.ts, never raw variables.
- Verify: Modules load via jiti. buildRefSnapshot evaluate callback no longer contains function declarations for cssPath/simpleHash (grep confirms). lifecycle.ts contains addInitScript call.
- Done when: Four infrastructure modules exist, lifecycle.ts injects evaluate-helpers, refs.ts uses window.__pi.*, all load without error
- [x] **T03: Extract tool registrations into grouped files and create slim index.ts** `est:1.5h`
- Why: Delivers R015 (module decomposition). The 43 tool registrations move from a single 3400-line block into 9 categorized files. index.ts becomes a slim orchestrator.
- Files: `src/resources/extensions/browser-tools/tools/navigation.ts`, `tools/screenshot.ts`, `tools/interaction.ts`, `tools/inspection.ts`, `tools/session.ts`, `tools/assertions.ts`, `tools/refs.ts`, `tools/wait.ts`, `tools/pages.ts`, `src/resources/extensions/browser-tools/index.ts`
- Do: Create tools/ directory. Each file exports a register function (e.g. registerNavigationTools(pi, deps)) that takes ExtensionAPI and ToolDeps. Move tool registrations verbatim — no logic changes, just import wiring. browser_batch in assertions.ts needs imports for settleAfterActionAdaptive, parseRef, resolveRefTarget, collectAssertionState, etc. Write new index.ts (<50 lines): import all register functions, build ToolDeps object, call each register function, register session_shutdown hook.
- Verify: Count pi.registerTool calls across all tool files === 43. Extension loads via jiti. index.ts is under 50 lines.
- Done when: Old monolithic index.ts is replaced by slim orchestrator, 9 tool files exist with correct tool counts per category, extension loads
- [x] **T04: Runtime verification against a real browser page** `est:30m`
- Why: The split is worthless if tools don't actually work. This task proves the operational contract by exercising the extension end-to-end.
- Files: none (verification only)
- Do: Load the extension, launch a browser, navigate to a page, take a snapshot, click a ref, verify window.__pi is injected. Check that buildRefSnapshot evaluate callback uses window.__pi (not inline declarations). Verify closeBrowser() resets all state. Verify re-launch after close works (addInitScript re-registered on new context).
- Verify: browser_navigate succeeds, browser_snapshot_refs returns refs, browser_click_ref resolves and clicks, page.evaluate(() => Object.keys(window.__pi)) returns expected function names, close + re-open cycle works
- Done when: All 43 tools register, navigate/snapshot/click work against a real page, window.__pi utilities are callable in evaluate context, close/reopen cycle passes
## Files Likely Touched
- `src/resources/extensions/browser-tools/index.ts` (rewritten to slim orchestrator)
- `src/resources/extensions/browser-tools/state.ts` (new)
- `src/resources/extensions/browser-tools/utils.ts` (new)
- `src/resources/extensions/browser-tools/evaluate-helpers.ts` (new)
- `src/resources/extensions/browser-tools/lifecycle.ts` (new)
- `src/resources/extensions/browser-tools/capture.ts` (new)
- `src/resources/extensions/browser-tools/settle.ts` (new)
- `src/resources/extensions/browser-tools/refs.ts` (new)
- `src/resources/extensions/browser-tools/tools/navigation.ts` (new)
- `src/resources/extensions/browser-tools/tools/screenshot.ts` (new)
- `src/resources/extensions/browser-tools/tools/interaction.ts` (new)
- `src/resources/extensions/browser-tools/tools/inspection.ts` (new)
- `src/resources/extensions/browser-tools/tools/session.ts` (new)
- `src/resources/extensions/browser-tools/tools/assertions.ts` (new)
- `src/resources/extensions/browser-tools/tools/refs.ts` (new)
- `src/resources/extensions/browser-tools/tools/wait.ts` (new)
- `src/resources/extensions/browser-tools/tools/pages.ts` (new)

View file

@ -1,52 +0,0 @@
---
estimated_steps: 5
estimated_files: 3
---
# T01: Extract state, types, utilities, and evaluate-helpers modules
**Slice:** S01 — Module decomposition and shared evaluate utilities
**Milestone:** M002
## Description
Extract the foundation modules that all other browser-tools modules will import from. `state.ts` holds all 18 mutable state variables behind accessor functions (critical for jiti compatibility — ES module live bindings may not work). `utils.ts` holds Node-side utility functions. `evaluate-helpers.ts` exports a JS string constant for browser-side injection. Define the `ToolDeps` interface that tool registration functions will consume.
## Steps
1. Create `state.ts`: move all 18 mutable state variables (lines 62202 of index.ts), their type/interface definitions, and the constants (ARTIFACT_ROOT, HAR_FILENAME). Export get/set accessor functions for each variable (getBrowser/setBrowser, getContext/setContext, etc.). Export `resetAllState()` that mirrors current `closeBrowser()`'s reset logic. Export the `pageRegistry` and `actionTimeline` instances (these are objects with internal state, not plain variables). Import `createPageRegistry`, `createActionTimeline`, `createBoundedLogPusher` from `./core.js`.
2. Create `utils.ts`: move `truncateText()`, `formatArtifactTimestamp()`, `ensureDir()`, `writeArtifactFile()`, `copyArtifactFile()`, `ensureSessionStartedAt()`, `ensureSessionArtifactDir()`, `buildSessionArtifactPath()`, `getActivePageMetadata()`, `getActiveFrameMetadata()`, `getSessionArtifactMetadata()`, `sanitizeArtifactName()`, `getLivePagesSnapshot()`, `resolveAccessibilityScope()`, `captureAccessibilityMarkdown()`, `isCriticalResourceType()`, `updatePendingCriticalRequests()`, `getPendingCriticalRequests()`, `verificationFromChecks()`, `verificationLine()`, `collectAssertionState()`, `formatAssertionText()`, `formatDiffText()`, `getUrlHash()`, `countOpenDialogs()`, `captureClickTargetState()`, `readInputLikeValue()`, `firstErrorLine()`, `beginTrackedAction()`, `finishTrackedAction()`, `getSinceTimestamp()`, `getConsoleEntriesSince()`, `getNetworkEntriesSince()`. These import state accessors from `./state.ts`. Functions that reference `browser`, `context`, `consoleLogs`, etc. use the accessor pattern.
3. Create `evaluate-helpers.ts`: export a single `EVALUATE_HELPERS_SOURCE` string constant containing an IIFE that attaches functions to `window.__pi`. The functions: `cssPath`, `simpleHash`, `isVisible`, `isEnabled`, `inferRole`, `accessibleName`, `isInteractiveEl`, `domPath`, `selectorHints`. Copy these verbatim from `buildRefSnapshot`'s evaluate callback (lines 12281430 of index.ts). Wrap in `(function() { window.__pi = window.__pi || {}; window.__pi.cssPath = ...; ... })()`. Ensure `simpleHash` uses the exact djb2 algorithm that matches `core.js`.
4. Define `ToolDeps` interface (in state.ts or a separate types file — decide based on import graph). This bundles the infrastructure functions that tool registration files need: `ensureBrowser`, `closeBrowser`, `getActivePage`, `getActiveTarget`, `getActivePageOrNull`, `captureCompactPageState`, `postActionSummary`, `constrainScreenshot`, `captureErrorScreenshot`, `getRecentErrors`, `settleAfterActionAdaptive`, `ensureMutationCounter`, `buildRefSnapshot`, `resolveRefTarget`, `parseRef`, `formatVersionedRef`, `staleRefGuidance`, `formatCompactStateSummary`, `beginTrackedAction`, `finishTrackedAction`, etc.
5. Verify all three modules load via jiti without errors. Check no circular dependencies exist (state.ts imports only from core.js and node stdlib; utils.ts imports from state.ts and core.js; evaluate-helpers.ts imports nothing).
## Must-Haves
- [ ] state.ts exports accessor functions for all 18 state variables, not raw `export let`
- [ ] state.ts exports `resetAllState()` that resets every variable to its initial value
- [ ] evaluate-helpers.ts `simpleHash` uses identical djb2 algorithm to core.js `computeContentHash`
- [ ] evaluate-helpers.ts covers all 9 functions: cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, domPath, selectorHints
- [ ] No circular imports between the three new modules
- [ ] ToolDeps interface defined and exported
## Verification
- `node -e "const jiti = require('@mariozechner/jiti')(...); jiti('./src/resources/extensions/browser-tools/state.ts'); console.log('state ok')"` — no error
- `node -e "const jiti = require('@mariozechner/jiti')(...); jiti('./src/resources/extensions/browser-tools/utils.ts'); console.log('utils ok')"` — no error
- `node -e "const jiti = require('@mariozechner/jiti')(...); const h = jiti('./src/resources/extensions/browser-tools/evaluate-helpers.ts'); console.log(h.EVALUATE_HELPERS_SOURCE.includes('cssPath'))"` — prints true
- grep evaluate-helpers.ts for all 9 function names
## Inputs
- `src/resources/extensions/browser-tools/index.ts` — lines 62202 (state/types), lines 204620 (helpers), lines 12281430 (browser-side utilities)
- `src/resources/extensions/browser-tools/core.js``computeContentHash` djb2 algorithm for hash invariant check
## Expected Output
- `src/resources/extensions/browser-tools/state.ts` — all state + types + accessors + resetAllState + ToolDeps interface
- `src/resources/extensions/browser-tools/utils.ts` — all Node-side utility functions
- `src/resources/extensions/browser-tools/evaluate-helpers.ts` — EVALUATE_HELPERS_SOURCE string constant

View file

@ -1,54 +0,0 @@
---
estimated_steps: 5
estimated_files: 4
---
# T02: Extract infrastructure modules and wire addInitScript injection
**Slice:** S01 — Module decomposition and shared evaluate utilities
**Milestone:** M002
## Description
Extract the four infrastructure modules (lifecycle, capture, settle, refs) that sit between state/utils and the tool registration layer. The key deliverable beyond mechanical extraction: `lifecycle.ts` injects `EVALUATE_HELPERS_SOURCE` via `context.addInitScript()` in `ensureBrowser()`, and `refs.ts` refactors `buildRefSnapshot`/`resolveRefTarget` evaluate callbacks to reference `window.__pi.*` instead of redeclaring utilities inline. This retires the R016 risk (shared browser-side evaluate utilities).
## Steps
1. Create `lifecycle.ts`: move `ensureBrowser()`, `closeBrowser()`, `getActivePage()`, `getActiveTarget()`, `getActivePageOrNull()`, `attachPageListeners()` from index.ts. Import state accessors from `./state.ts`. Import `EVALUATE_HELPERS_SOURCE` from `./evaluate-helpers.ts`. In `ensureBrowser()`, add `context.addInitScript(EVALUATE_HELPERS_SOURCE)` immediately after `browser.newContext()` and before `context.newPage()`. `closeBrowser()` calls `resetAllState()` from state.ts instead of resetting variables individually.
2. Create `capture.ts`: move `captureCompactPageState()`, `formatCompactStateSummary()`, `postActionSummary()`, `constrainScreenshot()`, `captureErrorScreenshot()`, `getRecentErrors()` from index.ts. Import from `./state.ts` and `./lifecycle.ts` as needed.
3. Create `settle.ts`: move `settleAfterActionAdaptive()`, `ensureMutationCounter()`, `readMutationCounter()`, `readFocusedDescriptor()` from index.ts. Import from `./state.ts`.
4. Create `refs.ts`: move `buildRefSnapshot()`, `resolveRefTarget()`, `parseRef()`, `formatVersionedRef()`, `staleRefGuidance()` from index.ts. **Refactor `buildRefSnapshot`'s evaluate callback:** remove the inline function declarations for `cssPath`, `simpleHash`, `isVisible`, `isEnabled`, `inferRole`, `accessibleName`, `isInteractiveEl`, `domPath`, `selectorHints`, `matchesMode`, `computeNearestHeading`, `computeFormOwnership` — replace with `window.__pi.cssPath(el)`, `window.__pi.simpleHash(str)`, etc. for the 9 injected functions. Keep `matchesMode`, `computeNearestHeading`, `computeFormOwnership` inline (they're not shared/duplicated). **Refactor `resolveRefTarget`'s evaluate callback:** remove inline `cssPath` and `simpleHash` declarations, replace with `window.__pi.cssPath` and `window.__pi.simpleHash`.
5. Verify all four modules load via jiti. Grep `buildRefSnapshot` and `resolveRefTarget` to confirm zero inline declarations of `cssPath` or `simpleHash`. Verify `lifecycle.ts` contains the `addInitScript` call.
## Must-Haves
- [ ] lifecycle.ts calls `context.addInitScript(EVALUATE_HELPERS_SOURCE)` after `browser.newContext()` and before `context.newPage()`
- [ ] closeBrowser() in lifecycle.ts calls resetAllState() from state.ts
- [ ] buildRefSnapshot evaluate callback uses window.__pi.cssPath, window.__pi.simpleHash, etc. — zero inline redeclarations of the 9 shared functions
- [ ] resolveRefTarget evaluate callback uses window.__pi.cssPath and window.__pi.simpleHash — zero inline redeclarations
- [ ] No circular imports between infrastructure modules (lifecycle→state, capture→state+lifecycle, settle→state, refs→state)
## Verification
- `grep -c "function cssPath\|function simpleHash" src/resources/extensions/browser-tools/refs.ts` returns 0
- `grep "addInitScript" src/resources/extensions/browser-tools/lifecycle.ts` returns a match
- `grep "resetAllState" src/resources/extensions/browser-tools/lifecycle.ts` returns a match
- All four modules load via jiti without error
## Inputs
- `src/resources/extensions/browser-tools/state.ts` — state accessors (from T01)
- `src/resources/extensions/browser-tools/utils.ts` — utility functions (from T01)
- `src/resources/extensions/browser-tools/evaluate-helpers.ts` — EVALUATE_HELPERS_SOURCE (from T01)
- `src/resources/extensions/browser-tools/index.ts` — source functions to extract
## Expected Output
- `src/resources/extensions/browser-tools/lifecycle.ts` — browser lifecycle with addInitScript injection
- `src/resources/extensions/browser-tools/capture.ts` — page state capture functions
- `src/resources/extensions/browser-tools/settle.ts` — DOM settle logic
- `src/resources/extensions/browser-tools/refs.ts` — ref snapshot/resolution using window.__pi.*

View file

@ -1,70 +0,0 @@
---
estimated_steps: 4
estimated_files: 10
---
# T03: Extract tool registrations into grouped files and create slim index.ts
**Slice:** S01 — Module decomposition and shared evaluate utilities
**Milestone:** M002
## Description
Move all 43 tool registrations from the monolithic export default function into 9 categorized tool files under `tools/`. Each file exports a single registration function. Rewrite `index.ts` as a slim orchestrator that imports everything and wires it together. This is the largest task by line count but the most mechanical — tool implementations don't change, only their location and import sources.
## Steps
1. Create `tools/` directory and 9 tool files. Each exports a function like `export function registerNavigationTools(pi: ExtensionAPI, deps: ToolDeps)`. Tool categorization per research:
- `navigation.ts` — browser_navigate, browser_go_back, browser_go_forward, browser_reload (4 tools)
- `screenshot.ts` — browser_screenshot (1 tool)
- `interaction.ts` — browser_click, browser_drag, browser_type, browser_upload_file, browser_scroll, browser_hover, browser_key_press, browser_select_option, browser_set_checked, browser_set_viewport (10 tools)
- `inspection.ts` — browser_get_console_logs, browser_get_network_logs, browser_get_dialog_logs, browser_evaluate, browser_get_page_source, browser_get_accessibility_tree, browser_find (7 tools)
- `session.ts` — browser_close, browser_trace_start, browser_trace_stop, browser_export_har, browser_timeline, browser_session_summary, browser_debug_bundle (7 tools)
- `assertions.ts` — browser_assert, browser_diff, browser_batch (3 tools)
- `tools/refs.ts` — browser_snapshot_refs, browser_get_ref, browser_click_ref, browser_hover_ref, browser_fill_ref (5 tools)
- `wait.ts` — browser_wait_for (1 tool)
- `pages.ts` — browser_list_pages, browser_switch_page, browser_close_page, browser_list_frames, browser_select_frame (5 tools)
2. For each tool, the execute function body stays verbatim. Replace direct function calls (ensureBrowser, captureCompactPageState, etc.) with `deps.ensureBrowser()`, `deps.captureCompactPageState()`, etc. Replace direct state variable access (consoleLogs, currentRefMap, etc.) with state accessor calls imported from `../state.ts`.
3. Handle `browser_batch` carefully — its `executeStep` closure calls `settleAfterActionAdaptive`, `parseRef`, `resolveRefTarget`, `collectAssertionState`, `evaluateAssertionChecks`, and accesses `consoleLogs` directly. All of these come through deps or state imports. The `validateWaitParams`, `parseThreshold`, `meetsThreshold`, `includesNeedle`, `createRegionStableScript` come from core.js imports.
4. Rewrite `index.ts` as slim orchestrator: import all 9 register functions, import infrastructure modules, build the ToolDeps object, call each register function, register the `session_shutdown` hook. Target: under 50 lines. The old index.ts content is fully replaced.
## Must-Haves
- [ ] Exactly 43 pi.registerTool calls across all 9 tool files (count must match)
- [ ] index.ts is under 50 lines and contains zero tool registrations
- [ ] browser_batch internal step execution works — all infrastructure functions accessible via deps/imports
- [ ] No tool parameter schemas or return formats changed
- [ ] Extension loads via jiti and all tools register
## Verification
- `grep -rc "pi.registerTool" src/resources/extensions/browser-tools/tools/` sums to 43
- `wc -l src/resources/extensions/browser-tools/index.ts` is under 50
- `grep "pi.registerTool" src/resources/extensions/browser-tools/index.ts` returns no matches
- Extension loads via jiti without error
## Inputs
- `src/resources/extensions/browser-tools/state.ts` — state accessors (from T01)
- `src/resources/extensions/browser-tools/utils.ts` — utility functions (from T01)
- `src/resources/extensions/browser-tools/lifecycle.ts` — browser lifecycle (from T02)
- `src/resources/extensions/browser-tools/capture.ts` — state capture (from T02)
- `src/resources/extensions/browser-tools/settle.ts` — DOM settle (from T02)
- `src/resources/extensions/browser-tools/refs.ts` — ref management (from T02)
- `src/resources/extensions/browser-tools/index.ts` — source tool registrations to extract (lines 16144989)
## Expected Output
- `src/resources/extensions/browser-tools/tools/navigation.ts` (4 tools)
- `src/resources/extensions/browser-tools/tools/screenshot.ts` (1 tool)
- `src/resources/extensions/browser-tools/tools/interaction.ts` (10 tools)
- `src/resources/extensions/browser-tools/tools/inspection.ts` (7 tools)
- `src/resources/extensions/browser-tools/tools/session.ts` (7 tools)
- `src/resources/extensions/browser-tools/tools/assertions.ts` (3 tools)
- `src/resources/extensions/browser-tools/tools/refs.ts` (5 tools)
- `src/resources/extensions/browser-tools/tools/wait.ts` (1 tool)
- `src/resources/extensions/browser-tools/tools/pages.ts` (5 tools)
- `src/resources/extensions/browser-tools/index.ts` — slim orchestrator (<50 lines)

View file

@ -1,50 +0,0 @@
---
estimated_steps: 4
estimated_files: 0
---
# T04: Runtime verification against a real browser page
**Slice:** S01 — Module decomposition and shared evaluate utilities
**Milestone:** M002
## Description
End-to-end verification that the module split actually works at runtime. Load the extension via jiti, verify all 43 tools register, launch a real browser, navigate to a page, exercise snapshot/click/ref tools, confirm window.__pi injection, and verify the close/reopen cycle re-registers addInitScript. This is pure verification — no code changes unless bugs are found.
## Steps
1. Load the extension module via jiti and verify it exports a default function. Mock or use the real ExtensionAPI to count tool registrations — confirm exactly 43.
2. Use the running pi instance or a test script to exercise the browser tools sequence: browser_navigate to a local or test URL → verify page title returned → browser_snapshot_refs → verify ref nodes returned → browser_click on a returned ref → verify click succeeds.
3. Verify window.__pi injection: use browser_evaluate to run `Object.keys(window.__pi)` and confirm it contains cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, domPath, selectorHints. Navigate to a new URL and re-check — confirms addInitScript survives navigation.
4. Verify close/reopen: call browser_close, then browser_navigate again. Confirm window.__pi is still available on the new browser context (addInitScript re-registered on the fresh context created by ensureBrowser).
## Must-Haves
- [ ] 43 tools registered (no more, no less)
- [ ] browser_navigate returns page title and URL
- [ ] browser_snapshot_refs returns ref nodes with valid structure
- [ ] window.__pi contains all 9 expected functions
- [ ] window.__pi survives navigation to new URL
- [ ] Close + reopen cycle works — window.__pi available on fresh context
## Verification
- Tool registration count === 43
- browser_navigate succeeds (returns content, no error)
- browser_snapshot_refs returns array with at least 1 ref
- `page.evaluate(() => Object.keys(window.__pi).sort())` returns the 9 expected function names
- After browser_close + browser_navigate: window.__pi still available
## Inputs
- All modules from T01T03 in place
- A reachable URL to navigate to (localhost dev server or data: URL)
## Expected Output
- Verification passes — no code changes needed (or bug fixes applied if issues found)
- Slice is confirmed done

View file

@ -1,56 +0,0 @@
# S02: Action pipeline performance
**Goal:** Reduce per-action evaluate overhead by consolidating state capture, short-circuiting settle on zero mutations, and skipping body text for low-signal actions.
**Demo:** Build succeeds. A browser_click action runs 3 fewer evaluate calls than before (5+N vs 8+N). Settle returns `zero_mutation_shortcut` reason when no mutations fire. Low-signal tools (scroll, hover, drag) skip body text capture.
## Must-Haves
- `postActionSummary` eliminated from high-signal tools — replaced by `captureCompactPageState` + `formatCompactStateSummary`
- `countOpenDialogs` removed as standalone call — dialog count comes from `captureCompactPageState`'s existing `dialog.count` field
- High-signal tools (click, type, key_press, select_option, set_checked, navigate) capture body text in afterState
- Low-signal tools (scroll, hover, drag, upload_file, hover_ref) skip body text in `captureCompactPageState`
- `settleAfterActionAdaptive` short-circuits with `zero_mutation_shortcut` settle reason when no mutations fire in the first 60ms
- `AdaptiveSettleDetails.settleReason` type includes `"zero_mutation_shortcut"`
- `readMutationCounter` + `readFocusedDescriptor` combined into single evaluate per settle poll
- Build succeeds via `npm run build`
## Proof Level
- This slice proves: operational + behavioral
- Real runtime required: no (build verification sufficient — behavioral improvements are structural, not observable without timing instrumentation)
- Human/UAT required: no
## Verification
- `npm run build` succeeds with zero errors
- `grep -c "countOpenDialogs" src/resources/extensions/browser-tools/tools/*.ts` returns 0 (no standalone dialog counting in tool files)
- `grep -c "postActionSummary" src/resources/extensions/browser-tools/tools/interaction.ts` returns 0 for high-signal tools that now use direct capture
- `grep "zero_mutation_shortcut" src/resources/extensions/browser-tools/settle.ts` finds the new settle reason
- `grep "includeBodyText" src/resources/extensions/browser-tools/tools/interaction.ts` shows explicit true/false per tool signal level
## Tasks
- [x] **T01: Consolidate capture pipeline and classify tool signal levels** `est:45m`
- Why: R017 + R018 — eliminate redundant evaluate calls per action by removing the `postActionSummary` + separate `captureCompactPageState` pattern in high-signal tools, folding `countOpenDialogs` into the existing `dialog.count` from captureCompactPageState, and classifying tools as high/low signal for body text capture.
- Files: `capture.ts`, `state.ts`, `utils.ts`, `index.ts`, `tools/interaction.ts`, `tools/navigation.ts`, `tools/refs.ts`
- Do: (1) Remove `postActionSummary` from ToolDeps — high-signal tools call `captureCompactPageState(includeBodyText: true)` once for afterState and derive summary via `formatCompactStateSummary`. Low-signal tools call `captureCompactPageState(includeBodyText: false)` and derive summary. (2) Remove standalone `countOpenDialogs` calls from tool files — use `afterState.dialog.count` / `beforeState.dialog.count` from the state already captured. (3) Keep `postActionSummary` function in capture.ts but remove it from ToolDeps and stop using it in action tools. Summary-only tools (go_back, go_forward, reload) can keep calling it since they don't do before/after diff. (4) Update ToolDeps interface. (5) Build verify.
- Verify: `npm run build` succeeds. `grep -c "countOpenDialogs" src/resources/extensions/browser-tools/tools/*.ts` returns 0. High-signal tools in interaction.ts have `includeBodyText: true` in afterState capture and no `postActionSummary` call.
- Done when: Build passes and high-signal tools use consolidated capture with explicit body text classification.
- [x] **T02: Settle zero-mutation short-circuit and poll consolidation** `est:25m`
- Why: R019 — save ~50ms on zero-mutation actions by short-circuiting the settle quiet window, and reduce per-poll evaluate calls by combining readMutationCounter + readFocusedDescriptor into one evaluate.
- Files: `settle.ts`, `state.ts`
- Do: (1) Add `"zero_mutation_shortcut"` to `AdaptiveSettleDetails.settleReason` union in state.ts. (2) In `settleAfterActionAdaptive`, track whether any mutation has fired since start. After 60ms with zero mutations, switch to a 30ms quiet window instead of 100ms and return `zero_mutation_shortcut` reason. (3) Combine `readMutationCounter` + `readFocusedDescriptor` into a single `readSettleState(target, checkFocus)` evaluate that returns `{ mutationCount, focusDescriptor }`. Replace per-poll sequential evaluates with this combined call. (4) Build verify.
- Verify: `npm run build` succeeds. `grep "zero_mutation_shortcut" src/resources/extensions/browser-tools/settle.ts` finds the new reason. The combined poll evaluate is a single `target.evaluate()` call returning both mutation count and focus descriptor.
- Done when: Build passes. Settle logic has zero-mutation short-circuit and combined poll evaluate.
## Files Likely Touched
- `src/resources/extensions/browser-tools/capture.ts`
- `src/resources/extensions/browser-tools/settle.ts`
- `src/resources/extensions/browser-tools/state.ts`
- `src/resources/extensions/browser-tools/utils.ts`
- `src/resources/extensions/browser-tools/index.ts`
- `src/resources/extensions/browser-tools/tools/interaction.ts`
- `src/resources/extensions/browser-tools/tools/navigation.ts`
- `src/resources/extensions/browser-tools/tools/refs.ts`

View file

@ -1,67 +0,0 @@
---
estimated_steps: 5
estimated_files: 7
---
# T01: Consolidate capture pipeline and classify tool signal levels
**Slice:** S02 — Action pipeline performance
**Milestone:** M002
## Description
Eliminate redundant evaluate round-trips per action by consolidating the capture pipeline. Currently high-signal tools call `postActionSummary` (which internally calls `captureCompactPageState` without body text) and then call `captureCompactPageState` again with `includeBodyText: true` — two evaluate calls for overlapping data. Additionally, tools call `countOpenDialogs` separately even though `captureCompactPageState` already captures `dialog.count`.
After this task: high-signal tools (click, type, key_press, select_option, set_checked, navigate) call `captureCompactPageState(includeBodyText: true)` once for afterState, derive the summary via `formatCompactStateSummary`, and read `dialog.count` from the captured state. Low-signal tools (scroll, hover, drag, upload_file) call `captureCompactPageState(includeBodyText: false)` and derive summary. Net saving: 3 evaluate round-trips per high-signal action.
## Steps
1. **Update ToolDeps in state.ts**: Remove `countOpenDialogs` from ToolDeps. `postActionSummary` stays in ToolDeps for now since summary-only tools (go_back, go_forward, reload) still use it — but action tools won't call it.
2. **Refactor high-signal tools in interaction.ts**: For `browser_click`, `browser_type`, `browser_key_press`, `browser_select_option`, `browser_set_checked`:
- Remove the `postActionSummary` call
- Remove standalone `countOpenDialogs` calls — use `beforeState.dialog.count` and `afterState.dialog.count` instead
- After settle, call `captureCompactPageState(p, { ..., includeBodyText: true })` once for afterState
- Derive summary text via `deps.formatCompactStateSummary(afterState)`
- The beforeState capture already has `dialog.count` — use it directly for dialog comparison
3. **Refactor browser_navigate in navigation.ts**: Same pattern — remove `postActionSummary`, use afterState (already captured) for summary via `formatCompactStateSummary`, use `dialog.count` from state.
4. **Refactor ref action tools in refs.ts**: For `browser_click_ref` — remove `countOpenDialogs` calls, use state's `dialog.count`. For `browser_click_ref`, `browser_hover_ref`, `browser_fill_ref` — replace `postActionSummary` with `captureCompactPageState` + `formatCompactStateSummary`. Mark ref action tools with explicit body text classification: `browser_click_ref` and `browser_fill_ref` get `includeBodyText: true` (high-signal), `browser_hover_ref` gets `includeBodyText: false` (low-signal).
5. **Classify low-signal tools in interaction.ts**: For `browser_scroll`, `browser_hover`, `browser_drag`, `browser_upload_file` — replace `postActionSummary` with `captureCompactPageState(includeBodyText: false)` + `formatCompactStateSummary`. This makes the signal classification explicit in code.
## Must-Haves
- [ ] No standalone `countOpenDialogs` calls in any tool file under `tools/`
- [ ] High-signal tools call `captureCompactPageState` with `includeBodyText: true` for afterState and derive summary via `formatCompactStateSummary`
- [ ] Low-signal tools call `captureCompactPageState` with `includeBodyText: false` and derive summary via `formatCompactStateSummary`
- [ ] `postActionSummary` remains available in ToolDeps for summary-only navigation tools (go_back, go_forward, reload) — these don't do before/after diff
- [ ] `countOpenDialogs` removed from ToolDeps interface and index.ts wiring
- [ ] `npm run build` succeeds
## Verification
- `npm run build` exits 0
- `grep -c "countOpenDialogs" src/resources/extensions/browser-tools/tools/*.ts` returns 0 for every file
- `grep -c "postActionSummary" src/resources/extensions/browser-tools/tools/interaction.ts` returns 0
- `grep "includeBodyText: false" src/resources/extensions/browser-tools/tools/interaction.ts` shows low-signal tools explicitly skipping body text
- `grep "includeBodyText: true" src/resources/extensions/browser-tools/tools/interaction.ts` shows high-signal tools explicitly including body text
## Inputs
- `src/resources/extensions/browser-tools/capture.ts``captureCompactPageState` and `postActionSummary` implementations
- `src/resources/extensions/browser-tools/state.ts` — ToolDeps interface, CompactPageState shape (includes `dialog.count`)
- `src/resources/extensions/browser-tools/utils.ts``formatCompactStateSummary`, `countOpenDialogs`
- `src/resources/extensions/browser-tools/tools/interaction.ts` — 10 interaction tools with current capture patterns
- `src/resources/extensions/browser-tools/tools/navigation.ts` — browser_navigate with postActionSummary + separate afterState capture
- `src/resources/extensions/browser-tools/tools/refs.ts` — click_ref/hover_ref/fill_ref with countOpenDialogs and postActionSummary
- S01 summary — module structure, ToolDeps contract, accessor patterns
## Expected Output
- `src/resources/extensions/browser-tools/state.ts` — ToolDeps without `countOpenDialogs`
- `src/resources/extensions/browser-tools/index.ts` — wiring without `countOpenDialogs`
- `src/resources/extensions/browser-tools/tools/interaction.ts` — all 10 tools using consolidated capture with explicit signal classification
- `src/resources/extensions/browser-tools/tools/navigation.ts` — browser_navigate using consolidated capture
- `src/resources/extensions/browser-tools/tools/refs.ts` — ref action tools using consolidated capture with signal classification

View file

@ -1,52 +0,0 @@
---
estimated_steps: 3
estimated_files: 2
---
# T02: Settle zero-mutation short-circuit and poll consolidation
**Slice:** S02 — Action pipeline performance
**Milestone:** M002
## Description
Save ~50ms on zero-mutation actions by short-circuiting the settle quiet window, and reduce per-poll evaluate overhead by combining `readMutationCounter` and `readFocusedDescriptor` into a single evaluate call.
Currently `settleAfterActionAdaptive` runs the full 100ms quiet window even when zero mutations have occurred. For actions like scroll, hover, or clicking static elements, this is wasted time. After 60ms with no mutation counter increment, the quiet window drops to 30ms.
Additionally, each poll iteration runs `readMutationCounter` (1 evaluate) and optionally `readFocusedDescriptor` (1 evaluate) sequentially. Combining them into one evaluate saves 1 round-trip per poll iteration (typically 2-4 polls per settle).
## Steps
1. **Add settle reason to type in state.ts**: Extend `AdaptiveSettleDetails.settleReason` union to include `"zero_mutation_shortcut"`.
2. **Create combined poll evaluate in settle.ts**: Replace separate `readMutationCounter` + `readFocusedDescriptor` calls in the poll loop with a single `readSettleState(target, checkFocus)` function that returns `{ mutationCount: number; focusDescriptor: string }` from one `target.evaluate()`. When `checkFocus` is false, return empty string for focusDescriptor. Keep the standalone `readMutationCounter` and `readFocusedDescriptor` exports for other consumers (interaction.ts imports `readFocusedDescriptor` directly for key_press before/after focus comparison).
3. **Implement zero-mutation short-circuit in settleAfterActionAdaptive**: Track `totalMutationsSeen` (sum of all mutation increments across polls). After 60ms, if `totalMutationsSeen === 0`, switch `quietWindowMs` to 30ms. When settle completes under this condition, return `settleReason: "zero_mutation_shortcut"`. The initial `ensureMutationCounter` + first `readMutationCounter` call before the loop should also be combined into the loop's first iteration where possible (use the combined evaluate).
## Must-Haves
- [ ] `AdaptiveSettleDetails.settleReason` union includes `"zero_mutation_shortcut"`
- [ ] Combined poll evaluate reads mutation counter + focus descriptor in one `evaluate()` call
- [ ] Zero-mutation short-circuit: after 60ms with no mutations, quiet window reduces to 30ms
- [ ] Settle returns `"zero_mutation_shortcut"` reason when short-circuit path is taken
- [ ] Standalone `readMutationCounter` and `readFocusedDescriptor` exports preserved for external consumers
- [ ] `npm run build` succeeds
## Verification
- `npm run build` exits 0
- `grep "zero_mutation_shortcut" src/resources/extensions/browser-tools/settle.ts` finds the new reason
- `grep "zero_mutation_shortcut" src/resources/extensions/browser-tools/state.ts` finds it in the type union
- The poll loop body contains a single `evaluate()` call (not two sequential ones)
## Inputs
- `src/resources/extensions/browser-tools/settle.ts` — current `settleAfterActionAdaptive`, `readMutationCounter`, `readFocusedDescriptor`
- `src/resources/extensions/browser-tools/state.ts``AdaptiveSettleDetails` interface
- S02 Research — settle timing analysis and proposed thresholds
## Expected Output
- `src/resources/extensions/browser-tools/settle.ts` — combined poll evaluate, zero-mutation short-circuit, new settle reason
- `src/resources/extensions/browser-tools/state.ts` — updated `AdaptiveSettleDetails.settleReason` type

View file

@ -1,40 +0,0 @@
# S03: Screenshot pipeline
**Goal:** `constrainScreenshot` uses sharp instead of canvas; `browser_navigate` returns no screenshot by default.
**Demo:** Build passes, `constrainScreenshot` calls sharp for dimension check and resize (no `page.evaluate`), `browser_navigate` omits screenshot unless `screenshot: true` is passed.
## Must-Haves
- `constrainScreenshot` uses `sharp(buffer).metadata()` for dimensions and `sharp(buffer).resize().jpeg()/png().toBuffer()` for resizing — no `page.evaluate` call
- Images already within MAX_SCREENSHOT_DIM bounds are returned unchanged (no re-encoding)
- JPEG output uses the `quality` parameter; PNG output uses lossless `.png()` (no quality param)
- `constrainScreenshot` keeps its existing `(page, buffer, mimeType, quality)` signature for backward compatibility
- `browser_navigate` has a `screenshot` parameter (default: `false`) gating screenshot capture
- `browser_reload` screenshot behavior is unchanged
- `captureErrorScreenshot` works with the new `constrainScreenshot`
- sharp added to root `package.json` dependencies and extension `peerDependencies`
## Verification
- `node -e "require('sharp')"` — sharp is installed and loadable
- `npx tsc --noEmit` or equivalent build check passes
- Grep verification: `grep -c "page.evaluate" src/resources/extensions/browser-tools/capture.ts` returns 0
- Grep verification: `grep "screenshot.*boolean" src/resources/extensions/browser-tools/tools/navigation.ts` finds the parameter
- Grep verification: `grep "default.*false\|screenshot.*false" src/resources/extensions/browser-tools/tools/navigation.ts` confirms default is false
- Extension loads via jiti and all 43 tools register
## Tasks
- [x] **T01: Replace constrainScreenshot with sharp and make navigate screenshots opt-in** `est:30m`
- Why: Delivers both R020 (sharp-based resizing) and R021 (opt-in navigate screenshots) — the two requirements this slice owns
- Files: `package.json`, `src/resources/extensions/browser-tools/package.json`, `src/resources/extensions/browser-tools/capture.ts`, `src/resources/extensions/browser-tools/tools/navigation.ts`
- Do: (1) Add sharp to root `package.json` dependencies and extension `peerDependencies`, run install. (2) Rewrite `constrainScreenshot` internals: use `sharp(buffer).metadata()` for width/height, return buffer unchanged if within bounds, otherwise `sharp(buffer).resize(MAX, MAX, { fit: 'inside' }).jpeg({ quality }).toBuffer()` for JPEG or `.png().toBuffer()` for PNG. Keep the `page` parameter unused. (3) Add `screenshot?: boolean` parameter (default: false) to `browser_navigate`, gate the screenshot capture block on it. Update the tool description. (4) Verify build, grep checks, extension load.
- Verify: Build passes; `grep -c "page.evaluate" capture.ts` returns 0; extension loads with 43 tools; navigate tool schema includes `screenshot` boolean parameter
- Done when: sharp handles all screenshot resizing with no page dependency; navigate returns no screenshot by default
## Files Likely Touched
- `package.json`
- `src/resources/extensions/browser-tools/package.json`
- `src/resources/extensions/browser-tools/capture.ts`
- `src/resources/extensions/browser-tools/tools/navigation.ts`

View file

@ -1,61 +0,0 @@
---
estimated_steps: 4
estimated_files: 4
---
# T01: Replace constrainScreenshot with sharp and make navigate screenshots opt-in
**Slice:** S03 — Screenshot pipeline
**Milestone:** M002
## Description
Two contained changes delivering R020 and R021. Replace `constrainScreenshot`'s manual JPEG/PNG header parsing and canvas-based resizing with sharp's `metadata()` and `resize()` APIs. Add an opt-in `screenshot` boolean parameter to `browser_navigate` (default false) so screenshots are only captured when explicitly requested.
## Steps
1. Add `sharp` to root `package.json` dependencies and to `src/resources/extensions/browser-tools/package.json` peerDependencies. Run `npm install`.
2. Rewrite `constrainScreenshot` in `capture.ts`:
- Add `import sharp from "sharp"` at top
- Replace manual header parsing with `const { width, height } = await sharp(buffer).metadata()`
- Early-return original buffer if `width <= MAX_SCREENSHOT_DIM && height <= MAX_SCREENSHOT_DIM`
- For JPEG: `return Buffer.from(await sharp(buffer).resize(MAX_SCREENSHOT_DIM, MAX_SCREENSHOT_DIM, { fit: 'inside' }).jpeg({ quality }).toBuffer())`
- For PNG: `return Buffer.from(await sharp(buffer).resize(MAX_SCREENSHOT_DIM, MAX_SCREENSHOT_DIM, { fit: 'inside' }).png().toBuffer())`
- Keep `page: Page` as first parameter (unused) — signature stability per D008 constraints
3. In `navigation.ts`, modify `browser_navigate`:
- Add `screenshot: Type.Optional(Type.Boolean({ description: "Capture and return a screenshot (default: false)", default: false }))` to parameters
- Gate the `screenshotContent` block with `if (params.screenshot)`
- Update the tool description to mention screenshots are opt-in
4. Verify: build passes, grep checks confirm no `page.evaluate` in capture.ts, extension loads with 43 tools via jiti
## Must-Haves
- [ ] `constrainScreenshot` uses sharp — zero `page.evaluate` calls in capture.ts
- [ ] Images within bounds returned unchanged (no re-encoding)
- [ ] JPEG uses quality param; PNG uses lossless `.png()`
- [ ] `(page, buffer, mimeType, quality)` signature preserved
- [ ] `browser_navigate` screenshot parameter defaults to false
- [ ] `browser_reload` screenshot behavior unchanged
- [ ] Build passes and extension loads with 43 tools
## Verification
- `npm install` succeeds with sharp
- `grep -c "page.evaluate" src/resources/extensions/browser-tools/capture.ts` returns 0
- `grep "screenshot.*Type.Boolean\|screenshot.*boolean" src/resources/extensions/browser-tools/tools/navigation.ts` finds the parameter
- Build/typecheck passes
- Extension loads via jiti: 43 tools registered
## Inputs
- `src/resources/extensions/browser-tools/capture.ts` — current `constrainScreenshot` with manual header parsing and canvas resizing (lines 126-182)
- `src/resources/extensions/browser-tools/tools/navigation.ts` — current `browser_navigate` with always-on screenshot (lines 56-61)
- `src/resources/extensions/browser-tools/state.ts` — ToolDeps interface with `constrainScreenshot` signature (line ~342)
- S01 summary — module structure, import patterns, ToolDeps contract
## Expected Output
- `package.json` — sharp added to dependencies
- `src/resources/extensions/browser-tools/package.json` — sharp added to peerDependencies
- `src/resources/extensions/browser-tools/capture.ts``constrainScreenshot` rewritten with sharp, zero `page.evaluate` calls
- `src/resources/extensions/browser-tools/tools/navigation.ts``browser_navigate` has `screenshot` parameter (default false), gated screenshot block, updated description

View file

@ -1,58 +0,0 @@
# S04: Form Intelligence
**Goal:** Two new browser tools — `browser_analyze_form` and `browser_fill_form` — that collapse multi-call form workflows into single tool calls.
**Demo:** Run `browser_analyze_form` against a multi-field HTML form and get a complete field inventory. Run `browser_fill_form` with a values mapping and see fields filled correctly with validation feedback.
## Must-Haves
- `browser_analyze_form` returns field inventory: labels, names, types, required, values, validation state, submit buttons
- Label resolution handles: `aria-labelledby`, `aria-label`, `<label for>`, wrapping `<label>`, `placeholder`, `title`, inferred from `name`
- `browser_fill_form` maps values by label, name, placeholder, aria-label — exact match first, then substring
- Fill uses Playwright APIs (`fill()`, `selectOption()`, `setChecked()`) not `page.evaluate()` value setting
- Fill reports: matched fields, unmatched keys, skipped fields (file inputs, hidden, custom dropdowns), validation state after fill
- Optional submit flag on `browser_fill_form`
- Ambiguous matches reported rather than wrong-field fills
- Auto-detect form if no selector provided (single form → use it, multiple → most visible inputs, none → body)
- Hidden fields included in analysis but flagged as not user-fillable
- Fieldset/legend grouping captured as context metadata
- Both tools registered and functional — build passes
## Proof Level
- This slice proves: integration (tools work against real HTML forms in a running browser)
- Real runtime required: yes (Playwright browser for verification)
- Human/UAT required: no (automated verification against test page sufficient; UAT deferred to S06)
## Verification
- `cd pkg && npm run build` — build succeeds with new tools
- Standalone jiti verification script that loads the extension and confirms tool count is 45 (43 existing + 2 new)
- Browser verification: serve a test HTML form, run `browser_analyze_form`, assert field inventory matches expected structure
- Browser verification: run `browser_fill_form` with values mapping, assert fields are filled correctly
## Integration Closure
- Upstream surfaces consumed: `state.ts` (ToolDeps), `lifecycle.ts` (ensureBrowser, getActiveTarget), `settle.ts` (settleAfterActionAdaptive), `utils.ts` (beginTrackedAction, finishTrackedAction, formatCompactStateSummary), `capture.ts` (captureCompactPageState, captureErrorScreenshot)
- New wiring introduced in this slice: `import { registerFormTools } from "./tools/forms.js"` + `registerFormTools(pi, deps)` in index.ts
- What remains before the milestone is truly usable end-to-end: S05 (intent-ranked retrieval, semantic actions), S06 (test coverage)
## Tasks
- [x] **T01: Implement browser_analyze_form with full label resolution** `est:45m`
- Why: R022 — the form analysis tool is the foundation. Its evaluate function implements label resolution heuristics that drive both analysis output and inform the fill tool's matching strategy.
- Files: `src/resources/extensions/browser-tools/tools/forms.ts`, `src/resources/extensions/browser-tools/index.ts`
- Do: Create `forms.ts` with `registerFormTools(pi, deps)`. Implement `browser_analyze_form` with a single `page.evaluate()` that inventories all form fields — full label resolution (aria-labelledby, aria-label, label-for, wrapping label, placeholder, title, name), type/required/value/validation extraction, fieldset/legend grouping, submit button detection. Auto-detect form if no selector given. Follow interaction.ts patterns for beginTrackedAction/finishTrackedAction and error handling. Wire into index.ts.
- Verify: `npm run build` passes, jiti load confirms 45 tools registered
- Done when: `browser_analyze_form` is registered, build succeeds, tool count is 45
- [x] **T02: Implement browser_fill_form and verify both tools against a real form** `est:45m`
- Why: R023 — the fill tool completes the form intelligence pair. End-to-end verification against a real form proves both tools work and retires the key risk (label association).
- Files: `src/resources/extensions/browser-tools/tools/forms.ts`
- Do: Add `browser_fill_form` to `forms.ts`. Matching logic: try exact label match via `getByLabel()`, then `[name=]`, then `[placeholder=]`, then `[aria-label=]`. Use `fill()` for text inputs, `selectOption()` for selects, `setChecked()` for checkboxes/radios. Handle ambiguity (report, don't guess). Skip file inputs and hidden fields. Settle after fills. Optional submit flag. Return matched/unmatched/skipped summary with post-fill validation state. Verify both tools against a served multi-field test HTML form.
- Verify: Build passes, serve test HTML form, run `browser_analyze_form` → verify field inventory, run `browser_fill_form` → verify fields filled and validation state returned
- Done when: Both tools work against a real multi-field form — analyze returns correct field inventory, fill maps values correctly and reports results
## Files Likely Touched
- `src/resources/extensions/browser-tools/tools/forms.ts` (new)
- `src/resources/extensions/browser-tools/index.ts`

View file

@ -1,67 +0,0 @@
---
estimated_steps: 5
estimated_files: 3
---
# T01: Implement browser_analyze_form with full label resolution
**Slice:** S04 — Form Intelligence
**Milestone:** M002
## Description
Create `tools/forms.ts` with the `registerFormTools` function and implement `browser_analyze_form`. The tool takes an optional form selector, auto-detects the form if not provided, and returns a structured field inventory via a single `page.evaluate()` call. The evaluate function implements the full label resolution priority chain (aria-labelledby → aria-label → label-for → wrapping label → placeholder → title → name inference). Wire the new file into `index.ts`.
## Steps
1. Create `src/resources/extensions/browser-tools/tools/forms.ts` with the `registerFormTools(pi, deps)` export. Define the `browser_analyze_form` tool schema using Typebox — parameters: `selector` (optional string for form CSS selector). Return type is a structured field inventory.
2. Implement the `browser_analyze_form` execute function following the interaction.ts pattern: `ensureBrowser()``getActiveTarget()``captureCompactPageState()` (for before-state) → `beginTrackedAction()``page.evaluate()``finishTrackedAction()`. Error path uses `captureErrorScreenshot()`.
3. Implement the `page.evaluate()` callback with:
- Form auto-detection: if no selector, find the single `<form>` or the form with most visible inputs, or fall back to `document.body`
- Field inventory: iterate all `<input>`, `<select>`, `<textarea>` within the form scope
- Label resolution (priority order): `aria-labelledby``aria-label``<label for="id">` → wrapping `<label>``placeholder``title` → humanized `name`
- For each field: extract `type`, `name`, `id`, `label` (resolved), `required`, `value`, `checked` (for checkboxes/radios), `options` (for selects), `validation` (ValidityState + validationMessage), `hidden` flag, `disabled` flag
- Fieldset/legend: walk up from each field to capture `<fieldset>` `<legend>` text as `group`
- Submit buttons: find `<button type="submit">`, `<input type="submit">`, and `<button>` without explicit type within the form
- Return: `{ formSelector, fields: [...], submitButtons: [...], fieldCount, visibleFieldCount }`
4. Wire into `index.ts`: import `registerFormTools` from `./tools/forms.js` and add `registerFormTools(pi, deps)` call alongside the other register calls.
5. Build and verify: run `npm run build`, then run a jiti verification script confirming 45 tools are registered (43 existing + browser_analyze_form + placeholder for browser_fill_form — actually 44 since fill isn't added yet; correct: verify 44 tools).
## Must-Haves
- [ ] `browser_analyze_form` registered as a tool with optional `selector` parameter
- [ ] Single `page.evaluate()` call collects entire field inventory
- [ ] Label resolution handles all 7 priority levels: aria-labelledby, aria-label, label-for, wrapping label, placeholder, title, name
- [ ] Fields include: type, name, id, label, required, value, validation state, hidden flag, disabled flag
- [ ] Select elements include their options
- [ ] Checkbox/radio elements include checked state
- [ ] Fieldset/legend captured as group context
- [ ] Submit buttons detected and listed
- [ ] Form auto-detection when no selector provided
- [ ] Hidden fields included but flagged
- [ ] Follows beginTrackedAction/finishTrackedAction pattern with error handling
- [ ] Wired into index.ts
- [ ] Build passes
## Verification
- `cd pkg && npm run build` completes without errors
- jiti script loads extension, counts registered tools — expect 44 (43 + browser_analyze_form)
- `grep -c "registerFormTools" src/resources/extensions/browser-tools/index.ts` returns 2 (import + call)
## Inputs
- `src/resources/extensions/browser-tools/tools/interaction.ts` — pattern for tool registration, beginTrackedAction/finishTrackedAction, error handling
- `src/resources/extensions/browser-tools/index.ts` — wiring pattern for new tool groups
- `src/resources/extensions/browser-tools/state.ts` — ToolDeps interface (no modifications needed)
- `src/resources/extensions/browser-tools/evaluate-helpers.ts` — window.__pi utilities (accessibleName available but doesn't handle label-for; form evaluate must implement its own label resolution)
## Expected Output
- `src/resources/extensions/browser-tools/tools/forms.ts` — new file with `registerFormTools()` containing `browser_analyze_form` implementation
- `src/resources/extensions/browser-tools/index.ts` — modified with import and registration of form tools
- Build succeeds with 44 registered tools

View file

@ -1,78 +0,0 @@
---
estimated_steps: 5
estimated_files: 2
---
# T02: Implement browser_fill_form and verify both tools against a real form
**Slice:** S04 — Form Intelligence
**Milestone:** M002
## Description
Add `browser_fill_form` to `tools/forms.ts` and verify both form tools end-to-end against a served multi-field HTML test form. The fill tool takes a values mapping, resolves fields by label/name/placeholder/aria-label, fills using Playwright APIs, and returns a detailed result with matched/unmatched/skipped fields and post-fill validation state. End-to-end verification retires the key risk: label association works on real HTML forms.
## Steps
1. Implement `browser_fill_form` in `tools/forms.ts`. Schema: `selector` (optional form selector), `values` (Record<string, string> — keys are field identifiers, values are field values), `submit` (optional boolean). Follow interaction.ts patterns for tracked actions and error handling.
2. Implement the field matching and filling logic:
- For each key in the values mapping, resolve the target field in priority order:
1. Exact label match via Playwright `getByLabel(key, { exact: true })` scoped to the form
2. Case-insensitive label match via `getByLabel(key)` scoped to form
3. `locator('[name="key"]')` scoped to form
4. `locator('[placeholder="key" i]')` scoped to form
5. `locator('[aria-label="key" i]')` scoped to form
- If multiple matches found for a key, report ambiguity — don't fill
- If no match found, add to unmatched list
- For matched fields: use `locator.fill()` for text/email/password/url/tel/search/number inputs and textareas, `locator.selectOption()` for selects, `locator.setChecked(value === 'true' || value === 'on')` for checkboxes/radios
- Skip file inputs and hidden inputs — add to skipped list with reason
- Catch Playwright errors per-field (e.g. `fill()` on non-fillable) and add to skipped with error message
3. After all fills: run `settleAfterActionAdaptive()`, then collect post-fill validation state via `page.evaluate()` on the form fields. If `submit` flag is set, find and click the form's submit button (first `[type=submit]` or first `<button>` in form). Return structured result: `{ matched: [...], unmatched: [...], skipped: [...], submitted: boolean, validationSummary: {...} }`.
4. Create a test HTML file with diverse field types: text inputs with labels (for, wrapping, aria-label), selects, checkboxes, radios, textareas, fieldsets, required fields, hidden inputs, a file input, a submit button. Serve it via a local HTTP server.
5. Verify end-to-end: navigate to the test form, run `browser_analyze_form` and verify the field inventory matches expected structure (correct labels, types, required flags). Run `browser_fill_form` with a values mapping and verify fields are filled (check via `page.evaluate` reading field values), unmatched keys are reported, and file/hidden inputs are skipped.
## Must-Haves
- [ ] `browser_fill_form` registered with `selector`, `values`, and `submit` parameters
- [ ] Matching resolves fields by label → name → placeholder → aria-label, exact first then case-insensitive
- [ ] Uses Playwright `fill()` for text-like inputs, `selectOption()` for selects, `setChecked()` for checkboxes/radios
- [ ] Ambiguous matches reported, not guessed
- [ ] File inputs and hidden inputs skipped with reason
- [ ] Per-field errors caught and reported (not tool-level crash)
- [ ] Post-fill validation state collected
- [ ] Optional submit clicks the submit button
- [ ] Settle after fills
- [ ] Both tools verified against a real multi-field HTML form
- [ ] Build passes, tool count is 45
## Verification
- `cd pkg && npm run build` completes without errors
- jiti script loads extension, counts registered tools — expect 45
- Serve test HTML form, navigate browser, run `browser_analyze_form`:
- Returns fields with correct labels resolved from various association methods
- Hidden fields flagged
- Select options listed
- Submit buttons detected
- Run `browser_fill_form` with values mapping:
- Text fields filled (verified by reading values back)
- Select option changed
- Checkbox checked
- File input skipped
- Hidden input skipped
- Unmatched keys reported
- Validation state returned
## Inputs
- `src/resources/extensions/browser-tools/tools/forms.ts` — T01's output with `browser_analyze_form` implemented
- `src/resources/extensions/browser-tools/tools/interaction.ts` — pattern reference for settle, tracked actions
## Expected Output
- `src/resources/extensions/browser-tools/tools/forms.ts` — updated with `browser_fill_form` implementation
- End-to-end verification passing: both tools work against a real multi-field form with diverse field types and label association patterns

View file

@ -1,52 +0,0 @@
# S05: Intent-ranked retrieval and semantic actions
**Goal:** `browser_find_best` returns scored candidates for semantic intents; `browser_act` resolves the top candidate and executes it in one call.
**Demo:** Run `browser_find_best` with intent "submit_form" against a real page with a form and get ranked candidates. Run `browser_act` with intent "close_dialog" against a page with a modal and see it dismissed.
## Must-Haves
- `browser_find_best` registered and functional with 8 intents: submit_form, close_dialog, primary_cta, search_field, next_step, dismiss, auth_action, back_navigation
- Each intent uses deterministic heuristic scoring (no LLM calls) with 2+ scoring dimensions per intent
- Candidates include CSS selectors usable with Playwright locator APIs
- Results capped at 5 candidates, scored 0-1 with human-readable reasons
- Intent strings normalized (accept underscores, spaces, mixed case)
- `browser_act` resolves top candidate, executes via Playwright locator click (not evaluate click), settles, returns before/after diff
- `browser_act` returns error (not throw) when zero candidates found
- Both tools wired into index.ts, tool count = 47
- Build passes
## Proof Level
- This slice proves: integration (new tools against real browser pages)
- Real runtime required: yes (Playwright against real pages)
- Human/UAT required: no (automated verification sufficient)
## Verification
- `npm run build` passes
- `grep -c "pi.registerTool" src/resources/extensions/browser-tools/tools/*.ts` sums to 47
- `browser_find_best` with intent "submit_form" against a page with a `<form>` returns candidates with scores > 0
- `browser_find_best` with intent "close_dialog" against a page with a `[role="dialog"]` returns close button candidates
- `browser_act` with intent "submit_form" clicks the submit button and returns before/after state
- `browser_act` against a page with no dialog returns a graceful error (not throw) for "close_dialog" intent
- Scoring heuristics produce differentiated rankings (top candidate scores higher than others)
## Integration Closure
- Upstream surfaces consumed: `evaluate-helpers.ts` (window.__pi utilities), `lifecycle.ts` (ensureBrowser, getActiveTarget), `state.ts` (ToolDeps, CompactPageState), `utils.ts` (action tracking, formatting), `core.js` (diffCompactStates), `settle.ts` (settleAfterActionAdaptive)
- New wiring introduced: `tools/intent.ts` + import/call in `index.ts`
- What remains before the milestone is truly usable end-to-end: S06 (test coverage)
## Tasks
- [x] **T01: Implement browser_find_best and browser_act with 8-intent scoring engine** `est:45m`
- Why: This is the entire slice — two tools sharing a single intent resolution engine, all in one file following the established forms.ts pattern. The scoring evaluate script, both tool registrations, and the index.ts wiring are tightly coupled and well within a single context window (~350 lines new code, 2 files created/modified).
- Files: `src/resources/extensions/browser-tools/tools/intent.ts` (new), `src/resources/extensions/browser-tools/index.ts` (wire)
- Do: Build `buildIntentScoringScript(intent, scope?)` as a string template evaluate returning scored candidates with cssPath selectors. Implement 8 intent scoring functions using window.__pi utilities (inferRole, accessibleName, isVisible, isEnabled, isInteractiveEl). Register `browser_find_best` (intent + optional scope → scored candidates) and `browser_act` (intent + optional scope → resolve top candidate → Playwright locator click → settle → diff). Wire via registerIntentTools import + call in index.ts.
- Verify: `npm run build` passes; grep tool count = 47; run both tools against real test pages via Playwright scripts
- Done when: Both tools registered, build passes, verified against real pages with forms and dialogs
## Files Likely Touched
- `src/resources/extensions/browser-tools/tools/intent.ts` (new)
- `src/resources/extensions/browser-tools/index.ts` (wire registration)

View file

@ -1,85 +0,0 @@
---
estimated_steps: 5
estimated_files: 2
---
# T01: Implement browser_find_best and browser_act with 8-intent scoring engine
**Slice:** S05 — Intent-ranked retrieval and semantic actions
**Milestone:** M002
## Description
Create `tools/intent.ts` with both `browser_find_best` and `browser_act`, sharing a single intent resolution engine built as a string template evaluate script (same pattern as forms.ts `buildFormAnalysisScript`). The scoring engine runs entirely in-browser via `page.evaluate()`, using `window.__pi` utilities for element metadata. Each of 8 intents has a candidate selector strategy and multi-dimensional scoring function. `browser_act` takes the top candidate from the same scoring logic, executes via Playwright `locator().click()` (D021), settles, and returns a before/after diff.
## Steps
1. **Create `tools/intent.ts`** with the `registerIntentTools(pi, deps)` export function. Define the 8 intent names as a const array and use `StringEnum` for the parameter schema. Build `buildIntentScoringScript(intent, scope?)` as a string template that:
- Normalizes the intent string (lowercase, strip spaces/underscores/hyphens)
- For each intent, selects candidate elements (e.g., submit_form → buttons/inputs inside or near forms; close_dialog → buttons inside `[role="dialog"]` or `dialog` elements)
- Scores each candidate 0-1 across 2-4 dimensions (structural position, role, text signals, visibility/enabled state)
- Returns top 5 candidates sorted by score, each with: `{ score, selector, tag, role, name, text, reason }`
- Uses `window.__pi.cssPath()` for selector generation, `window.__pi.inferRole()` / `window.__pi.accessibleName()` / `window.__pi.isVisible()` / `window.__pi.isEnabled()` for scoring signals
2. **Implement the 8 intent scoring functions** inside the evaluate string template:
- `submitform` — query `button[type="submit"], input[type="submit"], button:not([type])` within forms; score by: is-submit-type, inside-form, text-suggests-submission, visible+enabled
- `closedialog` — query buttons/links inside `[role="dialog"], dialog, [aria-modal="true"]`; score by: text-matches-close-pattern, has-aria-label-close, is-visible, position (top-right gets a boost)
- `primarycta` — query all visible enabled buttons/links; score by: visual prominence (size), semantic weight (role=button > link), text-not-cancel/dismiss, position (main content area)
- `searchfield` — query inputs with type=search or role=searchbox or name/placeholder matching "search"; score by: type-match, placeholder-match, visibility, is-in-header/nav
- `nextstep` — query buttons/links with text matching next/continue/proceed/forward patterns; score by: text-match-strength, is-button, visible+enabled, not-disabled
- `dismiss` — query buttons/links matching close/cancel/dismiss/skip/no-thanks patterns; score by: text-match, position, inside-dialog/modal/overlay, is-visible
- `authaction` — query buttons/links matching login/sign-in/signup/register patterns; score by: text-match-strength, is-button-or-link, prominent-position, visible
- `backnavigation` — query buttons/links matching back/previous/return patterns; score by: text-match, has-back-arrow/icon, is-in-nav/header, visible
3. **Register `browser_find_best`** tool:
- Parameters: `intent` (StringEnum of 8 intents), optional `scope` (CSS selector to narrow search)
- Execute: ensureBrowser → getActiveTarget → captureCompactPageState (before) → target.evaluate(buildIntentScoringScript) → format results as markdown with scores and selectors → tracked action finish
- Output format: numbered candidates with score, selector, role, text, and reason
4. **Register `browser_act`** tool:
- Parameters: `intent` (same StringEnum), optional `scope` (CSS selector)
- Execute: ensureBrowser → captureCompactPageState (before) → target.evaluate(buildIntentScoringScript) → if zero candidates, return error → take top candidate → locator(candidate.selector).click() with getByRole fallback → settleAfterActionAdaptive → captureCompactPageState (after) → diffCompactStates → format result with before/after diff
- For search_field intent: focus instead of click
- Error handling: graceful error return when no candidates found, captureErrorScreenshot on unexpected failures
5. **Wire into index.ts**: Add `import { registerIntentTools } from "./tools/intent.js"` and `registerIntentTools(pi, deps)` call. Verify build passes and tool count = 47.
## Must-Haves
- [ ] `browser_find_best` registered with 8-intent StringEnum parameter
- [ ] `browser_act` registered with same 8-intent parameter
- [ ] Intent scoring runs as a single page.evaluate() string template per call
- [ ] Each intent has 2+ orthogonal scoring dimensions producing differentiated rankings
- [ ] Scoring uses `window.__pi.*` utilities (no inline redeclarations)
- [ ] Candidates include CSS selectors from `window.__pi.cssPath()`
- [ ] Results capped at 5 candidates, scored 0-1
- [ ] Intent string normalization handles underscores, spaces, mixed case
- [ ] `browser_act` clicks via `target.locator(selector).click()` not `page.evaluate(() => el.click())`
- [ ] `browser_act` returns error (not throw) when zero candidates
- [ ] Both tools use tracked action pattern (beginTrackedAction / finishTrackedAction)
- [ ] Tool count = 47 after wiring
- [ ] `npm run build` passes
## Verification
- `npm run build` passes with zero errors
- `grep -c "pi.registerTool" src/resources/extensions/browser-tools/tools/*.ts | awk -F: '{s+=$2} END {print s}'` outputs 47
- Playwright verification script against a test HTML page with form + dialog:
- `browser_find_best` intent="submit_form" returns candidates with submit button scored highest
- `browser_find_best` intent="close_dialog" returns close/dismiss button inside dialog
- `browser_act` intent="submit_form" clicks the submit button
- `browser_act` intent="close_dialog" with no dialog on page returns error, not crash
## Inputs
- `src/resources/extensions/browser-tools/tools/forms.ts` — pattern for string template evaluates, tool registration, error handling
- `src/resources/extensions/browser-tools/tools/interaction.ts` — pattern for Playwright locator click with getByRole fallback
- `src/resources/extensions/browser-tools/evaluate-helpers.ts` — window.__pi API surface (9 functions)
- `src/resources/extensions/browser-tools/index.ts` — wiring pattern (import + ToolDeps + registerXTools call)
- `src/resources/extensions/browser-tools/state.ts` — ToolDeps interface, CompactPageState type
- S05-RESEARCH.md — intent list, scoring guidance, common pitfalls
## Expected Output
- `src/resources/extensions/browser-tools/tools/intent.ts` — new file with ~350-400 lines containing `registerIntentTools(pi, deps)`, `buildIntentScoringScript()`, and both tool registrations
- `src/resources/extensions/browser-tools/index.ts` — modified with 1 new import line + 1 new registration call

View file

@ -1,43 +0,0 @@
# S06: Test coverage
**Goal:** Test suite covers shared browser-side utilities, settle logic, screenshot resizing, form analysis heuristics, intent scoring, and semantic action resolution.
**Demo:** `npm run test:browser-tools` passes — unit tests via jiti and integration tests via Playwright both green.
## Must-Haves
- Unit tests for pure Node-side functions: parseRef, formatVersionedRef, staleRefGuidance, formatCompactStateSummary, verificationFromChecks, verificationLine, sanitizeArtifactName, isCriticalResourceType, getUrlHash, firstErrorLine, formatArtifactTimestamp
- Unit test for EVALUATE_HELPERS_SOURCE syntax validity (parseable via `new Function()`)
- Unit tests for state accessor pairs (set/get round-trip) and resetAllState
- Unit tests for constrainScreenshot with synthetic sharp buffers (JPEG/PNG, within-bounds passthrough, over-bounds resize)
- Integration tests for window.__pi utility functions (simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, cssPath) via Playwright page.evaluate against real DOM
- Integration tests for intent scoring differentiation (submit_form, close_dialog, search_field, primary_cta) via Playwright page.evaluate of buildIntentScoringScript output
- Integration tests for form label resolution (7-level priority chain) via Playwright page.evaluate of buildFormAnalysisScript output
- `test:browser-tools` script in package.json — separate from existing `test` script
## Verification
- `npm run test:browser-tools` exits 0 with all tests passing
- Unit test file: `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs`
- Integration test file: `src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs`
## Tasks
- [x] **T01: Unit tests for Node-side pure functions, state accessors, and constrainScreenshot** `est:30m`
- Why: Covers all pure-function logic from utils.ts, state.ts, evaluate-helpers.ts, and capture.ts that can be tested without a browser. These are the fastest, most stable tests.
- Files: `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs`, `package.json`
- Do: Create tests/ directory. Write CJS test file using `node:test` + `node:assert/strict` + `@mariozechner/jiti` for imports. Test pure functions from utils.ts (parseRef, formatVersionedRef, staleRefGuidance, formatCompactStateSummary, verificationFromChecks, verificationLine, sanitizeArtifactName, isCriticalResourceType, getUrlHash, firstErrorLine, formatArtifactTimestamp). Test EVALUATE_HELPERS_SOURCE parseable via `new Function()` and contains all 9 expected function names. Test state accessor round-trips and resetAllState. Test constrainScreenshot with synthetic sharp buffers: small JPEG passthrough, oversized JPEG resize, PNG resize. Add `test:browser-tools` script to package.json.
- Verify: `npm run test:browser-tools` passes all unit tests
- Done when: All unit tests pass, `test:browser-tools` script exists
- [x] **T02: Integration tests for browser-side utilities, intent scoring, and form analysis via Playwright** `est:30m`
- Why: Covers the evaluate-script logic that requires a real DOM — window.__pi functions, intent scoring heuristics, and form label resolution. These test the actual codepath (page.evaluate with IIFE strings) that the tools use in production.
- Files: `src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs`, `package.json`
- Do: Write ESM test file using `node:test` + `node:assert/strict` + Playwright chromium. Launch browser once in `before()`, close in `after()`. Test window.__pi functions by injecting EVALUATE_HELPERS_SOURCE then evaluating each function against HTML fixtures via `page.setContent()`. Test intent scoring by calling buildIntentScoringScript (not exported — read forms.ts and intent.ts to extract the evaluate script strings, or use the same evaluate-script-building approach from the source). Test form analysis by evaluating buildFormAnalysisScript output against a multi-field HTML form. Set explicit viewport dimensions (1280×720) for deterministic scoring. Update `test:browser-tools` script to include this file.
- Verify: `npm run test:browser-tools` passes all integration tests
- Done when: All integration tests pass including browser-side utility, intent scoring, and form analysis tests
## Files Likely Touched
- `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs`
- `src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs`
- `package.json`

View file

@ -1,52 +0,0 @@
---
estimated_steps: 5
estimated_files: 3
---
# T01: Unit tests for Node-side pure functions, state accessors, and constrainScreenshot
**Slice:** S06 — Test coverage
**Milestone:** M002
## Description
Create the browser-tools test infrastructure and write unit tests for all pure Node-side functions. Uses jiti for TypeScript imports (the resolve-ts ESM hook breaks on core.js), `node:test` for the runner, and `node:assert/strict` for assertions. Tests constrainScreenshot with synthetic sharp buffers — it's a pure buffer-in/buffer-out function since S03 removed the page dependency.
## Steps
1. Create `src/resources/extensions/browser-tools/tests/` directory and the `.cjs` test file with jiti-based imports of utils.ts, state.ts, evaluate-helpers.ts, and capture.ts.
2. Write tests for pure utility functions from utils.ts: parseRef (valid ref, invalid ref, legacy format), formatVersionedRef, staleRefGuidance, formatCompactStateSummary (with mock CompactPageState), verificationFromChecks (pass/fail cases), verificationLine, sanitizeArtifactName (valid, empty, special chars), isCriticalResourceType (document/stylesheet/script vs image/font), getUrlHash, firstErrorLine (Error, string, unknown), formatArtifactTimestamp.
3. Write tests for EVALUATE_HELPERS_SOURCE: parseable via `new Function(source)`, contains all 9 expected function assignment strings (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, domPath, selectorHints).
4. Write tests for state accessor round-trips (setBrowser/getBrowser, setContext/getContext, setActiveFrame/getActiveFrame, setSessionStartedAt/getSessionStartedAt, setSessionArtifactDir/getSessionArtifactDir, setCurrentRefMap/getCurrentRefMap, setRefVersion/getRefVersion, setRefMetadata/getRefMetadata, setLastActionBeforeState/getLastActionBeforeState, setLastActionAfterState/getLastActionAfterState) and resetAllState clearing all of them.
5. Write tests for constrainScreenshot: create synthetic JPEG buffer (800×600) via sharp — should pass through unchanged. Create oversized JPEG buffer (3000×2000) — should resize within 1568px. Create oversized PNG buffer — should resize and return PNG. Add `test:browser-tools` script to package.json: `node --test src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs`.
## Must-Haves
- [ ] jiti imports work for all browser-tools modules
- [ ] All pure utility function tests pass
- [ ] EVALUATE_HELPERS_SOURCE syntax validation passes
- [ ] State accessor round-trip tests pass
- [ ] resetAllState clears all state
- [ ] constrainScreenshot passthrough for small images
- [ ] constrainScreenshot resizes oversized JPEG
- [ ] constrainScreenshot resizes oversized PNG
- [ ] `test:browser-tools` script added to package.json
## Verification
- `npm run test:browser-tools` exits 0
- Test output shows all test cases passing
## Inputs
- `src/resources/extensions/browser-tools/utils.ts` — pure functions to test
- `src/resources/extensions/browser-tools/state.ts` — accessor pairs and resetAllState
- `src/resources/extensions/browser-tools/evaluate-helpers.ts` — EVALUATE_HELPERS_SOURCE constant
- `src/resources/extensions/browser-tools/capture.ts` — constrainScreenshot function
- S01 summary — accessor pattern details, jiti compatibility requirement
- S03 summary — constrainScreenshot is now pure buffer-in/buffer-out with unused `_page` param
## Expected Output
- `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs` — complete unit test file with 30+ test cases
- `package.json``test:browser-tools` script added

View file

@ -1,64 +0,0 @@
---
estimated_steps: 4
estimated_files: 2
---
# T02: Integration tests for browser-side utilities, intent scoring, and form analysis via Playwright
**Slice:** S06 — Test coverage
**Milestone:** M002
## Description
Write Playwright-based integration tests that exercise the browser-side evaluate scripts against real DOM. These test the actual codepath — IIFE strings evaluated via `page.evaluate()` against HTML fixtures. Covers window.__pi utilities from evaluate-helpers.ts, intent scoring from intent.ts, and form label resolution from forms.ts. The scoring and form analysis functions are module-private (not exported), so we replicate the evaluate approach: read the source files to extract the IIFE strings, then evaluate them in Playwright.
## Steps
1. Create the `.mjs` test file. Import `node:test`, `node:assert/strict`, `playwright` (chromium), and use jiti or direct file reads to get EVALUATE_HELPERS_SOURCE and the evaluate script source strings. Launch Chromium once in `before()`, set viewport to 1280×720, close in `after()`.
2. Write window.__pi utility tests: inject EVALUATE_HELPERS_SOURCE via `page.evaluate()`, then test each function against inline HTML fixtures via `page.setContent()`:
- `simpleHash` — deterministic output for same input, different output for different input
- `isVisible` — visible element returns true, `display:none` returns false
- `isEnabled` — enabled input returns true, disabled returns false
- `inferRole` — button element → "button", anchor with href → "link", input[type=text] → "textbox"
- `accessibleName` — button with text content, input with aria-label, input with label[for]
- `isInteractiveEl` — button → true, div → false, input → true
- `cssPath` — returns a valid CSS selector string that `querySelector` resolves back to the element
3. Write intent scoring tests: read `tools/intent.ts` source, extract the IIFE returned by `buildIntentScoringScript` for each intent (or replicate the script-building approach), then evaluate against HTML fixtures:
- `submit_form` — form with submit button scores higher than a random button outside the form
- `close_dialog` — dialog with × button and Cancel: × button scores highest
- `search_field` — input[type=search] scores higher than input[type=text]
- `primary_cta` — large styled button in main content scores higher than small nav link
4. Write form analysis tests: replicate `buildFormAnalysisScript()` call (or extract the script string), evaluate against a multi-field HTML form:
- Label via `label[for]` resolves correctly
- Label via wrapping `<label>` resolves correctly
- Label via `aria-label` resolves correctly
- Label via `aria-labelledby` resolves correctly
- Label via `placeholder` as fallback
- Hidden input is flagged as hidden
- Submit button is discovered
Update `test:browser-tools` script to glob both test files.
## Must-Haves
- [ ] Chromium launches and closes cleanly
- [ ] All 7 window.__pi utility functions tested
- [ ] Intent scoring tests show differentiated rankings for at least 4 intents
- [ ] Form analysis tests verify label resolution for at least 5 association methods
- [ ] `test:browser-tools` script runs both unit and integration test files
## Verification
- `npm run test:browser-tools` exits 0 with both unit and integration tests passing
- Integration tests complete in <30s
## Inputs
- `src/resources/extensions/browser-tools/evaluate-helpers.ts` — EVALUATE_HELPERS_SOURCE for injection
- `src/resources/extensions/browser-tools/tools/intent.ts` — buildIntentScoringScript source (module-private, need to extract the script string)
- `src/resources/extensions/browser-tools/tools/forms.ts` — buildFormAnalysisScript source (module-private, need to extract the script string)
- T01 output — test infrastructure exists, `test:browser-tools` script in package.json
## Expected Output
- `src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs` — integration test file with ~20-25 test cases
- `package.json``test:browser-tools` script updated to include both files