diff --git a/.gsd/DECISIONS.md b/.gsd/DECISIONS.md index 0c3159a18..24c061b31 100644 --- a/.gsd/DECISIONS.md +++ b/.gsd/DECISIONS.md @@ -28,3 +28,5 @@ | D020 | M002/S04 | pattern | Form analysis evaluate location | Form analysis evaluate logic lives in tools/forms.ts, not extracted to evaluate-helpers.ts | Form-specific, not a shared utility. The label resolution heuristic is only used by form tools. Keeping it local avoids bloating the shared injection. | Yes — if S05 intent tools need label resolution | | D021 | M002/S04 | pattern | Fill uses Playwright APIs, not evaluate | browser_fill_form uses Playwright locator.fill()/selectOption()/setChecked() instead of page.evaluate() value setting | Playwright APIs trigger proper input/change events and handle framework-specific reactivity (React, Vue). Direct value setting via evaluate skips event dispatch and breaks reactive frameworks. | No | | D022 | M002/S04 | pattern | Fill field matching priority | Label (exact → case-insensitive) → name → placeholder → aria-label | Label is the most human-readable identifier. Name is the most reliable programmatic identifier. Placeholder and aria-label are fallbacks. Exact match before fuzzy prevents wrong-field fills. | Yes — if real-world usage shows a different priority works better | +| D023 | M002/S05 | pattern | Intent scoring model | 4 orthogonal dimensions per intent, each 0-1, summed and clamped | Consistent scoring structure across all 8 intents. Makes scoring testable and debuggable — each dimension has a named reason. 4 dimensions balance discrimination vs complexity. | Yes — could add/remove dimensions per intent if real-world usage shows imbalance | +| D024 | M002/S05 | pattern | search_field action type | Focus instead of click for search_field intent in browser_act | Search fields need keyboard focus for typing, not a click that might submit or toggle. Focus is the semantically correct action. Other intents use click. | Yes — if focus proves unreliable on specific input implementations | diff --git a/.gsd/PROJECT.md b/.gsd/PROJECT.md index fd216e036..9fd033848 100644 --- a/.gsd/PROJECT.md +++ b/.gsd/PROJECT.md @@ -16,7 +16,7 @@ The GSD extension is fully functional with: - Guided `/gsd` wizard flow - `secure_env_collect` tool with masked TUI input, multi-destination write support, guidance display, and summary screen - Proactive secret management: planning prompts forecast secrets, manifests persist them, auto-mode collects them before first dispatch -- Browser-tools extension with 45 registered tools covering navigation, interaction, inspection, verification, tracing, debugging, and form intelligence (browser_analyze_form, browser_fill_form) +- Browser-tools extension with 47 registered tools covering navigation, interaction, inspection, verification, tracing, debugging, form intelligence (browser_analyze_form, browser_fill_form), and intent-ranked retrieval and semantic actions (browser_find_best, browser_act) - Browser-tools `core.js` with shared utilities for action timeline, page registry, state diffing, assertions, fingerprinting ## Architecture / Key Patterns @@ -26,7 +26,7 @@ The GSD extension is fully functional with: - **Secrets gate**: `startAuto()` checks `getManifestStatus()` before first dispatch - **Disk-driven state**: `.gsd/` files are the source of truth, `STATE.md` is derived cache - **File parsing**: `files.ts` has markdown parsers for all GSD file types -- **Browser-tools**: Modular structure — slim `index.ts` orchestrator, 8 focused infrastructure modules (state.ts, utils.ts, evaluate-helpers.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts), 10 categorized tool files under `tools/` (including forms.ts), shared infrastructure in `core.js` (~1000 lines). Browser-side utilities injected once via `addInitScript` under `window.__pi` namespace. Uses Playwright for browser control. Accessibility-first state representation, deterministic versioned refs, adaptive DOM settling, compact post-action summaries. Form tools use Playwright locator APIs for type-aware filling with structured result reporting. +- **Browser-tools**: Modular structure — slim `index.ts` orchestrator, 8 focused infrastructure modules (state.ts, utils.ts, evaluate-helpers.ts, lifecycle.ts, capture.ts, settle.ts, refs.ts), 11 categorized tool files under `tools/` (including forms.ts, intent.ts), shared infrastructure in `core.js` (~1000 lines). Browser-side utilities injected once via `addInitScript` under `window.__pi` namespace. Uses Playwright for browser control. Accessibility-first state representation, deterministic versioned refs, adaptive DOM settling, compact post-action summaries. Form tools use Playwright locator APIs for type-aware filling with structured result reporting. Intent tools use deterministic 4-dimension heuristic scoring for element retrieval and one-call semantic actions. - **Prompt templates**: `prompts/` directory with mustache-like `{{var}}` substitution - **TUI components**: `@gsd/pi-tui` provides `Editor`, `Text`, key handling, themes - **Branch-per-slice**: git branches isolate slice work, squash-merged to main on completion diff --git a/.gsd/REQUIREMENTS.md b/.gsd/REQUIREMENTS.md index c562f7655..52c5c0f2a 100644 --- a/.gsd/REQUIREMENTS.md +++ b/.gsd/REQUIREMENTS.md @@ -50,24 +50,24 @@ This file is the explicit capability and coverage contract for the project. ### R024 — Intent-ranked element retrieval (browser_find_best) - Class: core-capability -- Status: active +- Status: validated - Description: A browser_find_best tool that takes an intent string (e.g. "submit form", "close dialog", "primary CTA") and returns scored candidates with reasons, using deterministic heuristic ranking. - Why it matters: The agent frequently needs "which button submits this form?" Currently it does browser_find → gets 15 candidates → reasons about which one. A heuristic ranker cuts a round trip and reduces reasoning tokens. - Source: user - Primary owning slice: M002/S05 - Supporting slices: M002/S01 -- Validation: unmapped +- Validation: 8 intents implemented with 4-dimension scoring (submit_form, close_dialog, primary_cta, search_field, next_step, dismiss, auth_action, back_navigation). Each returns up to 5 candidates sorted by score with CSS selectors and reason strings. Intent normalization accepts underscores/spaces/hyphens. Verified via Playwright tests against real HTML pages with differentiated rankings. Build passes, tool count = 47. - Notes: Deterministic heuristics only. No hidden LLM calls. ### R025 — Semantic action tool (browser_act) - Class: core-capability -- Status: active +- Status: validated - Description: A browser_act tool that takes a semantic intent (e.g. "submit the current form", "close the active modal", "click the primary CTA") and executes the obvious action sequence internally. - Why it matters: Each of these common micro-tasks currently takes 2-4 tool calls. browser_act collapses them into one. - Source: user - Primary owning slice: M002/S05 - Supporting slices: M002/S04 -- Validation: unmapped +- Validation: Resolves top candidate via same scoring engine as browser_find_best. Executes via Playwright locator.click() with getByRole fallback (focus for search_field). Settles via settleAfterActionAdaptive, returns before/after diff. Zero-candidate returns isError:true without throwing. Verified via Playwright test scripts. Build passes, tool count = 47. - Notes: Builds on browser_find_best for element selection. Bounded — does not loop or retry. ### R026 — Test coverage for new and refactored code @@ -345,16 +345,16 @@ This file is the explicit capability and coverage contract for the project. | R021 | core-capability | validated | M002/S03 | none | screenshot param default false, capture gated, browser_reload unchanged, build passes | | R022 | core-capability | validated | M002/S04 | M002/S01 | 7-level label resolution, form auto-detection, verified against 12-field test form | | R023 | core-capability | validated | M002/S04 | M002/S01 | 5-strategy field resolution, type-aware fill, verified end-to-end with 10 fields | -| R024 | core-capability | active | M002/S05 | M002/S01 | unmapped | -| R025 | core-capability | active | M002/S05 | M002/S04 | unmapped | +| R024 | core-capability | validated | M002/S05 | M002/S01 | 8-intent scoring, Playwright tests, differentiated rankings, build passes | +| R025 | core-capability | validated | M002/S05 | M002/S04 | top candidate execution via Playwright locator, settle + diff, graceful error, build passes | | R026 | quality-attribute | active | M002/S06 | all M002 | unmapped | | R027 | core-capability | deferred | none | none | unmapped | | R028 | anti-feature | out-of-scope | none | none | n/a | ## Coverage Summary -- Active requirements: 3 -- Validated requirements: 19 +- Active requirements: 1 +- Validated requirements: 21 - Deferred requirements: 3 - Out of scope: 3 - Unmapped active requirements: 3 diff --git a/.gsd/STATE.md b/.gsd/STATE.md index 3baaef13a..d2e098e0c 100644 --- a/.gsd/STATE.md +++ b/.gsd/STATE.md @@ -1,7 +1,7 @@ # GSD State **Active Milestone:** M002 — Browser Tools Performance & Intelligence -**Active Slice:** S05 — Intent-ranked retrieval and semantic actions +**Active Slice:** S06 — Test coverage **Phase:** planning **Requirements Status:** 7 active · 15 validated · 3 deferred · 3 out of scope @@ -16,4 +16,4 @@ - None ## Next Action -Plan slice S05 (Intent-ranked retrieval and semantic actions). +Plan slice S06 (Test coverage). diff --git a/.gsd/milestones/M002/M002-ROADMAP.md b/.gsd/milestones/M002/M002-ROADMAP.md index 1fb3c1880..22f327edf 100644 --- a/.gsd/milestones/M002/M002-ROADMAP.md +++ b/.gsd/milestones/M002/M002-ROADMAP.md @@ -70,7 +70,7 @@ This milestone is complete only when all are true: - [x] **S04: Form intelligence** `risk:medium` `depends:[S01]` > After this: browser_analyze_form returns field inventory (labels, types, required, values, validation) for any form; browser_fill_form fills fields by label/name/placeholder mapping and optionally submits — verified by running both tools against a real multi-field form. -- [ ] **S05: Intent-ranked retrieval and semantic actions** `risk:medium` `depends:[S01]` +- [x] **S05: Intent-ranked retrieval and semantic actions** `risk:medium` `depends:[S01]` > After this: browser_find_best returns scored candidates for intents like "submit form", "close dialog", "primary CTA"; browser_act executes common micro-tasks in one call — verified by running both tools against real pages. - [ ] **S06: Test coverage** `risk:low` `depends:[S01,S02,S03,S04,S05]` diff --git a/.gsd/milestones/M002/slices/S05/S05-PLAN.md b/.gsd/milestones/M002/slices/S05/S05-PLAN.md new file mode 100644 index 000000000..195e95e5b --- /dev/null +++ b/.gsd/milestones/M002/slices/S05/S05-PLAN.md @@ -0,0 +1,52 @@ +# S05: Intent-ranked retrieval and semantic actions + +**Goal:** `browser_find_best` returns scored candidates for semantic intents; `browser_act` resolves the top candidate and executes it in one call. +**Demo:** Run `browser_find_best` with intent "submit_form" against a real page with a form and get ranked candidates. Run `browser_act` with intent "close_dialog" against a page with a modal and see it dismissed. + +## Must-Haves + +- `browser_find_best` registered and functional with 8 intents: submit_form, close_dialog, primary_cta, search_field, next_step, dismiss, auth_action, back_navigation +- Each intent uses deterministic heuristic scoring (no LLM calls) with 2+ scoring dimensions per intent +- Candidates include CSS selectors usable with Playwright locator APIs +- Results capped at 5 candidates, scored 0-1 with human-readable reasons +- Intent strings normalized (accept underscores, spaces, mixed case) +- `browser_act` resolves top candidate, executes via Playwright locator click (not evaluate click), settles, returns before/after diff +- `browser_act` returns error (not throw) when zero candidates found +- Both tools wired into index.ts, tool count = 47 +- Build passes + +## Proof Level + +- This slice proves: integration (new tools against real browser pages) +- Real runtime required: yes (Playwright against real pages) +- Human/UAT required: no (automated verification sufficient) + +## Verification + +- `npm run build` passes +- `grep -c "pi.registerTool" src/resources/extensions/browser-tools/tools/*.ts` sums to 47 +- `browser_find_best` with intent "submit_form" against a page with a `