From 5155d69d554ab768de6fa3670b48bcd0f45bb014 Mon Sep 17 00:00:00 2001 From: Lex Christopherson Date: Fri, 13 Mar 2026 00:56:23 -0600 Subject: [PATCH] test(M002/S06): Test coverage Tasks: - chore(M002/S06): auto-commit after complete-slice - chore(M002/S06): auto-commit after complete-slice - chore(M002/S06/T02): auto-commit after execute-task - chore(M002/S06/T02): auto-commit after execute-task - chore(M002/S06/T01): auto-commit after execute-task - chore(M002/S06/T01): auto-commit after execute-task - chore(M002/S06): auto-commit after plan-slice - chore: update state for S06 execution - docs(S06): add slice plan Branch: gsd/M002/S06 --- .gsd/DECISIONS.md | 2 + .gsd/PROJECT.md | 2 +- .gsd/REQUIREMENTS.md | 12 +- .gsd/STATE.md | 6 +- .gsd/milestones/M002/M002-ROADMAP.md | 2 +- .gsd/milestones/M002/slices/S06/S06-PLAN.md | 43 ++ .../M002/slices/S06/tasks/T01-PLAN.md | 52 ++ .../M002/slices/S06/tasks/T02-PLAN.md | 64 ++ package-lock.json | 11 + package.json | 2 + .../tests/browser-tools-integration.test.mjs | 652 ++++++++++++++++++ .../tests/browser-tools-unit.test.cjs | 614 +++++++++++++++++ 12 files changed, 1451 insertions(+), 11 deletions(-) create mode 100644 .gsd/milestones/M002/slices/S06/S06-PLAN.md create mode 100644 .gsd/milestones/M002/slices/S06/tasks/T01-PLAN.md create mode 100644 .gsd/milestones/M002/slices/S06/tasks/T02-PLAN.md create mode 100644 src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs create mode 100644 src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs diff --git a/.gsd/DECISIONS.md b/.gsd/DECISIONS.md index 24c061b31..d75f541a4 100644 --- a/.gsd/DECISIONS.md +++ b/.gsd/DECISIONS.md @@ -30,3 +30,5 @@ | D022 | M002/S04 | pattern | Fill field matching priority | Label (exact → case-insensitive) → name → placeholder → aria-label | Label is the most human-readable identifier. Name is the most reliable programmatic identifier. Placeholder and aria-label are fallbacks. Exact match before fuzzy prevents wrong-field fills. | Yes — if real-world usage shows a different priority works better | | D023 | M002/S05 | pattern | Intent scoring model | 4 orthogonal dimensions per intent, each 0-1, summed and clamped | Consistent scoring structure across all 8 intents. Makes scoring testable and debuggable — each dimension has a named reason. 4 dimensions balance discrimination vs complexity. | Yes — could add/remove dimensions per intent if real-world usage shows imbalance | | D024 | M002/S05 | pattern | search_field action type | Focus instead of click for search_field intent in browser_act | Search fields need keyboard focus for typing, not a click that might submit or toggle. Focus is the semantically correct action. Other intents use click. | Yes — if focus proves unreliable on specific input implementations | +| D025 | M002/S06 | pattern | Test import strategy for browser-tools | jiti CJS imports instead of ESM resolve-ts hook | The resolve-ts ESM hook breaks on core.js (plain .js file imported by TS modules). jiti handles mixed .ts/.js imports correctly from a .cjs test file. | No | +| D026 | M002/S06 | pattern | Testing module-private functions | Source extraction via readFileSync + brace-match + strip types + eval | Avoids exporting test-only APIs from production modules. Fragile to refactors but tests fail clearly when extraction breaks. Acceptable tradeoff for test code. | Yes — if private functions get exported for other reasons | diff --git a/.gsd/PROJECT.md b/.gsd/PROJECT.md index 9fd033848..edee0b9a2 100644 --- a/.gsd/PROJECT.md +++ b/.gsd/PROJECT.md @@ -38,4 +38,4 @@ See `.gsd/REQUIREMENTS.md` for the explicit capability contract, requirement sta ## Milestone Sequence - [x] M001: Proactive Secret Management — Front-loaded API key collection into planning so auto-mode runs uninterrupted (10 requirements validated) -- [ ] M002: Browser Tools Performance & Intelligence — Module decomposition, action pipeline optimization, sharp-based screenshots, form intelligence, intent-ranked retrieval, semantic actions +- [x] M002: Browser Tools Performance & Intelligence — Module decomposition, action pipeline optimization, sharp-based screenshots, form intelligence, intent-ranked retrieval, semantic actions, 108-test suite (12 requirements validated) diff --git a/.gsd/REQUIREMENTS.md b/.gsd/REQUIREMENTS.md index 52c5c0f2a..e460733f6 100644 --- a/.gsd/REQUIREMENTS.md +++ b/.gsd/REQUIREMENTS.md @@ -72,13 +72,13 @@ This file is the explicit capability and coverage contract for the project. ### R026 — Test coverage for new and refactored code - Class: quality-attribute -- Status: active +- Status: validated - Description: Test suite covers shared browser-side utilities, settle logic, screenshot resizing, form tools, and intent ranking. Tests verify correctness and guard against regressions. - Why it matters: A 5000-line file with zero tests is fragile. The refactoring and new features need regression protection. - Source: user - Primary owning slice: M002/S06 - Supporting slices: all M002 slices -- Validation: unmapped +- Validation: 108 tests (63 unit + 45 integration) passing via `npm run test:browser-tools`. Unit tests cover pure functions, state accessors, EVALUATE_HELPERS_SOURCE validity, constrainScreenshot with sharp. Integration tests cover window.__pi utilities, intent scoring differentiation, and form label resolution — all via Playwright against real DOM. - Notes: Test what's unit-testable without a running browser (heuristics, scoring, utility functions). Integration tests with Playwright for tools that need a page. ## Validated @@ -347,14 +347,14 @@ This file is the explicit capability and coverage contract for the project. | R023 | core-capability | validated | M002/S04 | M002/S01 | 5-strategy field resolution, type-aware fill, verified end-to-end with 10 fields | | R024 | core-capability | validated | M002/S05 | M002/S01 | 8-intent scoring, Playwright tests, differentiated rankings, build passes | | R025 | core-capability | validated | M002/S05 | M002/S04 | top candidate execution via Playwright locator, settle + diff, graceful error, build passes | -| R026 | quality-attribute | active | M002/S06 | all M002 | unmapped | +| R026 | quality-attribute | validated | M002/S06 | all M002 | 108 tests passing via npm run test:browser-tools | | R027 | core-capability | deferred | none | none | unmapped | | R028 | anti-feature | out-of-scope | none | none | n/a | ## Coverage Summary -- Active requirements: 1 -- Validated requirements: 21 +- Active requirements: 0 +- Validated requirements: 22 - Deferred requirements: 3 - Out of scope: 3 -- Unmapped active requirements: 3 +- Unmapped active requirements: 0 diff --git a/.gsd/STATE.md b/.gsd/STATE.md index d2e098e0c..28cd7feed 100644 --- a/.gsd/STATE.md +++ b/.gsd/STATE.md @@ -1,8 +1,8 @@ # GSD State **Active Milestone:** M002 — Browser Tools Performance & Intelligence -**Active Slice:** S06 — Test coverage -**Phase:** planning +**Active Slice:** None +**Phase:** completing-milestone **Requirements Status:** 7 active · 15 validated · 3 deferred · 3 out of scope ## Milestone Registry @@ -16,4 +16,4 @@ - None ## Next Action -Plan slice S06 (Test coverage). +All slices complete in M002. Write milestone summary. diff --git a/.gsd/milestones/M002/M002-ROADMAP.md b/.gsd/milestones/M002/M002-ROADMAP.md index 22f327edf..d8daa5866 100644 --- a/.gsd/milestones/M002/M002-ROADMAP.md +++ b/.gsd/milestones/M002/M002-ROADMAP.md @@ -73,7 +73,7 @@ This milestone is complete only when all are true: - [x] **S05: Intent-ranked retrieval and semantic actions** `risk:medium` `depends:[S01]` > After this: browser_find_best returns scored candidates for intents like "submit form", "close dialog", "primary CTA"; browser_act executes common micro-tasks in one call — verified by running both tools against real pages. -- [ ] **S06: Test coverage** `risk:low` `depends:[S01,S02,S03,S04,S05]` +- [x] **S06: Test coverage** `risk:low` `depends:[S01,S02,S03,S04,S05]` > After this: test suite covers shared browser-side utilities, settle logic, screenshot resizing, form analysis heuristics, intent scoring, and semantic action resolution — verified by test runner passing. ## Boundary Map diff --git a/.gsd/milestones/M002/slices/S06/S06-PLAN.md b/.gsd/milestones/M002/slices/S06/S06-PLAN.md new file mode 100644 index 000000000..6d5b86221 --- /dev/null +++ b/.gsd/milestones/M002/slices/S06/S06-PLAN.md @@ -0,0 +1,43 @@ +# S06: Test coverage + +**Goal:** Test suite covers shared browser-side utilities, settle logic, screenshot resizing, form analysis heuristics, intent scoring, and semantic action resolution. +**Demo:** `npm run test:browser-tools` passes — unit tests via jiti and integration tests via Playwright both green. + +## Must-Haves + +- Unit tests for pure Node-side functions: parseRef, formatVersionedRef, staleRefGuidance, formatCompactStateSummary, verificationFromChecks, verificationLine, sanitizeArtifactName, isCriticalResourceType, getUrlHash, firstErrorLine, formatArtifactTimestamp +- Unit test for EVALUATE_HELPERS_SOURCE syntax validity (parseable via `new Function()`) +- Unit tests for state accessor pairs (set/get round-trip) and resetAllState +- Unit tests for constrainScreenshot with synthetic sharp buffers (JPEG/PNG, within-bounds passthrough, over-bounds resize) +- Integration tests for window.__pi utility functions (simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, cssPath) via Playwright page.evaluate against real DOM +- Integration tests for intent scoring differentiation (submit_form, close_dialog, search_field, primary_cta) via Playwright page.evaluate of buildIntentScoringScript output +- Integration tests for form label resolution (7-level priority chain) via Playwright page.evaluate of buildFormAnalysisScript output +- `test:browser-tools` script in package.json — separate from existing `test` script + +## Verification + +- `npm run test:browser-tools` exits 0 with all tests passing +- Unit test file: `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs` +- Integration test file: `src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs` + +## Tasks + +- [x] **T01: Unit tests for Node-side pure functions, state accessors, and constrainScreenshot** `est:30m` + - Why: Covers all pure-function logic from utils.ts, state.ts, evaluate-helpers.ts, and capture.ts that can be tested without a browser. These are the fastest, most stable tests. + - Files: `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs`, `package.json` + - Do: Create tests/ directory. Write CJS test file using `node:test` + `node:assert/strict` + `@mariozechner/jiti` for imports. Test pure functions from utils.ts (parseRef, formatVersionedRef, staleRefGuidance, formatCompactStateSummary, verificationFromChecks, verificationLine, sanitizeArtifactName, isCriticalResourceType, getUrlHash, firstErrorLine, formatArtifactTimestamp). Test EVALUATE_HELPERS_SOURCE parseable via `new Function()` and contains all 9 expected function names. Test state accessor round-trips and resetAllState. Test constrainScreenshot with synthetic sharp buffers: small JPEG passthrough, oversized JPEG resize, PNG resize. Add `test:browser-tools` script to package.json. + - Verify: `npm run test:browser-tools` passes all unit tests + - Done when: All unit tests pass, `test:browser-tools` script exists + +- [x] **T02: Integration tests for browser-side utilities, intent scoring, and form analysis via Playwright** `est:30m` + - Why: Covers the evaluate-script logic that requires a real DOM — window.__pi functions, intent scoring heuristics, and form label resolution. These test the actual codepath (page.evaluate with IIFE strings) that the tools use in production. + - Files: `src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs`, `package.json` + - Do: Write ESM test file using `node:test` + `node:assert/strict` + Playwright chromium. Launch browser once in `before()`, close in `after()`. Test window.__pi functions by injecting EVALUATE_HELPERS_SOURCE then evaluating each function against HTML fixtures via `page.setContent()`. Test intent scoring by calling buildIntentScoringScript (not exported — read forms.ts and intent.ts to extract the evaluate script strings, or use the same evaluate-script-building approach from the source). Test form analysis by evaluating buildFormAnalysisScript output against a multi-field HTML form. Set explicit viewport dimensions (1280×720) for deterministic scoring. Update `test:browser-tools` script to include this file. + - Verify: `npm run test:browser-tools` passes all integration tests + - Done when: All integration tests pass including browser-side utility, intent scoring, and form analysis tests + +## Files Likely Touched + +- `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs` +- `src/resources/extensions/browser-tools/tests/browser-tools-integration.test.mjs` +- `package.json` diff --git a/.gsd/milestones/M002/slices/S06/tasks/T01-PLAN.md b/.gsd/milestones/M002/slices/S06/tasks/T01-PLAN.md new file mode 100644 index 000000000..0f69f452c --- /dev/null +++ b/.gsd/milestones/M002/slices/S06/tasks/T01-PLAN.md @@ -0,0 +1,52 @@ +--- +estimated_steps: 5 +estimated_files: 3 +--- + +# T01: Unit tests for Node-side pure functions, state accessors, and constrainScreenshot + +**Slice:** S06 — Test coverage +**Milestone:** M002 + +## Description + +Create the browser-tools test infrastructure and write unit tests for all pure Node-side functions. Uses jiti for TypeScript imports (the resolve-ts ESM hook breaks on core.js), `node:test` for the runner, and `node:assert/strict` for assertions. Tests constrainScreenshot with synthetic sharp buffers — it's a pure buffer-in/buffer-out function since S03 removed the page dependency. + +## Steps + +1. Create `src/resources/extensions/browser-tools/tests/` directory and the `.cjs` test file with jiti-based imports of utils.ts, state.ts, evaluate-helpers.ts, and capture.ts. +2. Write tests for pure utility functions from utils.ts: parseRef (valid ref, invalid ref, legacy format), formatVersionedRef, staleRefGuidance, formatCompactStateSummary (with mock CompactPageState), verificationFromChecks (pass/fail cases), verificationLine, sanitizeArtifactName (valid, empty, special chars), isCriticalResourceType (document/stylesheet/script vs image/font), getUrlHash, firstErrorLine (Error, string, unknown), formatArtifactTimestamp. +3. Write tests for EVALUATE_HELPERS_SOURCE: parseable via `new Function(source)`, contains all 9 expected function assignment strings (cssPath, simpleHash, isVisible, isEnabled, inferRole, accessibleName, isInteractiveEl, domPath, selectorHints). +4. Write tests for state accessor round-trips (setBrowser/getBrowser, setContext/getContext, setActiveFrame/getActiveFrame, setSessionStartedAt/getSessionStartedAt, setSessionArtifactDir/getSessionArtifactDir, setCurrentRefMap/getCurrentRefMap, setRefVersion/getRefVersion, setRefMetadata/getRefMetadata, setLastActionBeforeState/getLastActionBeforeState, setLastActionAfterState/getLastActionAfterState) and resetAllState clearing all of them. +5. Write tests for constrainScreenshot: create synthetic JPEG buffer (800×600) via sharp — should pass through unchanged. Create oversized JPEG buffer (3000×2000) — should resize within 1568px. Create oversized PNG buffer — should resize and return PNG. Add `test:browser-tools` script to package.json: `node --test src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs`. + +## Must-Haves + +- [ ] jiti imports work for all browser-tools modules +- [ ] All pure utility function tests pass +- [ ] EVALUATE_HELPERS_SOURCE syntax validation passes +- [ ] State accessor round-trip tests pass +- [ ] resetAllState clears all state +- [ ] constrainScreenshot passthrough for small images +- [ ] constrainScreenshot resizes oversized JPEG +- [ ] constrainScreenshot resizes oversized PNG +- [ ] `test:browser-tools` script added to package.json + +## Verification + +- `npm run test:browser-tools` exits 0 +- Test output shows all test cases passing + +## Inputs + +- `src/resources/extensions/browser-tools/utils.ts` — pure functions to test +- `src/resources/extensions/browser-tools/state.ts` — accessor pairs and resetAllState +- `src/resources/extensions/browser-tools/evaluate-helpers.ts` — EVALUATE_HELPERS_SOURCE constant +- `src/resources/extensions/browser-tools/capture.ts` — constrainScreenshot function +- S01 summary — accessor pattern details, jiti compatibility requirement +- S03 summary — constrainScreenshot is now pure buffer-in/buffer-out with unused `_page` param + +## Expected Output + +- `src/resources/extensions/browser-tools/tests/browser-tools-unit.test.cjs` — complete unit test file with 30+ test cases +- `package.json` — `test:browser-tools` script added diff --git a/.gsd/milestones/M002/slices/S06/tasks/T02-PLAN.md b/.gsd/milestones/M002/slices/S06/tasks/T02-PLAN.md new file mode 100644 index 000000000..fc0639edd --- /dev/null +++ b/.gsd/milestones/M002/slices/S06/tasks/T02-PLAN.md @@ -0,0 +1,64 @@ +--- +estimated_steps: 4 +estimated_files: 2 +--- + +# T02: Integration tests for browser-side utilities, intent scoring, and form analysis via Playwright + +**Slice:** S06 — Test coverage +**Milestone:** M002 + +## Description + +Write Playwright-based integration tests that exercise the browser-side evaluate scripts against real DOM. These test the actual codepath — IIFE strings evaluated via `page.evaluate()` against HTML fixtures. Covers window.__pi utilities from evaluate-helpers.ts, intent scoring from intent.ts, and form label resolution from forms.ts. The scoring and form analysis functions are module-private (not exported), so we replicate the evaluate approach: read the source files to extract the IIFE strings, then evaluate them in Playwright. + +## Steps + +1. Create the `.mjs` test file. Import `node:test`, `node:assert/strict`, `playwright` (chromium), and use jiti or direct file reads to get EVALUATE_HELPERS_SOURCE and the evaluate script source strings. Launch Chromium once in `before()`, set viewport to 1280×720, close in `after()`. +2. Write window.__pi utility tests: inject EVALUATE_HELPERS_SOURCE via `page.evaluate()`, then test each function against inline HTML fixtures via `page.setContent()`: + - `simpleHash` — deterministic output for same input, different output for different input + - `isVisible` — visible element returns true, `display:none` returns false + - `isEnabled` — enabled input returns true, disabled returns false + - `inferRole` — button element → "button", anchor with href → "link", input[type=text] → "textbox" + - `accessibleName` — button with text content, input with aria-label, input with label[for] + - `isInteractiveEl` — button → true, div → false, input → true + - `cssPath` — returns a valid CSS selector string that `querySelector` resolves back to the element +3. Write intent scoring tests: read `tools/intent.ts` source, extract the IIFE returned by `buildIntentScoringScript` for each intent (or replicate the script-building approach), then evaluate against HTML fixtures: + - `submit_form` — form with submit button scores higher than a random button outside the form + - `close_dialog` — dialog with × button and Cancel: × button scores highest + - `search_field` — input[type=search] scores higher than input[type=text] + - `primary_cta` — large styled button in main content scores higher than small nav link +4. Write form analysis tests: replicate `buildFormAnalysisScript()` call (or extract the script string), evaluate against a multi-field HTML form: + - Label via `label[for]` resolves correctly + - Label via wrapping `