chore: remove .gsd folder from tracking and consolidate gitignore (#78)
The .gsd/ directory contains user-specific GSD project artifacts that should never be committed. Remove all tracked .gsd files and consolidate the .gitignore entries to a single .gsd/ rule. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
b10b78cb75
commit
6e1e634251
8 changed files with 1 additions and 637 deletions
7
.gitignore
vendored
7
.gitignore
vendored
|
|
@ -38,9 +38,4 @@ AGENTS.md
|
|||
TODOS.md
|
||||
|
||||
# ── GSD baseline (auto-generated) ──
|
||||
.gsd/activity/
|
||||
.gsd/runtime/
|
||||
.gsd/worktrees/
|
||||
.gsd/auto.lock
|
||||
.gsd/metrics.json
|
||||
.gsd/STATE.md
|
||||
.gsd/
|
||||
|
|
|
|||
|
|
@ -1,33 +0,0 @@
|
|||
# Decisions Register
|
||||
|
||||
<!-- Append-only. Never edit or remove existing rows.
|
||||
To reverse a decision, add a new row that supersedes it.
|
||||
Read this file at the start of any planning or research phase. -->
|
||||
|
||||
| # | When | Scope | Decision | Choice | Rationale | Revisable? |
|
||||
|---|------|-------|----------|--------|-----------|------------|
|
||||
| D001 | M001 | arch | Embedding strategy | SDK (`createAgentSession` + `InteractiveMode`) | Type-safe, no subprocess management, full control over storage/resources, cleanest branded app path per pi docs | No |
|
||||
| D002 | M001 | arch | State storage location | `~/.gsd/` (agent: `~/.gsd/agent/`, sessions: `~/.gsd/sessions/`) | Complete isolation from `~/.pi/`, clear brand identity, follows pi doc recommendation for branded apps | No |
|
||||
| D003 | M001 | arch | Branding mechanism | `PI_PACKAGE_DIR` env var set before pi internals load, pointing to gsd package root; gsd `package.json` declares `piConfig: { name: "gsd", configDir: ".gsd" }` | `config.js` reads `APP_NAME` from `piConfig.name` in the package.json found at `PI_PACKAGE_DIR`. Only mechanism that renames the TUI header without patching pi source. | Yes — if pi adds a dedicated `createAgentSession` appName option |
|
||||
| D004 | M001 | arch | Extension delivery | Copy extension `.ts` source into `src/resources/extensions/` at dev time; load via `DefaultResourceLoader.additionalExtensionPaths`; pi's jiti handles JIT compilation at runtime | Preserves pi's JIT compilation model, no separate build step for extensions, extensions stay readable source | Yes — if extension count grows large enough to warrant pre-compilation |
|
||||
| D005 | M001 | scope | Skills in M001 | Excluded — extensions only | User decision during discussion | Yes — M002 candidate |
|
||||
| D006 | M001 | scope | Plugin/install system | Deferred | Not MVP; bundled-only product for M001 | Yes — M002 candidate |
|
||||
| D007 | M001 | arch | pi interop | None — GSD never reads or writes `~/.pi/` | GSD is a product, not a pi config. Interop would blur the brand boundary. | No |
|
||||
| D008 | M001/S01 | verification | S01 verification strategy | Shell commands + real TTY launch (no test framework) | S01 is a pure binary launch / TUI branding check. The only meaningful assertion is whether the binary launches with "gsd" in the header — no unit-testable logic to isolate. Shell verification commands cover all must-haves. Test framework deferred to S02+ if needed. | Yes — add test framework in S02 if extension loading logic warrants it |
|
||||
| D009 | M001/S01 | arch | `files` array in package.json | Set in T03 during S01 (`["dist", "package.json", "README.md"]`) | Correct npm publish manifest must be in place before S04 pack/publish. Setting it early avoids a late-stage surprise. | No |
|
||||
| D010 | M001/S01/T02 | impl | ModelRegistry instantiation | Constructor `new ModelRegistry(authStorage)` — not a static factory | SDK types show no `.create()` on ModelRegistry; authStorage is passed directly to constructor. All other managers (AuthStorage, SettingsManager, SessionManager) use static `.create()` but synchronously. | No |
|
||||
| D011 | M001/S01/T02 | impl | InteractiveMode.run() | Instance method: `new InteractiveMode(session); mode.run()` — not static | SDK type declarations confirm `run()` is an instance method; static call would fail at runtime. | No |
|
||||
| D012 | M001/S01/T02 | impl | skipLibCheck in tsconfig | `skipLibCheck: true` added | `@google/genai` published types reference `@modelcontextprotocol/sdk` which is not installed as a type dep — causes transitive TS2307 error unrelated to gsd code. skipLibCheck is the standard fix for third-party type declaration issues. | Yes — remove if MCP types are added as a dep in the future |
|
||||
| D013 | M001/S01/T03 | arch | `PI_PACKAGE_DIR` shim directory (`pkg/`) | Added `pkg/` dir with `package.json` (piConfig) + `dist/modes/interactive/theme/` (pi theme JSONs) as the `PI_PACKAGE_DIR` target | `config.js::getThemesDir()` uses `getPackageDir()` (= PI_PACKAGE_DIR) and checks if `<dir>/src` exists; if yes, uses `src/modes/interactive/theme/` instead of `dist/`. Our project has a real `src/` dir, causing themes to resolve to the wrong path. Pointing PI_PACKAGE_DIR at `pkg/` (which has no `src/`) avoids the collision while still providing `piConfig` for branding. `pkg/dist/modes/interactive/theme/` is populated by `npm run copy-themes` (build script). | Yes — if pi adds a dedicated `appName` option to createAgentSession making PI_PACKAGE_DIR unnecessary |
|
||||
| D014 | M001/S02 | verification | S02 verification strategy | Shell commands + real TTY launch with stderr capture, no test framework | Extension loading is a runtime integration concern — no unit-testable logic to isolate. The meaningful assertions are: zero extension errors in stderr on launch, correct env vars in compiled loader.js, absence of `~/.pi/` refs in patched files. Shell commands cover all must-haves. Test framework deferred per D008. | Yes — add test framework if extension loading logic grows complex |
|
||||
| D015 | M001/S02 | arch | subagent spawn approach | `spawn(process.execPath, [GSD_BIN_PATH, ...extensionArgs, ...args])` — no `pi` binary in PATH | Patched subagent spawns node directly with the gsd dist/loader.js entrypoint. This ensures spawned subagents always use the bundled gsd extensions, regardless of what `pi` is in PATH. `GSD_BIN_PATH` = `process.argv[1]` from loader.ts. | Yes — if pi adds a native subagent spawn API |
|
||||
| D016 | M001/S02 | arch | shared/ is a library, not an extension entry point | `shared/` is NOT added to `additionalExtensionPaths` | `shared/ui.ts`, `shared/next-action-ui.ts` etc. are cross-extension imports, not independently registered extensions. They are discovered by jiti when gsd and ask-user-questions imports them via `../shared/*.js`. Adding shared/ as an extension entry point would attempt to register it as an extension (which it isn't). | No |
|
||||
| D017 | M001/S02 | arch | AGENTS.md first-run write | `initResources()` writes bundled AGENTS.md to `~/.gsd/agent/AGENTS.md` on first launch | pi's `loadProjectContextFiles` discovers AGENTS.md from `agentDir` (`~/.gsd/agent/`). On fresh install this file doesn't exist. One-time write on launch (behind existsSync check) ensures spawned subagents always pick up GSD's hard rules and execution heuristics. | No |
|
||||
| D018 | M001/S03 | arch | Wizard injection point | Pre-session: before `createAgentSession()`, not via `session_start` event hook | Running wizard before `createAgentSession()` ensures Anthropic key is in `authStorage` before `modelRegistry.getAvailable()` runs — avoids "No models available" fallback warning. S01 forward intelligence mentioned session_start hook; pre-session approach is strictly better because the session starts clean with a valid model. | Yes — if pi adds a native `beforeStart` or `authMissing` hook to `createAgentSession` |
|
||||
| D019 | M001/S03 | verification | S03 verification strategy | Shell script (`scripts/verify-s03.sh`) for automated non-TTY/skip checks + interactive UAT for masked input and TUI launch | Wizard involves TTY interaction that cannot be meaningfully automated (masked stdin, TUI launch). Automated shell script covers all non-interactive assertions (exit codes, error text, env hydration). Interactive UAT covers the remaining visual/interactive behaviors. No test framework added — consistent with D008/D014. | Yes — add test framework if wizard logic grows complex |
|
||||
| D020 | M001/S03 | arch | Wizard scope | Optional tool keys only (Brave/Context7/Jina) — Anthropic auth is pi's responsibility via OAuth | Wizard collecting Anthropic key was redundant (pi already handles it) and interfered with verify script automation. Optional-key scope satisfies R006. | Yes — if pi adds a native "no Anthropic key" callback hook |
|
||||
| D021 | M001/S04 | arch | GSD_BUNDLED_EXTENSION_PATHS target | agentDir-based paths, not src/resources paths | When subagent spawns a child gsd process via --extension flags, the child also runs initResources + buildResourceLoader from agentDir. src/resources paths ≠ agentDir paths → pi deduplication fails → duplicate tool registration errors. Pointing to agentDir paths means both the --extension args and agentDir scan resolve identically → deduplication works. Safe because subagent spawning only happens after initResources has synced on first launch. | No |
|
||||
| D022 | M001/S04 | verification | S04 verification strategy | 10-check `scripts/verify-s04.sh` for tarball install path; registry publish check automated; interactive UAT for wizard fire from clean install | Tarball install + launch is automatable (env isolation, background kill). Registry install check is automatable (prefix install + stderr check). Wizard TTY interaction is UAT-only. Consistent with D008/D014/D019 — shell scripts, no test framework. | Yes — add test framework if automated E2E is needed later |
|
||||
| D023 | M003 | arch | Test flow execution model | Intent-based YAML specs, not deterministic scripts — agent interprets verify blocks with full adaptive intelligence | Evaluated Maestro (JVM dep, deterministic scripting, mobile-first) and decided against embedding or cloning it. GSD's advantage is AI-in-the-loop. Flows describe what to verify; the agent decides how. Faster iteration, better flakiness handling, plays to GSD's strength. | Yes — could add deterministic fast-path for simple assertions later |
|
||||
| D024 | M003 | arch | Test browser isolation | test-flows runs its own Playwright instance, separate from browser-tools | Test execution must not be polluted by development browser state (cookies, auth, DOM mutations). Two Playwright instances in one process is supported. Keeps test-flows extension fully decoupled from browser-tools. | No |
|
||||
| D025 | M003 | arch | Maestro integration | Not embedded — optional external tool if user installs it | Maestro requires JVM, adds ~200MB+ footprint, its YAML format is deterministic scripts not intent specs. GSD builds its own testing arm. Maestro MCP could be wired in later as an optional extension for users who want it. | Yes — could add maestro MCP wrapper extension later |
|
||||
|
|
@ -1,37 +0,0 @@
|
|||
# Project
|
||||
|
||||
## What This Is
|
||||
|
||||
GSD 2.0 is a branded npm CLI (`npm install -g gsd-pi`) that ships the full GSD coding agent experience as a standalone product. It embeds `@mariozechner/pi-coding-agent` via SDK, stores state in `~/.gsd/`, bundles the GSD extension, all supporting extensions, agents, and AGENTS.md context, and runs pi's `InteractiveMode` under the `gsd` brand. Users run `gsd` — not `pi`.
|
||||
|
||||
## Core Value
|
||||
|
||||
A single `npm install -g gsd-pi` gives any developer a fully configured, GSD-branded coding agent with the GSD extension, all supporting tools (browser, search, context7, subagent, bg-shell, etc.), and a first-run setup wizard that collects API keys — ready to use in under two minutes.
|
||||
|
||||
## Current State
|
||||
|
||||
M001 complete. `gsd-pi` published to npm (v2.3.7). `npm install -g gsd-pi` installs a working `gsd` binary that launches with GSD ASCII art branding, loads all 11 bundled extensions without errors, stores state in `~/.gsd/`, and runs the first-run wizard for optional API keys. All 9 M001 requirements validated. M002 (Branded Installer & Onboarding Experience) is in progress — S01 complete, S02-S03 planned.
|
||||
|
||||
Key structural artifact: `pkg/` shim directory — `PI_PACKAGE_DIR` points here (not project root) to avoid pi's `getThemesDir()` collision with our real `src/` dir. Committed; `pkg/dist/modes/interactive/theme/` populated by `npm run copy-themes` at build time.
|
||||
|
||||
## Architecture / Key Patterns
|
||||
|
||||
- **SDK embedding**: `@mariozechner/pi-coding-agent` imported as a library via `createAgentSession` + `InteractiveMode`
|
||||
- **Branded app directories**: state lives in `~/.gsd/agent/`, sessions in `~/.gsd/sessions/` (constants in `src/app-paths.ts`)
|
||||
- **Branding via `PI_PACKAGE_DIR`**: env var set in `src/loader.ts` before any pi SDK loads; points to `pkg/` shim; `pkg/package.json` declares `piConfig: { name: "gsd", configDir: ".gsd" }`
|
||||
- **Two-file loader pattern**: `loader.ts` (sets env vars, zero SDK imports, dynamic-imports `cli.js`) → `cli.ts` (static SDK imports, wires all managers)
|
||||
- **pkg/ shim**: lean subdirectory — only `package.json` (piConfig) and `dist/modes/interactive/theme/` (pi theme assets). No `src/`. Avoids `getThemesDir()` src-check collision.
|
||||
- **Bundled extensions**: GSD extension + 10 supporting extensions in `src/resources/extensions/`; loaded via `buildResourceLoader()` → `DefaultResourceLoader.additionalExtensionPaths`; all 11 load clean on launch
|
||||
- **Bundled agents + AGENTS.md**: scout, researcher, worker in `src/resources/agents/`; `initResources()` writes bundled AGENTS.md to `~/.gsd/agent/` on first launch (existsSync guard)
|
||||
- **4 GSD_ env vars**: set in loader.ts before cli.js loads — `GSD_CODING_AGENT_DIR`, `GSD_BIN_PATH`, `GSD_WORKFLOW_PATH`, `GSD_BUNDLED_EXTENSION_PATHS`
|
||||
- **First-run wizard**: `src/wizard.ts` — detects missing optional keys (Brave/Context7/Jina), prompts with masked TTY input, writes to `~/.gsd/agent/auth.json`; `loadStoredEnvKeys` hydrates env on every launch before extensions load
|
||||
|
||||
## Capability Contract
|
||||
|
||||
See `.gsd/REQUIREMENTS.md` for the explicit capability contract, requirement status, and coverage mapping.
|
||||
|
||||
## Milestone Sequence
|
||||
|
||||
- [x] M001: MVP CLI — `npm install -g gsd-pi` installs, launches, and runs with all bundled extensions and first-run setup
|
||||
- [ ] M002: Branded Installer & Onboarding Experience — ASCII logo, postinstall banner, unified onboarding wizard
|
||||
- [ ] M003: AI-Driven Test Flows — intent-based YAML test specs the agent writes during development and executes autonomously at UAT time (browser, mac, api targets)
|
||||
|
|
@ -1,7 +0,0 @@
|
|||
# Queue
|
||||
|
||||
<!-- Append-only log of queued milestones. -->
|
||||
|
||||
| # | Queued | Milestone | Title | Depends On | Notes |
|
||||
|---|--------|-----------|-------|------------|-------|
|
||||
| 1 | 2026-03-11 | M003 | AI-Driven Test Flows | M001 (bundled extension infrastructure) | Intent-based YAML test specs — browser, mac, api targets — with flow-driven UAT type for autonomous execution at slice completion |
|
||||
|
|
@ -1,205 +0,0 @@
|
|||
# Requirements
|
||||
|
||||
This file is the explicit capability and coverage contract for GSD 2.0.
|
||||
|
||||
## Active
|
||||
|
||||
### R001 — Single-command install
|
||||
|
||||
- Class: primary-user-loop
|
||||
- Status: validated
|
||||
- Description: `npm install -g gsd-pi` installs the gsd CLI and all bundled resources in a single command with no additional manual steps required
|
||||
- Why it matters: The whole product promise is zero-friction install. If install requires manual steps, the product fails its core pitch.
|
||||
- Source: user
|
||||
- Primary owning slice: M001/S01
|
||||
- Supporting slices: M001/S04
|
||||
- Validation: S04 — npm install -g gsd-pi from registry installs working binary; zero extension load errors; R001 fully validated
|
||||
|
||||
### R002 — Branded identity
|
||||
|
||||
- Class: differentiator
|
||||
- Status: validated
|
||||
- Description: The CLI is named `gsd`, state lives in `~/.gsd/`, the TUI header shows "gsd", and no pi branding is visible to the user in normal operation
|
||||
- Why it matters: GSD 2.0 is a product, not a pi config. Users should experience a coherent branded tool.
|
||||
- Source: user
|
||||
- Primary owning slice: M001/S01
|
||||
- Supporting slices: none
|
||||
- Validation: S01 — TUI header confirmed "gsd" via live runtime launch; piConfig.name=gsd, piConfig.configDir=.gsd verified; ~/.gsd/ confirmed created
|
||||
|
||||
### R003 — Bundled GSD extension
|
||||
|
||||
- Class: core-capability
|
||||
- Status: validated
|
||||
- Description: The `/gsd` command, auto-mode, GSD dashboard (Ctrl+Alt+G), and all GSD workflow commands work out of the box with no additional configuration
|
||||
- Why it matters: The GSD extension is the primary reason users install this tool.
|
||||
- Source: user
|
||||
- Primary owning slice: M001/S02
|
||||
- Supporting slices: none
|
||||
- Validation: S02 — gsd extension loads without errors on launch (zero stderr extension errors confirmed); interactive /gsd command use deferred to S04 UAT
|
||||
|
||||
### R004 — Bundled supporting extensions
|
||||
|
||||
- Class: core-capability
|
||||
- Status: validated
|
||||
- Description: All extensions from `~/.pi/agent/extensions/` ship bundled: browser-tools, search-the-web, context7, subagent, bg-shell, worktree, plan-mode, slash-commands, ask-user-questions, get-secrets-from-user
|
||||
- Why it matters: These extensions are what make the agent useful as a coding agent. GSD without browser tools, web search, and subagent is significantly less capable.
|
||||
- Source: user
|
||||
- Primary owning slice: M001/S02
|
||||
- Supporting slices: none
|
||||
- Validation: S02 — all 10 supporting extensions load without errors (zero stderr extension errors on launch); functional tool use (browser launch, web search) deferred to S04 UAT
|
||||
|
||||
### R005 — Bundled agents and AGENTS.md
|
||||
|
||||
- Class: core-capability
|
||||
- Status: validated
|
||||
- Description: The scout, researcher, and worker agents are bundled and available. The AGENTS.md hard rules and execution heuristics are loaded as the default agent context.
|
||||
- Why it matters: Agents and AGENTS.md define how the model behaves. Without them, subagent delegation and model discipline don't work.
|
||||
- Source: user
|
||||
- Primary owning slice: M001/S02
|
||||
- Supporting slices: none
|
||||
- Validation: S02 — scout.md, researcher.md, worker.md present in src/resources/agents/; AGENTS.md (15,070 bytes) written to ~/.gsd/agent/ on first launch via initResources()
|
||||
|
||||
### R006 — First-run setup wizard
|
||||
|
||||
- Class: launchability
|
||||
- Status: validated
|
||||
- Description: On first run, if optional tool API keys (Brave, Context7, Jina) are missing, a wizard prompts for them with masked input. Keys are stored in `~/.gsd/agent/auth.json` and hydrated into process.env on every launch. Wizard does not run on subsequent starts if keys are already configured. Anthropic auth is handled by pi's OAuth/API key flow — not the wizard.
|
||||
- Why it matters: Without API keys, nothing works. A wizard that detects and collects missing keys turns a broken first run into a successful one.
|
||||
- Source: user
|
||||
- Primary owning slice: M001/S03
|
||||
- Supporting slices: none
|
||||
- Validation: S03 — automated verify script (6/6 pass) + interactive UAT; wizard fires for missing optional keys, stores them, TUI launches, rerun skips wizard
|
||||
|
||||
### R007 — Isolated state in ~/.gsd/
|
||||
|
||||
- Class: quality-attribute
|
||||
- Status: validated
|
||||
- Description: All GSD state (auth, sessions, settings, logs) lives in `~/.gsd/`, completely separate from `~/.pi/`. Installing gsd must not modify or read a user's existing pi configuration.
|
||||
- Why it matters: Users may have an existing pi installation. GSD must not corrupt or interfere with it.
|
||||
- Source: inferred
|
||||
- Primary owning slice: M001/S01
|
||||
- Supporting slices: none
|
||||
- Validation: S01 — ~/.gsd/agent/ and ~/.gsd/sessions/ created after launch; ~/.pi/agent/sessions/ count unchanged (28/28) before and after gsd run
|
||||
|
||||
### R008 — npm update workflow
|
||||
|
||||
- Class: continuity
|
||||
- Status: validated
|
||||
- Description: `npm update -g gsd-pi` installs a new version with updated bundled resources. The update is clean — no stale extension files from old versions.
|
||||
- Why it matters: Software that can't update cleanly accumulates technical debt and breaks silently.
|
||||
- Source: user
|
||||
- Primary owning slice: M001/S04
|
||||
- Supporting slices: none
|
||||
- Validation: S04 — cpSync force:true in initResources ensures npm update -g replaces bundled resources; tarball smoke test confirms clean install path
|
||||
|
||||
### R009 — Observable failure state
|
||||
|
||||
- Class: failure-visibility
|
||||
- Status: validated
|
||||
- Description: If optional tool API keys are missing in a non-interactive run, the warning is actionable: it names the missing providers. Extension load failures are surfaced, not silently swallowed.
|
||||
- Why it matters: Silent failures are debugging nightmares. A future agent or user must be able to localize what broke without guessing.
|
||||
- Source: inferred
|
||||
- Primary owning slice: M001/S03
|
||||
- Supporting slices: M001/S02
|
||||
- Validation: S03 — non-TTY warning names all three missing providers (Brave Search, Context7, Jina); cat ~/.gsd/agent/auth.json shows stored state; extension load failure surface from S02 confirmed intact
|
||||
|
||||
### R010 — Test flow execution
|
||||
|
||||
- Class: core-capability
|
||||
- Status: active
|
||||
- Description: The agent can write YAML test specifications during development and execute them against browser, mac, and api targets via `run_test_flow` and `run_test_suite` tools. Flows use intent-based verification blocks (verify/given/expect) that the agent interprets adaptively. Browser tests run in a fresh isolated Playwright session.
|
||||
- Why it matters: Closes the gap between "agent builds a feature" and "agent proves it works" — durable, re-runnable test artifacts that survive context wipes.
|
||||
- Source: user
|
||||
- Primary owning slice: M003 (TBD)
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
|
||||
### R011 — Flow-driven UAT
|
||||
|
||||
- Class: core-capability
|
||||
- Status: active
|
||||
- Description: GSD auto-mode recognizes `flow-driven` as a UAT type. At slice completion, the UAT pipeline automatically executes all flow files in the slice's `flows/` directory and writes structured pass/fail results to the UAT result file.
|
||||
- Why it matters: Makes UAT fully autonomous for slices with test flows — no human intervention needed for UI/API verification.
|
||||
- Source: user
|
||||
- Primary owning slice: M003 (TBD)
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
|
||||
## Deferred
|
||||
|
||||
### R020 — Plugin system
|
||||
|
||||
- Class: differentiator
|
||||
- Status: deferred
|
||||
- Description: Allow users to install additional pi packages on top of GSD via `gsd install npm:pkg`
|
||||
- Why it matters: Makes GSD extensible beyond what ships in the box
|
||||
- Source: inferred
|
||||
- Primary owning slice: none
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
- Notes: Deferred — M001 ships bundled-only. Plugin support is explicitly post-MVP.
|
||||
|
||||
### R021 — Skills bundle
|
||||
|
||||
- Class: core-capability
|
||||
- Status: deferred
|
||||
- Description: Ship the skills from `~/.pi/agent/skills/` as bundled GSD skills
|
||||
- Why it matters: Skills provide specialized workflows
|
||||
- Source: user
|
||||
- Primary owning slice: none
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
- Notes: User explicitly excluded skills from M001. Can add in M002.
|
||||
|
||||
## Out of Scope
|
||||
|
||||
### R030 — pi compatibility / interoperability
|
||||
|
||||
- Class: anti-feature
|
||||
- Status: out-of-scope
|
||||
- Description: GSD does not read from or write to `~/.pi/`. There is no migration from pi to gsd. No `pi install npm:gsd` target.
|
||||
- Why it matters: Prevents scope confusion. GSD is a product, not a pi extension.
|
||||
- Source: user
|
||||
- Primary owning slice: none
|
||||
- Supporting slices: none
|
||||
- Validation: n/a
|
||||
- Notes: Explicitly out of scope by architecture decision.
|
||||
|
||||
### R031 — Web/desktop UI
|
||||
|
||||
- Class: constraint
|
||||
- Status: out-of-scope
|
||||
- Description: GSD 2.0 is terminal-only. No web UI, no Electron wrapper, no RPC mode.
|
||||
- Why it matters: Keeps scope focused on the CLI product.
|
||||
- Source: inferred
|
||||
- Primary owning slice: none
|
||||
- Supporting slices: none
|
||||
- Validation: n/a
|
||||
- Notes: `pi-web-ui` and RPC mode explicitly not used.
|
||||
|
||||
## Traceability
|
||||
|
||||
| ID | Class | Status | Primary owner | Supporting | Proof |
|
||||
| ---- | ------------------ | ------------ | ------------- | ---------- | -------- |
|
||||
| R001 | primary-user-loop | validated | M001/S01 | M001/S04 | S04 — npm install -g gsd-pi from registry; zero extension errors; binary confirmed |
|
||||
| R002 | differentiator | validated | M001/S01 | none | S01 — TUI shows "gsd", piConfig confirmed, ~/.gsd/ confirmed |
|
||||
| R003 | core-capability | validated | M001/S02 | none | S02 — gsd extension loads clean; interactive /gsd use deferred to S04 |
|
||||
| R004 | core-capability | validated | M001/S02 | none | S02 — all 10 supporting extensions load without errors; functional use deferred to S04 |
|
||||
| R005 | core-capability | validated | M001/S02 | none | S02 — agents present; AGENTS.md (15,070 bytes) written to ~/.gsd/agent/ on first launch |
|
||||
| R006 | launchability | validated | M001/S03 | none | S03 — optional-key wizard fires, stores, skips on rerun |
|
||||
| R007 | quality-attribute | validated | M001/S01 | none | S01 — ~/.gsd/ created; ~/.pi/ sessions unchanged (28/28) |
|
||||
| R008 | continuity | validated | M001/S04 | none | S04 — cpSync force:true; tarball smoke confirms clean install path |
|
||||
| R009 | failure-visibility | validated | M001/S03 | M001/S02 | S03 — non-TTY warning names missing providers; extension errors surface confirmed |
|
||||
| R020 | differentiator | deferred | none | none | unmapped |
|
||||
| R021 | core-capability | deferred | none | none | unmapped |
|
||||
| R010 | core-capability | active | M003 (TBD) | none | unmapped |
|
||||
| R011 | core-capability | active | M003 (TBD) | none | unmapped |
|
||||
| R030 | anti-feature | out-of-scope | none | none | n/a |
|
||||
| R031 | constraint | out-of-scope | none | none | n/a |
|
||||
|
||||
## Coverage Summary
|
||||
|
||||
- Active requirements: 11
|
||||
- Mapped to slices: 9
|
||||
- Validated: 9 (R001, R002, R003, R004, R005, R006, R007, R008, R009)
|
||||
- Unmapped active requirements: 2 (R010, R011 — pending M003 planning)
|
||||
|
|
@ -1,20 +0,0 @@
|
|||
# GSD State
|
||||
|
||||
**Active Milestone:** M002 — Branded Installer & Onboarding Experience
|
||||
**Active Slice:** S02 — ASCII logo in postinstall + first-launch banner
|
||||
**Phase:** executing
|
||||
**Requirements Status:** 11 active · 0 validated · 2 deferred · 2 out of scope
|
||||
|
||||
## Milestone Registry
|
||||
- ✅ **M001:** GSD 2.0 MVP CLI
|
||||
- 🔄 **M002:** Branded Installer & Onboarding Experience
|
||||
- ⬜ **M003:** M003
|
||||
|
||||
## Recent Decisions
|
||||
- None recorded
|
||||
|
||||
## Blockers
|
||||
- None
|
||||
|
||||
## Next Action
|
||||
Execute T01: Create shared logo module and wire into postinstall + loader in slice S02.
|
||||
|
|
@ -1,196 +0,0 @@
|
|||
---
|
||||
id: M001
|
||||
provides:
|
||||
- gsd-pi npm package (published, unscoped) — single-command install of the full GSD coding agent
|
||||
- gsd binary with "gsd" TUI branding, state in ~/.gsd/, ~/.pi/ untouched
|
||||
- 11 bundled extensions (gsd, browser-tools, search-the-web, context7, subagent, bg-shell, worktree, plan-mode, slash-commands, ask-user-questions, get-secrets-from-user)
|
||||
- Bundled agents (scout, researcher, worker) + AGENTS.md auto-deployed to ~/.gsd/agent/
|
||||
- First-run setup wizard (optional keys: Brave/Context7/Jina) with masked TTY input
|
||||
- pkg/ shim directory for PI_PACKAGE_DIR branding mechanism
|
||||
- Two-file loader pattern (loader.ts → cli.ts) with 4 GSD_ env vars
|
||||
- resource-loader.ts wiring all extensions via DefaultResourceLoader.additionalExtensionPaths
|
||||
key_decisions:
|
||||
- D001: SDK embedding via createAgentSession + InteractiveMode (not subprocess)
|
||||
- D002: State in ~/.gsd/ for complete isolation from ~/.pi/
|
||||
- D003: PI_PACKAGE_DIR branding mechanism — set before pi internals load
|
||||
- D004: Extension delivery — copy .ts source, pi's jiti handles JIT compilation
|
||||
- D013: pkg/ shim directory — avoids getThemesDir() src-check collision
|
||||
- D015: subagent spawns process.execPath + GSD_BIN_PATH (not "pi" binary)
|
||||
- D017: AGENTS.md first-run write with existsSync guard
|
||||
- D018: Wizard injection point is pre-session (before createAgentSession)
|
||||
- D020: Wizard scope is optional keys only — Anthropic auth is pi's responsibility
|
||||
- D021: GSD_BUNDLED_EXTENSION_PATHS uses agentDir-based paths to prevent double-load
|
||||
- D023: Published as gsd-pi (unscoped) — @glittercowboy scope not provisioned on npm
|
||||
patterns_established:
|
||||
- Two-file loader pattern: loader.ts (sets env, dynamic-imports) → cli.ts (static SDK imports)
|
||||
- pkg/ shim directory with piConfig and theme assets — PI_PACKAGE_DIR target with no src/ subdir
|
||||
- import.meta.url + fileURLToPath for module-relative resource paths
|
||||
- GSD_ env vars set in loader.ts before cli.js dynamic import
|
||||
- Pre-session auth gate: loadStoredEnvKeys → runWizardIfNeeded → initResources → createAgentSession
|
||||
- GSD_BUNDLED_EXTENSION_PATHS colon-delimited for subagent --extension args
|
||||
- process.execPath + GSD_BIN_PATH for spawning child gsd processes
|
||||
- existsSync guard on first-run resource writes to prevent overwriting user customizations
|
||||
- npm run copy-themes populates pkg/dist/modes/interactive/theme/ from node_modules at build time
|
||||
observability_surfaces:
|
||||
- "TUI launch: (node dist/loader.js & sleep 4; kill $!) 2>&1 — GSD ASCII art + version confirms branding"
|
||||
- "Extension errors: (node dist/loader.js & sleep 6; kill $!) 2>&1 | grep 'Extension load error' — zero matches = all clean"
|
||||
- "State isolation: ls ~/.gsd/ — agent/, sessions/ present; ls ~/.pi/agent/sessions/ — count unchanged"
|
||||
- "Registry health: npm view gsd-pi — shows version, dist-tags, maintainer"
|
||||
- "Wizard behavior: BRAVE_API_KEY= CONTEXT7_API_KEY= JINA_API_KEY= node dist/loader.js < /dev/null 2>&1 — surfaces warning"
|
||||
- "Env vars: grep GSD_ dist/loader.js — confirms all 4 env vars set"
|
||||
- "Verify scripts: bash scripts/verify-s03.sh (6 checks), bash scripts/verify-s04.sh (10 checks)"
|
||||
requirement_outcomes:
|
||||
- id: R001
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S04 — npm install -g gsd-pi from registry installs working binary; zero extension load errors on launch"
|
||||
- id: R002
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S01 — TUI header confirmed 'gsd' via live runtime launch; piConfig.name=gsd, piConfig.configDir=.gsd verified; ~/.gsd/ created"
|
||||
- id: R003
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S02 — gsd extension loads without errors on launch (zero stderr extension errors confirmed)"
|
||||
- id: R004
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S02 — all 10 supporting extensions load without errors on launch; confirmed via stderr capture"
|
||||
- id: R005
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S02 — agents in src/resources/agents/; AGENTS.md (15,070 bytes) written to ~/.gsd/agent/ on first launch"
|
||||
- id: R006
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S03 — automated verify script 6/6 pass + interactive UAT; wizard fires, stores keys, skips on rerun"
|
||||
- id: R007
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S01 — ~/.gsd/ created; ~/.pi/agent/sessions/ count unchanged (28/28 before and after gsd launch)"
|
||||
- id: R008
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S04 — cpSync force:true in initResources ensures update replaces bundled resources; tarball smoke confirms clean path"
|
||||
- id: R009
|
||||
from_status: active
|
||||
to_status: validated
|
||||
proof: "S03 — non-TTY warning names missing providers; S02 — extension load errors surface to stderr"
|
||||
duration: ~5 hours across 4 slices (S01 ~1h, S02 ~75min, S03 ~45min, S04 ~3h)
|
||||
verification_result: passed
|
||||
completed_at: 2026-03-11
|
||||
---
|
||||
|
||||
# M001: GSD 2.0 MVP CLI
|
||||
|
||||
**Single-command `npm install -g gsd-pi` installs a fully branded GSD coding agent with 11 bundled extensions, agents, first-run wizard, and state isolation — all 9 requirements validated.**
|
||||
|
||||
## What Happened
|
||||
|
||||
Four slices built the complete GSD 2.0 MVP CLI from scratch:
|
||||
|
||||
**S01 (CLI Scaffold and Branding)** established the binary architecture. The key discovery was that pi's `config.js::getThemesDir()` checks for a `src/` subdirectory at the `PI_PACKAGE_DIR` target — since the project has a real `src/`, this caused theme resolution to fail. The fix was the `pkg/` shim directory: a lean subdirectory containing only `package.json` (with piConfig) and theme assets, with no `src/` to trigger the collision. The two-file loader pattern (`loader.ts` sets env vars and dynamic-imports `cli.ts`) ensures `PI_PACKAGE_DIR` is set before any pi SDK code evaluates. After S01, the binary launched with "gsd" in the TUI header and state wrote to `~/.gsd/`.
|
||||
|
||||
**S02 (Bundle Extensions and Agents)** copied all 12 extension source trees into `src/resources/extensions/` and applied surgical patches to 6 files to eliminate hardcoded `~/.pi/` paths. The subagent extension was patched to spawn `process.execPath` with `GSD_BIN_PATH` instead of `spawn("pi", ...)`. A `resource-loader.ts` module wires all 11 extension entry points into `DefaultResourceLoader.additionalExtensionPaths`. `initResources()` writes AGENTS.md to `~/.gsd/agent/` on first launch behind an existsSync guard. All 11 extensions loaded without errors on launch.
|
||||
|
||||
**S03 (First-run Setup Wizard)** built `wizard.ts` with masked TTY input for optional API keys (Brave, Context7, Jina). The critical scoping decision: Anthropic auth is pi's responsibility via OAuth — the wizard only handles optional tool keys. The wizard wires into `cli.ts` as a pre-session auth gate: `loadStoredEnvKeys` → `runWizardIfNeeded` → `initResources` → `createAgentSession`. This ensures env is fully hydrated before extensions load.
|
||||
|
||||
**S04 (npm Publish and Install Smoke Test)** fixed a `GSD_BUNDLED_EXTENSION_PATHS` bug where the env var pointed to `src/resources/` paths instead of agentDir-based paths (causing subagent double-load). The package was initially published as `@glittercowboy/gsd` but the npm scope wasn't provisioned — switched to unscoped `gsd-pi` which resolved immediately. Registry install confirmed working with zero extension load errors.
|
||||
|
||||
## Cross-Slice Verification
|
||||
|
||||
Each success criterion from the roadmap was verified:
|
||||
|
||||
**`npm install -g gsd-pi` in a clean environment produces a working `gsd` binary:**
|
||||
- `npm view gsd-pi` returns v2.3.7 on the npm registry
|
||||
- S04 verified tarball install to an isolated prefix with successful launch
|
||||
- 10-check automated smoke test (`scripts/verify-s04.sh`) all passed
|
||||
|
||||
**`gsd` TUI header shows "gsd" — no pi branding visible in normal operation:**
|
||||
- Live launch of `node dist/loader.js` displays GSD ASCII art logo + "Get Shit Done v2.3.7"
|
||||
- `piConfig.name=gsd`, `piConfig.configDir=.gsd` confirmed via node eval
|
||||
- `PI_PACKAGE_DIR` confirmed pointing to `pkg/` in compiled `dist/loader.js`
|
||||
|
||||
**State lives in `~/.gsd/` — `~/.pi/` is untouched:**
|
||||
- `ls ~/.gsd/` shows `agent/`, `sessions/`, `preferences.md`
|
||||
- S01 verified `~/.pi/agent/sessions/` count unchanged (28/28) before and after gsd launch
|
||||
|
||||
**First-run wizard fires when API keys are missing, collects them, and stores them:**
|
||||
- S03 automated verify script: 6/6 checks passed (build, non-TTY warning, non-TTY no-exit-1, wizard skip, env hydration)
|
||||
- Interactive UAT confirmed masked input, key storage, wizard skip on rerun
|
||||
|
||||
**`/gsd` command is registered and responds correctly:**
|
||||
- gsd extension loads without errors (zero `Extension load error` matches in launch output)
|
||||
- Extension source includes `commands.ts` with `/gsd` command registration
|
||||
|
||||
**All bundled extensions load and their tools are available to the model:**
|
||||
- Launch test with stderr capture: zero extension load errors across all 11 extensions
|
||||
- `grep GSD_ dist/loader.js` shows 11 lines confirming all GSD_ env vars present
|
||||
|
||||
**`npm update -g gsd-pi` works cleanly on an existing install:**
|
||||
- `initResources()` uses `cpSync` with `force: true` for bundled resource updates
|
||||
- S04 tarball smoke test confirmed clean install over existing state
|
||||
|
||||
## Requirement Changes
|
||||
|
||||
- R001: active → validated — `npm install -g gsd-pi` from registry installs working binary with zero extension errors
|
||||
- R002: active → validated — TUI shows "gsd", piConfig confirmed, ~/.gsd/ created, ~/.pi/ untouched
|
||||
- R003: active → validated — gsd extension loads without errors on launch
|
||||
- R004: active → validated — all 10 supporting extensions load without errors on launch
|
||||
- R005: active → validated — agents in src/resources/agents/; AGENTS.md auto-deployed to ~/.gsd/agent/
|
||||
- R006: active → validated — optional-key wizard fires, stores, skips on rerun; scope narrowed to optional keys only (Anthropic handled by pi)
|
||||
- R007: active → validated — ~/.gsd/ created; ~/.pi/ sessions unchanged (28/28)
|
||||
- R008: active → validated — cpSync force:true ensures update replaces bundled resources; tarball smoke confirmed
|
||||
- R009: active → validated — non-TTY warning names missing providers; extension load errors surface to stderr
|
||||
|
||||
## Forward Intelligence
|
||||
|
||||
### What the next milestone should know
|
||||
- The package is `gsd-pi` on npm (unscoped), not `@glittercowboy/gsd`. The binary name is `gsd`.
|
||||
- `PI_PACKAGE_DIR` points to `pkg/` shim — any pi config resolution goes through this directory. If pi changes how `config.js` resolves piConfig or themes, this mechanism may break.
|
||||
- `GSD_BUNDLED_EXTENSION_PATHS` must match what `buildResourceLoader` discovers from agentDir. After `initResources()` syncs extensions to `~/.gsd/agent/extensions/`, subagent spawning uses these agentDir-based paths for `--extension` args.
|
||||
- `initResources()` writes AGENTS.md only once (existsSync guard). Existing installs won't get updated AGENTS.md content on upgrade unless the guard logic changes.
|
||||
- The wizard only handles optional tool keys (Brave/Context7/Jina). Anthropic auth is entirely pi's territory.
|
||||
- `loadStoredEnvKeys` runs on every launch from `cli.ts`, hydrating env from `auth.json` before extensions load.
|
||||
- Extensions are `.ts` source JIT-compiled by pi's jiti at runtime — not pre-compiled. Any TypeScript syntax jiti doesn't support will fail at load time (visible via stderr), not at build time.
|
||||
|
||||
### What's fragile
|
||||
- `pkg/` shim + PI_PACKAGE_DIR mechanism — relies on undocumented `config.js::getThemesDir()` behavior (src-check). Any pi update changing this logic breaks branding silently. Observable signal: ENOENT on dark.json at launch.
|
||||
- `dist/resource-loader.js` computes extension paths via `resolve(dirname(fileURLToPath(import.meta.url)), '..', 'src', 'resources', ...)` — correct for local dev but depends on `src/resources` being in the published `files` array.
|
||||
- `@mariozechner/pi-coding-agent` version pin (`^0.57.1`) — breaking changes in pi SDK will cascade to extension loading failures.
|
||||
- `skipLibCheck: true` in tsconfig masks transitive type errors from pi/google deps.
|
||||
- jiti JIT compilation of bundled `.ts` extensions — cutting-edge TS features may fail silently at load time.
|
||||
|
||||
### Authoritative diagnostics
|
||||
- `npm view gsd-pi` — canonical registry health check; confirms version and availability
|
||||
- `bash scripts/verify-s04.sh` — 10-check install regression suite; PASS/FAIL labeled per check
|
||||
- `bash scripts/verify-s03.sh` — 6-check wizard regression suite
|
||||
- `(node dist/loader.js & sleep 6; kill $!) 2>&1 | grep "Extension load error"` — zero lines = all extensions clean
|
||||
- `grep GSD_ dist/loader.js` — confirms env var presence and values
|
||||
- `ls pkg/dist/modes/interactive/theme/` — dark.json and light.json must exist; run `npm run copy-themes` to fix
|
||||
|
||||
### What assumptions changed
|
||||
- PI_PACKAGE_DIR → project root was wrong — `pkg/` shim required due to getThemesDir() src-check (D013)
|
||||
- ModelRegistry is a constructor, not a static factory (D010)
|
||||
- InteractiveMode.run() is an instance method, not static (D011)
|
||||
- Scoped npm publish `@glittercowboy/gsd` failed — scope not provisioned; unscoped `gsd-pi` works (D023)
|
||||
- Wizard scope narrowed from required+optional keys to optional-only — pi handles Anthropic auth (D020)
|
||||
- extensionsResult.errors shape is `{ path, error }` not `{ message }` — SDK type correction
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
- `package.json` — project manifest: name=gsd-pi, bin.gsd, piConfig, type:module, files array, build scripts, prepublishOnly
|
||||
- `tsconfig.json` — NodeNext/ESM config with skipLibCheck:true, exclude src/resources
|
||||
- `src/loader.ts` — binary entrypoint: sets PI_PACKAGE_DIR + 4 GSD_ env vars, dynamic-imports cli.js
|
||||
- `src/cli.ts` — SDK wiring: AuthStorage, ModelRegistry, wizard, initResources, buildResourceLoader, createAgentSession, InteractiveMode
|
||||
- `src/app-paths.ts` — ~/.gsd/ path constants (appRoot, agentDir, sessionsDir, authFilePath)
|
||||
- `src/wizard.ts` — optional-key wizard: loadStoredEnvKeys + runWizardIfNeeded
|
||||
- `src/resource-loader.ts` — buildResourceLoader(agentDir) + initResources(agentDir)
|
||||
- `pkg/package.json` — piConfig shim: { name: "gsd", configDir: ".gsd" }
|
||||
- `pkg/dist/modes/interactive/theme/` — pi theme assets (copied by build)
|
||||
- `src/resources/extensions/` — all 11 bundled extension source trees (patched for ~/.gsd/)
|
||||
- `src/resources/agents/` — scout.md, researcher.md, worker.md
|
||||
- `src/resources/AGENTS.md` — bundled agent context rules
|
||||
- `src/resources/GSD-WORKFLOW.md` — GSD workflow protocol document
|
||||
- `scripts/verify-s03.sh` — 6-check wizard verification script
|
||||
- `scripts/verify-s04.sh` — 10-check install smoke test script
|
||||
|
|
@ -1,133 +0,0 @@
|
|||
# M003: AI-Driven Test Flows — Context
|
||||
|
||||
**Gathered:** 2026-03-11
|
||||
**Status:** Queued — pending auto-mode execution
|
||||
|
||||
## Project Description
|
||||
|
||||
A new GSD extension (`test-flows`) that introduces intent-based YAML test specifications the agent writes during development and executes autonomously at UAT time. Flows describe **what to verify** (not mechanical step-by-step scripts), and the agent interprets each verification block using its full adaptive intelligence — choosing selectors, handling flakiness, retrying intelligently, and diagnosing failures.
|
||||
|
||||
Supports three target surfaces: **browser** (web apps via Playwright), **mac** (native macOS apps via Accessibility APIs), and **api** (HTTP request/response verification).
|
||||
|
||||
This is GSD's testing arm — the thing that closes the loop between "agent builds a feature" and "agent proves it works."
|
||||
|
||||
## Why This Milestone
|
||||
|
||||
GSD's current UAT pipeline has a gap: `artifact-driven` UAT runs shell commands and file checks, while `live-runtime` and `human-experience` UAT punt to the human. There is no way for the agent to write durable, re-runnable UI/API tests during development that execute automatically at UAT time.
|
||||
|
||||
The agent already has the tools (`browser_*`, `mac_*`, `bash` for HTTP) — what's missing is a structured format for persisting test intent and a runner that orchestrates execution against fresh isolated sessions. This milestone fills that gap.
|
||||
|
||||
The insight from Maestro evaluation: don't compete with Maestro as a standalone deterministic test runner. Instead, leverage what GSD is uniquely good at — AI-driven adaptive execution of test specifications. The YAML files are intent specs, not scripts. The AI handles the "how."
|
||||
|
||||
## User-Visible Outcome
|
||||
|
||||
### When this milestone is complete, the user can:
|
||||
|
||||
- See the agent write `.yaml` test flow files during slice development that describe what to verify
|
||||
- Have UAT run automatically at slice completion — the agent executes all flow files and writes a structured pass/fail report
|
||||
- Read `S01-UAT-RESULT.md` with per-flow, per-verification pass/fail results, timing, screenshots on failure, and diagnostic context
|
||||
- Manually trigger test flows via the agent calling `run_test_flow` or `run_test_suite` tools at any time
|
||||
- Test web apps (browser target), macOS apps (mac target), and APIs (api target) from the same flow format
|
||||
|
||||
### Entry point / environment
|
||||
|
||||
- Entry point: LLM tool calls (`run_test_flow`, `run_test_suite`) + GSD auto-mode UAT pipeline
|
||||
- Environment: local dev (macOS terminal running `gsd`)
|
||||
- Live dependencies involved: Playwright (bundled), mac-tools Swift CLI (bundled), HTTP via Node fetch (built-in)
|
||||
|
||||
## Completion Class
|
||||
|
||||
- Contract complete means: flow YAML parser validates correctly, runner executes all three targets (browser/mac/api) and returns structured results, `flow-driven` UAT type is recognized by the auto-mode pipeline
|
||||
- Integration complete means: agent writes flows during development, auto-mode UAT dispatches `run_test_suite`, results appear in `S01-UAT-RESULT.md`, failures include screenshots and diagnostics
|
||||
- Operational complete means: the full loop works end-to-end in a real GSD auto-mode session — agent builds a web feature, writes test flows, completes the slice, UAT runs the flows, report is written
|
||||
|
||||
## Final Integrated Acceptance
|
||||
|
||||
To call this milestone complete, we must prove:
|
||||
|
||||
- Agent can write a browser-target flow YAML during development, and `run_test_flow` executes it against a running local web app with correct pass/fail results
|
||||
- Agent can write a mac-target flow YAML, and it executes against a real macOS app (e.g., TextEdit) with correct pass/fail results
|
||||
- Agent can write an api-target flow YAML with HTTP request/response checks, and it executes correctly
|
||||
- `flow-driven` UAT type triggers automatic test suite execution at slice completion in auto-mode, with results written to the UAT result file
|
||||
- Test execution uses a fresh isolated browser session, not the agent's development browser
|
||||
- Failures include actionable diagnostics: screenshots, console logs (browser), element state (mac), response bodies (api)
|
||||
|
||||
## Risks and Unknowns
|
||||
|
||||
- **Inter-extension isolation** — The test-flows extension must run its own Playwright browser instance, separate from browser-tools' instance. Two Playwright instances in the same process should work (Playwright supports it), but needs verification. If they conflict, the runner may need to use a subprocess.
|
||||
- **Mac-tools CLI access** — The test-flows extension needs to call the mac-tools Swift CLI binary directly. The binary is compiled on first use by the mac-tools extension. test-flows must either wait for mac-tools to compile it first, or handle compilation itself. Need to determine the right approach.
|
||||
- **Agent flow authoring quality** — The value depends on Claude writing good test specifications during development. If the generated flows are too vague or too brittle, the system fails in practice. This is a prompt engineering challenge, not a code challenge. The system prompt guidelines for the tool must be excellent.
|
||||
- **Adaptive execution reliability** — Each `verify` block is interpreted by the LLM. Non-determinism means a flow might pass one run and fail the next. Need to design the execution model to minimize this (clear verify/expect structure, retries, good diagnostics on failure).
|
||||
- **Execution model for verify blocks** — The runner tool receives a YAML flow and must execute each verify block. Since extensions can't call other extensions' tools, the runner must use Playwright/mac-tools/fetch directly (not via `browser_*` tools). This means reimplementing some of the smart waiting/settling logic from browser-tools. Alternatively, each verify block could be dispatched as an LLM sub-turn — but that's expensive and slow. The right balance needs to be found.
|
||||
|
||||
## Existing Codebase / Prior Art
|
||||
|
||||
- `src/resources/extensions/browser-tools/index.ts` — Full Playwright browser automation extension (~4990 lines). Reference for Playwright patterns, adaptive settling, assertion evaluation, screenshot capture. The test-flows runner will import Playwright directly rather than calling these tools.
|
||||
- `src/resources/extensions/browser-tools/core.js` — Runtime-neutral helpers: action timeline, assertion evaluation (`evaluateAssertionChecks`), compact state diffing. May be importable by test-flows.
|
||||
- `src/resources/extensions/mac-tools/index.ts` — macOS Accessibility API automation via Swift CLI. Reference for how to invoke the Swift CLI binary (`execFileSync` with JSON protocol).
|
||||
- `src/resources/extensions/gsd/auto.ts` — GSD auto-mode engine. Contains `checkNeedsRunUat()`, `buildRunUatPrompt()`, UAT dispatch logic. Must be modified to support `flow-driven` UAT type.
|
||||
- `src/resources/extensions/gsd/files.ts` — Contains `extractUatType()` which classifies UAT types from markdown content. Must be extended with `flow-driven`.
|
||||
- `src/resources/extensions/gsd/prompts/run-uat.md` — UAT execution prompt template. Must be extended with `flow-driven` instructions.
|
||||
- `src/resources/extensions/gsd/templates/uat.md` — UAT file template. Must include `flow-driven` as a valid UAT mode.
|
||||
- Maestro (external, not embedded) — Inspiration for YAML flow format and "arm's length" testing philosophy. Not a dependency. Key takeaways: declarative YAML syntax, smart waiting, accessibility-layer interaction, cross-platform unified format.
|
||||
|
||||
> See `.gsd/DECISIONS.md` for all architectural and pattern decisions — it is an append-only register; read it during planning, append to it during execution.
|
||||
|
||||
## Relevant Requirements
|
||||
|
||||
- R003 (Bundled GSD extension) — This extends the GSD extension's UAT pipeline with a new type
|
||||
- R004 (Bundled supporting extensions) — This adds a new bundled extension (`test-flows`)
|
||||
- New requirement candidates:
|
||||
- R010 — Test flow execution: agent can write and execute YAML test specifications against browser, mac, and api targets
|
||||
- R011 — Flow-driven UAT: auto-mode recognizes `flow-driven` UAT type and executes test suites automatically at slice completion
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
- New `test-flows` extension in `src/resources/extensions/test-flows/`
|
||||
- YAML flow format: header (name, target, url/app/endpoint) + verification blocks (verify/given/expect)
|
||||
- Flow parser with validation and clear error messages
|
||||
- Browser target runner: own Playwright instance, fresh context per flow, smart waiting, screenshot capture
|
||||
- Mac target runner: direct Swift CLI invocation, element resolution, screenshot capture
|
||||
- API target runner: HTTP requests via Node fetch, status/header/body assertions
|
||||
- Two LLM tools: `run_test_flow` (single flow) and `run_test_suite` (directory of flows)
|
||||
- Structured result output: per-flow, per-verification pass/fail, timing, screenshots, diagnostics
|
||||
- New `flow-driven` UAT type in GSD extension (`files.ts`, `auto.ts`, `run-uat.md`, `uat.md`)
|
||||
- System prompt guidelines that teach the agent when and how to write good test flows
|
||||
- Flow files stored alongside slices: `.gsd/milestones/M00X/slices/S0X/flows/*.yaml`
|
||||
|
||||
### Out of Scope / Non-Goals
|
||||
|
||||
- Maestro compatibility (not a goal — different format, different execution model)
|
||||
- Visual regression testing / image diffing (future enhancement)
|
||||
- Parallel flow execution / sharding (future enhancement)
|
||||
- CI/CD integration or headless-only mode (future enhancement)
|
||||
- Flow recording / interactive flow authoring UI (future enhancement — Maestro Studio equivalent)
|
||||
- Mobile device/simulator testing (would require Maestro or Appium — out of scope)
|
||||
|
||||
## Technical Constraints
|
||||
|
||||
- Must be a pi extension following existing patterns (`export default function(pi: ExtensionAPI)`)
|
||||
- Must use TypeBox for tool parameter schemas, StringEnum for enums
|
||||
- Must truncate tool output to stay within context limits
|
||||
- Browser runner must use a separate Playwright instance from browser-tools (test isolation)
|
||||
- Mac runner must invoke the Swift CLI binary at the known path (`src/resources/extensions/mac-tools/swift-cli/.build/release/mac-agent`)
|
||||
- No new npm dependencies beyond what's already bundled (Playwright, yaml parsing via existing means)
|
||||
- Extension loads via `additionalExtensionPaths` — same mechanism as all other bundled extensions
|
||||
|
||||
## Integration Points
|
||||
|
||||
- `browser-tools` extension — Shares Playwright dependency but NOT browser state. test-flows runs its own Playwright instance.
|
||||
- `mac-tools` extension — test-flows calls the same Swift CLI binary but independently. Must handle the case where the binary hasn't been compiled yet.
|
||||
- `gsd` extension — UAT pipeline integration: `files.ts` (extractUatType), `auto.ts` (checkNeedsRunUat, buildRunUatPrompt), `prompts/run-uat.md`, `templates/uat.md`
|
||||
- `src/loader.ts` / `src/cli.ts` — test-flows must be added to `GSD_BUNDLED_EXTENSION_PATHS` and `initResources()` file sync
|
||||
- Playwright — Direct import for browser automation (already a dependency of the project)
|
||||
- Node.js `fetch` — For API target HTTP requests (built into Node 18+)
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **Verify block execution model** — Should each `verify` block be executed by deterministic code (parse expect clauses, run Playwright assertions) or by sending the block to the LLM as a sub-task? Deterministic is faster and cheaper but less adaptive. LLM sub-task is more flexible but slower and non-deterministic. Hybrid approach (deterministic for simple assertions, LLM for complex "verify this looks right" blocks) may be the sweet spot. Needs design decision in planning.
|
||||
- **YAML parsing** — Use `js-yaml` (would need to add as dependency) or parse the simple format manually? The format is simple enough that a hand-rolled parser might suffice and avoids a new dep.
|
||||
- **Mac binary compilation timing** — If test-flows needs the mac-tools binary and it hasn't been compiled yet, should test-flows trigger compilation or just fail with a clear message? Triggering compilation would duplicate logic from mac-tools extension.
|
||||
- **Flow file discovery for UAT** — When `run_test_suite` is called for a slice's flows, should it discover files by convention (all `.yaml` in the `flows/` dir) or should the UAT file explicitly list which flows to run?
|
||||
Loading…
Add table
Reference in a new issue