# CI/CD Pipeline Guide ## Overview GSD 2 uses a three-stage promotion pipeline that automatically moves merged PRs through **Dev → Test → Prod** environments using npm dist-tags. ``` PR merged to main │ ▼ ┌─────────┐ ci.yml passes (build, test, typecheck) │ DEV │ → publishes gsd-pi@-dev. with @dev tag └────┬────┘ ▼ (automatic if green) ┌─────────┐ CLI smoke tests + LLM fixture replay │ TEST │ → promotes to @next tag └────┬────┘ → pushes Docker image as :next ▼ (manual approval required) ┌─────────┐ optional real-LLM integration tests │ PROD │ → promotes to @latest tag └─────────┘ → creates GitHub Release ``` ## For Contributors: Testing Your PR Before It Ships ### Install the Dev Build Every merged PR is immediately installable: ```bash # Latest dev build (bleeding edge, every merged PR) npx gsd-pi@dev # Test candidate (passed smoke + fixture tests) npx gsd-pi@next # Stable production release npx gsd-pi@latest # or just: npx gsd-pi ``` ### Using Docker ```bash # Test candidate docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:next --version # Stable docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:latest --version ``` ### Checking if a Fix Landed 1. Find the PR's merge commit SHA (first 7 chars) 2. Check if it's in `@dev`: `npm view gsd-pi@dev version` - If the version ends in `-dev.`, your PR is in dev 3. Check if it promoted to `@next`: `npm view gsd-pi@next version` 4. Check if it's in production: `npm view gsd-pi@latest version` ## For Maintainers ### Pipeline Workflows | Workflow | File | Trigger | Purpose | |----------|------|---------|---------| | CI | `ci.yml` | PR + push to main | Build, test, typecheck — **gate for all promotions** | | Release Pipeline | `pipeline.yml` | After CI succeeds on main | Three-stage promotion | | Native Binaries | `build-native.yml` | `v*` tags | Cross-compile platform binaries | | Dev Cleanup | `cleanup-dev-versions.yml` | Weekly (Monday 06:00 UTC) | Unpublish `-dev.` versions older than 30 days | ### Gating Tests The pipeline only triggers after `ci.yml` passes. Key gating tests include: - **Unit tests** (`npm run test:unit`) — includes `auto-session-encapsulation.test.ts` which enforces that all auto-mode state is encapsulated in `AutoSession`, plus dispatch loop regression tests that exercise the full `deriveState → resolveDispatch → idempotency` chain without an LLM. Any PR adding module-level mutable state to `auto.ts` will fail CI and block the pipeline. - **Integration tests** (`npm run test:integration`) - **Extension typecheck** (`npm run typecheck:extensions`) - **Package validation** (`npm run validate-pack`) - **Smoke tests** (`npm run test:smoke`) — run post-build in the pipeline against the local binary and again against the globally-installed `@dev` package - **Fixture tests** (`npm run test:fixtures`) — replay recorded LLM conversations without hitting real APIs - **Live regression tests** (`npm run test:live-regression`) — run against the installed binary in the Test stage to catch runtime regressions before promotion to `@next` ### Approving a Prod Release 1. A version reaches the Test stage automatically 2. In GitHub Actions, the `prod-release` job will show "Waiting for review" 3. Click **Review deployments** → select `prod` → **Approve** 4. The version is promoted to `@latest` and a GitHub Release is created To enable live LLM tests during Prod promotion: - Set the `RUN_LIVE_TESTS` environment variable to `true` on the `prod` environment ### Rolling Back a Release If a broken version reaches production: ```bash # Roll back npm npm dist-tag add gsd-pi@ latest # Roll back Docker docker pull ghcr.io/gsd-build/gsd-pi: docker tag ghcr.io/gsd-build/gsd-pi: ghcr.io/gsd-build/gsd-pi:latest docker push ghcr.io/gsd-build/gsd-pi:latest ``` For `@dev` or `@next` rollbacks, the next successful merge will overwrite the tag automatically. ### GitHub Configuration Required | Setting | Value | |---------|-------| | Environment: `dev` | No protection rules | | Environment: `test` | No protection rules | | Environment: `prod` | Required reviewers: maintainers | | Secret: `NPM_TOKEN` | All environments | | Secret: `ANTHROPIC_API_KEY` | Prod environment only | | Secret: `OPENAI_API_KEY` | Prod environment only | | Variable: `RUN_LIVE_TESTS` | `false` (set to `true` to enable live LLM tests) | | GHCR | Enabled for the `gsd-build` org | ### Docker Images | Image | Base | Purpose | Tags | |-------|------|---------|------| | `ghcr.io/gsd-build/gsd-ci-builder` | `node:24-bookworm` | CI build environment with Rust toolchain | `:latest`, `:` | | `ghcr.io/gsd-build/gsd-pi` | `node:24-slim` | User-facing runtime | `:latest`, `:next`, `:v` | The CI builder image is rebuilt automatically when the `Dockerfile` changes. It eliminates ~3-5 min of toolchain setup per CI run. ## LLM Fixture Tests The fixture system records and replays LLM conversations without hitting real APIs (zero cost). ### Running Fixture Tests ```bash npm run test:fixtures ``` ### Recording New Fixtures ```bash # Set your API key, then record GSD_FIXTURE_MODE=record GSD_FIXTURE_DIR=./tests/fixtures/recordings \ node --experimental-strip-types tests/fixtures/record.ts ``` Fixtures are JSON files in `tests/fixtures/recordings/`. Each one captures a conversation's request/response pairs and replays them by turn index. ### When to Re-Record Re-record fixtures when: - Provider wire format changes (e.g., new field in Anthropic response) - Tool definitions change (affects request shape) - System prompt changes (may cause turn count mismatch) ## Version Strategy | Tag | Published | Format | Who uses it | |-----|-----------|--------|-------------| | `@dev` | Every merged PR | `2.27.0-dev.a3f2c1b` | Developers verifying fixes | | `@next` | Auto-promoted from dev | Same version | Early adopters, beta testers | | `@latest` | Manually approved | Same version | Production users | Old `-dev.` versions are cleaned up weekly (30-day retention).