# Repository Guidelines ## Setup Checklist for New Contributors - [ ] Install dev dependencies: `npm install` - [ ] Install pre-commit hooks: `npm run secret-scan:install-hook` - [ ] Apply GitHub labels: `gh label create priority/P0 --color B60205 --description "Critical"` (see .github/labels.yml for full list) - [ ] Verify devcontainer: `devcontainer build --workspace-folder .` - [ ] Run first tech-debt scan: `node scripts/tech-debt-scan.mjs` ## Purpose-First Doctrine sf follows **spec-first TDD**: see [`docs/SPEC_FIRST_TDD.md`](docs/SPEC_FIRST_TDD.md) for the full constitution. Iron Law: ``` THE TEST IS THE SPEC. THE JSDOC IS THE PURPOSE. CODE EXISTS TO FULFILL PURPOSE. NO BEHAVIOR CHANGE WITHOUT A FAILING TEST FIRST. NO COMPLETION WITHOUT A REAL CONSUMER. NO JUDGMENT CALL WITHOUT A CONFIDENCE AND FALSIFIER. ``` Every artifact (slice plan, task plan, function, test, ADR) must answer: - **why** this behaviour exists - **what value** it creates or protects - **who** uses it in production (real consumer, not just tests) - **what breaks** if it returns the wrong answer If any answer is missing: `BLOCKED: purpose unclear — [field]`. Surfacing the gap beats rationalising past it. ## Project Structure This is a TypeScript monorepo with npm workspaces. The main entry point is `dist/loader.js` (bin: `sf`). - `src/` — Main CLI source (sf-run core, extensions, agents) - `packages/` — Workspace packages (8 total): pi-tui, pi-ai, pi-agent-core, pi-coding-agent, daemon, mcp-server, native, rpc-client - `web/` — Next.js web frontend (optional web host mode) - `rust-engine/` — Rust N-API bindings for performance-critical operations - `scripts/` — Build, dev, release, and CI helper scripts - `tests/` — Fixtures, smoke tests, live tests, live-regression tests - `docs/` — User guides and developer documentation - `docker/` — Docker sandbox and builder configurations ## Build, Test, and Development Commands ```bash # Full build (core + web) npm run build # Build core only (packages + tsc + resources) npm run build:core # Dev mode with hot reload npm run dev # Run all tests (unit + integration) npm test # Unit tests only npm run test:unit # Integration tests only npm run test:integration # Coverage check (Vitest V8 provider; thresholds: statements 40%, lines 40%, branches 20%, functions 20%) npm run test:coverage # Type check extensions (no emit) npm run typecheck:extensions # Native Rust build npm run build:native # Root lint checks (Biome over src/) npm run lint npm run lint:fix # Web lint (Next.js ESLint; separate package) npm --prefix web run lint # Release workflow (changelog + version bump) npm run release:changelog npm run release:bump ``` ## Coding Style & Naming Conventions - **Language**: TypeScript with `"strict": true` enabled in all packages - **Module resolution**: NodeNext - **Target**: ES2022 - **Package manager**: npm (canonical; do not commit `bun.lock` or `pnpm-lock.yaml`) - **Commit format**: Conventional Commits enforced via commit-msg hook - **Branch naming**: `/` — e.g. `feat/new-command`, `fix/login-bug` - Types: `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `infra`, `ci`, `perf`, `build`, `revert` ### JSDoc Purpose Convention Every exported function, type, class, and module-level constant opens with a JSDoc block whose first sentence is its **purpose** — the consumer-facing reason it exists. Not what it does (the signature shows that), but **why**. ```ts /** * Acquire a unit claim atomically. Returns true on success, false if another worker * already holds an unexpired lease. * * Purpose: prevent two workers from dispatching the same unit when the run-lock is * unavailable (shared NFS, broken filesystem semantics) — the conditional UPDATE in * SQLite is the safety net. * * Consumer: auto-dispatch.ts when picking the next eligible unit per poll tick. */ export function claimUnit(unitId: string, leaseMs: number): boolean { ... } ``` Required for every exported symbol whose behaviour is non-trivial: - **First line** — what it returns / does, in the present tense. - **Purpose:** — why it exists; the value it protects. - **Consumer:** — who calls it in production. If you can't name a consumer, the symbol shouldn't exist yet. A bare `/** Helper. */` is a code smell. Either write the purpose or delete the symbol. For module-level JSDoc (file headers): keep the existing `module-name.ts — short description` opening, then a `Purpose:` line stating why the module exists as a separable unit. ## Testing Guidelines - **Primary test runner**: Vitest via `npm run test:unit`, `npm run test:integration`, and `npm test` - **Node test runner**: used only by specific package/native/browser-tool scripts where `package.json` says `node --test` - **Coverage tool**: Vitest coverage with `@vitest/coverage-v8`; thresholds are enforced in CI - **Naming**: `*.test.ts` and `*.test.mjs` patterns - **Smoke tests**: `npm run test:smoke` - **Live tests**: `npm run test:live` (requires environment variables) ### Purposeful Tests Test names are contract claims. Use the form `__`: | Good | Bad | |---|---| | `claim_when_lease_expired_returns_true` | `test claim` | | `dispatch_when_blocker_unresolved_skips_unit` | `test dispatch logic` | Three-tier organisation: 1. **Behaviour contracts** (primary) — what the consumer receives. The spec. A different implementation that passes these is equally correct. 2. **Degradation contracts** — what happens when dependencies fail. Consumer must always get a useful response; failure must degrade, not crash. 3. **Implementation guards** (secondary, labelled `// guard:`) — protect specific failure modes (resource leaks, infinite loops). Refactors update guards, not behaviour contracts. Write behaviour contracts first. They are the work order. A test that asserts call counts or mock interactions is **mechanical**, not purposeful — it should be a labelled implementation guard, not a primary contract test. A test that breaks on a refactor without behaviour change is mechanical too. Fix the test or relabel it. **Bug = missing correct-behaviour test.** When fixing a bug, write a test for the *correct* behaviour first — it must fail (RED) because the bug exists. If it passes immediately, the test is testing the broken behaviour; fix the test, not the code. ## Extension Development Extensions live in `src/resources/extensions/`. Each extension should: - Export a manifest with `name`, `version`, `tools[]`, and `agents[]` - Include tests in `src/resources/extensions//tests/` - Register tools via the extension API ## Pull Request Guidelines 1. **Link an issue** — PRs without a linked issue will be closed without review 2. **One concern per PR** — don't bundle unrelated changes 3. **No drive-by formatting** — don't reformat code you didn't touch 4. **CI must pass** — fix failing tests before requesting review 5. **Rebase onto main** — do not merge main into your feature branch 6. Use the PR template at `.github/PULL_REQUEST_TEMPLATE.md` ## Environment Setup Copy `docker/.env.example` to `.env` and fill in API keys. At minimum you need one LLM provider key (Anthropic, OpenAI, Google, or OpenRouter). ## Architecture Notes - State lives on disk in `.sf/` — no in-memory state survives across sessions - Bundled extensions/agents sync to `~/.sf/agent/` on every launch - LLM providers are lazy-loaded on first use to reduce cold-start time - Native Rust engine handles grep, glob, ps, highlight, ast, diff ## SF Planning State `.sf/` is the canonical home for SF agent state. It contains milestone plans, slice plans, task plans, and ephemeral working files under `.sf/milestones/`, `.sf/STATE.md`, `.sf/QUEUE.md`, and related artifacts. **Promote-only rule:** Agent state (the `.sf/` directory under `~/.sf/projects//`) is transient and gitignored — never committed directly. Project state (`.sf/` tracked in the repo root) contains only human-authored artifacts such as `DECISIONS.md`, `KNOWLEDGE.md`, `REQUIREMENTS.md`, `ROADMAP.md`, and `STATE.md`. Promoted artifacts — milestone summaries, architecture decision records (ADRs), and durable specifications — belong in tracked documentation directories: - `docs/plans/` — reviewed implementation plans promoted from `.sf/` milestone planning - `docs/adr/` — accepted architectural decisions promoted from `.sf/DECISIONS.md` - `docs/specs/` — long-lived behavior contracts and API specifications **Naming conventions:** - Milestone IDs: `M001`, `M002`, … - Slice IDs: `S01`, `S02`, … - Task IDs: `T01`, `T02`, … **Commands:** - `sf plan promote ` — copy a file from `.sf/` to `docs/plans/`, `docs/adr/`, or `docs/specs/` - `sf plan list` — list milestone and slice files in `.sf/` - `sf plan diff` — compare `.sf/` state with promoted `docs/` artifacts See [`docs/plans/README.md`](docs/plans/README.md), [`docs/adr/README.md`](docs/adr/README.md), and [`docs/specs/README.md`](docs/specs/README.md) for directory-specific conventions. ## Eval Dump Inbox SF/Pi automatically loads `AGENTS.md` and `CLAUDE.md` from the repo tree at startup. It does not automatically load `TODO.md`, but this repo uses root `TODO.md` as a temporary human dump inbox for eval and self-evolution ideas. When a repo contains a root `TODO.md`, treat it as a temporary dump inbox and read it before planning substantive work in that repo. This applies even when the user does not explicitly mention evals. Treat the `Raw Dump Inbox` section as untriaged source material, not as durable instructions. Triage it into reviewable artifacts: concrete eval cases, harness gaps, memory extraction requirements, docs, tests, or follow-up implementation tasks. After triage, remove the processed dump notes from `TODO.md` so the file returns to an empty inbox/template state. Do not treat dumped notes as runtime memory or approved behavior until they are converted into tested, versioned project artifacts. ## CI/CD - `ci.yml` — builds, tests, gates merges to main - `pipeline.yml` — three-stage release (dev → test → prod) - `pr-risk.yml` — PR risk classification - `ai-triage.yml` — AI-based issue/PR triage ## Code Quality Tooling The repository uses the following quality tools: - **Biome** — root source linting via `npm run lint` and autofix via `npm run lint:fix` - Scope: `src/` plus versioned JSON checks - Config: `biome.json` - Format touched files with `npx biome check --write `; full-repo formatting is not the current CI gate. - **ESLint** — web app linting via `npm --prefix web run lint` - Scope: `web/` - Config: `web/eslint.config.mjs` - **TypeScript** — Strict mode enabled; run `npm run typecheck:extensions` - **Knip** — Detect unused code and dependencies: `npx knip` (config at `knip.json`) - **jscpd** — Detect duplicate code: `npx jscpd` (config at `.jscpd.json`) - **Tech Debt Scanner** — `node scripts/tech-debt-scan.mjs` - Tracks TODO/FIXME/HACK/XXX counts against thresholds - **Secret Scan** — `npm run secret-scan` (pre-commit hook available via `npm run secret-scan:install-hook`) - **Coverage** — `npm run test:coverage` (Vitest V8 coverage with 40/40/20/20 thresholds) ## Dev Container A Dev Container configuration is available at `.devcontainer/devcontainer.json`. Open the repository in VS Code with the Dev Containers extension, or run: ```bash devcontainer up --workspace-folder . ``` The container includes Node 24, Rust, GitHub CLI, Docker-in-Docker, and recommended VS Code extensions. ## Dependency Updates Dependabot is configured at `.github/dependabot.yml` for: - Root npm dependencies (weekly, grouped by ecosystem) - Web app dependencies (weekly) - GitHub Actions (weekly) ## Issue Labels Label definitions are at `.github/labels.yml`. Apply labels using: ```bash # Create a single label gh label create priority/P0 --color B60205 --description "Critical — blocks release" # Or use a label management action in CI ```