singularity-forge/docs/dev/ADR-007-model-catalog-split.md

286 lines
16 KiB
Markdown
Raw Normal View History

2026-04-04 10:00:43 -05:00
# ADR-007: Model Catalog Split and Provider API Encapsulation
**Status:** Proposed
**Date:** 2026-04-03
**Deciders:** Jeremy McSpadden
**Related:** ADR-004 (capability-aware model routing), [ADR-005](https://github.com/gsd-build/gsd-2/issues/2790), [ADR-006](https://github.com/gsd-build/gsd-2/issues/2995), `packages/pi-ai/src/providers/`, `packages/pi-ai/src/models.ts`
## Context
The model/provider system in `pi-ai` has two structural problems worth fixing — but the system is **not fundamentally broken**. The heavy lifting (lazy SDK imports, registry-based dispatch, extension-based registration) is already well-designed. This ADR targets the two areas where the current design creates real friction without proposing unnecessary runtime changes.
### Current Architecture
```
stream.ts
└─ import "./providers/register-builtins.js" ← side-effect import at load time
├─ import anthropic.ts (6.8 KB)
├─ import anthropic-vertex.ts (3.9 KB)
├─ import openai-completions.ts (26 KB)
├─ import openai-responses.ts (6.4 KB)
├─ import openai-codex-responses.ts (29 KB)
├─ import azure-openai-responses.ts (7.8 KB)
├─ import google.ts (13.6 KB)
├─ import google-vertex.ts (14.5 KB)
├─ import google-gemini-cli.ts (30 KB)
├─ import mistral.ts (18.9 KB)
└─ amazon-bedrock.ts (24 KB) ← only lazy-loaded provider
models.ts
└─ import models.generated.ts ← 13,848 lines, ALL providers, loaded at init
└─ import models.custom.ts ← 197 lines, additional providers
```
### What Already Works Well
1. **SDK lazy loading.** Every provider file uses `async function getXxxClass()` with a cached dynamic `import()`. The heavy npm packages (`@anthropic-ai/sdk`, `openai`, `@google/genai`, `@aws-sdk/*`, `@mistralai/*`) are only loaded on first API call. This is where the real startup cost would be — and it's already handled.
2. **Registry-based dispatch.** `api-registry.ts` cleanly maps API types to stream functions. Callers use `stream(model, context)` and the registry routes to the right provider. This pattern is sound.
3. **Extension registration.** Ollama and Claude Code CLI register via `registerApiProvider()` at runtime. This extensibility point works correctly.
4. **Provider implementation code loading (~200KB total).** While all providers load eagerly, V8 parses local `.js` files in single-digit milliseconds each. The total parse cost for all provider files is ~10-30ms — not a user-visible bottleneck on a CLI that's about to make a multi-second API call anyway.
### What's Actually Worth Fixing
#### Problem 1: Monolithic model catalog — developer experience, not runtime
`models.generated.ts` is **13,848 lines in a single file**. This creates real friction:
- **PR reviews are painful.** When the generation script runs, the diff is a wall of changes across unrelated providers. Reviewers can't tell what actually changed for a specific provider.
- **Navigation is slow.** Finding a specific model requires scrolling or searching through thousands of lines of static object literals.
- **Merge conflicts are frequent.** Any two PRs that touch model generation will conflict on the same monolithic file.
- **Git blame is useless.** Every line was "last changed" by the generation script, obscuring the history of individual provider additions.
The runtime cost of loading all model definitions is negligible — a Map of ~200 model objects is maybe 50-100KB of heap. The problem is purely about code organization and developer workflow.
#### Problem 2: Barrel export leaks provider internals — API design
`packages/pi-ai/src/index.ts` re-exports every provider module's internals:
```typescript
export * from "./providers/anthropic.js";
export * from "./providers/google.js";
export * from "./providers/google-gemini-cli.js";
export * from "./providers/google-vertex.js";
export * from "./providers/mistral.js";
export * from "./providers/openai-completions.js";
export * from "./providers/openai-responses.js";
// ... etc
```
This is a public API problem:
- **Consumers can bypass the registry.** Any code that `import { streamAnthropic } from "pi-ai"` has a direct dependency on an implementation detail that should be internal.
- **Refactoring is blocked.** Renaming a function inside a provider file is a breaking change because it's re-exported from the package root.
- **API surface is unnecessarily large.** The public API should be `stream()`, `streamSimple()`, `registerApiProvider()`, model utilities, and types. Provider-specific stream functions are implementation details.
### What Is NOT Worth Changing
**Lazy provider loading (converting `register-builtins.ts` to async on-demand loading).** This was considered and rejected because:
1. **The SDKs are already lazy.** The heavy cost is handled. Provider implementation code (~200KB of local `.js`) parses in ~10-30ms total.
2. **Async resolution adds complexity to the hot path.** `stream.ts` currently does a synchronous `Map.get()`. Making `resolveApiProvider` async adds a microtask hop to every API call — not just the first. Small but measurable, and for no user-visible gain.
3. **High blast radius, low payoff.** Touching `stream.ts`, `api-registry.ts`, and the registration lifecycle simultaneously risks regressions in the core streaming path for an optimization that wouldn't show up in profiling.
4. **Bedrock's lazy loading is a special case, not a template.** It exists because `@aws-sdk/client-bedrock-runtime` is uniquely massive. Generalizing this pattern to providers where the SDK is already lazy-imported doesn't compound the benefit.
## Decision
**Make two targeted improvements to code organization and API hygiene. Do not change runtime loading behavior.**
### Change 1: Split `models.generated.ts` into per-provider files
Replace the monolithic 13,848-line generated file with per-provider files:
```
packages/pi-ai/src/models/
├── index.ts ← re-exports combined registry, same public API
├── generated/
│ ├── anthropic.ts ← Anthropic model definitions
│ ├── openai.ts ← OpenAI model definitions
│ ├── google.ts ← Google model definitions
│ ├── mistral.ts ← Mistral model definitions
│ ├── amazon-bedrock.ts ← Bedrock model definitions
│ ├── groq.ts ← Groq model definitions
│ ├── xai.ts ← xAI model definitions
│ ├── cerebras.ts ← Cerebras model definitions
│ ├── openrouter.ts ← OpenRouter model definitions
│ └── ... ← one file per provider in the catalog
├── custom.ts ← replaces models.custom.ts (unchanged content)
└── capability-patches.ts ← CAPABILITY_PATCHES extracted for clarity
```
**`models/index.ts` keeps the exact same synchronous public API:**
```typescript
// models/index.ts
// GSD-2 — Model registry (split by provider for maintainability)
import { ANTHROPIC_MODELS } from "./generated/anthropic.js";
import { OPENAI_MODELS } from "./generated/openai.js";
import { GOOGLE_MODELS } from "./generated/google.js";
// ... one import per provider
import { CUSTOM_MODELS } from "./custom.js";
import { CAPABILITY_PATCHES, applyCapabilityPatches } from "./capability-patches.js";
import type { Api, KnownProvider, Model, Usage } from "../types.js";
// Combine all generated models into single registry — same as today
const MODELS = {
...ANTHROPIC_MODELS,
...OPENAI_MODELS,
...GOOGLE_MODELS,
// ...
};
// Rest of the file is identical to current models.ts:
// modelRegistry Map construction, capability patch application,
// getModel(), getProviders(), getModels(), calculateCost(),
// supportsXhigh(), modelsAreEqual()
```
**Key constraint: loading stays synchronous and eager.** All model files are statically imported. The Map is built at module init exactly as today. No async, no lazy loading, no runtime behavior change. This is purely a file organization change.
**Update `generate-models.ts`** to emit one file per provider instead of a single `models.generated.ts`. The script already groups models by provider internally — it just needs to write separate files instead of one.
#### Why this matters
| Before | After |
|--------|-------|
| PR diffs show 13K-line file changes | PR diffs scoped to the provider that changed |
| Merge conflicts on any concurrent model update | Conflicts only when same provider is touched |
| `git blame` shows "regenerate models" for every line | `git blame` shows per-provider history |
| Finding a model = search through 13K lines | Finding a model = open the provider file |
| One reviewer must understand all providers | Reviewers only need context for affected provider |
### Change 2: Stop barrel-exporting provider internals
**Update `packages/pi-ai/src/index.ts`:**
```typescript
// Before (current — 17 re-exports including all providers):
export * from "./providers/anthropic.js";
export * from "./providers/azure-openai-responses.js";
export * from "./providers/google.js";
export * from "./providers/google-gemini-cli.js";
export * from "./providers/google-vertex.js";
export * from "./providers/mistral.js";
export * from "./providers/openai-completions.js";
export * from "./providers/openai-responses.js";
export * from "./providers/register-builtins.js";
// ...
// After (clean public API):
export * from "./api-registry.js";
export * from "./env-api-keys.js";
export * from "./models/index.js";
export * from "./providers/register-builtins.js"; // resetApiProviders() is public
export * from "./stream.js";
export * from "./types.js";
export * from "./utils/event-stream.js";
export * from "./utils/json-parse.js";
export type { OAuthAuthInfo, OAuthCredentials, /* ... */ } from "./utils/oauth/types.js";
export * from "./utils/overflow.js";
export * from "./utils/typebox-helpers.js";
export * from "./utils/repair-tool-json.js";
export * from "./utils/validation.js";
```
Provider-specific exports (`streamAnthropic`, `streamGoogle`, etc.) are removed from the public API. Any external consumer that imported them directly should use the registry-based `stream()` / `streamSimple()` functions instead — which is how all internal callers already work.
#### Why this matters
- **Enforces the registry pattern.** The correct way to call a provider is `stream(model, context)`. Direct provider function imports create fragile coupling.
- **Enables future refactoring.** Provider internal function signatures can change without breaking the package API. Today, renaming `streamAnthropic` would be a semver-breaking change.
- **Reduces API surface.** Consumers see only what they need: `stream`, `streamSimple`, `registerApiProvider`, model utilities, and types.
### What Does NOT Change
- **Runtime behavior** — all providers still load eagerly, same as today
- **The `Model<TApi>` type system** — all types, interfaces, and generics stay the same
- **The `ApiProvider` interface** — providers still implement `{ api, stream, streamSimple }`
- **The `api-registry.ts` registry** — synchronous `Map.get()` dispatch, unchanged
- **`stream.ts`** — no changes to the streaming entry point
- **`register-builtins.ts`** — still eagerly imports and registers all providers (only `resetApiProviders` remains in barrel export)
- **The extension system** — `registerApiProvider()` continues to work for Ollama, Claude Code CLI, etc.
- **`models.json` user config** — custom models, overrides, provider settings are unaffected
- **Model discovery** — discovery adapters are already lazy and independent
- **Model routing** — ADR-004's capability-aware routing is orthogonal
## Consequences
### Positive
1. **Cleaner PRs.** Model catalog changes are scoped to the provider that changed. Reviewers see a 200-line diff in `models/generated/openai.ts` instead of a 13K-line diff in `models.generated.ts`.
2. **Fewer merge conflicts.** Two PRs that update different providers no longer conflict on the same file.
3. **Better navigability.** Developers can jump directly to `models/generated/anthropic.ts` to see Anthropic's model definitions instead of searching through a monolith.
4. **Cleaner package API.** `pi-ai` exports only what consumers need. Provider internals are properly encapsulated.
5. **Future-proofs refactoring.** Provider implementation details can evolve without breaking the public API contract.
6. **Zero runtime risk.** No changes to loading, registration, streaming, or dispatch. The refactor is purely structural.
### Negative
1. **More files.** Instead of 1 generated file + 1 custom file, we'll have ~15-20 generated files. Marginal complexity increase, but each file is focused and small.
2. **Generation script update.** `generate-models.ts` needs to write per-provider files. The script already groups by provider, so this is straightforward but requires testing.
3. **Import audit for barrel export change.** Any code that directly imports `streamAnthropic` (etc.) from `pi-ai` needs to be updated. Based on research, the main consumer is `register-builtins.ts` itself, which imports providers directly (not through the barrel). External usage should be minimal.
## Alternatives Considered
### 1. Full lazy provider loading (original ADR-005 proposal)
Make all providers load on-demand via async dynamic imports, generalizing the Bedrock pattern. **Rejected** because:
- SDK imports are already lazy — the heavy cost is handled
- Provider implementation parsing is ~10-30ms total — not a bottleneck
- Adds async complexity to the synchronous stream dispatch hot path
- High migration effort and regression risk for unmeasurable performance gain
### 2. Plugin architecture with separate npm packages
Move each provider to its own package (`@gsd/provider-anthropic`, etc.). Maximum isolation but dramatically more complex build/release/versioning. Overkill for a monorepo where all providers ship together.
### 3. Do nothing
The current architecture works. This is a valid choice. The split is justified by the developer experience friction (13K-line file, merge conflicts, unusable git blame) and the API hygiene issue (leaking provider internals), not by a runtime problem. If the team is not experiencing these friction points, deferring is reasonable.
## Implementation Plan
### Wave 1: Split Model Catalog (Low-Medium Risk)
1. Update `generate-models.ts` to emit per-provider files into `models/generated/`
2. Create `models/index.ts` that imports all per-provider files and builds the same registry
3. Extract `CAPABILITY_PATCHES` into `models/capability-patches.ts`
4. Move `models.custom.ts` to `models/custom.ts`
5. Update imports in `models.ts` (or replace it with the new `models/index.ts`)
6. Verify `npm run build` and `npm run test` pass
7. Delete `models.generated.ts` and `models.custom.ts`
### Wave 2: Clean Up Barrel Export (Low Risk)
1. Remove provider re-exports from `index.ts`
2. Grep for direct provider imports from `"pi-ai"` across the codebase
3. Migrate any found usages to use `stream()` / `streamSimple()` through the registry
4. Verify build and tests
### Wave 3: Validate
1. Run full test suite
2. Verify extension registration (Ollama, Claude Code CLI) still works
3. Verify `resetApiProviders()` test helper still works
4. Spot-check a few providers end-to-end
## References
- Current model catalog: `packages/pi-ai/src/models.generated.ts` (13,848 lines)
- Current barrel export: `packages/pi-ai/src/index.ts`
- Model registry: `packages/pi-ai/src/models.ts`
- API provider registry: `packages/pi-ai/src/api-registry.ts`
- Eager registration: `packages/pi-ai/src/providers/register-builtins.ts`
- Stream dispatch: `packages/pi-ai/src/stream.ts`
- Generation script: `packages/pi-ai/scripts/generate-models.ts`
- Extension registration: `packages/pi-coding-agent/src/core/model-registry.ts`
- ADR-004: `docs/ADR-004-capability-aware-model-routing.md`