feat: dynamic model routing for token consumption optimization (#579)

* feat: dynamic model routing for token consumption optimization (#575)

Add complexity-based model routing that classifies units into light/standard/heavy
tiers and routes to cheaper models when appropriate. Reduces token consumption
by 20-50% for users on capped plans.

- Complexity classifier with heuristic-based tier assignment (no LLM call)
- Model router with downgrade-only semantics (user's config is ceiling)
- Budget-pressure-aware routing (more aggressive as budget fills)
- Cross-provider cost comparison via bundled cost table
- Hook classification support
- Escalation on failure (light → standard → heavy)
- Full preference validation and merge support
- Metrics tracking with tier and downgrade fields
- 40 new tests (classifier, router, cost table)

Closes #575

* feat: phases 2-4 — dashboard, adaptive learning, task introspection

Phase 2 — Observability & Dashboard:
- Tier badge [L]/[S]/[H] displayed in progress widget next to phase label
- Dynamic routing savings summary shown in footer when units have been downgraded
- Tier and modelDowngraded fields passed through snapshotUnitMetrics

Phase 3 — Adaptive Learning:
- New routing-history.ts: tracks success/failure per tier per unit-type pattern
- Rolling window of 50 entries per pattern to prevent stale data
- User feedback support (over/under/ok) with 2x weight vs automatic
- Failure rate >20% auto-bumps tier for that pattern
- Tag-specific patterns (e.g. execute-task:docs) for granular learning
- History persists to .gsd/routing-history.json
- Classifier consults adaptive history before finalizing tier

Phase 4 — Task Plan Introspection:
- Code block counting in task plans (5+ blocks → heavy)
- Complexity keyword detection: migration, architecture, security,
  performance, concurrency, compatibility
- Multiple complexity keywords (2+) → heavy, single → standard
- New codeBlockCount and complexityKeywords fields in TaskMetadata

Tests: 16 new tests (routing history + introspection), 419 total passing
This commit is contained in:
Flux Labs 2026-03-16 08:53:53 -05:00 committed by GitHub
parent 7567d2db05
commit 77309207ce
12 changed files with 1851 additions and 75 deletions

View file

@ -0,0 +1,364 @@
# Plan: Dynamic Model Routing for Token Optimization
**Issue:** #575 — Token Consumption Optimization through Dynamic Model Selection
**Status:** Draft
**Date:** 2025-03-15
## Problem Statement
Users on capped plans (e.g., Claude Pro) exhaust weekly token limits in 15-20 hours of GSD usage. Currently, GSD uses a single model per phase (research/planning/execution/completion), configured statically in preferences. Simple tasks consume the same tokens as complex ones.
## Current Architecture
### What Exists
- **Phase-based model config:** Users can set different models per phase via `preferences.md` (research, planning, execution, completion)
- **Fallback chains:** Each phase supports `fallbacks: [model1, model2]` for error recovery
- **Pre-dispatch hooks:** `PreDispatchResult` has a `model` field but it's **never applied** in `auto.ts` — this is a ready-made extension point
- **Model registry:** `ModelRegistry.getAvailable()` provides all configured models with metadata
- **Per-unit metrics:** Token counts (input/output/cacheRead/cacheWrite), cost, and model tracked per unit
- **Budget enforcement:** Real-time cost tracking with alerts at 75%/90%/100%
### Key Files
| File | Role |
|------|------|
| `src/resources/extensions/gsd/auto.ts` | Dispatch logic, model switching (lines 1791-1879) |
| `src/resources/extensions/gsd/preferences.ts` | Model resolution, `resolveModelWithFallbacksForUnit()` |
| `src/resources/extensions/gsd/post-unit-hooks.ts` | Pre-dispatch hooks (model field defined but unused) |
| `src/resources/extensions/gsd/types.ts` | Type definitions for hooks and model config |
| `src/resources/extensions/gsd/metrics.ts` | Token tracking, aggregation, cost projection |
| `src/resources/extensions/gsd/auto-prompts.ts` | Prompt builders per unit type |
| `packages/pi-coding-agent/src/core/model-registry.ts` | Model availability and metadata |
## Proposed Design
### Core Concept: Task Complexity Classification
Before each unit dispatch, classify the task into a complexity tier and route to an appropriate model. This sits between preference resolution and model dispatch — it can **downgrade** but never **upgrade** beyond the user's configured model.
### Complexity Tiers
| Tier | Complexity | Example Tasks | Default Model |
|------|-----------|---------------|---------------|
| **Tier 1 — Light** | Low cognitive load, structured output | File reads, search aggregation, simple summaries, completion/summary units | Haiku / cheapest available |
| **Tier 2 — Standard** | Moderate reasoning, some creativity | Research synthesis, plan formatting, routine code generation, UAT checks | Sonnet / mid-tier |
| **Tier 3 — Heavy** | Complex reasoning, architecture, novel code | Complex execution tasks, replanning, multi-file refactors, debugging | Opus / user's configured model |
### Classification Signals
The classifier uses **heuristic signals** available before dispatch (no LLM call needed):
1. **Unit type** (strongest signal):
- `complete-slice`, `run-uat` → Tier 1 (structured summarization)
- `research-milestone`, `research-slice` → Tier 2 (synthesis)
- `plan-milestone`, `plan-slice` → Tier 2-3 (depends on scope)
- `execute-task` → Tier 2-3 (depends on task complexity)
- `replan-slice` → Tier 3 (requires understanding of failure)
2. **Task metadata** (for execution units):
- Lines of code estimated to change (from task plan)
- Number of files involved
- Dependency count
- Whether task involves new file creation vs. modification
- Tags/labels if present (e.g., "refactor", "test", "docs")
3. **Historical performance** (adaptive, Phase 2):
- If a Tier 2 model failed and escalated on similar tasks before, default to Tier 3
- Track success rate per tier per unit-type pattern
### Architecture
```
User Preferences (phase → model)
resolveModelWithFallbacksForUnit() ← existing
classifyUnitComplexity() ← NEW: returns Tier 1/2/3
resolveModelForTier() ← NEW: maps tier → model from available set
maybeDowngradeModel() ← NEW: only downgrades from user's configured model
Model dispatch (existing auto.ts logic)
```
### Key Design Decisions
1. **Downgrade-only:** The classifier can select a cheaper model than configured, never a more expensive one. The user's preference is the ceiling.
2. **Opt-in with easy override:** New preference key `dynamic_model_routing: true|false` (default: `false`). Users who want token savings enable it explicitly.
3. **Escalation on failure:** If a lower-tier model fails (tool errors, incomplete output, exceeds retries), automatically escalate to the next tier and retry the unit.
4. **No LLM call for classification:** Uses heuristics only — adding an LLM call to save tokens would be counterproductive.
5. **Respects existing fallback chains:** Dynamic routing integrates with existing `fallbacks` — if the dynamically selected model fails, it tries the fallback chain before escalating tiers.
6. **Transparent to user:** Dashboard shows which model was selected and why (tier badge in progress widget).
## Implementation Phases
### Phase 1: Foundation — Complexity Classifier & Routing (Core)
**Goal:** Build the classification and routing system, wire it into dispatch.
#### 1a. Define types and configuration
**File:** `src/resources/extensions/gsd/types.ts`
- Add `ComplexityTier` type: `'light' | 'standard' | 'heavy'`
- Add `DynamicRoutingConfig` interface:
```typescript
interface DynamicRoutingConfig {
enabled: boolean;
tier_models?: {
light?: string; // model ID for light tasks
standard?: string; // model ID for standard tasks
heavy?: string; // model ID for heavy tasks (default: user's configured model)
};
escalate_on_failure?: boolean; // default: true
}
```
**File:** `src/resources/extensions/gsd/preferences.ts`
- Add `dynamic_routing` to preference schema
- Add validation for the new config
- Add `loadDynamicRoutingConfig()` function
#### 1b. Build complexity classifier
**New file:** `src/resources/extensions/gsd/complexity-classifier.ts`
- `classifyUnitComplexity(unitType, unitId, metadata?)``ComplexityTier`
- Heuristic rules:
- Unit type mapping (see Tiers table above)
- Task plan analysis: parse task plan file for file count, estimated scope
- Dependency analysis: tasks with 3+ dependencies → bump to heavy
- Export `getClassificationReason()` for dashboard display
#### 1c. Build model router
**New file:** `src/resources/extensions/gsd/model-router.ts`
- `resolveModelForComplexity(tier, phaseConfig, availableModels)``ResolvedModelConfig`
- Logic:
1. Get user's configured model for phase (ceiling)
2. If `tier_models` configured, use tier-specific model
3. If not configured, use smart defaults from available models (cheapest for light, mid for standard, configured for heavy)
4. Validate selected model is available
5. Return with fallback chain: `[tier_model, ...configured_fallbacks, configured_primary]`
#### 1d. Wire into dispatch
**File:** `src/resources/extensions/gsd/auto.ts`
- In the model resolution block (lines 1791-1879):
1. After `resolveModelWithFallbacksForUnit()`, call classifier
2. If dynamic routing enabled, call router to potentially downgrade
3. Log tier and model selection to metrics
4. On unit failure: if using downgraded model, escalate tier and retry
#### 1e. Wire the unused pre-dispatch hook model field
**File:** `src/resources/extensions/gsd/auto.ts`
- Apply `preDispatchResult.model` when returned — this is already defined but unused
- Allows hooks to override dynamic routing decisions
#### Tests
**New file:** `src/resources/extensions/gsd/tests/complexity-classifier.test.ts`
- Test tier assignment for each unit type
- Test metadata-based adjustments (file count, dependency count)
- Test edge cases (missing metadata, unknown unit types)
**New file:** `src/resources/extensions/gsd/tests/model-router.test.ts`
- Test downgrade-only behavior (never exceeds configured model)
- Test tier-to-model mapping with various available model sets
- Test fallback chain construction
- Test when dynamic routing is disabled (passthrough)
**New file:** `src/resources/extensions/gsd/tests/dynamic-routing-integration.test.ts`
- Test full flow: unit → classify → route → dispatch
- Test escalation on failure
- Test preference loading and validation
---
### Phase 2: Observability & Dashboard
**Goal:** Make routing decisions visible to users.
#### 2a. Metrics tracking
**File:** `src/resources/extensions/gsd/metrics.ts`
- Add `tier` field to `UnitMetrics`
- Add `model_downgraded: boolean` field
- Add `escalation_count` field
- Add `aggregateByTier()` function
- Add `formatTierSavings()` — show estimated savings from downgrades
#### 2b. Dashboard integration
**File:** `src/resources/extensions/gsd/auto-dashboard.ts`
- Add tier badge to unit progress display (e.g., `[L]`, `[S]`, `[H]`)
- Add savings summary to completion stats: "Dynamic routing saved ~$X.XX (N units downgraded)"
- Color-code tier in token widget
#### Tests
- Test metrics aggregation by tier
- Test savings calculation
- Test dashboard formatting
---
### Phase 3: Adaptive Learning (Future)
**Goal:** Improve classification accuracy over time based on outcomes.
#### 3a. Outcome tracking
**File:** `src/resources/extensions/gsd/complexity-classifier.ts`
- Track success/failure per tier per unit-type pattern
- Store in `.gsd/routing-history.json` (project-level)
- Simple structure: `{ "execute-task:docs": { light: { success: 12, fail: 1 }, ... } }`
#### 3b. Adaptive thresholds
- If a tier has >20% failure rate for a pattern, auto-bump default tier
- Decay old data (rolling window of last 50 units)
- User can reset learning: `dynamic_routing_reset: true` in preferences
#### Tests
- Test learning updates on success/failure
- Test threshold bumping
- Test decay logic
- Test reset behavior
---
### Phase 4: Task Plan Introspection (Future)
**Goal:** Deeper classification using task plan content analysis.
- Parse task plan markdown for complexity signals:
- "Create new file" vs. "modify existing"
- Number of code blocks in plan
- Presence of keywords: "refactor", "migration", "architecture", "test", "docs", "config"
- Estimated lines of change (if specified)
- Weight these signals alongside unit-type heuristics
---
## Preference Configuration (User-Facing)
```yaml
---
version: 1
models:
research: claude-sonnet-4-6
planning: claude-opus-4-6
execution: claude-sonnet-4-6
completion: claude-sonnet-4-6
dynamic_routing:
enabled: true
tier_models:
light: claude-haiku-4-5
standard: claude-sonnet-4-6
# heavy: inherits from phase config (ceiling)
escalate_on_failure: true
---
```
## Risk Mitigation
| Risk | Mitigation |
|------|-----------|
| Cheaper model produces low-quality output | Downgrade-only design; escalation on failure; user can disable |
| Classification overhead adds latency | Heuristics-only, no LLM call; <1ms classification time |
| Complex preferences confuse users | Disabled by default; works with zero config if enabled (uses smart defaults) |
| Model not available in user's provider | Validation at preference load; falls back to configured model |
| Escalation loops | Max 1 escalation per unit; after that, use configured model |
## Estimated Token Savings
Based on typical GSD session patterns:
- ~30% of units are completion/summary (Tier 1 candidates)
- ~40% are research/standard planning (Tier 2 candidates)
- ~30% are complex execution (Tier 3, no downgrade)
If Haiku is ~10x cheaper than Opus and Sonnet is ~5x cheaper:
- **Conservative estimate:** 20-30% cost reduction with dynamic routing enabled
- **Aggressive estimate:** 40-50% for projects with many small tasks
## Resolved Design Decisions
All four open questions resolved as **yes** — folded into the plan as additional scope:
### 1. Post-unit hook classification — YES
Hooks get their own complexity classification. Most hooks are lightweight (validation, file checks) and should default to Tier 1. The existing `model` field on `PostUnitHookConfig` becomes the ceiling, same as phase models for units.
**Implementation:** Add to Phase 1d — extend `classifyUnitComplexity()` to accept hook metadata. Wire into hook dispatch at `auto.ts` lines 936-946.
### 2. Budget-pressure-aware routing — YES
As budget usage increases, the classifier becomes more aggressive about downgrading:
- **<50% budget used:** Normal classification
- **50-75% budget used:** Bump Tier 2 candidates down to Tier 1 where possible
- **75-90% budget used:** Only Tier 3 tasks get the configured model; everything else goes to cheapest available
- **>90% budget used:** Everything except `replan-slice` gets downgraded to cheapest
**Implementation:** Add to Phase 1b — `classifyUnitComplexity()` takes `budgetPct` parameter from existing `getBudgetAlertLevel()` logic. New function `applyBudgetPressure(tier, budgetPct)` adjusts the tier.
### 3. Multi-provider cost routing — YES
When multiple providers are configured, the router should consider cost differences. If a user has both Anthropic and OpenRouter, pick the cheapest option for the resolved tier.
**Implementation:**
- Add `cost_per_1k_tokens` metadata to model registry (or maintain a lookup table for known models)
- New file: `src/resources/extensions/gsd/model-cost-table.ts` — static cost table for known models, updatable via preferences
- `resolveModelForComplexity()` ranks available models by cost within a tier's capability range
- Preference key: `dynamic_routing.cross_provider: true|false` (default: true when enabled)
**Risk:** Cost data goes stale. Mitigate with a bundled cost table that gets updated with GSD releases + user override capability.
### 4. User feedback loop — YES
After each unit completes, users can flag the output quality to improve future classification.
**Implementation (Phase 3 — Adaptive Learning):**
- Post-unit prompt option: user can react with `/gsd:rate-unit [over|under|ok]`
- `over` = "this could have used a simpler model" → records downgrade signal
- `under` = "this needed a better model" → records upgrade signal
- `ok` = confirms current tier was appropriate
- Feedback stored alongside outcome data in `.gsd/routing-history.json`
- Classifier weights feedback signals 2x vs. automatic success/failure detection
- Skill: `gsd:rate-unit` — simple command that tags the last completed unit
### Updated Preference Configuration
```yaml
---
version: 1
models:
research: claude-sonnet-4-6
planning: claude-opus-4-6
execution: claude-sonnet-4-6
completion: claude-sonnet-4-6
dynamic_routing:
enabled: true
tier_models:
light: claude-haiku-4-5
standard: claude-sonnet-4-6
# heavy: inherits from phase config (ceiling)
escalate_on_failure: true
budget_pressure: true # more aggressive downgrading as budget fills
cross_provider: true # consider cost across providers
hooks: true # classify hooks too
---
```
### Updated Phase Summary
| Phase | Scope | Includes |
|-------|-------|----------|
| **1 — Foundation** | Classifier, router, dispatch, hook classification, budget pressure | Decisions 1 & 2 |
| **2 — Observability** | Dashboard, tier badges, savings tracking, cost table | Decision 3 |
| **3 — Adaptive Learning** | Outcome tracking, user feedback (`/gsd:rate-unit`), adaptive thresholds | Decision 4 |
| **4 — Task Introspection** | Parse task plans for deeper complexity signals | — |

View file

@ -10,7 +10,7 @@ import type { ExtensionContext, ExtensionCommandContext } from "@gsd/pi-coding-a
import type { GSDState } from "./types.js";
import { getCurrentBranch } from "./worktree.js";
import { getActiveHook } from "./post-unit-hooks.js";
import { getLedger, getProjectTotals, formatCost, formatTokenCount } from "./metrics.js";
import { getLedger, getProjectTotals, formatCost, formatTokenCount, formatTierSavings } from "./metrics.js";
import {
resolveMilestoneFile,
resolveSliceFile,
@ -239,6 +239,7 @@ export function updateProgressWidget(
unitId: string,
state: GSDState,
accessors: WidgetStateAccessors,
tierBadge?: string,
): void {
if (!ctx.hasUI) return;
@ -319,7 +320,8 @@ export function updateProgressWidget(
const target = task ? `${task.id}: ${task.title}` : unitId;
const actionLeft = `${pad}${theme.fg("accent", "▸")} ${theme.fg("accent", verb)} ${theme.fg("text", target)}`;
const phaseBadge = theme.fg("dim", phaseLabel);
const tierTag = tierBadge ? theme.fg("dim", `[${tierBadge}] `) : "";
const phaseBadge = `${tierTag}${theme.fg("dim", phaseLabel)}`;
lines.push(rightAlign(actionLeft, phaseBadge, width));
lines.push("");
@ -414,6 +416,14 @@ export function updateProgressWidget(
? `${modelPhase}${theme.fg("dim", modelDisplay)}`
: "";
lines.push(rightAlign(`${pad}${sLeft}`, sRight, width));
// Dynamic routing savings summary
if (mLedger && mLedger.units.some(u => u.tier)) {
const savings = formatTierSavings(mLedger.units);
if (savings) {
lines.push(truncateToWidth(theme.fg("dim", `${pad}${savings}`), width));
}
}
}
const hintParts: string[] = [];

View file

@ -39,9 +39,12 @@ import {
readUnitRuntimeRecord,
writeUnitRuntimeRecord,
} from "./unit-runtime.js";
import { resolveAutoSupervisorConfig, resolveModelWithFallbacksForUnit, loadEffectiveGSDPreferences, resolveSkillDiscoveryMode } from "./preferences.js";
import { resolveAutoSupervisorConfig, resolveModelWithFallbacksForUnit, loadEffectiveGSDPreferences, resolveSkillDiscoveryMode, resolveDynamicRoutingConfig } from "./preferences.js";
import { sendDesktopNotification } from "./notifications.js";
import type { GSDPreferences } from "./preferences.js";
import { classifyUnitComplexity, tierLabel } from "./complexity-classifier.js";
import { resolveModelForComplexity } from "./model-router.js";
import { initRoutingHistory, resetRoutingHistory, recordOutcome } from "./routing-history.js";
import {
checkPostUnitHooks,
getActiveHook,
@ -233,6 +236,9 @@ let autoStartTime: number = 0;
let completedUnits: { type: string; id: string; startedAt: number; finishedAt: number }[] = [];
let currentUnit: { type: string; id: string; startedAt: number } | null = null;
/** Track dynamic routing decision for the current unit (for metrics) */
let currentUnitRouting: { tier: string; modelDowngraded: boolean } | null = null;
/** Track current milestone to detect transitions */
let currentMilestoneId: string | null = null;
let lastBudgetAlertLevel: BudgetAlertLevel = 0;
@ -504,6 +510,7 @@ export async function stopAuto(ctx?: ExtensionContext, pi?: ExtensionAPI): Promi
}
resetMetrics();
resetRoutingHistory();
resetHookState();
if (basePath) clearPersistedHookState(basePath);
active = false;
@ -809,6 +816,9 @@ export async function startAuto(
// Initialize metrics — loads existing ledger from disk
initMetrics(base);
// Initialize routing history for adaptive learning
initRoutingHistory(base);
// Snapshot installed skills so we can detect new ones after research
if (resolveSkillDiscoveryMode() !== "off") {
snapshotSkills();
@ -1011,7 +1021,7 @@ export async function handleAgentEnd(
const hookStartedAt = Date.now();
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
saveActivityLog(ctx, basePath, currentUnit.type, currentUnit.id);
}
currentUnit = { type: hookUnit.unitType, id: hookUnit.unitId, startedAt: hookStartedAt };
@ -1227,7 +1237,10 @@ function updateProgressWidget(
unitId: string,
state: GSDState,
): void {
_updateProgressWidget(ctx, unitType, unitId, state, widgetStateAccessors);
const badge = currentUnitRouting?.tier
? ({ light: "L", standard: "S", heavy: "H" }[currentUnitRouting.tier] ?? undefined)
: undefined;
_updateProgressWidget(ctx, unitType, unitId, state, widgetStateAccessors, badge);
}
/** State accessors for the widget — closures over module globals. */
@ -1395,7 +1408,7 @@ async function dispatchNextUnit(
// Save final session before stopping
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
saveActivityLog(ctx, basePath, currentUnit.type, currentUnit.id);
}
sendDesktopNotification("GSD", "All milestones complete!", "success", "milestone");
@ -1423,7 +1436,7 @@ async function dispatchNextUnit(
if (!mid || !midTitle) {
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
saveActivityLog(ctx, basePath, currentUnit.type, currentUnit.id);
}
await stopAuto(ctx, pi);
@ -1438,7 +1451,7 @@ async function dispatchNextUnit(
if (state.phase === "complete") {
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
saveActivityLog(ctx, basePath, currentUnit.type, currentUnit.id);
}
// Clear completed-units.json for the finished milestone so it doesn't grow unbounded.
@ -1508,7 +1521,7 @@ async function dispatchNextUnit(
if (state.phase === "blocked") {
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
saveActivityLog(ctx, basePath, currentUnit.type, currentUnit.id);
}
await stopAuto(ctx, pi);
@ -1616,7 +1629,7 @@ async function dispatchNextUnit(
if (dispatchResult.action === "stop") {
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
saveActivityLog(ctx, basePath, currentUnit.type, currentUnit.id);
}
await stopAuto(ctx, pi);
@ -1726,7 +1739,7 @@ async function dispatchNextUnit(
if (lifetimeCount > MAX_LIFETIME_DISPATCHES) {
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
}
saveActivityLog(ctx, basePath, unitType, unitId);
const expected = diagnoseExpectedArtifact(unitType, unitId, basePath);
@ -1740,7 +1753,7 @@ async function dispatchNextUnit(
if (prevCount >= MAX_UNIT_DISPATCHES) {
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
}
saveActivityLog(ctx, basePath, unitType, unitId);
@ -1898,9 +1911,19 @@ async function dispatchNextUnit(
// The session still holds the previous unit's data (newSession hasn't fired yet).
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
saveActivityLog(ctx, basePath, currentUnit.type, currentUnit.id);
// Record routing outcome for adaptive learning
if (currentUnitRouting) {
const isRetry = currentUnit.type === unitType && currentUnit.id === unitId;
recordOutcome(
currentUnit.type,
currentUnitRouting.tier as "light" | "standard" | "heavy",
!isRetry, // success = not being retried
);
}
// Only mark the previous unit as completed if:
// 1. We're not about to re-dispatch the same unit (retry scenario)
// 2. The expected artifact actually exists on disk
@ -2003,7 +2026,54 @@ async function dispatchNextUnit(
const modelConfig = resolveModelWithFallbacksForUnit(unitType);
if (modelConfig) {
const availableModels = ctx.modelRegistry.getAvailable();
const modelsToTry = [modelConfig.primary, ...modelConfig.fallbacks];
// ─── Dynamic Model Routing ─────────────────────────────────────────
// If enabled, classify unit complexity and potentially downgrade to a
// cheaper model. The user's configured model is the ceiling.
const routingConfig = resolveDynamicRoutingConfig();
let effectiveModelConfig = modelConfig;
let routingTierLabel = "";
currentUnitRouting = null;
if (routingConfig.enabled) {
// Compute budget pressure if budget ceiling is set
let budgetPct: number | undefined;
if (routingConfig.budget_pressure !== false) {
const budgetCeiling = prefs?.budget_ceiling;
if (budgetCeiling !== undefined && budgetCeiling > 0) {
const currentLedger = getLedger();
const totalCost = currentLedger ? getProjectTotals(currentLedger.units).cost : 0;
budgetPct = totalCost / budgetCeiling;
}
}
// Classify complexity (hook routing controlled by config.hooks)
const isHook = unitType.startsWith("hook/");
const shouldClassify = !isHook || routingConfig.hooks !== false;
if (shouldClassify) {
const classification = classifyUnitComplexity(unitType, unitId, basePath, budgetPct);
const availableModelIds = availableModels.map(m => m.id);
const routing = resolveModelForComplexity(classification, modelConfig, routingConfig, availableModelIds);
if (routing.wasDowngraded) {
effectiveModelConfig = {
primary: routing.modelId,
fallbacks: routing.fallbacks,
};
if (verbose) {
ctx.ui.notify(
`Dynamic routing [${tierLabel(classification.tier)}]: ${routing.modelId} (${classification.reason})`,
"info",
);
}
}
routingTierLabel = ` [${tierLabel(classification.tier)}]`;
currentUnitRouting = { tier: classification.tier, modelDowngraded: routing.wasDowngraded };
}
}
const modelsToTry = [effectiveModelConfig.primary, ...effectiveModelConfig.fallbacks];
let modelSet = false;
for (const modelId of modelsToTry) {
@ -2068,11 +2138,11 @@ async function dispatchNextUnit(
const ok = await pi.setModel(model, { persist: false });
if (ok) {
const fallbackNote = modelId === modelConfig.primary
const fallbackNote = modelId === effectiveModelConfig.primary
? ""
: ` (fallback from ${modelConfig.primary})`;
: ` (fallback from ${effectiveModelConfig.primary})`;
const phase = unitPhaseLabel(unitType);
ctx.ui.notify(`Model [${phase}]: ${model.provider}/${model.id}${fallbackNote}`, "info");
ctx.ui.notify(`Model [${phase}]${routingTierLabel}: ${model.provider}/${model.id}${fallbackNote}`, "info");
modelSet = true;
break;
} else {
@ -2151,7 +2221,7 @@ async function dispatchNextUnit(
if (currentUnit) {
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
}
saveActivityLog(ctx, basePath, unitType, unitId);
@ -2177,7 +2247,7 @@ async function dispatchNextUnit(
timeoutAt: Date.now(),
});
const modelId = ctx.model?.id ?? "unknown";
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId);
snapshotUnitMetrics(ctx, currentUnit.type, currentUnit.id, currentUnit.startedAt, modelId, currentUnitRouting ?? undefined);
}
saveActivityLog(ctx, basePath, unitType, unitId);

View file

@ -0,0 +1,322 @@
// GSD Extension — Complexity Classifier
// Classifies unit complexity for dynamic model routing.
// Pure heuristics + adaptive learning — no LLM calls. Sub-millisecond classification.
import { existsSync, readFileSync } from "node:fs";
import { join } from "node:path";
import { gsdRoot } from "./paths.js";
import { getAdaptiveTierAdjustment } from "./routing-history.js";
// ─── Types ───────────────────────────────────────────────────────────────────
export type ComplexityTier = "light" | "standard" | "heavy";
export interface ClassificationResult {
tier: ComplexityTier;
reason: string;
downgraded: boolean; // true if budget pressure lowered the tier
}
export interface TaskMetadata {
fileCount?: number;
dependencyCount?: number;
isNewFile?: boolean;
tags?: string[];
estimatedLines?: number;
codeBlockCount?: number; // number of fenced code blocks in plan
complexityKeywords?: string[]; // detected complexity signals
}
// ─── Unit Type → Default Tier Mapping ────────────────────────────────────────
const UNIT_TYPE_TIERS: Record<string, ComplexityTier> = {
// Tier 1 — Light: structured summaries, completion, UAT
"complete-slice": "light",
"run-uat": "light",
// Tier 2 — Standard: research, routine planning
"research-milestone": "standard",
"research-slice": "standard",
"plan-milestone": "standard",
"plan-slice": "standard",
// Tier 3 — Heavy: execution, replanning (requires deep reasoning)
"execute-task": "standard", // default standard, upgraded by metadata
"replan-slice": "heavy",
"reassess-roadmap": "heavy",
};
// ─── Public API ──────────────────────────────────────────────────────────────
/**
* Classify unit complexity to determine which model tier to use.
*
* @param unitType The type of unit being dispatched
* @param unitId The unit ID (e.g. "M001/S01/T01")
* @param basePath Project base path (for reading task plans)
* @param budgetPct Current budget usage as fraction (0.0-1.0+), or undefined if no budget
* @param metadata Optional pre-parsed task metadata
*/
export function classifyUnitComplexity(
unitType: string,
unitId: string,
basePath: string,
budgetPct?: number,
metadata?: TaskMetadata,
): ClassificationResult {
// Hook units default to light
if (unitType.startsWith("hook/")) {
const result: ClassificationResult = { tier: "light", reason: "hook unit", downgraded: false };
return applyBudgetPressure(result, budgetPct);
}
// Start with the default tier for this unit type
let tier = UNIT_TYPE_TIERS[unitType] ?? "standard";
let reason = `unit type: ${unitType}`;
// For execute-task, analyze task metadata for complexity signals
if (unitType === "execute-task") {
const taskAnalysis = analyzeTaskComplexity(unitId, basePath, metadata);
tier = taskAnalysis.tier;
reason = taskAnalysis.reason;
}
// For plan-slice, check if the slice has many tasks (complex planning)
if (unitType === "plan-slice" || unitType === "plan-milestone") {
const planAnalysis = analyzePlanComplexity(unitId, basePath);
if (planAnalysis) {
tier = planAnalysis.tier;
reason = planAnalysis.reason;
}
}
// Adaptive learning: check if history suggests bumping the tier
const tags = metadata?.tags ?? extractTaskMetadata(unitId, basePath).tags;
const adaptiveAdjustment = getAdaptiveTierAdjustment(unitType, tier, tags);
if (adaptiveAdjustment && tierOrdinal(adaptiveAdjustment) > tierOrdinal(tier)) {
reason = `${reason} (adaptive: high failure rate at ${tier})`;
tier = adaptiveAdjustment;
}
const result: ClassificationResult = { tier, reason, downgraded: false };
return applyBudgetPressure(result, budgetPct);
}
/**
* Get a short label for the tier (for dashboard display).
*/
export function tierLabel(tier: ComplexityTier): string {
switch (tier) {
case "light": return "L";
case "standard": return "S";
case "heavy": return "H";
}
}
/**
* Get the tier ordering value (for comparison).
*/
export function tierOrdinal(tier: ComplexityTier): number {
switch (tier) {
case "light": return 0;
case "standard": return 1;
case "heavy": return 2;
}
}
// ─── Task Complexity Analysis ────────────────────────────────────────────────
interface TaskAnalysis {
tier: ComplexityTier;
reason: string;
}
function analyzeTaskComplexity(
unitId: string,
basePath: string,
metadata?: TaskMetadata,
): TaskAnalysis {
// Try to read task plan for complexity signals
const meta = metadata ?? extractTaskMetadata(unitId, basePath);
// Heavy signals
if (meta.dependencyCount && meta.dependencyCount >= 3) {
return { tier: "heavy", reason: `${meta.dependencyCount} dependencies` };
}
if (meta.fileCount && meta.fileCount >= 6) {
return { tier: "heavy", reason: `${meta.fileCount} files to modify` };
}
if (meta.estimatedLines && meta.estimatedLines >= 500) {
return { tier: "heavy", reason: `~${meta.estimatedLines} lines estimated` };
}
// Heavy signals from complexity keywords (Phase 4)
if (meta.complexityKeywords && meta.complexityKeywords.length >= 2) {
return { tier: "heavy", reason: `complex: ${meta.complexityKeywords.join(", ")}` };
}
if (meta.codeBlockCount && meta.codeBlockCount >= 5) {
return { tier: "heavy", reason: `${meta.codeBlockCount} code blocks in plan` };
}
// Standard signals from single complexity keyword
if (meta.complexityKeywords && meta.complexityKeywords.length === 1) {
return { tier: "standard", reason: `${meta.complexityKeywords[0]} task` };
}
// Light signals (simple tasks)
if (meta.tags?.some(t => /^(docs?|readme|comment|config|typo|rename)$/i.test(t))) {
return { tier: "light", reason: `simple task: ${meta.tags.join(", ")}` };
}
if (meta.fileCount !== undefined && meta.fileCount <= 1 && !meta.isNewFile) {
return { tier: "light", reason: "single file modification" };
}
// Standard by default
return { tier: "standard", reason: "standard execution task" };
}
function analyzePlanComplexity(
unitId: string,
basePath: string,
): TaskAnalysis | null {
// Check if this is a milestone-level plan (more complex) vs single slice
const parts = unitId.split("/");
if (parts.length === 1) {
// Milestone-level planning is always at least standard
return { tier: "standard", reason: "milestone-level planning" };
}
// For slice planning, try to read the context/research to gauge complexity
// If research exists and is large, bump to heavy
const [mid, sid] = parts;
const researchPath = join(gsdRoot(basePath), mid, "slices", sid, "RESEARCH.md");
try {
if (existsSync(researchPath)) {
const content = readFileSync(researchPath, "utf-8");
const lineCount = content.split("\n").length;
if (lineCount > 200) {
return { tier: "heavy", reason: `complex slice: ${lineCount}-line research` };
}
}
} catch {
// Non-fatal
}
return null; // Use default tier
}
/**
* Extract task metadata from the task plan file on disk.
*/
function extractTaskMetadata(unitId: string, basePath: string): TaskMetadata {
const meta: TaskMetadata = {};
const parts = unitId.split("/");
if (parts.length !== 3) return meta;
const [mid, sid, tid] = parts;
const taskPlanPath = join(gsdRoot(basePath), mid, "slices", sid, "tasks", `${tid}-PLAN.md`);
try {
if (!existsSync(taskPlanPath)) return meta;
const content = readFileSync(taskPlanPath, "utf-8");
const lines = content.split("\n");
// Count files mentioned in "Files:" or "- Files:" lines
const fileLines = lines.filter(l => /^\s*-?\s*files?\s*:/i.test(l));
if (fileLines.length > 0) {
// Count comma-separated or bullet-pointed files
const allFiles = new Set<string>();
for (const line of fileLines) {
const filesStr = line.replace(/^\s*-?\s*files?\s*:\s*/i, "");
const files = filesStr.split(/[,;]/).map(f => f.trim()).filter(Boolean);
files.forEach(f => allFiles.add(f));
}
meta.fileCount = allFiles.size;
}
// Check for "new file" or "create" keywords
meta.isNewFile = lines.some(l => /\b(create|new file|scaffold|bootstrap)\b/i.test(l));
// Look for tags/labels in frontmatter or content
const tags: string[] = [];
if (content.match(/\b(refactor|migration|architect)/i)) tags.push("refactor");
if (content.match(/\b(test|spec|coverage)\b/i)) tags.push("test");
if (content.match(/\b(doc|readme|comment|jsdoc)\b/i)) tags.push("docs");
if (content.match(/\b(config|env|setting)\b/i)) tags.push("config");
if (content.match(/\b(rename|typo|spelling)\b/i)) tags.push("rename");
meta.tags = tags;
// Try to extract estimated lines from content
const estimateMatch = content.match(/~?\s*(\d+)\s*lines?\b/i);
if (estimateMatch) {
meta.estimatedLines = parseInt(estimateMatch[1], 10);
}
// Phase 4: Deeper introspection signals
// Count fenced code blocks (```) — more code blocks = more complex implementation
const codeBlockMatches = content.match(/^```/gm);
meta.codeBlockCount = codeBlockMatches ? Math.floor(codeBlockMatches.length / 2) : 0;
// Detect complexity keywords that suggest harder tasks
const complexityKeywords: string[] = [];
if (content.match(/\b(migration|migrate|schema change)\b/i)) complexityKeywords.push("migration");
if (content.match(/\b(architect|design pattern|system design)\b/i)) complexityKeywords.push("architecture");
if (content.match(/\b(security|auth|encrypt|credential|vulnerability)\b/i)) complexityKeywords.push("security");
if (content.match(/\b(performance|optimize|cache|index)\b/i)) complexityKeywords.push("performance");
if (content.match(/\b(concurrent|parallel|race condition|mutex|lock)\b/i)) complexityKeywords.push("concurrency");
if (content.match(/\b(backward.?compat|breaking change|deprecat)\b/i)) complexityKeywords.push("compatibility");
meta.complexityKeywords = complexityKeywords;
} catch {
// Non-fatal — metadata extraction is best-effort
}
return meta;
}
// ─── Budget Pressure ─────────────────────────────────────────────────────────
/**
* Apply budget pressure to a classification result.
* As budget usage increases, more aggressively downgrade tiers.
*
* - <50%: Normal classification (no change)
* - 50-75%: Tier 2 Tier 1 where possible
* - 75-90%: Only heavy tasks keep configured model
* - >90%: Everything except replan-slice gets cheapest model
*/
function applyBudgetPressure(
result: ClassificationResult,
budgetPct?: number,
): ClassificationResult {
if (budgetPct === undefined || budgetPct < 0.5) return result;
const original = result.tier;
if (budgetPct >= 0.9) {
// >90%: almost everything goes to light
if (result.tier !== "heavy") {
result.tier = "light";
} else {
// Even heavy gets downgraded to standard
result.tier = "standard";
}
} else if (budgetPct >= 0.75) {
// 75-90%: only heavy stays, everything else goes to light
if (result.tier === "standard") {
result.tier = "light";
}
} else {
// 50-75%: standard → light
if (result.tier === "standard") {
result.tier = "light";
}
}
if (result.tier !== original) {
result.downgraded = true;
result.reason = `${result.reason} (budget pressure: ${Math.round(budgetPct * 100)}%)`;
}
return result;
}

View file

@ -39,6 +39,8 @@ export interface UnitMetrics {
toolCalls: number;
assistantMessages: number;
userMessages: number;
tier?: string; // complexity tier (light/standard/heavy) if dynamic routing active
modelDowngraded?: boolean; // true if dynamic routing used a cheaper model
}
export interface MetricsLedger {
@ -104,6 +106,7 @@ export function snapshotUnitMetrics(
unitId: string,
startedAt: number,
model: string,
extras?: { tier?: string; modelDowngraded?: boolean },
): UnitMetrics | null {
if (!ledger) return null;
@ -156,6 +159,8 @@ export function snapshotUnitMetrics(
toolCalls,
assistantMessages,
userMessages,
...(extras?.tier ? { tier: extras.tier } : {}),
...(extras?.modelDowngraded !== undefined ? { modelDowngraded: extras.modelDowngraded } : {}),
};
ledger.units.push(unit);
@ -294,6 +299,49 @@ export function getProjectTotals(units: UnitMetrics[]): ProjectTotals {
return totals;
}
// ─── Tier Aggregation ────────────────────────────────────────────────────────
export interface TierAggregate {
tier: string;
units: number;
tokens: TokenCounts;
cost: number;
downgraded: number; // units that were downgraded by dynamic routing
}
export function aggregateByTier(units: UnitMetrics[]): TierAggregate[] {
const map = new Map<string, TierAggregate>();
for (const u of units) {
const tier = u.tier ?? "unknown";
let agg = map.get(tier);
if (!agg) {
agg = { tier, units: 0, tokens: emptyTokens(), cost: 0, downgraded: 0 };
map.set(tier, agg);
}
agg.units++;
agg.tokens = addTokens(agg.tokens, u.tokens);
agg.cost += u.cost;
if (u.modelDowngraded) agg.downgraded++;
}
const order = ["light", "standard", "heavy", "unknown"];
return order.map(t => map.get(t)).filter((a): a is TierAggregate => !!a);
}
/**
* Format a summary of savings from dynamic routing.
* Returns empty string if no units were downgraded.
*/
export function formatTierSavings(units: UnitMetrics[]): string {
const downgraded = units.filter(u => u.modelDowngraded);
if (downgraded.length === 0) return "";
const downgradedCost = downgraded.reduce((sum, u) => sum + u.cost, 0);
const totalUnits = units.filter(u => u.tier).length;
const pct = totalUnits > 0 ? Math.round((downgraded.length / totalUnits) * 100) : 0;
return `Dynamic routing: ${downgraded.length}/${totalUnits} units downgraded (${pct}%), cost: ${formatCost(downgradedCost)}`;
}
// ─── Formatting helpers ───────────────────────────────────────────────────────
export function formatCost(cost: number): string {

View file

@ -0,0 +1,65 @@
// GSD Extension — Model Cost Table
// Static cost reference for known models, used by the dynamic router
// for cross-provider cost comparison.
//
// Costs are approximate per-1K-token rates in USD (input tokens).
// Updated with GSD releases. Users can override via preferences.
export interface ModelCostEntry {
/** Model ID (bare, without provider prefix) */
id: string;
/** Approximate cost per 1K input tokens in USD */
inputPer1k: number;
/** Approximate cost per 1K output tokens in USD */
outputPer1k: number;
/** Last updated date */
updatedAt: string;
}
/**
* Bundled cost table for known models.
* Updated periodically with GSD releases.
*/
export const BUNDLED_COST_TABLE: ModelCostEntry[] = [
// Anthropic
{ id: "claude-opus-4-6", inputPer1k: 0.015, outputPer1k: 0.075, updatedAt: "2025-03-15" },
{ id: "claude-sonnet-4-6", inputPer1k: 0.003, outputPer1k: 0.015, updatedAt: "2025-03-15" },
{ id: "claude-haiku-4-5", inputPer1k: 0.0008, outputPer1k: 0.004, updatedAt: "2025-03-15" },
{ id: "claude-sonnet-4-5-20250514", inputPer1k: 0.003, outputPer1k: 0.015, updatedAt: "2025-03-15" },
{ id: "claude-3-5-sonnet-latest", inputPer1k: 0.003, outputPer1k: 0.015, updatedAt: "2025-03-15" },
{ id: "claude-3-5-haiku-latest", inputPer1k: 0.0008, outputPer1k: 0.004, updatedAt: "2025-03-15" },
{ id: "claude-3-opus-latest", inputPer1k: 0.015, outputPer1k: 0.075, updatedAt: "2025-03-15" },
// OpenAI
{ id: "gpt-4o", inputPer1k: 0.0025, outputPer1k: 0.01, updatedAt: "2025-03-15" },
{ id: "gpt-4o-mini", inputPer1k: 0.00015, outputPer1k: 0.0006, updatedAt: "2025-03-15" },
{ id: "o1", inputPer1k: 0.015, outputPer1k: 0.06, updatedAt: "2025-03-15" },
{ id: "o3", inputPer1k: 0.015, outputPer1k: 0.06, updatedAt: "2025-03-15" },
{ id: "gpt-4-turbo", inputPer1k: 0.01, outputPer1k: 0.03, updatedAt: "2025-03-15" },
// Google
{ id: "gemini-2.0-flash", inputPer1k: 0.0001, outputPer1k: 0.0004, updatedAt: "2025-03-15" },
{ id: "gemini-flash-2.0", inputPer1k: 0.0001, outputPer1k: 0.0004, updatedAt: "2025-03-15" },
{ id: "gemini-2.5-pro", inputPer1k: 0.00125, outputPer1k: 0.005, updatedAt: "2025-03-15" },
// DeepSeek
{ id: "deepseek-chat", inputPer1k: 0.00014, outputPer1k: 0.00028, updatedAt: "2025-03-15" },
];
/**
* Lookup cost for a model ID. Returns undefined if not found.
*/
export function lookupModelCost(modelId: string): ModelCostEntry | undefined {
const bareId = modelId.includes("/") ? modelId.split("/").pop()! : modelId;
return BUNDLED_COST_TABLE.find(e => e.id === bareId)
?? BUNDLED_COST_TABLE.find(e => bareId.includes(e.id) || e.id.includes(bareId));
}
/**
* Compare two models by input cost. Returns negative if a is cheaper.
*/
export function compareModelCost(modelIdA: string, modelIdB: string): number {
const costA = lookupModelCost(modelIdA)?.inputPer1k ?? 999;
const costB = lookupModelCost(modelIdB)?.inputPer1k ?? 999;
return costA - costB;
}

View file

@ -0,0 +1,256 @@
// GSD Extension — Dynamic Model Router
// Maps complexity tiers to models, enforcing downgrade-only semantics.
// The user's configured model is always the ceiling.
import type { ComplexityTier, ClassificationResult } from "./complexity-classifier.js";
import { tierOrdinal } from "./complexity-classifier.js";
import type { ResolvedModelConfig } from "./preferences.js";
// ─── Types ───────────────────────────────────────────────────────────────────
export interface DynamicRoutingConfig {
enabled?: boolean;
tier_models?: {
light?: string;
standard?: string;
heavy?: string;
};
escalate_on_failure?: boolean; // default: true
budget_pressure?: boolean; // default: true
cross_provider?: boolean; // default: true
hooks?: boolean; // default: true
}
export interface RoutingDecision {
/** The model ID to use (may be downgraded from configured) */
modelId: string;
/** Fallback chain: [selected_model, ...configured_fallbacks, configured_primary] */
fallbacks: string[];
/** The complexity tier that drove this decision */
tier: ComplexityTier;
/** True if the model was downgraded from the configured primary */
wasDowngraded: boolean;
/** Human-readable reason for this decision */
reason: string;
}
// ─── Known Model Tiers ───────────────────────────────────────────────────────
// Maps known model IDs to their capability tier. Used when tier_models is not
// explicitly configured to pick the best available model for each tier.
const MODEL_CAPABILITY_TIER: Record<string, ComplexityTier> = {
// Light-tier models (cheapest)
"claude-haiku-4-5": "light",
"claude-3-5-haiku-latest": "light",
"claude-3-haiku-20240307": "light",
"gpt-4o-mini": "light",
"gemini-2.0-flash": "light",
"gemini-flash-2.0": "light",
// Standard-tier models
"claude-sonnet-4-6": "standard",
"claude-sonnet-4-5-20250514": "standard",
"claude-3-5-sonnet-latest": "standard",
"gpt-4o": "standard",
"gemini-2.5-pro": "standard",
"deepseek-chat": "standard",
// Heavy-tier models (most capable)
"claude-opus-4-6": "heavy",
"claude-3-opus-latest": "heavy",
"gpt-4-turbo": "heavy",
"o1": "heavy",
"o3": "heavy",
};
// ─── Cost Table (per 1K input tokens, approximate USD) ───────────────────────
// Used for cross-provider cost comparison when multiple providers offer
// the same capability tier.
const MODEL_COST_PER_1K_INPUT: Record<string, number> = {
"claude-haiku-4-5": 0.0008,
"claude-3-5-haiku-latest": 0.0008,
"claude-sonnet-4-6": 0.003,
"claude-sonnet-4-5-20250514": 0.003,
"claude-opus-4-6": 0.015,
"gpt-4o-mini": 0.00015,
"gpt-4o": 0.0025,
"gemini-2.0-flash": 0.0001,
"gemini-2.5-pro": 0.00125,
"deepseek-chat": 0.00014,
};
// ─── Public API ──────────────────────────────────────────────────────────────
/**
* Resolve the model to use for a given complexity tier.
*
* Downgrade-only: the returned model is always equal to or cheaper than
* the user's configured primary model. Never upgrades beyond configuration.
*
* @param classification The complexity classification result
* @param phaseConfig The user's configured model for this phase (ceiling)
* @param routingConfig Dynamic routing configuration
* @param availableModelIds List of available model IDs (from registry)
*/
export function resolveModelForComplexity(
classification: ClassificationResult,
phaseConfig: ResolvedModelConfig | undefined,
routingConfig: DynamicRoutingConfig,
availableModelIds: string[],
): RoutingDecision {
// If no phase config or routing disabled, pass through
if (!phaseConfig || !routingConfig.enabled) {
return {
modelId: phaseConfig?.primary ?? "",
fallbacks: phaseConfig?.fallbacks ?? [],
tier: classification.tier,
wasDowngraded: false,
reason: "dynamic routing disabled or no phase config",
};
}
const configuredPrimary = phaseConfig.primary;
const configuredTier = getModelTier(configuredPrimary);
const requestedTier = classification.tier;
// Downgrade-only: if requested tier >= configured tier, no change
if (tierOrdinal(requestedTier) >= tierOrdinal(configuredTier)) {
return {
modelId: configuredPrimary,
fallbacks: phaseConfig.fallbacks,
tier: requestedTier,
wasDowngraded: false,
reason: `tier ${requestedTier} >= configured ${configuredTier}`,
};
}
// Find the best model for the requested tier
const targetModelId = findModelForTier(
requestedTier,
routingConfig,
availableModelIds,
routingConfig.cross_provider !== false,
);
if (!targetModelId) {
// No suitable model found — use configured primary
return {
modelId: configuredPrimary,
fallbacks: phaseConfig.fallbacks,
tier: requestedTier,
wasDowngraded: false,
reason: `no ${requestedTier}-tier model available`,
};
}
// Build fallback chain: [downgraded_model, ...configured_fallbacks, configured_primary]
const fallbacks = [
...phaseConfig.fallbacks.filter(f => f !== targetModelId),
configuredPrimary,
].filter(f => f !== targetModelId);
return {
modelId: targetModelId,
fallbacks,
tier: requestedTier,
wasDowngraded: true,
reason: classification.reason,
};
}
/**
* Escalate to the next tier after a failure.
* Returns the new tier, or null if already at heavy (max).
*/
export function escalateTier(currentTier: ComplexityTier): ComplexityTier | null {
switch (currentTier) {
case "light": return "standard";
case "standard": return "heavy";
case "heavy": return null;
}
}
/**
* Get the default routing config (all features enabled).
*/
export function defaultRoutingConfig(): DynamicRoutingConfig {
return {
enabled: false,
escalate_on_failure: true,
budget_pressure: true,
cross_provider: true,
hooks: true,
};
}
// ─── Internal ────────────────────────────────────────────────────────────────
function getModelTier(modelId: string): ComplexityTier {
// Strip provider prefix if present
const bareId = modelId.includes("/") ? modelId.split("/").pop()! : modelId;
// Check exact match first
if (MODEL_CAPABILITY_TIER[bareId]) return MODEL_CAPABILITY_TIER[bareId];
// Check if any known model ID is a prefix/suffix match
for (const [knownId, tier] of Object.entries(MODEL_CAPABILITY_TIER)) {
if (bareId.includes(knownId) || knownId.includes(bareId)) return tier;
}
// Unknown models are assumed heavy (safest assumption)
return "heavy";
}
function findModelForTier(
tier: ComplexityTier,
config: DynamicRoutingConfig,
availableModelIds: string[],
crossProvider: boolean,
): string | null {
// 1. Check explicit tier_models config
const explicitModel = config.tier_models?.[tier];
if (explicitModel && availableModelIds.includes(explicitModel)) {
return explicitModel;
}
// Also check with provider prefix stripped
if (explicitModel) {
const match = availableModelIds.find(id => {
const bareAvail = id.includes("/") ? id.split("/").pop()! : id;
const bareExplicit = explicitModel.includes("/") ? explicitModel.split("/").pop()! : explicitModel;
return bareAvail === bareExplicit;
});
if (match) return match;
}
// 2. Auto-detect: find the cheapest available model in the requested tier
const candidates = availableModelIds
.filter(id => {
const modelTier = getModelTier(id);
return modelTier === tier;
})
.sort((a, b) => {
if (!crossProvider) return 0;
const costA = getModelCost(a);
const costB = getModelCost(b);
return costA - costB;
});
return candidates[0] ?? null;
}
function getModelCost(modelId: string): number {
const bareId = modelId.includes("/") ? modelId.split("/").pop()! : modelId;
if (MODEL_COST_PER_1K_INPUT[bareId] !== undefined) {
return MODEL_COST_PER_1K_INPUT[bareId];
}
// Check partial matches
for (const [knownId, cost] of Object.entries(MODEL_COST_PER_1K_INPUT)) {
if (bareId.includes(knownId) || knownId.includes(bareId)) return cost;
}
// Unknown cost — assume expensive to avoid routing to unknown cheap models
return 999;
}

View file

@ -4,6 +4,8 @@ import { isAbsolute, join } from "node:path";
import { getAgentDir } from "@gsd/pi-coding-agent";
import type { GitPreferences } from "./git-service.js";
import type { PostUnitHookConfig, PreDispatchHookConfig, BudgetEnforcementMode, NotificationPreferences, TokenProfile, InlineLevel, PhaseSkipPreferences } from "./types.js";
import type { DynamicRoutingConfig } from "./model-router.js";
import { defaultRoutingConfig } from "./model-router.js";
import { VALID_BRANCH_NAME } from "./git-service.js";
const GLOBAL_PREFERENCES_PATH = join(homedir(), ".gsd", "preferences.md");
@ -36,6 +38,7 @@ const KNOWN_PREFERENCE_KEYS = new Set<string>([
"git",
"post_unit_hooks",
"pre_dispatch_hooks",
"dynamic_routing",
"token_profile",
"phases",
]);
@ -128,6 +131,7 @@ export interface GSDPreferences {
git?: GitPreferences;
post_unit_hooks?: PostUnitHookConfig[];
pre_dispatch_hooks?: PreDispatchHookConfig[];
dynamic_routing?: DynamicRoutingConfig;
token_profile?: TokenProfile;
phases?: PhaseSkipPreferences;
}
@ -674,6 +678,20 @@ export function resolveModelWithFallbacksForUnit(unitType: string): ResolvedMode
};
}
/**
* Resolve the dynamic routing configuration from effective preferences.
* Returns the merged config with defaults applied.
*/
export function resolveDynamicRoutingConfig(): DynamicRoutingConfig {
const prefs = loadEffectiveGSDPreferences();
const configured = prefs?.preferences.dynamic_routing;
if (!configured) return defaultRoutingConfig();
return {
...defaultRoutingConfig(),
...configured,
};
}
export function resolveAutoSupervisorConfig(): AutoSupervisorConfig {
const prefs = loadEffectiveGSDPreferences();
const configured = prefs?.preferences.auto_supervisor ?? {};
@ -780,6 +798,9 @@ function mergePreferences(base: GSDPreferences, override: GSDPreferences): GSDPr
: undefined,
post_unit_hooks: mergePostUnitHooks(base.post_unit_hooks, override.post_unit_hooks),
pre_dispatch_hooks: mergePreDispatchHooks(base.pre_dispatch_hooks, override.pre_dispatch_hooks),
dynamic_routing: (base.dynamic_routing || override.dynamic_routing)
? { ...(base.dynamic_routing ?? {}), ...(override.dynamic_routing ?? {}) } as DynamicRoutingConfig
: undefined,
token_profile: override.token_profile ?? base.token_profile,
phases: (base.phases || override.phases)
? { ...(base.phases ?? {}), ...(override.phases ?? {}) }
@ -1100,6 +1121,56 @@ export function validatePreferences(preferences: GSDPreferences): {
}
}
// ─── Dynamic Routing ─────────────────────────────────────────────────
if (preferences.dynamic_routing !== undefined) {
if (typeof preferences.dynamic_routing === "object" && preferences.dynamic_routing !== null) {
const dr = preferences.dynamic_routing as unknown as Record<string, unknown>;
const validDr: Partial<DynamicRoutingConfig> = {};
if (dr.enabled !== undefined) {
if (typeof dr.enabled === "boolean") validDr.enabled = dr.enabled;
else errors.push("dynamic_routing.enabled must be a boolean");
}
if (dr.escalate_on_failure !== undefined) {
if (typeof dr.escalate_on_failure === "boolean") validDr.escalate_on_failure = dr.escalate_on_failure;
else errors.push("dynamic_routing.escalate_on_failure must be a boolean");
}
if (dr.budget_pressure !== undefined) {
if (typeof dr.budget_pressure === "boolean") validDr.budget_pressure = dr.budget_pressure;
else errors.push("dynamic_routing.budget_pressure must be a boolean");
}
if (dr.cross_provider !== undefined) {
if (typeof dr.cross_provider === "boolean") validDr.cross_provider = dr.cross_provider;
else errors.push("dynamic_routing.cross_provider must be a boolean");
}
if (dr.hooks !== undefined) {
if (typeof dr.hooks === "boolean") validDr.hooks = dr.hooks;
else errors.push("dynamic_routing.hooks must be a boolean");
}
if (dr.tier_models !== undefined) {
if (typeof dr.tier_models === "object" && dr.tier_models !== null) {
const tm = dr.tier_models as Record<string, unknown>;
const validTm: Record<string, string> = {};
for (const tier of ["light", "standard", "heavy"]) {
if (tm[tier] !== undefined) {
if (typeof tm[tier] === "string") validTm[tier] = tm[tier] as string;
else errors.push(`dynamic_routing.tier_models.${tier} must be a string`);
}
}
if (Object.keys(validTm).length > 0) validDr.tier_models = validTm as DynamicRoutingConfig["tier_models"];
} else {
errors.push("dynamic_routing.tier_models must be an object");
}
}
if (Object.keys(validDr).length > 0) {
validated.dynamic_routing = validDr as unknown as DynamicRoutingConfig;
}
} else {
errors.push("dynamic_routing must be an object");
}
}
// ─── Git Preferences ───────────────────────────────────────────────────
if (preferences.git && typeof preferences.git === "object") {
const git: Record<string, unknown> = {};

View file

@ -0,0 +1,181 @@
import test from "node:test";
import assert from "node:assert/strict";
import { classifyUnitComplexity, tierLabel, tierOrdinal } from "../complexity-classifier.js";
import type { ComplexityTier, TaskMetadata } from "../complexity-classifier.js";
// ─── tierLabel ───────────────────────────────────────────────────────────────
test("tierLabel returns correct short labels", () => {
assert.equal(tierLabel("light"), "L");
assert.equal(tierLabel("standard"), "S");
assert.equal(tierLabel("heavy"), "H");
});
// ─── tierOrdinal ─────────────────────────────────────────────────────────────
test("tierOrdinal returns correct ordering", () => {
assert.ok(tierOrdinal("light") < tierOrdinal("standard"));
assert.ok(tierOrdinal("standard") < tierOrdinal("heavy"));
});
// ─── Unit Type Classification ────────────────────────────────────────────────
test("complete-slice classifies as light", () => {
const result = classifyUnitComplexity("complete-slice", "M001/S01", "/tmp/fake");
assert.equal(result.tier, "light");
});
test("run-uat classifies as light", () => {
const result = classifyUnitComplexity("run-uat", "M001/S01", "/tmp/fake");
assert.equal(result.tier, "light");
});
test("research-milestone classifies as standard", () => {
const result = classifyUnitComplexity("research-milestone", "M001", "/tmp/fake");
assert.equal(result.tier, "standard");
});
test("research-slice classifies as standard", () => {
const result = classifyUnitComplexity("research-slice", "M001/S01", "/tmp/fake");
assert.equal(result.tier, "standard");
});
test("plan-milestone classifies as standard", () => {
const result = classifyUnitComplexity("plan-milestone", "M001", "/tmp/fake");
assert.equal(result.tier, "standard");
});
test("plan-slice classifies as standard", () => {
const result = classifyUnitComplexity("plan-slice", "M001/S01", "/tmp/fake");
assert.equal(result.tier, "standard");
});
test("replan-slice classifies as heavy", () => {
const result = classifyUnitComplexity("replan-slice", "M001/S01", "/tmp/fake");
assert.equal(result.tier, "heavy");
});
test("reassess-roadmap classifies as heavy", () => {
const result = classifyUnitComplexity("reassess-roadmap", "M001", "/tmp/fake");
assert.equal(result.tier, "heavy");
});
test("hook units classify as light", () => {
const result = classifyUnitComplexity("hook/verify", "M001/S01/T01", "/tmp/fake");
assert.equal(result.tier, "light");
assert.match(result.reason, /hook/);
});
test("unknown unit types default to standard", () => {
const result = classifyUnitComplexity("custom-thing", "M001", "/tmp/fake");
assert.equal(result.tier, "standard");
});
// ─── Task Metadata Classification ────────────────────────────────────────────
test("execute-task with many dependencies classifies as heavy", () => {
const metadata: TaskMetadata = { dependencyCount: 4 };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "heavy");
assert.match(result.reason, /dependencies/);
});
test("execute-task with many files classifies as heavy", () => {
const metadata: TaskMetadata = { fileCount: 8 };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "heavy");
assert.match(result.reason, /files/);
});
test("execute-task with large estimated lines classifies as heavy", () => {
const metadata: TaskMetadata = { estimatedLines: 600 };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "heavy");
assert.match(result.reason, /lines/);
});
test("execute-task with docs tags classifies as light", () => {
const metadata: TaskMetadata = { tags: ["docs"] };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "light");
});
test("execute-task with single file modification classifies as light", () => {
const metadata: TaskMetadata = { fileCount: 1, isNewFile: false };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "light");
});
test("execute-task with no metadata classifies as standard", () => {
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake");
assert.equal(result.tier, "standard");
});
// ─── Budget Pressure ─────────────────────────────────────────────────────────
test("no budget pressure below 50%", () => {
const result = classifyUnitComplexity("research-slice", "M001/S01", "/tmp/fake", 0.3);
assert.equal(result.tier, "standard");
assert.equal(result.downgraded, false);
});
test("budget pressure at 50% downgrades standard to light", () => {
const result = classifyUnitComplexity("research-slice", "M001/S01", "/tmp/fake", 0.55);
assert.equal(result.tier, "light");
assert.equal(result.downgraded, true);
assert.match(result.reason, /budget pressure/);
});
test("budget pressure at 75% keeps heavy as heavy", () => {
const result = classifyUnitComplexity("replan-slice", "M001/S01", "/tmp/fake", 0.80);
assert.equal(result.tier, "heavy");
assert.equal(result.downgraded, false);
});
test("budget pressure at 90% downgrades heavy to standard", () => {
const result = classifyUnitComplexity("replan-slice", "M001/S01", "/tmp/fake", 0.95);
assert.equal(result.tier, "standard");
assert.equal(result.downgraded, true);
});
test("budget pressure at 90% downgrades standard to light", () => {
const result = classifyUnitComplexity("research-slice", "M001/S01", "/tmp/fake", 0.95);
assert.equal(result.tier, "light");
assert.equal(result.downgraded, true);
});
test("budget pressure at 90% downgrades light stays light", () => {
const result = classifyUnitComplexity("complete-slice", "M001/S01", "/tmp/fake", 0.95);
assert.equal(result.tier, "light");
});
// ─── Phase 4: Task Plan Introspection ────────────────────────────────────────
test("execute-task with multiple complexity keywords classifies as heavy", () => {
const metadata: TaskMetadata = { complexityKeywords: ["migration", "security"] };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "heavy");
assert.match(result.reason, /migration/);
assert.match(result.reason, /security/);
});
test("execute-task with single complexity keyword classifies as standard", () => {
const metadata: TaskMetadata = { complexityKeywords: ["performance"] };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "standard");
assert.match(result.reason, /performance/);
});
test("execute-task with many code blocks classifies as heavy", () => {
const metadata: TaskMetadata = { codeBlockCount: 6 };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "heavy");
assert.match(result.reason, /code blocks/);
});
test("execute-task with few code blocks stays standard", () => {
const metadata: TaskMetadata = { codeBlockCount: 2 };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "standard");
});

View file

@ -0,0 +1,69 @@
import test from "node:test";
import assert from "node:assert/strict";
import { lookupModelCost, compareModelCost, BUNDLED_COST_TABLE } from "../model-cost-table.js";
// ─── lookupModelCost ─────────────────────────────────────────────────────────
test("lookupModelCost finds exact match", () => {
const entry = lookupModelCost("claude-opus-4-6");
assert.ok(entry);
assert.equal(entry.id, "claude-opus-4-6");
assert.ok(entry.inputPer1k > 0);
assert.ok(entry.outputPer1k > 0);
});
test("lookupModelCost strips provider prefix", () => {
const entry = lookupModelCost("anthropic/claude-opus-4-6");
assert.ok(entry);
assert.equal(entry.id, "claude-opus-4-6");
});
test("lookupModelCost returns undefined for unknown model", () => {
const entry = lookupModelCost("totally-unknown-model");
assert.equal(entry, undefined);
});
test("lookupModelCost finds haiku", () => {
const entry = lookupModelCost("claude-haiku-4-5");
assert.ok(entry);
assert.ok(entry.inputPer1k < 0.001, "haiku should be cheap");
});
// ─── compareModelCost ────────────────────────────────────────────────────────
test("haiku is cheaper than opus", () => {
assert.ok(compareModelCost("claude-haiku-4-5", "claude-opus-4-6") < 0);
});
test("opus is more expensive than sonnet", () => {
assert.ok(compareModelCost("claude-opus-4-6", "claude-sonnet-4-6") > 0);
});
test("same model has equal cost", () => {
assert.equal(compareModelCost("claude-opus-4-6", "claude-opus-4-6"), 0);
});
// ─── BUNDLED_COST_TABLE ──────────────────────────────────────────────────────
test("cost table has entries for all major providers", () => {
const ids = BUNDLED_COST_TABLE.map(e => e.id);
// Anthropic
assert.ok(ids.includes("claude-opus-4-6"));
assert.ok(ids.includes("claude-sonnet-4-6"));
assert.ok(ids.includes("claude-haiku-4-5"));
// OpenAI
assert.ok(ids.includes("gpt-4o"));
assert.ok(ids.includes("gpt-4o-mini"));
// Google
assert.ok(ids.includes("gemini-2.0-flash"));
});
test("all cost table entries have valid data", () => {
for (const entry of BUNDLED_COST_TABLE) {
assert.ok(entry.id, `entry missing id`);
assert.ok(entry.inputPer1k >= 0, `${entry.id} inputPer1k should be >= 0`);
assert.ok(entry.outputPer1k >= 0, `${entry.id} outputPer1k should be >= 0`);
assert.ok(entry.updatedAt, `${entry.id} missing updatedAt`);
}
});

View file

@ -0,0 +1,167 @@
import test from "node:test";
import assert from "node:assert/strict";
import {
resolveModelForComplexity,
escalateTier,
defaultRoutingConfig,
} from "../model-router.js";
import type { DynamicRoutingConfig, RoutingDecision } from "../model-router.js";
import type { ClassificationResult } from "../complexity-classifier.js";
// ─── Helpers ─────────────────────────────────────────────────────────────────
function makeClassification(tier: "light" | "standard" | "heavy", reason = "test"): ClassificationResult {
return { tier, reason, downgraded: false };
}
const AVAILABLE_MODELS = [
"claude-opus-4-6",
"claude-sonnet-4-6",
"claude-haiku-4-5",
"gpt-4o-mini",
];
// ─── Passthrough when disabled ───────────────────────────────────────────────
test("returns configured model when routing is disabled", () => {
const config = { ...defaultRoutingConfig(), enabled: false };
const result = resolveModelForComplexity(
makeClassification("light"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
AVAILABLE_MODELS,
);
assert.equal(result.modelId, "claude-opus-4-6");
assert.equal(result.wasDowngraded, false);
});
test("returns configured model when no phase config", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result = resolveModelForComplexity(
makeClassification("light"),
undefined,
config,
AVAILABLE_MODELS,
);
assert.equal(result.modelId, "");
assert.equal(result.wasDowngraded, false);
});
// ─── Downgrade-only semantics ────────────────────────────────────────────────
test("does not downgrade when tier matches configured model tier", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result = resolveModelForComplexity(
makeClassification("heavy"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
AVAILABLE_MODELS,
);
assert.equal(result.modelId, "claude-opus-4-6");
assert.equal(result.wasDowngraded, false);
});
test("does not upgrade beyond configured model", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
// Configured model is sonnet (standard), classification says heavy
const result = resolveModelForComplexity(
makeClassification("heavy"),
{ primary: "claude-sonnet-4-6", fallbacks: [] },
config,
AVAILABLE_MODELS,
);
assert.equal(result.modelId, "claude-sonnet-4-6");
assert.equal(result.wasDowngraded, false);
});
test("downgrades from opus to haiku for light tier", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result = resolveModelForComplexity(
makeClassification("light"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
AVAILABLE_MODELS,
);
// Should pick haiku or gpt-4o-mini (cheapest light tier)
assert.ok(
result.modelId === "claude-haiku-4-5" || result.modelId === "gpt-4o-mini",
`Expected light-tier model, got ${result.modelId}`,
);
assert.equal(result.wasDowngraded, true);
});
test("downgrades from opus to sonnet for standard tier", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result = resolveModelForComplexity(
makeClassification("standard"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
AVAILABLE_MODELS,
);
assert.equal(result.modelId, "claude-sonnet-4-6");
assert.equal(result.wasDowngraded, true);
});
// ─── Explicit tier_models ────────────────────────────────────────────────────
test("uses explicit tier_models when configured", () => {
const config: DynamicRoutingConfig = {
...defaultRoutingConfig(),
enabled: true,
tier_models: { light: "gpt-4o-mini", standard: "claude-sonnet-4-6" },
};
const result = resolveModelForComplexity(
makeClassification("light"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
AVAILABLE_MODELS,
);
assert.equal(result.modelId, "gpt-4o-mini");
assert.equal(result.wasDowngraded, true);
});
// ─── Fallback chain construction ─────────────────────────────────────────────
test("fallback chain includes configured primary as last resort", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result = resolveModelForComplexity(
makeClassification("light"),
{ primary: "claude-opus-4-6", fallbacks: ["claude-sonnet-4-6"] },
config,
AVAILABLE_MODELS,
);
assert.ok(result.wasDowngraded);
// Fallbacks should include the configured fallbacks and primary
assert.ok(result.fallbacks.includes("claude-opus-4-6"), "primary should be in fallbacks");
assert.ok(result.fallbacks.includes("claude-sonnet-4-6"), "configured fallback should be in fallbacks");
});
// ─── Escalation ──────────────────────────────────────────────────────────────
test("escalateTier moves light → standard", () => {
assert.equal(escalateTier("light"), "standard");
});
test("escalateTier moves standard → heavy", () => {
assert.equal(escalateTier("standard"), "heavy");
});
test("escalateTier returns null for heavy (max)", () => {
assert.equal(escalateTier("heavy"), null);
});
// ─── No suitable model available ─────────────────────────────────────────────
test("falls back to configured model when no light-tier model available", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
// Only heavy-tier models available
const result = resolveModelForComplexity(
makeClassification("light"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
["claude-opus-4-6"],
);
assert.equal(result.modelId, "claude-opus-4-6");
assert.equal(result.wasDowngraded, false);
});

View file

@ -1,87 +1,240 @@
/**
* Routing History structural tests for adaptive learning module.
*
* Verifies routing-history.ts exports and structure from #579.
* Uses source-level checks to avoid @gsd/pi-coding-agent import chain.
*/
import test from "node:test";
import assert from "node:assert/strict";
import { readFileSync } from "node:fs";
import { join, dirname } from "node:path";
import { fileURLToPath } from "node:url";
import { mkdirSync, rmSync, writeFileSync, readFileSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
const __dirname = dirname(fileURLToPath(import.meta.url));
const historySrc = readFileSync(join(__dirname, "..", "routing-history.ts"), "utf-8");
import {
initRoutingHistory,
resetRoutingHistory,
recordOutcome,
recordFeedback,
getAdaptiveTierAdjustment,
clearRoutingHistory,
getRoutingHistory,
} from "../routing-history.js";
// ═══════════════════════════════════════════════════════════════════════════
// Module Exports
// ═══════════════════════════════════════════════════════════════════════════
// ─── Test Setup ──────────────────────────────────────────────────────────────
test("routing-history: exports initRoutingHistory", () => {
assert.ok(historySrc.includes("export function initRoutingHistory"), "should export initRoutingHistory");
function makeTmpDir(): string {
const dir = join(tmpdir(), `gsd-routing-test-${Date.now()}-${Math.random().toString(36).slice(2)}`);
mkdirSync(join(dir, ".gsd"), { recursive: true });
return dir;
}
function cleanup(dir: string): void {
try { rmSync(dir, { recursive: true, force: true }); } catch {}
resetRoutingHistory();
}
// ─── recordOutcome ───────────────────────────────────────────────────────────
test("recordOutcome tracks success and failure counts", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordOutcome("execute-task", "standard", true);
recordOutcome("execute-task", "standard", true);
recordOutcome("execute-task", "standard", false);
const history = getRoutingHistory();
assert.ok(history);
const pattern = history.patterns["execute-task"];
assert.ok(pattern);
assert.equal(pattern.standard.success, 2);
assert.equal(pattern.standard.fail, 1);
} finally {
cleanup(dir);
}
});
test("routing-history: exports recordOutcome", () => {
assert.ok(historySrc.includes("export function recordOutcome"), "should export recordOutcome");
test("recordOutcome tracks tag-specific patterns", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordOutcome("execute-task", "light", true, ["docs"]);
const history = getRoutingHistory();
assert.ok(history);
assert.ok(history.patterns["execute-task:docs"]);
assert.equal(history.patterns["execute-task:docs"].light.success, 1);
} finally {
cleanup(dir);
}
});
test("routing-history: exports recordFeedback", () => {
assert.ok(historySrc.includes("export function recordFeedback"), "should export recordFeedback");
test("recordOutcome applies rolling window", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
// Record 60 successes — should be capped to 50
for (let i = 0; i < 60; i++) {
recordOutcome("execute-task", "standard", true);
}
const history = getRoutingHistory();
assert.ok(history);
const total = history.patterns["execute-task"].standard.success +
history.patterns["execute-task"].standard.fail;
assert.ok(total <= 50, `total ${total} should be <= 50`);
} finally {
cleanup(dir);
}
});
test("routing-history: exports getAdaptiveTierAdjustment", () => {
assert.ok(historySrc.includes("export function getAdaptiveTierAdjustment"), "should export getAdaptiveTierAdjustment");
// ─── getAdaptiveTierAdjustment ───────────────────────────────────────────────
test("no adjustment when insufficient data", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordOutcome("execute-task", "light", false);
// Only 1 data point — not enough
const adj = getAdaptiveTierAdjustment("execute-task", "light");
assert.equal(adj, null);
} finally {
cleanup(dir);
}
});
test("routing-history: exports resetRoutingHistory", () => {
assert.ok(historySrc.includes("export function resetRoutingHistory"), "should export resetRoutingHistory");
test("bumps tier when failure rate exceeds threshold", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
// Record high failure rate at light tier
recordOutcome("execute-task", "light", false);
recordOutcome("execute-task", "light", false);
recordOutcome("execute-task", "light", true);
// 2/3 = 66% failure rate > 20% threshold
const adj = getAdaptiveTierAdjustment("execute-task", "light");
assert.equal(adj, "standard");
} finally {
cleanup(dir);
}
});
// ═══════════════════════════════════════════════════════════════════════════
// Design Constants
// ═══════════════════════════════════════════════════════════════════════════
test("routing-history: uses rolling window of 50 entries", () => {
assert.ok(historySrc.includes("ROLLING_WINDOW = 50"), "should use 50-entry rolling window");
test("no adjustment when success rate is high", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
for (let i = 0; i < 10; i++) {
recordOutcome("execute-task", "light", true);
}
const adj = getAdaptiveTierAdjustment("execute-task", "light");
assert.equal(adj, null);
} finally {
cleanup(dir);
}
});
test("routing-history: failure threshold is 20%", () => {
assert.ok(historySrc.includes("FAILURE_THRESHOLD = 0.20"), "should use 20% failure threshold");
test("tag-specific patterns take precedence", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
// Base pattern has high success rate (tagged calls also count toward base)
for (let i = 0; i < 15; i++) {
recordOutcome("execute-task", "light", true);
}
// But docs-tagged tasks fail at light
recordOutcome("execute-task", "light", false, ["docs"]);
recordOutcome("execute-task", "light", false, ["docs"]);
recordOutcome("execute-task", "light", true, ["docs"]);
// With tags, should bump (docs pattern: 1/3 success = 66% failure)
const adj = getAdaptiveTierAdjustment("execute-task", "light", ["docs"]);
assert.equal(adj, "standard");
// Without tags, should not bump (base: 16/18 success = 11% failure)
const adjBase = getAdaptiveTierAdjustment("execute-task", "light");
assert.equal(adjBase, null);
} finally {
cleanup(dir);
}
});
test("routing-history: feedback weight is 2x", () => {
assert.ok(historySrc.includes("FEEDBACK_WEIGHT = 2"), "feedback should count 2x");
// ─── recordFeedback ──────────────────────────────────────────────────────────
test("recordFeedback stores feedback entries", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordFeedback("execute-task", "M001/S01/T01", "standard", "over");
const history = getRoutingHistory();
assert.ok(history);
assert.equal(history.feedback.length, 1);
assert.equal(history.feedback[0].rating, "over");
assert.equal(history.feedback[0].tier, "standard");
} finally {
cleanup(dir);
}
});
// ═══════════════════════════════════════════════════════════════════════════
// Type Structure
// ═══════════════════════════════════════════════════════════════════════════
test("recordFeedback 'under' increases failure count at tier", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordFeedback("execute-task", "M001/S01/T01", "light", "under");
test("routing-history: imports ComplexityTier from types.ts", () => {
assert.ok(
historySrc.includes('from "./types.js"') && historySrc.includes("ComplexityTier"),
"should import ComplexityTier from types.ts",
);
const history = getRoutingHistory();
assert.ok(history);
// "under" adds 2 (FEEDBACK_WEIGHT) failures
assert.equal(history.patterns["execute-task"].light.fail, 2);
} finally {
cleanup(dir);
}
});
test("routing-history: defines RoutingHistoryData interface", () => {
assert.ok(historySrc.includes("interface RoutingHistoryData"), "should define RoutingHistoryData");
test("recordFeedback 'over' increases success count at lower tier", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordFeedback("execute-task", "M001/S01/T01", "standard", "over");
const history = getRoutingHistory();
assert.ok(history);
// "over" at standard → adds 2 successes at light
assert.equal(history.patterns["execute-task"].light.success, 2);
} finally {
cleanup(dir);
}
});
test("routing-history: defines FeedbackEntry interface", () => {
assert.ok(historySrc.includes("interface FeedbackEntry"), "should define FeedbackEntry");
// ─── clearRoutingHistory ─────────────────────────────────────────────────────
test("clearRoutingHistory resets all data", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordOutcome("execute-task", "light", true);
clearRoutingHistory(dir);
const history = getRoutingHistory();
assert.ok(history);
assert.deepEqual(history.patterns, {});
assert.deepEqual(history.feedback, []);
} finally {
cleanup(dir);
}
});
// ═══════════════════════════════════════════════════════════════════════════
// Persistence
// ═══════════════════════════════════════════════════════════════════════════
// ─── Persistence ─────────────────────────────────────────────────────────────
test("routing-history: persists to routing-history.json", () => {
assert.ok(historySrc.includes("routing-history.json"), "should persist to routing-history.json");
});
test("routing history persists to disk and reloads", () => {
const dir = makeTmpDir();
try {
initRoutingHistory(dir);
recordOutcome("execute-task", "standard", true);
recordOutcome("execute-task", "standard", true);
resetRoutingHistory();
test("routing-history: has save and load functions", () => {
assert.ok(historySrc.includes("saveHistory") || historySrc.includes("function save"), "should have save");
assert.ok(historySrc.includes("loadHistory") || historySrc.includes("function load"), "should have load");
// Reload from disk
initRoutingHistory(dir);
const history = getRoutingHistory();
assert.ok(history);
assert.equal(history.patterns["execute-task"].standard.success, 2);
} finally {
cleanup(dir);
}
});