Merge pull request #2755 from jeremymcs/feat/capability-aware-model-routing-pr

feat: capability-aware model routing (ADR-004)
This commit is contained in:
Jeremy McSpadden 2026-04-04 15:23:38 -05:00 committed by GitHub
commit af82c37041
12 changed files with 1277 additions and 224 deletions

View file

@ -1,12 +1,20 @@
# Dynamic Model Routing
*Introduced in v2.19.0*
*Introduced in v2.19.0. Capability scoring introduced in v2.52.0.*
Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
Starting in v2.52.0, the router uses **capability-aware scoring** to select the *best fit* model for each task, not just the cheapest one in the tier.
## How It Works
Each unit dispatched by auto-mode is classified into a complexity tier:
Each unit dispatched by auto-mode passes through a two-stage pipeline:
**Stage 1: Complexity classification** — classifies the work into a tier (light/standard/heavy).
**Stage 2: Capability scoring** — within the eligible tier, ranks available models by how well their capabilities match the task's requirements.
The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
| Tier | Typical Work | Default Model Level |
|------|-------------|-------------------|
@ -14,8 +22,6 @@ Each unit dispatched by auto-mode is classified into a complexity tier:
| **Standard** | Research, planning, execution, milestone completion | Sonnet-class |
| **Heavy** | Replanning, roadmap reassessment, complex execution | Opus-class |
The router then selects a model for that tier. The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
## Enabling
Dynamic routing is off by default. Enable it in preferences:
@ -41,6 +47,7 @@ dynamic_routing:
budget_pressure: true # auto-downgrade when approaching budget ceiling (default: true)
cross_provider: true # consider models from other providers (default: true)
hooks: true # apply routing to post-unit hooks (default: true)
capability_routing: true # enable capability scoring within tier (default: true)
```
### `tier_models`
@ -70,35 +77,156 @@ When approaching the budget ceiling, the router progressively downgrades:
When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
## Capability-Aware Scoring
### `capability_routing`
*Introduced in v2.59.0 (ADR-004 Phase 2)*
When `capability_routing` is enabled, the router goes beyond tier classification and scores models against task-specific capability requirements. Each known model has a 7-dimension profile:
| Dimension | What It Measures |
|-----------|-----------------|
| `coding` | Code generation, refactoring, implementation quality |
| `debugging` | Error diagnosis, fix accuracy |
| `research` | Information gathering, codebase exploration |
| `reasoning` | Multi-step logic, architectural decisions |
| `speed` | Response latency (inverse of cost) |
| `longContext` | Performance with large context windows |
| `instruction` | Adherence to structured instructions and templates |
Each unit type maps to a weighted requirement vector. For example, `execute-task` weights `coding: 0.9, reasoning: 0.6, debugging: 0.5` while `research-slice` weights `research: 0.9, reasoning: 0.7, longContext: 0.5`.
For `execute-task` units, the classifier also inspects task metadata (tags, description) to refine requirements. Documentation tasks boost `instruction` and lower `coding`; test tasks boost `debugging`.
Enable capability routing:
When enabled (default: true), the router uses capability scoring to pick the best model in a tier rather than always defaulting to the cheapest. Set to `false` to revert to cheapest-in-tier behavior:
```yaml
dynamic_routing:
enabled: true
capability_routing: true
capability_routing: false # disable scoring, use cheapest-in-tier
```
When enabled, models within the target tier are ranked by capability score rather than selected arbitrarily. When disabled (the default), the existing tier-only selection applies.
## Capability Profiles
Each model has a built-in **capability profile** — a 7-dimension score (0100) representing how well it handles different task types:
| Dimension | What It Represents |
|-----------|-------------------|
| `coding` | Code generation and implementation accuracy |
| `debugging` | Diagnosing and fixing errors |
| `research` | Synthesizing information and exploring topics |
| `reasoning` | Multi-step logical reasoning |
| `speed` | Latency and throughput (inverse of capability depth) |
| `longContext` | Handling large codebases and long documents |
| `instruction` | Following structured instructions precisely |
**Built-in profiles** exist for 9 models: `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`, `gpt-4o`, `gpt-4o-mini`, `gemini-2.5-pro`, `gemini-2.0-flash`, `deepseek-chat`, `o3`.
Models without a built-in profile receive **uniform scores of 50** across all dimensions. This is a cold-start policy — unknown models compete but don't have an advantage. From the user's perspective, routing behaves the same as before capability scoring was introduced for those models.
**Profiles are heuristic rankings, not benchmarks.** They represent approximate relative strengths, not verified benchmark results. Use user overrides (below) to correct them for models you know well.
## How Scoring Works
The routing pipeline within a tier:
```
classify complexity tier
filter eligible models for tier
fire before_model_select hook (optional override)
capability score eligible models
select winner (or first eligible if scoring is disabled)
```
**Scoring formula:** weighted average of capability dimensions
```
score = Σ(weight × capability) / Σ(weights)
```
**Task requirements** are dynamic — different task types weight dimensions differently:
| Unit Type | Key Dimensions |
|-----------|---------------|
| `execute-task` | coding (0.9), instruction (0.7), speed (0.3) |
| `research-*` | research (0.9), longContext (0.7), reasoning (0.5) |
| `plan-*` | reasoning (0.9), coding (0.5) |
| `replan-slice` | reasoning (0.9), debugging (0.6), coding (0.5) |
| `complete-slice`, `run-uat` | instruction (0.8), speed (0.7) |
For `execute-task`, requirements are further refined by task metadata signals:
- Tags like `docs`, `config`, `readme` → boost instruction weight
- Keywords like `concurrency`, `compatibility` → boost debugging and reasoning
- Keywords like `migration`, `architecture` → boost reasoning and coding
- Large file counts (≥6) or large estimated line counts (≥500) → boost coding and reasoning
**Tie-breaking:** When two models score within 2 points of each other, the cheaper model wins. If costs are equal, lexicographic model ID breaks the tie (deterministic).
## User Overrides
Correct built-in capability profiles for models you know well using `modelOverrides` in your models configuration:
```json
{
"providers": {
"anthropic": {
"modelOverrides": {
"claude-sonnet-4-6": {
"capabilities": {
"debugging": 90,
"research": 85
}
}
}
}
}
}
```
Overrides are **deep-merged** with built-in defaults — only the specified dimensions are overridden; others retain their built-in values.
**Use case:** You've found that a model consistently outperforms its built-in profile on specific task types. Override the relevant dimensions to steer the router toward that model for those tasks.
## Verbose Output
When verbose mode is active, the router logs its routing decision. When capability scoring was used, the log includes a full scoring breakdown:
```
Dynamic routing [S]: claude-sonnet-4-6 (capability-scored) — claude-sonnet-4-6: 82.3, gpt-4o: 78.1, deepseek-chat: 72.0
```
When tier-only routing was used (scoring disabled, single eligible model, or routing guards applied):
```
Dynamic routing [S]: claude-sonnet-4-6 (standard complexity, multiple steps)
```
The `selectionMethod` field in the routing decision indicates which path was taken:
- `"capability-scored"` — capability scoring selected the winner
- `"tier-only"` — cheapest in tier (or explicit pin) was used
## Extension Hook
Extensions can intercept and override model selection using the `before_model_select` hook.
The hook fires **after** tier filtering (eligible models are known) and **before** capability scoring (scores have not been computed yet). A hook can override selection entirely or return `undefined` to let scoring proceed normally.
**Registering a handler:**
```typescript
pi.on("before_model_select", async (event) => {
const { unitType, unitId, classification, taskMetadata, eligibleModels, phaseConfig } = event;
// Custom routing strategy: always use gemini for research tasks
if (unitType.startsWith("research-")) {
const gemini = eligibleModels.find(id => id.includes("gemini"));
if (gemini) return { modelId: gemini };
}
// Return undefined to let capability scoring proceed
return undefined;
});
```
**Event payload:**
| Field | Type | Description |
|-------|------|-------------|
| `unitType` | `string` | The unit type being dispatched (e.g., `"execute-task"`) |
| `unitId` | `string` | Unique identifier for this unit dispatch |
| `classification` | `{ tier, reason, downgraded }` | The complexity classification result |
| `taskMetadata` | `Record<string, unknown> \| undefined` | Task metadata extracted from the unit plan |
| `eligibleModels` | `string[]` | Models eligible for the classified tier |
| `phaseConfig` | `{ primary, fallbacks } \| undefined` | The user's configured model for this phase |
**Return value:** `{ modelId: string }` to override selection, or `undefined` to defer to capability scoring.
**First-override-wins:** If multiple extensions register handlers, the first one to return a non-undefined result wins. Subsequent handlers are not called.
## Complexity Classification

View file

@ -428,6 +428,8 @@ export function createExtensionRuntime(): ExtensionRuntime {
unregisterProvider: (name) => {
runtime.pendingProviderRegistrations = runtime.pendingProviderRegistrations.filter((r) => r.name !== name);
},
// Stub replaced by ExtensionRunner at construction time via bindEmitMethods().
emitBeforeModelSelect: async () => undefined,
};
return runtime;
@ -579,6 +581,10 @@ function createExtensionAPI(
runtime.unregisterProvider(name);
},
async emitBeforeModelSelect(event: Omit<import("./types.js").BeforeModelSelectEvent, "type">): Promise<import("./types.js").BeforeModelSelectResult | undefined> {
return runtime.emitBeforeModelSelect(event);
},
events: eventBus,
} as ExtensionAPI;

View file

@ -13,6 +13,8 @@ import type { SessionManager } from "../session-manager.js";
import type {
BeforeAgentStartEvent,
BeforeAgentStartEventResult,
BeforeModelSelectEvent,
BeforeModelSelectResult,
BeforeProviderRequestEvent,
CompactOptions,
ContextEvent,
@ -230,6 +232,8 @@ export class ExtensionRunner {
this.cwd = cwd;
this.sessionManager = sessionManager;
this.modelRegistry = modelRegistry;
// Bind emit methods into the shared runtime so createExtensionAPI can delegate to them.
this.runtime.emitBeforeModelSelect = (event) => this.emitBeforeModelSelect(event);
}
bindCore(actions: ExtensionActions, contextActions: ExtensionContextActions): void {
@ -694,6 +698,21 @@ export class ExtensionRunner {
return currentPayload;
}
async emitBeforeModelSelect(event: Omit<BeforeModelSelectEvent, "type">): Promise<BeforeModelSelectResult | undefined> {
let result: BeforeModelSelectResult | undefined;
await this.invokeHandlers("before_model_select", () => ({
type: "before_model_select" as const,
...event,
} satisfies BeforeModelSelectEvent), (handlerResult) => {
if (handlerResult) {
result = handlerResult as BeforeModelSelectResult;
return { done: true }; // first override wins
}
return { done: false };
});
return result;
}
async emitBeforeAgentStart(
prompt: string,
images: ImageContent[] | undefined,

View file

@ -603,6 +603,22 @@ export interface ModelSelectEvent {
source: ModelSelectSource;
}
/** Fired before model selection runs capability scoring. Extensions can override the selected model. */
export interface BeforeModelSelectEvent {
type: "before_model_select";
unitType: string;
unitId: string;
classification: { tier: string; reason: string; downgraded: boolean };
taskMetadata?: Record<string, unknown>;
eligibleModels: string[];
phaseConfig?: { primary: string; fallbacks: string[] };
}
/** Result from before_model_select event handler. Return { modelId } to override selection. */
export interface BeforeModelSelectResult {
modelId: string;
}
// ============================================================================
// User Bash Events
// ============================================================================
@ -1052,6 +1068,14 @@ export interface ExtensionAPI {
on(event: "tool_result", handler: ExtensionHandler<ToolResultEvent, ToolResultEventResult>): void;
on(event: "user_bash", handler: ExtensionHandler<UserBashEvent, UserBashEventResult>): void;
on(event: "input", handler: ExtensionHandler<InputEvent, InputEventResult>): void;
on(event: "before_model_select", handler: ExtensionHandler<BeforeModelSelectEvent, BeforeModelSelectResult>): void;
// =========================================================================
// Event Emission (for host extensions that orchestrate model selection)
// =========================================================================
/** Emit before_model_select event. Returns override model ID or undefined. */
emitBeforeModelSelect(event: Omit<BeforeModelSelectEvent, "type">): Promise<BeforeModelSelectResult | undefined>;
// =========================================================================
// Tool Registration
@ -1367,6 +1391,8 @@ export interface ExtensionRuntimeState {
*/
registerProvider: (name: string, config: ProviderConfig) => void;
unregisterProvider: (name: string) => void;
/** Emit before_model_select event to all registered handlers. Bound by ExtensionRunner. */
emitBeforeModelSelect: (event: Omit<BeforeModelSelectEvent, "type">) => Promise<BeforeModelSelectResult | undefined>;
}
/**

View file

@ -9,8 +9,8 @@ import type { ExtensionAPI, ExtensionContext } from "@gsd/pi-coding-agent";
import type { GSDPreferences } from "./preferences.js";
import { resolveModelWithFallbacksForUnit, resolveDynamicRoutingConfig } from "./preferences.js";
import type { ComplexityTier } from "./complexity-classifier.js";
import { classifyUnitComplexity, tierLabel, extractTaskMetadata } from "./complexity-classifier.js";
import { resolveModelForComplexity, escalateTier } from "./model-router.js";
import { classifyUnitComplexity, tierLabel } from "./complexity-classifier.js";
import { resolveModelForComplexity, escalateTier, getEligibleModels, loadCapabilityOverrides } from "./model-router.js";
import { getLedger, getProjectTotals } from "./metrics.js";
import { unitPhaseLabel } from "./auto-dashboard.js";
@ -107,27 +107,89 @@ export async function selectAndApplyModel(
}
}
// Extract task metadata for capability scoring
const taskMeta = unitType === "execute-task"
? extractTaskMetadata(unitId, basePath)
: undefined;
const routingResult = resolveModelForComplexity(
classification, modelConfig, routingConfig, availableModelIds,
unitType, taskMeta,
// Load user capability overrides from preferences (D-17: deep-merged with built-in profiles)
const capabilityOverrides = loadCapabilityOverrides(
(prefs as { modelOverrides?: Record<string, { capabilities?: Record<string, number> }> } | undefined) ?? {},
);
// Fire before_model_select hook (ADR-004, D-03)
// Hook can override model selection entirely by returning { modelId }
let hookOverride: string | undefined;
if (routingConfig.hooks !== false) {
const eligible = getEligibleModels(
classification.tier,
availableModelIds,
routingConfig,
);
const hookResult = await pi.emitBeforeModelSelect({
unitType,
unitId,
classification: {
tier: classification.tier,
reason: classification.reason,
downgraded: classification.downgraded,
},
taskMetadata: classification.taskMetadata as Record<string, unknown> | undefined,
eligibleModels: eligible,
phaseConfig: modelConfig ? {
primary: modelConfig.primary,
fallbacks: modelConfig.fallbacks ?? [],
} : undefined,
});
if (hookResult?.modelId) {
hookOverride = hookResult.modelId;
}
}
let routingResult: ReturnType<typeof resolveModelForComplexity>;
if (hookOverride) {
// Hook override bypasses capability scoring entirely
routingResult = {
modelId: hookOverride,
fallbacks: [
...(modelConfig?.fallbacks ?? []).filter(f => f !== hookOverride),
...(modelConfig?.primary && modelConfig.primary !== hookOverride ? [modelConfig.primary] : []),
],
tier: classification.tier,
wasDowngraded: hookOverride !== modelConfig?.primary,
reason: `hook override: ${hookOverride}`,
selectionMethod: "tier-only",
};
} else {
routingResult = resolveModelForComplexity(
classification,
modelConfig,
routingConfig,
availableModelIds,
unitType,
classification.taskMetadata,
capabilityOverrides,
);
}
if (routingResult.wasDowngraded) {
effectiveModelConfig = {
primary: routingResult.modelId,
fallbacks: routingResult.fallbacks,
};
if (verbose) {
const method = routingResult.selectionMethod === "capability-scored" ? "capability-scored" : "tier-only";
ctx.ui.notify(
`Dynamic routing [${tierLabel(classification.tier)}]: ${routingResult.modelId} (${method}${classification.reason})`,
"info",
);
if (routingResult.selectionMethod === "capability-scored" && routingResult.capabilityScores) {
// Verbose scoring breakdown for capability-scored decisions (D-20)
const tierLbl = tierLabel(classification.tier);
const scores = Object.entries(routingResult.capabilityScores)
.sort(([, a], [, b]) => b - a)
.map(([id, score]) => `${id}: ${score.toFixed(1)}`)
.join(", ");
ctx.ui.notify(
`Dynamic routing [${tierLbl}]: ${routingResult.modelId} (capability-scored) — ${scores}`,
"info",
);
} else {
ctx.ui.notify(
`Dynamic routing [${tierLabel(classification.tier)}]: ${routingResult.modelId} (${classification.reason})`,
"info",
);
}
}
}
routingTierLabel = ` [${tierLabel(classification.tier)}]`;

View file

@ -322,4 +322,12 @@ export function registerHooks(pi: ExtensionAPI): void {
payload.service_tier = tier;
return payload;
});
// Capability-aware model routing hook (ADR-004)
// Extensions can override model selection by returning { modelId: "..." }
// Return undefined to let the built-in capability scoring proceed.
pi.on("before_model_select", async (_event) => {
// Default: no override — let capability scoring handle selection
return undefined;
});
}

View file

@ -16,6 +16,7 @@ export interface ClassificationResult {
tier: ComplexityTier;
reason: string;
downgraded: boolean; // true if budget pressure lowered the tier
taskMetadata?: TaskMetadata;
}
export interface TaskMetadata {
@ -71,17 +72,20 @@ export function classifyUnitComplexity(
): ClassificationResult {
// Hook units default to light
if (unitType.startsWith("hook/")) {
const result: ClassificationResult = { tier: "light", reason: "hook unit", downgraded: false };
const result: ClassificationResult = { tier: "light", reason: "hook unit", downgraded: false, taskMetadata: undefined };
return applyBudgetPressure(result, budgetPct);
}
// Start with the default tier for this unit type
let tier = UNIT_TYPE_TIERS[unitType] ?? "standard";
let reason = `unit type: ${unitType}`;
let taskMeta: TaskMetadata | undefined;
// For execute-task, analyze task metadata for complexity signals
if (unitType === "execute-task") {
const taskAnalysis = analyzeTaskComplexity(unitId, basePath, metadata);
// Extract metadata once and reuse throughout to avoid double-extraction
taskMeta = metadata ?? extractTaskMetadata(unitId, basePath);
const taskAnalysis = analyzeTaskComplexity(unitId, basePath, taskMeta);
tier = taskAnalysis.tier;
reason = taskAnalysis.reason;
}
@ -96,14 +100,15 @@ export function classifyUnitComplexity(
}
// Adaptive learning: check if history suggests bumping the tier
const tags = metadata?.tags ?? extractTaskMetadata(unitId, basePath).tags;
// Use already-extracted taskMeta.tags if available to avoid double-extraction
const tags = taskMeta?.tags ?? metadata?.tags;
const adaptiveAdjustment = getAdaptiveTierAdjustment(unitType, tier, tags);
if (adaptiveAdjustment && tierOrdinal(adaptiveAdjustment) > tierOrdinal(tier)) {
reason = `${reason} (adaptive: high failure rate at ${tier})`;
tier = adaptiveAdjustment;
}
const result: ClassificationResult = { tier, reason, downgraded: false };
const result: ClassificationResult = { tier, reason, downgraded: false, taskMetadata: taskMeta };
return applyBudgetPressure(result, budgetPct);
}

View file

@ -2,7 +2,7 @@
// Maps complexity tiers to models, enforcing downgrade-only semantics.
// The user's configured model is always the ceiling.
import type { ComplexityTier, ClassificationResult } from "./complexity-classifier.js";
import type { ComplexityTier, ClassificationResult, TaskMetadata } from "./complexity-classifier.js";
import { tierOrdinal } from "./complexity-classifier.js";
import type { ResolvedModelConfig } from "./preferences.js";
@ -33,14 +33,27 @@ export interface RoutingDecision {
wasDowngraded: boolean;
/** Human-readable reason for this decision */
reason: string;
/** How the model was selected. */
selectionMethod?: "tier-only" | "capability-scored";
/** Capability scores per model (when capability-scored). */
/** How the model was selected */
selectionMethod: "tier-only" | "capability-scored";
/** Capability scores per eligible model (capability-scored path only) */
capabilityScores?: Record<string, number>;
/** Task requirement vector (when capability-scored). */
/** Task requirement vector used for scoring */
taskRequirements?: Partial<Record<string, number>>;
}
// ─── Capability Profiles ─────────────────────────────────────────────────────
/** Seven-dimension capability profile for a model. All values in 0100 range. */
export interface ModelCapabilities {
coding: number;
debugging: number;
research: number;
reasoning: number;
speed: number;
longContext: number;
instruction: number;
}
// ─── Known Model Tiers ───────────────────────────────────────────────────────
// Maps known model IDs to their capability tier. Used when tier_models is not
// explicitly configured to pick the best available model for each tier.
@ -121,33 +134,27 @@ const MODEL_COST_PER_1K_INPUT: Record<string, number> = {
"deepseek-chat": 0.00014,
};
// ─── Capability Profiles (ADR-004 Phase 2) ──────────────────────────────────
// 7-dimension profiles, 0100 normalized. Models without a profile
// score 50 uniformly — capability scoring is a no-op for them.
export interface ModelCapabilities {
coding: number;
debugging: number;
research: number;
reasoning: number;
speed: number;
longContext: number;
instruction: number;
}
// ─── Capability Profiles Data Table ──────────────────────────────────────────
// Per-model capability profiles (0100 scale). Used for capability-aware
// model selection within an eligible tier set.
export const MODEL_CAPABILITY_PROFILES: Record<string, ModelCapabilities> = {
"claude-opus-4-6": { coding: 95, debugging: 90, research: 85, reasoning: 95, speed: 30, longContext: 80, instruction: 90 },
"claude-sonnet-4-6": { coding: 85, debugging: 80, research: 75, reasoning: 80, speed: 60, longContext: 75, instruction: 85 },
"claude-haiku-4-5": { coding: 60, debugging: 50, research: 45, reasoning: 50, speed: 95, longContext: 50, instruction: 75 },
"gpt-4o": { coding: 80, debugging: 75, research: 70, reasoning: 75, speed: 65, longContext: 70, instruction: 80 },
"gpt-4o-mini": { coding: 55, debugging: 45, research: 40, reasoning: 45, speed: 90, longContext: 45, instruction: 70 },
"gemini-2.5-pro": { coding: 75, debugging: 70, research: 85, reasoning: 75, speed: 55, longContext: 90, instruction: 75 },
"gemini-2.0-flash": { coding: 50, debugging: 40, research: 50, reasoning: 40, speed: 95, longContext: 60, instruction: 65 },
"deepseek-chat": { coding: 75, debugging: 65, research: 55, reasoning: 70, speed: 70, longContext: 55, instruction: 65 },
"o3": { coding: 80, debugging: 85, research: 80, reasoning: 92, speed: 25, longContext: 70, instruction: 85 },
"claude-opus-4-6": { coding: 95, debugging: 90, research: 85, reasoning: 95, speed: 30, longContext: 80, instruction: 90 },
"claude-sonnet-4-6": { coding: 85, debugging: 80, research: 75, reasoning: 80, speed: 60, longContext: 75, instruction: 85 },
"claude-haiku-4-5": { coding: 60, debugging: 50, research: 45, reasoning: 50, speed: 95, longContext: 50, instruction: 75 },
"gpt-4o": { coding: 80, debugging: 75, research: 70, reasoning: 75, speed: 65, longContext: 70, instruction: 80 },
"gpt-4o-mini": { coding: 55, debugging: 45, research: 40, reasoning: 45, speed: 90, longContext: 45, instruction: 70 },
"gemini-2.5-pro": { coding: 75, debugging: 70, research: 85, reasoning: 75, speed: 55, longContext: 90, instruction: 75 },
"gemini-2.0-flash": { coding: 50, debugging: 40, research: 50, reasoning: 40, speed: 95, longContext: 60, instruction: 65 },
"deepseek-chat": { coding: 75, debugging: 65, research: 55, reasoning: 70, speed: 70, longContext: 55, instruction: 65 },
"o3": { coding: 80, debugging: 85, research: 80, reasoning: 92, speed: 25, longContext: 70, instruction: 85 },
};
const BASE_REQUIREMENTS: Record<string, Partial<Record<keyof ModelCapabilities, number>>> = {
// ─── Base Task Requirements Data Table ───────────────────────────────────────
// Per-unit-type base requirement vectors. Weights indicate how important each
// capability dimension is for this unit type.
export const BASE_REQUIREMENTS: Record<string, Partial<Record<keyof ModelCapabilities, number>>> = {
"execute-task": { coding: 0.9, instruction: 0.7, speed: 0.3 },
"research-milestone": { research: 0.9, longContext: 0.7, reasoning: 0.5 },
"research-slice": { research: 0.9, longContext: 0.7, reasoning: 0.5 },
@ -161,15 +168,36 @@ const BASE_REQUIREMENTS: Record<string, Partial<Record<keyof ModelCapabilities,
"complete-milestone": { instruction: 0.8, reasoning: 0.5 },
};
// ─── Public API ──────────────────────────────────────────────────────────────
/**
* Compute a task requirement vector from unit type and optional metadata.
* Score a model's suitability for a task given a requirement vector.
* Returns a weighted average of capability dimensions (0100).
* Returns 50 if requirements are empty (neutral score).
*/
export function scoreModel(
model: ModelCapabilities,
requirements: Partial<Record<keyof ModelCapabilities, number>>,
): number {
let weightedSum = 0;
let weightSum = 0;
for (const [dim, weight] of Object.entries(requirements)) {
const capability = model[dim as keyof ModelCapabilities] ?? 50;
weightedSum += weight * capability;
weightSum += weight;
}
return weightSum > 0 ? weightedSum / weightSum : 50;
}
/**
* Compute dynamic task requirements from unit type and optional task metadata.
* Returns a requirement vector refined by task-specific signals.
*/
export function computeTaskRequirements(
unitType: string,
metadata?: { tags?: string[]; complexityKeywords?: string[]; fileCount?: number; estimatedLines?: number },
metadata?: TaskMetadata,
): Partial<Record<keyof ModelCapabilities, number>> {
const base = { ...(BASE_REQUIREMENTS[unitType] ?? { reasoning: 0.5 }) };
const base = BASE_REQUIREMENTS[unitType] ?? { reasoning: 0.5 };
if (unitType === "execute-task" && metadata) {
if (metadata.tags?.some(t => /^(docs?|readme|comment|config|typo|rename)$/i.test(t))) {
return { ...base, instruction: 0.9, coding: 0.3, speed: 0.7 };
@ -184,29 +212,101 @@ export function computeTaskRequirements(
return { ...base, coding: 0.9, reasoning: 0.7 };
}
}
return base;
}
/**
* Score a model against a task requirement vector.
* Returns weighted average in range 0100. Returns 50 for empty requirements.
* Score all eligible models against a requirement vector and return them
* sorted by score descending. Within 2 points: prefer cheaper; equal cost:
* lexicographic tie-break by model ID.
*/
export function scoreModel(
capabilities: ModelCapabilities,
export function scoreEligibleModels(
eligibleModelIds: string[],
requirements: Partial<Record<keyof ModelCapabilities, number>>,
): number {
let weightedSum = 0;
let weightSum = 0;
for (const [dim, weight] of Object.entries(requirements)) {
const capability = capabilities[dim as keyof ModelCapabilities] ?? 50;
weightedSum += weight * capability;
weightSum += weight;
}
return weightSum > 0 ? weightedSum / weightSum : 50;
capabilityOverrides?: Record<string, Partial<ModelCapabilities>>,
): Array<{ modelId: string; score: number }> {
const scored = eligibleModelIds.map(modelId => {
const builtin = MODEL_CAPABILITY_PROFILES[modelId];
const override = capabilityOverrides?.[modelId];
const profile: ModelCapabilities = builtin
? override ? { ...builtin, ...override } : builtin
: { coding: 50, debugging: 50, research: 50, reasoning: 50, speed: 50, longContext: 50, instruction: 50 };
return { modelId, score: scoreModel(profile, requirements) };
});
scored.sort((a, b) => {
const scoreDiff = b.score - a.score;
if (Math.abs(scoreDiff) > 2) return scoreDiff;
const costA = MODEL_COST_PER_1K_INPUT[a.modelId] ?? Infinity;
const costB = MODEL_COST_PER_1K_INPUT[b.modelId] ?? Infinity;
if (costA !== costB) return costA - costB;
return a.modelId.localeCompare(b.modelId);
});
return scored;
}
// ─── Public API ──────────────────────────────────────────────────────────────
/**
* Return all models eligible for a given tier, sorted cheapest first.
* If routingConfig.tier_models[tier] is set and available, returns only that
* model. Otherwise filters availableModelIds by tier from MODEL_CAPABILITY_TIER.
*/
export function getEligibleModels(
tier: ComplexityTier,
availableModelIds: string[],
routingConfig: DynamicRoutingConfig,
): string[] {
// 1. Check explicit tier_models config
const explicitModel = routingConfig.tier_models?.[tier];
if (explicitModel) {
// Exact match
if (availableModelIds.includes(explicitModel)) return [explicitModel];
// Provider-prefix-stripped match
const match = availableModelIds.find(id => {
const bareAvail = id.includes("/") ? id.split("/").pop()! : id;
const bareExplicit = explicitModel.includes("/") ? explicitModel.split("/").pop()! : explicitModel;
return bareAvail === bareExplicit;
});
if (match) return [match];
}
// 2. Auto-detect: filter by tier, sort cheapest first
return availableModelIds
.filter(id => getModelTier(id) === tier)
.sort((a, b) => {
const costA = getModelCost(a);
const costB = getModelCost(b);
return costA - costB;
});
}
/**
* Build a fallback chain for a selected model: [selectedModel, ...configuredFallbacks, configuredPrimary]
* Deduplicates entries while preserving order.
*/
function buildFallbackChain(selectedModelId: string, phaseConfig: ResolvedModelConfig): string[] {
return [
...phaseConfig.fallbacks.filter(f => f !== selectedModelId),
phaseConfig.primary,
].filter(f => f !== selectedModelId);
}
/**
* Load capability overrides from user preferences' modelOverrides section.
* Returns a map of model ID partial capability overrides to deep-merge with built-in profiles.
*
* Per D-17: partial capability overrides via models.json modelOverrides, deep-merged with defaults.
*/
export function loadCapabilityOverrides(
prefs: { modelOverrides?: Record<string, { capabilities?: Partial<ModelCapabilities> }> },
): Record<string, Partial<ModelCapabilities>> {
const result: Record<string, Partial<ModelCapabilities>> = {};
if (!prefs.modelOverrides) return result;
for (const [modelId, overrideEntry] of Object.entries(prefs.modelOverrides)) {
if (overrideEntry.capabilities) {
result[modelId] = overrideEntry.capabilities;
}
}
return result;
}
/**
* Resolve the model to use for a given complexity tier.
@ -214,10 +314,18 @@ export function scoreModel(
* Downgrade-only: the returned model is always equal to or cheaper than
* the user's configured primary model. Never upgrades beyond configuration.
*
* @param classification The complexity classification result
* @param phaseConfig The user's configured model for this phase (ceiling)
* @param routingConfig Dynamic routing configuration
* @param availableModelIds List of available model IDs (from registry)
* STEP 1: Filter to eligible models for the requested tier.
* STEP 2: Capability scoring ranks eligible models by task-capability match
* when capability_routing is enabled and multiple eligible models exist.
* STEP 3: Fallback chain assembly.
*
* @param classification The complexity classification result
* @param phaseConfig The user's configured model for this phase (ceiling)
* @param routingConfig Dynamic routing configuration
* @param availableModelIds List of available model IDs (from registry)
* @param unitType The unit type for capability requirement computation (optional)
* @param taskMetadata Task metadata for refined requirement vectors (optional)
* @param capabilityOverrides User-provided capability overrides (deep-merged with built-in profiles, optional)
*/
export function resolveModelForComplexity(
classification: ClassificationResult,
@ -225,7 +333,8 @@ export function resolveModelForComplexity(
routingConfig: DynamicRoutingConfig,
availableModelIds: string[],
unitType?: string,
metadata?: { tags?: string[]; complexityKeywords?: string[]; fileCount?: number; estimatedLines?: number },
taskMetadata?: TaskMetadata,
capabilityOverrides?: Record<string, Partial<ModelCapabilities>>,
): RoutingDecision {
// If no phase config or routing disabled, pass through
if (!phaseConfig || !routingConfig.enabled) {
@ -235,6 +344,7 @@ export function resolveModelForComplexity(
tier: classification.tier,
wasDowngraded: false,
reason: "dynamic routing disabled or no phase config",
selectionMethod: "tier-only",
};
}
@ -254,6 +364,7 @@ export function resolveModelForComplexity(
tier: requestedTier,
wasDowngraded: false,
reason: `configured model "${configuredPrimary}" is not in the known tier map — honoring explicit config`,
selectionMethod: "tier-only",
};
}
@ -265,48 +376,52 @@ export function resolveModelForComplexity(
tier: requestedTier,
wasDowngraded: false,
reason: `tier ${requestedTier} >= configured ${configuredTier}`,
selectionMethod: "tier-only",
};
}
// Find the best model for the requested tier
const useCapabilityScoring = routingConfig.capability_routing && unitType;
// STEP 1: Get all eligible models for the requested tier
const eligible = getEligibleModels(requestedTier, availableModelIds, routingConfig);
let targetModelId: string | null;
let capabilityScores: Record<string, number> | undefined;
let taskRequirements: Partial<Record<string, number>> | undefined;
let selectionMethod: "tier-only" | "capability-scored" = "tier-only";
if (useCapabilityScoring) {
const result = findModelForTierWithCapability(
requestedTier, routingConfig, availableModelIds,
routingConfig.cross_provider !== false, unitType, metadata,
);
targetModelId = result.modelId;
capabilityScores = Object.keys(result.scores).length > 0 ? result.scores : undefined;
taskRequirements = Object.keys(result.requirements).length > 0 ? result.requirements : undefined;
selectionMethod = capabilityScores ? "capability-scored" : "tier-only";
} else {
targetModelId = findModelForTier(
requestedTier, routingConfig, availableModelIds,
routingConfig.cross_provider !== false,
);
}
if (!targetModelId) {
if (eligible.length === 0) {
// No suitable model found — use configured primary
return {
modelId: configuredPrimary,
fallbacks: phaseConfig.fallbacks,
tier: requestedTier,
wasDowngraded: false,
reason: `no ${requestedTier}-tier model available`,
selectionMethod,
selectionMethod: "tier-only",
};
}
const fallbacks = [
...phaseConfig.fallbacks.filter(f => f !== targetModelId),
configuredPrimary,
].filter(f => f !== targetModelId);
// STEP 2: Capability scoring (when enabled and multiple eligible models exist)
if (routingConfig.capability_routing !== false && eligible.length > 1 && unitType) {
const requirements = computeTaskRequirements(unitType, taskMetadata);
const scored = scoreEligibleModels(eligible, requirements, capabilityOverrides);
const winner = scored[0];
if (winner) {
const capScores: Record<string, number> = {};
for (const s of scored) capScores[s.modelId] = s.score;
const fallbacks = buildFallbackChain(winner.modelId, phaseConfig);
return {
modelId: winner.modelId,
fallbacks,
tier: requestedTier,
wasDowngraded: true,
reason: `capability-scored: ${winner.modelId} (${winner.score.toFixed(1)}) for ${unitType}`,
capabilityScores: capScores,
taskRequirements: requirements,
selectionMethod: "capability-scored",
};
}
}
// STEP 3: Fallback — use first eligible model (cheapest in tier, or single eligible)
const targetModelId = eligible[0];
// Build fallback chain: [downgraded_model, ...configured_fallbacks, configured_primary]
const fallbacks = buildFallbackChain(targetModelId, phaseConfig);
return {
modelId: targetModelId,
@ -314,9 +429,7 @@ export function resolveModelForComplexity(
tier: requestedTier,
wasDowngraded: true,
reason: classification.reason,
selectionMethod,
capabilityScores,
taskRequirements,
selectionMethod: "tier-only",
};
}
@ -338,7 +451,7 @@ export function escalateTier(currentTier: ComplexityTier): ComplexityTier | null
export function defaultRoutingConfig(): DynamicRoutingConfig {
return {
enabled: true,
capability_routing: false,
capability_routing: true,
escalate_on_failure: true,
budget_pressure: true,
cross_provider: true,
@ -360,8 +473,8 @@ function getModelTier(modelId: string): ComplexityTier {
if (bareId.includes(knownId) || knownId.includes(bareId)) return tier;
}
// Unknown models are assumed heavy (safest assumption)
return "heavy";
// Unknown models are assumed standard (per D-15: avoids silently ignoring user config)
return "standard";
}
/** Check if a model ID has a known capability tier mapping. (#2192) */
@ -374,93 +487,6 @@ function isKnownModel(modelId: string): boolean {
return false;
}
function findModelForTier(
tier: ComplexityTier,
config: DynamicRoutingConfig,
availableModelIds: string[],
crossProvider: boolean,
): string | null {
// 1. Check explicit tier_models config
const explicitModel = config.tier_models?.[tier];
if (explicitModel && availableModelIds.includes(explicitModel)) {
return explicitModel;
}
// Also check with provider prefix stripped
if (explicitModel) {
const match = availableModelIds.find(id => {
const bareAvail = id.includes("/") ? id.split("/").pop()! : id;
const bareExplicit = explicitModel.includes("/") ? explicitModel.split("/").pop()! : explicitModel;
return bareAvail === bareExplicit;
});
if (match) return match;
}
// 2. Auto-detect: find the cheapest available model in the requested tier
const candidates = availableModelIds
.filter(id => {
const modelTier = getModelTier(id);
return modelTier === tier;
})
.sort((a, b) => {
if (!crossProvider) return 0;
const costA = getModelCost(a);
const costB = getModelCost(b);
return costA - costB;
});
return candidates[0] ?? null;
}
function findModelForTierWithCapability(
tier: ComplexityTier,
config: DynamicRoutingConfig,
availableModelIds: string[],
crossProvider: boolean,
unitType: string,
metadata?: { tags?: string[]; complexityKeywords?: string[]; fileCount?: number; estimatedLines?: number },
): { modelId: string | null; scores: Record<string, number>; requirements: Partial<Record<string, number>> } {
const explicitModel = config.tier_models?.[tier];
if (explicitModel) {
const match = availableModelIds.find(id => {
const bareAvail = id.includes("/") ? id.split("/").pop()! : id;
const bareExplicit = explicitModel.includes("/") ? explicitModel.split("/").pop()! : explicitModel;
return bareAvail === bareExplicit || id === explicitModel;
});
if (match) return { modelId: match, scores: {}, requirements: {} };
}
const requirements = computeTaskRequirements(unitType, metadata);
const candidates = availableModelIds.filter(id => getModelTier(id) === tier);
if (candidates.length === 0) return { modelId: null, scores: {}, requirements };
const scores: Record<string, number> = {};
for (const id of candidates) {
const bareId = id.includes("/") ? id.split("/").pop()! : id;
const profile = getModelProfile(bareId);
scores[id] = scoreModel(profile, requirements);
}
candidates.sort((a, b) => {
const scoreDiff = scores[b] - scores[a];
if (Math.abs(scoreDiff) > 2) return scoreDiff;
if (crossProvider) {
const costDiff = getModelCost(a) - getModelCost(b);
if (costDiff !== 0) return costDiff;
}
return a.localeCompare(b);
});
return { modelId: candidates[0], scores, requirements };
}
function getModelProfile(bareId: string): ModelCapabilities {
if (MODEL_CAPABILITY_PROFILES[bareId]) return MODEL_CAPABILITY_PROFILES[bareId];
for (const [knownId, profile] of Object.entries(MODEL_CAPABILITY_PROFILES)) {
if (bareId.includes(knownId) || knownId.includes(bareId)) return profile;
}
return { coding: 50, debugging: 50, research: 50, reasoning: 50, speed: 50, longContext: 50, instruction: 50 };
}
function getModelCost(modelId: string): number {
const bareId = modelId.includes("/") ? modelId.split("/").pop()! : modelId;

View file

@ -0,0 +1,347 @@
// GSD Extension — Capability-Aware Router Tests
// Tests for new capability scoring functions and data tables (Plan 01-01)
import { describe, test } from "node:test";
import assert from "node:assert/strict";
import {
scoreModel,
computeTaskRequirements,
scoreEligibleModels,
getEligibleModels,
resolveModelForComplexity,
MODEL_CAPABILITY_PROFILES,
BASE_REQUIREMENTS,
defaultRoutingConfig,
} from "../model-router.js";
import type { ModelCapabilities, DynamicRoutingConfig, RoutingDecision } from "../model-router.js";
// ─── scoreModel ──────────────────────────────────────────────────────────────
describe("scoreModel", () => {
const sonnetProfile: ModelCapabilities = {
coding: 85, debugging: 80, research: 75, reasoning: 80,
speed: 60, longContext: 75, instruction: 85,
};
test("produces correct weighted average for single dimension", () => {
// Only coding weight 1.0 → result should be the coding score
const score = scoreModel(sonnetProfile, { coding: 1.0 });
assert.equal(score, 85);
});
test("produces correct weighted average for two dimensions (coding 0.9, instruction 0.7)", () => {
// (0.9*85 + 0.7*85) / (0.9+0.7) = (76.5+59.5)/1.6 = 136/1.6 = 85.0
const score = scoreModel(sonnetProfile, { coding: 0.9, instruction: 0.7 });
assert.ok(Math.abs(score - 85.0) < 0.01, `Expected ~85.0, got ${score}`);
});
test("returns 50 when requirements is empty", () => {
const score = scoreModel(sonnetProfile, {});
assert.equal(score, 50);
});
test("uses 50 as fallback for unknown dimension in requirements", () => {
// 'unknown' dimension not in profile → treated as 50
const score = scoreModel(sonnetProfile, { coding: 0.5, unknown: 1.0 } as any);
// (0.5*85 + 1.0*50) / (0.5+1.0) = (42.5+50)/1.5 = 92.5/1.5 = 61.67
assert.ok(score > 61 && score < 62, `Expected ~61.67, got ${score}`);
});
});
// ─── computeTaskRequirements ─────────────────────────────────────────────────
describe("computeTaskRequirements", () => {
test("execute-task with no metadata returns base requirements", () => {
const req = computeTaskRequirements("execute-task", undefined);
assert.deepStrictEqual(req, { coding: 0.9, instruction: 0.7, speed: 0.3 });
});
test("execute-task with docs tag returns docs-adjusted requirements", () => {
const req = computeTaskRequirements("execute-task", { tags: ["docs"] });
assert.equal(req.instruction, 0.9);
assert.equal(req.coding, 0.3);
assert.equal(req.speed, 0.7);
});
test("execute-task with readme tag returns docs-adjusted requirements", () => {
const req = computeTaskRequirements("execute-task", { tags: ["readme"] });
assert.equal(req.instruction, 0.9);
});
test("execute-task with concurrency keyword boosts debugging and reasoning", () => {
const req = computeTaskRequirements("execute-task", { complexityKeywords: ["concurrency"] });
assert.equal(req.debugging, 0.9);
assert.equal(req.reasoning, 0.8);
});
test("execute-task with compatibility keyword boosts debugging and reasoning", () => {
const req = computeTaskRequirements("execute-task", { complexityKeywords: ["compatibility"] });
assert.equal(req.debugging, 0.9);
assert.equal(req.reasoning, 0.8);
});
test("execute-task with migration keyword boosts reasoning and coding", () => {
const req = computeTaskRequirements("execute-task", { complexityKeywords: ["migration"] });
assert.equal(req.reasoning, 0.9);
assert.equal(req.coding, 0.8);
});
test("execute-task with architecture keyword boosts reasoning and coding", () => {
const req = computeTaskRequirements("execute-task", { complexityKeywords: ["architecture"] });
assert.equal(req.reasoning, 0.9);
assert.equal(req.coding, 0.8);
});
test("execute-task with fileCount >= 6 boosts coding and reasoning", () => {
const req = computeTaskRequirements("execute-task", { fileCount: 8 });
assert.equal(req.coding, 0.9);
assert.equal(req.reasoning, 0.7);
});
test("execute-task with fileCount exactly 6 triggers large-file boost", () => {
const req = computeTaskRequirements("execute-task", { fileCount: 6 });
assert.equal(req.coding, 0.9);
assert.equal(req.reasoning, 0.7);
});
test("execute-task with estimatedLines >= 500 boosts coding and reasoning", () => {
const req = computeTaskRequirements("execute-task", { estimatedLines: 500 });
assert.equal(req.coding, 0.9);
assert.equal(req.reasoning, 0.7);
});
test("research-milestone with no metadata returns base requirements", () => {
const req = computeTaskRequirements("research-milestone", undefined);
assert.deepStrictEqual(req, { research: 0.9, longContext: 0.7, reasoning: 0.5 });
});
test("unknown unit type returns default reasoning requirement", () => {
const req = computeTaskRequirements("unknown-type", undefined);
assert.deepStrictEqual(req, { reasoning: 0.5 });
});
});
// ─── MODEL_CAPABILITY_PROFILES ───────────────────────────────────────────────
describe("MODEL_CAPABILITY_PROFILES", () => {
test("contains all 9 required models", () => {
const required = [
"claude-opus-4-6", "claude-sonnet-4-6", "claude-haiku-4-5",
"gpt-4o", "gpt-4o-mini", "gemini-2.5-pro", "gemini-2.0-flash",
"deepseek-chat", "o3",
];
for (const model of required) {
assert.ok(MODEL_CAPABILITY_PROFILES[model], `Missing profile for ${model}`);
}
});
test("each profile has all 7 capability dimensions", () => {
const dims: Array<keyof ModelCapabilities> = [
"coding", "debugging", "research", "reasoning",
"speed", "longContext", "instruction",
];
for (const [modelId, profile] of Object.entries(MODEL_CAPABILITY_PROFILES)) {
for (const dim of dims) {
assert.ok(profile[dim] !== undefined, `${modelId} missing dimension ${dim}`);
assert.ok(profile[dim] >= 0 && profile[dim] <= 100, `${modelId}.${dim} out of range`);
}
}
});
test("claude-opus-4-6 has high reasoning and coding", () => {
const opus = MODEL_CAPABILITY_PROFILES["claude-opus-4-6"];
assert.ok(opus.reasoning >= 90, `Expected reasoning >= 90, got ${opus.reasoning}`);
assert.ok(opus.coding >= 90, `Expected coding >= 90, got ${opus.coding}`);
});
test("claude-haiku-4-5 has high speed but lower reasoning", () => {
const haiku = MODEL_CAPABILITY_PROFILES["claude-haiku-4-5"];
assert.ok(haiku.speed >= 90, `Expected speed >= 90, got ${haiku.speed}`);
assert.ok(haiku.reasoning < 70, `Expected reasoning < 70, got ${haiku.reasoning}`);
});
});
// ─── BASE_REQUIREMENTS ───────────────────────────────────────────────────────
describe("BASE_REQUIREMENTS", () => {
test("contains all 11 unit types", () => {
const required = [
"execute-task", "research-milestone", "research-slice",
"plan-milestone", "plan-slice", "replan-slice",
"reassess-roadmap", "complete-slice", "run-uat",
"discuss-milestone", "complete-milestone",
];
for (const unitType of required) {
assert.ok(BASE_REQUIREMENTS[unitType], `Missing requirements for ${unitType}`);
}
});
});
// ─── scoreEligibleModels ─────────────────────────────────────────────────────
describe("scoreEligibleModels", () => {
test("returns array sorted by score descending", () => {
const requirements = { research: 0.9, longContext: 0.7, reasoning: 0.5 };
const results = scoreEligibleModels(["claude-sonnet-4-6", "gpt-4o"], requirements);
assert.ok(results.length === 2);
assert.ok(results[0].score >= results[1].score, "Should be sorted descending by score");
});
test("returns single model when only one eligible", () => {
const requirements = { coding: 0.9 };
const results = scoreEligibleModels(["claude-sonnet-4-6"], requirements);
assert.equal(results.length, 1);
assert.equal(results[0].modelId, "claude-sonnet-4-6");
});
test("models without profiles get uniform 50s score", () => {
const requirements = { coding: 1.0 };
const results = scoreEligibleModels(["unknown-model-xyz"], requirements);
assert.equal(results[0].score, 50);
});
test("when two models score within 2 points, prefers cheaper model", () => {
// gemini-2.0-flash is cheaper than gpt-4o-mini ($0.0001 vs $0.00015)
// Use a requirement that causes similar scores for both
const requirements = { speed: 1.0 };
const results = scoreEligibleModels(["gpt-4o-mini", "gemini-2.0-flash"], requirements);
// Both are high-speed: gpt-4o-mini=90, gemini-2.0-flash=95 — scores differ by 5, not within 2
// So top should be gemini-2.0-flash by score
assert.equal(results[0].modelId, "gemini-2.0-flash");
});
test("tie-breaks by lexicographic model ID when cost and score are equal", () => {
// Use models without cost entries — both get Infinity cost
const requirements = { coding: 1.0 };
const results = scoreEligibleModels(["model-z", "model-a"], requirements);
// Both unknown → score=50, cost=Infinity → tiebreak by ID
assert.equal(results[0].modelId, "model-a");
});
test("scoreEligibleModels respects capabilityOverrides", () => {
const requirements = { coding: 1.0 };
// Override claude-sonnet-4-6's coding to 30 (worse)
const results = scoreEligibleModels(
["claude-sonnet-4-6", "gpt-4o"],
requirements,
{ "claude-sonnet-4-6": { coding: 30 } },
);
// gpt-4o coding=80 should beat overridden sonnet coding=30
assert.equal(results[0].modelId, "gpt-4o");
});
});
// ─── getEligibleModels ───────────────────────────────────────────────────────
describe("getEligibleModels", () => {
const MODELS = [
"claude-opus-4-6", // heavy
"claude-sonnet-4-6", // standard
"claude-haiku-4-5", // light
"gpt-4o-mini", // light
];
test("returns light-tier models sorted by cost when no explicit config", () => {
const config: DynamicRoutingConfig = defaultRoutingConfig();
const result = getEligibleModels("light", MODELS, config);
assert.ok(result.length >= 1);
// All results should be light-tier
for (const id of result) {
assert.ok(
["claude-haiku-4-5", "gpt-4o-mini"].includes(id),
`Expected light-tier model, got ${id}`,
);
}
});
test("returns explicit tier_models when configured and available", () => {
const config: DynamicRoutingConfig = {
...defaultRoutingConfig(),
tier_models: { light: "gpt-4o-mini" },
};
const result = getEligibleModels("light", MODELS, config);
assert.deepStrictEqual(result, ["gpt-4o-mini"]);
});
test("returns empty array when no eligible models for tier", () => {
const config: DynamicRoutingConfig = defaultRoutingConfig();
// Only heavy model available, requesting light
const result = getEligibleModels("light", ["claude-opus-4-6"], config);
assert.equal(result.length, 0);
});
});
// ─── DynamicRoutingConfig extension ─────────────────────────────────────────
describe("DynamicRoutingConfig.capability_routing", () => {
test("defaultRoutingConfig includes capability_routing: true", () => {
const config = defaultRoutingConfig();
assert.equal(config.capability_routing, true);
});
});
// ─── RoutingDecision.selectionMethod ─────────────────────────────────────────
describe("RoutingDecision.selectionMethod", () => {
const MODELS = ["claude-opus-4-6", "claude-sonnet-4-6", "claude-haiku-4-5", "gpt-4o-mini"];
function makeClassification(tier: "light" | "standard" | "heavy") {
return { tier, reason: "test", downgraded: false };
}
test("returns selectionMethod: tier-only when routing is disabled", () => {
const config = { ...defaultRoutingConfig(), enabled: false };
const result: RoutingDecision = resolveModelForComplexity(
makeClassification("light"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
MODELS,
);
assert.equal(result.selectionMethod, "tier-only");
});
test("returns selectionMethod: tier-only for no phase config passthrough", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result: RoutingDecision = resolveModelForComplexity(
makeClassification("light"),
undefined,
config,
MODELS,
);
assert.equal(result.selectionMethod, "tier-only");
});
test("returns selectionMethod: tier-only for unknown model passthrough", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result: RoutingDecision = resolveModelForComplexity(
makeClassification("light"),
{ primary: "custom-provider/my-model-v3", fallbacks: [] },
config,
["custom-provider/my-model-v3", ...MODELS],
);
assert.equal(result.selectionMethod, "tier-only");
});
test("returns selectionMethod: tier-only for no-downgrade passthrough", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result: RoutingDecision = resolveModelForComplexity(
makeClassification("heavy"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
MODELS,
);
assert.equal(result.selectionMethod, "tier-only");
});
test("returns selectionMethod: tier-only when downgraded", () => {
const config = { ...defaultRoutingConfig(), enabled: true };
const result: RoutingDecision = resolveModelForComplexity(
makeClassification("light"),
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
MODELS,
);
assert.equal(result.selectionMethod, "tier-only");
});
});

View file

@ -1,7 +1,7 @@
import test from "node:test";
import test, { describe } from "node:test";
import assert from "node:assert/strict";
import { classifyUnitComplexity, tierLabel, tierOrdinal } from "../complexity-classifier.js";
import { classifyUnitComplexity, tierLabel, tierOrdinal, extractTaskMetadata } from "../complexity-classifier.js";
import type { ComplexityTier, TaskMetadata } from "../complexity-classifier.js";
// ─── tierLabel ───────────────────────────────────────────────────────────────
@ -179,3 +179,28 @@ test("execute-task with few code blocks stays standard", () => {
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.equal(result.tier, "standard");
});
// ─── ClassificationResult taskMetadata passthrough ───────────────────────────
describe("ClassificationResult taskMetadata", () => {
test("classifyUnitComplexity for execute-task returns result with taskMetadata populated", () => {
const metadata: TaskMetadata = { fileCount: 3, tags: ["docs"] };
const result = classifyUnitComplexity("execute-task", "M001/S01/T01", "/tmp/fake", undefined, metadata);
assert.ok(result.taskMetadata !== undefined, "taskMetadata should be populated for execute-task");
assert.equal(result.taskMetadata!.tags?.[0], "docs");
});
test("classifyUnitComplexity for hook/xyz returns result with taskMetadata undefined", () => {
const result = classifyUnitComplexity("hook/verify", "M001/S01/T01", "/tmp/fake");
assert.equal(result.taskMetadata, undefined, "taskMetadata should be undefined for hook units");
});
test("classifyUnitComplexity for plan-slice returns result with taskMetadata undefined", () => {
const result = classifyUnitComplexity("plan-slice", "M001/S01", "/tmp/fake");
assert.equal(result.taskMetadata, undefined, "taskMetadata should be undefined for plan-slice");
});
test("extractTaskMetadata is importable as a named export and is a function", () => {
assert.equal(typeof extractTaskMetadata, "function", "extractTaskMetadata should be a callable function");
});
});

View file

@ -1,4 +1,4 @@
import test from "node:test";
import test, { describe } from "node:test";
import assert from "node:assert/strict";
import {
@ -7,6 +7,8 @@ import {
defaultRoutingConfig,
scoreModel,
computeTaskRequirements,
scoreEligibleModels,
getEligibleModels,
MODEL_CAPABILITY_PROFILES,
} from "../model-router.js";
import type { DynamicRoutingConfig, RoutingDecision, ModelCapabilities } from "../model-router.js";
@ -211,9 +213,9 @@ test("#2192: known model is still downgraded normally", () => {
// ─── Capability Scoring (ADR-004 Phase 2) ───────────────────────────────────
test("defaultRoutingConfig includes capability_routing: false", () => {
test("defaultRoutingConfig includes capability_routing: true", () => {
const config = defaultRoutingConfig();
assert.equal(config.capability_routing, false);
assert.equal(config.capability_routing, true);
});
test("scoreModel computes weighted average of capability × requirement", () => {
@ -356,3 +358,401 @@ test("#2885: heavy openai-codex model downgrades to light for light task", () =>
// Should pick a light-tier model
assert.notEqual(result.modelId, "gpt-5.4", "should not use the heavy model for light task");
});
// ─── scoreModel ──────────────────────────────────────────────────────────────
describe("scoreModel", () => {
const sonnetProfile: ModelCapabilities = MODEL_CAPABILITY_PROFILES["claude-sonnet-4-6"]!;
test("produces correct weighted average for two dimensions (coding:0.9, instruction:0.7)", () => {
// (0.9*85 + 0.7*85) / (0.9+0.7) = (76.5+59.5)/1.6 = 136/1.6 = 85.0
const score = scoreModel(sonnetProfile, { coding: 0.9, instruction: 0.7 });
assert.ok(Math.abs(score - 85.0) < 0.01, `Expected ~85.0, got ${score}`);
});
test("returns 50 when requirements is empty", () => {
const score = scoreModel(sonnetProfile, {});
assert.equal(score, 50);
});
test("returns correct score for single dimension coding:1.0", () => {
// coding=90 for claude-opus-4-6
const opusProfile = MODEL_CAPABILITY_PROFILES["claude-opus-4-6"]!;
const score = scoreModel(opusProfile, { coding: 1.0 });
assert.equal(score, 95);
});
test("handles all 7 dimensions correctly", () => {
// Uniform weight 1.0 on every dim → average of all dim values
const profile: ModelCapabilities = {
coding: 60, debugging: 60, research: 60, reasoning: 60,
speed: 60, longContext: 60, instruction: 60,
};
const reqs: Partial<Record<keyof ModelCapabilities, number>> = {
coding: 1.0, debugging: 1.0, research: 1.0, reasoning: 1.0,
speed: 1.0, longContext: 1.0, instruction: 1.0,
};
const score = scoreModel(profile, reqs);
assert.equal(score, 60);
});
});
// ─── computeTaskRequirements ─────────────────────────────────────────────────
describe("computeTaskRequirements", () => {
test("execute-task with no metadata returns base vector", () => {
const req = computeTaskRequirements("execute-task", undefined);
assert.deepStrictEqual(req, { coding: 0.9, instruction: 0.7, speed: 0.3 });
});
test("execute-task with tags:['docs'] adjusts requirements", () => {
const req = computeTaskRequirements("execute-task", { tags: ["docs"] });
assert.equal(req.instruction, 0.9);
assert.equal(req.coding, 0.3);
assert.equal(req.speed, 0.7);
});
test("execute-task with tags:['config'] adjusts requirements", () => {
const req = computeTaskRequirements("execute-task", { tags: ["config"] });
assert.equal(req.instruction, 0.9);
});
test("execute-task with complexityKeywords:['concurrency'] boosts debugging and reasoning", () => {
const req = computeTaskRequirements("execute-task", { complexityKeywords: ["concurrency"] });
assert.equal(req.debugging, 0.9);
assert.equal(req.reasoning, 0.8);
});
test("execute-task with complexityKeywords:['migration'] boosts reasoning and coding", () => {
const req = computeTaskRequirements("execute-task", { complexityKeywords: ["migration"] });
assert.equal(req.reasoning, 0.9);
assert.equal(req.coding, 0.8);
});
test("execute-task with fileCount:8 boosts coding and reasoning", () => {
const req = computeTaskRequirements("execute-task", { fileCount: 8 });
assert.equal(req.coding, 0.9);
assert.equal(req.reasoning, 0.7);
});
test("execute-task with estimatedLines:600 boosts coding and reasoning", () => {
const req = computeTaskRequirements("execute-task", { estimatedLines: 600 });
assert.equal(req.coding, 0.9);
assert.equal(req.reasoning, 0.7);
});
test("research-milestone returns correct base vector", () => {
const req = computeTaskRequirements("research-milestone");
assert.deepStrictEqual(req, { research: 0.9, longContext: 0.7, reasoning: 0.5 });
});
test("plan-slice returns correct base vector", () => {
const req = computeTaskRequirements("plan-slice");
assert.deepStrictEqual(req, { reasoning: 0.9, coding: 0.5 });
});
test("unknown-unit-type returns default reasoning requirement", () => {
const req = computeTaskRequirements("unknown-unit-type");
assert.deepStrictEqual(req, { reasoning: 0.5 });
});
test("non-execute-task with metadata ignores metadata refinements", () => {
// research-milestone should return the same vector regardless of metadata
const reqWithMeta = computeTaskRequirements("research-milestone", { tags: ["docs"], fileCount: 10 });
const reqWithout = computeTaskRequirements("research-milestone");
assert.deepStrictEqual(reqWithMeta, reqWithout);
});
});
// ─── scoreEligibleModels ─────────────────────────────────────────────────────
describe("scoreEligibleModels", () => {
test("ranks models by score descending when scores differ by more than 2", () => {
// research: heavily weights research dimension. gemini-2.5-pro has 85 research vs sonnet's 75
const requirements = { research: 0.9, longContext: 0.7, reasoning: 0.5 };
const results = scoreEligibleModels(["claude-sonnet-4-6", "gemini-2.5-pro"], requirements);
assert.equal(results.length, 2);
assert.ok(results[0].score >= results[1].score, "Should be sorted by score descending");
});
test("within 2-point threshold, prefers cheaper model", () => {
// Use models without built-in profiles (both get score 50) so tie-break applies
// Then use known models with equal scores: force this via single unknown model pair
const requirements = { coding: 1.0 };
// model-a and model-b are both unknown → score=50, cost=Infinity → lexicographic
const results = scoreEligibleModels(["model-z", "model-a"], requirements);
// Both unknown: score=50 (within 2), cost=Infinity (equal) → lex: model-a first
assert.equal(results[0].modelId, "model-a");
});
test("single model returns array of one", () => {
const results = scoreEligibleModels(["claude-sonnet-4-6"], { coding: 0.9 });
assert.equal(results.length, 1);
assert.equal(results[0].modelId, "claude-sonnet-4-6");
});
test("unknown model with no profile gets score of 50", () => {
const results = scoreEligibleModels(["totally-unknown-model"], { coding: 1.0 });
assert.equal(results[0].score, 50);
});
test("capabilityOverrides deep-merges with built-in profile", () => {
const requirements = { coding: 1.0 };
// Override sonnet's coding to 30 — gpt-4o (coding=80) should win
const results = scoreEligibleModels(
["claude-sonnet-4-6", "gpt-4o"],
requirements,
{ "claude-sonnet-4-6": { coding: 30 } },
);
assert.equal(results[0].modelId, "gpt-4o", "gpt-4o should rank first after coding override");
});
});
// ─── getEligibleModels ───────────────────────────────────────────────────────
describe("getEligibleModels", () => {
const ALL_MODELS = [
"claude-opus-4-6", // heavy
"claude-sonnet-4-6", // standard
"claude-haiku-4-5", // light
"gpt-4o-mini", // light
"gpt-4o", // standard
];
test("returns light-tier models from available list sorted by cost", () => {
const config: DynamicRoutingConfig = defaultRoutingConfig();
const result = getEligibleModels("light", ALL_MODELS, config);
assert.ok(result.length >= 1);
for (const id of result) {
assert.ok(
["claude-haiku-4-5", "gpt-4o-mini"].includes(id),
`Expected light-tier model, got ${id}`,
);
}
});
test("returns standard-tier models from available list sorted by cost", () => {
const config: DynamicRoutingConfig = defaultRoutingConfig();
const result = getEligibleModels("standard", ALL_MODELS, config);
assert.ok(result.length >= 1);
for (const id of result) {
assert.ok(
["claude-sonnet-4-6", "gpt-4o"].includes(id),
`Expected standard-tier model, got ${id}`,
);
}
});
test("tier_models pinned model returns single-element array", () => {
const config: DynamicRoutingConfig = {
...defaultRoutingConfig(),
tier_models: { light: "gpt-4o-mini" },
};
const result = getEligibleModels("light", ALL_MODELS, config);
assert.deepStrictEqual(result, ["gpt-4o-mini"]);
});
test("empty available list returns empty array", () => {
const config: DynamicRoutingConfig = defaultRoutingConfig();
const result = getEligibleModels("light", [], config);
assert.equal(result.length, 0);
});
test("unknown models classified as standard appear in standard tier results", () => {
const config: DynamicRoutingConfig = defaultRoutingConfig();
// unknown-model-xyz has no entry → defaults to standard tier
const result = getEligibleModels("standard", ["unknown-model-xyz"], config);
assert.ok(result.includes("unknown-model-xyz"), "Unknown model should appear in standard tier");
});
});
// ─── capability-aware routing integration ────────────────────────────────────
describe("capability-aware routing integration", () => {
// All standard-tier models available alongside heavy (opus)
const MULTI_MODEL_AVAILABLE = [
"claude-opus-4-6",
"claude-sonnet-4-6",
"gpt-4o",
"gemini-2.5-pro",
"claude-haiku-4-5",
"gpt-4o-mini",
];
// 1. Full pipeline with capability scoring active
test("full pipeline with capability_routing: true returns capability-scored decision", () => {
const config: DynamicRoutingConfig = { ...defaultRoutingConfig(), enabled: true, capability_routing: true };
// Configured primary is opus (heavy) — standard tier should trigger capability scoring
const result = resolveModelForComplexity(
{ tier: "standard", reason: "test", downgraded: false },
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
MULTI_MODEL_AVAILABLE,
"execute-task",
{ tags: [], complexityKeywords: [], fileCount: 3, estimatedLines: 100, codeBlockCount: 0 },
);
assert.equal(result.selectionMethod, "capability-scored", "should use capability scoring when enabled with multiple eligible models");
assert.ok(result.capabilityScores !== undefined, "capabilityScores should be populated");
assert.ok(Object.keys(result.capabilityScores!).length > 1, "should have scores for multiple models");
assert.equal(result.wasDowngraded, true, "should be downgraded from opus");
});
// 2. capability_routing: false falls back to tier-only
test("capability_routing: false skips scoring and uses tier-only", () => {
const config: DynamicRoutingConfig = { ...defaultRoutingConfig(), enabled: true, capability_routing: false };
const result = resolveModelForComplexity(
{ tier: "standard", reason: "test", downgraded: false },
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
MULTI_MODEL_AVAILABLE,
"execute-task",
undefined,
);
assert.equal(result.selectionMethod, "tier-only", "capability_routing: false should use tier-only");
assert.equal(result.capabilityScores, undefined, "capabilityScores should be undefined for tier-only");
});
// 3. Single eligible model skips scoring
test("single eligible model skips capability scoring and uses tier-only", () => {
const config: DynamicRoutingConfig = {
...defaultRoutingConfig(),
enabled: true,
capability_routing: true,
tier_models: { standard: "claude-sonnet-4-6" },
};
// Pin to single standard model — eligible.length === 1 → skips STEP 2
const result = resolveModelForComplexity(
{ tier: "standard", reason: "test", downgraded: false },
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
MULTI_MODEL_AVAILABLE,
"execute-task",
undefined,
);
// Single pinned model → tier-only (no scoring needed)
assert.equal(result.selectionMethod, "tier-only", "single eligible model should use tier-only");
assert.equal(result.modelId, "claude-sonnet-4-6", "should use the pinned model");
});
// 4. Unknown model with no profile gets uniform 50s and competes
test("unknown model with no profile gets uniform score of 50 and can compete", () => {
const unknownModel = "unknown-future-model-xyz";
const config: DynamicRoutingConfig = { ...defaultRoutingConfig(), enabled: true, capability_routing: true };
// Add unknown model to available list at standard tier (unknown → standard per D-15)
// scoring should still work with score=50 for the unknown model
const requirements = { coding: 0.9, instruction: 0.7, speed: 0.3 };
const scored = scoreEligibleModels([unknownModel, "claude-sonnet-4-6"], requirements);
const unknownEntry = scored.find(s => s.modelId === unknownModel);
assert.ok(unknownEntry !== undefined, "unknown model should be in scored results");
// Unknown model gets uniform 50s: (0.9*50 + 0.7*50 + 0.3*50) / (0.9+0.7+0.3) ≈ 50
assert.ok(Math.abs(unknownEntry!.score - 50) < 0.01, `expected score ~50, got ${unknownEntry!.score}`);
});
// 5. Capability overrides change scoring outcome
test("capabilityOverrides boost a model above another for same task", () => {
// sonnet: coding=85, gpt-4o: coding=80. Override gpt-4o coding to 99 → gpt-4o should win.
const requirements = { coding: 1.0 };
const overrides = { "gpt-4o": { coding: 99 } };
const scored = scoreEligibleModels(["claude-sonnet-4-6", "gpt-4o"], requirements, overrides);
assert.equal(scored[0].modelId, "gpt-4o", "overridden model should win for coding-heavy task");
assert.ok(scored[0].score > 90, `expected score > 90 after override, got ${scored[0].score}`);
});
// 5b. Capability overrides pass through resolveModelForComplexity to scoreEligibleModels
test("resolveModelForComplexity passes capabilityOverrides to scoring step", () => {
const config: DynamicRoutingConfig = { ...defaultRoutingConfig(), enabled: true, capability_routing: true };
// sonnet coding=85, gpt-4o coding=80. Override gpt-4o coding to 99 → gpt-4o should win.
const overrides: Record<string, Partial<ModelCapabilities>> = { "gpt-4o": { coding: 99 } };
const result = resolveModelForComplexity(
{ tier: "standard", reason: "test", downgraded: false },
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
["claude-opus-4-6", "claude-sonnet-4-6", "gpt-4o"],
"execute-task",
undefined,
overrides,
);
assert.equal(result.selectionMethod, "capability-scored");
assert.equal(result.modelId, "gpt-4o", "gpt-4o should win with coding override");
});
// 6. Regression: existing routing guards unchanged
test("regression: routing-disabled passthrough still returns tier-only", () => {
const config: DynamicRoutingConfig = { ...defaultRoutingConfig(), enabled: false };
const result = resolveModelForComplexity(
{ tier: "light", reason: "test", downgraded: false },
{ primary: "claude-opus-4-6", fallbacks: [] },
config,
MULTI_MODEL_AVAILABLE,
"execute-task",
undefined,
);
assert.equal(result.selectionMethod, "tier-only");
assert.equal(result.wasDowngraded, false);
assert.equal(result.modelId, "claude-opus-4-6");
});
test("regression: unknown-model bypass returns tier-only and does not downgrade", () => {
const config: DynamicRoutingConfig = { ...defaultRoutingConfig(), enabled: true };
const result = resolveModelForComplexity(
{ tier: "light", reason: "test", downgraded: false },
{ primary: "totally-unknown-custom-model", fallbacks: [] },
config,
["totally-unknown-custom-model", ...MULTI_MODEL_AVAILABLE],
"execute-task",
undefined,
);
assert.equal(result.selectionMethod, "tier-only");
assert.equal(result.wasDowngraded, false);
assert.equal(result.modelId, "totally-unknown-custom-model");
});
test("regression: no-downgrade-needed path returns tier-only", () => {
const config: DynamicRoutingConfig = { ...defaultRoutingConfig(), enabled: true, capability_routing: true };
// Configured model is sonnet (standard), requesting standard → no downgrade needed
const result = resolveModelForComplexity(
{ tier: "standard", reason: "test", downgraded: false },
{ primary: "claude-sonnet-4-6", fallbacks: [] },
config,
MULTI_MODEL_AVAILABLE,
"execute-task",
undefined,
);
assert.equal(result.selectionMethod, "tier-only");
assert.equal(result.wasDowngraded, false);
assert.equal(result.modelId, "claude-sonnet-4-6");
});
});
// ─── getModelTier unknown default ────────────────────────────────────────────
describe("getModelTier unknown default", () => {
test("unknown model returns standard tier (not heavy) via downgrade behavior", () => {
// We can verify this indirectly: resolveModelForComplexity for a standard classification
// with an unknown primary model should NOT downgrade (because unknown → standard, not heavy)
const config = { ...defaultRoutingConfig(), enabled: true };
// Use "unknown-model-xyz" as primary — its tier will be "standard" per D-15
// Classification is "heavy" → tier >= standard → no downgrade
// But unknown models use the isKnownModel() guard, so they pass through anyway
// Test the positive: an unknown model is NOT treated as heavy
const result = resolveModelForComplexity(
makeClassification("standard"),
{ primary: "claude-sonnet-4-6", fallbacks: [] },
config,
["claude-sonnet-4-6", "claude-haiku-4-5", "gpt-4o-mini"],
);
// standard classification with standard model (sonnet) → no downgrade
assert.equal(result.wasDowngraded, false, "standard model should not downgrade for standard task");
assert.equal(result.modelId, "claude-sonnet-4-6");
});
test("unknown model in getEligibleModels defaults to standard tier", () => {
// Per D-15: getModelTier returns "standard" for unknown models
const config: DynamicRoutingConfig = defaultRoutingConfig();
const standardModels = getEligibleModels("standard", ["totally-unknown-model-abc"], config);
const lightModels = getEligibleModels("light", ["totally-unknown-model-abc"], config);
const heavyModels = getEligibleModels("heavy", ["totally-unknown-model-abc"], config);
assert.ok(standardModels.includes("totally-unknown-model-abc"), "Unknown model should be in standard tier");
assert.equal(lightModels.length, 0, "Unknown model should NOT be in light tier");
assert.equal(heavyModels.length, 0, "Unknown model should NOT be in heavy tier");
});
});

View file

@ -316,6 +316,7 @@ export interface ClassificationResult {
tier: ComplexityTier;
reason: string;
downgraded: boolean;
taskMetadata?: TaskMetadata;
}
export interface TaskMetadata {