Final rebrand: rename remaining Rust source file to complete the gsd → forge transition. All parser references already use forge_parser after earlier commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
460 lines
24 KiB
Markdown
460 lines
24 KiB
Markdown
# ADR-004: Capability-Aware Model Routing
|
||
|
||
**Status:** Implemented (Phase 2)
|
||
**Date:** 2026-03-26
|
||
**Revised:** 2026-04-03
|
||
**Deciders:** Jeremy McSpadden
|
||
**Related:** ADR-003 (pipeline simplification), [Issue #2655](https://github.com/singularity-forge/sf-run/issues/2655), `docs/dynamic-model-routing.md`
|
||
|
||
## Context
|
||
|
||
SF already supports dynamic model routing in auto-mode, but the current router is fundamentally **complexity-tier and cost based**, not **task-capability based**.
|
||
|
||
Today the selection pipeline is:
|
||
|
||
```
|
||
unit dispatch
|
||
→ classifyUnitComplexity(unitType, unitId, basePath, budgetPct)
|
||
→ UNIT_TYPE_TIERS default mapping
|
||
→ analyzeTaskComplexity() / analyzePlanComplexity() [metadata heuristics]
|
||
→ getAdaptiveTierAdjustment() [routing history]
|
||
→ applyBudgetPressure() [budget ceiling]
|
||
→ resolveModelForComplexity(classification, phaseConfig, routingConfig, availableModelIds)
|
||
→ downgrade-only: never upgrades beyond user's configured model
|
||
→ MODEL_CAPABILITY_TIER lookup → cheapest available in tier
|
||
→ fallback chain assembly
|
||
→ resolveModelId() → pi.setModel()
|
||
→ before_provider_request hook (payload mutation only)
|
||
```
|
||
|
||
This architecture works when all models inside a tier are effectively interchangeable. That assumption no longer holds.
|
||
|
||
Users increasingly configure heterogeneous provider pools through `models.json`, scoped provider setup, and `/scoped-models`. In practice:
|
||
|
||
- Claude-class models often perform best on greenfield implementation and architecture work
|
||
- Codex-class models often perform best on debugging, refactoring, and root-cause analysis
|
||
- Gemini-class models often perform best on long-context synthesis and research-heavy tasks
|
||
- Fast small models are often best for cheap validation, triage, and lightweight hooks
|
||
|
||
The current router cannot express those differences. If Claude and Codex are both available at the same tier, SF either:
|
||
|
||
- treats them as equivalent and picks the cheaper one, or
|
||
- requires the user to hardcode specific phase models manually
|
||
|
||
That produces three structural problems:
|
||
|
||
### 1. Wrong optimization target
|
||
|
||
The router optimizes primarily for **task difficulty vs model cost**. The real problem is **task requirements vs model strengths**, subject to cost constraints.
|
||
|
||
### 2. Poor behavior with heterogeneous pools
|
||
|
||
Different users have different subscriptions and provider access. A fixed mapping like "research always uses Gemini" does not generalize when the user only has Claude + Codex, or only local models.
|
||
|
||
### 3. Capability knowledge is trapped in user intuition
|
||
|
||
Experienced users know which models are better at coding, debugging, research, long-context work, or instruction following. SF has no representation for that knowledge, so it cannot route intelligently on the user's behalf.
|
||
|
||
The system already has several building blocks that make a richer router feasible:
|
||
|
||
- unit types already encode the kind of work being dispatched
|
||
- `complexity-classifier.ts` already extracts rich `TaskMetadata` (file counts, dependency counts, tags, complexity keywords, code block counts)
|
||
- `auto-dispatch.ts` and prompt builders provide stable task categories
|
||
- `ctx.modelRegistry.getAvailable()` exposes the current model pool
|
||
- `models.json` already supports user overrides and cost data per model
|
||
- budget ceilings, routing history, and retry escalation already exist
|
||
- the `model_select` hook fires on model changes and could be extended for pre-selection interception
|
||
|
||
## Decision
|
||
|
||
**Extend dynamic routing from a one-dimensional tier system to a two-dimensional system that combines complexity classification ("how hard") with capability scoring ("what kind"), while preserving downgrade-only semantics, budget controls, and user overrideability.**
|
||
|
||
### Design Principles
|
||
|
||
1. **Downgrade-only invariant is preserved.** The user's configured model for a phase is always the ceiling. Capability scoring ranks models within the eligible set — it never promotes above the user's configured model.
|
||
|
||
2. **Complexity classification remains.** The existing `classifyUnitComplexity()` pipeline (unit type defaults, task plan analysis, adaptive learning, budget pressure) continues to determine tier eligibility. Capability scoring selects among tier-eligible models.
|
||
|
||
3. **Cost is a constraint, not a score dimension.** Budget pressure constrains which models are eligible. Capability profiles describe what models are good at, not what they cost.
|
||
|
||
4. **Requirement vectors are dynamic, not static.** Task requirements are computed from `(unitType, TaskMetadata)`, not from unit type alone.
|
||
|
||
### The Revised Routing Pipeline
|
||
|
||
```
|
||
unit dispatch
|
||
→ classifyUnitComplexity(unitType, unitId, basePath, budgetPct)
|
||
[unchanged — determines tier eligibility and budget filtering]
|
||
→ resolveModelForComplexity(classification, phaseConfig, routingConfig, availableModelIds)
|
||
→ STEP 1: filter to tier-eligible models (downgrade-only from user ceiling)
|
||
→ STEP 2: if capability routing enabled AND >1 eligible model:
|
||
→ computeTaskRequirements(unitType, taskMetadata)
|
||
→ scoreEligibleModels(eligible, taskRequirements)
|
||
→ select highest-scoring model (deterministic tie-break by cost, then ID)
|
||
→ STEP 3: assemble fallback chain
|
||
→ resolveModelId() → pi.setModel()
|
||
```
|
||
|
||
### Model Capability Profiles
|
||
|
||
Each model gains an optional capability profile:
|
||
|
||
```ts
|
||
interface ModelCapabilities {
|
||
coding: number; // greenfield implementation, code generation
|
||
debugging: number; // root-cause analysis, error diagnosis, refactoring
|
||
research: number; // information synthesis, investigation, exploration
|
||
reasoning: number; // multi-step logic, planning, architecture
|
||
speed: number; // response latency (inverse of thinking time)
|
||
longContext: number; // effective use of large input windows
|
||
instruction: number; // instruction following, structured output adherence
|
||
}
|
||
```
|
||
|
||
Scores are normalized `0–100`. Seven dimensions. No `costEfficiency` dimension — cost is handled separately by budget pressure and tier economics.
|
||
|
||
Models without a capability profile are treated as having uniform scores across all dimensions (score 50 in each), which makes capability scoring a no-op for those models and falls back to the existing cheapest-in-tier behavior.
|
||
|
||
### Dynamic Task Requirement Vectors
|
||
|
||
Requirement vectors are computed as a function of `(unitType, TaskMetadata)`, not looked up from a static table. This preserves the nuance that `classifyUnitComplexity` already captures.
|
||
|
||
```ts
|
||
function computeTaskRequirements(
|
||
unitType: string,
|
||
metadata?: TaskMetadata,
|
||
): Partial<Record<keyof ModelCapabilities, number>> {
|
||
// Base vector from unit type
|
||
const base = BASE_REQUIREMENTS[unitType] ?? { reasoning: 0.5 };
|
||
|
||
// Refine based on task metadata (only for execute-task)
|
||
if (unitType === "execute-task" && metadata) {
|
||
// Docs/config/rename tasks → boost instruction, reduce coding
|
||
if (metadata.tags?.some(t => /^(docs?|readme|comment|config|typo|rename)$/i.test(t))) {
|
||
return { ...base, instruction: 0.9, coding: 0.3, speed: 0.7 };
|
||
}
|
||
// Debugging keywords → boost debugging and reasoning
|
||
if (metadata.complexityKeywords?.some(k => k === "concurrency" || k === "compatibility")) {
|
||
return { ...base, debugging: 0.9, reasoning: 0.8 };
|
||
}
|
||
// Migration/architecture → boost reasoning and coding
|
||
if (metadata.complexityKeywords?.some(k => k === "migration" || k === "architecture")) {
|
||
return { ...base, reasoning: 0.9, coding: 0.8 };
|
||
}
|
||
// Many files or high estimated lines → boost coding
|
||
if ((metadata.fileCount ?? 0) >= 6 || (metadata.estimatedLines ?? 0) >= 500) {
|
||
return { ...base, coding: 0.9, reasoning: 0.7 };
|
||
}
|
||
}
|
||
|
||
return base;
|
||
}
|
||
```
|
||
|
||
Base requirement vectors by unit type:
|
||
|
||
```ts
|
||
const BASE_REQUIREMENTS: Record<string, Partial<Record<keyof ModelCapabilities, number>>> = {
|
||
"execute-task": { coding: 0.9, instruction: 0.7, speed: 0.3 },
|
||
"research-milestone": { research: 0.9, longContext: 0.7, reasoning: 0.5 },
|
||
"research-slice": { research: 0.9, longContext: 0.7, reasoning: 0.5 },
|
||
"plan-milestone": { reasoning: 0.9, coding: 0.5 },
|
||
"plan-slice": { reasoning: 0.9, coding: 0.5 },
|
||
"replan-slice": { reasoning: 0.9, debugging: 0.6, coding: 0.5 },
|
||
"reassess-roadmap": { reasoning: 0.9, research: 0.5 },
|
||
"complete-slice": { instruction: 0.8, speed: 0.7 },
|
||
"run-uat": { instruction: 0.7, speed: 0.8 },
|
||
"discuss-milestone": { reasoning: 0.6, instruction: 0.7 },
|
||
"complete-milestone": { instruction: 0.8, reasoning: 0.5 },
|
||
};
|
||
```
|
||
|
||
### Scoring Function
|
||
|
||
```ts
|
||
function scoreModel(
|
||
model: ModelCapabilities,
|
||
requirements: Partial<Record<keyof ModelCapabilities, number>>,
|
||
): number {
|
||
let weightedSum = 0;
|
||
let weightSum = 0;
|
||
for (const [dim, weight] of Object.entries(requirements)) {
|
||
const capability = model[dim as keyof ModelCapabilities] ?? 50;
|
||
weightedSum += weight * capability;
|
||
weightSum += weight;
|
||
}
|
||
return weightSum > 0 ? weightedSum / weightSum : 50;
|
||
}
|
||
```
|
||
|
||
This produces a **weighted average** in the range `0–100`, where each dimension's contribution is proportional to its requirement weight. The output is directly comparable across models regardless of how many dimensions the requirement vector has.
|
||
|
||
**Tie-breaking:** When two models score within 2 points of each other, prefer the cheaper model (by `MODEL_COST_PER_1K_INPUT`). If cost is also equal, break ties by lexicographic model ID for determinism.
|
||
|
||
### Configuration Model
|
||
|
||
Built-in capability profiles ship as a data table alongside `MODEL_CAPABILITY_TIER` and `MODEL_COST_PER_1K_INPUT` in `model-router.ts`:
|
||
|
||
```ts
|
||
const MODEL_CAPABILITY_PROFILES: Record<string, ModelCapabilities> = {
|
||
"claude-opus-4-6": { coding: 95, debugging: 90, research: 85, reasoning: 95, speed: 30, longContext: 80, instruction: 90 },
|
||
"claude-sonnet-4-6": { coding: 85, debugging: 80, research: 75, reasoning: 80, speed: 60, longContext: 75, instruction: 85 },
|
||
"claude-haiku-4-5": { coding: 60, debugging: 50, research: 45, reasoning: 50, speed: 95, longContext: 50, instruction: 75 },
|
||
"gpt-4o": { coding: 80, debugging: 75, research: 70, reasoning: 75, speed: 65, longContext: 70, instruction: 80 },
|
||
"gpt-4o-mini": { coding: 55, debugging: 45, research: 40, reasoning: 45, speed: 90, longContext: 45, instruction: 70 },
|
||
"gemini-2.5-pro": { coding: 75, debugging: 70, research: 85, reasoning: 75, speed: 55, longContext: 90, instruction: 75 },
|
||
"gemini-2.0-flash": { coding: 50, debugging: 40, research: 50, reasoning: 40, speed: 95, longContext: 60, instruction: 65 },
|
||
"deepseek-chat": { coding: 75, debugging: 65, research: 55, reasoning: 70, speed: 70, longContext: 55, instruction: 65 },
|
||
"o3": { coding: 80, debugging: 85, research: 80, reasoning: 92, speed: 25, longContext: 70, instruction: 85 },
|
||
};
|
||
```
|
||
|
||
Users can override capability profiles in `models.json` per provider:
|
||
|
||
```json
|
||
{
|
||
"providers": {
|
||
"anthropic": {
|
||
"modelOverrides": {
|
||
"claude-sonnet-4-6": {
|
||
"capabilities": {
|
||
"debugging": 90,
|
||
"research": 85
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Partial overrides are deep-merged with built-in defaults. This uses the same `modelOverrides` path that already supports `contextWindow`, `cost`, and `compat` overrides.
|
||
|
||
### Profile Versioning
|
||
|
||
Built-in capability profiles are maintained alongside the existing `MODEL_CAPABILITY_TIER` and `MODEL_COST_PER_1K_INPUT` tables in `model-router.ts`. When the `@sf/pi-ai` model catalog is updated with new models, the capability profile table must be updated in the same PR. A linting rule should flag any model present in `MODEL_CAPABILITY_TIER` but missing from `MODEL_CAPABILITY_PROFILES`.
|
||
|
||
Profiles are versioned implicitly by SF release. The existing `models.json` `modelOverrides` mechanism allows users to correct stale defaults immediately without waiting for a SF update.
|
||
|
||
### Extension-First Rollout
|
||
|
||
Capability-aware routing should be prototypable as an extension before moving to core. The current hook surface is **insufficient** for this:
|
||
|
||
- `before_provider_request` fires after model selection, at the API payload level — too late to swap model choice.
|
||
- `model_select` fires reactively when a model changes, not before selection — it cannot influence the choice.
|
||
|
||
**Required hook addition:** A `before_model_select` hook that fires within `selectAndApplyModel()` after tier classification but before `resolveModelForComplexity()`. This hook would receive:
|
||
|
||
```ts
|
||
interface BeforeModelSelectEvent {
|
||
unitType: string;
|
||
unitId: string;
|
||
classification: ClassificationResult;
|
||
taskMetadata: TaskMetadata;
|
||
eligibleModels: string[]; // tier-filtered available models
|
||
phaseConfig: ResolvedModelConfig;
|
||
}
|
||
```
|
||
|
||
Return value: `{ modelId: string } | undefined` (override selection, or undefined to use default).
|
||
|
||
This hook enables an extension to implement capability scoring externally, test it against real workloads, and validate behavior before the logic moves into `model-router.ts`.
|
||
|
||
**Rollout sequence:**
|
||
|
||
1. **Phase 1:** Add `before_model_select` hook and `TaskMetadata` to `ClassificationResult`. Ship built-in capability profile data table. No core routing changes.
|
||
2. **Phase 2:** Implement capability scoring as an extension that hooks `before_model_select`. Gather user feedback through routing history.
|
||
3. **Phase 3:** If behavior proves stable, move scoring into `resolveModelForComplexity()` in core. Extension hook remains for custom routing strategies.
|
||
|
||
### Observability
|
||
|
||
Every routing decision must be inspectable. The existing `RoutingDecision` interface is extended:
|
||
|
||
```ts
|
||
interface RoutingDecision {
|
||
modelId: string;
|
||
fallbacks: string[];
|
||
tier: ComplexityTier;
|
||
wasDowngraded: boolean;
|
||
reason: string;
|
||
// New fields:
|
||
capabilityScores?: Record<string, number>; // model ID → score
|
||
taskRequirements?: Partial<Record<string, number>>; // dimension → weight
|
||
selectionMethod: "tier-only" | "capability-scored";
|
||
}
|
||
```
|
||
|
||
When verbose mode is on, the routing notification includes the top-scoring models and why the winner was selected:
|
||
|
||
```
|
||
Dynamic routing [S]: claude-sonnet-4-6 (scored 82.3 — coding:0.9×85, debugging:0.6×80)
|
||
runner-up: gpt-4o (scored 78.1)
|
||
```
|
||
|
||
## Consequences
|
||
|
||
### Positive
|
||
|
||
#### 1. Better model-task fit
|
||
|
||
Routing decisions become based on the kind of work being done, not only how expensive or complex the work appears. A debugging task routes to the strongest debugger in the pool; a research task routes to the best synthesizer.
|
||
|
||
#### 2. Works across arbitrary model pools
|
||
|
||
The router no longer depends on a hardcoded vendor assumption. If a user has only Claude + Codex, it can still route intelligently between them. If the user adds Gemini or local models later, the same scoring system continues to work.
|
||
|
||
#### 3. Preserves all existing invariants
|
||
|
||
- **Downgrade-only semantics:** capability scoring never upgrades beyond the user's configured phase model.
|
||
- **Budget pressure:** unchanged — constrains tier eligibility before scoring runs.
|
||
- **Retry escalation:** unchanged — escalates tier, then scoring picks the best model in the new tier.
|
||
- **Fallback chains:** assembled the same way, with capability-scored winner as primary.
|
||
|
||
#### 4. Creates a testable, versionable contract for routing behavior
|
||
|
||
Capability profiles and task vectors are explicit data structures. Routing decisions are inspectable in verbose mode. The scoring function is a pure function suitable for deterministic unit tests.
|
||
|
||
#### 5. Opens the door to adaptive learning
|
||
|
||
Existing routing history (`routing-history.ts`) can later refine capability scores per task type. When a model consistently fails at a particular task shape, its effective score for that dimension decreases. This is a natural extension of the existing `getAdaptiveTierAdjustment()` mechanism.
|
||
|
||
#### 6. Graceful degradation
|
||
|
||
Models without capability profiles get uniform scores, producing the same cheapest-in-tier behavior as today. Zero behavior change for users who don't configure heterogeneous pools.
|
||
|
||
### Negative
|
||
|
||
#### 1. More metadata to maintain
|
||
|
||
Built-in model profiles will drift as model families evolve. Mitigation: profiles live in a single data table, versioned with SF releases, with a lint rule for completeness.
|
||
|
||
#### 2. Scoring can create false precision
|
||
|
||
A `0–100` capability scale looks exact but is still heuristic. Mitigation: document profiles as "relative rankings, not benchmarks." The 2-point tie-breaking threshold prevents insignificant score differences from overriding cost optimization.
|
||
|
||
#### 3. More routing complexity
|
||
|
||
The current tier router is simple to explain and debug. Multi-dimensional scoring is more powerful but harder to reason about. Mitigation: verbose observability output shows scores and reasons. The `selectionMethod` field in routing decisions makes it clear whether capability scoring was active.
|
||
|
||
#### 4. Stronger test requirements
|
||
|
||
The router will need coverage for:
|
||
|
||
- profile loading and override merge rules (partial deep-merge from `modelOverrides`)
|
||
- `computeTaskRequirements()` with various unit types and metadata combinations
|
||
- scoring function correctness (weighted average, tie-breaking)
|
||
- interaction with tier eligibility filtering
|
||
- budget pressure applied before scoring, not conflicting with it
|
||
- fallback behavior when no scored model is eligible
|
||
- graceful degradation when no profiles exist (uniform scores)
|
||
- `before_model_select` hook contract (extension path)
|
||
|
||
#### 5. New hook surface to maintain
|
||
|
||
The `before_model_select` hook adds a new extension API contract that must be maintained across releases. Mitigation: the hook is narrowly scoped — one event type, optional return.
|
||
|
||
### Neutral / Migration
|
||
|
||
#### 1. Tier-based routing does not disappear
|
||
|
||
Complexity tiers remain as:
|
||
|
||
- the primary "how hard is this" signal that determines tier eligibility
|
||
- the fallback behavior for models without capability profiles
|
||
- the escalation path on retries (light → standard → heavy)
|
||
|
||
Capability scoring adds the "what kind of work" signal on top. The two systems are layered, not competing.
|
||
|
||
#### 2. Existing preferences continue to work
|
||
|
||
`dynamic_routing.tier_models` still works — it pins a specific model per tier, bypassing capability scoring for that tier. Per-phase model overrides (`models.planning`, `models.execution`, etc.) continue to set the ceiling. No existing configuration breaks.
|
||
|
||
#### 3. Documentation update required
|
||
|
||
`docs/dynamic-model-routing.md` must be updated to explain:
|
||
|
||
- what capability profiles are and how to override them
|
||
- how scoring interacts with tier routing
|
||
- how to read verbose routing output
|
||
- how to use `before_model_select` for custom routing extensions
|
||
|
||
## Risks
|
||
|
||
### 1. Hardcoded vendor stereotypes become stale
|
||
|
||
If the default profiles are not reviewed regularly, SF will encode outdated assumptions about which models are "best" at which tasks.
|
||
|
||
**Mitigation:** Keep defaults in a single data table (not scattered conditionals). Lint for completeness against the model catalog. User overrides via `modelOverrides` provide immediate escape hatch. Document profiles as heuristic rankings, not benchmarks.
|
||
|
||
### 2. Budget logic and capability logic may conflict in user perception
|
||
|
||
The highest-scoring model may not be selected because budget pressure constrained the eligible tier. This could look inconsistent if the user doesn't understand the pipeline order.
|
||
|
||
**Mitigation:** Pipeline order is explicit and enforced in code:
|
||
1. Complexity classification determines tier
|
||
2. Budget pressure may downgrade tier
|
||
3. Tier-eligible models are filtered (downgrade-only from user ceiling)
|
||
4. Capability scoring ranks the eligible set
|
||
5. Cost tie-breaks within scoring threshold
|
||
|
||
Verbose output shows each step. The user sees "budget pressure: 85%" in the reason string when downgrade occurs.
|
||
|
||
### 3. Task-type classification may be too coarse initially
|
||
|
||
A unit type like `execute-task` contains many sub-shapes. The initial base vector plus metadata refinement may not distinguish all meaningful cases.
|
||
|
||
**Mitigation:** The `computeTaskRequirements()` function is designed for iterative refinement. The existing `TaskMetadata` already captures tags, complexity keywords, file counts, dependency counts, and code block counts. New metadata signals can be added to the existing `extractTaskMetadata()` without changing the scoring function. Routing history provides signal on where refinement is needed.
|
||
|
||
### 4. Unknown and custom models may score poorly by default
|
||
|
||
Users often bring custom provider IDs, local models, or vendor aliases that will not exist in the built-in profile table.
|
||
|
||
**Mitigation:** Unknown models receive uniform scores (50 across all dimensions), making capability scoring a no-op — they compete on cost within their tier, same as today. Users can add capability profiles via `modelOverrides` in `models.json` for models they know well.
|
||
|
||
### 5. Extension hook adds API surface
|
||
|
||
The `before_model_select` hook creates a contract that extensions may depend on.
|
||
|
||
**Mitigation:** The hook has a narrow, well-defined interface. It is additive (existing hooks unchanged). The return type is simple (`{ modelId } | undefined`). Breaking changes would be handled through the same extension API versioning as other hooks.
|
||
|
||
## Alternatives Considered
|
||
|
||
### A. Keep pure complexity-tier routing
|
||
|
||
Rejected because it optimizes cost within a tier but still treats meaningfully different models as interchangeable. The existing `MODEL_CAPABILITY_TIER` table already proves this is a recognized gap — it just stops at three buckets.
|
||
|
||
### B. Hardcode task → model mappings
|
||
|
||
Rejected because it breaks as soon as the user does not have the expected model. This is appropriate for a closed product with a fixed fleet, not for SF's user-configured provider model.
|
||
|
||
### C. Route only by user-specified per-phase models
|
||
|
||
Rejected because it pushes all routing intelligence onto the user and does not adapt to retries, task subtype, or provider heterogeneity.
|
||
|
||
### D. Use capability-aware routing only as an extension, never in core
|
||
|
||
Not rejected as a starting point, but insufficient as the long-term architecture. Extension prototyping is the recommended first phase. However, coherent preferences, diagnostics, testing, and profile versioning will likely require core integration if the model proves valuable.
|
||
|
||
### E. Add `costEfficiency` as a capability dimension
|
||
|
||
Rejected because it conflates two concerns. If cost appears in both the scoring function and the budget constraint, the router has two competing cost signals that produce confusing behavior (e.g., a cheap model wins on `costEfficiency` score but then gets filtered out by budget pressure, or vice versa). Cost constrains eligibility; capability determines ranking.
|
||
|
||
### F. Use static requirement vectors per unit type (no metadata refinement)
|
||
|
||
Rejected because the existing `classifyUnitComplexity()` already proves that unit type alone is too coarse. A `execute-task` for docs vs. a `execute-task` for migration are categorically different. The metadata signals (tags, complexity keywords, file counts) that the classifier already extracts should inform requirement vectors.
|
||
|
||
## Appendix: Current Architecture Reference
|
||
|
||
For implementors, the current routing pipeline files:
|
||
|
||
| File | Role |
|
||
|------|------|
|
||
| `auto-dispatch.ts` | Rule table that determines unit type + prompt |
|
||
| `auto-model-selection.ts` | Orchestrates model selection for each dispatch |
|
||
| `complexity-classifier.ts` | Tier classification with task metadata analysis |
|
||
| `model-router.ts` | Tier → model resolution with downgrade-only semantics |
|
||
| `routing-history.ts` | Adaptive learning from success/failure patterns |
|
||
| `preferences-models.ts` | Per-phase model config resolution and fallbacks |
|
||
| `register-hooks.ts` | Hook registration including `before_provider_request` |
|
||
|
||
The capability scoring additions would primarily touch `model-router.ts` (profiles, scoring function) and `auto-model-selection.ts` (passing metadata to the router, new hook point).
|