Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
## How It Works
Each unit dispatched by auto-mode is classified into a complexity tier:
The router then selects a model for that tier. The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
## Enabling
Dynamic routing is off by default. Enable it in preferences:
```yaml
---
version: 1
dynamic_routing:
enabled: true
---
```
## Configuration
```yaml
dynamic_routing:
enabled: true
tier_models: # explicit model per tier (optional)
light: claude-haiku-4-5
standard: claude-sonnet-4-6
heavy: claude-opus-4-6
escalate_on_failure: true # bump tier on task failure (default: true)
budget_pressure: true # auto-downgrade when approaching budget ceiling (default: true)
cross_provider: true # consider models from other providers (default: true)
hooks: true # apply routing to post-unit hooks (default: true)
```
### `tier_models`
Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:
When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
### `budget_pressure`
When approaching the budget ceiling, the router progressively downgrades:
| Budget Used | Effect |
|------------|--------|
| <50%|Noadjustment|
| 50-75% | Standard → Light |
| 75-90% | More aggressive downgrading |
| > 90% | Nearly everything → Light; only Heavy stays at Standard |
### `cross_provider`
When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
When `capability_routing` is enabled, the router goes beyond tier classification and scores models against task-specific capability requirements. Each known model has a 7-dimension profile:
| `longContext` | Performance with large context windows |
| `instruction` | Adherence to structured instructions and templates |
Each unit type maps to a weighted requirement vector. For example, `execute-task` weights `coding: 0.9, reasoning: 0.6, debugging: 0.5` while `research-slice` weights `research: 0.9, reasoning: 0.7, longContext: 0.5`.
For `execute-task` units, the classifier also inspects task metadata (tags, description) to refine requirements. Documentation tasks boost `instruction` and lower `coding`; test tasks boost `debugging`.
Enable capability routing:
```yaml
dynamic_routing:
enabled: true
capability_routing: true
```
When enabled, models within the target tier are ranked by capability score rather than selected arbitrarily. When disabled (the default), the existing tier-only selection applies.
The routing history (`.gsd/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (`over`/`under`/`ok`) is weighted 2× vs automatic outcomes.
## Interaction with Token Profiles
Dynamic routing and token profiles are complementary:
- **Token profiles** (`budget`/`balanced`/`quality`) control phase skipping and context compression
- **Dynamic routing** controls per-unit model selection within the configured phase model
When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The `budget` token profile + dynamic routing provides maximum cost savings.
## Cost Table
The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):
| Model | Input | Output |
|-------|-------|--------|
| claude-haiku-4-5 | $0.80 | $4.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-opus-4-6 | $15.00 | $75.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4o | $2.50 | $10.00 |
| gemini-2.0-flash | $0.10 | $0.40 |
The cost table is used for comparison only — actual billing comes from your provider.