Final rebrand: rename remaining Rust source file to complete the gsd → forge transition. All parser references already use forge_parser after earlier commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
88 lines
3.1 KiB
Markdown
88 lines
3.1 KiB
Markdown
# Dynamic Model Routing
|
|
|
|
Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces cost by 20-50% without sacrificing quality where it matters.
|
|
|
|
## Enabling
|
|
|
|
```yaml
|
|
dynamic_routing:
|
|
enabled: true
|
|
```
|
|
|
|
## How It Works
|
|
|
|
Each unit passes through two stages:
|
|
|
|
1. **Complexity classification** — classifies work as light, standard, or heavy
|
|
2. **Capability scoring** — within the tier, ranks models by how well they match the task
|
|
|
|
**Key rule:** Your configured model is always the ceiling — routing never upgrades beyond what you've set.
|
|
|
|
| Tier | Typical Work | Model Level |
|
|
|------|-------------|-------------|
|
|
| Light | Slice completion, UAT, hooks | Haiku-class |
|
|
| Standard | Research, planning, execution | Sonnet-class |
|
|
| Heavy | Replanning, roadmap reassessment | Opus-class |
|
|
|
|
## Configuration
|
|
|
|
```yaml
|
|
dynamic_routing:
|
|
enabled: true
|
|
tier_models: # optional: explicit model per tier
|
|
light: claude-haiku-4-5
|
|
standard: claude-sonnet-4-6
|
|
heavy: claude-opus-4-6
|
|
escalate_on_failure: true # bump tier on failure (default)
|
|
budget_pressure: true # auto-downgrade near budget ceiling (default)
|
|
cross_provider: true # consider models from other providers (default)
|
|
capability_routing: true # score models by task fit (default)
|
|
```
|
|
|
|
### Escalate on Failure
|
|
|
|
When a task fails at a given tier, the router escalates to the next tier on retry: Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
|
|
|
|
### Budget Pressure
|
|
|
|
When approaching the budget ceiling, the router progressively downgrades:
|
|
|
|
| Budget Used | Effect |
|
|
|------------|--------|
|
|
| < 50% | No adjustment |
|
|
| 50-75% | Standard → Light |
|
|
| 75-90% | More aggressive |
|
|
| > 90% | Nearly everything → Light |
|
|
|
|
### Cross-Provider
|
|
|
|
When enabled, the router may select models from providers other than your primary, using the built-in cost table to find the cheapest model at each tier.
|
|
|
|
### Capability Routing
|
|
|
|
Models are scored across 7 dimensions: coding, debugging, research, reasoning, speed, long context handling, and instruction following. Different task types weight these dimensions differently — a research task prioritizes research and reasoning, while an execution task prioritizes coding and instruction following.
|
|
|
|
Set `capability_routing: false` to revert to simple cheapest-in-tier selection.
|
|
|
|
## Interaction with Token Profiles
|
|
|
|
Dynamic routing and token profiles work together:
|
|
|
|
- **Token profiles** control phase skipping and context compression
|
|
- **Dynamic routing** controls per-unit model selection
|
|
|
|
The `budget` profile + dynamic routing provides maximum cost savings.
|
|
|
|
## Adaptive Learning
|
|
|
|
SF tracks routing outcomes in `.sf/routing-history.json`. If a tier's failure rate exceeds 20% for a given task type, future classifications are bumped up.
|
|
|
|
Use `/sf rate` to submit feedback:
|
|
|
|
```
|
|
/sf rate over # too powerful — use cheaper next time
|
|
/sf rate ok # just right
|
|
/sf rate under # too weak — use stronger next time
|
|
```
|
|
|
|
Feedback is weighted 2x compared to automatic outcomes.
|