New docs: - dynamic-model-routing.md — complexity classification, tier models, escalation, budget pressure, cost table, adaptive learning - captures-triage.md — fire-and-forget capture, triage pipeline, classification types, dashboard integration, worktree awareness - visualizer.md — four-tab TUI overlay (progress, deps, metrics, timeline), controls, auto-refresh, auto_visualize preference Updated docs: - README.md — added links to three new docs - commands.md — added capture, triage, visualize, knowledge, queue reorder - configuration.md — added dynamic_routing and auto_visualize settings, updated full example with new config options - auto-mode.md — added capture, visualize sections, dashboard badge, dynamic model routing reference - architecture.md — updated dispatch pipeline (routing + captures steps), added key modules table for v2.19 - cost-management.md — added dynamic routing and visualizer tips
4.9 KiB
Dynamic Model Routing
Introduced in v2.19.0
Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
How It Works
Each unit dispatched by auto-mode is classified into a complexity tier:
| Tier | Typical Work | Default Model Level |
|---|---|---|
| Light | Slice completion, UAT, hooks | Haiku-class |
| Standard | Research, planning, execution, milestone completion | Sonnet-class |
| Heavy | Replanning, roadmap reassessment, complex execution | Opus-class |
The router then selects a model for that tier. The key rule: downgrade-only semantics. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
Enabling
Dynamic routing is off by default. Enable it in preferences:
---
version: 1
dynamic_routing:
enabled: true
---
Configuration
dynamic_routing:
enabled: true
tier_models: # explicit model per tier (optional)
light: claude-haiku-4-5
standard: claude-sonnet-4-6
heavy: claude-opus-4-6
escalate_on_failure: true # bump tier on task failure (default: true)
budget_pressure: true # auto-downgrade when approaching budget ceiling (default: true)
cross_provider: true # consider models from other providers (default: true)
hooks: true # apply routing to post-unit hooks (default: true)
tier_models
Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:
- Light:
claude-haiku-4-5,gpt-4o-mini,gemini-2.0-flash - Standard:
claude-sonnet-4-6,gpt-4o,gemini-2.5-pro - Heavy:
claude-opus-4-6,gpt-4.5-preview,gemini-2.5-pro
escalate_on_failure
When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
budget_pressure
When approaching the budget ceiling, the router progressively downgrades:
| Budget Used | Effect |
|---|---|
| < 50% | No adjustment |
| 50-75% | Standard → Light |
| 75-90% | More aggressive downgrading |
| > 90% | Nearly everything → Light; only Heavy stays at Standard |
cross_provider
When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
Complexity Classification
Units are classified using pure heuristics — no LLM calls, sub-millisecond:
Unit Type Defaults
| Unit Type | Default Tier |
|---|---|
complete-slice, run-uat |
Light |
research-*, plan-*, complete-milestone |
Standard |
execute-task |
Standard (upgraded by task analysis) |
replan-slice, reassess-roadmap |
Heavy |
hook/* |
Light |
Task Plan Analysis
For execute-task units, the classifier analyzes the task plan:
| Signal | Simple → Light | Complex → Heavy |
|---|---|---|
| Step count | ≤ 3 | ≥ 8 |
| File count | ≤ 3 | ≥ 8 |
| Description length | < 500 chars | > 2000 chars |
| Code blocks | — | ≥ 5 |
| Complexity keywords | None | Present |
Complexity keywords: research, investigate, refactor, migrate, integrate, complex, architect, redesign, security, performance, concurrent, parallel, distributed, backward compat
Adaptive Learning
The routing history (.gsd/routing-history.json) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (over/under/ok) is weighted 2× vs automatic outcomes.
Interaction with Token Profiles
Dynamic routing and token profiles are complementary:
- Token profiles (
budget/balanced/quality) control phase skipping and context compression - Dynamic routing controls per-unit model selection within the configured phase model
When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The budget token profile + dynamic routing provides maximum cost savings.
Cost Table
The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):
| Model | Input | Output |
|---|---|---|
| claude-haiku-4-5 | $0.80 | $4.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-opus-4-6 | $15.00 | $75.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4o | $2.50 | $10.00 |
| gemini-2.0-flash | $0.10 | $0.40 |
The cost table is used for comparison only — actual billing comes from your provider.