# Plan: Dynamic Model Routing for Token Optimization
**Issue:** #575 — Token Consumption Optimization through Dynamic Model Selection
**Status:** Draft
**Date:** 2025-03-15
## Problem Statement
Users on capped plans (e.g., Claude Pro) exhaust weekly token limits in 15-20 hours of GSD usage. Currently, GSD uses a single model per phase (research/planning/execution/completion), configured statically in preferences. Simple tasks consume the same tokens as complex ones.
| `src/resources/extensions/gsd/auto-prompts.ts` | Prompt builders per unit type |
| `packages/pi-coding-agent/src/core/model-registry.ts` | Model availability and metadata |
## Proposed Design
### Core Concept: Task Complexity Classification
Before each unit dispatch, classify the task into a complexity tier and route to an appropriate model. This sits between preference resolution and model dispatch — it can **downgrade** but never **upgrade** beyond the user's configured model.
### Complexity Tiers
| Tier | Complexity | Example Tasks | Default Model |
resolveModelForTier() ← NEW: maps tier → model from available set
│
▼
maybeDowngradeModel() ← NEW: only downgrades from user's configured model
│
▼
Model dispatch (existing auto.ts logic)
```
### Key Design Decisions
1.**Downgrade-only:** The classifier can select a cheaper model than configured, never a more expensive one. The user's preference is the ceiling.
2.**Opt-in with easy override:** New preference key `dynamic_model_routing: true|false` (default: `false`). Users who want token savings enable it explicitly.
3.**Escalation on failure:** If a lower-tier model fails (tool errors, incomplete output, exceeds retries), automatically escalate to the next tier and retry the unit.
4.**No LLM call for classification:** Uses heuristics only — adding an LLM call to save tokens would be counterproductive.
5.**Respects existing fallback chains:** Dynamic routing integrates with existing `fallbacks` — if the dynamically selected model fails, it tries the fallback chain before escalating tiers.
6.**Transparent to user:** Dashboard shows which model was selected and why (tier badge in progress widget).
## Implementation Phases
### Phase 1: Foundation — Complexity Classifier & Routing (Core)
**Goal:** Build the classification and routing system, wire it into dispatch.
| Complex preferences confuse users | Disabled by default; works with zero config if enabled (uses smart defaults) |
| Model not available in user's provider | Validation at preference load; falls back to configured model |
| Escalation loops | Max 1 escalation per unit; after that, use configured model |
## Estimated Token Savings
Based on typical GSD session patterns:
- ~30% of units are completion/summary (Tier 1 candidates)
- ~40% are research/standard planning (Tier 2 candidates)
- ~30% are complex execution (Tier 3, no downgrade)
If Haiku is ~10x cheaper than Opus and Sonnet is ~5x cheaper:
- **Conservative estimate:** 20-30% cost reduction with dynamic routing enabled
- **Aggressive estimate:** 40-50% for projects with many small tasks
## Resolved Design Decisions
All four open questions resolved as **yes** — folded into the plan as additional scope:
### 1. Post-unit hook classification — YES
Hooks get their own complexity classification. Most hooks are lightweight (validation, file checks) and should default to Tier 1. The existing `model` field on `PostUnitHookConfig` becomes the ceiling, same as phase models for units.
**Implementation:** Add to Phase 1d — extend `classifyUnitComplexity()` to accept hook metadata. Wire into hook dispatch at `auto.ts` lines 936-946.
### 2. Budget-pressure-aware routing — YES
As budget usage increases, the classifier becomes more aggressive about downgrading:
- **<50%budgetused:**Normalclassification
- **50-75% budget used:** Bump Tier 2 candidates down to Tier 1 where possible
- **75-90% budget used:** Only Tier 3 tasks get the configured model; everything else goes to cheapest available
**Implementation:** Add to Phase 1b — `classifyUnitComplexity()` takes `budgetPct` parameter from existing `getBudgetAlertLevel()` logic. New function `applyBudgetPressure(tier, budgetPct)` adjusts the tier.
### 3. Multi-provider cost routing — YES
When multiple providers are configured, the router should consider cost differences. If a user has both Anthropic and OpenRouter, pick the cheapest option for the resolved tier.
**Implementation:**
- Add `cost_per_1k_tokens` metadata to model registry (or maintain a lookup table for known models)
- New file: `src/resources/extensions/gsd/model-cost-table.ts` — static cost table for known models, updatable via preferences
-`resolveModelForComplexity()` ranks available models by cost within a tier's capability range
- Preference key: `dynamic_routing.cross_provider: true|false` (default: true when enabled)
**Risk:** Cost data goes stale. Mitigate with a bundled cost table that gets updated with GSD releases + user override capability.
### 4. User feedback loop — YES
After each unit completes, users can flag the output quality to improve future classification.
**Implementation (Phase 3 — Adaptive Learning):**
- Post-unit prompt option: user can react with `/gsd:rate-unit [over|under|ok]`
-`over` = "this could have used a simpler model" → records downgrade signal
-`under` = "this needed a better model" → records upgrade signal
-`ok` = confirms current tier was appropriate
- Feedback stored alongside outcome data in `.gsd/routing-history.json`
- Classifier weights feedback signals 2x vs. automatic success/failure detection
- Skill: `gsd:rate-unit` — simple command that tags the last completed unit
### Updated Preference Configuration
```yaml
---
version: 1
models:
research: claude-sonnet-4-6
planning: claude-opus-4-6
execution: claude-sonnet-4-6
completion: claude-sonnet-4-6
dynamic_routing:
enabled: true
tier_models:
light: claude-haiku-4-5
standard: claude-sonnet-4-6
# heavy: inherits from phase config (ceiling)
escalate_on_failure: true
budget_pressure: true # more aggressive downgrading as budget fills
cross_provider: true # consider cost across providers