singularity-forge/docs/dev/building-coding-agents/21-cost-quality-tradeoff-model-routing.md

# Cost-Quality Tradeoff & Model Routing

### The Key Insight

Quality requirements vary enormously across task types, but most systems use the same model for everything.

### The Optimal Model Routing Strategy (All 4 Agree)

| Task Type | Model Tier | Rationale |
|-----------|-----------|-----------|
| **Planning, architecture, critique** | Frontier (always) | Planning errors cascade through every downstream task |
| **Ambiguity resolution** | Frontier | Wrong interpretation = wasted execution |
| **Well-specified implementation** (CRUD, standard UI, utilities) | Mid-tier / capable but cheaper | Task is well-defined, patterns established |
| **Code review, test generation** | Mid-tier | Evaluating against known criteria, not generating novel solutions |
| **Summarization** (task records, manifest updates) | Lightest viable | Language competence, minimal reasoning depth |
| **Boilerplate** | Small/fast model | Predictable output, low reasoning requirements |

### The Non-Obvious Cost Optimization

> **Reducing wasted tokens is higher leverage than reducing token price.** A bloated context window costs money on every single call. Trimming 500 unnecessary tokens from context assembly saves more over a project than switching to a model that's 10% cheaper.

### Measurement

Track **cost-per-successful-task**, not cost-per-task. If the cheaper model requires twice as many iterations, it's not actually cheaper. Grok reports 60-70% cost reduction with zero quality loss when routing is done at the orchestrator level.

---
fix: restore PR files lost during merge conflict resolution Files added by PR #2008 that were not in main were dropped during the merge. Restore all src/, docs/, and scripts/ files from the pre-merge PR head. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> 2026-03-25 22:38:55 -06:00			`# Cost-Quality Tradeoff & Model Routing`

			`### The Key Insight`

			`Quality requirements vary enormously across task types, but most systems use the same model for everything.`

			`### The Optimal Model Routing Strategy (All 4 Agree)`

			`\| Task Type \| Model Tier \| Rationale \|`
			`\|-----------\|-----------\|-----------\|`
			`\| Planning, architecture, critique \| Frontier (always) \| Planning errors cascade through every downstream task \|`
			`\| Ambiguity resolution \| Frontier \| Wrong interpretation = wasted execution \|`
			`\| Well-specified implementation (CRUD, standard UI, utilities) \| Mid-tier / capable but cheaper \| Task is well-defined, patterns established \|`
			`\| Code review, test generation \| Mid-tier \| Evaluating against known criteria, not generating novel solutions \|`
			`\| Summarization (task records, manifest updates) \| Lightest viable \| Language competence, minimal reasoning depth \|`
			`\| Boilerplate \| Small/fast model \| Predictable output, low reasoning requirements \|`

			`### The Non-Obvious Cost Optimization`

			`> Reducing wasted tokens is higher leverage than reducing token price. A bloated context window costs money on every single call. Trimming 500 unnecessary tokens from context assembly saves more over a project than switching to a model that's 10% cheaper.`

			`### Measurement`

			`Track cost-per-successful-task, not cost-per-task. If the cheaper model requires twice as many iterations, it's not actually cheaper. Grok reports 60-70% cost reduction with zero quality loss when routing is done at the orchestrator level.`

			`---`