singularity-forge/docs/dev/building-coding-agents/21-cost-quality-tradeoff-model-routing.md
Jeremy 872b0adb48 docs: reorganize into user-docs/ and dev/ subdirectories
Split flat docs/ into user-docs/ (guides, config, troubleshooting) and
dev/ (ADRs, architecture, extension guides, proposals). Updated
docs/README.md index to reflect new paths.
2026-04-10 09:25:31 -05:00

1.5 KiB

Cost-Quality Tradeoff & Model Routing

The Key Insight

Quality requirements vary enormously across task types, but most systems use the same model for everything.

The Optimal Model Routing Strategy (All 4 Agree)

Task Type Model Tier Rationale
Planning, architecture, critique Frontier (always) Planning errors cascade through every downstream task
Ambiguity resolution Frontier Wrong interpretation = wasted execution
Well-specified implementation (CRUD, standard UI, utilities) Mid-tier / capable but cheaper Task is well-defined, patterns established
Code review, test generation Mid-tier Evaluating against known criteria, not generating novel solutions
Summarization (task records, manifest updates) Lightest viable Language competence, minimal reasoning depth
Boilerplate Small/fast model Predictable output, low reasoning requirements

The Non-Obvious Cost Optimization

Reducing wasted tokens is higher leverage than reducing token price. A bloated context window costs money on every single call. Trimming 500 unnecessary tokens from context assembly saves more over a project than switching to a model that's 10% cheaper.

Measurement

Track cost-per-successful-task, not cost-per-task. If the cheaper model requires twice as many iterations, it's not actually cheaper. Grok reports 60-70% cost reduction with zero quality loss when routing is done at the orchestrator level.