singularity-forge/docs/dynamic-model-routing.md
Tom Boucher 7d5bf63b2d feat: GSD context optimization with model routing and context masking
* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 01:02:35 -04:00

157 lines
6.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dynamic Model Routing
*Introduced in v2.19.0*
Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
## How It Works
Each unit dispatched by auto-mode is classified into a complexity tier:
| Tier | Typical Work | Default Model Level |
|------|-------------|-------------------|
| **Light** | Slice completion, UAT, hooks | Haiku-class |
| **Standard** | Research, planning, execution, milestone completion | Sonnet-class |
| **Heavy** | Replanning, roadmap reassessment, complex execution | Opus-class |
The router then selects a model for that tier. The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
## Enabling
Dynamic routing is off by default. Enable it in preferences:
```yaml
---
version: 1
dynamic_routing:
enabled: true
---
```
## Configuration
```yaml
dynamic_routing:
enabled: true
tier_models: # explicit model per tier (optional)
light: claude-haiku-4-5
standard: claude-sonnet-4-6
heavy: claude-opus-4-6
escalate_on_failure: true # bump tier on task failure (default: true)
budget_pressure: true # auto-downgrade when approaching budget ceiling (default: true)
cross_provider: true # consider models from other providers (default: true)
hooks: true # apply routing to post-unit hooks (default: true)
```
### `tier_models`
Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:
- **Light:** `claude-haiku-4-5`, `gpt-4o-mini`, `gemini-2.0-flash`
- **Standard:** `claude-sonnet-4-6`, `gpt-4o`, `gemini-2.5-pro`
- **Heavy:** `claude-opus-4-6`, `gpt-4.5-preview`, `gemini-2.5-pro`
### `escalate_on_failure`
When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
### `budget_pressure`
When approaching the budget ceiling, the router progressively downgrades:
| Budget Used | Effect |
|------------|--------|
| < 50% | No adjustment |
| 50-75% | Standard Light |
| 75-90% | More aggressive downgrading |
| > 90% | Nearly everything → Light; only Heavy stays at Standard |
### `cross_provider`
When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
## Capability-Aware Scoring
*Introduced in v2.59.0 (ADR-004 Phase 2)*
When `capability_routing` is enabled, the router goes beyond tier classification and scores models against task-specific capability requirements. Each known model has a 7-dimension profile:
| Dimension | What It Measures |
|-----------|-----------------|
| `coding` | Code generation, refactoring, implementation quality |
| `debugging` | Error diagnosis, fix accuracy |
| `research` | Information gathering, codebase exploration |
| `reasoning` | Multi-step logic, architectural decisions |
| `speed` | Response latency (inverse of cost) |
| `longContext` | Performance with large context windows |
| `instruction` | Adherence to structured instructions and templates |
Each unit type maps to a weighted requirement vector. For example, `execute-task` weights `coding: 0.9, reasoning: 0.6, debugging: 0.5` while `research-slice` weights `research: 0.9, reasoning: 0.7, longContext: 0.5`.
For `execute-task` units, the classifier also inspects task metadata (tags, description) to refine requirements. Documentation tasks boost `instruction` and lower `coding`; test tasks boost `debugging`.
Enable capability routing:
```yaml
dynamic_routing:
enabled: true
capability_routing: true
```
When enabled, models within the target tier are ranked by capability score rather than selected arbitrarily. When disabled (the default), the existing tier-only selection applies.
## Complexity Classification
Units are classified using pure heuristics — no LLM calls, sub-millisecond:
### Unit Type Defaults
| Unit Type | Default Tier |
|-----------|-------------|
| `complete-slice`, `run-uat` | Light |
| `research-*`, `plan-*`, `complete-milestone` | Standard |
| `execute-task` | Standard (upgraded by task analysis) |
| `replan-slice`, `reassess-roadmap` | Heavy |
| `hook/*` | Light |
### Task Plan Analysis
For `execute-task` units, the classifier analyzes the task plan:
| Signal | Simple → Light | Complex → Heavy |
|--------|---------------|----------------|
| Step count | ≤ 3 | ≥ 8 |
| File count | ≤ 3 | ≥ 8 |
| Description length | < 500 chars | > 2000 chars |
| Code blocks | — | ≥ 5 |
| Complexity keywords | None | Present |
**Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`, `distributed`, `backward compat`
### Adaptive Learning
The routing history (`.gsd/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (`over`/`under`/`ok`) is weighted 2× vs automatic outcomes.
## Interaction with Token Profiles
Dynamic routing and token profiles are complementary:
- **Token profiles** (`budget`/`balanced`/`quality`) control phase skipping and context compression
- **Dynamic routing** controls per-unit model selection within the configured phase model
When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The `budget` token profile + dynamic routing provides maximum cost savings.
## Cost Table
The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):
| Model | Input | Output |
|-------|-------|--------|
| claude-haiku-4-5 | $0.80 | $4.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-opus-4-6 | $15.00 | $75.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4o | $2.50 | $10.00 |
| gemini-2.0-flash | $0.10 | $0.40 |
The cost table is used for comparison only — actual billing comes from your provider.