* docs: add context optimization design spec, implementation plan, and pi-layer research - Spec: 6-change design for GSD extension context optimization - Plan: 9-task TDD implementation plan with exact file paths and code - Pi-layer doc: 10 infrastructure opportunities (research only, not planned) Part of #3171, #3406, #3452, #3433. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(context): add observation masking for auto-mode sessions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(context): add phase handoff anchors for auto-mode Introduces PhaseAnchor read/write utilities so downstream agents can inherit decisions, blockers, and intent written at phase boundaries without re-inferring from conversation history. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(context): add capability-aware model routing and context management preferences Implement ADR-004 Phase 2 capability scoring with 7-dimension model profiles, task requirement vectors, and weighted scoring. Add ContextManagementConfig preferences for observation masking thresholds. Wire capability scoring into auto-model-selection dispatch path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(context): wire observation masking, phase anchors, and tool truncation Register observation masker in before_provider_request hook to replace old tool results with placeholders during auto-mode. Add tool result truncation (configurable via context_management.tool_result_max_chars). Inject phase handoff anchors into prompt builders so downstream phases inherit decisions from research/planning. Write anchors after successful phase completion. Update ADR-004 status to Implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove internal planning artifacts from PR Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add capability routing, observation masking, and context management Update dynamic-model-routing.md with capability-aware scoring section. Update token-optimization.md with observation masking, tool truncation, and phase handoff anchor documentation. Update configuration.md with context_management preference block and capability_routing flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Merge branch 'main' into feat/gsd-context-optimization * fix: add context_management to known keys and prevent tool truncation state corruption - Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users don't get spurious unknown-key warnings when configuring it. - Replace in-place mutation of tool result content with immutable spread to prevent corrupting shared conversation message objects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add stop and backtrack to triage-ui classification labels The Classification type gained stop and backtrack variants from main but triage-ui.ts was not updated, causing a TypeScript build failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: context masker and tool truncation operate on correct pi-ai message format The observation masker and tool result truncation in before_provider_request were checking m.type === "toolResult" but the actual pi-ai payload uses m.role === "toolResult" with content as TextContent[] arrays (not strings). bashExecution messages are converted to {role:"user"} by convertToLlm before the hook fires, so checking m.type === "bashExecution" was a no-op. - Fix context-masker to match on role, handle array content, detect bash results by their "Ran `" prefix - Fix register-hooks truncation to operate on role:"toolResult" with array content blocks - Update tests to use correct pi-ai LLM payload format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
6.3 KiB
Dynamic Model Routing
Introduced in v2.19.0
Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
How It Works
Each unit dispatched by auto-mode is classified into a complexity tier:
| Tier | Typical Work | Default Model Level |
|---|---|---|
| Light | Slice completion, UAT, hooks | Haiku-class |
| Standard | Research, planning, execution, milestone completion | Sonnet-class |
| Heavy | Replanning, roadmap reassessment, complex execution | Opus-class |
The router then selects a model for that tier. The key rule: downgrade-only semantics. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
Enabling
Dynamic routing is off by default. Enable it in preferences:
---
version: 1
dynamic_routing:
enabled: true
---
Configuration
dynamic_routing:
enabled: true
tier_models: # explicit model per tier (optional)
light: claude-haiku-4-5
standard: claude-sonnet-4-6
heavy: claude-opus-4-6
escalate_on_failure: true # bump tier on task failure (default: true)
budget_pressure: true # auto-downgrade when approaching budget ceiling (default: true)
cross_provider: true # consider models from other providers (default: true)
hooks: true # apply routing to post-unit hooks (default: true)
tier_models
Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:
- Light:
claude-haiku-4-5,gpt-4o-mini,gemini-2.0-flash - Standard:
claude-sonnet-4-6,gpt-4o,gemini-2.5-pro - Heavy:
claude-opus-4-6,gpt-4.5-preview,gemini-2.5-pro
escalate_on_failure
When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
budget_pressure
When approaching the budget ceiling, the router progressively downgrades:
| Budget Used | Effect |
|---|---|
| < 50% | No adjustment |
| 50-75% | Standard → Light |
| 75-90% | More aggressive downgrading |
| > 90% | Nearly everything → Light; only Heavy stays at Standard |
cross_provider
When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
Capability-Aware Scoring
Introduced in v2.59.0 (ADR-004 Phase 2)
When capability_routing is enabled, the router goes beyond tier classification and scores models against task-specific capability requirements. Each known model has a 7-dimension profile:
| Dimension | What It Measures |
|---|---|
coding |
Code generation, refactoring, implementation quality |
debugging |
Error diagnosis, fix accuracy |
research |
Information gathering, codebase exploration |
reasoning |
Multi-step logic, architectural decisions |
speed |
Response latency (inverse of cost) |
longContext |
Performance with large context windows |
instruction |
Adherence to structured instructions and templates |
Each unit type maps to a weighted requirement vector. For example, execute-task weights coding: 0.9, reasoning: 0.6, debugging: 0.5 while research-slice weights research: 0.9, reasoning: 0.7, longContext: 0.5.
For execute-task units, the classifier also inspects task metadata (tags, description) to refine requirements. Documentation tasks boost instruction and lower coding; test tasks boost debugging.
Enable capability routing:
dynamic_routing:
enabled: true
capability_routing: true
When enabled, models within the target tier are ranked by capability score rather than selected arbitrarily. When disabled (the default), the existing tier-only selection applies.
Complexity Classification
Units are classified using pure heuristics — no LLM calls, sub-millisecond:
Unit Type Defaults
| Unit Type | Default Tier |
|---|---|
complete-slice, run-uat |
Light |
research-*, plan-*, complete-milestone |
Standard |
execute-task |
Standard (upgraded by task analysis) |
replan-slice, reassess-roadmap |
Heavy |
hook/* |
Light |
Task Plan Analysis
For execute-task units, the classifier analyzes the task plan:
| Signal | Simple → Light | Complex → Heavy |
|---|---|---|
| Step count | ≤ 3 | ≥ 8 |
| File count | ≤ 3 | ≥ 8 |
| Description length | < 500 chars | > 2000 chars |
| Code blocks | — | ≥ 5 |
| Complexity keywords | None | Present |
Complexity keywords: research, investigate, refactor, migrate, integrate, complex, architect, redesign, security, performance, concurrent, parallel, distributed, backward compat
Adaptive Learning
The routing history (.gsd/routing-history.json) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (over/under/ok) is weighted 2× vs automatic outcomes.
Interaction with Token Profiles
Dynamic routing and token profiles are complementary:
- Token profiles (
budget/balanced/quality) control phase skipping and context compression - Dynamic routing controls per-unit model selection within the configured phase model
When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The budget token profile + dynamic routing provides maximum cost savings.
Cost Table
The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):
| Model | Input | Output |
|---|---|---|
| claude-haiku-4-5 | $0.80 | $4.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-opus-4-6 | $15.00 | $75.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4o | $2.50 | $10.00 |
| gemini-2.0-flash | $0.10 | $0.40 |
The cost table is used for comparison only — actual billing comes from your provider.