singularity-forge/docs/user-docs/dynamic-model-routing.md

# Dynamic Model Routing

*Introduced in v2.19.0. Capability scoring introduced in v2.52.0.*

Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.

Starting in v2.52.0, the router uses **capability-aware scoring** to select the *best fit* model for each task, not just the cheapest one in the tier.

## How It Works

Each unit dispatched by auto-mode passes through a two-stage pipeline:

**Stage 1: Complexity classification** — classifies the work into a tier (light/standard/heavy).

**Stage 2: Capability scoring** — within the eligible tier, ranks available models by how well their capabilities match the task's requirements.

The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.

| Tier | Typical Work | Default Model Level |
|------|-------------|-------------------|
| **Light** | Slice completion, UAT, hooks | Haiku-class |
| **Standard** | Research, planning, execution, milestone completion | Sonnet-class |
| **Heavy** | Replanning, roadmap reassessment, complex execution | Opus-class |

## Enabling

Dynamic routing is off by default. Enable it in preferences:

```yaml
---
version: 1
dynamic_routing:
  enabled: true
---
```

## Configuration

```yaml
dynamic_routing:
  enabled: true
  tier_models:                    # explicit model per tier (optional)
    light: claude-haiku-4-5
    standard: claude-sonnet-4-6
    heavy: claude-opus-4-6
  escalate_on_failure: true       # bump tier on task failure (default: true)
  budget_pressure: true           # auto-downgrade when approaching budget ceiling (default: true)
  cross_provider: true            # consider models from other providers (default: true)
  hooks: true                     # apply routing to post-unit hooks (default: true)
  capability_routing: true        # enable capability scoring within tier (default: true)
```

### `tier_models`

Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:

- **Light:** `claude-haiku-4-5`, `gpt-4o-mini`, `gemini-2.0-flash`
- **Standard:** `claude-sonnet-4-6`, `gpt-4o`, `gemini-2.5-pro`
- **Heavy:** `claude-opus-4-6`, `gpt-4.5-preview`, `gemini-2.5-pro`

### `escalate_on_failure`

When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.

### `budget_pressure`

When approaching the budget ceiling, the router progressively downgrades:

| Budget Used | Effect |
|------------|--------|
| < 50% | No adjustment |
| 50-75% | Standard → Light |
| 75-90% | More aggressive downgrading |
| > 90% | Nearly everything → Light; only Heavy stays at Standard |

### `cross_provider`

When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.

### `capability_routing`

When enabled (default: true), the router uses capability scoring to pick the best model in a tier rather than always defaulting to the cheapest. Set to `false` to revert to cheapest-in-tier behavior:

```yaml
dynamic_routing:
  enabled: true
  capability_routing: false   # disable scoring, use cheapest-in-tier
```

## Capability Profiles

Each model has a built-in **capability profile** — a 7-dimension score (0–100) representing how well it handles different task types:

| Dimension | What It Represents |
|-----------|-------------------|
| `coding` | Code generation and implementation accuracy |
| `debugging` | Diagnosing and fixing errors |
| `research` | Synthesizing information and exploring topics |
| `reasoning` | Multi-step logical reasoning |
| `speed` | Latency and throughput (inverse of capability depth) |
| `longContext` | Handling large codebases and long documents |
| `instruction` | Following structured instructions precisely |

**Built-in profiles** exist for 9 models: `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`, `gpt-4o`, `gpt-4o-mini`, `gemini-2.5-pro`, `gemini-2.0-flash`, `deepseek-chat`, `o3`.

Models without a built-in profile receive **uniform scores of 50** across all dimensions. This is a cold-start policy — unknown models compete but don't have an advantage. From the user's perspective, routing behaves the same as before capability scoring was introduced for those models.

**Profiles are heuristic rankings, not benchmarks.** They represent approximate relative strengths, not verified benchmark results. Use user overrides (below) to correct them for models you know well.

## How Scoring Works

The routing pipeline within a tier:

```
classify complexity tier
    ↓
filter eligible models for tier
    ↓
fire before_model_select hook (optional override)
    ↓
capability score eligible models
    ↓
select winner (or first eligible if scoring is disabled)
```

**Scoring formula:** weighted average of capability dimensions

```
score = Σ(weight × capability) / Σ(weights)
```

**Task requirements** are dynamic — different task types weight dimensions differently:

| Unit Type | Key Dimensions |
|-----------|---------------|
| `execute-task` | coding (0.9), instruction (0.7), speed (0.3) |
| `research-*` | research (0.9), longContext (0.7), reasoning (0.5) |
| `plan-*` | reasoning (0.9), coding (0.5) |
| `replan-slice` | reasoning (0.9), debugging (0.6), coding (0.5) |
| `complete-slice`, `run-uat` | instruction (0.8), speed (0.7) |

For `execute-task`, requirements are further refined by task metadata signals:
- Tags like `docs`, `config`, `readme` → boost instruction weight
- Keywords like `concurrency`, `compatibility` → boost debugging and reasoning
- Keywords like `migration`, `architecture` → boost reasoning and coding
- Large file counts (≥6) or large estimated line counts (≥500) → boost coding and reasoning

**Tie-breaking:** When two models score within 2 points of each other, the cheaper model wins. If costs are equal, lexicographic model ID breaks the tie (deterministic).

## User Overrides

Correct built-in capability profiles for models you know well using `modelOverrides` in your models configuration:

```json
{
  "providers": {
    "anthropic": {
      "modelOverrides": {
        "claude-sonnet-4-6": {
          "capabilities": {
            "debugging": 90,
            "research": 85
          }
        }
      }
    }
  }
}
```

Overrides are **deep-merged** with built-in defaults — only the specified dimensions are overridden; others retain their built-in values.

**Use case:** You've found that a model consistently outperforms its built-in profile on specific task types. Override the relevant dimensions to steer the router toward that model for those tasks.

## Verbose Output

When verbose mode is active, the router logs its routing decision. When capability scoring was used, the log includes a full scoring breakdown:

```
Dynamic routing [S]: claude-sonnet-4-6 (capability-scored) — claude-sonnet-4-6: 82.3, gpt-4o: 78.1, deepseek-chat: 72.0
```

When tier-only routing was used (scoring disabled, single eligible model, or routing guards applied):

```
Dynamic routing [S]: claude-sonnet-4-6 (standard complexity, multiple steps)
```

The `selectionMethod` field in the routing decision indicates which path was taken:
- `"capability-scored"` — capability scoring selected the winner
- `"tier-only"` — cheapest in tier (or explicit pin) was used

## Extension Hook

Extensions can intercept and override model selection using the `before_model_select` hook.

The hook fires **after** tier filtering (eligible models are known) and **before** capability scoring (scores have not been computed yet). A hook can override selection entirely or return `undefined` to let scoring proceed normally.

**Registering a handler:**

```typescript
pi.on("before_model_select", async (event) => {
  const { unitType, unitId, classification, taskMetadata, eligibleModels, phaseConfig } = event;

  // Custom routing strategy: always use gemini for research tasks
  if (unitType.startsWith("research-")) {
    const gemini = eligibleModels.find(id => id.includes("gemini"));
    if (gemini) return { modelId: gemini };
  }

  // Return undefined to let capability scoring proceed
  return undefined;
});
```

**Event payload:**

| Field | Type | Description |
|-------|------|-------------|
| `unitType` | `string` | The unit type being dispatched (e.g., `"execute-task"`) |
| `unitId` | `string` | Unique identifier for this unit dispatch |
| `classification` | `{ tier, reason, downgraded }` | The complexity classification result |
| `taskMetadata` | `Record<string, unknown> \| undefined` | Task metadata extracted from the unit plan |
| `eligibleModels` | `string[]` | Models eligible for the classified tier |
| `phaseConfig` | `{ primary, fallbacks } \| undefined` | The user's configured model for this phase |

**Return value:** `{ modelId: string }` to override selection, or `undefined` to defer to capability scoring.

**First-override-wins:** If multiple extensions register handlers, the first one to return a non-undefined result wins. Subsequent handlers are not called.

## Complexity Classification

Units are classified using pure heuristics — no LLM calls, sub-millisecond:

### Unit Type Defaults

| Unit Type | Default Tier |
|-----------|-------------|
| `complete-slice`, `run-uat` | Light |
| `research-*`, `plan-*`, `complete-milestone` | Standard |
| `execute-task` | Standard (upgraded by task analysis) |
| `replan-slice`, `reassess-roadmap` | Heavy |
| `hook/*` | Light |

### Task Plan Analysis

For `execute-task` units, the classifier analyzes the task plan:

| Signal | Simple → Light | Complex → Heavy |
|--------|---------------|----------------|
| Step count | ≤ 3 | ≥ 8 |
| File count | ≤ 3 | ≥ 8 |
| Description length | < 500 chars | > 2000 chars |
| Code blocks | — | ≥ 5 |
| Complexity keywords | None | Present |

**Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`, `distributed`, `backward compat`

### Adaptive Learning

The routing history (`.sf/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (`over`/`under`/`ok`) is weighted 2× vs automatic outcomes.

## Interaction with Token Profiles

Dynamic routing and token profiles are complementary:

- **Token profiles** (`budget`/`balanced`/`quality`) control phase skipping and context compression
- **Dynamic routing** controls per-unit model selection within the configured phase model

When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The `budget` token profile + dynamic routing provides maximum cost savings.

## Cost Table

The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):

| Model | Input | Output |
|-------|-------|--------|
| claude-haiku-4-5 | $0.80 | $4.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-opus-4-6 | $15.00 | $75.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4o | $2.50 | $10.00 |
| gemini-2.0-flash | $0.10 | $0.40 |

The cost table is used for comparison only — actual billing comes from your provider.
-												fix: restore PR files lost during merge conflict resolution

Files added by PR #2008 that were not in main were dropped during
the merge. Restore all src/, docs/, and scripts/ files from the
pre-merge PR head.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-25 22:38:55 -06:00
+								# Dynamic Model Routing
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								*Introduced in v2.19.0. Capability scoring introduced in v2.52.0.*
-												fix: restore PR files lost during merge conflict resolution

Files added by PR #2008 that were not in main were dropped during
the merge. Restore all src/, docs/, and scripts/ files from the
pre-merge PR head.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-25 22:38:55 -06:00
 								Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								Starting in v2.52.0, the router uses **capability-aware scoring** to select the *best fit* model for each task, not just the cheapest one in the tier.
-												fix: restore PR files lost during merge conflict resolution

Files added by PR #2008 that were not in main were dropped during
the merge. Restore all src/, docs/, and scripts/ files from the
pre-merge PR head.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-25 22:38:55 -06:00
+								## How It Works
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								Each unit dispatched by auto-mode passes through a two-stage pipeline:
 								**Stage 1: Complexity classification** — classifies the work into a tier (light/standard/heavy).
 								**Stage 2: Capability scoring** — within the eligible tier, ranks available models by how well their capabilities match the task's requirements.
 								The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
-												fix: restore PR files lost during merge conflict resolution

Files added by PR #2008 that were not in main were dropped during
the merge. Restore all src/, docs/, and scripts/ files from the
pre-merge PR head.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-25 22:38:55 -06:00
 								| Tier | Typical Work | Default Model Level |
 								|------|-------------|-------------------|
 								| **Light** | Slice completion, UAT, hooks | Haiku-class |
 								| **Standard** | Research, planning, execution, milestone completion | Sonnet-class |
 								| **Heavy** | Replanning, roadmap reassessment, complex execution | Opus-class |
 								## Enabling
 								Dynamic routing is off by default. Enable it in preferences:
 								```yaml
 								---
 								version: 1
 								dynamic_routing:
 								  enabled: true
 								---
 								```
 								## Configuration
 								```yaml
 								dynamic_routing:
 								  enabled: true
 								  tier_models:                    # explicit model per tier (optional)
 								    light: claude-haiku-4-5
 								    standard: claude-sonnet-4-6
 								    heavy: claude-opus-4-6
 								  escalate_on_failure: true       # bump tier on task failure (default: true)
 								  budget_pressure: true           # auto-downgrade when approaching budget ceiling (default: true)
 								  cross_provider: true            # consider models from other providers (default: true)
 								  hooks: true                     # apply routing to post-unit hooks (default: true)
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								  capability_routing: true        # enable capability scoring within tier (default: true)
-												fix: restore PR files lost during merge conflict resolution

Files added by PR #2008 that were not in main were dropped during
the merge. Restore all src/, docs/, and scripts/ files from the
pre-merge PR head.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-25 22:38:55 -06:00
+								```
 								### `tier_models`
 								Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:
 								- **Light:** `claude-haiku-4-5`, `gpt-4o-mini`, `gemini-2.0-flash`
 								- **Standard:** `claude-sonnet-4-6`, `gpt-4o`, `gemini-2.5-pro`
 								- **Heavy:** `claude-opus-4-6`, `gpt-4.5-preview`, `gemini-2.5-pro`
 								### `escalate_on_failure`
 								When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
 								### `budget_pressure`
 								When approaching the budget ceiling, the router progressively downgrades:
 								| Budget Used | Effect |
 								|------------|--------|
 								| < 50% | No adjustment |
 								| 50-75% | Standard → Light |
 								| 75-90% | More aggressive downgrading |
 								| > 90% | Nearly everything → Light; only Heavy stays at Standard |
 								### `cross_provider`
 								When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								### `capability_routing`
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								When enabled (default: true), the router uses capability scoring to pick the best model in a tier rather than always defaulting to the cheapest. Set to `false` to revert to cheapest-in-tier behavior:
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								```yaml
 								dynamic_routing:
 								  enabled: true
 								  capability_routing: false   # disable scoring, use cheapest-in-tier
 								```
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								## Capability Profiles
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								Each model has a built-in **capability profile** — a 7-dimension score (0–100) representing how well it handles different task types:
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								| Dimension | What It Represents |
 								|-----------|-------------------|
 								| `coding` | Code generation and implementation accuracy |
 								| `debugging` | Diagnosing and fixing errors |
 								| `research` | Synthesizing information and exploring topics |
 								| `reasoning` | Multi-step logical reasoning |
 								| `speed` | Latency and throughput (inverse of capability depth) |
 								| `longContext` | Handling large codebases and long documents |
 								| `instruction` | Following structured instructions precisely |
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								**Built-in profiles** exist for 9 models: `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`, `gpt-4o`, `gpt-4o-mini`, `gemini-2.5-pro`, `gemini-2.0-flash`, `deepseek-chat`, `o3`.
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								Models without a built-in profile receive **uniform scores of 50** across all dimensions. This is a cold-start policy — unknown models compete but don't have an advantage. From the user's perspective, routing behaves the same as before capability scoring was introduced for those models.
 								**Profiles are heuristic rankings, not benchmarks.** They represent approximate relative strengths, not verified benchmark results. Use user overrides (below) to correct them for models you know well.
 								## How Scoring Works
 								The routing pipeline within a tier:
 								```
 								classify complexity tier
 								    ↓
 								filter eligible models for tier
 								    ↓
 								fire before_model_select hook (optional override)
 								    ↓
 								capability score eligible models
 								    ↓
 								select winner (or first eligible if scoring is disabled)
 								```
 								**Scoring formula:** weighted average of capability dimensions
 								```
 								score = Σ(weight × capability) / Σ(weights)
 								```
 								**Task requirements** are dynamic — different task types weight dimensions differently:
 								| Unit Type | Key Dimensions |
 								|-----------|---------------|
 								| `execute-task` | coding (0.9), instruction (0.7), speed (0.3) |
 								| `research-*` | research (0.9), longContext (0.7), reasoning (0.5) |
 								| `plan-*` | reasoning (0.9), coding (0.5) |
 								| `replan-slice` | reasoning (0.9), debugging (0.6), coding (0.5) |
 								| `complete-slice`, `run-uat` | instruction (0.8), speed (0.7) |
 								For `execute-task`, requirements are further refined by task metadata signals:
 								- Tags like `docs`, `config`, `readme` → boost instruction weight
 								- Keywords like `concurrency`, `compatibility` → boost debugging and reasoning
 								- Keywords like `migration`, `architecture` → boost reasoning and coding
 								- Large file counts (≥6) or large estimated line counts (≥500) → boost coding and reasoning
 								**Tie-breaking:** When two models score within 2 points of each other, the cheaper model wins. If costs are equal, lexicographic model ID breaks the tie (deterministic).
 								## User Overrides
 								Correct built-in capability profiles for models you know well using `modelOverrides` in your models configuration:
 								```json
 								{
 								  "providers": {
 								    "anthropic": {
 								      "modelOverrides": {
 								        "claude-sonnet-4-6": {
 								          "capabilities": {
 								            "debugging": 90,
 								            "research": 85
 								          }
 								        }
 								      }
 								    }
 								  }
 								}
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
+								```
-												docs(01-05): update dynamic-model-routing.md with capability-aware routing features

- Add Capability Profiles section: 7 dimensions, 9 built-in profiles, uniform-50 cold-start
- Add How Scoring Works section: pipeline order, weighted average formula, task requirements table
- Add User Overrides section: modelOverrides JSON example, deep-merge semantics
- Update Configuration section: document capability_routing flag
- Add Verbose Output section: scoring breakdown format, selectionMethod field
- Add Extension Hook section: before_model_select payload, return value, first-override-wins

											
										
										
											2026-03-26 17:30:19 -05:00
+								Overrides are **deep-merged** with built-in defaults — only the specified dimensions are overridden; others retain their built-in values.
 								**Use case:** You've found that a model consistently outperforms its built-in profile on specific task types. Override the relevant dimensions to steer the router toward that model for those tasks.
 								## Verbose Output
 								When verbose mode is active, the router logs its routing decision. When capability scoring was used, the log includes a full scoring breakdown:
 								```
 								Dynamic routing [S]: claude-sonnet-4-6 (capability-scored) — claude-sonnet-4-6: 82.3, gpt-4o: 78.1, deepseek-chat: 72.0
 								```
 								When tier-only routing was used (scoring disabled, single eligible model, or routing guards applied):
 								```
 								Dynamic routing [S]: claude-sonnet-4-6 (standard complexity, multiple steps)
 								```
 								The `selectionMethod` field in the routing decision indicates which path was taken:
 								- `"capability-scored"` — capability scoring selected the winner
 								- `"tier-only"` — cheapest in tier (or explicit pin) was used
 								## Extension Hook
 								Extensions can intercept and override model selection using the `before_model_select` hook.
 								The hook fires **after** tier filtering (eligible models are known) and **before** capability scoring (scores have not been computed yet). A hook can override selection entirely or return `undefined` to let scoring proceed normally.
 								**Registering a handler:**
 								```typescript
 								pi.on("before_model_select", async (event) => {
 								  const { unitType, unitId, classification, taskMetadata, eligibleModels, phaseConfig } = event;
 								  // Custom routing strategy: always use gemini for research tasks
 								  if (unitType.startsWith("research-")) {
 								    const gemini = eligibleModels.find(id => id.includes("gemini"));
 								    if (gemini) return { modelId: gemini };
 								  }
 								  // Return undefined to let capability scoring proceed
 								  return undefined;
 								});
 								```
 								**Event payload:**
 								| Field | Type | Description |
 								|-------|------|-------------|
 								| `unitType` | `string` | The unit type being dispatched (e.g., `"execute-task"`) |
 								| `unitId` | `string` | Unique identifier for this unit dispatch |
 								| `classification` | `{ tier, reason, downgraded }` | The complexity classification result |
 								| `taskMetadata` | `Record<string, unknown> \| undefined` | Task metadata extracted from the unit plan |
 								| `eligibleModels` | `string[]` | Models eligible for the classified tier |
 								| `phaseConfig` | `{ primary, fallbacks } \| undefined` | The user's configured model for this phase |
 								**Return value:** `{ modelId: string }` to override selection, or `undefined` to defer to capability scoring.
 								**First-override-wins:** If multiple extensions register handlers, the first one to return a non-undefined result wins. Subsequent handlers are not called.
-												feat: GSD context optimization with model routing and context masking

* docs: add context optimization design spec, implementation plan, and pi-layer research

- Spec: 6-change design for GSD extension context optimization
- Plan: 9-task TDD implementation plan with exact file paths and code
- Pi-layer doc: 10 infrastructure opportunities (research only, not planned)

Part of #3171, #3406, #3452, #3433.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): add observation masking for auto-mode sessions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add phase handoff anchors for auto-mode

Introduces PhaseAnchor read/write utilities so downstream agents can
inherit decisions, blockers, and intent written at phase boundaries
without re-inferring from conversation history.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(context): add capability-aware model routing and context management preferences

Implement ADR-004 Phase 2 capability scoring with 7-dimension model
profiles, task requirement vectors, and weighted scoring. Add
ContextManagementConfig preferences for observation masking thresholds.
Wire capability scoring into auto-model-selection dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(context): wire observation masking, phase anchors, and tool truncation

Register observation masker in before_provider_request hook to replace
old tool results with placeholders during auto-mode. Add tool result
truncation (configurable via context_management.tool_result_max_chars).
Inject phase handoff anchors into prompt builders so downstream phases
inherit decisions from research/planning. Write anchors after successful
phase completion. Update ADR-004 status to Implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: remove internal planning artifacts from PR

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add capability routing, observation masking, and context management

Update dynamic-model-routing.md with capability-aware scoring section.
Update token-optimization.md with observation masking, tool truncation,
and phase handoff anchor documentation. Update configuration.md with
context_management preference block and capability_routing flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Merge branch 'main' into feat/gsd-context-optimization

* fix: add context_management to known keys and prevent tool truncation state corruption

- Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users
  don't get spurious unknown-key warnings when configuring it.
- Replace in-place mutation of tool result content with immutable spread
  to prevent corrupting shared conversation message objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add stop and backtrack to triage-ui classification labels

The Classification type gained stop and backtrack variants from main
but triage-ui.ts was not updated, causing a TypeScript build failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: context masker and tool truncation operate on correct pi-ai message format

The observation masker and tool result truncation in before_provider_request
were checking m.type === "toolResult" but the actual pi-ai payload uses
m.role === "toolResult" with content as TextContent[] arrays (not strings).
bashExecution messages are converted to {role:"user"} by convertToLlm before
the hook fires, so checking m.type === "bashExecution" was a no-op.

- Fix context-masker to match on role, handle array content, detect bash
  results by their "Ran `" prefix
- Fix register-hooks truncation to operate on role:"toolResult" with
  array content blocks
- Update tests to use correct pi-ai LLM payload format

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
											
										
										
											2026-04-04 01:02:35 -04:00
-												fix: restore PR files lost during merge conflict resolution

Files added by PR #2008 that were not in main were dropped during
the merge. Restore all src/, docs/, and scripts/ files from the
pre-merge PR head.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-25 22:38:55 -06:00
+								## Complexity Classification
 								Units are classified using pure heuristics — no LLM calls, sub-millisecond:
 								### Unit Type Defaults
 								| Unit Type | Default Tier |
 								|-----------|-------------|
 								| `complete-slice`, `run-uat` | Light |
 								| `research-*`, `plan-*`, `complete-milestone` | Standard |
 								| `execute-task` | Standard (upgraded by task analysis) |
 								| `replan-slice`, `reassess-roadmap` | Heavy |
 								| `hook/*` | Light |
 								### Task Plan Analysis
 								For `execute-task` units, the classifier analyzes the task plan:
 								| Signal | Simple → Light | Complex → Heavy |
 								|--------|---------------|----------------|
 								| Step count | ≤ 3 | ≥ 8 |
 								| File count | ≤ 3 | ≥ 8 |
 								| Description length | < 500 chars | > 2000 chars |
 								| Code blocks | — | ≥ 5 |
 								| Complexity keywords | None | Present |
 								**Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`, `distributed`, `backward compat`
 								### Adaptive Learning
-												refactor(native): rename gsd_parser.rs to forge_parser.rs

Final rebrand: rename remaining Rust source file to complete the gsd → forge
transition. All parser references already use forge_parser after earlier commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-15 14:58:21 +02:00
+								The routing history (`.sf/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (`over`/`under`/`ok`) is weighted 2× vs automatic outcomes.
-												fix: restore PR files lost during merge conflict resolution

Files added by PR #2008 that were not in main were dropped during
the merge. Restore all src/, docs/, and scripts/ files from the
pre-merge PR head.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-03-25 22:38:55 -06:00
 								## Interaction with Token Profiles
 								Dynamic routing and token profiles are complementary:
 								- **Token profiles** (`budget`/`balanced`/`quality`) control phase skipping and context compression
 								- **Dynamic routing** controls per-unit model selection within the configured phase model
 								When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The `budget` token profile + dynamic routing provides maximum cost savings.
 								## Cost Table
 								The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):
 								| Model | Input | Output |
 								|-------|-------|--------|
 								| claude-haiku-4-5 | $0.80 | $4.00 |
 								| claude-sonnet-4-6 | $3.00 | $15.00 |
 								| claude-opus-4-6 | $15.00 | $75.00 |
 								| gpt-4o-mini | $0.15 | $0.60 |
 								| gpt-4o | $2.50 | $10.00 |
 								| gemini-2.0-flash | $0.10 | $0.40 |
 								The cost table is used for comparison only — actual billing comes from your provider.