GSD 2.17 introduces a coordinated token optimization system that can reduce token usage by 40-60% without sacrificing output quality for most workloads. The system has three pillars: **token profiles**, **context compression**, and **complexity-based task routing**.
## Token Profiles
A token profile is a single preference that coordinates model selection, phase skipping, and context compression level. Set it in your preferences:
```yaml
---
version: 1
token_profile: balanced
---
```
Three profiles are available:
### `budget` — Maximum Savings (40-60% reduction)
Optimized for cost-sensitive workflows. Uses cheaper models, skips optional phases, and compresses dispatch context to the minimum needed.
GSD classifies each task by complexity and routes it to an appropriate model tier when dynamic routing is enabled. Simple documentation fixes use cheaper models while complex architectural work gets the reasoning power it needs.
> **Prerequisite:** Dynamic routing requires explicit `models` in your preferences. Without a `models` section, routing is skipped and the session's launch model is used for all phases. Token profiles set `models` automatically.
> **Ceiling behavior:** When dynamic routing is active, the model configured for each phase acts as a **ceiling**, not a fixed assignment. The router may downgrade to a cheaper model for simpler tasks but never upgrades beyond the configured model.
| Standard | `execution` | Sonnet / user default |
| Heavy | `execution` | Opus / user default |
Simple tasks use the `execution_simple` model key when configured. This is set automatically by the `budget` profile to Haiku.
### Budget Pressure
When approaching your budget ceiling, the classifier automatically downgrades tiers:
| Budget Used | Effect |
|------------|--------|
| <50%|Noadjustment|
| 50-75% | Standard → Light |
| 75-90% | Standard → Light |
| > 90% | Everything except Heavy → Light; Heavy → Standard |
This graduated approach preserves model quality for the most complex work while progressively reducing cost as the ceiling approaches.
## Adaptive Learning (Routing History)
GSD tracks the success and failure of each tier assignment over time and adjusts future classifications accordingly. This is opt-in — it happens automatically and persists in `.gsd/routing-history.json`.
### How It Works
1. After each unit completes, the outcome (success/failure) is recorded against the unit type and tier used
2. Outcomes are tracked per-pattern (e.g., `execute-task`, `execute-task:docs`) with a rolling window of the last 50 entries
3. If a tier's failure rate exceeds 20% for a given pattern, future classifications for that pattern are bumped up one tier
4. The system also accepts tag-specific patterns (e.g., `execute-task:test` vs `execute-task:frontend`) for more granular routing
GSD can apply deterministic prompt compression before falling back to section-boundary truncation. This preserves more information when context exceeds the budget.
### Compression Strategy
Set via preferences:
```yaml
---
version: 1
compression_strategy: compress
---
```
Two strategies are available:
| Strategy | Behavior | Default For |
|----------|----------|------------|
| `truncate` | Drop entire sections at boundaries (pre-v2.29 behavior) | `quality` profile |
| `compress` | Apply heuristic text compression first, then truncate if still over budget | `budget` and `balanced` profiles |
Compression removes redundant whitespace, abbreviates verbose phrases, deduplicates repeated content, and removes low-information boilerplate — all deterministically with no LLM calls.
| `smart` | Use TF-IDF semantic chunking for large files (>3KB), including only relevant portions | `budget` profile |
### Structured Data Compression
At `budget` and `balanced` inline levels, decisions and requirements are formatted in a compact notation that saves 30-50% tokens compared to full markdown tables.
### Summary Distillation
When a slice has 3+ dependency summaries and the total exceeds the summary budget, GSD extracts essential structured data (provides, requires, key_files, key_decisions) and drops verbose prose sections before falling back to section-boundary truncation.
### Cache Hit Rate Tracking
The metrics ledger now tracks `cacheHitRate` per unit (percentage of input tokens served from cache) and provides `aggregateCacheHitRate()` for session-wide cache performance.