singularity-forge/gitbook/features/token-optimization.md
ace-pm b29c12d5e5 refactor(native): rename gsd_parser.rs to forge_parser.rs
Final rebrand: rename remaining Rust source file to complete the gsd → forge
transition. All parser references already use forge_parser after earlier commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:58:21 +02:00

108 lines
3.4 KiB
Markdown

# Token Optimization
SF's token optimization system can reduce token usage by 40-60% without sacrificing output quality. It has three pillars: **token profiles**, **context compression**, and **complexity-based task routing**.
## Token Profiles
A token profile coordinates model selection, phase skipping, and context compression with a single setting:
```yaml
token_profile: balanced
```
### `budget` — Maximum Savings (40-60%)
| Setting | Value |
|---------|-------|
| Planning model | Sonnet |
| Execution model | Sonnet |
| Simple task model | Haiku |
| Milestone research | Skipped |
| Slice research | Skipped |
| Roadmap reassessment | Skipped |
| Context level | Minimal |
Best for: prototyping, small projects, well-understood codebases.
### `balanced` — Smart Defaults (default)
| Setting | Value |
|---------|-------|
| All models | User's default |
| Milestone research | Runs |
| Slice research | Skipped |
| Roadmap reassessment | Runs |
| Context level | Standard |
Best for: most projects, day-to-day development.
### `quality` — Full Context
| Setting | Value |
|---------|-------|
| All models | User's configured defaults |
| All phases | Run |
| Context level | Full |
Best for: complex architectures, greenfield projects, critical work.
## Context Compression
Each profile controls how much context is pre-loaded into AI prompts:
| Profile | What's Included |
|---------|----------------|
| `budget` | Task plan and essential prior summaries only |
| `balanced` | Task plan, summaries, slice plan, roadmap excerpt |
| `quality` | Everything — all plans, summaries, decisions, requirements |
## Complexity-Based Task Routing
SF classifies each task by complexity and routes it to an appropriate model:
| Complexity | Indicators | Model Level |
|-----------|------------|-------------|
| Simple | ≤3 steps, ≤3 files, short description | Haiku-class |
| Standard | 4-7 steps, 4-7 files | Sonnet-class |
| Complex | ≥8 steps, ≥8 files, complexity keywords | Opus-class |
**Complexity keywords** that prevent simple classification: `refactor`, `migrate`, `integrate`, `architect`, `security`, `performance`, `concurrent`, `distributed`, and others.
{% hint style="info" %}
Dynamic routing requires `models` configured in your preferences and `dynamic_routing.enabled: true`. See [Dynamic Model Routing](dynamic-model-routing.md).
{% endhint %}
## Overriding Profile Defaults
The `token_profile` sets defaults, but explicit preferences always win:
```yaml
token_profile: budget
phases:
skip_research: false # override: keep research
models:
planning: claude-opus-4-6 # override: use Opus for planning
```
## Adaptive Learning
SF tracks success and failure of tier assignments over time. If a model tier's failure rate exceeds 20% for a given task type, future tasks of that type are bumped to a higher tier.
Submit manual feedback with:
```
/sf rate over # model was overpowered — use cheaper next time
/sf rate ok # model was appropriate
/sf rate under # model was too weak — use stronger next time
```
## Observation Masking
During auto mode, old tool results are replaced with lightweight placeholders before each AI call. This reduces token usage between compactions with zero overhead.
```yaml
context_management:
observation_masking: true # default: true
observation_mask_turns: 8 # keep results from last 8 turns
tool_result_max_chars: 800 # truncate large tool outputs
```