singularity-forge/gitbook/features/token-optimization.md
ace-pm 35dc87ef53 chore: sync workspace state after rebrand
- Rebrand commits already in history (gsd → forge)
- Sync pre-existing doc, docker, and CI config updates
- All rebrand artifacts verified in place:
  * Native crates: forge-engine, forge-ast, forge-grep
  * Log prefixes: [forge] across 22+ files
  * Binary: ~/bin/sf-run
  * Workspace scopes: @sf-run/*, @singularity-forge/*
  * Nix flake: Rust toolchain ready

System ready for: nix develop && bun run build:native

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:54:20 +02:00

3.4 KiB

Token Optimization

SF's token optimization system can reduce token usage by 40-60% without sacrificing output quality. It has three pillars: token profiles, context compression, and complexity-based task routing.

Token Profiles

A token profile coordinates model selection, phase skipping, and context compression with a single setting:

token_profile: balanced

budget — Maximum Savings (40-60%)

Setting Value
Planning model Sonnet
Execution model Sonnet
Simple task model Haiku
Milestone research Skipped
Slice research Skipped
Roadmap reassessment Skipped
Context level Minimal

Best for: prototyping, small projects, well-understood codebases.

balanced — Smart Defaults (default)

Setting Value
All models User's default
Milestone research Runs
Slice research Skipped
Roadmap reassessment Runs
Context level Standard

Best for: most projects, day-to-day development.

quality — Full Context

Setting Value
All models User's configured defaults
All phases Run
Context level Full

Best for: complex architectures, greenfield projects, critical work.

Context Compression

Each profile controls how much context is pre-loaded into AI prompts:

Profile What's Included
budget Task plan and essential prior summaries only
balanced Task plan, summaries, slice plan, roadmap excerpt
quality Everything — all plans, summaries, decisions, requirements

Complexity-Based Task Routing

SF classifies each task by complexity and routes it to an appropriate model:

Complexity Indicators Model Level
Simple ≤3 steps, ≤3 files, short description Haiku-class
Standard 4-7 steps, 4-7 files Sonnet-class
Complex ≥8 steps, ≥8 files, complexity keywords Opus-class

Complexity keywords that prevent simple classification: refactor, migrate, integrate, architect, security, performance, concurrent, distributed, and others.

{% hint style="info" %} Dynamic routing requires models configured in your preferences and dynamic_routing.enabled: true. See Dynamic Model Routing. {% endhint %}

Overriding Profile Defaults

The token_profile sets defaults, but explicit preferences always win:

token_profile: budget
phases:
  skip_research: false        # override: keep research
models:
  planning: claude-opus-4-6   # override: use Opus for planning

Adaptive Learning

SF tracks success and failure of tier assignments over time. If a model tier's failure rate exceeds 20% for a given task type, future tasks of that type are bumped to a higher tier.

Submit manual feedback with:

/gsd rate over    # model was overpowered — use cheaper next time
/gsd rate ok      # model was appropriate
/gsd rate under   # model was too weak — use stronger next time

Observation Masking

During auto mode, old tool results are replaced with lightweight placeholders before each AI call. This reduces token usage between compactions with zero overhead.

context_management:
  observation_masking: true     # default: true
  observation_mask_turns: 8     # keep results from last 8 turns
  tool_result_max_chars: 800    # truncate large tool outputs