test: add end-to-end token optimization benchmark

Benchmark validates all optimization modules with realistic GSD content:
- Structured data: 20% decisions savings, 7% requirements savings
- Prompt compression: 5-17% across light/moderate/aggressive levels
- Semantic chunking: 73% content reduction via TF-IDF selection
- Summary distillation: 73% savings preserving structured fields
- Combined pipeline: 43% total savings on realistic dispatch prompt
- Cache efficiency: 94% cacheable prefix, 85% estimated Anthropic savings
- Provider-aware: 14% budget accuracy improvement for Anthropic vs OpenAI
This commit is contained in:
Jeremy McSpadden 2026-03-17 22:10:58 -05:00
parent d65da6c927
commit 4e7b3d486f

File diff suppressed because it is too large Load diff