New docs: - dynamic-model-routing.md — complexity classification, tier models, escalation, budget pressure, cost table, adaptive learning - captures-triage.md — fire-and-forget capture, triage pipeline, classification types, dashboard integration, worktree awareness - visualizer.md — four-tab TUI overlay (progress, deps, metrics, timeline), controls, auto-refresh, auto_visualize preference Updated docs: - README.md — added links to three new docs - commands.md — added capture, triage, visualize, knowledge, queue reorder - configuration.md — added dynamic_routing and auto_visualize settings, updated full example with new config options - auto-mode.md — added capture, visualize sections, dashboard badge, dynamic model routing reference - architecture.md — updated dispatch pipeline (routing + captures steps), added key modules table for v2.19 - cost-management.md — added dynamic routing and visualizer tips
93 lines
3.1 KiB
Markdown
93 lines
3.1 KiB
Markdown
# Cost Management
|
||
|
||
GSD tracks token usage and cost for every unit of work dispatched during auto mode. This data powers the dashboard, budget enforcement, and cost projections.
|
||
|
||
## Cost Tracking
|
||
|
||
Every unit's metrics are captured automatically:
|
||
|
||
- **Token counts** — input, output, cache read, cache write, total
|
||
- **Cost** — USD cost per unit
|
||
- **Duration** — wall-clock time
|
||
- **Tool calls** — number of tool invocations
|
||
- **Message counts** — assistant and user messages
|
||
|
||
Data is stored in `.gsd/metrics.json` and survives across sessions.
|
||
|
||
### Viewing Costs
|
||
|
||
**Dashboard:** `Ctrl+Alt+G` or `/gsd status` shows real-time cost breakdown.
|
||
|
||
**Aggregations available:**
|
||
- By phase (research, planning, execution, completion, reassessment)
|
||
- By slice (M001/S01, M001/S02, ...)
|
||
- By model (which models consumed the most budget)
|
||
- Project totals
|
||
|
||
## Budget Ceiling
|
||
|
||
Set a maximum spend for a project:
|
||
|
||
```yaml
|
||
---
|
||
version: 1
|
||
budget_ceiling: 50.00
|
||
---
|
||
```
|
||
|
||
### Enforcement Modes
|
||
|
||
Control what happens when the ceiling is reached:
|
||
|
||
```yaml
|
||
budget_enforcement: pause # default when ceiling is set
|
||
```
|
||
|
||
| Mode | Behavior |
|
||
|------|----------|
|
||
| `warn` | Log a warning, continue executing |
|
||
| `pause` | Pause auto mode, wait for user action |
|
||
| `halt` | Stop auto mode entirely |
|
||
|
||
## Cost Projections
|
||
|
||
Once at least two slices have completed, GSD projects the remaining cost:
|
||
|
||
```
|
||
Projected remaining: $12.40 ($6.20/slice avg × 2 remaining)
|
||
```
|
||
|
||
Projections use per-slice averages from completed work. If the budget ceiling has been reached, a warning is appended.
|
||
|
||
## Budget Pressure & Model Downgrading
|
||
|
||
When approaching the budget ceiling, the [complexity router](./token-optimization.md#budget-pressure) automatically downgrades model assignments to cheaper tiers. This is graduated:
|
||
|
||
- **< 50% used** — no adjustment
|
||
- **50-75% used** — standard tasks downgrade to light
|
||
- **75-90% used** — same, more aggressive
|
||
- **> 90% used** — nearly everything downgrades; only heavy tasks stay at standard
|
||
|
||
This ensures the budget is spread across remaining work instead of being exhausted early on complex tasks.
|
||
|
||
## Token Profiles & Cost
|
||
|
||
The `token_profile` preference directly affects cost:
|
||
|
||
| Profile | Typical Savings | How |
|
||
|---------|----------------|-----|
|
||
| `budget` | 40-60% | Cheaper models, phase skipping, minimal context |
|
||
| `balanced` | 10-20% | Default models, skip slice research, standard context |
|
||
| `quality` | 0% (baseline) | Full models, all phases, full context |
|
||
|
||
See [Token Optimization](./token-optimization.md) for details.
|
||
|
||
## Tips
|
||
|
||
- Start with `balanced` profile and a generous `budget_ceiling` to establish baseline costs
|
||
- Check `/gsd status` after a few slices to see per-slice cost averages
|
||
- Switch to `budget` profile for well-understood, repetitive work
|
||
- Use `quality` only when architectural decisions are being made
|
||
- Per-phase model selection lets you use Opus only for planning while keeping execution on Sonnet
|
||
- Enable `dynamic_routing` for automatic model downgrading on simple tasks — see [Dynamic Model Routing](./dynamic-model-routing.md)
|
||
- Use `/gsd visualize` → Metrics tab to see where your budget is going
|