feat: GSD context optimization with model routing and context masking
* docs: add context optimization design spec, implementation plan, and pi-layer research - Spec: 6-change design for GSD extension context optimization - Plan: 9-task TDD implementation plan with exact file paths and code - Pi-layer doc: 10 infrastructure opportunities (research only, not planned) Part of #3171, #3406, #3452, #3433. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(context): add observation masking for auto-mode sessions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(context): add phase handoff anchors for auto-mode Introduces PhaseAnchor read/write utilities so downstream agents can inherit decisions, blockers, and intent written at phase boundaries without re-inferring from conversation history. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(context): add capability-aware model routing and context management preferences Implement ADR-004 Phase 2 capability scoring with 7-dimension model profiles, task requirement vectors, and weighted scoring. Add ContextManagementConfig preferences for observation masking thresholds. Wire capability scoring into auto-model-selection dispatch path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(context): wire observation masking, phase anchors, and tool truncation Register observation masker in before_provider_request hook to replace old tool results with placeholders during auto-mode. Add tool result truncation (configurable via context_management.tool_result_max_chars). Inject phase handoff anchors into prompt builders so downstream phases inherit decisions from research/planning. Write anchors after successful phase completion. Update ADR-004 status to Implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: remove internal planning artifacts from PR Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add capability routing, observation masking, and context management Update dynamic-model-routing.md with capability-aware scoring section. Update token-optimization.md with observation masking, tool truncation, and phase handoff anchor documentation. Update configuration.md with context_management preference block and capability_routing flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Merge branch 'main' into feat/gsd-context-optimization * fix: add context_management to known keys and prevent tool truncation state corruption - Add missing 'context_management' to KNOWN_PREFERENCE_KEYS set so users don't get spurious unknown-key warnings when configuring it. - Replace in-place mutation of tool result content with immutable spread to prevent corrupting shared conversation message objects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add stop and backtrack to triage-ui classification labels The Classification type gained stop and backtrack variants from main but triage-ui.ts was not updated, causing a TypeScript build failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: context masker and tool truncation operate on correct pi-ai message format The observation masker and tool result truncation in before_provider_request were checking m.type === "toolResult" but the actual pi-ai payload uses m.role === "toolResult" with content as TextContent[] arrays (not strings). bashExecution messages are converted to {role:"user"} by convertToLlm before the hook fires, so checking m.type === "bashExecution" was a no-op. - Fix context-masker to match on role, handle array content, detect bash results by their "Ran `" prefix - Fix register-hooks truncation to operate on role:"toolResult" with array content blocks - Update tests to use correct pi-ai LLM payload format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
bb47f5a087
commit
7d5bf63b2d
22 changed files with 1068 additions and 28 deletions
|
|
@ -1,8 +1,8 @@
|
|||
# ADR-004: Capability-Aware Model Routing
|
||||
|
||||
**Status:** Proposed (Revised)
|
||||
**Status:** Implemented (Phase 2)
|
||||
**Date:** 2026-03-26
|
||||
**Revised:** 2026-03-26
|
||||
**Revised:** 2026-04-03
|
||||
**Deciders:** Jeremy McSpadden
|
||||
**Related:** ADR-003 (pipeline simplification), [Issue #2655](https://github.com/gsd-build/gsd-2/issues/2655), `docs/dynamic-model-routing.md`
|
||||
|
||||
|
|
|
|||
|
|
@ -686,6 +686,7 @@ Complexity-based model routing. See [Dynamic Model Routing](./dynamic-model-rout
|
|||
```yaml
|
||||
dynamic_routing:
|
||||
enabled: true
|
||||
capability_routing: true # score models by task capability (v2.59)
|
||||
tier_models:
|
||||
light: claude-haiku-4-5
|
||||
standard: claude-sonnet-4-6
|
||||
|
|
@ -695,6 +696,18 @@ dynamic_routing:
|
|||
cross_provider: true
|
||||
```
|
||||
|
||||
### `context_management` (v2.59)
|
||||
|
||||
Controls observation masking and tool result truncation during auto-mode sessions. Reduces context bloat between compactions with zero LLM overhead.
|
||||
|
||||
```yaml
|
||||
context_management:
|
||||
observation_masking: true # replace old tool results with placeholders (default: true)
|
||||
observation_mask_turns: 8 # keep results from last N user turns (1-50, default: 8)
|
||||
compaction_threshold_percent: 0.70 # target compaction at 70% context usage (0.5-0.95, default: 0.70)
|
||||
tool_result_max_chars: 800 # cap individual tool result content (200-10000, default: 800)
|
||||
```
|
||||
|
||||
### `service_tier` (v2.42)
|
||||
|
||||
OpenAI service tier preference for supported models. Toggle with `/gsd fast`.
|
||||
|
|
|
|||
|
|
@ -70,6 +70,36 @@ When approaching the budget ceiling, the router progressively downgrades:
|
|||
|
||||
When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
|
||||
|
||||
## Capability-Aware Scoring
|
||||
|
||||
*Introduced in v2.59.0 (ADR-004 Phase 2)*
|
||||
|
||||
When `capability_routing` is enabled, the router goes beyond tier classification and scores models against task-specific capability requirements. Each known model has a 7-dimension profile:
|
||||
|
||||
| Dimension | What It Measures |
|
||||
|-----------|-----------------|
|
||||
| `coding` | Code generation, refactoring, implementation quality |
|
||||
| `debugging` | Error diagnosis, fix accuracy |
|
||||
| `research` | Information gathering, codebase exploration |
|
||||
| `reasoning` | Multi-step logic, architectural decisions |
|
||||
| `speed` | Response latency (inverse of cost) |
|
||||
| `longContext` | Performance with large context windows |
|
||||
| `instruction` | Adherence to structured instructions and templates |
|
||||
|
||||
Each unit type maps to a weighted requirement vector. For example, `execute-task` weights `coding: 0.9, reasoning: 0.6, debugging: 0.5` while `research-slice` weights `research: 0.9, reasoning: 0.7, longContext: 0.5`.
|
||||
|
||||
For `execute-task` units, the classifier also inspects task metadata (tags, description) to refine requirements. Documentation tasks boost `instruction` and lower `coding`; test tasks boost `debugging`.
|
||||
|
||||
Enable capability routing:
|
||||
|
||||
```yaml
|
||||
dynamic_routing:
|
||||
enabled: true
|
||||
capability_routing: true
|
||||
```
|
||||
|
||||
When enabled, models within the target tier are ranked by capability score rather than selected arbitrarily. When disabled (the default), the existing tier-only selection applies.
|
||||
|
||||
## Complexity Classification
|
||||
|
||||
Units are classified using pure heuristics — no LLM calls, sub-millisecond:
|
||||
|
|
|
|||
198
docs/pi-context-optimization-opportunities.md
Normal file
198
docs/pi-context-optimization-opportunities.md
Normal file
|
|
@ -0,0 +1,198 @@
|
|||
# pi-coding-agent: Context Optimization Opportunities
|
||||
|
||||
> **Status**: Research only — not planned for implementation.
|
||||
> Scope: `packages/pi-coding-agent` and `packages/pi-agent-core` infrastructure.
|
||||
> These changes would benefit every consumer of the pi engine, not just GSD.
|
||||
|
||||
---
|
||||
|
||||
## 1. Prompt Caching (`cache_control`) — Highest Impact
|
||||
|
||||
**Current state**: Every LLM call re-pays full input token cost for the system prompt, tool definitions, and context files. No `cache_control` breakpoints are set anywhere in the API call path.
|
||||
|
||||
**Opportunity**: Anthropic's KV cache delivers 90% cost reduction on cached tokens (0.1x input rate). Claude Code achieves 92–98% cache hit rates by placing stable content before volatile content.
|
||||
|
||||
**Where to instrument** (`packages/pi-ai/src/providers/anthropic.ts`):
|
||||
- Set `cache_control: { type: "ephemeral" }` on the last tool definition block
|
||||
- Set `cache_control` after the static system prompt sections (base boilerplate + context files)
|
||||
- Leave the per-turn user message uncached
|
||||
|
||||
**Critical constraint**: The cache breakpoint must be placed *after* all static content and *before* any dynamic content (timestamps, per-request variables). Moving a timestamp before a cache breakpoint defeats it on every call.
|
||||
|
||||
**Cache hierarchy**: Tools → system → messages. Changing a tool definition invalidates system and message caches. Tool definitions should be sorted deterministically (alphabetically) to prevent spurious cache misses.
|
||||
|
||||
**Expected savings**: 80–90% reduction in input token cost for multi-turn sessions (the dominant cost pattern in GSD auto-mode).
|
||||
|
||||
---
|
||||
|
||||
## 2. Observation Masking in the Message Pipeline
|
||||
|
||||
**Current state**: `agent-loop.ts` passes the full `context.messages` array to the LLM on every turn. Tool results from 50 turns ago are re-read in full on every subsequent call. The `transformContext` hook exists on `AgentContext` and fires before every LLM call, but has no default implementation — extensions are responsible for any pruning.
|
||||
|
||||
**Opportunity**: Replace old tool result content with lightweight placeholders after N turns. JetBrains Research tested this on SWE-bench Verified (500 tasks, up to 250-turn trajectories) and found:
|
||||
- 50%+ cost reduction vs. unmanaged history
|
||||
- Performance matched or slightly exceeded LLM summarization
|
||||
- Zero overhead (no extra LLM call required)
|
||||
|
||||
**Proposed implementation** (default `transformContext` in `pi-agent-core`):
|
||||
```typescript
|
||||
// Keep last KEEP_RECENT_TURNS verbatim; mask older tool results
|
||||
const KEEP_RECENT_TURNS = 8;
|
||||
|
||||
function defaultObservationMask(messages: AgentMessage[]): AgentMessage[] {
|
||||
const cutoff = findTurnBoundary(messages, KEEP_RECENT_TURNS);
|
||||
return messages.map((m, i) => {
|
||||
if (i >= cutoff) return m;
|
||||
if (m.type === "toolResult" || m.type === "bashExecution") {
|
||||
return { ...m, content: "[result masked — within summarized history]", excludeFromContext: false };
|
||||
}
|
||||
return m;
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
**Compaction interaction**: Observation masking reduces the token accumulation rate, pushing the compaction threshold further out. The two mechanisms are complementary — masking handles the steady state, compaction handles the rare deep-session case.
|
||||
|
||||
---
|
||||
|
||||
## 3. Earlier Compaction Threshold
|
||||
|
||||
**Current state** (`packages/pi-coding-agent/src/core/constants.ts`):
|
||||
```typescript
|
||||
COMPACTION_RESERVE_TOKENS = 16_384 // triggers at contextWindow - 16K
|
||||
COMPACTION_KEEP_RECENT_TOKENS = 20_000
|
||||
```
|
||||
|
||||
For a 200K context window, compaction fires at ~183K tokens — 91.5% utilization.
|
||||
|
||||
**Problem**: Context drift (not raw exhaustion) causes ~65% of enterprise agent failures. Performance degrades measurably beyond ~30K tokens per Zylos production data. The current threshold lets sessions run degraded for a long stretch before compaction fires.
|
||||
|
||||
**Opportunity**: Lower the trigger to 70% utilization. For a 200K window, this means compacting at ~140K tokens — 43K tokens earlier.
|
||||
|
||||
```typescript
|
||||
// Proposed
|
||||
COMPACTION_THRESHOLD_PERCENT = 0.70 // fire at 70% of contextWindow
|
||||
COMPACTION_RESERVE_TOKENS = contextWindow * (1 - COMPACTION_THRESHOLD_PERCENT)
|
||||
```
|
||||
|
||||
**Trade-off**: More frequent compactions, each happening earlier when there's more "fresh" content to keep. Summary quality improves because less material needs to be discarded at each cut.
|
||||
|
||||
---
|
||||
|
||||
## 4. Tool Result Truncation at Write Time
|
||||
|
||||
**Current state**: `TOOL_RESULT_MAX_CHARS = 2_000` in `constants.ts`, but this limit is only applied *during compaction summarization*, not when the tool result enters the message store. A bash result returning 50KB of log output is stored and re-sent verbatim until compaction fires.
|
||||
|
||||
**Opportunity**: Truncate at write time in `messages.ts` → `convertToLlm()` or in the tool result handler. Two strategies:
|
||||
|
||||
- **Hard truncation**: Slice at N chars, append `"\n[truncated — {original_length} chars]"`. Simple, zero overhead.
|
||||
- **Semantic head/tail**: Keep first 500 chars (context, command echo) + last 1000 chars (final output, errors). Better for bash results where the end contains the error.
|
||||
|
||||
**Recommendation**: Semantic head/tail as the default, configurable per tool type. File read results benefit from head; bash/test output benefits from head+tail.
|
||||
|
||||
---
|
||||
|
||||
## 5. Context File Deduplication and Trim
|
||||
|
||||
**Current state** (`packages/pi-coding-agent/src/core/resource-loader.ts`, lines 84–109):
|
||||
- Searches from `~/.gsd/agent/` → ancestor dirs → cwd
|
||||
- Deduplicates by *file path* but not by *content*
|
||||
- Entire file content concatenated verbatim into system prompt — no trimming, no summarization
|
||||
|
||||
**Anti-pattern**: A project with AGENTS.md at 3 ancestor levels (repo root, workspace, home) injects all three in full. If they share common boilerplate, that content is re-injected multiple times.
|
||||
|
||||
**Opportunities**:
|
||||
1. **Content deduplication**: Hash paragraph-level chunks; skip any chunk already seen in a previously-loaded file
|
||||
2. **Section-aware loading**: Parse `## ` headings in AGENTS.md; only include sections relevant to the current task type (e.g., `## Testing` section only when running tests)
|
||||
3. **Token budget enforcement**: If total context files exceed N tokens, summarize oldest/most-distant file rather than including verbatim
|
||||
|
||||
---
|
||||
|
||||
## 6. Skill Content Lazy Loading and Summarization
|
||||
|
||||
**Current state**: When `/skill:name` is invoked, the full skill file content is injected inline as `<skill>...</skill>` in the user message. No chunking, no summarization. A 10KB skill file adds ~2,500 tokens to that turn.
|
||||
|
||||
**Opportunity**:
|
||||
- **Cached skill injection**: If the same skill is used across multiple turns (rare but possible), it's re-injected each time. Cache with `cache_control` after first injection.
|
||||
- **Skill digest mode**: Inject a 200-token summary of the skill on first reference; full content only if the model requests it via a `get_skill_detail` tool call. Reduces cost for skills that don't end up being followed.
|
||||
- **Skill prefetching**: Before a known long session (e.g., auto-mode start), pre-inject all likely skills with `cache_control` so they're cached for the entire session.
|
||||
|
||||
---
|
||||
|
||||
## 7. Token Estimation Accuracy
|
||||
|
||||
**Current state** (`compaction.ts`, line 216): `chars / 4` heuristic. This overestimates token count for English prose (~3.5 chars/token) and underestimates for code with short identifiers or Unicode.
|
||||
|
||||
**Opportunity**: Use a proper tokenizer.
|
||||
- `@anthropic-ai/tokenizer` (tiktoken-compatible, ships with the SDK) — accurate but ~5ms per call
|
||||
- Tiered approach: use chars/4 for display; use proper tokenizer only for compaction threshold decisions (where accuracy matters)
|
||||
|
||||
**Impact**: More accurate compaction timing, fewer unnecessary compactions, slightly better `COMPACTION_KEEP_RECENT_TOKENS` boundary placement.
|
||||
|
||||
---
|
||||
|
||||
## 8. Format: Markdown over XML for Internal Context
|
||||
|
||||
**Current state**: The message pipeline uses `<skill>`, `<summary>`, `<compaction>` XML wrappers in several places. System prompt sections are largely prose Markdown.
|
||||
|
||||
**Findings**: XML tags carry 15–40% more tokens than equivalent Markdown for the same semantic content, due to paired open/close tags. However, Claude was optimized for XML and shows higher accuracy on tasks requiring precise section parsing.
|
||||
|
||||
**Recommendation**: Audit XML usage in the pipeline and convert to Markdown where the content is:
|
||||
- Non-nested (flat instructions, status messages)
|
||||
- Human-readable rather than machine-parsed by the model
|
||||
- Not requiring precise boundary detection
|
||||
|
||||
Keep XML for: few-shot examples with ambiguous boundaries, skill content (requires precise isolation from surrounding text), compaction summaries that the model must treat as authoritative history.
|
||||
|
||||
**Estimated savings**: 5–15% reduction in system prompt token count.
|
||||
|
||||
---
|
||||
|
||||
## 9. Dynamic Tool Set Delivery
|
||||
|
||||
**Current state**: All tool definitions are included in every LLM request. Tool descriptions consume 60–80% of input tokens in static configurations. As new extensions register tools, the baseline grows linearly.
|
||||
|
||||
**Opportunity** (higher complexity): Implement the three-function Dynamic Toolset pattern:
|
||||
1. `search_tools(query)` — semantic search over tool catalog
|
||||
2. `describe_tools(ids[])` — fetch full schemas on demand
|
||||
3. `execute_tool(id, params)` — unchanged execution
|
||||
|
||||
Speakeasy measured 91–97% token reduction with 100% task success rate. Trade-off: 2–3x more tool calls, ~50% longer wall time. Net cost dramatically lower.
|
||||
|
||||
**Feasibility for pi**: The tool registry (`packages/pi-coding-agent/src/core/tool-registry.ts`) already stores tool metadata separately from definitions. The primary engineering work is the semantic search index and the `describe_tools` / `search_tools` tool implementations.
|
||||
|
||||
---
|
||||
|
||||
## 10. Cost Attribution and Per-Phase Reporting
|
||||
|
||||
**Current state**: `SessionManager.getUsageTotals()` accumulates cost across the entire session. No per-phase or per-agent breakdown is stored. Cost visibility is limited to the footer total and `GSD_SHOW_TOKEN_COST=1` per-turn display.
|
||||
|
||||
**Opportunity**: Emit structured cost events that extensions can subscribe to:
|
||||
```typescript
|
||||
interface CostCheckpointEvent {
|
||||
type: "cost_checkpoint";
|
||||
label: string; // "discuss-phase", "execute-slice-3"
|
||||
deltaTokens: Usage; // tokens since last checkpoint
|
||||
cumulativeTokens: Usage;
|
||||
cumulativeCost: number;
|
||||
}
|
||||
```
|
||||
|
||||
GSD extension could consume these events to surface per-milestone cost in `/gsd stats` and flag milestones that are disproportionately expensive — enabling budget-aware planning.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Ordering (if pursued)
|
||||
|
||||
| Priority | Item | Effort | Expected Impact |
|
||||
|----------|------|--------|-----------------|
|
||||
| 1 | Prompt caching (`cache_control`) | Low | 80–90% input cost reduction |
|
||||
| 2 | Earlier compaction threshold (70%) | Trivial | Reduces drift in long sessions |
|
||||
| 3 | Tool result truncation at write time | Low | Reduces context bloat between compactions |
|
||||
| 4 | Context file deduplication | Medium | Variable — high for multi-level AGENTS.md setups |
|
||||
| 5 | Observation masking (default `transformContext`) | Medium | 50%+ on long-running agents |
|
||||
| 6 | Token estimation (proper tokenizer) | Low | Accuracy improvement, minor cost impact |
|
||||
| 7 | Markdown over XML audit | Low | 5–15% system prompt reduction |
|
||||
| 8 | Skill caching with `cache_control` | Low | Meaningful for skill-heavy sessions |
|
||||
| 9 | Dynamic tool set delivery | High | 90%+ on large tool catalogs; major architecture change |
|
||||
| 10 | Per-phase cost attribution events | Medium | Visibility only; enables future budget routing |
|
||||
|
|
@ -262,15 +262,59 @@ PREFERENCES.md
|
|||
├─ resolveProfileDefaults() → model defaults + phase skip defaults
|
||||
├─ resolveInlineLevel() → standard
|
||||
│ └─ prompt builders gate context inclusion by level
|
||||
└─ classifyUnitComplexity() → routes to execution/execution_simple model
|
||||
├─ task plan analysis (steps, files, signals)
|
||||
├─ unit type defaults
|
||||
├─ budget pressure adjustment
|
||||
└─ adaptive learning from routing-history.json
|
||||
├─ classifyUnitComplexity() → routes to execution/execution_simple model
|
||||
│ ├─ task plan analysis (steps, files, signals)
|
||||
│ ├─ unit type defaults
|
||||
│ ├─ budget pressure adjustment
|
||||
│ ├─ adaptive learning from routing-history.json
|
||||
│ └─ capability scoring (when capability_routing: true)
|
||||
│ └─ 7-dimension model profiles × task requirement vectors
|
||||
└─ context_management
|
||||
├─ observation masking (before_provider_request hook)
|
||||
├─ tool result truncation (tool_result_max_chars)
|
||||
└─ phase handoff anchors (injected into prompt builders)
|
||||
```
|
||||
|
||||
The profile is resolved once and flows through the entire dispatch pipeline. Explicit preferences override profile defaults at every layer.
|
||||
|
||||
## Observation Masking
|
||||
|
||||
*Introduced in v2.59.0*
|
||||
|
||||
During auto-mode sessions, tool results accumulate in the conversation history and consume context window space. Observation masking replaces tool result content older than N user turns with a lightweight placeholder before each LLM call. This reduces token usage with zero LLM overhead — no summarization calls, no latency.
|
||||
|
||||
Masking is enabled by default during auto-mode. Configure via preferences:
|
||||
|
||||
```yaml
|
||||
context_management:
|
||||
observation_masking: true # default: true (set false to disable)
|
||||
observation_mask_turns: 8 # keep results from last 8 user turns (range: 1-50)
|
||||
tool_result_max_chars: 800 # truncate individual tool results beyond this length
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
1. Before each provider request, the `before_provider_request` hook inspects the messages array
|
||||
2. Tool results (`toolResult`, `bashExecution`) older than the configured turn threshold are replaced with `[result masked — within summarized history]`
|
||||
3. Recent tool results (within the keep window) are preserved in full
|
||||
4. All assistant and user messages are always preserved — only tool result content is masked
|
||||
|
||||
This pairs with the existing compaction system: masking reduces context pressure between compactions, and compaction handles the full context reset when the window fills.
|
||||
|
||||
### Tool Result Truncation
|
||||
|
||||
Individual tool results that exceed `tool_result_max_chars` (default: 800) are truncated with a `…[truncated]` marker. This prevents a single large tool output from dominating the context window.
|
||||
|
||||
## Phase Handoff Anchors
|
||||
|
||||
*Introduced in v2.59.0*
|
||||
|
||||
When auto-mode transitions between phases (research → planning → execution), structured JSON anchors are written to `.gsd/milestones/<mid>/anchors/<phase>.json`. Downstream prompt builders inject these anchors so the next phase inherits intent, decisions, blockers, and next steps without re-inferring from artifact files.
|
||||
|
||||
This reduces context drift — the 65% of enterprise agent failures caused by agents losing track of prior decisions across phase boundaries.
|
||||
|
||||
Anchors are written automatically after successful completion of `research-milestone`, `research-slice`, `plan-milestone`, and `plan-slice` units. No configuration needed.
|
||||
|
||||
## Prompt Compression
|
||||
|
||||
*Introduced in v2.29.0*
|
||||
|
|
|
|||
|
|
@ -9,7 +9,7 @@ import type { ExtensionAPI, ExtensionContext } from "@gsd/pi-coding-agent";
|
|||
import type { GSDPreferences } from "./preferences.js";
|
||||
import { resolveModelWithFallbacksForUnit, resolveDynamicRoutingConfig } from "./preferences.js";
|
||||
import type { ComplexityTier } from "./complexity-classifier.js";
|
||||
import { classifyUnitComplexity, tierLabel } from "./complexity-classifier.js";
|
||||
import { classifyUnitComplexity, tierLabel, extractTaskMetadata } from "./complexity-classifier.js";
|
||||
import { resolveModelForComplexity, escalateTier } from "./model-router.js";
|
||||
import { getLedger, getProjectTotals } from "./metrics.js";
|
||||
import { unitPhaseLabel } from "./auto-dashboard.js";
|
||||
|
|
@ -107,7 +107,15 @@ export async function selectAndApplyModel(
|
|||
}
|
||||
}
|
||||
|
||||
const routingResult = resolveModelForComplexity(classification, modelConfig, routingConfig, availableModelIds);
|
||||
// Extract task metadata for capability scoring
|
||||
const taskMeta = unitType === "execute-task"
|
||||
? extractTaskMetadata(unitId, basePath)
|
||||
: undefined;
|
||||
|
||||
const routingResult = resolveModelForComplexity(
|
||||
classification, modelConfig, routingConfig, availableModelIds,
|
||||
unitType, taskMeta,
|
||||
);
|
||||
|
||||
if (routingResult.wasDowngraded) {
|
||||
effectiveModelConfig = {
|
||||
|
|
@ -115,8 +123,9 @@ export async function selectAndApplyModel(
|
|||
fallbacks: routingResult.fallbacks,
|
||||
};
|
||||
if (verbose) {
|
||||
const method = routingResult.selectionMethod === "capability-scored" ? "capability-scored" : "tier-only";
|
||||
ctx.ui.notify(
|
||||
`Dynamic routing [${tierLabel(classification.tier)}]: ${routingResult.modelId} (${classification.reason})`,
|
||||
`Dynamic routing [${tierLabel(classification.tier)}]: ${routingResult.modelId} (${method} — ${classification.reason})`,
|
||||
"info",
|
||||
);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -26,6 +26,7 @@ import { existsSync } from "node:fs";
|
|||
import { computeBudgets, resolveExecutorContextWindow, truncateAtSectionBoundary } from "./context-budget.js";
|
||||
import { getPendingGates } from "./gsd-db.js";
|
||||
import { formatDecisionsCompact, formatRequirementsCompact } from "./structured-data-formatter.js";
|
||||
import { readPhaseAnchor, formatAnchorForPrompt } from "./phase-anchor.js";
|
||||
|
||||
// ─── Preamble Cap ─────────────────────────────────────────────────────────────
|
||||
|
||||
|
|
@ -906,6 +907,11 @@ export async function buildPlanMilestonePrompt(mid: string, midTitle: string, ba
|
|||
const researchRel = relMilestoneFile(base, mid, "RESEARCH");
|
||||
|
||||
const inlined: string[] = [];
|
||||
|
||||
// Inject phase handoff anchor from research phase (if available)
|
||||
const researchAnchor = readPhaseAnchor(base, mid, "research-milestone");
|
||||
if (researchAnchor) inlined.push(formatAnchorForPrompt(researchAnchor));
|
||||
|
||||
inlined.push(await inlineFile(contextPath, contextRel, "Milestone Context"));
|
||||
const researchInline = await inlineFileOptional(researchPath, researchRel, "Milestone Research");
|
||||
if (researchInline) inlined.push(researchInline);
|
||||
|
|
@ -1033,6 +1039,11 @@ export async function buildPlanSlicePrompt(
|
|||
const researchRel = relSliceFile(base, mid, sid, "RESEARCH");
|
||||
|
||||
const inlined: string[] = [];
|
||||
|
||||
// Inject phase handoff anchor from research phase (if available)
|
||||
const researchSliceAnchor = readPhaseAnchor(base, mid, "research-slice");
|
||||
if (researchSliceAnchor) inlined.push(formatAnchorForPrompt(researchSliceAnchor));
|
||||
|
||||
inlined.push(await inlineFile(roadmapPath, roadmapRel, "Milestone Roadmap"));
|
||||
const researchInline = await inlineFileOptional(researchPath, researchRel, "Slice Research");
|
||||
if (researchInline) inlined.push(researchInline);
|
||||
|
|
@ -1100,6 +1111,9 @@ export async function buildExecuteTaskPrompt(
|
|||
: { level: level as InlineLevel | undefined };
|
||||
const inlineLevel = opts.level ?? resolveInlineLevel();
|
||||
|
||||
// Inject phase handoff anchor from planning phase (if available)
|
||||
const planAnchor = readPhaseAnchor(base, mid, "plan-slice");
|
||||
|
||||
const priorSummaries = opts.carryForwardPaths ?? await getPriorTaskSummaryPaths(mid, sid, tid, base);
|
||||
const priorLines = priorSummaries.length > 0
|
||||
? priorSummaries.map(p => `- \`${p}\``).join("\n")
|
||||
|
|
@ -1190,9 +1204,12 @@ export async function buildExecuteTaskPrompt(
|
|||
? `### Runtime Context\nSource: \`.gsd/RUNTIME.md\`\n\n${runtimeContent.trim()}`
|
||||
: "";
|
||||
|
||||
const phaseAnchorSection = planAnchor ? formatAnchorForPrompt(planAnchor) : "";
|
||||
|
||||
return loadPrompt("execute-task", {
|
||||
overridesSection,
|
||||
runtimeContext,
|
||||
phaseAnchorSection,
|
||||
workingDirectory: base,
|
||||
milestoneId: mid, sliceId: sid, sliceTitle: sTitle, taskId: tid, taskTitle: tTitle,
|
||||
planPath: join(base, relSliceFile(base, mid, sid, "PLAN")),
|
||||
|
|
|
|||
|
|
@ -1205,6 +1205,23 @@ export async function runUnitPhase(
|
|||
s.unitRecoveryCount.delete(`${unitType}/${unitId}`);
|
||||
}
|
||||
|
||||
// Write phase handoff anchor after successful research/planning completion
|
||||
const anchorPhases = new Set(["research-milestone", "research-slice", "plan-milestone", "plan-slice"]);
|
||||
if (artifactVerified && mid && anchorPhases.has(unitType)) {
|
||||
try {
|
||||
const { writePhaseAnchor } = await import("../phase-anchor.js");
|
||||
writePhaseAnchor(s.basePath, mid, {
|
||||
phase: unitType,
|
||||
milestoneId: mid,
|
||||
generatedAt: new Date().toISOString(),
|
||||
intent: `Completed ${unitType} for ${unitId}`,
|
||||
decisions: [],
|
||||
blockers: [],
|
||||
nextSteps: [],
|
||||
});
|
||||
} catch { /* non-fatal — anchor is advisory */ }
|
||||
}
|
||||
|
||||
deps.emitJournalEvent({ ts: new Date().toISOString(), flowId: ic.flowId, seq: ic.nextSeq(), eventType: "unit-end", data: { unitType, unitId, status: unitResult.status, artifactVerified, ...(unitResult.errorContext ? { errorContext: unitResult.errorContext } : {}) }, causedBy: { flowId: ic.flowId, seq: unitStartSeq } });
|
||||
|
||||
return { action: "next", data: { unitStartedAt: s.currentUnit?.startedAt } };
|
||||
|
|
|
|||
|
|
@ -263,13 +263,62 @@ export function registerHooks(pi: ExtensionAPI): void {
|
|||
});
|
||||
|
||||
pi.on("before_provider_request", async (event) => {
|
||||
const modelId = event.model?.id;
|
||||
if (!modelId) return;
|
||||
const { getEffectiveServiceTier, supportsServiceTier } = await import("../service-tier.js");
|
||||
const tier = getEffectiveServiceTier();
|
||||
if (!tier || !supportsServiceTier(modelId)) return;
|
||||
const payload = event.payload as Record<string, unknown> | null;
|
||||
if (!payload || typeof payload !== "object") return;
|
||||
|
||||
// ── Observation Masking ─────────────────────────────────────────────
|
||||
// Replace old tool results with placeholders to reduce context bloat.
|
||||
// Only active during auto-mode when context_management.observation_masking is enabled.
|
||||
if (isAutoActive()) {
|
||||
try {
|
||||
const { loadEffectiveGSDPreferences } = await import("../preferences.js");
|
||||
const prefs = loadEffectiveGSDPreferences();
|
||||
const cmConfig = prefs?.preferences.context_management;
|
||||
|
||||
// Observation masking: replace old tool results with placeholders
|
||||
if (cmConfig?.observation_masking !== false) {
|
||||
const keepTurns = cmConfig?.observation_mask_turns ?? 8;
|
||||
const { createObservationMask } = await import("../context-masker.js");
|
||||
const mask = createObservationMask(keepTurns);
|
||||
const messages = payload.messages;
|
||||
if (Array.isArray(messages)) {
|
||||
payload.messages = mask(messages);
|
||||
}
|
||||
}
|
||||
|
||||
// Tool result truncation: cap individual tool result content length.
|
||||
// In pi-ai format, toolResult messages have role: "toolResult" and content: TextContent[].
|
||||
// Creates new objects to avoid mutating shared conversation state.
|
||||
const maxChars = cmConfig?.tool_result_max_chars ?? 800;
|
||||
const msgs = payload.messages;
|
||||
if (Array.isArray(msgs)) {
|
||||
payload.messages = msgs.map((msg: Record<string, unknown>) => {
|
||||
// Match toolResult messages (role: "toolResult", content is array of content blocks)
|
||||
if (msg?.role === "toolResult" && Array.isArray(msg.content)) {
|
||||
const blocks = msg.content as Array<Record<string, unknown>>;
|
||||
const totalLen = blocks.reduce((sum: number, b) => sum + (typeof b.text === "string" ? b.text.length : 0), 0);
|
||||
if (totalLen > maxChars) {
|
||||
const truncated = blocks.map(b => {
|
||||
if (typeof b.text === "string" && b.text.length > maxChars) {
|
||||
return { ...b, text: b.text.slice(0, maxChars) + "\n…[truncated]" };
|
||||
}
|
||||
return b;
|
||||
});
|
||||
return { ...msg, content: truncated };
|
||||
}
|
||||
}
|
||||
return msg;
|
||||
});
|
||||
}
|
||||
} catch { /* non-fatal */ }
|
||||
}
|
||||
|
||||
// ── Service Tier ────────────────────────────────────────────────────
|
||||
const modelId = event.model?.id;
|
||||
if (!modelId) return payload;
|
||||
const { getEffectiveServiceTier, supportsServiceTier } = await import("../service-tier.js");
|
||||
const tier = getEffectiveServiceTier();
|
||||
if (!tier || !supportsServiceTier(modelId)) return payload;
|
||||
payload.service_tier = tier;
|
||||
return payload;
|
||||
});
|
||||
|
|
|
|||
|
|
@ -15,7 +15,7 @@ import { gsdRoot } from "./paths.js";
|
|||
|
||||
// ─── Types ────────────────────────────────────────────────────────────────────
|
||||
|
||||
export type Classification = "quick-task" | "inject" | "defer" | "replan" | "note";
|
||||
export type Classification = "quick-task" | "inject" | "defer" | "replan" | "note" | "stop" | "backtrack";
|
||||
|
||||
export interface CaptureEntry {
|
||||
id: string;
|
||||
|
|
@ -42,7 +42,7 @@ export interface TriageResult {
|
|||
|
||||
const CAPTURES_FILENAME = "CAPTURES.md";
|
||||
const VALID_CLASSIFICATIONS: readonly string[] = [
|
||||
"quick-task", "inject", "defer", "replan", "note",
|
||||
"quick-task", "inject", "defer", "replan", "note", "stop", "backtrack",
|
||||
];
|
||||
|
||||
// ─── Path Resolution ──────────────────────────────────────────────────────────
|
||||
|
|
|
|||
|
|
@ -212,7 +212,7 @@ function analyzePlanComplexity(
|
|||
/**
|
||||
* Extract task metadata from the task plan file on disk.
|
||||
*/
|
||||
function extractTaskMetadata(unitId: string, basePath: string): TaskMetadata {
|
||||
export function extractTaskMetadata(unitId: string, basePath: string): TaskMetadata {
|
||||
const meta: TaskMetadata = {};
|
||||
const { milestone: mid, slice: sid, task: tid } = parseUnitId(unitId);
|
||||
if (!mid || !sid || !tid) return meta;
|
||||
|
|
|
|||
74
src/resources/extensions/gsd/context-masker.ts
Normal file
74
src/resources/extensions/gsd/context-masker.ts
Normal file
|
|
@ -0,0 +1,74 @@
|
|||
/**
|
||||
* Observation masking for GSD auto-mode sessions.
|
||||
*
|
||||
* Replaces tool result content older than N turns with a placeholder.
|
||||
* Reduces context bloat between compactions with zero LLM overhead.
|
||||
* Preserves message ordering, roles, and all assistant/user messages.
|
||||
*
|
||||
* Operates on the pi-ai Message[] format (post-convertToLlm, pre-provider):
|
||||
* - toolResult messages: { role: "toolResult", content: TextContent[] }
|
||||
* - bash results are already converted to: { role: "user", content: [{type:"text",text:"..."}] }
|
||||
* and start with "Ran `" from bashExecutionToText.
|
||||
*/
|
||||
|
||||
interface MaskableMessage {
|
||||
role: string;
|
||||
content: unknown;
|
||||
type?: string;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
const MASK_PLACEHOLDER = "[result masked — within summarized history]";
|
||||
const MASK_CONTENT_BLOCK = [{ type: "text" as const, text: MASK_PLACEHOLDER }];
|
||||
|
||||
function findTurnBoundary(messages: MaskableMessage[], keepRecentTurns: number): number {
|
||||
let turnsSeen = 0;
|
||||
for (let i = messages.length - 1; i >= 0; i--) {
|
||||
const m = messages[i];
|
||||
// In the LLM payload, genuine user turns have role "user".
|
||||
// Tool results have role "toolResult" and are excluded by this check.
|
||||
if (m.role === "user") {
|
||||
// Skip bash-result user messages (converted from bashExecution) — these aren't real user turns
|
||||
if (isBashResultUserMessage(m)) continue;
|
||||
turnsSeen++;
|
||||
if (turnsSeen >= keepRecentTurns) return i;
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect user messages that originated from bashExecution.
|
||||
* After convertToLlm, these are {role: "user", content: [{type:"text", text:"Ran `cmd`\n..."}]}.
|
||||
* The bashExecutionToText format always starts with "Ran `".
|
||||
*/
|
||||
function isBashResultUserMessage(m: MaskableMessage): boolean {
|
||||
if (m.role !== "user" || !Array.isArray(m.content)) return false;
|
||||
const first = m.content[0];
|
||||
return first && typeof first === "object" && "text" in first &&
|
||||
typeof first.text === "string" && first.text.startsWith("Ran `");
|
||||
}
|
||||
|
||||
function isMaskableMessage(m: MaskableMessage): boolean {
|
||||
// Tool result messages (role: "toolResult" in pi-ai format)
|
||||
if (m.role === "toolResult") return true;
|
||||
// Bash-result user messages (converted from bashExecution by convertToLlm)
|
||||
if (isBashResultUserMessage(m)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
export function createObservationMask(keepRecentTurns: number = 8) {
|
||||
return (messages: MaskableMessage[]): MaskableMessage[] => {
|
||||
const boundary = findTurnBoundary(messages, keepRecentTurns);
|
||||
if (boundary === 0) return messages;
|
||||
|
||||
return messages.map((m, i) => {
|
||||
if (i >= boundary) return m;
|
||||
if (isMaskableMessage(m)) {
|
||||
// Content may be string or array of content blocks — always replace with array
|
||||
return { ...m, content: MASK_CONTENT_BLOCK };
|
||||
}
|
||||
return m;
|
||||
});
|
||||
};
|
||||
}
|
||||
|
|
@ -189,6 +189,13 @@ Setting `prefer_skills: []` does **not** disable skill discovery — it just mea
|
|||
- `budget_pressure`: boolean — downgrade model tier when budget is under pressure. Default: `true`.
|
||||
- `cross_provider`: boolean — allow routing across different providers. Default: `true`.
|
||||
- `hooks`: boolean — enable routing hooks. Default: `true`.
|
||||
- `capability_routing`: boolean — enable capability-profile scoring for model selection within a tier. Requires `enabled: true`. Default: `false`.
|
||||
|
||||
- `context_management`: configures context hygiene for auto-mode sessions. Keys:
|
||||
- `observation_masking`: boolean — mask old tool results to reduce context bloat. Default: `true`.
|
||||
- `observation_mask_turns`: number — keep this many recent turns verbatim (1-50). Default: `8`.
|
||||
- `compaction_threshold_percent`: number — trigger compaction at this % of context window (0.5-0.95). Lower values fire compaction earlier, reducing drift. Default: `0.70`.
|
||||
- `tool_result_max_chars`: number — max chars per tool result in GSD sessions (200-10000). Default: `800`.
|
||||
|
||||
- `auto_visualize`: boolean — show a visualizer hint after each milestone completion in auto-mode. Default: `false`.
|
||||
|
||||
|
|
|
|||
|
|
@ -10,6 +10,7 @@ import type { ResolvedModelConfig } from "./preferences.js";
|
|||
|
||||
export interface DynamicRoutingConfig {
|
||||
enabled?: boolean;
|
||||
capability_routing?: boolean; // default: false — enable capability profile scoring
|
||||
tier_models?: {
|
||||
light?: string;
|
||||
standard?: string;
|
||||
|
|
@ -32,6 +33,12 @@ export interface RoutingDecision {
|
|||
wasDowngraded: boolean;
|
||||
/** Human-readable reason for this decision */
|
||||
reason: string;
|
||||
/** How the model was selected. */
|
||||
selectionMethod?: "tier-only" | "capability-scored";
|
||||
/** Capability scores per model (when capability-scored). */
|
||||
capabilityScores?: Record<string, number>;
|
||||
/** Task requirement vector (when capability-scored). */
|
||||
taskRequirements?: Partial<Record<string, number>>;
|
||||
}
|
||||
|
||||
// ─── Known Model Tiers ───────────────────────────────────────────────────────
|
||||
|
|
@ -114,6 +121,91 @@ const MODEL_COST_PER_1K_INPUT: Record<string, number> = {
|
|||
"deepseek-chat": 0.00014,
|
||||
};
|
||||
|
||||
// ─── Capability Profiles (ADR-004 Phase 2) ──────────────────────────────────
|
||||
// 7-dimension profiles, 0–100 normalized. Models without a profile
|
||||
// score 50 uniformly — capability scoring is a no-op for them.
|
||||
|
||||
export interface ModelCapabilities {
|
||||
coding: number;
|
||||
debugging: number;
|
||||
research: number;
|
||||
reasoning: number;
|
||||
speed: number;
|
||||
longContext: number;
|
||||
instruction: number;
|
||||
}
|
||||
|
||||
export const MODEL_CAPABILITY_PROFILES: Record<string, ModelCapabilities> = {
|
||||
"claude-opus-4-6": { coding: 95, debugging: 90, research: 85, reasoning: 95, speed: 30, longContext: 80, instruction: 90 },
|
||||
"claude-sonnet-4-6": { coding: 85, debugging: 80, research: 75, reasoning: 80, speed: 60, longContext: 75, instruction: 85 },
|
||||
"claude-haiku-4-5": { coding: 60, debugging: 50, research: 45, reasoning: 50, speed: 95, longContext: 50, instruction: 75 },
|
||||
"gpt-4o": { coding: 80, debugging: 75, research: 70, reasoning: 75, speed: 65, longContext: 70, instruction: 80 },
|
||||
"gpt-4o-mini": { coding: 55, debugging: 45, research: 40, reasoning: 45, speed: 90, longContext: 45, instruction: 70 },
|
||||
"gemini-2.5-pro": { coding: 75, debugging: 70, research: 85, reasoning: 75, speed: 55, longContext: 90, instruction: 75 },
|
||||
"gemini-2.0-flash": { coding: 50, debugging: 40, research: 50, reasoning: 40, speed: 95, longContext: 60, instruction: 65 },
|
||||
"deepseek-chat": { coding: 75, debugging: 65, research: 55, reasoning: 70, speed: 70, longContext: 55, instruction: 65 },
|
||||
"o3": { coding: 80, debugging: 85, research: 80, reasoning: 92, speed: 25, longContext: 70, instruction: 85 },
|
||||
};
|
||||
|
||||
const BASE_REQUIREMENTS: Record<string, Partial<Record<keyof ModelCapabilities, number>>> = {
|
||||
"execute-task": { coding: 0.9, instruction: 0.7, speed: 0.3 },
|
||||
"research-milestone": { research: 0.9, longContext: 0.7, reasoning: 0.5 },
|
||||
"research-slice": { research: 0.9, longContext: 0.7, reasoning: 0.5 },
|
||||
"plan-milestone": { reasoning: 0.9, coding: 0.5 },
|
||||
"plan-slice": { reasoning: 0.9, coding: 0.5 },
|
||||
"replan-slice": { reasoning: 0.9, debugging: 0.6, coding: 0.5 },
|
||||
"reassess-roadmap": { reasoning: 0.9, research: 0.5 },
|
||||
"complete-slice": { instruction: 0.8, speed: 0.7 },
|
||||
"run-uat": { instruction: 0.7, speed: 0.8 },
|
||||
"discuss-milestone": { reasoning: 0.6, instruction: 0.7 },
|
||||
"complete-milestone": { instruction: 0.8, reasoning: 0.5 },
|
||||
};
|
||||
|
||||
/**
|
||||
* Compute a task requirement vector from unit type and optional metadata.
|
||||
*/
|
||||
export function computeTaskRequirements(
|
||||
unitType: string,
|
||||
metadata?: { tags?: string[]; complexityKeywords?: string[]; fileCount?: number; estimatedLines?: number },
|
||||
): Partial<Record<keyof ModelCapabilities, number>> {
|
||||
const base = { ...(BASE_REQUIREMENTS[unitType] ?? { reasoning: 0.5 }) };
|
||||
|
||||
if (unitType === "execute-task" && metadata) {
|
||||
if (metadata.tags?.some(t => /^(docs?|readme|comment|config|typo|rename)$/i.test(t))) {
|
||||
return { ...base, instruction: 0.9, coding: 0.3, speed: 0.7 };
|
||||
}
|
||||
if (metadata.complexityKeywords?.some(k => k === "concurrency" || k === "compatibility")) {
|
||||
return { ...base, debugging: 0.9, reasoning: 0.8 };
|
||||
}
|
||||
if (metadata.complexityKeywords?.some(k => k === "migration" || k === "architecture")) {
|
||||
return { ...base, reasoning: 0.9, coding: 0.8 };
|
||||
}
|
||||
if ((metadata.fileCount ?? 0) >= 6 || (metadata.estimatedLines ?? 0) >= 500) {
|
||||
return { ...base, coding: 0.9, reasoning: 0.7 };
|
||||
}
|
||||
}
|
||||
|
||||
return base;
|
||||
}
|
||||
|
||||
/**
|
||||
* Score a model against a task requirement vector.
|
||||
* Returns weighted average in range 0–100. Returns 50 for empty requirements.
|
||||
*/
|
||||
export function scoreModel(
|
||||
capabilities: ModelCapabilities,
|
||||
requirements: Partial<Record<keyof ModelCapabilities, number>>,
|
||||
): number {
|
||||
let weightedSum = 0;
|
||||
let weightSum = 0;
|
||||
for (const [dim, weight] of Object.entries(requirements)) {
|
||||
const capability = capabilities[dim as keyof ModelCapabilities] ?? 50;
|
||||
weightedSum += weight * capability;
|
||||
weightSum += weight;
|
||||
}
|
||||
return weightSum > 0 ? weightedSum / weightSum : 50;
|
||||
}
|
||||
|
||||
// ─── Public API ──────────────────────────────────────────────────────────────
|
||||
|
||||
/**
|
||||
|
|
@ -132,6 +224,8 @@ export function resolveModelForComplexity(
|
|||
phaseConfig: ResolvedModelConfig | undefined,
|
||||
routingConfig: DynamicRoutingConfig,
|
||||
availableModelIds: string[],
|
||||
unitType?: string,
|
||||
metadata?: { tags?: string[]; complexityKeywords?: string[]; fileCount?: number; estimatedLines?: number },
|
||||
): RoutingDecision {
|
||||
// If no phase config or routing disabled, pass through
|
||||
if (!phaseConfig || !routingConfig.enabled) {
|
||||
|
|
@ -175,25 +269,40 @@ export function resolveModelForComplexity(
|
|||
}
|
||||
|
||||
// Find the best model for the requested tier
|
||||
const targetModelId = findModelForTier(
|
||||
requestedTier,
|
||||
routingConfig,
|
||||
availableModelIds,
|
||||
routingConfig.cross_provider !== false,
|
||||
);
|
||||
const useCapabilityScoring = routingConfig.capability_routing && unitType;
|
||||
|
||||
let targetModelId: string | null;
|
||||
let capabilityScores: Record<string, number> | undefined;
|
||||
let taskRequirements: Partial<Record<string, number>> | undefined;
|
||||
let selectionMethod: "tier-only" | "capability-scored" = "tier-only";
|
||||
|
||||
if (useCapabilityScoring) {
|
||||
const result = findModelForTierWithCapability(
|
||||
requestedTier, routingConfig, availableModelIds,
|
||||
routingConfig.cross_provider !== false, unitType, metadata,
|
||||
);
|
||||
targetModelId = result.modelId;
|
||||
capabilityScores = Object.keys(result.scores).length > 0 ? result.scores : undefined;
|
||||
taskRequirements = Object.keys(result.requirements).length > 0 ? result.requirements : undefined;
|
||||
selectionMethod = capabilityScores ? "capability-scored" : "tier-only";
|
||||
} else {
|
||||
targetModelId = findModelForTier(
|
||||
requestedTier, routingConfig, availableModelIds,
|
||||
routingConfig.cross_provider !== false,
|
||||
);
|
||||
}
|
||||
|
||||
if (!targetModelId) {
|
||||
// No suitable model found — use configured primary
|
||||
return {
|
||||
modelId: configuredPrimary,
|
||||
fallbacks: phaseConfig.fallbacks,
|
||||
tier: requestedTier,
|
||||
wasDowngraded: false,
|
||||
reason: `no ${requestedTier}-tier model available`,
|
||||
selectionMethod,
|
||||
};
|
||||
}
|
||||
|
||||
// Build fallback chain: [downgraded_model, ...configured_fallbacks, configured_primary]
|
||||
const fallbacks = [
|
||||
...phaseConfig.fallbacks.filter(f => f !== targetModelId),
|
||||
configuredPrimary,
|
||||
|
|
@ -205,6 +314,9 @@ export function resolveModelForComplexity(
|
|||
tier: requestedTier,
|
||||
wasDowngraded: true,
|
||||
reason: classification.reason,
|
||||
selectionMethod,
|
||||
capabilityScores,
|
||||
taskRequirements,
|
||||
};
|
||||
}
|
||||
|
||||
|
|
@ -226,6 +338,7 @@ export function escalateTier(currentTier: ComplexityTier): ComplexityTier | null
|
|||
export function defaultRoutingConfig(): DynamicRoutingConfig {
|
||||
return {
|
||||
enabled: true,
|
||||
capability_routing: false,
|
||||
escalate_on_failure: true,
|
||||
budget_pressure: true,
|
||||
cross_provider: true,
|
||||
|
|
@ -298,6 +411,56 @@ function findModelForTier(
|
|||
return candidates[0] ?? null;
|
||||
}
|
||||
|
||||
function findModelForTierWithCapability(
|
||||
tier: ComplexityTier,
|
||||
config: DynamicRoutingConfig,
|
||||
availableModelIds: string[],
|
||||
crossProvider: boolean,
|
||||
unitType: string,
|
||||
metadata?: { tags?: string[]; complexityKeywords?: string[]; fileCount?: number; estimatedLines?: number },
|
||||
): { modelId: string | null; scores: Record<string, number>; requirements: Partial<Record<string, number>> } {
|
||||
const explicitModel = config.tier_models?.[tier];
|
||||
if (explicitModel) {
|
||||
const match = availableModelIds.find(id => {
|
||||
const bareAvail = id.includes("/") ? id.split("/").pop()! : id;
|
||||
const bareExplicit = explicitModel.includes("/") ? explicitModel.split("/").pop()! : explicitModel;
|
||||
return bareAvail === bareExplicit || id === explicitModel;
|
||||
});
|
||||
if (match) return { modelId: match, scores: {}, requirements: {} };
|
||||
}
|
||||
|
||||
const requirements = computeTaskRequirements(unitType, metadata);
|
||||
const candidates = availableModelIds.filter(id => getModelTier(id) === tier);
|
||||
if (candidates.length === 0) return { modelId: null, scores: {}, requirements };
|
||||
|
||||
const scores: Record<string, number> = {};
|
||||
for (const id of candidates) {
|
||||
const bareId = id.includes("/") ? id.split("/").pop()! : id;
|
||||
const profile = getModelProfile(bareId);
|
||||
scores[id] = scoreModel(profile, requirements);
|
||||
}
|
||||
|
||||
candidates.sort((a, b) => {
|
||||
const scoreDiff = scores[b] - scores[a];
|
||||
if (Math.abs(scoreDiff) > 2) return scoreDiff;
|
||||
if (crossProvider) {
|
||||
const costDiff = getModelCost(a) - getModelCost(b);
|
||||
if (costDiff !== 0) return costDiff;
|
||||
}
|
||||
return a.localeCompare(b);
|
||||
});
|
||||
|
||||
return { modelId: candidates[0], scores, requirements };
|
||||
}
|
||||
|
||||
function getModelProfile(bareId: string): ModelCapabilities {
|
||||
if (MODEL_CAPABILITY_PROFILES[bareId]) return MODEL_CAPABILITY_PROFILES[bareId];
|
||||
for (const [knownId, profile] of Object.entries(MODEL_CAPABILITY_PROFILES)) {
|
||||
if (bareId.includes(knownId) || knownId.includes(bareId)) return profile;
|
||||
}
|
||||
return { coding: 50, debugging: 50, research: 50, reasoning: 50, speed: 50, longContext: 50, instruction: 50 };
|
||||
}
|
||||
|
||||
function getModelCost(modelId: string): number {
|
||||
const bareId = modelId.includes("/") ? modelId.split("/").pop()! : modelId;
|
||||
|
||||
|
|
|
|||
71
src/resources/extensions/gsd/phase-anchor.ts
Normal file
71
src/resources/extensions/gsd/phase-anchor.ts
Normal file
|
|
@ -0,0 +1,71 @@
|
|||
/**
|
||||
* Phase handoff anchors — compact structured summaries written between
|
||||
* GSD auto-mode phases so downstream agents inherit decisions, blockers,
|
||||
* and intent without re-inferring from scratch.
|
||||
*/
|
||||
|
||||
import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { gsdRoot } from "./paths.js";
|
||||
|
||||
export interface PhaseAnchor {
|
||||
phase: string;
|
||||
milestoneId: string;
|
||||
generatedAt: string;
|
||||
intent: string;
|
||||
decisions: string[];
|
||||
blockers: string[];
|
||||
nextSteps: string[];
|
||||
}
|
||||
|
||||
function anchorsDir(basePath: string, milestoneId: string): string {
|
||||
return join(gsdRoot(basePath), "milestones", milestoneId, "anchors");
|
||||
}
|
||||
|
||||
function anchorPath(basePath: string, milestoneId: string, phase: string): string {
|
||||
return join(anchorsDir(basePath, milestoneId), `${phase}.json`);
|
||||
}
|
||||
|
||||
export function writePhaseAnchor(basePath: string, milestoneId: string, anchor: PhaseAnchor): void {
|
||||
const dir = anchorsDir(basePath, milestoneId);
|
||||
if (!existsSync(dir)) {
|
||||
mkdirSync(dir, { recursive: true });
|
||||
}
|
||||
writeFileSync(anchorPath(basePath, milestoneId, anchor.phase), JSON.stringify(anchor, null, 2), "utf-8");
|
||||
}
|
||||
|
||||
export function readPhaseAnchor(basePath: string, milestoneId: string, phase: string): PhaseAnchor | null {
|
||||
const path = anchorPath(basePath, milestoneId, phase);
|
||||
if (!existsSync(path)) return null;
|
||||
try {
|
||||
return JSON.parse(readFileSync(path, "utf-8")) as PhaseAnchor;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export function formatAnchorForPrompt(anchor: PhaseAnchor): string {
|
||||
const lines: string[] = [
|
||||
`## Handoff from ${anchor.phase}`,
|
||||
"",
|
||||
`**Intent:** ${anchor.intent}`,
|
||||
];
|
||||
|
||||
if (anchor.decisions.length > 0) {
|
||||
lines.push("", "**Decisions:**");
|
||||
for (const d of anchor.decisions) lines.push(`- ${d}`);
|
||||
}
|
||||
|
||||
if (anchor.blockers.length > 0) {
|
||||
lines.push("", "**Blockers:**");
|
||||
for (const b of anchor.blockers) lines.push(`- ${b}`);
|
||||
}
|
||||
|
||||
if (anchor.nextSteps.length > 0) {
|
||||
lines.push("", "**Next steps:**");
|
||||
for (const s of anchor.nextSteps) lines.push(`- ${s}`);
|
||||
}
|
||||
|
||||
lines.push("", "---");
|
||||
return lines.join("\n");
|
||||
}
|
||||
|
|
@ -21,6 +21,13 @@ import type {
|
|||
GateEvaluationConfig,
|
||||
} from "./types.js";
|
||||
import type { DynamicRoutingConfig } from "./model-router.js";
|
||||
|
||||
export interface ContextManagementConfig {
|
||||
observation_masking?: boolean; // default: true
|
||||
observation_mask_turns?: number; // default: 8, range: 1-50
|
||||
compaction_threshold_percent?: number; // default: 0.70, range: 0.5-0.95
|
||||
tool_result_max_chars?: number; // default: 800, range: 200-10000
|
||||
}
|
||||
import type { GitHubSyncConfig } from "../github-sync/types.js";
|
||||
|
||||
// ─── Workflow Modes ──────────────────────────────────────────────────────────
|
||||
|
|
@ -94,6 +101,7 @@ export const KNOWN_PREFERENCE_KEYS = new Set<string>([
|
|||
"forensics_dedup",
|
||||
"show_token_cost",
|
||||
"stale_commit_threshold_minutes",
|
||||
"context_management",
|
||||
"experimental",
|
||||
]);
|
||||
|
||||
|
|
@ -227,6 +235,7 @@ export interface GSDPreferences {
|
|||
post_unit_hooks?: PostUnitHookConfig[];
|
||||
pre_dispatch_hooks?: PreDispatchHookConfig[];
|
||||
dynamic_routing?: DynamicRoutingConfig;
|
||||
context_management?: ContextManagementConfig;
|
||||
token_profile?: TokenProfile;
|
||||
phases?: PhaseSkipPreferences;
|
||||
auto_visualize?: boolean;
|
||||
|
|
|
|||
|
|
@ -428,6 +428,10 @@ export function validatePreferences(preferences: GSDPreferences): {
|
|||
if (typeof dr.hooks === "boolean") validDr.hooks = dr.hooks;
|
||||
else errors.push("dynamic_routing.hooks must be a boolean");
|
||||
}
|
||||
if (dr.capability_routing !== undefined) {
|
||||
if (typeof dr.capability_routing === "boolean") validDr.capability_routing = dr.capability_routing;
|
||||
else errors.push("dynamic_routing.capability_routing must be a boolean");
|
||||
}
|
||||
if (dr.tier_models !== undefined) {
|
||||
if (typeof dr.tier_models === "object" && dr.tier_models !== null) {
|
||||
const tm = dr.tier_models as Record<string, unknown>;
|
||||
|
|
@ -452,6 +456,40 @@ export function validatePreferences(preferences: GSDPreferences): {
|
|||
}
|
||||
}
|
||||
|
||||
// ─── Context Management ──────────────────────────────────────────────
|
||||
if (preferences.context_management !== undefined) {
|
||||
if (typeof preferences.context_management === "object" && preferences.context_management !== null) {
|
||||
const cm = preferences.context_management as unknown as Record<string, unknown>;
|
||||
const validCm: Record<string, unknown> = {};
|
||||
|
||||
if (cm.observation_masking !== undefined) {
|
||||
if (typeof cm.observation_masking === "boolean") validCm.observation_masking = cm.observation_masking;
|
||||
else errors.push("context_management.observation_masking must be a boolean");
|
||||
}
|
||||
if (cm.observation_mask_turns !== undefined) {
|
||||
const turns = cm.observation_mask_turns;
|
||||
if (typeof turns === "number" && turns >= 1 && turns <= 50) validCm.observation_mask_turns = turns;
|
||||
else errors.push("context_management.observation_mask_turns must be a number between 1 and 50");
|
||||
}
|
||||
if (cm.compaction_threshold_percent !== undefined) {
|
||||
const pct = cm.compaction_threshold_percent;
|
||||
if (typeof pct === "number" && pct >= 0.5 && pct <= 0.95) validCm.compaction_threshold_percent = pct;
|
||||
else errors.push("context_management.compaction_threshold_percent must be a number between 0.5 and 0.95");
|
||||
}
|
||||
if (cm.tool_result_max_chars !== undefined) {
|
||||
const chars = cm.tool_result_max_chars;
|
||||
if (typeof chars === "number" && chars >= 200 && chars <= 10000) validCm.tool_result_max_chars = chars;
|
||||
else errors.push("context_management.tool_result_max_chars must be a number between 200 and 10000");
|
||||
}
|
||||
|
||||
if (Object.keys(validCm).length > 0) {
|
||||
validated.context_management = validCm as any;
|
||||
}
|
||||
} else {
|
||||
errors.push("context_management must be an object");
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Parallel Config ────────────────────────────────────────────────────
|
||||
if (preferences.parallel && typeof preferences.parallel === "object") {
|
||||
const p = preferences.parallel as unknown as Record<string, unknown>;
|
||||
|
|
|
|||
|
|
@ -12,6 +12,8 @@ A researcher explored the codebase and a planner decomposed the work — you are
|
|||
|
||||
{{runtimeContext}}
|
||||
|
||||
{{phaseAnchorSection}}
|
||||
|
||||
{{resumeSection}}
|
||||
|
||||
{{carryForwardSection}}
|
||||
|
|
|
|||
122
src/resources/extensions/gsd/tests/context-masker.test.ts
Normal file
122
src/resources/extensions/gsd/tests/context-masker.test.ts
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
import test from "node:test";
|
||||
import assert from "node:assert/strict";
|
||||
|
||||
import { createObservationMask } from "../context-masker.js";
|
||||
|
||||
// These helpers produce messages in the pi-ai LLM payload format
|
||||
// (post-convertToLlm, pre-provider), which is what before_provider_request sees.
|
||||
|
||||
function userMsg(content: string) {
|
||||
return { role: "user", content: [{ type: "text", text: content }] };
|
||||
}
|
||||
|
||||
function assistantMsg(content: string) {
|
||||
return { role: "assistant", content: [{ type: "text", text: content }] };
|
||||
}
|
||||
|
||||
/** toolResult in pi-ai format: role "toolResult", content as TextContent[] */
|
||||
function toolResult(text: string) {
|
||||
return { role: "toolResult", content: [{ type: "text", text }], toolCallId: "toolu_test", toolName: "Read", isError: false };
|
||||
}
|
||||
|
||||
/** bashExecution after convertToLlm: becomes a user message with "Ran `cmd`" prefix */
|
||||
function bashResult(text: string) {
|
||||
return { role: "user", content: [{ type: "text", text: `Ran \`echo test\`\n\`\`\`\n${text}\n\`\`\`` }] };
|
||||
}
|
||||
|
||||
const MASK_TEXT = "[result masked — within summarized history]";
|
||||
|
||||
test("masks nothing when message count is within keepRecentTurns", () => {
|
||||
const mask = createObservationMask(8);
|
||||
const messages = [
|
||||
userMsg("hello"),
|
||||
assistantMsg("hi"),
|
||||
toolResult("file contents"),
|
||||
];
|
||||
const result = mask(messages as any);
|
||||
assert.equal(result.length, 3);
|
||||
assert.deepEqual((result[2].content as any)[0].text, "file contents");
|
||||
});
|
||||
|
||||
test("masks tool results older than keepRecentTurns", () => {
|
||||
const mask = createObservationMask(2);
|
||||
const messages = [
|
||||
userMsg("turn 1"),
|
||||
toolResult("old tool output"),
|
||||
assistantMsg("response 1"),
|
||||
userMsg("turn 2"),
|
||||
toolResult("newer tool output"),
|
||||
assistantMsg("response 2"),
|
||||
userMsg("turn 3"),
|
||||
toolResult("newest tool output"),
|
||||
assistantMsg("response 3"),
|
||||
];
|
||||
const result = mask(messages as any);
|
||||
// Old tool result (before boundary) should be masked
|
||||
assert.equal((result[1].content as any)[0].text, MASK_TEXT);
|
||||
// Recent tool results (within keep window) should be preserved
|
||||
assert.equal((result[4].content as any)[0].text, "newer tool output");
|
||||
assert.equal((result[7].content as any)[0].text, "newest tool output");
|
||||
});
|
||||
|
||||
test("never masks assistant messages", () => {
|
||||
const mask = createObservationMask(1);
|
||||
const messages = [
|
||||
userMsg("turn 1"),
|
||||
assistantMsg("old reasoning"),
|
||||
userMsg("turn 2"),
|
||||
assistantMsg("new reasoning"),
|
||||
];
|
||||
const result = mask(messages as any);
|
||||
assert.equal((result[1].content as any)[0].text, "old reasoning");
|
||||
assert.equal((result[3].content as any)[0].text, "new reasoning");
|
||||
});
|
||||
|
||||
test("never masks user messages", () => {
|
||||
const mask = createObservationMask(1);
|
||||
const messages = [
|
||||
userMsg("old user message"),
|
||||
assistantMsg("response"),
|
||||
userMsg("new user message"),
|
||||
assistantMsg("response"),
|
||||
];
|
||||
const result = mask(messages as any);
|
||||
assert.equal((result[0].content as any)[0].text, "old user message");
|
||||
});
|
||||
|
||||
test("masks bash result user messages", () => {
|
||||
const mask = createObservationMask(1);
|
||||
const messages = [
|
||||
userMsg("turn 1"),
|
||||
bashResult("huge log output"),
|
||||
assistantMsg("response 1"),
|
||||
userMsg("turn 2"),
|
||||
assistantMsg("response 2"),
|
||||
];
|
||||
const result = mask(messages as any);
|
||||
assert.equal((result[1].content as any)[0].text, MASK_TEXT);
|
||||
});
|
||||
|
||||
test("returns same array length", () => {
|
||||
const mask = createObservationMask(1);
|
||||
const messages = [
|
||||
userMsg("a"), toolResult("b"), assistantMsg("c"),
|
||||
userMsg("d"), toolResult("e"), assistantMsg("f"),
|
||||
];
|
||||
const result = mask(messages as any);
|
||||
assert.equal(result.length, messages.length);
|
||||
});
|
||||
|
||||
test("masks toolResult by role, not by type field", () => {
|
||||
const mask = createObservationMask(1);
|
||||
const messages = [
|
||||
userMsg("turn 1"),
|
||||
// This is the actual pi-ai format: role "toolResult", no type field
|
||||
{ role: "toolResult", content: [{ type: "text", text: "old result" }], toolCallId: "t1", toolName: "Read", isError: false },
|
||||
assistantMsg("response 1"),
|
||||
userMsg("turn 2"),
|
||||
assistantMsg("response 2"),
|
||||
];
|
||||
const result = mask(messages as any);
|
||||
assert.equal((result[1].content as any)[0].text, MASK_TEXT);
|
||||
});
|
||||
|
|
@ -5,8 +5,11 @@ import {
|
|||
resolveModelForComplexity,
|
||||
escalateTier,
|
||||
defaultRoutingConfig,
|
||||
scoreModel,
|
||||
computeTaskRequirements,
|
||||
MODEL_CAPABILITY_PROFILES,
|
||||
} from "../model-router.js";
|
||||
import type { DynamicRoutingConfig, RoutingDecision } from "../model-router.js";
|
||||
import type { DynamicRoutingConfig, RoutingDecision, ModelCapabilities } from "../model-router.js";
|
||||
import type { ClassificationResult } from "../complexity-classifier.js";
|
||||
|
||||
// ─── Helpers ─────────────────────────────────────────────────────────────────
|
||||
|
|
@ -206,6 +209,89 @@ test("#2192: known model is still downgraded normally", () => {
|
|||
assert.notEqual(result.modelId, "claude-opus-4-6");
|
||||
});
|
||||
|
||||
// ─── Capability Scoring (ADR-004 Phase 2) ───────────────────────────────────
|
||||
|
||||
test("defaultRoutingConfig includes capability_routing: false", () => {
|
||||
const config = defaultRoutingConfig();
|
||||
assert.equal(config.capability_routing, false);
|
||||
});
|
||||
|
||||
test("scoreModel computes weighted average of capability × requirement", () => {
|
||||
const caps: ModelCapabilities = {
|
||||
coding: 90, debugging: 80, research: 70,
|
||||
reasoning: 85, speed: 50, longContext: 60, instruction: 75,
|
||||
};
|
||||
const reqs = { coding: 0.9, reasoning: 0.5 };
|
||||
const score = scoreModel(caps, reqs);
|
||||
// Expected: (0.9*90 + 0.5*85) / (0.9 + 0.5) = (81 + 42.5) / 1.4 = 88.21...
|
||||
assert.ok(Math.abs(score - 88.21) < 0.1, `score ${score} should be ~88.21`);
|
||||
});
|
||||
|
||||
test("scoreModel returns 50 for empty requirements", () => {
|
||||
const caps: ModelCapabilities = {
|
||||
coding: 90, debugging: 80, research: 70,
|
||||
reasoning: 85, speed: 50, longContext: 60, instruction: 75,
|
||||
};
|
||||
const score = scoreModel(caps, {});
|
||||
assert.equal(score, 50);
|
||||
});
|
||||
|
||||
test("computeTaskRequirements returns base vector for known unit type", () => {
|
||||
const reqs = computeTaskRequirements("execute-task");
|
||||
assert.ok(reqs.coding !== undefined && reqs.coding > 0);
|
||||
});
|
||||
|
||||
test("computeTaskRequirements boosts instruction for docs-tagged tasks", () => {
|
||||
const reqs = computeTaskRequirements("execute-task", { tags: ["docs"] });
|
||||
assert.ok((reqs.instruction ?? 0) >= 0.8);
|
||||
assert.ok((reqs.coding ?? 1) <= 0.4);
|
||||
});
|
||||
|
||||
test("computeTaskRequirements returns generic vector for unknown unit type", () => {
|
||||
const reqs = computeTaskRequirements("unknown-unit");
|
||||
assert.ok(reqs.reasoning !== undefined);
|
||||
});
|
||||
|
||||
test("resolveModelForComplexity uses capability scoring when enabled", () => {
|
||||
const config: DynamicRoutingConfig = {
|
||||
...defaultRoutingConfig(),
|
||||
enabled: true,
|
||||
capability_routing: true,
|
||||
};
|
||||
const result = resolveModelForComplexity(
|
||||
makeClassification("light"),
|
||||
{ primary: "claude-opus-4-6", fallbacks: [] },
|
||||
config,
|
||||
["claude-opus-4-6", "claude-haiku-4-5", "gpt-4o-mini"],
|
||||
"execute-task",
|
||||
);
|
||||
assert.equal(result.wasDowngraded, true);
|
||||
assert.equal(result.selectionMethod, "capability-scored");
|
||||
});
|
||||
|
||||
test("resolveModelForComplexity falls back to tier-only when capability_routing is false", () => {
|
||||
const config: DynamicRoutingConfig = {
|
||||
...defaultRoutingConfig(),
|
||||
enabled: true,
|
||||
capability_routing: false,
|
||||
};
|
||||
const result = resolveModelForComplexity(
|
||||
makeClassification("light"),
|
||||
{ primary: "claude-opus-4-6", fallbacks: [] },
|
||||
config,
|
||||
["claude-opus-4-6", "claude-haiku-4-5", "gpt-4o-mini"],
|
||||
);
|
||||
assert.equal(result.wasDowngraded, true);
|
||||
assert.ok(!result.selectionMethod || result.selectionMethod === "tier-only");
|
||||
});
|
||||
|
||||
test("MODEL_CAPABILITY_PROFILES has entries for core models", () => {
|
||||
const profiledModels = Object.keys(MODEL_CAPABILITY_PROFILES);
|
||||
assert.ok(profiledModels.length >= 9, `Expected ≥9 profiles, got ${profiledModels.length}`);
|
||||
assert.ok(MODEL_CAPABILITY_PROFILES["claude-opus-4-6"]);
|
||||
assert.ok(MODEL_CAPABILITY_PROFILES["claude-haiku-4-5"]);
|
||||
});
|
||||
|
||||
// ─── #2885: openai-codex and modern OpenAI models in tier map ────────────────
|
||||
|
||||
test("#2885: openai-codex light-tier models are recognized", () => {
|
||||
|
|
|
|||
83
src/resources/extensions/gsd/tests/phase-anchor.test.ts
Normal file
83
src/resources/extensions/gsd/tests/phase-anchor.test.ts
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
import test from "node:test";
|
||||
import assert from "node:assert/strict";
|
||||
import { mkdtempSync, mkdirSync, rmSync, existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { tmpdir } from "node:os";
|
||||
|
||||
import { writePhaseAnchor, readPhaseAnchor, formatAnchorForPrompt } from "../phase-anchor.js";
|
||||
import type { PhaseAnchor } from "../phase-anchor.js";
|
||||
|
||||
function makeTempBase(): string {
|
||||
const tmp = mkdtempSync(join(tmpdir(), "gsd-anchor-test-"));
|
||||
mkdirSync(join(tmp, ".gsd", "milestones", "M001", "anchors"), { recursive: true });
|
||||
return tmp;
|
||||
}
|
||||
|
||||
test("writePhaseAnchor creates anchor file in correct location", () => {
|
||||
const base = makeTempBase();
|
||||
try {
|
||||
const anchor: PhaseAnchor = {
|
||||
phase: "discuss",
|
||||
milestoneId: "M001",
|
||||
generatedAt: new Date().toISOString(),
|
||||
intent: "Define authentication requirements",
|
||||
decisions: ["Use JWT tokens", "Session expiry 24h"],
|
||||
blockers: [],
|
||||
nextSteps: ["Plan the implementation slices"],
|
||||
};
|
||||
writePhaseAnchor(base, "M001", anchor);
|
||||
assert.ok(existsSync(join(base, ".gsd", "milestones", "M001", "anchors", "discuss.json")));
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("readPhaseAnchor returns written anchor", () => {
|
||||
const base = makeTempBase();
|
||||
try {
|
||||
const anchor: PhaseAnchor = {
|
||||
phase: "plan",
|
||||
milestoneId: "M001",
|
||||
generatedAt: new Date().toISOString(),
|
||||
intent: "Break work into slices",
|
||||
decisions: ["3 slices: auth, UI, tests"],
|
||||
blockers: ["Need DB schema first"],
|
||||
nextSteps: ["Execute S01"],
|
||||
};
|
||||
writePhaseAnchor(base, "M001", anchor);
|
||||
const read = readPhaseAnchor(base, "M001", "plan");
|
||||
assert.ok(read);
|
||||
assert.equal(read!.intent, "Break work into slices");
|
||||
assert.deepEqual(read!.decisions, ["3 slices: auth, UI, tests"]);
|
||||
assert.deepEqual(read!.blockers, ["Need DB schema first"]);
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("readPhaseAnchor returns null when no anchor exists", () => {
|
||||
const base = makeTempBase();
|
||||
try {
|
||||
const read = readPhaseAnchor(base, "M001", "discuss");
|
||||
assert.equal(read, null);
|
||||
} finally {
|
||||
rmSync(base, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
test("formatAnchorForPrompt produces markdown block", () => {
|
||||
const anchor: PhaseAnchor = {
|
||||
phase: "discuss",
|
||||
milestoneId: "M001",
|
||||
generatedAt: "2026-04-03T00:00:00.000Z",
|
||||
intent: "Define requirements",
|
||||
decisions: ["Use JWT"],
|
||||
blockers: [],
|
||||
nextSteps: ["Plan slices"],
|
||||
};
|
||||
const md = formatAnchorForPrompt(anchor);
|
||||
assert.ok(md.includes("## Handoff from discuss"));
|
||||
assert.ok(md.includes("Define requirements"));
|
||||
assert.ok(md.includes("Use JWT"));
|
||||
assert.ok(md.includes("Plan slices"));
|
||||
});
|
||||
|
|
@ -49,10 +49,18 @@ const CLASSIFICATION_LABELS: Record<Classification, { label: string; description
|
|||
label: "Note",
|
||||
description: "Informational only — no action needed.",
|
||||
},
|
||||
"stop": {
|
||||
label: "Stop",
|
||||
description: "Halt current execution — a blocking issue requires resolution.",
|
||||
},
|
||||
"backtrack": {
|
||||
label: "Backtrack",
|
||||
description: "Undo recent steps and retry from an earlier checkpoint.",
|
||||
},
|
||||
};
|
||||
|
||||
const ALL_CLASSIFICATIONS: Classification[] = [
|
||||
"quick-task", "inject", "defer", "replan", "note",
|
||||
"quick-task", "inject", "defer", "replan", "note", "stop", "backtrack",
|
||||
];
|
||||
|
||||
// ─── Public API ───────────────────────────────────────────────────────────────
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue