- Lock solver model to kimi-k2.6 independent of unit-type router - Executor prompt no longer requires checkpoint tool call - Add dedicated solver pass that reads executor transcript and emits canonical checkpoint - Classify executor refusals as blocker outcomes (already partially implemented) - Classify no-op iterations (continue with zero work) as missing-checkpoint-retry - Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment Fixes sf-mp34nxb6-27zdx7
159 lines
11 KiB
Markdown
159 lines
11 KiB
Markdown
# ADR-0079: Autonomous Solver / Executor Separation
|
||
|
||
**Status:** Proposed
|
||
**Date:** 2026-05-12
|
||
**Stakeholders:** Autonomous mode, model router, checkpoint protocol, runtime safety
|
||
**Related:** `.sf/self-feedback.jsonl` entry `sf-mp34nxb6-27zdx7` (architecture-defect:solver-executor-conflation)
|
||
|
||
---
|
||
|
||
## Problem Statement
|
||
|
||
Today the autonomous loop conflates two distinct roles into a single LLM call:
|
||
|
||
1. **Executor** — does the unit work (read files, run tests, edit code).
|
||
2. **Autonomous solver** — observes what the executor produced and emits a canonical checkpoint to disk (`outcome`, `completedItems`, `remainingItems`, PDD, verification evidence).
|
||
|
||
Both roles are filled by the same model, picked by `model-router.js:computeTaskRequirements` from the unit type (`execute-task`, `plan-slice`, …). The router optimizes for the *executor's* job — cost, coding capability, speed — and may select a small coding-tuned model (Codestral, Devstral, Gemini Flash). Those models are *not* required to be agentic, refusal-resistant, or stable at protocol reasoning.
|
||
|
||
When the chosen model is incapable of the agentic role, the protocol breaks in a way the repair loop cannot fix:
|
||
|
||
- **2026-05-12 M001-6377a4/S04/T02:** `mistral/codestral-latest` was routed to execute T02 (Align TUI Dashboard with Headless Status Output). It emitted:
|
||
> "I'm sorry, but I currently don't have the necessary tools to assist with that specific request."
|
||
|
||
No tool was called. The runtime logged `Autonomous solver checkpoint missing … repair attempt 1/4 (mentioned-checkpoint-without-tool)`, then prompted the *same* Codestral with stronger "you MUST call the checkpoint tool" wording. Codestral dutifully called `Autonomous Checkpoint` with `outcome=continue` — and produced zero file edits, zero work. The protocol layer reported success; the slice made no progress.
|
||
|
||
The repair logic at `auto/phases-unit.js:720-890` only enforces **protocol shape** ("did the LLM emit a checkpoint tool call?"). It does not check **outcome** ("did the unit progress?") or **refusal** ("did the executor refuse the task?"). And because executor and solver are the same call, retrying the repair just re-asks the broken model.
|
||
|
||
## Goals
|
||
|
||
1. The protocol layer must remain functional even when the executor refuses or is incapable.
|
||
2. Refusals must surface as blockers that can escalate model tier — not silently synthesize forward progress.
|
||
3. No-op iterations (continue with zero work) must not satisfy the repair gate.
|
||
4. Solver model choice must be stable and independent of unit-type routing.
|
||
|
||
## Non-Goals
|
||
|
||
- Replacing the model router for executors. Routing per `unitType` remains; cheap/specialized models are still desirable for unit work.
|
||
- Mandating a specific solver vendor. The locked solver model is a pinned default; ops may override via preferences.
|
||
- Reworking the checkpoint schema. The same JSON shape persists; only *who emits it* changes.
|
||
|
||
## Proposed Architecture
|
||
|
||
### Two-Layer Loop
|
||
|
||
```
|
||
┌─────────────────────────────────────────┐
|
||
│ runUnit(ctx, unitType, unitId, prompt) │
|
||
└─────────────────────┬───────────────────┘
|
||
│
|
||
┌───────────────────────┴───────────────────────┐
|
||
│ │
|
||
▼ ▼
|
||
┌───────────────────────────┐ ┌───────────────────────────┐
|
||
│ EXECUTOR PASS │ │ SOLVER PASS │
|
||
│ model: routed per unit │ transcript → │ model: LOCKED kimi-k2.6 │
|
||
│ (Codestral, Gemini, ...) │ ────────────────▶ │ reads agent_end messages, │
|
||
│ does the unit work │ │ emits canonical checkpoint │
|
||
│ NO checkpoint tool needed │ │ classifies refusal/no-op │
|
||
└───────────────────────────┘ └─────────────┬─────────────┘
|
||
│
|
||
▼
|
||
┌───────────────────────────┐
|
||
│ appendAutonomousSolver- │
|
||
│ Checkpoint(basePath, …) │
|
||
└───────────────────────────┘
|
||
```
|
||
|
||
### Solver Model Selection
|
||
|
||
A new helper `resolveSolverModel(preferences)` returns the pinned solver model. It:
|
||
|
||
- Defaults to `kimi-k2.6` (provider: `kimi-coding`).
|
||
- Allows preference override via `preferences.autonomousSolver.model` (operator escape hatch).
|
||
- **Never** consults the unit-type router, benchmark selector, Bayesian blender, or learning aggregator. The solver's model is a runtime invariant, not an optimization target.
|
||
- Falls back along a small explicit chain (`kimi-k2.6` → `claude-sonnet-4-6` → `claude-opus-4-7`) if the primary is unreachable. Falls back to "synthesize blocker" if none reachable, rather than silently dropping the protocol layer.
|
||
|
||
### Solver Pass Contract
|
||
|
||
Input: `{ unitType, unitId, executorTranscript, lastIteration, projection }`.
|
||
|
||
Output (a checkpoint, written via `appendAutonomousSolverCheckpoint`):
|
||
|
||
```json
|
||
{
|
||
"outcome": "continue|complete|blocker",
|
||
"summary": "...",
|
||
"completedItems": [...],
|
||
"remainingItems": [...],
|
||
"verificationEvidence": [...],
|
||
"pdd": { "purpose": "...", "consumer": "...", ... },
|
||
"classification": "executor-refused|executor-noop|progress|complete|blocker-...",
|
||
"evidence": "string excerpts proving the classification"
|
||
}
|
||
```
|
||
|
||
The solver's prompt is a deterministic template at `prompts/autonomous-solver.md` that:
|
||
|
||
1. Embeds the executor transcript.
|
||
2. States the schema and outcome rules.
|
||
3. Includes the refusal/no-op classification rubric.
|
||
4. Instructs the solver to **never** propose code edits — its job is to observe, classify, and write the checkpoint.
|
||
|
||
### Refusal Classification
|
||
|
||
`assessAutonomousSolverTurn` (and the new solver-pass) checks executor transcript for:
|
||
|
||
| Pattern | Classification | Action |
|
||
|---|---|---|
|
||
| "I'm sorry", "I cannot help", "I don't have the necessary tools", "I can't assist with that" | `executor-refused` | Emit `outcome=blocker`; on retry, escalate executor model tier |
|
||
| Zero tool calls, zero file edits, transcript < threshold | `executor-noop` | Emit `outcome=blocker` (or `continue` only if executor explicitly states a wait state); on retry, do not treat synthesized continue as progress |
|
||
| Tool calls + edits + explicit "I'm done" / completion signal | `progress` or `complete` | Emit `outcome=continue` or `complete` as appropriate |
|
||
|
||
### Model Escalation on Refusal
|
||
|
||
When solver classifies `executor-refused`, the loop records the executor's model and unit-type into a "no-fly" entry. On the next iteration of the same unit, the router consults this list and selects the next tier up (Sonnet → Opus, or via a model-tier graph). After 2 escalations on the same unit, pause the loop with a hard blocker.
|
||
|
||
### Backward Compatibility
|
||
|
||
- The existing checkpoint shape is preserved; downstream consumers (`auto-post-unit.js`, journal events, learning aggregator) are unchanged.
|
||
- The "executor calls the checkpoint tool" path is retained as a **fast path**: if the executor *did* emit a valid checkpoint AND the solver agrees with its classification, the solver pass is a no-op rubber stamp. The solver only synthesizes when the executor failed to checkpoint or classified incorrectly.
|
||
- The `mentioned-checkpoint-without-tool` repair attempts collapse to zero — the solver is now the source of truth, so a missing executor checkpoint is normal, not a defect.
|
||
|
||
## Migration
|
||
|
||
### Step 1 — Pin solver model
|
||
|
||
Add `resolveSolverModel` to `model-router.js` (or a new `solver-model.js`). It does not participate in the router's capability scoring. Wire it into `runUnit`'s solver-pass invocation only.
|
||
|
||
### Step 2 — Add solver pass
|
||
|
||
After `runUnit` returns, before `assessAutonomousSolverTurn`, run the solver pass with the executor transcript. The solver pass writes the checkpoint directly. Executor checkpoint tool calls remain accepted but become advisory.
|
||
|
||
### Step 3 — Refusal classifier
|
||
|
||
Extend `classifyAutonomousSolverMissingCheckpointFailure` (rename to `classifyExecutorTurn`) to detect refusal patterns. Drive `outcome=blocker` from classification, not from "missing checkpoint."
|
||
|
||
### Step 4 — Model escalation
|
||
|
||
Add a per-(unitId, model) no-fly entry on `executor-refused`. Router consults the list during selection.
|
||
|
||
### Step 5 — Tests
|
||
|
||
Cover: pinned solver model invariant, refusal pattern detection, no-op detection, solver-pass checkpoint emission when executor is silent, fast-path bypass when executor emits a valid checkpoint, escalation chain.
|
||
|
||
## Risks
|
||
|
||
- **Solver-pass cost.** Adds one LLM call per unit. Mitigation: solver pass uses a smaller prompt (transcript summary only) and is skippable when executor emitted a valid checkpoint.
|
||
- **Locked model availability.** If `kimi-k2.6` is unreachable, solver pass fails. Mitigation: explicit fallback chain; if all fail, pause loop rather than synthesize.
|
||
- **Solver hallucination.** Solver could mis-classify and over-emit blockers. Mitigation: deterministic prompt template, classification rubric with example transcripts, and self-feedback when classification flips between iterations.
|
||
|
||
## Open Questions
|
||
|
||
1. Should the solver pass run *during* the executor turn (streaming observer) or *after* (post-turn observer)? Post-turn is simpler and proposed here; streaming would catch refusals earlier but adds complexity.
|
||
2. Should the solver pass also re-evaluate the executor's verification evidence (cite tests that actually exist, etc.) — i.e. become a partial verifier — or stay narrowly focused on checkpoint emission?
|
||
3. How does this interact with `keepSession: true` in `runUnit`? The solver pass is a separate session by definition; the executor session remains as-is.
|
||
|
||
## Decision Outcome (when accepted)
|
||
|
||
To be filled when the ADR is accepted. Initial cut targets steps 1–3 (pinned solver model + solver pass + refusal classifier). Steps 4–5 (escalation + tests) follow in a subsequent slice.
|