# ADR-0079: Autonomous Solver / Executor Separation **Status:** Proposed **Date:** 2026-05-12 **Stakeholders:** Autonomous mode, model router, checkpoint protocol, runtime safety **Related:** `.sf/self-feedback.jsonl` entry `sf-mp34nxb6-27zdx7` (architecture-defect:solver-executor-conflation) --- ## Problem Statement Today the autonomous loop conflates two distinct roles into a single LLM call: 1. **Executor** — does the unit work (read files, run tests, edit code). 2. **Autonomous solver** — observes what the executor produced and emits a canonical checkpoint to disk (`outcome`, `completedItems`, `remainingItems`, PDD, verification evidence). Both roles are filled by the same model, picked by `model-router.js:computeTaskRequirements` from the unit type (`execute-task`, `plan-slice`, …). The router optimizes for the *executor's* job — cost, coding capability, speed — and may select a small coding-tuned model (Codestral, Devstral, Gemini Flash). Those models are *not* required to be agentic, refusal-resistant, or stable at protocol reasoning. When the chosen model is incapable of the agentic role, the protocol breaks in a way the repair loop cannot fix: - **2026-05-12 M001-6377a4/S04/T02:** `mistral/codestral-latest` was routed to execute T02 (Align TUI Dashboard with Headless Status Output). It emitted: > "I'm sorry, but I currently don't have the necessary tools to assist with that specific request." No tool was called. The runtime logged `Autonomous solver checkpoint missing … repair attempt 1/4 (mentioned-checkpoint-without-tool)`, then prompted the *same* Codestral with stronger "you MUST call the checkpoint tool" wording. Codestral dutifully called `Autonomous Checkpoint` with `outcome=continue` — and produced zero file edits, zero work. The protocol layer reported success; the slice made no progress. The repair logic at `auto/phases-unit.js:720-890` only enforces **protocol shape** ("did the LLM emit a checkpoint tool call?"). It does not check **outcome** ("did the unit progress?") or **refusal** ("did the executor refuse the task?"). And because executor and solver are the same call, retrying the repair just re-asks the broken model. ## Goals 1. The protocol layer must remain functional even when the executor refuses or is incapable. 2. Refusals must surface as blockers that can escalate model tier — not silently synthesize forward progress. 3. No-op iterations (continue with zero work) must not satisfy the repair gate. 4. Solver model choice must be stable and independent of unit-type routing. ## Non-Goals - Replacing the model router for executors. Routing per `unitType` remains; cheap/specialized models are still desirable for unit work. - Mandating a specific solver vendor. The locked solver model is a pinned default; ops may override via preferences. - Reworking the checkpoint schema. The same JSON shape persists; only *who emits it* changes. ## Proposed Architecture ### Two-Layer Loop ``` ┌─────────────────────────────────────────┐ │ runUnit(ctx, unitType, unitId, prompt) │ └─────────────────────┬───────────────────┘ │ ┌───────────────────────┴───────────────────────┐ │ │ ▼ ▼ ┌───────────────────────────┐ ┌───────────────────────────┐ │ EXECUTOR PASS │ │ SOLVER PASS │ │ model: routed per unit │ transcript → │ model: LOCKED kimi-k2.6 │ │ (Codestral, Gemini, ...) │ ────────────────▶ │ reads agent_end messages, │ │ does the unit work │ │ emits canonical checkpoint │ │ NO checkpoint tool needed │ │ classifies refusal/no-op │ └───────────────────────────┘ └─────────────┬─────────────┘ │ ▼ ┌───────────────────────────┐ │ appendAutonomousSolver- │ │ Checkpoint(basePath, …) │ └───────────────────────────┘ ``` ### Solver Model Selection A new helper `resolveSolverModel(preferences)` returns the pinned solver model. It: - Defaults to `kimi-k2.6` (provider: `kimi-coding`). - Allows preference override via `preferences.autonomousSolver.model` (operator escape hatch). - **Never** consults the unit-type router, benchmark selector, Bayesian blender, or learning aggregator. The solver's model is a runtime invariant, not an optimization target. - Falls back along a small explicit chain (`kimi-k2.6` → `claude-sonnet-4-6` → `claude-opus-4-7`) if the primary is unreachable. Falls back to "synthesize blocker" if none reachable, rather than silently dropping the protocol layer. ### Solver Pass Contract Input: `{ unitType, unitId, executorTranscript, lastIteration, projection }`. Output (a checkpoint, written via `appendAutonomousSolverCheckpoint`): ```json { "outcome": "continue|complete|blocker", "summary": "...", "completedItems": [...], "remainingItems": [...], "verificationEvidence": [...], "pdd": { "purpose": "...", "consumer": "...", ... }, "classification": "executor-refused|executor-noop|progress|complete|blocker-...", "evidence": "string excerpts proving the classification" } ``` The solver's prompt is a deterministic template at `prompts/autonomous-solver.md` that: 1. Embeds the executor transcript. 2. States the schema and outcome rules. 3. Includes the refusal/no-op classification rubric. 4. Instructs the solver to **never** propose code edits — its job is to observe, classify, and write the checkpoint. ### Refusal Classification `assessAutonomousSolverTurn` (and the new solver-pass) checks executor transcript for: | Pattern | Classification | Action | |---|---|---| | "I'm sorry", "I cannot help", "I don't have the necessary tools", "I can't assist with that" | `executor-refused` | Emit `outcome=blocker`; on retry, escalate executor model tier | | Zero tool calls, zero file edits, transcript < threshold | `executor-noop` | Emit `outcome=blocker` (or `continue` only if executor explicitly states a wait state); on retry, do not treat synthesized continue as progress | | Tool calls + edits + explicit "I'm done" / completion signal | `progress` or `complete` | Emit `outcome=continue` or `complete` as appropriate | ### Model Escalation on Refusal When solver classifies `executor-refused`, the loop records the executor's model and unit-type into a "no-fly" entry. On the next iteration of the same unit, the router consults this list and selects the next tier up (Sonnet → Opus, or via a model-tier graph). After 2 escalations on the same unit, pause the loop with a hard blocker. ### Backward Compatibility - The existing checkpoint shape is preserved; downstream consumers (`auto-post-unit.js`, journal events, learning aggregator) are unchanged. - The "executor calls the checkpoint tool" path is retained as a **fast path**: if the executor *did* emit a valid checkpoint AND the solver agrees with its classification, the solver pass is a no-op rubber stamp. The solver only synthesizes when the executor failed to checkpoint or classified incorrectly. - The `mentioned-checkpoint-without-tool` repair attempts collapse to zero — the solver is now the source of truth, so a missing executor checkpoint is normal, not a defect. ## Migration ### Step 1 — Pin solver model Add `resolveSolverModel` to `model-router.js` (or a new `solver-model.js`). It does not participate in the router's capability scoring. Wire it into `runUnit`'s solver-pass invocation only. ### Step 2 — Add solver pass After `runUnit` returns, before `assessAutonomousSolverTurn`, run the solver pass with the executor transcript. The solver pass writes the checkpoint directly. Executor checkpoint tool calls remain accepted but become advisory. ### Step 3 — Refusal classifier Extend `classifyAutonomousSolverMissingCheckpointFailure` (rename to `classifyExecutorTurn`) to detect refusal patterns. Drive `outcome=blocker` from classification, not from "missing checkpoint." ### Step 4 — Model escalation Add a per-(unitId, model) no-fly entry on `executor-refused`. Router consults the list during selection. ### Step 5 — Tests Cover: pinned solver model invariant, refusal pattern detection, no-op detection, solver-pass checkpoint emission when executor is silent, fast-path bypass when executor emits a valid checkpoint, escalation chain. ## Risks - **Solver-pass cost.** Adds one LLM call per unit. Mitigation: solver pass uses a smaller prompt (transcript summary only) and is skippable when executor emitted a valid checkpoint. - **Locked model availability.** If `kimi-k2.6` is unreachable, solver pass fails. Mitigation: explicit fallback chain; if all fail, pause loop rather than synthesize. - **Solver hallucination.** Solver could mis-classify and over-emit blockers. Mitigation: deterministic prompt template, classification rubric with example transcripts, and self-feedback when classification flips between iterations. ## Open Questions 1. Should the solver pass run *during* the executor turn (streaming observer) or *after* (post-turn observer)? Post-turn is simpler and proposed here; streaming would catch refusals earlier but adds complexity. 2. Should the solver pass also re-evaluate the executor's verification evidence (cite tests that actually exist, etc.) — i.e. become a partial verifier — or stay narrowly focused on checkpoint emission? 3. How does this interact with `keepSession: true` in `runUnit`? The solver pass is a separate session by definition; the executor session remains as-is. ## Decision Outcome (when accepted) To be filled when the ADR is accepted. Initial cut targets steps 1–3 (pinned solver model + solver pass + refusal classifier). Steps 4–5 (escalation + tests) follow in a subsequent slice.