- Lock solver model to kimi-k2.6 independent of unit-type router - Executor prompt no longer requires checkpoint tool call - Add dedicated solver pass that reads executor transcript and emits canonical checkpoint - Classify executor refusals as blocker outcomes (already partially implemented) - Classify no-op iterations (continue with zero work) as missing-checkpoint-retry - Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment Fixes sf-mp34nxb6-27zdx7
11 KiB
ADR-0079: Autonomous Solver / Executor Separation
Status: Proposed
Date: 2026-05-12
Stakeholders: Autonomous mode, model router, checkpoint protocol, runtime safety
Related: .sf/self-feedback.jsonl entry sf-mp34nxb6-27zdx7 (architecture-defect:solver-executor-conflation)
Problem Statement
Today the autonomous loop conflates two distinct roles into a single LLM call:
- Executor — does the unit work (read files, run tests, edit code).
- Autonomous solver — observes what the executor produced and emits a canonical checkpoint to disk (
outcome,completedItems,remainingItems, PDD, verification evidence).
Both roles are filled by the same model, picked by model-router.js:computeTaskRequirements from the unit type (execute-task, plan-slice, …). The router optimizes for the executor's job — cost, coding capability, speed — and may select a small coding-tuned model (Codestral, Devstral, Gemini Flash). Those models are not required to be agentic, refusal-resistant, or stable at protocol reasoning.
When the chosen model is incapable of the agentic role, the protocol breaks in a way the repair loop cannot fix:
-
2026-05-12 M001-6377a4/S04/T02:
mistral/codestral-latestwas routed to execute T02 (Align TUI Dashboard with Headless Status Output). It emitted:"I'm sorry, but I currently don't have the necessary tools to assist with that specific request."
No tool was called. The runtime logged
Autonomous solver checkpoint missing … repair attempt 1/4 (mentioned-checkpoint-without-tool), then prompted the same Codestral with stronger "you MUST call the checkpoint tool" wording. Codestral dutifully calledAutonomous Checkpointwithoutcome=continue— and produced zero file edits, zero work. The protocol layer reported success; the slice made no progress.
The repair logic at auto/phases-unit.js:720-890 only enforces protocol shape ("did the LLM emit a checkpoint tool call?"). It does not check outcome ("did the unit progress?") or refusal ("did the executor refuse the task?"). And because executor and solver are the same call, retrying the repair just re-asks the broken model.
Goals
- The protocol layer must remain functional even when the executor refuses or is incapable.
- Refusals must surface as blockers that can escalate model tier — not silently synthesize forward progress.
- No-op iterations (continue with zero work) must not satisfy the repair gate.
- Solver model choice must be stable and independent of unit-type routing.
Non-Goals
- Replacing the model router for executors. Routing per
unitTyperemains; cheap/specialized models are still desirable for unit work. - Mandating a specific solver vendor. The locked solver model is a pinned default; ops may override via preferences.
- Reworking the checkpoint schema. The same JSON shape persists; only who emits it changes.
Proposed Architecture
Two-Layer Loop
┌─────────────────────────────────────────┐
│ runUnit(ctx, unitType, unitId, prompt) │
└─────────────────────┬───────────────────┘
│
┌───────────────────────┴───────────────────────┐
│ │
▼ ▼
┌───────────────────────────┐ ┌───────────────────────────┐
│ EXECUTOR PASS │ │ SOLVER PASS │
│ model: routed per unit │ transcript → │ model: LOCKED kimi-k2.6 │
│ (Codestral, Gemini, ...) │ ────────────────▶ │ reads agent_end messages, │
│ does the unit work │ │ emits canonical checkpoint │
│ NO checkpoint tool needed │ │ classifies refusal/no-op │
└───────────────────────────┘ └─────────────┬─────────────┘
│
▼
┌───────────────────────────┐
│ appendAutonomousSolver- │
│ Checkpoint(basePath, …) │
└───────────────────────────┘
Solver Model Selection
A new helper resolveSolverModel(preferences) returns the pinned solver model. It:
- Defaults to
kimi-k2.6(provider:kimi-coding). - Allows preference override via
preferences.autonomousSolver.model(operator escape hatch). - Never consults the unit-type router, benchmark selector, Bayesian blender, or learning aggregator. The solver's model is a runtime invariant, not an optimization target.
- Falls back along a small explicit chain (
kimi-k2.6→claude-sonnet-4-6→claude-opus-4-7) if the primary is unreachable. Falls back to "synthesize blocker" if none reachable, rather than silently dropping the protocol layer.
Solver Pass Contract
Input: { unitType, unitId, executorTranscript, lastIteration, projection }.
Output (a checkpoint, written via appendAutonomousSolverCheckpoint):
{
"outcome": "continue|complete|blocker",
"summary": "...",
"completedItems": [...],
"remainingItems": [...],
"verificationEvidence": [...],
"pdd": { "purpose": "...", "consumer": "...", ... },
"classification": "executor-refused|executor-noop|progress|complete|blocker-...",
"evidence": "string excerpts proving the classification"
}
The solver's prompt is a deterministic template at prompts/autonomous-solver.md that:
- Embeds the executor transcript.
- States the schema and outcome rules.
- Includes the refusal/no-op classification rubric.
- Instructs the solver to never propose code edits — its job is to observe, classify, and write the checkpoint.
Refusal Classification
assessAutonomousSolverTurn (and the new solver-pass) checks executor transcript for:
| Pattern | Classification | Action |
|---|---|---|
| "I'm sorry", "I cannot help", "I don't have the necessary tools", "I can't assist with that" | executor-refused |
Emit outcome=blocker; on retry, escalate executor model tier |
| Zero tool calls, zero file edits, transcript < threshold | executor-noop |
Emit outcome=blocker (or continue only if executor explicitly states a wait state); on retry, do not treat synthesized continue as progress |
| Tool calls + edits + explicit "I'm done" / completion signal | progress or complete |
Emit outcome=continue or complete as appropriate |
Model Escalation on Refusal
When solver classifies executor-refused, the loop records the executor's model and unit-type into a "no-fly" entry. On the next iteration of the same unit, the router consults this list and selects the next tier up (Sonnet → Opus, or via a model-tier graph). After 2 escalations on the same unit, pause the loop with a hard blocker.
Backward Compatibility
- The existing checkpoint shape is preserved; downstream consumers (
auto-post-unit.js, journal events, learning aggregator) are unchanged. - The "executor calls the checkpoint tool" path is retained as a fast path: if the executor did emit a valid checkpoint AND the solver agrees with its classification, the solver pass is a no-op rubber stamp. The solver only synthesizes when the executor failed to checkpoint or classified incorrectly.
- The
mentioned-checkpoint-without-toolrepair attempts collapse to zero — the solver is now the source of truth, so a missing executor checkpoint is normal, not a defect.
Migration
Step 1 — Pin solver model
Add resolveSolverModel to model-router.js (or a new solver-model.js). It does not participate in the router's capability scoring. Wire it into runUnit's solver-pass invocation only.
Step 2 — Add solver pass
After runUnit returns, before assessAutonomousSolverTurn, run the solver pass with the executor transcript. The solver pass writes the checkpoint directly. Executor checkpoint tool calls remain accepted but become advisory.
Step 3 — Refusal classifier
Extend classifyAutonomousSolverMissingCheckpointFailure (rename to classifyExecutorTurn) to detect refusal patterns. Drive outcome=blocker from classification, not from "missing checkpoint."
Step 4 — Model escalation
Add a per-(unitId, model) no-fly entry on executor-refused. Router consults the list during selection.
Step 5 — Tests
Cover: pinned solver model invariant, refusal pattern detection, no-op detection, solver-pass checkpoint emission when executor is silent, fast-path bypass when executor emits a valid checkpoint, escalation chain.
Risks
- Solver-pass cost. Adds one LLM call per unit. Mitigation: solver pass uses a smaller prompt (transcript summary only) and is skippable when executor emitted a valid checkpoint.
- Locked model availability. If
kimi-k2.6is unreachable, solver pass fails. Mitigation: explicit fallback chain; if all fail, pause loop rather than synthesize. - Solver hallucination. Solver could mis-classify and over-emit blockers. Mitigation: deterministic prompt template, classification rubric with example transcripts, and self-feedback when classification flips between iterations.
Open Questions
- Should the solver pass run during the executor turn (streaming observer) or after (post-turn observer)? Post-turn is simpler and proposed here; streaming would catch refusals earlier but adds complexity.
- Should the solver pass also re-evaluate the executor's verification evidence (cite tests that actually exist, etc.) — i.e. become a partial verifier — or stay narrowly focused on checkpoint emission?
- How does this interact with
keepSession: trueinrunUnit? The solver pass is a separate session by definition; the executor session remains as-is.
Decision Outcome (when accepted)
To be filled when the ADR is accepted. Initial cut targets steps 1–3 (pinned solver model + solver pass + refusal classifier). Steps 4–5 (escalation + tests) follow in a subsequent slice.