Mikael Hugo 55229f6604 fix(auto): split autonomous solver from executor per ADR-0079

- Lock solver model to kimi-k2.6 independent of unit-type router
- Executor prompt no longer requires checkpoint tool call
- Add dedicated solver pass that reads executor transcript and emits canonical checkpoint
- Classify executor refusals as blocker outcomes (already partially implemented)
- Classify no-op iterations (continue with zero work) as missing-checkpoint-retry
- Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment

Fixes sf-mp34nxb6-27zdx7

2026-05-12 23:55:02 +02:00

11 KiB

Raw Permalink Blame History

ADR-0079: Autonomous Solver / Executor Separation

Status: Proposed Date: 2026-05-12 Stakeholders: Autonomous mode, model router, checkpoint protocol, runtime safety Related: .sf/self-feedback.jsonl entry sf-mp34nxb6-27zdx7 (architecture-defect:solver-executor-conflation)

Problem Statement

Today the autonomous loop conflates two distinct roles into a single LLM call:

Executor — does the unit work (read files, run tests, edit code).
Autonomous solver — observes what the executor produced and emits a canonical checkpoint to disk (outcome, completedItems, remainingItems, PDD, verification evidence).

Both roles are filled by the same model, picked by model-router.js:computeTaskRequirements from the unit type (execute-task, plan-slice, …). The router optimizes for the executor's job — cost, coding capability, speed — and may select a small coding-tuned model (Codestral, Devstral, Gemini Flash). Those models are not required to be agentic, refusal-resistant, or stable at protocol reasoning.

When the chosen model is incapable of the agentic role, the protocol breaks in a way the repair loop cannot fix:

2026-05-12 M001-6377a4/S04/T02: mistral/codestral-latest was routed to execute T02 (Align TUI Dashboard with Headless Status Output). It emitted:

"I'm sorry, but I currently don't have the necessary tools to assist with that specific request."

No tool was called. The runtime logged Autonomous solver checkpoint missing … repair attempt 1/4 (mentioned-checkpoint-without-tool), then prompted the same Codestral with stronger "you MUST call the checkpoint tool" wording. Codestral dutifully called Autonomous Checkpoint with outcome=continue — and produced zero file edits, zero work. The protocol layer reported success; the slice made no progress.

The repair logic at auto/phases-unit.js:720-890 only enforces protocol shape ("did the LLM emit a checkpoint tool call?"). It does not check outcome ("did the unit progress?") or refusal ("did the executor refuse the task?"). And because executor and solver are the same call, retrying the repair just re-asks the broken model.

Goals

The protocol layer must remain functional even when the executor refuses or is incapable.
Refusals must surface as blockers that can escalate model tier — not silently synthesize forward progress.
No-op iterations (continue with zero work) must not satisfy the repair gate.
Solver model choice must be stable and independent of unit-type routing.

Non-Goals

Replacing the model router for executors. Routing per unitType remains; cheap/specialized models are still desirable for unit work.
Mandating a specific solver vendor. The locked solver model is a pinned default; ops may override via preferences.
Reworking the checkpoint schema. The same JSON shape persists; only who emits it changes.

Proposed Architecture

Two-Layer Loop

                ┌─────────────────────────────────────────┐
                │ runUnit(ctx, unitType, unitId, prompt)  │
                └─────────────────────┬───────────────────┘
                                      │
              ┌───────────────────────┴───────────────────────┐
              │                                               │
              ▼                                               ▼
  ┌───────────────────────────┐                   ┌───────────────────────────┐
  │ EXECUTOR PASS             │                   │ SOLVER PASS               │
  │ model: routed per unit    │   transcript →    │ model: LOCKED kimi-k2.6   │
  │ (Codestral, Gemini, ...)  │ ────────────────▶ │ reads agent_end messages, │
  │ does the unit work        │                   │ emits canonical checkpoint │
  │ NO checkpoint tool needed │                   │ classifies refusal/no-op   │
  └───────────────────────────┘                   └─────────────┬─────────────┘
                                                                │
                                                                ▼
                                                ┌───────────────────────────┐
                                                │ appendAutonomousSolver-   │
                                                │ Checkpoint(basePath, …)   │
                                                └───────────────────────────┘

Solver Model Selection

A new helper resolveSolverModel(preferences) returns the pinned solver model. It:

Defaults to kimi-k2.6 (provider: kimi-coding).
Allows preference override via preferences.autonomousSolver.model (operator escape hatch).
Never consults the unit-type router, benchmark selector, Bayesian blender, or learning aggregator. The solver's model is a runtime invariant, not an optimization target.
Falls back along a small explicit chain (kimi-k2.6 → claude-sonnet-4-6 → claude-opus-4-7) if the primary is unreachable. Falls back to "synthesize blocker" if none reachable, rather than silently dropping the protocol layer.

Solver Pass Contract

Input: { unitType, unitId, executorTranscript, lastIteration, projection }.

Output (a checkpoint, written via appendAutonomousSolverCheckpoint):

{
  "outcome": "continue|complete|blocker",
  "summary": "...",
  "completedItems": [...],
  "remainingItems": [...],
  "verificationEvidence": [...],
  "pdd": { "purpose": "...", "consumer": "...", ... },
  "classification": "executor-refused|executor-noop|progress|complete|blocker-...",
  "evidence": "string excerpts proving the classification"
}

The solver's prompt is a deterministic template at prompts/autonomous-solver.md that:

Embeds the executor transcript.
States the schema and outcome rules.
Includes the refusal/no-op classification rubric.
Instructs the solver to never propose code edits — its job is to observe, classify, and write the checkpoint.

Refusal Classification

assessAutonomousSolverTurn (and the new solver-pass) checks executor transcript for:

Pattern	Classification	Action
"I'm sorry", "I cannot help", "I don't have the necessary tools", "I can't assist with that"	`executor-refused`	Emit `outcome=blocker`; on retry, escalate executor model tier
Zero tool calls, zero file edits, transcript < threshold	`executor-noop`	Emit `outcome=blocker` (or `continue` only if executor explicitly states a wait state); on retry, do not treat synthesized continue as progress
Tool calls + edits + explicit "I'm done" / completion signal	`progress` or `complete`	Emit `outcome=continue` or `complete` as appropriate

Model Escalation on Refusal

When solver classifies executor-refused, the loop records the executor's model and unit-type into a "no-fly" entry. On the next iteration of the same unit, the router consults this list and selects the next tier up (Sonnet → Opus, or via a model-tier graph). After 2 escalations on the same unit, pause the loop with a hard blocker.

Backward Compatibility

The existing checkpoint shape is preserved; downstream consumers (auto-post-unit.js, journal events, learning aggregator) are unchanged.
The "executor calls the checkpoint tool" path is retained as a fast path: if the executor did emit a valid checkpoint AND the solver agrees with its classification, the solver pass is a no-op rubber stamp. The solver only synthesizes when the executor failed to checkpoint or classified incorrectly.
The mentioned-checkpoint-without-tool repair attempts collapse to zero — the solver is now the source of truth, so a missing executor checkpoint is normal, not a defect.

Migration

Step 1 — Pin solver model

Add resolveSolverModel to model-router.js (or a new solver-model.js). It does not participate in the router's capability scoring. Wire it into runUnit's solver-pass invocation only.

Step 2 — Add solver pass

After runUnit returns, before assessAutonomousSolverTurn, run the solver pass with the executor transcript. The solver pass writes the checkpoint directly. Executor checkpoint tool calls remain accepted but become advisory.

Step 3 — Refusal classifier

Extend classifyAutonomousSolverMissingCheckpointFailure (rename to classifyExecutorTurn) to detect refusal patterns. Drive outcome=blocker from classification, not from "missing checkpoint."

Step 4 — Model escalation

Add a per-(unitId, model) no-fly entry on executor-refused. Router consults the list during selection.

Step 5 — Tests

Cover: pinned solver model invariant, refusal pattern detection, no-op detection, solver-pass checkpoint emission when executor is silent, fast-path bypass when executor emits a valid checkpoint, escalation chain.

Risks

Solver-pass cost. Adds one LLM call per unit. Mitigation: solver pass uses a smaller prompt (transcript summary only) and is skippable when executor emitted a valid checkpoint.
Locked model availability. If kimi-k2.6 is unreachable, solver pass fails. Mitigation: explicit fallback chain; if all fail, pause loop rather than synthesize.
Solver hallucination. Solver could mis-classify and over-emit blockers. Mitigation: deterministic prompt template, classification rubric with example transcripts, and self-feedback when classification flips between iterations.

Open Questions

Should the solver pass run during the executor turn (streaming observer) or after (post-turn observer)? Post-turn is simpler and proposed here; streaming would catch refusals earlier but adds complexity.
Should the solver pass also re-evaluate the executor's verification evidence (cite tests that actually exist, etc.) — i.e. become a partial verifier — or stay narrowly focused on checkpoint emission?
How does this interact with keepSession: true in runUnit? The solver pass is a separate session by definition; the executor session remains as-is.

Decision Outcome (when accepted)

To be filled when the ADR is accepted. Initial cut targets steps 1–3 (pinned solver model + solver pass + refusal classifier). Steps 4–5 (escalation + tests) follow in a subsequent slice.

11 KiB Raw Permalink Blame History Unescape Escape