singularity-forge/docs/adr/0079-autonomous-solver-executor-separation.md

# ADR-0079: Autonomous Solver / Executor Separation

**Status:** Proposed
**Date:** 2026-05-12
**Stakeholders:** Autonomous mode, model router, checkpoint protocol, runtime safety
**Related:** `.sf/self-feedback.jsonl` entry `sf-mp34nxb6-27zdx7` (architecture-defect:solver-executor-conflation)

---

## Problem Statement

Today the autonomous loop conflates two distinct roles into a single LLM call:

1. **Executor** — does the unit work (read files, run tests, edit code).
2. **Autonomous solver** — observes what the executor produced and emits a canonical checkpoint to disk (`outcome`, `completedItems`, `remainingItems`, PDD, verification evidence).

Both roles are filled by the same model, picked by `model-router.js:computeTaskRequirements` from the unit type (`execute-task`, `plan-slice`, …). The router optimizes for the *executor's* job — cost, coding capability, speed — and may select a small coding-tuned model (Codestral, Devstral, Gemini Flash). Those models are *not* required to be agentic, refusal-resistant, or stable at protocol reasoning.

When the chosen model is incapable of the agentic role, the protocol breaks in a way the repair loop cannot fix:

- **2026-05-12 M001-6377a4/S04/T02:** `mistral/codestral-latest` was routed to execute T02 (Align TUI Dashboard with Headless Status Output). It emitted:
  > "I'm sorry, but I currently don't have the necessary tools to assist with that specific request."

  No tool was called. The runtime logged `Autonomous solver checkpoint missing … repair attempt 1/4 (mentioned-checkpoint-without-tool)`, then prompted the *same* Codestral with stronger "you MUST call the checkpoint tool" wording. Codestral dutifully called `Autonomous Checkpoint` with `outcome=continue` — and produced zero file edits, zero work. The protocol layer reported success; the slice made no progress.

The repair logic at `auto/phases-unit.js:720-890` only enforces **protocol shape** ("did the LLM emit a checkpoint tool call?"). It does not check **outcome** ("did the unit progress?") or **refusal** ("did the executor refuse the task?"). And because executor and solver are the same call, retrying the repair just re-asks the broken model.

## Goals

1. The protocol layer must remain functional even when the executor refuses or is incapable.
2. Refusals must surface as blockers that can escalate model tier — not silently synthesize forward progress.
3. No-op iterations (continue with zero work) must not satisfy the repair gate.
4. Solver model choice must be stable and independent of unit-type routing.

## Non-Goals

- Replacing the model router for executors. Routing per `unitType` remains; cheap/specialized models are still desirable for unit work.
- Mandating a specific solver vendor. The locked solver model is a pinned default; ops may override via preferences.
- Reworking the checkpoint schema. The same JSON shape persists; only *who emits it* changes.

## Proposed Architecture

### Two-Layer Loop

```
                ┌─────────────────────────────────────────┐
                │ runUnit(ctx, unitType, unitId, prompt)  │
                └─────────────────────┬───────────────────┘
                                      │
              ┌───────────────────────┴───────────────────────┐
              │                                               │
              ▼                                               ▼
  ┌───────────────────────────┐                   ┌───────────────────────────┐
  │ EXECUTOR PASS             │                   │ SOLVER PASS               │
  │ model: routed per unit    │   transcript →    │ model: LOCKED kimi-k2.6   │
  │ (Codestral, Gemini, ...)  │ ────────────────▶ │ reads agent_end messages, │
  │ does the unit work        │                   │ emits canonical checkpoint │
  │ NO checkpoint tool needed │                   │ classifies refusal/no-op   │
  └───────────────────────────┘                   └─────────────┬─────────────┘
                                                                │
                                                                ▼
                                                ┌───────────────────────────┐
                                                │ appendAutonomousSolver-   │
                                                │ Checkpoint(basePath, …)   │
                                                └───────────────────────────┘
```

### Solver Model Selection

A new helper `resolveSolverModel(preferences)` returns the pinned solver model. It:

- Defaults to `kimi-k2.6` (provider: `kimi-coding`).
- Allows preference override via `preferences.autonomousSolver.model` (operator escape hatch).
- **Never** consults the unit-type router, benchmark selector, Bayesian blender, or learning aggregator. The solver's model is a runtime invariant, not an optimization target.
- Falls back along a small explicit chain (`kimi-k2.6` → `claude-sonnet-4-6` → `claude-opus-4-7`) if the primary is unreachable. Falls back to "synthesize blocker" if none reachable, rather than silently dropping the protocol layer.

### Solver Pass Contract

Input: `{ unitType, unitId, executorTranscript, lastIteration, projection }`.

Output (a checkpoint, written via `appendAutonomousSolverCheckpoint`):

```json
{
  "outcome": "continue|complete|blocker",
  "summary": "...",
  "completedItems": [...],
  "remainingItems": [...],
  "verificationEvidence": [...],
  "pdd": { "purpose": "...", "consumer": "...", ... },
  "classification": "executor-refused|executor-noop|progress|complete|blocker-...",
  "evidence": "string excerpts proving the classification"
}
```

The solver's prompt is a deterministic template at `prompts/autonomous-solver.md` that:

1. Embeds the executor transcript.
2. States the schema and outcome rules.
3. Includes the refusal/no-op classification rubric.
4. Instructs the solver to **never** propose code edits — its job is to observe, classify, and write the checkpoint.

### Refusal Classification

`assessAutonomousSolverTurn` (and the new solver-pass) checks executor transcript for:

| Pattern | Classification | Action |
|---|---|---|
| "I'm sorry", "I cannot help", "I don't have the necessary tools", "I can't assist with that" | `executor-refused` | Emit `outcome=blocker`; on retry, escalate executor model tier |
| Zero tool calls, zero file edits, transcript < threshold | `executor-noop` | Emit `outcome=blocker` (or `continue` only if executor explicitly states a wait state); on retry, do not treat synthesized continue as progress |
| Tool calls + edits + explicit "I'm done" / completion signal | `progress` or `complete` | Emit `outcome=continue` or `complete` as appropriate |

### Model Escalation on Refusal

When solver classifies `executor-refused`, the loop records the executor's model and unit-type into a "no-fly" entry. On the next iteration of the same unit, the router consults this list and selects the next tier up (Sonnet → Opus, or via a model-tier graph). After 2 escalations on the same unit, pause the loop with a hard blocker.

### Backward Compatibility

- The existing checkpoint shape is preserved; downstream consumers (`auto-post-unit.js`, journal events, learning aggregator) are unchanged.
- The "executor calls the checkpoint tool" path is retained as a **fast path**: if the executor *did* emit a valid checkpoint AND the solver agrees with its classification, the solver pass is a no-op rubber stamp. The solver only synthesizes when the executor failed to checkpoint or classified incorrectly.
- The `mentioned-checkpoint-without-tool` repair attempts collapse to zero — the solver is now the source of truth, so a missing executor checkpoint is normal, not a defect.

## Migration

### Step 1 — Pin solver model

Add `resolveSolverModel` to `model-router.js` (or a new `solver-model.js`). It does not participate in the router's capability scoring. Wire it into `runUnit`'s solver-pass invocation only.

### Step 2 — Add solver pass

After `runUnit` returns, before `assessAutonomousSolverTurn`, run the solver pass with the executor transcript. The solver pass writes the checkpoint directly. Executor checkpoint tool calls remain accepted but become advisory.

### Step 3 — Refusal classifier

Extend `classifyAutonomousSolverMissingCheckpointFailure` (rename to `classifyExecutorTurn`) to detect refusal patterns. Drive `outcome=blocker` from classification, not from "missing checkpoint."

### Step 4 — Model escalation

Add a per-(unitId, model) no-fly entry on `executor-refused`. Router consults the list during selection.

### Step 5 — Tests

Cover: pinned solver model invariant, refusal pattern detection, no-op detection, solver-pass checkpoint emission when executor is silent, fast-path bypass when executor emits a valid checkpoint, escalation chain.

## Risks

- **Solver-pass cost.** Adds one LLM call per unit. Mitigation: solver pass uses a smaller prompt (transcript summary only) and is skippable when executor emitted a valid checkpoint.
- **Locked model availability.** If `kimi-k2.6` is unreachable, solver pass fails. Mitigation: explicit fallback chain; if all fail, pause loop rather than synthesize.
- **Solver hallucination.** Solver could mis-classify and over-emit blockers. Mitigation: deterministic prompt template, classification rubric with example transcripts, and self-feedback when classification flips between iterations.

## Open Questions

1. Should the solver pass run *during* the executor turn (streaming observer) or *after* (post-turn observer)? Post-turn is simpler and proposed here; streaming would catch refusals earlier but adds complexity.
2. Should the solver pass also re-evaluate the executor's verification evidence (cite tests that actually exist, etc.) — i.e. become a partial verifier — or stay narrowly focused on checkpoint emission?
3. How does this interact with `keepSession: true` in `runUnit`? The solver pass is a separate session by definition; the executor session remains as-is.

## Decision Outcome (when accepted)

To be filled when the ADR is accepted. Initial cut targets steps 1–3 (pinned solver model + solver pass + refusal classifier). Steps 4–5 (escalation + tests) follow in a subsequent slice.