fix(auto): split autonomous solver from executor per ADR-0079

- Lock solver model to kimi-k2.6 independent of unit-type router
- Executor prompt no longer requires checkpoint tool call
- Add dedicated solver pass that reads executor transcript and emits canonical checkpoint
- Classify executor refusals as blocker outcomes (already partially implemented)
- Classify no-op iterations (continue with zero work) as missing-checkpoint-retry
- Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment

Fixes sf-mp34nxb6-27zdx7
This commit is contained in:
Mikael Hugo 2026-05-12 23:55:02 +02:00
parent e2f2cb7e2e
commit 55229f6604
5 changed files with 1250 additions and 55 deletions

View file

@ -0,0 +1,159 @@
# ADR-0079: Autonomous Solver / Executor Separation
**Status:** Proposed
**Date:** 2026-05-12
**Stakeholders:** Autonomous mode, model router, checkpoint protocol, runtime safety
**Related:** `.sf/self-feedback.jsonl` entry `sf-mp34nxb6-27zdx7` (architecture-defect:solver-executor-conflation)
---
## Problem Statement
Today the autonomous loop conflates two distinct roles into a single LLM call:
1. **Executor** — does the unit work (read files, run tests, edit code).
2. **Autonomous solver** — observes what the executor produced and emits a canonical checkpoint to disk (`outcome`, `completedItems`, `remainingItems`, PDD, verification evidence).
Both roles are filled by the same model, picked by `model-router.js:computeTaskRequirements` from the unit type (`execute-task`, `plan-slice`, …). The router optimizes for the *executor's* job — cost, coding capability, speed — and may select a small coding-tuned model (Codestral, Devstral, Gemini Flash). Those models are *not* required to be agentic, refusal-resistant, or stable at protocol reasoning.
When the chosen model is incapable of the agentic role, the protocol breaks in a way the repair loop cannot fix:
- **2026-05-12 M001-6377a4/S04/T02:** `mistral/codestral-latest` was routed to execute T02 (Align TUI Dashboard with Headless Status Output). It emitted:
> "I'm sorry, but I currently don't have the necessary tools to assist with that specific request."
No tool was called. The runtime logged `Autonomous solver checkpoint missing … repair attempt 1/4 (mentioned-checkpoint-without-tool)`, then prompted the *same* Codestral with stronger "you MUST call the checkpoint tool" wording. Codestral dutifully called `Autonomous Checkpoint` with `outcome=continue` — and produced zero file edits, zero work. The protocol layer reported success; the slice made no progress.
The repair logic at `auto/phases-unit.js:720-890` only enforces **protocol shape** ("did the LLM emit a checkpoint tool call?"). It does not check **outcome** ("did the unit progress?") or **refusal** ("did the executor refuse the task?"). And because executor and solver are the same call, retrying the repair just re-asks the broken model.
## Goals
1. The protocol layer must remain functional even when the executor refuses or is incapable.
2. Refusals must surface as blockers that can escalate model tier — not silently synthesize forward progress.
3. No-op iterations (continue with zero work) must not satisfy the repair gate.
4. Solver model choice must be stable and independent of unit-type routing.
## Non-Goals
- Replacing the model router for executors. Routing per `unitType` remains; cheap/specialized models are still desirable for unit work.
- Mandating a specific solver vendor. The locked solver model is a pinned default; ops may override via preferences.
- Reworking the checkpoint schema. The same JSON shape persists; only *who emits it* changes.
## Proposed Architecture
### Two-Layer Loop
```
┌─────────────────────────────────────────┐
│ runUnit(ctx, unitType, unitId, prompt) │
└─────────────────────┬───────────────────┘
┌───────────────────────┴───────────────────────┐
│ │
▼ ▼
┌───────────────────────────┐ ┌───────────────────────────┐
│ EXECUTOR PASS │ │ SOLVER PASS │
│ model: routed per unit │ transcript → │ model: LOCKED kimi-k2.6 │
│ (Codestral, Gemini, ...) │ ────────────────▶ │ reads agent_end messages, │
│ does the unit work │ │ emits canonical checkpoint │
│ NO checkpoint tool needed │ │ classifies refusal/no-op │
└───────────────────────────┘ └─────────────┬─────────────┘
┌───────────────────────────┐
│ appendAutonomousSolver- │
│ Checkpoint(basePath, …) │
└───────────────────────────┘
```
### Solver Model Selection
A new helper `resolveSolverModel(preferences)` returns the pinned solver model. It:
- Defaults to `kimi-k2.6` (provider: `kimi-coding`).
- Allows preference override via `preferences.autonomousSolver.model` (operator escape hatch).
- **Never** consults the unit-type router, benchmark selector, Bayesian blender, or learning aggregator. The solver's model is a runtime invariant, not an optimization target.
- Falls back along a small explicit chain (`kimi-k2.6``claude-sonnet-4-6``claude-opus-4-7`) if the primary is unreachable. Falls back to "synthesize blocker" if none reachable, rather than silently dropping the protocol layer.
### Solver Pass Contract
Input: `{ unitType, unitId, executorTranscript, lastIteration, projection }`.
Output (a checkpoint, written via `appendAutonomousSolverCheckpoint`):
```json
{
"outcome": "continue|complete|blocker",
"summary": "...",
"completedItems": [...],
"remainingItems": [...],
"verificationEvidence": [...],
"pdd": { "purpose": "...", "consumer": "...", ... },
"classification": "executor-refused|executor-noop|progress|complete|blocker-...",
"evidence": "string excerpts proving the classification"
}
```
The solver's prompt is a deterministic template at `prompts/autonomous-solver.md` that:
1. Embeds the executor transcript.
2. States the schema and outcome rules.
3. Includes the refusal/no-op classification rubric.
4. Instructs the solver to **never** propose code edits — its job is to observe, classify, and write the checkpoint.
### Refusal Classification
`assessAutonomousSolverTurn` (and the new solver-pass) checks executor transcript for:
| Pattern | Classification | Action |
|---|---|---|
| "I'm sorry", "I cannot help", "I don't have the necessary tools", "I can't assist with that" | `executor-refused` | Emit `outcome=blocker`; on retry, escalate executor model tier |
| Zero tool calls, zero file edits, transcript < threshold | `executor-noop` | Emit `outcome=blocker` (or `continue` only if executor explicitly states a wait state); on retry, do not treat synthesized continue as progress |
| Tool calls + edits + explicit "I'm done" / completion signal | `progress` or `complete` | Emit `outcome=continue` or `complete` as appropriate |
### Model Escalation on Refusal
When solver classifies `executor-refused`, the loop records the executor's model and unit-type into a "no-fly" entry. On the next iteration of the same unit, the router consults this list and selects the next tier up (Sonnet → Opus, or via a model-tier graph). After 2 escalations on the same unit, pause the loop with a hard blocker.
### Backward Compatibility
- The existing checkpoint shape is preserved; downstream consumers (`auto-post-unit.js`, journal events, learning aggregator) are unchanged.
- The "executor calls the checkpoint tool" path is retained as a **fast path**: if the executor *did* emit a valid checkpoint AND the solver agrees with its classification, the solver pass is a no-op rubber stamp. The solver only synthesizes when the executor failed to checkpoint or classified incorrectly.
- The `mentioned-checkpoint-without-tool` repair attempts collapse to zero — the solver is now the source of truth, so a missing executor checkpoint is normal, not a defect.
## Migration
### Step 1 — Pin solver model
Add `resolveSolverModel` to `model-router.js` (or a new `solver-model.js`). It does not participate in the router's capability scoring. Wire it into `runUnit`'s solver-pass invocation only.
### Step 2 — Add solver pass
After `runUnit` returns, before `assessAutonomousSolverTurn`, run the solver pass with the executor transcript. The solver pass writes the checkpoint directly. Executor checkpoint tool calls remain accepted but become advisory.
### Step 3 — Refusal classifier
Extend `classifyAutonomousSolverMissingCheckpointFailure` (rename to `classifyExecutorTurn`) to detect refusal patterns. Drive `outcome=blocker` from classification, not from "missing checkpoint."
### Step 4 — Model escalation
Add a per-(unitId, model) no-fly entry on `executor-refused`. Router consults the list during selection.
### Step 5 — Tests
Cover: pinned solver model invariant, refusal pattern detection, no-op detection, solver-pass checkpoint emission when executor is silent, fast-path bypass when executor emits a valid checkpoint, escalation chain.
## Risks
- **Solver-pass cost.** Adds one LLM call per unit. Mitigation: solver pass uses a smaller prompt (transcript summary only) and is skippable when executor emitted a valid checkpoint.
- **Locked model availability.** If `kimi-k2.6` is unreachable, solver pass fails. Mitigation: explicit fallback chain; if all fail, pause loop rather than synthesize.
- **Solver hallucination.** Solver could mis-classify and over-emit blockers. Mitigation: deterministic prompt template, classification rubric with example transcripts, and self-feedback when classification flips between iterations.
## Open Questions
1. Should the solver pass run *during* the executor turn (streaming observer) or *after* (post-turn observer)? Post-turn is simpler and proposed here; streaming would catch refusals earlier but adds complexity.
2. Should the solver pass also re-evaluate the executor's verification evidence (cite tests that actually exist, etc.) — i.e. become a partial verifier — or stay narrowly focused on checkpoint emission?
3. How does this interact with `keepSession: true` in `runUnit`? The solver pass is a separate session by definition; the executor session remains as-is.
## Decision Outcome (when accepted)
To be filled when the ADR is accepted. Initial cut targets steps 13 (pinned solver model + solver pass + refusal classifier). Steps 45 (escalation + tests) follow in a subsequent slice.

View file

@ -26,17 +26,23 @@ import {
appendAutonomousSolverCheckpoint,
assessAutonomousSolverTurn,
beginAutonomousSolverIteration,
buildAutonomousExecutorPromptBlock,
buildAutonomousSolverMissingCheckpointRepairPrompt,
buildAutonomousSolverPromptBlock,
buildAutonomousSolverSteeringPromptBlock,
buildSolverPassPrompt,
classifyAutonomousSolverMissingCheckpointFailure,
classifyExecutorRefusal,
consumePendingAutonomousSolverSteering,
getConfiguredAutonomousSolverMaxIterations,
isNoOpExecutorTranscript,
readAutonomousSolverState,
recordAutonomousSolverMissingCheckpointRetry,
} from "../autonomous-solver.js";
import { resumeAutoAfterProviderDelay } from "../bootstrap/provider-error-resume.js";
import { debugLog } from "../debug-logger.js";
import { PROJECT_FILES } from "../detection.js";
import { getErrorMessage } from "../error-utils.js";
import { MergeConflictError } from "../git-service.js";
import { recordLearnedOutcome } from "../learning/runtime.js";
import { sfRoot } from "../paths.js";
@ -73,6 +79,14 @@ import {
} from "../sf-db.js";
import { getEligibleSlices } from "../slice-parallel-eligibility.js";
import { startSliceParallel } from "../slice-parallel-orchestrator.js";
import {
clearSliceRoutingForUnit,
recordSliceRouting,
} from "../slice-routing-cache.js";
import {
resolveSolverModel,
resolveSolverModelCandidates,
} from "../solver-model.js";
import { handleProductAudit } from "../tools/product-audit-tool.js";
import { parseUnitId } from "../unit-id.js";
import {
@ -114,14 +128,17 @@ import {
FINALIZE_PRE_TIMEOUT_MS,
withTimeout,
} from "./finalize-timeout.js";
import {
emitCancelledUnitEnd,
recordLearningOutcomeForUnit,
shouldSkipArtifactVerification,
} from "./phases-helpers.js";
import { runUnit } from "./run-unit.js";
import { getErrorMessage } from "../error-utils.js";
import {
BUDGET_THRESHOLDS,
MAX_FINALIZE_TIMEOUTS,
MAX_RECOVERY_CHARS,
} from "./types.js";
import { emitCancelledUnitEnd, recordLearningOutcomeForUnit, shouldSkipArtifactVerification } from "./phases-helpers.js";
// ─── Session timeout scheduled resume state ────────────────────────────────────────
let consecutiveSessionTimeouts = 0;
@ -458,7 +475,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
if (steeringBlock) {
finalPrompt = `${finalPrompt}\n\n---\n\n${steeringBlock}`;
}
finalPrompt = `${finalPrompt}\n\n---\n\n${buildAutonomousSolverPromptBlock(solverState)}`;
finalPrompt = `${finalPrompt}\n\n---\n\n${buildAutonomousExecutorPromptBlock(solverState)}`;
deps.emitJournalEvent({
ts: new Date().toISOString(),
flowId: ic.flowId,
@ -505,8 +522,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
try {
finalPrompt = deps.reorderForCaching(finalPrompt);
} catch (reorderErr) {
const msg =
getErrorMessage(reorderErr);
const msg = getErrorMessage(reorderErr);
logWarning("engine", "Prompt reorder failed", { error: msg });
}
// Select and apply model (with tier escalation on retry — normal units only)
@ -706,13 +722,227 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
const unitResult = await runUnit(ctx, pi, s, unitType, unitId, finalPrompt);
s.lastUnitAgentEndMessages = unitResult.event?.messages ?? null;
let currentUnitResult = unitResult;
const executorMessages = unitResult.event?.messages ?? [];
const refusal =
unitResult.status !== "cancelled"
? classifyExecutorRefusal(executorMessages)
: null;
// Short-circuit: if runUnit was cancelled (provider not ready, session
// failed, timeout) there is no checkpoint to repair — skip the repair loop
// failed, timeout) there is no checkpoint to repair — skip the solver pass
// entirely and let the cancelled handler below surface the real cause.
let solverAssessment =
unitResult.status === "cancelled"
? { action: "none" }
: assessAutonomousSolverTurn(s.basePath, unitType, unitId);
: { action: "pending" };
// Refusal short-circuit: when the executor model returned a generic refusal,
// synthesize a blocked checkpoint immediately and skip the solver pass.
if (unitResult.status !== "cancelled" && refusal) {
const executorModel =
s.currentUnitModel?.provider && s.currentUnitModel?.id
? `${s.currentUnitModel.provider}/${s.currentUnitModel.id}`
: (s.currentUnitModel?.id ?? "unknown");
// Evict the sticky-routing entry for this slice — the model attached
// to it refused, so future units in the same slice should NOT re-pin
// the broken model.
try {
clearSliceRoutingForUnit(s.basePath, unitId);
} catch {
// best-effort
}
try {
appendAutonomousSolverCheckpoint(s.basePath, {
unitType,
unitId,
outcome: "blocked",
summary: `Executor (${executorModel}) refused the task. Pattern: ${refusal.pattern}. Repair-prompting the same model cannot produce progress; escalate the executor model or unblock this unit manually.`,
completedItems: [],
remainingItems: [
`Re-run ${unitType} ${unitId} with a more capable executor model — current routing selected an incapable model.`,
],
verificationEvidence: [
`executor-refusal-pattern=${refusal.pattern}`,
`executor-model=${executorModel}`,
],
blockerReason: `executor-refused (${refusal.pattern})`,
pdd: {
purpose:
"Surface executor refusals as protocol-level blockers instead of synthesizing fake progress.",
consumer: "autonomous loop pause-handler",
contract:
"On `executor-refused`, the loop pauses and self-feedback is filed; the operator must escalate the executor model.",
failureBoundary:
"If the operator does not escalate, the same refusal will recur on next dispatch.",
evidence: "classifyExecutorRefusal matched a refusal pattern",
nonGoals:
"This does not retry the unit automatically — capability mismatches require operator judgement (or a future automatic escalation policy).",
invariants: "Refusal never silently synthesizes a continue.",
assumptions:
"The refusal pattern set in classifyExecutorRefusal is conservative — false positives are rare and require operator review.",
},
});
} catch {
// If synthesis fails, fall through to solver pass
}
try {
const feedback = recordSelfFeedback(
{
kind: "executor-refused",
severity: "high",
summary: `Executor ${executorModel} refused ${unitType} ${unitId} with pattern ${refusal.pattern}; loop paused to prevent fake-progress synthesis.`,
evidence: [
`unit=${unitType} ${unitId}`,
`executor=${executorModel}`,
`refusal-pattern=${refusal.pattern}`,
"",
refusal.evidence ?? "",
].join("\n"),
suggestedFix:
"Escalate the executor model for this unit (or unit type) — the currently routed model lacks the agentic capabilities required. Long-term: separate the executor and autonomous-solver roles per ADR-0079 and pin the solver to a stable agentic model.",
acceptanceCriteria: [
"Executor model for this unit type is escalated to a model that passes the refusal-resistant tier.",
"Refusal pattern is added to classifyExecutorRefusal if a novel phrasing slipped through.",
],
occurredIn: { unitType, unitId },
source: "runtime",
},
s.basePath,
);
deps.emitJournalEvent({
ts: new Date().toISOString(),
flowId: ic.flowId,
seq: ic.nextSeq(),
eventType: "executor-refused",
data: {
unitType,
unitId,
executorModel,
pattern: refusal.pattern,
selfFeedbackId: feedback?.entry?.id,
blocking: feedback?.blocking,
},
});
} catch {
// self-feedback is observability; never block loop progression on it
}
ctx.ui.notify(
`Executor ${executorModel} refused ${unitType} ${unitId} (${refusal.pattern}); autonomous loop pausing instead of synthesizing fake progress. See SELF-FEEDBACK.md for escalation guidance.`,
"error",
);
solverAssessment = assessAutonomousSolverTurn(s.basePath, unitType, unitId);
}
// Solver pass: the stable solver model reads the executor transcript and
// emits the canonical checkpoint. This separates the executor role (unit
// work) from the solver role (protocol checkpoint) per ADR-0079.
if (unitResult.status !== "cancelled" && !refusal) {
const executorModel = s.currentUnitModel;
const solverCandidates = resolveSolverModelCandidates(prefs);
let solverPassResult = null;
for (const candidate of solverCandidates) {
const availableModels = ctx.modelRegistry.getAvailable?.() ?? [];
const match = availableModels.find(
(m) => m.provider === candidate.provider && m.id === candidate.id,
);
if (!match) continue;
const ok = await pi.setModel(match, { persist: false });
if (!ok) continue;
s.currentUnitModel = match;
ctx.ui.notify(
`Running solver pass for ${unitType} ${unitId} with ${match.provider}/${match.id}`,
"info",
);
const solverState = readAutonomousSolverState(s.basePath);
const solverPrompt = buildSolverPassPrompt(
executorMessages,
solverState,
unitType,
unitId,
);
try {
const result = await runUnit(
ctx,
pi,
s,
unitType,
unitId,
solverPrompt,
{ keepSession: false },
);
solverPassResult = result;
if (result.status !== "cancelled") {
currentUnitResult = result;
s.lastUnitAgentEndMessages = result.event?.messages ?? null;
break; // Solver pass succeeded
}
} catch {
// Try next fallback
}
}
if (!solverPassResult || solverPassResult.status === "cancelled") {
ctx.ui.notify(
`Solver pass failed for ${unitType} ${unitId} — no solver model was reachable. Synthesizing blocked checkpoint.`,
"error",
);
try {
appendAutonomousSolverCheckpoint(s.basePath, {
unitType,
unitId,
outcome: "blocked",
summary: `Solver pass failed — no solver model was reachable. The executor transcript could not be classified into a canonical checkpoint.`,
completedItems: [],
remainingItems: [
`Retry ${unitType} ${unitId} after verifying solver model availability.`,
],
verificationEvidence: ["solver-pass-failed"],
blockerReason: "solver-pass-failed",
pdd: {
purpose:
"Surface solver-pass failures as blockers rather than silently dropping the protocol layer.",
consumer: "autonomous loop pause-handler",
contract:
"On solver-pass failure, the loop pauses so the operator can fix model availability.",
failureBoundary:
"If all solver candidates are unreachable, the protocol layer cannot function.",
evidence:
"All solver candidates were unreachable or setModel failed.",
nonGoals:
"This does not retry with a different solver candidate automatically beyond the explicit fallback chain.",
invariants:
"Solver-pass failure never silently synthesizes a continue.",
assumptions:
"At least one solver candidate (kimi-k2.6 or fallback) is available in the model registry.",
},
});
} catch {
// best-effort
}
}
solverAssessment = assessAutonomousSolverTurn(
s.basePath,
unitType,
unitId,
executorMessages,
);
// Restore executor model after solver pass and assessment
if (executorModel) {
try {
await pi.setModel(executorModel, { persist: false });
} catch {
// best-effort restore
}
s.currentUnitModel = executorModel;
}
}
while (solverAssessment.action === "missing-checkpoint-retry") {
const diagnosis = classifyAutonomousSolverMissingCheckpointFailure(
currentUnitResult.event?.messages ?? [],
@ -779,6 +1009,26 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
remainingCount: solverCheckpoint.remainingItems?.length ?? 0,
},
});
// Record sticky-routing on successful outcomes only. `continue` is the
// usual within-iteration progress signal; `complete` is final success.
// We deliberately skip `blocked` and `decide` because attaching a model
// to a slice when it's known-stuck or known-undecided would defeat the
// fallback path.
if (
solverCheckpoint.outcome === "continue" ||
solverCheckpoint.outcome === "complete"
) {
try {
recordSliceRouting(
s.basePath,
unitType,
unitId,
s.currentUnitModel ?? ctx.model ?? null,
);
} catch {
// best-effort; routing cache must never break the loop
}
}
}
if (solverAssessment.action === "pause") {
const isMissingCheckpoint =
@ -808,7 +1058,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
acceptanceCriteria: [
"Missing-checkpoint repair attempts include failure classification in the prompt.",
"Repeated repair failures file self-feedback automatically.",
"Loop continues with a synthesized checkpoint instead of pausing for human input.",
"Loop continues with a synthesized checkpoint instead of pausing for human input — EXCEPT when classifyExecutorRefusal short-circuits with `executor-refused`, in which case the loop emits a `blocked` checkpoint and pauses (synthesizing forward progress over a refusing executor is the bug we are fixing).",
],
occurredIn: { unitType, unitId },
source: "runtime",
@ -1087,8 +1337,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
resume: allowAutoResume
? () => {
void resumeAutoAfterProviderDelay(pi, ctx).catch((err) => {
const message =
getErrorMessage(err);
const message = getErrorMessage(err);
ctx.ui.notify(
`Session timeout recovery failed: ${message}`,
"error",
@ -1280,10 +1529,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
});
} catch (err) {
/* non-fatal — anchor is advisory */
logWarning(
"engine",
`phase anchor failed: ${getErrorMessage(err)}`,
);
logWarning("engine", `phase anchor failed: ${getErrorMessage(err)}`);
}
}
if (currentUnitResult.status !== "completed" || !artifactVerified) {

View file

@ -281,7 +281,7 @@ export function beginAutonomousSolverIteration(
*
* Consumer: runUnitPhase prompt injection.
*/
export function buildAutonomousSolverPromptBlock(state) {
function _buildAutonomousLoopPromptPrefix(state, header) {
const phase = getSolverPhase(state.iteration, state.maxIterations);
const stalled =
Number(state.iterationsSinceProgress) >= STALL_THRESHOLD_ITERATIONS;
@ -306,7 +306,7 @@ export function buildAutonomousSolverPromptBlock(state) {
};
const lines = [
"## Autonomous Solver Loop Contract",
`## ${header}`,
"",
`You are inside /autonomous iteration ${state.iteration} of ${state.maxIterations} for ${state.unitType} ${state.unitId}.`,
"",
@ -357,6 +357,25 @@ export function buildAutonomousSolverPromptBlock(state) {
);
}
return lines;
}
/**
* Build the PDD autonomous solver prompt block appended to unit prompts.
*
* Purpose: bind every autonomous unit to bounded iterations, evidence, stop
* signals, and the eight PDD fields instead of open-ended hidden retries.
* Phase-aware: ORIENT (iters 1-2) focuses on reading and planning; EXECUTE
* (middle) on implementation; CLOSE (final 3) on verifying and wrapping up.
* Stall/loop signals are injected when the system detects no progress.
*
* Consumer: runUnitPhase prompt injection (solver pass).
*/
export function buildAutonomousSolverPromptBlock(state) {
const lines = _buildAutonomousLoopPromptPrefix(
state,
"Autonomous Solver Loop Contract",
);
lines.push(
"",
"## CHECKPOINT REQUIREMENT",
@ -390,6 +409,142 @@ export function buildAutonomousSolverPromptBlock(state) {
return lines.join("\n");
}
/**
* Build the executor prompt block (no checkpoint requirement).
*
* Purpose: the executor focuses on doing the unit work. A separate solver pass
* reads the executor transcript and emits the canonical checkpoint.
*
* Consumer: runUnitPhase prompt injection (executor pass).
*/
export function buildAutonomousExecutorPromptBlock(state) {
const lines = _buildAutonomousLoopPromptPrefix(
state,
"Autonomous Executor Contract",
);
lines.push(
"",
"## EXECUTOR ROLE",
"",
"Your job is to do the unit work: read files, run tests, edit code, and produce concrete artifacts.",
"You do NOT need to call the `checkpoint` tool. A separate solver pass will observe your work and emit the canonical checkpoint.",
"Focus entirely on making verifiable progress toward the task goal.",
"",
"If you are executing an `execute-task` unit and the task is finished, `complete_task` remains mandatory.",
"End your turn when the bounded work is done or when you have made meaningful progress and need to wait for the next iteration.",
);
return lines.join("\n");
}
/**
* Build the solver pass prompt that reads an executor transcript.
*
* Purpose: give the stable solver model the executor transcript and instruct it
* to classify what happened and emit the canonical checkpoint.
*
* Consumer: runUnitPhase after the executor pass returns.
*/
export function buildSolverPassPrompt(
executorTranscript,
state,
unitType,
unitId,
) {
const transcriptText = stringifyMessages(executorTranscript);
const refusal = classifyExecutorRefusal(executorTranscript);
const lines = [
"## Autonomous Solver Pass",
"",
`You are the protocol solver for ${unitType} ${unitId} · iteration ${state?.iteration ?? "unknown"} of ${state?.maxIterations ?? "unknown"}.`,
"",
"Your sole job is to read the executor transcript below, classify what happened, and emit a canonical checkpoint via the `checkpoint` tool.",
"Do NOT edit files, run commands, or propose code changes. Observe and classify only.",
"",
"## Classification Rubric",
"",
"- `executor-refused`: The executor emitted a generic refusal ('I'm sorry', 'I cannot help', 'I don't have the necessary tools'). → checkpoint outcome=`blocked`, blockerReason=`executor-refused`.",
"- `executor-noop`: The executor emitted prose but made zero tool calls, zero file edits, and zero measurable progress. → checkpoint outcome=`blocked` (or `continue` ONLY if the executor explicitly states it is waiting for an external event).",
"- `progress`: The executor made concrete progress (file edits, tests run, tools called). → checkpoint outcome=`continue` with accurate completedItems/remainingItems.",
"- `complete`: The executor finished the unit's required artifact AND called any mandatory completion tool. → checkpoint outcome=`complete`.",
"- `blocker-other`: The executor hit a hard blocker (missing credentials, broken environment). → checkpoint outcome=`blocked` with a precise blockerReason.",
"",
"## Executor Transcript",
"",
"```",
transcriptText,
"```",
"",
];
if (refusal) {
lines.push(
`⚠️ Refusal pattern detected: ${refusal.pattern}.`,
"The executor refused the task. Emit outcome='blocked' with blockerReason='executor-refused'.",
"",
);
}
lines.push(
"Call `checkpoint` with all eight PDD fields and accurate completedItems / remainingItems.",
"Your final action MUST be the checkpoint tool call.",
);
return lines.join("\n");
}
/**
* Detect whether an executor transcript contains zero meaningful work.
*
* Purpose: no-op iterations (continue checkpoint with zero file/tool activity)
* must not satisfy the repair gate.
*
* Consumer: assessAutonomousSolverTurn to reject no-op continues.
*/
export function isNoOpExecutorTranscript(messages) {
if (!Array.isArray(messages) || messages.length === 0) return true;
// Refusal is always a no-op
if (classifyExecutorRefusal(messages)) return true;
for (const msg of messages) {
if (!msg || typeof msg !== "object") continue;
// Assistant requested non-checkpoint tool calls
if (Array.isArray(msg.tool_calls)) {
for (const tc of msg.tool_calls) {
const name = tc?.function?.name ?? tc?.name ?? "";
if (name && name !== "checkpoint") {
return false;
}
}
}
// Tool results from non-checkpoint tools
if (msg.role === "tool" || msg.role === "tool_result") {
const name = msg.name ?? "";
if (name && name !== "checkpoint") {
return false;
}
}
// Content that shows concrete work was done
const content = typeof msg.content === "string" ? msg.content : "";
if (
content.includes("File edited") ||
content.includes("File written") ||
content.includes("File created") ||
content.includes("```diff") ||
content.includes("--- a/") ||
content.includes("+++ b/")
) {
return false;
}
}
return true;
}
/**
* Record a solver checkpoint and update the markdown projection.
*
@ -541,6 +696,141 @@ export function recordAutonomousSolverMissingCheckpointRetry(
return nextState;
}
/**
* Detect that the executor model refused the task outright (rather than
* attempting and failing the protocol).
*
* Why: when a routed executor model (e.g. a code-completion model like
* Codestral) lacks the agentic capabilities required for the unit, it emits a
* generic refusal "I'm sorry, I currently don't have the necessary tools to
* assist with that specific request." The existing missing-checkpoint repair
* loop will dutifully re-prompt the same model until it emits a syntactically
* valid checkpoint with zero work, fabricating forward progress. Refusal must
* be caught earlier and surfaced as a `blocked` outcome so the loop pauses (or
* the executor model can be escalated on retry) rather than synthesizing a
* `continue` over no work.
*
* Returns null when no refusal pattern is detected.
*
* Consumer: runUnitPhase short-circuits the repair loop on a positive match.
*/
export function classifyExecutorRefusal(messages) {
const text = stringifyMessages(messages);
if (!text.trim()) return null;
const lower = text.toLowerCase();
const patterns = [
{
id: "apology-no-tools",
regex:
/i(?:'m| am)\s+sorry[^.]{0,80}(?:don't|do not|cannot|can't)\s+have\s+(?:the\s+)?(?:necessary\s+)?tools?/i,
},
{
id: "cannot-assist",
regex:
/i\s+(?:cannot|can't|am unable to|won't be able to)\s+(?:assist|help)\s+with\s+(?:that|this)/i,
},
{
id: "not-able-to-help",
regex:
/i\s+(?:am\s+not\s+able\s+to|do not have the ability to|don't have the ability to)\s+(?:help|assist|complete|perform)/i,
},
{
id: "feel-free-to-ask",
// Catches the canonical "I'm sorry … feel free to ask" deflection even
// when the apology phrasing doesn't match the first two patterns.
regex:
/(?:i(?:'m| am)\s+sorry|i\s+apologi[sz]e)[\s\S]{0,200}feel\s+free\s+to\s+ask/i,
},
{
id: "outside-capabilities",
regex:
/(?:that's|that is|this is)\s+(?:outside|beyond)\s+(?:my|the)\s+(?:capabilities|abilities|scope)/i,
},
];
for (const pattern of patterns) {
if (pattern.regex.test(lower) || pattern.regex.test(text)) {
return {
classification: "executor-refused",
pattern: pattern.id,
summary:
"The executor model refused the task rather than attempting it. This is a capability/routing problem, not a protocol problem — repairing the prompt will not produce progress.",
evidence: truncateEvidence(text),
};
}
}
return null;
}
/**
* Memoized lookup: is the `checkpoint` tool registered in the SF extension
* manifest? Used by `classifyAutonomousSolverMissingCheckpointFailure` to
* disambiguate "agent says tool is unavailable" from "agent mentioned a real
* tool but did not call it."
*
* Why memoized: the previous implementation read the manifest from disk on
* every classifier call, with CWD-sensitive path probing surprising hidden
* I/O inside what reads like a pure function, and a test-unfriendly coupling.
* The manifest does not change while the process is running, so a single
* memoized read at first call is correct and fast.
*
* Callers that want test-time control (or are running in environments where
* the manifest can't be located, e.g. CI fixtures) pass an explicit
* `checkpointToolRegistered` override to the classifier instead no need to
* stub the filesystem.
*/
let _checkpointToolRegisteredCache = null;
function isCheckpointToolRegisteredFromManifest() {
if (_checkpointToolRegisteredCache !== null) {
return _checkpointToolRegisteredCache;
}
try {
const manifestPath = join(
process.cwd(),
"dist",
"resources",
"extensions",
"sf",
"extension-manifest.json",
);
const srcManifestPath = join(
process.cwd(),
"src",
"resources",
"extensions",
"sf",
"extension-manifest.json",
);
const manifestContent = existsSync(manifestPath)
? readFileSync(manifestPath, "utf-8")
: existsSync(srcManifestPath)
? readFileSync(srcManifestPath, "utf-8")
: null;
if (!manifestContent) {
_checkpointToolRegisteredCache = false;
return false;
}
const manifest = JSON.parse(manifestContent);
_checkpointToolRegisteredCache =
Array.isArray(manifest?.provides?.tools) &&
manifest.provides.tools.includes("checkpoint");
return _checkpointToolRegisteredCache;
} catch {
_checkpointToolRegisteredCache = false;
return false;
}
}
/**
* Test-only escape hatch to reset the manifest-lookup memoization. Tests that
* exercise the classifier under different "is checkpoint registered" assumptions
* should prefer the explicit `options.checkpointToolRegistered` override on
* `classifyAutonomousSolverMissingCheckpointFailure` this function exists
* only as a safety net for tests that need to clear a polluted module cache.
*/
export function _resetCheckpointToolRegisteredCacheForTests() {
_checkpointToolRegisteredCache = null;
}
/**
* Classify why a solver turn omitted the checkpoint tool.
*
@ -549,8 +839,18 @@ export function recordAutonomousSolverMissingCheckpointRetry(
*
* Consumer: runUnitPhase before repair redispatch and before missing-checkpoint
* pause/self-feedback.
*
* @param {Array} messages
* @param {object} [options]
* @param {boolean} [options.checkpointToolRegistered] - Explicit override for
* the "is the checkpoint tool registered in our manifest?" question. When
* omitted, falls back to the memoized manifest lookup. Tests should pass
* this explicitly so they don't depend on CWD or on-disk dist/ state.
*/
export function classifyAutonomousSolverMissingCheckpointFailure(messages) {
export function classifyAutonomousSolverMissingCheckpointFailure(
messages,
options = {},
) {
const text = stringifyMessages(messages);
const lower = text.toLowerCase();
if (!text.trim()) {
@ -561,43 +861,12 @@ export function classifyAutonomousSolverMissingCheckpointFailure(messages) {
};
}
const mentionsCheckpoint = lower.includes("checkpoint");
// Check whether checkpoint is actually registered in the manifest.
// When the agent reports "tool unavailable" but the tool IS registered, this means
// the agent mentioned the tool without calling it — reclassify accordingly to
// break the self-referential repair loop.
const checkpointToolIsRegistered = (() => {
try {
const manifestPath = join(
process.cwd(),
"dist",
"resources",
"extensions",
"sf",
"extension-manifest.json",
);
const srcManifestPath = join(
process.cwd(),
"src",
"resources",
"extensions",
"sf",
"extension-manifest.json",
);
const manifestContent = existsSync(manifestPath)
? readFileSync(manifestPath, "utf-8")
: existsSync(srcManifestPath)
? readFileSync(srcManifestPath, "utf-8")
: null;
if (!manifestContent) return false;
const manifest = JSON.parse(manifestContent);
return (
Array.isArray(manifest?.provides?.tools) &&
manifest.provides.tools.includes("checkpoint")
);
} catch {
return false;
}
})();
// Resolve "is checkpoint registered" — explicit override wins, otherwise
// fall back to the memoized manifest lookup.
const checkpointToolIsRegistered =
typeof options.checkpointToolRegistered === "boolean"
? options.checkpointToolRegistered
: isCheckpointToolRegisteredFromManifest();
const mentionsToolUnavailable =
/(unknown|unavailable|not available|not found|no such) tool/.test(lower) ||
(lower.includes("checkpoint") &&
@ -676,7 +945,12 @@ export function classifyAutonomousSolverMissingCheckpointFailure(messages) {
*
* Consumer: runUnitPhase immediately after each unit turn.
*/
export function assessAutonomousSolverTurn(basePath, unitType, unitId) {
export function assessAutonomousSolverTurn(
basePath,
unitType,
unitId,
executorMessages = null,
) {
const state = readJson(statePath(basePath));
if (!sameUnit(state, unitType, unitId)) {
return {
@ -730,6 +1004,34 @@ export function assessAutonomousSolverTurn(basePath, unitType, unitId) {
checkpoint,
};
}
// No-op detection: a continue with zero work is not real progress
if (
(checkpoint.outcome === "continue" || checkpoint.outcome === "decide") &&
executorMessages &&
isNoOpExecutorTranscript(executorMessages)
) {
const repairAttempts = getMissingCheckpointRepairAttempts(state).filter(
(attempt) => Number(attempt.iteration) === Number(state.iteration),
).length;
if (repairAttempts >= DEFAULT_MISSING_CHECKPOINT_REPAIR_ATTEMPTS) {
return {
action: "pause",
reason: "solver-noop-continue",
state,
repairAttempts,
maxRepairAttempts: DEFAULT_MISSING_CHECKPOINT_REPAIR_ATTEMPTS,
checkpoint,
};
}
return {
action: "missing-checkpoint-retry",
reason: "solver-noop-continue",
state,
repairAttempt: repairAttempts + 1,
maxRepairAttempts: DEFAULT_MISSING_CHECKPOINT_REPAIR_ATTEMPTS,
checkpoint,
};
}
// "decide" is treated as "continue": agent reconstructs best-effort and moves on
return {
action:

View file

@ -0,0 +1,119 @@
/**
* solver-model.js pinned model selection for the autonomous solver role.
*
* Why this exists:
* The "executor" and "autonomous solver" roles were historically conflated
* into a single LLM call selected by the unit-type router. When the router
* picked a coding-tuned or capability-limited model for the executor (e.g.
* `mistral/codestral-latest`, `google-gemini-cli/gemini-3-flash-preview`),
* the same model was expected to (a) do the unit work and (b) emit the
* canonical protocol checkpoint. Models that refuse agentic tasks or fail
* to follow tool-use contracts broke the protocol layer entirely and the
* missing-checkpoint repair loop could only re-prompt the same broken
* model, synthesizing fake `continue` outcomes over zero progress.
*
* The solver role MUST stay on a stable, agentic, refusal-resistant model
* independent of any per-unit routing choices. This module is the single
* place that decision is made.
*
* Contract:
* - Default solver model is `kimi-k2.6` (provider: `kimi-coding`).
* - Preference override is accepted ONLY when the operator has explicitly
* opted into it via `preferences.autonomousSolver.model`. Router output,
* benchmark scoring, learning blender, and unit-type routing are NEVER
* consulted here.
* - A fallback chain is provided so a brief outage of the primary does not
* take the protocol layer with it.
*
* Consumers (forthcoming): the solver-pass invocation in auto/phases-unit.js
* once the two-layer loop lands (see ADR-0079).
*/
/**
* Default model for the autonomous solver role. Locked. Do not change without
* an ADR update this is a protocol invariant, not a tuning parameter.
*/
export const SOLVER_MODEL_DEFAULT = {
provider: "kimi-coding",
id: "kimi-k2.6",
};
/**
* Explicit fallback chain when the default is unreachable. Ordered by
* preference. Each entry must be a stable agentic model that follows tool-use
* contracts; nothing on this list is a code-completion-only model.
*/
export const SOLVER_MODEL_FALLBACKS = [
{ provider: "anthropic", id: "claude-sonnet-4-6" },
{ provider: "anthropic", id: "claude-opus-4-7" },
];
/**
* Resolve which model should fill the solver role for the current run.
*
* @param {object} [preferences] - Operator preferences object. Only consulted
* for `preferences.autonomousSolver.model`. Anything else is ignored.
* @returns {{ provider: string, id: string }} the selected solver model
*/
export function resolveSolverModel(preferences) {
const override = preferences?.autonomousSolver?.model;
if (override && typeof override === "object" && override.id) {
return {
provider: String(override.provider ?? SOLVER_MODEL_DEFAULT.provider),
id: String(override.id),
};
}
if (typeof override === "string" && override.trim()) {
// Allow "provider/model" short form for ergonomics; default provider when
// only a model id is supplied.
const trimmed = override.trim();
const slash = trimmed.indexOf("/");
if (slash > 0) {
return {
provider: trimmed.slice(0, slash),
id: trimmed.slice(slash + 1),
};
}
return { provider: SOLVER_MODEL_DEFAULT.provider, id: trimmed };
}
return { ...SOLVER_MODEL_DEFAULT };
}
/**
* Resolve the ordered candidate list for the solver role: primary first, then
* the fallback chain. Callers iterate until they find a reachable provider.
*
* @param {object} [preferences]
* @returns {Array<{ provider: string, id: string }>}
*/
export function resolveSolverModelCandidates(preferences) {
const primary = resolveSolverModel(preferences);
const candidates = [primary];
for (const fallback of SOLVER_MODEL_FALLBACKS) {
if (
fallback.provider === primary.provider &&
fallback.id === primary.id
) {
continue;
}
candidates.push({ ...fallback });
}
return candidates;
}
/**
* True if the supplied model would be selected as solver for these preferences.
* Useful for invariants and tests.
*
* @param {{ provider?: string, id?: string }} model
* @param {object} [preferences]
* @returns {boolean}
*/
export function isSolverModel(model, preferences) {
if (!model?.id) return false;
const solver = resolveSolverModel(preferences);
return (
String(model.provider ?? solver.provider) === solver.provider &&
String(model.id) === solver.id
);
}

View file

@ -7,13 +7,17 @@ import {
appendAutonomousSolverSteering,
assessAutonomousSolverTurn,
beginAutonomousSolverIteration,
buildAutonomousExecutorPromptBlock,
buildAutonomousSolverMissingCheckpointRepairPrompt,
buildAutonomousSolverPromptBlock,
buildSolverPassPrompt,
classifyAutonomousSolverMissingCheckpointFailure,
classifyExecutorRefusal,
consumePendingAutonomousSolverSteering,
detectSolverLoop,
getConfiguredAutonomousSolverMaxIterations,
getSolverPhase,
isNoOpExecutorTranscript,
readAutonomousSolverState,
readLatestAutonomousSolverCheckpoint,
recordAutonomousSolverMissingCheckpointRetry,
@ -565,3 +569,368 @@ describe("autonomous solver", () => {
expect(prompt).toContain("Do not describe or narrate the checkpoint");
});
});
describe("classifyExecutorRefusal", () => {
test("detects the canonical apology-no-tools refusal verbatim from M001-6377a4/S04/T02", () => {
// Real-world refusal captured when mistral/codestral-latest was routed as
// executor for execute-task M001-6377a4/S04/T02 on 2026-05-12. The
// classifier must catch this exact phrasing or the repair loop will
// re-prompt the same broken model and synthesize fake progress.
const refusal = classifyExecutorRefusal([
{
role: "assistant",
content:
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request. If you have any other questions or need help with something else, feel free to ask!",
},
]);
expect(refusal).not.toBeNull();
expect(refusal.classification).toBe("executor-refused");
// The "apology-no-tools" pattern is the most specific; "feel-free-to-ask"
// is a fallback that may match the same string. Either is acceptable as
// long as the result is a refusal.
expect(["apology-no-tools", "feel-free-to-ask"]).toContain(refusal.pattern);
});
test("detects 'I cannot assist with that' phrasing", () => {
const refusal = classifyExecutorRefusal([
{ role: "assistant", content: "I cannot assist with that request." },
]);
expect(refusal).not.toBeNull();
expect(refusal.pattern).toBe("cannot-assist");
});
test("detects 'outside my capabilities' phrasing", () => {
const refusal = classifyExecutorRefusal([
{
role: "assistant",
content:
"That's outside my capabilities — I am unable to perform file edits.",
},
]);
expect(refusal).not.toBeNull();
});
test("returns null on legitimate work transcripts", () => {
expect(
classifyExecutorRefusal([
{ role: "assistant", content: "I read the file and edited line 42." },
]),
).toBeNull();
expect(
classifyExecutorRefusal([
{
role: "assistant",
content:
"Checkpoint recorded: outcome=continue, completed steps 1-3.",
},
]),
).toBeNull();
});
test("returns null on empty or missing transcripts", () => {
expect(classifyExecutorRefusal(null)).toBeNull();
expect(classifyExecutorRefusal([])).toBeNull();
expect(
classifyExecutorRefusal([{ role: "assistant", content: "" }]),
).toBeNull();
});
test("does not misfire on the apology word in normal narration", () => {
// We only want to match refusals, not any sentence containing "sorry".
// A model saying "Sorry for the long output below — here's the full
// diff" should not be classified as a refusal.
const refusal = classifyExecutorRefusal([
{
role: "assistant",
content:
"Sorry for the long output below — here is the full diff of the change. I am about to run the tests now.",
},
]);
expect(refusal).toBeNull();
});
test("evidence is truncated for storage", () => {
const refusal = classifyExecutorRefusal([
{
role: "assistant",
content:
"I'm sorry, I currently don't have the necessary tools. " +
"x".repeat(8000),
},
]);
expect(refusal).not.toBeNull();
expect(refusal.evidence.length).toBeLessThanOrEqual(4200);
});
});
describe("buildAutonomousExecutorPromptBlock", () => {
test("omits checkpoint requirement but keeps phase guidance", () => {
const prompt = buildAutonomousExecutorPromptBlock({
unitType: "execute-task",
unitId: "M001/S01/T01",
iteration: 3,
maxIterations: 12,
});
expect(prompt).toContain("Autonomous Executor Contract");
expect(prompt).toContain("/autonomous iteration 3 of 12");
expect(prompt).toContain("EXECUTE PHASE");
expect(prompt).not.toContain("CHECKPOINT REQUIREMENT");
expect(prompt).not.toContain(
"Hard requirement: before ending the turn, call the actual `checkpoint` tool",
);
expect(prompt).toContain("You do NOT need to call the `checkpoint` tool");
expect(prompt).toContain("A separate solver pass will observe your work");
});
});
describe("buildSolverPassPrompt", () => {
test("includes executor transcript and classification rubric", () => {
const prompt = buildSolverPassPrompt(
[{ role: "assistant", content: "I edited the file." }],
{ iteration: 2, maxIterations: 10 },
"execute-task",
"M001/S01/T01",
);
expect(prompt).toContain("Autonomous Solver Pass");
expect(prompt).toContain("protocol solver for execute-task M001/S01/T01");
expect(prompt).toContain("Classification Rubric");
expect(prompt).toContain("executor-refused");
expect(prompt).toContain("executor-noop");
expect(prompt).toContain("progress");
expect(prompt).toContain("I edited the file.");
expect(prompt).toContain(
"Your final action MUST be the checkpoint tool call",
);
});
test("injects refusal warning when refusal is detected", () => {
const prompt = buildSolverPassPrompt(
[
{
role: "assistant",
content:
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request.",
},
],
{ iteration: 1, maxIterations: 10 },
"execute-task",
"M001/S01/T01",
);
expect(prompt).toContain("Refusal pattern detected");
expect(prompt).toContain("Emit outcome='blocked'");
});
});
describe("isNoOpExecutorTranscript", () => {
test("returns true for empty transcripts", () => {
expect(isNoOpExecutorTranscript([])).toBe(true);
expect(isNoOpExecutorTranscript(null)).toBe(true);
expect(isNoOpExecutorTranscript(undefined)).toBe(true);
});
test("returns true for refusal transcripts", () => {
expect(
isNoOpExecutorTranscript([
{
role: "assistant",
content:
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request.",
},
]),
).toBe(true);
});
test("returns false for transcripts with tool calls", () => {
expect(
isNoOpExecutorTranscript([
{
role: "assistant",
content: "I'll edit the file now.",
tool_calls: [
{
id: "tc_1",
function: { name: "edit", arguments: "{}" },
},
],
},
]),
).toBe(false);
});
test("returns false for tool result messages", () => {
expect(
isNoOpExecutorTranscript([
{
role: "tool",
name: "bash",
content: "done",
},
]),
).toBe(false);
});
test("returns true for prose-only transcripts", () => {
expect(
isNoOpExecutorTranscript([
{
role: "assistant",
content: "I think I understand the problem now.",
},
]),
).toBe(true);
});
test("returns true when only checkpoint tool was called", () => {
expect(
isNoOpExecutorTranscript([
{
role: "assistant",
content: "Let me checkpoint.",
tool_calls: [
{
id: "tc_1",
function: { name: "checkpoint", arguments: "{}" },
},
],
},
]),
).toBe(true);
});
});
describe("assessAutonomousSolverTurn no-op detection", () => {
test("continue_with_no_op_executor_messages_returns_missing_checkpoint_retry", () => {
const project = makeProject();
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
appendAutonomousSolverCheckpoint(project, {
unitType: "execute-task",
unitId: "M001/S01/T01",
outcome: "continue",
summary: "More work remains.",
completedItems: ["First pass"],
remainingItems: ["Second pass"],
verificationEvidence: ["npx vitest run focused.test.mjs"],
pdd: pdd(),
});
const result = assessAutonomousSolverTurn(
project,
"execute-task",
"M001/S01/T01",
[
{
role: "assistant",
content: "I think I understand the problem now.",
},
],
);
expect(result.action).toBe("missing-checkpoint-retry");
expect(result.reason).toBe("solver-noop-continue");
});
test("continue_with_real_work_executor_messages_returns_continue", () => {
const project = makeProject();
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
appendAutonomousSolverCheckpoint(project, {
unitType: "execute-task",
unitId: "M001/S01/T01",
outcome: "continue",
summary: "More work remains.",
completedItems: ["First pass"],
remainingItems: ["Second pass"],
verificationEvidence: ["npx vitest run focused.test.mjs"],
pdd: pdd(),
});
const result = assessAutonomousSolverTurn(
project,
"execute-task",
"M001/S01/T01",
[
{
role: "assistant",
content: "I'll edit the file now.",
tool_calls: [
{
id: "tc_1",
function: { name: "edit", arguments: "{}" },
},
],
},
],
);
expect(result.action).toBe("continue");
expect(result.reason).toBe("solver-continue");
});
test("no_op_continue_after_max_repairs_returns_pause", () => {
const project = makeProject();
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
appendAutonomousSolverCheckpoint(project, {
unitType: "execute-task",
unitId: "M001/S01/T01",
outcome: "continue",
summary: "More work remains.",
completedItems: ["First pass"],
remainingItems: ["Second pass"],
verificationEvidence: ["npx vitest run focused.test.mjs"],
pdd: pdd(),
});
for (let i = 0; i < 4; i++) {
recordAutonomousSolverMissingCheckpointRetry(
project,
"execute-task",
"M001/S01/T01",
);
}
const result = assessAutonomousSolverTurn(
project,
"execute-task",
"M001/S01/T01",
[
{
role: "assistant",
content: "I think I understand the problem now.",
},
],
);
expect(result.action).toBe("pause");
expect(result.reason).toBe("solver-noop-continue");
expect(result.repairAttempts).toBe(4);
});
test("refusal_transcript_returns_missing_checkpoint_retry_even_with_continue_checkpoint", () => {
const project = makeProject();
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
appendAutonomousSolverCheckpoint(project, {
unitType: "execute-task",
unitId: "M001/S01/T01",
outcome: "continue",
summary: "More work remains.",
completedItems: ["First pass"],
remainingItems: ["Second pass"],
verificationEvidence: ["npx vitest run focused.test.mjs"],
pdd: pdd(),
});
const result = assessAutonomousSolverTurn(
project,
"execute-task",
"M001/S01/T01",
[
{
role: "assistant",
content:
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request.",
},
],
);
expect(result.action).toBe("missing-checkpoint-retry");
expect(result.reason).toBe("solver-noop-continue");
});
});