fix(auto): split autonomous solver from executor per ADR-0079
- Lock solver model to kimi-k2.6 independent of unit-type router - Executor prompt no longer requires checkpoint tool call - Add dedicated solver pass that reads executor transcript and emits canonical checkpoint - Classify executor refusals as blocker outcomes (already partially implemented) - Classify no-op iterations (continue with zero work) as missing-checkpoint-retry - Add tests for executor prompt block, solver pass prompt, no-op detection, and no-op assessment Fixes sf-mp34nxb6-27zdx7
This commit is contained in:
parent
e2f2cb7e2e
commit
55229f6604
5 changed files with 1250 additions and 55 deletions
159
docs/adr/0079-autonomous-solver-executor-separation.md
Normal file
159
docs/adr/0079-autonomous-solver-executor-separation.md
Normal file
|
|
@ -0,0 +1,159 @@
|
|||
# ADR-0079: Autonomous Solver / Executor Separation
|
||||
|
||||
**Status:** Proposed
|
||||
**Date:** 2026-05-12
|
||||
**Stakeholders:** Autonomous mode, model router, checkpoint protocol, runtime safety
|
||||
**Related:** `.sf/self-feedback.jsonl` entry `sf-mp34nxb6-27zdx7` (architecture-defect:solver-executor-conflation)
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
Today the autonomous loop conflates two distinct roles into a single LLM call:
|
||||
|
||||
1. **Executor** — does the unit work (read files, run tests, edit code).
|
||||
2. **Autonomous solver** — observes what the executor produced and emits a canonical checkpoint to disk (`outcome`, `completedItems`, `remainingItems`, PDD, verification evidence).
|
||||
|
||||
Both roles are filled by the same model, picked by `model-router.js:computeTaskRequirements` from the unit type (`execute-task`, `plan-slice`, …). The router optimizes for the *executor's* job — cost, coding capability, speed — and may select a small coding-tuned model (Codestral, Devstral, Gemini Flash). Those models are *not* required to be agentic, refusal-resistant, or stable at protocol reasoning.
|
||||
|
||||
When the chosen model is incapable of the agentic role, the protocol breaks in a way the repair loop cannot fix:
|
||||
|
||||
- **2026-05-12 M001-6377a4/S04/T02:** `mistral/codestral-latest` was routed to execute T02 (Align TUI Dashboard with Headless Status Output). It emitted:
|
||||
> "I'm sorry, but I currently don't have the necessary tools to assist with that specific request."
|
||||
|
||||
No tool was called. The runtime logged `Autonomous solver checkpoint missing … repair attempt 1/4 (mentioned-checkpoint-without-tool)`, then prompted the *same* Codestral with stronger "you MUST call the checkpoint tool" wording. Codestral dutifully called `Autonomous Checkpoint` with `outcome=continue` — and produced zero file edits, zero work. The protocol layer reported success; the slice made no progress.
|
||||
|
||||
The repair logic at `auto/phases-unit.js:720-890` only enforces **protocol shape** ("did the LLM emit a checkpoint tool call?"). It does not check **outcome** ("did the unit progress?") or **refusal** ("did the executor refuse the task?"). And because executor and solver are the same call, retrying the repair just re-asks the broken model.
|
||||
|
||||
## Goals
|
||||
|
||||
1. The protocol layer must remain functional even when the executor refuses or is incapable.
|
||||
2. Refusals must surface as blockers that can escalate model tier — not silently synthesize forward progress.
|
||||
3. No-op iterations (continue with zero work) must not satisfy the repair gate.
|
||||
4. Solver model choice must be stable and independent of unit-type routing.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Replacing the model router for executors. Routing per `unitType` remains; cheap/specialized models are still desirable for unit work.
|
||||
- Mandating a specific solver vendor. The locked solver model is a pinned default; ops may override via preferences.
|
||||
- Reworking the checkpoint schema. The same JSON shape persists; only *who emits it* changes.
|
||||
|
||||
## Proposed Architecture
|
||||
|
||||
### Two-Layer Loop
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ runUnit(ctx, unitType, unitId, prompt) │
|
||||
└─────────────────────┬───────────────────┘
|
||||
│
|
||||
┌───────────────────────┴───────────────────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────────────────┐ ┌───────────────────────────┐
|
||||
│ EXECUTOR PASS │ │ SOLVER PASS │
|
||||
│ model: routed per unit │ transcript → │ model: LOCKED kimi-k2.6 │
|
||||
│ (Codestral, Gemini, ...) │ ────────────────▶ │ reads agent_end messages, │
|
||||
│ does the unit work │ │ emits canonical checkpoint │
|
||||
│ NO checkpoint tool needed │ │ classifies refusal/no-op │
|
||||
└───────────────────────────┘ └─────────────┬─────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────────────┐
|
||||
│ appendAutonomousSolver- │
|
||||
│ Checkpoint(basePath, …) │
|
||||
└───────────────────────────┘
|
||||
```
|
||||
|
||||
### Solver Model Selection
|
||||
|
||||
A new helper `resolveSolverModel(preferences)` returns the pinned solver model. It:
|
||||
|
||||
- Defaults to `kimi-k2.6` (provider: `kimi-coding`).
|
||||
- Allows preference override via `preferences.autonomousSolver.model` (operator escape hatch).
|
||||
- **Never** consults the unit-type router, benchmark selector, Bayesian blender, or learning aggregator. The solver's model is a runtime invariant, not an optimization target.
|
||||
- Falls back along a small explicit chain (`kimi-k2.6` → `claude-sonnet-4-6` → `claude-opus-4-7`) if the primary is unreachable. Falls back to "synthesize blocker" if none reachable, rather than silently dropping the protocol layer.
|
||||
|
||||
### Solver Pass Contract
|
||||
|
||||
Input: `{ unitType, unitId, executorTranscript, lastIteration, projection }`.
|
||||
|
||||
Output (a checkpoint, written via `appendAutonomousSolverCheckpoint`):
|
||||
|
||||
```json
|
||||
{
|
||||
"outcome": "continue|complete|blocker",
|
||||
"summary": "...",
|
||||
"completedItems": [...],
|
||||
"remainingItems": [...],
|
||||
"verificationEvidence": [...],
|
||||
"pdd": { "purpose": "...", "consumer": "...", ... },
|
||||
"classification": "executor-refused|executor-noop|progress|complete|blocker-...",
|
||||
"evidence": "string excerpts proving the classification"
|
||||
}
|
||||
```
|
||||
|
||||
The solver's prompt is a deterministic template at `prompts/autonomous-solver.md` that:
|
||||
|
||||
1. Embeds the executor transcript.
|
||||
2. States the schema and outcome rules.
|
||||
3. Includes the refusal/no-op classification rubric.
|
||||
4. Instructs the solver to **never** propose code edits — its job is to observe, classify, and write the checkpoint.
|
||||
|
||||
### Refusal Classification
|
||||
|
||||
`assessAutonomousSolverTurn` (and the new solver-pass) checks executor transcript for:
|
||||
|
||||
| Pattern | Classification | Action |
|
||||
|---|---|---|
|
||||
| "I'm sorry", "I cannot help", "I don't have the necessary tools", "I can't assist with that" | `executor-refused` | Emit `outcome=blocker`; on retry, escalate executor model tier |
|
||||
| Zero tool calls, zero file edits, transcript < threshold | `executor-noop` | Emit `outcome=blocker` (or `continue` only if executor explicitly states a wait state); on retry, do not treat synthesized continue as progress |
|
||||
| Tool calls + edits + explicit "I'm done" / completion signal | `progress` or `complete` | Emit `outcome=continue` or `complete` as appropriate |
|
||||
|
||||
### Model Escalation on Refusal
|
||||
|
||||
When solver classifies `executor-refused`, the loop records the executor's model and unit-type into a "no-fly" entry. On the next iteration of the same unit, the router consults this list and selects the next tier up (Sonnet → Opus, or via a model-tier graph). After 2 escalations on the same unit, pause the loop with a hard blocker.
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
- The existing checkpoint shape is preserved; downstream consumers (`auto-post-unit.js`, journal events, learning aggregator) are unchanged.
|
||||
- The "executor calls the checkpoint tool" path is retained as a **fast path**: if the executor *did* emit a valid checkpoint AND the solver agrees with its classification, the solver pass is a no-op rubber stamp. The solver only synthesizes when the executor failed to checkpoint or classified incorrectly.
|
||||
- The `mentioned-checkpoint-without-tool` repair attempts collapse to zero — the solver is now the source of truth, so a missing executor checkpoint is normal, not a defect.
|
||||
|
||||
## Migration
|
||||
|
||||
### Step 1 — Pin solver model
|
||||
|
||||
Add `resolveSolverModel` to `model-router.js` (or a new `solver-model.js`). It does not participate in the router's capability scoring. Wire it into `runUnit`'s solver-pass invocation only.
|
||||
|
||||
### Step 2 — Add solver pass
|
||||
|
||||
After `runUnit` returns, before `assessAutonomousSolverTurn`, run the solver pass with the executor transcript. The solver pass writes the checkpoint directly. Executor checkpoint tool calls remain accepted but become advisory.
|
||||
|
||||
### Step 3 — Refusal classifier
|
||||
|
||||
Extend `classifyAutonomousSolverMissingCheckpointFailure` (rename to `classifyExecutorTurn`) to detect refusal patterns. Drive `outcome=blocker` from classification, not from "missing checkpoint."
|
||||
|
||||
### Step 4 — Model escalation
|
||||
|
||||
Add a per-(unitId, model) no-fly entry on `executor-refused`. Router consults the list during selection.
|
||||
|
||||
### Step 5 — Tests
|
||||
|
||||
Cover: pinned solver model invariant, refusal pattern detection, no-op detection, solver-pass checkpoint emission when executor is silent, fast-path bypass when executor emits a valid checkpoint, escalation chain.
|
||||
|
||||
## Risks
|
||||
|
||||
- **Solver-pass cost.** Adds one LLM call per unit. Mitigation: solver pass uses a smaller prompt (transcript summary only) and is skippable when executor emitted a valid checkpoint.
|
||||
- **Locked model availability.** If `kimi-k2.6` is unreachable, solver pass fails. Mitigation: explicit fallback chain; if all fail, pause loop rather than synthesize.
|
||||
- **Solver hallucination.** Solver could mis-classify and over-emit blockers. Mitigation: deterministic prompt template, classification rubric with example transcripts, and self-feedback when classification flips between iterations.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. Should the solver pass run *during* the executor turn (streaming observer) or *after* (post-turn observer)? Post-turn is simpler and proposed here; streaming would catch refusals earlier but adds complexity.
|
||||
2. Should the solver pass also re-evaluate the executor's verification evidence (cite tests that actually exist, etc.) — i.e. become a partial verifier — or stay narrowly focused on checkpoint emission?
|
||||
3. How does this interact with `keepSession: true` in `runUnit`? The solver pass is a separate session by definition; the executor session remains as-is.
|
||||
|
||||
## Decision Outcome (when accepted)
|
||||
|
||||
To be filled when the ADR is accepted. Initial cut targets steps 1–3 (pinned solver model + solver pass + refusal classifier). Steps 4–5 (escalation + tests) follow in a subsequent slice.
|
||||
|
|
@ -26,17 +26,23 @@ import {
|
|||
appendAutonomousSolverCheckpoint,
|
||||
assessAutonomousSolverTurn,
|
||||
beginAutonomousSolverIteration,
|
||||
buildAutonomousExecutorPromptBlock,
|
||||
buildAutonomousSolverMissingCheckpointRepairPrompt,
|
||||
buildAutonomousSolverPromptBlock,
|
||||
buildAutonomousSolverSteeringPromptBlock,
|
||||
buildSolverPassPrompt,
|
||||
classifyAutonomousSolverMissingCheckpointFailure,
|
||||
classifyExecutorRefusal,
|
||||
consumePendingAutonomousSolverSteering,
|
||||
getConfiguredAutonomousSolverMaxIterations,
|
||||
isNoOpExecutorTranscript,
|
||||
readAutonomousSolverState,
|
||||
recordAutonomousSolverMissingCheckpointRetry,
|
||||
} from "../autonomous-solver.js";
|
||||
import { resumeAutoAfterProviderDelay } from "../bootstrap/provider-error-resume.js";
|
||||
import { debugLog } from "../debug-logger.js";
|
||||
import { PROJECT_FILES } from "../detection.js";
|
||||
import { getErrorMessage } from "../error-utils.js";
|
||||
import { MergeConflictError } from "../git-service.js";
|
||||
import { recordLearnedOutcome } from "../learning/runtime.js";
|
||||
import { sfRoot } from "../paths.js";
|
||||
|
|
@ -73,6 +79,14 @@ import {
|
|||
} from "../sf-db.js";
|
||||
import { getEligibleSlices } from "../slice-parallel-eligibility.js";
|
||||
import { startSliceParallel } from "../slice-parallel-orchestrator.js";
|
||||
import {
|
||||
clearSliceRoutingForUnit,
|
||||
recordSliceRouting,
|
||||
} from "../slice-routing-cache.js";
|
||||
import {
|
||||
resolveSolverModel,
|
||||
resolveSolverModelCandidates,
|
||||
} from "../solver-model.js";
|
||||
import { handleProductAudit } from "../tools/product-audit-tool.js";
|
||||
import { parseUnitId } from "../unit-id.js";
|
||||
import {
|
||||
|
|
@ -114,14 +128,17 @@ import {
|
|||
FINALIZE_PRE_TIMEOUT_MS,
|
||||
withTimeout,
|
||||
} from "./finalize-timeout.js";
|
||||
import {
|
||||
emitCancelledUnitEnd,
|
||||
recordLearningOutcomeForUnit,
|
||||
shouldSkipArtifactVerification,
|
||||
} from "./phases-helpers.js";
|
||||
import { runUnit } from "./run-unit.js";
|
||||
import { getErrorMessage } from "../error-utils.js";
|
||||
import {
|
||||
BUDGET_THRESHOLDS,
|
||||
MAX_FINALIZE_TIMEOUTS,
|
||||
MAX_RECOVERY_CHARS,
|
||||
} from "./types.js";
|
||||
import { emitCancelledUnitEnd, recordLearningOutcomeForUnit, shouldSkipArtifactVerification } from "./phases-helpers.js";
|
||||
|
||||
// ─── Session timeout scheduled resume state ────────────────────────────────────────
|
||||
let consecutiveSessionTimeouts = 0;
|
||||
|
|
@ -458,7 +475,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
|
|||
if (steeringBlock) {
|
||||
finalPrompt = `${finalPrompt}\n\n---\n\n${steeringBlock}`;
|
||||
}
|
||||
finalPrompt = `${finalPrompt}\n\n---\n\n${buildAutonomousSolverPromptBlock(solverState)}`;
|
||||
finalPrompt = `${finalPrompt}\n\n---\n\n${buildAutonomousExecutorPromptBlock(solverState)}`;
|
||||
deps.emitJournalEvent({
|
||||
ts: new Date().toISOString(),
|
||||
flowId: ic.flowId,
|
||||
|
|
@ -505,8 +522,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
|
|||
try {
|
||||
finalPrompt = deps.reorderForCaching(finalPrompt);
|
||||
} catch (reorderErr) {
|
||||
const msg =
|
||||
getErrorMessage(reorderErr);
|
||||
const msg = getErrorMessage(reorderErr);
|
||||
logWarning("engine", "Prompt reorder failed", { error: msg });
|
||||
}
|
||||
// Select and apply model (with tier escalation on retry — normal units only)
|
||||
|
|
@ -706,13 +722,227 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
|
|||
const unitResult = await runUnit(ctx, pi, s, unitType, unitId, finalPrompt);
|
||||
s.lastUnitAgentEndMessages = unitResult.event?.messages ?? null;
|
||||
let currentUnitResult = unitResult;
|
||||
const executorMessages = unitResult.event?.messages ?? [];
|
||||
const refusal =
|
||||
unitResult.status !== "cancelled"
|
||||
? classifyExecutorRefusal(executorMessages)
|
||||
: null;
|
||||
|
||||
// Short-circuit: if runUnit was cancelled (provider not ready, session
|
||||
// failed, timeout) there is no checkpoint to repair — skip the repair loop
|
||||
// failed, timeout) there is no checkpoint to repair — skip the solver pass
|
||||
// entirely and let the cancelled handler below surface the real cause.
|
||||
let solverAssessment =
|
||||
unitResult.status === "cancelled"
|
||||
? { action: "none" }
|
||||
: assessAutonomousSolverTurn(s.basePath, unitType, unitId);
|
||||
: { action: "pending" };
|
||||
|
||||
// Refusal short-circuit: when the executor model returned a generic refusal,
|
||||
// synthesize a blocked checkpoint immediately and skip the solver pass.
|
||||
if (unitResult.status !== "cancelled" && refusal) {
|
||||
const executorModel =
|
||||
s.currentUnitModel?.provider && s.currentUnitModel?.id
|
||||
? `${s.currentUnitModel.provider}/${s.currentUnitModel.id}`
|
||||
: (s.currentUnitModel?.id ?? "unknown");
|
||||
// Evict the sticky-routing entry for this slice — the model attached
|
||||
// to it refused, so future units in the same slice should NOT re-pin
|
||||
// the broken model.
|
||||
try {
|
||||
clearSliceRoutingForUnit(s.basePath, unitId);
|
||||
} catch {
|
||||
// best-effort
|
||||
}
|
||||
try {
|
||||
appendAutonomousSolverCheckpoint(s.basePath, {
|
||||
unitType,
|
||||
unitId,
|
||||
outcome: "blocked",
|
||||
summary: `Executor (${executorModel}) refused the task. Pattern: ${refusal.pattern}. Repair-prompting the same model cannot produce progress; escalate the executor model or unblock this unit manually.`,
|
||||
completedItems: [],
|
||||
remainingItems: [
|
||||
`Re-run ${unitType} ${unitId} with a more capable executor model — current routing selected an incapable model.`,
|
||||
],
|
||||
verificationEvidence: [
|
||||
`executor-refusal-pattern=${refusal.pattern}`,
|
||||
`executor-model=${executorModel}`,
|
||||
],
|
||||
blockerReason: `executor-refused (${refusal.pattern})`,
|
||||
pdd: {
|
||||
purpose:
|
||||
"Surface executor refusals as protocol-level blockers instead of synthesizing fake progress.",
|
||||
consumer: "autonomous loop pause-handler",
|
||||
contract:
|
||||
"On `executor-refused`, the loop pauses and self-feedback is filed; the operator must escalate the executor model.",
|
||||
failureBoundary:
|
||||
"If the operator does not escalate, the same refusal will recur on next dispatch.",
|
||||
evidence: "classifyExecutorRefusal matched a refusal pattern",
|
||||
nonGoals:
|
||||
"This does not retry the unit automatically — capability mismatches require operator judgement (or a future automatic escalation policy).",
|
||||
invariants: "Refusal never silently synthesizes a continue.",
|
||||
assumptions:
|
||||
"The refusal pattern set in classifyExecutorRefusal is conservative — false positives are rare and require operator review.",
|
||||
},
|
||||
});
|
||||
} catch {
|
||||
// If synthesis fails, fall through to solver pass
|
||||
}
|
||||
try {
|
||||
const feedback = recordSelfFeedback(
|
||||
{
|
||||
kind: "executor-refused",
|
||||
severity: "high",
|
||||
summary: `Executor ${executorModel} refused ${unitType} ${unitId} with pattern ${refusal.pattern}; loop paused to prevent fake-progress synthesis.`,
|
||||
evidence: [
|
||||
`unit=${unitType} ${unitId}`,
|
||||
`executor=${executorModel}`,
|
||||
`refusal-pattern=${refusal.pattern}`,
|
||||
"",
|
||||
refusal.evidence ?? "",
|
||||
].join("\n"),
|
||||
suggestedFix:
|
||||
"Escalate the executor model for this unit (or unit type) — the currently routed model lacks the agentic capabilities required. Long-term: separate the executor and autonomous-solver roles per ADR-0079 and pin the solver to a stable agentic model.",
|
||||
acceptanceCriteria: [
|
||||
"Executor model for this unit type is escalated to a model that passes the refusal-resistant tier.",
|
||||
"Refusal pattern is added to classifyExecutorRefusal if a novel phrasing slipped through.",
|
||||
],
|
||||
occurredIn: { unitType, unitId },
|
||||
source: "runtime",
|
||||
},
|
||||
s.basePath,
|
||||
);
|
||||
deps.emitJournalEvent({
|
||||
ts: new Date().toISOString(),
|
||||
flowId: ic.flowId,
|
||||
seq: ic.nextSeq(),
|
||||
eventType: "executor-refused",
|
||||
data: {
|
||||
unitType,
|
||||
unitId,
|
||||
executorModel,
|
||||
pattern: refusal.pattern,
|
||||
selfFeedbackId: feedback?.entry?.id,
|
||||
blocking: feedback?.blocking,
|
||||
},
|
||||
});
|
||||
} catch {
|
||||
// self-feedback is observability; never block loop progression on it
|
||||
}
|
||||
ctx.ui.notify(
|
||||
`Executor ${executorModel} refused ${unitType} ${unitId} (${refusal.pattern}); autonomous loop pausing instead of synthesizing fake progress. See SELF-FEEDBACK.md for escalation guidance.`,
|
||||
"error",
|
||||
);
|
||||
solverAssessment = assessAutonomousSolverTurn(s.basePath, unitType, unitId);
|
||||
}
|
||||
|
||||
// Solver pass: the stable solver model reads the executor transcript and
|
||||
// emits the canonical checkpoint. This separates the executor role (unit
|
||||
// work) from the solver role (protocol checkpoint) per ADR-0079.
|
||||
if (unitResult.status !== "cancelled" && !refusal) {
|
||||
const executorModel = s.currentUnitModel;
|
||||
const solverCandidates = resolveSolverModelCandidates(prefs);
|
||||
let solverPassResult = null;
|
||||
|
||||
for (const candidate of solverCandidates) {
|
||||
const availableModels = ctx.modelRegistry.getAvailable?.() ?? [];
|
||||
const match = availableModels.find(
|
||||
(m) => m.provider === candidate.provider && m.id === candidate.id,
|
||||
);
|
||||
if (!match) continue;
|
||||
|
||||
const ok = await pi.setModel(match, { persist: false });
|
||||
if (!ok) continue;
|
||||
|
||||
s.currentUnitModel = match;
|
||||
ctx.ui.notify(
|
||||
`Running solver pass for ${unitType} ${unitId} with ${match.provider}/${match.id}`,
|
||||
"info",
|
||||
);
|
||||
|
||||
const solverState = readAutonomousSolverState(s.basePath);
|
||||
const solverPrompt = buildSolverPassPrompt(
|
||||
executorMessages,
|
||||
solverState,
|
||||
unitType,
|
||||
unitId,
|
||||
);
|
||||
|
||||
try {
|
||||
const result = await runUnit(
|
||||
ctx,
|
||||
pi,
|
||||
s,
|
||||
unitType,
|
||||
unitId,
|
||||
solverPrompt,
|
||||
{ keepSession: false },
|
||||
);
|
||||
solverPassResult = result;
|
||||
if (result.status !== "cancelled") {
|
||||
currentUnitResult = result;
|
||||
s.lastUnitAgentEndMessages = result.event?.messages ?? null;
|
||||
break; // Solver pass succeeded
|
||||
}
|
||||
} catch {
|
||||
// Try next fallback
|
||||
}
|
||||
}
|
||||
|
||||
if (!solverPassResult || solverPassResult.status === "cancelled") {
|
||||
ctx.ui.notify(
|
||||
`Solver pass failed for ${unitType} ${unitId} — no solver model was reachable. Synthesizing blocked checkpoint.`,
|
||||
"error",
|
||||
);
|
||||
try {
|
||||
appendAutonomousSolverCheckpoint(s.basePath, {
|
||||
unitType,
|
||||
unitId,
|
||||
outcome: "blocked",
|
||||
summary: `Solver pass failed — no solver model was reachable. The executor transcript could not be classified into a canonical checkpoint.`,
|
||||
completedItems: [],
|
||||
remainingItems: [
|
||||
`Retry ${unitType} ${unitId} after verifying solver model availability.`,
|
||||
],
|
||||
verificationEvidence: ["solver-pass-failed"],
|
||||
blockerReason: "solver-pass-failed",
|
||||
pdd: {
|
||||
purpose:
|
||||
"Surface solver-pass failures as blockers rather than silently dropping the protocol layer.",
|
||||
consumer: "autonomous loop pause-handler",
|
||||
contract:
|
||||
"On solver-pass failure, the loop pauses so the operator can fix model availability.",
|
||||
failureBoundary:
|
||||
"If all solver candidates are unreachable, the protocol layer cannot function.",
|
||||
evidence:
|
||||
"All solver candidates were unreachable or setModel failed.",
|
||||
nonGoals:
|
||||
"This does not retry with a different solver candidate automatically beyond the explicit fallback chain.",
|
||||
invariants:
|
||||
"Solver-pass failure never silently synthesizes a continue.",
|
||||
assumptions:
|
||||
"At least one solver candidate (kimi-k2.6 or fallback) is available in the model registry.",
|
||||
},
|
||||
});
|
||||
} catch {
|
||||
// best-effort
|
||||
}
|
||||
}
|
||||
|
||||
solverAssessment = assessAutonomousSolverTurn(
|
||||
s.basePath,
|
||||
unitType,
|
||||
unitId,
|
||||
executorMessages,
|
||||
);
|
||||
|
||||
// Restore executor model after solver pass and assessment
|
||||
if (executorModel) {
|
||||
try {
|
||||
await pi.setModel(executorModel, { persist: false });
|
||||
} catch {
|
||||
// best-effort restore
|
||||
}
|
||||
s.currentUnitModel = executorModel;
|
||||
}
|
||||
}
|
||||
while (solverAssessment.action === "missing-checkpoint-retry") {
|
||||
const diagnosis = classifyAutonomousSolverMissingCheckpointFailure(
|
||||
currentUnitResult.event?.messages ?? [],
|
||||
|
|
@ -779,6 +1009,26 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
|
|||
remainingCount: solverCheckpoint.remainingItems?.length ?? 0,
|
||||
},
|
||||
});
|
||||
// Record sticky-routing on successful outcomes only. `continue` is the
|
||||
// usual within-iteration progress signal; `complete` is final success.
|
||||
// We deliberately skip `blocked` and `decide` because attaching a model
|
||||
// to a slice when it's known-stuck or known-undecided would defeat the
|
||||
// fallback path.
|
||||
if (
|
||||
solverCheckpoint.outcome === "continue" ||
|
||||
solverCheckpoint.outcome === "complete"
|
||||
) {
|
||||
try {
|
||||
recordSliceRouting(
|
||||
s.basePath,
|
||||
unitType,
|
||||
unitId,
|
||||
s.currentUnitModel ?? ctx.model ?? null,
|
||||
);
|
||||
} catch {
|
||||
// best-effort; routing cache must never break the loop
|
||||
}
|
||||
}
|
||||
}
|
||||
if (solverAssessment.action === "pause") {
|
||||
const isMissingCheckpoint =
|
||||
|
|
@ -808,7 +1058,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
|
|||
acceptanceCriteria: [
|
||||
"Missing-checkpoint repair attempts include failure classification in the prompt.",
|
||||
"Repeated repair failures file self-feedback automatically.",
|
||||
"Loop continues with a synthesized checkpoint instead of pausing for human input.",
|
||||
"Loop continues with a synthesized checkpoint instead of pausing for human input — EXCEPT when classifyExecutorRefusal short-circuits with `executor-refused`, in which case the loop emits a `blocked` checkpoint and pauses (synthesizing forward progress over a refusing executor is the bug we are fixing).",
|
||||
],
|
||||
occurredIn: { unitType, unitId },
|
||||
source: "runtime",
|
||||
|
|
@ -1087,8 +1337,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
|
|||
resume: allowAutoResume
|
||||
? () => {
|
||||
void resumeAutoAfterProviderDelay(pi, ctx).catch((err) => {
|
||||
const message =
|
||||
getErrorMessage(err);
|
||||
const message = getErrorMessage(err);
|
||||
ctx.ui.notify(
|
||||
`Session timeout recovery failed: ${message}`,
|
||||
"error",
|
||||
|
|
@ -1280,10 +1529,7 @@ export async function runUnitPhase(ic, iterData, loopState, sidecarItem) {
|
|||
});
|
||||
} catch (err) {
|
||||
/* non-fatal — anchor is advisory */
|
||||
logWarning(
|
||||
"engine",
|
||||
`phase anchor failed: ${getErrorMessage(err)}`,
|
||||
);
|
||||
logWarning("engine", `phase anchor failed: ${getErrorMessage(err)}`);
|
||||
}
|
||||
}
|
||||
if (currentUnitResult.status !== "completed" || !artifactVerified) {
|
||||
|
|
|
|||
|
|
@ -281,7 +281,7 @@ export function beginAutonomousSolverIteration(
|
|||
*
|
||||
* Consumer: runUnitPhase prompt injection.
|
||||
*/
|
||||
export function buildAutonomousSolverPromptBlock(state) {
|
||||
function _buildAutonomousLoopPromptPrefix(state, header) {
|
||||
const phase = getSolverPhase(state.iteration, state.maxIterations);
|
||||
const stalled =
|
||||
Number(state.iterationsSinceProgress) >= STALL_THRESHOLD_ITERATIONS;
|
||||
|
|
@ -306,7 +306,7 @@ export function buildAutonomousSolverPromptBlock(state) {
|
|||
};
|
||||
|
||||
const lines = [
|
||||
"## Autonomous Solver Loop Contract",
|
||||
`## ${header}`,
|
||||
"",
|
||||
`You are inside /autonomous iteration ${state.iteration} of ${state.maxIterations} for ${state.unitType} ${state.unitId}.`,
|
||||
"",
|
||||
|
|
@ -357,6 +357,25 @@ export function buildAutonomousSolverPromptBlock(state) {
|
|||
);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the PDD autonomous solver prompt block appended to unit prompts.
|
||||
*
|
||||
* Purpose: bind every autonomous unit to bounded iterations, evidence, stop
|
||||
* signals, and the eight PDD fields instead of open-ended hidden retries.
|
||||
* Phase-aware: ORIENT (iters 1-2) focuses on reading and planning; EXECUTE
|
||||
* (middle) on implementation; CLOSE (final 3) on verifying and wrapping up.
|
||||
* Stall/loop signals are injected when the system detects no progress.
|
||||
*
|
||||
* Consumer: runUnitPhase prompt injection (solver pass).
|
||||
*/
|
||||
export function buildAutonomousSolverPromptBlock(state) {
|
||||
const lines = _buildAutonomousLoopPromptPrefix(
|
||||
state,
|
||||
"Autonomous Solver Loop Contract",
|
||||
);
|
||||
lines.push(
|
||||
"",
|
||||
"## CHECKPOINT REQUIREMENT",
|
||||
|
|
@ -390,6 +409,142 @@ export function buildAutonomousSolverPromptBlock(state) {
|
|||
return lines.join("\n");
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the executor prompt block (no checkpoint requirement).
|
||||
*
|
||||
* Purpose: the executor focuses on doing the unit work. A separate solver pass
|
||||
* reads the executor transcript and emits the canonical checkpoint.
|
||||
*
|
||||
* Consumer: runUnitPhase prompt injection (executor pass).
|
||||
*/
|
||||
export function buildAutonomousExecutorPromptBlock(state) {
|
||||
const lines = _buildAutonomousLoopPromptPrefix(
|
||||
state,
|
||||
"Autonomous Executor Contract",
|
||||
);
|
||||
lines.push(
|
||||
"",
|
||||
"## EXECUTOR ROLE",
|
||||
"",
|
||||
"Your job is to do the unit work: read files, run tests, edit code, and produce concrete artifacts.",
|
||||
"You do NOT need to call the `checkpoint` tool. A separate solver pass will observe your work and emit the canonical checkpoint.",
|
||||
"Focus entirely on making verifiable progress toward the task goal.",
|
||||
"",
|
||||
"If you are executing an `execute-task` unit and the task is finished, `complete_task` remains mandatory.",
|
||||
"End your turn when the bounded work is done or when you have made meaningful progress and need to wait for the next iteration.",
|
||||
);
|
||||
return lines.join("\n");
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the solver pass prompt that reads an executor transcript.
|
||||
*
|
||||
* Purpose: give the stable solver model the executor transcript and instruct it
|
||||
* to classify what happened and emit the canonical checkpoint.
|
||||
*
|
||||
* Consumer: runUnitPhase after the executor pass returns.
|
||||
*/
|
||||
export function buildSolverPassPrompt(
|
||||
executorTranscript,
|
||||
state,
|
||||
unitType,
|
||||
unitId,
|
||||
) {
|
||||
const transcriptText = stringifyMessages(executorTranscript);
|
||||
const refusal = classifyExecutorRefusal(executorTranscript);
|
||||
|
||||
const lines = [
|
||||
"## Autonomous Solver Pass",
|
||||
"",
|
||||
`You are the protocol solver for ${unitType} ${unitId} · iteration ${state?.iteration ?? "unknown"} of ${state?.maxIterations ?? "unknown"}.`,
|
||||
"",
|
||||
"Your sole job is to read the executor transcript below, classify what happened, and emit a canonical checkpoint via the `checkpoint` tool.",
|
||||
"Do NOT edit files, run commands, or propose code changes. Observe and classify only.",
|
||||
"",
|
||||
"## Classification Rubric",
|
||||
"",
|
||||
"- `executor-refused`: The executor emitted a generic refusal ('I'm sorry', 'I cannot help', 'I don't have the necessary tools'). → checkpoint outcome=`blocked`, blockerReason=`executor-refused`.",
|
||||
"- `executor-noop`: The executor emitted prose but made zero tool calls, zero file edits, and zero measurable progress. → checkpoint outcome=`blocked` (or `continue` ONLY if the executor explicitly states it is waiting for an external event).",
|
||||
"- `progress`: The executor made concrete progress (file edits, tests run, tools called). → checkpoint outcome=`continue` with accurate completedItems/remainingItems.",
|
||||
"- `complete`: The executor finished the unit's required artifact AND called any mandatory completion tool. → checkpoint outcome=`complete`.",
|
||||
"- `blocker-other`: The executor hit a hard blocker (missing credentials, broken environment). → checkpoint outcome=`blocked` with a precise blockerReason.",
|
||||
"",
|
||||
"## Executor Transcript",
|
||||
"",
|
||||
"```",
|
||||
transcriptText,
|
||||
"```",
|
||||
"",
|
||||
];
|
||||
|
||||
if (refusal) {
|
||||
lines.push(
|
||||
`⚠️ Refusal pattern detected: ${refusal.pattern}.`,
|
||||
"The executor refused the task. Emit outcome='blocked' with blockerReason='executor-refused'.",
|
||||
"",
|
||||
);
|
||||
}
|
||||
|
||||
lines.push(
|
||||
"Call `checkpoint` with all eight PDD fields and accurate completedItems / remainingItems.",
|
||||
"Your final action MUST be the checkpoint tool call.",
|
||||
);
|
||||
|
||||
return lines.join("\n");
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect whether an executor transcript contains zero meaningful work.
|
||||
*
|
||||
* Purpose: no-op iterations (continue checkpoint with zero file/tool activity)
|
||||
* must not satisfy the repair gate.
|
||||
*
|
||||
* Consumer: assessAutonomousSolverTurn to reject no-op continues.
|
||||
*/
|
||||
export function isNoOpExecutorTranscript(messages) {
|
||||
if (!Array.isArray(messages) || messages.length === 0) return true;
|
||||
|
||||
// Refusal is always a no-op
|
||||
if (classifyExecutorRefusal(messages)) return true;
|
||||
|
||||
for (const msg of messages) {
|
||||
if (!msg || typeof msg !== "object") continue;
|
||||
|
||||
// Assistant requested non-checkpoint tool calls
|
||||
if (Array.isArray(msg.tool_calls)) {
|
||||
for (const tc of msg.tool_calls) {
|
||||
const name = tc?.function?.name ?? tc?.name ?? "";
|
||||
if (name && name !== "checkpoint") {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Tool results from non-checkpoint tools
|
||||
if (msg.role === "tool" || msg.role === "tool_result") {
|
||||
const name = msg.name ?? "";
|
||||
if (name && name !== "checkpoint") {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Content that shows concrete work was done
|
||||
const content = typeof msg.content === "string" ? msg.content : "";
|
||||
if (
|
||||
content.includes("File edited") ||
|
||||
content.includes("File written") ||
|
||||
content.includes("File created") ||
|
||||
content.includes("```diff") ||
|
||||
content.includes("--- a/") ||
|
||||
content.includes("+++ b/")
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Record a solver checkpoint and update the markdown projection.
|
||||
*
|
||||
|
|
@ -541,6 +696,141 @@ export function recordAutonomousSolverMissingCheckpointRetry(
|
|||
return nextState;
|
||||
}
|
||||
|
||||
/**
|
||||
* Detect that the executor model refused the task outright (rather than
|
||||
* attempting and failing the protocol).
|
||||
*
|
||||
* Why: when a routed executor model (e.g. a code-completion model like
|
||||
* Codestral) lacks the agentic capabilities required for the unit, it emits a
|
||||
* generic refusal — "I'm sorry, I currently don't have the necessary tools to
|
||||
* assist with that specific request." The existing missing-checkpoint repair
|
||||
* loop will dutifully re-prompt the same model until it emits a syntactically
|
||||
* valid checkpoint with zero work, fabricating forward progress. Refusal must
|
||||
* be caught earlier and surfaced as a `blocked` outcome so the loop pauses (or
|
||||
* the executor model can be escalated on retry) rather than synthesizing a
|
||||
* `continue` over no work.
|
||||
*
|
||||
* Returns null when no refusal pattern is detected.
|
||||
*
|
||||
* Consumer: runUnitPhase short-circuits the repair loop on a positive match.
|
||||
*/
|
||||
export function classifyExecutorRefusal(messages) {
|
||||
const text = stringifyMessages(messages);
|
||||
if (!text.trim()) return null;
|
||||
const lower = text.toLowerCase();
|
||||
const patterns = [
|
||||
{
|
||||
id: "apology-no-tools",
|
||||
regex:
|
||||
/i(?:'m| am)\s+sorry[^.]{0,80}(?:don't|do not|cannot|can't)\s+have\s+(?:the\s+)?(?:necessary\s+)?tools?/i,
|
||||
},
|
||||
{
|
||||
id: "cannot-assist",
|
||||
regex:
|
||||
/i\s+(?:cannot|can't|am unable to|won't be able to)\s+(?:assist|help)\s+with\s+(?:that|this)/i,
|
||||
},
|
||||
{
|
||||
id: "not-able-to-help",
|
||||
regex:
|
||||
/i\s+(?:am\s+not\s+able\s+to|do not have the ability to|don't have the ability to)\s+(?:help|assist|complete|perform)/i,
|
||||
},
|
||||
{
|
||||
id: "feel-free-to-ask",
|
||||
// Catches the canonical "I'm sorry … feel free to ask" deflection even
|
||||
// when the apology phrasing doesn't match the first two patterns.
|
||||
regex:
|
||||
/(?:i(?:'m| am)\s+sorry|i\s+apologi[sz]e)[\s\S]{0,200}feel\s+free\s+to\s+ask/i,
|
||||
},
|
||||
{
|
||||
id: "outside-capabilities",
|
||||
regex:
|
||||
/(?:that's|that is|this is)\s+(?:outside|beyond)\s+(?:my|the)\s+(?:capabilities|abilities|scope)/i,
|
||||
},
|
||||
];
|
||||
for (const pattern of patterns) {
|
||||
if (pattern.regex.test(lower) || pattern.regex.test(text)) {
|
||||
return {
|
||||
classification: "executor-refused",
|
||||
pattern: pattern.id,
|
||||
summary:
|
||||
"The executor model refused the task rather than attempting it. This is a capability/routing problem, not a protocol problem — repairing the prompt will not produce progress.",
|
||||
evidence: truncateEvidence(text),
|
||||
};
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Memoized lookup: is the `checkpoint` tool registered in the SF extension
|
||||
* manifest? Used by `classifyAutonomousSolverMissingCheckpointFailure` to
|
||||
* disambiguate "agent says tool is unavailable" from "agent mentioned a real
|
||||
* tool but did not call it."
|
||||
*
|
||||
* Why memoized: the previous implementation read the manifest from disk on
|
||||
* every classifier call, with CWD-sensitive path probing — surprising hidden
|
||||
* I/O inside what reads like a pure function, and a test-unfriendly coupling.
|
||||
* The manifest does not change while the process is running, so a single
|
||||
* memoized read at first call is correct and fast.
|
||||
*
|
||||
* Callers that want test-time control (or are running in environments where
|
||||
* the manifest can't be located, e.g. CI fixtures) pass an explicit
|
||||
* `checkpointToolRegistered` override to the classifier instead — no need to
|
||||
* stub the filesystem.
|
||||
*/
|
||||
let _checkpointToolRegisteredCache = null;
|
||||
function isCheckpointToolRegisteredFromManifest() {
|
||||
if (_checkpointToolRegisteredCache !== null) {
|
||||
return _checkpointToolRegisteredCache;
|
||||
}
|
||||
try {
|
||||
const manifestPath = join(
|
||||
process.cwd(),
|
||||
"dist",
|
||||
"resources",
|
||||
"extensions",
|
||||
"sf",
|
||||
"extension-manifest.json",
|
||||
);
|
||||
const srcManifestPath = join(
|
||||
process.cwd(),
|
||||
"src",
|
||||
"resources",
|
||||
"extensions",
|
||||
"sf",
|
||||
"extension-manifest.json",
|
||||
);
|
||||
const manifestContent = existsSync(manifestPath)
|
||||
? readFileSync(manifestPath, "utf-8")
|
||||
: existsSync(srcManifestPath)
|
||||
? readFileSync(srcManifestPath, "utf-8")
|
||||
: null;
|
||||
if (!manifestContent) {
|
||||
_checkpointToolRegisteredCache = false;
|
||||
return false;
|
||||
}
|
||||
const manifest = JSON.parse(manifestContent);
|
||||
_checkpointToolRegisteredCache =
|
||||
Array.isArray(manifest?.provides?.tools) &&
|
||||
manifest.provides.tools.includes("checkpoint");
|
||||
return _checkpointToolRegisteredCache;
|
||||
} catch {
|
||||
_checkpointToolRegisteredCache = false;
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Test-only escape hatch to reset the manifest-lookup memoization. Tests that
|
||||
* exercise the classifier under different "is checkpoint registered" assumptions
|
||||
* should prefer the explicit `options.checkpointToolRegistered` override on
|
||||
* `classifyAutonomousSolverMissingCheckpointFailure` — this function exists
|
||||
* only as a safety net for tests that need to clear a polluted module cache.
|
||||
*/
|
||||
export function _resetCheckpointToolRegisteredCacheForTests() {
|
||||
_checkpointToolRegisteredCache = null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Classify why a solver turn omitted the checkpoint tool.
|
||||
*
|
||||
|
|
@ -549,8 +839,18 @@ export function recordAutonomousSolverMissingCheckpointRetry(
|
|||
*
|
||||
* Consumer: runUnitPhase before repair redispatch and before missing-checkpoint
|
||||
* pause/self-feedback.
|
||||
*
|
||||
* @param {Array} messages
|
||||
* @param {object} [options]
|
||||
* @param {boolean} [options.checkpointToolRegistered] - Explicit override for
|
||||
* the "is the checkpoint tool registered in our manifest?" question. When
|
||||
* omitted, falls back to the memoized manifest lookup. Tests should pass
|
||||
* this explicitly so they don't depend on CWD or on-disk dist/ state.
|
||||
*/
|
||||
export function classifyAutonomousSolverMissingCheckpointFailure(messages) {
|
||||
export function classifyAutonomousSolverMissingCheckpointFailure(
|
||||
messages,
|
||||
options = {},
|
||||
) {
|
||||
const text = stringifyMessages(messages);
|
||||
const lower = text.toLowerCase();
|
||||
if (!text.trim()) {
|
||||
|
|
@ -561,43 +861,12 @@ export function classifyAutonomousSolverMissingCheckpointFailure(messages) {
|
|||
};
|
||||
}
|
||||
const mentionsCheckpoint = lower.includes("checkpoint");
|
||||
// Check whether checkpoint is actually registered in the manifest.
|
||||
// When the agent reports "tool unavailable" but the tool IS registered, this means
|
||||
// the agent mentioned the tool without calling it — reclassify accordingly to
|
||||
// break the self-referential repair loop.
|
||||
const checkpointToolIsRegistered = (() => {
|
||||
try {
|
||||
const manifestPath = join(
|
||||
process.cwd(),
|
||||
"dist",
|
||||
"resources",
|
||||
"extensions",
|
||||
"sf",
|
||||
"extension-manifest.json",
|
||||
);
|
||||
const srcManifestPath = join(
|
||||
process.cwd(),
|
||||
"src",
|
||||
"resources",
|
||||
"extensions",
|
||||
"sf",
|
||||
"extension-manifest.json",
|
||||
);
|
||||
const manifestContent = existsSync(manifestPath)
|
||||
? readFileSync(manifestPath, "utf-8")
|
||||
: existsSync(srcManifestPath)
|
||||
? readFileSync(srcManifestPath, "utf-8")
|
||||
: null;
|
||||
if (!manifestContent) return false;
|
||||
const manifest = JSON.parse(manifestContent);
|
||||
return (
|
||||
Array.isArray(manifest?.provides?.tools) &&
|
||||
manifest.provides.tools.includes("checkpoint")
|
||||
);
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
})();
|
||||
// Resolve "is checkpoint registered" — explicit override wins, otherwise
|
||||
// fall back to the memoized manifest lookup.
|
||||
const checkpointToolIsRegistered =
|
||||
typeof options.checkpointToolRegistered === "boolean"
|
||||
? options.checkpointToolRegistered
|
||||
: isCheckpointToolRegisteredFromManifest();
|
||||
const mentionsToolUnavailable =
|
||||
/(unknown|unavailable|not available|not found|no such) tool/.test(lower) ||
|
||||
(lower.includes("checkpoint") &&
|
||||
|
|
@ -676,7 +945,12 @@ export function classifyAutonomousSolverMissingCheckpointFailure(messages) {
|
|||
*
|
||||
* Consumer: runUnitPhase immediately after each unit turn.
|
||||
*/
|
||||
export function assessAutonomousSolverTurn(basePath, unitType, unitId) {
|
||||
export function assessAutonomousSolverTurn(
|
||||
basePath,
|
||||
unitType,
|
||||
unitId,
|
||||
executorMessages = null,
|
||||
) {
|
||||
const state = readJson(statePath(basePath));
|
||||
if (!sameUnit(state, unitType, unitId)) {
|
||||
return {
|
||||
|
|
@ -730,6 +1004,34 @@ export function assessAutonomousSolverTurn(basePath, unitType, unitId) {
|
|||
checkpoint,
|
||||
};
|
||||
}
|
||||
// No-op detection: a continue with zero work is not real progress
|
||||
if (
|
||||
(checkpoint.outcome === "continue" || checkpoint.outcome === "decide") &&
|
||||
executorMessages &&
|
||||
isNoOpExecutorTranscript(executorMessages)
|
||||
) {
|
||||
const repairAttempts = getMissingCheckpointRepairAttempts(state).filter(
|
||||
(attempt) => Number(attempt.iteration) === Number(state.iteration),
|
||||
).length;
|
||||
if (repairAttempts >= DEFAULT_MISSING_CHECKPOINT_REPAIR_ATTEMPTS) {
|
||||
return {
|
||||
action: "pause",
|
||||
reason: "solver-noop-continue",
|
||||
state,
|
||||
repairAttempts,
|
||||
maxRepairAttempts: DEFAULT_MISSING_CHECKPOINT_REPAIR_ATTEMPTS,
|
||||
checkpoint,
|
||||
};
|
||||
}
|
||||
return {
|
||||
action: "missing-checkpoint-retry",
|
||||
reason: "solver-noop-continue",
|
||||
state,
|
||||
repairAttempt: repairAttempts + 1,
|
||||
maxRepairAttempts: DEFAULT_MISSING_CHECKPOINT_REPAIR_ATTEMPTS,
|
||||
checkpoint,
|
||||
};
|
||||
}
|
||||
// "decide" is treated as "continue": agent reconstructs best-effort and moves on
|
||||
return {
|
||||
action:
|
||||
|
|
|
|||
119
src/resources/extensions/sf/solver-model.js
Normal file
119
src/resources/extensions/sf/solver-model.js
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
/**
|
||||
* solver-model.js — pinned model selection for the autonomous solver role.
|
||||
*
|
||||
* Why this exists:
|
||||
* The "executor" and "autonomous solver" roles were historically conflated
|
||||
* into a single LLM call selected by the unit-type router. When the router
|
||||
* picked a coding-tuned or capability-limited model for the executor (e.g.
|
||||
* `mistral/codestral-latest`, `google-gemini-cli/gemini-3-flash-preview`),
|
||||
* the same model was expected to (a) do the unit work and (b) emit the
|
||||
* canonical protocol checkpoint. Models that refuse agentic tasks or fail
|
||||
* to follow tool-use contracts broke the protocol layer entirely — and the
|
||||
* missing-checkpoint repair loop could only re-prompt the same broken
|
||||
* model, synthesizing fake `continue` outcomes over zero progress.
|
||||
*
|
||||
* The solver role MUST stay on a stable, agentic, refusal-resistant model
|
||||
* independent of any per-unit routing choices. This module is the single
|
||||
* place that decision is made.
|
||||
*
|
||||
* Contract:
|
||||
* - Default solver model is `kimi-k2.6` (provider: `kimi-coding`).
|
||||
* - Preference override is accepted ONLY when the operator has explicitly
|
||||
* opted into it via `preferences.autonomousSolver.model`. Router output,
|
||||
* benchmark scoring, learning blender, and unit-type routing are NEVER
|
||||
* consulted here.
|
||||
* - A fallback chain is provided so a brief outage of the primary does not
|
||||
* take the protocol layer with it.
|
||||
*
|
||||
* Consumers (forthcoming): the solver-pass invocation in auto/phases-unit.js
|
||||
* once the two-layer loop lands (see ADR-0079).
|
||||
*/
|
||||
|
||||
/**
|
||||
* Default model for the autonomous solver role. Locked. Do not change without
|
||||
* an ADR update — this is a protocol invariant, not a tuning parameter.
|
||||
*/
|
||||
export const SOLVER_MODEL_DEFAULT = {
|
||||
provider: "kimi-coding",
|
||||
id: "kimi-k2.6",
|
||||
};
|
||||
|
||||
/**
|
||||
* Explicit fallback chain when the default is unreachable. Ordered by
|
||||
* preference. Each entry must be a stable agentic model that follows tool-use
|
||||
* contracts; nothing on this list is a code-completion-only model.
|
||||
*/
|
||||
export const SOLVER_MODEL_FALLBACKS = [
|
||||
{ provider: "anthropic", id: "claude-sonnet-4-6" },
|
||||
{ provider: "anthropic", id: "claude-opus-4-7" },
|
||||
];
|
||||
|
||||
/**
|
||||
* Resolve which model should fill the solver role for the current run.
|
||||
*
|
||||
* @param {object} [preferences] - Operator preferences object. Only consulted
|
||||
* for `preferences.autonomousSolver.model`. Anything else is ignored.
|
||||
* @returns {{ provider: string, id: string }} the selected solver model
|
||||
*/
|
||||
export function resolveSolverModel(preferences) {
|
||||
const override = preferences?.autonomousSolver?.model;
|
||||
if (override && typeof override === "object" && override.id) {
|
||||
return {
|
||||
provider: String(override.provider ?? SOLVER_MODEL_DEFAULT.provider),
|
||||
id: String(override.id),
|
||||
};
|
||||
}
|
||||
if (typeof override === "string" && override.trim()) {
|
||||
// Allow "provider/model" short form for ergonomics; default provider when
|
||||
// only a model id is supplied.
|
||||
const trimmed = override.trim();
|
||||
const slash = trimmed.indexOf("/");
|
||||
if (slash > 0) {
|
||||
return {
|
||||
provider: trimmed.slice(0, slash),
|
||||
id: trimmed.slice(slash + 1),
|
||||
};
|
||||
}
|
||||
return { provider: SOLVER_MODEL_DEFAULT.provider, id: trimmed };
|
||||
}
|
||||
return { ...SOLVER_MODEL_DEFAULT };
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the ordered candidate list for the solver role: primary first, then
|
||||
* the fallback chain. Callers iterate until they find a reachable provider.
|
||||
*
|
||||
* @param {object} [preferences]
|
||||
* @returns {Array<{ provider: string, id: string }>}
|
||||
*/
|
||||
export function resolveSolverModelCandidates(preferences) {
|
||||
const primary = resolveSolverModel(preferences);
|
||||
const candidates = [primary];
|
||||
for (const fallback of SOLVER_MODEL_FALLBACKS) {
|
||||
if (
|
||||
fallback.provider === primary.provider &&
|
||||
fallback.id === primary.id
|
||||
) {
|
||||
continue;
|
||||
}
|
||||
candidates.push({ ...fallback });
|
||||
}
|
||||
return candidates;
|
||||
}
|
||||
|
||||
/**
|
||||
* True if the supplied model would be selected as solver for these preferences.
|
||||
* Useful for invariants and tests.
|
||||
*
|
||||
* @param {{ provider?: string, id?: string }} model
|
||||
* @param {object} [preferences]
|
||||
* @returns {boolean}
|
||||
*/
|
||||
export function isSolverModel(model, preferences) {
|
||||
if (!model?.id) return false;
|
||||
const solver = resolveSolverModel(preferences);
|
||||
return (
|
||||
String(model.provider ?? solver.provider) === solver.provider &&
|
||||
String(model.id) === solver.id
|
||||
);
|
||||
}
|
||||
|
|
@ -7,13 +7,17 @@ import {
|
|||
appendAutonomousSolverSteering,
|
||||
assessAutonomousSolverTurn,
|
||||
beginAutonomousSolverIteration,
|
||||
buildAutonomousExecutorPromptBlock,
|
||||
buildAutonomousSolverMissingCheckpointRepairPrompt,
|
||||
buildAutonomousSolverPromptBlock,
|
||||
buildSolverPassPrompt,
|
||||
classifyAutonomousSolverMissingCheckpointFailure,
|
||||
classifyExecutorRefusal,
|
||||
consumePendingAutonomousSolverSteering,
|
||||
detectSolverLoop,
|
||||
getConfiguredAutonomousSolverMaxIterations,
|
||||
getSolverPhase,
|
||||
isNoOpExecutorTranscript,
|
||||
readAutonomousSolverState,
|
||||
readLatestAutonomousSolverCheckpoint,
|
||||
recordAutonomousSolverMissingCheckpointRetry,
|
||||
|
|
@ -565,3 +569,368 @@ describe("autonomous solver", () => {
|
|||
expect(prompt).toContain("Do not describe or narrate the checkpoint");
|
||||
});
|
||||
});
|
||||
|
||||
describe("classifyExecutorRefusal", () => {
|
||||
test("detects the canonical apology-no-tools refusal verbatim from M001-6377a4/S04/T02", () => {
|
||||
// Real-world refusal captured when mistral/codestral-latest was routed as
|
||||
// executor for execute-task M001-6377a4/S04/T02 on 2026-05-12. The
|
||||
// classifier must catch this exact phrasing or the repair loop will
|
||||
// re-prompt the same broken model and synthesize fake progress.
|
||||
const refusal = classifyExecutorRefusal([
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request. If you have any other questions or need help with something else, feel free to ask!",
|
||||
},
|
||||
]);
|
||||
expect(refusal).not.toBeNull();
|
||||
expect(refusal.classification).toBe("executor-refused");
|
||||
// The "apology-no-tools" pattern is the most specific; "feel-free-to-ask"
|
||||
// is a fallback that may match the same string. Either is acceptable as
|
||||
// long as the result is a refusal.
|
||||
expect(["apology-no-tools", "feel-free-to-ask"]).toContain(refusal.pattern);
|
||||
});
|
||||
|
||||
test("detects 'I cannot assist with that' phrasing", () => {
|
||||
const refusal = classifyExecutorRefusal([
|
||||
{ role: "assistant", content: "I cannot assist with that request." },
|
||||
]);
|
||||
expect(refusal).not.toBeNull();
|
||||
expect(refusal.pattern).toBe("cannot-assist");
|
||||
});
|
||||
|
||||
test("detects 'outside my capabilities' phrasing", () => {
|
||||
const refusal = classifyExecutorRefusal([
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"That's outside my capabilities — I am unable to perform file edits.",
|
||||
},
|
||||
]);
|
||||
expect(refusal).not.toBeNull();
|
||||
});
|
||||
|
||||
test("returns null on legitimate work transcripts", () => {
|
||||
expect(
|
||||
classifyExecutorRefusal([
|
||||
{ role: "assistant", content: "I read the file and edited line 42." },
|
||||
]),
|
||||
).toBeNull();
|
||||
expect(
|
||||
classifyExecutorRefusal([
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"Checkpoint recorded: outcome=continue, completed steps 1-3.",
|
||||
},
|
||||
]),
|
||||
).toBeNull();
|
||||
});
|
||||
|
||||
test("returns null on empty or missing transcripts", () => {
|
||||
expect(classifyExecutorRefusal(null)).toBeNull();
|
||||
expect(classifyExecutorRefusal([])).toBeNull();
|
||||
expect(
|
||||
classifyExecutorRefusal([{ role: "assistant", content: "" }]),
|
||||
).toBeNull();
|
||||
});
|
||||
|
||||
test("does not misfire on the apology word in normal narration", () => {
|
||||
// We only want to match refusals, not any sentence containing "sorry".
|
||||
// A model saying "Sorry for the long output below — here's the full
|
||||
// diff" should not be classified as a refusal.
|
||||
const refusal = classifyExecutorRefusal([
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"Sorry for the long output below — here is the full diff of the change. I am about to run the tests now.",
|
||||
},
|
||||
]);
|
||||
expect(refusal).toBeNull();
|
||||
});
|
||||
|
||||
test("evidence is truncated for storage", () => {
|
||||
const refusal = classifyExecutorRefusal([
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"I'm sorry, I currently don't have the necessary tools. " +
|
||||
"x".repeat(8000),
|
||||
},
|
||||
]);
|
||||
expect(refusal).not.toBeNull();
|
||||
expect(refusal.evidence.length).toBeLessThanOrEqual(4200);
|
||||
});
|
||||
});
|
||||
|
||||
describe("buildAutonomousExecutorPromptBlock", () => {
|
||||
test("omits checkpoint requirement but keeps phase guidance", () => {
|
||||
const prompt = buildAutonomousExecutorPromptBlock({
|
||||
unitType: "execute-task",
|
||||
unitId: "M001/S01/T01",
|
||||
iteration: 3,
|
||||
maxIterations: 12,
|
||||
});
|
||||
|
||||
expect(prompt).toContain("Autonomous Executor Contract");
|
||||
expect(prompt).toContain("/autonomous iteration 3 of 12");
|
||||
expect(prompt).toContain("EXECUTE PHASE");
|
||||
expect(prompt).not.toContain("CHECKPOINT REQUIREMENT");
|
||||
expect(prompt).not.toContain(
|
||||
"Hard requirement: before ending the turn, call the actual `checkpoint` tool",
|
||||
);
|
||||
expect(prompt).toContain("You do NOT need to call the `checkpoint` tool");
|
||||
expect(prompt).toContain("A separate solver pass will observe your work");
|
||||
});
|
||||
});
|
||||
|
||||
describe("buildSolverPassPrompt", () => {
|
||||
test("includes executor transcript and classification rubric", () => {
|
||||
const prompt = buildSolverPassPrompt(
|
||||
[{ role: "assistant", content: "I edited the file." }],
|
||||
{ iteration: 2, maxIterations: 10 },
|
||||
"execute-task",
|
||||
"M001/S01/T01",
|
||||
);
|
||||
|
||||
expect(prompt).toContain("Autonomous Solver Pass");
|
||||
expect(prompt).toContain("protocol solver for execute-task M001/S01/T01");
|
||||
expect(prompt).toContain("Classification Rubric");
|
||||
expect(prompt).toContain("executor-refused");
|
||||
expect(prompt).toContain("executor-noop");
|
||||
expect(prompt).toContain("progress");
|
||||
expect(prompt).toContain("I edited the file.");
|
||||
expect(prompt).toContain(
|
||||
"Your final action MUST be the checkpoint tool call",
|
||||
);
|
||||
});
|
||||
|
||||
test("injects refusal warning when refusal is detected", () => {
|
||||
const prompt = buildSolverPassPrompt(
|
||||
[
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request.",
|
||||
},
|
||||
],
|
||||
{ iteration: 1, maxIterations: 10 },
|
||||
"execute-task",
|
||||
"M001/S01/T01",
|
||||
);
|
||||
|
||||
expect(prompt).toContain("Refusal pattern detected");
|
||||
expect(prompt).toContain("Emit outcome='blocked'");
|
||||
});
|
||||
});
|
||||
|
||||
describe("isNoOpExecutorTranscript", () => {
|
||||
test("returns true for empty transcripts", () => {
|
||||
expect(isNoOpExecutorTranscript([])).toBe(true);
|
||||
expect(isNoOpExecutorTranscript(null)).toBe(true);
|
||||
expect(isNoOpExecutorTranscript(undefined)).toBe(true);
|
||||
});
|
||||
|
||||
test("returns true for refusal transcripts", () => {
|
||||
expect(
|
||||
isNoOpExecutorTranscript([
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request.",
|
||||
},
|
||||
]),
|
||||
).toBe(true);
|
||||
});
|
||||
|
||||
test("returns false for transcripts with tool calls", () => {
|
||||
expect(
|
||||
isNoOpExecutorTranscript([
|
||||
{
|
||||
role: "assistant",
|
||||
content: "I'll edit the file now.",
|
||||
tool_calls: [
|
||||
{
|
||||
id: "tc_1",
|
||||
function: { name: "edit", arguments: "{}" },
|
||||
},
|
||||
],
|
||||
},
|
||||
]),
|
||||
).toBe(false);
|
||||
});
|
||||
|
||||
test("returns false for tool result messages", () => {
|
||||
expect(
|
||||
isNoOpExecutorTranscript([
|
||||
{
|
||||
role: "tool",
|
||||
name: "bash",
|
||||
content: "done",
|
||||
},
|
||||
]),
|
||||
).toBe(false);
|
||||
});
|
||||
|
||||
test("returns true for prose-only transcripts", () => {
|
||||
expect(
|
||||
isNoOpExecutorTranscript([
|
||||
{
|
||||
role: "assistant",
|
||||
content: "I think I understand the problem now.",
|
||||
},
|
||||
]),
|
||||
).toBe(true);
|
||||
});
|
||||
|
||||
test("returns true when only checkpoint tool was called", () => {
|
||||
expect(
|
||||
isNoOpExecutorTranscript([
|
||||
{
|
||||
role: "assistant",
|
||||
content: "Let me checkpoint.",
|
||||
tool_calls: [
|
||||
{
|
||||
id: "tc_1",
|
||||
function: { name: "checkpoint", arguments: "{}" },
|
||||
},
|
||||
],
|
||||
},
|
||||
]),
|
||||
).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
describe("assessAutonomousSolverTurn no-op detection", () => {
|
||||
test("continue_with_no_op_executor_messages_returns_missing_checkpoint_retry", () => {
|
||||
const project = makeProject();
|
||||
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
|
||||
appendAutonomousSolverCheckpoint(project, {
|
||||
unitType: "execute-task",
|
||||
unitId: "M001/S01/T01",
|
||||
outcome: "continue",
|
||||
summary: "More work remains.",
|
||||
completedItems: ["First pass"],
|
||||
remainingItems: ["Second pass"],
|
||||
verificationEvidence: ["npx vitest run focused.test.mjs"],
|
||||
pdd: pdd(),
|
||||
});
|
||||
|
||||
const result = assessAutonomousSolverTurn(
|
||||
project,
|
||||
"execute-task",
|
||||
"M001/S01/T01",
|
||||
[
|
||||
{
|
||||
role: "assistant",
|
||||
content: "I think I understand the problem now.",
|
||||
},
|
||||
],
|
||||
);
|
||||
expect(result.action).toBe("missing-checkpoint-retry");
|
||||
expect(result.reason).toBe("solver-noop-continue");
|
||||
});
|
||||
|
||||
test("continue_with_real_work_executor_messages_returns_continue", () => {
|
||||
const project = makeProject();
|
||||
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
|
||||
appendAutonomousSolverCheckpoint(project, {
|
||||
unitType: "execute-task",
|
||||
unitId: "M001/S01/T01",
|
||||
outcome: "continue",
|
||||
summary: "More work remains.",
|
||||
completedItems: ["First pass"],
|
||||
remainingItems: ["Second pass"],
|
||||
verificationEvidence: ["npx vitest run focused.test.mjs"],
|
||||
pdd: pdd(),
|
||||
});
|
||||
|
||||
const result = assessAutonomousSolverTurn(
|
||||
project,
|
||||
"execute-task",
|
||||
"M001/S01/T01",
|
||||
[
|
||||
{
|
||||
role: "assistant",
|
||||
content: "I'll edit the file now.",
|
||||
tool_calls: [
|
||||
{
|
||||
id: "tc_1",
|
||||
function: { name: "edit", arguments: "{}" },
|
||||
},
|
||||
],
|
||||
},
|
||||
],
|
||||
);
|
||||
expect(result.action).toBe("continue");
|
||||
expect(result.reason).toBe("solver-continue");
|
||||
});
|
||||
|
||||
test("no_op_continue_after_max_repairs_returns_pause", () => {
|
||||
const project = makeProject();
|
||||
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
|
||||
appendAutonomousSolverCheckpoint(project, {
|
||||
unitType: "execute-task",
|
||||
unitId: "M001/S01/T01",
|
||||
outcome: "continue",
|
||||
summary: "More work remains.",
|
||||
completedItems: ["First pass"],
|
||||
remainingItems: ["Second pass"],
|
||||
verificationEvidence: ["npx vitest run focused.test.mjs"],
|
||||
pdd: pdd(),
|
||||
});
|
||||
|
||||
for (let i = 0; i < 4; i++) {
|
||||
recordAutonomousSolverMissingCheckpointRetry(
|
||||
project,
|
||||
"execute-task",
|
||||
"M001/S01/T01",
|
||||
);
|
||||
}
|
||||
|
||||
const result = assessAutonomousSolverTurn(
|
||||
project,
|
||||
"execute-task",
|
||||
"M001/S01/T01",
|
||||
[
|
||||
{
|
||||
role: "assistant",
|
||||
content: "I think I understand the problem now.",
|
||||
},
|
||||
],
|
||||
);
|
||||
expect(result.action).toBe("pause");
|
||||
expect(result.reason).toBe("solver-noop-continue");
|
||||
expect(result.repairAttempts).toBe(4);
|
||||
});
|
||||
|
||||
test("refusal_transcript_returns_missing_checkpoint_retry_even_with_continue_checkpoint", () => {
|
||||
const project = makeProject();
|
||||
beginAutonomousSolverIteration(project, "execute-task", "M001/S01/T01");
|
||||
appendAutonomousSolverCheckpoint(project, {
|
||||
unitType: "execute-task",
|
||||
unitId: "M001/S01/T01",
|
||||
outcome: "continue",
|
||||
summary: "More work remains.",
|
||||
completedItems: ["First pass"],
|
||||
remainingItems: ["Second pass"],
|
||||
verificationEvidence: ["npx vitest run focused.test.mjs"],
|
||||
pdd: pdd(),
|
||||
});
|
||||
|
||||
const result = assessAutonomousSolverTurn(
|
||||
project,
|
||||
"execute-task",
|
||||
"M001/S01/T01",
|
||||
[
|
||||
{
|
||||
role: "assistant",
|
||||
content:
|
||||
"I'm sorry, but I currently don't have the necessary tools to assist with that specific request.",
|
||||
},
|
||||
],
|
||||
);
|
||||
expect(result.action).toBe("missing-checkpoint-retry");
|
||||
expect(result.reason).toBe("solver-noop-continue");
|
||||
});
|
||||
});
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue