fix(auto/loop): convergence guard breaks the reassess-roadmap redispatch loop
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions

Dogfood today: autonomous mode burned $4.95 / 33.5M tokens / 28 min /
500 unproductive iterations on reassess-roadmap M006/S01 redispatching
the SAME unit ≥45 consecutive times before runaway-guard finally
fired. Each cycle: unit dispatches → swarm planner completes → unit
exits "success" → next iteration sees the same doctor slice-ref
health issue → re-queues the same unit. The auto-post-unit
auto-remediate path (insertArtifact for ASSESSMENT files) is wired
correctly but the reassess-roadmap unit's success doesn't actually
resolve the doctor's slice-reference issues — so the gate keeps
firing.

SF already has detectStuck Rule 2 ("Same unit 3+ consecutive times →
stuck") in auto/detect-stuck.js, but the doctor-health-reassess-
roadmap shortcut in auto/loop.js:1095-1170 bypasses normal pre-dispatch
and unshifts directly to sidecarQueue — so the unit never goes through
the phases-dispatch path that pushes to loopState.recentUnits, and
detectStuck never sees the repetition.

Convergence guard: before unshifting reassess-roadmap, check whether
the SAME (unitType + unitId) just ran 3+ consecutive times in
loopState.recentUnits. If yes:
  - Skip the redispatch (don't unshift, don't finishTurn("retry"))
  - File a self-feedback entry kind=engine-loop:non-converging-
    redispatch so triage sees the pattern and can plan a real fix
  - Fall through to normal runPreDispatch so the existing detectStuck
    machinery can break the loop the next time the same key derives.

This is the user's "Ralph Wiggum loop" pattern — system observing its
own failure repeatedly without ever escaping. The broader convergence-
detector / solver-handoff / quarantine framework is filed for slice
planning in sf-mp8x32sy-70w298; this commit is the minimum surgical
fix for the specific reassess-roadmap-via-doctor-shortcut path that
actually fired today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-17 00:31:23 +02:00
parent 6481e54fec
commit 56e8ec6c53

View file

@ -1136,31 +1136,74 @@ export async function autoLoop(ctx, pi, s, deps) {
const midTitle = sfState.activeMilestone?.title ?? "";
const sliceId = sfState.activeSlice?.id ?? "reassess";
if (mid) {
ctx.ui.notify(
`Health issues detected with slice references — queuing reassess-roadmap instead of pausing.`,
"warning",
{
noticeKind: NOTICE_KIND.SYSTEM_NOTICE,
dedupe_key: "doctor-health-reassess-roadmap",
},
);
const { buildReassessRoadmapPrompt } = await import(
"../auto-prompts.js"
);
const reassessPrompt = await buildReassessRoadmapPrompt(
mid,
midTitle,
sliceId,
s.basePath,
);
s.sidecarQueue.unshift({
kind: "hook",
unitType: "reassess-roadmap",
unitId: `${mid}/${sliceId}`,
prompt: `## Doctor Health Issues\n\n${healthCheck.issues.map((i) => `- ${i}`).join("\n")}\n\n${reassessPrompt}`,
});
finishTurn("retry");
continue;
// Convergence guard (Ralph Wiggum): if the SAME
// reassess-roadmap target just ran 3+ consecutive
// times the doctor's slice-ref issues evidently
// aren't being resolved by reassessment. Skip
// the redispatch, file self-feedback, and fall
// through to normal pre-dispatch so the existing
// detectStuck path (Rule 2) can break the loop
// instead of looping forever burning tokens.
const newKey = `reassess-roadmap:${mid}/${sliceId}`;
const recentKeys = (loopState.recentUnits || [])
.slice(-3)
.map((u) => u?.key);
const stuckOnReassess =
recentKeys.length === 3 &&
recentKeys.every((k) => k === newKey);
if (stuckOnReassess) {
ctx.ui.notify(
`Convergence guard: ${newKey} succeeded 3 consecutive times but doctor's slice-ref issues persist. Skipping redispatch — running normal pre-dispatch so detectStuck can break the loop.`,
"warning",
{
noticeKind: NOTICE_KIND.SYSTEM_NOTICE,
dedupe_key: "convergence-guard-reassess",
},
);
try {
recordSelfFeedback(
{
kind: "engine-loop:non-converging-redispatch",
severity: "high",
summary: `${newKey} dispatched 3 consecutive times with success exit, but doctor's slice-reference health issues persist. Convergence guard skipped further redispatch.`,
evidence: `Doctor health issues persisting after 3 successful reassess-roadmap cycles: ${healthCheck.issues.slice(0, 5).join(" | ")}`,
},
s.basePath,
);
} catch {
// Filing must never block the loop's recovery path.
}
// Fall through to normal pre-dispatch (no
// unshift, no finishTurn — the next phases
// will either advance state via a different
// unit or hit detectStuck and bail.
} else {
ctx.ui.notify(
`Health issues detected with slice references — queuing reassess-roadmap instead of pausing.`,
"warning",
{
noticeKind: NOTICE_KIND.SYSTEM_NOTICE,
dedupe_key: "doctor-health-reassess-roadmap",
},
);
const { buildReassessRoadmapPrompt } = await import(
"../auto-prompts.js"
);
const reassessPrompt = await buildReassessRoadmapPrompt(
mid,
midTitle,
sliceId,
s.basePath,
);
s.sidecarQueue.unshift({
kind: "hook",
unitType: "reassess-roadmap",
unitId: `${mid}/${sliceId}`,
prompt: `## Doctor Health Issues\n\n${healthCheck.issues.map((i) => `- ${i}`).join("\n")}\n\n${reassessPrompt}`,
});
finishTurn("retry");
continue;
}
}
}
}