fix: verify artifacts on disk before bailing on dispatch loop limit

The loop detection in dispatchNextUnit stops auto-mode when a unit has
been dispatched MAX_UNIT_DISPATCHES (3) times. Previously, only
execute-task had reconciliation logic to check whether the artifact
actually exists on disk before bailing. All other unit types
(complete-slice, plan-slice, research-slice, etc.) would immediately
stop — even if the Nth attempt successfully produced the artifact.

This is a race between the dispatch counter and disk verification:
the counter increments at dispatch time, but artifact verification
only runs during closeout of the NEXT unit. If the last allowed
attempt succeeds, the counter is already at the limit when the next
dispatch tries to run, and nobody checks disk state.

Reproduction scenario:
1. complete-slice dispatched 3 times (LLM missed writing UAT on
   attempts 1-2, succeeded on attempt 3)
2. Attempt 3 produces both SUMMARY and UAT — auto-committed to disk
3. Dispatch 4 fires: prevCount (3) >= MAX_UNIT_DISPATCHES (3)
4. No disk check for complete-slice → pipeline stops with
   'Expected artifact not found' despite artifacts existing

Fix: add a general verifyExpectedArtifact() check after the
execute-task-specific reconciliation and before the final bail-out.
If artifacts exist on disk, clear the counter and advance. If not,
same error as before — no behavior change for genuinely stuck units.
This commit is contained in:
deseltrus 2026-03-15 10:41:05 +01:00
parent 3bfa444809
commit 271ab39576

View file

@ -1921,6 +1921,30 @@ async function dispatchNextUnit(
}
}
// General reconciliation: if the last attempt DID produce the expected
// artifact on disk, clear the counter and advance instead of stopping.
// The execute-task path above handles its special case (writing placeholder
// summaries). This catch-all covers complete-slice, plan-slice,
// research-slice, and all other unit types where the Nth attempt at the
// dispatch limit succeeded but the counter check fires before anyone
// verifies disk state. Without this, a successful final attempt is
// indistinguishable from a failed one.
if (verifyExpectedArtifact(unitType, unitId, basePath)) {
ctx.ui.notify(
`Loop recovery: ${unitType} ${unitId} — artifact verified after ${prevCount + 1} dispatches. Advancing.`,
"info",
);
// Persist completion so the idempotency check prevents re-dispatch
// if deriveState keeps returning this unit (see #462).
persistCompletedKey(basePath, dispatchKey);
completedKeySet.add(dispatchKey);
unitDispatchCount.delete(dispatchKey);
invalidateStateCache();
await new Promise(r => setImmediate(r));
await dispatchNextUnit(ctx, pi);
return;
}
const expected = diagnoseExpectedArtifact(unitType, unitId, basePath);
const remediation = buildLoopRemediationSteps(unitType, unitId, basePath);
await stopAuto(ctx, pi);