fix: verify artifacts on disk before bailing on dispatch loop limit
The loop detection in dispatchNextUnit stops auto-mode when a unit has been dispatched MAX_UNIT_DISPATCHES (3) times. Previously, only execute-task had reconciliation logic to check whether the artifact actually exists on disk before bailing. All other unit types (complete-slice, plan-slice, research-slice, etc.) would immediately stop — even if the Nth attempt successfully produced the artifact. This is a race between the dispatch counter and disk verification: the counter increments at dispatch time, but artifact verification only runs during closeout of the NEXT unit. If the last allowed attempt succeeds, the counter is already at the limit when the next dispatch tries to run, and nobody checks disk state. Reproduction scenario: 1. complete-slice dispatched 3 times (LLM missed writing UAT on attempts 1-2, succeeded on attempt 3) 2. Attempt 3 produces both SUMMARY and UAT — auto-committed to disk 3. Dispatch 4 fires: prevCount (3) >= MAX_UNIT_DISPATCHES (3) 4. No disk check for complete-slice → pipeline stops with 'Expected artifact not found' despite artifacts existing Fix: add a general verifyExpectedArtifact() check after the execute-task-specific reconciliation and before the final bail-out. If artifacts exist on disk, clear the counter and advance. If not, same error as before — no behavior change for genuinely stuck units.
This commit is contained in:
parent
3bfa444809
commit
271ab39576
1 changed files with 24 additions and 0 deletions
|
|
@ -1921,6 +1921,30 @@ async function dispatchNextUnit(
|
|||
}
|
||||
}
|
||||
|
||||
// General reconciliation: if the last attempt DID produce the expected
|
||||
// artifact on disk, clear the counter and advance instead of stopping.
|
||||
// The execute-task path above handles its special case (writing placeholder
|
||||
// summaries). This catch-all covers complete-slice, plan-slice,
|
||||
// research-slice, and all other unit types where the Nth attempt at the
|
||||
// dispatch limit succeeded but the counter check fires before anyone
|
||||
// verifies disk state. Without this, a successful final attempt is
|
||||
// indistinguishable from a failed one.
|
||||
if (verifyExpectedArtifact(unitType, unitId, basePath)) {
|
||||
ctx.ui.notify(
|
||||
`Loop recovery: ${unitType} ${unitId} — artifact verified after ${prevCount + 1} dispatches. Advancing.`,
|
||||
"info",
|
||||
);
|
||||
// Persist completion so the idempotency check prevents re-dispatch
|
||||
// if deriveState keeps returning this unit (see #462).
|
||||
persistCompletedKey(basePath, dispatchKey);
|
||||
completedKeySet.add(dispatchKey);
|
||||
unitDispatchCount.delete(dispatchKey);
|
||||
invalidateStateCache();
|
||||
await new Promise(r => setImmediate(r));
|
||||
await dispatchNextUnit(ctx, pi);
|
||||
return;
|
||||
}
|
||||
|
||||
const expected = diagnoseExpectedArtifact(unitType, unitId, basePath);
|
||||
const remediation = buildLoopRemediationSteps(unitType, unitId, basePath);
|
||||
await stopAuto(ctx, pi);
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue