spec(R051): same-unit dispatch-loop detection (Ralph Wiggum safety net)

When dispatcher resolves the same unit N>3 times in a session without
state-change between dispatches, detect the loop, pause, file
self-feedback. Targets the 2026-05-17 dogfood pattern where
reassess-roadmap M010/S03 ran 70+ times because of the ASSESSMENT
suffix mismatch (now fixed in a737af318).

Even after the immediate fix, this safety net prevents future
unknown-bug versions of the same failure mode from burning hours of
compute. R051 makes the failure first-class detectable instead of
operator-hand-debug.

Owning milestone M038 (future).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-17 04:12:55 +02:00
parent a737af318d
commit e470939723

View file

@ -636,3 +636,14 @@ ADR-0000 declares SF a **purpose-to-software compiler**. R036R040 codify that
- Supporting slices: none
- Validation: unmapped
- Notes: Two paths to scores: (a) bulk-import published scores from MMLU/HumanEval/SWE-bench for known models, (b) live-measure via SF's eval suite for unknown models (existing `.sf/evals/autonomous-solver/` framework). Doctor surfaces uncovered models; scheduler treats uncovered as "use cautiously, not for high-stakes units."
### R051 — Same-Unit Dispatch-Loop Detection (Ralph Wiggum Safety Net)
- Class: failure-visibility
- Status: active
- Description: When the dispatcher resolves the same `{unitType, unitId}` more than N=3 times in a single autonomous session without that unit's outcome changing the dispatcher's decision, SF must detect the loop, pause the autoLoop, file a self-feedback entry of kind `dispatch:degenerate-loop` with the unit details, and surface to operator. Default: stop dispatching that unit; advance to the next-best alternative.
- Why it matters: 2026-05-17 dogfood: SF dispatched `reassess-roadmap M010/S03` 70+ times because `checkNeedsReassessment` had a suffix-mismatch bug (`ASSESS` vs `ASSESSMENT.md`). Each dispatch took 2-7 minutes; cumulatively hours of compute burned on the same no-op task. The Ralph-Wiggum-obvious failure mode — "I keep doing the same thing and nothing changes" — needs to be a first-class detector, not require operator hand-debugging.
- Source: spec (responds to dogfood evidence 2026-05-17)
- Primary owning slice: unmapped (future "M038 Dispatch Loop Safety")
- Supporting slices: none
- Validation: unmapped
- Notes: Implementation: per-session counter of `dispatch-resolve` decisions keyed by `${unitType}:${unitId}`. When count > 3 in <30min wall AND no measurable state change between dispatches (e.g., milestone status, slice status, artifact set unchanged), trigger the safety net. Doctor surfaces the loop as a project-level issue.