spec(R051): same-unit dispatch-loop detection (Ralph Wiggum safety net)
When dispatcher resolves the same unit N>3 times in a session without
state-change between dispatches, detect the loop, pause, file
self-feedback. Targets the 2026-05-17 dogfood pattern where
reassess-roadmap M010/S03 ran 70+ times because of the ASSESSMENT
suffix mismatch (now fixed in a737af318).
Even after the immediate fix, this safety net prevents future
unknown-bug versions of the same failure mode from burning hours of
compute. R051 makes the failure first-class detectable instead of
operator-hand-debug.
Owning milestone M038 (future).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
a737af318d
commit
e470939723
1 changed files with 11 additions and 0 deletions
|
|
@ -636,3 +636,14 @@ ADR-0000 declares SF a **purpose-to-software compiler**. R036–R040 codify that
|
|||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
- Notes: Two paths to scores: (a) bulk-import published scores from MMLU/HumanEval/SWE-bench for known models, (b) live-measure via SF's eval suite for unknown models (existing `.sf/evals/autonomous-solver/` framework). Doctor surfaces uncovered models; scheduler treats uncovered as "use cautiously, not for high-stakes units."
|
||||
|
||||
### R051 — Same-Unit Dispatch-Loop Detection (Ralph Wiggum Safety Net)
|
||||
- Class: failure-visibility
|
||||
- Status: active
|
||||
- Description: When the dispatcher resolves the same `{unitType, unitId}` more than N=3 times in a single autonomous session without that unit's outcome changing the dispatcher's decision, SF must detect the loop, pause the autoLoop, file a self-feedback entry of kind `dispatch:degenerate-loop` with the unit details, and surface to operator. Default: stop dispatching that unit; advance to the next-best alternative.
|
||||
- Why it matters: 2026-05-17 dogfood: SF dispatched `reassess-roadmap M010/S03` 70+ times because `checkNeedsReassessment` had a suffix-mismatch bug (`ASSESS` vs `ASSESSMENT.md`). Each dispatch took 2-7 minutes; cumulatively hours of compute burned on the same no-op task. The Ralph-Wiggum-obvious failure mode — "I keep doing the same thing and nothing changes" — needs to be a first-class detector, not require operator hand-debugging.
|
||||
- Source: spec (responds to dogfood evidence 2026-05-17)
|
||||
- Primary owning slice: unmapped (future "M038 Dispatch Loop Safety")
|
||||
- Supporting slices: none
|
||||
- Validation: unmapped
|
||||
- Notes: Implementation: per-session counter of `dispatch-resolve` decisions keyed by `${unitType}:${unitId}`. When count > 3 in <30min wall AND no measurable state change between dispatches (e.g., milestone status, slice status, artifact set unchanged), trigger the safety net. Doctor surfaces the loop as a project-level issue.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue