spec: M038 Wiggums Detector family (R051-R056)

The autonomous loop currently lacks baseline "this is dumb-obvious
stuck" detection. This session alone surfaced 14 such patterns that
required operator grep-archaeology to identify. M038 centralizes a
single Wiggums-Detector orchestrator (R056) that runs 5 detector
questions every 30s:

  R051 — same-unit dispatched >3 times with no state change
  R052 — runtime/units progressCount:0 for >5min (heartbeating ghost)
  R053 — >5 self-feedback entries of same kind/target in 24h
  R054 — artifact predicates flapping between dispatches
  R055 — stale .sf/sf.lock from dead holder + stale inline-fix marker

Each detector pauses + files actionable self-feedback. Trivial cases
auto-fix (e.g. stale-lock rm). New detectors (R057+) plug into the
orchestrator without per-detector lifecycle code.

Anchors to ADR-0000 (purpose-to-software requires self-healing). Builds
on the recurring patterns evidenced 2026-05-17:
  - 70+ degenerate reassess iterations on M010/S03 (R051)
  - 56+ runaway-loop:idle-halt entries accumulated on M005 (R053)
  - Multiple stale-lock incidents requiring manual rm (R055)

56 R-entries total, 54/56 mapped (R049/R050 still future M036-M037).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Mikael Hugo 2026-05-17 04:16:52 +02:00
parent e470939723
commit 96d03b33bc

View file

@ -502,14 +502,20 @@ The next group enforces ADR-0000's contract: **purpose is the driver**, not work
| R048 | core-capability | active | M035/S02 | M035/S01, M035/S03, M035/S04 | unmapped |
| R049 | differentiator | active | unmapped (M036 future) | none | unmapped |
| R050 | quality-attribute | active | unmapped (M037 future) | none | partial — variant-fallback shipped in benchmark-coverage.js |
| R051 | failure-visibility | active | M038/S01 | none | unmapped |
| R052 | failure-visibility | active | M038/S02 | none | unmapped |
| R053 | failure-visibility | active | M038/S03 | none | unmapped |
| R054 | failure-visibility | active | M038/S04 | none | unmapped |
| R055 | differentiator | active | M038/S05 | none | unmapped |
| R056 | core-capability | active | M038 | M038/S01, M038/S02, M038/S03, M038/S04, M038/S05 | unmapped |
## Coverage Summary
- Active requirements: 50
- Mapped to slices: **48**
- Active requirements: 56
- Mapped to slices: **54**
- Validated: 0
- Unmapped active requirements: **2** (R049 — multi-provider parallel routing; R050 — auto-benchmark uncovered models)
- Owning milestones: M003 (R001-R006), M005 (R007-R010), M010 (R013-R015, R020), M011 (R011-R012), M012 (R016), M013 (R017), M014 (R018), M015 (R019), M016-M030 (R021-R040), M031 (R041-R044), M032 (R045), M033 (R046), M034 (R047), M035 (R048), [pending] M036-M037 (R049-R050)
- Owning milestones: M003 (R001-R006), M005 (R007-R010), M010 (R013-R015, R020), M011 (R011-R012), M012 (R016), M013 (R017), M014 (R018), M015 (R019), M016-M030 (R021-R040), M031 (R041-R044), M032 (R045), M033 (R046), M034 (R047), M035 (R048), [pending] M036-M037 (R049-R050), **M038 (R051-R056 — Wiggums Detector family)**
## Purpose Anchor
@ -647,3 +653,59 @@ ADR-0000 declares SF a **purpose-to-software compiler**. R036R040 codify that
- Supporting slices: none
- Validation: unmapped
- Notes: Implementation: per-session counter of `dispatch-resolve` decisions keyed by `${unitType}:${unitId}`. When count > 3 in <30min wall AND no measurable state change between dispatches (e.g., milestone status, slice status, artifact set unchanged), trigger the safety net. Doctor surfaces the loop as a project-level issue.
### R052 — Zero-Progress Runtime Unit Detector
- Class: failure-visibility
- Status: active
- Description: Watch active .sf/runtime/units/*.json files. When any unit sits at progressCount:0 for >5min wall AND has lastHeartbeatAt within that window (alive but stuck), surface as kind=dispatch:zero-progress-stall self-feedback. This catches the "heartbeating ghost" pattern witnessed multiple times this dogfood session.
- Why it matters: Heartbeats-alive-but-zero-progress is invisible to the standard runaway-guard because runaway tracks tool-call growth, not unit-state growth. This detector closes that gap.
- Source: spec (responds to 2026-05-17 dogfood evidence)
- Primary owning slice: M038/S02
- Supporting slices: none
- Validation: unmapped
- Notes: 5-minute threshold tunable per unit type (research-slice naturally takes longer than complete-milestone).
### R053 — Repeated Self-Feedback Kind/Target Detector
- Class: failure-visibility
- Status: active
- Description: Group unresolved self-feedback entries by {kind, occurredIn.milestone, occurredIn.slice}. When a group has >5 entries within 24h, the same error is being filed repeatedly without resolution — surface as kind=feedback:repeated-failure and trigger triage escalation. This catches the "56+ runaway-loop:idle-halt entries on M005" pattern that nobody acted on for days.
- Why it matters: The self-feedback queue is a write-only audit if nobody reads it. Detecting clustering signals "this is a real recurring bug, not noise" — and routes the entry to humans/triage with high signal.
- Source: spec (responds to 2026-05-17 dogfood evidence — 56+ idle-halt entries accumulated)
- Primary owning slice: M038/S03
- Supporting slices: none
- Validation: unmapped
- Notes: Rollup logic must respect existing maybeRecordRepeatedFailureRollup in doctor.js; this detector extends, not replaces.
### R054 — Artifact Predicate Flap Detector
- Class: failure-visibility
- Status: active
- Description: Watch key artifact predicates (file exists, vision non-empty, slice status, M*-SUMMARY.md presence). When any predicate flaps true/false between consecutive dispatches of the same unit, the unit is undoing its own work — surface as kind=dispatch:artifact-flap.
- Why it matters: A unit that creates an artifact, the next iteration deletes it, the next iteration recreates it is in a self-destructive loop. Today this is invisible until operator notices via mtimes. The detector makes flap-loops first-class.
- Source: spec
- Primary owning slice: M038/S04
- Supporting slices: none
- Validation: unmapped
- Notes: Predicates checked per unit type. Initial set: SUMMARY.md / ASSESSMENT.md / PLAN.md exists, slice status, milestone vision non-empty.
### R055 — Stale-Lock Auto-Recovery
- Class: differentiator
- Status: active
- Description: Detect .sf/sf.lock held by no live PID (the holder PID is dead). Auto-fix: rm the lock + log + file self-feedback kind=lock:stale-recovered. Also detect .sf/runtime/self-feedback-inline-fix.json with dispatchedAt older than 30min and no in-flight dispatcher work — auto-clear. Race-safe: verify the holder is genuinely dead via /proc check before removing.
- Why it matters: This session repeatedly hit "Another autonomous mode session (PID X) appears to be running. Stop it with kill X" — but PID X was always already dead. Stale-lock auto-recovery removes the manual cleanup chore + prevents crashed sf from blocking the next watchdog cycle.
- Source: spec (responds to 2026-05-17 dogfood evidence — multiple stale-lock incidents)
- Primary owning slice: M038/S05
- Supporting slices: none
- Validation: unmapped
- Notes: Watchdog already does this on cycle restart (scripts/sf-autonomous-watchdog.sh); R055 moves the logic into SF core so any sf invocation (not just under watchdog) benefits.
### R056 — Wiggums Detector Periodic Orchestrator
- Class: core-capability
- Status: active
- Description: A single periodic loop in auto-timers.js that runs every 30s and evaluates all 5 Wiggums questions (R051, R052, R053, R054, R055). Detector results aggregate into a wiggums-state.json that the operator dashboard reads. Single orchestrator pattern means: add new detectors by adding a question function; no per-detector lifecycle code.
- Why it matters: Without a single orchestrator, each detector becomes its own scheduled task with its own lifecycle bugs (the watchdog uses the simple "loop forever" pattern but inside SF that's not idiomatic). One orchestrator = one place where ALL stuck-pattern detection lives. Future Wiggums questions (R057+) just plug in.
- Source: spec
- Primary owning slice: M038 (cross-cuts all S01-S05)
- Supporting slices: M038/S01, M038/S02, M038/S03, M038/S04, M038/S05
- Validation: unmapped
- Notes: Pairs with the dashboard surface (R022, R026). Wiggums state is part of the autonomous-loop status snapshot.