diff --git a/.sf/REQUIREMENTS.md b/.sf/REQUIREMENTS.md index 4cb6e9a68..3d8e68d3d 100644 --- a/.sf/REQUIREMENTS.md +++ b/.sf/REQUIREMENTS.md @@ -502,14 +502,20 @@ The next group enforces ADR-0000's contract: **purpose is the driver**, not work | R048 | core-capability | active | M035/S02 | M035/S01, M035/S03, M035/S04 | unmapped | | R049 | differentiator | active | unmapped (M036 future) | none | unmapped | | R050 | quality-attribute | active | unmapped (M037 future) | none | partial — variant-fallback shipped in benchmark-coverage.js | +| R051 | failure-visibility | active | M038/S01 | none | unmapped | +| R052 | failure-visibility | active | M038/S02 | none | unmapped | +| R053 | failure-visibility | active | M038/S03 | none | unmapped | +| R054 | failure-visibility | active | M038/S04 | none | unmapped | +| R055 | differentiator | active | M038/S05 | none | unmapped | +| R056 | core-capability | active | M038 | M038/S01, M038/S02, M038/S03, M038/S04, M038/S05 | unmapped | ## Coverage Summary -- Active requirements: 50 -- Mapped to slices: **48** +- Active requirements: 56 +- Mapped to slices: **54** - Validated: 0 - Unmapped active requirements: **2** (R049 — multi-provider parallel routing; R050 — auto-benchmark uncovered models) -- Owning milestones: M003 (R001-R006), M005 (R007-R010), M010 (R013-R015, R020), M011 (R011-R012), M012 (R016), M013 (R017), M014 (R018), M015 (R019), M016-M030 (R021-R040), M031 (R041-R044), M032 (R045), M033 (R046), M034 (R047), M035 (R048), [pending] M036-M037 (R049-R050) +- Owning milestones: M003 (R001-R006), M005 (R007-R010), M010 (R013-R015, R020), M011 (R011-R012), M012 (R016), M013 (R017), M014 (R018), M015 (R019), M016-M030 (R021-R040), M031 (R041-R044), M032 (R045), M033 (R046), M034 (R047), M035 (R048), [pending] M036-M037 (R049-R050), **M038 (R051-R056 — Wiggums Detector family)** ## Purpose Anchor @@ -647,3 +653,59 @@ ADR-0000 declares SF a **purpose-to-software compiler**. R036–R040 codify that - Supporting slices: none - Validation: unmapped - Notes: Implementation: per-session counter of `dispatch-resolve` decisions keyed by `${unitType}:${unitId}`. When count > 3 in <30min wall AND no measurable state change between dispatches (e.g., milestone status, slice status, artifact set unchanged), trigger the safety net. Doctor surfaces the loop as a project-level issue. + + +### R052 — Zero-Progress Runtime Unit Detector +- Class: failure-visibility +- Status: active +- Description: Watch active .sf/runtime/units/*.json files. When any unit sits at progressCount:0 for >5min wall AND has lastHeartbeatAt within that window (alive but stuck), surface as kind=dispatch:zero-progress-stall self-feedback. This catches the "heartbeating ghost" pattern witnessed multiple times this dogfood session. +- Why it matters: Heartbeats-alive-but-zero-progress is invisible to the standard runaway-guard because runaway tracks tool-call growth, not unit-state growth. This detector closes that gap. +- Source: spec (responds to 2026-05-17 dogfood evidence) +- Primary owning slice: M038/S02 +- Supporting slices: none +- Validation: unmapped +- Notes: 5-minute threshold tunable per unit type (research-slice naturally takes longer than complete-milestone). + +### R053 — Repeated Self-Feedback Kind/Target Detector +- Class: failure-visibility +- Status: active +- Description: Group unresolved self-feedback entries by {kind, occurredIn.milestone, occurredIn.slice}. When a group has >5 entries within 24h, the same error is being filed repeatedly without resolution — surface as kind=feedback:repeated-failure and trigger triage escalation. This catches the "56+ runaway-loop:idle-halt entries on M005" pattern that nobody acted on for days. +- Why it matters: The self-feedback queue is a write-only audit if nobody reads it. Detecting clustering signals "this is a real recurring bug, not noise" — and routes the entry to humans/triage with high signal. +- Source: spec (responds to 2026-05-17 dogfood evidence — 56+ idle-halt entries accumulated) +- Primary owning slice: M038/S03 +- Supporting slices: none +- Validation: unmapped +- Notes: Rollup logic must respect existing maybeRecordRepeatedFailureRollup in doctor.js; this detector extends, not replaces. + +### R054 — Artifact Predicate Flap Detector +- Class: failure-visibility +- Status: active +- Description: Watch key artifact predicates (file exists, vision non-empty, slice status, M*-SUMMARY.md presence). When any predicate flaps true/false between consecutive dispatches of the same unit, the unit is undoing its own work — surface as kind=dispatch:artifact-flap. +- Why it matters: A unit that creates an artifact, the next iteration deletes it, the next iteration recreates it is in a self-destructive loop. Today this is invisible until operator notices via mtimes. The detector makes flap-loops first-class. +- Source: spec +- Primary owning slice: M038/S04 +- Supporting slices: none +- Validation: unmapped +- Notes: Predicates checked per unit type. Initial set: SUMMARY.md / ASSESSMENT.md / PLAN.md exists, slice status, milestone vision non-empty. + +### R055 — Stale-Lock Auto-Recovery +- Class: differentiator +- Status: active +- Description: Detect .sf/sf.lock held by no live PID (the holder PID is dead). Auto-fix: rm the lock + log + file self-feedback kind=lock:stale-recovered. Also detect .sf/runtime/self-feedback-inline-fix.json with dispatchedAt older than 30min and no in-flight dispatcher work — auto-clear. Race-safe: verify the holder is genuinely dead via /proc check before removing. +- Why it matters: This session repeatedly hit "Another autonomous mode session (PID X) appears to be running. Stop it with kill X" — but PID X was always already dead. Stale-lock auto-recovery removes the manual cleanup chore + prevents crashed sf from blocking the next watchdog cycle. +- Source: spec (responds to 2026-05-17 dogfood evidence — multiple stale-lock incidents) +- Primary owning slice: M038/S05 +- Supporting slices: none +- Validation: unmapped +- Notes: Watchdog already does this on cycle restart (scripts/sf-autonomous-watchdog.sh); R055 moves the logic into SF core so any sf invocation (not just under watchdog) benefits. + +### R056 — Wiggums Detector Periodic Orchestrator +- Class: core-capability +- Status: active +- Description: A single periodic loop in auto-timers.js that runs every 30s and evaluates all 5 Wiggums questions (R051, R052, R053, R054, R055). Detector results aggregate into a wiggums-state.json that the operator dashboard reads. Single orchestrator pattern means: add new detectors by adding a question function; no per-detector lifecycle code. +- Why it matters: Without a single orchestrator, each detector becomes its own scheduled task with its own lifecycle bugs (the watchdog uses the simple "loop forever" pattern but inside SF that's not idiomatic). One orchestrator = one place where ALL stuck-pattern detection lives. Future Wiggums questions (R057+) just plug in. +- Source: spec +- Primary owning slice: M038 (cross-cuts all S01-S05) +- Supporting slices: M038/S01, M038/S02, M038/S03, M038/S04, M038/S05 +- Validation: unmapped +- Notes: Pairs with the dashboard surface (R022, R026). Wiggums state is part of the autonomous-loop status snapshot.