- sf-db-schema.js: auto_vacuum INCREMENTAL → NONE. The "Bad ptr map entry"
corruption on 2026-05-17 was incremental-autovacuum ptrmap drift under
concurrent writers. Recovered DB has no ptrmap; future fresh DBs must
match. incremental_vacuum() callers in sf-db-core.js become no-ops.
- bin/sf-from-source: lock allowlist extended to skip readonly sf headless
subcommands (--help, query, status, usage, reflect, feedback list,
triage --list/--json). Previously every sf headless invocation tried
to acquire the project lock — operator couldn't even inspect SF state
while autonomous was running.
- self-feedback.js triageBlockedEntries: (1) treat empty/null/undefined
sfVersion as unknown, not zero; (2) exempt operator-direction kinds
(improvement-idea, architecture-defect, missing-feature, gap) from
auto-version-bump close. Both were needed to prevent the R124 incident
recurring.
- headless-feedback.ts handleAdd: populate sfVersion via getCurrentSfVersion
+ detect repoIdentity via isForgeRepo, not hardcoded "external"/"". An
empty sfVersion sorts below any real semver, so the resolver retry-closed
every operator-filed entry within seconds.
Net effect: R124 proposal (filed via sf headless feedback add) is no
longer auto-resolved as version-stale. Larger architectural fix (single-
writer SF daemon / RPC for all DB writes — M040 territory) tracked as
follow-up R-entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- uok/auto-runaway-guard.js: invoke runDetectorSweep alongside the existing
zero-progress check (fire-and-forget for sync-tick compatibility; results
consumed on next tick via sweepState ring buffer). Passes unitId,
unitMetrics, sessionFingerprint, lockPaths, and a 30-min DB-windowed
recentFeedback slice.
- detectors/{same-unit-loop, zero-progress, repeated-feedback-kind,
artifact-flap, stale-lock, periodic-runner}.js: each detector now also
exports a UokGate wrapper (id/type/execute -> GateResult per ADR-0075).
Plain detector functions kept for existing consumers.
- detectors/index.js: single import surface for the gate exports.
- detector-stale-lock.test.mjs (9), detector-periodic-runner.test.mjs (10),
detector-gates-contract.test.mjs: fills the R055/R056 test gap filed
earlier today + proves UokGate contract conformance.
- 41/41 detector tests green; copy-resources clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit accidentally tracked .sf/meta-status.json + .sf/meta-supervisor.pid (transient runtime files written by scripts/sf-meta-supervisor.mjs each tick). Mirror the existing .sf/runtime/ ignore pattern for these top-level meta-* files; the daemon keeps writing them on disk but git no longer tracks the churn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scripts/sf-meta-supervisor.mjs: pure-node daemon supervising
scripts/sf-autonomous-watchdog.sh. Tick=60s, restarts watchdog if dead,
emits .sf/meta-status.json, halt via .sf/meta-supervisor.halt. Uses
only node builtins (no SF dist deps) so it survives dist breakage.
- src/headless.ts: R091 — gate the per-cycle handleTriage call on a time
interval (SF_TRIAGE_INTERVAL_MS, default 30 min) and bump batch size
(SF_TRIAGE_MAX, default 25, was 5). Drops the ~8min triage hit from
every cycle while letting daily drain capacity rise.
- .sf/REQUIREMENTS.md: R091 (triage sidecar) + R092 (PDD-completeness
as routing signal) + R093 (pin model per orchestration agent.yaml) +
R094 (swarm-role model tier specialization — 8 roles already exist
in uok/swarm-roles.js; model field per role missing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operator: "should we have a for adrs and d for decisions? any other
type we should habe?" + "and so we use" — yes, file it and adopt.
Adds A (ADR), D (Decision), F (Finding), G (Gate), K (Knowledge),
P (Pattern), E (Evidence) prefixes to the existing R/M/S/T set.
Each gets a source-of-truth location and a mechanical migration path.
R048 (unbroken purpose chain) + R047 (per-R fulfillment validation)
both require typed cross-references to verify integrity. Without
typed IDs, "this M is covered by R, S, T, A, D" is unverifiable
free-text.
Owning milestone M041 (also added) splits the migration into 6
slices: rename ADRs, add D-IDs to DECISIONS, backfill F/G/K-IDs in
DB tables, doctor cross-link integrity check, lint for SF-authored
typed references.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operator: "db via bus so we dont crash it?" — yes, this is the natural
fit for the lane model (R057 system lane + R046 multi-unit lanes).
Concurrent lanes writing directly to SQLite hit SQLITE_BUSY or worse;
mediating all writes through a single MessageBus-owned db-writer actor
enforces the single-writer invariant operationally.
Reads stay direct (multi-reader WAL is safe). Migration is gradual
(table by table). Also serves as the substrate for multi-repo
federation (R028 / ADR-019/020) where cross-process DB sharing needs
message-based access anyway.
Future M040.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operator coined "system lane" — better than my "side-track". Frames
the architecture cleanly. The lane primitive unifies:
- R046 (multi-unit lanes) parallel slice dispatch
- R049 (per-lane model routing) different LLM per lane
- R057 (system lane) non-unit work alongside unit lane
Today autoLoop is 1 unit lane. System lane runs alongside for memory
consolidation, triage drain, doctor audits, log compaction, reflection
assembly, catalog refresh — all currently queued between units.
Single-writer DB met by sf-db.js serial queue.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
legacyPermissionLevelForProfile had a switch with cases for
restricted/trusted/unrestricted only, no case for "normal" (the
DEFAULT autonomous session profile per auto/session.js:377). "normal"
fell through to default → "low" — too restrictive for autonomous work.
Witnessed M010/S04/T01: solver note "TypeScript compilation and git
diff blocked by low permission level" — SF couldn't verify its own
deliverable because permissions were locked down despite running in
autonomous mode.
Fix:
- "normal" → "medium" (allows tsc, git, npm test)
- default → "medium" (was "low"); unknown profiles shouldn't cripple
autonomous executors. Operators wanting strict mode set
profile: "restricted" explicitly.
Per operator intent 2026-05-17: "SF should have permission even if
it can limit its agents and only allow orchestrator or whatever."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two guards added after today's 2-hour crash-loop on missing
DEFAULT_STALE_TIMEOUT_MS export:
1. Pre-flight smoke test: \`sf --version\` must succeed before each
cycle. If dist is broken (missing export, syntax error), pause
5min + log loudly instead of immediately respawning into the same
crash.
2. Crash-loop detection: 3 consecutive <90s failure exits → assume
crash-loop, back off 5min before retry. Prevents the
"100 crashes in 2 hours, 0 useful work" pattern we just hit.
Together: a broken dist causes ONE crash + a 5min pause, not a
2-hour CPU burn. Operator notices the pause in .sf/watchdog.log
and intervenes; in the meantime no resources wasted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the halt-watchdog detects stuck state, the autoLoop was logging
"halt-watchdog-break" every iteration but otherwise tight-spinning
through dispatch-resolve at ~2s/iteration. 2026-05-17 dogfood logged
60+ such events in a 30s window — pure CPU burn while the actual
stuck condition stayed stuck.
Fix: exponential backoff (1s → 2s → 4s → 8s → 16s → capped at 30s)
based on how many halt thresholds have elapsed. Heartbeat() resets
when real progress resumes (existing behavior). Backoff costs nothing
when the loop is healthy.
One of the 14 Ralph-Wiggum patterns surfaced this session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two sites told operator to "kill PID X" without checking X was alive:
- interrupted-session.js:formatInterruptedSessionRunningMessage
- auto.js autonomous-start blocking notification
Both report stale locks from crashed prior sessions as if a live session
exists, confusing operator and blocking restart. Session-lock.js already
has auto-recovery for stale-PID locks; these two surfaces just needed
matching liveness checks to label dead-PID locks correctly.
Now: dead-PID → "Stale lock from dead PID X — will be auto-recovered"
alive-PID → original "kill X" message
Catches one of the 14 Ralph-Wiggum-obvious patterns surfaced this
session. Reduces operator confusion + dovetails with R055 (M038/S05)
when stale-lock auto-recovery becomes a core-loop detector.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The autonomous loop currently lacks baseline "this is dumb-obvious
stuck" detection. This session alone surfaced 14 such patterns that
required operator grep-archaeology to identify. M038 centralizes a
single Wiggums-Detector orchestrator (R056) that runs 5 detector
questions every 30s:
R051 — same-unit dispatched >3 times with no state change
R052 — runtime/units progressCount:0 for >5min (heartbeating ghost)
R053 — >5 self-feedback entries of same kind/target in 24h
R054 — artifact predicates flapping between dispatches
R055 — stale .sf/sf.lock from dead holder + stale inline-fix marker
Each detector pauses + files actionable self-feedback. Trivial cases
auto-fix (e.g. stale-lock rm). New detectors (R057+) plug into the
orchestrator without per-detector lifecycle code.
Anchors to ADR-0000 (purpose-to-software requires self-healing). Builds
on the recurring patterns evidenced 2026-05-17:
- 70+ degenerate reassess iterations on M010/S03 (R051)
- 56+ runaway-loop:idle-halt entries accumulated on M005 (R053)
- Multiple stale-lock incidents requiring manual rm (R055)
56 R-entries total, 54/56 mapped (R049/R050 still future M036-M037).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When dispatcher resolves the same unit N>3 times in a session without
state-change between dispatches, detect the loop, pause, file
self-feedback. Targets the 2026-05-17 dogfood pattern where
reassess-roadmap M010/S03 ran 70+ times because of the ASSESSMENT
suffix mismatch (now fixed in a737af318).
Even after the immediate fix, this safety net prevents future
unknown-bug versions of the same failure mode from burning hours of
compute. R051 makes the failure first-class detectable instead of
operator-hand-debug.
Owning milestone M038 (future).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
checkNeedsReassessment looked up the slice's assessment file with
suffix='ASSESS', but actual files are 'S03-ASSESSMENT.md'. The
resolveFile pattern requires at least one char before the suffix
(/^S03-.*-ASSESS\.md$/), so 'S03-ASSESSMENT.md' never matched and
the helper returned {sliceId} on every poll → dispatcher kept
firing reassess-roadmap forever.
Fix: try 'ASSESSMENT' first, fall back to legacy 'ASSESS'. Now
S03-ASSESSMENT.md properly satisfies the "already reassessed" check
and the dispatcher advances to the next slice (S04).
Verified: resolveSliceFile('M010','S03','ASSESSMENT') returns the
real path; with the fallback, this resolves on first call. The
70+ degenerate reassess iterations on M010/S03 (witnessed
2026-05-17) won't recur.
Ralph Wiggum approved. (per operator: "sf should clear these stuck
itself ralph wiggums would fix")
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Detected via supervisory check 2026-05-17: SF stuck in degenerate reassess-
roadmap loop on M010/S03 (5 iterations in 8min, all returning
outcome=continue). Root cause: synthesized-checkpoint in runUnitViaSwarm
only treats the generic `checkpoint` tool as a completion signal — but
units routinely complete via their unit-specific tool (reassess_roadmap
with verdict=roadmap-confirmed, validate_milestone, complete_milestone,
complete_slice, save_summary). The LLM correctly emitted the unit's
specific completion tool + assistant text "<turn_status>complete</turn_status>",
but workerSignaledOutcome stayed null → synthesized checkpoint fell back
to continue → solver re-iterated.
Fix: recognize UNIT_COMPLETION_TOOLS = {reassess_roadmap,
validate_milestone, complete_milestone, complete_slice, save_summary}
as implicit "complete" signals. The check fires when those tools are
called and an earlier explicit checkpoint hasn't already said
"complete" or "blocked".
This resolves sf-mp94lth4-ew26om and should prevent future
degenerate-iteration loops on reassess-roadmap and milestone completion
units. 13/13 existing M010 tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that M010/S01+S02+S03 ship the inline-dispatch path (runUnitInline +
DispatchLayer + autoLoop wiring), the watchdog enables it on every
cycle so the autonomous loop actually exercises the inline scope for
INLINE_ELIGIBLE_UNITS (validate-milestone, complete-milestone,
reassess-roadmap). Other unit types continue to use the swarm path
unchanged.
This dogfoods M010/S03 in every watchdog cycle. If the inline path
regresses, the autonomous solver will surface it via self-feedback
(R015 spawn-failure loud-failure + agent-runner instrumentation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User caught: flash-lite ≠ flash (different model tier, different scores).
Previous fix counted flash-lite as fully covered via flash proxy, which
overstated coverage and could mislead routing.
benchmarkLookupVariants now tags variants with kind:
- 'exact' → date/version strip + -latest alias (same model line)
- 'approx' → tier strip (flash-lite→flash, X-lite→X) — different model
computeBenchmarkCoverage promotes 'exact' matches to covered; 'approx'
matches stay in uncovered with `approximatedBy` field so operators see
when a real benchmark is still needed.
Honest report: 64 exact covered / 1 proxy-only / 104 genuine uncovered
(was 65/0/104 with the overcount).
R049 + R050 added to traceability (M036/M037 future milestones).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
benchmark-coverage.js: new benchmarkLookupVariants() returns ordered
fallback keys for a model id, and computeBenchmarkCoverage tries each
variant before flagging uncovered. Patterns covered:
- date/version suffix strip ("mistral-medium-2505" → "mistral-medium")
- tier strip ("X-flash-lite" → "X-flash", "Y-lite" → "Y")
- "-latest" append for bare names ("mistral-medium" → "mistral-medium-latest")
The audit reports the matched variant via `matchedVia` so operators can
see when fallback applied (vs adding a real entry).
Verified: coverage 62/169 (37%) → 65/169 (38.4%). Sample fallback matches:
google-gemini-cli/gemini-2.5-flash-lite → gemini-2.5-flash
mistral/mistral-medium → mistral-medium-latest
mistral/magistral-small-2509 → magistral-small
R050 now active: full closure requires auto-benchmark of remaining
104 uncovered models via bulk-import of published scores or live eval.
This step shrinks the gap via cheap structural fallback; future work
adds the real scoring loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
run-unit.js: new tryInlineDispatch helper routes inline-eligible unit
types through the M010/S02 DispatchLayer when env SF_INLINE_DISPATCH=1.
Safe by default — without the env var OR for non-eligible unit types
(any unit not in INLINE_ELIGIBLE_UNITS), behavior is byte-identical
to before. With the env var set on validate-milestone / complete-milestone
/ reassess-roadmap, the autoLoop reaches runUnitInline → runSubagent
in-process, no spawn.
The helper translates DispatchLayer's {ok, output, exitCode, stderr}
into the UnitResult shape that autoLoop's resolveAgentEnd/finalize
chain expects, so downstream handling works unchanged.
13/13 M010 tests still pass. M010/S03 marked complete.
R049 added: Multi-Provider Parallel Routing — different concurrent units
route to different LLM providers based on quota/specialty/cost/failover.
Builds on R046 + R017 + model-router scoring. Future M036.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
REQUIREMENTS.md: traceability table now has 48/48 R-entries mapped to
owning milestone slices (was 40/48 unmapped). M031 owns R041-R044
(R-to-Milestone bootstrap with deep research), M032 owns R045 (R-auto-
expansion), M033 owns R046 (autonomous loop parallel dispatch), M034
owns R047 (per-R fulfillment validation), M035 owns R048 (unbroken
purpose chain).
scripts/sf-autonomous-watchdog.sh: also clears .sf/runtime/autonomous-
solver/active.json on cycle restart. Without this, a unit in
status:running from a crashed prior run made the autoLoop spin in
halt-watchdog-break (witnessed in this session: iteration 239+ in 8min
without unit progress).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scripts/sf-autonomous-watchdog.sh — bash daemon that supervises
`sf headless autonomous` across crashes/timeouts. Per-cycle:
1. Cleans stale state (lock + zombie inline-fix dispatch)
2. Kills orphan sf processes from prior runs
3. Launches sf with 30-min hard timeout (longest sf accepts cleanly)
4. On exit (timeout / dispatch-stop / crash), logs and restarts after
15s cooldown (10min cooldown if all milestones complete)
Run: nohup bash scripts/sf-autonomous-watchdog.sh > .sf/watchdog.log 2>&1 &
Stop: pkill -f sf-autonomous-watchdog
This is the operational mode for the 2-4 week delivery horizon — SF
runs continuously, the watchdog catches all exit conditions, and
progress accumulates across many autonomous cycles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two SF processes writing to the same .sf/sf.db over WAL caused torn
pages and "database disk image is malformed" corruption (observed
2026-05-17 in dogfood-5 — the project DB ended up with B-tree
pointer-map desync at page 69, requiring a backup restore). The
session-lock in src/resources/extensions/sf/session-lock.js exists
but is only acquired from auto-start.js when autonomous mode starts.
Interactive sf or pre-autonomous-start work did not take it, so a
second sf could open the same DB and contend.
Promote the lock to the shell wrapper so EVERY sf invocation in a
write-capable mode acquires a project-level flock on .sf/sf.lock
BEFORE node is launched. Read-only commands (logs, status, dash,
sessions, list, --version, --help) skip the lock to keep concurrent
read use-cases working. SF_SKIP_LOCK=1 escape hatch for tests that
intentionally exercise concurrent paths.
On collision the wrapper prints the current lock holder (pid + args
+ cwd + started timestamp) so the operator can identify the
conflicting session, then exits with 75 (EX_TEMPFAIL). The lock is
released automatically when the wrapper bash exits — no stale-lock
recovery needed since flock is kernel-owned and dies with the fd.
The fd opens in read+write mode (`<>`) WITHOUT truncating so the
collision branch can still cat the existing holder; truncation
happens only after flock succeeds, preventing two racers from
clobbering each other's metadata.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 38994d7a2 added a custom bm25-only Sift warmup at session_start.
After investigating, code-intelligence.js already has ensureSiftIndexWarmup
which runs the full hybrid + vector + reranker warmup as a properly-
daemonized process (PPID=1 after init-reparent, 1-hour hard cap, state
tracked in .sf/runtime/sift-index-warmup.json with status/artifactCount/
cacheBytes fields). The existing function is wired to auto-start.js,
init-wizard.js, guided-flow.js, and auto/loop.js — but NOT to plain
session_start. A pure interactive `sf` session (no /autonomous, no init
wizard) was previously getting no warmup at all.
Replace the bm25-only spawn with a call to ensureSiftIndexWarmup so
session_start now gets the same full hybrid+vector treatment the other
entry points already use. Drop sift-prewarm.js — the wrapper is no
longer needed.
User's "we need vector reindex" intent (today): now satisfied at every
SF entry point, not just autonomous/wizard/flow.
The broader "always-on out-of-session daemon + file-watcher incremental
re-warm + bus integration" piece is still tracked in
sf-mp8z9otl-iaqrn2 (missing-feature:sift-persistent-index-daemon) for
slice planning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sift (~/.cargo/bin/sift) builds its index lazily on first `sift
search` per cache key. In an SF session, the first real Sift query
typically happens deep inside an execute-task unit when an agent
reaches for the search-tool — and that agent pays the full cold-
build cost (tens of seconds on a large repo). Subsequent queries
hit warm cache and are fast.
Hook session_start to fire a cheap detached `sift search` against
the project root. The actual index build runs in parallel with the
rest of session_start (other catalog refreshes, doctor fix, etc.)
and is ready by the time any agent invokes search-tool. Cheapest
possible warmup: bm25-only retriever, no reranking, limit 1 — just
enough to trigger the index build pipeline.
Fully fire-and-forget: failures are swallowed (sift missing, spawn
error, exit non-zero — all just resolve(false). SF carries on as
before).
Also lands the .sf/preferences.yaml git section requested in the
same session: solo-mode defaults (auto_push=true, isolation=none,
merge_strategy=squash) so the autonomous loop doesn't pause for
operator confirmation on commit/push.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. cooldown failover (sf-mp8w9cg9-arixq7, high)
When a provider hits AUTH_COOLDOWN in unit execution, block the
failing model with an expiry using the existing blockModel() API,
then try a non-cooldowned provider via isProviderRequestReady.
Only stops if every provider is unavailable, with an enumerated
message showing which ones are down. loop.js consecutiveCooldowns
is not touched here (it tracks the loop-level retry budget for
provider-not-ready errors that bypass phases-unit; the cooldown
path in loop.js is separate and handles errors thrown before
runUnitPhase, while this fix handles cancellation returned from
runUnitPhase due to provider error during session creation).
2. redundant reassess-roadmap on completed slices (sf-mp8wa4qr-xw8fjb, medium)
Doctor-triggered reassess path (loop.js P4-A) now checks whether
the target slice already has an ASSESSMENT file before queuing
reassess-roadmap. Mirrors the guard already present in the
normal dispatch path (checkNeedsReassessment).
3. empty structured fields in slice summary (sf-mp8w6s88-ckv4yr, low)
Added explicit instruction in complete-slice.md prompt template
directing the executor to derive key_files, key_decisions, and
patterns_established from task summaries before calling
complete_slice.
Bootstrap drains the triage queue once at session_start (headless.ts:
647 "[headless] autonomous: draining self-feedback triage queue
first..."). Entries filed DURING the autonomous run previously sat
until the next sf restart — defeating the self-heal thesis for
long-running sessions like the 3-day dogfood the user is running now.
dispatchSelfFeedbackInlineFixIfNeeded already exists in the extension
(self-feedback-drain.js:277) and is wired into bootstrap/register-
hooks at session_start. It selects high/critical candidates, debounces
via a claim file (so concurrent invocations skip), and on the headless
surface spawns a child `sf headless triage --apply` fire-and-forget —
the autonomous loop continues unblocked while triage runs in a child.
Hook it into the auto-loop top-of-iteration so it fires every
MID_LOOP_TRIAGE_INTERVAL=5 iterations. The dispatcher's own claim-file
debounce prevents re-dispatch of in-flight entries; pre-bootstrap-
drained entries get re-evaluated only when something new shows up.
Also ignores scripts/tmp-check-test-imports in biome — the check-
test-imports.test.mjs self-test creates regression fixtures there and
they triggered formatter errors on dirty exits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The check-test-imports drift guard was emitting too many false positives
to be safely integrated into npm run lint (per CLAUDE.md: "NOT integrated
into npm run lint by default — too broad"). Two big classes of FP:
1) TypeScript keywords + utility types treated as undeclared (any, type,
ReturnType, Partial, Record, never, unknown, etc.) — added to the
JS_KEYWORDS set since the script doesn't otherwise distinguish JS
from TS.
2) Identifiers declared locally in the file (function declarations,
const/let/var declarations, destructured patterns, function/arrow
parameters, catch params, class names, type/interface/enum names) —
added a new collectLocalDeclarations() pass that regex-scans these
patterns and feeds the results into the filter chain.
After this patch the script no longer flags makeMockTUI / loader / tui
(local lets), `ReturnType<...>` (TS utility), or `any` (TS keyword) on
the canonical TUI test files. It still flags type-only imports
(`import type { Foo }` lines) and object-literal property names
(`{ recursive: true }`) — those remain as known FP classes documented
in the file's header for a future TS-parser-based pass.
Self-test 5/5 passes. Not yet integrating into npm run lint pending
further FP reduction; see filed self-feedback for the broader
integration plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dogfood today: autonomous mode burned $4.95 / 33.5M tokens / 28 min /
500 unproductive iterations on reassess-roadmap M006/S01 redispatching
the SAME unit ≥45 consecutive times before runaway-guard finally
fired. Each cycle: unit dispatches → swarm planner completes → unit
exits "success" → next iteration sees the same doctor slice-ref
health issue → re-queues the same unit. The auto-post-unit
auto-remediate path (insertArtifact for ASSESSMENT files) is wired
correctly but the reassess-roadmap unit's success doesn't actually
resolve the doctor's slice-reference issues — so the gate keeps
firing.
SF already has detectStuck Rule 2 ("Same unit 3+ consecutive times →
stuck") in auto/detect-stuck.js, but the doctor-health-reassess-
roadmap shortcut in auto/loop.js:1095-1170 bypasses normal pre-dispatch
and unshifts directly to sidecarQueue — so the unit never goes through
the phases-dispatch path that pushes to loopState.recentUnits, and
detectStuck never sees the repetition.
Convergence guard: before unshifting reassess-roadmap, check whether
the SAME (unitType + unitId) just ran 3+ consecutive times in
loopState.recentUnits. If yes:
- Skip the redispatch (don't unshift, don't finishTurn("retry"))
- File a self-feedback entry kind=engine-loop:non-converging-
redispatch so triage sees the pattern and can plan a real fix
- Fall through to normal runPreDispatch so the existing detectStuck
machinery can break the loop the next time the same key derives.
This is the user's "Ralph Wiggum loop" pattern — system observing its
own failure repeatedly without ever escaping. The broader convergence-
detector / solver-handoff / quarantine framework is filed for slice
planning in sf-mp8x32sy-70w298; this commit is the minimum surgical
fix for the specific reassess-roadmap-via-doctor-shortcut path that
actually fired today.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier collectInspectData read .sf/sf-db.json, a JSON projection
file SF stopped generating after the DB-first runtime landed.
.sf/sf-db.json no longer exists in any modern repo (verified absent
in this checkout), so /api/inspect was returning an empty payload
every time.
Replace with a read-only node:sqlite query against the live database:
- schemaVersion via MAX(version) FROM schema_version
- counts from COUNT(*) FROM {decisions,requirements,artifacts}
- recentDecisions ordered by decisions.seq DESC LIMIT 5
- recentRequirements ordered by requirements.id DESC LIMIT 5
The DB is opened readOnly so the autonomous loop's writer lock isn't
contested, and any failure (corrupt / locked / schema-drift) returns
an empty payload instead of 500-ing so the operator endpoint stays
available.
This is the small surgical half of the broader web-sf-information-
drift gap: web has no API surfaces for self-feedback, memories,
reflection reports, or uok_messages bus state. That broader integration
work is filed as a separate self-feedback entry for slice planning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bash wrapper bin/sf-from-source exports SF_SUBAGENT_VIA_SWARM=1
to make the swarm/messagebus path the default for subagent dispatch.
That covers every sf launch via the wrapper but does NOT cover the
web-launched sf — src/web/cli-entry.ts:resolveSfCliEntry spawns sf by
calling process.execPath (node) directly with src/loader.ts or
dist/loader.js, bypassing the wrapper entirely. So /tmp/sf-web-
onboarding-runtime-* sf processes were still falling through to the
direct-runSubagent subprocess path.
Flip the default in code instead: swarm runs unless
SF_SUBAGENT_VIA_SWARM is explicitly set to "0" or "false". Now every
sf launch — wrapper, web, dev-cli, packaged-standalone — picks up the
same default. The wrapper's export line is now redundant but harmless;
keeping it as defense-in-depth (documents the intent at the wrapper
layer too).
Test update: subagent-via-swarm.test.mjs's "unset → subprocess"
assertion is updated to "=0 → subprocess" — the unset case now means
swarm-by-default. All 13 tests in that file pass. The other tests in
the file that explicitly set the flag to "1"/"true" are unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps version across the workspace (root + 10 @singularity-forge/*
packages) and lands the pending dependency refresh that had been
sitting uncommitted:
@anthropic-ai/sdk 0.95.1 → 0.96.0
@anthropic-ai/vertex-sdk 0.14.4 → 0.16.0
@google/genai 2.0 → 2.3
@logtape/{file,logtape,pretty,redaction} 2.0.7 → 2.0.9
@smithy/node-http-handler 4.7.0 → 4.7.3
@clack/prompts 1.3 → 1.4
@types/mime-types 2.1 → 3.0
Inter-package refs in packages/{daemon,ai}/package.json bumped to
^2.75.4 so the workspace stays self-consistent. package-lock.json
regenerated via `npm install --package-lock-only --legacy-peer-deps`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The phaseWatchdog at 10s fired "STUCK phase=session.prompt" on every
healthy LLM call longer than 10 seconds. Verified via strace on the
running dogfood sf: bytes were actively flowing on the TLS socket
(fd 29) to the LLM provider while STUCK was being logged — the
session.prompt was never actually stuck, the watchdog was just
diagnostic-only and oblivious to stream activity.
The noOutputTimeoutMs watchdog (set to 60s for triage in commit
d80060fec) is the actual kill mechanism. It is already event-aware:
every meaningful subagent event resets the timer via armNoOutputTimer
+ isMeaningfulSubagentOutputEvent. The 10s STUCK warning was added
in commit 67e5ac9db as investigation infrastructure for the
sf-mp8e02m1-zpk903 family of bugs, but now it is just noise that
makes legitimate 30-200s LLM responses look broken.
Keeps the 10s STUCK watchdog for the three setup phases
(resourceLoader.reload, createAgentSession, bindExtensions) where
10s of silence is a real hang signal — those phases normally run in
sub-second.
Also includes:
- biome.json: bump $schema URL from 2.4.14 to 2.4.15 to match the
current biome CLI (clears the deserialize warning)
- scripts/check-test-imports.{,test.}mjs: format + drop a useless
regex escape that biome flagged in landed code
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets SF_SUBAGENT_VIA_SWARM=1 by default in the wrapper so all sf
launches route subagent calls through runSingleAgentViaSwarm (uok
message-bus / uok_messages table) instead of spawning a child sf
process via runSubagent. Operators can opt out with
SF_SUBAGENT_VIA_SWARM=0 (or =false) in env.
Leaves the runSingleAgent code default (opt-in) unchanged so the
existing tests/subagent-via-swarm.test.mjs "unset → subprocess"
assertion keeps holding. The flip lives at the wrapper layer where
every interactive/headless sf launch picks it up but tests and
direct dev-cli launches stay on documented opt-in semantics.
Note: this is Layer 1 of the inline-execution path. Layer 2 (full
in-process unit dispatch via runUnitInline) is tracked separately
in REQUIREMENTS.md R013/R014 and is not addressed here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AC1: Document convention in CLAUDE.md — test files over-importing (>5)
from a SF module should use namespace imports to avoid the anti-pattern
where a new describe() block uses an undeclared function (ReferenceError
at vitest run-time, not caught by biome lint).
AC3/AC4: add check-test-imports.mjs — static analysis script that scans
all *.test.{js,mjs,ts} files for itemized imports (≥6) + camelCase
identifier not in the import list. Exposes the failure mode at lint time.
Includes regression test (check-test-imports.test.mjs, 5/5 passing).
Closes sf-mp8ujgry-aoqcx0.
Extend R009 builder ordering safety tests to 6 builders:
- buildPlanSlicePrompt: verifies inlined context and roadmap
- buildRefineSlicePrompt: verifies inlined context and slice-context
- buildExecuteTaskPrompt: verifies task plan inlining and templates
- buildReactiveExecutePrompt: verifies ready task list and templates
- buildCompleteMilestonePrompt: verifies inlined context and roadmap
- buildGateEvaluatePrompt: verifies slice plan context and gates
Note: buildWorkflowPreferencesPrompt and buildReactiveExecutePrompt do not use
{{inlinedContext}} — they use {{inlinedTemplates}} or bespoke template wiring.
Tests assert on the actual template markers these builders produce.
Format-only normalization of files landed in 7d57115a6 — multi-line
object literals and import groupings to match the project's biome
config. No semantic changes (test still passes 4/4).
Also reformats auto-prompts.js whitespace touched by the same pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>