Commit graph

4679 commits

Author SHA1 Message Date
Mikael Hugo
cc32ab79d9 fix(docs): remove stale hashline-{read,edit}.ts rows post-fold
Hashline read/edit tool wrappers were folded into Edit({match}) and
Read({format}) modes in commit ffdec0fee. The two rows in FILE-SYSTEM-MAP.md
pointed to files that no longer exist. Updated the surviving hashline.ts row
to note its new consumer relationship with Edit/Read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:48:34 +02:00
Mikael Hugo
781a7e7319 chore(safety): narrow autonomous-rollback to flag-flip only (R066 D1)
Remove git-revert authority per operator decision M048-D1. Crash-loop
classifier sees runtime evidence, not commit attribution; reverting on
runtime symptoms risks reverting the wrong commit. On quarantine trigger,
smoke_gate is flipped false to halt ledger writes and a self-feedback entry
(kind: crash-loop-detected, severity: high) is filed with a manual-review
suggestion. Operator retains sole authority to git-revert.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 18:24:00 +02:00
Mikael Hugo
c2f101734f feat: enforce purpose-first adversarial review 2026-05-17 18:15:15 +02:00
Mikael Hugo
acafee06e2 fix: iter-completion-reconciler test uses relative timestamps
Test had fixed literal timestamps (TS_X = "2026-05-17T12:42:05.618Z")
that became stale once the calendar moved past them — the reconciler's
default maxAgeMs (1h, "older drift is operator territory") filtered
them out. By 3h after the original write the test failed: reconciled.length
was 0 because no entry passed the age filter.

Switch to NOW-relative timestamps (5/30/1 min back from Date.now()) so
the fixture always lands inside the default age window regardless of
when the test runs.

Sonnet #13 (tool rename) report flagged this test as failing alongside
the 4 known pre-existing failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:49:11 +02:00
Mikael Hugo
623af869b1 remove: SF voice IVR / ElevenLabs paging — migrated to centralcloud
Per operator-direction 2026-05-17 (R089 — Migrate Voice IVR / ElevenLabs
On-Call Paging Infrastructure out of SF). Migration target landed in
centralcloud monorepo:
  - centralcloud_core/lib/centralcloud_core/voice.ex (TwiML + ElevenLabs)
  - centralcloud_staff/lib/.../controllers/voice_controller.ex (Phoenix)
  - centralcloud_staff/lib/.../controllers/voice_prompt_controller.ex
  - centralcloud_staff/lib/.../router.ex (/twilio scope)

SF removal:
  - web/app/api/voice/route.ts
  - web/app/api/voice/prompt/route.ts
  - web/app/api/voice/ directory
  - src/tests/integration/web-voice-ivr-contract.test.ts

Operator-paging infra was historical drift in SF (per-project compiler);
belongs in centralcloud (org-level ops). R088 (Pre-Removal Test-Import
Safety Gate) not yet built — operator manually verified safety scan:
TWILIO_/ELEVENLABS_ env vars only referenced in the deleted files; no
internal SF callers; centralcloud version verified present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:42:16 +02:00
Mikael Hugo
ffdec0feee fold: hashline_edit + hashline_read → Edit({match}) + Read({format}) modes
Per operator R-entry sf-mp9wo7e3-sdxqss + no-compat directive.

- Edit gains `match: "substring"|"anchor"` arg; anchor mode routes to the
  existing applyHashlineEdits logic. Substring stays default.
- Read gains `format: "plain"|"tagged"` arg; tagged mode emits LINE#HASH
  prefixes via formatHashLines.
- Delete hashline-edit.ts, hashline-read.ts. KEEP hashline.ts (helpers
  are now Edit/Read internals).
- tools/index.ts: drop the two tools + the createHashlineCodingTools
  preset.
- agent-session.ts: setEditMode no longer swaps tool instances (single
  tool surface; mode preserved for system-prompt context only).
- sdk.ts + index.ts: remove hashline tool re-exports.
- headless-ui.ts + test: remove hashline_edit case.

Net agent-visible tool surface: -2 tools. Capability preserved as modes.
No backward-compat alias for the removed tool names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:39:59 +02:00
Mikael Hugo
d03758d803 feat: replace launchd with systemd user-unit install path
Operator-direction 2026-05-17 "we will never use mac" — no compat
preservation. Single-cutover replacement.

- new packages/daemon/src/systemd.ts: install/uninstall/status using
  systemctl --user + ~/.config/systemd/user/sf-server.service
- new packages/daemon/src/systemd.test.ts: ports launchd tests, same
  shape, mocked systemctl via RunCommandFn injection + SF_SYSTEMD_USER_DIR
  env override for real filesystem tests
- cli-main.ts: switch import + update help text + status messages
- index.ts: re-export systemd module (installSystemdUnit, uninstallSystemdUnit,
  systemdUnitStatus, generateUnit, getServicePath, SystemdStatus, SystemdUnitOptions)
- DELETED: launchd.ts (253 LOC), launchd.test.ts (379 LOC)
- docs/dev/drafts/M053-per-repo-supervisor.md: remove "launchd" mention
- CHANGELOG.md: document systemd-only install path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:34 +02:00
Mikael Hugo
44915b73d4 rename: tool names → Claude-Code-aligned (Bash/Read/Write/Edit/Grep/Glob/LS); remove run_command/read_output/hashline duplicates
Per operator-direction 2026-05-17 (sf-mp9w20y1-nld9hc + "DONT KEE COMPAT" stance + adversarial-review override). Cross-vendor frontier LLMs are trained on PascalCase Claude Code tool names; calling them by SF's lowercase + novel names increases tool-call error rates. Single atomic cutover, no aliases. Internal implementations preserved; only the LLM-facing names + registrations change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:26:36 +02:00
Mikael Hugo
57fef5979d feat: make sf server the operator entrypoint 2026-05-17 17:23:46 +02:00
Mikael Hugo
c046ff9a6c fix: auto-rebuild workspace packages when src is newer than dist
Without this, edits to packages/coding-agent/src/* (or any other workspace
package src) silently land while the dist stays stale — agents continue
loading the old compiled JS and operators see "why didn't my edit take
effect?" symptoms. Observed 2026-05-17 wiring in the AST tools: vitest
(reading TS source) passed; runtime smoke test against dist failed because
no auto-rebuild fired.

Extends ensure-source-resources.cjs (which sf-from-source runs on every
launch) to also check workspace packages: agent-core, ai, coding-agent,
daemon, google-gemini-cli-provider, openai-codex-provider, rpc-client, tui.
For each, compare latest src mtime vs latest dist mtime (with a 100ms grace
window). If src is newer, run `npm run build -w @singularity-forge/<pkg>`.

Excludes:
  - packages/native (Rust build is 5–10 min; trigger manually via
    `node rust-engine/scripts/build.js --dev`).
  - Any package in SF_SKIP_WORKSPACE_AUTOBUILD (comma-separated).
  - Whole step disabled by SF_SKIP_WORKSPACE_AUTOBUILD=all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:19:25 +02:00
Mikael Hugo
6e3b3d3c54 feat: add Serena-style AST tools (ReplaceSymbol, InsertAroundSymbol, AstGrep)
Wraps the native AST primitives from @singularity-forge/native/{edit,ast} as
LLM tools so agents can do tree-sitter-anchored code edits instead of
substring-based Edit or line-anchor hashline.

- replace-symbol.ts (+117): wraps replaceSymbol(file, symbolPath, newBody);
  matches function/class/method declarations via tree-sitter, returns
  matched=false sentinel when the symbol isn't located.
- insert-around-symbol.ts (+122): wraps insertAroundSymbol with position
  enum BeforeDecl/AfterDecl/AtBodyStart/AtBodyEnd.
- ast-grep.ts (+152): wraps astGrep for pattern matching across files with
  $VAR/$$$ARGS meta-variables; returns ranked matches with byte/line/column
  + captured meta-variable bindings.

Each tool:
  - typebox schema matching the existing AgentTool pattern (edit.ts)
  - notifyFileChanged() into the LSP layer on write ops
  - resolveToCwd() for path normalization
  - catches native errors + returns isError result with the
    NativeUnavailableError message pointing operators to
    `nix develop` + `node rust-engine/scripts/build.js --dev`

Wire-in:
- tools/index.ts: re-exports + imports + entries in `allTools` map and
  createAllTools() factory.
- extension-manifest.json: ReplaceSymbol / InsertAroundSymbol / AstGrep
  appended to provides.tools so SF extension agents see them.

Higher value than substring/line-anchor for code in tree-sitter-supported
languages (TS/JS/TSX/Python/Rust). Edit + hashline remain for non-code
files. PascalCase names per the Claude-Code-aligned convention from
sf-mp9w20y1-nld9hc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:14:12 +02:00
Mikael Hugo
19b10eb67c feat: make sf-server own swarm registry sync 2026-05-17 17:05:16 +02:00
Mikael Hugo
33560c9b09 fix: auto-open configured web project 2026-05-17 16:38:11 +02:00
Mikael Hugo
0f5a606923 fix: native loader — loud banner on fallback + structured load log + helpers
- Stderr banner on fallback now multi-line with concrete fix steps
  (nix develop → node rust-engine/scripts/build.js --dev) so an operator
  scanning a 280MB cycle log can't miss it. The old single-line warning
  was easy to overlook (today's "WHY HAS NOBODY SEEN IF LOUD" check).
- Structured load record per process at .sf/runtime/native-engine-load.jsonl:
  {ts, pid, platformTag, source, binaryPath, sha256, loaded, errors?}.
  Lets operators audit which binary each SF process loaded — and detect
  ABI mismatches across daemon↔worker boundaries when different sha256
  values appear for the same platformTag (the "rare but real" concern
  flagged earlier today).
- Proxy error message now points to the build/install commands instead
  of just saying "not available". NativeUnavailableError is named for
  consumer try/catch chains.
- Fixed _loadedSuccessfully ordering — was set true BEFORE the require,
  leaving stale-true after a failed first attempt.
- New helpers isNativeLoaded(), nativeBinaryPath(), nativeBinarySha256()
  for diagnostic surfaces (sf headless query, doctor checks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:34:02 +02:00
Mikael Hugo
f87e9bc0d9 fix: attach web server to project without token 2026-05-17 16:25:47 +02:00
Mikael Hugo
eeb80bbbdd fix: register 6 detector gates + add adversarial-finding kind + watchdog log rotation
Three concrete fixes from open self-feedback assessment 2026-05-17:

- uok/gate-registry-bootstrap.js: register all 6 R081 detector gates
  (same-unit-loop, zero-progress, repeated-feedback-kind, artifact-flap,
  stale-lock, periodic-detector-sweep) alongside drift-detection and
  iter-completion-reconciler. Closes the gap reported by
  sf-mp9udspu-fsf7si — bootstrap previously registered 2 of 8 gates.

- self-feedback.js ALLOWED_KIND_DOMAINS: add `adversarial-finding`.
  Closes gap reported by sf-mp9u4i25-fczmcj — R075 (autonomous
  adversarial review) challenge unit had no kind to file findings under.

- sf-autonomous-watchdog.sh: delete watchdog-run-*.log files older than
  60 minutes at each cycle start. Without rotation .sf/ grew to 1.9 GB
  in 24h (today's snapshot). 60 min retention captures last cycle for
  post-incident triage; older state is already in DB + iterations.jsonl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:08:05 +02:00
Mikael Hugo
077fd0a2a7 remove A2A; swarm enrollment + status projection + web swarms view; headless refactor
- A2A removal per M054/R071 cancellation 2026-05-17 (-2294 lines):
  - docs/plans/A2A_ADOPTION_PLAN.md, MISSION-A2A-ADOPTION.md deleted
  - src/resources/extensions/sf/uok/a2a-agent-server.js,
    a2a-transport.js deleted
  - tests/a2a-auth.test.mjs deleted
  - swarm-dispatch.js purged of A2A-conditional code paths
- New: scripts/sf-swarm-enroll.mjs + test (operator-facing swarm
  enrollment, replaces former A2A pairing flow)
- New: src/status-projection.ts + test, web/lib/swarm-status.ts +
  test, web/components/sf/swarms-view.tsx, web/app/api/swarms/
  (web swarms-view surface — direct visibility into running swarm
  state without requiring TUI; aligns with project_tui_deprecating)
- headless-{answers,query,ui,headless}.ts: coordinated tweaks
  consistent with the headless-as-default direction (R124 proposal)
- docs/dev/drafts/M053-per-repo-supervisor.md: design refinement
- .sf/REQUIREMENTS.md: small text fixes (6/6 churn)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:04:06 +02:00
Mikael Hugo
ecac4328bd backfill: restore 48 R-entries from REQUIREMENTS.md to DB
DB corruption recovery (2026-05-17) rebuilt the requirements table
from valid btree pages; ROWID-by-ROWID scan found 60 of 68 R-entries.
The other 8 + 40 historical (R006-R009 + R022-R065) were never in the
DB to begin with — they had drifted into REQUIREMENTS.md only. Backfill
script parsed each ### Rxxx — title block and INSERTed into the
requirements table with proper class/status/description/why/notes
fields. Final DB count: 75 → 123, integrity_check ok, MD↔DB parity
restored.

The .gitignore tweak from the meta-supervisor commit landed earlier;
no functional change here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 15:57:45 +02:00
Mikael Hugo
1cd7890d64 fix: auto-version-bump swallowed operator-direction; ptrmap + lock guards
- sf-db-schema.js: auto_vacuum INCREMENTAL → NONE. The "Bad ptr map entry"
  corruption on 2026-05-17 was incremental-autovacuum ptrmap drift under
  concurrent writers. Recovered DB has no ptrmap; future fresh DBs must
  match. incremental_vacuum() callers in sf-db-core.js become no-ops.
- bin/sf-from-source: lock allowlist extended to skip readonly sf headless
  subcommands (--help, query, status, usage, reflect, feedback list,
  triage --list/--json). Previously every sf headless invocation tried
  to acquire the project lock — operator couldn't even inspect SF state
  while autonomous was running.
- self-feedback.js triageBlockedEntries: (1) treat empty/null/undefined
  sfVersion as unknown, not zero; (2) exempt operator-direction kinds
  (improvement-idea, architecture-defect, missing-feature, gap) from
  auto-version-bump close. Both were needed to prevent the R124 incident
  recurring.
- headless-feedback.ts handleAdd: populate sfVersion via getCurrentSfVersion
  + detect repoIdentity via isForgeRepo, not hardcoded "external"/"". An
  empty sfVersion sorts below any real semver, so the resolver retry-closed
  every operator-filed entry within seconds.

Net effect: R124 proposal (filed via sf headless feedback add) is no
longer auto-resolved as version-stale. Larger architectural fix (single-
writer SF daemon / RPC for all DB writes — M040 territory) tracked as
follow-up R-entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 15:51:36 +02:00
Mikael Hugo
87e9729c13 fix: shard sift search and project requirements 2026-05-17 15:38:55 +02:00
Mikael Hugo
3e5b6fc511 fix: reconcile iteration completion drift 2026-05-17 15:06:40 +02:00
Mikael Hugo
f643272a91 fix: preserve requirements projection fidelity 2026-05-17 15:02:25 +02:00
Mikael Hugo
4289946e11 fix: clear task verification status on revert 2026-05-17 14:59:20 +02:00
Mikael Hugo
3e002ca698 refactor: consolidate loop signals and gate registry wiring 2026-05-17 14:45:12 +02:00
Mikael Hugo
4d2266e57d fix: consolidate loop supervision gates 2026-05-17 14:35:40 +02:00
Mikael Hugo
625a830d2f wire R053-R056 detectors into auto-runaway-guard + R081 UokGate retrofit
- uok/auto-runaway-guard.js: invoke runDetectorSweep alongside the existing
  zero-progress check (fire-and-forget for sync-tick compatibility; results
  consumed on next tick via sweepState ring buffer). Passes unitId,
  unitMetrics, sessionFingerprint, lockPaths, and a 30-min DB-windowed
  recentFeedback slice.
- detectors/{same-unit-loop, zero-progress, repeated-feedback-kind,
  artifact-flap, stale-lock, periodic-runner}.js: each detector now also
  exports a UokGate wrapper (id/type/execute -> GateResult per ADR-0075).
  Plain detector functions kept for existing consumers.
- detectors/index.js: single import surface for the gate exports.
- detector-stale-lock.test.mjs (9), detector-periodic-runner.test.mjs (10),
  detector-gates-contract.test.mjs: fills the R055/R056 test gap filed
  earlier today + proves UokGate contract conformance.
- 41/41 detector tests green; copy-resources clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:18:54 +02:00
Mikael Hugo
527ebfcaa4 gitignore meta-supervisor runtime state
The previous commit accidentally tracked .sf/meta-status.json + .sf/meta-supervisor.pid (transient runtime files written by scripts/sf-meta-supervisor.mjs each tick). Mirror the existing .sf/runtime/ ignore pattern for these top-level meta-* files; the daemon keeps writing them on disk but git no longer tracks the churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:09:24 +02:00
Mikael Hugo
d5664f7142 meta-supervisor (node daemon) + R091 triage gate + R091-R094 spec
- scripts/sf-meta-supervisor.mjs: pure-node daemon supervising
  scripts/sf-autonomous-watchdog.sh. Tick=60s, restarts watchdog if dead,
  emits .sf/meta-status.json, halt via .sf/meta-supervisor.halt. Uses
  only node builtins (no SF dist deps) so it survives dist breakage.
- src/headless.ts: R091 — gate the per-cycle handleTriage call on a time
  interval (SF_TRIAGE_INTERVAL_MS, default 30 min) and bump batch size
  (SF_TRIAGE_MAX, default 25, was 5). Drops the ~8min triage hit from
  every cycle while letting daily drain capacity rise.
- .sf/REQUIREMENTS.md: R091 (triage sidecar) + R092 (PDD-completeness
  as routing signal) + R093 (pin model per orchestration agent.yaml) +
  R094 (swarm-role model tier specialization — 8 roles already exist
  in uok/swarm-roles.js; model field per role missing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:08:30 +02:00
Mikael Hugo
e93d17a3b4 spec + ADR annotations + dormant-code cleanup
- .sf/REQUIREMENTS.md: today's R-entries (R066..R090) covering parallel-rescue
  targets — bus deliver verify, drift detection gate, PDD typed contracts,
  lane split, Wiggums detector family, repo supervisor design.
- ADR-014/019/020: SF-first banners (operator direction: get SF working before
  ACE/wire-architecture changes land downstream).
- docs/records + drafts: 2026-05-07 strategy + cli-agent survey index refresh;
  SF/ACE pattern draft annotations.
- roadmap-mutations.js removed (dormant — never imported; reachable shape
  verified against handler-relative + dynamic import audit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:45:00 +02:00
Mikael Hugo
d2ff4e84ba land 6 parallel codex/sonnet rescue outputs
- R016 swarm bus deliver verify (uok/swarm-dispatch.js + test): _busDispatch now
  force-refreshes target inbox and verifies messageId visibility before returning
  ok:true; ack-without-deliver class closed.
- R082 drift detection UokGate (uok/drift-detection-gate.js + test): single-task
  + sweep scope; 3 drift classes (artifact-missing, prose-status mismatch,
  broken-import); follows ADR-0075 id/type/execute -> GateResult contract.
- R087 PDD typed contracts (engine-types.js + test): ADR-0000 8 PDD fields +
  7-dim run-control policy + ADR-0075 GateResult typedefs and validators.
- R090 planning-execute lane split (auto/unit-lanes.js + auto/loop.js + 2 tests):
  lane classifier + capacity-aware tick dispatcher; SF_LANES=0 fallback is
  byte-equivalent to pre-R090.
- R053 + R054 Wiggums detectors (detectors/repeated-feedback-kind.js +
  detectors/artifact-flap.js + 2 tests); R055 stale-lock + R056 periodic-runner
  source landed without tests (gap filed as self-feedback).
- M053 per-repo supervisor design + skeleton (supervisor/repo-supervisor.js +
  test + design doc): RepoSupervisor class, zero module-global state, tick
  stub, failure isolation; M056 trust-boundary called out as follow-up.

85/85 tests green across the 8 new test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:44:42 +02:00
Mikael Hugo
eaac4f0bd3 sf snapshot: uncommitted changes after 187m inactivity 2026-05-17 12:04:55 +02:00
Mikael Hugo
6f5e2f0aa9 spec(R060-R065): backup, schema-migration, secrets lifecycle, rate budget, cache policy, deprecation + M042-M047 2026-05-17 08:57:30 +02:00
Mikael Hugo
09687ccd30 spec(R059): typed entity vocabulary (R/A/D/M/S/T/F/G/K/P/E)
Operator: "should we have a for adrs and d for decisions? any other
type we should habe?" + "and so we use" — yes, file it and adopt.

Adds A (ADR), D (Decision), F (Finding), G (Gate), K (Knowledge),
P (Pattern), E (Evidence) prefixes to the existing R/M/S/T set.
Each gets a source-of-truth location and a mechanical migration path.

R048 (unbroken purpose chain) + R047 (per-R fulfillment validation)
both require typed cross-references to verify integrity. Without
typed IDs, "this M is covered by R, S, T, A, D" is unverifiable
free-text.

Owning milestone M041 (also added) splits the migration into 6
slices: rename ADRs, add D-IDs to DECISIONS, backfill F/G/K-IDs in
DB tables, doctor cross-link integrity check, lint for SF-authored
typed references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:52:37 +02:00
Mikael Hugo
72c2ecb2b2 spec(R058): DB writes via MessageBus (single-writer actor)
Operator: "db via bus so we dont crash it?" — yes, this is the natural
fit for the lane model (R057 system lane + R046 multi-unit lanes).
Concurrent lanes writing directly to SQLite hit SQLITE_BUSY or worse;
mediating all writes through a single MessageBus-owned db-writer actor
enforces the single-writer invariant operationally.

Reads stay direct (multi-reader WAL is safe). Migration is gradual
(table by table). Also serves as the substrate for multi-repo
federation (R028 / ADR-019/020) where cross-process DB sharing needs
message-based access anyway.

Future M040.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:48:40 +02:00
Mikael Hugo
14fe3fa20a spec(R057): rename to "System Lane" + introduce lane primitive
Operator coined "system lane" — better than my "side-track". Frames
the architecture cleanly. The lane primitive unifies:
- R046 (multi-unit lanes) parallel slice dispatch
- R049 (per-lane model routing) different LLM per lane
- R057 (system lane) non-unit work alongside unit lane

Today autoLoop is 1 unit lane. System lane runs alongside for memory
consolidation, triage drain, doctor audits, log compaction, reflection
assembly, catalog refresh — all currently queued between units.

Single-writer DB met by sf-db.js serial queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:45:30 +02:00
Mikael Hugo
9bd7067b69 fix(wiggums): permission level — "normal" + default fallback to "medium"
legacyPermissionLevelForProfile had a switch with cases for
restricted/trusted/unrestricted only, no case for "normal" (the
DEFAULT autonomous session profile per auto/session.js:377). "normal"
fell through to default → "low" — too restrictive for autonomous work.

Witnessed M010/S04/T01: solver note "TypeScript compilation and git
diff blocked by low permission level" — SF couldn't verify its own
deliverable because permissions were locked down despite running in
autonomous mode.

Fix:
- "normal" → "medium" (allows tsc, git, npm test)
- default → "medium" (was "low"); unknown profiles shouldn't cripple
  autonomous executors. Operators wanting strict mode set
  profile: "restricted" explicitly.

Per operator intent 2026-05-17: "SF should have permission even if
it can limit its agents and only allow orchestrator or whatever."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:38:10 +02:00
Mikael Hugo
02bac88a63 feat: Spawn-failure watchdog, status=failed transition, doctor signal,…
SF-Task: S04/T01
2026-05-17 08:36:18 +02:00
Mikael Hugo
8122a2b6c7 fix(watchdog): pre-flight smoke + crash-loop backoff
Two guards added after today's 2-hour crash-loop on missing
DEFAULT_STALE_TIMEOUT_MS export:

1. Pre-flight smoke test: \`sf --version\` must succeed before each
   cycle. If dist is broken (missing export, syntax error), pause
   5min + log loudly instead of immediately respawning into the same
   crash.

2. Crash-loop detection: 3 consecutive <90s failure exits → assume
   crash-loop, back off 5min before retry. Prevents the
   "100 crashes in 2 hours, 0 useful work" pattern we just hit.

Together: a broken dist causes ONE crash + a 5min pause, not a
2-hour CPU burn. Operator notices the pause in .sf/watchdog.log
and intervenes; in the meantime no resources wasted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:31:07 +02:00
Mikael Hugo
80ede48f06 sf snapshot: uncommitted changes after 246m inactivity 2026-05-17 08:28:04 +02:00
Mikael Hugo
c7b13607b5 fix(wiggums): exponential backoff on autoLoop halt-watchdog-break
When the halt-watchdog detects stuck state, the autoLoop was logging
"halt-watchdog-break" every iteration but otherwise tight-spinning
through dispatch-resolve at ~2s/iteration. 2026-05-17 dogfood logged
60+ such events in a 30s window — pure CPU burn while the actual
stuck condition stayed stuck.

Fix: exponential backoff (1s → 2s → 4s → 8s → 16s → capped at 30s)
based on how many halt thresholds have elapsed. Heartbeat() resets
when real progress resumes (existing behavior). Backoff costs nothing
when the loop is healthy.

One of the 14 Ralph-Wiggum patterns surfaced this session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:21:21 +02:00
Mikael Hugo
24d2b37562 fix(wiggums): verify PID liveness before "Another session running" message
Two sites told operator to "kill PID X" without checking X was alive:
- interrupted-session.js:formatInterruptedSessionRunningMessage
- auto.js autonomous-start blocking notification

Both report stale locks from crashed prior sessions as if a live session
exists, confusing operator and blocking restart. Session-lock.js already
has auto-recovery for stale-PID locks; these two surfaces just needed
matching liveness checks to label dead-PID locks correctly.

Now: dead-PID → "Stale lock from dead PID X — will be auto-recovered"
     alive-PID → original "kill X" message

Catches one of the 14 Ralph-Wiggum-obvious patterns surfaced this
session. Reduces operator confusion + dovetails with R055 (M038/S05)
when stale-lock auto-recovery becomes a core-loop detector.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:20:00 +02:00
Mikael Hugo
96d03b33bc spec: M038 Wiggums Detector family (R051-R056)
The autonomous loop currently lacks baseline "this is dumb-obvious
stuck" detection. This session alone surfaced 14 such patterns that
required operator grep-archaeology to identify. M038 centralizes a
single Wiggums-Detector orchestrator (R056) that runs 5 detector
questions every 30s:

  R051 — same-unit dispatched >3 times with no state change
  R052 — runtime/units progressCount:0 for >5min (heartbeating ghost)
  R053 — >5 self-feedback entries of same kind/target in 24h
  R054 — artifact predicates flapping between dispatches
  R055 — stale .sf/sf.lock from dead holder + stale inline-fix marker

Each detector pauses + files actionable self-feedback. Trivial cases
auto-fix (e.g. stale-lock rm). New detectors (R057+) plug into the
orchestrator without per-detector lifecycle code.

Anchors to ADR-0000 (purpose-to-software requires self-healing). Builds
on the recurring patterns evidenced 2026-05-17:
  - 70+ degenerate reassess iterations on M010/S03 (R051)
  - 56+ runaway-loop:idle-halt entries accumulated on M005 (R053)
  - Multiple stale-lock incidents requiring manual rm (R055)

56 R-entries total, 54/56 mapped (R049/R050 still future M036-M037).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:16:52 +02:00
Mikael Hugo
e470939723 spec(R051): same-unit dispatch-loop detection (Ralph Wiggum safety net)
When dispatcher resolves the same unit N>3 times in a session without
state-change between dispatches, detect the loop, pause, file
self-feedback. Targets the 2026-05-17 dogfood pattern where
reassess-roadmap M010/S03 ran 70+ times because of the ASSESSMENT
suffix mismatch (now fixed in a737af318).

Even after the immediate fix, this safety net prevents future
unknown-bug versions of the same failure mode from burning hours of
compute. R051 makes the failure first-class detectable instead of
operator-hand-debug.

Owning milestone M038 (future).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:12:55 +02:00
Mikael Hugo
a737af318d fix(dispatch): reassess-roadmap loop on slice with ASSESSMENT.md
checkNeedsReassessment looked up the slice's assessment file with
suffix='ASSESS', but actual files are 'S03-ASSESSMENT.md'. The
resolveFile pattern requires at least one char before the suffix
(/^S03-.*-ASSESS\.md$/), so 'S03-ASSESSMENT.md' never matched and
the helper returned {sliceId} on every poll → dispatcher kept
firing reassess-roadmap forever.

Fix: try 'ASSESSMENT' first, fall back to legacy 'ASSESS'. Now
S03-ASSESSMENT.md properly satisfies the "already reassessed" check
and the dispatcher advances to the next slice (S04).

Verified: resolveSliceFile('M010','S03','ASSESSMENT') returns the
real path; with the fallback, this resolves on first call. The
70+ degenerate reassess iterations on M010/S03 (witnessed
2026-05-17) won't recur.

Ralph Wiggum approved. (per operator: "sf should clear these stuck
itself ralph wiggums would fix")

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:12:15 +02:00
Mikael Hugo
0c0608fa50 fix(swarm): recognize unit-specific completion tools as implicit complete
Detected via supervisory check 2026-05-17: SF stuck in degenerate reassess-
roadmap loop on M010/S03 (5 iterations in 8min, all returning
outcome=continue). Root cause: synthesized-checkpoint in runUnitViaSwarm
only treats the generic `checkpoint` tool as a completion signal — but
units routinely complete via their unit-specific tool (reassess_roadmap
with verdict=roadmap-confirmed, validate_milestone, complete_milestone,
complete_slice, save_summary). The LLM correctly emitted the unit's
specific completion tool + assistant text "<turn_status>complete</turn_status>",
but workerSignaledOutcome stayed null → synthesized checkpoint fell back
to continue → solver re-iterated.

Fix: recognize UNIT_COMPLETION_TOOLS = {reassess_roadmap,
validate_milestone, complete_milestone, complete_slice, save_summary}
as implicit "complete" signals. The check fires when those tools are
called and an earlier explicit checkpoint hasn't already said
"complete" or "blocked".

This resolves sf-mp94lth4-ew26om and should prevent future
degenerate-iteration loops on reassess-roadmap and milestone completion
units. 13/13 existing M010 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:59:30 +02:00
Mikael Hugo
460db52504 chore(watchdog): enable SF_INLINE_DISPATCH=1 by default
Now that M010/S01+S02+S03 ship the inline-dispatch path (runUnitInline +
DispatchLayer + autoLoop wiring), the watchdog enables it on every
cycle so the autonomous loop actually exercises the inline scope for
INLINE_ELIGIBLE_UNITS (validate-milestone, complete-milestone,
reassess-roadmap). Other unit types continue to use the swarm path
unchanged.

This dogfoods M010/S03 in every watchdog cycle. If the inline path
regresses, the autonomous solver will surface it via self-feedback
(R015 spawn-failure loud-failure + agent-runner instrumentation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:54:47 +02:00
Mikael Hugo
7a273262f1 fix(benchmark-coverage): tier-strip fallbacks downgraded to 'approx' proxy
User caught: flash-lite ≠ flash (different model tier, different scores).
Previous fix counted flash-lite as fully covered via flash proxy, which
overstated coverage and could mislead routing.

benchmarkLookupVariants now tags variants with kind:
  - 'exact'  → date/version strip + -latest alias (same model line)
  - 'approx' → tier strip (flash-lite→flash, X-lite→X) — different model

computeBenchmarkCoverage promotes 'exact' matches to covered; 'approx'
matches stay in uncovered with `approximatedBy` field so operators see
when a real benchmark is still needed.

Honest report: 64 exact covered / 1 proxy-only / 104 genuine uncovered
(was 65/0/104 with the overcount).

R049 + R050 added to traceability (M036/M037 future milestones).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:52:29 +02:00
Mikael Hugo
1dc7c2e278 feat(benchmark-coverage): variant-fallback lookup (R050 step 1)
benchmark-coverage.js: new benchmarkLookupVariants() returns ordered
fallback keys for a model id, and computeBenchmarkCoverage tries each
variant before flagging uncovered. Patterns covered:
  - date/version suffix strip ("mistral-medium-2505" → "mistral-medium")
  - tier strip ("X-flash-lite" → "X-flash", "Y-lite" → "Y")
  - "-latest" append for bare names ("mistral-medium" → "mistral-medium-latest")
The audit reports the matched variant via `matchedVia` so operators can
see when fallback applied (vs adding a real entry).

Verified: coverage 62/169 (37%) → 65/169 (38.4%). Sample fallback matches:
  google-gemini-cli/gemini-2.5-flash-lite → gemini-2.5-flash
  mistral/mistral-medium → mistral-medium-latest
  mistral/magistral-small-2509 → magistral-small

R050 now active: full closure requires auto-benchmark of remaining
104 uncovered models via bulk-import of published scores or live eval.
This step shrinks the gap via cheap structural fallback; future work
adds the real scoring loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:50:21 +02:00
Mikael Hugo
bbce6827aa feat(dispatch): wire autoLoop to DispatchLayer via SF_INLINE_DISPATCH (M010/S03)
run-unit.js: new tryInlineDispatch helper routes inline-eligible unit
types through the M010/S02 DispatchLayer when env SF_INLINE_DISPATCH=1.
Safe by default — without the env var OR for non-eligible unit types
(any unit not in INLINE_ELIGIBLE_UNITS), behavior is byte-identical
to before. With the env var set on validate-milestone / complete-milestone
/ reassess-roadmap, the autoLoop reaches runUnitInline → runSubagent
in-process, no spawn.

The helper translates DispatchLayer's {ok, output, exitCode, stderr}
into the UnitResult shape that autoLoop's resolveAgentEnd/finalize
chain expects, so downstream handling works unchanged.

13/13 M010 tests still pass. M010/S03 marked complete.

R049 added: Multi-Provider Parallel Routing — different concurrent units
route to different LLM providers based on quota/specialty/cost/failover.
Builds on R046 + R017 + model-router scoring. Future M036.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:46:48 +02:00
Mikael Hugo
1b3e09d527 chore: map R041-R048 to M031-M035 + watchdog clears active.json
REQUIREMENTS.md: traceability table now has 48/48 R-entries mapped to
owning milestone slices (was 40/48 unmapped). M031 owns R041-R044
(R-to-Milestone bootstrap with deep research), M032 owns R045 (R-auto-
expansion), M033 owns R046 (autonomous loop parallel dispatch), M034
owns R047 (per-R fulfillment validation), M035 owns R048 (unbroken
purpose chain).

scripts/sf-autonomous-watchdog.sh: also clears .sf/runtime/autonomous-
solver/active.json on cycle restart. Without this, a unit in
status:running from a crashed prior run made the autoLoop spin in
halt-watchdog-break (witnessed in this session: iteration 239+ in 8min
without unit progress).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:43:23 +02:00