Commit graph

4695 commits

Author SHA1 Message Date
Mikael Hugo
cc67970fa0 fix(sf-db): share open-DB state across module instances via globalThis
Two SQLite connections were being opened in the same Node process when
the same module loaded under two graphs:

  - the autonomous-loop side loads sf-db modules via normal ESM resolution
  - src/headless-feedback.ts re-imports them via jiti.createJiti() so the
    in-server `sf headless feedback ...` drain can call them without
    bringing the agent extension into the rpc-mode bundle

Module-level `let currentDb / currentPath / currentPid` etc. lived on
two independent module instances, so each instance opened its own
SQLite handle to .sf/sf.db. WAL mode lets readers share, but two writer
connections in the same process produced SQLITE_BUSY / writer stalls —
the hang we saw on sf-mpa4g46x and the wedged-drainer recurrence after
the server restart at 19:35.

Fix: hoist the connection slot onto globalThis under a well-known
Symbol so every module instance points at the same record. All five
fields formerly module-level become `_sf.<field>` and live in one
shared object.

Codex's original diagnosis (split module-graph DB-writer contention)
was right; I dismissed it earlier because I missed that
headless-feedback uses jiti even though rpc-mode itself doesn't import
sf-db directly.

Verification:
  - Syntax check: clean
  - sf-db-migration.test.mjs: 12/13 pass. The one failure
    (openDatabase_migrates_v27_tasks_without_created_at_through_spec_backfill
    expects schema version 72, actual 73) is unrelated — a schema
    migration landed elsewhere without bumping that test's expected
    version.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:47:01 +02:00
Mikael Hugo
a3469f2334 feat(detectors): wire gate-deadlock-classifier into the autonomous loop
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Three changes that close the gap between the gate-deadlock-classifier
landed in ab2c99686 and a working detection signal.

(1) Detector wrapper now returns outcome=manual-attention (not fail) when
    a deadlock fires. The whole point of detecting the deadlock is to
    escape it — returning `fail` would add another refusal and compound
    the lockout. Same precedent as periodicDetectorSweepGate.

(2) New auto/gate-refusal-recorder.js — in-process ring buffer (cap 32,
    TTL 30 min) that records UokGate refusals from the dispatcher.
    Storage is intentionally in-memory; refusals are operational signals,
    not durable state.

(3) auto/run-unit.js — calls recordGateRefusal() at the inline-route-refused
    branch, passing the rationale (already includes `[gate-id]` prefix +
    R-id status fragments the detector parses) plus unitType/unitId.

(4) detectors/periodic-runner.js — adds a `gate-deadlock` entry to the
    default detector list, pulling ctx.gateRefusals from the caller OR
    falling back to recentGateRefusals() from the recorder. ctx can also
    override requirementCoverageByMilestone + resolveMilestoneId for tests.

After this change, an inline-route refusal flows:

  inlineRuntimeGate.execute → outcome=fail
    → run-unit.js records the refusal in gate-refusal-recorder
    → periodic-runner sweep picks it up via recentGateRefusals()
    → detectGateDeadlock cross-references against milestone coverage
    → if overlap: detectorsFired includes {name:"gate-deadlock", signature}
    → periodicDetectorSweepGate surfaces as manual-attention

Tests: 16 detector + 10 existing periodic-runner = 26/26 pass. The
existing periodic-runner test exercises the default detector list, so
adding the new entry is implicitly validated.

Follow-up still open: have the periodic sweep file a self_feedback entry
when the gate-deadlock detector fires, so the operator and SF's autonomous
triage both see the signal without polling logs. That belongs in the
sweep handler, not the detector — separate commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:19:29 +02:00
Mikael Hugo
ab2c996866 feat(detectors): gate-deadlock-classifier — Wiggums detector for R074 self-deadlock
The R074 inlineRuntimeGate refused inline dispatch for M048/S05 reassess-roadmap
because R020 and R066 are still 'active' — but those slices ARE the work that
validates R066. Autonomous mode stopped with no way to escape. Filed earlier as
sf-mpa4f9k1-jm01rc.

This detector classifies the pattern at runtime:

  parseGateRefusal(rationale)
    extracts gateId + refused requirement ids from gate-refusal text
    matching shape "[gate-id] ... R020=active R066=active ..."

  detectGateDeadlock(ctx, options)
    ctx.gateRefusals: recent gate refusal events ({rationale, unitType, unitId})
    ctx.requirementCoverageByMilestone: milestone -> R-ids in its DoD/coverage
    ctx.resolveMilestoneId: optional unit -> milestone resolver
        (default: strip after '/', require M-prefix)
    Returns { stuck, reason: "gate-deadlock", signature: {
      gateId, deadlockedRequirements, refusedUnits, examples, suggestedAction
    }} when any refused unit's milestone coverage overlaps the gate's refused
    requirements. Per-gateId throttle prevents repeat firings within 60s.

  gateDeadlockClassifierGate
    UokGate (type=verification per ADR-0075) wrapping the detector for
    integration into periodicDetectorSweepGate + post-finalize sweeps.

Registered in uok/gate-registry-bootstrap.js between inlineRuntimeGate and the
existing detector chain. Also re-exported from detectors/index.js for the
common detector import surface.

Test coverage:
  - parseGateRefusal: 5 cases (inline shape, dedup, missing reqs, missing gate, empty)
  - detectGateDeadlock: 7 cases (empty input, fire-on-overlap, no-overlap,
                                 empty coverage, throttle, custom resolver,
                                 examples cap)
  - UokGate wrapper: 3 cases (contract shape, pass, fail-with-findings)
  - Threshold export sanity: 1 case
  16/16 tests pass.

The wiring from autonomous-loop output (where gate refusals are emitted) into
the detector's gateRefusals input is a follow-up — this commit lands the
detector with a stable contract and tests it can be wired against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:15:21 +02:00
Mikael Hugo
acd907fec2 fix: harden sf server control loop
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
2026-05-17 21:13:12 +02:00
Mikael Hugo
70d89eebec feat(dev-server): auto-reload on SF extension + coding-agent + git upgrades
Before: dev-server watched packages/daemon/src + dev scripts + package.json.
SF extension source edits in src/resources/extensions/sf/ AND coding-agent
edits in packages/coding-agent/src/ did NOT trigger restart. Operators had to
restart manually after copy-resources / git pull / coding-agent edits.

Adds three watched paths:

1. packages/coding-agent/src — rpc-mode hosts sf_feedback / start_autonomous
   handlers, lives here. Edits must restart the sf child.

2. dist/resources/.sf-resource-build-stamp — atomic stamp updated by
   copy-resources. Watching the stamp (not the dist tree) avoids heavy
   recursive walk while picking up extension upgrades the moment they land.
   Idempotent: ensure-source-resources only updates the stamp when an actual
   rebuild ran, so no restart-loop on identical re-runs.

3. .git/HEAD — changes on pull / branch switch / commit. Catches upgrade
   flows where source moved outside this process.

Native (packages/native/) intentionally not watched — Rust build is 5–10 min,
auto-trigger would loop. Operator triggers native rebuild manually per the
existing ensure-source-resources policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:03:49 +02:00
Mikael Hugo
1ac2527b36 chore: auto-commit after challenge
SF-Unit: M048/S05/challenge
2026-05-17 20:36:26 +02:00
Mikael Hugo
dd03d17089 chore: auto-commit after challenge
SF-Unit: M048/S04/challenge
2026-05-17 20:33:12 +02:00
Mikael Hugo
d8fd70e57f fix(sf): keep web autonomy on proven routes 2026-05-17 20:24:51 +02:00
Mikael Hugo
8f097f8dca chore: auto-commit after challenge
SF-Unit: M048/S03/challenge
2026-05-17 20:16:24 +02:00
Mikael Hugo
cf2d1a768e feat(sf): route server control through rpc 2026-05-17 20:07:36 +02:00
Mikael Hugo
1f7fa1222c build(ts): update native TypeScript 7 preview 2026-05-17 19:21:25 +02:00
Mikael Hugo
3adcb833ed refactor(sf): separate daemon from server identity 2026-05-17 19:18:33 +02:00
Mikael Hugo
187d736930 fix(sf): run source server with live web host 2026-05-17 19:13:10 +02:00
Mikael Hugo
f7b262f33a fix(sf): harden server pid lifecycle 2026-05-17 19:00:21 +02:00
Mikael Hugo
3568972059 fix(sf): use fixed server port 2026-05-17 18:55:21 +02:00
Mikael Hugo
425bba7d39 fix: restore full content of R074/R075 swarm files from worktrees
The prior commit (cc32ab79d) accidentally landed truncated versions of the
new R074 + R075 files due to a cherry-pick partial-state. Restored:

- inline-runtime-gate.js: 74→96 LOC
- inline-runtime-gate.test.mjs: 115→273 LOC (15 tests; 2 sonnet-imagined
  bootstrapGateRegistry/BOOTSTRAP_GATES tests rewritten to assert SF's
  actual side-effect-on-import registry pattern)
- adversarial-budget.js: 86→106 LOC
- adversarial-budget.test.mjs: 63→132 LOC (9 tests)
- adversarial-finding-bridge.js: 123→191 LOC
- adversarial-finding-bridge.test.mjs: 98→216 LOC (14 tests)

45/45 tests pass across the four affected files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:54:39 +02:00
Mikael Hugo
cc32ab79d9 fix(docs): remove stale hashline-{read,edit}.ts rows post-fold
Hashline read/edit tool wrappers were folded into Edit({match}) and
Read({format}) modes in commit ffdec0fee. The two rows in FILE-SYSTEM-MAP.md
pointed to files that no longer exist. Updated the surviving hashline.ts row
to note its new consumer relationship with Edit/Read.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:48:34 +02:00
Mikael Hugo
781a7e7319 chore(safety): narrow autonomous-rollback to flag-flip only (R066 D1)
Remove git-revert authority per operator decision M048-D1. Crash-loop
classifier sees runtime evidence, not commit attribution; reverting on
runtime symptoms risks reverting the wrong commit. On quarantine trigger,
smoke_gate is flipped false to halt ledger writes and a self-feedback entry
(kind: crash-loop-detected, severity: high) is filed with a manual-review
suggestion. Operator retains sole authority to git-revert.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 18:24:00 +02:00
Mikael Hugo
c2f101734f feat: enforce purpose-first adversarial review 2026-05-17 18:15:15 +02:00
Mikael Hugo
acafee06e2 fix: iter-completion-reconciler test uses relative timestamps
Test had fixed literal timestamps (TS_X = "2026-05-17T12:42:05.618Z")
that became stale once the calendar moved past them — the reconciler's
default maxAgeMs (1h, "older drift is operator territory") filtered
them out. By 3h after the original write the test failed: reconciled.length
was 0 because no entry passed the age filter.

Switch to NOW-relative timestamps (5/30/1 min back from Date.now()) so
the fixture always lands inside the default age window regardless of
when the test runs.

Sonnet #13 (tool rename) report flagged this test as failing alongside
the 4 known pre-existing failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:49:11 +02:00
Mikael Hugo
623af869b1 remove: SF voice IVR / ElevenLabs paging — migrated to centralcloud
Per operator-direction 2026-05-17 (R089 — Migrate Voice IVR / ElevenLabs
On-Call Paging Infrastructure out of SF). Migration target landed in
centralcloud monorepo:
  - centralcloud_core/lib/centralcloud_core/voice.ex (TwiML + ElevenLabs)
  - centralcloud_staff/lib/.../controllers/voice_controller.ex (Phoenix)
  - centralcloud_staff/lib/.../controllers/voice_prompt_controller.ex
  - centralcloud_staff/lib/.../router.ex (/twilio scope)

SF removal:
  - web/app/api/voice/route.ts
  - web/app/api/voice/prompt/route.ts
  - web/app/api/voice/ directory
  - src/tests/integration/web-voice-ivr-contract.test.ts

Operator-paging infra was historical drift in SF (per-project compiler);
belongs in centralcloud (org-level ops). R088 (Pre-Removal Test-Import
Safety Gate) not yet built — operator manually verified safety scan:
TWILIO_/ELEVENLABS_ env vars only referenced in the deleted files; no
internal SF callers; centralcloud version verified present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:42:16 +02:00
Mikael Hugo
ffdec0feee fold: hashline_edit + hashline_read → Edit({match}) + Read({format}) modes
Per operator R-entry sf-mp9wo7e3-sdxqss + no-compat directive.

- Edit gains `match: "substring"|"anchor"` arg; anchor mode routes to the
  existing applyHashlineEdits logic. Substring stays default.
- Read gains `format: "plain"|"tagged"` arg; tagged mode emits LINE#HASH
  prefixes via formatHashLines.
- Delete hashline-edit.ts, hashline-read.ts. KEEP hashline.ts (helpers
  are now Edit/Read internals).
- tools/index.ts: drop the two tools + the createHashlineCodingTools
  preset.
- agent-session.ts: setEditMode no longer swaps tool instances (single
  tool surface; mode preserved for system-prompt context only).
- sdk.ts + index.ts: remove hashline tool re-exports.
- headless-ui.ts + test: remove hashline_edit case.

Net agent-visible tool surface: -2 tools. Capability preserved as modes.
No backward-compat alias for the removed tool names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:39:59 +02:00
Mikael Hugo
d03758d803 feat: replace launchd with systemd user-unit install path
Operator-direction 2026-05-17 "we will never use mac" — no compat
preservation. Single-cutover replacement.

- new packages/daemon/src/systemd.ts: install/uninstall/status using
  systemctl --user + ~/.config/systemd/user/sf-server.service
- new packages/daemon/src/systemd.test.ts: ports launchd tests, same
  shape, mocked systemctl via RunCommandFn injection + SF_SYSTEMD_USER_DIR
  env override for real filesystem tests
- cli-main.ts: switch import + update help text + status messages
- index.ts: re-export systemd module (installSystemdUnit, uninstallSystemdUnit,
  systemdUnitStatus, generateUnit, getServicePath, SystemdStatus, SystemdUnitOptions)
- DELETED: launchd.ts (253 LOC), launchd.test.ts (379 LOC)
- docs/dev/drafts/M053-per-repo-supervisor.md: remove "launchd" mention
- CHANGELOG.md: document systemd-only install path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-17 17:33:34 +02:00
Mikael Hugo
44915b73d4 rename: tool names → Claude-Code-aligned (Bash/Read/Write/Edit/Grep/Glob/LS); remove run_command/read_output/hashline duplicates
Per operator-direction 2026-05-17 (sf-mp9w20y1-nld9hc + "DONT KEE COMPAT" stance + adversarial-review override). Cross-vendor frontier LLMs are trained on PascalCase Claude Code tool names; calling them by SF's lowercase + novel names increases tool-call error rates. Single atomic cutover, no aliases. Internal implementations preserved; only the LLM-facing names + registrations change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:26:36 +02:00
Mikael Hugo
57fef5979d feat: make sf server the operator entrypoint 2026-05-17 17:23:46 +02:00
Mikael Hugo
c046ff9a6c fix: auto-rebuild workspace packages when src is newer than dist
Without this, edits to packages/coding-agent/src/* (or any other workspace
package src) silently land while the dist stays stale — agents continue
loading the old compiled JS and operators see "why didn't my edit take
effect?" symptoms. Observed 2026-05-17 wiring in the AST tools: vitest
(reading TS source) passed; runtime smoke test against dist failed because
no auto-rebuild fired.

Extends ensure-source-resources.cjs (which sf-from-source runs on every
launch) to also check workspace packages: agent-core, ai, coding-agent,
daemon, google-gemini-cli-provider, openai-codex-provider, rpc-client, tui.
For each, compare latest src mtime vs latest dist mtime (with a 100ms grace
window). If src is newer, run `npm run build -w @singularity-forge/<pkg>`.

Excludes:
  - packages/native (Rust build is 5–10 min; trigger manually via
    `node rust-engine/scripts/build.js --dev`).
  - Any package in SF_SKIP_WORKSPACE_AUTOBUILD (comma-separated).
  - Whole step disabled by SF_SKIP_WORKSPACE_AUTOBUILD=all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:19:25 +02:00
Mikael Hugo
6e3b3d3c54 feat: add Serena-style AST tools (ReplaceSymbol, InsertAroundSymbol, AstGrep)
Wraps the native AST primitives from @singularity-forge/native/{edit,ast} as
LLM tools so agents can do tree-sitter-anchored code edits instead of
substring-based Edit or line-anchor hashline.

- replace-symbol.ts (+117): wraps replaceSymbol(file, symbolPath, newBody);
  matches function/class/method declarations via tree-sitter, returns
  matched=false sentinel when the symbol isn't located.
- insert-around-symbol.ts (+122): wraps insertAroundSymbol with position
  enum BeforeDecl/AfterDecl/AtBodyStart/AtBodyEnd.
- ast-grep.ts (+152): wraps astGrep for pattern matching across files with
  $VAR/$$$ARGS meta-variables; returns ranked matches with byte/line/column
  + captured meta-variable bindings.

Each tool:
  - typebox schema matching the existing AgentTool pattern (edit.ts)
  - notifyFileChanged() into the LSP layer on write ops
  - resolveToCwd() for path normalization
  - catches native errors + returns isError result with the
    NativeUnavailableError message pointing operators to
    `nix develop` + `node rust-engine/scripts/build.js --dev`

Wire-in:
- tools/index.ts: re-exports + imports + entries in `allTools` map and
  createAllTools() factory.
- extension-manifest.json: ReplaceSymbol / InsertAroundSymbol / AstGrep
  appended to provides.tools so SF extension agents see them.

Higher value than substring/line-anchor for code in tree-sitter-supported
languages (TS/JS/TSX/Python/Rust). Edit + hashline remain for non-code
files. PascalCase names per the Claude-Code-aligned convention from
sf-mp9w20y1-nld9hc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 17:14:12 +02:00
Mikael Hugo
19b10eb67c feat: make sf-server own swarm registry sync 2026-05-17 17:05:16 +02:00
Mikael Hugo
33560c9b09 fix: auto-open configured web project 2026-05-17 16:38:11 +02:00
Mikael Hugo
0f5a606923 fix: native loader — loud banner on fallback + structured load log + helpers
- Stderr banner on fallback now multi-line with concrete fix steps
  (nix develop → node rust-engine/scripts/build.js --dev) so an operator
  scanning a 280MB cycle log can't miss it. The old single-line warning
  was easy to overlook (today's "WHY HAS NOBODY SEEN IF LOUD" check).
- Structured load record per process at .sf/runtime/native-engine-load.jsonl:
  {ts, pid, platformTag, source, binaryPath, sha256, loaded, errors?}.
  Lets operators audit which binary each SF process loaded — and detect
  ABI mismatches across daemon↔worker boundaries when different sha256
  values appear for the same platformTag (the "rare but real" concern
  flagged earlier today).
- Proxy error message now points to the build/install commands instead
  of just saying "not available". NativeUnavailableError is named for
  consumer try/catch chains.
- Fixed _loadedSuccessfully ordering — was set true BEFORE the require,
  leaving stale-true after a failed first attempt.
- New helpers isNativeLoaded(), nativeBinaryPath(), nativeBinarySha256()
  for diagnostic surfaces (sf headless query, doctor checks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:34:02 +02:00
Mikael Hugo
f87e9bc0d9 fix: attach web server to project without token 2026-05-17 16:25:47 +02:00
Mikael Hugo
eeb80bbbdd fix: register 6 detector gates + add adversarial-finding kind + watchdog log rotation
Three concrete fixes from open self-feedback assessment 2026-05-17:

- uok/gate-registry-bootstrap.js: register all 6 R081 detector gates
  (same-unit-loop, zero-progress, repeated-feedback-kind, artifact-flap,
  stale-lock, periodic-detector-sweep) alongside drift-detection and
  iter-completion-reconciler. Closes the gap reported by
  sf-mp9udspu-fsf7si — bootstrap previously registered 2 of 8 gates.

- self-feedback.js ALLOWED_KIND_DOMAINS: add `adversarial-finding`.
  Closes gap reported by sf-mp9u4i25-fczmcj — R075 (autonomous
  adversarial review) challenge unit had no kind to file findings under.

- sf-autonomous-watchdog.sh: delete watchdog-run-*.log files older than
  60 minutes at each cycle start. Without rotation .sf/ grew to 1.9 GB
  in 24h (today's snapshot). 60 min retention captures last cycle for
  post-incident triage; older state is already in DB + iterations.jsonl.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:08:05 +02:00
Mikael Hugo
077fd0a2a7 remove A2A; swarm enrollment + status projection + web swarms view; headless refactor
- A2A removal per M054/R071 cancellation 2026-05-17 (-2294 lines):
  - docs/plans/A2A_ADOPTION_PLAN.md, MISSION-A2A-ADOPTION.md deleted
  - src/resources/extensions/sf/uok/a2a-agent-server.js,
    a2a-transport.js deleted
  - tests/a2a-auth.test.mjs deleted
  - swarm-dispatch.js purged of A2A-conditional code paths
- New: scripts/sf-swarm-enroll.mjs + test (operator-facing swarm
  enrollment, replaces former A2A pairing flow)
- New: src/status-projection.ts + test, web/lib/swarm-status.ts +
  test, web/components/sf/swarms-view.tsx, web/app/api/swarms/
  (web swarms-view surface — direct visibility into running swarm
  state without requiring TUI; aligns with project_tui_deprecating)
- headless-{answers,query,ui,headless}.ts: coordinated tweaks
  consistent with the headless-as-default direction (R124 proposal)
- docs/dev/drafts/M053-per-repo-supervisor.md: design refinement
- .sf/REQUIREMENTS.md: small text fixes (6/6 churn)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 16:04:06 +02:00
Mikael Hugo
ecac4328bd backfill: restore 48 R-entries from REQUIREMENTS.md to DB
DB corruption recovery (2026-05-17) rebuilt the requirements table
from valid btree pages; ROWID-by-ROWID scan found 60 of 68 R-entries.
The other 8 + 40 historical (R006-R009 + R022-R065) were never in the
DB to begin with — they had drifted into REQUIREMENTS.md only. Backfill
script parsed each ### Rxxx — title block and INSERTed into the
requirements table with proper class/status/description/why/notes
fields. Final DB count: 75 → 123, integrity_check ok, MD↔DB parity
restored.

The .gitignore tweak from the meta-supervisor commit landed earlier;
no functional change here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 15:57:45 +02:00
Mikael Hugo
1cd7890d64 fix: auto-version-bump swallowed operator-direction; ptrmap + lock guards
- sf-db-schema.js: auto_vacuum INCREMENTAL → NONE. The "Bad ptr map entry"
  corruption on 2026-05-17 was incremental-autovacuum ptrmap drift under
  concurrent writers. Recovered DB has no ptrmap; future fresh DBs must
  match. incremental_vacuum() callers in sf-db-core.js become no-ops.
- bin/sf-from-source: lock allowlist extended to skip readonly sf headless
  subcommands (--help, query, status, usage, reflect, feedback list,
  triage --list/--json). Previously every sf headless invocation tried
  to acquire the project lock — operator couldn't even inspect SF state
  while autonomous was running.
- self-feedback.js triageBlockedEntries: (1) treat empty/null/undefined
  sfVersion as unknown, not zero; (2) exempt operator-direction kinds
  (improvement-idea, architecture-defect, missing-feature, gap) from
  auto-version-bump close. Both were needed to prevent the R124 incident
  recurring.
- headless-feedback.ts handleAdd: populate sfVersion via getCurrentSfVersion
  + detect repoIdentity via isForgeRepo, not hardcoded "external"/"". An
  empty sfVersion sorts below any real semver, so the resolver retry-closed
  every operator-filed entry within seconds.

Net effect: R124 proposal (filed via sf headless feedback add) is no
longer auto-resolved as version-stale. Larger architectural fix (single-
writer SF daemon / RPC for all DB writes — M040 territory) tracked as
follow-up R-entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 15:51:36 +02:00
Mikael Hugo
87e9729c13 fix: shard sift search and project requirements 2026-05-17 15:38:55 +02:00
Mikael Hugo
3e5b6fc511 fix: reconcile iteration completion drift 2026-05-17 15:06:40 +02:00
Mikael Hugo
f643272a91 fix: preserve requirements projection fidelity 2026-05-17 15:02:25 +02:00
Mikael Hugo
4289946e11 fix: clear task verification status on revert 2026-05-17 14:59:20 +02:00
Mikael Hugo
3e002ca698 refactor: consolidate loop signals and gate registry wiring 2026-05-17 14:45:12 +02:00
Mikael Hugo
4d2266e57d fix: consolidate loop supervision gates 2026-05-17 14:35:40 +02:00
Mikael Hugo
625a830d2f wire R053-R056 detectors into auto-runaway-guard + R081 UokGate retrofit
- uok/auto-runaway-guard.js: invoke runDetectorSweep alongside the existing
  zero-progress check (fire-and-forget for sync-tick compatibility; results
  consumed on next tick via sweepState ring buffer). Passes unitId,
  unitMetrics, sessionFingerprint, lockPaths, and a 30-min DB-windowed
  recentFeedback slice.
- detectors/{same-unit-loop, zero-progress, repeated-feedback-kind,
  artifact-flap, stale-lock, periodic-runner}.js: each detector now also
  exports a UokGate wrapper (id/type/execute -> GateResult per ADR-0075).
  Plain detector functions kept for existing consumers.
- detectors/index.js: single import surface for the gate exports.
- detector-stale-lock.test.mjs (9), detector-periodic-runner.test.mjs (10),
  detector-gates-contract.test.mjs: fills the R055/R056 test gap filed
  earlier today + proves UokGate contract conformance.
- 41/41 detector tests green; copy-resources clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:18:54 +02:00
Mikael Hugo
527ebfcaa4 gitignore meta-supervisor runtime state
The previous commit accidentally tracked .sf/meta-status.json + .sf/meta-supervisor.pid (transient runtime files written by scripts/sf-meta-supervisor.mjs each tick). Mirror the existing .sf/runtime/ ignore pattern for these top-level meta-* files; the daemon keeps writing them on disk but git no longer tracks the churn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:09:24 +02:00
Mikael Hugo
d5664f7142 meta-supervisor (node daemon) + R091 triage gate + R091-R094 spec
- scripts/sf-meta-supervisor.mjs: pure-node daemon supervising
  scripts/sf-autonomous-watchdog.sh. Tick=60s, restarts watchdog if dead,
  emits .sf/meta-status.json, halt via .sf/meta-supervisor.halt. Uses
  only node builtins (no SF dist deps) so it survives dist breakage.
- src/headless.ts: R091 — gate the per-cycle handleTriage call on a time
  interval (SF_TRIAGE_INTERVAL_MS, default 30 min) and bump batch size
  (SF_TRIAGE_MAX, default 25, was 5). Drops the ~8min triage hit from
  every cycle while letting daily drain capacity rise.
- .sf/REQUIREMENTS.md: R091 (triage sidecar) + R092 (PDD-completeness
  as routing signal) + R093 (pin model per orchestration agent.yaml) +
  R094 (swarm-role model tier specialization — 8 roles already exist
  in uok/swarm-roles.js; model field per role missing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:08:30 +02:00
Mikael Hugo
e93d17a3b4 spec + ADR annotations + dormant-code cleanup
- .sf/REQUIREMENTS.md: today's R-entries (R066..R090) covering parallel-rescue
  targets — bus deliver verify, drift detection gate, PDD typed contracts,
  lane split, Wiggums detector family, repo supervisor design.
- ADR-014/019/020: SF-first banners (operator direction: get SF working before
  ACE/wire-architecture changes land downstream).
- docs/records + drafts: 2026-05-07 strategy + cli-agent survey index refresh;
  SF/ACE pattern draft annotations.
- roadmap-mutations.js removed (dormant — never imported; reachable shape
  verified against handler-relative + dynamic import audit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:45:00 +02:00
Mikael Hugo
d2ff4e84ba land 6 parallel codex/sonnet rescue outputs
- R016 swarm bus deliver verify (uok/swarm-dispatch.js + test): _busDispatch now
  force-refreshes target inbox and verifies messageId visibility before returning
  ok:true; ack-without-deliver class closed.
- R082 drift detection UokGate (uok/drift-detection-gate.js + test): single-task
  + sweep scope; 3 drift classes (artifact-missing, prose-status mismatch,
  broken-import); follows ADR-0075 id/type/execute -> GateResult contract.
- R087 PDD typed contracts (engine-types.js + test): ADR-0000 8 PDD fields +
  7-dim run-control policy + ADR-0075 GateResult typedefs and validators.
- R090 planning-execute lane split (auto/unit-lanes.js + auto/loop.js + 2 tests):
  lane classifier + capacity-aware tick dispatcher; SF_LANES=0 fallback is
  byte-equivalent to pre-R090.
- R053 + R054 Wiggums detectors (detectors/repeated-feedback-kind.js +
  detectors/artifact-flap.js + 2 tests); R055 stale-lock + R056 periodic-runner
  source landed without tests (gap filed as self-feedback).
- M053 per-repo supervisor design + skeleton (supervisor/repo-supervisor.js +
  test + design doc): RepoSupervisor class, zero module-global state, tick
  stub, failure isolation; M056 trust-boundary called out as follow-up.

85/85 tests green across the 8 new test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:44:42 +02:00
Mikael Hugo
eaac4f0bd3 sf snapshot: uncommitted changes after 187m inactivity 2026-05-17 12:04:55 +02:00
Mikael Hugo
6f5e2f0aa9 spec(R060-R065): backup, schema-migration, secrets lifecycle, rate budget, cache policy, deprecation + M042-M047 2026-05-17 08:57:30 +02:00
Mikael Hugo
09687ccd30 spec(R059): typed entity vocabulary (R/A/D/M/S/T/F/G/K/P/E)
Operator: "should we have a for adrs and d for decisions? any other
type we should habe?" + "and so we use" — yes, file it and adopt.

Adds A (ADR), D (Decision), F (Finding), G (Gate), K (Knowledge),
P (Pattern), E (Evidence) prefixes to the existing R/M/S/T set.
Each gets a source-of-truth location and a mechanical migration path.

R048 (unbroken purpose chain) + R047 (per-R fulfillment validation)
both require typed cross-references to verify integrity. Without
typed IDs, "this M is covered by R, S, T, A, D" is unverifiable
free-text.

Owning milestone M041 (also added) splits the migration into 6
slices: rename ADRs, add D-IDs to DECISIONS, backfill F/G/K-IDs in
DB tables, doctor cross-link integrity check, lint for SF-authored
typed references.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:52:37 +02:00
Mikael Hugo
72c2ecb2b2 spec(R058): DB writes via MessageBus (single-writer actor)
Operator: "db via bus so we dont crash it?" — yes, this is the natural
fit for the lane model (R057 system lane + R046 multi-unit lanes).
Concurrent lanes writing directly to SQLite hit SQLITE_BUSY or worse;
mediating all writes through a single MessageBus-owned db-writer actor
enforces the single-writer invariant operationally.

Reads stay direct (multi-reader WAL is safe). Migration is gradual
(table by table). Also serves as the substrate for multi-repo
federation (R028 / ADR-019/020) where cross-process DB sharing needs
message-based access anyway.

Future M040.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:48:40 +02:00