Two SQLite connections were being opened in the same Node process when
the same module loaded under two graphs:
- the autonomous-loop side loads sf-db modules via normal ESM resolution
- src/headless-feedback.ts re-imports them via jiti.createJiti() so the
in-server `sf headless feedback ...` drain can call them without
bringing the agent extension into the rpc-mode bundle
Module-level `let currentDb / currentPath / currentPid` etc. lived on
two independent module instances, so each instance opened its own
SQLite handle to .sf/sf.db. WAL mode lets readers share, but two writer
connections in the same process produced SQLITE_BUSY / writer stalls —
the hang we saw on sf-mpa4g46x and the wedged-drainer recurrence after
the server restart at 19:35.
Fix: hoist the connection slot onto globalThis under a well-known
Symbol so every module instance points at the same record. All five
fields formerly module-level become `_sf.<field>` and live in one
shared object.
Codex's original diagnosis (split module-graph DB-writer contention)
was right; I dismissed it earlier because I missed that
headless-feedback uses jiti even though rpc-mode itself doesn't import
sf-db directly.
Verification:
- Syntax check: clean
- sf-db-migration.test.mjs: 12/13 pass. The one failure
(openDatabase_migrates_v27_tasks_without_created_at_through_spec_backfill
expects schema version 72, actual 73) is unrelated — a schema
migration landed elsewhere without bumping that test's expected
version.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes that close the gap between the gate-deadlock-classifier
landed in ab2c99686 and a working detection signal.
(1) Detector wrapper now returns outcome=manual-attention (not fail) when
a deadlock fires. The whole point of detecting the deadlock is to
escape it — returning `fail` would add another refusal and compound
the lockout. Same precedent as periodicDetectorSweepGate.
(2) New auto/gate-refusal-recorder.js — in-process ring buffer (cap 32,
TTL 30 min) that records UokGate refusals from the dispatcher.
Storage is intentionally in-memory; refusals are operational signals,
not durable state.
(3) auto/run-unit.js — calls recordGateRefusal() at the inline-route-refused
branch, passing the rationale (already includes `[gate-id]` prefix +
R-id status fragments the detector parses) plus unitType/unitId.
(4) detectors/periodic-runner.js — adds a `gate-deadlock` entry to the
default detector list, pulling ctx.gateRefusals from the caller OR
falling back to recentGateRefusals() from the recorder. ctx can also
override requirementCoverageByMilestone + resolveMilestoneId for tests.
After this change, an inline-route refusal flows:
inlineRuntimeGate.execute → outcome=fail
→ run-unit.js records the refusal in gate-refusal-recorder
→ periodic-runner sweep picks it up via recentGateRefusals()
→ detectGateDeadlock cross-references against milestone coverage
→ if overlap: detectorsFired includes {name:"gate-deadlock", signature}
→ periodicDetectorSweepGate surfaces as manual-attention
Tests: 16 detector + 10 existing periodic-runner = 26/26 pass. The
existing periodic-runner test exercises the default detector list, so
adding the new entry is implicitly validated.
Follow-up still open: have the periodic sweep file a self_feedback entry
when the gate-deadlock detector fires, so the operator and SF's autonomous
triage both see the signal without polling logs. That belongs in the
sweep handler, not the detector — separate commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The R074 inlineRuntimeGate refused inline dispatch for M048/S05 reassess-roadmap
because R020 and R066 are still 'active' — but those slices ARE the work that
validates R066. Autonomous mode stopped with no way to escape. Filed earlier as
sf-mpa4f9k1-jm01rc.
This detector classifies the pattern at runtime:
parseGateRefusal(rationale)
extracts gateId + refused requirement ids from gate-refusal text
matching shape "[gate-id] ... R020=active R066=active ..."
detectGateDeadlock(ctx, options)
ctx.gateRefusals: recent gate refusal events ({rationale, unitType, unitId})
ctx.requirementCoverageByMilestone: milestone -> R-ids in its DoD/coverage
ctx.resolveMilestoneId: optional unit -> milestone resolver
(default: strip after '/', require M-prefix)
Returns { stuck, reason: "gate-deadlock", signature: {
gateId, deadlockedRequirements, refusedUnits, examples, suggestedAction
}} when any refused unit's milestone coverage overlaps the gate's refused
requirements. Per-gateId throttle prevents repeat firings within 60s.
gateDeadlockClassifierGate
UokGate (type=verification per ADR-0075) wrapping the detector for
integration into periodicDetectorSweepGate + post-finalize sweeps.
Registered in uok/gate-registry-bootstrap.js between inlineRuntimeGate and the
existing detector chain. Also re-exported from detectors/index.js for the
common detector import surface.
Test coverage:
- parseGateRefusal: 5 cases (inline shape, dedup, missing reqs, missing gate, empty)
- detectGateDeadlock: 7 cases (empty input, fire-on-overlap, no-overlap,
empty coverage, throttle, custom resolver,
examples cap)
- UokGate wrapper: 3 cases (contract shape, pass, fail-with-findings)
- Threshold export sanity: 1 case
16/16 tests pass.
The wiring from autonomous-loop output (where gate refusals are emitted) into
the detector's gateRefusals input is a follow-up — this commit lands the
detector with a stable contract and tests it can be wired against.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: dev-server watched packages/daemon/src + dev scripts + package.json.
SF extension source edits in src/resources/extensions/sf/ AND coding-agent
edits in packages/coding-agent/src/ did NOT trigger restart. Operators had to
restart manually after copy-resources / git pull / coding-agent edits.
Adds three watched paths:
1. packages/coding-agent/src — rpc-mode hosts sf_feedback / start_autonomous
handlers, lives here. Edits must restart the sf child.
2. dist/resources/.sf-resource-build-stamp — atomic stamp updated by
copy-resources. Watching the stamp (not the dist tree) avoids heavy
recursive walk while picking up extension upgrades the moment they land.
Idempotent: ensure-source-resources only updates the stamp when an actual
rebuild ran, so no restart-loop on identical re-runs.
3. .git/HEAD — changes on pull / branch switch / commit. Catches upgrade
flows where source moved outside this process.
Native (packages/native/) intentionally not watched — Rust build is 5–10 min,
auto-trigger would loop. Operator triggers native rebuild manually per the
existing ensure-source-resources policy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit (cc32ab79d) accidentally landed truncated versions of the
new R074 + R075 files due to a cherry-pick partial-state. Restored:
- inline-runtime-gate.js: 74→96 LOC
- inline-runtime-gate.test.mjs: 115→273 LOC (15 tests; 2 sonnet-imagined
bootstrapGateRegistry/BOOTSTRAP_GATES tests rewritten to assert SF's
actual side-effect-on-import registry pattern)
- adversarial-budget.js: 86→106 LOC
- adversarial-budget.test.mjs: 63→132 LOC (9 tests)
- adversarial-finding-bridge.js: 123→191 LOC
- adversarial-finding-bridge.test.mjs: 98→216 LOC (14 tests)
45/45 tests pass across the four affected files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hashline read/edit tool wrappers were folded into Edit({match}) and
Read({format}) modes in commit ffdec0fee. The two rows in FILE-SYSTEM-MAP.md
pointed to files that no longer exist. Updated the surviving hashline.ts row
to note its new consumer relationship with Edit/Read.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove git-revert authority per operator decision M048-D1. Crash-loop
classifier sees runtime evidence, not commit attribution; reverting on
runtime symptoms risks reverting the wrong commit. On quarantine trigger,
smoke_gate is flipped false to halt ledger writes and a self-feedback entry
(kind: crash-loop-detected, severity: high) is filed with a manual-review
suggestion. Operator retains sole authority to git-revert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test had fixed literal timestamps (TS_X = "2026-05-17T12:42:05.618Z")
that became stale once the calendar moved past them — the reconciler's
default maxAgeMs (1h, "older drift is operator territory") filtered
them out. By 3h after the original write the test failed: reconciled.length
was 0 because no entry passed the age filter.
Switch to NOW-relative timestamps (5/30/1 min back from Date.now()) so
the fixture always lands inside the default age window regardless of
when the test runs.
Sonnet #13 (tool rename) report flagged this test as failing alongside
the 4 known pre-existing failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per operator-direction 2026-05-17 (R089 — Migrate Voice IVR / ElevenLabs
On-Call Paging Infrastructure out of SF). Migration target landed in
centralcloud monorepo:
- centralcloud_core/lib/centralcloud_core/voice.ex (TwiML + ElevenLabs)
- centralcloud_staff/lib/.../controllers/voice_controller.ex (Phoenix)
- centralcloud_staff/lib/.../controllers/voice_prompt_controller.ex
- centralcloud_staff/lib/.../router.ex (/twilio scope)
SF removal:
- web/app/api/voice/route.ts
- web/app/api/voice/prompt/route.ts
- web/app/api/voice/ directory
- src/tests/integration/web-voice-ivr-contract.test.ts
Operator-paging infra was historical drift in SF (per-project compiler);
belongs in centralcloud (org-level ops). R088 (Pre-Removal Test-Import
Safety Gate) not yet built — operator manually verified safety scan:
TWILIO_/ELEVENLABS_ env vars only referenced in the deleted files; no
internal SF callers; centralcloud version verified present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per operator-direction 2026-05-17 (sf-mp9w20y1-nld9hc + "DONT KEE COMPAT" stance + adversarial-review override). Cross-vendor frontier LLMs are trained on PascalCase Claude Code tool names; calling them by SF's lowercase + novel names increases tool-call error rates. Single atomic cutover, no aliases. Internal implementations preserved; only the LLM-facing names + registrations change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, edits to packages/coding-agent/src/* (or any other workspace
package src) silently land while the dist stays stale — agents continue
loading the old compiled JS and operators see "why didn't my edit take
effect?" symptoms. Observed 2026-05-17 wiring in the AST tools: vitest
(reading TS source) passed; runtime smoke test against dist failed because
no auto-rebuild fired.
Extends ensure-source-resources.cjs (which sf-from-source runs on every
launch) to also check workspace packages: agent-core, ai, coding-agent,
daemon, google-gemini-cli-provider, openai-codex-provider, rpc-client, tui.
For each, compare latest src mtime vs latest dist mtime (with a 100ms grace
window). If src is newer, run `npm run build -w @singularity-forge/<pkg>`.
Excludes:
- packages/native (Rust build is 5–10 min; trigger manually via
`node rust-engine/scripts/build.js --dev`).
- Any package in SF_SKIP_WORKSPACE_AUTOBUILD (comma-separated).
- Whole step disabled by SF_SKIP_WORKSPACE_AUTOBUILD=all.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the native AST primitives from @singularity-forge/native/{edit,ast} as
LLM tools so agents can do tree-sitter-anchored code edits instead of
substring-based Edit or line-anchor hashline.
- replace-symbol.ts (+117): wraps replaceSymbol(file, symbolPath, newBody);
matches function/class/method declarations via tree-sitter, returns
matched=false sentinel when the symbol isn't located.
- insert-around-symbol.ts (+122): wraps insertAroundSymbol with position
enum BeforeDecl/AfterDecl/AtBodyStart/AtBodyEnd.
- ast-grep.ts (+152): wraps astGrep for pattern matching across files with
$VAR/$$$ARGS meta-variables; returns ranked matches with byte/line/column
+ captured meta-variable bindings.
Each tool:
- typebox schema matching the existing AgentTool pattern (edit.ts)
- notifyFileChanged() into the LSP layer on write ops
- resolveToCwd() for path normalization
- catches native errors + returns isError result with the
NativeUnavailableError message pointing operators to
`nix develop` + `node rust-engine/scripts/build.js --dev`
Wire-in:
- tools/index.ts: re-exports + imports + entries in `allTools` map and
createAllTools() factory.
- extension-manifest.json: ReplaceSymbol / InsertAroundSymbol / AstGrep
appended to provides.tools so SF extension agents see them.
Higher value than substring/line-anchor for code in tree-sitter-supported
languages (TS/JS/TSX/Python/Rust). Edit + hashline remain for non-code
files. PascalCase names per the Claude-Code-aligned convention from
sf-mp9w20y1-nld9hc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Stderr banner on fallback now multi-line with concrete fix steps
(nix develop → node rust-engine/scripts/build.js --dev) so an operator
scanning a 280MB cycle log can't miss it. The old single-line warning
was easy to overlook (today's "WHY HAS NOBODY SEEN IF LOUD" check).
- Structured load record per process at .sf/runtime/native-engine-load.jsonl:
{ts, pid, platformTag, source, binaryPath, sha256, loaded, errors?}.
Lets operators audit which binary each SF process loaded — and detect
ABI mismatches across daemon↔worker boundaries when different sha256
values appear for the same platformTag (the "rare but real" concern
flagged earlier today).
- Proxy error message now points to the build/install commands instead
of just saying "not available". NativeUnavailableError is named for
consumer try/catch chains.
- Fixed _loadedSuccessfully ordering — was set true BEFORE the require,
leaving stale-true after a failed first attempt.
- New helpers isNativeLoaded(), nativeBinaryPath(), nativeBinarySha256()
for diagnostic surfaces (sf headless query, doctor checks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concrete fixes from open self-feedback assessment 2026-05-17:
- uok/gate-registry-bootstrap.js: register all 6 R081 detector gates
(same-unit-loop, zero-progress, repeated-feedback-kind, artifact-flap,
stale-lock, periodic-detector-sweep) alongside drift-detection and
iter-completion-reconciler. Closes the gap reported by
sf-mp9udspu-fsf7si — bootstrap previously registered 2 of 8 gates.
- self-feedback.js ALLOWED_KIND_DOMAINS: add `adversarial-finding`.
Closes gap reported by sf-mp9u4i25-fczmcj — R075 (autonomous
adversarial review) challenge unit had no kind to file findings under.
- sf-autonomous-watchdog.sh: delete watchdog-run-*.log files older than
60 minutes at each cycle start. Without rotation .sf/ grew to 1.9 GB
in 24h (today's snapshot). 60 min retention captures last cycle for
post-incident triage; older state is already in DB + iterations.jsonl.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DB corruption recovery (2026-05-17) rebuilt the requirements table
from valid btree pages; ROWID-by-ROWID scan found 60 of 68 R-entries.
The other 8 + 40 historical (R006-R009 + R022-R065) were never in the
DB to begin with — they had drifted into REQUIREMENTS.md only. Backfill
script parsed each ### Rxxx — title block and INSERTed into the
requirements table with proper class/status/description/why/notes
fields. Final DB count: 75 → 123, integrity_check ok, MD↔DB parity
restored.
The .gitignore tweak from the meta-supervisor commit landed earlier;
no functional change here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- sf-db-schema.js: auto_vacuum INCREMENTAL → NONE. The "Bad ptr map entry"
corruption on 2026-05-17 was incremental-autovacuum ptrmap drift under
concurrent writers. Recovered DB has no ptrmap; future fresh DBs must
match. incremental_vacuum() callers in sf-db-core.js become no-ops.
- bin/sf-from-source: lock allowlist extended to skip readonly sf headless
subcommands (--help, query, status, usage, reflect, feedback list,
triage --list/--json). Previously every sf headless invocation tried
to acquire the project lock — operator couldn't even inspect SF state
while autonomous was running.
- self-feedback.js triageBlockedEntries: (1) treat empty/null/undefined
sfVersion as unknown, not zero; (2) exempt operator-direction kinds
(improvement-idea, architecture-defect, missing-feature, gap) from
auto-version-bump close. Both were needed to prevent the R124 incident
recurring.
- headless-feedback.ts handleAdd: populate sfVersion via getCurrentSfVersion
+ detect repoIdentity via isForgeRepo, not hardcoded "external"/"". An
empty sfVersion sorts below any real semver, so the resolver retry-closed
every operator-filed entry within seconds.
Net effect: R124 proposal (filed via sf headless feedback add) is no
longer auto-resolved as version-stale. Larger architectural fix (single-
writer SF daemon / RPC for all DB writes — M040 territory) tracked as
follow-up R-entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- uok/auto-runaway-guard.js: invoke runDetectorSweep alongside the existing
zero-progress check (fire-and-forget for sync-tick compatibility; results
consumed on next tick via sweepState ring buffer). Passes unitId,
unitMetrics, sessionFingerprint, lockPaths, and a 30-min DB-windowed
recentFeedback slice.
- detectors/{same-unit-loop, zero-progress, repeated-feedback-kind,
artifact-flap, stale-lock, periodic-runner}.js: each detector now also
exports a UokGate wrapper (id/type/execute -> GateResult per ADR-0075).
Plain detector functions kept for existing consumers.
- detectors/index.js: single import surface for the gate exports.
- detector-stale-lock.test.mjs (9), detector-periodic-runner.test.mjs (10),
detector-gates-contract.test.mjs: fills the R055/R056 test gap filed
earlier today + proves UokGate contract conformance.
- 41/41 detector tests green; copy-resources clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit accidentally tracked .sf/meta-status.json + .sf/meta-supervisor.pid (transient runtime files written by scripts/sf-meta-supervisor.mjs each tick). Mirror the existing .sf/runtime/ ignore pattern for these top-level meta-* files; the daemon keeps writing them on disk but git no longer tracks the churn.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scripts/sf-meta-supervisor.mjs: pure-node daemon supervising
scripts/sf-autonomous-watchdog.sh. Tick=60s, restarts watchdog if dead,
emits .sf/meta-status.json, halt via .sf/meta-supervisor.halt. Uses
only node builtins (no SF dist deps) so it survives dist breakage.
- src/headless.ts: R091 — gate the per-cycle handleTriage call on a time
interval (SF_TRIAGE_INTERVAL_MS, default 30 min) and bump batch size
(SF_TRIAGE_MAX, default 25, was 5). Drops the ~8min triage hit from
every cycle while letting daily drain capacity rise.
- .sf/REQUIREMENTS.md: R091 (triage sidecar) + R092 (PDD-completeness
as routing signal) + R093 (pin model per orchestration agent.yaml) +
R094 (swarm-role model tier specialization — 8 roles already exist
in uok/swarm-roles.js; model field per role missing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operator: "should we have a for adrs and d for decisions? any other
type we should habe?" + "and so we use" — yes, file it and adopt.
Adds A (ADR), D (Decision), F (Finding), G (Gate), K (Knowledge),
P (Pattern), E (Evidence) prefixes to the existing R/M/S/T set.
Each gets a source-of-truth location and a mechanical migration path.
R048 (unbroken purpose chain) + R047 (per-R fulfillment validation)
both require typed cross-references to verify integrity. Without
typed IDs, "this M is covered by R, S, T, A, D" is unverifiable
free-text.
Owning milestone M041 (also added) splits the migration into 6
slices: rename ADRs, add D-IDs to DECISIONS, backfill F/G/K-IDs in
DB tables, doctor cross-link integrity check, lint for SF-authored
typed references.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Operator: "db via bus so we dont crash it?" — yes, this is the natural
fit for the lane model (R057 system lane + R046 multi-unit lanes).
Concurrent lanes writing directly to SQLite hit SQLITE_BUSY or worse;
mediating all writes through a single MessageBus-owned db-writer actor
enforces the single-writer invariant operationally.
Reads stay direct (multi-reader WAL is safe). Migration is gradual
(table by table). Also serves as the substrate for multi-repo
federation (R028 / ADR-019/020) where cross-process DB sharing needs
message-based access anyway.
Future M040.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>