The file was using node:test which both passes (tests 2/2) but reports
the FILE as failed under vitest because vitest can't see node:test
suites in its harness. Same assertions, vitest shape — keeps the rest
of the test run clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bundles the working-tree state into one coherent commit covering the
upgrade-safety glue that complements today's earlier landings
(orphan-recovery, sf-db single-connection, drain-timer-not-unref'd,
forceShutdown drain, shutdown-state.ts, instrumentation.ts,
shutdown-signal.js, gate-deadlock-classifier).
Modified:
docker/Dockerfile.source-server — image build tweaks for the source-
server variant used by the in-container upgrader.
docker/docker-compose.vega.yaml — env passthroughs for host-side dirs
(SF_SOURCE_HOST_ROOT, SF_WORKSPACE_HOST_DIR, SF_WORKSPACES_HOST_DIR,
SF_HOME_HOST_DIR), docker socket mount, group_add for docker GID,
and SF_RPC_SHUTDOWN_GRACE_MS=600000 matching the 10-min drain.
scripts/run-vega-source-server.mjs — substantial rework supporting
the in-container upgrade flow.
scripts/upgrade-vega-source-server.mjs — buildEnv() + dockerBuildEnv()
helpers, probeBind via SF_VEGA_PROBE_HOST, containerExists()
pre-check before drainContainer, stop timeout now matches the
10-min RPC grace via SF_VEGA_DRAIN_STOP_TIME (default 610s).
src/web/project-discovery-service.ts — calls
recoverProjectRuntimeQueues() on each of the 3 discovery paths
(root monorepo, per-entry, nested SF projects). Closes the
cloud-volume mtime-lag window codex flagged.
web/app/api/ready/route.ts — calls recoverProjectRuntimeQueues() on
every readiness probe, and now also reads shutdown-state so the
probe returns 503 while draining.
web/components/sf/projects-view.tsx — UI wiring for the upgrade
trigger.
web/pages/api/projects.ts — backend API addition for the project
enumeration that feeds projects-view.
docs/specs/sf-self-deploy.md — docs update for the new flow.
package.json — script alias.
Added:
scripts/build-web-host.mjs — new build helper for the standalone web
host artifact consumed by the upgrade flow.
src/resources/extensions/sf/tests/auto-shutdown-signal.test.mjs —
unit test for the cooperative-shutdown signal module (registers /
requests / snapshot).
src/web/project-runtime-recovery.ts — thin wrapper around
recoverOrphanedFeedbackDrains for per-project use from web routes.
web/app/api/drain/route.ts — explicit drain endpoint for operator-
triggered queue flush.
web/app/api/server-upgrade/route.ts — auth-gated endpoint that
spawns the in-container upgrader via docker socket; passes through
host-dir env so the upgrader knows real bind-mount paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two upgrade-safety gaps codex flagged in the round before, both now
closed:
1. Next.js HTTP request drain — web/instrumentation.ts.
Next.js calls `register()` once at server boot. Installs one
SIGTERM/SIGINT/SIGHUP listener that:
- marks shutdown-state.ts (so /api/healthz returns 503 immediately
— LB/Traefik readinessProbe drains traffic away within ~4s)
- schedules process.exit after SF_WEB_SHUTDOWN_GRACE_MS (default
30s) — in-flight HTTP requests have time to finish; timer is
NOT unref'd so it keeps the process alive during the drain
Single-install guard via globalThis Symbol so jiti/bundle splits
don't end up with multiple racing timers.
2. Autonomous loop iteration-boundary shutdown awareness —
src/resources/extensions/sf/auto/shutdown-signal.js +
src/resources/extensions/sf/auto/loop.js iteration check.
Before: a SIGTERM mid-iteration killed the loop process before
the current unit's tool calls + DB writes could complete cleanly.
After: shutdown-signal flips a flag on first SIGTERM; loop polls
it at the top of each `while (s.active)` iteration; current unit
finishes, loop exits gracefully, the existing forceShutdown path
takes over to drain the sf_feedback queue and exit.
Includes a force-exit safety timer (SF_AUTONOMOUS_SHUTDOWN_GRACE_MS
or SF_RPC_SHUTDOWN_GRACE_MS, default 10 min) so a hung iteration
doesn't block exit indefinitely.
Test coverage:
- web-shutdown-state.test.ts extended: 6/6 (added ready-route
503-during-drain assertion).
- shutdown-signal: covered indirectly by loop dispatch tests; a
standalone unit test for register/request/snapshot is a small
follow-up.
Net of today's work, the upgrade safety chain for SF on Vega (Layer-1,
Tailscale Serve only) is operationally complete. Layer-2 (cluster
Traefik ingress with weighted blue/green) plugs in via the same
healthz-503 + recovery primitives — no further SF source changes
needed for that path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit removes infra/srv/ that I created in d23b99819. The
docker-compose-Traefik sketch was architecturally wrong:
- Traefik on this host is a Flux-managed Kubernetes DaemonSet at
/srv/infra/clusters/default/infrastructure/traefik/helmrelease.yaml
(hostNetwork: true, ports 80/443/18789/2222)
- Vega's k3s explicitly disables its own bundled Traefik
(--disable=traefik,servicelb,metrics-server) and relies on the
Flux-managed one
- So the correct Traefik integration for sf-server is k8s
IngressRoute + Service + Deployment manifests under
/srv/infra/apps/ or hosts/vega/, NOT a docker-compose stack in
the SF source tree
The sf-server Docker image (docker/Dockerfile.sf-server) and the
production-grade graceful-shutdown/recovery work in
packages/coding-agent/src/modes/rpc/ + src/web/shutdown-state.ts
all remain valid and necessary — they just plug into k8s/Traefik
via manifests in the operator's GitOps repo, not via this compose.
Naming: also moved infra/srv -> docker/vega briefly during this
session at the operator's nudging; both locations are gone now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New /infra/srv/ tree: production-style Docker compose that puts Traefik
in front of sf-server. Closes the orchestration gaps the bare-docker
upgrader (scripts/upgrade-vega-source-server.mjs) couldn't address:
1. Health-check-driven drain. Traefik polls /api/healthz every 2s.
The moment SF receives SIGTERM, src/web/shutdown-state.ts flips
the in-process flag and the route returns 503 (landed in
f8e53840d). ~4s later Traefik removes the replica from the pool
— new traffic stops, in-flight requests finish.
2. Sticky sessions via the `sf-aff` cookie. /api/session/events SSE
streams (and any other long-lived per-replica state) survive
client reconnects within the upgrade window because Traefik
pins the cookie to the same replica until that replica is gone.
3. Blue/green via the `sf-candidate` service. Guarded by Docker
compose profile=candidate so production traffic keeps flowing to
`sf` until the operator promotes. Image swap is then atomic from
a client perspective — old replica goes 503, new replica picks
up traffic before old container actually stops.
4. stop_grace_period: 610s matching SF_RPC_SHUTDOWN_GRACE_MS=600000.
If a self-feedback queue drain is in flight when SIGTERM lands,
it MUST finish. Losing writes across an upgrade is worse than the
wait. Hard-bypass via `docker kill` if the operator chooses; the
.draining file then gets recovered on the next start via
feedback-queue-recovery's startup scan.
infra/srv/README.md documents the runbook: bring-up, upgrade flow,
env vars, TLS notes, and what this does NOT replace (the existing
Dockerfile, k8s/Forgejo CI flow, and the source-server upgrader).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three fixes addressing codex's adversarial review of the earlier orphan-
recovery / graceful-shutdown landing:
(1) Codex point B — single shutdown path. Removed the parallel
installGracefulShutdown() handler in rpc-mode.ts that was adding
a second SIGTERM listener and racing forceShutdown()'s teardown.
The drain is now the FIRST step inside forceShutdown() (before
killTrackedDetachedChildren / extension session_shutdown / etc.)
so DB writes complete cleanly while child processes are still
alive to flush. Race-free against the existing shutdown ordering.
(2) Codex point D — recovery-before-each-drain. Cloud-volume mtime
visibility lags between containers can mean an orphan `.draining`
file from a previous container isn't visible during the startup
scan but appears moments later. drainQueuedSfFeedbackCommands()
now runs recoverOrphanedFeedbackDrains() as its first step, so
each dispatch's drain sees the latest filesystem state.
(3) Codex point E — healthz returns 503 during shutdown. New module
src/web/shutdown-state.ts holds a per-process flag, auto-registers
SIGTERM/SIGINT/SIGHUP handlers on first read, and exposes a
snapshot (signal, startedAt, elapsedMs) for diagnostics. The
healthz route imports isShuttingDown() and returns 503 when set,
so k8s readinessProbe / Forgejo blue-green probes drain traffic
BEFORE we actually stop responding.
Tests:
- rpc-mode-orphan-recovery.test.ts: 8/8 still green
- web-shutdown-state.test.ts: 5/5 new — default false, mark sets
flag, idempotent, signal exposed via snapshot, null signal for
manual mark
Deferred to a follow-up commit (codex didn't flag, but noted for
completeness): a SIGTERM-drain child-process integration test that
spawns rpc-mode + sends a real signal. The 5 unit tests cover the
flag logic; the integration test would cover the full process tree
and is bulkier than the current commit warrants.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tsgo rejects `.ts` extensions in imports without allowImportingTsExtensions.
Updated the test to import from "./feedback-queue-recovery.js" which is
both ESM-compatible and matches the rest of the package convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related changes to make blue/green upgrades (per scripts/upgrade-vega-
source-server.mjs) safe for in-flight self-feedback writes.
1. Startup orphan recovery (feedback-queue-recovery.ts, extracted module).
Scans .sf/runtime/ for sf-feedback-queue.jsonl.<pid>(.<sid>)?.draining
files left by previous processes. For each:
- if our own session id: leave alone (live drain)
- if PID is alive: leave alone (foreign drainer)
- else: rename back to queue (only if no active queue file exists)
Crash safety: when both an orphan AND an active queue exist, we DEFER
recovery rather than merge — appending then unlinking would risk
duplicate replay on crash. The next restart's recovery picks it up
once the queue is naturally drained. Supports legacy filenames
(.<pid>.draining, pre-session-id) for backward compat.
Added SF_DRAIN_SESSION_ID (per-process 6-byte hex) stamped into the
.draining filename. PID reuse across container restarts is normally
safe because /proc clears, but the session id is a stronger guarantee
that we don't trample a foreign drainer that happens to land on the
same PID.
2. SIGTERM/SIGINT drain-then-exit handler (installGracefulShutdown).
Drains the queue once on signal, then exits. Bounded by
SF_RPC_SHUTDOWN_GRACE_MS (default 600_000 = 10 min). Rationale: if
a drain is in flight, it MUST finish — losing self-feedback writes
across a server upgrade is worse than a long wait. Normal drains
complete in <1s; the 10-min ceiling is for pathological lock
contention. Operator overrides via env var, or docker kill /
kubectl delete --force for hard bypass.
Upgrader script bumped to docker stop --timeout 610 (10s safety
margin past the grace). k8s deployments must set
terminationGracePeriodSeconds≥610 for the rolling-update path.
Tests: rpc-mode-orphan-recovery.test.ts — 7 cases covering empty,
no-orphans, dead-PID single recovery, both-files-deferred (codex's
crash-safety fix), live-PID untouched, multiple-dead-PIDs, malformed-
filename ignored.
Refs sf-mpa5kdpu (drainer orphans never recovered), sf-mpa4g46x
(original RPC hang). Codex adversarial-reviewed; the PID-reuse hardening
and crash-safety deferral landed per its feedback. Open follow-ups:
shutdown-aware /api/healthz returning 503 (codex point E), integrate
with existing forceShutdown ordering (codex point C).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The drainer was scheduled via setTimeout(0) with timer.unref(). The unref
made the timer release-eligible — fine in a long-running rpc-mode child
where the process has plenty of other event-loop handles, but fatal in
the packaged-standalone path where the rpc subprocess has nothing else
to keep it alive. The process exited before the timer fired, so the
queue file was renamed to .<pid>.draining and then stranded forever.
Removed timer.unref(). The setTimeout(0) still lets the RPC response go
back to the caller first (no synchronous blocking on the drain), but the
timer now keeps the process alive until the drain handler runs, and the
drain's own async I/O keeps it alive until done.
Refs sf-mpa6wuhm-wwddd1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two SQLite connections were being opened in the same Node process when
the same module loaded under two graphs:
- the autonomous-loop side loads sf-db modules via normal ESM resolution
- src/headless-feedback.ts re-imports them via jiti.createJiti() so the
in-server `sf headless feedback ...` drain can call them without
bringing the agent extension into the rpc-mode bundle
Module-level `let currentDb / currentPath / currentPid` etc. lived on
two independent module instances, so each instance opened its own
SQLite handle to .sf/sf.db. WAL mode lets readers share, but two writer
connections in the same process produced SQLITE_BUSY / writer stalls —
the hang we saw on sf-mpa4g46x and the wedged-drainer recurrence after
the server restart at 19:35.
Fix: hoist the connection slot onto globalThis under a well-known
Symbol so every module instance points at the same record. All five
fields formerly module-level become `_sf.<field>` and live in one
shared object.
Codex's original diagnosis (split module-graph DB-writer contention)
was right; I dismissed it earlier because I missed that
headless-feedback uses jiti even though rpc-mode itself doesn't import
sf-db directly.
Verification:
- Syntax check: clean
- sf-db-migration.test.mjs: 12/13 pass. The one failure
(openDatabase_migrates_v27_tasks_without_created_at_through_spec_backfill
expects schema version 72, actual 73) is unrelated — a schema
migration landed elsewhere without bumping that test's expected
version.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes that close the gap between the gate-deadlock-classifier
landed in ab2c99686 and a working detection signal.
(1) Detector wrapper now returns outcome=manual-attention (not fail) when
a deadlock fires. The whole point of detecting the deadlock is to
escape it — returning `fail` would add another refusal and compound
the lockout. Same precedent as periodicDetectorSweepGate.
(2) New auto/gate-refusal-recorder.js — in-process ring buffer (cap 32,
TTL 30 min) that records UokGate refusals from the dispatcher.
Storage is intentionally in-memory; refusals are operational signals,
not durable state.
(3) auto/run-unit.js — calls recordGateRefusal() at the inline-route-refused
branch, passing the rationale (already includes `[gate-id]` prefix +
R-id status fragments the detector parses) plus unitType/unitId.
(4) detectors/periodic-runner.js — adds a `gate-deadlock` entry to the
default detector list, pulling ctx.gateRefusals from the caller OR
falling back to recentGateRefusals() from the recorder. ctx can also
override requirementCoverageByMilestone + resolveMilestoneId for tests.
After this change, an inline-route refusal flows:
inlineRuntimeGate.execute → outcome=fail
→ run-unit.js records the refusal in gate-refusal-recorder
→ periodic-runner sweep picks it up via recentGateRefusals()
→ detectGateDeadlock cross-references against milestone coverage
→ if overlap: detectorsFired includes {name:"gate-deadlock", signature}
→ periodicDetectorSweepGate surfaces as manual-attention
Tests: 16 detector + 10 existing periodic-runner = 26/26 pass. The
existing periodic-runner test exercises the default detector list, so
adding the new entry is implicitly validated.
Follow-up still open: have the periodic sweep file a self_feedback entry
when the gate-deadlock detector fires, so the operator and SF's autonomous
triage both see the signal without polling logs. That belongs in the
sweep handler, not the detector — separate commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The R074 inlineRuntimeGate refused inline dispatch for M048/S05 reassess-roadmap
because R020 and R066 are still 'active' — but those slices ARE the work that
validates R066. Autonomous mode stopped with no way to escape. Filed earlier as
sf-mpa4f9k1-jm01rc.
This detector classifies the pattern at runtime:
parseGateRefusal(rationale)
extracts gateId + refused requirement ids from gate-refusal text
matching shape "[gate-id] ... R020=active R066=active ..."
detectGateDeadlock(ctx, options)
ctx.gateRefusals: recent gate refusal events ({rationale, unitType, unitId})
ctx.requirementCoverageByMilestone: milestone -> R-ids in its DoD/coverage
ctx.resolveMilestoneId: optional unit -> milestone resolver
(default: strip after '/', require M-prefix)
Returns { stuck, reason: "gate-deadlock", signature: {
gateId, deadlockedRequirements, refusedUnits, examples, suggestedAction
}} when any refused unit's milestone coverage overlaps the gate's refused
requirements. Per-gateId throttle prevents repeat firings within 60s.
gateDeadlockClassifierGate
UokGate (type=verification per ADR-0075) wrapping the detector for
integration into periodicDetectorSweepGate + post-finalize sweeps.
Registered in uok/gate-registry-bootstrap.js between inlineRuntimeGate and the
existing detector chain. Also re-exported from detectors/index.js for the
common detector import surface.
Test coverage:
- parseGateRefusal: 5 cases (inline shape, dedup, missing reqs, missing gate, empty)
- detectGateDeadlock: 7 cases (empty input, fire-on-overlap, no-overlap,
empty coverage, throttle, custom resolver,
examples cap)
- UokGate wrapper: 3 cases (contract shape, pass, fail-with-findings)
- Threshold export sanity: 1 case
16/16 tests pass.
The wiring from autonomous-loop output (where gate refusals are emitted) into
the detector's gateRefusals input is a follow-up — this commit lands the
detector with a stable contract and tests it can be wired against.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: dev-server watched packages/daemon/src + dev scripts + package.json.
SF extension source edits in src/resources/extensions/sf/ AND coding-agent
edits in packages/coding-agent/src/ did NOT trigger restart. Operators had to
restart manually after copy-resources / git pull / coding-agent edits.
Adds three watched paths:
1. packages/coding-agent/src — rpc-mode hosts sf_feedback / start_autonomous
handlers, lives here. Edits must restart the sf child.
2. dist/resources/.sf-resource-build-stamp — atomic stamp updated by
copy-resources. Watching the stamp (not the dist tree) avoids heavy
recursive walk while picking up extension upgrades the moment they land.
Idempotent: ensure-source-resources only updates the stamp when an actual
rebuild ran, so no restart-loop on identical re-runs.
3. .git/HEAD — changes on pull / branch switch / commit. Catches upgrade
flows where source moved outside this process.
Native (packages/native/) intentionally not watched — Rust build is 5–10 min,
auto-trigger would loop. Operator triggers native rebuild manually per the
existing ensure-source-resources policy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit (cc32ab79d) accidentally landed truncated versions of the
new R074 + R075 files due to a cherry-pick partial-state. Restored:
- inline-runtime-gate.js: 74→96 LOC
- inline-runtime-gate.test.mjs: 115→273 LOC (15 tests; 2 sonnet-imagined
bootstrapGateRegistry/BOOTSTRAP_GATES tests rewritten to assert SF's
actual side-effect-on-import registry pattern)
- adversarial-budget.js: 86→106 LOC
- adversarial-budget.test.mjs: 63→132 LOC (9 tests)
- adversarial-finding-bridge.js: 123→191 LOC
- adversarial-finding-bridge.test.mjs: 98→216 LOC (14 tests)
45/45 tests pass across the four affected files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hashline read/edit tool wrappers were folded into Edit({match}) and
Read({format}) modes in commit ffdec0fee. The two rows in FILE-SYSTEM-MAP.md
pointed to files that no longer exist. Updated the surviving hashline.ts row
to note its new consumer relationship with Edit/Read.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove git-revert authority per operator decision M048-D1. Crash-loop
classifier sees runtime evidence, not commit attribution; reverting on
runtime symptoms risks reverting the wrong commit. On quarantine trigger,
smoke_gate is flipped false to halt ledger writes and a self-feedback entry
(kind: crash-loop-detected, severity: high) is filed with a manual-review
suggestion. Operator retains sole authority to git-revert.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Test had fixed literal timestamps (TS_X = "2026-05-17T12:42:05.618Z")
that became stale once the calendar moved past them — the reconciler's
default maxAgeMs (1h, "older drift is operator territory") filtered
them out. By 3h after the original write the test failed: reconciled.length
was 0 because no entry passed the age filter.
Switch to NOW-relative timestamps (5/30/1 min back from Date.now()) so
the fixture always lands inside the default age window regardless of
when the test runs.
Sonnet #13 (tool rename) report flagged this test as failing alongside
the 4 known pre-existing failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per operator-direction 2026-05-17 (R089 — Migrate Voice IVR / ElevenLabs
On-Call Paging Infrastructure out of SF). Migration target landed in
centralcloud monorepo:
- centralcloud_core/lib/centralcloud_core/voice.ex (TwiML + ElevenLabs)
- centralcloud_staff/lib/.../controllers/voice_controller.ex (Phoenix)
- centralcloud_staff/lib/.../controllers/voice_prompt_controller.ex
- centralcloud_staff/lib/.../router.ex (/twilio scope)
SF removal:
- web/app/api/voice/route.ts
- web/app/api/voice/prompt/route.ts
- web/app/api/voice/ directory
- src/tests/integration/web-voice-ivr-contract.test.ts
Operator-paging infra was historical drift in SF (per-project compiler);
belongs in centralcloud (org-level ops). R088 (Pre-Removal Test-Import
Safety Gate) not yet built — operator manually verified safety scan:
TWILIO_/ELEVENLABS_ env vars only referenced in the deleted files; no
internal SF callers; centralcloud version verified present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per operator-direction 2026-05-17 (sf-mp9w20y1-nld9hc + "DONT KEE COMPAT" stance + adversarial-review override). Cross-vendor frontier LLMs are trained on PascalCase Claude Code tool names; calling them by SF's lowercase + novel names increases tool-call error rates. Single atomic cutover, no aliases. Internal implementations preserved; only the LLM-facing names + registrations change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, edits to packages/coding-agent/src/* (or any other workspace
package src) silently land while the dist stays stale — agents continue
loading the old compiled JS and operators see "why didn't my edit take
effect?" symptoms. Observed 2026-05-17 wiring in the AST tools: vitest
(reading TS source) passed; runtime smoke test against dist failed because
no auto-rebuild fired.
Extends ensure-source-resources.cjs (which sf-from-source runs on every
launch) to also check workspace packages: agent-core, ai, coding-agent,
daemon, google-gemini-cli-provider, openai-codex-provider, rpc-client, tui.
For each, compare latest src mtime vs latest dist mtime (with a 100ms grace
window). If src is newer, run `npm run build -w @singularity-forge/<pkg>`.
Excludes:
- packages/native (Rust build is 5–10 min; trigger manually via
`node rust-engine/scripts/build.js --dev`).
- Any package in SF_SKIP_WORKSPACE_AUTOBUILD (comma-separated).
- Whole step disabled by SF_SKIP_WORKSPACE_AUTOBUILD=all.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the native AST primitives from @singularity-forge/native/{edit,ast} as
LLM tools so agents can do tree-sitter-anchored code edits instead of
substring-based Edit or line-anchor hashline.
- replace-symbol.ts (+117): wraps replaceSymbol(file, symbolPath, newBody);
matches function/class/method declarations via tree-sitter, returns
matched=false sentinel when the symbol isn't located.
- insert-around-symbol.ts (+122): wraps insertAroundSymbol with position
enum BeforeDecl/AfterDecl/AtBodyStart/AtBodyEnd.
- ast-grep.ts (+152): wraps astGrep for pattern matching across files with
$VAR/$$$ARGS meta-variables; returns ranked matches with byte/line/column
+ captured meta-variable bindings.
Each tool:
- typebox schema matching the existing AgentTool pattern (edit.ts)
- notifyFileChanged() into the LSP layer on write ops
- resolveToCwd() for path normalization
- catches native errors + returns isError result with the
NativeUnavailableError message pointing operators to
`nix develop` + `node rust-engine/scripts/build.js --dev`
Wire-in:
- tools/index.ts: re-exports + imports + entries in `allTools` map and
createAllTools() factory.
- extension-manifest.json: ReplaceSymbol / InsertAroundSymbol / AstGrep
appended to provides.tools so SF extension agents see them.
Higher value than substring/line-anchor for code in tree-sitter-supported
languages (TS/JS/TSX/Python/Rust). Edit + hashline remain for non-code
files. PascalCase names per the Claude-Code-aligned convention from
sf-mp9w20y1-nld9hc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Stderr banner on fallback now multi-line with concrete fix steps
(nix develop → node rust-engine/scripts/build.js --dev) so an operator
scanning a 280MB cycle log can't miss it. The old single-line warning
was easy to overlook (today's "WHY HAS NOBODY SEEN IF LOUD" check).
- Structured load record per process at .sf/runtime/native-engine-load.jsonl:
{ts, pid, platformTag, source, binaryPath, sha256, loaded, errors?}.
Lets operators audit which binary each SF process loaded — and detect
ABI mismatches across daemon↔worker boundaries when different sha256
values appear for the same platformTag (the "rare but real" concern
flagged earlier today).
- Proxy error message now points to the build/install commands instead
of just saying "not available". NativeUnavailableError is named for
consumer try/catch chains.
- Fixed _loadedSuccessfully ordering — was set true BEFORE the require,
leaving stale-true after a failed first attempt.
- New helpers isNativeLoaded(), nativeBinaryPath(), nativeBinarySha256()
for diagnostic surfaces (sf headless query, doctor checks).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concrete fixes from open self-feedback assessment 2026-05-17:
- uok/gate-registry-bootstrap.js: register all 6 R081 detector gates
(same-unit-loop, zero-progress, repeated-feedback-kind, artifact-flap,
stale-lock, periodic-detector-sweep) alongside drift-detection and
iter-completion-reconciler. Closes the gap reported by
sf-mp9udspu-fsf7si — bootstrap previously registered 2 of 8 gates.
- self-feedback.js ALLOWED_KIND_DOMAINS: add `adversarial-finding`.
Closes gap reported by sf-mp9u4i25-fczmcj — R075 (autonomous
adversarial review) challenge unit had no kind to file findings under.
- sf-autonomous-watchdog.sh: delete watchdog-run-*.log files older than
60 minutes at each cycle start. Without rotation .sf/ grew to 1.9 GB
in 24h (today's snapshot). 60 min retention captures last cycle for
post-incident triage; older state is already in DB + iterations.jsonl.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>