singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	d03758d803	feat: replace launchd with systemd user-unit install path Operator-direction 2026-05-17 "we will never use mac" — no compat preservation. Single-cutover replacement. - new packages/daemon/src/systemd.ts: install/uninstall/status using systemctl --user + ~/.config/systemd/user/sf-server.service - new packages/daemon/src/systemd.test.ts: ports launchd tests, same shape, mocked systemctl via RunCommandFn injection + SF_SYSTEMD_USER_DIR env override for real filesystem tests - cli-main.ts: switch import + update help text + status messages - index.ts: re-export systemd module (installSystemdUnit, uninstallSystemdUnit, systemdUnitStatus, generateUnit, getServicePath, SystemdStatus, SystemdUnitOptions) - DELETED: launchd.ts (253 LOC), launchd.test.ts (379 LOC) - docs/dev/drafts/M053-per-repo-supervisor.md: remove "launchd" mention - CHANGELOG.md: document systemd-only install path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-17 17:33:34 +02:00
Mikael Hugo	44915b73d4	rename: tool names → Claude-Code-aligned (Bash/Read/Write/Edit/Grep/Glob/LS); remove run_command/read_output/hashline duplicates Per operator-direction 2026-05-17 (sf-mp9w20y1-nld9hc + "DONT KEE COMPAT" stance + adversarial-review override). Cross-vendor frontier LLMs are trained on PascalCase Claude Code tool names; calling them by SF's lowercase + novel names increases tool-call error rates. Single atomic cutover, no aliases. Internal implementations preserved; only the LLM-facing names + registrations change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 17:26:36 +02:00
Mikael Hugo	57fef5979d	feat: make sf server the operator entrypoint	2026-05-17 17:23:46 +02:00
Mikael Hugo	c046ff9a6c	fix: auto-rebuild workspace packages when src is newer than dist Without this, edits to packages/coding-agent/src/* (or any other workspace package src) silently land while the dist stays stale — agents continue loading the old compiled JS and operators see "why didn't my edit take effect?" symptoms. Observed 2026-05-17 wiring in the AST tools: vitest (reading TS source) passed; runtime smoke test against dist failed because no auto-rebuild fired. Extends ensure-source-resources.cjs (which sf-from-source runs on every launch) to also check workspace packages: agent-core, ai, coding-agent, daemon, google-gemini-cli-provider, openai-codex-provider, rpc-client, tui. For each, compare latest src mtime vs latest dist mtime (with a 100ms grace window). If src is newer, run `npm run build -w @singularity-forge/<pkg>`. Excludes: - packages/native (Rust build is 5–10 min; trigger manually via `node rust-engine/scripts/build.js --dev`). - Any package in SF_SKIP_WORKSPACE_AUTOBUILD (comma-separated). - Whole step disabled by SF_SKIP_WORKSPACE_AUTOBUILD=all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 17:19:25 +02:00
Mikael Hugo	6e3b3d3c54	feat: add Serena-style AST tools (ReplaceSymbol, InsertAroundSymbol, AstGrep) Wraps the native AST primitives from @singularity-forge/native/{edit,ast} as LLM tools so agents can do tree-sitter-anchored code edits instead of substring-based Edit or line-anchor hashline. - replace-symbol.ts (+117): wraps replaceSymbol(file, symbolPath, newBody); matches function/class/method declarations via tree-sitter, returns matched=false sentinel when the symbol isn't located. - insert-around-symbol.ts (+122): wraps insertAroundSymbol with position enum BeforeDecl/AfterDecl/AtBodyStart/AtBodyEnd. - ast-grep.ts (+152): wraps astGrep for pattern matching across files with $VAR/$$$ARGS meta-variables; returns ranked matches with byte/line/column + captured meta-variable bindings. Each tool: - typebox schema matching the existing AgentTool pattern (edit.ts) - notifyFileChanged() into the LSP layer on write ops - resolveToCwd() for path normalization - catches native errors + returns isError result with the NativeUnavailableError message pointing operators to `nix develop` + `node rust-engine/scripts/build.js --dev` Wire-in: - tools/index.ts: re-exports + imports + entries in `allTools` map and createAllTools() factory. - extension-manifest.json: ReplaceSymbol / InsertAroundSymbol / AstGrep appended to provides.tools so SF extension agents see them. Higher value than substring/line-anchor for code in tree-sitter-supported languages (TS/JS/TSX/Python/Rust). Edit + hashline remain for non-code files. PascalCase names per the Claude-Code-aligned convention from sf-mp9w20y1-nld9hc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 17:14:12 +02:00
Mikael Hugo	19b10eb67c	feat: make sf-server own swarm registry sync	2026-05-17 17:05:16 +02:00
Mikael Hugo	33560c9b09	fix: auto-open configured web project	2026-05-17 16:38:11 +02:00
Mikael Hugo	0f5a606923	fix: native loader — loud banner on fallback + structured load log + helpers - Stderr banner on fallback now multi-line with concrete fix steps (nix develop → node rust-engine/scripts/build.js --dev) so an operator scanning a 280MB cycle log can't miss it. The old single-line warning was easy to overlook (today's "WHY HAS NOBODY SEEN IF LOUD" check). - Structured load record per process at .sf/runtime/native-engine-load.jsonl: {ts, pid, platformTag, source, binaryPath, sha256, loaded, errors?}. Lets operators audit which binary each SF process loaded — and detect ABI mismatches across daemon↔worker boundaries when different sha256 values appear for the same platformTag (the "rare but real" concern flagged earlier today). - Proxy error message now points to the build/install commands instead of just saying "not available". NativeUnavailableError is named for consumer try/catch chains. - Fixed _loadedSuccessfully ordering — was set true BEFORE the require, leaving stale-true after a failed first attempt. - New helpers isNativeLoaded(), nativeBinaryPath(), nativeBinarySha256() for diagnostic surfaces (sf headless query, doctor checks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 16:34:02 +02:00
Mikael Hugo	f87e9bc0d9	fix: attach web server to project without token	2026-05-17 16:25:47 +02:00
Mikael Hugo	eeb80bbbdd	fix: register 6 detector gates + add adversarial-finding kind + watchdog log rotation Three concrete fixes from open self-feedback assessment 2026-05-17: - uok/gate-registry-bootstrap.js: register all 6 R081 detector gates (same-unit-loop, zero-progress, repeated-feedback-kind, artifact-flap, stale-lock, periodic-detector-sweep) alongside drift-detection and iter-completion-reconciler. Closes the gap reported by sf-mp9udspu-fsf7si — bootstrap previously registered 2 of 8 gates. - self-feedback.js ALLOWED_KIND_DOMAINS: add `adversarial-finding`. Closes gap reported by sf-mp9u4i25-fczmcj — R075 (autonomous adversarial review) challenge unit had no kind to file findings under. - sf-autonomous-watchdog.sh: delete watchdog-run-*.log files older than 60 minutes at each cycle start. Without rotation .sf/ grew to 1.9 GB in 24h (today's snapshot). 60 min retention captures last cycle for post-incident triage; older state is already in DB + iterations.jsonl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 16:08:05 +02:00
Mikael Hugo	077fd0a2a7	remove A2A; swarm enrollment + status projection + web swarms view; headless refactor - A2A removal per M054/R071 cancellation 2026-05-17 (-2294 lines): - docs/plans/A2A_ADOPTION_PLAN.md, MISSION-A2A-ADOPTION.md deleted - src/resources/extensions/sf/uok/a2a-agent-server.js, a2a-transport.js deleted - tests/a2a-auth.test.mjs deleted - swarm-dispatch.js purged of A2A-conditional code paths - New: scripts/sf-swarm-enroll.mjs + test (operator-facing swarm enrollment, replaces former A2A pairing flow) - New: src/status-projection.ts + test, web/lib/swarm-status.ts + test, web/components/sf/swarms-view.tsx, web/app/api/swarms/ (web swarms-view surface — direct visibility into running swarm state without requiring TUI; aligns with project_tui_deprecating) - headless-{answers,query,ui,headless}.ts: coordinated tweaks consistent with the headless-as-default direction (R124 proposal) - docs/dev/drafts/M053-per-repo-supervisor.md: design refinement - .sf/REQUIREMENTS.md: small text fixes (6/6 churn) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 16:04:06 +02:00
Mikael Hugo	ecac4328bd	backfill: restore 48 R-entries from REQUIREMENTS.md to DB DB corruption recovery (2026-05-17) rebuilt the requirements table from valid btree pages; ROWID-by-ROWID scan found 60 of 68 R-entries. The other 8 + 40 historical (R006-R009 + R022-R065) were never in the DB to begin with — they had drifted into REQUIREMENTS.md only. Backfill script parsed each ### Rxxx — title block and INSERTed into the requirements table with proper class/status/description/why/notes fields. Final DB count: 75 → 123, integrity_check ok, MD↔DB parity restored. The .gitignore tweak from the meta-supervisor commit landed earlier; no functional change here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 15:57:45 +02:00
Mikael Hugo	1cd7890d64	fix: auto-version-bump swallowed operator-direction; ptrmap + lock guards - sf-db-schema.js: auto_vacuum INCREMENTAL → NONE. The "Bad ptr map entry" corruption on 2026-05-17 was incremental-autovacuum ptrmap drift under concurrent writers. Recovered DB has no ptrmap; future fresh DBs must match. incremental_vacuum() callers in sf-db-core.js become no-ops. - bin/sf-from-source: lock allowlist extended to skip readonly sf headless subcommands (--help, query, status, usage, reflect, feedback list, triage --list/--json). Previously every sf headless invocation tried to acquire the project lock — operator couldn't even inspect SF state while autonomous was running. - self-feedback.js triageBlockedEntries: (1) treat empty/null/undefined sfVersion as unknown, not zero; (2) exempt operator-direction kinds (improvement-idea, architecture-defect, missing-feature, gap) from auto-version-bump close. Both were needed to prevent the R124 incident recurring. - headless-feedback.ts handleAdd: populate sfVersion via getCurrentSfVersion + detect repoIdentity via isForgeRepo, not hardcoded "external"/"". An empty sfVersion sorts below any real semver, so the resolver retry-closed every operator-filed entry within seconds. Net effect: R124 proposal (filed via sf headless feedback add) is no longer auto-resolved as version-stale. Larger architectural fix (single- writer SF daemon / RPC for all DB writes — M040 territory) tracked as follow-up R-entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 15:51:36 +02:00
Mikael Hugo	87e9729c13	fix: shard sift search and project requirements	2026-05-17 15:38:55 +02:00
Mikael Hugo	3e5b6fc511	fix: reconcile iteration completion drift	2026-05-17 15:06:40 +02:00
Mikael Hugo	f643272a91	fix: preserve requirements projection fidelity	2026-05-17 15:02:25 +02:00
Mikael Hugo	4289946e11	fix: clear task verification status on revert	2026-05-17 14:59:20 +02:00
Mikael Hugo	3e002ca698	refactor: consolidate loop signals and gate registry wiring	2026-05-17 14:45:12 +02:00
Mikael Hugo	4d2266e57d	fix: consolidate loop supervision gates	2026-05-17 14:35:40 +02:00
Mikael Hugo	625a830d2f	wire R053-R056 detectors into auto-runaway-guard + R081 UokGate retrofit - uok/auto-runaway-guard.js: invoke runDetectorSweep alongside the existing zero-progress check (fire-and-forget for sync-tick compatibility; results consumed on next tick via sweepState ring buffer). Passes unitId, unitMetrics, sessionFingerprint, lockPaths, and a 30-min DB-windowed recentFeedback slice. - detectors/{same-unit-loop, zero-progress, repeated-feedback-kind, artifact-flap, stale-lock, periodic-runner}.js: each detector now also exports a UokGate wrapper (id/type/execute -> GateResult per ADR-0075). Plain detector functions kept for existing consumers. - detectors/index.js: single import surface for the gate exports. - detector-stale-lock.test.mjs (9), detector-periodic-runner.test.mjs (10), detector-gates-contract.test.mjs: fills the R055/R056 test gap filed earlier today + proves UokGate contract conformance. - 41/41 detector tests green; copy-resources clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:18:54 +02:00
Mikael Hugo	527ebfcaa4	gitignore meta-supervisor runtime state The previous commit accidentally tracked .sf/meta-status.json + .sf/meta-supervisor.pid (transient runtime files written by scripts/sf-meta-supervisor.mjs each tick). Mirror the existing .sf/runtime/ ignore pattern for these top-level meta-* files; the daemon keeps writing them on disk but git no longer tracks the churn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:09:24 +02:00
Mikael Hugo	d5664f7142	meta-supervisor (node daemon) + R091 triage gate + R091-R094 spec - scripts/sf-meta-supervisor.mjs: pure-node daemon supervising scripts/sf-autonomous-watchdog.sh. Tick=60s, restarts watchdog if dead, emits .sf/meta-status.json, halt via .sf/meta-supervisor.halt. Uses only node builtins (no SF dist deps) so it survives dist breakage. - src/headless.ts: R091 — gate the per-cycle handleTriage call on a time interval (SF_TRIAGE_INTERVAL_MS, default 30 min) and bump batch size (SF_TRIAGE_MAX, default 25, was 5). Drops the ~8min triage hit from every cycle while letting daily drain capacity rise. - .sf/REQUIREMENTS.md: R091 (triage sidecar) + R092 (PDD-completeness as routing signal) + R093 (pin model per orchestration agent.yaml) + R094 (swarm-role model tier specialization — 8 roles already exist in uok/swarm-roles.js; model field per role missing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:08:30 +02:00
Mikael Hugo	e93d17a3b4	spec + ADR annotations + dormant-code cleanup - .sf/REQUIREMENTS.md: today's R-entries (R066..R090) covering parallel-rescue targets — bus deliver verify, drift detection gate, PDD typed contracts, lane split, Wiggums detector family, repo supervisor design. - ADR-014/019/020: SF-first banners (operator direction: get SF working before ACE/wire-architecture changes land downstream). - docs/records + drafts: 2026-05-07 strategy + cli-agent survey index refresh; SF/ACE pattern draft annotations. - roadmap-mutations.js removed (dormant — never imported; reachable shape verified against handler-relative + dynamic import audit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:45:00 +02:00
Mikael Hugo	d2ff4e84ba	land 6 parallel codex/sonnet rescue outputs - R016 swarm bus deliver verify (uok/swarm-dispatch.js + test): _busDispatch now force-refreshes target inbox and verifies messageId visibility before returning ok:true; ack-without-deliver class closed. - R082 drift detection UokGate (uok/drift-detection-gate.js + test): single-task + sweep scope; 3 drift classes (artifact-missing, prose-status mismatch, broken-import); follows ADR-0075 id/type/execute -> GateResult contract. - R087 PDD typed contracts (engine-types.js + test): ADR-0000 8 PDD fields + 7-dim run-control policy + ADR-0075 GateResult typedefs and validators. - R090 planning-execute lane split (auto/unit-lanes.js + auto/loop.js + 2 tests): lane classifier + capacity-aware tick dispatcher; SF_LANES=0 fallback is byte-equivalent to pre-R090. - R053 + R054 Wiggums detectors (detectors/repeated-feedback-kind.js + detectors/artifact-flap.js + 2 tests); R055 stale-lock + R056 periodic-runner source landed without tests (gap filed as self-feedback). - M053 per-repo supervisor design + skeleton (supervisor/repo-supervisor.js + test + design doc): RepoSupervisor class, zero module-global state, tick stub, failure isolation; M056 trust-boundary called out as follow-up. 85/85 tests green across the 8 new test files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:44:42 +02:00
Mikael Hugo	eaac4f0bd3	sf snapshot: uncommitted changes after 187m inactivity	2026-05-17 12:04:55 +02:00
Mikael Hugo	6f5e2f0aa9	spec(R060-R065): backup, schema-migration, secrets lifecycle, rate budget, cache policy, deprecation + M042-M047	2026-05-17 08:57:30 +02:00
Mikael Hugo	09687ccd30	spec(R059): typed entity vocabulary (R/A/D/M/S/T/F/G/K/P/E) Operator: "should we have a for adrs and d for decisions? any other type we should habe?" + "and so we use" — yes, file it and adopt. Adds A (ADR), D (Decision), F (Finding), G (Gate), K (Knowledge), P (Pattern), E (Evidence) prefixes to the existing R/M/S/T set. Each gets a source-of-truth location and a mechanical migration path. R048 (unbroken purpose chain) + R047 (per-R fulfillment validation) both require typed cross-references to verify integrity. Without typed IDs, "this M is covered by R, S, T, A, D" is unverifiable free-text. Owning milestone M041 (also added) splits the migration into 6 slices: rename ADRs, add D-IDs to DECISIONS, backfill F/G/K-IDs in DB tables, doctor cross-link integrity check, lint for SF-authored typed references. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:52:37 +02:00
Mikael Hugo	72c2ecb2b2	spec(R058): DB writes via MessageBus (single-writer actor) Operator: "db via bus so we dont crash it?" — yes, this is the natural fit for the lane model (R057 system lane + R046 multi-unit lanes). Concurrent lanes writing directly to SQLite hit SQLITE_BUSY or worse; mediating all writes through a single MessageBus-owned db-writer actor enforces the single-writer invariant operationally. Reads stay direct (multi-reader WAL is safe). Migration is gradual (table by table). Also serves as the substrate for multi-repo federation (R028 / ADR-019/020) where cross-process DB sharing needs message-based access anyway. Future M040. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:48:40 +02:00
Mikael Hugo	14fe3fa20a	spec(R057): rename to "System Lane" + introduce lane primitive Operator coined "system lane" — better than my "side-track". Frames the architecture cleanly. The lane primitive unifies: - R046 (multi-unit lanes) parallel slice dispatch - R049 (per-lane model routing) different LLM per lane - R057 (system lane) non-unit work alongside unit lane Today autoLoop is 1 unit lane. System lane runs alongside for memory consolidation, triage drain, doctor audits, log compaction, reflection assembly, catalog refresh — all currently queued between units. Single-writer DB met by sf-db.js serial queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:45:30 +02:00
Mikael Hugo	9bd7067b69	fix(wiggums): permission level — "normal" + default fallback to "medium" legacyPermissionLevelForProfile had a switch with cases for restricted/trusted/unrestricted only, no case for "normal" (the DEFAULT autonomous session profile per auto/session.js:377). "normal" fell through to default → "low" — too restrictive for autonomous work. Witnessed M010/S04/T01: solver note "TypeScript compilation and git diff blocked by low permission level" — SF couldn't verify its own deliverable because permissions were locked down despite running in autonomous mode. Fix: - "normal" → "medium" (allows tsc, git, npm test) - default → "medium" (was "low"); unknown profiles shouldn't cripple autonomous executors. Operators wanting strict mode set profile: "restricted" explicitly. Per operator intent 2026-05-17: "SF should have permission even if it can limit its agents and only allow orchestrator or whatever." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:38:10 +02:00
Mikael Hugo	02bac88a63	feat: Spawn-failure watchdog, status=failed transition, doctor signal,… SF-Task: S04/T01	2026-05-17 08:36:18 +02:00
Mikael Hugo	8122a2b6c7	fix(watchdog): pre-flight smoke + crash-loop backoff Two guards added after today's 2-hour crash-loop on missing DEFAULT_STALE_TIMEOUT_MS export: 1. Pre-flight smoke test: \`sf --version\` must succeed before each cycle. If dist is broken (missing export, syntax error), pause 5min + log loudly instead of immediately respawning into the same crash. 2. Crash-loop detection: 3 consecutive <90s failure exits → assume crash-loop, back off 5min before retry. Prevents the "100 crashes in 2 hours, 0 useful work" pattern we just hit. Together: a broken dist causes ONE crash + a 5min pause, not a 2-hour CPU burn. Operator notices the pause in .sf/watchdog.log and intervenes; in the meantime no resources wasted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:31:07 +02:00
Mikael Hugo	80ede48f06	sf snapshot: uncommitted changes after 246m inactivity	2026-05-17 08:28:04 +02:00
Mikael Hugo	c7b13607b5	fix(wiggums): exponential backoff on autoLoop halt-watchdog-break When the halt-watchdog detects stuck state, the autoLoop was logging "halt-watchdog-break" every iteration but otherwise tight-spinning through dispatch-resolve at ~2s/iteration. 2026-05-17 dogfood logged 60+ such events in a 30s window — pure CPU burn while the actual stuck condition stayed stuck. Fix: exponential backoff (1s → 2s → 4s → 8s → 16s → capped at 30s) based on how many halt thresholds have elapsed. Heartbeat() resets when real progress resumes (existing behavior). Backoff costs nothing when the loop is healthy. One of the 14 Ralph-Wiggum patterns surfaced this session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 04:21:21 +02:00
Mikael Hugo	24d2b37562	fix(wiggums): verify PID liveness before "Another session running" message Two sites told operator to "kill PID X" without checking X was alive: - interrupted-session.js:formatInterruptedSessionRunningMessage - auto.js autonomous-start blocking notification Both report stale locks from crashed prior sessions as if a live session exists, confusing operator and blocking restart. Session-lock.js already has auto-recovery for stale-PID locks; these two surfaces just needed matching liveness checks to label dead-PID locks correctly. Now: dead-PID → "Stale lock from dead PID X — will be auto-recovered" alive-PID → original "kill X" message Catches one of the 14 Ralph-Wiggum-obvious patterns surfaced this session. Reduces operator confusion + dovetails with R055 (M038/S05) when stale-lock auto-recovery becomes a core-loop detector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 04:20:00 +02:00
Mikael Hugo	96d03b33bc	spec: M038 Wiggums Detector family (R051-R056) The autonomous loop currently lacks baseline "this is dumb-obvious stuck" detection. This session alone surfaced 14 such patterns that required operator grep-archaeology to identify. M038 centralizes a single Wiggums-Detector orchestrator (R056) that runs 5 detector questions every 30s: R051 — same-unit dispatched >3 times with no state change R052 — runtime/units progressCount:0 for >5min (heartbeating ghost) R053 — >5 self-feedback entries of same kind/target in 24h R054 — artifact predicates flapping between dispatches R055 — stale .sf/sf.lock from dead holder + stale inline-fix marker Each detector pauses + files actionable self-feedback. Trivial cases auto-fix (e.g. stale-lock rm). New detectors (R057+) plug into the orchestrator without per-detector lifecycle code. Anchors to ADR-0000 (purpose-to-software requires self-healing). Builds on the recurring patterns evidenced 2026-05-17: - 70+ degenerate reassess iterations on M010/S03 (R051) - 56+ runaway-loop:idle-halt entries accumulated on M005 (R053) - Multiple stale-lock incidents requiring manual rm (R055) 56 R-entries total, 54/56 mapped (R049/R050 still future M036-M037). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 04:16:52 +02:00
Mikael Hugo	e470939723	spec(R051): same-unit dispatch-loop detection (Ralph Wiggum safety net) When dispatcher resolves the same unit N>3 times in a session without state-change between dispatches, detect the loop, pause, file self-feedback. Targets the 2026-05-17 dogfood pattern where reassess-roadmap M010/S03 ran 70+ times because of the ASSESSMENT suffix mismatch (now fixed in `a737af318`). Even after the immediate fix, this safety net prevents future unknown-bug versions of the same failure mode from burning hours of compute. R051 makes the failure first-class detectable instead of operator-hand-debug. Owning milestone M038 (future). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 04:12:55 +02:00
Mikael Hugo	a737af318d	fix(dispatch): reassess-roadmap loop on slice with ASSESSMENT.md checkNeedsReassessment looked up the slice's assessment file with suffix='ASSESS', but actual files are 'S03-ASSESSMENT.md'. The resolveFile pattern requires at least one char before the suffix (/^S03-.*-ASSESS\.md$/), so 'S03-ASSESSMENT.md' never matched and the helper returned {sliceId} on every poll → dispatcher kept firing reassess-roadmap forever. Fix: try 'ASSESSMENT' first, fall back to legacy 'ASSESS'. Now S03-ASSESSMENT.md properly satisfies the "already reassessed" check and the dispatcher advances to the next slice (S04). Verified: resolveSliceFile('M010','S03','ASSESSMENT') returns the real path; with the fallback, this resolves on first call. The 70+ degenerate reassess iterations on M010/S03 (witnessed 2026-05-17) won't recur. Ralph Wiggum approved. (per operator: "sf should clear these stuck itself ralph wiggums would fix") Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 04:12:15 +02:00
Mikael Hugo	0c0608fa50	fix(swarm): recognize unit-specific completion tools as implicit complete Detected via supervisory check 2026-05-17: SF stuck in degenerate reassess- roadmap loop on M010/S03 (5 iterations in 8min, all returning outcome=continue). Root cause: synthesized-checkpoint in runUnitViaSwarm only treats the generic `checkpoint` tool as a completion signal — but units routinely complete via their unit-specific tool (reassess_roadmap with verdict=roadmap-confirmed, validate_milestone, complete_milestone, complete_slice, save_summary). The LLM correctly emitted the unit's specific completion tool + assistant text "<turn_status>complete</turn_status>", but workerSignaledOutcome stayed null → synthesized checkpoint fell back to continue → solver re-iterated. Fix: recognize UNIT_COMPLETION_TOOLS = {reassess_roadmap, validate_milestone, complete_milestone, complete_slice, save_summary} as implicit "complete" signals. The check fires when those tools are called and an earlier explicit checkpoint hasn't already said "complete" or "blocked". This resolves sf-mp94lth4-ew26om and should prevent future degenerate-iteration loops on reassess-roadmap and milestone completion units. 13/13 existing M010 tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:59:30 +02:00
Mikael Hugo	460db52504	chore(watchdog): enable SF_INLINE_DISPATCH=1 by default Now that M010/S01+S02+S03 ship the inline-dispatch path (runUnitInline + DispatchLayer + autoLoop wiring), the watchdog enables it on every cycle so the autonomous loop actually exercises the inline scope for INLINE_ELIGIBLE_UNITS (validate-milestone, complete-milestone, reassess-roadmap). Other unit types continue to use the swarm path unchanged. This dogfoods M010/S03 in every watchdog cycle. If the inline path regresses, the autonomous solver will surface it via self-feedback (R015 spawn-failure loud-failure + agent-runner instrumentation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:54:47 +02:00
Mikael Hugo	7a273262f1	fix(benchmark-coverage): tier-strip fallbacks downgraded to 'approx' proxy User caught: flash-lite ≠ flash (different model tier, different scores). Previous fix counted flash-lite as fully covered via flash proxy, which overstated coverage and could mislead routing. benchmarkLookupVariants now tags variants with kind: - 'exact' → date/version strip + -latest alias (same model line) - 'approx' → tier strip (flash-lite→flash, X-lite→X) — different model computeBenchmarkCoverage promotes 'exact' matches to covered; 'approx' matches stay in uncovered with `approximatedBy` field so operators see when a real benchmark is still needed. Honest report: 64 exact covered / 1 proxy-only / 104 genuine uncovered (was 65/0/104 with the overcount). R049 + R050 added to traceability (M036/M037 future milestones). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:52:29 +02:00
Mikael Hugo	1dc7c2e278	feat(benchmark-coverage): variant-fallback lookup (R050 step 1) benchmark-coverage.js: new benchmarkLookupVariants() returns ordered fallback keys for a model id, and computeBenchmarkCoverage tries each variant before flagging uncovered. Patterns covered: - date/version suffix strip ("mistral-medium-2505" → "mistral-medium") - tier strip ("X-flash-lite" → "X-flash", "Y-lite" → "Y") - "-latest" append for bare names ("mistral-medium" → "mistral-medium-latest") The audit reports the matched variant via `matchedVia` so operators can see when fallback applied (vs adding a real entry). Verified: coverage 62/169 (37%) → 65/169 (38.4%). Sample fallback matches: google-gemini-cli/gemini-2.5-flash-lite → gemini-2.5-flash mistral/mistral-medium → mistral-medium-latest mistral/magistral-small-2509 → magistral-small R050 now active: full closure requires auto-benchmark of remaining 104 uncovered models via bulk-import of published scores or live eval. This step shrinks the gap via cheap structural fallback; future work adds the real scoring loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:50:21 +02:00
Mikael Hugo	bbce6827aa	feat(dispatch): wire autoLoop to DispatchLayer via SF_INLINE_DISPATCH (M010/S03) run-unit.js: new tryInlineDispatch helper routes inline-eligible unit types through the M010/S02 DispatchLayer when env SF_INLINE_DISPATCH=1. Safe by default — without the env var OR for non-eligible unit types (any unit not in INLINE_ELIGIBLE_UNITS), behavior is byte-identical to before. With the env var set on validate-milestone / complete-milestone / reassess-roadmap, the autoLoop reaches runUnitInline → runSubagent in-process, no spawn. The helper translates DispatchLayer's {ok, output, exitCode, stderr} into the UnitResult shape that autoLoop's resolveAgentEnd/finalize chain expects, so downstream handling works unchanged. 13/13 M010 tests still pass. M010/S03 marked complete. R049 added: Multi-Provider Parallel Routing — different concurrent units route to different LLM providers based on quota/specialty/cost/failover. Builds on R046 + R017 + model-router scoring. Future M036. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:46:48 +02:00
Mikael Hugo	1b3e09d527	chore: map R041-R048 to M031-M035 + watchdog clears active.json REQUIREMENTS.md: traceability table now has 48/48 R-entries mapped to owning milestone slices (was 40/48 unmapped). M031 owns R041-R044 (R-to-Milestone bootstrap with deep research), M032 owns R045 (R-auto- expansion), M033 owns R046 (autonomous loop parallel dispatch), M034 owns R047 (per-R fulfillment validation), M035 owns R048 (unbroken purpose chain). scripts/sf-autonomous-watchdog.sh: also clears .sf/runtime/autonomous- solver/active.json on cycle restart. Without this, a unit in status:running from a crashed prior run made the autoLoop spin in halt-watchdog-break (witnessed in this session: iteration 239+ in 8min without unit progress). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:43:23 +02:00
Mikael Hugo	73a464f574	feat(ops): SF autonomous watchdog for continuous unattended dispatch scripts/sf-autonomous-watchdog.sh — bash daemon that supervises `sf headless autonomous` across crashes/timeouts. Per-cycle: 1. Cleans stale state (lock + zombie inline-fix dispatch) 2. Kills orphan sf processes from prior runs 3. Launches sf with 30-min hard timeout (longest sf accepts cleanly) 4. On exit (timeout / dispatch-stop / crash), logs and restarts after 15s cooldown (10min cooldown if all milestones complete) Run: nohup bash scripts/sf-autonomous-watchdog.sh > .sf/watchdog.log 2>&1 & Stop: pkill -f sf-autonomous-watchdog This is the operational mode for the 2-4 week delivery horizon — SF runs continuously, the watchdog catches all exit conditions, and progress accumulates across many autonomous cycles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:39:08 +02:00
Mikael Hugo	9432dace89	feat: roadmap expansion (M010-M030) + Unified Dispatch v2 scaffold (M010/S01+S02) REQUIREMENTS.md: 15 → 48 R-entries covering self-heal, inline dispatch, MessageBus coherence, multi-model routing, reconciliation, operator tooling, docs sync, test-backed completion, cost accountability, portability, federation, skills marketplace, privacy, ADR enforcement, idempotency, plan determinism, performance budgets, operator steering, purpose-driver enforcement (R036-R040), R-to-milestone bootstrap (R041-R044), R-auto-expansion (R045), parallel dispatch (R046), per-R validation (R047), unbroken purpose chain R→M→S→T→code (R048). 40/48 mapped to milestone slices. PROJECT.md: reconciled with reality (M001/M003/M004/M005/M006 complete; M010-M030 queued; cancelled/skipped properly categorized). New code (M010/S01+S02 delivered): - dispatch/run-unit-inline.js: callable runUnitInline(unitType, unitId, opts) for in-process unit execution. Routes through runSubagent without spawn or worktree. Covers validate-milestone, complete-milestone, reassess-roadmap. - dispatch/dispatch-layer.js: DispatchLayer class with full 4D API per UNIFIED_DISPATCH_V2_PLAN.md. Implements full\|managed\|inline\|single config; other cells return structured not-implemented errors with named owners. Tests: run-unit-inline.test.mjs (5/5), dispatch-layer.test.mjs (8/8), m006-s02-manifest-drift.test.mjs (2/2 regression guard for the manifest drift class). Bug fix: state-db.js cancelled-milestone branch in buildRegistryAndFindActive (resolves sf-mp8aotmq-jxby91). Dispatcher no longer routes plan-milestone at cancelled stubs. M005/M006 honest closeouts via VALIDATION.md + SUMMARY.md with operational verification class evidence. M001-6377a4 SUMMARY retrofit. auto-prompts.js: M005 round-2 remediation — removed manual knowledge/graph re-injection from 4 simple builders + migrated research-milestone to fully declarative composer ordering. unit-context-manifest.js: research-milestone manifest moved knowledge to inline-position + graph to computed. swarm-dispatch.js: debugLog instrumentation for diagnosis (before-busDispatch / after-busDispatch / before-runAgentTurn / watchdog-about-to-call-runAgentTurn). research-milestone.md prompt + research.md template: tuned for heavy research (deep-mode default, 8-12 web search budget, mandatory Comparable Systems section). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:37:00 +02:00
Mikael Hugo	d52c869433	feat(sf-from-source): single-writer project lock via flock on .sf/sf.lock Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Two SF processes writing to the same .sf/sf.db over WAL caused torn pages and "database disk image is malformed" corruption (observed 2026-05-17 in dogfood-5 — the project DB ended up with B-tree pointer-map desync at page 69, requiring a backup restore). The session-lock in src/resources/extensions/sf/session-lock.js exists but is only acquired from auto-start.js when autonomous mode starts. Interactive sf or pre-autonomous-start work did not take it, so a second sf could open the same DB and contend. Promote the lock to the shell wrapper so EVERY sf invocation in a write-capable mode acquires a project-level flock on .sf/sf.lock BEFORE node is launched. Read-only commands (logs, status, dash, sessions, list, --version, --help) skip the lock to keep concurrent read use-cases working. SF_SKIP_LOCK=1 escape hatch for tests that intentionally exercise concurrent paths. On collision the wrapper prints the current lock holder (pid + args + cwd + started timestamp) so the operator can identify the conflicting session, then exits with 75 (EX_TEMPFAIL). The lock is released automatically when the wrapper bash exits — no stale-lock recovery needed since flock is kernel-owned and dies with the fd. The fd opens in read+write mode (`<>`) WITHOUT truncating so the collision branch can still cat the existing holder; truncation happens only after flock succeeds, preventing two racers from clobbering each other's metadata. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:59:06 +02:00
Mikael Hugo	7b70d35111	refactor(bootstrap): use ensureSiftIndexWarmup at session_start, drop bm25-only prewarm Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Commit `38994d7a2` added a custom bm25-only Sift warmup at session_start. After investigating, code-intelligence.js already has ensureSiftIndexWarmup which runs the full hybrid + vector + reranker warmup as a properly- daemonized process (PPID=1 after init-reparent, 1-hour hard cap, state tracked in .sf/runtime/sift-index-warmup.json with status/artifactCount/ cacheBytes fields). The existing function is wired to auto-start.js, init-wizard.js, guided-flow.js, and auto/loop.js — but NOT to plain session_start. A pure interactive `sf` session (no /autonomous, no init wizard) was previously getting no warmup at all. Replace the bm25-only spawn with a call to ensureSiftIndexWarmup so session_start now gets the same full hybrid+vector treatment the other entry points already use. Drop sift-prewarm.js — the wrapper is no longer needed. User's "we need vector reindex" intent (today): now satisfied at every SF entry point, not just autonomous/wizard/flow. The broader "always-on out-of-session daemon + file-watcher incremental re-warm + bus integration" piece is still tracked in sf-mp8z9otl-iaqrn2 (missing-feature:sift-persistent-index-daemon) for slice planning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:33:50 +02:00
Mikael Hugo	38994d7a20	feat(bootstrap): pre-warm Sift index at session_start Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details Sift (~/.cargo/bin/sift) builds its index lazily on first `sift search` per cache key. In an SF session, the first real Sift query typically happens deep inside an execute-task unit when an agent reaches for the search-tool — and that agent pays the full cold- build cost (tens of seconds on a large repo). Subsequent queries hit warm cache and are fast. Hook session_start to fire a cheap detached `sift search` against the project root. The actual index build runs in parallel with the rest of session_start (other catalog refreshes, doctor fix, etc.) and is ready by the time any agent invokes search-tool. Cheapest possible warmup: bm25-only retriever, no reranking, limit 1 — just enough to trigger the index build pipeline. Fully fire-and-forget: failures are swallowed (sift missing, spawn error, exit non-zero — all just resolve(false). SF carries on as before). Also lands the .sf/preferences.yaml git section requested in the same session: solo-mode defaults (auto_push=true, isolation=none, merge_strategy=squash) so the autonomous loop doesn't pause for operator confirmation on commit/push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 01:24:51 +02:00
Mikael Hugo	53259aebf1	fix(self-feedback): 3 sf-internal defects resolved Some checks are pending CI / detect-changes (push) Waiting to run Details CI / docs-check (push) Blocked by required conditions Details CI / lint (push) Blocked by required conditions Details CI / build (push) Blocked by required conditions Details CI / integration-tests (push) Blocked by required conditions Details CI / windows-portability (push) Blocked by required conditions Details CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions Details CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions Details CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions Details 1. cooldown failover (sf-mp8w9cg9-arixq7, high) When a provider hits AUTH_COOLDOWN in unit execution, block the failing model with an expiry using the existing blockModel() API, then try a non-cooldowned provider via isProviderRequestReady. Only stops if every provider is unavailable, with an enumerated message showing which ones are down. loop.js consecutiveCooldowns is not touched here (it tracks the loop-level retry budget for provider-not-ready errors that bypass phases-unit; the cooldown path in loop.js is separate and handles errors thrown before runUnitPhase, while this fix handles cancellation returned from runUnitPhase due to provider error during session creation). 2. redundant reassess-roadmap on completed slices (sf-mp8wa4qr-xw8fjb, medium) Doctor-triggered reassess path (loop.js P4-A) now checks whether the target slice already has an ASSESSMENT file before queuing reassess-roadmap. Mirrors the guard already present in the normal dispatch path (checkNeedsReassessment). 3. empty structured fields in slice summary (sf-mp8w6s88-ckv4yr, low) Added explicit instruction in complete-slice.md prompt template directing the executor to derive key_files, key_decisions, and patterns_established from task summaries before calling complete_slice.	2026-05-17 00:55:56 +02:00

1 2 3 4 5 ...

4673 commits