Commit graph

4629 commits

Author SHA1 Message Date
Mikael Hugo
73a464f574 feat(ops): SF autonomous watchdog for continuous unattended dispatch
scripts/sf-autonomous-watchdog.sh — bash daemon that supervises
`sf headless autonomous` across crashes/timeouts. Per-cycle:
  1. Cleans stale state (lock + zombie inline-fix dispatch)
  2. Kills orphan sf processes from prior runs
  3. Launches sf with 30-min hard timeout (longest sf accepts cleanly)
  4. On exit (timeout / dispatch-stop / crash), logs and restarts after
     15s cooldown (10min cooldown if all milestones complete)

Run: nohup bash scripts/sf-autonomous-watchdog.sh > .sf/watchdog.log 2>&1 &
Stop: pkill -f sf-autonomous-watchdog

This is the operational mode for the 2-4 week delivery horizon — SF
runs continuously, the watchdog catches all exit conditions, and
progress accumulates across many autonomous cycles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:39:08 +02:00
Mikael Hugo
9432dace89 feat: roadmap expansion (M010-M030) + Unified Dispatch v2 scaffold (M010/S01+S02)
REQUIREMENTS.md: 15 → 48 R-entries covering self-heal, inline dispatch,
MessageBus coherence, multi-model routing, reconciliation, operator tooling,
docs sync, test-backed completion, cost accountability, portability, federation,
skills marketplace, privacy, ADR enforcement, idempotency, plan determinism,
performance budgets, operator steering, purpose-driver enforcement (R036-R040),
R-to-milestone bootstrap (R041-R044), R-auto-expansion (R045), parallel
dispatch (R046), per-R validation (R047), unbroken purpose chain R→M→S→T→code
(R048). 40/48 mapped to milestone slices.

PROJECT.md: reconciled with reality (M001/M003/M004/M005/M006 complete;
M010-M030 queued; cancelled/skipped properly categorized).

New code (M010/S01+S02 delivered):
- dispatch/run-unit-inline.js: callable runUnitInline(unitType, unitId, opts)
  for in-process unit execution. Routes through runSubagent without spawn
  or worktree. Covers validate-milestone, complete-milestone, reassess-roadmap.
- dispatch/dispatch-layer.js: DispatchLayer class with full 4D API per
  UNIFIED_DISPATCH_V2_PLAN.md. Implements full|managed|inline|single config;
  other cells return structured not-implemented errors with named owners.

Tests: run-unit-inline.test.mjs (5/5), dispatch-layer.test.mjs (8/8),
m006-s02-manifest-drift.test.mjs (2/2 regression guard for the manifest
drift class).

Bug fix: state-db.js cancelled-milestone branch in buildRegistryAndFindActive
(resolves sf-mp8aotmq-jxby91). Dispatcher no longer routes plan-milestone at
cancelled stubs.

M005/M006 honest closeouts via VALIDATION.md + SUMMARY.md with operational
verification class evidence. M001-6377a4 SUMMARY retrofit.

auto-prompts.js: M005 round-2 remediation — removed manual knowledge/graph
re-injection from 4 simple builders + migrated research-milestone to fully
declarative composer ordering.

unit-context-manifest.js: research-milestone manifest moved knowledge to
inline-position + graph to computed.

swarm-dispatch.js: debugLog instrumentation for diagnosis (before-busDispatch
/ after-busDispatch / before-runAgentTurn / watchdog-about-to-call-runAgentTurn).

research-milestone.md prompt + research.md template: tuned for heavy research
(deep-mode default, 8-12 web search budget, mandatory Comparable Systems
section).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:37:00 +02:00
Mikael Hugo
d52c869433 feat(sf-from-source): single-writer project lock via flock on .sf/sf.lock
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Two SF processes writing to the same .sf/sf.db over WAL caused torn
pages and "database disk image is malformed" corruption (observed
2026-05-17 in dogfood-5 — the project DB ended up with B-tree
pointer-map desync at page 69, requiring a backup restore). The
session-lock in src/resources/extensions/sf/session-lock.js exists
but is only acquired from auto-start.js when autonomous mode starts.
Interactive sf or pre-autonomous-start work did not take it, so a
second sf could open the same DB and contend.

Promote the lock to the shell wrapper so EVERY sf invocation in a
write-capable mode acquires a project-level flock on .sf/sf.lock
BEFORE node is launched. Read-only commands (logs, status, dash,
sessions, list, --version, --help) skip the lock to keep concurrent
read use-cases working. SF_SKIP_LOCK=1 escape hatch for tests that
intentionally exercise concurrent paths.

On collision the wrapper prints the current lock holder (pid + args
+ cwd + started timestamp) so the operator can identify the
conflicting session, then exits with 75 (EX_TEMPFAIL). The lock is
released automatically when the wrapper bash exits — no stale-lock
recovery needed since flock is kernel-owned and dies with the fd.

The fd opens in read+write mode (`<>`) WITHOUT truncating so the
collision branch can still cat the existing holder; truncation
happens only after flock succeeds, preventing two racers from
clobbering each other's metadata.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:59:06 +02:00
Mikael Hugo
7b70d35111 refactor(bootstrap): use ensureSiftIndexWarmup at session_start, drop bm25-only prewarm
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Commit 38994d7a2 added a custom bm25-only Sift warmup at session_start.
After investigating, code-intelligence.js already has ensureSiftIndexWarmup
which runs the full hybrid + vector + reranker warmup as a properly-
daemonized process (PPID=1 after init-reparent, 1-hour hard cap, state
tracked in .sf/runtime/sift-index-warmup.json with status/artifactCount/
cacheBytes fields). The existing function is wired to auto-start.js,
init-wizard.js, guided-flow.js, and auto/loop.js — but NOT to plain
session_start. A pure interactive `sf` session (no /autonomous, no init
wizard) was previously getting no warmup at all.

Replace the bm25-only spawn with a call to ensureSiftIndexWarmup so
session_start now gets the same full hybrid+vector treatment the other
entry points already use. Drop sift-prewarm.js — the wrapper is no
longer needed.

User's "we need vector reindex" intent (today): now satisfied at every
SF entry point, not just autonomous/wizard/flow.

The broader "always-on out-of-session daemon + file-watcher incremental
re-warm + bus integration" piece is still tracked in
sf-mp8z9otl-iaqrn2 (missing-feature:sift-persistent-index-daemon) for
slice planning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:33:50 +02:00
Mikael Hugo
38994d7a20 feat(bootstrap): pre-warm Sift index at session_start
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Sift (~/.cargo/bin/sift) builds its index lazily on first `sift
search` per cache key. In an SF session, the first real Sift query
typically happens deep inside an execute-task unit when an agent
reaches for the search-tool — and that agent pays the full cold-
build cost (tens of seconds on a large repo). Subsequent queries
hit warm cache and are fast.

Hook session_start to fire a cheap detached `sift search` against
the project root. The actual index build runs in parallel with the
rest of session_start (other catalog refreshes, doctor fix, etc.)
and is ready by the time any agent invokes search-tool. Cheapest
possible warmup: bm25-only retriever, no reranking, limit 1 — just
enough to trigger the index build pipeline.

Fully fire-and-forget: failures are swallowed (sift missing, spawn
error, exit non-zero — all just resolve(false). SF carries on as
before).

Also lands the .sf/preferences.yaml git section requested in the
same session: solo-mode defaults (auto_push=true, isolation=none,
merge_strategy=squash) so the autonomous loop doesn't pause for
operator confirmation on commit/push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 01:24:51 +02:00
Mikael Hugo
53259aebf1 fix(self-feedback): 3 sf-internal defects resolved
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
1. cooldown failover (sf-mp8w9cg9-arixq7, high)
   When a provider hits AUTH_COOLDOWN in unit execution, block the
   failing model with an expiry using the existing blockModel() API,
   then try a non-cooldowned provider via isProviderRequestReady.
   Only stops if every provider is unavailable, with an enumerated
   message showing which ones are down. loop.js consecutiveCooldowns
   is not touched here (it tracks the loop-level retry budget for
   provider-not-ready errors that bypass phases-unit; the cooldown
   path in loop.js is separate and handles errors thrown before
   runUnitPhase, while this fix handles cancellation returned from
   runUnitPhase due to provider error during session creation).

2. redundant reassess-roadmap on completed slices (sf-mp8wa4qr-xw8fjb, medium)
   Doctor-triggered reassess path (loop.js P4-A) now checks whether
   the target slice already has an ASSESSMENT file before queuing
   reassess-roadmap. Mirrors the guard already present in the
   normal dispatch path (checkNeedsReassessment).

3. empty structured fields in slice summary (sf-mp8w6s88-ckv4yr, low)
   Added explicit instruction in complete-slice.md prompt template
   directing the executor to derive key_files, key_decisions, and
   patterns_established from task summaries before calling
   complete_slice.
2026-05-17 00:55:56 +02:00
Mikael Hugo
41276a7b7a feat(auto/loop): mid-loop self-feedback inline-fix dispatcher
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Bootstrap drains the triage queue once at session_start (headless.ts:
647 "[headless] autonomous: draining self-feedback triage queue
first..."). Entries filed DURING the autonomous run previously sat
until the next sf restart — defeating the self-heal thesis for
long-running sessions like the 3-day dogfood the user is running now.

dispatchSelfFeedbackInlineFixIfNeeded already exists in the extension
(self-feedback-drain.js:277) and is wired into bootstrap/register-
hooks at session_start. It selects high/critical candidates, debounces
via a claim file (so concurrent invocations skip), and on the headless
surface spawns a child `sf headless triage --apply` fire-and-forget —
the autonomous loop continues unblocked while triage runs in a child.

Hook it into the auto-loop top-of-iteration so it fires every
MID_LOOP_TRIAGE_INTERVAL=5 iterations. The dispatcher's own claim-file
debounce prevents re-dispatch of in-flight entries; pre-bootstrap-
drained entries get re-evaluated only when something new shows up.

Also ignores scripts/tmp-check-test-imports in biome — the check-
test-imports.test.mjs self-test creates regression fixtures there and
they triggered formatter errors on dirty exits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:50:03 +02:00
Mikael Hugo
e8bbb477e6 fix(scripts/check-test-imports): filter TS keywords + local declarations
The check-test-imports drift guard was emitting too many false positives
to be safely integrated into npm run lint (per CLAUDE.md: "NOT integrated
into npm run lint by default — too broad"). Two big classes of FP:

1) TypeScript keywords + utility types treated as undeclared (any, type,
   ReturnType, Partial, Record, never, unknown, etc.) — added to the
   JS_KEYWORDS set since the script doesn't otherwise distinguish JS
   from TS.

2) Identifiers declared locally in the file (function declarations,
   const/let/var declarations, destructured patterns, function/arrow
   parameters, catch params, class names, type/interface/enum names) —
   added a new collectLocalDeclarations() pass that regex-scans these
   patterns and feeds the results into the filter chain.

After this patch the script no longer flags makeMockTUI / loader / tui
(local lets), `ReturnType<...>` (TS utility), or `any` (TS keyword) on
the canonical TUI test files. It still flags type-only imports
(`import type { Foo }` lines) and object-literal property names
(`{ recursive: true }`) — those remain as known FP classes documented
in the file's header for a future TS-parser-based pass.

Self-test 5/5 passes. Not yet integrating into npm run lint pending
further FP reduction; see filed self-feedback for the broader
integration plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:46:10 +02:00
Mikael Hugo
56e8ec6c53 fix(auto/loop): convergence guard breaks the reassess-roadmap redispatch loop
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Dogfood today: autonomous mode burned $4.95 / 33.5M tokens / 28 min /
500 unproductive iterations on reassess-roadmap M006/S01 redispatching
the SAME unit ≥45 consecutive times before runaway-guard finally
fired. Each cycle: unit dispatches → swarm planner completes → unit
exits "success" → next iteration sees the same doctor slice-ref
health issue → re-queues the same unit. The auto-post-unit
auto-remediate path (insertArtifact for ASSESSMENT files) is wired
correctly but the reassess-roadmap unit's success doesn't actually
resolve the doctor's slice-reference issues — so the gate keeps
firing.

SF already has detectStuck Rule 2 ("Same unit 3+ consecutive times →
stuck") in auto/detect-stuck.js, but the doctor-health-reassess-
roadmap shortcut in auto/loop.js:1095-1170 bypasses normal pre-dispatch
and unshifts directly to sidecarQueue — so the unit never goes through
the phases-dispatch path that pushes to loopState.recentUnits, and
detectStuck never sees the repetition.

Convergence guard: before unshifting reassess-roadmap, check whether
the SAME (unitType + unitId) just ran 3+ consecutive times in
loopState.recentUnits. If yes:
  - Skip the redispatch (don't unshift, don't finishTurn("retry"))
  - File a self-feedback entry kind=engine-loop:non-converging-
    redispatch so triage sees the pattern and can plan a real fix
  - Fall through to normal runPreDispatch so the existing detectStuck
    machinery can break the loop the next time the same key derives.

This is the user's "Ralph Wiggum loop" pattern — system observing its
own failure repeatedly without ever escaping. The broader convergence-
detector / solver-handoff / quarantine framework is filed for slice
planning in sf-mp8x32sy-70w298; this commit is the minimum surgical
fix for the specific reassess-roadmap-via-doctor-shortcut path that
actually fired today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:31:23 +02:00
Mikael Hugo
6481e54fec fix(web/inspect): read live .sf/sf.db SQLite instead of obsolete sf-db.json
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
The earlier collectInspectData read .sf/sf-db.json, a JSON projection
file SF stopped generating after the DB-first runtime landed.
.sf/sf-db.json no longer exists in any modern repo (verified absent
in this checkout), so /api/inspect was returning an empty payload
every time.

Replace with a read-only node:sqlite query against the live database:
  - schemaVersion via MAX(version) FROM schema_version
  - counts from COUNT(*) FROM {decisions,requirements,artifacts}
  - recentDecisions ordered by decisions.seq DESC LIMIT 5
  - recentRequirements ordered by requirements.id DESC LIMIT 5

The DB is opened readOnly so the autonomous loop's writer lock isn't
contested, and any failure (corrupt / locked / schema-drift) returns
an empty payload instead of 500-ing so the operator endpoint stays
available.

This is the small surgical half of the broader web-sf-information-
drift gap: web has no API surfaces for self-feedback, memories,
reflection reports, or uok_messages bus state. That broader integration
work is filed as a separate self-feedback entry for slice planning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:18:46 +02:00
Mikael Hugo
bde55dfc87 feat(subagent): default subagent dispatch to swarm in code, not just wrapper
The bash wrapper bin/sf-from-source exports SF_SUBAGENT_VIA_SWARM=1
to make the swarm/messagebus path the default for subagent dispatch.
That covers every sf launch via the wrapper but does NOT cover the
web-launched sf — src/web/cli-entry.ts:resolveSfCliEntry spawns sf by
calling process.execPath (node) directly with src/loader.ts or
dist/loader.js, bypassing the wrapper entirely. So /tmp/sf-web-
onboarding-runtime-* sf processes were still falling through to the
direct-runSubagent subprocess path.

Flip the default in code instead: swarm runs unless
SF_SUBAGENT_VIA_SWARM is explicitly set to "0" or "false". Now every
sf launch — wrapper, web, dev-cli, packaged-standalone — picks up the
same default. The wrapper's export line is now redundant but harmless;
keeping it as defense-in-depth (documents the intent at the wrapper
layer too).

Test update: subagent-via-swarm.test.mjs's "unset → subprocess"
assertion is updated to "=0 → subprocess" — the unset case now means
swarm-by-default. All 13 tests in that file pass. The other tests in
the file that explicitly set the flag to "1"/"true" are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:12:52 +02:00
Mikael Hugo
9a84d82cdb chore(release): 2.75.3 → 2.75.4 + workspace dependency refresh
Bumps version across the workspace (root + 10 @singularity-forge/*
packages) and lands the pending dependency refresh that had been
sitting uncommitted:

  @anthropic-ai/sdk         0.95.1 → 0.96.0
  @anthropic-ai/vertex-sdk  0.14.4 → 0.16.0
  @google/genai             2.0    → 2.3
  @logtape/{file,logtape,pretty,redaction}  2.0.7 → 2.0.9
  @smithy/node-http-handler 4.7.0  → 4.7.3
  @clack/prompts            1.3    → 1.4
  @types/mime-types         2.1    → 3.0

Inter-package refs in packages/{daemon,ai}/package.json bumped to
^2.75.4 so the workspace stays self-consistent. package-lock.json
regenerated via `npm install --package-lock-only --legacy-peer-deps`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:59:14 +02:00
Mikael Hugo
f55d490e1d fix(subagent-runner): drop spurious 10s STUCK warning on session.prompt
The phaseWatchdog at 10s fired "STUCK phase=session.prompt" on every
healthy LLM call longer than 10 seconds. Verified via strace on the
running dogfood sf: bytes were actively flowing on the TLS socket
(fd 29) to the LLM provider while STUCK was being logged — the
session.prompt was never actually stuck, the watchdog was just
diagnostic-only and oblivious to stream activity.

The noOutputTimeoutMs watchdog (set to 60s for triage in commit
d80060fec) is the actual kill mechanism. It is already event-aware:
every meaningful subagent event resets the timer via armNoOutputTimer
+ isMeaningfulSubagentOutputEvent. The 10s STUCK warning was added
in commit 67e5ac9db as investigation infrastructure for the
sf-mp8e02m1-zpk903 family of bugs, but now it is just noise that
makes legitimate 30-200s LLM responses look broken.

Keeps the 10s STUCK watchdog for the three setup phases
(resourceLoader.reload, createAgentSession, bindExtensions) where
10s of silence is a real hang signal — those phases normally run in
sub-second.

Also includes:
- biome.json: bump $schema URL from 2.4.14 to 2.4.15 to match the
  current biome CLI (clears the deserialize warning)
- scripts/check-test-imports.{,test.}mjs: format + drop a useless
  regex escape that biome flagged in landed code

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:49:43 +02:00
Mikael Hugo
c09cad1cf0 chore(sf-from-source): default subagent dispatch to swarm/messagebus path
Sets SF_SUBAGENT_VIA_SWARM=1 by default in the wrapper so all sf
launches route subagent calls through runSingleAgentViaSwarm (uok
message-bus / uok_messages table) instead of spawning a child sf
process via runSubagent. Operators can opt out with
SF_SUBAGENT_VIA_SWARM=0 (or =false) in env.

Leaves the runSingleAgent code default (opt-in) unchanged so the
existing tests/subagent-via-swarm.test.mjs "unset → subprocess"
assertion keeps holding. The flip lives at the wrapper layer where
every interactive/headless sf launch picks it up but tests and
direct dev-cli launches stay on documented opt-in semantics.

Note: this is Layer 1 of the inline-execution path. Layer 2 (full
in-process unit dispatch via runUnitInline) is tracked separately
in REQUIREMENTS.md R013/R014 and is not addressed here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:27:29 +02:00
Mikael Hugo
2eaea85020 fix(cli): add test-import-drift lint guard
AC1: Document convention in CLAUDE.md — test files over-importing (>5)
from a SF module should use namespace imports to avoid the anti-pattern
where a new describe() block uses an undeclared function (ReferenceError
at vitest run-time, not caught by biome lint).

AC3/AC4: add check-test-imports.mjs — static analysis script that scans
all *.test.{js,mjs,ts} files for itemized imports (≥6) + camelCase
identifier not in the import list. Exposes the failure mode at lint time.
Includes regression test (check-test-imports.test.mjs, 5/5 passing).

Closes sf-mp8ujgry-aoqcx0.
2026-05-16 23:26:38 +02:00
Mikael Hugo
950e345085 refactor(tests): use namespace import for auto-prompts to avoid stale exports 2026-05-16 23:18:16 +02:00
Mikael Hugo
4f460b54a3 feat(tests): add 6 new builder tests for M006/S01/T01 R009 ordering coverage
Extend R009 builder ordering safety tests to 6 builders:
- buildPlanSlicePrompt: verifies inlined context and roadmap
- buildRefineSlicePrompt: verifies inlined context and slice-context
- buildExecuteTaskPrompt: verifies task plan inlining and templates
- buildReactiveExecutePrompt: verifies ready task list and templates
- buildCompleteMilestonePrompt: verifies inlined context and roadmap
- buildGateEvaluatePrompt: verifies slice plan context and gates

Note: buildWorkflowPreferencesPrompt and buildReactiveExecutePrompt do not use
{{inlinedContext}} — they use {{inlinedTemplates}} or bespoke template wiring.
Tests assert on the actual template markers these builders produce.
2026-05-16 23:16:51 +02:00
Mikael Hugo
96425e19dc style(biome): apply biome format to skipped-slice-render and reassess-roadmap
Format-only normalization of files landed in 7d57115a6 — multi-line
object literals and import groupings to match the project's biome
config. No semantic changes (test still passes 4/4).

Also reformats auto-prompts.js whitespace touched by the same pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:10:08 +02:00
Mikael Hugo
7d57115a68 fix: render skipped slices distinctly; accurate error messages in reassess_roadmap
Skipped slices now render with *(skipped)* annotation in ROADMAP.md
generated via renderRoadmapFromDb. renderRoadmapCheckboxes now uses
isClosedStatus (covers complete/done/skipped) instead of the narrow
=== 'complete' check.

reassess_roadmap guard error messages now distinguish 'skipped' from
'completed' instead of conflating both under 'cannot modify completed
slice'. The structural enforcement logic (no touch for closed slices)
is unchanged — this is an accuracy fix for error messages and render
behaviour, not a policy change.

Tests added in skipped-slice-render.test.mjs covering:
- renderRoadmapCheckboxes sets [x] for skipped slices
- renderRoadmapCheckboxes unchecks slice that was marked complete but is now pending
- reassess_roadmap error message uses 'skipped' not 'completed' for skipped slices

Refs: sf-mp8p1h0k-b0dcja
2026-05-16 21:57:48 +02:00
Mikael Hugo
767c499d9a fix(markdown-renderer): collapse 3+ consecutive newlines in slice plan
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Task descriptions in slice plans sometimes contained double-blanks
(model emits multi-paragraph content with its own paragraph padding,
which survives normalizeMarkdownBlockSpacing's heading-only padding
logic). The double blanks tripped MD012/no-multiple-blanks in
pre-execution checks and blocked the autonomous loop at the
execute-task phase.

Live observation today: SF iter2 completed research-slice and
plan-slice for M006/S01 cleanly, then pre-execution checks failed on
the generated S01-PLAN.md with two MD012 violations at lines 99-100
and 126-127 (both inside task description paragraphs). SF paused
"Autonomous mode paused (Escape)" awaiting user — autonomous loop
stalled.

auto_fix_check_failures: true in prefs should have handled this but
doesn't run for files under .sf/milestones/ (separate bug worth
filing). Fix at source: collapse runs of 3+ newlines to 2 in the
final rendered slice plan. Surgical, no semantic change, defensive
against future model-quirks too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:40:52 +02:00
Mikael Hugo
6f4728bdbd chore(lint): clear remaining biome warnings (dead code + signature contracts)
Drop superseded dead code surfaced by biome (knowledgeAbsPath, the
documentation-only SUSPECT_RESOLUTION_KINDS / SELF_FEEDBACK_RECORD_ENTRY
constants, the legacy appendResolutionToJsonl writer that the
regenerate-from-DB flow replaced, OLD_BENCHMARK_KEY_ALIASES which was
never iterated), prefix intentionally-unused params on stub/contract
signatures with _, drop unused locals in tests, and add the missing
backupContent1 ≠ sentinel sanity assertion in the model-learner
overwrite-protection test (without it the second assertion was
vacuously true if the first ctor never wrote anything). Also re-indent
the misformatted assist block in biome.json.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:36:13 +02:00
Mikael Hugo
365c6bbc3b chore: formatter / linter touch-up (230 files)
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Pure formatting / lint-fix pass that ran during `npm run build:core`
in the session that landed the agent-runner / quota / coverage /
phase-2 routing work. No logic changes — indentation, trailing
commas, import sort, etc. Captured separately so the actual feature
commits stay scoped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:19:53 +02:00
Mikael Hugo
d80060fec5 fix(headless-triage): 60s no-output watchdog to cap session.prompt hang
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
When session.prompt() hits the deadlock seam sf-mp8e02m1-zpk903 (Promise
never resolves pre-LLM-dispatch, 0 syscall activity, blocks until outer
abort), the previous triage call had noOutputTimeoutMs=0 — meaning no
fast-fail path. The full 8-minute timeoutMs would burn before the
parent abort fired, wasting 8 minutes of subscription window per stuck
triage attempt.

This adds a 60s no-output watchdog: if no meaningful subagent event
fires for 60s, abort the prompt. Combined with the diagnostic logs in
subagent-runner.ts (commit 67e5ac9db) the operator gets:

  [subagent:triage-decider] phase=session.prompt-entered ...
  [subagent:triage-decider] STUCK phase=session.prompt 10001ms ...
  [forge] triage] apply blocked: triage-decider produced no output for 60000ms
                                                                       ↑ 60s, not 480s

Triage failure stays non-fatal (per the existing handleTriage error
catch in headless.ts:auto-triage path) — the autonomous loop continues
to its main milestone dispatch. Net effect: SF moves forward 8× faster
when the triage deadlock fires.

Doesn't fix the underlying Promise deadlock (still tracked in
sf-mp8e02m1-zpk903 and the new sf-mpmpXXX-... follow-up). This is a
"unblock the autonomous loop now, fix the deadlock later" patch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:58:07 +02:00
Mikael Hugo
67e5ac9db1 diag(subagent-runner): per-phase timing + stuck-watchdog for sf-mp8e02m1-zpk903
Some checks are pending
CI / detect-changes (push) Waiting to run
CI / docs-check (push) Blocked by required conditions
CI / lint (push) Blocked by required conditions
CI / build (push) Blocked by required conditions
CI / integration-tests (push) Blocked by required conditions
CI / windows-portability (push) Blocked by required conditions
CI / rtk-portability (linux, blacksmith-4vcpu-ubuntu-2404) (push) Blocked by required conditions
CI / rtk-portability (macos, macos-15) (push) Blocked by required conditions
CI / rtk-portability (windows, blacksmith-4vcpu-windows-2025) (push) Blocked by required conditions
Adds visible diagnostics to runSubagent so the next time the
"session initialized but no LLM call" bug fires, the log identifies
which setup phase hangs.

Phases instrumented:
  - resourceLoader.reload()
  - createAgentSession()
  - bindExtensions(runLifecycle=...)
  - session.prompt() entry → return

Output format (stderr, prefixed with [subagent:<name>]):
  phase=resourceLoader.reload 23ms
  phase=createAgentSession 142ms
  phase=bindExtensions 89ms runLifecycle=true
  phase=session.prompt-entered taskLen=8421 timeoutMs=480000 noOutputMs=180000
  phase=session.prompt-returned 16234ms          ← normal completion
  STUCK phase=<X> 10000ms (no completion signal ...)   ← when watchdog fires

Each phase has a soft 10s watchdog that emits a STUCK line if the
await doesn't complete in time. The watchdog never aborts — just
surfaces visibility. Existing timeoutMs / noOutputTimeoutMs handle
actual termination.

This is investigation infrastructure for the third prompt-never-sent
seam (coding-agent/subagent-runner). The agent-runner.js seam
(sf-mp8g4rcd-w01tkh) was fixed in commit 8ee4d8358 with bounded
retries. This commit doesn't fix the underlying bug — it makes the
bug self-reporting next time it fires so operator and autonomous
loop both get actionable signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:40:17 +02:00
Mikael Hugo
8ee4d83581 fix(agent-runner): retry inbox refresh + throw loud on missing message
Closes sf-mp8g4rcd-w01tkh (FINAL prompt-never-sent root cause) — the
agent-runner.js:182 silent early-return that has been causing 59+
runaway-loop:idle-halt feedback entries and the recurring "Autonomous
loop stuck — no heartbeat" cascade.

Root cause: when swarm-dispatch's bus delivers a message and SF
kernel marks the unit as dispatched, the consumer agent's inbox
sometimes doesn't see the message immediately (different MessageBus
instance, SQLite read-cache lag). Previous code returned
{turnsProcessed:0, response:null} silently — caller (swarm-dispatch
dispatchAndWait) swallowed it as "no work" — LLM never ran — unit
appeared cancelled with no diagnostic.

Fix: bounded retry on missing-message with exponential backoff:
50, 100, 200, 400, 800 ms (1.55s total max). If target message
appears during retry → log recovery event, proceed normally. If still
missing after the last retry → throw a loud error with full inbox
state in the message. The caller wraps in try/catch and surfaces it
as turnResult.error, so the autonomous loop sees a real failure
instead of phantom forward progress.

What this resolves:
- Earlier today: `sf headless triage --apply` timed out at 480000ms
  because triage-decider subagent hit this bug. With retries, the
  triage-decider has 1.55s of latency tolerance to receive its prompt.
- The 59 backlogged runaway-loop:idle-halt entries are symptoms of
  the same root cause. Future occurrences will surface as loud errors,
  not phantom "stuck" units — operator/auto-supervisor can react.

Validated:
- 578 tests pass (49 files) including agent-runner / swarm-dispatch /
  inbox tests.
- runAgentTurn callers (auto/loop.js, agent-swarm.js, swarm-dispatch
  dispatchAndWait) all already handle thrown errors via try/catch
  with explicit error surfacing — the contract change is safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:30:58 +02:00
Mikael Hugo
7e882b56d0 fix(engine): auto-remediate reassess-roadmap ASSESSMENT artifact (breaks loop)
The rogue-write detector in auto-post-unit.js:detectRogueFileWrites
checks for an `artifacts` table row with artifact_type='ASSESSMENT'
after a reassess-roadmap unit writes the assessment file. Other unit
types (execute-task, complete-slice) had auto-remediation paths that
sync the DB to the filesystem when state is stale. reassess-roadmap
did not.

Effect: the reassess_roadmap MCP tool writes the assessment file but
nothing registers it in the artifacts table. EVERY successful
iteration gets flagged rogue post-hoc; SF re-dispatches the same
unit; same thing happens; infinite loop until --timeout SIGTERM.

Empirically observed today (filed as sf-mpmp8min68-yoy2pa):
  Run 1: success $0.012, 16709 tokens → rogue → redispatch
  Run 2: success $0.017, 18925 tokens → rogue → redispatch
  Run 3: started → SIGTERM at --timeout 480000ms

Each iteration is real work product (real assessment content,
verdict: roadmap-confirmed) — the model is doing its job correctly,
the engine just doesn't recognize completion.

Fix: when assessment file exists on disk and artifacts row is
missing, INSERT into artifacts table via insertArtifact (parallel to
updateTaskStatus / updateSliceStatus auto-remediate in the same
function). Falls back to flagging rogue only if the insert fails.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:35:44 +02:00
Mikael Hugo
7217a24c28 fix(quota/openrouter): suppress usedFraction since SF policy is :free-only
OpenRouter's credit-balance (total_usage / total_credits) was being
used as a quota signal in phase 2's quotaHeadroomMultiplier, demoting
openrouter once credits got high (e.g., 80% used → 0.5 multiplier).

But SF's built-in policy (preferences-models.js:123-131
isModelAllowedByBuiltInProviderPolicy) hard-restricts every OpenRouter
route to `:free` + zero-cost models for ALL SF users — there's no
opt-in, no way to bypass it. Therefore SF dispatches NEVER consume
OpenRouter credits, and the credit balance is purely historical noise.

Fix: stop emitting `usedFraction` for OpenRouter's credit window. The
window is still reported (so `sf headless usage` shows credits state
for awareness) but quotaHeadroomMultiplier now treats OpenRouter as
"no quota signal" → neutral 1.0 — no spurious demotion.

Affects only the routing layer (selector). Display layer unchanged
beyond the label tweak ("info only — SF routes :free").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:26:16 +02:00
Mikael Hugo
f7cd01df0a feat(cli): sf --maintain drains self-feedback triage queue too
Extends the --maintain command (catalog refresh + quota refresh +
coverage audit) to also drain the self-feedback triage queue with
max=10 candidates per invocation. Combined with the daemon's 6h
maintenance timer that spawns `sf --maintain` in every configured
repo, this gives unattended cross-repo triage:

  Repo                       What gets triaged
  ────────────────────────── ─────────────────────────────────
  ~/code/singularity-forge   SF's own backlog (prompt-never-sent,
                             architecture defects, the 3
                             enhancement entries from today)
  ~/code/dr-repo             dr-repo's backlog (M005 flow
                             failures, agent friction, etc.)
  ~/code/centralcloud/*      whatever each subproject accrues

Both --maintain and `headless autonomous` use process.cwd() so they
target the right repo automatically. Interactive mode (plain `sf`)
deliberately does NOT auto-triage — that would spawn subagents while
the user is working in the same session, risking lock contention.

Triage failures stay non-fatal: catalog/quota/coverage work still
completes even if triage subagent dispatch hits the prompt-never-sent
bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:04:03 +02:00
Mikael Hugo
5a57549591 feat(headless): autonomous mode auto-drains self-feedback triage queue first
Before this change, `sf headless autonomous` only dispatched units for
the active milestone — never touched .sf/self-feedback.jsonl. The
existing `sf headless triage --apply` was a manual operator path
required for self-feedback to become actionable work. Defeats the
"SF self-heals" thesis: 146 entries can sit in the queue indefinitely
while the autonomous loop happily cranks on M005.

Now: at autonomous startup (not on resume, not on initial bootstrap)
SF calls handleTriage({ apply: true, max: 5 }) to drain the top-5
candidates from the triage queue before entering the dispatch loop.
The bound at max=5 keeps the upfront cost bounded; remaining items
process on the next session_start.

The comment on the existing triage handler in headless.ts:917-921
explicitly acknowledged the gap — autonomous-loop followUp delivery
was broken (sf-mp4rxkwb-l4baga). Wiring the deterministic triage
path BEFORE the dispatch loop closes that gap.

Opt-out: pass --skip-triage on the autonomous command (e.g. when
debugging a specific milestone without backlog churn).

Triage failures are non-fatal — they log a warning and the
autonomous loop continues with its existing milestone dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:01:50 +02:00
Mikael Hugo
8d0f41436b feat(benchmark-selector): phase 2 — quota-aware routing weight
Bias dispatch toward under-used subscriptions ("spend the subs") and
de-prioritize near-exhausted ones (avoid 429 walls). Multiplier is
applied to the benchmark score before sort, so it only re-orders
within the existing score → cost → coverage → preference ladder.
Unknown quota state stays neutral 1.0 — never punish a provider for
having no public quota API.

Curve, keyed on max(usedFraction) across all windows:
  < 0.20 → 1.15  (boost — lots of headroom, prefer to use it)
  < 0.50 → 1.00  (neutral)
  < 0.70 → 0.92  (slight steer away)
  < 0.90 → 0.50  (strong de-prioritize)
  < 0.95 → 0.20  (near-exhaustion)
  ≥ 0.95 → 0.05  (effectively skip)

Max-across-windows means kimi-coding's 5h-rolling window (tighter)
binds the decision even when the weekly is fresh.

New exported helper quotaHeadroomMultiplier(providerKey, getQuotaState?)
takes the resolver as optional dep for testability; defaults to
getProviderQuotaState from provider-quota-cache.js.

16 new tests cover the curve and the selectByBenchmarks integration
(unknown quota → unchanged, demoted high-usage provider, boosted
under-used provider, near-exhausted skipped when alternatives exist).

Filed as SF backlog item sf-mpmp8ie6xf-z4cxhg before — now closes
that loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:09:33 +02:00
Mikael Hugo
b39cf3387e fix(quota/zai): use raw key auth (no Bearer) + correct response shape
Cross-referenced vbgate/opencode-mystatus reference implementation
and found two real bugs in the zai fetcher:

1. Auth header: zai's monitor endpoint expects `Authorization: <key>`
   with NO `Bearer ` prefix. Using Bearer caused the server to treat
   the call as unauthenticated and return the generic "no coding
   plan" response even for active coding-plan users.

2. Response shape: real envelope is
     { code, msg, success, data: { limits: [
       { type: "TOKENS_LIMIT"|"TIME_LIMIT", usage, currentValue,
         percentage, nextResetTime? } ] } }
   Was looking for `data: [...]` directly and using `limit`/`used`
   fields. Now parses `data.data.limits[].usage` / `.currentValue`.

3. Added User-Agent header to match the reference tool.

Live probe finding: this user's z.ai key works fine for inference
(/api/coding/paas/v4/models returns 200 with the full model list)
but the monitor endpoint reports "no coding plan" — meaning their
account uses the regular pay-as-you-go z.ai/zhipu tier, not the
separately-billed "Coding Plan" subscription that the monitor
endpoint serves. The 429s they observe during inference are
rate-limit RPM/TPM errors, not coding-plan window exhaustion.
Code change is correct; the error message is now accurate and
actionable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:04:12 +02:00
Mikael Hugo
8fa9a4b8fa fix(quota): match real API shapes for kimi-coding / minimax / zai
Dogfooded `sf headless usage` against live APIs and discovered three
shape mismatches in the phase-1 fetchers:

- kimi-coding returns numeric fields as STRINGS ("limit": "100") and
  uses camelCase `resetTime`. Added toNum() coercion + reset hint
  extraction. Now reports Weekly + 5h rolling windows correctly.

- minimax response is `{ model_remains: [{ model_name,
  current_interval_total_count, current_interval_usage_count,
  current_weekly_total_count, current_weekly_usage_count, end_time,
  weekly_end_time, ...}] }` — per-model rolling + weekly windows, not
  the flat `remaining_tokens`/`total_tokens` shape I had assumed.
  Rewrote parser to emit one window per model entry.

- zai uses a `{ code, msg, success, data }` envelope. When
  `success: false` (e.g. user lacks an active coding plan), parser
  now surfaces vendor msg as the entry error instead of silently
  emitting no windows.

Tests updated to mirror real shapes; added one for zai's failure
envelope. 12 tests pass (was 11).

Live result from re-running `sf headless usage`:
  - openrouter: 80.7% used, $7.71 remaining (real signal — watch this)
  - kimi-coding: Weekly 32%, 5h 4%
  - minimax: MiniMax-M* 5h 1.4% + coding-plan-vlm/search 1.4%
  - gemini-cli: 0.0-0.4% across all models (clean)
  - zai: surfaces "user does not have a coding plan" — may need a
    different endpoint or scope depending on the user's account setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:59:53 +02:00
Mikael Hugo
c0d089f9ca feat(catalog/quota): global model catalog, benchmark coverage audit, provider quota visibility
Phase-1 work shipped together since prior auto-snapshots split it across
several commits. This commit captures the leftover type declarations,
the new provider-quota-cache test suite, and the last register-hooks /
cli wiring.

Highlights now in tree:

- Model catalog moved from per-project to global `~/.sf/model-catalog/`
  via `sfHome()` (one cache shared by all repos; no more 9-dir
  duplication).

- `benchmark-coverage.js` audits the dispatchable model set against
  `learning/data/model-benchmarks.json` at session_start, writes
  `~/.sf/benchmark-coverage.json`, notifies on change.

- `provider-quota-cache.js` introduces phase-1 subscription quota
  visibility for the 5 providers with documented APIs:
  kimi-coding (/coding/v1/usages), openrouter (/api/v1/credits),
  minimax (/v1/token_plan/remains), zai (/api/monitor/usage/quota/limit),
  google-gemini-cli (existing snapshotGeminiCliAccount). 15-min TTL,
  global cache.

- `sf --maintain` CLI flag refreshes catalogs + quotas + coverage audit
  in one idempotent pass. Daemon spawns it every 6h.

- `sf headless usage` rewritten to display all providers from the
  unified cache, with explicit "no public API" notes for mistral,
  ollama-cloud, opencode, opencode-go, xiaomi.

- Awaitable `runXIfStale` variants for model-catalog, gemini-catalog,
  openai-codex-catalog (the schedule* variants now wrap them in
  setImmediate).

- TypeScript declarations added for the new JS modules so the
  dist-redirect pipeline type-checks cleanly.

Phase 2 (quota-aware routing in benchmark-selector) is filed as SF
self-feedback for the backlog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:37:20 +02:00
Mikael Hugo
effbb75f83 sf snapshot: uncommitted changes after 30m inactivity 2026-05-16 17:30:35 +02:00
Mikael Hugo
b5764af27b sf snapshot: uncommitted changes after 33m inactivity 2026-05-16 17:00:13 +02:00
Mikael Hugo
b73e386090 sf snapshot: uncommitted changes after 56m inactivity 2026-05-16 16:26:40 +02:00
Mikael Hugo
4442400d11 sf snapshot: uncommitted changes after 30m inactivity 2026-05-16 15:29:43 +02:00
Mikael Hugo
da0c41d375 sf snapshot: uncommitted changes after 56m inactivity 2026-05-16 14:59:40 +02:00
Mikael Hugo
6071a9207c sf snapshot: uncommitted changes after 53m inactivity 2026-05-16 14:03:34 +02:00
Mikael Hugo
af2f86c3c2 sf snapshot: uncommitted changes after 827m inactivity 2026-05-16 13:10:15 +02:00
Mikael Hugo
6f32f9287a sf snapshot: pre-dispatch, uncommitted changes after 30m inactivity 2026-05-15 23:22:58 +02:00
Mikael Hugo
e2c3d6542c sf snapshot: uncommitted changes after 97m inactivity 2026-05-15 22:52:58 +02:00
Mikael Hugo
a2a6ab767c fix(auto): replan inactive milestones with DB context 2026-05-15 21:15:04 +02:00
Mikael Hugo
ecf6af92e8 fix(auto): avoid resuming blocked no-unit sessions 2026-05-15 20:56:15 +02:00
Mikael Hugo
03f6d4990f fix(auto): set solver iteration default to 25 2026-05-15 20:35:10 +02:00
Mikael Hugo
8e85a6e673 fix(state): skip cancelled slices during dispatch 2026-05-15 20:24:11 +02:00
Mikael Hugo
62d63f111e fix(auto): bound solver iteration defaults 2026-05-15 20:14:30 +02:00
Mikael Hugo
0b187b9f62 fix(headless): remove legacy v1 fallback path 2026-05-15 20:12:00 +02:00
Mikael Hugo
e2e096c5c7 feat(rpc): configurable RPC init timeout via SF_RPC_INIT_TIMEOUT_MS
Add resolveRpcInitTimeoutMs() helper and wire it into RpcClient.init().
Default init timeout increased from 30s to 120s. Override via env var.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 20:00:26 +02:00
Mikael Hugo
ced90e84a8 test(headless): update v2 migration tests for fatal-by-default fallback policy
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-15 20:00:02 +02:00