The bash wrapper bin/sf-from-source exports SF_SUBAGENT_VIA_SWARM=1
to make the swarm/messagebus path the default for subagent dispatch.
That covers every sf launch via the wrapper but does NOT cover the
web-launched sf — src/web/cli-entry.ts:resolveSfCliEntry spawns sf by
calling process.execPath (node) directly with src/loader.ts or
dist/loader.js, bypassing the wrapper entirely. So /tmp/sf-web-
onboarding-runtime-* sf processes were still falling through to the
direct-runSubagent subprocess path.
Flip the default in code instead: swarm runs unless
SF_SUBAGENT_VIA_SWARM is explicitly set to "0" or "false". Now every
sf launch — wrapper, web, dev-cli, packaged-standalone — picks up the
same default. The wrapper's export line is now redundant but harmless;
keeping it as defense-in-depth (documents the intent at the wrapper
layer too).
Test update: subagent-via-swarm.test.mjs's "unset → subprocess"
assertion is updated to "=0 → subprocess" — the unset case now means
swarm-by-default. All 13 tests in that file pass. The other tests in
the file that explicitly set the flag to "1"/"true" are unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps version across the workspace (root + 10 @singularity-forge/*
packages) and lands the pending dependency refresh that had been
sitting uncommitted:
@anthropic-ai/sdk 0.95.1 → 0.96.0
@anthropic-ai/vertex-sdk 0.14.4 → 0.16.0
@google/genai 2.0 → 2.3
@logtape/{file,logtape,pretty,redaction} 2.0.7 → 2.0.9
@smithy/node-http-handler 4.7.0 → 4.7.3
@clack/prompts 1.3 → 1.4
@types/mime-types 2.1 → 3.0
Inter-package refs in packages/{daemon,ai}/package.json bumped to
^2.75.4 so the workspace stays self-consistent. package-lock.json
regenerated via `npm install --package-lock-only --legacy-peer-deps`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The phaseWatchdog at 10s fired "STUCK phase=session.prompt" on every
healthy LLM call longer than 10 seconds. Verified via strace on the
running dogfood sf: bytes were actively flowing on the TLS socket
(fd 29) to the LLM provider while STUCK was being logged — the
session.prompt was never actually stuck, the watchdog was just
diagnostic-only and oblivious to stream activity.
The noOutputTimeoutMs watchdog (set to 60s for triage in commit
d80060fec) is the actual kill mechanism. It is already event-aware:
every meaningful subagent event resets the timer via armNoOutputTimer
+ isMeaningfulSubagentOutputEvent. The 10s STUCK warning was added
in commit 67e5ac9db as investigation infrastructure for the
sf-mp8e02m1-zpk903 family of bugs, but now it is just noise that
makes legitimate 30-200s LLM responses look broken.
Keeps the 10s STUCK watchdog for the three setup phases
(resourceLoader.reload, createAgentSession, bindExtensions) where
10s of silence is a real hang signal — those phases normally run in
sub-second.
Also includes:
- biome.json: bump $schema URL from 2.4.14 to 2.4.15 to match the
current biome CLI (clears the deserialize warning)
- scripts/check-test-imports.{,test.}mjs: format + drop a useless
regex escape that biome flagged in landed code
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets SF_SUBAGENT_VIA_SWARM=1 by default in the wrapper so all sf
launches route subagent calls through runSingleAgentViaSwarm (uok
message-bus / uok_messages table) instead of spawning a child sf
process via runSubagent. Operators can opt out with
SF_SUBAGENT_VIA_SWARM=0 (or =false) in env.
Leaves the runSingleAgent code default (opt-in) unchanged so the
existing tests/subagent-via-swarm.test.mjs "unset → subprocess"
assertion keeps holding. The flip lives at the wrapper layer where
every interactive/headless sf launch picks it up but tests and
direct dev-cli launches stay on documented opt-in semantics.
Note: this is Layer 1 of the inline-execution path. Layer 2 (full
in-process unit dispatch via runUnitInline) is tracked separately
in REQUIREMENTS.md R013/R014 and is not addressed here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AC1: Document convention in CLAUDE.md — test files over-importing (>5)
from a SF module should use namespace imports to avoid the anti-pattern
where a new describe() block uses an undeclared function (ReferenceError
at vitest run-time, not caught by biome lint).
AC3/AC4: add check-test-imports.mjs — static analysis script that scans
all *.test.{js,mjs,ts} files for itemized imports (≥6) + camelCase
identifier not in the import list. Exposes the failure mode at lint time.
Includes regression test (check-test-imports.test.mjs, 5/5 passing).
Closes sf-mp8ujgry-aoqcx0.
Extend R009 builder ordering safety tests to 6 builders:
- buildPlanSlicePrompt: verifies inlined context and roadmap
- buildRefineSlicePrompt: verifies inlined context and slice-context
- buildExecuteTaskPrompt: verifies task plan inlining and templates
- buildReactiveExecutePrompt: verifies ready task list and templates
- buildCompleteMilestonePrompt: verifies inlined context and roadmap
- buildGateEvaluatePrompt: verifies slice plan context and gates
Note: buildWorkflowPreferencesPrompt and buildReactiveExecutePrompt do not use
{{inlinedContext}} — they use {{inlinedTemplates}} or bespoke template wiring.
Tests assert on the actual template markers these builders produce.
Format-only normalization of files landed in 7d57115a6 — multi-line
object literals and import groupings to match the project's biome
config. No semantic changes (test still passes 4/4).
Also reformats auto-prompts.js whitespace touched by the same pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skipped slices now render with *(skipped)* annotation in ROADMAP.md
generated via renderRoadmapFromDb. renderRoadmapCheckboxes now uses
isClosedStatus (covers complete/done/skipped) instead of the narrow
=== 'complete' check.
reassess_roadmap guard error messages now distinguish 'skipped' from
'completed' instead of conflating both under 'cannot modify completed
slice'. The structural enforcement logic (no touch for closed slices)
is unchanged — this is an accuracy fix for error messages and render
behaviour, not a policy change.
Tests added in skipped-slice-render.test.mjs covering:
- renderRoadmapCheckboxes sets [x] for skipped slices
- renderRoadmapCheckboxes unchecks slice that was marked complete but is now pending
- reassess_roadmap error message uses 'skipped' not 'completed' for skipped slices
Refs: sf-mp8p1h0k-b0dcja
Task descriptions in slice plans sometimes contained double-blanks
(model emits multi-paragraph content with its own paragraph padding,
which survives normalizeMarkdownBlockSpacing's heading-only padding
logic). The double blanks tripped MD012/no-multiple-blanks in
pre-execution checks and blocked the autonomous loop at the
execute-task phase.
Live observation today: SF iter2 completed research-slice and
plan-slice for M006/S01 cleanly, then pre-execution checks failed on
the generated S01-PLAN.md with two MD012 violations at lines 99-100
and 126-127 (both inside task description paragraphs). SF paused
"Autonomous mode paused (Escape)" awaiting user — autonomous loop
stalled.
auto_fix_check_failures: true in prefs should have handled this but
doesn't run for files under .sf/milestones/ (separate bug worth
filing). Fix at source: collapse runs of 3+ newlines to 2 in the
final rendered slice plan. Surgical, no semantic change, defensive
against future model-quirks too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop superseded dead code surfaced by biome (knowledgeAbsPath, the
documentation-only SUSPECT_RESOLUTION_KINDS / SELF_FEEDBACK_RECORD_ENTRY
constants, the legacy appendResolutionToJsonl writer that the
regenerate-from-DB flow replaced, OLD_BENCHMARK_KEY_ALIASES which was
never iterated), prefix intentionally-unused params on stub/contract
signatures with _, drop unused locals in tests, and add the missing
backupContent1 ≠ sentinel sanity assertion in the model-learner
overwrite-protection test (without it the second assertion was
vacuously true if the first ctor never wrote anything). Also re-indent
the misformatted assist block in biome.json.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure formatting / lint-fix pass that ran during `npm run build:core`
in the session that landed the agent-runner / quota / coverage /
phase-2 routing work. No logic changes — indentation, trailing
commas, import sort, etc. Captured separately so the actual feature
commits stay scoped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When session.prompt() hits the deadlock seam sf-mp8e02m1-zpk903 (Promise
never resolves pre-LLM-dispatch, 0 syscall activity, blocks until outer
abort), the previous triage call had noOutputTimeoutMs=0 — meaning no
fast-fail path. The full 8-minute timeoutMs would burn before the
parent abort fired, wasting 8 minutes of subscription window per stuck
triage attempt.
This adds a 60s no-output watchdog: if no meaningful subagent event
fires for 60s, abort the prompt. Combined with the diagnostic logs in
subagent-runner.ts (commit 67e5ac9db) the operator gets:
[subagent:triage-decider] phase=session.prompt-entered ...
[subagent:triage-decider] STUCK phase=session.prompt 10001ms ...
[forge] triage] apply blocked: triage-decider produced no output for 60000ms
↑ 60s, not 480s
Triage failure stays non-fatal (per the existing handleTriage error
catch in headless.ts:auto-triage path) — the autonomous loop continues
to its main milestone dispatch. Net effect: SF moves forward 8× faster
when the triage deadlock fires.
Doesn't fix the underlying Promise deadlock (still tracked in
sf-mp8e02m1-zpk903 and the new sf-mpmpXXX-... follow-up). This is a
"unblock the autonomous loop now, fix the deadlock later" patch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds visible diagnostics to runSubagent so the next time the
"session initialized but no LLM call" bug fires, the log identifies
which setup phase hangs.
Phases instrumented:
- resourceLoader.reload()
- createAgentSession()
- bindExtensions(runLifecycle=...)
- session.prompt() entry → return
Output format (stderr, prefixed with [subagent:<name>]):
phase=resourceLoader.reload 23ms
phase=createAgentSession 142ms
phase=bindExtensions 89ms runLifecycle=true
phase=session.prompt-entered taskLen=8421 timeoutMs=480000 noOutputMs=180000
phase=session.prompt-returned 16234ms ← normal completion
STUCK phase=<X> 10000ms (no completion signal ...) ← when watchdog fires
Each phase has a soft 10s watchdog that emits a STUCK line if the
await doesn't complete in time. The watchdog never aborts — just
surfaces visibility. Existing timeoutMs / noOutputTimeoutMs handle
actual termination.
This is investigation infrastructure for the third prompt-never-sent
seam (coding-agent/subagent-runner). The agent-runner.js seam
(sf-mp8g4rcd-w01tkh) was fixed in commit 8ee4d8358 with bounded
retries. This commit doesn't fix the underlying bug — it makes the
bug self-reporting next time it fires so operator and autonomous
loop both get actionable signal.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes sf-mp8g4rcd-w01tkh (FINAL prompt-never-sent root cause) — the
agent-runner.js:182 silent early-return that has been causing 59+
runaway-loop:idle-halt feedback entries and the recurring "Autonomous
loop stuck — no heartbeat" cascade.
Root cause: when swarm-dispatch's bus delivers a message and SF
kernel marks the unit as dispatched, the consumer agent's inbox
sometimes doesn't see the message immediately (different MessageBus
instance, SQLite read-cache lag). Previous code returned
{turnsProcessed:0, response:null} silently — caller (swarm-dispatch
dispatchAndWait) swallowed it as "no work" — LLM never ran — unit
appeared cancelled with no diagnostic.
Fix: bounded retry on missing-message with exponential backoff:
50, 100, 200, 400, 800 ms (1.55s total max). If target message
appears during retry → log recovery event, proceed normally. If still
missing after the last retry → throw a loud error with full inbox
state in the message. The caller wraps in try/catch and surfaces it
as turnResult.error, so the autonomous loop sees a real failure
instead of phantom forward progress.
What this resolves:
- Earlier today: `sf headless triage --apply` timed out at 480000ms
because triage-decider subagent hit this bug. With retries, the
triage-decider has 1.55s of latency tolerance to receive its prompt.
- The 59 backlogged runaway-loop:idle-halt entries are symptoms of
the same root cause. Future occurrences will surface as loud errors,
not phantom "stuck" units — operator/auto-supervisor can react.
Validated:
- 578 tests pass (49 files) including agent-runner / swarm-dispatch /
inbox tests.
- runAgentTurn callers (auto/loop.js, agent-swarm.js, swarm-dispatch
dispatchAndWait) all already handle thrown errors via try/catch
with explicit error surfacing — the contract change is safe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rogue-write detector in auto-post-unit.js:detectRogueFileWrites
checks for an `artifacts` table row with artifact_type='ASSESSMENT'
after a reassess-roadmap unit writes the assessment file. Other unit
types (execute-task, complete-slice) had auto-remediation paths that
sync the DB to the filesystem when state is stale. reassess-roadmap
did not.
Effect: the reassess_roadmap MCP tool writes the assessment file but
nothing registers it in the artifacts table. EVERY successful
iteration gets flagged rogue post-hoc; SF re-dispatches the same
unit; same thing happens; infinite loop until --timeout SIGTERM.
Empirically observed today (filed as sf-mpmp8min68-yoy2pa):
Run 1: success $0.012, 16709 tokens → rogue → redispatch
Run 2: success $0.017, 18925 tokens → rogue → redispatch
Run 3: started → SIGTERM at --timeout 480000ms
Each iteration is real work product (real assessment content,
verdict: roadmap-confirmed) — the model is doing its job correctly,
the engine just doesn't recognize completion.
Fix: when assessment file exists on disk and artifacts row is
missing, INSERT into artifacts table via insertArtifact (parallel to
updateTaskStatus / updateSliceStatus auto-remediate in the same
function). Falls back to flagging rogue only if the insert fails.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenRouter's credit-balance (total_usage / total_credits) was being
used as a quota signal in phase 2's quotaHeadroomMultiplier, demoting
openrouter once credits got high (e.g., 80% used → 0.5 multiplier).
But SF's built-in policy (preferences-models.js:123-131
isModelAllowedByBuiltInProviderPolicy) hard-restricts every OpenRouter
route to `:free` + zero-cost models for ALL SF users — there's no
opt-in, no way to bypass it. Therefore SF dispatches NEVER consume
OpenRouter credits, and the credit balance is purely historical noise.
Fix: stop emitting `usedFraction` for OpenRouter's credit window. The
window is still reported (so `sf headless usage` shows credits state
for awareness) but quotaHeadroomMultiplier now treats OpenRouter as
"no quota signal" → neutral 1.0 — no spurious demotion.
Affects only the routing layer (selector). Display layer unchanged
beyond the label tweak ("info only — SF routes :free").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the --maintain command (catalog refresh + quota refresh +
coverage audit) to also drain the self-feedback triage queue with
max=10 candidates per invocation. Combined with the daemon's 6h
maintenance timer that spawns `sf --maintain` in every configured
repo, this gives unattended cross-repo triage:
Repo What gets triaged
────────────────────────── ─────────────────────────────────
~/code/singularity-forge SF's own backlog (prompt-never-sent,
architecture defects, the 3
enhancement entries from today)
~/code/dr-repo dr-repo's backlog (M005 flow
failures, agent friction, etc.)
~/code/centralcloud/* whatever each subproject accrues
Both --maintain and `headless autonomous` use process.cwd() so they
target the right repo automatically. Interactive mode (plain `sf`)
deliberately does NOT auto-triage — that would spawn subagents while
the user is working in the same session, risking lock contention.
Triage failures stay non-fatal: catalog/quota/coverage work still
completes even if triage subagent dispatch hits the prompt-never-sent
bug.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this change, `sf headless autonomous` only dispatched units for
the active milestone — never touched .sf/self-feedback.jsonl. The
existing `sf headless triage --apply` was a manual operator path
required for self-feedback to become actionable work. Defeats the
"SF self-heals" thesis: 146 entries can sit in the queue indefinitely
while the autonomous loop happily cranks on M005.
Now: at autonomous startup (not on resume, not on initial bootstrap)
SF calls handleTriage({ apply: true, max: 5 }) to drain the top-5
candidates from the triage queue before entering the dispatch loop.
The bound at max=5 keeps the upfront cost bounded; remaining items
process on the next session_start.
The comment on the existing triage handler in headless.ts:917-921
explicitly acknowledged the gap — autonomous-loop followUp delivery
was broken (sf-mp4rxkwb-l4baga). Wiring the deterministic triage
path BEFORE the dispatch loop closes that gap.
Opt-out: pass --skip-triage on the autonomous command (e.g. when
debugging a specific milestone without backlog churn).
Triage failures are non-fatal — they log a warning and the
autonomous loop continues with its existing milestone dispatch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bias dispatch toward under-used subscriptions ("spend the subs") and
de-prioritize near-exhausted ones (avoid 429 walls). Multiplier is
applied to the benchmark score before sort, so it only re-orders
within the existing score → cost → coverage → preference ladder.
Unknown quota state stays neutral 1.0 — never punish a provider for
having no public quota API.
Curve, keyed on max(usedFraction) across all windows:
< 0.20 → 1.15 (boost — lots of headroom, prefer to use it)
< 0.50 → 1.00 (neutral)
< 0.70 → 0.92 (slight steer away)
< 0.90 → 0.50 (strong de-prioritize)
< 0.95 → 0.20 (near-exhaustion)
≥ 0.95 → 0.05 (effectively skip)
Max-across-windows means kimi-coding's 5h-rolling window (tighter)
binds the decision even when the weekly is fresh.
New exported helper quotaHeadroomMultiplier(providerKey, getQuotaState?)
takes the resolver as optional dep for testability; defaults to
getProviderQuotaState from provider-quota-cache.js.
16 new tests cover the curve and the selectByBenchmarks integration
(unknown quota → unchanged, demoted high-usage provider, boosted
under-used provider, near-exhausted skipped when alternatives exist).
Filed as SF backlog item sf-mpmp8ie6xf-z4cxhg before — now closes
that loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-referenced vbgate/opencode-mystatus reference implementation
and found two real bugs in the zai fetcher:
1. Auth header: zai's monitor endpoint expects `Authorization: <key>`
with NO `Bearer ` prefix. Using Bearer caused the server to treat
the call as unauthenticated and return the generic "no coding
plan" response even for active coding-plan users.
2. Response shape: real envelope is
{ code, msg, success, data: { limits: [
{ type: "TOKENS_LIMIT"|"TIME_LIMIT", usage, currentValue,
percentage, nextResetTime? } ] } }
Was looking for `data: [...]` directly and using `limit`/`used`
fields. Now parses `data.data.limits[].usage` / `.currentValue`.
3. Added User-Agent header to match the reference tool.
Live probe finding: this user's z.ai key works fine for inference
(/api/coding/paas/v4/models returns 200 with the full model list)
but the monitor endpoint reports "no coding plan" — meaning their
account uses the regular pay-as-you-go z.ai/zhipu tier, not the
separately-billed "Coding Plan" subscription that the monitor
endpoint serves. The 429s they observe during inference are
rate-limit RPM/TPM errors, not coding-plan window exhaustion.
Code change is correct; the error message is now accurate and
actionable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dogfooded `sf headless usage` against live APIs and discovered three
shape mismatches in the phase-1 fetchers:
- kimi-coding returns numeric fields as STRINGS ("limit": "100") and
uses camelCase `resetTime`. Added toNum() coercion + reset hint
extraction. Now reports Weekly + 5h rolling windows correctly.
- minimax response is `{ model_remains: [{ model_name,
current_interval_total_count, current_interval_usage_count,
current_weekly_total_count, current_weekly_usage_count, end_time,
weekly_end_time, ...}] }` — per-model rolling + weekly windows, not
the flat `remaining_tokens`/`total_tokens` shape I had assumed.
Rewrote parser to emit one window per model entry.
- zai uses a `{ code, msg, success, data }` envelope. When
`success: false` (e.g. user lacks an active coding plan), parser
now surfaces vendor msg as the entry error instead of silently
emitting no windows.
Tests updated to mirror real shapes; added one for zai's failure
envelope. 12 tests pass (was 11).
Live result from re-running `sf headless usage`:
- openrouter: 80.7% used, $7.71 remaining (real signal — watch this)
- kimi-coding: Weekly 32%, 5h 4%
- minimax: MiniMax-M* 5h 1.4% + coding-plan-vlm/search 1.4%
- gemini-cli: 0.0-0.4% across all models (clean)
- zai: surfaces "user does not have a coding plan" — may need a
different endpoint or scope depending on the user's account setup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-1 work shipped together since prior auto-snapshots split it across
several commits. This commit captures the leftover type declarations,
the new provider-quota-cache test suite, and the last register-hooks /
cli wiring.
Highlights now in tree:
- Model catalog moved from per-project to global `~/.sf/model-catalog/`
via `sfHome()` (one cache shared by all repos; no more 9-dir
duplication).
- `benchmark-coverage.js` audits the dispatchable model set against
`learning/data/model-benchmarks.json` at session_start, writes
`~/.sf/benchmark-coverage.json`, notifies on change.
- `provider-quota-cache.js` introduces phase-1 subscription quota
visibility for the 5 providers with documented APIs:
kimi-coding (/coding/v1/usages), openrouter (/api/v1/credits),
minimax (/v1/token_plan/remains), zai (/api/monitor/usage/quota/limit),
google-gemini-cli (existing snapshotGeminiCliAccount). 15-min TTL,
global cache.
- `sf --maintain` CLI flag refreshes catalogs + quotas + coverage audit
in one idempotent pass. Daemon spawns it every 6h.
- `sf headless usage` rewritten to display all providers from the
unified cache, with explicit "no public API" notes for mistral,
ollama-cloud, opencode, opencode-go, xiaomi.
- Awaitable `runXIfStale` variants for model-catalog, gemini-catalog,
openai-codex-catalog (the schedule* variants now wrap them in
setImmediate).
- TypeScript declarations added for the new JS modules so the
dist-redirect pipeline type-checks cleanly.
Phase 2 (quota-aware routing in benchmark-selector) is filed as SF
self-feedback for the backlog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add resolveRpcInitTimeoutMs() helper and wire it into RpcClient.init().
Default init timeout increased from 30s to 120s. Override via env var.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Require SF_HEADLESS_ALLOW_V1_FALLBACK=1 to use legacy v1 fallback.
Default behavior now exits with error when v2 init fails, preventing
silent degradation to less reliable protocol matching.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
M004 S01: Update manifests to support knowledge and graph artifacts.
Adds computed: ["knowledge", "graph"] to manifests that did not yet
declare them, matching the actual behavior of their prompt builders:
- execute-task, reactive-execute
- discuss-project, discuss-requirements, research-project
- workflow-preferences (knowledge only — no graph scope)
These unit types already inline knowledge/graph via their builder
functions in auto-prompts.js; the manifest declarations were missing.
This brings the manifest schema into sync with real dispatch behavior.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>